CN114758072A

CN114758072A - Depth reconstruction method, device and storage medium

Info

Publication number: CN114758072A
Application number: CN202210399320.3A
Authority: CN
Inventors: 乔汝坤; 查红彬; 姜翰青
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2022-04-15
Filing date: 2022-04-15
Publication date: 2022-07-15

Abstract

The application discloses a depth reconstruction method, a depth reconstruction device and a computer readable storage medium. The method comprises the following steps: extracting the characteristics of a current frame to obtain a characteristic matrix of the current frame, wherein the current frame is an image obtained by space projection shooting of a preset structured light mode in current processing; predicting by adopting the feature matrix of the current frame and the feature matrix of the previous frame to obtain a predicted disparity map of the current frame; and correcting the predicted parallax image of the current frame by adopting the characteristic matrix of the current frame and the characteristic matrix of the structured light mode to obtain a corrected parallax image of the current frame. By the method, the calculation amount required by the depth reconstruction can be reduced, so that the reconstruction frame rate is improved.

Description

Depth reconstruction method, device and storage medium

Technical Field

The present application relates to the field of machine vision technologies, and in particular, to a depth reconstruction method and apparatus, and a computer-readable storage medium.

Background

Depth information recovery of three-dimensional dynamic scenes is an important problem in the field of computer vision. The three-dimensional information of the dynamic scene has high requirements in the application scenes of AR/VR, motion analysis, digital medical treatment and the like. Many further analyses and identifications require real-time acquisition of highly accurate, dense three-dimensional information as the basis for algorithms. Structured light is one implementation of monocular stereovision. Structured light systems typically include a light source (which may also be referred to as a projection light source) and a single camera. The light source projects a designed structured light mode or a structured light pattern, if an object exists in the space, the structured light is projected on the surface of the object, a single camera acquires a projected image of the structured light, the projection of the structured light can generate distortion along with the depth of each part of the surface of the object (namely, the distance between the structured light and the structured light system), and the distortion is analyzed to obtain the position and depth information of the object.

Although the depth of image restoration is more accurate and robust compared with that of a pure camera after projection is added, it is still very difficult to meet the real-time online requirement for a dynamic scene. The deformation mode in each frame image needs to be globally matched and optimized with the original mode for the whole image, and the operation amount can cause the reduction of the reconstructed frame rate and is difficult to meet the real-time requirement.

Disclosure of Invention

The application provides a depth reconstruction method, a depth reconstruction device and a computer readable storage medium, which can solve the problem that the parallax image is difficult to be dynamically reconstructed in real time in the existing structured light depth reconstruction.

In order to solve the technical problem, the application adopts a technical scheme that: a depth reconstruction method is provided. The method comprises the following steps: extracting the characteristics of a current frame to obtain a characteristic matrix of the current frame, wherein the current frame is an image obtained by space projection shooting of a preset structured light mode in current processing; predicting by adopting the feature matrix of the current frame and the feature matrix of the previous frame to obtain a predicted disparity map of the current frame; and correcting the predicted parallax image of the current frame by adopting the characteristic matrix of the current frame and the characteristic matrix of the structured light mode to obtain a corrected parallax image of the current frame.

Optionally, the obtaining the predicted disparity map of the current frame by predicting the feature matrix of the current frame and the feature matrix of the previous frame includes: calculating a mode flow between the current frame and the previous frame by adopting the characteristic matrix of the current frame and the characteristic matrix of the previous frame, wherein the mode flow is an optical flow caused by the change of the spatial projection of the structured light mode between the current frame and the previous frame; and transforming the corrected disparity map of the previous frame by adopting the mode stream to obtain the predicted disparity map of the current frame.

Optionally, the calculating the mode flow between the current frame and the previous frame by using the feature matrix of the current frame and the feature matrix of the previous frame includes: and obtaining the mode flow through iterative computation of a first cyclic update block, wherein the input of the first cyclic update block comprises a feature matrix of the current frame and a feature matrix of the previous frame, and the output result of the first cyclic update block at this time is used as the input of the next iterative computation until the iteration times reach a preset value.

Optionally, the transforming the modified disparity map of the previous frame by using the mode stream to obtain the predicted disparity map of the current frame includes: determining a first difference value between a parallax image of a current frame and a parallax image of a previous frame based on the mode stream, wherein the parallax image of the current frame is a parallax image between the current frame and the structured light mode, and the parallax image of the previous frame is a parallax image between the previous frame and the structured light mode; and summing the first difference value and the modified disparity map of the previous frame to obtain the predicted disparity map of the current frame.

Optionally, the modifying the predicted parallax map of the current frame by using the feature matrix of the current frame and the feature matrix of the structured light mode includes: predicting by adopting the characteristic matrix of the current frame and the characteristic matrix of the structured light mode to obtain a first parallax map of the current frame; obtaining a second disparity map of the current frame by utilizing the first disparity map of the current frame and the predicted disparity map of the current frame; and correcting the second parallax image of the current frame by adopting the characteristic matrix of the current frame and the characteristic matrix of the structured light mode.

Optionally, the obtaining the second disparity map of the current frame by using the first disparity map of the current frame and the predicted disparity map of the current frame includes: and fusing the first disparity map of the current frame and the predicted disparity map of the current frame to obtain a second disparity map of the current frame.

Optionally, the modifying the second disparity map of the current frame by using the feature matrix of the current frame and the feature matrix of the structured light mode includes: and performing iterative calculation through a second cyclic updating block to obtain a corrected parallax image of the current frame, wherein the input of the second cyclic updating block comprises the second parallax image of the current frame, the characteristic matrix of the current frame and the characteristic matrix of the structured light mode, and the output result of the second cyclic updating block at this time is used as the input of the next iterative calculation until the iteration times reach a preset value.

Optionally, the structured light pattern is a pseudo-random binary pattern, and is unique in one of the row and column directions, and repeats periodically in the other direction.

Optionally, further comprising: the method comprises the steps of utilizing luminosity errors of a current frame and a structural light mode to conduct self-supervision training on at least one of a feature extraction module, a first prediction module and a correction module, wherein the feature extraction module is used for extracting features of the current frame, the first prediction module is used for adopting a feature matrix of the current frame and a feature matrix of a previous frame to conduct prediction, and the correction module is used for adopting the feature matrix of the current frame and the feature matrix of the structural light mode to correct a predicted parallax image of the current frame.

Optionally, further comprising: and performing self-supervision training by using the corrected parallax image of the current frame output by the correction module.

Optionally, the loss function of the self-supervision training includes a mode spectrophotometry loss function, a prediction loss function, and a parallax loss function, the loss function used by the training feature extraction module is calculated by using the mode spectrophotometry loss function, the prediction loss function, and the parallax loss function, the loss function used by the training first prediction module includes the mode spectrophotometry loss function and the prediction loss function, and the loss function used by the training modification module includes the parallax loss function.

In order to solve the above technical problem, another technical solution adopted by the present application is: the depth reconstruction device comprises a feature extraction module, a feature extraction module and a feature extraction module, wherein the feature extraction module is used for extracting features of a current frame to obtain a feature matrix of the current frame, and the current frame is an image obtained by space projection shooting of a preset structured light mode and is currently processed; the first prediction module is used for predicting by adopting the feature matrix of the current frame and the feature matrix of the previous frame to obtain a prediction parallax map of the current frame; and the correction module is used for correcting the predicted parallax image of the current frame by adopting the characteristic matrix of the current frame and the characteristic matrix of the structured light mode to obtain a corrected parallax image of the current frame.

In order to solve the above technical problem, the present application adopts another technical solution: a depth reconstruction device is provided, which includes a processor, a memory coupled to the processor, wherein the memory stores program instructions; the processor is configured to execute the program instructions stored by the memory to implement the above-described method.

In order to solve the above technical problem, the present application adopts another technical solution that: there is provided a computer readable storage medium storing program instructions that when executed are capable of implementing the above method.

In the mode, the characteristics of the current frame are extracted through the characteristic extraction module to obtain the characteristic matrix of the current frame, and the current frame is an image obtained by space projection shooting of the preset structured light mode which is processed currently; predicting by a first prediction module by using the feature matrix of the current frame and the feature matrix of the previous frame to obtain a predicted disparity map of the current frame; and correcting the predicted parallax image of the current frame by using the characteristic matrix of the current frame and the characteristic matrix of the structured light mode through a correction module to obtain a corrected parallax image of the current frame. The feature matrix of the previous frame is introduced in the process of calculating the disparity map of the current frame, and the time sequence information brought by the previous frame can effectively reduce the calculation amount required by depth reconstruction, so that the reconstructed frame rate is improved, and the real-time requirement of depth reconstruction of a dynamic scene is better met.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a depth reconstruction method according to the present application;

FIG. 2 is a schematic view of the detailed process of S2 in FIG. 1;

FIG. 3 is a schematic geometric relationship diagram of a mode stream transformation in an embodiment of the depth reconstruction method of the present application;

FIG. 4 is a schematic view of the detailed process of S22 in FIG. 2;

FIG. 5 is a schematic view of the detailed process of S3 in FIG. 1;

FIG. 6 is a schematic diagram of an overall framework of a deep learning network TIDE using an embodiment of the deep reconstruction method of the present application;

FIG. 7 is a schematic diagram of a network architecture for TIDE;

FIG. 8 is a schematic structural diagram of an embodiment of the depth reconstruction apparatus of the present application;

FIG. 9 is a schematic structural diagram of another embodiment of the depth reconstruction device of the present application;

FIG. 10 is a schematic structural diagram of an embodiment of a computer-readable storage medium of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first", "second" and "third" in this application are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any indication of the number of technical features indicated. Thus, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of the feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless explicitly specifically limited otherwise.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those skilled in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.

Fig. 1 is a schematic flowchart of an embodiment of a depth reconstruction method according to the present application. It should be noted that, if the result is substantially the same, the flow sequence shown in fig. 1 is not limited in this embodiment. As shown in fig. 1, the present embodiment may include:

s1: and extracting the characteristics of the current frame to obtain a characteristic matrix of the current frame.

The present embodiment performs depth information estimation on consecutive frames in a single-camera structured light system. The current frame is an image obtained by space projection shooting of the preset structured light mode which is processed currently.

In a single camera structured light system, the projection light source can be considered as a second camera, whose imaging plane (e.g., diffractive optical element) serves as a reference pattern, whereby binocular parallax similar to stereoscopic vision can be formed. It is assumed that both the camera and the projection light source are calibrated and corrected. For successive frames in a single camera structured light system, the system may take a camera image as input and provide a disparity map for each frame.

In the system, the preset structured light mode is P, and the current frame I^tIs the camera image acquired at time t, D^tFor the output parallax image, i.e. structured light patterns P and I^tWhere t may be represented as a specific time stamp, or may be a frame number in a continuous frame sequence.

The structured light pattern may be a designed two-dimensional pattern, or a grating code pattern.

The structured light pattern may be a pseudo-random binary pattern and may be unique in one of the row and column directions and periodically repeated in the other direction, e.g., the spots in the structured light may be unique in the row, i.e., the spots in each row are different, while being periodically repeated in the vertical direction, i.e., the column direction.

Specifically, Local Contrast Normalization (LCN) can be used to normalize I^tAnd inputting the standard image into a feature extraction module after connecting the standard image. The feature extraction module may include two encoders with the same structure, one for extracting image features and one for extracting semantic features, and the output of the two encoders is combined to obtain a feature matrix of the current frame.

The same feature extraction module can be used to extract the feature matrix of the structured light pattern.

S2: and predicting by adopting the characteristic matrix of the current frame and the characteristic matrix of the previous frame to obtain a predicted disparity map of the current frame.

In dynamic scenes, camera images acquired at different times may be distorted due to changes in the position and/or depth of the object as the structured light system and/or the object in projection space moves. By calculating these distortions, the disparity map of the current frame can be predicted in combination with the corrected disparity map of the previous frame. Such inter-frame estimation can reduce the amount of calculation compared to the existing single-frame calculation.

Optionally, as shown in fig. 2, S2 specifically includes:

s21: and calculating the mode flow between the current frame and the previous frame by using the feature matrix of the current frame and the feature matrix of the previous frame.

Mode flow F^tSpatial projection of structured light pattern on current frame I_tAnd the previous frame I_t-1The light flow caused by the change between the two, is used to represent the motion of the projected light. The pattern flow can be computed in a similar manner as the optical flow, but it differs from the conventional optical flow in several ways.

First, the optical flow is the motion of the physical points, while the mode flow is the displacement of the projected rays, which means that it will only move along one-dimensional lines, i.e. epipolar lines, and not the 2D image.

Second, the mode flow under structured light is always easier to compute, since the modes always have enough features to match locally. Therefore, we can significantly reduce the network size and the number of iterations of the pattern flow computation.

Finally, although the mode flow has the aforementioned advantages over optical flow, it does not guarantee that the tracking points in both frames belong to the same physical point. The mode stream lacks physical correspondence information and cannot be used for motion estimation. However, it can be used to transform from the disparity of the previous frame to the disparity of the current frame.

Specifically, the mode stream may be obtained through iterative computation by a first cyclic update block, where the input of the first cyclic update block may include a feature matrix of a current frame and a feature matrix of a previous frame, and an output result of the first cyclic update block at this time is used as an input of next iterative computation until the number of iterations reaches a preset value. The first loop update block includes a time convolution gate loop unit (ConvGRU), which is a type of gate loop unit that replaces a full connection with a convolution. S22: and transforming the corrected disparity map of the previous frame by adopting the mode flow to obtain the predicted disparity map of the current frame.

The geometrical relationship of the mode stream transformation is described below.

As shown in FIG. 3, consider a point P (x) from the light source through the imaging plane_pY) projected light whose point in space projected at time t is p (X, Y, Z), and at time t +1 moves to p + Δ p as the object and/or the structured light system in space moves, where Δ p ═ Δ X,0, Δ Z. Note that p and p + Δ p cannot be guaranteed to be on the same point of the same object, but on the same projection ray. Thus, p will only move in the antipodal plane, and for a calibrated system, the vertical direction need not be considered. f is the focal length of the camera, b is the baseline between the camera and the projection light source, (x, y) are the coordinates on the camera imaging plane, corresponding to the coordinates in the camera image, from which:

by differentiating the formula (1), it is possible to obtain:

since the projection point only moves along the projection ray, it can be known that

The pattern stream may be represented as a horizontal vector, F^t(x, y) ═ u,0), where

Equation (2) can be further simplified as:

on the other hand, let D be D^t(x, y) and D^t-1The difference between the pixel values of (x, y) can be obtained from the epipolar geometry

Differentiating d gives:

from the equations (3) and (4), it is understood that:

this means that the mode stream is D^tAnd D^t-1Negative change in the difference between. According to this relationship, it is possible to transform from the modified disparity map of the previous frame to the predicted disparity map of the current frame according to the mode stream.

As shown in fig. 4, S22 may specifically include:

s221: a first difference value between a parallax image of a current frame and a parallax image of a previous frame is determined based on the mode stream.

Parallax image D of current frame^tFor the current frame I^tThe parallax image between the structural light mode P and the previous frame parallax image D^t-1For the previous frame I^t-1And the structured light pattern P. With reference to the above geometric relationship, the first difference may be calculated from the mode stream.

S222: and summing the first difference value and the corrected disparity map of the previous frame to obtain the predicted disparity map of the current frame.

The mode stream can explicitly provide disparity propagation between frames, so that corresponding errors caused by object motion are reduced, and the robustness of the depth scene to the motion scene is improved.

S3: and correcting the predicted parallax image of the current frame by adopting the characteristic matrix of the current frame and the characteristic matrix of the structured light mode to obtain a corrected parallax image of the current frame.

After the corrected parallax map of the current frame is obtained, the depth map of the current frame can be reconstructed by using a common depth reconstruction algorithm in combination with the parameters of the structured light system.

As shown in fig. 5, S3 may specifically include:

s31: and predicting by using the characteristic matrix of the current frame and the characteristic matrix of the structured light mode to obtain a first parallax image of the current frame.

S32: and obtaining a second disparity map of the current frame by using the first disparity map of the current frame and the predicted disparity map of the current frame.

In this embodiment, the depth information of the current frame needs to be acquired, but for the first frame, the previous frame does not exist, and at this time, the first disparity map needs to be introduced. The first disparity map is obtained based on the current frame and the structured light mode prediction, and the first disparity map can be obtained by prediction by using semi-global block matching (SGBM), for example, without using information of the previous frame.

The first disparity map of the current frame and the predicted disparity map of the current frame may be fused to obtain the second disparity map of the current frame.

To combine the two disparity maps, a small network can be used to estimate a soft mask. The input to the network comprises a current frame I^t、

And with

Output soft mask M^tHas a value range of [0,1 ]]. Wherein

For predicting disparity maps according to t time

Back-projecting the projection mode P to a camera space to obtain a drawn image;

is a first disparity map according to time t

The projection pattern P is back projected into the camera space, resulting in a rendered image. And obtaining a second disparity map at the time t by adopting the following formula:

wherein the content of the first and second substances,

for the second disparity map at time t,

for the predicted disparity map at time t,

is the first disparity map at the time t.

Alternatively, one of the first disparity map and the predicted disparity map may be selected as the second disparity map of the current frame according to the time t. For example, if the frame corresponding to the time t is the first frame, the first disparity map is selected as the second disparity map, otherwise, the predicted disparity map is selected as the second disparity map.

S33: and correcting the second parallax image of the current frame by adopting the characteristic matrix of the current frame and the characteristic matrix of the structured light mode.

Specifically, the corrected disparity map of the current frame is obtained through iterative calculation of the second cyclic updating block, the input of the second cyclic updating block includes the second disparity map of the current frame, the feature matrix of the current frame and the feature matrix of the structured light mode, and the output result of the second cyclic updating block at this time is used as the input of the next iterative calculation until the iteration number reaches a preset value. The second loop update block has the same structure as the first loop update block.

S4: and performing self-supervision training on at least one of the feature extraction module, the first prediction module and the correction module by utilizing the luminosity errors of the current frame and the structured light mode.

The characteristic extraction module is used for extracting the characteristics of the current frame, the first prediction module is used for predicting by adopting the characteristic matrix of the current frame and the characteristic matrix of the previous frame, and the correction module is used for correcting the predicted parallax image of the current frame by adopting the characteristic matrix of the current frame and the characteristic matrix of the structured light mode.

If training is complete, this step can be omitted.

The self-supervision training can be further carried out by utilizing the corrected disparity map of the current frame output by the correction module.

The loss function for supervised training includes a mode-fluency loss function L_FPredicted loss function L_CAnd a parallax loss function L_D. The calculation methods thereof are described below, respectively.

Mode fluence loss function L_F: for the slave I^t,I^t-1Estimated mode flow F^tThe associated pixels should ideally correspond to the same light rays, i.e. have the same intensity values, in the reference pattern, i.e. the structured light pattern. Thus, mode flow F^tThe loss of luminosity of (a) is defined as:

where π is the warping function used to give the inter-frame relationship, ρ represents the speckle-based luminance loss between two images, which is defined as follows:

ρ(Iⁱ,I^j)＝∑_x,y∑_u,v∈(x,y)|Iⁱ(u,v)-I^j(u,v)|_C (8)

wherein |. O_CRepresenting a smooth Census transform.

Prediction loss function L_C: the goal of the prediction module is to give as accurate a prediction as possible for the correction module. Therefore, the final modified disparity map should be directed in reverseAnd a prediction module. The prediction loss is the L1 distance between the prediction disparity map and the corrected disparity map:

wherein

The corrected parallax map at time t is shown.

Function of parallax loss L_D: the luminance loss of parallax is similar to the mode stream, except that the reference pattern is then warped to the current frame instead of the previous frame image. And calculating parallax loss by adopting the fused second parallax image and the final corrected parallax image:

where λ is a coefficient, the value can be set to 0.8 experimentally.

The loss function (which may also be referred to as an overall loss function) used by the training feature extraction module is calculated using a mode-flow photometric loss function, a predictive loss function, and a parallax loss function. Defining (T-T +1: T) as a time window of size T, within which I^t-T+1:tAnd P is the input,

for output, the formula for calculating the overall loss function is as follows:

wherein omega_EAs a relative weight factor, the value thereof may be set to 0.2 according to experiments.

The loss function used to train the first prediction module includes a mode-fluency loss function L_FAnd a predictive loss function L_C. Loss function package for training correction moduleIncluding the parallax loss function L_D。

Through the implementation of the embodiment, the characteristic matrix of the previous frame is introduced in the process of calculating the disparity map of the current frame, and the time sequence information brought by the previous frame can effectively reduce the calculation amount required by the depth reconstruction, so that the reconstruction frame rate is improved, and the real-time requirement of the depth reconstruction of the dynamic scene is better met. Meanwhile, the network using the deep reconstruction method provided by the embodiment does not need an additional data true value for training, but is self-supervised training, so that the training difficulty is effectively reduced.

A deep learning network using the embodiment may be referred to as temporal Iterative Disparity Estimation (time), and an overall framework thereof is as shown in fig. 6, and a specific network structure is as shown in fig. 7. The Encoder in fig. 6 represents the feature extraction module, Predictor represents the first prediction module, and Refiner represents the correction module.

The upper left part of fig. 7 is a feature extraction module, which includes two encoders with the same structure, one for extracting image features and one for extracting semantic features. Will I^tInputting the feature extraction module after connecting with the standardized image to obtain the corresponding image feature

And semantic features C_t。

Then, the characteristics of the current frame and the characteristics of the previous frame are input into a first cyclic updating block to obtain a mode stream of the current frame. Specifically, the image feature of the current frame and the image feature of the previous frame may be used to calculate the correlation feature between the current frame and the previous frame, and then the current mode stream and the semantic feature C are superimposed_tAnd inputting the time convolution gate cycle unit. And after the output of the time convolution gate circulation unit is subjected to convolution and upsampling operation, the current updated mode flow is obtained. And inputting the current mode stream as the current mode stream into a time convolution gate circulation unit when the current mode stream is subjected to next iterative computation. And repeating the steps in a circulating way until the times of iterative calculation reach preset values. And the mode flow of the current update obtained by the last iterative computation is the mode flow of the current frame. Front moldThe initial value of the formula stream may be set to 0 or other values, such as the mode stream of the previous frame.

Based on the mode flow of the current frame, a first difference value between the parallax image of the current frame and the parallax image of the previous frame can be determined, and then the first difference value and the corrected parallax image of the previous frame are summed to obtain the predicted parallax image of the current frame

The upper right part of fig. 7 is a small network that estimates the soft mask.

Using the soft mask, will

First disparity map obtained by SGBM prediction

Fusing to obtain a second disparity map

Then will be

And inputting the characteristics of the current frame and the characteristics of the structural light mode into a second cyclic updating block to obtain a corrected parallax image of the current frame. Specifically, the image feature of the current frame and the image feature of the structured light mode (obtained by inputting the structured light mode into the feature extraction module) may be used to calculate the correlation feature between the current frame and the structured light mode, and then the current disparity map and the semantic feature C may be superimposed_tAnd inputting the time convolution gate cycle unit. And after the output of the time convolution gate circulation unit is subjected to convolution and up-sampling operation, the parallax map updated this time is obtained. And inputting the current updated disparity map as the current disparity map into a time convolution gate circulation unit during the next iterative computation. And repeating the steps in a circulating way until the times of iterative computation reach a preset value. And the updated disparity map obtained by the last iterative computation is the corrected disparity map of the current frame. The initial value of the current disparity map is

Fig. 8 is a schematic structural diagram of an embodiment of the depth reconstruction apparatus according to the present application. As shown in fig. 8, the depth reconstruction apparatus includes a feature extraction module 11, a first prediction module 12, and a modification module 13.

The feature extraction module 11 is configured to extract features of a current frame to obtain a feature matrix of the current frame, where the current frame is an image obtained by performing spatial projection shooting on a preset structured light mode and is currently processed.

The first prediction module 12 is configured to perform prediction by using the feature matrix of the current frame and the feature matrix of the previous frame to obtain a predicted disparity map of the current frame.

The correction module 13 is configured to correct the predicted parallax image of the current frame by using the feature matrix of the current frame and the feature matrix of the structured light mode, so as to obtain a corrected parallax image of the current frame.

The functions and possible extensions of the modules in this embodiment may refer to the relevant contents of the embodiments of the deep reconstruction method of this application.

Fig. 9 is a schematic structural diagram of another embodiment of the depth reconstruction device according to the present application. As shown in fig. 9, the depth reconstruction apparatus includes a processor 21 and a memory 22 coupled to the processor 21.

Wherein the memory 22 stores program instructions for implementing the method of any of the above embodiments; processor 21 is operative to execute program instructions stored by memory 22 to implement the steps of the above-described method embodiments. The processor 21 may also be referred to as a CPU (Central Processing Unit). The processor 21 may be an integrated circuit chip having signal processing capabilities. The processor 21 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

FIG. 10 is a schematic structural diagram of an embodiment of a computer-readable storage medium of the present application. As shown in fig. 10, the computer readable storage medium 30 of the embodiment of the present application stores program instructions 31, and the program instructions 31 implement the method provided by the above-mentioned embodiment of the present application when executed. The program instructions 31 may form a program file stored in the computer-readable storage medium 30 in the form of a software product, so as to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the methods according to the embodiments of the present application. And the aforementioned computer-readable storage medium 30 includes: various media capable of storing program codes, such as a usb disk, a mobile hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or terminal devices, such as a computer, a server, a mobile phone, and a tablet.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit. The above embodiments are merely examples and are not intended to limit the scope of the present disclosure, and all modifications, equivalents, and flow charts using the contents of the specification and drawings of the present disclosure or those directly or indirectly applied to other related technical fields are intended to be included in the scope of the present disclosure.

Claims

1. A depth reconstruction method, comprising:

extracting the characteristics of a current frame to obtain a characteristic matrix of the current frame, wherein the current frame is an image obtained by space projection shooting of a preset structured light mode and is currently processed;

predicting by adopting the feature matrix of the current frame and the feature matrix of the previous frame to obtain a predicted disparity map of the current frame;

and correcting the predicted parallax image of the current frame by adopting the characteristic matrix of the current frame and the characteristic matrix of the structured light mode to obtain a corrected parallax image of the current frame.

2. The method of claim 1,

the obtaining of the predicted disparity map of the current frame by predicting the feature matrix of the current frame and the feature matrix of the previous frame includes:

calculating a mode flow between the current frame and the previous frame by using the feature matrix of the current frame and the feature matrix of the previous frame, wherein the mode flow is an optical flow caused by the change of the spatial projection of the structured light mode between the current frame and the previous frame;

and transforming the corrected disparity map of the previous frame by adopting the mode stream to obtain the predicted disparity map of the current frame.

3. The method of claim 2,

the calculating the mode flow between the current frame and the previous frame by using the feature matrix of the current frame and the feature matrix of the previous frame comprises:

and obtaining the mode stream through iterative computation of a first cyclic update block, wherein the input of the first cyclic update block comprises the feature matrix of the current frame and the feature matrix of the previous frame, and the output result of the first cyclic update block at this time is used as the input of the next iterative computation until the iteration times reach a preset value.

4. The method of claim 2,

the transforming the modified disparity map of the previous frame by using the mode stream to obtain the predicted disparity map of the current frame includes:

determining a first difference value between the parallax image of the current frame and the parallax image of the previous frame based on the mode stream, wherein the parallax image of the current frame is the parallax image between the current frame and the structured light mode, and the parallax image of the previous frame is the parallax image between the previous frame and the structured light mode;

and summing the first difference value and the modified disparity map of the previous frame to obtain the predicted disparity map of the current frame.

5. The method of claim 1,

the modifying the predicted parallax map of the current frame by using the feature matrix of the current frame and the feature matrix of the structured light mode includes:

predicting by using the characteristic matrix of the current frame and the characteristic matrix of the structured light mode to obtain a first parallax image of the current frame;

obtaining a second disparity map of the current frame by using the first disparity map of the current frame and the predicted disparity map of the current frame;

and correcting the second parallax image of the current frame by adopting the characteristic matrix of the current frame and the characteristic matrix of the structured light mode.

6. The method of claim 5,

the obtaining the second disparity map of the current frame by using the first disparity map of the current frame and the predicted disparity map of the current frame comprises:

and fusing the first disparity map of the current frame with the prediction disparity map of the current frame to obtain a second disparity map of the current frame.

7. The method of claim 5,

the modifying the second disparity map of the current frame by using the feature matrix of the current frame and the feature matrix of the structured light mode includes:

and obtaining the corrected parallax image of the current frame through iterative calculation of a second cyclic updating block, wherein the input of the second cyclic updating block comprises the second parallax image of the current frame, the characteristic matrix of the current frame and the characteristic matrix of the structured light mode, and the output result of the second cyclic updating block at this time is used as the input of the next iterative calculation until the iteration number reaches a preset value.

8. The method according to any one of claims 1 to 7,

the structured light pattern is a pseudo-random binary pattern and is unique in one of the row and column directions and repeats periodically in the other direction.

9. The method of any one of claims 1-7, further comprising:

and performing self-supervision training on at least one of a feature extraction module, a first prediction module and a correction module by utilizing the luminosity error of the current frame and the structured light mode, wherein the feature extraction module is used for extracting the features of the current frame, the first prediction module is used for predicting by adopting the feature matrix of the current frame and the feature matrix of the previous frame, and the correction module is used for correcting the predicted parallax image of the current frame by adopting the feature matrix of the current frame and the feature matrix of the structured light mode.

10. The method of claim 9, further comprising:

and performing self-supervision training by using the corrected disparity map of the current frame output by the correction module.

11. The method of claim 10,

the loss function used for training the feature extraction module is calculated by using the mode streamer degree loss function, the prediction loss function and the parallax loss function, the loss function used for training the first prediction module comprises the mode streamer degree loss function and the prediction loss function, and the loss function used for training the correction module comprises the parallax loss function.

12. A depth reconstruction apparatus, comprising:

the characteristic extraction module is used for extracting the characteristics of a current frame to obtain a characteristic matrix of the current frame, wherein the current frame is an image obtained by current processing of spatial projection shooting of a preset structured light mode;

the first prediction module is used for predicting by adopting the characteristic matrix of the current frame and the characteristic matrix of the previous frame to obtain a prediction disparity map of the current frame;

and the correction module is used for correcting the predicted parallax image of the current frame by adopting the characteristic matrix of the current frame and the characteristic matrix of the structured light mode to obtain a corrected parallax image of the current frame.

13. A depth reconstruction apparatus comprising a processor, a memory coupled to the processor, wherein,

the memory stores program instructions;

the processor is configured to execute the program instructions stored by the memory to implement the method of any of claims 1-11.

14. A computer-readable storage medium, characterized in that the storage medium stores program instructions that, when executed, implement the method of any of claims 1-11.