CN115830240A - Unsupervised deep learning three-dimensional reconstruction method based on image fusion visual angle - Google Patents

Unsupervised deep learning three-dimensional reconstruction method based on image fusion visual angle Download PDF

Info

Publication number
CN115830240A
CN115830240A CN202211618155.2A CN202211618155A CN115830240A CN 115830240 A CN115830240 A CN 115830240A CN 202211618155 A CN202211618155 A CN 202211618155A CN 115830240 A CN115830240 A CN 115830240A
Authority
CN
China
Prior art keywords
focus
relu
conv
image
formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211618155.2A
Other languages
Chinese (zh)
Inventor
闫涛
盖彦辛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanxi University
Original Assignee
Shanxi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanxi University filed Critical Shanxi University
Priority to CN202211618155.2A priority Critical patent/CN115830240A/en
Publication of CN115830240A publication Critical patent/CN115830240A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses an unsupervised deep learning three-dimensional shape reconstruction method based on an image fusion visual angle. The method comprises the following steps: firstly, acquiring a focus stack of an image and a corresponding focus position; secondly, iterating a focusing area detection module and a down-sampling focusing area detection module to obtain focusing volumes with different scales; then outputting attention of the multi-scale focus volume through a four-layer hourglass network, and further obtaining a predicted depth map and a full focus image of the scene; and finally, the predicted depth map and the full-focus image are subjected to a guide filtering function to obtain a final three-dimensional shape reconstruction result of the scene. The method solves the problem of reconstruction of the three-dimensional appearance of the scene from an unsupervised view angle, and can effectively relieve the problem of difficult real depth annotation in the process of reconstruction of the three-dimensional appearance.

Description

Unsupervised deep learning three-dimensional reconstruction method based on image fusion visual angle
Technical Field
The invention belongs to the field of computer vision, and particularly relates to an unsupervised deep learning three-dimensional reconstruction method based on an image fusion visual angle.
Background
The vision-based three-dimensional reconstruction has the characteristics of high speed, good real-time performance, analysis visualization and the like, and is widely applied to the fields of automatic navigation in the robot field, obstacle identification in computer vision, three-dimensional modeling in architecture, cultural relic restoration in archaeology and the like. Therefore, the universal requirements of multiple fields push the development of the three-dimensional reconstruction technology towards easy realization and high precision.
In computer vision, an additional depth cue is a key for recovering three-dimensional structural information of a scene by using a two-dimensional image, and a traditional three-dimensional reconstruction method starts from depth cues such as defocusing, shading and shapes and often recovers the depth of each pixel by a focus position with maximum sharpness. For example, the focal topography reconstruction method estimates a depth map of a scene by using a change of focal point information in a multi-depth-of-field image sequence as a clue, and is a typical passive optical method. Compared with other methods, the focusing morphology reconstruction method does not need to rely on high-precision depth detection equipment, and scene texture information can be effectively reserved in the reconstruction process. However, the focus measurement operator is affected by noise level, contrast, scene texture, and other factors, resulting in a focal volume containing erroneous focus values, which in turn affects the accuracy of the depth map. Furthermore, calculating depth based on the sharpness of each pixel is time consuming and does not work well for non-textured objects.
The neural network has the advantages that semantic information of an image is effectively extracted, and pixel information is associated through convolution, so that the defects existing in the traditional method can be effectively overcome by introducing deep learning into the depth estimation field to predict the depth of a focus stack. For example, because a strong association relationship exists between the scene depth and the defocus blur amount, a deep learning method for recovering the depth by using defocus information can obtain a depth value more accurate than that of a conventional method by learning a direct regression depth value, but such a deep learning model needs a large amount of data sets with true values, and the true values of the scene are difficult to obtain in practice, so that the deep learning model is widely applied to the field of multi-focus three-dimensional topography reconstruction.
Through the above analysis of the current research situation, we consider that the existing method has the following disadvantages: depth information acquisition devices typically require specialized hardware, such as structured light projectors and laser co-focused laser emitters; the traditional characteristic evaluation method of passive reconstruction lacks scene applicability and method robustness due to the need of prior knowledge intervention; the deep learning technology helps to overcome the problems, but a typical deep learning model needs deep labeling data of a real scene and is difficult to be applied in practice. Therefore, how to realize domain self-adaptation, effectively utilize defocus information and do not need three-dimensional reconstruction of a real scene depth map is an important problem.
Therefore, the depth information is acquired in the image fusion process so as to realize unsupervised three-dimensional shape reconstruction, and the problem of difficulty in three-dimensional shape reconstruction and labeling of a real scene is effectively solved.
Disclosure of Invention
In order to overcome the defects in the existing solution, the invention aims to provide an unsupervised deep learning three-dimensional reconstruction method based on an image fusion visual angle, which comprises the following steps:
step 1, giving a focus stack FS epsilon R H×W×N×C And the corresponding focus position P ∈ R H×W×N×C H and W respectively represent the height and width of the focal slices, N is the number of the focal slices, C is the number of channels, and R represents a real number domain;
step 2, the focus stack FS ∈ R in the step 1 H×W×N×C Obtaining a focus volume FV according to the focus area detection modules of equations (1) to (3) 1 ∈R H×W×N×C
F 1 =dilated(FS) (1)
F 2 =RELU(ResNet(F 1 )+FS) (2)
FV 1 =RELU(conv(RELU(conv(F 2 ))))+F 2 (3)
Where scaled () represents the extended convolution, F 1 For initial characterization, resNet () represents the residual module, RELU () represents the activation function, F 2 For semantic features, conv () represents a 3D convolution module;
step 3, focusing volume FV obtained in step 2 1 Outputting an effective downsampling feature F according to the downsampling focus detection module of equation (4) 3 To feature F 3 Performing feature extraction according to the formula (5) and the formula (6) to obtain a second-scale focal volume
Figure BDA0003998912150000021
F 3 =RELU(stride_conv(FV 1 )+conv(Maxpooling(FV 1 ))) (4)
F 4 =RELU(ResNet(F 3 )+FS) (5)
FV 2 =RELU(conv(RELU(conv(F 4 ))))+F 4 (6)
Where stride _ conv () represents a step convolution, maxpooling () represents a 3D max pooling operation, F 4 Is a semantic feature;
step 4, focusing volume FV obtained in step 3 2 Down-sampling focus detection module of input formula (7) obtains down-sampling output characteristic F 5 Then for feature F 5 Obtaining a third dimension of the focal volume according to equations (8) and (9)
Figure BDA0003998912150000022
F 5 =RELU(stride_conv(FV 2 )+conv(Maxpooling(FV 2 ))) (7)
F 6 =RELU(ResNet(F 5 )+FS) (8)
FV 3 =RELU(conv(RELU(conv(F 6 ))))+F 6 (9)
Wherein F 6 Is a semantic feature;
step 5, the focus volume FV obtained in step 2, step 3 and step 4 is respectively 1 ,FV 2 ,FV 3 Inputting a four-layer hourglass network according to the formula (10) to carry out combination and refinement of different size characteristics, and outputting the middle attention M epsilon R of the maximum sharpness probability of each focus position H×W×N
M=hourglass(FV 1 ,FV 2 ,FV 3 ) (10)
Step 6, normalizing the intermediate attention M obtained in the step 5 according to the formula (11) to obtain the depth map attention M depth And a predicted depth map D is obtained by performing dot multiplication on the focal position P according to the equation (12),
Figure BDA0003998912150000031
Figure BDA0003998912150000032
where F denotes the number of pictures of the focal stack, M i,j,t Representing the intermediate attention value at pixel point (i, j) in the t-th image in the focal stack,
Figure BDA0003998912150000033
representing the attention value of the depth map at the pixel point (i, j) in the t-th slice, wherein the value range of the pixel point (i, j) is that i is more than or equal to 1 and less than or equal to H, j is more than or equal to 1 and less than or equal to W, t is a stack subscript, the range of t is more than or equal to 1 and less than or equal to N, and D i,j Representing depth information of a pixel point (i, j) in a depth map, exp () representing an exponential function, ln () representing a logarithmic function;
step 7, normalizing the intermediate attention M obtained in the step 5 according to the formula (13) to obtain the attention M of the full-focus image AiF And dot-multiplying the focus stack FS according to the equation (14) to obtain a full focus image I,
Figure BDA0003998912150000034
Figure BDA0003998912150000035
wherein
Figure BDA0003998912150000036
Represents the attention value, I, of the fully focused image at pixel point (I, j) in the t-th slice i,j Representing gray information of a pixel point (i, j) in the depth map;
step 8, obtaining a final three-dimensional reconstruction result D of the scene according to the guiding filter function of the formula (15) by the depth map D obtained in the step 6 and the full focus image I obtained in the step 7 depth
D depth =GT(I,D) (15)
Where GT () represents the guided filtering function.
Compared with the prior art, the invention has the following advantages:
(1) The three-dimensional reconstruction method provided by the invention fully utilizes the relation between the depth and the full-focus image estimation, and realizes the unsupervised depth information estimation of a scene;
(2) The three-dimensional reconstruction method provided by the invention has good scene universality, realizes depth estimation by extracting the focus information with invariance in the process of full focus image estimation, and has good scene generalization.
Drawings
FIG. 1 is a flow chart of an unsupervised deep learning three-dimensional topography reconstruction method based on an image fusion view angle according to the present invention;
FIG. 2 is a schematic diagram of an unsupervised deep learning three-dimensional morphology reconstruction method based on an image fusion view angle according to the present invention;
FIG. 3 is a schematic view of a focus region detection module of an unsupervised deep learning three-dimensional topography reconstruction method based on an image fusion view angle according to the present invention;
FIG. 4 is a schematic diagram of a down-sampling focusing detection module of an unsupervised deep learning three-dimensional feature reconstruction method based on an image fusion view angle according to the present invention;
FIG. 5 shows four layers of the unsupervised deep learning three-dimensional morphology reconstruction method based on image fusion view angle of the present invention
Schematic representation of the hourglass network.
Detailed Description
As shown in fig. 1 and 2, a method for reconstructing unsupervised deep learning three-dimensional topography based on image fusion view angle includes the following steps:
step 1, giving a focus stack FS epsilon R H×W×N×C And the corresponding focus position P ∈ R H×W×N×C H and W respectively represent the height and width of the focal slices, N is the number of the focal slices, C is the number of channels, and R represents a real number domain;
step 2, focusing in the step 1Stack FS ∈ R H×W×N×C Obtaining a focus volume FV according to the focus area detection modules of equations (1) to (3) 1 ∈R H×W×N×C As shown in fig. 3, in this example,
F 1 =dilated(FS) (1)
F 2 =RELU(ResNet(F 1 )+FS) (2)
FV 1 =RELU(conv(RELU(conv(F 2 ))))+F 2 (3)
where scaled () represents the extended convolution, F 1 For the initial feature, resNet () represents the residual module, RELU () represents the activation function, F 2 For semantic features, conv () represents a 3D convolution module;
step 3, focusing volume FV obtained in step 2 1 The down-sampling focus detection module according to equation (4) (as shown in fig. 4) outputs a valid down-sampling feature F 3 To feature F 3 Performing feature extraction according to the formulas (5) and (6) to obtain a second-scale focal volume
Figure BDA0003998912150000051
F 3 =RELU(stride_conv(FV 1 )+conv(Maxpooling(FV 1 ))) (4)
F 4 =RELU(ResNet(F 3 )+FS) (5)
FV 2 =RELU(conv(RELU(conv(F 4 ))))+F 4 (6)
Where stride _ conv () represents a step convolution, maxpooling () represents a 3D max pooling operation, F 4 Is a semantic feature;
step 4, focusing volume FV obtained in step 3 2 The down-sampling focus detection module of input formula (7) (as shown in FIG. 4) obtains the down-sampling output characteristic F 5 Then for feature F 5 Obtaining a third dimension of the focal volume according to equations (8) and (9)
Figure BDA0003998912150000052
F 5 =RELU(stride_conv(FV 2 )+conv(Maxpooling(FV 2 ))) (7)
F 6 =RELU(ResNet(F 5 )+FS) (8)
FV 3 =RELU(conv(RELU(conv(F 6 ))))+F 6 (9)
Wherein F 6 Is a semantic feature;
step 5, the focus volume FV obtained in step 2, step 3 and step 4 is respectively 1 ,FV 2 ,FV 3 The method comprises the steps of inputting a four-layer hourglass network (shown in figure 5) according to the formula (10) to carry out combination and refinement of different size features, and outputting the middle attention M epsilon R of the maximum sharpness probability of each focus position H×W×N
M=hourglass(FV 1 ,FV 2 ,FV 3 ) (10)
Step 6, normalizing the intermediate attention M obtained in the step 5 according to the formula (11) to obtain the depth map attention M depth And a predicted depth map D is obtained by performing dot multiplication on the focal position P according to the equation (12),
Figure BDA0003998912150000053
Figure BDA0003998912150000054
where F denotes the number of pictures of the focal stack, M i,j,t Representing the intermediate attention value at pixel point (i, j) in the t-th image in the focal stack,
Figure BDA0003998912150000055
representing the attention value of the depth map at the pixel point (i, j) in the t-th slice, wherein the value range of the pixel point (i, j) is that i is more than or equal to 1 and less than or equal to H, j is more than or equal to 1 and less than or equal to W, t is a stack subscript, the range of t is more than or equal to 1 and less than or equal to N, and D i,j Representing depth information of a pixel point (i, j) in a depth map, exp () representing an exponentFunction, ln () represents a logarithmic function;
step 7, normalizing the intermediate attention M obtained in the step 5 according to the formula (13) to obtain the attention M of the full-focus image AiF And dot-multiplying the focal stack FS according to the equation (14) to obtain a full focus image I,
Figure BDA0003998912150000061
Figure BDA0003998912150000062
wherein
Figure BDA0003998912150000063
Represents the attention value, I, of the fully focused image at pixel point (I, j) in the t-th slice i,j Representing gray information of a pixel point (i, j) in the depth map;
step 8, obtaining a final three-dimensional reconstruction result D of the scene according to the guiding filter function of the formula (15) by using the depth map D obtained in the step 6 and the full focus image I obtained in the step 7 depth
D depth =GT(I,D) (15)
Where GT () denotes a guided filter function.

Claims (1)

1. An unsupervised deep learning three-dimensional shape reconstruction method based on an image fusion visual angle is characterized by comprising the following steps of:
step 1, giving a focus stack FS epsilon R H×W×N×C And the corresponding focus position P ∈ R H×W×N×C H and W respectively represent the height and width of the focal slices, N is the number of the focal slices, C is the number of channels, and R represents a real number domain;
step 2, the focus stack FS ∈ R in the step 1 H×W×N×C Obtaining a focus volume FV according to the focus area detection modules of equations (1) to (3) 1 ∈R H×W×N×C
F 1 =dilated(FS) (1)
F 2 =RELU(ResNet(F 1 )+FS) (2)
FV 1 =RELU(conv(RELU(conv(F 2 ))))+F 2 (3)
Where scaled () represents the extended convolution, F 1 For the initial feature, resNet () represents the residual module, RELU () represents the activation function, F 2 For semantic features, conv () represents a 3D convolution module;
step 3, focusing volume FV obtained in step 2 1 Outputting an effective downsampling feature F according to the downsampling focus detection module of equation (4) 3 To feature F 3 Performing feature extraction according to the formula (5) and the formula (6) to obtain a second-scale focal volume
Figure FDA0003998912140000011
F 3 =RELU(stride_conv(FV 1 )+conv(Maxpooling(FV 1 ))) (4)
F 4 =RELU(ResNet(F 3 )+FS) (5)
FV 2 =RELU(conv(RELU(conv(F 4 ))))+F 4 (6)
Where stride _ conv () represents a stride convolution, maxpooling () represents a 3D max pooling operation, F 4 Is a semantic feature;
step 4, focusing volume FV obtained in step 3 2 Down-sampling focus detection module of input formula (7) obtains down-sampling output characteristic F 5 Then for feature F 5 Obtaining a third dimension of the focal volume according to equations (8) and (9)
Figure FDA0003998912140000012
F 5 =RELU(stride_conv(FV 2 )+conv(Maxpooling(FV 2 ))) (7)
F 6 =RELU(ResNet(F 5 )+FS) (8)
FV 3 =RELU(conv(RELU(conv(F 6 ))))+F 6 (9)
Wherein F 6 Is a semantic feature;
step 5, the focus volume FV obtained in step 2, step 3 and step 4 is respectively 1 ,FV 2 ,FV 3 Inputting a four-layer hourglass network according to the formula (10) to carry out combination and refinement of different size characteristics, and outputting the middle attention M epsilon R of the maximum sharpness probability of each focus position H×W×N
M=hourglass(FV 1 ,FV 2 ,FV 3 ) (10)
Step 6, normalizing the intermediate attention M obtained in the step 5 according to the formula (11) to obtain the depth map attention M depth And a predicted depth map D is obtained by performing dot multiplication on the focal position P according to the equation (12),
Figure FDA0003998912140000021
Figure FDA0003998912140000022
where F denotes the number of pictures of the focal stack, M i,j,t Representing the intermediate attention value at pixel point (i, j) in the t-th image in the focal stack,
Figure FDA0003998912140000023
representing the attention value of the depth map at the pixel point (i, j) in the t-th slice, wherein the value range of the pixel point (i, j) is that i is more than or equal to 1 and less than or equal to H, j is more than or equal to 1 and less than or equal to W, t is a stack subscript, the range of t is more than or equal to 1 and less than or equal to N, and D i,j Representing depth information of a pixel point (i, j) in the depth map, exp () representing an exponential function, ln () representing a logarithmic function;
step 7, normalizing the intermediate attention M obtained in the step 5 according to the formula (13) to obtain the attention M of the full-focus image AiF And according to formula (14) with cokeThe point stack FS performs a point multiplication to obtain a fully focused image I,
Figure FDA0003998912140000024
Figure FDA0003998912140000025
wherein
Figure FDA0003998912140000026
Represents the attention value, I, of the fully focused image at pixel point (I, j) in the t-th slice i,j Representing gray information of a pixel point (i, j) in the depth map;
step 8, obtaining a final three-dimensional reconstruction result D of the scene according to the guiding filter function of the formula (15) by the depth map D obtained in the step 6 and the full focus image I obtained in the step 7 depth
D depth =GT(I,D) (15)
Where GT () represents the guided filtering function.
CN202211618155.2A 2022-12-14 2022-12-14 Unsupervised deep learning three-dimensional reconstruction method based on image fusion visual angle Pending CN115830240A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211618155.2A CN115830240A (en) 2022-12-14 2022-12-14 Unsupervised deep learning three-dimensional reconstruction method based on image fusion visual angle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211618155.2A CN115830240A (en) 2022-12-14 2022-12-14 Unsupervised deep learning three-dimensional reconstruction method based on image fusion visual angle

Publications (1)

Publication Number Publication Date
CN115830240A true CN115830240A (en) 2023-03-21

Family

ID=85545876

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211618155.2A Pending CN115830240A (en) 2022-12-14 2022-12-14 Unsupervised deep learning three-dimensional reconstruction method based on image fusion visual angle

Country Status (1)

Country Link
CN (1) CN115830240A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116823914A (en) * 2023-08-30 2023-09-29 中国科学技术大学 Unsupervised focal stack depth estimation method based on all-focusing image synthesis
CN117274788A (en) * 2023-10-07 2023-12-22 南开大学 Sonar image target positioning method, system, electronic equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116823914A (en) * 2023-08-30 2023-09-29 中国科学技术大学 Unsupervised focal stack depth estimation method based on all-focusing image synthesis
CN116823914B (en) * 2023-08-30 2024-01-09 中国科学技术大学 Unsupervised focal stack depth estimation method based on all-focusing image synthesis
CN117274788A (en) * 2023-10-07 2023-12-22 南开大学 Sonar image target positioning method, system, electronic equipment and storage medium
CN117274788B (en) * 2023-10-07 2024-04-30 南开大学 Sonar image target positioning method, system, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN113240691B (en) Medical image segmentation method based on U-shaped network
CN115830240A (en) Unsupervised deep learning three-dimensional reconstruction method based on image fusion visual angle
US9297995B2 (en) Automatic stereological analysis of biological tissue including section thickness determination
CN109389057B (en) Object detection method based on multi-scale advanced semantic fusion network
CN107590831B (en) Stereo matching method based on deep learning
CN109035172B (en) Non-local mean ultrasonic image denoising method based on deep learning
CN111462191B (en) Non-local filter unsupervised optical flow estimation method based on deep learning
CN112734915A (en) Multi-view stereoscopic vision three-dimensional scene reconstruction method based on deep learning
CN111524117A (en) Tunnel surface defect detection method based on characteristic pyramid network
CN115424017B (en) Building inner and outer contour segmentation method, device and storage medium
CN115147709B (en) Underwater target three-dimensional reconstruction method based on deep learning
CN114897738A (en) Image blind restoration method based on semantic inconsistency detection
CN111738295A (en) Image segmentation method and storage medium
CN111105451B (en) Driving scene binocular depth estimation method for overcoming occlusion effect
CN110490915B (en) Point cloud registration method based on convolution-limited Boltzmann machine
CN116310098A (en) Multi-view three-dimensional reconstruction method based on attention mechanism and variable convolution depth network
CN115661459A (en) 2D mean teacher model using difference information
CN115222884A (en) Space object analysis and modeling optimization method based on artificial intelligence
CN112926667A (en) Method and device for detecting saliency target of depth fusion edge and high-level feature
CN115965641A (en) Pharyngeal image segmentation and positioning method based on deplapv 3+ network
CN110738113B (en) Object detection method based on adjacent scale feature filtering and transferring
CN113837947A (en) Processing method for obtaining optical coherence tomography large focal depth image
CN113808202A (en) Multi-target detection and space positioning method and system thereof
CN112614092A (en) Spine detection method and device
CN112288669A (en) Point cloud map acquisition method based on light field imaging

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination