CN116091706B - Three-dimensional reconstruction method for multi-mode remote sensing image deep learning matching - Google Patents

Three-dimensional reconstruction method for multi-mode remote sensing image deep learning matching Download PDF

Info

Publication number
CN116091706B
CN116091706B CN202310363863.4A CN202310363863A CN116091706B CN 116091706 B CN116091706 B CN 116091706B CN 202310363863 A CN202310363863 A CN 202310363863A CN 116091706 B CN116091706 B CN 116091706B
Authority
CN
China
Prior art keywords
matching
image
descriptor
mode
distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310363863.4A
Other languages
Chinese (zh)
Other versions
CN116091706A (en
Inventor
姚国标
张力
艾海滨
张进
任晓芳
傅青青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Jianzhu University
Original Assignee
Shandong Jianzhu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Jianzhu University filed Critical Shandong Jianzhu University
Priority to CN202310363863.4A priority Critical patent/CN116091706B/en
Publication of CN116091706A publication Critical patent/CN116091706A/en
Application granted granted Critical
Publication of CN116091706B publication Critical patent/CN116091706B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Graphics (AREA)
  • Biophysics (AREA)
  • Geometry (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

In the current real-scene three-dimensional model acquisition process, a large number of terrain elements among a large number of multi-mode images and point-line surface features thereof still depend on manual discrimination and extraction.

Description

Three-dimensional reconstruction method for multi-mode remote sensing image deep learning matching
Technical Field
The invention relates to the fields of new generation information technology, digital photogrammetry, computer vision, artificial intelligence and intersection thereof, in particular to a three-dimensional reconstruction method for multi-mode remote sensing image deep learning matching.
Background
The multi-mode remote sensing data of the same area, such as optical images, infrared images, synthetic aperture radars (Synthetic Aperture Radar, SAR) and the like, can provide complementary supporting information such as textures, geometries, spectrums, radiation and the like for three-dimensional reconstruction of a live-action by applying processing technologies such as homonymous feature extraction, image fusion, analysis and the like. However, in the current live-action three-dimensional model acquisition process, a large number of topographic elements and dot line and plane characteristics among a large number of multi-mode images still depend on manual discrimination and extraction, huge manpower and material resources are consumed, and measurement accuracy is limited by the technical level of operators. Specifically, the prior art has several problems:
(1) The imaging light sources and imaging mechanisms of the current multi-mode images such as optical and SAR images are often in intrinsic difference due to the fact that the images are derived from different types of sensors, so that the problems of low repetition rate, rare number and uneven spatial distribution of the homonymous features of the multi-mode images, rare matching of the homonymous features and even matching failure easily occur, and the automation and intelligent level of the current geographic information mapping technology is seriously bound.
(2) The conventional matching method cannot adapt to geometric deformation among images, and prior information is generally needed to be roughly corrected on the basis of the prior information before the image matching is carried out by applying the method to eliminate the geometric deformation among the images. When the first-check information is lost, geometric transformation between images is estimated by manually collecting ground control points, and manual point selection is time-consuming and labor-consuming, so that popularization and application of the method are limited to a certain extent.
(3) The multi-mode remote sensing image often presents difficult states such as large scale, disorder and the like, the algorithms of three-dimensional digital modeling acquisition methods such as home and abroad multi-source multi-mode remote sensing technology and the like are well developed but a system is not formed, the automatic construction of a semantic entity model is still immature, time-consuming manual operation needs to be introduced, and an integrated efficient computing framework integrating multi-mode overlapping image rapid retrieval, deep learning matching and real-scene three-dimensional reconstruction is lacked.
Disclosure of Invention
The invention aims to provide a three-dimensional reconstruction method for deep learning matching of multi-mode remote sensing images, which not only can solve the problem of failure of matching of homonymous features, but also can form a mature system framework.
The invention aims to achieve the aim, and the aim is achieved by the following technical scheme:
a three-dimensional reconstruction method for multi-mode remote sensing image deep learning matching comprises the following steps:
(1) deep learning feature extraction and description algorithm construction
Acquiring a multi-mode homonymy influence block training data set, learning based on the data set to acquire a trained DNN model, and performing feature extraction and multi-channel descriptor output;
(2) feature matching algorithm establishment of full-connection network iteration
Fusing the multi-channel descriptors, constructing complementary comprehensive descriptors considering various information of the images and a matching measure model, and completing optimal matching of the multi-mode image features through self-adaptive iteration of the fully connected network FCN;
(3) matching algorithm integration and live-action three-dimensional reconstruction application
And constructing an application frame integrating multi-mode overlapping image quick retrieval, deep learning matching and live-action three-dimensional reconstruction.
The method for obtaining the multi-mode homonymy influence block training data set comprises the following steps of: and introducing the multimode homonymous image block data set, and performing data augmentation by using a deep generation matching network to obtain the multimode homonymous influence block training data set.
The DNN network structure comprises two key subnetworks, namely affine invariant learning HesAffNet and nonlinear brightness distortion learning NLIntensNet, wherein the former is used for learning affine distortion of the same-name region, and the latter is used for learning brightness distortion.
The multi-mode overlapped image quick search comprises the following steps: the method comprises the steps of based on multi-mode image combination retrieval of visual features, introducing the maximum stable extremum region features in computer vision, setting a feature main direction as 0 direction, generating feature descriptors by utilizing pixel information of feature point neighborhood, thus obtaining a descriptor vector set, generating visual words by using a hierarchical K-means algorithm, constructing a vocabulary tree, generating an inverted document at each node, recording the length of the document, weighting the visual words, and retrieving the image combination with overlapped or similar content; and constructing a random forest based on the detected visual features in the images to be matched by a K-d tree algorithm, and adopting an optimal adjacent point searching strategy and a principal component analysis method to accelerate searching so as to determine the overlapping area of the images to be matched.
The training method of the DNN comprises the following steps: for the multi-mode homonymous image blocks with the size of 64x64 pixels, respectively passing through
Figure SMS_1
And
Figure SMS_2
is further cut into homonymous image blocks with the size of 32x32 pixels; respectively leading the affine 4 parameter matrix into the twin HesAffNet with shared weight, and outputting the affine 4 parameter matrix through network learning>
Figure SMS_3
And->
Figure SMS_4
Further generating a geometric normalized homonymous image block; the method comprises the steps of importing the image blocks with the same name into a twin NLIntenNet with shared weights, carrying out nonlinear radiation distortion learning, outputting a 3-channel descriptor with unchanged brightness, and according to an established multi-mode loss function:
Figure SMS_5
the loss function values of different channels can be calculated respectively, wherein,
Figure SMS_6
for the number of samples>
Figure SMS_7
For multimode match descriptor distance, < >>
Figure SMS_8
Distance for multimode nearest neighbor non-matching descriptor; finally, based on a large number of training samples, a random gradient descent method and residual error backward iterative propagation are utilized, so that a loss function finally tends to be minimum, and the HesAffNet and NLIntensNet combined training and global optimization are completed.
Further, the multi-channel descriptor in the step (2) includes a direction gradient descriptor G, a structural information descriptor S and a local extremum response descriptor R, and the dimension of the descriptor vector is reduced by using a PCA principal component analysis method, and the three descriptors after dimension reduction can be expressed as:
Figure SMS_9
wherein,,nthe number of features extracted is represented by the number,
Figure SMS_10
local ladder respectively representing characteristicsA degree vector, an overall structure information entropy vector and a neighborhood Gaussian Laplace extremum vector; then the three descriptors are fused, and a comprehensive descriptor which considers the global and local information of the image is constructed>
Figure SMS_11
Figure SMS_12
Establishing a robust matching measure model of the comprehensive descriptor
Figure SMS_13
Figure SMS_14
Wherein,,
Figure SMS_15
respectively representing the gradient space distance, the structure information entropy distance and the local response distance,
Figure SMS_16
respectively represent the weighted values corresponding to the three distances, thus +.>
Figure SMS_17
Obtaining the distance weighted value of the descriptor vector through the self-adaptive iteration of the fully-connected network FCN
Figure SMS_18
And a match measure threshold +.>
Figure SMS_19
And (5) completing the optimal matching of the multi-mode image features.
In order to reduce the influence of partial distance extremum and improve the robustness and universality of the matching measure, the gradient space distance, the structure information entropy distance and the local response distance are respectively normalized to [0,1] by utilizing a sigmoid function.
The fully connected network FCN adaptive iteration steps include:
(1) Three distance weighted values in the matching measure are initialized
Figure SMS_20
、/>
Figure SMS_21
、/>
Figure SMS_22
The matching measure threshold is to be initialized to +.>
Figure SMS_23
(2) Under double geometric constraint, based on comprehensive descriptors
Figure SMS_24
The initialized matching measure model +.>
Figure SMS_25
Matching measure threshold +.>
Figure SMS_26
Acquiring an initial homonymous matching set +.>
Figure SMS_27
Estimating a new homography matrix and a new fundamental matrix of the stereopair;
(3) Adaptive generation of new distance weights by FCN
Figure SMS_28
Matching measure threshold +.>
Figure SMS_29
Initializing each parameter by referring to the value of the step (1);
(3) Repeating the steps (2) and (3) until the number of the homonymous features is no longer changed, terminating the iteration, and outputting the best matching result.
The three-dimensional reconstruction of the live-action comprises the following steps:
(1) Combining three-dimensional points observed by each sensor into a collineation conditional equation, and jointly obtaining object space three-dimensional coordinates of each characteristic point;
(2) Expanding a matching window to the whole overlapping area to be matched by using a deep learning matching algorithm and object space geometric information constraint to obtain a pixel-by-pixel dense matching point cloud;
(3) And (3) carrying out automatic dense matching and three-dimensional reconstruction on the multi-mode images by utilizing the adjustment of a beam method.
Preferably, the three-dimensional reconstruction of the live-action further comprises the steps of:
determining the image point coordinates of the space points on each image by utilizing a collineation condition equation according to the three-dimensional information of the model, screening target textures based on a vector included angle formed by the normal vector of the current model surface and the surface center to the photographing center, and selecting the target textures according to the size of the reversely projected texture area;
automatically cutting the target texture by adopting a minimum area external rectangle strategy;
and according to a reverse mapping rule, adopting a ray tracing method to sample point by point, and realizing automatic registration of textures and models.
The invention has the advantages that:
aiming at the problems of low repetition rate, rare number, uneven spatial distribution and the like of the homonymous feature extraction of the multi-modal image, the scheme provides a multi-modal feature extraction and description algorithm fused with a deep neural network on the basis of respectively designing an affine invariant feature extraction network and a brightness invariant descriptor network, and improves the repetition rate, the detection number and the spatial distribution uniformity of the features by utilizing the powerful learning and optimizing functions of the invariant network, thereby guaranteeing the invariance, universality and accuracy of feature extraction and description and laying a foundation for reliable matching of the subsequent multi-modal image;
because the scale, azimuth, surface brightness and neighborhood information of the same space target on the multi-mode image are subjected to complex distortion or missing, great difficulty is formed for describing and matching image characteristics;
the multi-mode remote sensing image often presents difficult states such as large scale, disorder and the like, so the multi-mode remote sensing large data intelligent application is oriented, and the scheme provides an integrated efficient computing framework integrating multi-mode overlapped image rapid retrieval, deep learning matching and live-action three-dimensional reconstruction, and finally builds a three-dimensional geographic information product production technology system.
The method is oriented to the multi-mode remote sensing image sequence, can fully automatically identify and reliably extract the homonymous features of various terrain elements, and then replaces manual work with a computer to realize intelligent processing and deep analysis of the multi-mode remote sensing image, so that the geographic information productivity of mapping is liberated and developed, and key technology and high-quality data support are provided for real-scene three-dimensional Chinese construction.
Drawings
FIG. 1 is a flow chart of the general technique of the invention;
FIG. 2 is a schematic diagram of a deep learning feature extraction and description strategy according to the present invention;
fig. 3 is a block diagram of an adaptive matching algorithm for FCN iteration of the present invention;
FIG. 4 is a diagram of a matching algorithm integration and live three-dimensional reconstruction application framework of the invention;
FIG. 5 is a schematic diagram of a multi-modal image block-to-data augmentation strategy based on DGMN according to the present invention;
FIG. 6 is a flowchart illustrating the constant feature extraction and DNN construction and training of the present invention;
FIG. 7 is a diagram of a full-automatic reconstruction effect of a three-dimensional surface model fused with a multi-mode image set according to an embodiment of the present invention, wherein (a) is a side view of the reconstruction of the three-dimensional surface model with pose information, and (b) is a top view of the reconstruction of the three-dimensional surface model;
FIG. 8 is a fully automatic registration of a real texture and a model according to an embodiment of the present invention, wherein (a) is the best texture auto-selection, (b) is the best texture auto-clipping, and (c) is the texture auto-mapping.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
The invention discloses a three-dimensional reconstruction method for deep learning matching of a multi-mode remote sensing image, and the flow is shown in FIG. 1, and the method comprises three steps of deep learning feature extraction and description algorithm construction, feature matching algorithm establishment of full-connection network iteration, matching algorithm integration and live-action three-dimensional reconstruction application. The detailed method is as follows:
step one, deep learning feature extraction and description algorithm construction
Referring to fig. 2, the description performance of the dnn model on the invariant feature depends on the breadth and number of training samples to a certain extent, so that the internationally disclosed multi-source homonymous image block data sets (such as SEN1-2, UBC data sets, etc.) are introduced, and on the basis, the enhancement of the existing data sets is maximally realized by adopting a depth generation matching network (Deep Generative Matching Network, DGMN) as shown in fig. 5, wherein G1 represents an optical to SAR image block converter, and G2 represents an SAR to optical image block converter. First, generating a plurality of matched and non-matched image block pairs using a generation countermeasure network (Generative Adversarial Network, GAN); then, a matching label of the image block is output by utilizing a generated matching network (Generative Matching Network, GMN), wherein '1' and '0' respectively represent matching and non-matching marks, and a sufficient quantity of multi-mode homonymous image block training data sets are finally obtained, so that a sample foundation is laid for the training and learning of the deep neural network (Deep Neural Network, DNN).
The DNN comprises two key subnetworks, namely affine invariant learning HesAffNet and nonlinear brightness distortion learning NLIntensNet, wherein the affine invariant learning HesAffNet is used for learning affine distortion of a same-name region and outputting affine 4 parameters (4D); the latter is used to learn the luminance distortion and output a 3-channel descriptor. The DNN training process needs to respectively design a Hessian affine invariant neighborhood parameter extraction network HesAffNet and a nonlinear brightness invariant descriptor to generate a network NLIntensNet, and thenConstructing a feature extraction and description network DNN; then, learning to obtain a DNN model with good geometric and brightness invariable performance based on the multi-mode homonymous image block training data set; and then, extracting Hessian corner features of the test image, extracting affine invariant neighborhood and brightness invariant multi-channel descriptors thereof by using DNN (digital network) on the Hessian corner, and obtaining geometric and brightness invariant descriptors of the multi-mode image. As shown in FIG. 6, the above-mentioned flow is constructed by first passing through the multi-mode homonymous image blocks with 64x64 pixels
Figure SMS_30
And->
Figure SMS_31
Is further cut into homonymous image blocks with the size of 32x32 pixels; then respectively leading the affine 4 parameter matrix into the twin HesAffNet with shared weight values, and outputting the affine 4 parameter matrix through network learning>
Figure SMS_32
And->
Figure SMS_33
Further generating a geometric normalized homonymous image block; then, the image blocks with the same name are imported into a twin NLIntenNet with shared weights to perform nonlinear radiation distortion learning, and a 3-channel descriptor with unchanged brightness is output, and according to the established multi-mode loss function:
Figure SMS_34
where k is the number of samples,
Figure SMS_35
for multimode match descriptor distance, < >>
Figure SMS_36
The loss function values of different channels can be calculated for the distance of the nearest neighbor non-matching descriptors of the multimode; finally, based on a large number of training samplesAnd finally, utilizing a random gradient descent method and residual backward iterative propagation to enable a loss function to tend to be minimum, thereby completing HesAffNet and NLIntensNet combined training and global optimization.
Step two, establishing a feature matching algorithm of full-connection network iteration
The geometric and brightness unchanged descriptor results of the multi-mode image are obtained through the first step, the second step is to fuse the multi-channel descriptors, and the complementary comprehensive descriptors considering the global structural information of the image, the local gradient, the extreme value and other information are to be constructed; on the basis, a robust matching measure model based on the comprehensive descriptor is constructed; then, through self-adaptive iteration of the fully connected network (Fully Connected Network, FCN), reliable distance weighted values of various descriptors and reliable self-adaptive measure thresholds are obtained, so that optimal matching of multi-mode image features is achieved.
Referring to fig. 3, first, a directional gradient descriptor G, a structural information descriptor S and a local extremum response descriptor R are fused, and the vector dimension of the descriptor is reduced by using a principal component analysis (Principal Components Analysis, PCA) method to improve the subsequent operation efficiency, and the three descriptors after dimension reduction can be expressed as:
Figure SMS_37
wherein,,nthe number of features extracted is represented by the number,
Figure SMS_38
a local gradient vector, an overall structure information entropy vector and a neighborhood Gaussian Laplace extremum vector which respectively represent the characteristics; then the three descriptors are fused, and a comprehensive descriptor which considers the global and local information of the image is constructed>
Figure SMS_39
Figure SMS_40
Then, a robust matching measure calculation model based on the comprehensive descriptor is establishedEM
Figure SMS_41
Wherein,,
Figure SMS_42
representing the spatial distance of the gradient, the entropy distance of the structural information and the local response distance respectively (note: to attenuate the influence of the partial distance extremum to improve the robustness and universality of the matching measure, it is necessary to normalize the three distances to 0,1 respectively by means of a sigmoid function)]) And->
Figure SMS_43
Respectively represent the weighted values corresponding to the three distances, thus
Figure SMS_44
Then, the distance weighted value of the descriptor vector is obtained through the adaptive iteration of the fully connected network FCN
Figure SMS_46
And a match measure threshold +.>
Figure SMS_51
. The iteration steps can be briefly described as: step1, three distance weights in the matching measure are to be initialized to +.>
Figure SMS_53
、/>
Figure SMS_48
、/>
Figure SMS_50
The matching measure threshold is to be initialized to +.>
Figure SMS_54
The method comprises the steps of carrying out a first treatment on the surface of the Step2, feature-based comprehensive descriptor ++under double geometric constraints>
Figure SMS_56
Matching measure after initialization->
Figure SMS_45
Matching measure threshold +.>
Figure SMS_49
Acquiring an initial homonymous matching set +.>
Figure SMS_52
Estimating a new homography matrix and a new fundamental matrix of the stereopair; step3, adaptively generating a new more reliable distance weighting value +.>
Figure SMS_55
And a match measure threshold +.>
Figure SMS_47
Initializing each parameter by referring to Step 1; step4, repeatedly executing Step2 and Step3 until the number of the homonymous features is no longer changed, terminating iteration, and outputting the best matching result.
Step three, matching algorithm integration and live-action three-dimensional reconstruction application
Referring to fig. 4, the scheme establishes an application framework integrating three contents of multi-mode overlapping image fast retrieval, deep learning matching, live three-dimensional reconstruction and the like.
Firstly, in order to accurately and efficiently acquire a multi-mode image sequence to be matched from a large-scale multi-mode unordered image, introducing the characteristics of a maximum stable extremum region (Maximally Stable Extremal Regions, MSERs) widely applied in computer vision, wherein the characteristics have stability on the visual angle, brightness, noise and the like of the image, setting the main direction of the characteristics as 0 direction, and generating a characteristic descriptor by utilizing the pixel information of the neighborhood of the characteristic points, thereby acquiring a descriptor vector set S. And then a layered K-means algorithm is adopted to generate a visual word from the vector set S and construct a vocabulary tree, then an inverted document is generated for each node and the length of the document is recorded, and a TF-IDF (Term Frequency-Inverse Document Frequency) statistical technique is adopted to weight the visual word and search out the overlapping content or similar image combination. On the basis, a K-d tree algorithm is adopted to construct a random forest for the detected visual features in the images to be matched, and an optimal adjacent point searching strategy and a principal component analysis method are adopted to accelerate searching, so that the overlapping area of the images to be matched is rapidly determined.
Then, an image blocking and parallel distributed processing technology is introduced, uniformly distributed Hessian characteristic points are efficiently extracted from an overlapping area, hesAffNet and NLIntensNet network models are integrated, hessian affine invariant neighborhood detection and nonlinear brightness invariant descriptor generation are realized, on the basis, an FCN self-adaptive iterative matching technology is applied to obtain an optimal matching result of multi-mode image characteristics, and the joint positioning of multi-mode multi-source images is realized.
Finally, in order to further recover the image gestures and the dense object point cloud positions, the following technical scheme is adopted:
(1) And recovering the three-dimensional coordinates of the feature points. Considering that the position of the same object point on the multi-view image relates to at least two sensors, combining three-dimensional points observed by each sensor into a collineation conditional equation, and jointly solving the object three-dimensional coordinates of each characteristic point.
(2) And (5) extracting high-density three-dimensional point clouds. And expanding a matching window to the whole overlapping area to be matched by using a deep learning matching algorithm and object space geometric information constraint, and finally obtaining the pixel-by-pixel dense matching point cloud. In the dense matching process, the self-adaptive window and gray scale weighting strategy are adopted, so that deformation caused by topography fluctuation, scale change and the like can be automatically compensated, and the reliability of dense matching propagation is ensured.
(3) And (5) leveling by a beam method. The beam method adjustment is according to the following formula:
Figure SMS_57
wherein,,
Figure SMS_58
for the sensor pose parameters, +.>
Figure SMS_59
And->
Figure SMS_60
Respectively, three-dimensional point coordinates of an object and corresponding image point coordinates +.>
Figure SMS_61
Representing a loss function, which can be used to robustly optimize the gross error; and the combined nonlinear least square error is estimated by using the method, and the sum of the re-projection errors is minimized, so that the pose parameters of the sensor and the coordinates of the three-dimensional point of the object can be optimized at the same time. The scene optimization processing is performed by integrating the nonlinear least square model with strong expansibility, automatic dense matching and three-dimensional reconstruction are performed on 1190 multi-mode images of the test area, and the initial test result is shown in fig. 7, so that the feasibility of the method is shown.
After the three-dimensional surface restoration of the object is completed, the real texture automatic mapping is performed, and the specific scheme comprises the following steps:
(1) And selecting the optimal texture. According to the three-dimensional information of the model, the image point coordinates of the space points on each image are determined by utilizing a collineation condition equation, then the target texture is further screened based on the vector included angle formed by the normal vector of the current model surface and the surface center to the photographing center, and the optimal target texture is preferably obtained according to the size of the reversely projected texture area.
(2) And (5) cutting and extracting textures. The minimum area external rectangle strategy is adopted for automatic cutting, so that the storage space is saved to the greatest extent while the sense of reality of textures is ensured.
(3) And (5) automatically mapping textures. And according to a reverse mapping rule, adopting a ray tracing method to sample point by point, and finally realizing automatic registration of textures and models. The texture mapping effect of the local model is shown in fig. 8.
Finally, it should be noted that: the foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (2)

1. The three-dimensional reconstruction method for the multi-mode remote sensing image deep learning matching is characterized by comprising the following steps of:
(1) deep learning feature extraction and description algorithm construction
Introducing a multi-mode homonymous image block data set, performing data augmentation by using a depth generation matching network to obtain a multi-mode homonymous influence block training data set, learning based on the data set to obtain a trained DNN model, and performing feature extraction and multi-channel descriptor output;
the network structure of the DNN model comprises two key subnetworks, namely affine invariant learning HesAffNet and nonlinear brightness distortion learning NLIntenNet, wherein the affine distortion of the same-name region is learned by the affine invariant learning HesAffNet, and the brightness distortion is learned by the nonlinear brightness distortion learning NLIntenNet;
the training method of the DNN comprises the following steps: for the multi-mode homonymous image blocks with the size of 64x64 pixels, the multi-mode homonymous image blocks respectively pass through T 1 And T 2 Is further cut into homonymous image blocks with the size of 32x32 pixels; respectively leading the affine 4 parameter matrix A into the twin HesAffNet with shared weight values, and outputting the affine 4 parameter matrix A through network learning 1 And A 2 Further generating a geometric normalized homonymous image block; the method comprises the steps of importing the image blocks with the same name into a twin NLIntenNet with shared weights, carrying out nonlinear radiation distortion learning, outputting a 3-channel descriptor with unchanged brightness, and according to an established multi-mode loss function:
Figure QLYQS_1
the loss function values of different channels can be calculated respectively, wherein k is the number of samples, d (m i ,p i ) For multimode matching descriptor distance, d (m i ,n i ) Distance for multimode nearest neighbor non-matching descriptor; finally, based on a large number of training samples, the random gradient descent method and residual error backward iterative propagation are utilized to enable the loss function to finally tend to be minimum, thereby completing the HesAffNet and NLIntensNet connectionTraining and global optimization are combined;
(2) feature matching algorithm establishment of full-connection network iteration
Fusing the multi-channel descriptors, constructing complementary comprehensive descriptors considering various information of the images and a matching measure model, and completing optimal matching of the multi-mode image features through self-adaptive iteration of the fully connected network FCN;
the multi-channel descriptor comprises a direction gradient descriptor G, a structural information descriptor S and a local extremum response descriptor R, the vector dimension of the descriptor is simplified by using a PCA principal component analysis method, and three descriptors after dimension reduction can be expressed as:
Figure QLYQS_2
wherein n represents the number of extracted features, g i 、s i And r i A local gradient vector, an overall structure information entropy vector and a neighborhood Gaussian Laplace extremum vector which respectively represent the characteristics; then fusing the three descriptors, and constructing a comprehensive descriptor F taking global and local information of the image into consideration:
Figure QLYQS_3
establishing a robust matching measure model EM of the comprehensive descriptor:
EM=k 1 E 1 +k 2 E 2 +k 3 E 3
wherein E is 1 、E 2 、E 3 Respectively representing gradient space distance, structure information entropy distance and local response distance, k 1 、k 2 、k 3 Respectively represent the weighted values corresponding to the three distances, so that EM E [0,1]];
Obtaining the distance weighted value k of the descriptor vector through the self-adaptive iteration of the fully-connected network FCN 1 、k 2 、k 3 And a matching measure threshold T E The optimal matching of the multi-mode image features is completed;
the step of self-adapting iteration of the fully connected network FCN comprises the following steps:
(21) Three distance weighted values in the matching measure are to be initialized to k 10 =1/3、k 20 =1/3、k 30 =1/3, the matching measure threshold is to be initialized to T E0 =0.6;
(22) Under the double geometric constraint, based on the comprehensive descriptor F, the initialized matching measure model EM and the matching measure threshold T E0 Acquiring an initial homonymous matching set phi 0 Estimating a new homography matrix and a new fundamental matrix of the stereopair;
(23) Adaptive generation of new distance weighting value k by FCN 1 、k 2 、k 3 Threshold T of matching measure E Initializing each parameter by referring to the value of the step (1);
(24) Repeating the steps (22) and (23) until the number of the homonymous features is no longer changed, terminating the iteration, and outputting the best matching result;
(3) matching algorithm integration and live-action three-dimensional reconstruction application
Constructing an application frame integrating multi-mode overlapping image rapid retrieval, deep learning matching and live-action three-dimensional reconstruction;
the multi-mode overlapped image quick search comprises the following steps: the method comprises the steps of based on multi-mode image combination retrieval of visual features, introducing the maximum stable extremum region features in computer vision, setting a feature main direction as 0 direction, generating feature descriptors by utilizing pixel information of feature point neighborhood, thus obtaining a descriptor vector set, generating visual words by using a hierarchical K-means algorithm, constructing a vocabulary tree, generating an inverted document at each node, recording the length of the document, weighting the visual words, and retrieving the image combination with overlapped or similar content; constructing a random forest based on the detected visual features in the images to be matched by a K-d tree algorithm, and adopting an optimal adjacent point searching strategy and a principal component analysis method to accelerate searching so as to determine an overlapping region of the images to be matched;
then, introducing an image blocking and parallel distributed processing technology, efficiently extracting uniformly distributed Hessian characteristic points from an overlapping area, integrating a HesAffNet network model and an NLIntensNet network model, realizing Hessian affine invariant neighborhood detection and nonlinear brightness invariant descriptor generation, and obtaining an optimal matching result of multi-mode image characteristics by applying an FCN adaptive iterative matching technology on the basis, so as to realize the joint positioning of multi-mode multi-source images;
finally, in order to further recover the image gestures and the dense object point cloud positions, the following technical scheme is adopted:
(31) Recovering the three-dimensional coordinates of the feature points; considering that the position of the same object point on the multi-view image relates to at least two sensors, combining three-dimensional points observed by each sensor into a collineation conditional equation, and jointly solving the object three-dimensional coordinates of each characteristic point;
(32) Extracting high-density three-dimensional point clouds; expanding a matching window to the whole overlapping area to be matched by using a deep learning matching algorithm and object space geometric information constraint, and finally obtaining a pixel-by-pixel dense matching point cloud; in the dense matching process, the deformation caused by relief of topography, change of scale and the like can be automatically compensated by adopting a self-adaptive window and gray scale weighting strategy, so that the reliability of dense matching propagation is ensured;
(33) Adjustment by a beam method; the beam method adjustment is according to the following formula:
Figure QLYQS_4
wherein P is a sensor pose parameter, X and X are respectively an object space three-dimensional point coordinate and a corresponding image point coordinate, ρ represents a loss function, and the loss function can be used for robust optimization of gross error; estimating a joint nonlinear least squares error using the above and minimizing a sum of the reprojection errors;
(4) after the three-dimensional surface restoration of the object is completed, the real texture automatic mapping is performed, and the specific scheme comprises the following steps:
(41) Selecting an optimal texture; determining the coordinates of image points of space points on each image by utilizing a collineation condition equation according to the three-dimensional information of the model, then further screening target textures based on a vector included angle formed by the normal vector of the current model surface and the surface center to the photographing center, and optimizing the optimal target textures according to the size of the reversely projected texture area;
(42) Cutting and extracting textures; the minimum area external rectangle strategy is adopted for automatic cutting, so that the storage space is saved to the greatest extent while the sense of reality of textures is ensured;
(43) Automatically mapping textures; and according to a reverse mapping rule, adopting a ray tracing method to sample point by point, and finally realizing automatic registration of textures and models.
2. The three-dimensional reconstruction method for deep learning matching of multi-modal remote sensing images according to claim 1, wherein the gradient spatial distance, the structural information entropy distance and the local response distance are normalized to [0,1] respectively by using a sigmoid function.
CN202310363863.4A 2023-04-07 2023-04-07 Three-dimensional reconstruction method for multi-mode remote sensing image deep learning matching Active CN116091706B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310363863.4A CN116091706B (en) 2023-04-07 2023-04-07 Three-dimensional reconstruction method for multi-mode remote sensing image deep learning matching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310363863.4A CN116091706B (en) 2023-04-07 2023-04-07 Three-dimensional reconstruction method for multi-mode remote sensing image deep learning matching

Publications (2)

Publication Number Publication Date
CN116091706A CN116091706A (en) 2023-05-09
CN116091706B true CN116091706B (en) 2023-06-20

Family

ID=86204841

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310363863.4A Active CN116091706B (en) 2023-04-07 2023-04-07 Three-dimensional reconstruction method for multi-mode remote sensing image deep learning matching

Country Status (1)

Country Link
CN (1) CN116091706B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117078982B (en) * 2023-10-16 2024-01-26 山东建筑大学 Deep learning-based large-dip-angle stereoscopic image alignment dense feature matching method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563438A (en) * 2017-08-31 2018-01-09 西南交通大学 The multi-modal Remote Sensing Images Matching Method and system of a kind of fast robust

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414968B (en) * 2020-03-26 2022-05-03 西南交通大学 Multi-mode remote sensing image matching method based on convolutional neural network characteristic diagram
WO2022043910A1 (en) * 2020-08-27 2022-03-03 Iitb-Monash Research Academy Systems and methods for automatically enhancing low-dose pet images with robustness to out-of-distribution (ood) data
CN113393524B (en) * 2021-06-18 2023-09-26 常州大学 Target pose estimation method combining deep learning and contour point cloud reconstruction
CN113971760B (en) * 2021-10-26 2024-02-06 山东建筑大学 High-quality quasi-dense complementary feature extraction method based on deep learning
CN115661336A (en) * 2022-09-21 2023-01-31 华为技术有限公司 Three-dimensional reconstruction method and related device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563438A (en) * 2017-08-31 2018-01-09 西南交通大学 The multi-modal Remote Sensing Images Matching Method and system of a kind of fast robust

Also Published As

Publication number Publication date
CN116091706A (en) 2023-05-09

Similar Documents

Publication Publication Date Title
Maltezos et al. Building extraction from LiDAR data applying deep convolutional neural networks
CN105184801B (en) It is a kind of based on multi-level tactful optics and SAR image high-precision method for registering
CN112529015A (en) Three-dimensional point cloud processing method, device and equipment based on geometric unwrapping
CN112818925B (en) Urban building and crown identification method
CN113052066B (en) Multi-mode fusion method based on multi-view and image segmentation in three-dimensional target detection
CN111998862B (en) BNN-based dense binocular SLAM method
CN116091706B (en) Three-dimensional reconstruction method for multi-mode remote sensing image deep learning matching
CN116612468A (en) Three-dimensional target detection method based on multi-mode fusion and depth attention mechanism
CN106295657A (en) A kind of method extracting human height&#39;s feature during video data structure
CN115546116B (en) Full-coverage type rock mass discontinuous surface extraction and interval calculation method and system
CN114972968A (en) Tray identification and pose estimation method based on multiple neural networks
CN111444923A (en) Image semantic segmentation method and device under natural scene
CN115222884A (en) Space object analysis and modeling optimization method based on artificial intelligence
CN114611635B (en) Object identification method and device, storage medium and electronic device
CN113989604A (en) Tire DOT information identification method based on end-to-end deep learning
CN115937552A (en) Image matching method based on fusion of manual features and depth features
CN117078982B (en) Deep learning-based large-dip-angle stereoscopic image alignment dense feature matching method
Parmehr et al. Automatic registration of optical imagery with 3d lidar data using local combined mutual information
CN113936047A (en) Dense depth map generation method and system
Li et al. Feature point extraction and tracking based on a local adaptive threshold
CN113971760B (en) High-quality quasi-dense complementary feature extraction method based on deep learning
CN114359660B (en) Multi-modal target detection method and system suitable for modal intensity change
CN112200850B (en) ORB extraction method based on mature characteristic points
CN111435537B (en) Model training method and device and pose optimization method and device based on mosaic
CN111435086B (en) Navigation method and device based on splicing map

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant