CN116704123A - Three-dimensional reconstruction method combined with image main body extraction technology - Google Patents

Three-dimensional reconstruction method combined with image main body extraction technology Download PDF

Info

Publication number
CN116704123A
CN116704123A CN202310604414.4A CN202310604414A CN116704123A CN 116704123 A CN116704123 A CN 116704123A CN 202310604414 A CN202310604414 A CN 202310604414A CN 116704123 A CN116704123 A CN 116704123A
Authority
CN
China
Prior art keywords
image
segmentation
dimensional reconstruction
label
main body
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310604414.4A
Other languages
Chinese (zh)
Inventor
吴磊
叶许超
尹志城
程秀超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hebei Dinglian Technology Co ltd
Original Assignee
Hebei Dinglian Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hebei Dinglian Technology Co ltd filed Critical Hebei Dinglian Technology Co ltd
Priority to CN202310604414.4A priority Critical patent/CN116704123A/en
Publication of CN116704123A publication Critical patent/CN116704123A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Geometry (AREA)
  • Computer Graphics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a three-dimensional reconstruction method combining an image main body extraction technology, which comprises the following steps: sequentially placing foreground objects into at least two different spatial postures, and respectively collecting image sequences; performing interactive image segmentation on part of the images, clicking a front Jing Shengcheng positive label, clicking a background to generate a negative label, and generating a segmentation result; combining image feature extraction and an image feature matching algorithm, sequentially transmitting the positive label and the negative label to an image pair with highest matching degree through image feature matching points, so as to finish image segmentation; taking the image extracted by the main body and the image matching result as input, executing an SfM process, completing the calculation of camera parameters, and triangulating a sparse point cloud according to the calculated camera parameters; registering the sparse point clouds which finish camera parameter calculation under different spatial postures, uniformly transforming the data to the same coordinate system, and executing a subsequent reconstruction step on the data transformed to the uniform coordinate system to finish three-dimensional reconstruction.

Description

Three-dimensional reconstruction method combined with image main body extraction technology
Technical Field
The invention relates to the technical field of three-dimensional reconstruction, in particular to a three-dimensional reconstruction method combined with an image main body extraction technology.
Background
Three-dimensional reconstruction is a computer technique that uses two-dimensional projections to recover three-dimensional information (shape, etc.) of an object. With the rapid development of computer software and hardware, the real-time performance of drawing large-scale and high-precision three-dimensional scenes is continuously improved, and the difficulty is greatly reduced. Meanwhile, three-dimensional reconstruction technology with high computational complexity has also advanced, and some classical three-dimensional reconstruction algorithms have been successfully applied in the fields of reverse engineering, video entertainment, industrial design, cultural relics protection, urban informatization such as "digital earth", "smart city", and the like. The three-dimensional modeling approach can be divided into three types: three-dimensional software modeling based on geometric information, distance-based measurement modeling, and image-based modeling. Three data sources correspond to three modeling methods. The image-based model reconstruction method has the advantages of convenience in data acquisition, low equipment price and the like, and rapidly becomes one of the important methods of current three-dimensional modeling. The generation of three-dimensional models via motion restoration structures (SfM, structure from Motion) using two-dimensional images is a key step in image-based reconstruction, making it a popular research direction in recent years based on image three-dimensional modeling.
However, the three-dimensional reconstruction process is easy to be interfered by the background in the acquired image, and meanwhile, the reconstructed three-dimensional model needs manual intervention to delete the background part. The manual mode has the defects that: during image acquisition, the surrounding environment is acquired outside the main body object, the three-dimensional reconstruction process including feature matching and the like is influenced, and the accuracy is reduced. The three-dimensional model containing the environment scene needs manual intervention and model cutting to remove the background scene, and no better method is available at present to solve the problem. Before three-dimensional reconstruction, the image is preprocessed, the background is removed, the main body is extracted, the interference of the background can be reduced, and the manual interference can be reduced.
For foreground extraction, mask_rcnn is a classical image style deep learning framework. Different instances in a picture can be partitioned on a priori basis. But because mask rcnn is a priori based, it is not well processed for objects or combinations of objects that have not been seen. In the three-dimensional reconstruction task, a plurality of unusual objects exist, and the foreground extraction task cannot be effectively performed.
Based on saliency detection techniques, such as SaliencyFilters, often do not distinguish between foreground and background correctly, attempts to combine SaliencyFilters with SuperPixels have not been ideal.
When the image sequence of the object is acquired, the generated model needs to be manually repaired because the bottom of the object cannot be acquired.
The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.
Disclosure of Invention
The invention aims to provide a three-dimensional reconstruction method combined with an image main body extraction technology, so as to solve the technical problems in the prior art.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
the invention provides a three-dimensional reconstruction method combined with an image main body extraction technology, which comprises the following steps:
s1, sequentially placing foreground objects into at least two different spatial postures, and respectively collecting image sequences under each spatial posture;
s2, performing interactive image segmentation on part of the image, clicking a front Jing Shengcheng positive label, clicking a background to generate a negative label, and generating a segmentation result according to the positive label and the negative label;
s3, combining image feature extraction and an image feature matching algorithm, sequentially transmitting the positive label and the negative label to an image pair with highest matching degree through image feature matching points, and thus completing image segmentation;
s4, taking the image extracted by the main body and an image matching result as input, executing an SfM process, completing camera parameter calculation, and triangulating a sparse point cloud according to the calculated camera parameters;
and S5, registering the sparse point clouds which finish camera parameter calculation under different spatial postures, uniformly transforming the data to the same coordinate system, realizing coordinate alignment, and executing a subsequent reconstruction step on the data transformed to the uniform coordinate system to finish three-dimensional reconstruction.
Further, the interactive image segmentation in the step S2 includes: after clicking a single image, generating a rough mask by the network model according to the clicking area, extracting the area concerned by the user through an image morphology algorithm, carrying out local prediction, updating the mask, carrying out fine segmentation, taking the mask and the original image obtained in the previous step as input, adopting a conditional random field, combining color consistency and feature similarity, constructing an energy equation, and refining a segmentation boundary to obtain a segmentation result.
Further, the step S3 includes: searching a new image with highest matching degree with the segmented image, wherein the image characteristic points matched with the foreground part become positive labels of the new image, the image characteristic points matched with the background part become negative labels of the new image, and the new image is automatically segmented.
By adopting the technical scheme, the invention has the following beneficial effects:
the invention adopts a method based on image technology, collects image sequences of at least two different spatial attitudes of an object, extracts the main body of the image, removes the background, reduces the interference of surrounding environment, improves the precision of three-dimensional reconstruction, and performs pose registration on the intermediate results of the two groups of image sequences in the reconstruction process and transforms the intermediate results into the same coordinate system. The reconstructed three-dimensional model does not need manual intervention either so as to remove the environmental part and generate complete model results.
According to the invention, the interactive segmentation operation is not required for each image by a user, only the interactive image segmentation is required for part of the images, then the image feature extraction and the image feature matching algorithm are combined, and the positive labels and the negative labels are sequentially transmitted to the image pair with the highest matching degree through the image feature matching points. This process is performed until all the images that can be matched have been image segmented, leaving the results of foreground extraction and image matching.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description will briefly explain the drawings needed in the embodiments or the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a three-dimensional reconstruction method combined with an image main body extraction technology according to an embodiment of the present invention;
fig. 2 is a flowchart of interactive image segmentation according to an embodiment of the present invention.
Description of the embodiments
The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The following describes specific embodiments of the present invention in detail with reference to the drawings. It should be understood that the detailed description and specific examples, while indicating and illustrating the invention, are not intended to limit the invention.
Referring to fig. 1 and fig. 2, the embodiment provides a specific implementation manner of a three-dimensional reconstruction method combining with an image main body extraction technology, specifically, an object a to be reconstructed is placed in a first posture, image data acquisition is performed according to a shooting requirement based on image three-dimensional reconstruction, and an image sequence L1 is obtained, wherein the bottom of the object a cannot be acquired due to shielding.
And placing the object A to be rebuilt in a second posture, so that the part incapable of being collected due to shielding is fully displayed, and collecting image data to obtain an image sequence L2.
And storing the two groups of image sequences under two file paths, and performing three-dimensional reconstruction.
Referring to fig. 1, in the reconstruction process, an image matching process, a camera parameter resolving process and a sparse point cloud generating process are sequentially performed on 2 groups of data in combination with interactive segmentation. And then registering based on the point cloud, carrying out coordinate transformation on the other group of data by taking the coordinate space of the one group of data as a reference, and unifying the coordinate spaces of the two groups of data. And after the coordinates are aligned, dense point cloud reconstruction, gridding and texture attaching processes are carried out.
The image matching process in combination with interactive image segmentation is embodied as follows. First, interactive image segmentation is performed in a partial image in which a subject appears to be significant. And marking the foreground points and the background points in a clicking mode, and segmenting according to a weight mask generated by the foreground points and the background points by adopting a deep learning method. The image segmentation network in the step can select a proper semantic segmentation network according to the requirement, such as a full convolution neural network, a semantic segmentation network combined with Self-Attention and transform, and the like, so as to generate a rough segmentation result.
And (3) taking the result generated in the step as a mask, adopting a conditional random field, combining color consistency and feature similarity, constructing an energy equation, and refining a segmentation boundary to obtain a refined segmentation result.
The feature extraction and matching are carried out on all images, and the conventional ORB, SURF, SIFT algorithm and the deep learning-based SuperPoint and SuperGlue algorithm are commonly used. And (3) dividing the partial image to obtain a corresponding mask, wherein the characteristic points falling in the foreground region correspond to positive labels, and the characteristic points falling in the background region correspond to negative labels. And traversing the residual images, dividing the image with the highest matching degree every time, and generating an initial mask by taking the characteristic points corresponding to the positive labels and the matching points corresponding to the negative labels as inputs in the characteristic matching point pairs of the divided images, so as to divide the image. And when no new matching image exists, completing the image segmentation engineering.
Taking the image which is subjected to image matching and foreground extraction as input, carrying out camera parameter calculation, and generating sparse point cloud according to the dispersed camera parameters.
When the 2 groups of acquired image sequences all generate sparse point clouds, registering the 2 groups of sparse point clouds, and solving 7 degrees of freedom including translation, selection and scale by adopting a CPD (Coherent Point Drift) algorithm. And (3) taking the data coordinate space calculated by the image sequence L1 as a reference, transforming the data calculated by the image sequence L1 into the same coordinate space, realizing coordinate alignment, combining the two groups of data, and carrying out subsequent processing, wherein the data cover the complete object to be reconstructed. And dense reconstruction, gridding and texture attachment are carried out through algorithms based on multi-view imaging such as OpenMVS, so that three-dimensional reconstruction is realized.
The three-dimensional reconstruction based on the image combines the interactive image segmentation technology; based on three-dimensional reconstruction of images, data acquisition is carried out on the object to be reconstructed in 2 different postures, and two groups of data are subjected to coordinate alignment in the reconstruction process, so that final reconstruction is completed, and a complete reconstruction result is generated.
Of course, it is also possible to place the reconstructed object a in three or more spatial poses, thereby obtaining three or more sets of image sequences, and further to process each set of image sequences.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (3)

1. The three-dimensional reconstruction method combined with the image main body extraction technology is characterized by comprising the following steps of:
s1, sequentially placing foreground objects into at least two different spatial postures, and respectively collecting image sequences under each spatial posture;
s2, performing interactive image segmentation on part of the image, clicking a front Jing Shengcheng positive label, clicking a background to generate a negative label, and generating a segmentation result according to the positive label and the negative label;
s3, combining image feature extraction and an image feature matching algorithm, sequentially transmitting the positive label and the negative label to an image pair with highest matching degree through image feature matching points, and thus completing image segmentation;
s4, taking the image extracted by the main body and an image matching result as input, executing an SfM process, completing camera parameter calculation, and triangulating a sparse point cloud according to the calculated camera parameters;
and S5, registering the sparse point clouds which finish camera parameter calculation under different spatial postures, uniformly transforming the data to the same coordinate system, realizing coordinate alignment, and executing a subsequent reconstruction step on the data transformed to the uniform coordinate system to finish three-dimensional reconstruction.
2. The three-dimensional reconstruction method according to claim 1, wherein the interactive image segmentation in step S2 comprises: after clicking a single image, generating a rough mask by the network model according to the clicking area, extracting the area concerned by the user through an image morphology algorithm, carrying out local prediction, updating the mask, carrying out fine segmentation, taking the mask and the original image obtained in the previous step as input, adopting a conditional random field, combining color consistency and feature similarity, constructing an energy equation, and refining a segmentation boundary to obtain a segmentation result.
3. The three-dimensional reconstruction method according to claim 1, wherein the step S3 comprises: searching a new image with highest matching degree with the segmented image, wherein the image characteristic points matched with the foreground part become positive labels of the new image, the image characteristic points matched with the background part become negative labels of the new image, and the new image is automatically segmented.
CN202310604414.4A 2023-05-26 2023-05-26 Three-dimensional reconstruction method combined with image main body extraction technology Pending CN116704123A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310604414.4A CN116704123A (en) 2023-05-26 2023-05-26 Three-dimensional reconstruction method combined with image main body extraction technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310604414.4A CN116704123A (en) 2023-05-26 2023-05-26 Three-dimensional reconstruction method combined with image main body extraction technology

Publications (1)

Publication Number Publication Date
CN116704123A true CN116704123A (en) 2023-09-05

Family

ID=87834999

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310604414.4A Pending CN116704123A (en) 2023-05-26 2023-05-26 Three-dimensional reconstruction method combined with image main body extraction technology

Country Status (1)

Country Link
CN (1) CN116704123A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117710500A (en) * 2023-12-08 2024-03-15 广东创意热店互联网科技有限公司 E-commerce image generation method based on diffusion model

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117710500A (en) * 2023-12-08 2024-03-15 广东创意热店互联网科技有限公司 E-commerce image generation method based on diffusion model

Similar Documents

Publication Publication Date Title
CN110458939B (en) Indoor scene modeling method based on visual angle generation
Zhang et al. Deep dense multi-scale network for snow removal using semantic and depth priors
CN106803267B (en) Kinect-based indoor scene three-dimensional reconstruction method
CN110135455B (en) Image matching method, device and computer readable storage medium
CN111243093B (en) Three-dimensional face grid generation method, device, equipment and storage medium
CN111063021B (en) Method and device for establishing three-dimensional reconstruction model of space moving target
Meuleman et al. Progressively optimized local radiance fields for robust view synthesis
CN107481279B (en) Monocular video depth map calculation method
CN111524233A (en) Three-dimensional reconstruction method for dynamic target of static scene
WO2018133119A1 (en) Method and system for three-dimensional reconstruction of complete indoor scene based on depth camera
CN115496864B (en) Model construction method, model reconstruction device, electronic equipment and storage medium
Kang et al. Competitive learning of facial fitting and synthesis using uv energy
CN116704123A (en) Three-dimensional reconstruction method combined with image main body extraction technology
CN116433843A (en) Three-dimensional model reconstruction method and device based on binocular vision reconstruction route
Rara et al. Model-based 3D shape recovery from single images of unknown pose and illumination using a small number of feature points
CN116310098A (en) Multi-view three-dimensional reconstruction method based on attention mechanism and variable convolution depth network
Huang et al. ES-Net: An efficient stereo matching network
Zhu et al. Occlusion-free scene recovery via neural radiance fields
CN111161348A (en) Monocular camera-based object pose estimation method, device and equipment
CN115115847B (en) Three-dimensional sparse reconstruction method and device and electronic device
CN116878524A (en) Dynamic SLAM dense map construction method based on pyramid L-K optical flow and multi-view geometric constraint
CN113610969B (en) Three-dimensional human body model generation method and device, electronic equipment and storage medium
CN111461141B (en) Equipment pose calculating method and device
CN113111741A (en) Assembly state identification method based on three-dimensional feature points
CN111508063A (en) Three-dimensional reconstruction method and system based on image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination