CN106952338B - Three-dimensional reconstruction method and system based on deep learning and readable storage medium - Google Patents

Three-dimensional reconstruction method and system based on deep learning and readable storage medium Download PDF

Info

Publication number
CN106952338B
CN106952338B CN201710150304.XA CN201710150304A CN106952338B CN 106952338 B CN106952338 B CN 106952338B CN 201710150304 A CN201710150304 A CN 201710150304A CN 106952338 B CN106952338 B CN 106952338B
Authority
CN
China
Prior art keywords
picture
scoremap
vanishing point
vanishing
dimensional reconstruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710150304.XA
Other languages
Chinese (zh)
Other versions
CN106952338A (en
Inventor
夏侯佐鑫
陈志国
丛林
李晓燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Yixian Advanced Technology Co ltd
Original Assignee
Netease Hangzhou Network Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Netease Hangzhou Network Co Ltd filed Critical Netease Hangzhou Network Co Ltd
Priority to CN201710150304.XA priority Critical patent/CN106952338B/en
Publication of CN106952338A publication Critical patent/CN106952338A/en
Application granted granted Critical
Publication of CN106952338B publication Critical patent/CN106952338B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects

Abstract

The embodiment of the invention provides a method and a system for three-dimensional reconstruction based on deep learning. Generating a plurality of structures by estimating vanishing points of the input picture; determining a target structure from the plurality of structures based on the deep learning extracted structural features; and performing three-dimensional reconstruction according to the target structure and the information of the vanishing point. The method and the device have the advantages that the structural features are extracted by utilizing deep learning, the extracted structural features are accurate and good in robustness, the effect of three-dimensional reconstruction is further improved, the method and the device are different from the method and the device in the prior art that the same feature extraction operation needs to be carried out on all structures, the extracted structural features utilizing deep learning in the embodiment of the invention are suitable for all structures, and the efficiency of three-dimensional reconstruction is further improved.

Description

Three-dimensional reconstruction method and system based on deep learning and readable storage medium
Technical Field
The embodiment of the invention relates to the technical field of communication and computers, in particular to a method, a system and a readable storage medium for three-dimensional reconstruction based on deep learning.
Background
This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
With the development of scientific technology, reconstruction technology has received more and more attention in the fields of computer vision and computer graphics, and in the last decades, many scholars have proposed methods for three-dimensional reconstruction based on two-dimensional images, for example, generating possible room scene layouts through sequence constraint, connectivity constraint, vanishing point constraint, coplanar constraint and boundary constraint, then selecting an optimal result from many scene layouts by using a maximum energy method, and finally screening the result from the layout of cubes by using a structured learning method of a conditional random field. The method utilizes a scene layout screening method of a conditional random field.
For another example, the three-dimensional reconstruction is performed by: A. acquiring single image data by using a common camera, extracting straight line characteristics in the image, grouping the straight lines by adopting an EM (effective magnetic field) iterative algorithm, solving a vanishing point, and calibrating the camera according to the information of the vanishing point; B. carrying out support analysis by using the classification straight lines to obtain a wall surface to which each pixel in the image belongs, thereby obtaining a preliminary scene structure; C. extracting and constructing an image scene graph from the preliminary scene structure through the interactive operation of a user, and optimizing the scene structure in the interactive process so as to obtain final three-dimensional scene structure information; D. and registering the obtained three-dimensional scene structure into a unified three-dimensional scene by using the characteristics of the single directed line segment, so that the three-dimensional scene structure under the single camera is expanded to a larger range.
Disclosure of Invention
However, in the three-dimensional reconstruction method in the prior art, it is necessary to manually design features for all possible room structures, however, on one hand, manually designed feature parameters are sensitive, unstable and poor in robustness, so that the three-dimensional reconstruction effect is poor. On the other hand, the number of structures generated by the vanishing points is large, the feature extraction operation is carried out on different structures, the operation cost in the processes of feature extraction and structure screening is very high, and the efficiency of three-dimensional reconstruction is reduced.
Therefore, a new method, a system and a readable storage medium for three-dimensional reconstruction based on deep learning are needed, which utilize deep learning to extract features, avoid artificial design of features, and improve the efficiency and effect of three-dimensional reconstruction.
In this context, embodiments of the present invention are intended to provide a method, a system and a readable storage medium for deep learning based three-dimensional reconstruction.
In a first aspect of embodiments of the present invention, there is provided a method for three-dimensional reconstruction based on deep learning, including: estimating vanishing points of the input pictures to generate a plurality of structures;
determining a target structure from the plurality of structures based on the deep learning extracted structural features;
and performing three-dimensional reconstruction according to the target structure and the information of the vanishing point.
In some embodiments, based on the foregoing scheme, the estimating a vanishing point of the input picture includes:
extracting a plurality of line segments of the picture by adopting an LSD algorithm;
a vanishing point is estimated from a plurality of intersections of the plurality of line segments.
In some embodiments, based on the foregoing scheme, the estimating a vanishing point from a plurality of intersections of the plurality of line segments includes:
and estimating a vanishing point from a plurality of intersection points formed by the plurality of line segments according to the included angle between each intersection point and the midpoint of each line segment and the length of each line segment.
In some embodiments, based on the foregoing, the method comprises:
voting is carried out on each intersection point through the following formula, and a vanishing point is estimated according to the voting value of each intersection point:
Figure GDA0001270544300000021
where v represents the vote value of the intersection, α represents the angle between the intersection P and the midpoint of the line segment L, and | L | is the length of the line segment and represents a constant.
In some embodiments, the number of vanishing points is 3, based on the foregoing scheme.
In some embodiments, based on the foregoing scheme, the deep learning comprises a full convolution neural network FCN.
In some embodiments, based on the foregoing scheme, the structure extracted based on deep learning
Characterizing a target structure, including:
extracting Scoremap of the picture based on the FCN; wherein the Scoremap is used for describing structural features;
determining a target structure from the plurality of structures according to the Scoremap.
In some embodiments, based on the foregoing scheme, before determining the target structure from the plurality of structures according to the ScoreMap, the method further includes:
based on the structural features, negative sample suppression is performed on the plurality of structures.
In some embodiments, based on the foregoing solution, the performing negative example suppression on the plurality of structures based on the structural feature includes:
and carrying out binarization operation on the Scoremap, then carrying out at least one operation of corrosion and expansion, and carrying out negative sample inhibition on the structures by using the Scoremap of the operated picture.
In some embodiments, based on the foregoing solution, the determining a target structure from the plurality of structures according to the ScoreMap includes:
calculating a score for each structure, from which a target structure is determined, by the following formula:
Figure GDA0001270544300000031
wherein, yiFor a graph of structural features in the resulting structure, all _ pixels denote all pixels of the graph, piOutput of [0, 1] for the FCN]Scoremap of range.
In some embodiments, based on the foregoing solution, the performing three-dimensional reconstruction according to the target structure and the information of the vanishing point includes:
estimating camera parameters corresponding to the pictures according to the vanishing points;
and determining a three-dimensional structure corresponding to the picture according to the camera parameters, the vanishing point and the determined target structure, so as to realize three-dimensional reconstruction.
In some embodiments, based on the foregoing scheme, the estimating, according to the vanishing point, a camera parameter corresponding to the picture includes:
using the manhattan assumption, the focal length is estimated by minimizing the loss function as follows:
Figure GDA0001270544300000041
wherein, min E (f)k) Expressing an optimization operation for finding the minimum, E being an optimization objective function, fkFor variables to be optimized, vixFor the previously calculated x-coordinate, v, of the i-th vanishing pointiyFor the previously calculated y-coordinate of the i-th vanishing point, fkRepresenting the focus value.
In a second aspect of the present invention, there is provided a system for deep learning based three-dimensional reconstruction, comprising: the estimation module is used for estimating vanishing points of the input pictures and generating a plurality of structures; a determining module for determining a target structure from the plurality of structures based on the structure features extracted by deep learning; and the reconstruction module is used for performing three-dimensional reconstruction according to the target structure and the information of the vanishing point.
In some embodiments, based on the foregoing, the estimating module includes:
the extraction unit is used for extracting a plurality of line segments of the picture by adopting an LSD algorithm;
an estimating unit configured to estimate a vanishing point from a plurality of intersections composed of the plurality of line segments.
In some embodiments, based on the foregoing scheme, the estimating unit is configured to estimate vanishing points from a plurality of intersections of the plurality of line segments according to an angle between each intersection and a midpoint of each line segment and a length of each line segment.
In some embodiments, based on the foregoing scheme, the estimating unit is configured to vote for each intersection point through the following formula, and estimate the vanishing point according to a vote value of each intersection point:
Figure GDA0001270544300000042
where v represents the vote value of the intersection, α represents the angle between the intersection P and the midpoint of the line segment L, and | L | is the length of the line segment and represents a constant.
In some embodiments, the number of vanishing points is 3, based on the foregoing scheme.
In some embodiments, based on the foregoing scheme, the deep learning comprises a full convolution neural network FCN.
In some embodiments, based on the foregoing, the determining module includes:
an extracting unit, configured to extract ScoreMap of the picture based on the FCN; wherein the Scoremap is used for describing structural features;
a determining unit, configured to determine a target structure from the multiple structures according to the ScoreMap.
In some embodiments, based on the foregoing scheme, the determining module further includes:
a suppressing unit configured to perform negative-sample suppression on the plurality of structures based on the structure feature before the determining unit determines the target structure.
In some embodiments, based on the foregoing scheme, the suppressing unit is configured to perform binarization operation on the ScoreMap, and then perform at least one of erosion and dilation operation, and perform negative sample suppression on the plurality of structures by using the ScoreMap of the operated picture.
In some embodiments, based on the foregoing scheme, the determining unit is configured to calculate a score of each structure by the following formula, and determine the target structure according to the score of each structure:
Figure GDA0001270544300000051
wherein, yiFor a graph of structural features in the resulting structure, all _ pixels denote all pixels of the graph, piOutput of [0, 1] for the FCN]Scoremap of range.
In some embodiments, based on the foregoing solution, the reconstruction module includes:
the estimation unit is used for estimating camera parameters corresponding to the pictures according to the vanishing points;
and the reconstruction unit is used for determining a three-dimensional structure corresponding to the picture according to the camera parameters, the vanishing point and the determined target structure so as to realize three-dimensional reconstruction.
In some embodiments, based on the foregoing scheme, the estimating unit is configured to estimate the focal length by using a minimum loss function as follows, using a manhattan assumption:
Figure GDA0001270544300000052
wherein, min E (f)k) Expressing an optimization operation for finding the minimum, E being an optimization objective function, fkFor variables to be optimized, vixFor the previously calculated x-coordinate, v, of the i-th vanishing pointiyFor the previously calculated y-coordinate of the i-th vanishing point, fkRepresenting the focus value.
In a third aspect of embodiments of the present invention, there is provided a readable storage medium having stored thereon a program which, when executed by a processor, performs the method as in the first aspect.
According to the method and the system for three-dimensional reconstruction based on deep learning, disclosed by the embodiment of the invention, the structural features are extracted by utilizing the deep learning, so that the extracted structural features are accurate and good in robustness, and further the effect of three-dimensional reconstruction is improved.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
fig. 1 schematically shows a flow chart of a method of deep learning based three-dimensional reconstruction according to an embodiment of the invention;
FIG. 2 is a schematic diagram illustrating a location of a vanishing point and a plane of a picture border according to an embodiment of the invention;
FIG. 3 schematically shows a structural diagram obtained according to FIG. 2;
fig. 4 schematically illustrates a diagram after binarization and dilation operations of ScoreMap according to an embodiment of the present invention;
fig. 5 schematically shows a flow chart of a method of estimating a vanishing point of an input picture according to an embodiment of the present invention;
FIG. 6 schematically illustrates a planar positional relationship of an intersection point and a line segment, according to an embodiment of the present invention;
FIG. 7 schematically illustrates a FCN network architecture diagram according to an embodiment of the present invention;
FIG. 8 schematically illustrates a system block diagram of a deep learning based three-dimensional reconstruction of an embodiment of the present invention;
in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Detailed Description
The principles and spirit of the present invention will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the invention, and are not intended to limit the scope of the invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As will be appreciated by one skilled in the art, embodiments of the present invention may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
According to the embodiment of the invention, a method, a system and a readable storage medium for three-dimensional reconstruction based on deep learning are provided.
In this document, it is to be understood that any number of elements in the figures are provided by way of illustration and not limitation, and any nomenclature is used for differentiation only and not in any limiting sense.
The principles and spirit of the present invention are explained in detail below with reference to several representative embodiments of the invention.
Summary of The Invention
The inventor finds that the three-dimensional reconstruction method in the prior art needs manual design characteristics on all possible room structures, reduces the three-dimensional reconstruction efficiency, and has poor three-dimensional reconstruction effect.
Therefore, the embodiment of the invention provides a method, a system and a readable storage medium for three-dimensional reconstruction based on deep learning, in the process of three-dimensional reconstruction, a plurality of structures are generated by estimating vanishing points of an input picture, a target structure is determined from the structures based on structural features extracted by the deep learning, and the three-dimensional reconstruction is carried out according to the target structure and the information of the vanishing points, so that feature extraction by the deep learning is realized, artificial design of features is avoided, and the efficiency and the effect of the three-dimensional reconstruction are improved.
Having described the general principles of the invention, various non-limiting embodiments of the invention are described in detail below.
Application scene overview
The application scenario is an exemplary application scenario applicable to the present invention, and it is to be understood that the application scenario described herein is only exemplary and not limiting.
In the application scenario, the input is a 2D picture, and a result of 3D reconstruction of the 2D picture needs to be output.
The 2D picture may be an indoor scene picture, an outdoor scene picture, or a 2D picture corresponding to a 3D scene such as an in-package scene.
Exemplary method
In the following, in connection with the above application scenarios, a method of three-dimensional reconstruction based on deep learning according to an exemplary embodiment of the present invention is described with reference to fig. 1 to 7. It should be noted that the above application scenarios are merely illustrated for the convenience of understanding the spirit and principles of the present invention, and the embodiments of the present invention are not limited in this respect. Rather, embodiments of the present invention may be applied to any scenario where applicable.
Fig. 1 schematically shows a flow chart of a method of deep learning based three-dimensional reconstruction according to an embodiment of the present invention. The method may be applied to the application scenarios described above, but the application scenarios of the method are not limited thereto.
As shown in fig. 1, in step S110, vanishing points of the input picture are estimated, and a plurality of structures are generated.
Vanishing Point (VP), intersection point on 2D picture after parallel lines in euclidean space are seen through camera perspective.
According to some embodiments, the 2D picture may be an indoor scene picture, an outdoor scene picture, or a 2D picture corresponding to a 3D scene such as an in-package scene.
Note that the estimation of the vanishing point will be described later in conjunction with fig. 5.
According to an example embodiment, the number of the vanishing points may be 3, but the invention is not limited thereto, e.g. the number of the vanishing points may also be more.
After the vanishing point is estimated, a plurality of structures can be constructed from the emission lines of the vanishing point.
Fig. 2 is a schematic diagram illustrating a planar position of a vanishing point and a picture frame according to an embodiment of the present invention.
As shown in fig. 2, assuming that the estimated 3 vanishing points are vp0, vp1 and vp2, respectively, the edge of the picture is already determined, by emitting two rays at any angle from vp0 and two rays at any angle from vp1, it is satisfied that the resulting quadrangle formed by connecting four intersection points contains vp2, and by extending vp2 to the frame of the picture and extending the line of the extended line to the four intersection points, the wall and the ground are separated according to the line of the extended line to the four intersection points. Fig. 3 schematically shows a structural diagram obtained according to fig. 2.
It should be noted that, according to the vanishing point and the plane position diagram of the picture frame in fig. 2, a plurality of structural diagrams can be obtained, for example, different structural diagrams can be obtained by changing angles of rays emitted by vp0 and vp1, and the structural diagram in fig. 3 is only an example.
In step S120, a target structure is determined from the above-described plurality of structures based on the structural features extracted by the deep learning.
According to some embodiments, the deep learning may include, but is not limited to, a full convolutional neural network (FCN).
The full convolutional neural network (FCN), the linear mapping layer is a convolutional (no fully connected layer) deep neural network.
In the process of extracting the structural features, ScoreMap of the picture may be extracted based on the FCN, and the ScoreMap is used for describing the structural features, according to some embodiments. For example, the ScoreMap may be used to represent a graph composed of probabilities that each pixel becomes a structural feature in a picture, and the probability value that each pixel becomes a structural feature is between 0 and 1.
It should be noted that, for an indoor scene picture, the structural feature may be a room structure line, and the ScoreMap may be used to represent a graph formed by probabilities that each pixel in the picture becomes the room structure line.
According to some embodiments, after extracting the ScoreMap, a target structure may be determined from the plurality of structures according to the ScoreMap. In determining the target structure from the plurality of structures, the score for each structure may be calculated by the following formula:
Figure GDA0001270544300000091
wherein, yiFor example, a graph may be formed of values for each pixel of a room structure line in a generated structure graph, all _ pixels representing all pixels of the graph, piOutput of [0, 1] for the FCN]Scoremap of extent, i.e. made up of the probability of each pixel extracted by FCN to become a room structure lineFigure (a). The score for each structure can be calculated by performing a dot product of the value of each pixel in the graph and the probability that each pixel in the ScoreMap output by the FCN will be a structural feature.
In the formula (1), y isi(1-pi) For the penalty term, the error structure with more structure characteristics can be restrained when compared with ScoreMap output by FCN in a plurality of structures. For example, when a pixel in the configuration diagram has a value of 1 for the room configuration line and the corresponding pixel in ScoreMap output by the FCN has a value of 0 for the room configuration line, then y isi(1-pi)=1×(1-0)=1,yipiWhen the number of room structure lines in the configuration diagram is greater than that in the ScoreMap output by the FCN, the score of the configuration diagram is reduced, and the score of the configuration diagram is the highest only when the pixel corresponding to the room structure line in the configuration diagram completely coincides with the pixel corresponding to the room structure line in the ScoreMap output by the FCN.
After the score for each structure is calculated, the target structure is determined based on the score for each structure. In general, the structure with the highest score is selected as the target structure.
In the above embodiment, the score of each structure is calculated by the formula (1), and the score may represent the ScoreMap similarity between the structure and the FCN output, so that the target structure is automatically determined, the accuracy of determining the target structure is improved, and the efficiency of determining the target structure is also improved.
In step S130, three-dimensional reconstruction is performed based on the target structure and the vanishing point information.
According to some embodiments, when performing three-dimensional reconstruction, a camera parameter corresponding to the picture may be estimated according to the vanishing point, and then a three-dimensional structure corresponding to the picture may be determined according to the camera parameter, the vanishing point, and the determined target structure, so as to implement three-dimensional reconstruction.
When estimating the camera parameters corresponding to the picture, using the manhattan assumption, the focal length is estimated by the following minimized loss function:
Figure GDA0001270544300000101
wherein, min E (f)k) Expressing an optimization operation for finding the minimum, E being an optimization objective function, fkFor variables to be optimized, vixFor the previously calculated x-coordinate, v, of the i-th vanishing pointiyFor the previously calculated y-coordinate of the i-th vanishing point, fkRepresenting the focus value.
It should be noted that, regardless of the offset of the camera in the x and y directions, the corresponding camera parameter matrix K can be obtained from F.
Figure GDA0001270544300000102
After the internal parameters of the camera are obtained, the information of the vanishing points and the target structure are combined, so that the corresponding boundary points of each wall surface can be obtained, the three-dimensional structure of each wall surface is determined, and three-dimensional reconstruction is realized.
Furthermore, each wall surface can be respectively bent (warp) by utilizing the corresponding relation between the obtained three-dimensional coordinates and the two-dimensional coordinates of the picture, and the three-dimensional map of the picture can be obtained.
In the embodiment of the invention, the deep learning is utilized to extract the structural features, so that the extracted structural features are accurate and have good robustness, further the effect of three-dimensional reconstruction is improved, and the method is different from the prior art that the same feature extraction operation needs to be carried out on all structures.
According to some embodiments, after the ScoreMap is extracted based on the FCN, negative sample suppression may be further performed on the plurality of structures according to the structural feature, so as to suppress a part of the structures.
When the negative sample suppression is performed, the ScoreMap may be binarized, and then at least one of erosion and dilation may be performed, so that the negative sample suppression may be performed on the plurality of structures by using the ScoreMap of the picture after the operation.
Note that the binarization operation performed on the ScoreMap means that the value of a pixel in the picture is converted into one of two values, i.e., 0 or 1. Further, at least one of corrosion and expansion is performed, so that the value difference of the pixels in the picture is more obvious, and the structural features in the picture are clearer. Further, the plural structures are sorted by ScoreMap after the above operation, and the structures with partial errors are eliminated.
Fig. 4 schematically shows a diagram after binarization and dilation operations of ScoreMap according to an embodiment of the invention.
As shown in fig. 4, assuming that the corresponding pixel is white when the pixel value is 1, and black when the pixel value is 0, the ScoreMap is binarized and dilated to obtain the schematic diagram shown in fig. 4. When negative sample suppression is performed, since the corner of the generated structure inevitably falls in the white pixel area, that is, the pixel value corresponding to the corner is inevitably 1, a structure in which the pixel value corresponding to the corner of the partially generated structure is 0 can be suppressed.
In the case of suppressing a partial structure, it is possible to set different suppression degrees, for example, a structure in which pixel values corresponding to all corners are suppressed to 0 may be provided, but the present invention is not limited to this, and for example, a structure in which pixel values corresponding to only one corner are suppressed to 0 may be provided.
In the embodiment of the invention, the Scoremap is subjected to binarization operation and then at least one of corrosion and expansion, and the operated Scoremap of the picture is used for carrying out negative sample suppression on the structures, so that partial wrong structures are eliminated, the base number of the structures needing to be scored is reduced, the effect of three-dimensional reconstruction is ensured, and the efficiency of three-dimensional reconstruction is improved.
The following describes a process of estimating a vanishing point of an input picture in detail with reference to specific embodiments.
Fig. 5 schematically shows a flowchart of a method of estimating a vanishing point of an input picture according to an embodiment of the present invention.
As shown in fig. 5, in S510, a plurality of line segments of the picture are extracted using the LSD algorithm.
According to some embodiments, when the LSD algorithm is adopted, first, points with similar gradient directions are connected into a region with a uniform orientation through iteration, and then a minimum rectangular structure capable of surrounding the region is found, so that a plurality of line segments of an input picture are extracted.
In S520, a vanishing point is estimated from a plurality of intersections composed of the plurality of line segments.
According to some embodiments, a vanishing point is estimated from a plurality of intersections of the plurality of line segments based on an angle of each intersection to a midpoint of each line segment and a length of each line segment.
According to some embodiments, a vote is cast for each intersection point, and a vanishing point is estimated from the vote value for said each intersection point by:
Figure GDA0001270544300000121
where v denotes a vote value of the intersection, α denotes an angle between the intersection P and the midpoint of the line segment L, | L | denotes a length of the line segment, and denotes a constant, and the constant for adjusting the weight is generally 0.1.
According to the formula (3), the smaller included angle and the larger line segment length correspond to the larger voting value.
Fig. 6 schematically shows a planar positional relationship diagram of an intersection point and a line segment according to an embodiment of the present invention. As shown in fig. 6, α represents an angle between the intersection point P and the midpoint of the line segment L, Q is the midpoint of the line segment L, and | L | is the length of the line segment.
And after the voting value of each intersection point is obtained, screening out a group of (3) intersection points which can be orthogonal and have the highest voting value, wherein the group of the intersection points (3) are estimated vanishing points.
The following describes in detail a method for extracting ScoreMap of the above-mentioned picture based on the FCN, with reference to a specific application scenario.
When the picture is an indoor scene picture, in the process of extracting ScoreMap, a room structure line in the picture can be extracted. Because the proportion of pixels occupied by the room structure line in the picture is small, and the problem of unbalanced training data exists when the two-classification is carried out, the invention adopts a 6-classification method to divide the picture into 6 channels of the ground, the left wall, the middle wall, the right wall, the ceiling and the room structure line, and further extracts the channel of the room structure line. When the structure line is calibrated, the line width adopts an 8-element expansion method, so that the structure line is clearer.
Fig. 7 schematically illustrates an FCN network architecture diagram according to an embodiment of the present invention.
As shown in fig. 7, the network includes a down-sampling portion and an up-sampling portion. The downsampling part adopts the structure of GoogleNet, but only uses conv1 to the dropout layer, and does not use the full connection of GoogleNet.
The upsampling process first uses a convolution of 1x1 to map the number of channels to 6 channels, corresponding to 6 classes, and learns the parameters. Then 32 times of upsampling is directly carried out by using bilinear interpolation, so that the parameters are restored to reach the same scale as the input. It should be noted that, since the parameters are learned in the convolution of 1 × 1, the parameters will not be learned in the bilinear interpolation process. Since the pixel padding (padding) operation is performed during the down-sampling process, the graph formed by the structure lines needs to be clipped to the same size as the input by using a crop layer (crop) after the up-sampling. And finally mapping the result to the position between [0, 1] through a Softmax layer to obtain Scoremap of the room structure.
It should be noted that, in the above embodiment, the input is a two-dimensional RGB picture scaled to 500 × 3(R, G, B three color channels), and the output is ScoreMap of 500 × 6 (6 channels of ground, left wall, middle wall, right wall, ceiling and room structure line), during the process of extracting ScoreMap, the input size may be adjusted accordingly, and sizes with different aspect ratios may also be used, and the size is preferably between 300-.
In the embodiment of the invention, ScoreMap is extracted by using FCN, so that the extracted structural features are accurate, the robustness is good, the effect of three-dimensional reconstruction is further improved, and the method is different from the condition that the same feature extraction operation needs to be carried out on all structures in the prior art.
Exemplary System
Having described the method of an exemplary embodiment of the present invention, a system for deep learning based three-dimensional reconstruction of an exemplary embodiment of the present invention will next be described with reference to fig. 8.
Fig. 8 schematically shows a system block diagram of deep learning based three-dimensional reconstruction according to an embodiment of the present invention. The system 800 may implement the corresponding methods described above. The system 800 is described below, and details corresponding to the foregoing method are not repeated.
As shown in fig. 8, the apparatus 800 includes an estimation module 810, a determination module 820, and a reconstruction module 830.
An estimating module 810, configured to estimate vanishing points of the input pictures, and generate a plurality of structures;
a determining module 820 for determining a target structure from the plurality of structures based on the deep learning extracted structural features;
and a reconstruction module 830, configured to perform three-dimensional reconstruction according to the target structure and the information of the vanishing point.
According to some embodiments, the estimation module 810 comprises:
an extracting unit 8102, configured to extract a plurality of line segments of the picture by using an LSD algorithm.
An estimating unit 8104 configured to estimate a vanishing point from a plurality of intersection points formed by the plurality of line segments.
According to some embodiments, the estimating unit 8104 is configured to estimate vanishing points from a plurality of intersections of the plurality of line segments according to an angle between each intersection and a midpoint of each line segment and a length of each line segment.
According to some embodiments, the estimating unit 8104 is configured to vote for each intersection point by the following formula, and estimate the vanishing point according to the vote value of each intersection point:
Figure GDA0001270544300000141
where v represents the vote value of the intersection, α represents the angle between the intersection P and the midpoint of the line segment L, and | L | is the length of the line segment and represents a constant.
According to some embodiments, the number of vanishing points is 3.
According to some embodiments, the deep learning comprises a full convolution neural network, FCN.
According to some embodiments, the determining module 820 comprises:
an extracting unit 8202, configured to extract ScoreMap of the picture based on the FCN; wherein the Scoremap is used for describing structural features;
a determining unit 8204, configured to determine a target structure from the plurality of structures according to the ScoreMap.
According to some embodiments, the determining module 820 further comprises:
a suppressing unit 8206 for performing negative sample suppression on the plurality of structures based on the structure feature before the determination unit determines the target structure.
According to some embodiments, the suppression unit 8206 is configured to perform binarization operation on the ScoreMap, and then perform at least one of erosion and dilation operation, and perform negative sample suppression on the plurality of structures by using the ScoreMap of the operated picture.
According to some embodiments, the determining unit 8204 is configured to calculate a score for each structure by the following formula, and determine a target structure according to the score for each structure:
Figure GDA0001270544300000151
wherein, yiFor a graph of structural features in the resulting structure, all _ pixels denote all pixels of the graph, piOutput of [0, 1] for the FCN]Scoremap of range.
According to some embodiments, the reconstruction module 830 comprises:
an estimating unit 8302, configured to estimate, according to the vanishing point, a camera parameter corresponding to the picture;
a reconstruction unit 8304, configured to determine a three-dimensional structure corresponding to the picture according to the camera parameter, the vanishing point, and the determined target structure, so as to implement three-dimensional reconstruction.
According to some embodiments, the estimation unit 8302 is configured to estimate the focal length using the manhattan assumption by minimizing a loss function as follows:
Figure GDA0001270544300000152
wherein, min E (f)k) Expressing an optimization operation for finding the minimum, E being an optimization objective function, fkFor variables to be optimized, vixFor the previously calculated x-coordinate, v, of the i-th vanishing pointiyFor the previously calculated y-coordinate of the i-th vanishing point, fkRepresenting the focus value.
In the embodiment of the invention, the deep learning is utilized to extract the structural features, so that the extracted structural features are accurate and have good robustness, further the effect of three-dimensional reconstruction is improved, and the method is different from the prior art that the same feature extraction operation needs to be carried out on all structures.
Exemplary device
Having described the method and system of an exemplary embodiment of the present invention, an apparatus for deep learning based three-dimensional reconstruction according to another exemplary embodiment of the present invention is described next.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
In some possible embodiments, an apparatus for deep learning based three-dimensional reconstruction according to the present invention may comprise at least one processing unit, and at least one memory unit. Wherein the storage unit stores program code which, when executed by the processing unit, causes the processing unit to perform the steps of the method for deep learning based three-dimensional reconstruction according to various exemplary embodiments of the present invention described in the above section "exemplary methods" of the present specification. For example, the processing unit may perform step S110 as shown in fig. 1: estimating vanishing points of the input pictures to generate a plurality of structures; step S120, based on the structure characteristics extracted by deep learning, determining a target structure from the plurality of structures; and step S130, performing three-dimensional reconstruction according to the target structure and the information of the vanishing point.
Exemplary program product
In some possible embodiments, aspects of the present invention may also be implemented in the form of a program product including program code for causing a terminal device to perform steps in the method for deep learning based three-dimensional reconstruction according to various exemplary embodiments of the present invention described in the section "exemplary method" above in this specification, when the program product is run on the terminal device, for example, the terminal device may perform step S110 as shown in fig. 1: estimating vanishing points of the input pictures to generate a plurality of structures; step S120, based on the structure characteristics extracted by deep learning, determining a target structure from the plurality of structures; and step S130, performing three-dimensional reconstruction according to the target structure and the information of the vanishing point.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user computing device, partly on a remote computing device, or entirely on the remote computing device or server. In situations involving remote computing devices, the remote computing devices may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN).
It should be noted that although in the above detailed description several modules or units of the system of deep learning based three-dimensional reconstruction are mentioned, such partitioning is merely exemplary and not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit according to embodiments of the invention. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Moreover, while the operations of the method of the invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
While the spirit and principles of the invention have been described with reference to several particular embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (15)

1. A method of deep learning based three-dimensional reconstruction, comprising:
estimating vanishing points of the input pictures to generate a plurality of structures;
extracting Scoremap of the picture based on a full convolution neural network (FCN), performing binarization operation on the Scoremap, performing at least one of corrosion and expansion operation, and performing negative sample inhibition on the structures by using the operated Scoremap of the picture; wherein the Scoremap is used for describing structural features;
calculating a score of each structure according to the Scoremap through the following formula, and determining a target structure from the plurality of structures subjected to negative sample suppression according to the score of each structure:
Figure FDA0002445103390000011
wherein, yiFor a graph of structural features in the resulting structure, all _ pixels denote all pixels of the picture, piOutput of [0, 1] for the full convolution neural network FCN]Scoremap of the range;
and performing three-dimensional reconstruction according to the target structure and the information of the vanishing point.
2. The method of claim 1, wherein the estimating a vanishing point of the input picture comprises:
extracting a plurality of line segments of the picture by adopting an LSD algorithm;
a vanishing point is estimated from a plurality of intersections of the plurality of line segments.
3. The method of claim 2, wherein said estimating a vanishing point from a plurality of intersections of said plurality of line segments comprises:
and estimating a vanishing point from a plurality of intersection points formed by the plurality of line segments according to the included angle between each intersection point and the midpoint of each line segment and the length of each line segment.
4. The method of claim 3, wherein the method comprises:
voting is carried out on each intersection point through the following formula, and a vanishing point is estimated according to the voting value of each intersection point:
Figure FDA0002445103390000021
where v represents the vote value of the intersection, α represents the angle between the intersection P and the midpoint of the line segment L, and | L | is the length of the line segment and represents a constant.
5. The method of any one of claims 1 to 4, wherein the number of vanishing points is 3.
6. The method of claim 1, wherein the three-dimensional reconstruction from the target structure and the vanishing point information comprises:
estimating camera parameters corresponding to the pictures according to the vanishing points;
and determining a three-dimensional structure corresponding to the picture according to the camera parameters, the vanishing point and the determined target structure, so as to realize three-dimensional reconstruction.
7. The method of claim 6, wherein the estimating camera parameters corresponding to the picture according to the vanishing point comprises:
using the manhattan assumption, the focal length is estimated by minimizing the loss function as follows:
Figure FDA0002445103390000022
wherein, minE (f)k) Expressing an optimization operation for finding the minimum, E being an optimization objective function, fkFor variables to be optimized, vixFor the previously calculated x-coordinate, v, of the i-th vanishing pointiyFor the previously calculated y-coordinate of the i-th vanishing point, fkRepresenting the focus value.
8. A system for deep learning based three-dimensional reconstruction, comprising:
the estimation module is used for estimating vanishing points of the input pictures and generating a plurality of structures;
the suppression module is used for extracting the Scoremap of the picture based on a full convolution neural network (FCN), carrying out binarization operation on the Scoremap, then carrying out at least one operation of corrosion and expansion, and carrying out negative sample suppression on the structures by using the operated Scoremap of the picture; wherein the Scoremap is used for describing structural features;
a determining module, configured to calculate a score of each structure according to the ScoreMap by using the following formula, and determine a target structure from the plurality of structures subjected to negative sample suppression according to the score of each structure:
Figure FDA0002445103390000031
wherein, yiFor a graph of structural features in the resulting structure, all _ pixels represents all pixels of the picture,piOutput of [0, 1] for the full convolution neural network FCN]Scoremap of the range;
and the reconstruction module is used for performing three-dimensional reconstruction according to the target structure and the information of the vanishing point.
9. The system of claim 8, wherein the estimation module comprises:
the extraction unit is used for extracting a plurality of line segments of the picture by adopting an LSD algorithm;
an estimating unit configured to estimate a vanishing point from a plurality of intersections composed of the plurality of line segments.
10. The system of claim 9, wherein the estimation unit is configured to estimate vanishing points from a plurality of intersections of the plurality of line segments based on an angle of each intersection to a midpoint of each line segment and a length of each line segment.
11. The system of claim 10, wherein the estimation unit is configured to vote for each intersection point by the following formula, and estimate the vanishing point based on the vote value for each intersection point:
Figure FDA0002445103390000032
where v represents the vote value of the intersection, α represents the angle between the intersection P and the midpoint of the line segment L, and | L | is the length of the line segment and represents a constant.
12. The system of any one of claims 8 to 11, wherein the number of vanishing points is 3.
13. The system of claim 8, wherein the reconstruction module comprises:
the estimation unit is used for estimating camera parameters corresponding to the pictures according to the vanishing points;
and the reconstruction unit is used for determining a three-dimensional structure corresponding to the picture according to the camera parameters, the vanishing point and the determined target structure so as to realize three-dimensional reconstruction.
14. The system of claim 13, wherein the estimation unit is configured to estimate the focal length using manhattan assumptions by minimizing a loss function as follows:
Figure FDA0002445103390000033
wherein, minE (f)k) Expressing an optimization operation for finding the minimum, E being an optimization objective function, fkFor variables to be optimized, vixFor the previously calculated x-coordinate, v, of the i-th vanishing pointiyFor the previously calculated y-coordinate of the i-th vanishing point, fkRepresenting the focus value.
15. A readable storage medium, on which a program is stored which, when executed by a processor, carries out the method according to any one of claims 1 to 7.
CN201710150304.XA 2017-03-14 2017-03-14 Three-dimensional reconstruction method and system based on deep learning and readable storage medium Active CN106952338B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710150304.XA CN106952338B (en) 2017-03-14 2017-03-14 Three-dimensional reconstruction method and system based on deep learning and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710150304.XA CN106952338B (en) 2017-03-14 2017-03-14 Three-dimensional reconstruction method and system based on deep learning and readable storage medium

Publications (2)

Publication Number Publication Date
CN106952338A CN106952338A (en) 2017-07-14
CN106952338B true CN106952338B (en) 2020-08-14

Family

ID=59467450

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710150304.XA Active CN106952338B (en) 2017-03-14 2017-03-14 Three-dimensional reconstruction method and system based on deep learning and readable storage medium

Country Status (1)

Country Link
CN (1) CN106952338B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107369204B (en) * 2017-07-27 2020-01-07 北京航空航天大学 Method for recovering basic three-dimensional structure of scene from single photo
US10607135B2 (en) 2017-10-19 2020-03-31 General Electric Company Training an auto-encoder on a single class
US10460440B2 (en) 2017-10-24 2019-10-29 General Electric Company Deep convolutional neural network with self-transfer learning
CN108305327A (en) * 2017-11-22 2018-07-20 北京居然设计家家居连锁集团有限公司 A kind of image rendering method
CN109062211B (en) * 2018-08-10 2021-12-10 远形时空科技(北京)有限公司 Method, device and system for identifying adjacent space based on SLAM and storage medium
CN111199574A (en) * 2018-11-16 2020-05-26 青岛海信激光显示股份有限公司 Holographic image generation method and equipment
US10839606B2 (en) 2018-12-28 2020-11-17 National Tsing Hua University Indoor scene structural estimation system and estimation method thereof based on deep learning network
CN111325143A (en) * 2020-02-18 2020-06-23 西北工业大学 Underwater target identification method under unbalanced data set condition
CN111583417B (en) * 2020-05-12 2022-05-03 北京航空航天大学 Method and device for constructing indoor VR scene based on image semantics and scene geometry joint constraint, electronic equipment and medium
CN111968245B (en) * 2020-07-07 2022-03-01 北京城市网邻信息技术有限公司 Three-dimensional space marking line display method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103413347A (en) * 2013-07-05 2013-11-27 南京邮电大学 Extraction method of monocular image depth map based on foreground and background fusion
CN103413352A (en) * 2013-07-29 2013-11-27 西北工业大学 Scene three-dimensional reconstruction method based on RGBD multi-sensor fusion
CN106327532A (en) * 2016-08-31 2017-01-11 北京天睿空间科技股份有限公司 Three-dimensional registering method for single image

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9008460B2 (en) * 2012-04-27 2015-04-14 Adobe Systems Incorporated Automatic adjustment of images using a homography

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103413347A (en) * 2013-07-05 2013-11-27 南京邮电大学 Extraction method of monocular image depth map based on foreground and background fusion
CN103413352A (en) * 2013-07-29 2013-11-27 西北工业大学 Scene three-dimensional reconstruction method based on RGBD multi-sensor fusion
CN106327532A (en) * 2016-08-31 2017-01-11 北京天睿空间科技股份有限公司 Three-dimensional registering method for single image

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Learning Informative Edge Maps for Indoor Scene Layout Prediction;Arun Mallya ET AL.;《2015 IEEE International Conference on Computer Vision》;20151231;第936-944页 *
Recovering the Spatial Layout of Cluttered Rooms;Varsha Hedau ET AL.;《2009 IEEE 12th International Conference on Computer Vision 》;20091231;第1849-1856页 *

Also Published As

Publication number Publication date
CN106952338A (en) 2017-07-14

Similar Documents

Publication Publication Date Title
CN106952338B (en) Three-dimensional reconstruction method and system based on deep learning and readable storage medium
US10235771B2 (en) Methods and systems of performing object pose estimation
US9189862B2 (en) Outline approximation for point cloud of building
CN110930454A (en) Six-degree-of-freedom pose estimation algorithm based on boundary box outer key point positioning
CN107798725B (en) Android-based two-dimensional house type identification and three-dimensional presentation method
US7409108B2 (en) Method and system for hybrid rigid registration of 2D/3D medical images
US10726599B2 (en) Realistic augmentation of images and videos with graphics
CN112348815A (en) Image processing method, image processing apparatus, and non-transitory storage medium
CN109191554B (en) Super-resolution image reconstruction method, device, terminal and storage medium
US11551388B2 (en) Image modification using detected symmetry
US11176425B2 (en) Joint detection and description systems and methods
CN111768415A (en) Image instance segmentation method without quantization pooling
CN112868021A (en) Letter detection device, method and system
CN113450396A (en) Three-dimensional/two-dimensional image registration method and device based on bone features
CN114202632A (en) Grid linear structure recovery method and device, electronic equipment and storage medium
CN114519819B (en) Remote sensing image target detection method based on global context awareness
CN116645592A (en) Crack detection method based on image processing and storage medium
CN112085842A (en) Depth value determination method and device, electronic equipment and storage medium
CN115393423A (en) Target detection method and device
EP4075381B1 (en) Image processing method and system
CN117296078A (en) Optical flow techniques and systems for accurately identifying and tracking moving objects
CN113920525A (en) Text correction method, device, equipment and storage medium
CN111915599A (en) Flame significance detection method based on boundary perception
US9111395B2 (en) Automatic placement of shadow map partitions
CN110349111A (en) A kind of antidote and device comprising image in 2 D code

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20210512

Address after: 311200 Room 102, 6 Blocks, C District, Qianjiang Century Park, Xiaoshan District, Hangzhou City, Zhejiang Province

Patentee after: Hangzhou Yixian Advanced Technology Co.,Ltd.

Address before: 7 / F, building 4, No. 599, Wangshang Road, Changhe street, Hangzhou City, Zhejiang Province 310052

Patentee before: NETEASE (HANGZHOU) NETWORK Co.,Ltd.

TR01 Transfer of patent right