CN114627244A

CN114627244A - Three-dimensional reconstruction method and device, electronic equipment and computer readable medium

Info

Publication number: CN114627244A
Application number: CN202210291769.8A
Authority: CN
Inventors: 苏明兰; 张超颖; 郭枝虾; 梁宝林; 王建秀
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2022-03-22
Filing date: 2022-03-22
Publication date: 2022-06-14

Abstract

The application relates to the technical field of three-dimensional reconstruction, in particular to a three-dimensional reconstruction method and device, electronic equipment and a computer readable medium, wherein the method comprises the following steps: acquiring a target image of a scene to be reconstructed; respectively extracting two-dimensional feature descriptors of two adjacent frames of images in the target image, and performing feature matching on the two-dimensional feature descriptors of the two adjacent frames of images to obtain two-dimensional key point matching point pairs; generating a three-dimensional feature descriptor of each frame of image; carrying out feature matching on three-dimensional feature descriptors of two adjacent frames of images to obtain a three-dimensional key point matching point pair; filtering the two-dimensional key point matching point pair and the three-dimensional key point matching point pair to obtain a target matching point pair; and performing camera pose estimation and three-dimensional scene reconstruction according to the target matching point pair to obtain a target image of a reconstructed scene. According to the embodiment of the application, the matching accuracy of the key points and the accuracy of three-dimensional reconstruction can be improved under the condition that the scene to be reconstructed has periodic texture distribution.

Description

Three-dimensional reconstruction method and device, electronic equipment and computer readable medium

Technical Field

The present application relates to the field of three-dimensional reconstruction technologies, and in particular, to a three-dimensional reconstruction method, a three-dimensional reconstruction apparatus, an electronic device, and a computer-readable medium.

Background

The three-dimensional reconstruction is an inverse process of describing or reconstructing two-dimensional projection images of objects and scenes in a three-dimensional space, the two-dimensional images are restored into an object or scene model containing three-dimensional information, and the reconstructed model can be conveniently represented, processed and displayed by a computer. The three-dimensional reconstruction technology is an important component of autonomous navigation, position environment model reconstruction, large-scale digital monitoring and the like of a mobile robot, is also the basis of Augmented Reality (AR) technology, and the reconstructed three-dimensional model can be directly applied to an AR or VR scene.

Three-dimensional reconstruction is mainly divided into two main categories: laser-based reconstruction, which is not amenable to mass production and commercialization due to the high cost of hardware equipment, and visual image-based reconstruction. The reconstruction based on the visual image usually acquires a target image of a scene to be reconstructed and depth information corresponding to the target image, key point matching is carried out on the target image of the scene to be reconstructed through a two-dimensional feature descriptor, and then processes such as pose optimization, surface model reconstruction and the like are carried out to realize three-dimensional reconstruction.

At present, the related art can only achieve a good three-dimensional reconstruction effect on a scene to be reconstructed without periodic texture distribution, but when the scene to be reconstructed has periodic texture distribution, perspective distortion is easily caused by visual change and the like, so that mismatching of key point pairs is caused, and the accuracy of three-dimensional reconstruction cannot be guaranteed.

Disclosure of Invention

In order to solve the above technical problem, embodiments of the present application provide a three-dimensional reconstruction method, a three-dimensional reconstruction apparatus, an electronic device, and a computer-readable medium.

According to an aspect of an embodiment of the present application, there is provided a three-dimensional reconstruction method, including: acquiring a target image of a scene to be reconstructed; respectively extracting two-dimensional feature descriptors of two adjacent frames of images in the target image, and performing feature matching on the two-dimensional feature descriptors of the two adjacent frames of images to obtain two-dimensional key point matching point pairs; performing feature description on the three-dimensional key points of each frame of image in the target image to generate a three-dimensional feature descriptor of each frame of image; the three-dimensional key point of each frame of image is determined based on the two-dimensional key point of each frame of image; carrying out feature matching on the three-dimensional feature descriptors of the two adjacent frames of images to obtain a three-dimensional key point matching point pair; filtering the two-dimensional key point matching point pairs and the three-dimensional key point matching point pairs to obtain target matching point pairs; and performing camera pose estimation and three-dimensional scene reconstruction according to the target matching point pairs to obtain a target image of a reconstructed scene.

According to an aspect of an embodiment of the present application, there is provided a three-dimensional reconstruction apparatus, including: the acquisition module is configured to acquire a target image of a scene to be reconstructed; the two-dimensional feature extraction and matching module is configured to respectively extract two-dimensional feature descriptors of two adjacent frames of images in the target image, and perform feature matching on the two-dimensional feature descriptors of the two adjacent frames of images to obtain two-dimensional key point matching point pairs; the three-dimensional feature descriptor generation module is configured to perform feature description on three-dimensional key points of each frame of image in the target image and generate a three-dimensional feature descriptor of each frame of image; the three-dimensional key point of each frame of image is determined based on the two-dimensional key point of each frame of image; the three-dimensional feature matching module is configured to perform feature matching on the three-dimensional feature descriptors of the two adjacent frames of images to obtain a three-dimensional key point matching point pair; the filtering module is configured to filter the two-dimensional key point matching point pairs and the three-dimensional key point matching point pairs to obtain target matching point pairs; and the camera pose estimation and three-dimensional scene reconstruction module is configured to perform camera pose estimation and three-dimensional scene reconstruction according to the target matching point pairs to obtain a target image of a reconstructed scene.

According to an aspect of an embodiment of the present application, there is provided an electronic device including: one or more processors; a storage device for storing one or more programs that, when executed by the one or more processors, cause the electronic device to implement the three-dimensional reconstruction method as described above.

According to an aspect of embodiments of the present application, there is provided a computer-readable medium having stored thereon computer-readable instructions, which, when executed by a processor of a computer, cause the computer to perform the three-dimensional reconstruction method as described above.

In the technical scheme provided by the embodiment of the application, aiming at a scene to be reconstructed with periodic texture distribution, firstly, a three-dimensional feature descriptor capable of describing shape information of a local curved surface of a three-dimensional key point in the scene to be reconstructed is constructed, and the constructed three-dimensional feature descriptor is combined with a two-dimensional feature descriptor to perform feature matching to obtain an initial matching point pair; after the initial matching point pair is obtained, the initial matching point pair is screened by combining the depth information constraint and the local neighbor constraint of the target image, so that the screened candidate matching point pair has higher matching precision, the mismatching of key points can be reduced, the matching accuracy of the key points of two adjacent frames of images is improved, and the matching accuracy of the key points and the accuracy of three-dimensional reconstruction are effectively improved. In addition, before camera pose estimation, a random sampling consistency algorithm is used for screening candidate matching point pairs subjected to local neighbor constraint and depth information constraint screening again, so that all wrong matching point pairs can be removed, the matching precision of key point matching point pairs is improved, and the screening time of the matching point pairs can be shortened.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:

FIG. 1 is a schematic diagram of a three-dimensional reconstruction system framework shown in an exemplary embodiment of the present application;

FIG. 2 is a flow chart illustrating a three-dimensional reconstruction method according to an exemplary embodiment of the present application;

FIG. 3 is a flow chart of step S230 in the embodiment shown in FIG. 2 in an exemplary embodiment;

FIG. 4 is a flowchart of step S340 in the embodiment shown in FIG. 3 in an exemplary embodiment;

FIG. 5 is a flow chart of step S240 in the embodiment shown in FIG. 2 in an exemplary embodiment;

FIG. 6 is a flow chart of step S250 in the embodiment shown in FIG. 2 in an exemplary embodiment;

FIG. 7 is a schematic diagram of the local neighbor constraint of step S610 in the embodiment shown in FIG. 6;

FIG. 8 is a flow chart of a three-dimensional reconstruction method shown in another exemplary embodiment of the present application;

FIG. 9 is a block diagram of a three-dimensional reconstruction apparatus shown in an exemplary embodiment of the present application;

FIG. 10 is a block diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the application, as detailed in the appended claims.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

Reference to "a plurality" in this application means two or more. "and/or" describe the association relationship of the associated objects, meaning that there may be three relationships, e.g., A and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

The scheme provided by the embodiment of the application relates to a three-dimensional reconstruction technology. It will be appreciated that three-dimensional reconstruction techniques are mathematical models that depict a real scene as conforming to a computer logical representation. In the related art, detection and matching of key points of adjacent frames are usually performed through a two-dimensional feature descriptor, and then three-dimensional reconstruction is realized through processes of pose optimization, surface model reconstruction and the like. However, the two-dimensional feature descriptor lacks spatial information, is very sensitive to perspective distortion and the like caused by visual change, and if periodic texture distribution exists in a scene to be reconstructed, mismatching of key point pairs is easily caused, local mismatching in the key point matching process is propagated in global optimization, so that errors are further accumulated, and the accuracy of three-dimensional reconstruction is reduced.

When periodic texture distribution exists in a scene to be reconstructed, in order to effectively avoid the problem of perspective distortion caused by visual change brought by the periodic texture distribution in the scene to be reconstructed and improve the accuracy of key point matching and the correctness of three-dimensional reconstruction, the application provides a three-dimensional reconstruction system, which comprises a terminal and a server, wherein:

the server side obtains a target image of a scene to be reconstructed;

respectively extracting two-dimensional feature descriptors of two adjacent frames of images in the target image, and performing feature matching on the two-dimensional feature descriptors of the two adjacent frames of images to obtain two-dimensional key point matching point pairs;

performing feature description on three-dimensional key points of each frame of image in the target image to generate a three-dimensional feature descriptor of each frame of image; the three-dimensional key point of each frame of image is determined based on the two-dimensional key point of each frame of image;

carrying out feature matching on three-dimensional feature descriptors of two adjacent frames of images to obtain a three-dimensional key point matching point pair;

filtering the two-dimensional key point matching point pair and the three-dimensional key point matching point pair to obtain a target matching point pair;

and performing camera pose estimation and three-dimensional scene reconstruction according to the target matching point pair to obtain a target image of a reconstructed scene.

Namely, the terminal and the server interact; specifically, the server acquires a target image of a scene to be reconstructed from the terminal, and performs key point extraction, feature descriptor generation and key point matching on adjacent frames contained in the target image respectively to obtain a target matching point pair to realize three-dimensional reconstruction.

Fig. 1 is a schematic diagram of a three-dimensional reconstruction system framework shown in an exemplary embodiment of the present application, as shown in fig. 1, as can be seen from the three-dimensional reconstruction system framework shown in fig. 1, wherein: the server 101 has logic calculation, the server 101 respectively calculates two-dimensional feature descriptors and three-dimensional feature descriptors of two adjacent frames of images in the target image, and performs feature matching on the two-dimensional feature descriptors of the two adjacent frames of images to obtain two-dimensional key point matching point pairs; carrying out feature matching on three-dimensional feature descriptors of two adjacent frames of images to obtain a three-dimensional key point matching point pair; then, screening and filtering the two-dimensional key point matching point pairs and the three-dimensional key point matching point pairs to obtain accurately matched target matching point pairs; and performing camera pose estimation and three-dimensional scene reconstruction according to the target matching point pair so as to obtain a target image of a reconstructed scene.

In an embodiment of the present application, the server 101 is a server, for example, may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), and a big data and artificial intelligence platform, which is not limited herein. The server 101 may communicate with the terminal 102 through a wireless network such as 3G (third generation mobile information technology), 4G (fourth generation mobile information technology), and 5G (fifth generation mobile information technology), which is not limited herein.

In an embodiment of the present application, the terminal 102 may be a video camera, a digital camera, a smart phone, a tablet computer, a PC (personal computer), or any other electronic device with an image capturing function, which is not limited herein.

It should be understood that the number of terminals 102 and servers 101 in fig. 1 is merely illustrative. There may be any number of terminals 102 and servers 101, as desired.

Based on the application scene shown in fig. 1, after the technical scheme of the embodiment of the present application is adopted, the server 101 acquires a target image of a scene to be reconstructed from the terminal 102, constructs a three-dimensional feature descriptor according to two adjacent frames of images, and performs feature matching on the constructed three-dimensional feature descriptor and a two-dimensional feature descriptor to obtain an initial matching point pair, wherein the three-dimensional feature descriptor contains the change condition and the geometric features of a local surface near a three-dimensional key point, so that the shape information of a local curved surface of the three-dimensional key point can be described, and the accuracy of the key point pair can be improved; and screening the initial matching point pair by combining the depth information constraint and the local neighbor constraint of the target image, so that the screened candidate matching point pair has higher matching precision, the mismatching of key points can be reduced, and the matching accuracy of the key points of two adjacent frames of images is improved.

Various implementation details of the technical solution of the embodiments of the present application are set forth in detail below:

fig. 2 is a flowchart illustrating a three-dimensional reconstruction method according to an exemplary embodiment of the present application. As shown in fig. 2, the method may be performed by the server 101 in the framework of the three-dimensional reconstruction system shown in fig. 1. It should be understood that the method may be applied to other exemplary implementation environments and is specifically executed by devices in other implementation environments, and the embodiment does not limit the implementation environment to which the method is applied.

As shown in fig. 2, the three-dimensional reconstruction method at least includes steps S210 to S260, which are described in detail as follows:

step S210, a target image of a scene to be reconstructed is obtained.

The target image of the scene to be reconstructed in the embodiment of the application refers to an initial image which is obtained from the terminal device and used for three-dimensional reconstruction, and is an image of continuous multiple frames.

In this embodiment, a target image of a scene to be reconstructed may be acquired from any terminal device having an image capturing function, such as a mobile phone, a tablet computer, and a digital camera, for example, a multi-frame initial image may be acquired by using an RGB-D depth camera, where the RGB-D depth camera may provide not only a color image but also depth information corresponding to a pixel point of the image, that is, distance information from an object to the camera may be acquired. According to actual needs, the target image of the scene to be reconstructed can be directly read from a hard disk of the server.

The scene to be reconstructed in the embodiment of the application refers to a mathematical model which is characterized by a real scene and accords with the logical expression of a computer, for example, three-dimensional construction of a game virtual scene, three-dimensional reconstruction of an indoor scene and the like.

The scene to be reconstructed generally has periodic texture distribution with different degrees, some texture distribution is rich, some texture distribution is simple, the three-dimensional reconstruction effect is directly related to the rich degree of the texture distribution of the scene to be reconstructed, generally speaking, the reconstruction effect with richer texture distribution is poor, the reconstruction effect with simpler texture distribution is better, because the more texture distribution of the scene to be reconstructed, the more serious perspective distortion caused by visual change is, and when key point matching is performed, the situation of key point mismatching is easy to occur.

Step S220, respectively extracting two-dimensional feature descriptors of two adjacent frames of images in the target image, and performing feature matching on the two-dimensional feature descriptors of the two adjacent frames of images to obtain two-dimensional key point matching point pairs.

It should be noted that the two-dimensional feature descriptor in the embodiment of the present application is generally a vector, and is a simplified representation of an image, and includes important information of the image, for example, the two-dimensional feature descriptor includes orientation of two-dimensional key points and surrounding pixel information. The two-dimensional feature descriptor has rotation scale invariance, robustness and distinguishability, and the change of the scale and the direction of the image caused by the change of the visual angle can be eliminated through the two-dimensional feature descriptor, so that better matching between the images is facilitated.

The two-dimensional Feature descriptor in the embodiment of the present application may be obtained by performing Feature description on a two-dimensional Feature point through a Scale-Invariant Feature Transform (SIFT) Feature point descriptor, a Speeded Up Robust Features (SURF) Feature point descriptor, an ordered FAST and rolling BRIEF (original FAST and rolling BRIEF) Feature point descriptor, or any one of the Feature point descriptors.

In an embodiment of the present application, the process of extracting two-dimensional feature descriptors of two adjacent frames of images in step S220 may include the following steps, which are described in detail as follows:

respectively extracting two-dimensional key points of two adjacent frames of images in the target image to obtain two-dimensional key points of each frame of image; and respectively carrying out feature description on the two-dimensional key points of each frame of image to obtain two-dimensional feature descriptors respectively corresponding to two adjacent frames of images.

The two-dimensional key points in the embodiment of the application refer to the positions of the feature points in the image, and some two-dimensional key points also have direction and scale information.

The two-dimensional key points in the embodiment of the application can be obtained by detecting the feature point detection modes such as an SIFT key point detection method, an SURF key point detection method and the like.

In an embodiment of the present application, the process of performing feature matching on the two-dimensional feature descriptors of two adjacent frames of images in step S220 to obtain a two-dimensional keypoint matching point pair may include the following steps, which are described in detail as follows:

acquiring two-dimensional key points of one frame of image from two adjacent frames of images; respectively calculating the Euclidean distance between each two-dimensional key point and each two-dimensional key point contained in the other frame of image; and if the Euclidean distance between the two-dimensional key points is smaller than a preset distance value, taking the two-dimensional key points as a two-dimensional key point matching point pair.

The feature matching in the embodiment of the application is performed on two-dimensional feature descriptors, the similarity degree of the two feature descriptors can be reflected by the distance between the two feature descriptors, and the feature matching is to find out the most similar key points from two-dimensional key point sets respectively corresponding to two adjacent frames of images.

The two-dimensional keypoint matching point pairs in the embodiment of the application refer to the most similar keypoint pairs found from two-dimensional keypoint sets respectively corresponding to two adjacent frames of images.

Step S230, performing feature description on the three-dimensional key points of each frame of image in the target image to generate a three-dimensional feature descriptor of each frame of image; the three-dimensional keypoints of each frame of image are determined based on the two-dimensional keypoints of each frame of image.

The three-dimensional key points in the embodiment of the application are obtained by fusing the two-dimensional key points and the depth information corresponding to the two-dimensional key points.

It should be understood that the three-dimensional feature descriptor is usually a vector, the three-dimensional feature descriptor considers spatial information of an image to be reconstructed, can describe shape information of a local curved surface of a three-dimensional key point, and can reduce a matching error caused by a view angle change due to periodic texture distribution through the three-dimensional feature descriptor, thereby facilitating better matching between images, improving the accuracy of key point matching, and ensuring the accuracy of three-dimensional reconstruction.

The three-dimensional feature descriptor in the embodiment of the present application may be obtained by performing feature description on a three-dimensional feature point through a SIFT feature point descriptor, a SURF feature point descriptor, an ORB feature point descriptor, or any one of the feature point descriptors that can be realized.

In an embodiment of the present application, referring to fig. 3, the process of performing feature description on the three-dimensional key points of each frame of image in the target image in step S230 to generate the three-dimensional feature descriptor of each frame of image may include steps S310 to S340, which are described in detail as follows:

step S310, two-dimensional key point extraction is respectively carried out on each frame of image in the target image, and the two-dimensional key points of each frame of image are determined.

The two-dimensional key points in the embodiment of the application can be obtained by any one of the detection modes of the SIFT key point, the SURF key point, the BRLEF, the ORB and other feature points.

Illustratively, two-dimensional key points of two adjacent frames of images in the target image are obtained based on a fast (features from obtained segment test) corner detection algorithm.

In step S320, a depth image corresponding to each frame of image in the target image is obtained, and depth information corresponding to each frame of image is determined from the depth image.

The depth image in the embodiment of the present application is a three-dimensional representation of an object, which is also referred to as a range image, and refers to an image in which the distance/depth from an image acquisition device to each point in a scene is used as a pixel value, and the depth image directly reflects the geometric shape of a visible surface of the scene to be reconstructed.

The depth information in the embodiment of the present application refers to a distance between each two-dimensional key point included in a target image of a scene to be reconstructed and an image collector, and the depth information is helpful for performing three-dimensional reconstruction based on depth data.

In the embodiment of the present application, the depth image may be acquired by a stereo camera or a TOF (Time of flight 3D imaging) camera, and illustratively, by an RGB depth camera. Besides, the depth image acquisition method may also include a laser radar depth imaging method, a computer stereo vision imaging method, a coordinate measuring machine method, a moire fringe method, a structured light method, and the like.

For example, if a target image of a scene to be reconstructed is acquired by the RGB-D depth camera, because the RGB-D depth camera can acquire color information and depth information of an object, the target image of the scene to be reconstructed acquired from the RGB-D depth camera includes continuous frames of RGB images and continuous frames of depth images, and the continuous frames of RGB images and the continuous frames of depth images are in a one-to-one correspondence relationship, that is, one frame of RGB image corresponds to one frame of depth image, each pixel point in the RGB image and a pixel point in the depth image corresponding to the pixel point are also in a one-to-one mapping relationship, and depth information corresponding to a two-dimensional key point in the RGB image can be found from the depth image according to a two-dimensional key point coordinate in the RGB image.

And step S330, fusing the two-dimensional key points and the depth information in each frame of image to obtain three-dimensional key points of each frame of image.

Step S340, generating a three-dimensional feature descriptor of each frame of image according to the change condition and the geometric features of the local surface of the three-dimensional key point of each frame of image.

It should be understood that the three-dimensional feature descriptor in the embodiments of the present application refers to any three-dimensional feature descriptor capable of describing the change situation and the geometric feature of the local surface of the three-dimensional keypoint, and may be, for example, a three-dimensional histogram feature descriptor. In an embodiment of the present application, referring to fig. 4, taking the generation of the three-dimensional histogram feature descriptor as an example, the process of generating the three-dimensional histogram feature descriptor of each frame of image according to the variation condition and the geometric feature of the local surface of the three-dimensional keypoint of each frame of image in step S340 may include steps S410 to S430, which are described in detail as follows:

step S410, calculating a basic three-dimensional feature histogram of each three-dimensional key point in each frame of image.

For example, to calculate three-dimensional keypoints p₁Take the basic three-dimensional histogram of (1) as an example, first, take the three-dimensional key point p₁As a central point, a sphere is drawn by taking r as a radius, K three-dimensional key points exist in the sphere, and the K three-dimensional key points are three-dimensional key points p₁Respectively connecting three-dimensional key points p₁With all neighboring points.

Three-dimensional key point p₁The K neighbor points include a neighbor point p₂For three-dimensional key points p₁And its neighboring point p₂Connecting three-dimensional key points p₁And neighbor point p₂Three-dimensional key point p₁Normal vector of (a) is n₁Three-dimensional key point p₁Is close to the point p₂Normal vector of (a) is n₂. To quantify three-dimensional keypoints p₁And its neighbor point p₂Relationship of normal vectors between two points, at a three-dimensional key point p₁A local coordinate system is defined, and three unit vectors x, y, z of the local coordinate system can be defined as follows:

x＝n₁

z＝x×y

wherein, | | p₂-p₁I2 represents a three-dimensional key point p₁And neighbor point p₂The euclidean distance between two points.

According to the established local coordinate system, the difference of normal vectors between two points can be represented by the following three parameters:

α＝y·n₂

θ＝arctan(z·n₂,x·n₂)

wherein alpha is a normal vector n₂The included angle between the three-dimensional key point and the x axis of the local coordinate system is phi₁And neighbor point p₂Is the angle between the line connecting the local coordinate system xy plane and the global coordinate system xy plane and the local coordinate system x axis, theta is the angle between the line connecting the local coordinate system xy plane and the global coordinate system xy plane and the local coordinate system x axis, and the nearest neighbor point p₂The normal vector difference in the z direction, and some parameters in the formula are the same as those in the formula, and are not described again.

The above three parameters are calculated for each pair of points. Dividing each parameter into b parts, generating a b x b dimensional histogram, and counting the occurrence times of each parameter group in each interval range to obtain the three-dimensional histogramKey point p₁The base three-dimensional feature histogram of (1).

Step S420, calculating a three-dimensional feature histogram of K adjacent points corresponding to each three-dimensional key point in each frame of image.

Step S430, carrying out weighted calculation on the basic three-dimensional feature histogram of each three-dimensional key point in each frame of image and the three-dimensional feature histograms of K adjacent points corresponding to each three-dimensional key point respectively to obtain a three-dimensional histogram feature descriptor of each frame of image.

In the embodiment of the application, the influence of neighborhood neighbors on the space geometry of the three-dimensional key points p is considered, and the weighted statistics is carried out on the three-dimensional feature histograms of the K neighborhood points corresponding to the three-dimensional key points respectively.

For example, the calculation manner of the weighting calculation may be as follows:

wherein F is a three-dimensional histogram feature descriptor of a frame of image, F (P) is a basic three-dimensional feature histogram of the three-dimensional key point p calculated according to the method of step S410,

representing the weighting of the three-dimensional feature histogram of K neighbors in the neighborhood of the three-dimensional keypoint p, where ω is_kThe euclidean distance in space between the neighboring point k and the three-dimensional keypoint p is represented.

And S240, performing feature matching on the three-dimensional feature descriptors of the two adjacent frames of images to obtain a three-dimensional key point matching point pair.

In an embodiment of the present application, referring to fig. 5, the process of performing feature matching on the three-dimensional feature descriptors of two adjacent frames of images in step S240 to obtain a three-dimensional keypoint matching point pair may include steps S510 to S520, which are described in detail as follows:

and step S510, constructing characteristic quantities according to the three-dimensional characteristic descriptors, and comparing and screening the characteristic quantities to obtain a mapping set.

And step S520, performing feature matching on the three-dimensional feature descriptors of the two adjacent frames of images according to the mapping set to obtain a three-dimensional key point matching point pair.

The feature matching method in this embodiment may be any method including the above processes, and for example, the feature matching may be implemented by using a FLANN (Fast Library for adaptive Nearest Neighbors) algorithm.

Illustratively, a FLANN algorithm is adopted to search and cluster three-dimensional feature descriptors of two adjacent frames of images, three-dimensional key points with each key point in one frame of image being closest to a feature vector in the adjacent frame of image are obtained through matching, and a three-dimensional key point matching point pair is formed. Because the key points in two adjacent images are too many, the calculation amount can be greatly reduced by adopting FLANN to search and cluster.

And step S250, filtering the two-dimensional key point matching point pairs and the three-dimensional key point matching point pairs to obtain target matching point pairs.

In an embodiment of the present application, referring to fig. 6, the process of filtering the two-dimensional keypoint matching point pairs and the three-dimensional keypoint matching point pairs in step S250 to obtain target matching point pairs may include steps S610 to S620, which are described in detail as follows:

and step S610, local neighbor constraint and depth information constraint are carried out on the two-dimensional key point matching point pairs and the three-dimensional key point matching point pairs to obtain candidate matching point pairs.

In the embodiment of the application, the local neighbor constraint performs screening and matching on the key points of two adjacent frames of images through a preset distance threshold, and in the two adjacent frames of images, if the distance between the key point in one frame of image and the corresponding key point in the other frame of image is smaller than the preset distance threshold, the two key points meet the screening condition and can be used as candidate matching point pairs after the local neighbor constraint.

For example, as shown in fig. 7, if the target image of the scene to be reconstructed acquired by the RGB-D depth camera includes two continuous images, frame a and frame B, the key point p (a) and its neighboring points in the image of frame aNear Key Point p (A)_N1、p(A)_N2、p(A)_N3The key points in the B frame image are p (B), p (B)_N1、p(B)_N2、p(B)_N3In the B-th frame image, p (B) and p (B)_N1、p(B)_N2、p(B)_N3Still adjacent to the keypoint. The local neighbor information is restricted to the adjacent key points of the key point P (A) in the image of the frame A and the corresponding key points { p (A) in the adjacent frame_N1,p(B)_N1}、{p(A)_N2,p(B)_N2}、{p(A)_N3,p(B)_N3The distance between the corresponding points is smaller than a preset distance threshold value, and the local neighbor information constraint expression is as follows:

local neighbor constraint 1:

wherein the content of the first and second substances,

and representing the local characteristics corresponding to the qth neighboring point of the ith key point in the frame A, wherein delta is a preset distance threshold.

Local neighbor constraint 2:

wherein the content of the first and second substances,

and (3) local features corresponding to the q-th neighbor of the ith key point in the frame B are represented, and partial parameters in the formula are the same as those in the formula.

The depth information constraint in the embodiment of the application screens and matches the key points of two adjacent frames of images through a preset depth difference threshold value. Because the difference between the two depth values corresponding to the keypoint matching point pairs of the two adjacent frame images is not large, in the two adjacent frame images, if the difference between the absolute depth values of the keypoint in one frame image and the corresponding keypoint in the other frame image is smaller than the preset depth difference threshold, the two keypoints meet the screening condition and can be used as a candidate matching point pair after the depth information is constrained.

Illustratively, as shown in fig. 7, the difference between the absolute values of a key point p (a) in the a-frame image and a Depth { Depth (p (a)) } corresponding to a key point p (B) in the B-frame image should be smaller than a preset Depth difference threshold, and the Depth information constraint expression is as follows:

||Depth(p(A)_i)-Depth(p(B)_i)||≤ρ

wherein, Depth (p (A)_i) For the key point in the A frame image_iDepth, corresponding to a keypoint, Depth (p (B)_i) And p is a preset depth difference threshold value, and is the depth corresponding to the ith key point of the key point in the B frame image.

The candidate matching point pair in the embodiment of the present application refers to a matching point pair obtained by performing the local neighbor constraint and depth information constraint screening at the same time. When the scene to be matched with the periodic texture is matched with the key points, the visual change generated by the periodic texture easily affects the matching, so that the phenomenon of mismatching is caused.

And S620, filtering the candidate matching point pairs by using a random sampling consistency algorithm to obtain target matching point pairs.

In the embodiment of the application, the RANSAC algorithm is adopted to further screen the candidate matching point pairs subjected to local neighbor constraint and depth information constraint screening, so that all mismatching points can be removed, the matching accuracy of key points of two adjacent frames of images is further improved, and the accuracy of three-dimensional reconstruction is improved. In addition, the matching point pairs subjected to local neighbor constraint and depth information constraint screening are introduced into RANSAC sample consistency analysis, so that rapidity and global optimality of posture estimation can be guaranteed.

And step 260, performing camera pose estimation and three-dimensional scene reconstruction according to the target matching point pairs to obtain a target image of a reconstructed scene.

As can be seen from the above, in the embodiment of the present application, by constructing the three-dimensional feature descriptor containing the depth information, and performing feature matching on the constructed three-dimensional feature descriptor in combination with the two-dimensional feature descriptor, an initial matching point pair is obtained, and since the three-dimensional feature descriptor can describe shape information of a local curved surface of the three-dimensional key point, the spatial information of the scene to be reconstructed is considered, and the accuracy of the key point matching point pair is improved to a certain extent by performing feature matching in combination with the three-dimensional feature descriptor and the two-dimensional feature descriptor; after the initial matching point pair is obtained, in order to screen out the matching point pair which is in error matching, the initial matching point pair is screened by combining the depth information constraint and the local neighbor constraint of the target image to obtain a candidate matching point pair, so that the screened candidate matching point pair has higher matching precision, the error matching of key points is greatly reduced, the accuracy rate of key point matching of two adjacent frames of images is improved, and the accuracy rate of key point matching and the accuracy rate of three-dimensional reconstruction can be effectively improved. In addition, the random sampling consistency algorithm is used for carrying out a secondary screening process on the candidate matching point pairs subjected to the local neighbor constraint and depth information constraint screening before the camera pose estimation is carried out, all wrong matching point pairs can be removed, the matching precision of the key point matching point pairs is improved, in addition, the random sampling consistency algorithm is used after the number of the wrong matching point pairs is preliminarily reduced through the local neighbor constraint and the depth information constraint screening, and the screening time of the matching point pairs can be shortened; and after the wrong matching point pairs are deleted, camera pose estimation is carried out on the matching point pair set which is accurately matched, so that the accuracy of three-dimensional reconstruction is improved.

Fig. 8 is a flowchart illustrating a three-dimensional reconstruction method according to another exemplary embodiment of the present application. As shown in fig. 8, the method may be performed by the server 101 in the framework of the three-dimensional reconstruction system shown in fig. 1. It should be understood that the method may be applied to other exemplary implementation environments and is specifically executed by devices in other implementation environments, and the embodiment does not limit the implementation environment to which the method is applied.

As shown in fig. 8, the three-dimensional reconstruction method at least includes steps S810 to S860, which are described in detail as follows:

step S810, acquiring a target image of a scene to be reconstructed from the RGB-D camera, wherein the target image of the scene to be reconstructed comprises two adjacent images of a frame A and a frame B.

Optionally, the target image of the scene to be reconstructed is an initial image of consecutive frames for three-dimensional reconstruction.

Step S820, respectively extracting a two-dimensional feature descriptor corresponding to the frame A and a two-dimensional feature descriptor corresponding to the frame B, and performing feature matching according to the two-dimensional feature descriptor of the frame A and the two-dimensional feature descriptor of the frame B to obtain a two-dimensional key point matching point pair of the frame A and the frame B.

In this embodiment, the process of extracting the two-dimensional feature descriptors corresponding to the frame a and the frame B in step S820 may include the following steps, which are described in detail as follows:

respectively carrying out two-dimensional key point detection on the frame A and the frame B by adopting an SIFT key point detection method to obtain two-dimensional key points of the frame A and two-dimensional key points of the frame B; the number of two-dimensional key points of the frame a is multiple, and the number of two-dimensional key points of the frame B is multiple, which is not limited in this embodiment.

And respectively carrying out feature description on the two-dimensional key points of the frame A and the two-dimensional key points of the frame B to obtain a two-dimensional feature descriptor of the frame A and a two-dimensional feature descriptor of the frame B.

In this embodiment, the process of performing feature matching according to the two-dimensional feature descriptor of the frame a and the two-dimensional feature descriptor of the frame B in step S820 to obtain the two-dimensional keypoint matching point pair of the frame a and the frame B may include the following steps, which are described in detail as follows:

and if the Euclidean distance between the two-dimensional key points of the frame A and the two-dimensional key points in the frame B is smaller than a preset distance value, taking the two key points as a two-dimensional key point matching point pair.

Step S830, performing feature description on the three-dimensional key points of the frame a and the frame B to generate three-dimensional feature descriptors of the frame a and the frame B, where the three-dimensional key point of the frame a is determined based on the two-dimensional key point of the frame a, and the three-dimensional key point of the frame B is determined based on the two-dimensional key point of the frame B.

In this embodiment of the present application, the process of performing feature description on the three-dimensional key points of the frame a and the frame B and generating three-dimensional feature descriptors of the frame a and the frame B in step S830 may include the following steps, which are described in detail as follows:

respectively extracting two-dimensional key points of the frame A and the frame B by adopting a FAST key point detection method to determine the two-dimensional key points of the frame A and the two-dimensional key points of the frame B;

acquiring a depth image corresponding to a frame A and a depth image corresponding to a frame B, determining depth information corresponding to the frame A from the depth image corresponding to the frame A, and determining depth information corresponding to the frame B from the depth image corresponding to the frame B;

fusing the two-dimensional key points of the frame A and the depth information corresponding to the frame A to obtain three-dimensional key points of the frame A, and fusing the two-dimensional key points of the frame B and the depth information corresponding to the frame B to obtain three-dimensional key points of the frame B;

and generating a three-dimensional histogram feature descriptor of the frame A according to the change condition and the geometric features of the three-dimensional key point local surface of the frame A, and generating a three-dimensional histogram feature descriptor of the frame B according to the change condition and the geometric features of the three-dimensional key point local surface of the frame B.

Step 840, performing feature matching on three-dimensional histogram feature descriptors corresponding to frame A and frame B respectively to obtain three-dimensional key point matching point pairs;

step S850, filtering the two-dimensional key point matching point pairs and the three-dimensional key point matching point pairs to obtain target matching point pairs;

and S860, performing camera pose estimation and three-dimensional scene reconstruction according to the target matching point pairs to obtain a target image of a reconstructed scene.

Optionally, please refer to the foregoing embodiment for the specific implementation process of steps S810 to S860, which is not described herein again.

From the above, in the embodiment of the present application, first, the target image of the scene to be reconstructed is obtained from the RGB-D camera, then the two-dimensional feature descriptor extraction and the three-dimensional feature descriptor extraction are performed on the two adjacent frames of images, then the two-dimensional feature matching is performed according to the two-dimensional feature descriptors of the two adjacent frames of images to obtain the two-dimensional key point matching point pair, the three-dimensional feature matching is performed according to the three-dimensional feature descriptors of the two adjacent frames of images to obtain the three-dimensional key point matching point pair, then the two-dimensional key point matching point pair and the three-dimensional key point matching point pair are screened by using the local neighbor constraint and the depth information constraint to obtain the candidate matching point pair with higher matching precision, then the candidate matching point pair is screened by using RANSAC to remove all the wrong matching point pairs, which can improve the matching precision of the key point matching point pair, and the screening time of the key point matching point pair can be shortened, and the accuracy of three-dimensional reconstruction can be further improved.

Therefore, in the embodiment of the application, the three-dimensional histogram feature descriptor is constructed through the depth information and the two-dimensional feature descriptor, and can describe the change condition and the geometric features of the local surface near the three-dimensional key point, namely the spatial information of a scene to be reconstructed is considered during three-dimensional reconstruction, and compared with the prior art that the three-dimensional reconstruction is completed by only using the two-dimensional feature descriptor to perform point-to-point matching, the matching precision of feature matching is higher, and the accuracy of key point matching and the accuracy of three-dimensional reconstruction are improved to a great extent; in addition, the depth and the neighbor information of the image are combined to restrict the matching process of the key points, the matching point pairs which are mismatched are screened out, the screening time of the key point matching point pairs is shortened, and meanwhile the matching precision of the key point matching point pairs is further improved.

Fig. 9 is a block diagram of a three-dimensional reconstruction apparatus according to an exemplary embodiment of the present application. The apparatus can be applied to the implementation environment shown in fig. 1, and is specifically configured in the server 101. The apparatus may also be applied to other exemplary implementation environments and specifically configured in other devices, and the embodiment does not limit the implementation environment to which the apparatus is applied.

As shown in fig. 9, the exemplary three-dimensional reconstruction apparatus includes:

an obtaining module 910 configured to obtain a target image of a scene to be reconstructed;

a two-dimensional feature extraction and matching module 920, configured to respectively extract two-dimensional feature descriptors of two adjacent frames of images in the target image, and perform feature matching on the two-dimensional feature descriptors of the two adjacent frames of images to obtain a two-dimensional key point matching point pair;

a three-dimensional feature descriptor generation module 930 configured to perform feature description on the three-dimensional key points of each frame of image in the target image, and generate a three-dimensional feature descriptor of each frame of image; the three-dimensional key point of each frame of image is determined based on the two-dimensional key point of each frame of image;

a three-dimensional feature matching module 940 configured to perform feature matching on the three-dimensional feature descriptors of two adjacent frames of images to obtain a three-dimensional key point matching point pair;

a filtering module 950 configured to filter the two-dimensional key point matching point pairs and the three-dimensional key point matching point pairs to obtain target matching point pairs;

and the camera pose estimation and three-dimensional scene reconstruction module 960 is configured to perform camera pose estimation and three-dimensional scene reconstruction according to the target matching point pairs to obtain a target image of a reconstructed scene.

In an embodiment of the present application, the three-dimensional feature descriptor generating module 930 further includes:

the two-dimensional key point determining submodule is configured to extract two-dimensional key points of each frame of image in the target image respectively and determine the two-dimensional key points of each frame of image;

the depth information determining submodule is configured to acquire a depth image corresponding to each frame of image in the target image and determine depth information corresponding to each frame of image from the depth image;

the three-dimensional key point determining submodule is configured to fuse the two-dimensional key points and the depth information in each frame of image to obtain three-dimensional key points of each frame of image;

and the three-dimensional feature descriptor generation sub-module is configured to generate the three-dimensional feature descriptor of each frame of image according to the change condition and the geometric features of the local surface of the three-dimensional key point of each frame of image.

In one embodiment of the present application, the three-dimensional feature descriptor generation sub-module further includes:

the first calculation unit is configured to generate a three-dimensional feature descriptor of each frame of image according to the change condition and the geometric features of the local surface of the three-dimensional key point of each frame of image;

the second calculation unit is configured to calculate a three-dimensional feature histogram of K adjacent points corresponding to each three-dimensional key point in each frame of image;

and the third calculation unit is configured to perform weighted calculation on the basic three-dimensional feature histogram of each three-dimensional key point in each frame of image and the three-dimensional feature histograms of the K adjacent points corresponding to each three-dimensional key point respectively to obtain a three-dimensional histogram feature descriptor of each frame of image.

In one embodiment of the present application, the three-dimensional feature matching module 940 further includes:

the mapping set determining unit is configured to construct feature quantities according to the three-dimensional feature descriptors, and compare and screen the feature quantities to obtain a mapping set;

and the three-dimensional feature matching unit is configured to perform feature matching on the three-dimensional feature descriptors of the two adjacent frames of images according to the mapping set to obtain a three-dimensional key point matching point pair.

In one embodiment of the present application, the two-dimensional feature extraction and matching module 920 further comprises:

the two-dimensional key point determining unit is configured to extract two-dimensional key points of two adjacent frames of images in the target image respectively to obtain two-dimensional key points of each frame of image;

the two-dimensional feature descriptor determining unit is configured to respectively perform feature description on the two-dimensional key points of each frame of image to obtain two-dimensional feature descriptors respectively corresponding to two adjacent frames of images;

the two-dimensional key point acquisition unit is configured to acquire two-dimensional key points of one frame of image from two adjacent frames of images;

a fourth calculation unit configured to calculate euclidean distances between each two-dimensional key point and each two-dimensional key point included in another frame image, respectively;

and the two-dimensional key point matching point pair determining unit is configured to use the two-dimensional key points as the two-dimensional key point matching point pair if the Euclidean distance between the two-dimensional key points is smaller than a preset distance value.

In one embodiment of the present application, the filtering module 950 further includes:

the first filtering unit is configured to perform local neighbor constraint and depth information constraint on the two-dimensional key point matching point pairs and the three-dimensional key point matching point pairs to obtain candidate matching point pairs;

and the second filtering unit is configured to filter the candidate matching point pairs by using a random sampling consistency algorithm to obtain target matching point pairs.

It should be noted that the three-dimensional reconstruction apparatus provided in the foregoing embodiment and the three-dimensional reconstruction method provided in the foregoing embodiment belong to the same concept, and specific ways for the modules and units to perform operations have been described in detail in the method embodiment, and are not described herein again. In practical applications, the three-dimensional reconstruction apparatus provided in the above embodiment may be configured to allocate the functions to different functional modules according to requirements, that is, to divide the internal structure of the apparatus into different functional modules to complete all or part of the functions described above, which is not limited herein.

An embodiment of the present application further provides an electronic device, including: one or more processors; a storage device for storing one or more programs, which when executed by the one or more processors, cause the electronic device to implement the three-dimensional reconstruction method provided in the above-described embodiments.

FIG. 10 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application. It should be noted that the computer system 1000 of the electronic device shown in fig. 10 is only an example, and should not bring any limitation to the functions and the application scope of the embodiments of the present application.

As shown in fig. 10, the computer system 1000 includes a Central Processing Unit (CPU)1001 that can perform various appropriate actions and processes, such as performing the methods described in the above embodiments, according to a program stored in a Read-Only Memory (ROM) 1002 or a program loaded from a storage portion 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for system operation are also stored. The CPU 1001, ROM 1002, and RAM 1003 are connected to each other by a bus 1004. An Input/Output (I/O) interface 1005 is also connected to the bus 1004.

The following components are connected to the I/O interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output section 1007 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 1008 including a hard disk and the like; and a communication section 1009 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The driver 1010 is also connected to the I/O interface 1005 as necessary. A removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1010 as necessary, so that a computer program read out therefrom is mounted into the storage section 1008 as necessary.

In particular, according to embodiments of the application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising a computer program for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication part 1009 and/or installed from the removable medium 1011. When the computer program is executed by a Central Processing Unit (CPU)1001, various functions defined in the system of the present application are executed.

It should be noted that the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer readable storage medium may be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer-readable signal medium may comprise a propagated data signal with a computer-readable computer program embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. The computer program embodied on the computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

Another aspect of the present application also provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the three-dimensional reconstruction method as set forth above. The computer-readable storage medium may be included in the electronic device described in the above embodiment, or may exist separately without being incorporated in the electronic device.

Another aspect of the application also provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes the three-dimensional reconstruction method provided in the above embodiments.

The above description is only a preferred exemplary embodiment of the present application, and is not intended to limit the embodiments of the present application, and those skilled in the art can easily make various changes and modifications according to the main concept and spirit of the present application, so that the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of three-dimensional reconstruction, comprising:

acquiring a target image of a scene to be reconstructed;

performing feature description on the three-dimensional key points of each frame of image in the target image to generate a three-dimensional feature descriptor of each frame of image; the three-dimensional key point of each frame of image is determined based on the two-dimensional key point of each frame of image;

carrying out feature matching on the three-dimensional feature descriptors of the two adjacent frames of images to obtain a three-dimensional key point matching point pair;

filtering the two-dimensional key point matching point pairs and the three-dimensional key point matching point pairs to obtain target matching point pairs;

and performing camera pose estimation and three-dimensional scene reconstruction according to the target matching point pairs to obtain a target image of a reconstructed scene.

2. The method of claim 1, wherein the characterizing three-dimensional key points of each frame of image in the target image to generate a three-dimensional feature descriptor of each frame of image comprises:

respectively extracting two-dimensional key points of each frame of image in the target image to determine the two-dimensional key points of each frame of image;

acquiring a depth image corresponding to each frame of image in the target image, and determining depth information corresponding to each frame of image from the depth image;

fusing two-dimensional key points and depth information in each frame of image to obtain three-dimensional key points of each frame of image;

and generating a three-dimensional feature descriptor of each frame of image according to the change condition and the geometric features of the local surface of the three-dimensional key point of each frame of image.

3. The method of claim 2, wherein the three-dimensional feature descriptor comprises a three-dimensional histogram feature descriptor, and the generating of the three-dimensional feature descriptor of each frame of image according to the change situation and the geometric features of the local surface of the three-dimensional key point of each frame of image comprises:

calculating a basic three-dimensional feature histogram of each three-dimensional key point in each frame of image;

calculating three-dimensional feature histograms of K adjacent points corresponding to each three-dimensional key point in each frame of image;

and performing weighted calculation on the basic three-dimensional feature histogram of each three-dimensional key point in each frame of image and the three-dimensional feature histograms of K adjacent points corresponding to each three-dimensional key point respectively to obtain a three-dimensional histogram feature descriptor of each frame of image.

4. The method according to claim 2, wherein the performing feature matching on the three-dimensional feature descriptors of the two adjacent frames of images to obtain a three-dimensional keypoint matching point pair comprises:

constructing characteristic quantities according to the three-dimensional characteristic descriptors, and comparing and screening the characteristic quantities to obtain a mapping set;

and performing feature matching on the three-dimensional feature descriptors of the two adjacent frames of images according to the mapping set to obtain a three-dimensional key point matching point pair.

5. The method according to claim 1, wherein the extracting two-dimensional feature descriptors of two adjacent frames of images respectively comprises:

respectively extracting two-dimensional key points of two adjacent frames of images in the target image to obtain two-dimensional key points of each frame of image;

and respectively carrying out feature description on the two-dimensional key points of each frame of image to obtain two-dimensional feature descriptors respectively corresponding to the two adjacent frames of images.

6. The method according to claim 5, wherein the performing feature matching on the two-dimensional feature descriptors of the two adjacent frames of images to obtain two-dimensional keypoint matching point pairs comprises:

acquiring two-dimensional key points of one frame of image from the two adjacent frames of images;

respectively calculating Euclidean distances between each two-dimensional key point and each two-dimensional key point contained in the other frame of image;

and if the Euclidean distance between the two-dimensional key points is smaller than a preset distance value, taking the two-dimensional key points as the two-dimensional key point matching point pair.

7. The method according to any one of claims 1 to 6, wherein the filtering the two-dimensional keypoint matched point pairs and the three-dimensional keypoint matched point pairs to obtain target matched point pairs comprises:

local neighbor constraint and depth information constraint are carried out on the two-dimensional key point matching point pairs and the three-dimensional key point matching point pairs to obtain candidate matching point pairs;

and filtering the candidate matching point pairs by using a random sampling consistency algorithm to obtain the target matching point pairs.

8. A three-dimensional reconstruction apparatus, characterized in that the apparatus comprises:

the acquisition module is configured to acquire a target image of a scene to be reconstructed;

the two-dimensional feature extraction and matching module is configured to respectively extract two-dimensional feature descriptors of two adjacent frames of images in the target image, and perform feature matching on the two-dimensional feature descriptors of the two adjacent frames of images to obtain two-dimensional key point matching point pairs;

the three-dimensional feature descriptor generation module is configured to perform feature description on three-dimensional key points of each frame of image in the target image and generate a three-dimensional feature descriptor of each frame of image; the three-dimensional key point of each frame of image is determined based on the two-dimensional key point of each frame of image;

the three-dimensional feature matching module is configured to perform feature matching on the three-dimensional feature descriptors of the two adjacent frames of images to obtain a three-dimensional key point matching point pair;

the filtering module is configured to filter the two-dimensional key point matching point pairs and the three-dimensional key point matching point pairs to obtain target matching point pairs;

and the camera pose estimation and three-dimensional scene reconstruction module is configured to perform camera pose estimation and three-dimensional scene reconstruction according to the target matching point pairs to obtain a target image of a reconstructed scene.

9. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs which, when executed by the one or more processors, cause the electronic device to implement the three-dimensional reconstruction method of any one of claims 1 to 7.

10. A computer-readable storage medium having computer-readable instructions stored thereon, which, when executed by a processor of a computer, cause the computer to perform the three-dimensional reconstruction method of any one of claims 1 to 7.