CN116977596A

CN116977596A - Three-dimensional modeling system and method based on multi-view images

Info

Publication number: CN116977596A
Application number: CN202310828788.4A
Authority: CN
Inventors: 甘智高; 岳克强; 李文钧
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2023-07-07
Filing date: 2023-07-07
Publication date: 2023-10-31

Abstract

The invention discloses a three-dimensional modeling system and a method based on multi-view images, wherein a three-dimensional model of a real scene is reconstructed through two-dimensional multi-view images; the invention comprises image feature point detection and matching, motion recovery structure, multi-view stereo, surface reconstruction and texture mapping, and the feature points on different images are matched one by one through the image feature point matching; then calculating the three-dimensional coordinates of the pose of the camera and the sparse point cloud through the matched characteristic points; further acquiring richer scene information according to the pose of the camera; then gridding the obtained scene point cloud through surface reconstruction; finally, the grid colors are adjusted through texture mapping, so that the three-dimensional model effect is more realistic. The method can be deployed on the unmanned aerial vehicle, and can reconstruct the image shot by the unmanned aerial vehicle into a three-dimensional model of a scene of the unmanned aerial vehicle.

Description

Three-dimensional modeling system and method based on multi-view images

Technical Field

The invention relates to the technical field of computer vision and three-dimensional reconstruction, in particular to a three-dimensional modeling system and method based on multi-view images.

Background

In recent years, along with the development of technology along the direction of intelligence, three-dimensional vision is widely focused under the drive of application such as smart city, virtual tour, digital heritage protection, digital map and the like.

The image-based three-dimensional model reconstruction is to shoot real world objects and scenes by using sensors such as cameras, and process the real world objects and scenes by using a computer vision technology, so that a three-dimensional model of the objects is obtained, and the three-dimensional model reconstruction is a basic and active research subject in computer vision and photogrammetry. With popularization of smart phones, digital cameras, unmanned aerial vehicles and the like and high-speed development of the internet, a large number of internet images related to a certain outdoor scene can be acquired through a search engine; by utilizing the images to carry out efficient and accurate three-dimensional reconstruction, real perception and immersive experience can be provided for users, and meanwhile, the three-dimensional reconstruction plays an important role in post-disaster rescue of the unmanned aerial vehicle.

These application requirements have now raised a great deal of attention in the industry, emerging a variety of three-dimensional reconstruction methods. Mainly based on visual geometry and on deep learning. Among the main techniques involved in conventional methods based on visual geometry are multi-view geometry, depth map estimation, point cloud processing, mesh reconstruction and optimization, texture mapping, markov random fields, etc.

Although there are great advances made in current image-based three-dimensional reconstruction techniques, there are still problems to be solved. Modeling performance is not robust enough for scenes with repeated textures and weak textures; meanwhile, the three-dimensional reconstruction has higher requirements on the quality of the image, and problems can occur in detection and matching of characteristic points when the inclination angle of the image is overlarge, and the image needs to be preprocessed, so that the automatic modeling in a short period can not completely replace the manual modeling.

Disclosure of Invention

In order to solve the defects in the prior art, the aim of reducing the probability of matching failure and enhancing the robustness of the system is fulfilled, and the invention adopts the following technical scheme:

a three-dimensional modeling system based on multi-view images comprises an image feature point detection and matching module, a motion restoration structure module, a multi-view three-dimensional module and a texture mapping module;

the image feature point detection and matching module performs forward-looking angular matrix transformation on the image, detects the same feature points in the image under different visual angles, and associates the same feature points through matching; aiming at the problem that the number of the feature points detected when the image is at a large inclination angle is small, the front view angle matrix transformation pretreatment is carried out on the image before the feature points are detected, so that the number of the feature points can be effectively increased, and more high-quality feature points can be obtained;

the motion recovery structure module calculates the pose parameters of the camera according to the matched characteristic points, and calculates the three-dimensional coordinates of the sparse point cloud mapped by the characteristic points through the pose parameters of the camera;

the multi-view three-dimensional module is used for calculating three-dimensional coordinates corresponding to pixel blocks around the feature points by triangularization with the pose of the camera and the feature points as the center, and combining the three-dimensional coordinates of sparse point clouds to obtain three-dimensional coordinates of dense point clouds;

the texture mapping module is used for rendering the color of the original image.

Further, the forward-looking angular matrix transformation formula is as follows:

where (u, v) is the original image pixel abscissa and ordinate, (u) ^′ ,v ^′ ) The transformed horizontal and vertical coordinates of the image pixels are that a represents one element of a rotation matrix, the rotation matrix represents the pose state of the camera, and the rotation matrix can be obtained through calculation through corresponding feature points in the image.

Further, the image feature point detection and matching module includes: the device comprises a feature point detection unit, a feature point description unit, a feature point matching unit and a mismatching filtering unit;

the characteristic point detection unit detects local characteristic points on the sub-positioning image by utilizing the characteristics; determining a feature detector, namely firstly calculating the gradient of an image, then calculating the Harris matrix of each pixel position through the gradient of the image, and then calculating the angular point response value of each pixel through the Harris matrix of each pixel, wherein the angular point response value is higher than a response threshold value and is taken as a feature point;

the feature point description unit utilizes the neighborhood range of the feature point of the feature descriptor coding feature point to prepare for subsequent matching; the feature descriptor is a unique group of identifiers of pixel blocks, is in a multi-dimensional (128-dimensional) 0/1 vector form, is determined by sampling a group of patches in an image, comparing the sizes of pixels in the patches, setting 1 if the pixel sizes are larger than a pixel threshold value, setting 0 if the pixel sizes are smaller than the pixel threshold value, and taking the obtained multi-dimensional (128-dimensional) 0/1 vector as a descriptor;

the feature point matching unit calculates the distance between feature descriptors in the two images by utilizing nearest neighbor search so as to match the corresponding relation of feature points;

in the mismatching filtering unit, because factors such as illumination, scale, rotation and the like interfere in the characteristic point matching process, in order to improve the matching success rate, a mismatching mechanism is set, mismatching characteristic points are removed in a mode of estimating a camera model, a filtering threshold is set according to the minimum distance between characteristic descriptors, and when the distance between the characteristic descriptors is larger than the filtering threshold, matching is considered to be wrong.

Further, the image gradient is:

the Harris matrix is:

where (x, y) represents the position of the pixel, I (x, y) represents the pixel value of the (x, y) position,the pixel value representing the (x, y) position is biased against the abscissa and +.>The pixel value representing the (x, y) position is biased against the ordinate, w (x, y) representing the weight coefficient, defaulting to 1.

In the feature point matching unit, the feature points are matched by comparing the distances between feature descriptors in two images through hamming distances, and if the ratio of the nearest neighbor distance to the next nearest neighbor distance is smaller than a preset threshold value, the matching is successful, wherein the hamming distance formula is as follows:

wherein a and b are two feature description subsequences of picture pixels to be matched,representing exclusive or operation, n representing the dimension of the descriptor.

Further, the motion restoration structure module calculates the pose of the camera by using a camera model and epipolar geometry, wherein the camera model is represented by a perspective matrix of the pinhole camera:

wherein R and t are external parameters of the camera, and are rotation matrixes and translation vectors respectively; f (f) _a 、f _b 、u ₀ 、v ₀ Is an internal reference of the camera, f _a 、f _b Representing the lateral and longitudinal conversion factors between the physical coordinate system of the image and the conversion of the pixel coordinate system, u ₀ 、v ₀ A number of horizontal and vertical pixels representing a phase difference between a center pixel coordinate of the image and an origin pixel coordinate of the image; (u, v) is the coordinates of the image plane, (x) _w y _w z _w ) Is a coordinate in the world coordinate system, w represents an abbreviation of the world coordinate system;

the epipolar constraint is expressed as:

where x1, x2 are coordinates on the normalized plane of the two pixel points,represents a basic matrix, K1 and K2 represent camera internal reference matrixes of two frames of images, and E= [ t ]] _x R represents an essential matrix, [. Cndot.] _x A mathematical expression for converting the vector into an antisymmetric matrix corresponding to the vector and multiplying the antisymmetric matrix with the matrix;

if the feature points in the space are located on the same plane, solving a homography matrix H by using a direct linear transformation method, and then calculating the pose of the camera, wherein

The equation for solving the homography matrix H is as follows:

wherein the method comprises the steps ofAnd->Representing pixel coordinates of a pair of matching points, H _ij An element representing a homography matrix, H having 8 degrees of freedom, each pair of points having two constraints;

H ₁₁ u ₁ +H ₁₂ v ₁ +H ₁₃ -H ₃₁ u ₁ u ₂ -H ₃₂ v ₁ u ₂ -H ₃₃ u ₂ ＝0

H ₂₁ u ₁ +H ₂₂ v ₁ +H ₂₃ -H ₃₁ u ₁ v ₂ -H ₃₂ v ₁ v ₂ -H ₃₃ v ₂ ＝0

let H ₃₃ =1, 4 pairs of feature points are required in total to solve the homography matrix;

wherein, 4 pairs of matched characteristic points are needed for solving the homography matrix,and->The pixel coordinates of the feature points are indicated, and the upper right brackets indicate the number of pairs of feature points.

Further, the multi-view stereoscopic module is implemented by a depth map fusion method, and includes: a reference image selecting unit, a depth map estimating unit and a depth map extracting unit;

the reference image selecting unit searches a group of reference images which can help the images to estimate the depth for each image, so that each pixel point in the original image and the reference image have corresponding points;

the depth map estimation unit estimates a proper depth for the pixel points in the original image by utilizing the luminosity consistency of the corresponding pixel points in the reference image;

the depth map extraction unit is used for extracting and filtering the depth map and removing the condition that the depth of the adjacent depth maps is inconsistent.

Further, the triangularization is to estimate the pixel depth with minimum reprojection error by matching the pixel coordinates of the feature points with the rays connected with the camera optical center, wherein the depth optimization is to adopt a photometric consistency assumption and binding adjustment strategy to generate globally consistent camera parameters and scene structures from the relative camera parameters.

Further, the device also comprises a surface reconstruction module, wherein the three-dimensional point cloud obtained by the multi-view stereo module is gridded, so that convenience is provided for subsequent rendering; mining local geometric information and graphic structure information in large-scale point cloud input, and converting the three-dimensional point cloud into a three-dimensional grid by adopting a difference value and approximation method;

and the texture mapping module adds the color information of the corresponding pixel point in the picture to the grid information to obtain a final color three-dimensional model.

A three-dimensional modeling method based on multi-view images comprises the following steps:

step S101: performing orthoscopic angular matrix transformation on the image, detecting the same characteristic points in the image under different visual angles, and correlating the same characteristic points through matching; aiming at the problem that the number of the feature points detected when the image is at a large inclination angle is small, the front view angle matrix transformation pretreatment is carried out on the image before the feature points are detected, so that the number of the feature points can be effectively increased, and more high-quality feature points can be obtained;

step S102: according to the matched feature points, calculating pose parameters of the camera, and calculating three-dimensional coordinates of sparse point clouds mapped by the feature points through the pose parameters of the camera;

step S103: the camera pose and the feature point are subjected to triangularization by taking the feature point as a center to calculate three-dimensional coordinates corresponding to pixel blocks around the feature point, and the three-dimensional coordinates of sparse point clouds are combined to obtain dense point cloud three-dimensional coordinates;

step S104: the colors of the original image are rendered by texture mapping.

Further, in the step S104, the creating of the texture image based on multiple views includes the following steps:

step S201: viewing angle selection: selecting a viewing angle based on the scale of the image, the detail richness of the image and the visibility of the image, so that each vertex has a unique viewing angle for acquiring texture information;

step S202: calculating texture coordinates: projecting the grid onto a visual image, determining the corresponding relation of points among the projection triangles, and normalizing texture coordinates;

step S203: creating a texture image: projecting the grids onto the corresponding images, and intercepting the images within the range of the minimum bounding box as texture images;

step S204: color adjustment: since there are obvious gaps at different texture grids due to camera exposure or illumination difference between different visual angles, a color adjustment amount needs to be added to each pixel, and the color adjustment amount is obtained in an interpolation mode, so that the color difference at the gaps is minimized.

Step S205: editing an image: for the region with serious gaps, global color adjustment cannot guarantee complete gap removal, and the foreground image and the background image need to be mixed so that the fused image meets the condition that the pixel value on the boundary is the same as that of the background image, and meanwhile, the gradient in the foreground region is the same as that of the guiding gradient field.

The invention has the advantages that:

1. the image is preprocessed through the forward view matrix transformation before the feature point detection and the matching, compared with the traditional algorithm, enough feature points can be still extracted when the image inclination angle is too large, the probability of matching failure is greatly reduced, and the robustness of the system is enhanced.

2. The depth map refining unit is added when the motion structure is restored and the multi-view stereo matching is performed, the condition that the depth of the adjacent depth maps is inconsistent is eliminated, the redundancy of dense point clouds generated by the multi-view depth maps is reduced, and the reconstruction efficiency is improved.

3. The boundary expression method is used for describing the three-dimensional object into a group of surfaces, compared with the traditional space division method, the geometric operation and operation of the model are more convenient, the surface detail of the model can be restored, and meanwhile, the stability is high.

Drawings

FIG. 1 is a schematic diagram of a system structure according to an embodiment of the present invention.

FIG. 2 is a diagram of an improved feature point detection and matching process in an embodiment of the present invention.

Fig. 3 is a diagram of an image feature point matching effect in an embodiment of the present invention.

Fig. 4 is a schematic diagram of recovering three-dimensional point cloud coordinates in an embodiment of the present invention.

FIG. 5 is a flow chart of texture mapping in an embodiment of the invention.

Fig. 6 is an effect diagram of an indoor scene reconstructed in an embodiment of the present invention.

Detailed Description

The following describes specific embodiments of the present invention in detail with reference to the drawings. It should be understood that the detailed description and specific examples, while indicating and illustrating the invention, are not intended to limit the invention.

A three-dimensional modeling system based on multi-view images outputs a three-dimensional model of a scene in a picture by inputting the picture and then performing a series of visual geometry processing. Because the traditional three-dimensional reconstruction algorithm has high requirements on images, the detected feature points are few when the inclination angle of the images is overlarge, and the matching is easy to fail, so that the follow-up steps cannot be performed, and the reconstruction fails. In order to improve the success rate of three-dimensional reconstruction, the invention makes a series of improvements on the traditional algorithm. Firstly, preprocessing of forward view matrix transformation is added to the image, the number of feature points is effectively increased, and the robustness of a reconstruction system is improved. Meanwhile, a depth map refining step is added in the multi-view stereo matching process, so that the redundancy of point clouds is reduced, and the operation speed is increased.

The system of the present invention comprises five modules, as shown in fig. 1. The system comprises an image feature point detection and matching module, a motion recovery structure module, a multi-view three-dimensional module, a surface reconstruction module and a texture mapping module.

The characteristic point detection and matching module of the image detects the same characteristic point in the image under different visual angles, and associates the same characteristic point by matching; the module is a basic module of three-dimensional reconstruction, the invention adopts a manual design detection algorithm, and aims at the problem that the number of detected characteristic points is small when the image is at a large inclination angle, a new algorithm is improved on the basis of a Scale-invariant feature transform (Scale-invariant feature transform) algorithm, the image is preprocessed by forward-looking angular matrix transformation before the characteristic points are detected, the number of the characteristic points can be effectively increased, more high-quality characteristic points are obtained, and more accurate data support is provided for the subsequent motion structure recovery. The conversion formula of the pretreatment is as follows:

where (u, v) is the original image pixel coordinates, (u) ^′ ,v ^′ ) Is the transformed image pixel coordinates.

The image feature point matching module comprises: the device comprises a feature point detection unit, a feature point description unit, a feature point matching unit and a mismatching filtering unit; the execution is shown in fig. 2, and the feature matching effect is shown in fig. 3.

The characteristic point detection unit is used for detecting local characteristic points on the sub-positioning image; the feature points mainly meet the two requirements of difference and repeatability; wherein the difference is detectable; repeatability is manifested in being mateable;

and determining a feature detection sub, namely firstly calculating the gradient of the image, then calculating the Harris matrix of each pixel position, and then calculating the angular point response value of each pixel, wherein the response value is higher than a threshold value and is taken as a feature point.

Wherein the image gradient is:

the Harris matrix is:

the characteristic point description unit prepares for subsequent matching by utilizing the neighborhood range of the descriptor coding characteristic points;

and determining a characteristic descriptor, wherein a description vector consists of 128 0 or 1, randomly sampling 128 patches with the size of 5x5 in an image, and comparing the sizes of pixel sums in the patches.

Firstly, calculating an image scale space, then detecting and positioning extreme points, removing edge points, calculating a main direction, and generating a descriptor.

The feature point matching unit calculates the Hamming distance between the two images by utilizing nearest neighbor search, and judges the corresponding relation between feature points by taking the fact that the ratio of the nearest neighbor distance to the next nearest neighbor distance is smaller than a preset threshold value as a matching strategy;

feature matching: and comparing the distances between feature descriptors in the two images through the Hamming distance, and if the ratio of the nearest neighbor distance to the next nearest neighbor distance is smaller than a certain value, successfully matching. Wherein the hamming distance formula is:

the mismatching filtering unit is used for setting a mismatching mechanism for improving the matching success rate due to the interference of factors such as illumination, scale, rotation and the like in the characteristic point matching process, and removing mismatching characteristic points by estimating a camera model.

And (5) mismatching and filtering: if the distance between descriptors is greater than twice the minimum distance, then a match is considered to be incorrect.

The motion restoration structure module is also called a sparse reconstruction module: is a moving structure of a camera that resumes shooting an image, including a position and an attitude of the camera. According to the feature points correlated by the image feature point matching module, calculating to obtain camera attitude parameters, and further calculating to obtain three-dimensional coordinates of sparse point clouds mapped by the feature points through the camera attitude parameters; the method comprises a scene graph construction and optimization unit and a motion recovery structure unit;

the scene graph construction and optimization unit is used for improving the reconstruction efficiency;

and the motion recovery structural unit is used for calculating the pose of the camera and the sparse point cloud coordinates according to the matched characteristic points.

Specifically, the motion restoration structure module may restore the position pose of the camera and the structural information of the scene directly from the image. The module needs to use the matched characteristic points, and can calculate the pose of the camera by using the camera model and the epipolar geometry. The camera model is represented by a perspective matrix of pinhole cameras:

wherein R and t are external parameters of the camera, and are rotation matrix and translation vector respectively; f (f) _a 、f _b 、u ₀ 、v ₀ Is an internal parameter of the camera;

(u, v) is the coordinates of the image plane, (x) _w y _w z _w ) Is the coordinate in the world coordinate system.

The epipolar constraint is expressed as:

wherein F is the basis matrix of the matrix,

wherein E is an essential matrix, E= [ t ]] _x R；

If the feature points in the space are located on the same plane, solving the homography matrix H by using a direct linear transformation method, and then calculating the pose of the camera. Wherein the method comprises the steps of

The equation for solving the homography matrix H is as follows:

wherein H has 8 degrees of freedom, and each pair of points has two constraints;

multi-view stereoscopic modules are also known as dense reconstruction modules: after the pose of the camera is found in the motion structure recovery module, the depth of the pixel points serving as the features is calculated through triangulation on the feature points obtained in the image feature point matching module, so that the corresponding coordinates of the feature points in the three-dimensional space are determined. The coordinates of the three-dimensional points are recovered by triangularization through known camera parameters and matching points, and binding adjustment is carried out subsequently.

According to camera attitude parameters obtained in the motion recovery module, based on reasonable assumptions (such as rigidity of a scene), three-dimensional point cloud information is obtained; the module has the main functions of solving three-dimensional point cloud coordinates of the feature points by using the obtained camera pose information and carrying out fusion optimization on the depth of the feature points.

Specifically, the module is realized by adopting a depth map fusion method; the device comprises a reference image selecting unit, a depth map estimating unit and a depth map refining unit;

the reference image selecting unit is used for searching a group of reference images which can help the images to estimate the depth for each image, so that each pixel point in the original image and the reference image have corresponding points;

the depth map estimating unit is used for estimating a proper depth for the pixel points in the original image by utilizing the luminosity consistency of the corresponding pixel points in the reference image;

the depth map refining unit is used for refining and filtering the depth map under the condition of removing the inconsistent depth of the adjacent depth maps.

In the embodiment of the invention, triangularization is used for solving the point cloud coordinates, the pixel depth is estimated by using the minimum reprojection error through the rays connected with the camera optical center by matching the pixel coordinates of the feature points, and the coordinate schematic diagram of the known camera parameters and the three-dimensional point restored by the matching points is shown in fig. 4. The depth optimization adopts a luminosity consistency assumption and binding adjustment strategy, and generates globally consistent camera parameters and scene structures from the relative camera parameters.

And (3) a surface reconstruction module: gridding (converting the three-dimensional point cloud obtained in the multi-view stereo module into grid information) to facilitate subsequent rendering;

the module excavates local geometric information and graphic structure information in the large-scale point cloud input; converting the three-dimensional point cloud into a three-dimensional grid by adopting a difference value and approximation method (adopting a Delaunay triangulation scheme in the embodiment); the module may also be omitted depending on the application requirements.

Texture mapping module: the method comprises the steps of rendering colors of an original image on grids, adding color information to a three-dimensional grid model obtained by a surface reconstruction module, and rendering to obtain a high-resolution color model, namely adding color information of corresponding pixel points in a picture to the grid information obtained by the surface reconstruction module to obtain a final color three-dimensional model; texture mapping is a method for improving the sense of reality of a three-dimensional model, and aims to improve texture quality and establish a high-fidelity and high-readability real-scene three-dimensional model. The main algorithm comprises methods of nearest neighbor sampling, bilinear filtering and the like, and global color editing is carried out on the grid model through view angle selection and texture coordinate calculation.

The multi-view texture image creation flow has the following 5 steps, as shown in fig. 5.

Step 1, visual angle selection: the selection of the viewing angle mainly considers three points; firstly, the scale of the image, secondly, the detail richness of the image and thirdly, the visibility of the image. Each vertex requires a unique perspective for obtaining texture information.

Step 2, calculating texture coordinates: firstly, the grids are projected onto a visual image, and after the corresponding relation of points among the projected triangles is determined, the texture coordinates are normalized.

Step 3, creating texture images: and projecting the grid onto a corresponding image, and intercepting the image in the range of the minimum bounding box as a texture image.

Step 4, color adjustment: because obvious gaps exist at different texture grids due to camera exposure or illumination difference among different view angles, a color adjustment amount needs to be added to each pixel, and the adjustment amount of each pixel is obtained in an interpolation mode, so that the color difference at the gaps is as small as possible.

Step 5, editing the image: for the region with serious gaps, global color adjustment cannot ensure complete gap removal, and the foreground image and the background image need to be mixed, so that the fused image meets the condition that the pixel value on the boundary is the same as that of the background image, and meanwhile, the gradient in the foreground region is the same as that of the guiding gradient field.

So far, the whole three-dimensional modeling framework based on the multi-view image is finished.

As shown in fig. 6, the present invention can reconstruct a three-dimensional model of a scene based on information in an image. Compared with the traditional three-dimensional reconstruction algorithm based on the image, the method has the advantages that a depth map refining unit is added, redundancy of dense point clouds is reduced, and reconstruction efficiency is improved. Meanwhile, when the inclination angle of the image is overlarge, the image is preprocessed through positive view angle matrix transformation, enough characteristic points can still be extracted, the probability of matching failure is reduced, and the robustness of the system is enhanced.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced with equivalents; such modifications and substitutions do not depart from the spirit of the technical solutions according to the embodiments of the present invention.

Claims

1. The utility model provides a three-dimensional modeling system based on multi-view image, includes image feature point detection and matching module, motion restoration structure module, multi-view three-dimensional module and texture mapping module, its characterized in that:

the image feature point detection and matching module performs forward-looking angular matrix transformation on the image, detects the same feature points in the image under different visual angles, and associates the same feature points through matching;

2. A multi-view image based three-dimensional modeling system according to claim 1, wherein: the forward angle matrix transformation formula is as follows:

wherein, (u, v) is the horizontal and vertical coordinates of the original image pixel, and (u ', v') is the horizontal and vertical coordinates of the transformed image pixel, a represents one element of a rotation matrix, the rotation matrix represents the pose state of the camera, and the rotation matrix can be calculated by the corresponding feature points in the image.

3. A multi-view image based three-dimensional modeling system according to claim 1, wherein: the image feature point detection and matching module comprises: the device comprises a feature point detection unit, a feature point description unit, a feature point matching unit and a mismatching filtering unit;

the feature point description unit encodes a neighborhood range of the feature point by using the feature descriptor; the feature descriptor is a unique group of marks of the pixel block, is in a multidimensional 0/1 vector form, is determined by sampling a group of patches in an image, comparing the sizes of pixels in the patches, setting 1 when the sizes of the pixels are larger than a pixel threshold value, setting 0 when the sizes of the pixels are smaller than the pixel threshold value, and taking the obtained multidimensional 0/1 vector as a descriptor;

and the mismatching filtering unit is used for setting a filtering threshold according to the minimum distance between the feature descriptors, and considering that the matching is wrong when the distance between the feature descriptors is larger than the filtering threshold.

4. A multi-view image based three-dimensional modeling system according to claim 3, wherein:

the image gradient is:

the Harris matrix is:

where (x, y) represents the position of the pixel, I (x, y) represents the pixel value of the (x, y) position,the pixel value representing the (x, y) position is biased against the abscissa and +.>Performing partial derivative calculation on the ordinate by using pixel values representing the (x, y) position, wherein w (x, y) represents a weight coefficient;

wherein a and b are two feature descriptor subsequences of picture pixel points to be matched, wherein, the x represents exclusive-or operation, and n represents dimension of descriptor.

5. A multi-view image based three-dimensional modeling system according to claim 1, wherein: the motion recovery structure module calculates the pose of the camera by using a camera model and epipolar geometry, and the camera model is represented by a perspective matrix of a pinhole camera:

the epipolar constraint is expressed as:

The equation for solving the homography matrix H is as follows:

wherein the method comprises the steps ofAnd->The pixel coordinates of a pair of matching points are represented, hij represents one element of a homography matrix, H has 8 degrees of freedom, and each pair of points has two constraints;

6. A multi-view image based three-dimensional modeling system according to claim 1, wherein: the multi-view stereoscopic module is realized by adopting a depth map fusion method, and comprises the following steps: a reference image selecting unit, a depth map estimating unit and a depth map extracting unit;

the depth map estimation unit is used for estimating a depth for the pixel points in the original image by utilizing the luminosity consistency of the corresponding pixel points in the reference image;

7. A multi-view image based three-dimensional modeling system according to claim 1, wherein: the triangularization is to estimate the pixel depth by matching the pixel coordinates of the feature points with the rays connected with the camera optical center and minimizing the re-projection error, wherein the depth optimization is to adopt a luminosity consistency assumption and binding adjustment strategy to generate the camera parameters and scene structures which are globally consistent with the relative camera parameters.

8. A multi-view image based three-dimensional modeling system according to claim 1, wherein: the device also comprises a surface reconstruction module, wherein the three-dimensional point cloud obtained by the multi-view stereo module is gridded;

9. The three-dimensional modeling method based on the multi-view image is characterized by comprising the following steps of:

step S101: performing orthoscopic angular matrix transformation on the image, detecting the same characteristic points in the image under different visual angles, and correlating the same characteristic points through matching;

step S104: the colors of the original image are rendered by texture mapping.

10. The multi-view image-based three-dimensional modeling method according to claim 9, wherein: in the step S104, the creating of the texture image based on multiple views includes the following steps:

step S204: color adjustment: adding a color adjustment amount to each pixel, and obtaining the color adjustment amount by interpolation so as to minimize the color difference at the gap;

step S205: editing an image: the foreground image and the background image are mixed so that the fused image satisfies that the pixel value on the boundary is the same as the background image, and the gradient in the foreground region is the same as the guiding gradient field.