CN110176064B

CN110176064B - Automatic identification method for main body object of photogrammetric generation three-dimensional model

Info

Publication number: CN110176064B
Application number: CN201910440234.0A
Authority: CN
Inventors: 高云龙
Original assignee: Wuhai Dashi Intelligence Technology Co ltd
Current assignee: Wuhai Dashi Intelligence Technology Co ltd
Priority date: 2019-05-24
Filing date: 2019-05-24
Publication date: 2022-11-18
Anticipated expiration: 2039-05-24
Also published as: CN110176064A

Abstract

The invention relates to the technical field of three-dimensional digitization, in particular to a subject object automatic identification method for generating a three-dimensional model through photogrammetry. The method is specially used for processing the identification of the main body object of the three-dimensional model generated based on the photogrammetry technology, because the original image required by the photogrammetry calculation is used as the input, the complexity of the problem is reduced, the problem has a possible way to solve, the main body object is identified from each image by a deep learning method, the characteristic matching in the multi-angle shot image in the photogrammetry modeling and the fusion of the identification results of the main body object of the multi-angle image are utilized, and the accuracy of the identification of the main body object in the three-dimensional model is greatly improved.

Description

Automatic identification method for subject object of photogrammetry generated three-dimensional model

Technical Field

The invention relates to the technical field of three-dimensional digitization, in particular to a subject object automatic identification method for generating a three-dimensional model through photogrammetry.

Background

The demand of various technologies such as virtual reality on three-dimensional models is increased sharply, so that a large number of semi-automatic modeling technologies are promoted, however, background noise generally exists in the models generated by the automatic modeling technologies, the models need to be reprocessed, the main bodies and the backgrounds of the models need to be identified, background areas are stripped, and invalid data are reduced. The main part discernment of present model mainly relies on the operation of interior trade, needs the staff to prune one by one and cuts, and not only consuming time for a long time but also work load is big, has consumed huge manpower resources.

Disclosure of Invention

The invention aims to solve the defects in the prior art and provides an automatic identification method of a main body object for generating a three-dimensional model through photogrammetry.

In order to achieve the purpose, the invention adopts the following technical scheme:

a method for automatically identifying a subject object of a photogrammetric generation three-dimensional model is designed, and is characterized by comprising the following steps:

s1, sample training: introducing a convolution neural network data preprocessing method used in an experiment, wherein the method comprises the steps of picture scaling and data enhancement, and the picture scaling refers to the following steps: the input image is zoomed into a smaller picture to be trained, the training speed is improved, and the data enhancement can be realized by the following methods: random rotation, random cropping, flipping, gaussian noise, translation and scaling, in addition to these common data enhancement methods, there are some advanced enhancement methods, deep photo style transfer, converting the image from this domain to another;

s2, single image subject object identification based on deep learning and foreground and background optimization of graph theory segmentation;

s3, fusing deep learning results: the output result of the deep learning network has certain errors, so the deep learning segmentation result needs to be processed, and considering that the shooting position of the subject object is relatively centered and the classification accuracy of the image center region in the deep learning segmentation result is high, the center part of the foreground region is reserved, the boundary part is discarded, and the seed point preprocessing flow is as follows:

m1, selecting points: firstly, drawing an external rectangle of a foreground area, drawing a circle by taking a center point of the rectangle as a circle center and taking a ratio of 1/4 of the length of a diagonal line of the rectangle as a radius, selecting foreground pixel points in the circle as seed points, and thus, remaining seed points come from the center area of an original segmentation structure, eliminating a boundary part and weakening the interference on an algorithm caused by boundary segmentation errors when the seed points are transmitted to a graph theory segmentation algorithm;

m2, thinning: the number of the screened foreground points is still too large, the repeatability between pixels is strong, and the calculated amount is still too large for a program, so that pixel points are thinned by taking the step length as 5, the data amount is reduced to 1/5 of the data amount before thinning, the total number of the foreground points segmented by deep learning is 151131, after the two steps, 15827 foreground points are reserved, the number of the foreground points is about 1/10 of the previous number, the data amount is obviously reduced, and the operation time of a graph segmentation algorithm is saved;

m3, dividing;

s4, cross validation and identification result optimization;

and S5, cutting a geometric model of the main body object in the three-dimensional model.

Preferably, the S2 deep learning-based single image subject object recognition and the foreground and background optimization of graph theory segmentation specifically include using a mask-cnn network with excellent performance, and with the improvement of a deep learning algorithm, the deep learning network with the region identification capability can be used for single image subject object recognition.

Preferably, the M3 segmentation in the S3 fusion deep learning result specifically comprises an Onecut algorithm in a graph theory segmentation algorithm, the Onecut algorithm is an improvement of a traditional graph cut algorithm, and image segmentation can be completed quickly only by simply inputting foreground and background.

Preferably, the S4 cross validation and the optimization of the recognition result specifically include that, in the process of generating the three-dimensional model, a certain repetition rate needs to be ensured between adjacent images to successfully model, so that each vertex forming the three-dimensional model can be projected on a plurality of images, a main body part of the three-dimensional model is labeled as a foreground in a three-dimensional space, each triangulation network attribute forming the main body part is also a foreground, and the projection on the image should also be segmented as the foreground.

Preferably, in consideration of the error of image segmentation, the main body part may be segmented into the background, so that the model needs to be projected from multiple perspectives, the probability of each triangulation belonging to the foreground is calculated, and whether each triangulation belongs to the foreground or the background is analyzed:

n _fore n representing the number of the triangular mesh projection divided into the foreground on the image _back The number of the same triangulation network projected on the image and divided into the background is shown, a shows the probability that the triangulation network belongs to the foreground on all the images, and the probability threshold is set to be 0.5;

when a is more than or equal to 0.5, marking the triangulation network as a foreground;

when a is less than 0.5, marking the triangulation network as a background, wherein a region represented by the triangulation network P appears on 12 images, 9 images divide the region into a foreground, and 3 images divide the region into a background;

when a =0.75, the triangulation network P is marked as foreground, and is considered to be on the three-dimensional model main body, the probability threshold is set artificially, and it is desired herein to segment the background-removed area as much as possible under the condition of ensuring the integrity of the three-dimensional model main body, so the threshold is set to 0.5, and the parameter can be adjusted in practical application.

Preferably, the S5 cutting of the geometric model of the subject object in the three-dimensional model includes the steps of:

1) The three-dimensional model consists of three parts: the vertex coordinates, the texture coordinates and the triangular network vertex sequence numbers are not needed to participate in the process of converting the three-dimensional space points into the two-dimensional image pixel points, only the three-dimensional coordinates corresponding to each triangular network vertex sequence number are needed to be calculated and projected to the pixel coordinates on the two-dimensional image, and then the foreground or background attributes are given to the three-dimensional triangular network according to the fact that the projected two-dimensional triangular network area belongs to the foreground or background.

2) In computer vision, an image point X is represented in the form of a 3-dimensional homogeneous vector, where the coordinate X representing a point in the world coordinate system is represented in homogeneous coordinates with the coordinate X of the same point in the image coordinate system _cam The equation is as follows:

wherein, K is called a computer calibration matrix, the contained parameters are internal parameters of the camera and called internal parameters, f is the focal length of the camera under the pixel unit, (x 0, y 0) is the coordinate of the principal point of the camera, R is a camera rotation matrix and is related to the rotation angle, the pitch angle and the inclination angle of the camera, C represents the coordinate of the center of the camera in a world coordinate system, the parameters in R and C are not related to the internal configuration of the camera and are related to the position and the orientation of the camera in the world coordinate system and are called external parameters;

X _cam is in the form of a three-dimensional homogeneous vector, and further calculation can obtain pixel coordinates (x, y):

3) The image coordinates of the vertex of the triangular net on each image can be obtained by using the formula in the 2), and due to the shielding of the model, the corresponding coordinates can also be obtained in the image after the shielded area on the back of the model is calculated by the formula, and the distinguishing processing is needed;

4) After the three-dimensional vertexes are reduced to the two-dimensional image, the original three-dimensional triangulation network of the model is also projected to the image in a perspective mode to form a two-dimensional triangulation network, the area of the foreground and the area of the background in the two-dimensional triangulation network are calculated, when the area of the foreground is larger than half of the total area, the two-dimensional triangulation network and the corresponding three-dimensional triangulation network are marked as the foreground, otherwise, the two-dimensional triangulation network and the corresponding three-dimensional triangulation network are marked as the background,

S _fore representing the area of the divided foreground in the triangulation area, S _back The area of the triangle network area divided into the background is shown, when P is more than or equal to 0.5, the triangle network is marked as the foreground, and the label is 1; p<At 0.5 the triangulation is marked as background, the label is 0;

5) Through the mapping relation between the two-dimensional pixel points and the three-dimensional space points, the foreground and background attributes of the two-dimensional pixel points can be assigned to the three-dimensional space points and the triangular net, the main body part and the background part of the three-dimensional model are divided based on a single image, the identification of the main body of the three-dimensional model based on the single image is influenced by the foreground and background division effect of the image, the image division precision is high, the identification of the main body of the three-dimensional model is accurate, the image division precision is low, the identification of the main body of the three-dimensional model has large errors, therefore, the geometric model is restrained from multiple visual angles, and the identification precision of the main body is improved.

The automatic identification method for the main body object of the photogrammetry generated three-dimensional model has the advantages that: the automatic identification method of the subject object of the three-dimensional model generated by photogrammetry is specially used for processing the identification of the subject object of the three-dimensional model generated by photogrammetry technology, and because the original image required by photogrammetry calculation is used as input, the complexity of the problem is reduced, and the problem has a possible way to solve. The patent provides a subject object automatic identification method based on photogrammetry generates three-dimensional model, and by means of a deep learning method, the subject object is identified from each image, and by means of feature matching in multi-angle shooting images in photogrammetry modeling and fusion of multi-angle image subject object identification results, the accuracy of subject object identification in the three-dimensional model is greatly improved.

Drawings

Fig. 1 is a flowchart of a method for automatically identifying a subject object of a three-dimensional model generated by photogrammetry according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.

Referring to fig. 1, a method for automatically identifying a subject object of a photogrammetric three-dimensional model includes the following steps:

s1, sample training: the convolutional neural network data preprocessing method used in the experiment is introduced, and comprises the steps of picture zooming and data enhancement, wherein the picture zooming refers to the following steps: the input image is zoomed into a smaller picture to be trained, the training speed is improved, and the data enhancement can be realized by the following methods: random rotation, random cropping, flipping, gaussian noise, translation and scaling, in addition to these common data enhancement methods, there are some advanced enhancement methods, deep photo style transfer, converting the image from this domain to another;

s2, single image subject object identification based on deep learning and foreground and background optimization of graph theory segmentation: the method adopts a mask-cnn network with excellent performance, and with the improvement of a deep learning algorithm, the deep learning network with the region identification capability can be used for identifying a single image main object;

s3, fusing deep learning results: the output result of the deep learning network has a certain error, so the deep learning segmentation result needs to be processed, and considering that the shooting position of the subject object is relatively centered, the classification accuracy of the central region of the image in the deep learning segmentation result is high, so the central part of the foreground region is reserved, the boundary part is discarded, and the seed point preprocessing flow is as follows:

m1, selecting points: firstly, drawing an external rectangle of a foreground region, drawing a circle by taking a center point of the rectangle as a circle center and taking a ratio of 1/4 of the length of a diagonal line of the rectangle as a radius, selecting foreground pixel points in the circle as seed points, so that the reserved seed points come from the center region of an original segmentation structure, removing a boundary part, and weakening the interference caused by boundary segmentation errors when the seed points are transmitted to a graph theory segmentation algorithm;

m3, segmentation: an Onecut algorithm in a graph theory segmentation algorithm is adopted, the Onecut algorithm is an improvement on the traditional graph cut algorithm, and image segmentation can be quickly completed only by simply inputting foreground and background;

s4, cross validation and identification result optimization: in the process of generating the three-dimensional model, a certain repetition rate needs to be ensured between adjacent images for successful modeling, so that each vertex forming the three-dimensional model can be projected on a plurality of images, a main body part of the three-dimensional model is marked as a foreground in a three-dimensional space, each triangulation network attribute forming the main body part is also a foreground, and the image projected on the foreground is also divided into a foreground;

considering that the image segmentation has errors, the main part may be segmented into the background, so that the model needs to be projected from multiple perspectives, the probability of each triangulation network belonging to the foreground is calculated, and whether the triangulation network belongs to the foreground or the background is analyzed:

n _fore n representing the number of the divided foreground of the triangulation projection on the image _back The number of the same triangulation network projected on the image and divided into the background is shown, a shows the probability that the triangulation network belongs to the foreground on all the images, and the probability threshold is set to be 0.5;

marking the triangulation network as a foreground when a is more than or equal to 0.5;

when a is less than 0.5, marking the triangular net as a background, wherein a triangular net P represents an area which appears on 12 images, 9 images divide the area into a foreground, and 3 images divide the area into a background;

S5, cutting a geometric model of the main body object in the three-dimensional model, wherein the method comprises the following steps:

2) In computer vision, an image point X is expressed in the form of a 3-dimensional homogeneous vector, and a coordinate X which represents a point in the world coordinate system at a homogeneous coordinate is the same point coordinate X in the image coordinate system _cam The equation is as follows:

wherein, K is called a computer calibration matrix, the parameters contained in the matrix are camera internal parameters called internal parameters (internal parameters), f is the camera focal length in pixel units, (x 0, y 0) are the coordinates of the principal point of the camera, R is a camera rotation matrix and is related to the rotation angle, the pitch angle and the inclination angle of the camera, C represents the coordinates of the center of the camera in a world coordinate system, the parameters in R and C are not related to the internal configuration of the camera, and are related to the position and the orientation of the camera in the world coordinate system and are called external parameters (external parameters);

3) The image coordinates of the vertex of the triangular net on each image can be obtained by using the formula in the step 2), and due to the shielding of the model, the corresponding coordinates of the shielded area on the back of the model can be obtained in the image after the calculation of the formula, so that the distinguishing processing is needed;

S _fore representing the area of the divided foreground in the triangulation area, S _back The area of the triangle network area divided into the background is shown, when P is larger than or equal to 0.5, the triangle network is marked as the foreground, and the label is 1; p<At 0.5 the triangulation is marked as background, the label is 0;

5) Through the mapping relation between the two-dimensional pixel points and the three-dimensional space points, the foreground and background attributes of the two-dimensional pixel points can be assigned to the three-dimensional space points and the triangular net, the main body part and the background part of the three-dimensional model are divided based on a single image, the main body identification of the three-dimensional model based on the single image is influenced by the foreground and background division effect of the image, the image division precision is high, the main body identification of the three-dimensional model is accurate, the image division precision is low, the main body identification of the three-dimensional model has large errors, therefore, the multi-view constraint geometric model is needed, and the main body identification precision is improved.

The invention has the advantages that:

1. the problem of identification of a three-dimensional model main object is solved by using the identification of the main object of the multi-angle image;

2. the method utilizes a deep learning method to identify the subject object, and improves the identification precision through the optimization of a graph cutting algorithm;

3. the multi-angle image recognition results are fused and optimized, the conversion relation from two-dimension to three-dimension is ingeniously utilized, the problem that the recognition precision of a single image is insufficient is solved, and the recognition precision of the main object of the three-dimensional image is improved.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered as the technical solutions and the inventive concepts of the present invention within the technical scope of the present invention.

Claims

1. A subject object automatic identification method for generating a three-dimensional model through photogrammetry is characterized by comprising the following steps:

s1, sample training: introducing a convolution neural network data preprocessing method used in an experiment, wherein the method comprises the steps of picture scaling and data enhancement, and the picture scaling refers to the following steps: the input image is zoomed into a smaller picture to be trained, the training speed is improved, and the data enhancement can be realized by the following methods: random rotation, random clipping, flipping, gaussian noise, translation and scaling;

m1, selecting points: firstly, drawing an external rectangle of a foreground region, drawing a circle by taking a center point of the rectangle as a center of the circle and taking a ratio of 1/4 of the length of a diagonal line of the rectangle as a radius, selecting foreground pixel points in the circle as seed points, and removing a boundary part, wherein the reserved seed points all come from the center region of an original segmentation structure;

m2, thinning: the number of the screened foreground points is still excessive, the repeatability between pixels is strong, and the calculated amount is still excessive for a program, so that the pixel points are thinned by taking the step length as 5, the data amount is reduced to 1/5 of the data amount before thinning, the total number of the foreground points segmented by deep learning is 151131, and after the two steps, 15827 foreground points are reserved, and the number of the foreground points is about 1/10 of the previous number;

m3, dividing;

s4, cross validation and identification result optimization;

2. The method according to claim 1, wherein the S2 of performing the subject object recognition based on the single image and the foreground and background optimization of the graph theory segmentation specifically includes using a mask-cnn network with excellent performance, and as the deep learning algorithm is improved, the deep learning network with the region identification capability can be used for performing the subject object recognition on the single image.

3. The method for automatically identifying a subject object through photogrammetry to generate a three-dimensional model according to claim 1, wherein the S3 fusion deep learning result M3 segmentation specifically comprises an OneCut algorithm in a graph theory segmentation algorithm, and the OneCut algorithm is an improvement of a traditional graph cut algorithm.

4. The method of claim 1, wherein the step S4 of cross-validation and optimization of the recognition result specifically includes that in the process of generating the three-dimensional model, a certain repetition rate needs to be ensured between adjacent images to successfully model, so that each vertex forming the three-dimensional model can be projected onto a plurality of images, the main body of the three-dimensional model is labeled as a foreground in the three-dimensional space, and each triangulation property forming the main body is also a foreground, and the projection onto the images should be segmented as a foreground.

5. The method of claim 4, wherein the subject is segmented into the background, considering the error of image segmentation, so that the model needs to be projected from multiple perspectives, calculating the probability of each triangle network belonging to the foreground, and analyzing whether it belongs to the foreground or the background:

n _fore n representing the number of the divided foreground of the triangulation projection on the image _back The number of the divided backgrounds projected on the image by the same triangulation network is represented, a represents the probability that the triangulation network belongs to the foreground on all the images, and the probability threshold is set to be 0.5;

6. The method of claim 1, wherein the step of S5 cutting the geometric model of the subject object in the three-dimensional model comprises the steps of:

1) The three-dimensional model consists of three parts: the vertex coordinates, the texture coordinates and the triangular net vertex sequence numbers are not needed to participate in the process of converting the three-dimensional space points into the two-dimensional image pixel points, only the three-dimensional coordinates corresponding to the vertex sequence numbers of each triangular net are needed to be calculated and projected to the pixel coordinates on the two-dimensional image, and then the foreground or background attributes are given to the three-dimensional triangular net according to the fact that the projected two-dimensional triangular net area belongs to the foreground or background;

2) In computer vision, the image point x is represented in the form of a 3-dimensional homogeneous vector, where the homogeneous vector is representedCoordinate X representing a point in world coordinate system under coordinates and coordinate X of the same point in image coordinate system under coordinates _cam The equation is as follows:

4) After the three-dimensional vertexes are reduced to the two-dimensional image, the original three-dimensional triangulation of the model is also projected to the image in a perspective mode to form a two-dimensional triangulation, the area of the foreground and the area of the background in the two-dimensional triangulation are calculated, when the area of the foreground is larger than half of the total area, the two-dimensional triangulation and the corresponding three-dimensional triangulation are marked as the foreground, otherwise, the two-dimensional triangulation is marked as the background,

5) Through the mapping relation between the two-dimensional pixel points and the three-dimensional space points, the foreground and background attributes of the two-dimensional pixel points can be assigned to the three-dimensional space points and the triangular net, the main body part and the background part of the three-dimensional model are divided based on a single image, the main body identification of the three-dimensional model based on the single image is influenced by the foreground and background division effect of the image, and the image division precision is high.