CN117057086A

CN117057086A - Three-dimensional reconstruction method, device and equipment based on target identification and model matching

Info

Publication number: CN117057086A
Application number: CN202310731172.5A
Authority: CN
Inventors: 李靖; 雷子钒; 赵宏杰; 陆川
Original assignee: Chengdu Guoxing Aerospace Technology Co ltd
Current assignee: Chengdu Guoxing Aerospace Technology Co ltd
Priority date: 2023-06-19
Filing date: 2023-06-19
Publication date: 2023-11-14

Abstract

The application provides a three-dimensional reconstruction method, device and equipment based on target identification and model matching, relates to the technical field of three-dimensional modeling, and aims to solve the problem of low three-dimensional reconstruction efficiency. The method comprises the following steps: determining the corrected real geographic coordinates of a plurality of pixel points according to the pixel coordinates of the plurality of pixel points corresponding to each image target in one acquired image; the acquired image is an image which is acquired for a target area and contains depth information, and the plurality of pixel points comprise a centroid and boundary angular points; respectively determining target model samples corresponding to the image targets meeting the preset similarity conditions from a preset model sample library; the model sample library comprises a plurality of facility model samples and sample information of the facility model samples; and constructing a three-dimensional scene model of the target area according to the corrected real geographic coordinates of the plurality of pixel points and the target model samples.

Description

Three-dimensional reconstruction method, device and equipment based on target identification and model matching

Technical Field

The application relates to the technical field of three-dimensional modeling, and provides a three-dimensional reconstruction method, device and equipment based on target identification and model matching.

Background

In recent years, immersive experience has become a future development direction, wherein a three-dimensional reconstruction of a real environment to a virtual network environment is an indispensable basis for immersive experience. Specifically, three-dimensional reconstruction can be classified into indoor three-dimensional reconstruction and outdoor three-dimensional reconstruction according to the reconstruction target.

At present, indoor three-dimensional reconstruction has two modes of manual modeling and automatic modeling. Wherein, manual modeling can be performed through modeling software commonly found in markets, which is time-consuming and labor-consuming; and the automatic modeling can automatically generate a three-dimensional model by processing corresponding data through a computer in a mode of combining software and hardware. Further, the automatic modeling can be specifically divided into an active measurement method and a passive measurement method. Active measurement can be performed by transmitting a controllable signal to the target object, calculating depth information of the target based on the transmitted information number and the return signal, and further modeling and measuring, such as: laser ranging, etc., has limited applications because the required instruments are expensive in large scale and not easily portable. The passive measurement can acquire three-dimensional information of a target through an image, the common passive measurement is multi-view three-dimensional reconstruction, the essence of the passive measurement is a focal length method, the focal length method is similar to a laser ranging principle, the distances between different positions of the target and the camera can be measured by adjusting the focal length of the camera, namely, the depth of a measured point is calculated by utilizing a lens imaging formula, when ranging is performed on different areas, the focusing position is required to be continuously changed to acquire depth information of different positions, and a plurality of images are required to be acquired at the same position, so that the data acquisition work is complicated, and the whole calculation process is complicated. In addition, as the amount of data increases, the efficiency also doubles.

The outdoor three-dimensional reconstruction can comprehensively sense a large-range complex scene, can intuitively reflect the attributes such as appearance texture, position and height of the ground object through the three-dimensional reconstruction data acquisition equipment and the data result generated by the professional three-dimensional reconstruction system, and provides guarantee for real effect and mapping level precision. Currently, unmanned aerial vehicle oblique photogrammetry is one of the most widely used fields of outdoor three-dimensional reconstruction. According to the oblique photography technology, a plurality of sensors are carried on a flight platform, and images are collected from different angles such as a vertical side view, a four side view and the like, so that more abundant side texture information can be obtained. The principle is as follows: the position, the posture and other external parameters of the camera and the horizontal, vertical visual angles, main points, focal lengths and other internal parameters are known, the spatial position of a target is calculated according to the rear intersection principle, and then the outdoor three-dimensional reconstruction is realized.

In summary, whether it is indoor three-dimensional reconstruction or outdoor three-dimensional reconstruction, three-dimensional reconstruction is basically performed through a large number of pictures and a large amount of time at present, which results in lower three-dimensional reconstruction efficiency. Therefore, how to improve the three-dimensional reconstruction efficiency is a problem to be solved at present.

Disclosure of Invention

The application provides a three-dimensional reconstruction method, device and equipment based on target identification and model matching, which are used for solving the problem of low three-dimensional reconstruction efficiency.

In one aspect, a three-dimensional reconstruction method based on object recognition and model matching is provided, the method comprising:

determining the corrected real geographic coordinates of a plurality of pixel points according to the pixel coordinates of the plurality of pixel points corresponding to each image target in one acquired image; the acquired image is an image which is acquired for a target area and contains depth information, and the plurality of pixel points comprise a centroid and boundary angular points;

respectively determining target model samples corresponding to the image targets meeting the preset similarity conditions from a preset model sample library; the model sample library comprises a plurality of facility model samples and sample information of the facility model samples;

and constructing a three-dimensional scene model of the target area according to the corrected real geographic coordinates of the plurality of pixel points and the target model samples.

In one aspect, there is provided a three-dimensional reconstruction apparatus based on object recognition and model matching, the apparatus comprising:

The real geographic coordinate determining unit is used for determining the corrected real geographic coordinates of a plurality of pixel points according to the pixel coordinates of the pixel points corresponding to each image target in one acquired image; the acquired image is an image which is acquired for a target area and contains depth information, and the plurality of pixel points comprise a centroid and boundary angular points;

the target model sample determining unit is used for respectively determining target model samples corresponding to the image targets meeting the preset similarity conditions from a preset model sample library; the model sample library comprises a plurality of facility model samples and sample information of the facility model samples;

and the three-dimensional scene model construction unit is used for constructing a three-dimensional scene model of the target area according to the corrected real geographic coordinates of the plurality of pixel points and the target model samples.

Optionally, the apparatus further includes a sample information determining unit, the sample information determining unit including:

extracting features of the acquired image, and determining respective corresponding color information and texture information of each image target in the acquired image;

Performing feature recognition on the acquired image to determine shape information corresponding to each image target in the acquired image;

and determining target model samples corresponding to the image targets meeting the preset similarity condition from a preset model sample library respectively, wherein the target model samples comprise:

and respectively determining target model samples corresponding to the image targets meeting the preset similarity condition from a preset model sample library according to the color information, the texture information and the shape information.

Optionally, the object model sample determining unit is further configured to:

determining a target model set matched with the semantic category of any image target from the preset model sample library according to the semantic category of any image target aiming at any image target in the image targets;

and determining a target model sample corresponding to the any image target meeting a preset similarity condition from the target model set according to the color information, the texture information and the shape information of the any image target.

Optionally, the apparatus further includes a semantic category determining unit, where the semantic category determining unit includes:

According to a preset neural network model, carrying out semantic recognition on each image target in the acquired image, and determining the semantic category corresponding to each image target.

Optionally, the apparatus further includes a model sample library construction unit, which includes:

three-dimensional modeling is carried out on various outdoor facilities and various outdoor facilities in the real world by adopting preset three-dimensional modeling software, so as to obtain three-dimensional model samples which are applied outside the various rooms and respectively correspond to the various outdoor facilities;

and constructing the model sample library according to the three-dimensional model samples respectively corresponding to the various outdoor facilities and applied to the various outdoor facilities.

Optionally, the real geographic coordinate determining unit is further configured to:

determining first world geographic coordinates of the plurality of pixel points according to the depth information of the plurality of pixel points, the pixel coordinates of the plurality of pixel points, the focal length of the image acquisition equipment when acquiring the acquired image and the length and width of the acquired image;

according to the equipment geographic coordinates of the image acquisition equipment when acquiring the acquired image, carrying out translation operation on the first world geographic coordinates of the plurality of pixel points, and determining the second world geographic coordinates of the plurality of pixel points;

And according to the gesture information of the image acquisition equipment when acquiring the acquired image, performing rotation operation on the second world geographic coordinates of the plurality of pixel points, and determining the corrected real geographic coordinates of the plurality of pixel points.

Optionally, the three-dimensional scene model building unit is further configured to:

aiming at any image target in the image targets, acquiring the rotation angle and the size of a target model sample corresponding to the any image target according to the corrected real geographic coordinates of each boundary corner point corresponding to the any image target;

carrying out affine transformation on a target model sample corresponding to any image target according to the rotation angle, the size and the corrected real geographic coordinates of the mass center corresponding to any image target, and constructing a three-dimensional monomer model corresponding to any image target;

and constructing a three-dimensional scene model of the target area according to the three-dimensional monomer models corresponding to the image targets.

In one aspect, there is provided an apparatus for three-dimensional reconstruction based on object recognition and model matching, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the methods described above when executing the computer program.

In one aspect, there is provided a computer storage medium having stored thereon computer program instructions which, when executed by a processor, perform the steps of any of the methods described above.

In the embodiment of the application, firstly, the corrected real geographic coordinates of the centroid and the boundary corner point corresponding to each image target can be determined according to the pixel coordinates of the centroid and the boundary corner point corresponding to each image target in an acquired image acquired for a target area; furthermore, target model samples corresponding to the image targets meeting the preset similarity conditions can be respectively determined from a preset model sample library; and finally, constructing a three-dimensional scene model of the target area according to the corrected real geographic coordinates of the center of mass and the boundary corner points corresponding to each image target and each target model sample. Therefore, in the embodiment of the application, the three-dimensional reconstruction is performed by only one acquired image, so that a large number of repeated images of the same target at different view angles are not required, and further, a large number of steps of feature extraction, feature matching and point cloud generation are not required, huge calculation amount is saved, and the three-dimensional reconstruction efficiency is greatly improved. In addition, because the target model samples matched with the image targets are directly obtained from the preset model sample library, and the three-dimensional scene model is built based on the target model samples, the three-dimensional reconstruction efficiency is further improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the related art, the drawings that are required to be used in the embodiments or the related technical descriptions will be briefly described, and it is apparent that the drawings in the following description are only embodiments of the present application, and other drawings may be obtained according to the provided drawings without inventive effort for those skilled in the art.

Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;

FIG. 2 is a schematic flow chart of a three-dimensional reconstruction method based on object recognition and model matching according to an embodiment of the present application;

FIG. 3 is a schematic diagram of determining real geographic coordinates according to an embodiment of the present application;

FIG. 4 is a schematic diagram of obtaining a rotation angle according to an embodiment of the present application;

fig. 5 is a schematic diagram of a three-dimensional reconstruction device based on object recognition and model matching according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application. Embodiments of the application and features of the embodiments may be combined with one another arbitrarily without conflict. Also, while a logical order is depicted in the flowchart, in some cases, the steps depicted or described may be performed in a different order than presented herein.

Currently, three-dimensional reconstruction can be classified into indoor three-dimensional reconstruction and outdoor three-dimensional reconstruction according to a reconstruction target. When the indoor three-dimensional reconstruction is carried out, if manual modeling is adopted, the time and the labor are very consumed, the required instrument is high in price and not easy to carry, and the application occasions are limited; if the automatic modeling is adopted, a plurality of images need to be acquired for the same target, the data acquisition work is complicated, the whole calculation flow is complicated, and the efficiency is reduced by times along with the increase of the data quantity. When outdoor three-dimensional reconstruction is performed, the shot images have high overlapping degree, large redundancy, low calculation efficiency and incapability of realizing model singulation.

Based on the above, the embodiment of the application provides a three-dimensional reconstruction method based on object identification and model matching, in the method, firstly, the corrected real geographic coordinates of the centroid and boundary corner point corresponding to each image object in an acquired image acquired for a target area can be determined only according to the pixel coordinates of the centroid and boundary corner point corresponding to each image object in the acquired image; furthermore, target model samples corresponding to the image targets meeting the preset similarity conditions can be respectively determined from a preset model sample library; and finally, constructing a three-dimensional scene model of the target area according to the corrected real geographic coordinates of the center of mass and the boundary corner points corresponding to each image target and each target model sample. Therefore, in the embodiment of the application, the three-dimensional reconstruction is performed by only one acquired image, so that a large number of repeated images of the same target at different view angles are not required, and further, a large number of steps of feature extraction, feature matching and point cloud generation are not required, huge calculation amount is saved, and the three-dimensional reconstruction efficiency is greatly improved. In addition, because the target model samples matched with the image targets are directly obtained from the preset model sample library, and the three-dimensional scene model is built based on the target model samples, the three-dimensional reconstruction efficiency is further improved.

After the design idea of the embodiment of the present application is introduced, some simple descriptions are made below for application scenarios applicable to the technical solution of the embodiment of the present application, and it should be noted that the application scenarios described below are only used for illustrating the embodiment of the present application and are not limiting. In the specific implementation process, the technical scheme provided by the embodiment of the application can be flexibly applied according to actual needs.

Fig. 1 is a schematic diagram of an application scenario provided in an embodiment of the present application. The application scene may include a three-dimensional reconstruction device 10 and an image acquisition device 20.

The image capturing device 20 may be used to capture an image of a target area in the real world, for example, a camera, a mobile phone, or the like having an image capturing function. The three-dimensional reconstruction device 10 may be used for three-dimensional reconstruction of acquired images, for example, it may be a personal computer (personal computer, PC) or the like. The three-dimensional reconstruction device 10 may include one or more processors 101, memory 102, I/O interfaces 103, and a database 104. Specifically, the processor 101 may be a central processing unit (central processing unit, CPU), or a digital processing unit or the like. The memory 102 may be a volatile memory (RAM), such as a random-access memory (RAM); the memory 102 may also be a non-volatile memory (non-volatile memory), such as a read-only memory, a flash memory (flash memory), a Hard Disk Drive (HDD) or a Solid State Drive (SSD), or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto. The memory 102 may be a combination of the above. The memory 102 may store program instructions of the three-dimensional reconstruction method based on object recognition and model matching provided by the embodiment of the present application, where the program instructions, when executed by the processor 101, can be used to implement steps of the three-dimensional reconstruction method based on object recognition and model matching provided by the embodiment of the present application, so as to solve the problem of low three-dimensional reconstruction efficiency. The database 104 may be used to store data such as a plurality of facility model samples and sample information of a plurality of facility model samples involved in the solution provided by the embodiment of the present application.

In the embodiment of the present application, when the three-dimensional reconstruction of the target area is required, the image acquisition device 20 may acquire an image of the target area, then, the three-dimensional reconstruction device 10 may acquire an acquired image acquired by the image acquisition device 20 through the I/O interface 103 and transmit the acquired image to the memory 102, and further, the processor 101 performs three-dimensional scene reconstruction on an acquired image packet in the memory 102 according to the program instruction of the three-dimensional reconstruction method based on target recognition and model matching provided by the embodiment of the present application. In addition, data such as a number of facility model samples and sample information of a number of facility model samples involved in the overall three-dimensional scene reconstruction process may also be stored in the database 104.

Of course, the method provided by the embodiment of the present application is not limited to the application scenario shown in fig. 1, but may be used in other possible application scenarios, and the embodiment of the present application is not limited. The functions that can be implemented by each device in the application scenario shown in fig. 1 will be described together in the following method embodiments, which are not described in detail herein. The method according to the embodiment of the present application will be described below with reference to the accompanying drawings.

As shown in fig. 2, a schematic flow chart of a three-dimensional reconstruction method based on object recognition and model matching according to an embodiment of the present application may be implemented by the three-dimensional reconstruction device 10 in fig. 1, and specifically, the flow chart of the method is described below.

Step 201: and determining the corrected real geographic coordinates of the plurality of pixel points only according to the pixel coordinates of the plurality of pixel points corresponding to each image target in one acquired image.

In an embodiment of the present application, the acquired image may be an image acquired for the target area and including depth information, and the plurality of pixel points may include a centroid and a boundary corner point.

Specifically, in order to reduce the calculation amount, in the embodiment of the present application, when determining the corrected real geographic coordinates of each image object in the acquired image, only the centroid and the boundary corner point corresponding to each image object may be determined, where, when modeling a three-dimensional scene, the centroid may be used to determine the position of the image object in the real world, and the boundary corner point may be used to determine the size of the image object in the real world.

In one possible implementation, to make the three-dimensional scene model more accurate, the internal and external parameters of the image capturing device 20 may be used to calculate corrected real geographic coordinates of a plurality of pixels corresponding to each image object in the captured image. As shown in fig. 3, a schematic diagram of determining real geographic coordinates according to an embodiment of the present application is shown, where a specific determining process is as follows:

Step 301: and determining first world geographic coordinates of the plurality of pixel points according to the depth information of the plurality of pixel points, the pixel coordinates of the plurality of pixel points, the focal length of the image acquisition equipment when acquiring one acquired image and the length and width of the acquired image.

In the embodiment of the present application, assuming that the device geographical coordinates of the image capturing device 20 are (0, 0), the following equation (1) may be used to determine the first world geographical coordinates (Xc, yc, zc) of the plurality of pixel points:

the upper left corner of the acquired image is assumed to be an origin (0, 0), and x and y are pixel coordinates of pixel points in the acquired image; xc is the longitude of the pixel in the captured image, yc is the latitude of the pixel in the captured image, zc is the depth information of the pixel in the captured image (the distance between the image capturing device 20 and each pixel in the real world); (u) ₀ ，v ₀ ) To acquire the coordinates of the main pixel point of the image, typically the center point of the acquired image, for example, a 640 x 480 picture, the coordinates of the main pixel point are (320, 240); fx, fy are the horizontal and vertical focal lengths of the image capturing apparatus 20, respectively, and specifically, fx, fy can be determined by deduction using the following formulas (2) - (5):

width＝2*f*tan(FovX) (2)

height＝2*f*tan(FovY) (3)

the width and hh of the imaging plane inside the image capturing device 20 can be checked in the image attribute, for example, 640×480 pixels, that is, the image length is 640 and the width is 480; f is the focal length of the image capturing device 20, and can be obtained directly from the manufacturer who produces or sells the imaging device; fovX is the horizontal viewing angle and FovY is the vertical viewing angle.

Further, the correspondence between the length and width of the imaging plane inside the image capturing apparatus 20 and the length and width of the captured image can be determined by using the following formulas (4), (5):

f/fx＝width/pixelX→fx＝f*pixelX/width (4)

f/fy＝height/pixelY→fy＝f*pixelY/height (5)

wherein pixelX, pixelY is the length and width of the acquired image, respectively. Further, the horizontal direction focal length fx and the vertical direction focal length fy can be derived from the formulas (2) - (5). Further, the focal lengths fx and fy will be derived, and the coordinates of the main pixel point in the acquired image (u ₀ ，v ₀ ) Pixel coordinates (x, y) of the centroid and the boundary corner point corresponding to each image object, and depth information Zc of the centroid and the boundary corner point corresponding to each image object to determine each image objectFirst world geographical coordinates (Xc, yc, zc) of the centroid and the boundary corner point.

In one possible implementation manner, the depth information Zc of the centroid and the boundary corner corresponding to each image object may be obtained in the following manners:

(1) An image is acquired directly with a depth camera (e.g., microsoft Kinect camera) to obtain depth information for each pixel in the acquired image.

(2) And acquiring an image by using a binocular camera, and directly acquiring depth information of each pixel point in the acquired image.

(3) The parameter learning method comprises the following steps: the input training image dataset (known depth information) is modeled using a Markov random field (Markov Random Field, MRF) model to invert the unknown parameters. And forward calculating the depth information of the acquired image by using the calculated parameters.

(4) Non-parametric learning method: based on the input training image dataset (known depth information), the depth information of the acquired image is predicted on the premise that similar images have similar depths.

(5) And estimating the depth of the single image by using an artificial neural network.

A. The relationship between the image and depth information is fitted by a complex multi-layer convolutional neural network (Convolutional Neural Networks, CNN) structure.

B. The convolutional neural network CNN is combined with a conditional random field (Conditional Random Field, CRF), and the training of the CNN is constrained by a CRF model, so that the prediction is more accurate.

Step 302: and carrying out translation operation on the first world geographic coordinates of the plurality of pixel points according to the equipment geographic coordinates of the image acquisition equipment when acquiring one acquired image, and determining the second world geographic coordinates of the plurality of pixel points.

In the embodiment of the present application, since the first world geographic coordinates obtained according to the above-described formulas (1) - (5) are calculated on the assumption that the true geographic coordinates of the image capturing apparatus 20 are (0, 0). Therefore, to obtain the coordinates of the centroid and the boundary corner of each image object in the real world, the translation operation needs to be performed on the first world geographic coordinates of the plurality of pixels according to the device geographic coordinates of the image capturing device 20 when capturing one captured image, so as to determine the second world geographic coordinates of the plurality of pixels. The device geographical coordinates may be directly obtained in the real world according to the positioning device in the image capturing device 20, for example, when the mobile phone is used for capturing an image, the device geographical coordinates of the mobile phone when capturing an image may be directly obtained according to the gyroscope of the mobile phone.

Specifically, the second world geographic coordinates (X ₀ ，Y ₀ ，Z ₀ ) Can be determined by using the following equation (6):

(X ₀ ，Y ₀ ，Z ₀ ) ＝ (X _C +longitude，Y _c +latitude，Z _c +height) (6)

wherein longitude, latitude, hh are the geographical coordinates of the image capturing device 20 in the real world, respectively.

Step 303: and according to the gesture information of the image acquisition equipment when acquiring one acquired image, performing rotation operation on the second world geographic coordinates of the plurality of pixel points, and determining the corrected real geographic coordinates of the plurality of pixel points.

In the embodiment of the application, since the transformation from the camera coordinate system to the world coordinate system belongs to rigid transformation, namely, the object is not deformed, only translation and rotation operations are needed. Therefore, after the second world geographic coordinates are obtained by the panning operation, it is also necessary to perform a rotation operation on the second world geographic coordinates of the plurality of pixel points based on the posture information of the image capturing apparatus 20 at the time of obtaining one captured image, that is, the above-described second world geographic coordinates (X ₀ ，Y ₀ ，Z ₀ ) Euler transformation is performed to obtain corrected real geographic coordinates (X _T ，Y _T ，Z _T ). Specifically, the second world geographic coordinates (X ₀ ，Y ₀ ，Z ₀ ) Can be determined by using the following formulas (7) - (10):

Where α is the yaw angle head, β is the roll angle, and γ is the pitch angle pitch.

And then, according to the determined real geographic coordinates of the plurality of pixel points corresponding to each image target, the real position and the size of each image target can be determined. Further, the relative positional relationship between the respective image objects can also be analyzed.

Step 202: and respectively determining target model samples corresponding to the image targets meeting the preset similarity conditions from a preset model sample library.

In an embodiment of the present application, the model sample library includes a plurality of facility model samples and sample information of the plurality of facility model samples. The sample information may include color information, texture information, and shape information.

In practical application, specifically, whether a matched target model sample exists or not can be searched from a preset model sample library through color information, texture information and shape information of an image target. The preset similarity condition may be that "a similarity between color information of the image object and color information of the facility model sample is greater than a first similarity, a similarity between texture information of the image object and texture information of the facility model sample is greater than a second similarity, and a similarity between shape information of the image object and shape information of the facility model sample is greater than a third similarity. In this case, it is possible that more than one model sample is selected for one image object, and thus, one model sample may be selected from among the selected model samples as the object model sample of the image object. The first similarity, the second similarity and the third similarity can be set according to the user's requirement.

Of course, the preset similarity condition may be any one of the 3 sub-conditions or any combination of any 2 of the 3 sub-conditions, for example, the target model sample corresponding to each image target may be determined from the preset model sample library according to only the sub-condition that "the similarity between the texture information of the image target and the texture information of the facility model sample is greater than the second similarity". In addition, the model sample having the highest similarity to the image target may be directly selected as the target model sample of the image target.

In the embodiment of the application, the similarity between the image target and the model sample can be calculated in the following ways:

(1) Cosine similarity calculation: the images are represented as a vector, and the similarity between the two images is characterized by calculating the cosine distance between the vectors.

(2) And (3) similarity calculation of a hash algorithm: and extracting the image fingerprints by using a hash algorithm, and measuring the similarity of the images according to the difference value between the image fingerprints, wherein the smaller the difference value is, the higher the similarity is.

(3) Histogram similarity calculation: the closer the histogram distribution degree between the images is, the higher the similarity is.

(4) Structural similarity (Structural Similarity, SSIM) calculation: image similarity is measured in terms of brightness, contrast, and structure, respectively.

In one possible implementation, the color information and the texture information corresponding to each image target in the acquired image may be determined by performing feature extraction on the acquired image, and the shape information corresponding to each image target in the acquired image may be determined by performing feature recognition on the acquired image. Furthermore, according to the color information, the texture information and the shape information corresponding to each image target, target model samples corresponding to each image target meeting the preset similarity condition can be respectively determined from a preset model sample library. In an embodiment of the present application, in the present application,

specifically, the RGB (Red, green, blue) color value of each pixel of the image target is analyzed, and then the color histogram statistics is performed to obtain the color distribution condition and the texture information of the image target. In addition, when the feature extraction is performed, the boundary feature of each image object in the acquired image can be extracted. Specifically, the following methods may be used to perform edge detection on the acquired image, so as to determine respective boundary features corresponding to each image target in the acquired image:

(1) Sobel operator Soble algorithm: the gray values in the four fields of up, down, left and right of each pixel in the image are weighted and the extreme value is reached at the edge to detect the edge. It is a discrete difference operator used to compute an approximation of the gradient of the image luminance function. Using this operator at any point in the image will result in a corresponding gradient vector or normal vector thereof.

(2) The Prewitt algorithm of proud: the edge detection of the first-order differential operator utilizes the gray level difference of the adjacent points of the pixel point up and down and left and right to reach the extreme value at the edge to detect the edge, and removes part of pseudo edges, thereby having smoothing effect on noise. The principle is that two direction templates are used for carrying out neighborhood convolution on the image in the image space, wherein the two direction templates are used for detecting horizontal edges and vertical edges.

(3) Cannizing Canny Algorithm: and judging whether the edge point is an edge point or not by adopting non-maximum value inhibition. The non-maximum suppression is to find the local maximum of the pixel point, if the pixel point is the local maximum, the local maximum is reserved;

otherwise it is deleted (value set to 0).

In addition, in the feature recognition, artificial intelligence (Artificial Intelligence, AI) recognition of the extracted image target may be performed by machine deep learning using convolutional neural network CNN or computer vision technology, for example, the recognized shape is a desk shape, a bookshelf shape, a sofa shape, a car shape, a wagon shape, a house shape, or the like.

Specifically, the following methods may be used to perform feature recognition on the image target:

(1) Fast regional convolutional neural network (Faster Regions with CNN features, fast RCNN) algorithm: second order calculations may be employed based on CNN. The first order aims at finding out the position where the image target appears, and the second order classifies and refines the position identified by the first order. The recognition accuracy is high.

(2) Only one (You Only Look Once, YOLO) series (v 1-v 4) algorithm is required: first order calculations may be employed based on CNN. The method combines the two-order algorithm into one, and completes the prediction of searching the appearance position and shape of the object in one stage. The method is simple and has higher speed.

In one possible implementation manner, to further improve efficiency of three-dimensional scene reconstruction, when determining the target model sample, preliminary screening may be performed according to a semantic class of the image target, and then the target model sample may be determined according to sample information, so as to reduce the calculation amount. In the embodiment of the application, semantic recognition can be performed on each image target in the acquired image according to the preset neural network model, so that the semantic category corresponding to each image target is determined. Specifically, the semantic category recognition of the image object can also be performed by using the (Faster RCNN algorithm and the YOLO series (v 1-v 4) algorithm.

On the basis, when the target model sample is determined, the target model set matched with the semantic category of any image target can be determined from a preset model sample library according to the semantic category of any image target for any image target. Further, according to the color information, texture information and shape information of any image object, determining a corresponding object model sample of any image object meeting a preset similarity condition from the object model set. The target model samples corresponding to the image targets can be found out from a preset model sample library respectively, so that the model is individualized.

Step 203: and constructing a three-dimensional scene model of the target area according to the real geographic coordinates of a plurality of pixel points corresponding to each image target and each target model sample.

In the embodiment of the application, after determining the real geographic coordinates of a plurality of pixel points corresponding to each image target and each target model sample, in order to enable the constructed three-dimensional scene model to be more consistent with the target area, for any image target in each image target, the rotation angle and the size of the target model sample corresponding to any image target can be obtained according to the real geographic coordinates corrected by each boundary corner point corresponding to any image target, then affine transformation is carried out on the target model sample corresponding to any image target according to the rotation angle, the size and the real geographic coordinates corrected by the centroid corresponding to any image target, so that the target model sample is rotated according to the rotation angle, scaled according to the size and placed in a translation manner according to the real position determined by the corrected real geographic coordinates of the centroid of the image target, so as to construct the three-dimensional single model corresponding to any image target, and further, the three-dimensional scene model of the target area can be constructed according to the three-dimensional single model corresponding to each image target.

Specifically, in the case of performing three-dimensional scene reconstruction, since a specific orientation of the target model sample needs to be considered, it is necessary to calculate the rotation angle of the target model sample. Assume that the north direction is 0 deg., and that one clockwise rotation is 360 deg.. In the embodiment of the present application, as shown in fig. 4, a schematic diagram for acquiring a rotation angle according to the embodiment of the present application may be provided, where the leftmost boundary intersection point 3 and the rightmost boundary corner point 2 may be extracted as representative points based on the real geographic coordinates of the boundary corner points, and then an included angle between a connection line between the boundary intersection point 3 and the boundary corner point 2 and the north direction is calculated, where the included angle is the rotation angle of the target model sample. Then, since the boundary intersection point 3 coordinates are (116, 30) and the boundary intersection point 2 coordinates are (117, 31), the rotation angle is 225 ° and further, in the affine transformation, the target model sample needs to be rotated by 225 °.

In one possible implementation manner, in order to make the accuracy of the constructed three-dimensional scene model higher, in the embodiment of the present application, before target model samples corresponding to each image target meeting the preset similarity condition are respectively determined from a preset model sample library, preset three-dimensional modeling software may be further adopted to perform three-dimensional modeling on various outdoor facilities and various outdoor facilities in the real world, so as to obtain three-dimensional model samples corresponding to each outdoor facility and each outdoor facility, and further, a model sample library is constructed according to the three-dimensional model samples corresponding to each outdoor facility and each outdoor facility.

In practical application, color information, texture information and shape information of common outdoor facilities such as buildings, vehicles, roads and the like and indoor facilities such as indoor tables and chairs, furniture and floors and the like can be collected in the field, and 3DS Max, maya, rhino and the like modeling software is adopted for three-dimensional modeling, so that a model sample library with rich model samples is established. Furthermore, since the model samples in the model sample library are all fine models, the accuracy of the three-dimensional reconstruction result can be made higher.

In one possible embodiment, the acquired image may also be preprocessed before model matching is performed in order to improve the accuracy of the three-dimensional reconstruction result. Specifically, the preprocessing may include denoising, filtering, enhancing, etc. operations to improve the efficiency and accuracy of subsequent target recognition and feature extraction.

In summary, in the embodiment of the application, since the three-dimensional reconstruction is performed by only one acquired image, a large number of repeated images of the same target at different viewing angles are not required, and further, a large number of steps of feature extraction, feature matching and point cloud generation are not required, the requirement on original data is lower, huge calculation is saved, and the three-dimensional reconstruction efficiency is greatly improved. In addition, because the target model samples matched with the image targets are directly obtained from the preset model sample library, and the three-dimensional scene model is built based on the target model samples, the three-dimensional reconstruction efficiency is further improved. Moreover, since the model samples in the model sample library are all fine models, the three-dimensional reconstruction result accuracy is higher.

Based on the same inventive concept, an embodiment of the present application provides a three-dimensional reconstruction apparatus 50 based on object recognition and model matching, as shown in fig. 5, the apparatus includes:

a real geographic coordinate determining unit 501, configured to determine real geographic coordinates after correction of a plurality of pixel points according to pixel coordinates of a plurality of pixel points corresponding to each image object in one acquired image; the acquired image is an image which is acquired for a target area and contains depth information, and the plurality of pixel points comprise a centroid and boundary angular points;

the target model sample determining unit 502 is configured to determine target model samples corresponding to each image target that meets a preset similarity condition from a preset model sample library; the model sample library comprises a plurality of facility model samples and sample information of the facility model samples;

the three-dimensional scene model construction unit 503 is configured to construct a three-dimensional scene model of the target area according to the corrected real geographic coordinates of the plurality of pixels and each target model sample.

Optionally, the apparatus 50 further includes a sample information determining unit 504, the sample information determining unit 504 including:

extracting features of an acquired image, and determining color information and texture information corresponding to each image target in the acquired image;

Performing feature recognition on an acquired image, and determining shape information corresponding to each image target in the acquired image;

respectively determining target model samples corresponding to the image targets meeting the preset similarity conditions from a preset model sample library, wherein the target model samples comprise:

and respectively determining target model samples corresponding to the image targets meeting the preset similarity conditions from a preset model sample library according to the color information, the texture information and the shape information.

Optionally, the object model sample determining unit 502 is further configured to:

aiming at any image target in all image targets, determining a target model set matched with the semantic category of any image target from a preset model sample library according to the semantic category of any image target;

and determining a target model sample corresponding to any image target meeting the preset similarity condition from the target model set according to the color information, the texture information and the shape information of any image target.

Optionally, the apparatus further includes a semantic category determining unit 505, the semantic category determining unit 505 includes:

according to a preset neural network model, carrying out semantic recognition on each image target in one acquired image, and determining the semantic category corresponding to each image target.

Optionally, the apparatus further includes a model sample library construction unit 506, and the model sample library construction unit 506 includes:

adopting preset three-dimensional modeling software to perform three-dimensional modeling on various outdoor facilities and various outdoor facilities in the real world, and obtaining three-dimensional model samples corresponding to the various outdoor facilities and the various outdoor facilities respectively;

and constructing a model sample library according to the three-dimensional model samples corresponding to the outdoor facilities and the outdoor facilities.

Optionally, the real geographic coordinate determining unit 501 is further configured to:

determining first world geographic coordinates of a plurality of pixel points according to depth information of the plurality of pixel points, pixel coordinates of the plurality of pixel points, a focal length of an image acquisition device when acquiring an acquired image and a length and width of the acquired image;

according to the equipment geographic coordinates of the image acquisition equipment when acquiring an acquired image, carrying out translation operation on the first world geographic coordinates of a plurality of pixel points, and determining the second world geographic coordinates of the plurality of pixel points;

and according to the gesture information of the image acquisition equipment when acquiring one acquired image, performing rotation operation on the second world geographic coordinates of the plurality of pixel points, and determining the corrected real geographic coordinates of the plurality of pixel points.

Optionally, the three-dimensional scene model building unit 503 is further configured to:

aiming at any image target in all image targets, acquiring the rotation angle and the size of a target model sample corresponding to any image target according to the corrected real geographic coordinates of each boundary corner point corresponding to any image target;

carrying out affine transformation on a target model sample corresponding to any image target according to the rotation angle, the size and the real geographic coordinates after the centroid correction corresponding to any image target, and constructing a three-dimensional monomer model corresponding to any image target;

The apparatus may be used to execute the method executed by the video processing apparatus in the embodiments shown in fig. 2 to 4, so the descriptions of the functions and the like that can be implemented by the functional modules of the apparatus may be referred to in the embodiments shown in fig. 2 to 4, and are not repeated.

In some possible embodiments, aspects of the method provided by the present application may also be implemented in the form of a program product comprising program code for causing a computer device to carry out the steps of the method according to the various exemplary embodiments of the application described above when said program product is run on the computer device, for example the computer device may carry out the method as carried out by the video processing apparatus in the embodiment shown in fig. 2 to 4.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk or an optical disk, or the like, which can store program codes. Alternatively, the above-described integrated units of the present invention may be stored in a computer-readable storage medium if implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in essence or a part contributing to the prior art in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, ROM, RAM, magnetic or optical disk, or other medium capable of storing program code.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A three-dimensional reconstruction method based on object recognition and model matching, the method comprising:

2. The method of claim 1, wherein before determining the object model samples corresponding to the respective image objects meeting a preset similarity condition from a preset model sample library, the method further comprises:

3. The method according to claim 2, wherein the determining, from a preset model sample library, the target model samples corresponding to the respective image targets meeting a preset similarity condition according to the color information, the texture information, and the shape information includes:

4. A method according to claim 3, wherein before determining the target model samples of the respective image targets meeting a preset similarity condition from a preset model sample library, respectively, the method further comprises:

5. The method of claim 1, wherein before determining the object model samples corresponding to the respective image objects meeting a preset similarity condition from a preset model sample library, the method further comprises:

6. The method of claim 1, wherein determining the corrected true geographic coordinates of the plurality of pixels based only on the pixel coordinates of the plurality of pixels corresponding to each of the image objects in the one acquired image, comprises:

7. The method of claim 6, wherein constructing a three-dimensional scene model of the target region from the corrected true geographic coordinates of the plurality of pixels and the respective target model samples comprises:

8. A three-dimensional reconstruction apparatus based on object recognition and model matching, the apparatus comprising:

9. An electronic device, the device comprising:

a memory for storing program instructions;

a processor for invoking program instructions stored in said memory and for performing the steps comprised in the method according to any of claims 1-7 in accordance with the obtained program instructions.

10. A storage medium storing computer-executable instructions for causing a computer to perform the steps comprised by the method of any one of claims 1-7.