CN111260794A

CN111260794A - Outdoor augmented reality application method based on cross-source image matching

Info

Publication number: CN111260794A
Application number: CN202010034538.XA
Authority: CN
Inventors: 王程; 刘伟权; 卞学胜; 沈雪仑; 赖柏锜; 李渊; 李永川; 贾宏
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2020-01-14
Filing date: 2020-01-14
Publication date: 2020-06-09
Anticipated expiration: 2040-01-14
Also published as: CN111260794B

Abstract

The invention provides an outdoor augmented reality application method based on cross-source image matching, which comprises the following steps: acquiring a camera image and a rendering image correspondingly matched with the camera image, and processing the camera image and the rendering image to acquire a local camera image block and a local rendering image block which are matched in pairs; constructing a deep learning model according to the automatic coding machine and the twin network, and training the deep learning model; extracting feature descriptors of local camera image blocks and local rendering image blocks to be matched based on a trained deep learning model, and performing cross-source image matching on the local camera image blocks and the local rendering image blocks to be matched according to the extracted feature descriptors to obtain a cross-source image matching result; acquiring a corresponding relation of the cross-source images according to the cross-source image matching result, and calculating a virtual-real registration transformation relation according to the corresponding relation; and the application of outdoor augmented reality is realized according to the virtual-real registration transformation relation, so that the augmented reality effect is improved.

Description

Outdoor augmented reality application method based on cross-source image matching

Technical Field

The invention relates to the technical field of outdoor augmented reality, in particular to an outdoor augmented reality application method based on cross-source image matching, a computer readable storage medium and computer equipment.

Background

In the related art, augmented reality applications are mainly focused on indoor scenes, virtual and real registration is assisted by pre-placing marks, however, in outdoor scenes, due to the fact that the scale and complexity of the outdoor scenes are increased, the pre-placing marks are unrealistic, most of outdoor augmented reality applications are usually based on positioning and vision methods of sensors and are mainly applied to static scenes, and the fusion precision of multiple sensors is not robust to light change and shading, so that the augmented reality effect is influenced.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the art described above. Therefore, one objective of the present invention is to provide an outdoor augmented reality application method based on cross-source image matching, in which a corresponding relationship is obtained by matching the cross-source images, and a virtual-real registration transformation relationship is obtained according to the corresponding relationship, so as to improve an augmented reality effect.

A second object of the invention is to propose a computer-readable storage medium.

A third object of the invention is to propose a computer device.

In order to achieve the above object, an embodiment of a first aspect of the present invention provides an outdoor augmented reality application method based on cross-source image matching, including the following steps: acquiring a camera image and a rendering image correspondingly matched with the camera image, and processing the camera image and the rendering image to acquire a local camera image block and a local rendering image block which are matched in pairs; constructing a deep learning model according to an automatic coding machine and a twin network, and training the deep learning model according to paired matched local camera image blocks and local rendering image blocks; extracting feature descriptors of local camera image blocks and local rendering image blocks to be matched based on a trained deep learning model, and performing cross-source image matching on the local camera image blocks and the local rendering image blocks to be matched according to the extracted feature descriptors to obtain a cross-source image matching result; acquiring a corresponding relation of the cross-source images according to the cross-source image matching result, and calculating a virtual-real registration transformation relation according to the corresponding relation; and realizing the application of outdoor augmented reality according to the virtual-real registration transformation relation.

According to the outdoor augmented reality application method based on cross-source image matching, firstly, a camera image and a rendering image corresponding to the camera image are obtained, the camera image and the rendering image are processed to obtain a local camera image block and a local rendering image block which are matched in pairs, then, a deep learning model is constructed according to an automatic coding machine and a twin network, the deep learning model is trained according to the local camera image block and the local rendering image block which are matched in pairs, then, the feature descriptors of the local camera image block and the local rendering image block which are to be matched are extracted based on the trained deep learning model, the cross-source image matching is carried out on the local camera image block and the local rendering image block which are to be matched according to the extracted feature descriptors to obtain a cross-source image matching result, and then, the corresponding relation of the cross-source image is obtained according to the cross-source image matching result, calculating a virtual-real registration transformation relation according to the corresponding relation, and finally realizing application to outdoor augmented reality according to the virtual-real registration transformation relation; therefore, the cross-source image is matched to obtain the corresponding relation, and the virtual-real registration transformation relation is obtained according to the corresponding relation, so that the augmented reality effect is improved.

In addition, the outdoor augmented reality application method based on cross-source image matching proposed according to the above embodiment of the present invention may further have the following additional technical features:

optionally, acquiring a camera image and a rendered image correspondingly matched with the camera image includes: acquiring a camera image; acquiring an aerial image, and performing three-dimensional reconstruction on the aerial image by adopting an SFM algorithm to obtain a three-dimensional image point cloud of an outdoor scene; and acquiring image information according to the camera image, and rendering a rendering image which is correspondingly matched with the camera image in the three-dimensional image point cloud according to the image information.

Optionally, processing the camera image and the rendered image to obtain pairs of matched local camera tiles and local rendered tiles includes: acquiring a perspective transformation matrix of the camera image and the rendered image; labeling the segmented sample in the camera image with a LabelMe toolkit; constructing a segmentation network, and training the segmentation network according to the marked camera image; segmenting the camera image based on the trained segmentation network to segment a segmentation sample of the camera image; extracting all key points of the segmentation sample by using a detector with scale-invariant feature transformation, selecting a plurality of key points from all key points so that the distance between each selected key point is greater than a first preset threshold value, and deleting other unselected key points; and taking the selected multiple key points as a center, acquiring corresponding local camera image blocks according to a preset size, and mapping the local camera image blocks onto the rendered image according to the perspective transformation matrix to acquire the corresponding local rendered image blocks.

Optionally, the deep learning model comprises: an encoder, a decoder and an STN block.

Optionally, when the deep learning model is trained according to the pair-matched local camera image block and local rendering image block, the method further includes: and adjusting the optimizer and the hyper-parameters according to the training requirements of the deep learning model, wherein the hyper-parameters comprise a learning step length, a learning rate and a batch size.

Optionally, performing cross-source image matching on the local camera image block to be matched and the local rendering image block according to the extracted feature descriptors, including: acquiring a feature descriptor of a corresponding local rendering image block meeting a first preset condition by using a nearest neighbor retrieval method and taking the feature descriptor of the local camera image block as a reference; and filtering error matching by adopting a RANSAC algorithm according to the retrieved feature descriptors of the matched local camera image blocks and the feature descriptors of the local rendering image blocks, and calculating the central points of the remaining paired matched local camera image blocks and local rendering image blocks to obtain the matching relationship between the local camera image blocks and the local rendering image blocks and obtain the cross-source image matching result.

Optionally, calculating a virtual-real registration transformation relationship according to the correspondence includes: acquiring the three-dimensional image point cloud M to a camera image C according to the image information_ICorresponding rendering image R_IThe projection matrix P of (i.e. P.M → R)_I(ii) a Capturing Camera image C_IAnd a corresponding rendered image R_IIs T, i.e. T.R_I→C_I(ii) a Obtaining a three-dimensional image point cloud M to a camera image C according to the projection relation and the matching relation_IThe transformation relation of the virtual and real registration of (1), namely the transformation relation of the three-dimensional space and the two-dimensional space T · (P · M) → C_I。

Optionally, the implementing of the application to outdoor augmented reality according to the virtual-real registration transformation relationship includes: acquiring the position of a three-dimensional virtual target to be superposed in an outdoor scene; placing the three-dimensional virtual target into a three-dimensional image point cloud; and mapping the three-dimensional virtual target to the camera image according to the virtual-real registration transformation relation.

To achieve the above object, a second embodiment of the present invention provides a computer-readable storage medium, on which an outdoor augmented reality application based on cross-source image matching is stored, and when executed by a processor, the outdoor augmented reality application based on cross-source image matching implements the outdoor augmented reality application method based on cross-source image matching as described above.

According to the computer-readable storage medium of the embodiment of the invention, the outdoor augmented reality application program based on cross-source image matching is stored, so that the processor realizes the outdoor augmented reality application method based on cross-source image matching when the outdoor augmented reality application program based on cross-source image matching is executed, and the effect of augmented reality is improved.

In order to achieve the above object, a third embodiment of the present invention provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the method for applying outdoor augmented reality based on cross-source image matching as described above is implemented.

According to the computer device of the embodiment of the invention, the computer program which can run on the processor is stored through the memory, so that the processor can realize the outdoor augmented reality application method based on cross-source image matching when executing the computer program, and the augmented reality effect is improved.

Drawings

Fig. 1 is a schematic flowchart of an outdoor augmented reality application method based on cross-source image matching according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a three-dimensional image point cloud result of an outdoor scene obtained after an aerial image is three-dimensionally reconstructed by an SFM algorithm according to an embodiment of the invention;

FIG. 3 is a schematic diagram of acquiring a rendered image corresponding to a match of a camera image according to one embodiment of the invention;

FIG. 4 is a schematic diagram of a partitioned network according to one embodiment of the present invention;

FIG. 5 is a cross-source image block matched in pairs according to one embodiment of the invention;

FIG. 6 is a schematic structural diagram of a deep learning model according to an embodiment of the present invention;

FIG. 7 is a block diagram of deep learning model branch 1 according to an embodiment of the present invention;

FIG. 8 is a block diagram of deep learning model branch 2 according to an embodiment of the present invention;

FIG. 9 is a cross-source image matching result according to one embodiment of the invention;

FIG. 10 is a matching result across a source image center point line according to one embodiment of the present invention;

fig. 11 is a diagram illustrating an effect of outdoor augmented reality according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

In order to better understand the above technical solutions, exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.

Fig. 1 is a schematic flow diagram of an outdoor augmented reality application method based on cross-source image matching according to an embodiment of the present invention, and as shown in fig. 1, the outdoor augmented reality application method based on cross-source image matching according to the embodiment of the present invention includes the following steps:

step 101, acquiring a camera image and a rendering image corresponding to the camera image, and processing the camera image and the rendering image to acquire a local camera image block and a local rendering image block which are matched in pairs.

As one embodiment, acquiring a camera image and a rendered image corresponding to the camera image includes: acquiring a camera image; acquiring an aerial image, and performing three-dimensional reconstruction on the aerial image by adopting an SFM algorithm to obtain a three-dimensional image point cloud of an outdoor scene; and acquiring image information according to the camera image, and rendering a rendering image which is correspondingly matched with the camera image in the three-dimensional image point cloud according to the image information.

As a specific example, the camera image may be obtained by shooting with a mobile phone.

As a specific example, the aerial images may be captured by a drone.

As a specific example, as shown in fig. 2, an outdoor scene is obliquely photographed by an unmanned aerial vehicle, so as to obtain a large number of aerial images I_iAnd i is 1,2, …, N, wherein N is the number of aerial images, and the three-dimensional reconstruction is performed on the aerial images by adopting an SfM algorithm, so as to obtain a three-dimensional image point cloud M of the outdoor scene.

As a specific embodiment, as shown in fig. 3, an outdoor scene is photographed by a mobile phone, so as to obtain a camera image C_IVirtually positioning in the reconstructed three-dimensional image point cloud scene M by using the positioning information of the mobile phone, starting from a shooting direction determined by the external parameters of the mobile phone to obtain a projection matrix P from the three-dimensional image point cloud M to a camera image, and rendering a rendering image R with the same size in the reconstructed three-dimensional image point cloud M by taking the size of the image shot by the mobile phone as a standard_I。

In addition, the camera image C is used as a reference_IA rendered image R rendered from the positioning information_IWith the camera image C_IAre matched with each other, and a camera image C_IAnd rendering the image R_IReferred to as a cross-source image.

As one embodiment, processing a camera image and a rendered image to obtain pairs of matched local camera tiles and local rendered tiles includes: acquiring a perspective transformation matrix of a camera image and a rendering image; marking the segmentation samples in the camera image by adopting a LabelMe toolkit; constructing a segmentation network, and training the segmentation network according to the marked camera image; segmenting the camera image based on the trained segmentation network to segment a segmentation sample of the camera image; extracting all key points of the segmentation sample by using a detector with scale-invariant feature transformation, selecting a plurality of key points from all the key points so that the distance between each selected key point is greater than a first preset threshold value, and deleting other unselected key points; and taking the selected multiple key points as a center, acquiring corresponding local camera image blocks according to a preset size, and mapping the local camera image blocks onto the rendered image according to a perspective transformation matrix to acquire the corresponding local rendered image blocks.

Note that the matched camera image C is preset_IAnd rendering the image R_IThe transformation relation of (2) is perspective transformation.

As a specific example, at least 4 sets of camera images C are first manually selected_IAnd rendering the image R_IThe two perspective transformation matrixes T across the source images are calculated by the matching corresponding points; a segmented network was then constructed using the framework of U-Net, and 200 camera images C were then labeled using the LabelMe toolkit_ITaking the building as a target segmentation sample, and finally inputting the 200 marked samples into a constructed segmentation network for training; wherein, the constructed segmentation network is shown in FIG. 4; next, the camera image C_IInputting the image into a trained segmentation network to segment the building, and extracting a camera image C by using a SIFT (Scale Invariant Feature Transform) detector_IDividing SIFT key points of the divided building, selecting a plurality of key points from all SIFT key points, enabling the distance between each selected SIFT key point to be larger than 30 pixels, and deleting other SIFT key points which are not selected; taking a plurality of selected SIFT key points as a center, acquiring local camera image blocks corresponding to each SIFT key point according to a certain size, and mapping the local camera image blocks onto a rendering image matched with the camera image according to a calculated perspective transformation matrix T to obtain corresponding matched local rendering image blocks; the matching pairs of partial camera patch and partial rendering patch are shown in fig. 5, where the first row in fig. 5 is a partial camera patch,the second action locally renders the image block.

And 102, constructing a deep learning model according to the automatic coding machine and the twin network, and training the deep learning model according to the pair-matched local camera image blocks and the local rendering image blocks.

As one example, as shown in FIG. 6, the deep learning model Y-Net is shaped like the letter Y, and includes: an encoder, a decoder and an STN block.

Wherein, this encoder has two, and the structure of this encoder is: c (32,5,2) -BN-SeLU-C (64,5,2) -BN-SeLU-P (3,2) -C (96,3,1) -BN-SeLU-C (256,3,1) -BN-SeLU-P (3,2) -C (384,3,1) -BN-SeLU-C (384,3,1) -BN-SeLU-C (256,3,1) -BN-SeLU-P (3,2) -C (128,7,1) -BN-SeLU; where C (n, k, s) represents a convolutional layer containing n convolutional kernels of size k and having a step size s; p (k, s) represents the maximum pooling layer with a sliding window of k and a step size of s; BN is batch standardization; SeLU is the activation function.

Wherein, the decoder is a shared decoder, and the structure of the decoder is as follows: FC (128,1024) -TC (128,4,2) -SeLU-TC (64,4,2) -SeLU-TC (32,4,2) -SeLU-TC (16,4,2) -SeLU-TC (8, 4,2) -SeLU-TC (4, 4,2) -SeLU-TC (3, 4,2) -Sigmoid; wherein FC (p, q) represents a full connection layer, FC maps a vector of p dimension to a vector of q dimension; TC (n, k, s) represents the deconvolution layer, the output depth is n, the convolution kernel size is k × k, and the step length is s; SeLU and Sigmoid are activation functions.

It should be noted that the inputs of the deep learning model Y-Net are local camera image blocks and local rendering image blocks matched in pairs, where the input of one branch is a local camera image block and the input of the other branch is a local rendering image block, and these image blocks are first adjusted to 256 × 3 in size before being input into the deep learning model Y-Net; the output of the deep learning model Y-Net is two 128-dimensional feature vectors, namely feature descriptors; the decomposition diagrams of the two branches of the deep learning model Y-Net are shown in FIGS. 7 and 8.

In FIG. 7, the input of the deep learning model Y-Net branch 1 is a local rendering image block R, which is processed by the encoder F_En1Extracted feature f_AE1Characteristic f_AE1Via a decoder G_De1Recovering the obtained image and recording the image as R'; in FIG. 8, the input to the deep learning model Y-Net branch 2 is the local camera patch C, combining the STN module with the encoder as one large encoder F_En2Local camera image block C is encoded by encoder F_En2The extracted feature is f_AE2Characteristic f_AE2Via a decoder G_De2The recovered image is denoted as C'.

As a specific embodiment, when the deep learning model is constructed according to the automatic coding machine and the twin network, a cross-source constraint loss function is further designed to optimize the deep learning model Y-Net, wherein the cross-source constraint loss function comprises content loss and characteristic consistent loss.

Firstly, acting on an image block of an input deep learning model Y-Net by using Mean Squared Error (MSE), specifically as follows:

wherein, W × H is the size of the input image block, and N is the number of channels of the image; combining these three MSE losses yields a content loss as follows:

L_content＝L_AE1(R，R′)+L_AE2(R，C′)+L_GEN(R′，C′)

secondly, the feature consistent loss is constrained by the feature f extracted by two branches of the deep learning model Y-Net_AE1And f_AE2And constraining by adopting the Euclidean distance, which specifically comprises the following steps:

wherein K is the feature f_AE1And f_AE2In the present invention, K is 128.

And finally, combining the content loss and the characteristic consistent loss to obtain a cross-source constraint loss function, which is as follows:

L_Y-Net＝L_Content+λ*L_Feature

wherein λ is a weight parameter.

As an embodiment, when the deep learning model is trained according to the pair-matched local camera image block and local rendering image block, the method further includes: and adjusting the optimizer and the hyper-parameters according to the training requirements of the deep learning model, wherein the hyper-parameters comprise a learning step length, a learning rate and a batch size.

As a specific example, the deep learning model Y-Net was implemented by PyTorch, using an optimizer of RMSprop, with an initial value of learning rate of 0.001 and a decay of 0.99 times every 4 cycles.

103, extracting feature descriptors of the local camera image blocks and the local rendering image blocks to be matched based on the trained deep learning model, and performing cross-source image matching on the local camera image blocks and the local rendering image blocks to be matched according to the extracted feature descriptors to obtain a cross-source image matching result.

As an embodiment, before extracting the feature descriptors of the local camera image blocks and the local rendering image blocks to be matched, the building of the camera image is further segmented by the trained segmentation network; extracting SIFT key points of a building segmented from a camera image by using an SIFT detector, selecting a plurality of key points from all the key points so that the distance between each selected key point is more than 30 pixels, deleting other unselected key points, and taking the SIFT key points as the center to obtain a local camera image block; randomly selecting 3000 points on a corresponding rendering image, and taking the random points as a center to obtain a local rendering image block; thereby obtaining the same number of local camera image blocks and local rendering image blocks to be matched.

As a specific embodiment, inputting the obtained local camera image blocks and local rendering image blocks with the same number to be matched into a trained deep learning model Y-Net to extract feature descriptors of the local camera image blocks and the local rendering image blocks; and acquiring the feature descriptors of the local rendering image blocks meeting the following two conditions by adopting a nearest neighbor retrieval method and taking the feature descriptors of the local camera image blocks as a reference: 1) the feature descriptor of the local rendering image block closest to the feature descriptor of the local camera image block, 2) the feature descriptor of the local rendering image block having a pre-similarity to the feature descriptor of the local camera image block greater than 0.92; and filtering out erroneous matching by adopting a RANSAC algorithm according to the retrieved feature descriptors of the matched local camera image blocks and the local rendering image blocks, and then calculating the transformation relation of the two images by using the centers of the left matched local camera image blocks and the local rendering image blocks to complete the cross-source image matching. The final cross-source image block matching result is shown in fig. 9, and the centroids of these image blocks are connected as shown in fig. 10.

And 104, acquiring the corresponding relation of the cross-source images according to the cross-source image matching result, and calculating a virtual-real registration transformation relation according to the corresponding relation.

As an embodiment, calculating the virtual-real registration transformation relationship according to the correspondence includes: obtaining three-dimensional image point cloud M to camera image C according to image information_ICorresponding rendering image R_IThe projection matrix P of (i.e. P.M → R)_I(ii) a Capturing Camera image C_IAnd a corresponding rendered image R_IIs a perspective transformation matrix T, i.e. T.R_I→C_I(ii) a Obtaining three-dimensional image point cloud M to camera image C according to projection matrix P and perspective transformation matrix T_IThe transformation relation of the virtual and real registration of (1), namely the transformation relation of the three-dimensional space and the two-dimensional space T · (P · M) → C_I。

And 105, realizing application of outdoor augmented reality according to the virtual-real registration transformation relation.

As an embodiment, the application of outdoor augmented reality is realized according to a virtual-real registration transformation relationship, including: acquiring the position of a three-dimensional virtual target to be superposed in an outdoor scene; placing a three-dimensional virtual target into a three-dimensional image point cloud; and mapping the three-dimensional virtual target to the camera image according to the virtual-real registration transformation relation. Fig. 11 shows several effects of the outdoor augmented reality application based on the present invention, and the real content is real-time information of a library.

In summary, the outdoor augmented reality application method based on cross-source image matching provided by the invention includes firstly acquiring a camera image and a rendering image correspondingly matched with the camera image, processing the camera image and the rendering image to acquire a local camera image block and a local rendering image block which are matched in pairs, then constructing a deep learning model according to an automatic coding machine and a twin network, training the deep learning model according to the local camera image block and the local rendering image block which are matched in pairs, then extracting feature descriptors of the local camera image block and the local rendering image block to be matched based on the trained deep learning model, performing cross-source image matching on the local camera image block and the local rendering image block to be matched according to the feature descriptors to obtain a cross-source image matching result, and then acquiring a corresponding relation of the cross-source image according to the cross-source image matching result, calculating a virtual-real registration transformation relation according to the corresponding relation, and finally realizing application to outdoor augmented reality according to the virtual-real registration transformation relation; therefore, the cross-source image is matched to obtain the corresponding relation, and the virtual-real registration transformation relation is obtained according to the corresponding relation, so that the augmented reality effect is improved.

In addition, the embodiment of the present invention further provides a computer-readable storage medium, on which an outdoor augmented reality application based on cross-source image matching is stored, and when being executed by a processor, the outdoor augmented reality application based on cross-source image matching implements the above outdoor augmented reality application method based on cross-source image matching.

In addition, the embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the above outdoor augmented reality application method based on cross-source image matching is implemented.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

In the description of the present invention, it is to be understood that the terms "first", "second" and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

In the present invention, unless otherwise expressly stated or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; either directly or indirectly through intervening media, either internally or in any other relationship. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

In the present invention, unless otherwise expressly stated or limited, the first feature "on" or "under" the second feature may be directly contacting the first and second features or indirectly contacting the first and second features through an intermediate. Also, a first feature "on," "over," and "above" a second feature may be directly or diagonally above the second feature, or may simply indicate that the first feature is at a higher level than the second feature. A first feature being "under," "below," and "beneath" a second feature may be directly under or obliquely under the first feature, or may simply mean that the first feature is at a lesser elevation than the second feature.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above should not be understood to necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. An outdoor augmented reality application method based on cross-source image matching is characterized by comprising the following steps:

acquiring a camera image and a rendering image correspondingly matched with the camera image, and processing the camera image and the rendering image to acquire a local camera image block and a local rendering image block which are matched in pairs;

constructing a deep learning model according to an automatic coding machine and a twin network, and training the deep learning model according to paired matched local camera image blocks and local rendering image blocks;

extracting feature descriptors of local camera image blocks and local rendering image blocks to be matched based on a trained deep learning model, and performing cross-source image matching on the local camera image blocks and the local rendering image blocks to be matched according to the extracted feature descriptors to obtain a cross-source image matching result;

acquiring a corresponding relation of the cross-source images according to the cross-source image matching result, and calculating a virtual-real registration transformation relation according to the corresponding relation;

and realizing the application of outdoor augmented reality according to the virtual-real registration transformation relation.

2. The outdoor augmented reality application method based on cross-source image matching of claim 1, wherein acquiring a camera image and a rendered image corresponding matching the camera image comprises:

acquiring a camera image;

acquiring an aerial image, and performing three-dimensional reconstruction on the aerial image by adopting an SFM algorithm to obtain a three-dimensional image point cloud of an outdoor scene;

and acquiring image information according to the camera image, and rendering a rendering image which is correspondingly matched with the camera image in the three-dimensional image point cloud according to the image information.

3. The method for outdoor augmented reality application based on cross-source image matching according to claim 1, wherein processing the camera image and the rendered image to obtain pairs of matched local camera tiles and local rendered tiles comprises:

acquiring a perspective transformation matrix of the camera image and the rendered image;

labeling the segmented sample in the camera image with a LabelMe toolkit;

constructing a segmentation network, and training the segmentation network according to the marked camera image;

segmenting the camera image based on the trained segmentation network to segment a segmentation sample of the camera image;

extracting all key points of the segmentation sample by using a detector with scale-invariant feature transformation, selecting a plurality of key points from all key points so that the distance between each selected key point is greater than a first preset threshold value, and deleting other unselected key points;

and taking the selected multiple key points as a center, acquiring corresponding local camera image blocks according to a preset size, and mapping the local camera image blocks onto the rendered image according to the perspective transformation matrix to acquire the corresponding local rendered image blocks.

4. The outdoor augmented reality application method based on cross-source image matching of claim 1 wherein the deep learning model comprises: an encoder, a decoder and an STN block.

5. The outdoor augmented reality application method based on cross-source image matching of claim 1, wherein when training the deep learning model from pairs of matched local camera patch and local rendering patch, further comprising:

and adjusting the optimizer and the hyper-parameters according to the training requirements of the deep learning model, wherein the hyper-parameters comprise a learning step length, a learning rate and a batch size.

6. The outdoor augmented reality application method based on cross-source image matching of claim 1 wherein cross-source image matching of the local camera image block to be matched and the local rendering image block according to the extracted feature descriptors comprises:

acquiring a feature descriptor of a corresponding local rendering image block meeting a first preset condition by using a nearest neighbor retrieval method and taking the feature descriptor of the local camera image block as a reference;

and filtering error matching by adopting a RANSAC algorithm according to the retrieved feature descriptors of the matched local camera image blocks and the feature descriptors of the local rendering image blocks, and calculating the central points of the remaining paired matched local camera image blocks and local rendering image blocks to obtain perspective transformation matrixes of the local camera image blocks and the local rendering image blocks and obtain a cross-source image matching result.

7. The outdoor augmented reality application method based on cross-source image matching as claimed in claim 2, wherein calculating a virtual-real registration transformation relation according to the correspondence comprises:

acquiring the three-dimensional image point cloud M to a camera image C according to the image information_ICorresponding rendering image R_IThe projection matrix P of (i.e. P.M → R)_I；

Capturing Camera image C_IAnd a corresponding rendered image R_IIs a perspective transformation matrix T, i.e. T.R_I→C_I；

Obtaining a three-dimensional image point cloud M to a camera image C according to the projection matrix P and the perspective transformation matrix T_IThe transformation relation of the virtual and real registration of (1), namely the transformation relation of the three-dimensional space and the two-dimensional space T · (P · M) → C_I。

8. The outdoor augmented reality application method based on cross-source image matching according to claim 1, wherein the application of outdoor augmented reality is realized according to the virtual-real registration transformation relation, and comprises the following steps:

acquiring the position of a three-dimensional virtual target to be superposed in an outdoor scene;

placing the three-dimensional virtual target into a three-dimensional image point cloud;

and mapping the three-dimensional virtual target to the camera image according to the virtual-real registration transformation relation.

9. A computer-readable storage medium having stored thereon a cross-source image matching based outdoor augmented reality application that, when executed by a processor, implements a cross-source image matching based outdoor augmented reality application method according to any one of claims 1-8.

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements the cross-source image matching based outdoor augmented reality application method of any one of claims 1-8.