CN115937546A

CN115937546A - Image matching method, three-dimensional image reconstruction method, image matching device, three-dimensional image reconstruction device, electronic apparatus, and medium

Info

Publication number: CN115937546A
Application number: CN202211534768.8A
Authority: CN
Inventors: 么仕曾
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-11-30
Filing date: 2022-11-30
Publication date: 2023-04-07

Abstract

The present disclosure provides an image matching method, an image matching device, a three-dimensional image reconstruction method, an image matching apparatus, an electronic device, and a medium, and relates to the field of computer technologies, in particular to the technical fields of network and computational power technologies, artificial intelligence technologies, virtual reality, augmented reality, metastic technologies, computer vision, and the like. The specific implementation scheme is as follows: rotating the image to be matched of the target object to obtain a plurality of candidate images to be matched, wherein the texture of the target object has at least one of weak texture and repeatability; determining a target image to be matched from a plurality of candidate images to be matched according to the first characteristic information of the target image and the second characteristic information of the candidate images to be matched; determining matching information between the target image and the target image to be matched according to the third characteristic information of the target image and the fourth characteristic information of the target image to be matched; and matching the target image and the target image to be matched according to the matching information.

Description

Image matching method, three-dimensional image reconstruction method, image matching device, three-dimensional image reconstruction device, electronic apparatus, and medium

Technical Field

The present disclosure relates to the field of computer technology, and more particularly to the field of network and computational power technology, artificial intelligence technology, virtual reality, augmented reality, metastic technology, and computer vision. And more particularly, to an image matching method, a three-dimensional image reconstruction method, an apparatus, an electronic device, and a medium.

Background

With the continuous development of computer technology, the demand for processing images is increasing. For example, the images may be image matched. Image matching may refer to the process of determining matching images by performing similarity and consistency analysis on the content, features, structure, and gray scale of the images.

Disclosure of Invention

The disclosure provides an image matching method, an image matching device, a three-dimensional image reconstruction method, a three-dimensional image reconstruction device, an electronic device and a medium.

According to an aspect of the present disclosure, there is provided an image matching method including: rotating an image to be matched of a target object to obtain a plurality of candidate images to be matched, wherein the texture of the target object has at least one of weak texture and repeatability; determining a target image to be matched from the candidate images to be matched according to first characteristic information of the target image and second characteristic information of the candidate images to be matched, wherein the target image is an image aiming at the target object; determining matching information between the target image and the target image to be matched according to third feature information of the target image and fourth feature information of the target image to be matched, wherein the third feature information comprises first feature point information and second feature point descriptor information of first feature points, and the fourth feature information comprises second feature point information and second feature point descriptor information of second feature points; and matching the target image and the target image to be matched according to the matching information.

According to another aspect of the present disclosure, there is provided a three-dimensional image reconstruction method including: obtaining matching information between the target image and the target image to be matched by using an image matching method; and performing three-dimensional reconstruction on the target object according to the matching information.

According to another aspect of the present disclosure, there is provided an image matching apparatus including: the rotation module is used for rotating the image to be matched of the target object to obtain a plurality of candidate images to be matched, wherein the texture of the target object has at least one of weak texture and repeatability; a first determining module, configured to determine a target image to be matched from the multiple candidate images to be matched according to first feature information of the target image and second feature information of the multiple candidate images to be matched, where the target image is an image of the target object; a second determining module, configured to determine matching information between the target image and the target image to be matched according to third feature information of the target image and fourth feature information of the target image to be matched, where the third feature information includes first feature point information and first feature point descriptor information of a first feature point, and the fourth feature information includes second feature point information and second feature point descriptor information of a second feature point; and the matching module is used for matching the target image with the target image to be matched according to the matching information.

According to another aspect of the present disclosure, there is provided a three-dimensional image reconstruction apparatus including: a fifth obtaining module, configured to obtain, by using an image matching apparatus, matching information between the target image and the target image to be matched; and the three-dimensional reconstruction module is used for performing three-dimensional reconstruction on the target object according to the matching information.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the method according to the disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method as described above in the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method as described above in the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 schematically illustrates an exemplary system architecture to which an image matching method, a three-dimensional image reconstruction method and apparatus may be applied, according to an embodiment of the present disclosure;

FIG. 2 schematically shows a flow chart of an image matching method according to an embodiment of the present disclosure;

fig. 3 schematically illustrates an example schematic diagram of performing cropping processing on at least one of an original image and an original image to be matched to obtain a target image and an image to be matched according to an embodiment of the disclosure;

fig. 4A schematically illustrates an example schematic diagram of a method of determining third feature information of a target image and fourth feature information of a target image to be matched according to an embodiment of the present disclosure;

fig. 4B schematically illustrates an example diagram of a method of determining third feature information of a target image and fourth feature information of a target image to be matched according to another embodiment of the present disclosure;

fig. 5A schematically illustrates an example diagram of determining matching information between a target image and a target image to be matched according to third feature information of the target image and fourth feature information of the target image to be matched according to an embodiment of the present disclosure;

fig. 5B schematically illustrates an example schematic diagram of determining matching information between a target image and a target image to be matched according to third feature information of the target image and fourth feature information of the target image to be matched according to another embodiment of the present disclosure;

fig. 5C schematically illustrates an example schematic diagram of determining matching information between a target image and a target image to be matched according to third feature information of the target image and fourth feature information of the target image to be matched according to another embodiment of the present disclosure;

FIG. 6 schematically shows a flow chart of a three-dimensional image reconstruction method according to an embodiment of the present disclosure;

fig. 7A schematically illustrates an example schematic diagram of a three-dimensional image reconstruction method according to an embodiment of the disclosure;

FIG. 7B schematically illustrates an example schematic of a three-dimensional image reconstruction method according to another embodiment of the present disclosure;

FIG. 7C schematically shows an example schematic of a three-dimensional image reconstruction result according to an embodiment of the disclosure;

fig. 8 schematically shows a block diagram of an image matching apparatus according to an embodiment of the present disclosure;

fig. 9 schematically shows a block diagram of a three-dimensional image reconstruction apparatus according to an embodiment of the present disclosure; and

fig. 10 schematically shows a block diagram of an electronic device adapted to implement an image matching method and a three-dimensional image reconstruction method according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Image matching may include a process of determining corresponding matching points between a plurality of images in which a target object exists. The image matching can be applied to the fields of three-dimensional reconstruction, image registration, remote sensing detection and the like. The target object may be at least one of a static target object and a dynamic target object. The static target object may include at least one of a posable static target object and a fixed-pose static target object

The three-dimensional reconstruction may refer to a process of reconstructing coordinates of a matching point corresponding to the target object in the two images in a three-dimensional space by determining the matching point corresponding to the target object in the two images.

The three-dimensional reconstruction may be based on multi-view images. The multi-view image may be an image acquired by acquiring the target object on different multi-turn orbits based on an image acquisition device with an unknown pose.

The three-dimensional reconstruction based on the multi-view images may refer to a process of determining the same point of the target object in each multi-view image through the multi-view images, and determining the three-dimensional depth of the target object at the point by using the relative positional relationship of each view. Due to the fact that the target object can have different placing modes among different tracks, all external visual information of the target object can be covered by multi-view image collection based on the multi-circle tracks.

However, since the relative pose of the image capture device between the multi-turn orbits is unknown, it is necessary to transform the multi-view images into the same three-dimensional coordinate system based on the texture information of the target object itself. In this case, for a target object having a weak texture or a repetitive texture, since it is difficult to accurately estimate the camera pose of each multi-view image, the three-dimensional reconstruction effect of the target object based on the multi-view images is poor.

To this end, the present disclosure proposes an image matching scheme. For example, the image to be matched of the target object is rotated to obtain a plurality of candidate images to be matched. The texture of the target object may have at least one of weak texture and repeatability. And determining the target image to be matched from the candidate images to be matched according to the first characteristic information of the target image and the second characteristic information of the candidate images to be matched. The target image may be an image for a target object. And determining matching information between the target image and the target image to be matched according to the third characteristic information of the target image and the fourth characteristic information of the target image to be matched. The third feature information includes first feature point information and first feature point descriptor information of the first feature point. The fourth feature information includes second feature point information and second feature point descriptor information of the second feature point. And matching the target image and the target image to be matched according to the matching information.

According to the embodiment of the disclosure, the target image to be matched is determined from the candidate images to be matched according to the first characteristic information of the target image and the second characteristic information of the candidate images to be matched, and the candidate images to be matched are obtained by rotating the images to be matched of the target object, so that the influence of weak texture and repetitive texture can be reduced by using the target image to be matched, and the accuracy of image matching is ensured. In addition, according to the third characteristic information of the target image and the fourth characteristic information of the target image to be matched, the matching information between the target image and the target image to be matched is determined, image matching is achieved according to the matching information, and accuracy of image matching is improved.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

In the technical scheme of the disclosure, before the personal information of the user is obtained or collected, the authorization or the consent of the user is obtained.

Fig. 1 schematically shows an exemplary system architecture to which the image matching method, the three-dimensional image reconstruction method and apparatus may be applied, according to an embodiment of the present disclosure.

It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios. For example, in another embodiment, an exemplary system architecture to which the image matching method, the three-dimensional image reconstruction method, and the apparatus may be applied may include a terminal device, but the terminal device may implement the image matching method, the three-dimensional image reconstruction method, and the apparatus provided in the embodiments of the present disclosure without interacting with a server.

As shown in fig. 1, the system architecture 100 according to this embodiment may include a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 is used to provide a medium of communication links between the first terminal device 101, the second terminal device 102, the third terminal device 103 and the server 105. The network 104 may include various connection types. E.g., at least one of wired and wireless communication links, etc. The terminal device may comprise at least one of the first terminal device 101, the second terminal device 102 and the third terminal device 103.

The user may interact with the server 105 via the network 104 using at least one of the first terminal device 101, the second terminal device 102 and the third terminal device 103 to receive or send messages or the like. At least one of the first terminal device 101, the second terminal device 102, and the third terminal device 103 may be installed with various communication client applications. For example, at least one of a knowledge reading class application, a web browser application, a search class application, an instant messaging tool, a mailbox client, social platform software, and the like.

The first terminal device 101, the second terminal device 102, and the third terminal device 103 may be various electronic devices having a display screen and supporting web browsing. For example, the electronic device may include at least one of a smartphone, a tablet, a laptop portable computer, a desktop computer, and the like.

The server 105 may be a server that provides various services. For example, the Server 105 may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in a conventional physical host and a VPS service (Virtual Private Server).

It should be noted that the image matching method and the three-dimensional image reconstruction method provided by the embodiments of the present disclosure may be generally executed by one of the first terminal device 101, the second terminal device 102, and the third terminal device 103. Correspondingly, the image matching device and the three-dimensional image reconstruction device provided by the embodiments of the present disclosure may also be disposed in one of the first terminal device 101, the second terminal device 102, and the third terminal device 103.

Alternatively, the image matching method and the three-dimensional image reconstruction method provided by the embodiments of the present disclosure may also be generally executed by the server 105. Accordingly, the image matching apparatus and the three-dimensional image reconstruction apparatus provided by the embodiments of the present disclosure may be generally disposed in the server 105. The image matching method and the three-dimensional image reconstruction method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with at least one of the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. Accordingly, the image matching apparatus and the three-dimensional image reconstruction apparatus provided in the embodiments of the present disclosure may also be disposed in a server or a server cluster that is different from the server 105 and is capable of communicating with at least one of the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105.

It should be understood that the number of first terminal devices, second terminal devices, third terminal device networks and servers in fig. 1 is merely illustrative. There may be any number of first terminal device, second terminal device, third terminal device, network and server, as desired for the implementation.

It should be noted that the sequence numbers of the respective operations in the following methods are merely used as representations of the operations for description, and should not be construed as representing the execution order of the respective operations. The method need not be performed in the exact order shown, unless explicitly stated.

Fig. 2 schematically shows a flow chart of an image matching method according to an embodiment of the present disclosure.

As shown in FIG. 2, the method 200 includes operations S210-S240.

In operation S210, the image to be matched of the target object is rotated to obtain a plurality of candidate images to be matched.

In operation S220, a target image to be matched is determined from a plurality of candidate images to be matched according to the first feature information of the target image and the second feature information of the plurality of candidate images to be matched.

In operation S230, matching information between the target image and the target image to be matched is determined according to the third feature information of the target image and the fourth feature information of the target image to be matched.

In operation S240, the target image and the target image to be matched are matched according to the matching information.

According to an embodiment of the present disclosure, the texture of the target object may have at least one of weak texture and repetitiveness. The target image may be an image for a target object. The third feature information may include first feature point information and first feature point descriptor information of the first feature point. The fourth feature information may include second feature point information and second feature point descriptor information of the second feature point.

According to an embodiment of the present disclosure, the target object may refer to an object of interest in an image matching process. The target object may include at least one of a static target object and a dynamic target object. For example, the static target object may include at least one of: static characters, static objects, static scenery, etc. Alternatively, the dynamic target object may comprise at least one of: dynamic characters, dynamic objects, dynamic scenes and the like.

According to an embodiment of the present disclosure, the static target object may include at least one of: a fixed-pose static target object and a pose variable static target object. The stationary target object of a fixed posture may refer to a target object whose relative position is kept constant and posture is fixed with the earth as a reference frame. For example, a stationary pose static target object may include at least one of: houses, buildings, street views, etc. The posable static target object may refer to a target object whose relative position may be changed and whose pose may be changed with the earth as a frame of reference. The change in relative position may include rotation, flipping, tilting, and the like. For example, a posable static target object may include one of the following through y: mechanical equipment, office supplies, and the like.

According to an embodiment of the present disclosure, the target image may refer to an image including a target object. The image to be matched may refer to an image that needs to be matched. The image to be matched may comprise at least one of: the image matching method comprises an initial image to be matched and a preprocessed initial image to be matched.

According to an embodiment of the present disclosure, the target object may have a texture. Texture may refer to a pattern of small shapes, semi-periodic or regularly arranged, that is present in a certain range in an image. The texture of an image may refer to image features that are quantized in the image computation. The texture may describe the spatial color distribution and light intensity distribution of the image or small regions therein. The texture can represent parameters such as the shape, size, distribution, direction and the like of a target to an object in an image through the gray distribution of pixels and surrounding spatial neighborhoods thereof, and reflect the visual characteristics of the homogeneous phenomenon in the image. The texture may embody slowly varying or periodically varying surface structure tissue alignment properties of the surface of the target object in the image. The texture can be expressed by the texture richness degree, the texture distribution condition, the texture repetition condition and the like.

According to an embodiment of the present disclosure, the target image and the image to be matched may have texture features. The texture features may include at least one of: statistical texture features, model texture features, signal processing texture features, and structural texture features. For example, the statistical texture feature may be determined by calculating a spatial correlation of image grays based on a Gray-Level Co-occurrrence Matrix (GLCM). Alternatively, the model-type texture features may be determined based on a random field method and a fractal method. Alternatively, a linear transform, filter, or filter bank may be utilized to convert the texture to the transform domain and apply an energy criterion to determine the texture feature signal processing type texture feature. Alternatively, structural texture features may be determined by modeling the texture to be detected, searching for repetitive patterns in the image.

According to an embodiment of the present disclosure, weak textural property may refer to that an image of a target object does not have significant textural features such as corners, boundaries, etc. within a certain neighborhood of a certain point. A region in which the target object has weak texture may be referred to as a weak texture region. The weak texture regions may have symmetry and isotropy. Repeatability may refer to images of the target object having the same or similar texture features. The region where the target object has repeatability may be referred to as a repetitive texture region.

According to the embodiment of the disclosure, the target image and the image to be matched can be obtained by shooting the same target object on different multi-turn tracks. The method for obtaining the target image and the image to be matched can be set according to actual business requirements, and is not limited herein. For example, the first image capturing device may be disposed on a spiral first track, and the second image capturing device may be disposed on a spiral second track different from the first track, in which case, the target image may be obtained by shooting through the first image capturing device, and the image to be matched may be obtained by shooting through the second image capturing device.

According to the embodiment of the disclosure, the target image and the image to be matched can be determined according to the video files shot on different multi-turn tracks. For example, the target image may be determined from a key frame in a first video sequence captured by a first video capture device on a first track, and the image to be matched may be determined from a random frame in a second video sequence captured by a second video capture device on a second track. The first video sequence and the second video sequence may comprise all or part of a segment of a pre-fetched video file. The video file may be captured by the execution main body using a video capture unit or a video capture device communicatively connected to the execution main body, or may be stored in a storage unit of the execution main body or a storage unit of another electronic apparatus communicatively connected to the execution main body.

According to the embodiment of the disclosure, after the image to be matched of the target object is obtained, the image to be matched of the target object may be rotated to obtain a plurality of candidate images to be matched. For example, the images to be matched of the target object may be rotated by 90 degrees, 180 degrees and 270 degrees, respectively, to obtain a plurality of candidate images to be matched at different angles. After the target image and the candidate images to be matched are obtained, feature extraction can be performed on the target image to obtain first feature information of the target image. And respectively extracting the features of the candidate images to be matched to obtain second feature information of the candidate images to be matched.

According to the embodiment of the present disclosure, after the first feature information of the target image and the second feature information of the plurality of candidate images to be matched are obtained, the similarity between the target object and the plurality of candidate images to be matched may be determined based on the first feature information of the target image and the second feature information of the plurality of candidate images to be matched. And determining a target image to be matched from the candidate images to be matched according to the similarity between the target object and the candidate images to be matched.

According to the embodiment of the disclosure, after the target image to be matched is obtained, feature extraction can be performed on the target image to be matched to obtain third feature information of the target image to be matched. And performing feature extraction on the target image to obtain fourth feature information of the target image. The third feature information may include first feature point information and first feature point descriptor information of the first feature point. The fourth feature information may include second feature point information and second feature point descriptor information of the second feature point. The feature point may refer to at least one of a pixel and a pixel block reflecting an invariant feature of the target object. The feature points may appear at corners or where the texture changes dramatically in the image. The feature points may include corner points, blobs, end points, and the like. The feature point information may refer to information for describing a feature point. The characteristic point information may include at least one of: location information, scale information, and direction information. The feature point descriptor information may be used as a basis for determining whether two feature points are a matching feature point pair. The meaning of the first feature point and the second feature point is the same as the meaning of the feature point, and the first feature point is a feature point of the target image. The second feature points are feature points of the candidate images to be matched. Similarly, the meaning of the first feature point information and the second feature point information is the same as that of the feature point information, and the first feature point information is the feature point information of the target image. The second feature point information is feature point information of the candidate image to be matched. The meaning of the first feature point descriptor information and the second feature point descriptor information is the same as that of the feature point descriptor information, and the first feature point descriptor information is the feature point descriptor information of the target image. The second feature point descriptor information is feature point descriptor information of the candidate to-be-matched image.

According to the embodiment of the present disclosure, the manner of obtaining the first feature information may be the same as or different from the manner of obtaining the second feature information, and is not limited herein as long as the feature information can be extracted. For example, the feature information extraction method may include at least one of: a traditional feature information extraction method and a feature information extraction method based on deep learning. The conventional feature information extraction method may include at least one of: SIFT (Scale Invariant Feature Transform), SURF (speedup Up Robust Features), ORB (organized FAST and organized BRIEF), and so on. The feature information extraction method based on deep learning may include at least one of: a D2-Net-based characteristic information extraction method, an R2D 2-based characteristic information extraction method and the like.

According to the embodiment of the disclosure, after the third feature information of the target image and the fourth feature information of the target image to be matched are obtained, the matching information between the target image and the target image to be matched can be determined according to the third feature information of the target image and the fourth feature information of the target image to be matched based on a feature point matching method. The feature point matching method may include at least one of: a traditional feature point matching method and a feature point matching method based on deep learning.

The image matching method according to the embodiment of the disclosure is further described with reference to fig. 3, fig. 4A, fig. 4B, fig. 5A, fig. 5B, and fig. 5C in conjunction with specific embodiments.

According to an embodiment of the present disclosure, the image matching method 200 may further include the following operations.

And processing the target image and the candidate images to be matched by using the characterization model to obtain first characteristic information of the target image and second characteristic information of the candidate images to be matched.

According to an embodiment of the present disclosure, the characterization model may be obtained by training an auto-supervision model using sample feature information of the positive sample image and sample feature information of a plurality of negative sample images corresponding to the positive sample image. The plurality of negative sample images may be determined from a plurality of candidate negative sample images corresponding to the positive sample image.

According to embodiments of the present disclosure, the self-supervision model may comprise at least one of: CPC (continuous Predictive Coding), AMDIM (amplified Multiscale Deep InfoMax), MOCO (Momentum Contrast), simCLR (Simple frame for Contrast Learning of Visual retrieval), BYOL (Bootstrap Young index), and the like.

According to an embodiment of the present disclosure, an auto-supervised model may include a first encoder and a second encoder. Multiple rounds of training may be performed on the first encoder and the second encoder until a predetermined condition is satisfied. The trained second encoder is determined as the characterization model.

According to an embodiment of the present disclosure, in contrast learning, a child sample image obtained by data enhancement of a parent sample image may be regarded as a positive sample image for the parent sample image. The parent sample image may refer to a sample image as a target of data enhancement processing. In the disclosed embodiment, the positive sample image may include a parent sample image and a positive sample image obtained by performing data enhancement on the parent sample image. A negative sample image may refer to other sample images of a different category than the parent sample image.

According to an embodiment of the present disclosure, a momentum queue may refer to a queue having a certain length. The queue elements in the momentum queue may be referred to as sample characteristic information, i.e., the momentum queue may include a plurality of sample characteristic information. The momentum queue may include sample feature information that may refer to sample feature information corresponding to negative sample images. The momentum queues include sample feature information that can be dynamically updated, i.e., each turn has a momentum queue corresponding to the turn. And updating the momentum queue corresponding to the current turn by adding the sample characteristic information corresponding to the parent view corresponding to the previous turn to the momentum queue corresponding to the previous turn and removing one sample characteristic information of the momentum queue corresponding to the previous turn from the queue according to the time sequence, so that the number of the sample characteristic information included in the momentum queue is kept unchanged.

According to an embodiment of the present disclosure, performing multiple rounds of training on the first encoder and the second encoder may include: and processing the parent sample image corresponding to the current round by using the first encoder corresponding to the current round to obtain the sample characteristic information of the parent sample image corresponding to the current round. And processing the positive sample image corresponding to the current round by using a second encoder corresponding to the current round to obtain the sample characteristic information of the positive sample image corresponding to the current round. The positive sample image is obtained by performing data enhancement on the negative sample image. And training a first encoder and a second encoder corresponding to the current round by using the sample characteristic information of the parent sample image corresponding to the current round, the sample characteristic information of the positive sample image and the sample characteristic information of the negative sample images based on the target loss function. The sample feature information of the negative sample images corresponding to the current round is obtained from the momentum queue corresponding to the current round and the sample feature information of the parent sample image based on the sample selection policy corresponding to the current round. The momentum queue comprises sample characteristic information of the candidate negative sample image, which is obtained by processing the candidate negative sample image by the second encoder. The target loss function may include at least one of: infoNCE (Info Noise-dependent Estimation), NCE (Noise-dependent Estimation Loss), and the like.

According to the embodiment of the disclosure, by determining the negative sample image from the plurality of candidate negative sample images, the negative sample with smaller difference from the positive sample in the momentum queue is prevented from participating in the training of the model, and therefore, the probability of the occurrence of overfitting of the self-supervision model in the training stage is reduced. In addition, the first characteristic information of the target image is obtained by processing the target image by using the characterization model, and the first characteristic information of the candidate images to be matched is obtained by processing the candidate images to be matched by using the characterization model, so that the comprehensiveness of the characteristic information is ensured.

According to an embodiment of the present disclosure, the image matching method may further include the following operations.

And under the condition that the absolute value of the size difference between the target object in the original image and the target object in the original image to be matched is determined to be greater than or equal to a preset threshold value, at least one of the original image and the original image to be matched is subjected to cutting processing, and the target image and the image to be matched are obtained.

According to an embodiment of the present disclosure, a size difference between a target object in a target image and a target object in an image to be matched may be smaller than a predetermined threshold.

According to the embodiment of the disclosure, in response to detecting the image matching request, an original image and an original image to be matched corresponding to the target object may be acquired. After obtaining the original image and the original image to be matched, an absolute value of a size difference between a target object in the original image and a target object in the original image to be matched may be determined. For example, a first size of the target object in the original image may be determined from the original image, and a second size of the target object in the original image to be matched may be determined from the original image to be matched. Based on the first size and the second size, a difference between the first size and the second size is determined. An absolute value of the difference is determined based on the difference between the first dimension and the second dimension.

According to the embodiment of the disclosure, after the absolute value of the size difference between the target object in the original image and the target object in the original image to be matched is determined, whether the cropping processing is required or not can be determined according to the relation between the absolute value of the difference and the predetermined threshold. The predetermined threshold may be set according to actual traffic demands, and is not limited herein. For example, the predetermined threshold may be set to 2.

According to the embodiment of the disclosure, in the case that it is determined that the absolute value of the size difference between the target object in the original image and the target object in the original image to be matched is greater than or equal to the predetermined threshold, at least one of the original image and the original image to be matched may be subjected to cropping processing, so as to obtain the target image and the image to be matched. In the case that it is determined that the absolute value of the size difference between the target object in the original image and the target object in the original image to be matched is smaller than the predetermined threshold, the original image may be directly determined as the target image, and the original image to be matched may be determined as the image to be matched.

According to the embodiment of the disclosure, performing cropping processing on at least one of the original image and the original image to be matched to obtain the target image and the image to be matched may include the following operations.

In the case that the original image is determined to be cut, object detection is performed on the original image, and first detection information is obtained. The first detection information may include first position information of a first detection frame corresponding to the target object. And extracting an image area corresponding to the target object from the original image according to the first position information to obtain a target image.

According to the embodiment of the disclosure, in the case that it is determined that the original image is to be cropped, object detection may be performed on the original image, resulting in first detection information. The target detection method may include at least one of: a conventional target detection method and a deep learning-based target detection method. The conventional target detection method may include at least one of: an AdaBoost algorithm framework, histogram of Oriented Gradients (HOG), and Support Vector Machine (SVM). The target detection method based on deep learning can comprise at least one of the following steps: one-stage target detection and Two-stage target detection.

According to the embodiment of the disclosure, the structure of the One-stage target detector can comprise functional modules such as a feature extraction layer, an additional layer, a transition layer, a feature pyramid and information fusion layer, and a detection classification prediction layer. The functional modules can be used for enhancing the characterization capability of the network, fusing the characteristic information and completing the positioning and classification of the target by utilizing the characteristic information. The structure of the Two-stage target detector can comprise a feature extraction layer, a regional candidate network layer for target region screening and fine tuning and a prediction network layer for target precise positioning and classification. Two-stage target detectors, fast-RCNN and R-FCN (i.e., region-based fused connectivity Network), etc.

According to the embodiment of the disclosure, the target detection can be performed on the original image based on the target detection method, so as to obtain the first global feature information. And obtaining at least one candidate first detection frame according to the first global feature information. Each of the at least one candidate first detection box may be used to characterize a candidate first image region. According to the at least one candidate first detection frame, a first detection frame corresponding to the target object is determined. After the first detection frame is determined, a first coordinate system may be established with a predetermined position of the original image as an origin and a predetermined numerical value as a unit. And determining the coordinate value of the vertex of the first detection frame according to the distance between the first detection frame and the coordinate axis so as to obtain the first position information of the first detection frame.

According to an embodiment of the present disclosure, after the first position information is obtained, a first image region of the original image corresponding to the target object may be determined according to the first position information. For example, a first image region corresponding to the target object may be cut out from the original image according to the first position information to obtain the target image.

According to the embodiment of the disclosure, at least one of the original image and the original image to be matched is subjected to cropping processing to obtain the target image and the image to be matched, and the method may further include the following operation.

And under the condition that the original image to be matched is determined to be cut, performing target detection on the original image to be matched to obtain second detection information. The second detection information may include second position information of a second detection frame corresponding to the target object. And according to the second position information, extracting an image area corresponding to the target object from the original image to be matched to obtain the image to be matched.

According to the embodiment of the disclosure, under the condition that the original image to be matched is determined to be cut, the target detection can be performed on the original image to be matched to obtain the second detection information. The target detection method may include at least one of: a conventional target detection method and a deep learning-based target detection method.

According to the embodiment of the disclosure, the original image to be matched can be subjected to target detection based on a target detection method, so that second global feature information is obtained. And obtaining at least one candidate second detection frame according to the second global feature information. Each of the at least one candidate second detection box may be used to characterize a candidate second image region. And determining a second detection frame corresponding to the target object according to the at least one candidate second detection frame. After the second detection frame is determined, a second coordinate system may be established with a predetermined position of the original image to be matched as an origin and a predetermined numerical value as a unit. And determining the coordinate value of the vertex of the second detection frame according to the distance between the second detection frame and the coordinate axis so as to obtain second position information of the second detection frame.

According to the embodiment of the disclosure, after the second position information is obtained, a second image area of the original image to be matched corresponding to the target object may be determined according to the second position information. For example, the second image region corresponding to the target object may be cut out from the original image to be matched according to the second position information to obtain the image to be matched.

According to the embodiment of the disclosure, in the case that it is determined that the absolute value of the size difference between the target object in the original image and the target object in the original image to be matched is greater than or equal to the predetermined threshold, by performing the cropping processing on at least one of the original image and the original image to be matched, the target object in the image can be accurately positioned. In addition, because the size difference between the target object in the target image and the target object in the image to be matched is smaller than the preset threshold value, the resolution of the target object can be improved, and the image matching effect and the three-dimensional image reconstruction effect are further improved.

Under the condition that the absolute value of the size difference between the target object in the original image and the target object in the original image to be matched is smaller than a preset threshold value, the original image is determined as a target image and the original image to be matched is determined as an image to be matched.

According to the embodiment of the present disclosure, if it is determined that the absolute value of the size difference between the target object of the original image and the target object of the original image to be matched is smaller than the predetermined threshold, it may be said that the original image and the original image to be matched do not need to be cropped, and thus, the original image may be determined as the target image. And determining the original image to be matched as the image to be matched.

Fig. 3 schematically illustrates an example schematic diagram of performing a cropping process on at least one of an original image and an original image to be matched to obtain a target image and an image to be matched according to an embodiment of the present disclosure.

As shown in fig. 3, an absolute value 303 of a size difference between a target object in an original image 301 and a target object in an original image to be matched 302 may be determined from the original image 301 and the original image to be matched 302.

In the case where it is determined that the absolute value 303 of the size difference between the target object in the original image 301 and the target object in the original image to be matched 302 is greater than or equal to the predetermined threshold and the original image 301 is to be cropped, target detection may be performed on the original image 301, resulting in the first detection information 304. The first detection information 304 may include first position information 304_1 of a first detection box corresponding to the target object. An image region corresponding to the target object is extracted from the original image 301 based on the first position information 304_1, resulting in a target image 305.

In the case that it is determined that the absolute value 303 of the size difference between the target object in the original image 301 and the target object in the original image to be matched 302 is greater than or equal to the predetermined threshold and the original image to be matched 302 is to be cropped, the original image to be matched 302 may be subjected to target detection to obtain second detection information 306. The second detection information 306 may include second position information 306_1 of a second detection box corresponding to the target object. According to the second position information 306_1, an image area corresponding to the target object is extracted from the original image to be matched 302, and an image to be matched 307 is obtained.

And respectively extracting the features of the target image and the target image to be matched to obtain a first feature vector of the target image and a second feature vector of the target image to be matched. And obtaining third feature information of the target image according to the first feature vector. And obtaining fourth feature information of the target image to be matched according to the second feature vector of the target image to be matched.

According to an embodiment of the present disclosure, the third feature information of the target image may be obtained from the first feature vector of the target image. The first feature vector of the target image may be obtained by feature extraction of the target image. For example, feature extraction may be performed on the target image to obtain a first feature vector of the target image. And decoding the first characteristic vector of the target image to obtain third characteristic information of the target image.

According to the embodiment of the disclosure, the fourth feature information of the target image to be matched may be obtained according to the second feature vector of the target image to be matched. The second feature vector of the target image to be matched may be obtained by performing feature extraction on the target image to be matched. For example, feature extraction may be performed on the target image to be matched to obtain a second feature vector of the target image to be matched. And decoding the second characteristic vector of the target image to be matched to obtain fourth characteristic information of the target image to be matched.

According to an embodiment of the present disclosure, the first feature vector of the target image may be obtained by processing the target image using the second feature information extraction model. The second feature vector of the target image to be matched may be obtained by processing the target image to be matched using the second feature information extraction model. The feature information extraction model may be obtained by training the first deep learning model using the sample image pair and the real sample feature information of the sample image pair. The sample image pair may include a first sample image and a second sample image. The second sample image may be a homography transform of the first sample image. The true sample feature information of the sample image pair may include first true sample feature information and second true sample feature information. The first true sample feature information may be true sample feature information of the first sample image. The second true sample feature information may be true sample feature information of the second sample image. The first real sample feature information may include real feature point information and real feature point descriptor information of real feature points of the first sample image. The second true sample feature information may include true feature point information and true feature point descriptor information of true feature points of the second sample image.

According to an embodiment of the present disclosure, the first deep learning model may include a first deep learning sub-module and a second deep learning sub-module. The first deep learning submodel may include a first encoder and a first decoder. The first decoder may include a second decoder and a third decoder. The second deep learning submodel may include a second encoder and a fourth decoder. The fourth decoder may include a fifth decoder and a sixth decoder. And in the process of the first deep learning model, the values of the model parameters of the first deep learning submodel and the second deep learning submodel are kept consistent. The feature information extraction model may be one of a trained first deep learning submodel and a trained second deep learning submodel, e.g., the feature information extraction model may include a trained first encoder and a first decoder. Alternatively, the feature information extraction model may include a trained second encoder and a fourth decoder.

According to an embodiment of the present disclosure, the feature information extraction model may be obtained by training the first deep learning model using the sample image pair and the real sample feature information of the sample image pair, and may include: the feature information extraction model may be obtained by adjusting a model parameter of the first deep learning model according to the loss function value. The loss function value may be determined based on the first loss function value, the second loss function value, and the third loss function value. The first loss function value may be obtained from real feature point information of the real feature point of the first sample image and predicted feature point information of the predicted feature point of the first sample image based on the first loss function. The second loss function value may be obtained from real feature point information of the real feature point of the second sample image and predicted feature point descriptor information of the predicted feature point of the second sample image based on the first loss function. The third loss function value may be obtained based on the second loss function from the true feature point descriptor information of the true feature point of the first sample image, the predicted feature point descriptor information of the true feature point of the first sample image, the true feature point descriptor information of the true feature point of the second sample image, and the predicted feature point descriptor information of the true feature point of the second sample image.

According to an embodiment of the present disclosure, the predicted feature point information of the predicted feature point of the first sample image may be obtained by processing the predicted feature vector of the first sample image with the first decoder. The predicted feature point descriptor information of the predicted feature point of the first sample image may be obtained by processing the predicted feature vector of the first sample image with the second decoder. The predicted feature vector of the first sample image may be obtained by processing the first sample image using the first encoder.

According to an embodiment of the present disclosure, the predicted feature point information of the predicted feature point of the second sample image may be obtained by processing the predicted feature vector of the second sample image with the third decoder. The predicted feature point descriptor information of the predicted feature point of the second sample image may be obtained by processing the predicted feature vector of the second sample image with the fourth decoder. The predicted feature vector of the second sample image may be obtained by processing the second sample image using the second encoder.

According to an embodiment of the present disclosure, the first real sample feature information may be obtained by processing the first sample image using a feature point extraction model. The second real sample feature information may be obtained by processing the second sample image using the feature point extraction model. The feature point extraction model may be obtained by training a second deep learning model using the synthetic data. The synthetic data may include at least one of: corner points and end points, etc.

According to the embodiment of the disclosure, obtaining the third feature information of the target image according to the first feature vector may include the following operations.

And performing first decoding on the first feature vector to obtain first feature point information of a first feature point of the target image. And carrying out second decoding on the first feature vector to obtain first feature point description information of the first feature point of the target image.

According to an embodiment of the present disclosure, a first feature vector of a target image may be processed by a trained second decoder of a feature information extraction model to obtain first feature point information of a first feature point of the target image. A trained third decoder of the feature information extraction model may be utilized to process the first feature vector of the target image to obtain first feature point descriptor information for the first feature point of the target image.

According to an embodiment of the present disclosure, the first feature point information of the first feature point of the target image may be obtained by processing the first feature vector of the target image using a trained second decoder of the feature information extraction model. The first feature point descriptor information of the first feature point of the target image may be obtained by processing the first feature vector of the target image using a trained third decoder of the feature information extraction model.

According to the embodiment of the disclosure, fourth feature information of the target image to be matched can be obtained according to the second feature vector. For example, the second feature vector may be subjected to third decoding to obtain second feature point information of a second feature point of the target image to be matched. And performing fourth decoding on the second feature vector to obtain second feature point description information of a second feature point of the target image to be matched.

According to the embodiment of the disclosure, the trained second decoder of the feature information extraction model may be utilized to process the second feature vector of the target image to be matched, so as to obtain second feature point information of the second feature point of the target image to be matched. A trained third decoder of the feature information extraction model may be utilized to process a second feature vector of the target image to be matched to obtain second feature point descriptor information of a second feature point of the target image to be matched.

According to an embodiment of the present disclosure, the second feature point information of the second feature point of the target image to be matched may be obtained by processing the second feature vector of the target image to be matched using a trained second decoder of the feature information extraction model. The second feature point descriptor information of the second feature point of the target image to be matched may be obtained by processing the second feature vector of the target image to be matched using a trained third decoder of the feature information extraction model.

According to the embodiment of the disclosure, the first feature vector can represent the features of the target image, the second feature vector can represent the features of the target image to be matched, the third feature information of the target image is obtained according to the first feature vector, and the fourth feature information of the target image to be matched is obtained according to the second feature vector of the target image to be matched, so that the correctness of the second feature information can be ensured, and the effect of image matching can be improved.

Fig. 4A schematically illustrates an example schematic diagram of a method of determining third feature information of a target image and fourth feature information of a target image to be matched according to an embodiment of the present disclosure.

As shown in fig. 4A, a target image 402 and an image to be matched 403 of a target object 401 may be determined. And rotating the image 403 to be matched of the target object 401 to obtain a plurality of candidate images to be matched. The target image to be matched 404 is determined from a plurality of candidate images to be matched. Feature extraction is performed on the target image 402 to obtain a first feature vector 405 of the target image 402. And extracting the features of the target image to be matched 404 to obtain a second feature vector 406 of the target image to be matched 404.

After obtaining the first feature vector 405 of the target image 402, third feature information 407 of the target image 402 may be obtained according to the first feature vector 405. For example, the first feature vector 405 may be first decoded to obtain first feature point information 4071 of the first feature point of the target image 402. The first feature vector 405 is decoded for the second time, so as to obtain first feature point description information 4072 of the first feature point of the target image 402.

After obtaining the second feature vector 406 of the target image to be matched 404, fourth feature information 408 of the target image to be matched 404 may be obtained according to the second feature vector 406. For example, the second feature vector 406 may be subjected to third decoding, so as to obtain second feature point information 4081 of the second feature point of the target image to be matched 404. And performing second decoding on the second feature vector 406 to obtain second feature point description information 4082 of a second feature point of the target image to be matched 404.

And respectively extracting characteristic point information of the target image and the target image to be matched to obtain first characteristic point information of the target image and second characteristic point information of the target image to be matched. And obtaining first characteristic point descriptor information of the target image according to the first characteristic point information of the target image. And obtaining second feature point descriptor information of the target image to be matched according to the second feature point information of the target image to be matched.

According to the embodiment of the disclosure, the target image can be processed by using a traditional feature extraction method, and the third feature information of the target image is obtained. The target image to be matched can be processed by utilizing a traditional characteristic information extraction method, and fourth characteristic information of the target image to be matched is obtained.

According to the embodiment of the disclosure, feature point information extraction can be performed on the target image to obtain first feature point information of a first feature point of the target image. The first feature point information of the first feature point may include at least one of first position information and first scale information of the first feature point. And determining first direction information of the first characteristic point according to the first characteristic point information of the first characteristic point. And obtaining first characteristic point descriptor information of the first characteristic point of the target image according to the first direction information of the first characteristic point. And obtaining third characteristic information of the target image according to the first characteristic point information of the first characteristic point of the target image and the first characteristic point descriptor information.

According to the embodiment of the disclosure, the feature point information of the target image to be matched can be extracted, and the second feature point information of the second feature point of the target image to be matched is obtained. The second feature point information of the second feature point may include at least one of second position information and second scale information of the second feature point. And determining second direction information of the second characteristic point according to the second characteristic point information of the second characteristic point. And obtaining second characteristic point descriptor information of the second characteristic points of the candidate images to be matched according to the second direction information of the second characteristic points. And obtaining fourth feature information of the target image to be matched according to the second feature point information of the second feature point of the target image to be matched and the second feature point descriptor information.

According to the embodiment of the disclosure, the first feature point information and the first feature point descriptor information of the target image can represent the features of the target image, and the second feature point information and the second feature point descriptor information of the target image to be matched can represent the features of the target image to be matched, so that the first feature point information and the first feature point descriptor information can ensure the correctness of the third feature information, and the second feature point information and the second feature point descriptor information can ensure the correctness of the fourth feature information, which is beneficial to improving the image matching effect.

Fig. 4B schematically illustrates an example schematic diagram of a method of determining third feature information of a target image and fourth feature information of a target image to be matched according to another embodiment of the present disclosure.

As shown in fig. 4B, a target image 410 of a target object 409 and an image to be matched 411 may be determined. And rotating the image 411 to be matched of the target object 409 to obtain a plurality of candidate images to be matched. The target image to be matched 412 is obtained from a plurality of candidate images to be matched. Feature point information extraction is performed on the target image 410 to obtain first feature point information 413 of the target image 410. First feature point descriptor information 414 of the target image 410 is obtained from the first feature point information 413 of the target image 410. Third feature information of the target image 410 is obtained from the first feature point information 413 and the first feature point descriptor information 414.

Feature point information extraction may be performed on the target image to be matched 412, so as to obtain second feature point information 415 of the target image to be matched 412. And obtaining second feature point descriptor information 416 of the target image to be matched 412 according to the second feature point information 415 of the target image to be matched 412. And obtaining fourth feature information of the target image to be matched 412 according to the second feature point information 415 and the second feature point descriptor information 416.

According to an embodiment of the present disclosure, operation S220 may include the following operations.

And determining the similarity between the target object and the candidate images to be matched according to the first characteristic information of the target image and the second characteristic information of the candidate images to be matched. And determining the target image to be matched from the candidate images to be matched according to the similarity between the target object and the candidate images to be matched.

According to the embodiment of the disclosure, the similarity between the target object and the plurality of candidate images to be matched can be determined according to the first feature information of the target image and the second feature information of the plurality of candidate images to be matched based on the image similarity determination method. The image similarity determination method may include at least one of: an image Similarity determination method based on a Hash (i.e., hash) value, an image Similarity determination method based on template matching, and an image Similarity determination method based on Structural Similarity (SSIM).

According to an embodiment of the present disclosure, the image similarity determination method based on a hash value may include at least one of: the average hash algorithm (i.e., aHash), the perceptual hash algorithm (i.e., pHash), and the differential hash algorithm (i.e., dHash). The image similarity determination method based on template matching may include at least one of: squared error matching and correlation matching.

According to the embodiment of the disclosure, after the similarity between the target object and the candidate images to be matched is obtained, the target images to be matched can be determined from the candidate images to be matched according to the similarity between the target object and the candidate images to be matched. For example, the multiple candidate images to be matched may be ranked according to the similarity between the target object and the multiple candidate images to be matched, so as to obtain ranking information. And determining a target image to be matched from the candidate images to be matched according to the sorting information. For example, the images to be matched are ranked according to the similarity from high to low, and the candidate image to be matched with the top rank is determined from the multiple candidate images to be matched according to the ranking information. And determining the candidate image to be matched with the top rank as the target image to be matched. Alternatively, the images to be matched with the candidate in the closest sequence may be determined from the plurality of images to be matched according to the sequence information, wherein the sequence is from small to large according to the similarity. And determining the candidate image to be matched with the highest ranking as the target image to be matched.

According to an embodiment of the present disclosure, operation S230 may include the following operations.

Based on a traditional feature point matching method, according to first feature point descriptor information of a first feature point of a target image and second feature point descriptor information of a second feature point of the target image to be matched, matching information between the target image and the target image to be matched is determined.

According to an embodiment of the present disclosure, a conventional feature point matching method may include at least one of: a fast nearest neighbor search method, a random sample consensus method, a random K-D tree method, a graph transformation matching method and the like.

According to an embodiment of the present disclosure, the matching information may be determined based on a conventional feature point matching method according to first feature point descriptor information of a first feature point of the target image and second feature point descriptor information of a second feature point of the target image to be matched. The matching information may include matching feature point pairs between the target image and the target image to be matched. The pair of matching feature points may include a first matching feature point and a second matching feature point. The first matching feature point may be a first feature point of the target image. The second matching feature point may be a second feature point of the target image to be matched.

According to the embodiment of the disclosure, the matching information between the target image and the target image to be matched is determined based on the traditional feature point matching method, so that an additional intermediate step is avoided, and therefore, the efficiency of determining the matching information is improved.

Fig. 5A schematically illustrates an example schematic diagram of determining matching information between a target image and a target image to be matched according to third feature information of the target image and fourth feature information of the target image to be matched according to an embodiment of the present disclosure.

As shown in fig. 5A, matching information 503 between the target image and the target image to be matched may be determined based on a conventional feature point matching method from first feature point descriptor information 501 of a first feature point of the target image and second feature point descriptor information 502 of a second feature point of the target image to be matched.

And obtaining matching descriptor information of the feature points of the target image and the target image to be matched according to the third feature information of the target image and the fourth feature information of the target image to be matched based on the attention map learning method. And determining a matching degree evaluation matrix between the target image and the target image to be matched according to the first matching descriptor information of the first characteristic point of the target image and the second matching descriptor information of the second characteristic point of the target image to be matched. And determining the matching information between the target image and the target image to be matched according to the matching degree evaluation matrix.

According to an embodiment of the present disclosure, the matching descriptor information of the feature point may be determined from the feature point information of the feature point and the feature point descriptor information based on an attention-deficit-force learning method. The match descriptor information may include a match descriptor matrix. The first matching descriptor information of the first feature point and the second matching descriptor information of the second feature point have the same meaning as the matching descriptor information of the feature point. The matching degree evaluation matrix can be used for evaluating the matching degree between the first characteristic point of the target image and the second characteristic point of the target image to be matched.

According to the embodiment of the disclosure, the third feature information of the target image and the fourth feature information of the target image to be matched can be processed by using the feature point matching model, so that the transformation information between the target image and the target image to be matched is obtained. The feature point matching model may be obtained by training a second deep learning model using the second sample image pair. The second sample image pair may include a third sample image and a fourth sample image. The model structure of the second deep learning model may be configured according to actual business requirements, and is not limited herein. For example, the second deep learning model may include a third encoder and a graph neural network module. The graph neural network module may include an attention-based policy graph neural network module.

According to the embodiment of the disclosure, because the matching descriptor information between the target image and the target image to be matched is determined based on the attention-seeking learning method, not only the feature point descriptor information of the feature points are considered, but also the geometric relationship and the constraint between the feature points are considered, and therefore, the accuracy of the matching information is improved.

Fig. 5B schematically illustrates an example schematic diagram of determining matching information between a target image and a target image to be matched according to third feature information of the target image and fourth feature information of the target image to be matched according to another embodiment of the present disclosure.

As shown in fig. 5B, first matching descriptor information 505 of the first feature point of the target image may be obtained from the third feature information 504 of the target image based on an attention map learning method. And obtaining second matching descriptor information 507 of a second feature point of the target image to be matched according to the fourth feature information 506 of the target image to be matched based on the attention-driven learning method.

And determining a matching degree evaluation matrix 508 between the target image and the target image to be matched according to the first matching descriptor information 505 of the target image and the second matching descriptor information 507 of the feature points of the target image to be matched. And determining matching information 509 between the target image and the target image to be matched according to the matching degree evaluation matrix 508.

According to an embodiment of the disclosure, obtaining first matching descriptor information of a first feature point of a target image and second matching descriptor information of a second feature point of a target image to be matched according to third feature information of the target image and fourth feature information of the target image to be matched based on an attention-seeking learning method may include the following operations.

Respectively extracting the features of first feature point information of a first feature point of a target image and second feature point information of a second feature point of the target image to be matched to obtain first intermediate feature point information of the first feature point of the target image and second intermediate feature point information of the second feature point of the target image to be matched. And obtaining first fusion information of the first characteristic point of the target image according to the first intermediate characteristic point information and the first characteristic point descriptor information. And obtaining second fusion information of the second characteristic point of the target image to be matched according to the second intermediate characteristic point information and the second characteristic point descriptor information. And obtaining first matching descriptor information of the first characteristic point of the target image and second matching descriptor information of the second characteristic point of the target image to be matched according to the first fusion information and the second fusion information based on the attention-seeking learning method.

According to the embodiment of the disclosure, feature extraction may be performed on first feature point information of a first feature point of a target image to obtain first intermediate feature point information of the first feature point of the target image. The feature extraction may be performed on the second feature point information of the second feature point of the target image to be matched to obtain second intermediate feature point information of the second feature point of the target image to be matched. The first intermediate feature point information of the feature point of the target image and the first feature point descriptor information may be fused to obtain first fusion information of the first feature point of the target image. The second intermediate feature point information of the target image to be matched and the second feature point descriptor information can be fused to obtain second fusion information of the second feature point of the target image to be matched.

According to the embodiment of the disclosure, the third encoder may be utilized to process the first intermediate feature point information and the first feature point descriptor information of the first feature point of the target image, so as to obtain the first fusion information of the first feature point of the target image. Second intermediate feature point information and second feature point descriptor information of a second feature point of the target image to be matched can be processed by a third encoder to obtain second fusion information of the second feature point of the target image to be matched.

According to the embodiment of the disclosure, a graph neural network method based on attention strategies processes first fusion information of a first feature point of a target image and second fusion information of a second feature point of a target image to be matched to obtain first matching descriptor description information of the first feature point of the target image and second matching descriptor information of the second feature point of the target image to be matched. Attention strategies may include self-attention strategies and cross-attention strategies. For example, the first fusion information of the target image and the second fusion information of the target image to be matched may be processed by using a graph neural network module based on an attention policy to obtain first matching descriptor information of the target image and second matching descriptor information of the target image to be matched.

According to the embodiment of the present disclosure, determining the matching degree evaluation matrix between the target image and the target image to be matched according to the first matching descriptor information and the second matching descriptor information may include the following operations.

And determining the similarity between the first characteristic point of the target image and the second characteristic point of the target image to be matched according to the first matching descriptor information and the second matching descriptor information. And obtaining a matching degree evaluation matrix between the target image and the target image to be matched according to the similarity between the first characteristic point of the target image and the second characteristic point of the target image to be matched.

According to an embodiment of the present disclosure, the similarity may characterize a degree of similarity between two feature points. The relationship between the similarity value and the similarity degree may be configured according to actual service requirements, and is not limited herein. For example, the greater the numerical value of the degree of similarity, the greater the degree of similarity between two feature points can be characterized. Conversely, the smaller the degree of similarity between two feature points. Alternatively, the smaller the numerical value of the similarity, the greater the degree of similarity between two feature points can be characterized. Conversely, the smaller the degree of similarity between two feature points. The similarity may be configured according to actual service requirements, and is not limited herein. For example, the similarity may include one of a cosine similarity, a pearson correlation coefficient, a euclidean distance, a Jaccard distance, and the like.

According to the embodiment of the disclosure, determining the matching information between the target image and the target image to be matched according to the matching degree evaluation matrix may include the following operations.

And processing the matching degree evaluation matrix based on the optimal transmission method to obtain a distribution matrix. And determining matching information between the target image and the target image to be matched according to the distribution matrix.

According to the embodiment of the disclosure, the matching degree evaluation matrix can be processed based on the optimal transmission method to obtain the distribution matrix. The optimal transmission method may include one of: the Sinkhorn method and the entropy regularization method. After obtaining the allocation matrix, matching information between the target image and the target image to be matched may be determined according to the allocation matrix.

Fig. 5C schematically illustrates an example schematic diagram of determining matching information between a target image and a target image to be matched according to third feature information of the target image and fourth feature information of the target image to be matched according to another embodiment of the present disclosure.

As shown in fig. 5C, the third feature information 510 of the target image may include first feature point information 5101 of the first feature point of the target image and first feature point descriptor information 5102 of the first feature point of the target image. The third feature information 511 of the target image to be matched may include second feature point information 5111 of the second feature point of the target image to be matched and second feature point descriptor information 5112 of the second feature point of the target image to be matched.

The first feature point information 5101 of the feature points of the target image may be subjected to feature extraction to obtain first intermediate feature point information 512 of the first feature point of the target image. First fusion information 513 of the first feature point of the target image is determined from the first intermediate feature point information 512 and the first feature point descriptor information 5102 of the target image.

Feature extraction may be performed on the second feature point information 5111 of the second feature point of the target image to be matched, so as to obtain second intermediate feature point information 514 of the second feature point of the target image to be matched. And determining second fusion information 515 of the second characteristic point of the target image to be matched according to the second intermediate characteristic point information 514 and the second characteristic point descriptor information 5112 of the target image to be matched.

Based on the attention-seeking learning method, first matching descriptor information 516 of the first feature point of the target image and second matching descriptor information 517 of the second feature point of the target image to be matched are determined from the first fusion information 513 of the first feature point of the target image and the second fusion information 515 of the second feature point of the target image to be matched.

And determining the similarity 518 between the characteristic points of the target image and the characteristic points of the target image to be matched according to the first matching descriptor information 516 of the target image and the second matching descriptor information 517 of the second characteristic points of the target image to be matched. And determining a matching degree evaluation matrix 519 between the target image and the target image to be matched according to the similarity 518 between the characteristic points of the target image and the characteristic points of the target image to be matched.

The match evaluation matrix 519 is processed based on an optimal transmission method to obtain an allocation matrix 520. According to the distribution matrix 520, matching information 521 between the target image and the target image to be matched is determined.

The above is only an exemplary embodiment, but is not limited thereto, and other image matching methods known in the art may be included as long as the accuracy of image matching can be improved.

Fig. 6 schematically shows a flow chart of a method of three-dimensional image reconstruction according to an embodiment of the present disclosure.

As shown in fig. 6, the method 600 includes operations S610 to S620.

In operation S610, matching information between a target image and a target image to be matched is acquired.

In operation S620, a three-dimensional reconstruction is performed on the target object according to the matching information.

According to the embodiment of the disclosure, the matching information between the target image and the target image to be matched may be obtained by using the image matching method according to the embodiment of the disclosure.

According to the embodiment of the disclosure, the matching information between the target image and the target image to be matched can be determined according to the third characteristic information of the target image and the fourth characteristic information of the target image to be matched by using an image matching method. The third feature information of the target image may include first feature point information and first feature point descriptor information of the first feature point of the target image. The fourth feature information of the target image to be matched may include second feature point information and second feature point descriptor information of a second feature point of the target image to be matched.

According to the embodiment of the present disclosure, after the matching information is obtained, the target object may be three-dimensionally reconstructed according to the matching information. Three-dimensional reconstruction may refer to the creation of a mathematical model of a target object that is suitable for computer representation and processing in order to manipulate, manipulate and analyze the properties of the target object in a computer environment. The target object may be reconstructed three-dimensionally based on the matching information. For example, the target object may be three-dimensionally reconstructed from the matching information using a predetermined three-dimensional reconstruction algorithm. The predetermined three-dimensional reconstruction algorithm may include at least one of: HMR (i.e., human Mesh Recovery) algorithm, simplify x algorithm, and Total Capture algorithm.

According to the embodiment of the disclosure, the matching information between the target image and the target image to be matched is obtained by using the image matching method, so that the correctness of the matching information is ensured, and the stability of the matching information is improved. Further, since the texture of the target object has at least one of weak texture and repetitiveness, the influence of weak texture and repetitive texture is reduced. On the basis, the target object is subjected to three-dimensional reconstruction according to the matching information, so that the success rate of three-dimensional image reconstruction is guaranteed, and the effect of three-dimensional image reconstruction is improved.

The three-dimensional image reconstruction method according to the embodiment of the disclosure is further described with reference to fig. 7A, 7B, and 7C in conjunction with specific embodiments.

Operation S620 may include the following operations according to an embodiment of the present disclosure.

And determining the camera pose of the camera according to the matching information. And generating a three-dimensional point cloud model according to the camera pose and the matching information. And generating a grid model according to the three-dimensional point cloud model. And generating a mapping model according to the grid model, the target image and the target image to be matched.

According to the embodiment of the disclosure, the target image and the target image to be matched can be obtained by shooting according to cameras with different view angles. The matching information may include matching feature point pairs between the target image and the target image to be matched. The pair of matching feature points may include a first matching feature point and a second matching feature point. The first matching feature points may be feature points of the target image. The second matching feature points may be feature points of the target image to be matched.

According to the embodiment of the disclosure, after the matching information between the target image and the target image to be matched is obtained by using the image matching method, the camera pose of the camera can be determined according to the matching information. Determining the pose of the camera may refer to a process of solving coordinates and a rotation angle of the camera located in a coordinate system through feature points of known coordinates and imaging of the feature points in an image. The manner of determining the pose of the camera may include at least one of: determining a camera pose based on a feature point method and determining a camera pose based on a direct method.

According to the embodiment of the disclosure, the camera array can be calibrated through the calibration board according to the matching information, and the target camera and the camera pose corresponding to the target camera are determined from the camera array. And determining a search area of the target camera by using the camera pose corresponding to the target camera and the geometric constraint information. And determining a target camera to be matched and a camera pose corresponding to the target camera to be matched in the search area according to the search area of the target camera. The camera pose may include at least one of: position information of the camera and pose information of the camera. The position information of the camera can be determined through conversion between the world coordinate system, the camera coordinate system, the image coordinate system and the pixel coordinate system. The pose information of the camera may include at least one of: the pitch attitude of the camera and the camera's shooting perspective.

According to embodiments of the present disclosure, after determining the camera pose, a three-dimensional point cloud model may be generated from the camera pose and the matching information. For example, point cloud data conversion processing can be performed on the matching information according to the pose of the camera, so that matching point cloud data to be registered are obtained. And carrying out point cloud registration processing on the matched point cloud data to be registered to obtain the matched point cloud data. And generating a three-dimensional point cloud model according to the matched point cloud data. A three-dimensional point cloud model may refer to a data set of points of a target object located under a certain coordinate system. The three-dimensional point cloud model may include three-dimensional coordinates of the target object and an RGB color mode (RGB color mode), etc.

According to an embodiment of the present disclosure, a point cloud registration processing method may include at least one of: a coarse registration method and a fine registration method. The coarse registration method may include, for example, the 4-point method (4-Points consistency Sets,4 PCS). The fine registration method may include, for example, discriminant Optimization (DO) and Iterative Closest Point (ICP) algorithms, etc.

According to the embodiment of the disclosure, the target position information of the target image in the target camera can be determined according to the camera pose corresponding to the target camera. And determining the target to-be-matched position information of the target to-be-matched image in the target to-be-matched camera according to the camera pose corresponding to the target to-be-matched camera. And determining the depth information of the target image according to the camera pose corresponding to the target camera, the target position information, the camera pose corresponding to the target camera to be matched and the target position information to be matched. And determining the depth information of the target image to be matched according to the depth information of the target image. Information fusion can be carried out according to the depth information of the target image and the depth information of the target image to be matched, and the fused depth information is obtained. And point cloud reconstruction is carried out according to the fused depth information to obtain a three-dimensional point cloud model.

According to an embodiment of the present disclosure, after obtaining the three-dimensional point cloud model, a mesh model may be generated from the three-dimensional point cloud model. For example, a mesh model of polygons may be constructed from a contiguous point cloud of target objects. The shape style of the mesh model may include at least one of: triangular, quadrilateral or convex polygon.

According to an embodiment of the present disclosure, after obtaining the mesh model, a map model may be generated from the mesh model. The mapping model may refer to a mesh model that includes color information. For example, the map model may be rendered according to the texture coordinates of each mesh and the corresponding texture image.

Fig. 7A schematically illustrates an example schematic diagram of a three-dimensional image reconstruction method according to an embodiment of the present disclosure.

As shown in fig. 7A, in an embodiment of the present disclosure, the texture of the target object has repeatability. Taking the target object as a "magic cube" as an example, the target image 701 of the "magic cube" and the image to be matched 7021 may be obtained.

The image to be matched 7021 of the target object may be rotated to obtain a candidate image to be matched 7022 and a candidate image to be matched 7023. First feature information 703 of the target image 701 is determined. Second feature information 7041 of the image to be matched 7021 is determined. Second feature information 7042 of the candidate image to be matched 7022 is determined. Second feature information 7043 of the candidate image to be matched 7023 is determined.

The target image to be matched 705 can be determined according to the first feature information 703 of the target image 701, the second feature information 7041 of the image to be matched 7021, the second feature information 7042 of the candidate image to be matched 7022, and the second feature information 7043 of the candidate image to be matched 7023.

After the target image to be matched 705 is determined, fourth feature information 706 of the target image to be matched 705 may be determined. Third feature information 707 of the target image 701 is determined. According to the third characteristic information 707 of the target image 701 and the fourth characteristic information 706 of the target image to be matched 705, matching information 708 between the target image 701 and the target image to be matched 705 is determined.

After determining the match information 708, a camera pose 709 of the camera may be determined from the match information 708. From the camera pose 709 and the matching information 708, a three-dimensional point cloud model 710 is generated. From the three-dimensional point cloud model 710, a mesh model 711 is generated. And determining a map model 712 according to the grid model 711, the target image 701 and the target image to be matched 705.

Fig. 7B schematically illustrates an example schematic diagram of a three-dimensional image reconstruction method according to another embodiment of the present disclosure.

As shown in fig. 7B, in an embodiment of the present disclosure, the texture of the target object has weak texture. Taking the target object as an "architectural model" as an example, a target image 713 of the "architectural model" and an image 7141 to be matched may be acquired.

The image to be matched 7141 of the target object may be rotated to obtain a candidate image to be matched 7142 and a candidate image to be matched 7143. First feature information 715 of the target image 713 is determined. Second feature information 7161 of the image to be matched 7141 is determined. Second feature information 7162 of the candidate image to be matched 7142 is determined. Second feature information 7163 of the candidate image to be matched 7143 is determined.

The target to-be-matched image 717 may be determined from the first feature information 715 of the target image 713, the second feature information 7161 of the to-be-matched image 7151, the second feature information 7162 of the candidate to-be-matched image 7142, and the second feature information 7163 of the candidate to-be-matched image 7143.

After the target image to be matched 717 is determined, fourth feature information 718 of the target image to be matched 717 may be determined. The third characteristic information 719 of the target image 713 is determined. According to the third characteristic information 719 of the target image 713 and the fourth characteristic information 718 of the target image 717, matching information 720 between the target image 713 and the target image 717 to be matched is determined.

After determining the matching information 720, a camera pose 721 of the camera may be determined from the matching information 720. From the camera pose 721 and the matching information 720, a three-dimensional point cloud model 722 is generated. A mesh model 723 is generated from the three-dimensional point cloud model 722. According to the grid model 723, the target image 713 and the target image to be matched 717, a map model 724 is determined.

Fig. 7C schematically illustrates an example schematic of a three-dimensional image reconstruction result according to an embodiment of the disclosure.

As shown in fig. 7C, in the case where the texture of the target object has repeatability, the target object is taken as a "magic cube" as an example. The existing three-dimensional image reconstruction results may be shown as a chartlet model 725. The three-dimensional image reconstruction result obtained by the three-dimensional image reconstruction method according to the embodiment of the disclosure can be shown as a map model 726.

According to the embodiment of the disclosure, the image matching method can reduce the influence of repeated textures. On the basis, the target object is subjected to three-dimensional reconstruction according to the matching information, so that the three-dimensional image reconstruction effect of the target object with repeated textures is improved.

As shown in fig. 7C, when the texture of the target object has weak texture, the target object is taken as an "architectural model" as an example. The existing three-dimensional image reconstruction results may be shown as the map model 727. The three-dimensional image reconstruction result obtained by the three-dimensional image reconstruction method according to the embodiment of the disclosure may be as shown in a map model 728.

According to the embodiment of the disclosure, the image matching method can reduce the influence of weak texture. On the basis, the target object is subjected to three-dimensional reconstruction according to the matching information, so that the three-dimensional image reconstruction effect of the target object with weak texture is improved.

The above is only an exemplary embodiment, but is not limited thereto, and other three-dimensional image reconstruction methods known in the art may be included as long as the three-dimensional image reconstruction effect can be improved.

Fig. 8 schematically shows a block diagram of an image matching apparatus according to an embodiment of the present disclosure.

As shown in fig. 8, the image matching apparatus 800 may include a rotation module 810, a first determination module 820, a second determination module 830, and a matching module 840.

The rotating module 810 is configured to rotate an image to be matched of the target object to obtain a plurality of candidate images to be matched. The texture of the target object has at least one of weak texture and repeatability.

The first determining module 820 is configured to determine a target image to be matched from the multiple candidate images to be matched according to the first feature information of the target image and the second feature information of the multiple candidate images to be matched. The target image is an image for a target object.

The second determining module 830 is configured to determine matching information between the target image and the target image to be matched according to the third feature information of the target image and the fourth feature information of the target image to be matched. The third feature information includes first feature point information and first feature point descriptor information of the first feature point. The fourth feature information includes second feature point information and second feature point description information of the second feature point.

And the matching module 840 is used for matching the target image and the target image to be matched according to the matching information.

According to an embodiment of the present disclosure, the image matching apparatus 800 may further include a feature extraction module, a first obtaining module, and a second obtaining module.

And the feature extraction module is used for respectively extracting features of the target image and the target image to be matched to obtain a first feature vector of the target image and a second feature vector of the target image to be matched.

And the first obtaining module is used for obtaining third feature information of the target image according to the first feature vector.

And the second obtaining module is used for obtaining fourth feature information of the target image to be matched according to the second feature vector of the target image to be matched.

According to an embodiment of the present disclosure, the first obtaining module may include a first decoding unit and a second decoding unit.

And the first decoding unit is used for carrying out first decoding on the first characteristic vector to obtain first characteristic point information of the first characteristic point of the target image.

And the second decoding unit is used for carrying out second decoding on the first characteristic vector to obtain first characteristic point description information of the first characteristic point of the target image.

According to an embodiment of the present disclosure, the image matching apparatus 800 may further include a feature point information extraction module, a third obtaining module, and a fourth obtaining module.

And the characteristic point information extraction module is used for respectively extracting the characteristic point information of the target image and the target image to be matched to obtain first characteristic point information of the target image and second characteristic point information of the target image to be matched.

And the third obtaining module is used for obtaining the first characteristic point descriptor information of the target image according to the first characteristic point information of the target image.

And the fourth obtaining module is used for obtaining second characteristic point descriptor information of the target image to be matched according to the second characteristic point information of the target image to be matched.

According to an embodiment of the present disclosure, the second determination module 830 may include a first determination unit.

The first determining unit is used for determining matching information between the target image and the target image to be matched according to first feature point descriptor information of a first feature point of the target image and second feature point descriptor information of a second feature point of the target image to be matched based on a traditional feature point matching method.

According to an embodiment of the present disclosure, the second determining module 830 may include a first obtaining unit, a second determining unit, and a third determining unit.

And the first obtaining unit is used for obtaining first matching descriptor information of the first characteristic point of the target image and second matching descriptor information of the second characteristic point of the target image to be matched according to the third characteristic information of the target image and the fourth characteristic information of the target image to be matched based on the attention map learning method.

And the second determining unit is used for determining a matching degree evaluation matrix between the target image and the target image to be matched according to the first matching descriptor information and the second matching descriptor information.

And the third determining unit is used for determining the matching information between the target image and the target image to be matched according to the matching degree evaluation matrix.

According to an embodiment of the present disclosure, the first obtaining unit may include a feature extraction subunit, a first obtaining subunit, a second obtaining subunit, and a third obtaining subunit.

And the feature extraction subunit is used for respectively performing feature extraction on the first feature point information of the first feature point of the target image and the second feature point information of the second feature point of the target image to be matched to obtain first intermediate feature point information of the first feature point of the target image and second intermediate feature point information of the second feature point of the target image to be matched.

And the first obtaining subunit is configured to obtain first fusion information of the first feature point of the target image according to the first intermediate feature point information and the first feature point descriptor information.

And the second obtaining subunit is configured to obtain second fusion information of the second feature point of the target image to be matched according to the second intermediate feature point information and the second feature point descriptor information.

And the third obtaining subunit is used for obtaining first matching descriptor information of the first characteristic point of the target image and second matching descriptor information of the second characteristic point of the target image to be matched according to the first fusion information and the second fusion information based on the attention map learning method.

According to an embodiment of the present disclosure, the second determining unit may include a first determining subunit and a fourth obtaining subunit.

And the first determining subunit is used for determining the similarity between the first characteristic point of the target image and the second characteristic point of the target image to be matched according to the first matching descriptor information and the second matching descriptor information.

And the fourth obtaining subunit is used for obtaining a matching degree evaluation matrix between the target image and the target image to be matched according to the similarity between the first characteristic point of the target image and the second characteristic point of the target image to be matched.

According to an embodiment of the present disclosure, the third determining unit may include a fifth obtaining subunit and a second determining subunit.

And the fifth obtaining subunit is used for processing the matching degree evaluation matrix based on the optimal transmission method to obtain the distribution matrix.

And the second determining subunit is used for determining the matching information between the target image and the target image to be matched according to the distribution matrix.

According to an embodiment of the present disclosure, the first determination module 820 may include a fourth determination unit and a fifth determination unit.

And the fourth determining unit is used for determining the similarity between the target object and the candidate images to be matched according to the first characteristic information of the target image and the first characteristic information of the candidate images to be matched.

And the fifth determining unit is used for determining the target image to be matched from the candidate images to be matched according to the similarity between the target object and the candidate images to be matched.

According to an embodiment of the present disclosure, the image matching apparatus 800 may further include a processing module.

And the processing module is used for processing the target image and the candidate images to be matched by using the characterization model to obtain first characteristic information of the target image and second characteristic information of the candidate images to be matched. The characterization model is obtained by training an automatic supervision model by using the sample characteristic information of the positive sample image and the sample characteristic information of the negative sample images corresponding to the positive sample image. The plurality of negative sample images are determined from a plurality of candidate negative sample images corresponding to the positive sample images.

According to an embodiment of the present disclosure, the image matching apparatus 800 may further include a cropping processing module.

And the cropping processing module is used for performing cropping processing on at least one of the original image and the original image to be matched under the condition that the absolute value of the size difference between the target object in the original image and the target object in the original image to be matched is greater than or equal to a preset threshold value, so as to obtain a target image and an image to be matched. The size difference between the target object in the target image and the target object in the image to be matched is smaller than a predetermined threshold.

According to an embodiment of the present disclosure, in a case where it is determined that an original image is to be cropped, a cropping processing module may include a first object detection unit and a first extraction unit.

And the first target detection unit is used for carrying out target detection on the original image to obtain first detection information. The first detection information includes first position information of a first detection frame corresponding to the target object.

And the first extraction unit is used for extracting an image area corresponding to the target object from the original image according to the first position information to obtain a target image.

According to an embodiment of the present disclosure, in a case where it is determined that the original image to be matched is to be cropped, the cropping processing module may include a second object detection unit and a second extraction unit.

And the second target detection unit is used for carrying out target detection on the original image to be matched to obtain second detection information. The second detection information includes second position information of a second detection frame corresponding to the target object.

And the second extraction unit is used for extracting an image area corresponding to the target object from the original image to be matched according to the second position information to obtain the image to be matched.

According to an embodiment of the present disclosure, the image matching apparatus 800 may further include a third determining module.

And the third determining module is used for determining the original image as the target image and the original image to be matched as the image to be matched under the condition that the absolute value of the size difference between the target object in the original image and the target object in the original image to be matched is smaller than the preset threshold value.

Fig. 9 schematically shows a block diagram of a three-dimensional image reconstruction apparatus according to an embodiment of the present disclosure.

As shown in fig. 9, the three-dimensional image reconstruction apparatus 900 may include a fifth obtaining module 910 and a three-dimensional reconstruction module 920.

A fifth obtaining module 910, configured to obtain, by using the image matching apparatus 800, matching information between the target image and the target image to be matched.

And a three-dimensional reconstruction module 920, configured to perform three-dimensional reconstruction on the target object according to the matching information.

According to the embodiment of the disclosure, the target image and the target image to be matched are obtained by shooting according to cameras with different view angles.

According to an embodiment of the present disclosure, the three-dimensional reconstruction module 920 may include a sixth determination unit, a first generation unit, a second generation unit, and a third generation unit.

And the sixth determining unit is used for determining the camera pose of the camera according to the matching information.

And the first generating unit is used for generating a three-dimensional point cloud model according to the camera pose and the matching information.

And the second generation unit is used for generating a grid model according to the three-dimensional point cloud model.

And the third generation unit is used for generating a mapping model according to the grid model, the target image and the target image to be matched.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in the present disclosure.

According to an embodiment of the present disclosure, a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method as described in the present disclosure.

According to an embodiment of the disclosure, a computer program product comprising a computer program which, when executed by a processor, implements a method as described in the disclosure.

Fig. 10 schematically shows a block diagram of an electronic device adapted to implement the image matching method and the three-dimensional reconstruction method according to an embodiment of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 10, the electronic device 1000 includes a computing unit 1001 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1002 or a computer program loaded from a storage unit 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for the operation of the electronic apparatus 1000 can also be stored. The calculation unit 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.

A number of components in the electronic device 1000 are connected to the I/O interface 1005, including: an input unit 1006 such as a keyboard, a mouse, and the like; an output unit 1007 such as various types of displays, speakers, and the like; a storage unit 1008 such as a magnetic disk, optical disk, or the like; and a communication unit 1009 such as a network card, a modem, a wireless communication transceiver, or the like. The communication unit 1009 allows the electronic device 1000 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

Computing unit 1001 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 1001 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 1001 executes the respective methods and processes described above, such as an image matching method, a three-dimensional image reconstruction method. For example, in some embodiments, the image matching method, the three-dimensional image reconstruction method, may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 1008. In some embodiments, part or all of the computer program may be loaded and/or installed onto electronic device 1000 via ROM 1002 and/or communications unit 1009. When the computer program is loaded into the RAM 1003 and executed by the computing unit 1001, one or more steps of the image matching method, the three-dimensional image reconstruction method described above may be performed. Alternatively, in other embodiments, the computing unit 1001 may be configured by any other suitable means (e.g. by means of firmware) to perform an image matching method, a three-dimensional image reconstruction method.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. An image matching method, comprising:

rotating an image to be matched of a target object to obtain a plurality of candidate images to be matched, wherein the texture of the target object has at least one of weak texture and repeatability;

determining a target image to be matched from the candidate images to be matched according to first feature information of the target image and second feature information of the candidate images to be matched, wherein the target image is an image aiming at the target object;

determining matching information between the target image and the target image to be matched according to third feature information of the target image and fourth feature information of the target image to be matched, wherein the third feature information comprises first feature point information and first feature point descriptor information of first feature points, and the fourth feature information comprises second feature point information and second feature point descriptor information of second feature points; and

and matching the target image with the target image to be matched according to the matching information.

2. The method of claim 1, further comprising:

respectively extracting features of the target image and the target image to be matched to obtain a first feature vector of the target image and a second feature vector of the target image to be matched;

obtaining third feature information of the target image according to the first feature vector; and

and obtaining fourth feature information of the target image to be matched according to the second feature vector of the target image to be matched.

3. The method of claim 2, wherein the obtaining third feature information of the target image according to the first feature vector comprises:

performing first decoding on the first feature vector to obtain first feature point information of a first feature point of the target image; and

and performing second decoding on the first feature vector to obtain first feature point description information of the first feature point of the target image.

4. The method of claim 1, further comprising:

respectively extracting feature point information of the target image and the target image to be matched to obtain first feature point information of the target image and second feature point information of the target image to be matched;

obtaining first characteristic point descriptor information of the target image according to the first characteristic point information of the target image; and

and obtaining second feature point descriptor information of the target image to be matched according to the second feature point information of the target image to be matched.

5. The method according to any one of claims 1 to 4, wherein the determining matching information between the target image and the target image to be matched according to the third feature information of the target image and the fourth feature information of the target image to be matched comprises:

and determining matching information between the target image and the target image to be matched according to first feature point descriptor information of a first feature point of the target image and second feature point descriptor information of a second feature point of the target image to be matched based on a traditional feature point matching method.

6. The method according to any one of claims 1 to 4, wherein the determining matching information between the target image and the target image to be matched according to the third feature information of the target image and the fourth feature information of the target image to be matched comprises:

obtaining first matching descriptor information of a first feature point of the target image and second matching descriptor information of a second feature point of the target image to be matched according to third feature information of the target image and fourth feature information of the target image to be matched based on an attention map learning method;

determining a matching degree evaluation matrix between the target image and the target image to be matched according to the first matching descriptor information and the second matching descriptor information; and

and determining matching information between the target image and the target image to be matched according to the matching degree evaluation matrix.

7. The method according to claim 6, wherein the learning based on attention map method, which obtains first matching descriptor information of a first feature point of the target image and second matching descriptor information of a second feature point of the target image to be matched according to the third feature information of the target image and the fourth feature information of the target image to be matched, comprises:

respectively extracting the features of first feature point information of a first feature point of the target image and second feature point information of a second feature point of the target image to be matched to obtain first intermediate feature point information of the first feature point of the target image and second intermediate feature point information of the second feature point of the target image to be matched;

obtaining first fusion information of a first characteristic point of the target image according to the first intermediate characteristic point information and the first characteristic point descriptor information;

obtaining second fusion information of a second characteristic point of the target image to be matched according to the second intermediate characteristic point information and the second characteristic point descriptor information; and

and obtaining first matching descriptor information of a first characteristic point of the target image and second matching descriptor information of a second characteristic point of the target image to be matched according to the first fusion information and the second fusion information based on an attention-seeking learning method.

8. The method according to claim 6 or 7, wherein the determining a matching degree evaluation matrix between the target image and the target image to be matched according to the first matching descriptor information and the second matching descriptor information comprises:

determining the similarity between a first feature point of the target image and a second feature point of the target image to be matched according to the first matching descriptor information and the second matching descriptor information; and

and obtaining a matching degree evaluation matrix between the target image and the image to be matched according to the similarity between the first characteristic point of the target image and the second characteristic point of the image to be matched.

9. The method according to any one of claims 6 to 8, wherein the determining matching information between the target image and the target image to be matched according to the matching degree evaluation matrix comprises:

processing the matching degree evaluation matrix based on an optimal transmission method to obtain a distribution matrix; and

and determining matching information between the target image and the target image to be matched according to the distribution matrix.

10. The method according to any one of claims 1 to 9, wherein the determining a target image to be matched from the plurality of candidate images to be matched according to the first feature information of the target image and the second feature information of the plurality of candidate images to be matched comprises:

determining similarity between the target object and the candidate images to be matched according to the first characteristic information of the target image and the second characteristic information of the candidate images to be matched; and

and determining the target image to be matched from the candidate images to be matched according to the similarity between the target object and the candidate images to be matched.

11. The method of any of claims 1-10, further comprising:

processing the target image and the candidate images to be matched by using a characterization model to obtain first feature information of the target image and second feature information of the candidate images to be matched, wherein the characterization model is obtained by training an auto-supervision model by using sample feature information of a positive sample image and sample feature information of a plurality of negative sample images corresponding to the positive sample image, and the negative sample images are determined from the candidate negative sample images corresponding to the positive sample image.

12. The method of any of claims 1-11, further comprising:

under the condition that the absolute value of the size difference between the target object in the original image and the target object in the original image to be matched is determined to be greater than or equal to a preset threshold value, at least one of the original image and the original image to be matched is subjected to cutting processing to obtain the target image and the image to be matched, wherein the size difference between the target object in the target image and the target object in the image to be matched is smaller than the preset threshold value.

13. The method according to claim 12, wherein the cropping at least one of the original image and the original image to be matched to obtain the target image and the image to be matched comprises:

in the case where it is determined that the original image is to be cropped,

performing target detection on the original image to obtain first detection information, wherein the first detection information comprises first position information of a first detection frame corresponding to the target object; and

extracting an image area corresponding to a target object from the original image according to the first position information to obtain the target image;

in case it is determined that the original image to be matched is to be cropped,

performing target detection on the original image to be matched to obtain second detection information, wherein the second detection information comprises second position information of a second detection frame corresponding to the target object; and

and extracting an image area corresponding to a target object from the original image to be matched according to the second position information to obtain the image to be matched.

14. The method of claim 12 or 13, further comprising:

determining the original image as the target image and determining the original image to be matched as the image to be matched if the absolute value of the size difference between the target object in the original image and the target object in the original image to be matched is smaller than the predetermined threshold.

15. A three-dimensional image reconstruction method, comprising:

obtaining matching information between the target image and the target image to be matched by using the method of any one of claims 1 to 14; and

and performing three-dimensional reconstruction on the target object according to the matching information.

16. The method of claim 15, wherein the target image and the target image to be matched are captured from cameras of different perspectives;

wherein the three-dimensional reconstruction of the target object according to the matching information comprises:

determining a camera pose of the camera according to the matching information;

generating a three-dimensional point cloud model according to the camera pose and the matching information;

generating a grid model according to the three-dimensional point cloud model; and

and generating a mapping model according to the grid model, the target image and the target image to be matched.

17. An image matching apparatus comprising:

the rotation module is used for rotating the image to be matched of the target object to obtain a plurality of candidate images to be matched, wherein the texture of the target object has at least one of weak texture and repeatability;

a first determining module, configured to determine a target image to be matched from the multiple candidate images to be matched according to first feature information of the target image and second feature information of the multiple candidate images to be matched, where the target image is an image for the target object;

a second determining module, configured to determine matching information between the target image and the target image to be matched according to third feature information of the target image and fourth feature information of the target image to be matched, where the third feature information includes first feature point information and first feature point descriptor information of a first feature point, and the fourth feature information includes second feature point information and second feature point descriptor information of a second feature point; and

and the matching module is used for matching the target image with the target image to be matched according to the matching information.

18. The apparatus of claim 17, further comprising:

the feature extraction module is used for respectively extracting features of the target image and the target image to be matched to obtain a first feature vector of the target image and a second feature vector of the target image to be matched;

the first obtaining module is used for obtaining third feature information of the target image according to the first feature vector; and

19. The apparatus of claim 18, wherein the first obtaining means comprises:

a first decoding unit, configured to perform first decoding on the first feature vector to obtain first feature point information of a first feature point of the target image; and

20. The apparatus of claim 17, further comprising:

the characteristic point information extraction module is used for respectively extracting characteristic point information of the target image and the target image to be matched to obtain first characteristic point information of the target image and second characteristic point information of the target image to be matched;

a third obtaining module, configured to obtain first feature point descriptor information of the target image according to the first feature point information of the target image; and

and the fourth obtaining module is used for obtaining second feature point descriptor information of the target image to be matched according to the second feature point information of the target image to be matched.

21. The apparatus of any of claims 17-20, wherein the second determining means comprises:

a first determining unit, configured to determine matching information between the target image and the target image to be matched according to first feature point descriptor information of a first feature point of the target image and second feature point descriptor information of a second feature point of the target image to be matched based on a conventional feature point matching method.

22. The apparatus of any of claims 17-20, wherein the second determining means comprises:

a first obtaining unit, configured to obtain, based on an attention-driven learning method, first matching descriptor information of a first feature point of the target image and second matching descriptor information of a second feature point of the target image to be matched according to third feature information of the target image and fourth feature information of the target image to be matched;

a second determining unit, configured to determine, according to the first matching descriptor information and the second matching descriptor information, a matching degree evaluation matrix between the target image and the target image to be matched; and

23. The apparatus of claim 22, wherein the first obtaining unit comprises:

a feature extraction subunit, configured to perform feature extraction on first feature point information of a first feature point of the target image and second feature point information of a second feature point of the target image to be matched, respectively, to obtain first intermediate feature point information of the first feature point of the target image and second intermediate feature point information of the second feature point of the target image to be matched;

a first obtaining subunit, configured to obtain first fusion information of a first feature point of the target image according to the first intermediate feature point information and the first feature point descriptor information;

a second obtaining subunit, configured to obtain, according to the second intermediate feature point information and the second feature point descriptor information, second fusion information of a second feature point of the target image to be matched; and

and the third obtaining subunit is configured to obtain, based on an attention-driven learning method, first matching descriptor information of the first feature point of the target image and second matching descriptor information of the second feature point of the target image to be matched according to the first fusion information and the second fusion information.

24. The apparatus of claim 22 or 23, wherein the second determining unit comprises:

a first determining subunit, configured to determine, according to the first matching descriptor information and the second matching descriptor information, a similarity between a first feature point of the target image and a second feature point of the target image to be matched; and

and the fourth obtaining subunit is configured to obtain a matching degree evaluation matrix between the target image and the target image to be matched according to the similarity between the first feature point of the target image and the second feature point of the target image to be matched.

25. The apparatus according to any one of claims 22 to 24, wherein the third determining unit includes:

a fifth obtaining subunit, configured to process the matching degree evaluation matrix based on an optimal transmission method to obtain a distribution matrix; and

and the second determining subunit is used for determining matching information between the target image and the target image to be matched according to the distribution matrix.

26. The apparatus of any of claims 17-25, wherein the first determining means comprises:

a fourth determining unit, configured to determine similarity between the target object and the multiple candidate images to be matched according to the first feature information of the target image and the second feature information of the multiple candidate images to be matched; and

a fifth determining unit, configured to determine the target image to be matched from the multiple candidate images to be matched according to similarities between the target object and the multiple candidate images to be matched.

27. The apparatus of any one of claims 17-26, further comprising:

the processing module is used for processing the target image and the candidate images to be matched by using a characterization model to obtain first feature information of the target image and second feature information of the candidate images to be matched, wherein the characterization model is obtained by training a self-supervision model by using sample feature information of a positive sample image and sample feature information of a plurality of negative sample images corresponding to the positive sample image, and the negative sample images are determined from the candidate negative sample images corresponding to the positive sample image.

28. The apparatus of any one of claims 17-27, further comprising:

and the cropping processing module is used for performing cropping processing on at least one of the original image and the original image to be matched under the condition that the absolute value of the size difference between the target object in the original image and the target object in the original image to be matched is greater than or equal to a preset threshold value, so as to obtain the target image and the image to be matched, wherein the size difference between the target object in the target image and the target object in the image to be matched is smaller than the preset threshold value.

29. The apparatus of claim 28, wherein the crop processing module comprises:

in the case where it is determined that the original image is to be cropped,

a first target detection unit, configured to perform target detection on the original image to obtain first detection information, where the first detection information includes first position information of a first detection frame corresponding to the target object; and

a first extraction unit, configured to extract an image region corresponding to a target object from the original image according to the first position information, so as to obtain the target image;

the second target detection unit is used for carrying out target detection on the original image to be matched to obtain second detection information, wherein the second detection information comprises second position information of a second detection frame corresponding to the target object; and

30. The apparatus of claim 28 or 29, further comprising:

a third determining module, configured to determine the original image as the target image and the original image to be matched as the image to be matched, when it is determined that an absolute value of a size difference between the target object in the original image and the target object in the original image to be matched is smaller than the predetermined threshold.

31. A three-dimensional image reconstruction apparatus comprising:

a fifth obtaining module, configured to obtain matching information between the target image and the target image to be matched, by using the apparatus according to any one of claims 17 to 29; and

and the three-dimensional reconstruction module is used for performing three-dimensional reconstruction on the target object according to the matching information.

32. The device of claim 31, wherein the target image and the target image to be matched are captured from cameras of different perspectives;

wherein, three-dimensional reconstruction module includes:

a sixth determining unit configured to determine a camera pose of the camera according to the matching information;

the first generation unit is used for generating a three-dimensional point cloud model according to the camera pose and the matching information;

the second generating unit is used for generating a grid model according to the three-dimensional point cloud model; and

33. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 16.

34. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method according to any one of claims 1-16.

35. A computer program product comprising a computer program which, when executed by a processor, implements a method according to any one of claims 1 to 16.