CN118053004A

CN118053004A - Image processing method, apparatus, device, storage medium, and program product

Info

Publication number: CN118053004A
Application number: CN202211405939.7A
Authority: CN
Inventors: 杜斯亮; 刘明忠; 齐汉超; 陈旭
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2022-11-10
Filing date: 2022-11-10
Publication date: 2024-05-17

Abstract

Embodiments of the present disclosure provide an image processing method, apparatus, device, storage medium, and program product, relating to the field of artificial intelligence. For example, the present disclosure provides an image processing method. The method may include: determining a candidate image matched with a first image based on the similarity information, wherein the first image and the candidate image comprise at least one key point, the position information of the at least one key point in the candidate image is known, and the candidate image comprises a second image; respectively acquiring a first feature map and a candidate feature map from the first image and the candidate image; determining location information of the at least one keypoint on the second image on the first image by determining matched pairs of pixels in the first feature map and the candidate feature map; and determining a transformation relationship of the first image to the second image based on the position information of the at least one key point on the first image and the position information of the at least one key point on the second image. Through the scheme, the transformation relation between the heterogeneous images can be rapidly and accurately determined, and the user experience is improved.

Description

Image processing method, apparatus, device, storage medium, and program product

Technical Field

Embodiments of the present disclosure relate generally to the field of artificial intelligence. More particularly, embodiments of the present disclosure relate to image processing methods, apparatuses, electronic devices, computer-readable storage media, and computer program products.

Background

As artificial intelligence techniques such as Deep Neural Network (DNN) algorithms have been applied to image processing fields such as three-dimensional reconstruction, SLAM (simultaneous localization and mapping ), etc., fusion of images from heterogeneous sensors has become possible. Fusion of heterologous images essentially involves acquiring coordinates of matching keypoints between the heterologous images. Due to radiation differences, differences in camera models, heteromatching becomes very difficult, and currently existing methods are difficult to adapt directly to the matching of these heterosensors. For example, while there have been heterogeneous image matching approaches that can address differences in radiation during the day, night, etc., it is difficult to handle matching of images from different camera models, and only partial heterogeneous sensor alignment problems can be addressed. Furthermore, since the heterogeneous sensor typically employs different operators to achieve feature extraction, it is difficult to achieve direct matching between the different operators. In summary, fusion techniques for heterologous images need to be further improved.

Disclosure of Invention

To facilitate fusion of heterologous images, embodiments of the present disclosure provide a new image processing scheme.

In a first aspect of the present disclosure, an image processing method is provided. The method may include: determining a candidate image matched with a first image based on the similarity information, wherein the first image and the candidate image comprise at least one key point, the position information of the at least one key point in the candidate image is known, and the candidate image comprises a second image; respectively acquiring a first feature map and a candidate feature map from the first image and the candidate image; determining location information of the at least one keypoint on the second image on the first image by determining matched pairs of pixels in the first feature map and the candidate feature map; and determining a transformation relationship of the first image to the second image based on the position information of the at least one key point on the first image and the position information of the at least one key point on the second image.

The present disclosure performs matching of key points on the level of feature vectors by extracting feature maps from a query image and a reference image, thereby determining location information of the key points in the query image. Since the position information of the key point on the reference image is known, the transformation relationship of the two images can be determined. Therefore, the heterologous image fusion scheme disclosed by the invention can be suitable for fusion operation of image data acquired by different types of sensors, and is not subjected to consistency problems of different operators and descriptors, so that user experience is improved.

In an implementation manner of the first aspect, determining the location information of the at least one key point on the first image may include: determining matching scores of a plurality of pixels in the first feature map and a plurality of pixels in the candidate feature map; acquiring position information of the pixel pairs with the matching score larger than a threshold score; acquiring a partial feature map from the first feature map based on the position information of the pixel pairs in response to the at least one key point conforming to the position information of the pixel pairs; acquiring a part of candidate feature images from the candidate feature images based on the position information of the at least one key point in the candidate images; and determining location information of the at least one keypoint on the first image based on the partial feature map and the partial candidate feature map. In this way, the coordinates of the key points known on the second image on the first image can be determined more quickly and accurately.

In an implementation manner of the first aspect, acquiring the first feature map and the candidate feature map respectively may include: acquiring a first sub-feature map of a first resolution and a second sub-feature map of a second resolution of the first image; and acquiring a third sub-feature map of the first resolution and a fourth sub-feature map of the second resolution of the candidate image, the first resolution being less than the second resolution. It should be appreciated that feature maps of different resolutions may be used for different computing tasks, thereby ensuring the accuracy of the computation while conserving computing resources.

In an implementation manner of the first aspect, determining the location information of the at least one key point on the first image may include: determining a matching score of at least one pixel in the first sub-feature map and at least one pixel in the third sub-feature map; acquiring position information of the pixel pairs with the matching score larger than a threshold score; acquiring a partial feature map from the second sub-feature map based on the position information of the pixel pairs in response to the at least one key point conforming to the position information of the pixel pairs; acquiring a part of candidate feature images from the fourth sub-feature images based on the position information of the at least one key point in the candidate images; and determining location information of the at least one keypoint on the first image based on the partial feature map and the partial candidate feature map. In this way, by performing a rough matching process with a feature map of lower resolution and performing a fine matching process with a feature map of higher resolution, the coordinates of the key points in the first image can be accurately calculated, and computational resources can be saved in the rough matching process.

In one implementation manner of the first aspect, the sizes of the partial feature map and the partial candidate feature map may be smaller than the sizes of the first feature map and the candidate feature map.

In one implementation manner of the first aspect, determining the matching score may include: updating the first sub-feature map based on the first sub-feature map and the pixel location of the at least one pixel; updating the third sub-feature map based on the third sub-feature map and the pixel location of the at least one pixel; and processing the updated first sub-feature map and the third sub-feature map through a neural network model to determine the matching score. In this way, each feature vector to be matched corresponding to a pixel in the feature map includes both the information of the feature map itself and the information of the corresponding feature map, so that the matching result can be more accurate.

In an implementation manner of the first aspect, determining the location information of the at least one key point on the first image may include: updating the partial feature map based on the partial feature map and pixel positions of pixels in the partial feature map; updating the partial candidate feature map based on the partial candidate feature map and pixel positions of pixels in the partial candidate feature map; and processing the updated partial feature map and the partial candidate feature map through the neural network model to determine the position information of the at least one key point on the first image. In this way, each feature vector to be matched corresponding to the pixels in the partial feature map obtained by clipping contains both the information of the feature map itself and the information of the corresponding feature map, so that the matching result can be more accurate.

In one implementation of the first aspect, the neural network model may be a transducer model.

In one implementation manner of the first aspect, determining the candidate image matching the first image may include: acquiring at least the first image and pose information from a first map; acquiring a plurality of images from a second map different from the first map; dividing the first image into a plurality of sub-images based on the pose information; determining similarity information of each of the plurality of sub-images and each of the plurality of images; and determining the candidate image from the plurality of images based on the similarity information. It should be appreciated that the division into multiple sub-images may be into multiple search grids, thereby enabling inter-picture sub-pixel level matching.

In an implementation manner of the first aspect, determining a transformation relationship of the first image to the second image may include: determining the number of matching points based on the position information of the at least one key point on the first image and the position information of the at least one key point on the second image; determining pose information for the first image in response to the number being greater than a threshold number; and determining a transformation relationship of the first image to the second image based at least on pose information of the first image.

In an implementation manner of the first aspect, the method may further include: and determining a fusion image of the first image and the second image based on the transformation relation.

In a second aspect of the present disclosure, a speech recognition apparatus is provided. The apparatus comprises a functional module for implementing the first aspect or any implementation manner of the first aspect.

In a third aspect of the present disclosure, an electronic device is provided. The electronic device includes: at least one computing unit; at least one memory coupled to the at least one computing unit and storing instructions for execution by the at least one computing unit, the instructions when executed by the at least one computing unit cause the apparatus to perform the method of the first aspect or any one of the implementations of the first aspect.

In a fourth aspect of the present disclosure, a computer-readable storage medium is provided. The computer-readable storage medium stores one or more computer instructions, where the one or more computer instructions are executable by a processor to implement the first aspect or any of the implementations of the first aspect.

In a fifth aspect of the present disclosure, a computer program product is provided. The computer program product comprises computer executable instructions which, when executed by a processor, cause the computer to perform some or all of the steps of the method of the first aspect or any implementation of the first aspect.

It will be appreciated that the speech recognition apparatus of the second aspect, the electronic device of the third aspect, the computer storage medium of the fourth aspect or the computer program product of the fifth aspect provided above are all adapted to implement the method provided by the first aspect. Accordingly, the explanation or explanation regarding the first aspect is equally applicable to the second aspect, the third aspect, the fourth aspect, and the fifth aspect. The advantages achieved by the second, third, fourth and fifth aspects are referred to as advantages in the corresponding methods, and are not described here.

These and other aspects of the invention will be apparent from and elucidated with reference to the embodiment(s) described hereinafter.

Drawings

The above and other features, advantages and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, wherein like or similar reference numerals designate like or similar elements, and wherein:

FIG. 1 illustrates a schematic diagram of an example system in which various embodiments of the present disclosure may be implemented;

FIG. 2 shows a flow chart of a process of image processing according to an embodiment of the present disclosure;

FIG. 3 illustrates a flowchart of a process of determining candidate images according to an embodiment of the present disclosure;

FIG. 4 illustrates a schematic diagram of an example system for acquiring feature maps in accordance with an embodiment of the present disclosure;

FIG. 5 illustrates a schematic diagram of a portion of an example system for determining location information of keypoints according to embodiments of the disclosure;

FIG. 6 illustrates a schematic diagram of another portion of an example system for determining location information of keypoints according to embodiments of the disclosure;

FIG. 7 illustrates a flowchart of a process of determining a transformation relationship between images, according to an embodiment of the present disclosure;

Fig. 8 illustrates a schematic block diagram of an image processing apparatus according to some embodiments of the present disclosure; and

Fig. 9 illustrates a block diagram of a computing unit capable of implementing various embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

In describing embodiments of the present disclosure, the term "comprising" and its like should be taken to be open-ended, i.e., including, but not limited to. The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The terms "first," "second," and the like, may refer to different or the same object. Other explicit and implicit definitions are also possible below. Further, herein, "and/or" is used to denote at least one object of a plurality of objects. For example, "a and/or B" means one of "a", "B" and "a and B". Furthermore, all specific values herein are for purposes of example and are not intended to limit the scope of the present disclosure.

As discussed above, conventional heterologous image fusion techniques can only address alignment issues of a portion of the heterologous sensors, and it is difficult to achieve direct matching between different operators of different heterologous sensors, which presents challenges to image processing capabilities in the heterologous image fusion technique.

Conventional image processing techniques for heterologous image fusion may be, for example, colmap or SuperPoint + SuperGlue algorithms. colmap is that for incrementally added images, feature points need to be repeatedly extracted, matching and three-dimensional reconstruction are repeatedly performed, which results in a large number of repeated works, and the depth map estimation of colmap is too slow and less accurate. In addition, colmap uses the traditional operator, and is not feature extraction of a deep learning mode, so that the matching effect on the heterogeneous algorithm is more common. On the other hand, superPoint +SuperGlue has the disadvantage that a descriptor description of the feature points is required, which is difficult to adapt to the data of different sensors. In addition, superPoint +SuperGlue method adopts a pattern based on detection-matching, and the detected points must have more obvious characteristics, which is not friendly to matching in areas where the characteristics are not obvious. Therefore, superPoint +SuperGlue has poor matching effect on different sensor data, and repeated feature points cannot be effectively matched in an area with unobvious features.

Therefore, the conventional heterogeneous image fusion technology has limitations in application scenes and application effects.

In order to solve the above-described problems, the present disclosure provides a new image processing scheme. As an example, the present disclosure achieves fusion of heterologous images by arranging an image retrieval module, a keypoint matching module, and an image alignment module. The image retrieval module is used for retrieving a reference image similar to the query image; the key point matching module is used for extracting a feature image from the image, carrying out rough matching on the feature image and carrying out refined regression on a rough matching result so as to determine a matching result of the key points; the image alignment module is used for determining a transformation relation between the query map and the reference map so as to realize subsequent image fusion.

In order to more accurately describe the concepts of the present disclosure, an example system according to an embodiment of the present disclosure is described in detail below in conjunction with fig. 1.

Fig. 1 illustrates a schematic diagram of an example system 100 in which various embodiments of the present disclosure may be implemented.

In fig. 1, system 100 includes map data 110, computing device 120, and fused image 130. It should be appreciated that the map data 110 may be a historical map database that may contain cell phone maps, drone maps, vehicle maps, satellite maps, etc. loaded at different times. Since these maps are typically photographed by different devices, the map data 110 may be referred to as heterogeneous image data. As shown in fig. 1, the map data 110 includes a first image 111 and a reference image 112. As an example, the first image 111 may be an image in a query map and the reference image 112 may be an image in a reference map.

It should also be appreciated that map data 110 may also be image data under different parameter systems in a virtual space (e.g., a metauniverse) created by the person. Map data 110 is typically stored in a storage device. The storage device includes a register for storing data. The storage device may also be a storage disk(s). The storage disk can be various types of devices having a storage function, including, but not limited to, a hard disk (HDD), a Solid State Disk (SSD), a removable disk, any other magnetic storage device, and any other optical storage device, or any combination thereof.

Further, a trained machine learning model or deep neural network may be disposed in the computing device 120. Computing device 120 may be any device having computing capabilities. As non-limiting examples, computing device 120 may be any type of fixed, mobile, or portable computing device including, but not limited to, a server, desktop, laptop, notebook, netbook, tablet, smartphone, or the like. All or a portion of the components of computing device 120 may be distributed across the cloud. Computing device 120 may also employ a cloud-edge architecture.

As shown in fig. 1, when the computing device 120 receives the first image 111 and the reference image 112, the computing device 120 may process the first image 111 and the reference image 112 using an image retrieval module 121, a keypoint matching module 122, and an image alignment module 123 disposed thereon. As an example, the image retrieval module 121 may retrieve candidate images from the reference image 112 that are similar to the first image 111. The keypoint matching module 122 may extract feature maps from the first image and the candidate images, perform coarse matching on the feature maps, and perform refined regression on the coarse matching results, thereby determining a second image that matches the first image 111, and determining location information of known keypoints on the second image on the first image. Further, the image alignment module 123 may determine a transformation relationship between the first map to the second map based on a plurality of location information of the known keypoints in order to achieve subsequent image fusion. In this way, the present disclosure may perform image fusion of any two heterogeneous images, for example, may perform map fusion of an on-vehicle map that has been placed for a period of time with a recently updated satellite map. The heterologous image fusion scheme disclosed by the invention can be suitable for fusion operation of image data acquired by different types of sensors, and is free from the problem of consistency of different operators and descriptors, so that user experience is improved.

It should be understood that the architecture and functionality in the example system 100 are described for illustrative purposes only and are not meant to imply any limitation on the scope of the present disclosure. Embodiments of the present disclosure may also be applied to other environments having different structures and/or functions.

A process according to an embodiment of the present disclosure will be described in detail below in conjunction with fig. 2. For ease of understanding, specific data set forth in the following description are intended to be exemplary and are not intended to limit the scope of the disclosure. It will be appreciated that the embodiments described below may also include additional actions not shown and/or may omit shown actions, the scope of the present disclosure being not limited in this respect.

Fig. 2 shows a flowchart of a process 200 of image processing according to an embodiment of the present disclosure. In some embodiments, process 200 may be in computing device 120 in fig. 1. A process 200 for determining a transformation relationship between heterologous images according to an embodiment of the present disclosure is now described with reference to fig. 1. For ease of understanding, the specific examples mentioned in the following description are illustrative and are not intended to limit the scope of the disclosure.

As shown in fig. 2, at 202, the computing device 120 may determine candidate images from the reference image 112 that match the first image 111 based on the similarity information. It is to be appreciated that the first image 111 may be an image of a query map, the first image 111 and the candidate image each including at least one keypoint, the location information of which in the candidate image is known, and the candidate image contains a second image, i.e. an image that is ultimately determined to be the closest match to the first image 111. It should also be appreciated that the key points may be guide points or landmark (landmark) points in the map. In some embodiments, the reference image 112 is from reference map data, which is a heterologous image to the first image 111.

At 204, the computing device 120 may obtain a first feature map and a plurality of candidate feature maps, respectively, from a plurality of candidate images selected from the first image 111 and the reference image 112. As an example, the first image 111 and the matched plurality of candidate images may be applied to a pre-trained neural network model, thereby extracting the first feature map and the plurality of candidate feature maps from the first image 111 and the candidate images. It should be understood that the feature map is the feature data of the image. As an example, the neural network model may be a convolutional neural network. That is, the convolutional neural network may be used to extract features for the first image 111 and the candidate image. It should be appreciated that any structure of convolutional neural network (such as vgg, renset, densenet, mobilenet, etc.) and some operators (deformconv, se, dilationconv, inception, etc.) that can be used to improve the effect of the network can be used herein. Thus, the computing device 120 may extract feature maps from the first image 111 and the candidate images. It should also be appreciated that the feature map may be represented by three tensors of height H, width W, and channel number C. Further, the feature map may be regarded as a set of feature vectors composed of h×w feature vectors, and the dimension of each feature vector is equal to the channel number C of the feature map. For example, when H equals 3,W equals 3 and c equals 4, the feature map is composed of 9 feature vectors, and the dimension of each feature vector equals 4. It should be understood that the numerical values in the present embodiment are exemplary and are not intended to limit the scope of the present disclosure.

At 206, the computing device 120 may determine location information of at least one keypoint on the second image on the first image 111 by determining matching pairs of pixels in the first feature map and the candidate feature map. In other words, by finding the corresponding key point of the first image 111 at the feature vector level, the coordinates of the key point can be determined. It should be understood that the coordinates may be three-dimensional coordinates of the keypoint in three-dimensional space.

At 208, the computing device 120 may determine a transformation relationship of the first image 111 to the second image based on the location information of the at least one keypoint on the first image 11 and the location information of the at least one keypoint on the second image, ultimately determining the fused image 130. In this way, the heterologous image fusion scheme of the present disclosure may be applicable to fusion operations of image data acquired by different types of sensors.

To describe the various steps of the present disclosure in more detail, a process 300 for determining a plurality of candidate images matching the first image 111 from the reference image 112 according to an embodiment of the present disclosure will now be described with reference to fig. 3. For ease of understanding, the specific examples mentioned in the following description are illustrative and are not intended to limit the scope of the disclosure.

As shown in fig. 3, at 302, the computing device 120 may obtain at least the first image 111 and pose information from the first map. As an example, the first map may be a query map, for example, the above-described vehicle-mounted map, and the pose information may be view angle information of a picture of the map.

At 304, the computing device 120 may obtain a plurality of images from a second map that is different from the first map. In other words, the first map and the second map are heterogeneous maps, for example, the first map is a vehicle-mounted map and the second map is a satellite map, both of which are acquired by different sensors.

At 306, the computing device 120 may divide the first image 111 into a plurality of sub-images based on the pose information. As an example, the first image 111 may be divided into a plurality of grids.

At 308, the computing device 120 may determine similarity information for each of the plurality of sub-images to each of the plurality of images of the second map. Further, the plurality of images of the second map may be ordered by similarity magnitude.

At 310, the computing device 120 may determine a candidate image from the plurality of images of the second map based on the similarity information. In other words, the first few predetermined number of images with a large similarity will be determined as candidate images.

In some embodiments, the image retrieval module 121 in the computing device 120 may estimate the principal plane and the normal direction using Principal Component Analysis (PCA) based on the position coordinates of the first image 111 as the query map, then project the coordinates of the image onto the estimated plane and divide the plane range into a grid of N x N, N being a positive integer. Further, the image search module 121 sorts images in each grid by the search similarity, and selects K, which is a positive integer, as a search result in the top rank in each grid. In this way, the present disclosure enables inter-picture sub-pixel level matching.

After a number of matching candidate images are determined, feature extraction needs to be performed on these images as well as the first image in order to find a second image matching the first image 111 from the reference image 112 at the feature data level. Fig. 4 illustrates a schematic diagram of an example system 400 for acquiring feature maps in accordance with an embodiment of the present disclosure.

It should be appreciated that the example system 400 of fig. 4 may be implemented in the keypoint matching module 122 of fig. 1. In fig. 4, in order to acquire a first sub-feature map of a first resolution and a second sub-feature map of a second resolution of the first image 411 and a third sub-feature map of a first resolution and a fourth sub-feature map of a second resolution of the candidate image 412, the first image 411 and the candidate image 412 may be input into the convolutional neural network 420, respectively, to thereby obtain a first sub-feature map 421 of the first resolution and a third sub-feature map 422 of the first resolution, respectively. Thereafter, the first sub-feature map 421 and the third sub-feature map 422 are input to FPN (feature pyramid network) to obtain a second sub-feature map 431 of the second resolution and a fourth sub-feature map 432 of the second resolution, respectively. As an example, the first resolution is less than the second resolution. As an example, a rough feature map of the first picture (from the query map) and the reference picture (from the reference map) may be extracted using convolutional neural network 420. The coarse feature map may then be upsampled using the FPN and added to the feature map of the same scale to obtain a fine feature map. Since feature maps with different resolutions can be used for different calculation tasks, the method can save calculation resources and ensure calculation accuracy.

After the feature maps are determined, they need to be matched to determine the location information of the key points in the first image. Fig. 5 illustrates a schematic diagram of a portion of an example system 500 for determining location information of a keypoint in accordance with an embodiment of the disclosure.

First, to determine the matching score, the computing device 120 may update the first sub-feature map 421 based on the first sub-feature map 421 and the position encoding 511 of the pixel location of the at least one pixel, and update the third sub-feature map 422 based on the third sub-feature map 422 and the position encoding 512 of the pixel location of the at least one pixel. Further, the updated first and third sub-feature maps are processed by the neural network model 520 to determine a matching score 530.

As shown in fig. 5, for the first sub-feature map 421 of the first resolution and the third sub-feature map 422 of the first resolution, first, the position codes 511 and 512 are generated and added to themselves, respectively. Thereafter, the first sub-feature map 421 with the position code 511 added and the third sub-feature map 422 with the position code 512 added are input to the neural network model 520 to determine a matching score 530 of at least one pixel in the first sub-feature map 421 and at least one pixel in the third sub-feature map 422. Further, the computing device 120 may select location information for pixel pairs from the matching scores that are greater than a threshold score. As an example, the threshold score may be a value of 0.2, 0.25, 0.3, etc. In this way, a coarse matching of feature levels is achieved. Because rough matching utilizes feature maps with lower resolution, the matching process saves computational resources.

In certain embodiments, the neural network model 520 is a transducer model.

After the location information of the keypoints is determined by coarse matching, a fine regression of the coarse results is required to determine the exact location information of the keypoints in the first image. Fig. 6 illustrates a schematic diagram of another portion of an example system 600 for determining location information of keypoints according to embodiments of the disclosure.

As shown in fig. 6, when at least one keypoint matches the coarse position information for the pixel pair, the computing device 120 may obtain a partial feature map from the second sub-feature map 431 of the second resolution based on the coarse position information 530 for the pixel pair. As an example, a partial feature map of a predetermined size may be cropped centered on the coordinates shown in the coarse position information 530 using the cropping module 611. Accordingly, the computing device 120 may obtain a partial candidate feature map from the fourth sub-feature map 432 of the second resolution based on the location information of the keypoints in the candidate image. As an example, a portion of the candidate feature map of a predetermined size may be cropped with the cropping module 612 centered on the coordinates shown by the position information 613 of the keypoint in the candidate image. Further, the computing device 120 may determine location information 640 of the keypoint on the first image based on the cropped partial feature map and the partial candidate feature map using the neural network model 630. It should be appreciated that the dimensions of the partial feature map and the partial candidate feature map are smaller than the dimensions of the first feature map and the candidate feature map. Preferably, the coordinates of the key points can also be returned back and forth using an MLP (multi-layer neural network), resulting in their corresponding coordinates on the first image. In this way, each feature vector to be matched corresponding to a pixel in the feature map includes both the information of the feature map itself and the information of the corresponding feature map, so that the matching result can be more accurate.

In certain embodiments, the neural network model 630 is a transducer model.

Fig. 7 illustrates a flowchart of a process 700 of determining a transformation relationship between images according to an embodiment of the present disclosure. In some embodiments, process 700 may be in computing device 120 in fig. 1. A process 700 for determining an image transformation relationship according to an embodiment of the present disclosure is now described with reference to fig. 1. For ease of understanding, the specific examples mentioned in the following description are illustrative and are not intended to limit the scope of the disclosure.

As shown in fig. 7, at 702, the computing device 120 may determine a number of matching points based on the location information of the at least one keypoint on the first image and the location information of the at least one keypoint on the second image. And, at 704, the number is compared to a threshold number. As an example, computing device 120 may determine whether the number of matching points for each query picture is greater than a threshold number, and discard the picture if the requirements are not met. When the number is greater than the threshold number, at 706, the computing device 120 may determine pose information for the first image. As an example, computing device 120 may calculate the pose of the query picture using PnP (PERSPECTIVE-n-Point) algorithm. Further, at 708, the computing device 120 may determine a transformation relationship of the first image to the second image based at least on pose information of the first image. As an example, the computing device 120 may determine the transformation relationship (rotation, translation, scaling, etc.) of the first image to the second image using, for example, a RANSAC algorithm, taking the newly calculated pose of each query image as reference coordinates. Further, the transformation relationship may be applied to, for example, a query map, resulting in a fusion of the query map to a reference map.

In some embodiments, if the original construction project of the query map exists (such as map points, matching relations and the like), the fine matching points can be used as control points, the pose fused in the previous step is used as an initial pose, and nonlinear optimization is adopted to further optimize the pose, so that a fusion result with higher precision is obtained.

In some embodiments, to determine location information of at least one keypoint on the first image 111, the computing device 120 may determine a matching score for a plurality of pixels in the first feature map to a plurality of pixels in the candidate feature map. The computing device 120 may also obtain location information for pixel pairs having a matching score greater than a threshold score. If the at least one keypoint matches the acquired location information of the pixel pair, the computing device 120 may acquire a partial feature map from the first feature map based on the location information of the pixel pair. Further, the computing device 120 may obtain a partial candidate feature map from the candidate feature maps based on the location information of the at least one keypoint in the candidate image. Based on the partial feature map and the partial candidate feature map, the computing device 120 may determine location information of at least one keypoint on the first image 111. It should be appreciated that while the resolution of the feature map in this embodiment is consistent, it may still enable determination of location information of the keypoints.

Through the above embodiments, the present disclosure provides a heterogeneous image fusion scheme for achieving keypoint matching using feature maps. According to the key point matching scheme, three-dimensional coordinates of key points in images of different descriptors are transferred to other images, so that repeated point-by-point matching operation is avoided. In addition, the rough-to-fine matching strategy can overcome the problems of weak textures, large texture difference and the like, so that the fusion precision is higher.

Fig. 8 illustrates a schematic block diagram of an image processing apparatus 800 according to some embodiments of the present disclosure. As shown in fig. 8, the apparatus 800 may include a candidate image determining module 802 configured to determine a candidate image that matches a first image based on similarity information, the first image and the candidate image each including at least one keypoint, location information of the at least one keypoint in the candidate image being known, the candidate image including a second image. The apparatus 800 may further comprise a feature map obtaining module 804 configured to obtain a first feature map and a candidate feature map from the first image and the candidate image, respectively. The apparatus 800 may further comprise a location information determining module 806 for determining location information of the at least one keypoint on the second image on the first image by determining matched pairs of pixels in the first feature map and the candidate feature map. Further, the apparatus 800 may comprise a transformation relation determination module 808 for determining a transformation relation of the first image to the second image based on the position information of the at least one keypoint on the first image and the position information of the at least one keypoint on the second image.

In some embodiments, the location information determination module 806 may include: a matching score determination module configured to determine matching scores of a plurality of pixels in the first feature map and a plurality of pixels in the candidate feature map; a position information acquisition module configured to acquire position information of the pixel pairs whose matching score is greater than a threshold score; a partial feature map acquisition module configured to acquire a partial feature map from the first feature map based on the position information of the pixel pair in response to the at least one key point conforming to the position information of the pixel pair; a partial candidate feature map acquisition module configured to acquire a partial candidate feature map from the candidate feature map based on the position information of the at least one key point in the candidate image; and a determining module configured to determine location information of the at least one keypoint on the first image based on the partial feature map and the partial candidate feature map.

In some embodiments, the feature map acquisition module 804 is configured to: acquiring a first sub-feature map of a first resolution and a second sub-feature map of a second resolution of the first image; and acquiring a third sub-feature map of the first resolution and a fourth sub-feature map of the second resolution of the candidate image, the first resolution being less than the second resolution.

In some embodiments, the location information determination module 806 includes: a matching score determination module configured to determine a matching score of at least one pixel in the first sub-feature map and at least one pixel in the third sub-feature map; a position information acquisition module configured to acquire position information of the pixel pairs whose matching score is greater than a threshold score; a partial feature map acquisition module configured to acquire a partial feature map from the second sub-feature map based on the position information of the pixel pair in response to the at least one key point conforming to the position information of the pixel pair; a partial candidate feature map acquisition module configured to acquire a partial candidate feature map from the fourth sub-feature map based on the position information of the at least one key point in the candidate image; and a determining module configured to determine location information of the at least one keypoint on the first image based on the partial feature map and the partial candidate feature map.

In some embodiments, the dimensions of the partial feature map and the partial candidate feature map are smaller than the dimensions of the first feature map and the candidate feature map.

In some embodiments, the matching score determination module is configured to: updating the first sub-feature map based on the first sub-feature map and the pixel location of the at least one pixel; updating the third sub-feature map based on the third sub-feature map and the pixel location of the at least one pixel; and processing the updated first sub-feature map and the third sub-feature map through a neural network model to determine the matching score.

In some embodiments, the location information determination module 806 is configured to: updating the partial feature map based on the partial feature map and pixel positions of pixels in the partial feature map; updating the partial candidate feature map based on the partial candidate feature map and pixel positions of pixels in the partial candidate feature map; and processing the updated partial feature map and the partial candidate feature map through the neural network model to determine the position information of the at least one key point on the first image.

In certain embodiments, the neural network model is a transducer model.

In some embodiments, the candidate image determination module 802 is configured to: acquiring at least the first image and pose information from a first map; acquiring the plurality of second images from a second map different from the first map; dividing the first image into a plurality of sub-images based on the pose information; determining similarity information of each of the plurality of sub-images with each of the plurality of second images; and determining the candidate image based on the similarity information.

In some embodiments, the transformation relationship determination module 808 includes: determining the number of matching points based on the position information of the at least one key point on the first image and the position information of the at least one key point on the second image; determining pose information for the first image in response to the number being greater than a threshold number; and determining a transformation relationship of the first image to the second image based at least on pose information of the first image.

In certain embodiments, the apparatus 800 further comprises: and the fusion module is used for determining a fusion image of the first image and the second image based on the transformation relation.

Fig. 9 shows a schematic block diagram of an example device 900 that may be used to implement embodiments of the present disclosure. As shown in fig. 9, the apparatus 900 includes a computing unit 901 that can perform various suitable actions and processes according to computer program instructions stored in a Random Access Memory (RAM) 903 and/or a Read Only Memory (ROM) 902 or computer program instructions loaded from a storage unit 908 into the RAM 903 and/or ROM 902. In the RAM 903 and/or ROM 902, various programs and data required for the operation of the device 900 may also be stored. The computing unit 901 and the RAM 903 and/or ROM 902 are connected to each other by a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.

Various components in the device 600 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, or the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, an optical disk, or the like; and a communication unit 909 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunications networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 901 performs the various methods and processes described above, such as processes 200, 300, 700. For example, in some embodiments, the processes 200, 300, 700 may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 900 via the RAM 903 and/or ROM 902 and/or communication unit 909. One or more of the steps of the processes 200, 300, 700 described above may be performed when a computer program is loaded into RAM 903 and/or ROM 902 and executed by the computing unit 901. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the processes 200, 300, 700 by any other suitable means (e.g., by means of firmware).

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Moreover, although operations are depicted in a particular order, this should be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.

Claims

1. An image processing method, the method comprising:

determining a candidate image matched with a first image based on the similarity information, wherein the first image and the candidate image comprise at least one key point, the position information of the at least one key point in the candidate image is known, and the candidate image comprises a second image;

Respectively acquiring a first feature map and a candidate feature map from the first image and the candidate image;

determining location information of the at least one keypoint on the second image on the first image by determining matched pairs of pixels in the first feature map and the candidate feature map; and

A transformation relationship of the first image to the second image is determined based on the location information of the at least one keypoint on the first image and the location information of the at least one keypoint on the second image.

2. The method of claim 1, wherein determining location information of the at least one keypoint on the first image comprises:

determining matching scores of a plurality of pixels in the first feature map and a plurality of pixels in the candidate feature map;

Acquiring position information of the pixel pairs with the matching score larger than a threshold score;

acquiring a partial feature map from the first feature map based on the position information of the pixel pairs in response to the at least one key point conforming to the position information of the pixel pairs;

Acquiring a part of candidate feature images from the candidate feature images based on the position information of the at least one key point in the candidate images; and

And determining the position information of the at least one key point on the first image based on the partial feature map and the partial candidate feature map.

3. The method of claim 1, wherein separately obtaining the first feature map and the candidate feature map comprises:

Acquiring a first sub-feature map of a first resolution and a second sub-feature map of a second resolution of the first image; and

And acquiring a third sub-feature map of the first resolution and a fourth sub-feature map of the second resolution of the candidate image, wherein the first resolution is smaller than the second resolution.

4. A method according to claim 3, wherein determining location information of the at least one keypoint on the first image comprises:

Determining a matching score of at least one pixel in the first sub-feature map and at least one pixel in the third sub-feature map;

acquiring a partial feature map from the second sub-feature map based on the position information of the pixel pairs in response to the at least one key point conforming to the position information of the pixel pairs;

Acquiring a part of candidate feature images from the fourth sub-feature images based on the position information of the at least one key point in the candidate images; and

5. The method according to claim 2 or 4, wherein the dimensions of the partial feature map and the partial candidate feature map are smaller than the dimensions of the first feature map and the candidate feature map.

6. The method of claim 4, wherein determining the matching score comprises:

Updating the first sub-feature map based on the first sub-feature map and the pixel location of the at least one pixel;

updating the third sub-feature map based on the third sub-feature map and the pixel location of the at least one pixel; and

And processing the updated first sub-feature map and the updated third sub-feature map through a neural network model to determine the matching score.

7. The method of claim 6, wherein determining location information of the at least one keypoint on the first image comprises:

Updating the partial feature map based on the partial feature map and pixel positions of pixels in the partial feature map;

Updating the partial candidate feature map based on the partial candidate feature map and pixel positions of pixels in the partial candidate feature map; and

And processing the updated partial feature map and the partial candidate feature map through the neural network model to determine the position information of the at least one key point on the first image.

8. The method of claim 6 or 7, wherein the neural network model is a transducer model.

9. The method of any of claims 1-4, wherein determining the candidate image that matches the first image comprises:

acquiring at least the first image and pose information from a first map;

acquiring a plurality of images from a second map different from the first map;

dividing the first image into a plurality of sub-images based on the pose information;

Determining similarity information of each of the plurality of sub-images and each of the plurality of images; and

The candidate image is determined from the plurality of images based on the similarity information.

10. The method of any of claims 1-4, wherein determining a transformation relationship of the first image to the second image comprises:

Determining the number of matching points based on the position information of the at least one key point on the first image and the position information of the at least one key point on the second image;

determining pose information for the first image in response to the number being greater than a threshold number; and

Based at least on pose information of the first image, a transformation relationship of the first image to the second image is determined.

11. The method according to any one of claims 1 to 4, further comprising:

and determining a fusion image of the first image and the second image based on the transformation relation.

12. An image processing apparatus, comprising:

A candidate image determination module configured to determine a candidate image that matches a first image based on the similarity information, the first image and the candidate image each including at least one keypoint, the location information of the at least one keypoint in the candidate image being known, the candidate image including a second image;

The feature map acquisition module is configured to acquire a first feature map and a candidate feature map from the first image and the candidate image respectively;

A location information determination module configured to determine location information of the at least one keypoint on the second image on the first image by determining matched pairs of pixels in the first feature map and the candidate feature map; and

A transformation relationship determination module configured to determine a transformation relationship of the first image to the second image based on the position information of the at least one key point on the first image and the position information of the at least one key point on the second image.

13. An electronic device, comprising:

at least one computing unit;

at least one memory coupled to the at least one computing unit and storing instructions for execution by the at least one computing unit, the instructions when executed by the at least one computing unit cause the apparatus to perform the method of any one of claims 1-11.

14. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-11.

15. A computer program product comprising computer executable instructions which, when executed by a processor, implement the method according to any one of claims 1-11.