CN113538310A

CN113538310A - Image processing method and device, electronic equipment and storage medium

Info

Publication number: CN113538310A
Application number: CN202110801048.2A
Authority: CN
Inventors: 孙文秀; 谢佳芯; 严琼; 王腾飞; 陈启峰
Original assignee: Shenzhen TetrasAI Technology Co Ltd
Current assignee: Shenzhen TetrasAI Technology Co Ltd
Priority date: 2021-07-15
Filing date: 2021-07-15
Publication date: 2021-10-22

Abstract

The present disclosure relates to an image processing method and apparatus, an electronic device, and a storage medium, the method including: acquiring a wide-angle image and a tele image aiming at the same scene; performing characteristic extraction on the wide-angle image and the tele image to obtain a wide-angle characteristic diagram of the wide-angle image and a tele characteristic diagram of the tele image; determining matching feature image blocks corresponding to the wide-angle feature image blocks of the wide-angle feature image from a plurality of tele feature image blocks of the tele feature image; for any wide-angle feature pattern block, converting the matching feature pattern block corresponding to the wide-angle feature pattern block to obtain an alignment feature pattern block aligned with the wide-angle feature pattern block; and performing feature fusion on the plurality of alignment feature image blocks and the wide-angle feature image to obtain a fused target image. The embodiment of the disclosure can be beneficial to improving the image quality of the fused target image.

Description

Image processing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to an image processing method and apparatus, an electronic device, and a storage medium.

Background

Camera fusion is an important task in computer vision, and aims to fuse clear texture details in a tele image acquired by a tele camera of the same electronic device (e.g., a mobile phone) into a wide image acquired by a wide camera of the electronic device.

With the development of the deep learning technology, in the related art, camera fusion can be performed in a mode based on image block matching or pixel-by-pixel matching, but pixel-by-pixel matching is generally unstable and is prone to distortion; the image block matching method is easy to generate dislocation in an image block, and can introduce irrelevant noise information into a wide-angle image, thereby influencing the quality of the fused image.

Disclosure of Invention

The present disclosure proposes an image processing technical solution.

According to an aspect of the present disclosure, there is provided an image processing method including: acquiring a wide-angle image and a tele image aiming at the same scene; performing feature extraction on the wide-angle image and the tele image to obtain a wide-angle feature map of the wide-angle image and a tele feature map of the tele image; determining matching feature patches corresponding to the wide feature patches of the wide feature map from the plurality of tele feature patches of the tele feature map; for any wide-angle feature pattern block, transforming the matching feature pattern block corresponding to the wide-angle feature pattern block to obtain an alignment feature pattern block aligned with the wide-angle feature pattern block; and carrying out feature fusion on the plurality of alignment feature image blocks and the wide-angle feature image to obtain a fused target image. By the method, irrelevant noise information can be reduced from being merged into the wide-angle image, and the image quality of the merged target image is improved.

In one possible implementation, the determining, from a plurality of tele feature patches of the tele feature map, a matching feature patch corresponding to each of the wide feature patches of the wide feature map includes: respectively segmenting the wide-angle feature map and the tele feature map according to a preset segmentation rule to obtain a plurality of wide-angle feature map blocks and a plurality of tele feature map blocks; for any wide-angle feature pattern block, determining a matching feature pattern block corresponding to the wide-angle feature pattern block from the plurality of tele feature pattern blocks according to the similarity between the wide-angle feature pattern block and the plurality of tele feature pattern blocks. By the method, the best matched tele characteristic image block can be effectively found for each wide characteristic image block, namely, the matched characteristic image block corresponding to each wide characteristic image block is determined.

In one possible implementation, for any wide angle feature tile, transforming a matching feature tile corresponding to the wide angle feature tile to obtain an aligned feature tile aligned with the wide angle feature tile includes: determining, for any wide-angle feature tile, a mapping between the wide-angle feature tile and the matching feature tile, the mapping comprising an affine matrix between the wide-angle feature tile and the matching feature tile; and carrying out affine transformation on the matched feature pattern blocks according to the affine matrix to obtain the aligned feature pattern blocks. By the method, the alignment feature pattern blocks aligned with the wide-angle feature pattern blocks can be effectively obtained according to the affine matrix between the wide-angle feature pattern blocks and the matching feature pattern blocks.

In one possible implementation, the method further includes: for any wide-angle feature pattern block, determining a confidence level of the matching feature pattern block according to the similarity between the wide-angle feature pattern block and the corresponding matching feature pattern block, wherein the confidence level is used for indicating the matching degree of the wide-angle feature pattern block and the matching feature pattern block; wherein, the feature fusion is carried out on a plurality of alignment feature image blocks and the wide-angle feature image to obtain a fused target image, and the method comprises the following steps: splicing the plurality of alignment feature image blocks to obtain an alignment feature map; performing feature fusion on the wide-angle feature map and the aligned feature map according to the confidence degrees of the multiple matched feature map blocks to obtain a fused feature map; and generating the target image according to the fusion feature map. By the method, the introduction of irrelevant noise information into the wide-angle feature map can be reduced, more effective feature information in the fusion alignment feature map is facilitated, and the image quality of the target image is improved.

In one possible implementation manner, the performing feature fusion on the wide-angle feature map and the aligned feature map according to the confidence degrees of the plurality of matched feature patches to obtain a fused feature map includes: determining a confidence in the confidence of the plurality of matching feature patches that is above a confidence threshold; and performing feature fusion on the feature value indicated by the confidence coefficient higher than the confidence coefficient threshold value in the alignment feature map and the feature value of the wide-angle feature map to obtain a fused feature map. By the method, irrelevant noise features can be introduced into the fusion feature map, and the image quality of the target image can be improved.

In one possible implementation, the tele feature map includes I dimensions, and the wide feature map includes I dimensions, where I is a positive integer, and wherein the generating the target image from the fused feature map includes: the fused feature map of the (I-1) th scale is up-sampled to obtain a feature map of the (I) th scale, wherein (I +1) is more than or equal to I and more than or equal to 2; performing feature fusion on the feature map of the ith scale and the alignment feature map of the ith scale to obtain a fusion feature map of the ith scale, wherein the alignment feature map of the ith scale is obtained according to the tele feature map of the ith scale and the wide feature map of the ith scale; and if the I is equal to (I +1), decoding the feature map of the (I +1) th scale to obtain the target image. By the method, the target image can be effectively generated aiming at the multi-scale tele characteristic diagram and the multi-scale wide characteristic diagram, the introduction of irrelevant noise information is reduced, and the image quality of the target image is improved.

In a possible implementation manner, the performing feature fusion on the feature map of the ith scale and the alignment feature map of the ith scale to obtain a fused feature map of the ith scale includes: and performing feature fusion on the feature map of the ith scale, the alignment feature map of the ith scale and the wide-angle feature map of the ith scale to obtain a fusion feature map of the ith scale. By the method, the loss of the characteristic information in the wide-angle characteristic diagrams with different scales can be reduced, and the image quality of the generated target image is higher. By the method, the loss of the characteristic information in the wide-angle characteristic diagrams with different scales can be reduced, and the image quality of the generated target image is higher.

In one possible implementation, the method further includes: recording the matching characteristic image blocks corresponding to the wide-angle characteristic image blocks through the index map; wherein, the splicing the plurality of alignment feature blocks to obtain the alignment feature map comprises: and splicing the plurality of alignment feature image blocks according to the index map to obtain an alignment feature map. By the method, the matching characteristic diagram blocks can be conveniently recorded through the index diagram, and meanwhile, the splicing of the alignment characteristic diagram can be quickly realized.

In one possible implementation, the method further includes: extracting image details of the tele image to obtain a detail image of the tele image; and carrying out image fusion on the target image and the detail image to obtain a fused image. By the method, the high-frequency details of the tele image can be fused into the target image, so that the fused image has higher image quality.

In a possible implementation manner, the image fusing the target image and the detail image to obtain a fused image includes: according to the similarity between a plurality of target image blocks of the target image and a plurality of detail image blocks of the detail image, respectively determining a matching image block corresponding to each target image block from the plurality of detail image blocks, and determining the confidence coefficient of the matching image block; aiming at any target image block, transforming the matched image block according to the mapping relation between the target image block and the corresponding matched image block to obtain an aligned image block aligned with the target image block; and carrying out image fusion on the plurality of aligned image blocks and the target image according to the confidence degrees of the plurality of matched image blocks to obtain the fused image. By the method, irrelevant noise information introduced into the target image can be reduced, more useful image details in the fused detail image can be obtained, and the image quality of the fused image can be improved.

In one possible implementation, the image processing method is implemented by an image processing network, which includes a feature extraction sub-network, an alignment attention sub-network, and an adaptive fusion sub-network; wherein, it reaches to be right the long burnt image carries out feature extraction, obtains the wide angle characteristic map of wide angle image reaches the long burnt characteristic map of long burnt image includes: and performing feature extraction on the obtained wide-angle image and the tele image through the feature extraction sub-network to obtain a wide-angle feature map of the wide-angle image and a tele feature map of the tele image. By the method, the image processing method can be efficiently and accurately realized through the image processing network.

In one possible implementation, the alignment attention subnetwork comprises: matching the network layer, aligning the network layer and determining the network layer by mapping relation; wherein determining, from the plurality of tele feature patches of the tele feature map, a matching feature patch corresponding to each wide feature patch of the wide feature map comprises: determining, by the matching network layer, matching feature patches corresponding to respective Wide feature patches of the Wide feature map from a plurality of tele feature patches of the tele feature map. By the method, the image processing method can be efficiently and accurately realized through the image processing network.

In one possible implementation, for any wide angle feature tile, transforming a matching feature tile corresponding to the wide angle feature tile to obtain an aligned feature tile aligned with the wide angle feature tile includes: determining, by the mapping determination network layer, a mapping between the wide-angle feature tile and the matching feature tile; and transforming the matching feature pattern blocks corresponding to the wide-angle feature pattern blocks according to the mapping relation between the wide-angle feature pattern blocks and the matching feature pattern blocks by aiming at any wide-angle feature pattern block through the alignment network layer to obtain the alignment feature pattern blocks aligned with the wide-angle feature pattern blocks. By the method, the image processing method can be efficiently and accurately realized through the image processing network.

In one possible implementation, the adaptive fusion subnetwork comprises: a confidence learning network layer and a fusion network layer; wherein, the feature fusion is carried out on a plurality of alignment feature image blocks and the wide-angle feature image to obtain a fused target image, and the method comprises the following steps: determining the confidence of each matched feature pattern block through the confidence learning network layer; and performing feature fusion on the plurality of aligned feature pattern blocks and the wide-angle feature pattern through a fusion network layer according to the confidence degrees of the matched feature pattern blocks to obtain a fused target image. By the method, the image processing method can be efficiently and accurately realized through the image processing network.

According to an aspect of the present disclosure, there is provided an image processing apparatus including: the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a wide-angle image and a tele image aiming at the same scene; the characteristic extraction module is used for carrying out characteristic extraction on the wide-angle image and the tele image to obtain a wide-angle characteristic diagram of the wide-angle image and a tele characteristic diagram of the tele image; a matching module, configured to determine, from a plurality of tele feature patches of the tele feature map, matching feature patches corresponding to respective wide feature patches of the wide feature map; the transformation module is used for transforming the matching feature pattern blocks corresponding to any wide-angle feature pattern block to obtain an alignment feature pattern block aligned with the wide-angle feature pattern block; and the characteristic fusion module is used for carrying out characteristic fusion on the plurality of aligned characteristic image blocks and the wide-angle characteristic image to obtain a fused target image.

In one possible implementation, the matching module includes: the segmentation submodule is used for segmenting the wide-angle feature map and the tele feature map respectively according to a preset segmentation rule to obtain a plurality of wide-angle feature image blocks and a plurality of tele feature image blocks; and the matching sub-module is used for determining a matching characteristic image block corresponding to the wide-angle characteristic image block from the plurality of tele characteristic image blocks according to the similarity between the wide-angle characteristic image block and the plurality of tele characteristic image blocks aiming at any wide-angle characteristic image block.

In one possible implementation, the transformation module includes: a mapping relationship determination submodule to determine, for any wide-angle feature tile, a mapping relationship between the wide-angle feature tile and the matching feature tile, the mapping relationship comprising an affine matrix between the wide-angle feature tile and the matching feature tile; and the transformation submodule is used for carrying out affine transformation on the matched feature pattern blocks according to the affine matrix to obtain the aligned feature pattern blocks.

In one possible implementation, the apparatus further includes: a confidence determination module, configured to determine, for any wide-angle feature block, a confidence of the matching feature block according to a similarity between the wide-angle feature block and the corresponding matching feature block, where the confidence is used to indicate a matching degree between the wide-angle feature block and the matching feature block; wherein, the fusion module comprises: the characteristic image block splicing submodule is used for splicing the plurality of alignment characteristic image blocks to obtain an alignment characteristic image; the feature fusion submodule is used for carrying out feature fusion on the wide-angle feature map and the alignment feature map according to the confidence degrees of a plurality of matched feature map blocks to obtain a fusion feature map; and the target image generation submodule is used for generating the target image according to the fusion feature map.

In one possible implementation manner, the performing feature fusion on the wide-angle feature map and the aligned feature map according to the confidence degrees of the plurality of matched feature patches to obtain a fused feature map includes: determining a confidence in the confidence of the plurality of matching feature patches that is above a confidence threshold; and performing feature fusion on the feature value indicated by the confidence coefficient higher than the confidence coefficient threshold value in the alignment feature map and the feature value of the wide-angle feature map to obtain a fused feature map.

In one possible implementation, the tele feature map includes I dimensions, and the wide feature map includes I dimensions, where I is a positive integer, and wherein the generating the target image from the fused feature map includes: the fused feature map of the (I-1) th scale is up-sampled to obtain a feature map of the (I) th scale, wherein (I +1) is more than or equal to I and more than or equal to 2; performing feature fusion on the feature map of the ith scale and the alignment feature map of the ith scale to obtain a fusion feature map of the ith scale, wherein the alignment feature map of the ith scale is obtained according to the tele feature map of the ith scale and the wide feature map of the ith scale; and if the I is equal to (I +1), decoding the feature map of the (I +1) th scale to obtain the target image.

In a possible implementation manner, the performing feature fusion on the feature map of the ith scale and the alignment feature map of the ith scale to obtain a fused feature map of the ith scale includes: and performing feature fusion on the feature map of the ith scale, the alignment feature map of the ith scale and the wide-angle feature map of the ith scale to obtain a fusion feature map of the ith scale.

In one possible implementation, the apparatus further includes: the recording module is used for recording the matching characteristic image blocks corresponding to the wide-angle characteristic image blocks through the index map; wherein, the splicing the plurality of alignment feature blocks to obtain the alignment feature map comprises: and splicing the plurality of alignment feature image blocks according to the index map to obtain an alignment feature map.

In one possible implementation, the apparatus further includes: the image detail extraction module is used for extracting the image details of the tele image to obtain a detail image of the tele image; and the image fusion module is used for carrying out image fusion on the target image and the detail image to obtain a fusion image.

In one possible implementation, the image fusion module includes: the image block matching sub-module is used for respectively determining a matching image block corresponding to each target image block from the plurality of detail image blocks according to the similarity between the plurality of target image blocks of the target image and the plurality of detail image blocks of the detail image, and determining the confidence coefficient of the matching image block; the image block transformation sub-module is used for transforming the matched image blocks according to the mapping relation between the target image blocks and the corresponding matched image blocks aiming at any target image block to obtain aligned image blocks aligned with the target image blocks; and the image fusion sub-module is used for carrying out image fusion on the plurality of aligned image blocks and the target image according to the confidence degrees of the plurality of matched image blocks to obtain the fused image.

In one possible implementation, the image processing device is implemented by an image processing network comprising a feature extraction sub-network, an alignment attention sub-network, and an adaptive fusion sub-network; wherein, it reaches to be right the long burnt image carries out feature extraction, obtains the wide angle characteristic map of wide angle image reaches the long burnt characteristic map of long burnt image includes: and performing feature extraction on the obtained wide-angle image and the tele image through the feature extraction sub-network to obtain a wide-angle feature map of the wide-angle image and a tele feature map of the tele image.

In one possible implementation, the alignment attention subnetwork comprises: matching the network layer, aligning the network layer and determining the network layer by mapping relation; wherein determining, from the plurality of tele feature patches of the tele feature map, a matching feature patch corresponding to each wide feature patch of the wide feature map comprises: determining, by the matching network layer, matching feature patches corresponding to respective Wide feature patches of the Wide feature map from a plurality of tele feature patches of the tele feature map.

In one possible implementation, for any wide angle feature tile, transforming a matching feature tile corresponding to the wide angle feature tile to obtain an aligned feature tile aligned with the wide angle feature tile includes: determining, by the mapping determination network layer, a mapping between the wide-angle feature tile and the matching feature tile; and transforming the matching feature pattern blocks corresponding to the wide-angle feature pattern blocks according to the mapping relation between the wide-angle feature pattern blocks and the matching feature pattern blocks by aiming at any wide-angle feature pattern block through the alignment network layer to obtain the alignment feature pattern blocks aligned with the wide-angle feature pattern blocks.

In one possible implementation, the adaptive fusion subnetwork comprises: a confidence learning network layer and a fusion network layer; wherein, the feature fusion is carried out on a plurality of alignment feature image blocks and the wide-angle feature image to obtain a fused target image, and the method comprises the following steps: determining the confidence of each matched feature pattern block through the confidence learning network layer; and performing feature fusion on the plurality of aligned feature pattern blocks and the wide-angle feature pattern through a fusion network layer according to the confidence degrees of the matched feature pattern blocks to obtain a fused target image.

According to an aspect of the present disclosure, there is provided an electronic device including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.

In the embodiment of the disclosure, matching feature image blocks corresponding to the wide-angle feature image blocks are determined from the plurality of tele feature image blocks, so that the matching feature image blocks can be accurately determined by using image features in a feature space; and then, the matching characteristic pattern blocks are transformed to obtain an alignment characteristic pattern aligned with the wide-angle characteristic pattern, so that the dislocation phenomenon of the alignment characteristic pattern blocks and the wide-angle characteristic pattern blocks can be reduced, the introduction of irrelevant noise information into the wide-angle image is reduced, and the image quality of the fused target image is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 shows a flowchart of an image processing method according to an embodiment of the present disclosure.

Fig. 2a shows a schematic diagram of a wide-angle image according to an embodiment of the present disclosure.

Fig. 2b shows a schematic diagram of a tele image according to an embodiment of the disclosure.

FIG. 3a shows a schematic diagram of a signature graph according to an embodiment of the present disclosure.

Fig. 3b shows a schematic diagram of a feature tile according to an embodiment of the present disclosure.

FIG. 4 illustrates a schematic diagram of a detail image in accordance with an embodiment of the present disclosure.

FIG. 5 shows a block diagram of an image processing network according to an embodiment of the present disclosure.

Fig. 6 shows a schematic diagram of an image processing method according to an embodiment of the present disclosure.

Fig. 7 illustrates a block diagram of an image processing apparatus according to an embodiment of the present disclosure.

FIG. 8 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure.

Fig. 9 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

Fig. 1 shows a flowchart of an image processing method according to an embodiment of the present disclosure, which may be performed by an electronic device such as a terminal device or a server, the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like, and the method may be implemented by a processor calling a computer-readable instruction stored in a memory. Alternatively, the method may be performed by a server. As shown in fig. 1, the image processing method includes:

in step S11, a wide image and a tele image for the same scene are acquired;

in step S12, feature extraction is performed on the wide-angle image and the tele image to obtain a wide-angle feature map of the wide-angle image and a tele feature map of the tele image;

in step S13, a matching feature pattern corresponding to each wide feature pattern of the wide feature pattern is determined from the plurality of tele feature patterns of the tele feature pattern;

in step S14, for any wide-angle feature pattern block, transforming the matching feature pattern block corresponding to the wide-angle feature pattern block to obtain an alignment feature pattern block aligned with the wide-angle feature pattern block;

in step S15, feature fusion is performed on the multiple alignment feature patches and the wide-angle feature map, so as to obtain a fused target image.

In one possible implementation, in step S11, the wide image may be acquired by a wide camera (or a wide camera) installed on the electronic device, and the tele image may be acquired by a tele camera (or a tele camera) installed on the electronic device. It should be understood that the positions of the wide camera (or wide camera) and the tele camera (or tele camera) on the electronic device may be fixed, for example, the wide camera and the tele camera of a cell phone. It can be understood that the fixed-position wide camera and tele camera respectively acquire the wide image and tele image which can be for the same scene.

Fig. 2a shows a schematic diagram of a wide image and fig. 2b shows a schematic diagram of a tele image according to an embodiment of the present disclosure. As shown in fig. 2a, the wide-angle image collected by the wide-angle camera has a large view angle, but the texture details of the distant object may not be clear enough, and as shown in fig. 2b, the tele-image collected by the tele-camera has a small view angle, but the clear texture details of the distant object are collected. It should be understood that by fusing the sharp texture details of the tele image acquisition into the wide image, a high quality image with a large viewing angle and sharp texture details can be obtained.

In one possible implementation manner, in step S12, feature extraction may be performed on the wide-angle image and the tele image through a feature extraction network, that is, an encoding network, to obtain a wide-angle feature map of the wide-angle image and a tele feature map of the tele image. The feature extraction network may be, for example, a convolutional neural network, and the embodiments of the present disclosure are not limited to the network structure, the network type, and the like of the feature extraction network.

The wide-angle feature map and the tele-feature map extracted by the feature extraction network may be in multiple scales, respectively, and the scales of the wide-angle feature map and the tele-feature map may correspond to each other. For example, the wide-angle feature map may include 128 × 128, 64 × 64, 32 × 32, etc. dimensions, and correspondingly, the tele feature map may also include 128 × 128, 64 × 64, 32 × 32, etc. dimensions.

It should be understood that the smaller the feature map scale, the more prominent and semantically rich the features, and the higher the accuracy for determining the feature similarity, in step S13, determining a matching feature patch corresponding to each wide-angle feature patch of the wide-angle feature map from the plurality of tele feature patches of the wide-angle feature image may include: and determining a matching characteristic image block corresponding to each wide-angle characteristic image block from the plurality of tele characteristic image blocks according to the similarity between the plurality of wide-angle characteristic image blocks of the wide-angle characteristic image with the minimum scale and the plurality of tele characteristic image blocks of the tele characteristic image with the minimum scale.

The wide-angle characteristic image blocks can be obtained by segmenting the wide-angle characteristic image according to a preset segmentation rule, and the tele characteristic image blocks can be obtained by segmenting the tele characteristic image according to the segmentation rule. The segmentation rule may include a segmentation step size and a segmentation size.

For example, fig. 3a shows a schematic diagram of a feature map according to an embodiment of the present disclosure, and fig. 3b shows a schematic diagram of a feature pattern block according to an embodiment of the present disclosure, such as the 4 × 4 feature map shown in fig. 3a, the 4 feature pattern blocks shown in fig. 3b can be obtained by performing segmentation according to a segmentation rule with a segmentation step size of 1 and a segmentation scale of 3 × 3.

It should be understood that a higher degree of similarity may represent a higher degree of match between the wide angle feature patch and the tele feature patch. Wherein determining a matching feature patch from the plurality of tele feature patches corresponding to each of the wide feature maps may comprise: and regarding any wide-angle feature pattern block, taking the telephoto feature pattern block with the highest similarity with the wide-angle feature pattern as a matching feature pattern block corresponding to the wide-angle feature pattern.

The similarity between the wide angle feature blocks and the telephoto feature blocks may be determined by using a known similarity calculation method, for example, cosine similarity, euclidean distance, and the like, which is not limited in the embodiment of the present disclosure.

It should be understood that, a misalignment may exist between the matching feature pattern block determined based on the above segmentation and similarity and the corresponding wide-angle feature pattern, and the misalignment between the matching feature pattern block and the wide-angle feature pattern block can be reduced by transforming the matching feature pattern block to the alignment feature pattern block in step S14, so that irrelevant noise information is reduced from being merged into the wide-angle feature pattern, and the image quality of the merged target image is improved.

In one possible implementation, transforming the matching feature tile corresponding to the wide-angle feature tile to obtain an alignment feature tile aligned with the wide-angle feature map in step S14 may include: and according to the mapping relation between the wide-angle characteristic image block and the corresponding matching characteristic image block, carrying out geometric transformation on the matching characteristic image block to obtain an alignment characteristic image block with the position aligned with the wide-angle characteristic image block. The geometric transformation may include an affine transformation or a projective transformation, among others.

The mapping relationship between the wide-angle feature pattern block and the corresponding matching feature pattern block can be determined through a neural network (such as a convolutional neural network), and the mapping relationship can be represented by an affine matrix or a projection matrix. The embodiments of the present disclosure are not limited to the network type, network structure, training mode, and the like of the neural network.

It should be noted that the above determination of the mapping relationship through the neural network is an implementation manner disclosed in the embodiments of the present disclosure, and in fact, a person skilled in the art may determine the mapping relationship between the wide-angle feature pattern block and the matching feature pattern block in any manner known in the art, and the embodiments of the present disclosure are not limited thereto. For example, an affine matrix or a projection matrix may be calculated according to coordinates of a pair of matching feature points between the wide-angle feature pattern block and the matching feature pattern block, that is, a mapping relationship between the wide-angle feature pattern block and the matching feature pattern block is obtained.

In one possible implementation manner, in step S15, performing feature fusion on the multiple alignment feature patches and the wide-angle feature map to obtain a fused target image, which may include: splicing the multiple alignment feature blocks to obtain an alignment feature map; and performing feature fusion on the alignment feature map and the wide-angle feature map to obtain a fused target image.

Wherein, splicing the plurality of alignment feature blocks to obtain the alignment feature map may include: and splicing the alignment feature pattern blocks corresponding to the wide-angle feature pattern blocks according to the positions of the wide-angle feature pattern blocks relative to the wide-angle feature pattern to obtain the alignment feature pattern.

The feature fusion between the alignment feature map and the wide-angle feature map may be implemented by using a feature fusion manner known in the art, for example, a manner of adding (add) the two feature maps and keeping the number of channels unchanged, or a manner of merging (concat) the two feature maps in the channel dimension and increasing the number of channels may be used, which is not limited in the embodiment of the present disclosure.

As described above, the wide-angle feature map and the corresponding alignment feature map may be feature maps with the smallest scale, and in one possible implementation, the feature map obtained by fusing the alignment feature map and the wide-angle feature map may be subjected to up-sampling, decoding, and the like step by step to obtain a fused target image.

In the embodiment of the disclosure, by determining the matching feature pattern blocks corresponding to the wide-angle feature pattern blocks from the plurality of tele feature pattern blocks, the matching feature pattern blocks can be more accurately determined by using image features in the feature space; and then, the matching characteristic pattern blocks are transformed to obtain an alignment characteristic pattern aligned with the wide-angle characteristic pattern, so that the dislocation phenomenon of the alignment characteristic pattern blocks and the wide-angle characteristic pattern blocks can be reduced, irrelevant noise information is reduced from being blended into the wide-angle image, and the image quality of the fused target image is improved.

As described above, the wide-angle feature map and the telephoto feature map may be segmented by a preset segmentation rule, where the segmentation rule includes a segmentation step size and a segmentation size. In one possible implementation, in step S13, determining a matching feature patch corresponding to each of the wide feature patches of the wide feature map from the plurality of tele feature patches of the tele feature map includes:

respectively segmenting the wide-angle characteristic diagram and the tele characteristic diagram according to a preset segmentation rule to obtain a plurality of wide-angle characteristic diagram blocks and a plurality of tele characteristic diagram blocks;

and for any wide-angle characteristic image block, determining a matching characteristic image block corresponding to the wide-angle characteristic image block from the plurality of tele characteristic image blocks according to the similarity between the wide-angle characteristic image block and the plurality of tele characteristic image blocks.

It should be understood that different numbers of wide feature patches and tele feature patches may be segmented based on different segmentation rules and different scales of wide and tele feature patches.

As described above, the higher the similarity indicates a higher matching degree between the two feature image blocks, where determining a matching feature image block corresponding to the wide-angle feature image block from the plurality of tele feature image blocks according to the similarities between the wide-angle feature image block and the plurality of tele feature image blocks may include: and taking the tele characteristic image block corresponding to the maximum similarity in the similarities between the wide-angle characteristic image block and the tele characteristic image blocks as a matching characteristic image block corresponding to the wide-angle characteristic image block.

In one possible implementation, the similarity between the wide angle feature patch and the plurality of tele feature patches may be calculated by equation (1):

wherein s is_m,nRepresenting the cosine similarity between the mth wide angle feature patch and the nth tele feature patch,

representing a wide-angle image I^LRThe mth wide-angle feature map of the wide-angle feature map of (1),

representing tele images I^RefThe nth tele characteristic map of the tele characteristic map of (a);

a representative feature extraction network; the matching feature patches may be represented as

And representing the tele characteristic pattern block corresponding to the maximum cosine similarity as a matching characteristic pattern block.

In the embodiment of the present disclosure, the best matching tele characteristic pattern block can be effectively found for each wide characteristic pattern block, that is, the matching characteristic pattern block corresponding to each wide characteristic pattern block is determined.

As described above, the mapping relationship between the wide-angle feature patches and the matching feature patches may be characterized using an affine matrix or a projection matrix. In one possible implementation, in step S14, transforming the matching feature tile to obtain an alignment feature tile aligned with the wide-angle feature tile for any wide-angle feature tile may include:

determining a mapping relation between the wide-angle feature pattern blocks and the matching feature pattern blocks aiming at any wide-angle feature pattern block, wherein the mapping relation comprises an affine matrix between the wide-angle feature pattern blocks and the matching feature pattern blocks;

and carrying out affine transformation on the matched feature image blocks according to the affine matrix to obtain the aligned feature image blocks.

The affine matrix can reflect the geometric transformation relationship between the wide-angle feature pattern block and the corresponding matching feature pattern block. And performing affine transformation on the matching feature pattern blocks according to the affine matrix to obtain the alignment feature pattern blocks with the positions aligned with the wide-angle feature pattern blocks.

The affine matrix between the wide-angle feature pattern block and the corresponding matching feature pattern block can be determined by formula (2):

A＝T(concat(P^LR,P^matched)) (2)

wherein A represents an affine matrix, T represents a neural network for determining an affine matrix, P^LRRepresenting a wide-angle characteristic pattern, P^matchedRepresents a block of the matching features, and,concat(P^LR,P^matched) And the representation merges and connects the wide-angle characteristic image block and the matching characteristic image block, so as to input the merged and connected characteristic image block into T to obtain A.

In the embodiment of the present disclosure, the alignment feature pattern block aligned with the wide-angle feature pattern block can be effectively obtained according to the affine matrix between the wide-angle feature pattern block and the matching feature pattern block.

It should be understood that the field of view of the wide-angle image is larger than that of the tele image, so that there is an area uncovered by the tele image in the wide-angle image, and if the alignment feature map and the wide-angle feature map are directly fused, and a target image generated based on the fused feature map may contain more irrelevant noise information.

That is, the matching feature pattern block determined in the above manner has the highest similarity with respect to other tele feature pattern blocks, but there may be a case where the similarity between the matching feature pattern block and the wide-angle feature pattern block is actually lower, and based on this, if the region with lower similarity in the alignment feature pattern is also fused to the wide-angle feature pattern, irrelevant feature information may be introduced into the wide-angle feature pattern, that is, irrelevant noise information may be introduced into the wide-angle image, which affects the image quality of the target image.

For example, tele feature tiles

And wide-angle characteristic picture block

The similarity of the characteristic blocks is respectively 0.01, 0.011, 0.009 and 0.0089, and the wide-angle characteristic blocks can be obtained by the method

The corresponding matching characteristic image block is

But the

And wide-angle characteristic picture block

The similarity between them is still actually low.

In one possible implementation, the method further includes:

for any wide-angle feature pattern block, determining a confidence level of the matching feature pattern block according to the similarity between the wide-angle feature pattern block and the corresponding matching feature pattern block, wherein the confidence level indicates the matching degree between the wide-angle feature pattern block and the matching feature pattern block.

In one possible implementation, determining the confidence of the matching feature pattern block according to the similarity between the wide-angle feature pattern block and the corresponding matching feature pattern block may include: and directly taking the similarity between the wide-angle characteristic image block and the corresponding matching characteristic image block as the confidence coefficient of the matching characteristic image block. Wherein the confidence of the matching feature pattern block of the mth wide-angle feature pattern block can be represented as c_m＝max_n(s_m,n) The equation represents the maximum cosine similarity among the cosine similarities of the mth wide angle feature pattern and the plurality of tele feature patterns.

In one possible implementation, determining the confidence of the matching feature pattern block according to the similarity between the wide-angle feature pattern block and the corresponding matching feature pattern block may further include: the similarity between the wide-angle feature image blocks and the matching feature image blocks is enhanced (can be understood as weighting) or amplified through the learnable neural network, so that the confidence degrees with more obvious similarity difference are obtained, and therefore in the process of indicating feature fusion based on the confidence degrees, the feature fusion under different confidence degrees can be processed more accurately and flexibly.

Based on the above determined confidence of the matching feature image blocks, in one possible implementation manner, in step S15, feature fusion is performed on a plurality of alignment feature image blocks and the wide-angle feature map, so as to obtain a fused target image, including:

splicing the multiple alignment feature blocks to obtain an alignment feature map; fusing the wide-angle feature map and the alignment feature map according to the confidence degrees of the plurality of matched feature map blocks to obtain a fused feature map; and generating a target image according to the fusion feature map.

The process of splicing the alignment feature blocks in the embodiments of the present disclosure may be referred to above, and details are not repeated herein.

In a possible implementation manner, the wide-angle feature map and the alignment feature map are fused according to the confidence degrees of the multiple matched feature map blocks through a fusion network, so that a fusion feature map is obtained. The confidence degrees of the multiple matched feature image blocks can be used for guiding the fusion network to realize the fusion of the wide-angle feature image and the alignment feature image. The embodiments of the present disclosure are not limited to the network structure, the network type, and the training mode of the converged network.

In one possible implementation, the fusing the wide-angle feature map and the aligned feature map according to the confidence degrees of the multiple matching feature image blocks to obtain a fused feature map may include: and fusing the characteristic value indicated by the higher confidence degree in the alignment characteristic map with the characteristic value in the wide-angle characteristic map. In this way, the introduction of extraneous noise information is advantageously reduced.

In a possible implementation manner, the fusing the wide-angle feature map and the aligned feature map according to the confidence degrees of the multiple matching feature image blocks to obtain a fused feature map, which may further include: and weighting the feature values in the alignment feature map according to the confidence degrees of the multiple matching feature map blocks, and fusing the weighted alignment feature map and the wide-angle feature map. In this way, the weight of the feature value indicated by higher confidence coefficient can be increased, and the weight of the feature value indicated by lower confidence coefficient can be decreased, so as to fuse and align more useful feature information in the feature map, which is beneficial to reducing the introduction of irrelevant noise information.

As described above, aligning the feature map with the wide-angle feature map may be a feature map with a smallest scale, wherein generating the target image from the fused feature map may include: and performing upsampling, encoding and other processing on the fusion characteristic graph to generate a target image.

In the embodiment of the disclosure, based on the confidence of the matching feature image block, the fusion of the alignment feature image and the wide-angle feature image is indicated, so that the introduction of irrelevant noise information into the wide-angle feature image can be reduced, more effective feature information in the fusion alignment feature image is facilitated, and the image quality of the target image is improved.

As described above, the feature value indicated by the higher confidence in the aligned feature map may be fused with the feature value in the wide-angle feature map to obtain a fused feature map, and in one possible implementation, the fusing the wide-angle feature map with the aligned feature map according to the confidence of the multiple matching feature map blocks to obtain the fused feature map includes:

determining a confidence level above a confidence level threshold among the confidence levels of the plurality of matched feature patches;

and fusing the characteristic value indicated by the confidence coefficient higher than the confidence coefficient threshold value in the alignment characteristic diagram with the characteristic value of the wide-angle characteristic diagram to obtain a fused characteristic diagram.

The confidence threshold may be determined based on the confidences of a plurality of matching feature patches, for example, 0.5 times of the average of the confidences may be set as the confidence threshold, or the confidence threshold may also be manually set according to historical experience, which is not limited by the embodiment of the present disclosure.

It should be understood that a higher confidence may represent a higher similarity between the aligned feature pattern (i.e., the matching feature pattern) and the wide-angle feature pattern, and a confidence indicating feature value above the confidence threshold may be considered as a feature value in a region of the aligned feature pattern with a higher similarity to the wide-angle feature pattern (i.e., a region in which the aligned feature pattern with a higher similarity is located).

In one possible implementation, fusing the feature value indicated by the confidence higher than the confidence threshold in the aligned feature map with the feature value of the wide-angle feature map may include: adding the characteristic value indicated by the confidence coefficient higher than the confidence coefficient threshold value in the alignment characteristic diagram with the characteristic value of the wide-angle characteristic diagram to obtain a fusion characteristic diagram; the method can also comprise the following steps: weighting a characteristic value indicated by the confidence coefficient higher than the confidence coefficient threshold value in the alignment characteristic diagram and the confidence coefficient indicating the characteristic value to obtain a weighted characteristic value; and adding the weighted feature value to the feature value of the wide-angle feature map to obtain a fused feature map, which is not limited in the embodiment of the present disclosure.

In the embodiment of the disclosure, by fusing the feature value indicated by the confidence higher than the confidence threshold in the aligned feature map with the feature value of the wide-angle feature map, irrelevant noise features can be introduced into the fused feature map, and the image quality of the target image can be improved.

As described above, the tele feature map may include multiple scales and the Wide feature map includes multiple scales. In one possible implementation, the tele feature map includes I dimensions, and the wide feature map includes I dimensions, I being a positive integer, wherein generating the target image from the fused feature map includes:

the fusion characteristic diagram of the I-1 th scale is up-sampled to obtain the characteristic diagram of the I-1 th scale, wherein I +1 is more than or equal to I and more than or equal to 2;

fusing the feature map of the ith scale with the alignment feature map of the ith scale to obtain a fused feature map of the ith scale, wherein the alignment feature map of the ith scale is obtained according to the tele feature map of the ith scale and the wide feature map of the ith scale;

and if the I is I +1, decoding the feature map of the I + 1-th scale to obtain a target image.

For example, assuming that I is 3, the fused feature map of the 1 st scale may be obtained by fusing the wide-angle feature map of the 1 st scale with the corresponding alignment feature map of the 1 st scale; the 1 st scale fusion feature map is up-sampled to obtain a 2 nd scale feature map; fusing the feature map of the 2 nd scale and the alignment feature map of the 2 nd scale to obtain a fused feature map of the 2 nd scale; similarly, the fused feature map of the 2 nd scale is up-sampled to obtain a feature map of the 3 rd scale; fusing the feature map of the 3 rd scale and the alignment feature map of the 3 rd scale to obtain a fused feature map of the 3 rd scale; and performing up-sampling on the fusion feature map of the 3 rd scale to obtain a feature map of the 4 th scale, and performing decoding processing on the feature map of the 4 th scale to obtain a target image. It should be understood that the feature map at the 4 th scale may be the same feature map as the wide-angle image scale.

However, in the above manner from step S13 to step S14 in the embodiment of the present disclosure, obtaining alignment feature blocks according to the long-focus feature map of the ith scale and the wide-angle feature map of the ith scale, and splicing the alignment feature blocks to obtain the alignment feature map of the ith scale. In this way, the determined alignment feature maps at various scales can be more accurate.

In a possible implementation manner, the fusing the feature map of the ith scale with the alignment feature map of the ith scale to obtain a fused feature map of the ith scale may include: determining confidence for fusing the feature map of the ith scale and the aligned feature map of the ith scale; and according to the confidence coefficient, fusing the feature map of the ith scale and the alignment feature map of the ith scale to obtain a fused feature map of the ith scale.

In one possible implementation, determining the confidence for fusing the feature map of the ith scale and the aligned feature map of the ith scale may include: and (3) up-sampling a plurality of confidence degrees determined according to the wide-angle feature diagram of the 1 st scale and the tele feature diagram of the 1 st scale step by step to obtain the confidence degree for fusing the feature diagram of the ith scale and the alignment feature diagram of the ith scale. By the method, the calculation amount for determining the confidence coefficient can be reduced, and the image fusion efficiency is improved.

In a possible implementation manner, determining a confidence for fusing the feature map of the ith scale and the aligned feature map of the ith scale may further include: according to the above method for determining the confidence degrees of multiple matching feature image blocks in the embodiment of the present disclosure, the confidence degree for fusing the feature image of the ith scale and the alignment feature image of the ith scale is determined according to the wide-angle feature image of the ith scale and the tele feature image of the ith scale. By this means, the determined confidence level can be made more accurate.

In a possible implementation manner, the process of obtaining the fused feature map of the ith scale by fusing the feature map of the ith scale and the alignment feature map of the ith scale can be expressed as formula (3):

wherein the content of the first and second substances,

representing the fused feature map of the ith scale, C_iRepresenting the confidence for fusing the feature map of the ith scale with the aligned feature map of the ith scale, F_iA feature map representing the ith scale,

an alignment feature map representing the ith scale, g and h represent the network layers of the converged network, and equation (3) may be represented at C_iUnder the guidance of (1) A is_iAnd

fusion is performed. Wherein, it is understood that C_iFor example, may include a plurality of confidences c_m。

In a possible implementation manner, the fusing the feature map of the ith scale with the alignment feature map of the ith scale to obtain a fused feature map of the ith scale may include: and fusing the feature map of the ith scale, the alignment feature map of the ith scale and the wide-angle feature map of the ith scale to obtain a fused feature map of the ith scale. By the method, the feature information in the wide-angle feature maps with different scales is fused, so that the loss of the feature information in the wide-angle feature maps with different scales can be reduced, and the image quality of the generated target image is higher.

In a possible implementation manner, the process of obtaining the fused feature map of the ith scale by fusing the feature map of the ith scale, the alignment feature map of the ith scale, and the wide-angle feature map of the ith scale can be expressed as formula (4):

wherein the content of the first and second substances,

represents the wide-angle characteristic diagram of the ith scale, and the above formula (4) can express that the characteristic diagram is to be arranged at C_iUnder the guidance of (1) to (F)_iAnd

fusing the obtained feature map and

fusion is performed.

It should be understood that the fusion between the feature maps can be achieved by using the above-mentioned feature fusion method known in the art; the fused feature map of the I-th scale may be the feature map of the largest scale obtained by fusion.

It should be understood, however, that the feature map at the I +1 th scale may be the same as the feature map at the wide-angle image scale. The decoding network corresponding to the coding network may be adopted to decode the I +1 th scale feature map to obtain the target image, and the embodiment of the present disclosure is not limited to the network structure, the network type, and the like of the decoding network.

In the embodiment of the disclosure, the target image can be effectively generated aiming at the multi-scale tele characteristic diagram and the multi-scale wide characteristic diagram, which is beneficial to reducing the introduction of irrelevant noise information and improving the image quality of the target image.

In one possible implementation, the method further includes: matching feature image blocks corresponding to the wide-angle feature image blocks can be recorded through the index map; the splicing the multiple alignment feature blocks to obtain the alignment feature map may include: and splicing the multiple alignment feature blocks according to the index map to obtain an alignment feature map.

As described above, the matching feature pattern blocks corresponding to the respective wide-angle feature pattern blocks may be determined according to the similarities between the plurality of wide-angle feature pattern blocks and the plurality of tele feature pattern blocks, and the matching feature pattern blocks corresponding to the respective wide-angle feature pattern blocks may be recorded in the form of an index map, thereby facilitating subsequent splicing of the alignment feature patterns.

In one possible implementation, a plurality of tele feature blocks may be numbered in advance, so that the index map may record the numbers of matching feature blocks; or the index map may also directly record the position coordinates of the matching feature pattern blocks relative to the tele feature pattern to record the matching feature pattern blocks corresponding to the wide feature pattern blocks, which is not limited in this embodiment of the disclosure.

As described above, the alignment feature tiles are feature tiles obtained by transforming matching feature tiles, and the index map may indicate the alignment feature tiles corresponding to the respective wide-angle feature tiles. It can be understood that the position of each matching feature pattern block recorded in the index map may also reflect a splicing position when each alignment feature pattern block is spliced, and in a possible implementation manner, the splicing a plurality of alignment feature pattern blocks according to the index map to obtain the alignment feature map may include: and splicing the plurality of alignment feature image blocks according to the recorded positions of the matching feature image blocks in the index map to obtain the alignment feature map.

It should be understood that, recording the matching feature pattern blocks through the index map is one implementation manner provided in the embodiments of the present disclosure, and a person skilled in the art may select any information recording manner, for example, the matching feature patterns corresponding to the respective wide-angle feature pattern blocks may also be recorded through a matrix, an array, and the like, which is not limited to the embodiments of the present disclosure.

In the embodiment of the disclosure, the matching feature map blocks can be conveniently and effectively recorded through the index map, and meanwhile, the splicing of the alignment feature maps can be quickly realized, so that the alignment feature maps corresponding to dimensions such as the scale and the channel of the wide-angle feature map can be effectively obtained.

It is considered that by fusing the tele image to the wide image in the feature space by the embodiments of the present disclosure described above, high frequency details of the tele image may be lost, wherein the high frequency details may characterize image details in the tele image that are strongly gray-varying, for example, at image edges of the tele image. In one possible implementation, the method further includes:

extracting image details of the tele image to obtain a detail image of the tele image;

and carrying out image fusion on the target image and the detail image to obtain a fused image.

In one possible implementation, the image details of the tele image may be extracted through a residual network; the detail image of the tele image can be obtained by performing fourier transform on the tele image to extract the image detail of the tele image, which is not limited by the embodiment of the present disclosure. Wherein the image detail may be high frequency detail of the tele image. FIG. 4 illustrates a schematic diagram of a detail image, which may reflect high frequency details of a tele image, as shown in FIG. 4, in accordance with an embodiment of the disclosure.

The image fusion technique known in the art, for example, a weighted average method, a wavelet transform method, a high-pass filtering method, etc., may be used to implement image fusion between the target image and the detail image to obtain a fused image, which is not limited in the embodiment of the present disclosure.

In the embodiment of the disclosure, the high-frequency details of the tele image are fused into the target image, so that the fused image has higher image quality.

In a possible implementation manner, image fusion is performed on the target image and the detail image to obtain a fused image, including:

according to the similarity between a plurality of target image blocks of a target image and a plurality of detail image blocks of a detail image, respectively determining a matching image block corresponding to each target image block from the plurality of detail image blocks, and determining the confidence coefficient of the matching image block;

aiming at any target image block, transforming the matched image block according to the mapping relation between the target image block and the corresponding matched image block to obtain an aligned image block aligned with the target image block;

and carrying out image fusion on the plurality of aligned image blocks and the target image according to the confidence degrees of the plurality of matched image blocks to obtain a fused image.

In a possible implementation manner, the target image and the detail image may be respectively segmented according to a preset image segmentation rule to obtain a plurality of target image blocks and a plurality of detail image blocks. The image segmentation rule can be the same as the segmentation rule for segmenting the feature map; the embodiment of the present disclosure is not limited to this, considering that the resolution of the image is generally large, and may also be an image segmentation rule that is set independently, for example, the image segmentation step size may be set to 2, the image segmentation size is 5 × 5, and the like.

In a possible implementation manner, determining, according to similarities between a plurality of target image blocks of a target image and a plurality of detail image blocks of a detail image, a matching image block corresponding to each target image block from the plurality of detail image blocks may include: and for any target image block, determining a matched image block corresponding to the target image block from the plurality of detail image blocks according to the similarity between the target image block and the plurality of detail image blocks.

The similarity between the target image blocks and the detail image blocks may be determined by using a known similarity calculation method, for example, cosine similarity, euclidean distance, and the like, which is not limited in the embodiment of the present disclosure.

Determining a matching image block corresponding to the target image block from the plurality of detail image blocks according to the similarity between the target image block and the plurality of detail image blocks may include: and taking the detail image block with the highest similarity with the target image block as a matching image block corresponding to the target image block.

Considering that the matching image block determined by the above method may have a misalignment with the target image block, the misalignment between the target image block and the matching image block can be reduced by transforming the matching image block according to the mapping relationship between the target image block and the corresponding matching image block, and the irrelevant noise information is reduced to be blended into the target image, which is beneficial to improving the image quality of the blended image.

Wherein, transforming the matching image block according to the mapping relationship between the target image block and the corresponding matching image block to obtain an aligned image block aligned with the target image block may include: determining an affine matrix between the target image block and the corresponding matching image block; performing affine transformation on the matched image blocks according to the affine matrix to obtain aligned image blocks; the method can also comprise the following steps: determining a projection matrix between the target image block and the corresponding matching image block; and performing projection transformation on the matched image blocks according to the projection matrix to obtain aligned image blocks, which is not limited in the embodiment of the present disclosure.

In one possible implementation, determining the confidence of the matching image block according to the similarity between the plurality of target image blocks of the target image and the plurality of detail image blocks of the detail image may include: directly taking the similarity between the target image block and the corresponding matching image block as the confidence coefficient of the matching image block; or may further include enhancing (may be understood as weighting) or amplifying the similarity between the target image block and the matching image block through a learnable neural network, so as to obtain a confidence that the similarity difference is more significant, and the embodiment of the present disclosure is not limited thereto.

It should be understood that the detail image is generated based on the tele image, the target image is generated based on the wide image, and the field of view of the wide image is larger than that of the tele image, so that there is also an area in the target image that is not covered by the detail image, and if the alignment image of the detail image is directly image-fused with the target image, the uncovered area may introduce irrelevant noise information into the target image. The confidence degrees can reflect the matching degree between the aligned image blocks and the target image blocks, and the aligned image blocks and the target image are subjected to image fusion according to the confidence degrees of the matched image blocks, so that irrelevant noise information introduced into the target image can be reduced, and effective image information in a detailed image can be fused more.

In a possible implementation manner, performing image fusion on the multiple aligned image blocks and the target image according to the confidence degrees of the multiple matched image blocks to obtain a fused image may include: splicing the plurality of aligned image blocks according to the position of the target image block relative to the target image to obtain an aligned image; and carrying out image fusion on the aligned image and the target image according to the confidence degrees of the plurality of matched image blocks to obtain a fused image.

The image stitching technology known in the art may be adopted to implement stitching of multiple aligned images to obtain aligned images, and the embodiments of the present disclosure are not limited thereto.

The image fusion of the aligned image and the target image according to the confidence degrees of the plurality of matched image blocks to obtain a fused image may include: determining a confidence level that is higher than a confidence level threshold among the confidence levels of the plurality of matched image blocks; and carrying out image fusion on the pixel value indicated by the confidence coefficient higher than the confidence coefficient threshold value in the aligned image and the pixel value in the target image to obtain a fused image. By the method, the pixel value indicated by the confidence coefficient higher than the confidence coefficient threshold value in the aligned image is subjected to image fusion with the pixel value of the target image, so that irrelevant noise information can be reduced from being introduced into the fused image, and the image quality of the fused image can be improved.

Wherein, according to the confidence degrees of the plurality of matching image blocks, the image fusion is performed on the aligned image and the target image to obtain a fused image, and the method further comprises the following steps: and weighting the pixel values in the aligned images according to the confidence degrees of the plurality of matched image blocks, and carrying out image fusion on the weighted aligned images and the target image to obtain a fused image. By the method, the weight of the pixel value indicated by the higher confidence coefficient can be increased, the weight of the pixel value indicated by the lower confidence coefficient can be reduced, and the introduction of irrelevant noise information into the target image can be reduced by fusing more effective image information in the detail image.

In a possible implementation manner, the process of performing image fusion on the multiple aligned image blocks and the target image according to the confidence degrees of the multiple matched image blocks to obtain a fused image can be expressed as formula (5):

wherein, I^SRA fused image is represented that is,

showing a detail image I^HFIn the alignment of the images of (a),

a fused feature graph representing the I-th scale, decoder representing the decoding network,

representing a target image, C representing the confidence of a plurality of matching image blocks, k representing a neural network for enhancing or amplifying the confidence C, as expressed by the above equation (5), under the guidance of k (C)

And

and carrying out image fusion.

In the embodiment of the disclosure, the dislocation phenomenon between the detail image blocks and the target image blocks can be reduced, the introduction of irrelevant noise information into the target image can be reduced, more useful image details in the detail image can be fused, and the improvement of the image quality of the fused image is facilitated.

In a possible implementation manner, the image processing method is implemented by an image processing network, and fig. 5 shows a schematic diagram of a framework of an image processing network according to an embodiment of the present disclosure, and as shown in fig. 5, the image processing network includes a feature extraction sub-network, an attention alignment sub-network, and an adaptive fusion sub-network.

In step S12, the feature extraction of the wide-angle image and the tele image to obtain a wide-angle feature map of the wide-angle image and a tele feature map of the tele image includes: and performing feature extraction on the acquired wide-angle image and the acquired tele image through a feature extraction sub-network to obtain a wide-angle feature map of the wide-angle image and a tele feature map of the tele image.

In one possible implementation, as shown in fig. 5, the alignment attention subnetwork comprises: the network layer is determined by the matching network layer, the alignment network layer and the mapping relation. In step S13, determining a matching feature pattern block corresponding to each of the wide feature pattern blocks from the plurality of tele feature pattern blocks of the tele feature pattern includes: and determining matching characteristic image blocks corresponding to the wide-angle characteristic image blocks of the wide-angle characteristic image from the plurality of tele characteristic image blocks of the tele characteristic image through a matching network layer.

In one possible implementation, in step S14, transforming, for any wide angle feature tile, a matching feature tile corresponding to the wide angle feature tile to obtain an alignment feature tile aligned with the wide angle feature tile includes: determining a mapping relation between the wide-angle characteristic image block and the matching characteristic image block by the network layer through the mapping relation; and transforming the matching characteristic pattern blocks corresponding to the wide-angle characteristic pattern blocks according to the mapping relation between the wide-angle characteristic pattern blocks and the matching characteristic pattern blocks by the alignment network layer aiming at any wide-angle characteristic pattern block to obtain the alignment characteristic pattern blocks aligned with the wide-angle characteristic pattern blocks.

In one possible implementation, as shown in fig. 5, the adaptive fusion sub-network includes: a confidence learning network layer and a fusion network layer. In step S15, performing feature fusion on the multiple alignment feature patches and the wide-angle feature map to obtain a fused target image, including: determining the confidence of each matched feature pattern block through a confidence learning network layer; and performing feature fusion on the plurality of aligned feature pattern blocks and the wide-angle feature pattern through the fusion network layer according to the confidence degrees of the matched feature pattern blocks to obtain a fused target image.

The method steps implemented by each sub-network and each network layer of each sub-network in the image processing network may refer to the image processing method in the embodiment of the present disclosure, which are not described herein again.

It should be understood that the network framework of the image processing network is an implementation manner disclosed in the embodiments of the present disclosure, and a person skilled in the art may design a network structure, a network type, a training manner, and the like of the image processing network according to actual needs to enable the image processing network to implement the image processing method of the embodiments of the present disclosure. The embodiments of the present disclosure are not limited to the network structures and network types of the sub-networks in the image processing network and the network layers in the sub-networks.

In the embodiment of the disclosure, the image processing method can be efficiently and accurately realized through an image processing network.

Fig. 6 shows a schematic diagram of an image processing method according to an embodiment of the present disclosure. As shown in fig. 6, the method includes:

by feature extraction networks

Extraction of Wide-Angle image I^LRMultiple scale wide-angle feature map F^LRAnd tele image I^RefMultiple scale tele feature map F^Ref；

Calculating a wide-angle feature map F by a similarity calculation module^LRA plurality of wide angle feature image blocks and a tele feature image F^RefThe similarity among a plurality of long-focus characteristic image blocks is obtained to obtain an index map D_jAnd confidence map C_jJ ∈ 1,2,3, wherein the index map D_jFor indicating matching feature patches corresponding to respective ones of the wide-angle feature patches of the j-th scale, a confidence map C_jThe similarity of matching feature blocks corresponding to the wide-angle feature blocks of the j-th-scale wide-angle feature map is represented;

the 1 st scale long focus characteristic map and an index map D are combined₁Inputting the data into an alignment attention module to obtain an alignment feature map of a 1 st scale

The confidence coefficient chart C₁1 st scale alignment feature map

And wide-angle feature map of 1 st scale

Inputting the data into a self-adaptive fusion module to obtain a 2 nd-scale feature map F₂；

Wherein the alignment attention module is used for the tele characteristic map and the index map D according to the 1 st scale₁Obtaining a 1 st scale alignment feature map

The adaptive fusion module is used for the confidence coefficient map C₁Under the guidance of (1), aligning the feature map of the 1 st scale

Wide angle profile from 1 st scale

Fusing to obtain a 1 st scale fused feature map, and upsampling the 1 st scale fused feature map to obtain a 2 nd scale feature map F₂；

The 2 nd long-focus characteristic map and an index map D are combined₂Inputting the data into an alignment attention module to obtain a 2 nd-scale alignment feature map

The confidence coefficient chart C₂2 nd dimension alignment feature map

And feature map F of 2 nd scale₂Inputting the data into a self-adaptive fusion module to obtain a 3 rd-scale feature map F₃；

Wherein the alignment attention module is further configured to compare the 2 nd scale tele feature map with the index map D₂Obtaining the 2 nd-scale alignment feature map

The adaptive fusion module is also used for obtaining the confidence coefficient map C₂Under the guidance of (2), aligning the feature map of the 2 nd scale

Feature map F of 2 nd scale₂Performing fusion to obtain a 2 nd-scale fusion characteristic diagram, and performing fusion on the 2 nd scaleThe fusion characteristic diagram of the scale is up-sampled to obtain a characteristic diagram F of the 3 rd scale₃；

The 3 rd-scale long-focus feature map is compared with an index map D₃Inputting the data into an alignment attention module to obtain a 3 rd-scale alignment feature map

The confidence coefficient chart C ₃3 rd dimension alignment feature map

And feature map F of 3 rd scale₃Inputting the data into a self-adaptive fusion module to obtain a feature map F of the 4 th scale₃；

Wherein the alignment attention module is further configured to generate a tele feature map and an index map D according to the 3 rd scale₃Obtaining a 3 rd-scale alignment feature map

The adaptive fusion module is also used for obtaining the confidence coefficient map C₃Under the guidance of (3), aligning the feature map of the 3 rd scale

Feature map F of 3 rd scale₃Fusing to obtain a 3 rd-scale fused feature map, and upsampling the 3 rd-scale fused feature map to obtain a 4 th-scale feature map F₄；

By a decoding module, the feature map F of the 4 th scale₄Decoding to obtain a target image I;

detail image I of tele image^HFAnd inputting the index map D to the alignment attention module to obtain an alignment image

Aligning the confidence map C with the image

And the target image I is input to the adaptive fusion module,obtaining a fused image I^SR；

The index image D is a matching image block which is obtained by calculating through a similarity calculation module based on a plurality of target image blocks of a target image and a plurality of detail image blocks of a detail image and is used for indicating each target image block of the target image; the confidence coefficient graph C is obtained by the similarity calculation module based on a plurality of target image blocks of the target image and a plurality of detail image blocks of the detail image, and is used for representing the similarity between each target image block of the target image and the corresponding matching image block.

Wherein the alignment attention module is further configured to align the image according to the detail image I^HFAnd index map D, obtaining the aligned image

The self-adaptive fusion module is also used for aligning the images under the guidance of the confidence coefficient map C

Fusing with the target image I to obtain a fused image I^SR。

It should be understood that obtaining the index map corresponds to determining matching feature image blocks and matching image blocks; the confidence map includes a plurality of confidences, and the method for determining the plurality of confidences may refer to the embodiment of the present disclosure, the process for obtaining the alignment feature map and the alignment image through the alignment attention module, and the process for obtaining the alignment feature map and the alignment image in the embodiment of the present disclosure, which are not described herein again.

According to the embodiment of the disclosure, after the matched feature image blocks and the matched image blocks after feature matching are obtained, the alignment image blocks and the aligned feature image blocks are obtained based on the mapping relation through the neural network learning mapping relation by the alignment attention module, so that the aligned feature map and the wide-angle feature map, the aligned image and the target image are input into the adaptive fusion module, the tele image can be better fused into the wide image, a better image fusion result can be obtained, and the image quality is improved.

According to the embodiment of the disclosure, the misalignment problem of the matching image block and the target image block can be reduced through the aligning attention module, the misalignment problem between the matching feature image block and the wide-angle feature image block is reduced, and a better fusion effect can be obtained. The alignment attention module is learnable, and when a larger data set is adopted, the alignment attention module can fit the data set and learn by itself, so that the effect of matching alignment can be improved.

Compared with an image fusion mode based on image stitching in the related art, the image processing method disclosed by the embodiment of the disclosure can reduce obvious artifacts generated by a target image and a fused image, and simultaneously retains texture details of most of tele images.

According to the image processing method, compared with an image fusion mode based on a deep learning technology in the related art, the alignment image and the alignment feature map can be provided through the alignment attention module, and the alignment image and the alignment feature map are respectively and well fused with the target image and the wide-angle feature map by adopting the self-adaptive fusion module.

It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted. Those skilled in the art will appreciate that in the above methods of the specific embodiments, the specific order of execution of the steps should be determined by their function and possibly their inherent logic.

In addition, the present disclosure also provides an image processing apparatus, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any one of the image processing methods provided by the present disclosure, and the descriptions and corresponding descriptions of the corresponding technical solutions and the corresponding descriptions in the methods section are omitted for brevity.

Fig. 7 illustrates a block diagram of an image processing apparatus according to an embodiment of the present disclosure, the apparatus including, as illustrated in fig. 7:

an obtaining module 101, configured to obtain a wide-angle image and a tele image for a same scene;

a feature extraction module 102, configured to perform feature extraction on the wide-angle image and the tele image to obtain a wide-angle feature map of the wide-angle image and a tele feature map of the tele image;

a matching module 103, configured to determine, from a plurality of tele feature patches of the tele feature map, matching feature patches corresponding to respective wide feature patches of the wide feature map;

a transformation module 104, configured to transform, for any wide-angle feature tile, a matching feature tile corresponding to the wide-angle feature tile to obtain an alignment feature tile aligned with the wide-angle feature tile;

and the feature fusion module 105 is configured to perform feature fusion on the multiple aligned feature image blocks and the wide-angle feature map to obtain a fused target image.

In a possible implementation manner, the matching module 103 includes: the segmentation submodule is used for segmenting the wide-angle feature map and the tele feature map respectively according to a preset segmentation rule to obtain a plurality of wide-angle feature image blocks and a plurality of tele feature image blocks; and the matching sub-module is used for determining a matching characteristic image block corresponding to the wide-angle characteristic image block from the plurality of tele characteristic image blocks according to the similarity between the wide-angle characteristic image block and the plurality of tele characteristic image blocks aiming at any wide-angle characteristic image block.

In one possible implementation, the transformation module 104 includes: a mapping relationship determination submodule to determine, for any wide-angle feature tile, a mapping relationship between the wide-angle feature tile and the matching feature tile, the mapping relationship comprising an affine matrix between the wide-angle feature tile and the matching feature tile; and the transformation submodule is used for carrying out affine transformation on the matched feature pattern blocks according to the affine matrix to obtain the aligned feature pattern blocks.

In one possible implementation, the apparatus further includes: a confidence determination module, configured to determine, for any wide-angle feature block, a confidence of the matching feature block according to a similarity between the wide-angle feature block and the corresponding matching feature block, where the confidence is used to indicate a matching degree between the wide-angle feature block and the matching feature block; wherein the feature fusion module 105 comprises: the characteristic image block splicing submodule is used for splicing the plurality of alignment characteristic image blocks to obtain an alignment characteristic image; the feature fusion submodule is used for carrying out feature fusion on the wide-angle feature map and the alignment feature map according to the confidence degrees of a plurality of matched feature map blocks to obtain a fusion feature map; and the target image generation submodule is used for generating the target image according to the fusion feature map.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a volatile or non-volatile computer readable storage medium.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

The disclosed embodiments also provide a computer program product comprising computer readable code or a non-transitory computer readable storage medium carrying computer readable code, which when run in a processor of an electronic device, the processor in the electronic device performs the above method.

The electronic device may be provided as a terminal, server, or other form of device.

Fig. 8 illustrates a block diagram of an electronic device 800 in accordance with an embodiment of the disclosure. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like terminal.

Referring to fig. 8, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a Complementary Metal Oxide Semiconductor (CMOS) or Charge Coupled Device (CCD) image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as a wireless network (WiFi), a second generation mobile communication technology (2G) or a third generation mobile communication technology (3G), or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.

Fig. 9 illustrates a block diagram of an electronic device 1900 in accordance with an embodiment of the disclosure. For example, the electronic device 1900 may be provided as a server. Referring to fig. 9, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system, such as the Microsoft Server operating system (Windows Server), stored in the memory 1932^TM) Apple Inc. of the present application based on the graphic user interface operating System (Mac OS X)^TM) Multi-user, multi-process computer operating system (Unix)^TM) Free and open native code Unix-like operating System (Linux)^TM) Open native code Unix-like operating System (FreeBSD)^TM) Or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. An image processing method, comprising:

acquiring a wide-angle image and a tele image aiming at the same scene;

performing feature extraction on the wide-angle image and the tele image to obtain a wide-angle feature map of the wide-angle image and a tele feature map of the tele image;

determining matching feature patches corresponding to the wide feature patches of the wide feature map from the plurality of tele feature patches of the tele feature map;

for any wide-angle feature pattern block, transforming the matching feature pattern block corresponding to the wide-angle feature pattern block to obtain an alignment feature pattern block aligned with the wide-angle feature pattern block;

and carrying out feature fusion on the plurality of alignment feature image blocks and the wide-angle feature image to obtain a fused target image.

2. The method of claim 1, wherein determining, from the plurality of tele feature patches of the tele feature map, a matching feature patch corresponding to each Wide feature patch of the Wide feature map comprises:

respectively segmenting the wide-angle feature map and the tele feature map according to a preset segmentation rule to obtain a plurality of wide-angle feature map blocks and a plurality of tele feature map blocks;

for any wide-angle feature pattern block, determining a matching feature pattern block corresponding to the wide-angle feature pattern block from the plurality of tele feature pattern blocks according to the similarity between the wide-angle feature pattern block and the plurality of tele feature pattern blocks.

3. The method of claim 1 or 2, wherein transforming, for any wide angle feature tile, a matching feature tile corresponding to the wide angle feature tile to obtain an aligned feature tile aligned with the wide angle feature tile comprises:

determining, for any wide-angle feature tile, a mapping between the wide-angle feature tile and the matching feature tile, the mapping comprising an affine matrix between the wide-angle feature tile and the matching feature tile;

and carrying out affine transformation on the matched feature pattern blocks according to the affine matrix to obtain the aligned feature pattern blocks.

4. The method according to any one of claims 1-3, further comprising:

for any wide-angle feature pattern block, determining a confidence level of the matching feature pattern block according to the similarity between the wide-angle feature pattern block and the corresponding matching feature pattern block, wherein the confidence level is used for indicating the matching degree of the wide-angle feature pattern block and the matching feature pattern block;

wherein, the feature fusion is carried out on a plurality of alignment feature image blocks and the wide-angle feature image to obtain a fused target image, and the method comprises the following steps:

splicing the plurality of alignment feature image blocks to obtain an alignment feature map;

performing feature fusion on the wide-angle feature map and the aligned feature map according to the confidence degrees of the multiple matched feature map blocks to obtain a fused feature map;

and generating the target image according to the fusion feature map.

5. The method of claim 4, wherein the feature fusing the wide-angle feature map and the aligned feature map according to the confidence of the plurality of matched feature patches to obtain a fused feature map comprises:

determining a confidence in the confidence of the plurality of matching feature patches that is above a confidence threshold;

and performing feature fusion on the feature value indicated by the confidence coefficient higher than the confidence coefficient threshold value in the alignment feature map and the feature value of the wide-angle feature map to obtain a fused feature map.

6. The method of claim 4 or 5, wherein the tele feature map comprises I scales and the Wide feature map comprises I scales, I being a positive integer, wherein the generating the target image from the fused feature map comprises:

the fused feature map of the (I-1) th scale is up-sampled to obtain a feature map of the (I) th scale, wherein (I +1) is more than or equal to I and more than or equal to 2;

performing feature fusion on the feature map of the ith scale and the alignment feature map of the ith scale to obtain a fusion feature map of the ith scale, wherein the alignment feature map of the ith scale is obtained according to the tele feature map of the ith scale and the wide feature map of the ith scale;

and if the I is equal to (I +1), decoding the feature map of the (I +1) th scale to obtain the target image.

7. The method according to claim 6, wherein the performing feature fusion on the feature map of the ith scale and the alignment feature map of the ith scale to obtain a fused feature map of the ith scale includes:

and performing feature fusion on the feature map of the ith scale, the alignment feature map of the ith scale and the wide-angle feature map of the ith scale to obtain a fusion feature map of the ith scale.

8. The method according to any one of claims 4-7, further comprising:

recording the matching characteristic image blocks corresponding to the wide-angle characteristic image blocks through the index map;

wherein, the splicing the plurality of alignment feature blocks to obtain the alignment feature map comprises:

and splicing the plurality of alignment feature image blocks according to the index map to obtain an alignment feature map.

9. The method according to any one of claims 1-8, further comprising:

10. The method according to claim 9, wherein the image fusing the target image and the detail image to obtain a fused image comprises:

according to the similarity between a plurality of target image blocks of the target image and a plurality of detail image blocks of the detail image, respectively determining a matching image block corresponding to each target image block from the plurality of detail image blocks, and determining the confidence coefficient of the matching image block;

and carrying out image fusion on the plurality of aligned image blocks and the target image according to the confidence degrees of the plurality of matched image blocks to obtain the fused image.

11. The method according to any of claims 1-10, wherein the image processing method is implemented by an image processing network comprising a feature extraction sub-network, an alignment attention sub-network, and an adaptive fusion sub-network;

wherein, it reaches to be right the long burnt image carries out feature extraction, obtains the wide angle characteristic map of wide angle image reaches the long burnt characteristic map of long burnt image includes:

and performing feature extraction on the obtained wide-angle image and the tele image through the feature extraction sub-network to obtain a wide-angle feature map of the wide-angle image and a tele feature map of the tele image.

12. The method of claim 11, wherein aligning the attention subnetwork comprises: matching the network layer, aligning the network layer and determining the network layer by mapping relation;

wherein determining, from the plurality of tele feature patches of the tele feature map, a matching feature patch corresponding to each wide feature patch of the wide feature map comprises:

determining, by the matching network layer, matching feature patches corresponding to respective Wide feature patches of the Wide feature map from a plurality of tele feature patches of the tele feature map.

13. The method of claim 12, wherein transforming, for any wide angle feature tile, a matching feature tile corresponding to the wide angle feature tile to obtain an alignment feature tile aligned with the wide angle feature tile comprises:

determining, by the mapping determination network layer, a mapping between the wide-angle feature tile and the matching feature tile;

and transforming the matching feature pattern blocks corresponding to the wide-angle feature pattern blocks according to the mapping relation between the wide-angle feature pattern blocks and the matching feature pattern blocks by aiming at any wide-angle feature pattern block through the alignment network layer to obtain the alignment feature pattern blocks aligned with the wide-angle feature pattern blocks.

14. The method of claim 11, wherein the adaptive fusion sub-network comprises: a confidence learning network layer and a fusion network layer;

determining the confidence of each matched feature pattern block through the confidence learning network layer;

and performing feature fusion on the plurality of aligned feature pattern blocks and the wide-angle feature pattern through a fusion network layer according to the confidence degrees of the matched feature pattern blocks to obtain a fused target image.

15. An image processing apparatus characterized by comprising:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a wide-angle image and a tele image aiming at the same scene;

the characteristic extraction module is used for carrying out characteristic extraction on the wide-angle image and the tele image to obtain a wide-angle characteristic diagram of the wide-angle image and a tele characteristic diagram of the tele image;

a matching module, configured to determine, from a plurality of tele feature patches of the tele feature map, matching feature patches corresponding to respective wide feature patches of the wide feature map;

the transformation module is used for transforming the matching feature pattern blocks corresponding to any wide-angle feature pattern block to obtain an alignment feature pattern block aligned with the wide-angle feature pattern block;

and the characteristic fusion module is used for carrying out characteristic fusion on the plurality of aligned characteristic image blocks and the wide-angle characteristic image to obtain a fused target image.

16. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the memory-stored instructions to perform the method of any one of claims 1 to 14.

17. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 14.