CN115953339A

CN115953339A - Image fusion processing method, device, equipment, storage medium and chip

Info

Publication number: CN115953339A
Application number: CN202211567607.9A
Authority: CN
Inventors: 吴彩林
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2022-12-07
Filing date: 2022-12-07
Publication date: 2023-04-11

Abstract

The disclosure relates to an image fusion processing method, device, equipment, storage medium and chip, wherein the method comprises the following steps: acquiring a first image and a second image; affine transformation is carried out on the second image to the coordinate system of the first image, and a third image is obtained; determining a target fusion mask image based on the first image and the third image, wherein the target fusion mask image comprises a first indication area and a second indication area, the boundary of the first indication area and the second indication area is a target fusion boundary, the target fusion boundary is arranged at a position where an image difference meets a set condition, and the image difference comprises a difference between the first image and the third image; and fusing the third image and the first image based on the target fusion mask image to obtain a target fusion image. The method and the device can avoid the problem of ghost image generation at the fusion boundary, and further improve the quality of the output image of the electronic equipment.

Description

Image fusion processing method, device, equipment, storage medium and chip

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image fusion processing method, apparatus, device, storage medium, and chip.

Background

With the development of terminal technology, the functions of terminal devices such as smart phones are more and more abundant, and the image processing function is one of the important functions of the terminal devices. The camera module of the terminal equipment cannot reach the shooting level of a professional camera due to the limitation of the space structure of the terminal equipment, so that a fused image is obtained by fusing multi-frame images captured by a plurality of cameras in the camera module in the related art, and the imaging quality is improved. However, the image fusion method in the related art has a problem of image fusion ghosting, which affects the quality of the output image.

Disclosure of Invention

In order to overcome the problems in the related art, embodiments of the present disclosure provide an image fusion processing method, apparatus, device, storage medium, and chip, so as to solve the defects in the related art.

According to a first aspect of the embodiments of the present disclosure, there is provided an image fusion processing method, the method including:

acquiring a first image and a second image, wherein the first image is acquired by a first camera of the electronic equipment, the second image is acquired by a second camera of the electronic equipment, and the field angle of the first camera is larger than that of the second camera;

affine transformation is carried out on the second image to the coordinate system of the first image, and a third image is obtained;

determining a target fusion mask image based on the first image and the third image, wherein the target fusion mask image comprises a first indication area and a second indication area, the first indication area is used for indicating a fusion area, the second indication area is used for indicating a non-fusion area, the boundary of the first indication area and the second indication area is a target fusion boundary, the target fusion boundary is arranged at a position where an image difference meets a set condition, and the image difference comprises a difference between the first image and the third image;

and fusing the third image and the first image based on the target fusion mask image to obtain a target fusion image.

In some embodiments, the affine-transforming the second image into the coordinate system of the first image to obtain a third image includes:

carrying out image registration based on the first image and the second image to obtain an affine transformation relation of the images;

and affine transforming the second image to a coordinate system of the first image based on the image affine transformation relation to obtain a third image.

In some embodiments, the method further comprises:

preprocessing the first image and the second image based on a preset processing mode to obtain the preprocessed first image and second image, wherein the preset processing mode comprises at least one of brightness correction and color correction;

and performing image registration based on the first image and the second image after preprocessing to obtain an image affine transformation relation.

In some embodiments, the determining a target fusion mask image based on the first image and the third image comprises:

determining pixel differences and optical flow information for the first image and the third image;

determining a first fusion mask image based on the pixel difference and the optical flow information, wherein the first fusion mask image comprises a third indication area and a fourth indication area, the third indication area is used for indicating a fusion area, the fourth indication area is used for indicating a non-fusion area, and the boundary of the third indication area and the fourth indication area is a first fusion boundary;

performing down-sampling processing on the first fusion mask image to obtain a second fusion mask image, wherein the second fusion mask image comprises a fifth indication region and a sixth indication region, the fifth indication region is used for indicating a fusion region, the sixth indication region is used for indicating a non-fusion region, and the boundary between the fifth indication region and the sixth indication region is a second fusion boundary;

adjusting a second fusion boundary in the second fusion mask image to a position where image differences meet a set condition, so as to obtain an adjusted second fusion mask image, where the image differences include a difference between the first image and the third image;

and performing up-sampling processing on the adjusted second fusion mask image to obtain the target fusion mask image, wherein the target fusion boundary is obtained based on the up-sampling of the adjusted second fusion boundary.

In some embodiments, said determining a first fusion mask image based on said pixel differences and said optical flow information comprises:

determining a first fusion mask image using a preset algorithm based on the pixel difference and the optical flow information, the preset algorithm including at least one of a motion detection algorithm and an occlusion detection algorithm.

In some embodiments, the adjusting the second fusion boundary in the second fusion mask image to a position where the image difference satisfies a set condition to obtain an adjusted second fusion mask image includes:

performing distance transformation on the sixth indication area in the second fusion mask image to obtain a first distance transformation image;

determining an adjustment range of the second fusion boundary in the second fusion mask image based on the first distance transformed image;

determining pixel differences of the first image and the third image;

determining a Gaussian weighted average value of pixel differences in a set window corresponding to each pixel in the adjustment range based on the pixel differences to obtain a Gaussian weighted average value image;

constructing an objective function by using a graph cut algorithm based on the gaussian weighted average image, wherein the objective function comprises a constraint term and a smoothing term, the constraint term is used for limiting the positions of the fifth indication area and the sixth indication area in the second fusion mask image, the smoothing term is used for adjusting the second fusion boundary within the adjustment range to a position where an image difference meets a set condition based on the gaussian weighted average image, and the image difference comprises a difference between the first image and the third image;

and solving the objective function based on a preset function solving algorithm to obtain an adjusted second fusion mask image.

In some embodiments, said fusing said third image with said first image based on said target fusion mask image to obtain a target fusion image comprises:

fusing the third image with the first image based on a predetermined fusion transition weight in a fusion transition region of the target fusion image, wherein the fusion transition region corresponds to a preset fusion transition region in the target fusion mask image;

reserving the pixel value of the third image in a fusion area of the target fusion image, wherein the fusion area corresponds to an area except for the preset fusion transition area in the first indication area;

and reserving pixel values of the first image in a non-fusion area of the target fusion image, wherein the non-fusion area corresponds to the second indication area.

In some embodiments, the method further comprises determining the fusion transition weight in advance based on:

performing distance transformation on the second indication area in the target fusion mask image to obtain a second distance transformation image;

and determining the fusion transition weight corresponding to the preset fusion transition region based on the second distance transformation image.

In some embodiments, the method further comprises:

carrying out style migration on the third image to obtain a fourth image;

and replacing the third image with the fourth image, and fusing the third image and the first image based on the target fusion mask image to obtain a target fusion image.

In some embodiments, the performing style migration on the third image to obtain a fourth image includes:

inputting the third image into a generator network in a pre-trained generation countermeasure network to obtain a style migration residual image from the third image to the first image;

and based on a predetermined style transition weight, superposing the style migration residual image to a style transition region of the third image to obtain a fourth image, wherein the style transition region of the third image corresponds to a preset style transition region of the target fusion mask image.

In some embodiments, the method further comprises determining the style transition weights in advance based on:

performing distance transformation on the second indication area in the target fusion mask image to obtain a third distance transformation image;

and determining style transition weights corresponding to preset style transition regions of the target fusion mask image based on the third distance conversion image.

In some embodiments, the method further comprises training the generating an antagonistic network in advance based on:

acquiring a plurality of first sample images and a plurality of third sample images, wherein the third sample images comprise images obtained by affine transforming second sample images to a coordinate system of the first sample images, the first sample images are acquired by a first sample camera, the second sample images are acquired by a second sample camera, and the field angle of the first sample camera is larger than that of the second sample camera;

taking the multiple first sample images and the multiple third sample images as a judger training set, training a judger network in the generated countermeasure network to obtain a trained judger network, wherein the judger network is used for classifying based on comparison results of image styles;

respectively sliding on the first sample image and the third sample image based on a preset sliding window;

in response to detecting that the structural similarity SSIM between two images in the sliding window is greater than or equal to a set threshold value, taking the two images as a group of training images;

repeating the process of obtaining a group of training images, and taking the obtained multiple groups of training images as a generator training set;

and training the generator network in the generation countermeasure network based on the generator training set to obtain the trained generator network.

In some embodiments, said training a generator network in said generating a countermeasure network based on said generator training set comprises:

constructing a generator loss function, wherein the loss function comprises at least one of a constraint item, a characteristic similar item and a generation item, the constraint item is determined based on the image content difference between the third sample image after the style transformation and the third sample image, the characteristic similar item is determined based on the image content difference between the third sample image after the style transformation and the first sample image, and the generation item is determined based on the output probability of the judger network;

and solving the loss function based on a preset optimization algorithm to obtain a trained generator network.

According to a second aspect of the embodiments of the present disclosure, there is provided an image fusion processing apparatus, the apparatus including:

the image acquisition module is used for acquiring a first image and a second image, the first image is acquired by a first camera of the electronic equipment, the second image is acquired by a second camera of the electronic equipment, and the field angle of the first camera is larger than that of the second camera;

the image transformation module is used for affine transforming the second image to a coordinate system of the first image to obtain a third image;

a mask determination module, configured to determine a target fusion mask image based on the first image and the third image, where the target fusion mask image includes a first indication region and a second indication region, the first indication region is used to indicate a fusion region, the second indication region is used to indicate a non-fusion region, a boundary of the first indication region and the second indication region is a target fusion boundary, the target fusion boundary is set at a position where an image difference satisfies a set condition, and the image difference includes a difference between the first image and the third image;

and the image fusion module is used for fusing the third image and the first image based on the target fusion mask image to obtain a target fusion image.

In some embodiments, the image transformation module comprises:

the relationship acquisition unit is used for carrying out image registration on the basis of the first image and the second image to obtain an affine transformation relationship of the images;

and the image transformation unit is used for affine transforming the second image to the coordinate system of the first image based on the image affine transformation relation to obtain a third image.

In some embodiments, the image transformation module comprises:

the preprocessing unit is used for preprocessing the first image and the second image based on a preset processing mode to obtain the preprocessed first image and second image, wherein the preset processing mode comprises at least one of brightness correction and color correction;

the relationship obtaining unit is further configured to perform, based on the first image and the second image after the preprocessing, an operation of performing image registration based on the first image and the second image to obtain an image affine transformation relationship.

In some embodiments, the mask determination module comprises:

a difference information determination unit for determining pixel differences and optical flow information of the first image and the third image;

a first mask determination unit configured to determine a first fusion mask image based on the pixel difference and the optical flow information, the first fusion mask image including a third indication region and a fourth indication region, the third indication region indicating a fusion region, the fourth indication region indicating a non-fusion region, a boundary between the third indication region and the fourth indication region being a first fusion boundary;

a second mask obtaining unit, configured to perform downsampling on the first fusion mask image to obtain a second fusion mask image, where the second fusion mask image includes a fifth indication region and a sixth indication region, the fifth indication region is used to indicate a fusion region, the sixth indication region is used to indicate a non-fusion region, and a boundary between the fifth indication region and the sixth indication region is a second fusion boundary;

a fusion boundary adjusting unit, configured to adjust a second fusion boundary in the second fusion mask image to a position where an image difference satisfies a set condition, so as to obtain an adjusted second fusion mask image, where the image difference includes a difference between the first image and the third image;

and the target mask acquisition unit is used for performing up-sampling processing on the adjusted second fusion mask image to obtain the target fusion mask image, wherein the target fusion boundary is obtained based on the up-sampling of the adjusted second fusion boundary.

In some embodiments, the first mask determination unit is further configured to determine a first fused mask image using a preset algorithm including at least one of a motion detection algorithm and an occlusion detection algorithm based on the pixel difference and the optical flow information.

In some embodiments, the fusion boundary adjustment unit is further configured to:

determining pixel differences of the first image and the third image;

In some embodiments, the image fusion module comprises:

a first fusion unit, configured to fuse the third image and the first image based on a predetermined fusion transition weight in a fusion transition region of the target fusion image, where the fusion transition region corresponds to a preset fusion transition region in the target fusion mask image;

a second fusion unit, configured to reserve a pixel value of the third image in a fusion region of the target fusion image, where the fusion region corresponds to a region of the first indication region excluding the preset fusion transition region;

a third fusing unit, configured to reserve pixel values of the first image in a non-fused region of the target fused image, where the non-fused region corresponds to the second indication region.

In some embodiments, the image fusion module further comprises a fusion transition weight determination unit;

the fusion transition weight determination unit is configured to:

In some embodiments, the apparatus further comprises:

the style migration module is used for carrying out style migration on the third image to obtain a fourth image;

the image fusion module is further configured to perform an operation of fusing the third image and the first image based on the target fusion mask image to obtain a target fusion image, based on the fourth image instead of the third image.

In some embodiments, the style migration module includes:

a residual image obtaining unit, configured to input the third image to a generator network in a pre-trained generation countermeasure network, so as to obtain a style migration residual image from the third image to the first image;

and the fourth image acquisition unit is used for superposing the style migration residual image to a style transition region of the third image based on a predetermined style transition weight to obtain a fourth image, wherein the style transition region of the third image corresponds to a preset style transition region of the target fusion mask image.

In some embodiments, the style migration module further comprises a style transition weight determination unit;

the style transition weight determination unit is configured to:

In some embodiments, the apparatus further comprises a generate confrontation network training module;

the generation confrontation network training module comprises:

the device comprises a sample acquisition unit and a processing unit, wherein the sample acquisition unit is used for acquiring a plurality of first sample images and a plurality of third sample images, the third sample images comprise images obtained by affine transformation of second sample images to a coordinate system of the first sample images, the first sample images are acquired by a first sample camera, the second sample images are acquired by a second sample camera, and the field angle of the first sample camera is larger than that of the second sample camera;

a decision device training unit, configured to train a decision device network in the generated countermeasure network by using the multiple first sample images and the multiple third sample images as a decision device training set, so as to obtain a trained decision device network, where the decision device network is used for classifying based on a comparison result of image styles;

a window sliding unit configured to slide on the first sample image and the third sample image, respectively, based on a preset sliding window;

the image determining unit is used for responding to the fact that the structural similarity SSIM between the two images in the sliding window is larger than or equal to a set threshold value, and using the two images as a group of training images;

the training set determining unit is used for repeating the process of obtaining a group of training images and taking a plurality of groups of obtained training images as a generator training set;

and the generator training unit is used for training the generator network in the generated confrontation network based on the generator training set to obtain the trained generator network.

In some embodiments, the generator training unit is further to:

According to a third aspect of embodiments of the present disclosure, there is provided an electronic apparatus, the apparatus comprising:

the system comprises a first camera, a second camera, a processor and a memory for storing computer programs;

wherein the processor is configured to, when executing the computer program, implement:

acquiring a first image and a second image, wherein the first image is acquired by the first camera, the second image is acquired by the second camera, and the field angle of the first camera is larger than that of the second camera;

affine transformation is carried out on the second image to a coordinate system of the first image, and a third image is obtained;

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements:

According to a fifth aspect of the embodiments of the present disclosure, there is provided a chip including:

a processor and an interface;

the processor is used for reading instructions through the interface to execute any one of the image fusion processing methods.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

according to the method, the first image and the second image are obtained, the second image is affine transformed to the coordinate system of the first image to obtain the third image, the target fusion mask image is determined based on the first image and the third image, the third image and the first image can be fused based on the target fusion mask image to obtain the target fusion image, the target fusion boundary in the target fusion mask image is arranged at the position where the image difference meets the set condition, the target fusion boundary can be limited to the position where the image difference is minimum through the set condition, the problem that ghost images are generated at the fusion boundary in the target fusion image during image fusion can be avoided, the width of a fusion transition area can be properly reduced on the basis of ensuring the fusion effect, compared with the fusion transition mode in the related technology, the effect of narrow fusion transition can be achieved, the problem that ghost images are generated at the fusion boundary can be further avoided, and the quality of the images output by electronic equipment is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1A is a flow diagram illustrating a method of image fusion processing according to an exemplary embodiment of the present disclosure;

FIG. 1B is a schematic illustration of a first image shown in accordance with an exemplary embodiment of the present disclosure;

FIG. 1C is a schematic illustration of a third image shown in accordance with an exemplary embodiment of the present disclosure;

FIG. 1D is a schematic illustration of a target fusion mask image shown in accordance with an exemplary embodiment of the present disclosure;

FIG. 1E is a schematic diagram of a target fusion image shown in accordance with an exemplary embodiment of the present disclosure;

FIG. 2 is a flow chart illustrating how a third image is obtained according to an exemplary embodiment of the present disclosure;

FIG. 3 is a flowchart illustrating how a target fusion mask image is determined based on the first image and the third image according to an exemplary embodiment of the present disclosure;

FIG. 4A is a flowchart illustrating how to adjust the second fusion boundary in the second fusion mask image to a position where the image difference satisfies a set condition according to an exemplary embodiment of the present disclosure;

FIG. 4B is a schematic diagram illustrating a second fusion mask image according to an exemplary embodiment of the present disclosure;

FIG. 4C is a schematic diagram of a first distance transformed image shown in accordance with an exemplary embodiment of the present disclosure;

FIG. 4D is a schematic diagram illustrating an adjustment range of a second blending boundary according to an exemplary embodiment of the present disclosure;

FIG. 5 is a flow chart illustrating how the fused transition weights are determined according to yet another exemplary embodiment of the present disclosure;

FIG. 6A is a flowchart illustrating how the third image is style migrated in accordance with an exemplary embodiment of the present disclosure;

FIG. 6B is a schematic diagram of a third image shown in accordance with an exemplary embodiment of the present disclosure;

FIG. 7 is a flow chart illustrating how the style transition weights are determined according to an exemplary embodiment of the present disclosure;

FIG. 8 is a flow chart illustrating how the generative warfare network is trained in accordance with an exemplary embodiment of the present disclosure;

FIG. 9 is a flowchart illustrating how a generator network in the generative confrontation network is trained based on the generator training set according to an exemplary embodiment of the present disclosure;

fig. 10 is a block diagram illustrating an image fusion processing apparatus according to an exemplary embodiment of the present disclosure;

fig. 11 is a block diagram illustrating still another image fusion processing apparatus according to an exemplary embodiment of the present disclosure;

fig. 12 is a block diagram illustrating an electronic device according to an exemplary embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the exemplary embodiments below do not represent all embodiments consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

FIG. 1A is a flow diagram illustrating a method of image fusion processing in accordance with an exemplary embodiment; the method of the present embodiment may be applied to an electronic apparatus having a first camera (e.g., a wide-angle camera, etc.) and a second camera (e.g., a tele-camera, etc.).

As shown in fig. 1A, the method comprises the following steps S101-S102:

in step S101, a first image and a second image are acquired.

In this embodiment, the first image may be collected by a first camera of the electronic device, and the second image may be collected by a second camera of the electronic device.

For example, the first image may be an image captured by a wide-angle lens with a lower resolution and a wider field of view (as shown in fig. 1B), and the second image may be an image captured by a telephoto lens with a higher resolution and a smaller field of view (not shown).

In step S102, the second image is affine-transformed into the coordinate system of the first image, so as to obtain a third image.

In this embodiment, after the first image and the second image are obtained, the second image may be affine-transformed (warp) to the coordinate system of the first image to obtain a third image (as shown in fig. 1C).

It should be noted that the above-mentioned manner of performing affine transformation on the second image can be referred to the explanation and description in the related art, and this embodiment does not limit this.

In other embodiments, the above-mentioned manner of affine transforming the second image into the coordinate system of the first image can also be referred to the following embodiment shown in fig. 2, and will not be described in detail here.

In step S103, a target fusion mask image is determined based on the first image and the third image.

For example, fig. 1D is a schematic illustration of a target fusion mask image shown in accordance with an exemplary embodiment of the present disclosure. As shown in fig. 1D, the target fusion Mask (Mask) image includes a first indication region (i.e., a white region in fig. 1D) for indicating a fusion region and a second indication region (i.e., a black region in fig. 1D) for indicating a non-fusion region, and a boundary of the first indication region and the second indication region is a target fusion boundary.

In this embodiment, the target fusion boundary is set at a position where an image difference satisfies a set condition (e.g., an image difference is minimum), where the image difference includes a difference between the first image and the third image.

In some embodiments, after the second image is affine-transformed to the coordinate system of the first image to obtain the third image, an initial fusion mask image may be generated based on a mask image generation method in the related art, and then a fusion boundary in the initial fusion mask image is adjusted to be located at a position where an image difference is minimum.

In other embodiments, the above-mentioned manner of determining the target fusion mask image based on the first image and the third image can also be seen in the embodiment shown in fig. 3 described below, which will not be described in detail herein.

In step S104, the third image and the first image are fused based on the target fusion mask image, so as to obtain a target fusion image.

In this embodiment, after determining a target fusion mask image based on the first image and the third image, the third image and the first image may be fused based on the target fusion mask image to obtain a target fusion image.

For example, fig. 1E is a schematic diagram of a target fusion image shown in accordance with an exemplary embodiment of the present disclosure. In the present embodiment, when performing image fusion, a fusion transition region (hereinafter referred to as a preset fusion transition region for convenience of distinction) may be set in advance in the target fusion mask image in order to make the image fusion transition more natural. For example, a region outside the target fusion boundary in the target fusion mask image by a set width (e.g., 5, etc.) may be used as the preset fusion transition region.

As shown in fig. 1E, within a fusion transition region of the target fusion image (i.e. Sb, which corresponds to a preset fusion transition region in the target fusion mask image), the third image and the first image may be fused based on a predetermined fusion transition weight.

In an embodiment, the determination manner of the above fusion transition weight can be referred to the following embodiment shown in fig. 5, and will not be described in detail here.

On the other hand, within a fusion region of the target fusion image, which corresponds to a region of the first indication region in the target fusion mask image except for the preset fusion transition region (i.e., a portion of the fusion region except for the fusion transition region), the pixel values of the third image may be retained.

And in a non-fusion area of the target fusion image, the pixel values of the first image may be retained, wherein the non-fusion area corresponds to the second indication area in the target fusion mask image.

In some embodiments, in order to ensure that the overall image style of the target fusion image is consistent, before the third image and the first image are fused based on the target fusion mask image, style migration may be performed on the third image to obtain a fourth image, and then the fourth image is used to replace the third image and perform image fusion with the first image to obtain the target fusion image.

It should be noted that, the manner of performing style migration on the third image may refer to an image style migration scheme in the related art, which is not limited in this embodiment.

In other embodiments, the manner of performing style migration on the third image may also be referred to the following embodiment shown in fig. 6A, which is not described in detail herein.

As can be seen from the above description, in the present embodiment, a first image and a second image are obtained, the second image is affine-transformed into a coordinate system of the first image to obtain a third image, a target fusion mask image is determined based on the first image and the third image, and then the third image and the first image can be fused based on the target fusion mask image to obtain a target fusion image, because a target fusion boundary in the target fusion mask image is set at a position where an image difference satisfies a set condition, the target fusion boundary can be limited to a position where the image difference is minimum by the set condition, so that a problem of generating a ghost image at the fusion boundary in the target fusion image during image fusion can be avoided, and a width of a fusion transition region can be appropriately reduced on the basis of ensuring a fusion effect.

FIG. 2 is a flow chart illustrating how a third image is obtained according to an exemplary embodiment of the present disclosure; the present embodiment exemplifies how to obtain the third image on the basis of the above-described embodiments. As shown in fig. 2, the affine transformation of the second image into the coordinate system of the first image in the step S102 to obtain the third image may include the following steps S201 to S202:

in step S201, image registration is performed based on the first image and the second image, so as to obtain an affine transformation relationship of the images.

In this embodiment, the registration of the first image and the second image may be completed by algorithms such as feature point matching and optical flow. Illustratively, feature point matching may be performed in an ORB (Oriented Fast and Rotated Brief) method, and/or optical flow information of the first image and the second image may be determined based on a PWC-Net (optical flow learning network) algorithm.

In some embodiments, for example, it is ensured that the color and/or brightness of a target fusion image obtained by subsequently performing image fusion is integrally consistent, so that before performing image registration based on the first image and the second image, the first image and the second image may be preprocessed based on a preset processing manner to obtain the preprocessed first image and the preprocessed second image, and then, image registration is performed based on the preprocessed first image and the preprocessed second image to obtain an image affine transformation relationship.

The preset processing mode may include at least one of brightness correction and color correction.

In step S202, the second image is affine transformed into the coordinate system of the first image based on the image affine transformation relation, so as to obtain a third image.

In this embodiment, after performing image registration based on the first image and the second image to obtain an image affine transformation relationship, the second image may be affine transformed to the coordinate system of the first image based on the image affine transformation relationship to obtain a third image.

For example, after the above-mentioned image affine transformation relationship is obtained, based on the transformation relationship of each pixel point recorded in the image affine transformation relationship, each pixel point in the second image may be affine-transformed into the coordinate system of the first image, so that the third image may be obtained.

As can be seen from the above description, in this embodiment, an affine transformation relationship is obtained by performing image registration on the basis of the first image and the second image, and the second image is affine transformed to the coordinate system of the first image to obtain the third image on the basis of the affine transformation relationship, so that the second image can be accurately affine transformed to the coordinate system of the first image, and further the subsequent fusion of the obtained third image and the first image can be realized, the problem of generating a ghost at the fusion boundary can be avoided, and the quality of the output image of the electronic device can be improved.

FIG. 3 is a flowchart illustrating how a target fusion mask image is determined based on the first image and the third image according to an exemplary embodiment of the present disclosure; the present embodiment exemplifies how to determine the target fusion mask image based on the first image and the third image on the basis of the above-described embodiments.

As shown in fig. 3, the determining of the target fusion mask image based on the first image and the third image in the above step S103 may include the following steps S301 to S302:

in step S301, pixel differences and optical flow information of the first image and the third image are determined.

In this embodiment, when it is necessary to determine a target fusion mask image based on the first image and the third image, the pixel difference and the optical flow information of the first image and the third image may be determined in advance.

The pixel difference between the first image and the third image may be calculated based on a pixel difference calculation method in the related art, which is not limited in this embodiment.

In some embodiments, optical flow information for the first image and the second image may be determined based on a PWC-Net (optical flow learning network) algorithm.

In step S302, a first fusion mask image is determined based on the pixel difference and the optical flow information.

In this embodiment, after determining the pixel difference and the optical flow information of the first image and the third image, a first fusion mask image may be determined based on the pixel difference and the optical flow information, where the first fusion mask image includes a third indication area and a fourth indication area, the third indication area is used for indicating a fusion area, the fourth indication area is used for indicating a non-fusion area, and a boundary of the third indication area and the fourth indication area is a first fusion boundary.

In some embodiments, the first fusion mask image may be determined using a preset algorithm based on the pixel difference and optical flow information described above. Illustratively, the preset algorithm may include at least one of a motion detection algorithm and an occlusion detection algorithm in the related art.

In step S303, a down-sampling process is performed on the first fusion mask image to obtain a second fusion mask image.

In this embodiment, after determining the first fusion mask image based on the pixel difference and the optical flow information, in order to increase the speed of performing subsequent fusion boundary adjustment, the first fusion mask image is subjected to down-sampling processing to obtain a second fusion mask image.

It should be noted that the frequency for performing the down-sampling process may be flexibly set based on business experience or requirements of an actual scene, for example, the frequency is set to be 4.

The second fusion mask image includes a fifth indication region and a sixth indication region, the fifth indication region is used for indicating a fusion region, the sixth indication region is used for indicating a non-fusion region, and a boundary between the fifth indication region and the sixth indication region is a second fusion boundary (i.e., a first fusion boundary after down-sampling).

In step S304, the second fusion boundary in the second fusion mask image is adjusted to a position where the image difference satisfies a set condition, so as to obtain an adjusted second fusion mask image.

In this embodiment, after the first fusion mask image is down-sampled to obtain a second fusion mask image, the second fusion boundary in the second fusion mask image may be adjusted to a position where the image difference satisfies the set condition, so as to obtain an adjusted second fusion mask image. Wherein the image difference comprises a difference between the first image and the third image.

Illustratively, the position where the above-described image difference satisfies the setting condition includes a position where the image difference is minimum.

In some embodiments, the second fusion boundary in the second fusion mask image may be adjusted to a position where the image difference satisfies a set condition based on a difference between the first image and the third image.

In other embodiments, the manner of adjusting the second fusion boundary in the second fusion mask image to the position where the image difference satisfies the setting condition may also refer to the following embodiment shown in fig. 4A, which is not repeated herein.

In step S305, the adjusted second fusion mask image is up-sampled to obtain the target fusion mask image.

In this embodiment, after the second fusion boundary in the second fusion mask image is adjusted to a position where the image difference satisfies the set condition to obtain the adjusted second fusion mask image, the adjusted second fusion mask image may be subjected to upsampling to obtain the target fusion mask image, that is, the adjusted second fusion mask image is subjected to size conversion (Resize) to the size of the first fusion mask image to obtain the target fusion mask image. And the target fusion boundary in the target fusion mask image is the "adjusted second fusion boundary" after the up-sampling.

It is to be understood that, in the case that the frequency of performing the down-sampling process is 4.

As can be seen from the above description, in this embodiment, by determining the pixel difference and the optical flow information of the first image and the third image, determining a first fusion mask image based on the pixel difference and the optical flow information, then performing down-sampling processing on the first fusion mask image to obtain a second fusion mask image, and adjusting the second fusion boundary in the second fusion mask image to a position where the image difference satisfies a set condition, so as to obtain an adjusted second fusion mask image, and then performing up-sampling processing on the adjusted second fusion mask image to obtain the target fusion mask image, it is possible to accurately determine the target fusion mask image, lay a foundation for subsequently fusing the third image and the first image based on the target fusion mask image, and because the down-sampling processing is performed on the first fusion mask image first, and then performing boundary fusion adjustment on the obtained second fusion mask image, it is possible to implement small-scale fusion boundary adjustment, and compared with directly performing boundary adjustment on the original scale, it is possible to improve the output efficiency of the subsequent image.

FIG. 4A is a flowchart illustrating how to adjust the second fusion boundary in the second fusion mask image to a position where the image difference satisfies a set condition according to an exemplary embodiment of the present disclosure; FIG. 4B is a schematic diagram illustrating a second fusion mask image according to an exemplary embodiment of the present disclosure; FIG. 4C is a schematic diagram of a first distance transformed image shown in accordance with an exemplary embodiment of the present disclosure; fig. 4D is a schematic diagram illustrating an adjustment range of a second blending boundary according to an exemplary embodiment of the present disclosure.

The present embodiment exemplifies how to adjust the second fusion boundary in the second fusion mask image to a position where the image difference satisfies the setting condition on the basis of the above-described embodiments.

As shown in fig. 3, the adjusting the second fusion boundary in the second fusion mask image to a position where the image difference satisfies the set condition in the above step S304 to obtain the adjusted second fusion mask image may include the following steps S401 to S406:

in step S401, distance conversion is performed on the sixth indication region in the second fusion mask image, so as to obtain a first distance conversion image.

In this embodiment, when a second blending boundary (i.e., a boundary between a black area and a white area shown in fig. 4B) in a second blending mask image (such as the image shown in fig. 4B) needs to be adjusted, distance transformation may be performed on the sixth indication area in the second blending mask image to obtain a first distance transformation image (such as the image shown in fig. 4C).

As shown in fig. 4C, the first distance transformed image may be used to characterize the distance between each pixel in the second fusion mask image and the second fusion boundary. That is, different color regions in the image may be used to indicate different distances from the second blend boundary.

It should be noted that, the above-mentioned manner of performing distance transformation may refer to explanation and description in the related art, and this embodiment does not limit this.

In step S402, an adjustment range of the second fusion boundary in the second fusion mask image is determined based on the first distance transformed image.

In this embodiment, after the distance conversion is performed on the sixth indication region in the second fusion mask image to obtain a first distance conversion image, an adjustment range of the second fusion boundary in the second fusion mask image may be determined based on the first distance conversion image (e.g., the adjustment range S shown in fig. 4D).

As shown in fig. 4D, the range of the distance [ a, b ] may be set as the adjustment range S of the second fusion boundary. Illustratively, a ≦ 0 (i.e., a may be a negative number), b =20. That is, the second blending boundary is expanded outward by a distance of 20 as the adjustment range S of the second blending boundary.

In step S403, pixel differences of the first image and the third image are determined.

In this embodiment, after determining the adjustment range of the second fusion boundary in the second fusion mask image based on the first distance transform image, the pixel difference between the first image and the third image may be determined.

For example, assume the first image is I ₁ (p _i ) The third image is I ₃ (p _i )，p _i For the ith point in the image, the pixel difference between the first image and the third image may be:

|I ₁ (p _i )-I ₃ (p _i )|；(4-1)

in step S404, a gaussian weighted average of pixel differences in a setting window corresponding to each pixel in the adjustment range is determined based on the pixel differences, so as to obtain a gaussian weighted average image.

In this embodiment, after determining the pixel difference between the first image and the third image, a gaussian weighted average of the pixel difference in the setting window corresponding to each pixel in the adjustment range may be determined based on the pixel difference, so as to obtain a gaussian weighted average image, where the image may be used to describe the pixel difference between the first image and the third image.

For example, the above-described gaussian weighted average image D (p) can be determined based on the following equation (4-2):

D(p)＝∑ _i∈N w _i |I ₁ (p _i )-I ₃ (p _i )|； (4-2)

in the above formula, N is p _i Point corresponds to a set of pixels, w, in a set window at the position of the original image _i To set the Gaussian weight within the window, I ₁ (p _i ) And I ₃ (p _i ) Representing the first image and the third image, respectively. For example, the size of the setting window may be 7 × 7.

In step S405, an objective function is constructed using a graph cut algorithm based on the gaussian weighted average image.

In this embodiment, the objective function includes a constraint term and a smoothing term, the constraint term is used to limit the positions of the fifth indication region and the sixth indication region in the second fusion mask image, and the smoothing term is used to adjust the second fusion boundary within the adjustment range to a position where an image difference satisfies a set condition based on the gaussian weighted average image, where the image difference includes a difference between the first image and the third image.

For example, an objective function E as described in the following equation (4-3) may be constructed:

E＝∑ _p∈S E _d (p)+∑ _p,q∈N E _s (p,q)； (4-3)

in the above formula, S is a set of points within the adjustment range, N is a set of pixels in a setting window corresponding to the position of the original image corresponding to the P point, and q is a point around the P point in the setting window; e _d (p) is a constraint term for limiting the position of the fifth indication region (i.e., the region indicating the fused region) and the sixth indication region (i.e., the region indicating the non-fused region) in the second fusion mask image; e _s (p, q) is a smoothing term for adjusting the second fusion boundary within the adjustment range to a position where an image difference satisfies a set condition (e.g., an image difference is minimum) based on the gaussian-weighted average image.

In some embodiments, the constraint term E _d The form of (p) may be represented by the following formula (4-5):

in the above-mentioned formula, the compound has the following structure,

is the adjusted second fusion mask.

As is apparent from the above formula (4-5):

(1) Let E be when S (p) ≠ { a, b }, i.e., p-points in S are not within the adjustment range { a, b }, then _d (p) =0, indicating thatThe cost of the expression is low;

as can be seen from (1), in the present embodiment, when the second fusion boundary is adjusted, the point outside the adjustment range is not focused.

(2) When the temperature is higher than the set temperature

I.e., the point p in S is on the boundary line a, and the point is located in the sixth indication region (region indicating non-fusion region), let the constraint term E _d (p) =0, indicating that the cost of this expression is small;

(3) When in use

I.e., the point p in S is on the boundary line a, and the point is located in the fifth indication region (region indicating the fusion region), let the constraint term E _d (p) = ∞, indicating that the cost of this expression is large;

as can be seen from (2) and (3), in the present embodiment, when the second fusion boundary is adjusted, the predetermined boundary line a belongs to the sixth indication region.

(4) When the temperature is higher than the set temperature

I.e. the point p in S is on the boundary line b and the point is in the fifth indication area, let the constraint term E _d (p) =0, indicating that the cost of this expression is small; />

(5) When in use

I.e. the point p in S is on the boundary line b and the point is in the sixth indication area, let the constraint term E _d (p) = ∞, indicating that the cost of this expression is large;

as can be seen from (4) and (5), in the present embodiment, when the second fusion boundary is adjusted, the predetermined boundary line b belongs to the fifth indication region.

That is, in the present embodiment, when adjusting the second fusion boundary in the second fusion mask image, { a, b } (i.e., the region between the boundary line a and the boundary line b) is divided into the adjustment range, and the region inside the boundary line a (including the boundary line a) is set as the sixth indication region (i.e., the region indicating the non-fusion region), and the region outside the boundary line b (including the boundary line b) is set as the fifth indication region (i.e., the region indicating the fusion region); in a subsequent step, the second blending boundary is adjusted within the adjustment range { a, b }.

In some embodiments, the smoothing term E described above _s The form of (p, q) can be represented by the following formula (4-6):

wherein D (p) and D (q) are Gaussian weighted average value images of a point p and a point q respectively,

and &>

The fused mask images for p and q points, respectively.

According to the above formula (4-6), when

Then, then->

Then E _s (p, q) = max (D (p), D (q)), indicating that the expression is costly; and when>

Then, then->

Then E _s (p, q) =0, indicating that the cost of this expression is small.

That is, in the present embodiment, when the second fusion boundary in the second fusion mask image is adjusted, the second fusion boundary may be forced toward

And "max (D (p), D (q))" is adjusted in a direction having a smaller value, because it is possible to adjust the second fusion boundary in the second fusion mask image to a position where the image difference is minimum.

In step S406, the objective function is solved based on a preset function solving algorithm to obtain an adjusted second fusion mask image.

In this embodiment, after an objective function is constructed by using a graph cut algorithm based on the gaussian weighted average image, the objective function may be solved based on a preset function solving algorithm to obtain an adjusted second fusion mask image.

It should be noted that the preset function solving algorithm may be set based on actual service needs, for example, the preset function solving algorithm is set as a Boykov-Kolmogorov algorithm, which is not limited in this embodiment.

As can be seen from the above description, in this embodiment, a first distance transformation image is obtained by performing distance transformation on the sixth indication area in the second fusion mask image, an adjustment range of the second fusion boundary in the second fusion mask image is determined based on the first distance transformation image, a pixel difference between the first image and the third image is determined, a gaussian weighted average value of pixel differences in a set window corresponding to each pixel in the adjustment range is determined based on the pixel difference, a gaussian weighted average value image is obtained, further, based on the gaussian weighted average value image, an objective function is constructed by using a graph cut algorithm, and then the objective function is solved based on a preset function solving algorithm, so as to obtain an adjusted second fusion mask image, which can be established by constructing and solving the objective function so as to adjust the second fusion boundary in the second fusion mask image to a position where the image difference satisfies a set condition, and further, can perform subsequent sampling processing on the adjusted second fusion mask image, so as to obtain a basis of the target fusion mask image.

FIG. 5 is a flow chart illustrating how the fused transition weights are determined according to yet another exemplary embodiment of the present disclosure; on the basis of the above embodiments, the present embodiment exemplifies how to determine the fusion transition weight.

As shown in fig. 5, the image fusion processing method of the present embodiment may further include, on the basis of the above-mentioned embodiment, determining the fusion transition weight in advance based on the following steps S501 to S502:

in step S501, distance conversion is performed on the second indication region in the target fusion mask image to obtain a second distance conversion image.

In this embodiment, after obtaining the target fusion mask image, distance conversion may be performed on the second indication region in the target fusion mask image to obtain a second distance conversion image.

The second distance transformation image may be used to characterize the distance between each pixel in the target fusion mask image and the target fusion boundary.

In step S502, a fusion transition weight corresponding to the preset fusion transition region is determined based on the second distance transformed image.

In this embodiment, after performing distance transformation on the second indication region in the target fusion mask image to obtain a second distance transformation image, the fusion transition weight corresponding to the preset fusion transition region may be determined based on the second distance transformation image.

In some embodiments, after obtaining the second distance transformation image, the fusion transition weight setting of a point corresponding to a point closer to a boundary of a preset fusion transition region may be larger, and the fusion transition weight setting of a point corresponding to a point farther from the boundary of the preset fusion transition region may be smaller, based on the second distance transformation image, where the point is a point in the preset fusion transition region.

For example, the fusion transition weight W may be determined based on the following equation (5-1) _b ：

Wherein s is _b For the boundary distance of the preset fusion transition region, i.e. the range of the preset fusion transition region is [0,s _b ]Therefore d is<s _b The above-mentioned points may be represented as points in the preset blend transition region.

As can be seen from the above description, in this embodiment, the second indication region is subjected to distance transformation in the target fusion mask image to obtain a second distance transformation image, and the fusion transition weight corresponding to the preset fusion transition region is determined based on the second distance transformation image, so that the fusion transition weight can be accurately determined, and then the third image and the first image are fused based on the fusion transition weight in the fusion transition region of the target fusion image, so that the problem of ghost at the fusion boundary can be better avoided, and the quality of the image output by the electronic device can be improved.

FIG. 6A is a flowchart illustrating how a style migration may be performed on the third image according to an exemplary embodiment of the present disclosure; the present embodiment exemplifies how to perform style migration on the third image on the basis of the above embodiments.

As shown in fig. 6A, the image fusion processing method of the present embodiment may further include, on the basis of the foregoing embodiment, performing style migration on the third image in advance based on the following steps S601-S602:

in step S601, the third image is input to a generator network in a pre-trained generation countermeasure network, so as to obtain a style transition residual image from the third image to the first image.

In this embodiment, when the style migration of the third image is required, the third image may be input to a generator network in a pre-trained generation countermeasure network, so as to generate a style migration residual image from the third image to the first image through the generator network.

For example, a third sample image may be obtained by affine transforming the second sample image to a coordinate system of the first sample image based on a first sample image and a second sample image respectively acquired by the first sample camera and the second sample camera, and then training the above-mentioned generation countermeasure network (GAN) based on the first sample image and the second sample image. The generator network in the trained generative confrontation network may be used to generate a style migration residual image of the third image to the first image based on the input third image.

The first sample camera may be a wide-angle camera having the same model as the first camera, and the second sample camera may be a telephoto camera having the same model as the second camera.

In step S602, based on a predetermined style transition weight, the style transition residual image is superimposed on the style transition region of the third image, so as to obtain a fourth image.

For example, fig. 6B is a schematic diagram of a third image shown in accordance with an exemplary embodiment of the present disclosure;

in this embodiment, when performing image fusion, a style transition region (hereinafter referred to as a preset style transition region for convenience of distinction) may be set in advance in the target fusion mask image in order to make the image style fusion more natural. For example, a region with a set width (e.g., 20, etc.) outside the target fusion boundary in the target fusion mask image may be used as the preset style transition region.

As shown in fig. 6B, after obtaining the style transition residual image, the style transition residual image is superimposed on the style transition region of the third image (i.e., sa, which corresponds to the preset blending transition region in the target blending mask image) based on the predetermined style transition weight, so as to obtain a fourth image.

In an embodiment, the style transition weight may be determined in the following embodiment shown in fig. 7, which will not be described in detail.

As can be seen from the above description, in this embodiment, the third image is input to a generator network in a pre-trained generation countermeasure network, so as to obtain a style migration residual image from the third image to the first image, and based on a predetermined style transition weight, the style migration residual image is superimposed on a style transition region of the third image, so as to obtain a fourth image, so that style migration of the third image can be implemented, and then image fusion can be implemented subsequently on the basis of the fourth image instead of the third image and the first image, so as to obtain a target fusion image, so that it can be ensured that the overall image styles of the target fusion image are consistent, thereby avoiding the problem of seam marks at a fusion boundary, and further improving the quality of an image output by an electronic device.

FIG. 7 is a flow chart illustrating how the style transition weights are determined according to an exemplary embodiment of the present disclosure; the present embodiment is exemplified by how to determine the style transition weight on the basis of the above embodiments.

As shown in fig. 7, the image fusion processing method of the present embodiment may further include, on the basis of the foregoing embodiment, determining style transition weights in advance based on the following steps S701 to S702:

in step S701, performing distance transformation on the second indication region in the target fusion mask image to obtain a third distance transformation image;

in this embodiment, after obtaining the target fusion mask image, distance conversion may be performed on the second indication region in the target fusion mask image to obtain a third distance conversion image.

The third distance transformed image may be used to characterize the distance between each pixel in the target fusion mask image and the target fusion boundary.

In step S702, style transition weights corresponding to preset style transition regions of the target fusion mask image are determined based on the third distance transform image.

In this embodiment, after performing distance transformation on the second indication region in the target fusion mask image to obtain a third distance transformation image, the style transition weight corresponding to the preset style transition region of the target fusion mask image may be determined based on the third distance transformation image.

In some embodiments, after obtaining the third distance transformation image, based on the third distance transformation image, the fusion transition weight corresponding to a point closer to the boundary of the preset style transition region may be set to be larger, and the fusion transition weight corresponding to a point farther from the boundary of the preset style transition region may be set to be smaller, where the point is a point in the preset fusion transition region.

For example, style transition weight W may be determined based on equation (7-1) below _a ：

Wherein s is _a The boundary distance of the preset style transition region is [0,s ] _a ]Therefore d is<s _a It may be represented as a point in the preset style transition region.

As can be seen from the above description, in this embodiment, the second indication region is subjected to distance transformation in the target fusion mask image to obtain a third distance transformation image, and the style transition weight corresponding to the preset style transition region of the target fusion mask image is determined based on the third distance transformation image, so that the style transition weight can be accurately determined, and then the style migration residual image is superimposed on the style transition region of the third image based on the style transition weight to obtain a fourth image, so that the style consistency of the fusion images can be ensured, the problem of stitching marks generated at the fusion boundary can be avoided, and the quality of the image output by the electronic device can be further improved.

FIG. 8 is a flow chart illustrating how the generative warfare network is trained in accordance with an exemplary embodiment of the present disclosure; the present embodiment exemplifies how to train the generation countermeasure network on the basis of the above-described embodiments.

As shown in fig. 8, the image fusion processing method of the present embodiment may further include training the generation countermeasure network based on the following steps S801 to S806:

in step S801, a plurality of first sample images and a plurality of third sample images are acquired.

In this embodiment, the third sample image may include an image obtained by affine-transforming a second sample image into a coordinate system of the first sample image, the first sample image may be acquired by a first sample camera, and the second sample image may be acquired by a second sample camera. The first sample camera may be a wide-angle camera of the same type as the first camera, and the second sample camera may be a telephoto camera of the same type as the second camera, so that a field angle of the first sample camera is larger than a field angle of the second sample camera.

In step S802, the plurality of first sample images and the plurality of third sample images are used as a decision maker training set, and a decision maker network in the generated countermeasure network is trained to obtain a trained decision maker network.

The above-mentioned decider network may be configured to classify based on a comparison result of image styles (i.e., a result of comparing a fourth image obtained by superimposing the third image and the style migration residual image with the style of the first image), for example: when the similarity is higher than or equal to the set similarity threshold, outputting 1; and when the similarity is lower than the set similarity threshold, outputting 0.

For example, a large number of first sample images may be labeled as 1, a large number of third sample images may be labeled as 0, and these sample images are used as a decision device training set to perform classification training on a pre-constructed decision device network, so as to obtain a trained decision device network D. The type of the decider network may be set based on actual needs, such as a VGG16 model, and the VGG16 model may have 13 convolutional layers and 3 fully-connected layers.

In step S803, sliding is performed on the first sample image and the third sample image, respectively, based on a preset sliding window.

In this embodiment, a preset sliding window may be used to slide on the first sample image and the third sample image, and detect a Structural Similarity (SSIM) between the two images in the sliding window.

It should be noted that the size of the sliding window may be set based on actual needs, for example, set to 512 × 512, which is not limited in this embodiment.

In step S804, in response to detecting that the structural similarity SSIM between two images within the sliding window is greater than or equal to a set threshold, the two images are taken as a set of training images.

In this embodiment, when it is detected that the structural similarity SSIM between two images in the sliding window is greater than or equal to the set threshold T, the two images may be regarded as a set of training images.

It should be noted that the size of the setting threshold T may be set based on actual needs, such as 0.95, and this embodiment is not limited thereto.

In step S805, the process of acquiring a set of training images is repeated, and the obtained sets of training images are used as a generator training set.

In this embodiment, the process in step S804 is repeated in multiple pairs of images, so that multiple sets of training images can be obtained, and a generator training set can be further formed.

In step S806, the generator network in the generator countermeasure network is trained based on the generator training set, resulting in a trained generator network.

In this embodiment, after the generator training set is obtained, the generator network in the generation countermeasure network is trained based on the generator training set, so as to obtain a trained generator network G.

After training is complete, a style migration residual image from the third image to the first image may be generated based on the generator network.

As can be seen from the above description, in this embodiment, training based on a sample image countermeasure network can be implemented, and then, the third image can be subsequently input to a generator network in a trained generation countermeasure network, so as to obtain a style migration residual image from the third image to the first image, thereby implementing style migration on the third image, and ensuring that the overall image styles of the target fusion image are consistent, thereby avoiding the problem of stitching marks at the fusion boundary, and further improving the quality of the image output by the electronic device.

FIG. 9 is a flowchart illustrating how a generator network in the generative confrontation network is trained based on the generator training set according to an exemplary embodiment of the present disclosure; the present embodiment exemplifies how to train the generator network in the generation countermeasure network based on the generator training set on the basis of the above-described embodiments.

As shown in fig. 9, training the generator network in the generator countermeasure network based on the generator training set in the above step S806 may include the following steps S901 to S902:

in step S901, a generator loss function is constructed.

The loss function comprises at least one of a constraint item, a characteristic similarity item and a generation item, wherein the constraint item is determined based on the image content difference between the third sample image after the third sample image is subjected to the style transformation and the self image content, the characteristic similarity item is determined based on the image content difference between the third sample image after the style transformation and the first sample image, and the generation item is determined based on the output probability of the decider network.

For example, the generator Loss function Loss may be constructed based on the following equation (9-1):

Loss＝α·l _mse +β·l _vgg/n +γ·l _gen ； (9-1)

in the above formula, /) _mse For the constraint term, the MSE (Mean Squared Error) of the third image after the style transformation and the third image are represented) Keeping the image content stable after the third image style conversion;

l _vgg/n representing MSE of the n-th layer VGG network feature map of the third image and the first image after the style transformation for the feature similarity item; exemplarily, n =7;

l _gen to generate terms, the logarithm of the probability of the output of the decision network is characterized, so that the probability that the third image after the style transformation is judged as the first image (i.e. the probability that the similarity of the third image after the style transformation and the first image is judged to be higher than the set similarity threshold) can be forced to increase;

alpha, beta and gamma are weight adjustment factors of each item respectively.

Illustratively, the constraint term l _mse Can be represented by the following formula (9-2):

in the formula, W and H are the width and height of the third image, respectively,

denotes a point of coordinates (x, y) in the third image, G (I) ⁽³⁾ ) _x,y Representing a point with coordinates (x, y) in the style transition residual image from the third image to the first image.

Illustratively, the feature similarity term l _vgg/n Can be represented by the following formula (9-3):

in the formula, W _n And H _n Width and height of the characteristic diagram of the nth layer VGG network respectively,

the point with coordinates (x, y) in the n-th layer VGG network feature map of the first image is shown.

Illustratively, the term l is generated _gen Shape ofThe formula (II) can be represented by the following formula (9-4):

l _gen ＝-log{D(G(I ⁽³⁾ )+I ⁽³⁾ )}； (9-4)

in the formula, D (G (I) ⁽³⁾ )+I ⁽³⁾ ) Is the output probability of the decider network D.

In step S902, the loss function is solved based on a preset optimization algorithm to obtain a trained generator network.

In this embodiment, after the generator loss function is constructed, the loss function may be solved based on a preset optimization algorithm to obtain a trained generator network, that is, based on the preset optimization algorithm, a generator network parameter that minimizes the loss function is determined and is used as a parameter of the trained generator network.

It should be noted that the preset optimization algorithm may be set based on actual service needs, for example, the preset optimization algorithm is set as a Boykov-Kolmogorov algorithm, and the present embodiment is not limited thereto.

As can be seen from the above description, in this embodiment, by constructing a generator loss function, and solving the loss function based on a preset optimization algorithm, a generator network after training is obtained, so that the generator network can be accurately trained, and then the third image can be subsequently input to a generator network in a pre-trained generation countermeasure network, so as to obtain a style migration residual image from the third image to the first image, thereby realizing style migration of the third image, and ensuring that the overall image styles of the target fusion image are consistent, so that the problem of seam marks at the fusion boundary can be avoided, and the quality of the image output by the electronic device can be further improved.

Fig. 10 is a block diagram illustrating an image fusion processing apparatus according to an exemplary embodiment of the present disclosure; the apparatus of the present embodiment may be applied to an electronic device having a first camera (e.g., a wide-angle camera, etc.) and a second camera (e.g., a tele camera, etc.).

As shown in fig. 10, the apparatus includes: an image acquisition module 110, an image transformation module 120, a mask determination module 130, and an image fusion module 140, wherein:

the image acquiring module 110 is configured to acquire a first image and a second image, where the first image is acquired by a first camera of the electronic device, the second image is acquired by a second camera of the electronic device, and a field angle of the first camera is larger than a field angle of the second camera;

the image transformation module 120 is configured to affine-transform the second image into a coordinate system of the first image to obtain a third image;

a mask determining module 130, configured to determine a target fusion mask image based on the first image and the third image, where the target fusion mask image includes a first indication region and a second indication region, the first indication region is used to indicate a fusion region, the second indication region is used to indicate a non-fusion region, a boundary of the first indication region and the second indication region is a target fusion boundary, the target fusion boundary is set at a position where an image difference satisfies a set condition, and the image difference includes a difference between the first image and the third image;

and an image fusion module 140, configured to fuse the third image and the first image based on the target fusion mask image to obtain a target fusion image.

As can be seen from the above description, the apparatus of this embodiment obtains the first image and the second image, affine-transforms the second image to a coordinate system of the first image to obtain a third image, determines a target fusion mask image based on the first image and the third image, and further can fuse the third image and the first image based on the target fusion mask image to obtain a target fusion image, where a target fusion boundary in the target fusion mask image is set at a position where an image difference satisfies a set condition, so that the target fusion boundary can be limited to a position where the image difference is minimum by the set condition, thereby avoiding a problem of generating a ghost at the fusion boundary in the target fusion image during image fusion, and also can appropriately reduce a width of a fusion transition region on the basis of ensuring a fusion effect, compared with a fusion transition manner in the related art, which can achieve an effect of narrow fusion transition, and further avoid a problem of generating a ghost at the fusion boundary, thereby improving quality of an output image of an electronic device.

Fig. 11 is a block diagram illustrating still another image fusion processing apparatus according to an exemplary embodiment of the present disclosure; the apparatus of the present embodiment may be applied to an electronic device having a first camera (e.g., a wide-angle camera, etc.) and a second camera (e.g., a tele-camera, etc.).

The image obtaining module 210, the image transforming module 220, the mask determining module 230, and the image fusing module 240 are the same as the image obtaining module 110, the image transforming module 120, the mask determining module 130, and the image fusing module 140 in the embodiment shown in fig. 10, and are not described herein again.

As shown in fig. 11, the image transformation module 220 may include:

a relationship obtaining unit 221, configured to perform image registration based on the first image and the second image to obtain an affine transformation relationship of the images;

and the image transformation unit 222 is configured to affine-transform the second image into the coordinate system of the first image based on the image affine transformation relationship to obtain a third image.

In some embodiments, the image transformation module 220 may further include:

a preprocessing unit 223, configured to preprocess the first image and the second image based on a preset processing manner, so as to obtain the preprocessed first image and second image, where the preset processing manner includes at least one of brightness correction and color correction;

furthermore, the relationship obtaining unit 221 may be further configured to perform, based on the first image and the second image after the preprocessing, an operation of performing image registration based on the first image and the second image to obtain an affine transformation relationship of the images.

In some embodiments, the mask determination module 230 may include:

a difference information determining unit 231 for determining pixel differences and optical flow information of the first image and the third image;

a first mask determining unit 232, configured to determine a first fusion mask image based on the pixel difference and the optical flow information, where the first fusion mask image includes a third indication area and a fourth indication area, the third indication area is used to indicate a fusion area, the fourth indication area is used to indicate a non-fusion area, and a boundary of the third indication area and the fourth indication area is a first fusion boundary;

a second mask obtaining unit 233, configured to perform downsampling on the first fusion mask image to obtain a second fusion mask image, where the second fusion mask image includes a fifth indication region and a sixth indication region, the fifth indication region is used to indicate a fusion region, the sixth indication region is used to indicate a non-fusion region, and a boundary between the fifth indication region and the sixth indication region is a second fusion boundary;

a fusion boundary adjusting unit 234, configured to adjust a second fusion boundary in the second fusion mask image to a position where an image difference satisfies a set condition, so as to obtain an adjusted second fusion mask image, where the image difference includes a difference between the first image and the third image;

a target mask obtaining unit 235, configured to perform upsampling processing on the adjusted second fusion mask image to obtain the target fusion mask image, where the target fusion boundary is obtained based on upsampling of the adjusted second fusion boundary.

In some embodiments, the first mask determining unit 232 may be further configured to determine the first fused mask image by using a preset algorithm based on the pixel difference and the optical flow information, the preset algorithm including at least one of a motion detection algorithm and an occlusion detection algorithm.

In some embodiments, the blending boundary adjustment unit 234 may be further configured to:

determining pixel differences of the first image and the third image;

constructing an objective function by using a graph cut algorithm based on the gaussian weighted average image, wherein the objective function comprises a constraint term and a smoothing term, the constraint term is used for limiting the positions of the fifth indication region and the sixth indication region in the second fusion mask image, the smoothing term is used for adjusting the second fusion boundary within the adjustment range to a position where the image difference meets a set condition based on the gaussian weighted average image, and the image difference comprises the difference between the first image and the third image;

In some embodiments, the image fusion module 240 may include:

a first fusing unit 241, configured to fuse the third image and the first image based on a predetermined fusion transition weight in a fusion transition region of the target fusion image, where the fusion transition region corresponds to a preset fusion transition region in the target fusion mask image;

a second fusion unit 242, configured to reserve a pixel value of the third image in a fusion region of the target fusion image, where the fusion region corresponds to a region of the first indication region excluding the preset fusion transition region;

a third fusing unit 243, configured to reserve pixel values of the first image in a non-fused region of the target fused image, where the non-fused region corresponds to the second indication region.

In some embodiments, the image fusion module 240 may further include a fusion transition weight determination unit 244;

a fused transition weight determination unit 244, which may be configured to:

In some embodiments, the apparatus may further include:

a style migration module 250, configured to perform style migration on the third image to obtain a fourth image;

furthermore, the image fusion module 240 may be further configured to perform the operation of fusing the third image and the first image based on the target fusion mask image to obtain a target fusion image, based on the fourth image instead of the third image.

In some embodiments, the style migration module 250 may include:

a residual image obtaining unit 251, configured to input the third image to a generator network in a pre-trained generation countermeasure network, so as to obtain a style transition residual image from the third image to the first image;

a fourth image obtaining unit 252, configured to superimpose the style migration residual image onto a style transition region of the third image based on a predetermined style transition weight to obtain a fourth image, where the style transition region of the third image corresponds to a preset style transition region of the target fusion mask image.

In some embodiments, the style migration module 250 may further include a style transition weight determination unit 253;

the style transition weight determining unit 253 may be configured to:

In some embodiments, the apparatus generates an confrontation network training module 260;

the generate confrontation network training module 260 may include:

a sample acquiring unit 261, configured to acquire a plurality of first sample images and a plurality of third sample images, where the third sample images include an image obtained by affine-transforming a second sample image into a coordinate system of the first sample image, the first sample image is acquired by a first sample camera, the second sample image is acquired by a second sample camera, and a field angle of the first sample camera is larger than a field angle of the second sample camera;

a decision device training unit 262, configured to train a decision device network in the generated countermeasure network by using the multiple first sample images and the multiple third sample images as a decision device training set, so as to obtain a trained decision device network, where the decision device network is used for classifying based on a comparison result of image styles;

a window sliding unit 263, configured to respectively slide on the first sample image and the third sample image based on a preset sliding window;

an image determining unit 264, configured to, in response to detecting that the structural similarity SSIM between two images in the sliding window is greater than or equal to a set threshold, treat the two images as a set of training images;

a training set determining unit 265, configured to repeat a process of obtaining a group of training images, and use the obtained multiple groups of training images as a generator training set;

a generator training unit 266, configured to train a generator network in the generated countermeasure network based on the generator training set, so as to obtain a trained generator network.

In some embodiments, the generator training unit 266 is further configured to:

constructing a generator loss function, wherein the loss function comprises at least one of a constraint item, a characteristic similar item and a generation item, the constraint item is determined based on the image content difference between the third sample image after the style transformation and the third sample image, the characteristic similar item is determined based on the image content difference between the third sample image after the style transformation and the first sample image, and the generation item is determined based on the output probability of the determiner network;

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

FIG. 12 is a block diagram illustrating an electronic device in accordance with an example embodiment. For example, the device 900 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and so forth.

Referring to fig. 12, device 900 may include one or more of the following components: processing component 902, memory 904, power component 906, multimedia component 908, audio component 910, input/output (I/O) interface 912, sensor component 914, and communication component 916.

The processing component 902 generally controls the overall operation of the device 900, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. Processing component 902 may include one or more processors 920 to execute instructions to perform all or a portion of the steps of the methods described above. Further, processing component 902 can include one or more modules that facilitate interaction between processing component 902 and other components. For example, the processing component 902 can include a multimedia module to facilitate interaction between the multimedia component 908 and the processing component 902.

The memory 904 is configured to store various types of data to support operation at the device 900. Examples of such data include instructions for any application or method operating on device 900, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 904 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power component 906 provides power to the various components of the device 900. The power components 906 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device 900.

The multimedia components 908 include a screen that provides an output interface between the device 900 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 908 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 900 is in an operational mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 910 is configured to output and/or input audio signals. For example, audio component 910 includes a Microphone (MIC) configured to receive external audio signals when device 900 is in an operational mode, such as a call mode, a record mode, and a voice recognition mode. The received audio signals may further be stored in the memory 904 or transmitted via the communication component 916. In some embodiments, audio component 910 also includes a speaker for outputting audio signals.

I/O interface 912 provides an interface between processing component 902 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 914 includes one or more sensors for providing status assessment of various aspects of the device 900. For example, the sensor component 914 may detect an open/closed state of the device 900, the relative positioning of components, such as a display and keypad of the device 900, the sensor component 914 may also detect a change in the position of the device 900 or a component of the device 900, the presence or absence of user contact with the device 900, orientation or acceleration/deceleration of the device 900, and a change in the temperature of the device 900. The sensor assembly 914 may also include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor assembly 914 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 914 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 916 is configured to facilitate communications between the device 900 and other devices in a wired or wireless manner. The device 900 may access a wireless network based on a communication standard, such as WiFi,2G or 3g,4g or 5G or a combination thereof. In an exemplary embodiment, the communication component 916 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 916 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the device 900 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 904 comprising instructions, executable by the processor 920 of the device 900 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, there is also provided a chip comprising a processor and an interface;

the processor is used for reading instructions through the interface to execute the method.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An image fusion processing method, characterized in that the method comprises:

2. The method of claim 1, wherein affine transforming the second image into the coordinate system of the first image to obtain a third image comprises:

and performing affine transformation on the second image to a coordinate system of the first image based on the image affine transformation relation to obtain a third image.

3. The method of claim 2, further comprising:

4. The method of claim 1, wherein determining a target fusion mask image based on the first image and the third image comprises:

adjusting a second fusion boundary in the second fusion mask image to a position where an image difference satisfies a set condition, to obtain an adjusted second fusion mask image, where the image difference includes a difference between the first image and the third image;

5. The method of claim 4, wherein said determining a first fusion mask image based on said pixel differences and said optical flow information comprises:

6. The method according to claim 4, wherein the adjusting the second fusion boundary in the second fusion mask image to a position where the image difference satisfies a set condition, and obtaining an adjusted second fusion mask image comprises:

determining pixel differences of the first image and the third image;

and solving the target function based on a preset function solving algorithm to obtain an adjusted second fusion mask image.

7. The method of claim 1, wherein said fusing the third image with the first image based on the target fused mask image to obtain a target fused image comprises:

8. The method according to claim 7, further comprising determining the fusion transition weight in advance based on:

9. The method of claim 1 or 7, further comprising:

carrying out style migration on the third image to obtain a fourth image;

10. The method of claim 9, wherein the style migration of the third image to obtain a fourth image comprises:

and superposing the style migration residual image to a style transition region of the third image based on a predetermined style transition weight to obtain a fourth image, wherein the style transition region of the third image corresponds to a preset style transition region of the target fusion mask image.

11. The method of claim 10, further comprising determining the style transition weights in advance based on:

12. The method of claim 10, further comprising training the generative countermeasure network in advance based on:

taking the multiple first sample images and the multiple third sample images as a judger training set, training a judger network in the generated confrontation network to obtain a trained judger network, wherein the judger network is used for classifying based on comparison results of image styles;

13. The method of claim 12, wherein training the generator network in the generative confrontation network based on the generator training set comprises:

14. An image fusion processing apparatus, characterized in that the apparatus comprises:

the image acquisition module is used for acquiring a first image and a second image, wherein the first image is acquired by a first camera of the electronic equipment, the second image is acquired by a second camera of the electronic equipment, and the field angle of the first camera is larger than that of the second camera;

15. An electronic device, characterized in that the device comprises:

wherein the processor is configured, upon execution of the computer program, to implement:

16. A computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing:

17. A chip, comprising:

a processor and an interface;

the processor is used for reading instructions through the interface to execute the image fusion processing method of any one of claims 1 to 13.