CN114782297B

CN114782297B - Image fusion method based on motion-friendly multi-focus fusion network

Info

Publication number: CN114782297B
Application number: CN202210396277.5A
Authority: CN
Inventors: 刘帅成; 郑梓楠; 陈才; 章程
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2022-04-15
Filing date: 2022-04-15
Publication date: 2023-12-26
Anticipated expiration: 2042-04-15
Also published as: CN114782297A

Abstract

The invention relates to the technical field of image enhancement and computer vision, and discloses an image fusion method based on a motion-friendly multi-focus fusion network, which comprises the following steps: step S1: shooting two partially focused images by using a camera, wherein one image is in foreground focusing and background blurring, and the other image is in foreground blurring and background focusing; step S2: judging whether the cameras in the two images move or not, if so, entering a step S3, and if not, entering the step S3 or the step S4; judging whether the object in the two images moves or not, if so, entering a step S3, and if not, entering the step S3 or the step S4; step S3: fusing the two input photos by using a motion-friendly multi-focus fusion network, directly outputting the two fused photos according to the motion-friendly multi-focus fusion network, and corresponding to the input images one by one; step S4: and fusing the two input photos by using a fusion network, and outputting the two fused photos, wherein the two fused photos correspond to the input images one by one.

Description

Image fusion method based on motion-friendly multi-focus fusion network

Technical Field

The invention relates to the technical field of image enhancement and computer vision, in particular to an image fusion method based on a motion-friendly multi-focus fusion network, which is used for realizing effective multi-focus image fusion by carrying out special processing on a multi-focus image with motion.

Background

Due to the limitation of hardware conditions, the depth of field of the optical lens is limited, and objects located in the depth of field only have clear appearance when shooting, so that only one of the near view and the far view can be clearly presented. However, a sharp image is more easily observed and perceived by human vision than a blurred image, and a sharp fully focused image may provide more image content and detail. Multi-focus image fusion is a technique that generates a picture in focus everywhere from a set of partially focused pictures taken in the same scene. The method is an effective technology for expanding the depth of field of the optical lens, has important significance in the fields of digital photography, optical microscope, integrated imaging and the like, and is an important field in image processing.

The two source images fused by the multi-focus image have different focusing areas except for objects in focus, and other aspects have to be consistent, so that the requirements on shooting scenes, shooting conditions and shooting equipment are very high. However, in real life, most of the photos are taken by the handheld device of the user, and there may be motion of the object to be taken, and there may be shake of the camera due to the handheld device. At present, when two photographed images are not completely "matched", that is, when there is the motion, a corresponding technology is lacking to enable the two source images to be effectively multi-focus fused, and the existing focused image fusion is based on a static object, so that an image fusion method is needed to be capable of performing multi-focus fusion on the images with the motion by using deep learning.

Disclosure of Invention

The invention aims to provide an image fusion method based on a sports-friendly multi-focus fusion network, which realizes effective multi-focus image fusion by performing special processing on a multi-focus image with sports.

The invention is realized by the following technical scheme: an image fusion method based on a motion-friendly multi-focus fusion network comprises the following steps:

step S1: shooting two partially focused images by using a camera, wherein one image is in foreground focusing and background blurring, and the other image is in foreground blurring and background focusing;

step S2: utilizing a feature alignment module to align the middle features of the images, judging whether cameras in the two images move or not, if so, entering a step S3, and if not, entering the step S3 or the step S4; judging whether the object in the two images moves or not, if so, entering a step S3, and if not, entering the step S3 or the step S4;

step S3: fusing the two input photos by using a motion-friendly multi-focus fusion network MTMFNet, directly outputting the two fused photos according to the motion-friendly multi-focus fusion network MTMFNet, and corresponding to the input images one by one;

step S4: and fusing the two input photos by using a fusion network, and outputting the two fused photos, wherein the two fused photos correspond to the input images one by one.

To better implement the present invention, further, the feature alignment module includes a deconvolution layer.

To better implement the present invention, further, the motion-friendly multi-focus fusion network MTMFNet in step S3 includes:

the motion-friendly multi-focus fusion network MTMFNet comprises a dataset;

the dataset includes a DAVIS video dataset and a Cityscapes street view image dataset.

To better implement the invention, further, the dataset comprises:

the data set selects a moving object in the DAVIS video data set as an image foreground, and selects a street view image in the Cityscapes as an image background;

the image foreground selects non-rigid moving objects with random directions and random amplitudes, and meanwhile, the image background is transformed into the non-rigid moving objects with random directions and random amplitudes.

In order to better implement the present invention, further, the calculation method for fusing the two input photos by the motion-friendly multi-focus fusion network MTMFNet in step S3 includes:

in training the sports-friendly multi-focus fusion network MTMFNet, the loss function used is:；

wherein,；/>；

wherein,representing spatial domain loss, < >>Representing frequency domain loss->Representing the corresponding output image of the input image 1, +.>Representing a ground truth image corresponding to the input image 1. />Fourier transform representing the corresponding output image of the input image 1,/->A fourier transform representing a ground truth image corresponding to the input image 1; />Representing the corresponding output image of the input image 2, +.>Representing the ground truth image corresponding to the input image 2. />Fourier transform representing the corresponding output image of the input image 2,/->Representing the fourier transform of the ground truth image corresponding to the input image 2. H and W represent the height and width of the input image, respectively, < >>A channel index representing an RGB image.

In order to better implement the present invention, further, the fusion network in step S4 includes a multi-focus image fusion network.

Compared with the prior art, the invention has the following advantages:

(1) The existing focusing image fusion is based on a static object, and when two photographed images are not completely matched, namely under the condition of the motion, the two source images can be effectively fused in a multi-focusing mode due to the lack of corresponding technology. The invention judges the image after obtaining the image to see whether the object in the image moves or not and whether the camera shakes or not, if yes, the invention needs to process the multi-focus fusion network which is friendly to the movement, if not, the invention can process the image of the static object or the image which does not shake by selecting the multi-focus fusion network which is friendly to the movement or other focusing fusion networks.

Drawings

The invention is further described with reference to the following drawings and examples, and all inventive concepts of the invention are to be considered as being disclosed and claimed.

Fig. 1 is a schematic structural diagram of a structure of a sports-friendly multi-focus fusion network MTMFNet in the image fusion method based on the sports-friendly multi-focus fusion network.

Fig. 2 is a schematic diagram showing comparison of multi-focus image fusion results in an image fusion method based on a motion-friendly multi-focus fusion network.

Fig. 3 is a schematic flow chart of an image fusion method based on a sports-friendly multi-focus fusion network.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it should be understood that the described embodiments are only some embodiments of the present invention, but not all embodiments, and therefore should not be considered as limiting the scope of protection. All other embodiments, which are obtained by a worker of ordinary skill in the art without creative efforts, are within the protection scope of the present invention based on the embodiments of the present invention.

In the description of the present invention, it should be noted that, unless explicitly stated and limited otherwise, the terms "disposed," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; or may be directly connected, or may be indirectly connected through an intermediate medium, or may be communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.

Example 1:

the image fusion method based on the multi-focus fusion network with the motion friendly performance of the embodiment, as shown in fig. 1-3, comprises the following steps:

step S2: judging whether the cameras in the two images move or not, if so, entering a step S3, and if not, entering the step S3 or the step S4; judging whether the object in the two images moves or not, if so, entering a step S3, and if not, entering the step S3 or the step S4;

In this embodiment, as shown in fig. 1, the input image is a pair of multi-focus images in which there is motion. The output images are two all-focusing images, and are respectively in one-to-one correspondence with the input images. The effective multi-focus image fusion is realized by carrying out special treatment on the multi-focus image with motion; the aligned image features are used for fusing the multi-focus image features, so that the purpose of image fusion is achieved. When a user holds a photographing apparatus to photograph an image, a multi-focus image in which there is a motion may be obtained. Through the processing of the deep learning network, a clear full-focus image can be obtained.

Two partially focused photographs were taken using a camera. One of the front Jing Duijiao and background is blurred, and the other is blurred and the background is focused. The camera is allowed to move slightly in both pictures.

And fusing the two input photos by using a motion-friendly multi-focus Fusion network MTMFNet (Motion Tolerant Multi-focus Fusion Net), wherein the network directly outputs the two fused photos and corresponds to the input images one by one.

The invention relates to a motion-friendly multi-focus fusion network MTMFNet, which is used for carrying out multi-focus fusion on two input multi-focus images with motion. The MTMFNet consists of four modules, namely feature extraction, feature alignment, feature fusion and image restoration. Because the shot object in the two input photos has relative motion, the network is easy to learn difficultly, and therefore, the invention uses the characteristic alignment module to align the middle characteristic of the image, thereby being convenient for the network to grasp the key points.

The real object motion is random, flexible motion with an indefinite range, and the information provided by the input image is too much and complex for the neural network, so that the neural network has difficulty in learning the characteristics thereof. The invention considers simplifying corresponding movement to reduce over-complex information, and adjusts the flexible movement of the foreground and the background into rigid movement so as to reduce the difficulty of network learning of the multi-focus image fusion method. Compared with the existing method for focusing pictures, the existing method for focusing pictures does not allow the camera to shake and shake due to the fact that images of objects which do not move are selected, but the method for focusing pictures based on the multi-focus fusion network is provided.

Example 2:

the embodiment is further optimized based on embodiment 1, as shown in fig. 2, and is a comparison schematic diagram of multi-focus image fusion results, wherein the data set for training the MTMFNet network is a multi-focus image data set with motion using a DAVIS video data set and a Cityscapes street view image data set. The data set takes a moving object in the DAVIS video data set as an image foreground, takes a street view image in the Cityscapes as an image background, selects a non-rigid moving object with random direction and random amplitude for simulating a moving image under a real condition, and simultaneously carries out transformation of the non-rigid movement with random direction and random amplitude on the background. The training set had 17,915 image pairs and the test set had 2,083 image pairs. The DAVIS video dataset and the Cityscapes street view image dataset are proper nouns and have no chinese paraphrasing.

Other portions of this embodiment are the same as those of embodiment 1, and thus will not be described in detail.

Example 3:

this embodiment is further optimized based on the above embodiment 1 or 2, and the loss function used in training the neural network MTMFNet is as follows:

wherein:；

；

wherein,indicating emptyInterdomain loss (I/O)>Representing frequency domain loss->Representing the corresponding output image of the input image 1, +.>Representing a ground truth image corresponding to the input image 1. />Fourier transform representing the corresponding output image of the input image 1,/->A fourier transform representing a ground truth image corresponding to the input image 1; />Representing the corresponding output image of the input image 2, +.>Representing the ground truth image corresponding to the input image 2. />Fourier transform representing the corresponding output image of the input image 2,/->Representing the fourier transform of the ground truth image corresponding to the input image 2. H and W represent the height and width of the input image, respectively, < >>A channel index representing an RGB image.

In the feature alignment module, two images are input, pyramid features are obtained through a convolution layer, features of each level are obtained step by step through deconvolution and up-sampling, and finally alignment is achieved through cascading. Other than the motion-friendly multi-focus convergence network, the remaining convergence networks cannot achieve alignment.

The foregoing description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and any simple modification and equivalent variation of the above embodiment according to the technical matter of the present invention falls within the scope of the present invention.

Claims

1. An image fusion method based on a motion-friendly multi-focus fusion network is characterized by comprising the following steps: step S1: shooting two partially focused images by using a camera, wherein one image is in foreground focusing and background blurring, and the other image is in foreground blurring and background focusing; step S2: utilizing a feature alignment module to align the middle features of the images, judging whether cameras in the two images move or not, if so, entering a step S3, and if not, entering the step S3 or the step S4; judging whether the object in the two images moves or not, if so, entering a step S3, and if not, entering the step S3 or the step S4; step S3: fusing the two input photos by using a motion-friendly multi-focus fusion network MTMFNet, directly outputting the two fused photos according to the motion-friendly multi-focus fusion network MTMFNet, and corresponding to the input images one by one;

the motion-friendly multi-focus fusion network MTMFNet comprises a feature extraction module, a feature alignment module, a feature fusion module and an image recovery module which are connected in sequence;

the feature extraction module includes a feature pyramid block pyramid feature block;

the characteristic alignment Module comprises a deformable alignment Module PCD Module and a deconvolution lamination conv which are connected in sequence;

the feature fusion module comprises an expanded convolution residual error intensive block DRDB and a self-adaptive convolution layer adaptive conv which are connected in sequence;

the image restoration module includes an image restoration module image reconstruction;

step S4: fusing the two input photos by using a fusion network, and outputting the two fused photos, wherein the two fused photos correspond to the input images one by one;

the step S3 includes: the motion-friendly multi-focus fusion network MTMFNet comprises a dataset;

the data set comprises a DAVIS video data set and a Cityscapes street view image data set;

the calculation method for fusing the two input photos by the motion-friendly multi-focus fusion network MTMFNet in the step S3 comprises the following steps:

in training the sports-friendly multi-focus fusion network MTMFNet, the loss function used is:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>； The method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Representing spatial domain loss, < >>Representing a frequency domain loss; />Representing the corresponding output image of the input image 1, +.>Representing a ground real image corresponding to the input image 1;fourier transform representing the corresponding output image of the input image 1,/->A fourier transform representing a ground truth image corresponding to the input image 1;/>representing the corresponding output image of the input image 2, +.>Representing a ground real image corresponding to the input image 2; />Fourier transform representing the corresponding output image of the input image 2,/->A fourier transform representing a ground truth image corresponding to the input image 2; h and W represent the height and width of the input image, respectively, < >>A channel index representing an RGB image.

2. The method of claim 1, wherein the feature alignment module comprises a deconvolution layer.

3. The image fusion method based on a motion-friendly multi-focus fusion network of claim 2, wherein the dataset comprises: the data set selects a moving object in the DAVIS video data set as an image foreground, and selects a street view image in the Cityscapes as an image background; the image foreground selects non-rigid moving objects with random directions and random amplitudes, and meanwhile, the image background is transformed into the non-rigid moving objects with random directions and random amplitudes.

4. The method of claim 1, wherein the fusion network in step S4 comprises a multi-focus image fusion network.