WO2022042049A1 - 图像融合方法、图像融合模型的训练方法和装置 - Google Patents

图像融合方法、图像融合模型的训练方法和装置 Download PDF

Info

Publication number
WO2022042049A1
WO2022042049A1 PCT/CN2021/104634 CN2021104634W WO2022042049A1 WO 2022042049 A1 WO2022042049 A1 WO 2022042049A1 CN 2021104634 W CN2021104634 W CN 2021104634W WO 2022042049 A1 WO2022042049 A1 WO 2022042049A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
fusion
target
infrared
color
Prior art date
Application number
PCT/CN2021/104634
Other languages
English (en)
French (fr)
Inventor
吴华珍
许婷婷
黄芝娟
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21859889.4A priority Critical patent/EP4198875A4/en
Publication of WO2022042049A1 publication Critical patent/WO2022042049A1/zh
Priority to US18/176,240 priority patent/US20230214976A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/73Deblurring; Sharpening
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30232Surveillance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30236Traffic on road, railway or crossing

Definitions

  • the embodiments of the present application relate to the field of computer vision, and in particular, to an image fusion method, a training method and apparatus for an image fusion model.
  • Computer vision is an integral part of various intelligent/autonomous systems in various application fields, such as manufacturing, inspection, document analysis, medical diagnosis, and military. What we need is the knowledge of the data and information of the subject being photographed. To put it figuratively, it is to install eyes (cameras/camcorders) and brains (algorithms) on the computer to identify, track and measure the target instead of the human eye, so that the computer can perceive the environment. Because perception can be viewed as extracting information from sensory signals, computer vision can also be viewed as the science of how to make artificial systems "perceive" from images or multidimensional data. In general, computer vision is to use various imaging systems to replace the visual organ to obtain input information, and then use the computer to replace the brain to complete the processing and interpretation of these input information. The ultimate research goal of computer vision is to enable computers to observe and understand the world through vision like humans, and have the ability to adapt to the environment autonomously.
  • the image quality has a significant impact on the image processing effect.
  • the current photographing device can obtain better imaging results under the condition of high illumination, for example, in an ideal daytime condition.
  • the captured images or videos have problems such as low resolution, poor contrast, and loss of image details.
  • Current devices usually use near-infrared supplementary light to improve low-illumination.
  • the imaging quality of the scene but due to its imaging characteristics, the infrared image cannot restore the true color of the object.
  • a fusion image can be obtained by fusing the color image and the infrared image to improve the imaging quality.
  • the current fusion methods cannot guarantee the fusion effect, and a lot of details are lost in the output image, which affects the quality of the output image.
  • the present application provides an image fusion method, a training method and device for an image fusion model, which can make the fused image contain more image details, improve the quality of the fused image, and at the same time ensure that the color of the fused image is accurate and natural.
  • a first aspect provides an image fusion method, the method includes: acquiring a color image to be processed, an infrared image and a background reference image, the infrared image and the color image to be processed are shot for the same scene, and the same scene refers to the to-be-processed color image.
  • the similarity between the processed color image and the infrared image is greater than the first threshold;
  • the color image to be processed is an image formed by the scene's reflection of visible light, and the infrared image is an image formed by the scene's reflection of light in the infrared band;
  • the color image, infrared image and background reference image are input into the trained image fusion model for feature extraction, and image fusion is performed based on the extracted features to obtain a fusion image; wherein, the similarity between the background reference image and the color image to be processed degree is greater than the second threshold.
  • the color image has rich color information
  • the infrared image has more texture information
  • the fusion image obtained by the fusion of the two has natural color and rich texture information, which significantly improves the foreground quality of the fusion image.
  • the background blur problem that may be caused by the flashlight effect of the infrared image can be solved, and the background quality of the output image can be greatly improved, that is, the quality of the foreground area and the background area of the output image can be enhanced at the same time to achieve a full-screen image. enhanced.
  • the similarity in this embodiment of the present application may be image texture similarity.
  • the similarity between the color image to be processed and the infrared image may be the image texture similarity between the color image to be processed and the infrared image.
  • the similarity between the background reference image and the color image to be processed may be the image texture similarity between the background reference image and the color image to be processed.
  • the background area in the background reference image is the same as the background area in the color image to be processed.
  • the similarity between the background reference image and the color image to be processed is greater than the second threshold, which may be that the similarity between the background area of the background reference image and the background area of the color image to be processed is greater than the second threshold.
  • the background region may be determined by the prior art, which is not limited in this embodiment of the present application.
  • the infrared image may be an infrared image captured on the same area at the same time as the color image to be processed.
  • the background reference image can be input into the image fusion model in the form of a color map, or it can be input into the image fusion model in the form of a grayscale map.
  • the method further includes: acquiring a fusion weight, and inputting the fusion weight into the image fusion model; wherein the fusion weight is used to weight the color image and infrared image to be processed .
  • the fusion weight is used to adjust the fusion ratio of the color image to be processed and the infrared image in the fusion image.
  • the fusion images obtained by using the same image fusion model cannot meet the fusion requirements of different application scenarios.
  • the fusion ratio of color images and infrared images can be adjusted, which is beneficial for different applications. application scenarios. That is to say, there is no need to train multiple image fusion models for different application scenarios, and it can be applied to different scenarios only by adjusting the fusion weight, which improves the degree of freedom of the model.
  • the fusion weight corresponds to some or all of the fusion images.
  • the fusion weight corresponds to all the fusion images, and there is only one fusion weight in the whole fusion image. In any region in the fused image, the fusion ratio of the color image to be processed and the infrared image is the same.
  • the fusion weight corresponding to a part of the fusion image can be understood as the fusion weight corresponding to a region in the fusion image.
  • the number of fusion weights may be multiple, and the multiple fusion weights respectively correspond to different regions in the fusion image.
  • different regions correspond to different fusion weights, so as to meet the requirements for image fusion of different regions in the same image, which is beneficial to improve the image quality of the output image.
  • the fusion weight is greater than or equal to 0 and less than or equal to 1, and the proportion of infrared images in the fusion image is positively correlated with the fusion weight.
  • the color images to be processed include N frames of color images
  • the infrared images include N frames of infrared images corresponding to the N frames of color images
  • background reference images corresponding to the N frames of color images It is determined according to the background reference images of M frames of color images in the N frames of color images, M is a positive integer, N is a positive integer greater than 1, and N>M.
  • this method can be used for image fusion of any frame of color image and infrared image in the video.
  • the N frames of infrared images corresponding to the N frames of color images may be obtained when the same area is photographed at the same time as the N frames of color images. That is to say, there is a one-to-one correspondence between N frames of color images and N frames of infrared images.
  • the features of the background reference image of the previous frame are multiplexed, and it is not necessary to extract the features of the background reference image in each fusion process, which reduces the amount of calculation, and can reduce hardware while ensuring imaging quality. In the case of limited computing resources of the device, image fusion can still be achieved.
  • the features of N frames of color images and the features of N frames of infrared images are respectively extracted; the features of M background reference images corresponding to M frames of color images are extracted respectively; The features of one frame of color images, the features of N frames of infrared images, and the features of M background reference images are reconstructed to obtain N fusion images.
  • N frames of color images and N frames of infrared images can be input into the image fusion model at the same time, so that the features of the N frames of color images and the features of the N frames of infrared images can be simultaneously extracted, thereby further improving the processing speed.
  • N frames of color images and the N frames of infrared images can also be input into the image fusion model in sequence, and the features of the N frames of color images and the features of the N frames of infrared images are sequentially extracted.
  • multiple frames of images are simultaneously fused, the processing speed is improved, and the features of the background reference image are multiplexed, which reduces the amount of calculation in the process of extracting the features of the background reference image and reduces the hardware overhead.
  • the background reference image is obtained in any of the following ways: obtaining the background reference image according to multiple frames before the color image to be processed;
  • the long-exposure frame is used as the background reference image, and the long-exposure frame is the frame obtained when the exposure duration is greater than the third threshold;
  • the result of temporal noise reduction of the color image to be processed is used as the background reference image;
  • the fused image of the frame before the color image is used as the background reference image.
  • the trained image fusion model uses the first color image and the first infrared image as the input of the image fusion model, so that the value of the loss function is smaller than the fourth threshold. It is obtained by training the image fusion model for the target; the loss function includes a first loss function, the first loss function is used to indicate the difference between the image output by the image fusion model and the target fusion image, and the target fusion image is based on the target color image and Determined by the target infrared image, the first color image, the first infrared image, the target color image and the target infrared image are taken for the same scene, and the same scene refers to the first color image, the first infrared image, the target color image and the target.
  • the similarity between any two images in the infrared image is greater than the first threshold, the signal-to-noise ratio of the target color image is higher than the signal-to-noise ratio of the first color image, and the signal-to-
  • the target fusion image is determined by the target color image and the target infrared image, and the image fusion model is trained based on the target fusion image, so that the image fusion model can make full use of infrared information, which is conducive to the fusion of more images in the output image. Texture information to preserve more image details.
  • the loss function further includes a second loss function, where the second loss function is used to indicate the difference between the target color image and the image output by the image fusion model.
  • the image output by the loss constraint image fusion model is as similar as possible to the target color image, which can not only ensure the effect of noise reduction, but also ensure that the output image color is consistent with the target color image, so as to avoid the appearance of the output image The problem of wrong colors.
  • the denoising task and the fusion task are performed collaboratively to reduce information loss, which not only ensures that rich texture details are retained in the fused image, but also ensures that the fused image achieves high resolution and true color information.
  • the target fusion image is an image of a brightness channel
  • the difference between the image output by the image fusion model and the target fusion image is the difference between the brightness channel of the image output by the image fusion model and the brightness channel of the image output by the image fusion model. The difference between the target fusion images.
  • training at the luminance channel level is beneficial to the fusion of more texture features and reduces the influence of other factors on the image fusion process.
  • a second aspect provides a training method for an image fusion model, the training method includes: acquiring at least one training sample, the training sample includes a first color image, a first infrared image, a target color image and a target infrared image, the first color image , The first infrared image, the target color image and the target infrared image are taken for the same scene, and the same scene refers to any two images between the first color image, the first infrared image, the target color image and the target infrared image.
  • the similarity is greater than the first threshold, the first color image and the target color image are images formed by the scene’s reflection of visible light, and the first infrared image and the target infrared image are images formed by the scene’s reflection of light in the infrared band; the target color image
  • the signal-to-noise ratio of the target infrared image is higher than that of the first color image, and the signal-to-noise ratio of the target infrared image is higher than that of the first infrared image
  • the first color image and the first infrared image are used as the input of the image fusion model
  • the image fusion model is trained with the value of the loss function less than the fourth threshold as the target, and a trained image fusion model is obtained; wherein, the loss function includes a first loss function, and the first loss function is used to indicate that the image output by the image fusion model is different from that of the image fusion model.
  • the difference between the target fusion images, the target fusion images are determined from the
  • the color image may also be referred to as a visible light image.
  • the first infrared image may be an infrared image captured at the same time as the first color image.
  • the target infrared image is the infrared image at the same time as the target color image.
  • Shooting for the same scene can be understood as the same content in the image, such as the same scene shot at the same location.
  • the similarity in this embodiment of the present application may be image texture similarity.
  • the similarity between any two images among the first color image, the first infrared image, the target color image, and the target infrared image may be the same as that of the first color image, the first infrared image, the target color image, and the target infrared image.
  • the color image has rich color information
  • the infrared image has more texture information.
  • the fusion image obtained by the fusion of the two has natural color and rich texture information.
  • the image determines the target fusion image, and trains the image fusion model based on the target fusion image, so that the image fusion model can make full use of infrared information, which is beneficial to fuse more texture information in the output image and retain more image details.
  • the first color image and the first infrared image are used as the input of the image fusion model, and the image fusion model is trained with the value of the loss function being less than the fourth threshold as the target.
  • obtaining a trained image fusion model including: using the first fusion weight, the first color image and the first infrared image as the input of the image fusion model, and training the image fusion model with the value of the loss function less than the fourth threshold as the target , the trained image fusion model is obtained, the first fusion weight is used to weight the first color image and the first infrared image, and the target fusion image is determined according to the first fusion weight, the target color image and the target infrared image.
  • the first fusion weight is used to adjust the fusion ratio of the first color image and the first infrared image in the image output by the image fusion model.
  • determining the target fusion image according to the first fusion weight, the target color image and the target infrared image includes: determining the supervision image according to the target color image and the target infrared image, and weighting the supervision image and the target color image according to the first fusion weight. .
  • the proportion of the supervised image and the target color image in the target fusion image is adjusted according to the first fusion weight.
  • the fusion images obtained by using the same image fusion model cannot meet the fusion requirements of different application scenarios.
  • the fusion ratio of color images and infrared images can be adjusted, which is beneficial for different applications. application scenarios. That is to say, there is no need to train multiple image fusion models for different application scenarios, and it can be applied to different scenarios only by adjusting the fusion weight, which improves the degree of freedom of the model.
  • the first fusion weight corresponds to part or all of the images output by the image fusion model.
  • the first fusion weight corresponds to all the images output by the image fusion model, and it can be understood that there is only one first fusion weight in the images output by the entire image fusion model. In any region in the image output by the image fusion model, the fusion ratio of the first color image and the first infrared image is the same.
  • the first fusion weight corresponds to a part of the image output by the image fusion model, and the first fusion weight corresponds to an area in the image output by the image fusion model.
  • the number of the first fusion weights may be multiple, and the multiple first fusion weights respectively correspond to different regions in the image output by the image fusion model.
  • the first fusion weight can be understood as a local weight. The local weight is used to indicate the fusion weight of the local region during the image fusion process. During the fusion process, different regions may adopt different first fusion weights.
  • different regions correspond to different fusion weights, so as to meet the requirements for image fusion of different regions in the same image, which is beneficial to improve the image quality of the output image.
  • the first fusion weight may be input into the image fusion model in the form of a parameter, or may be input into the image fusion model in the form of a fusion weight map, which is not limited in this application.
  • Representing the first fusion weight in the form of a fusion weight map can reduce the complexity of adjusting the first fusion weight.
  • the form of the fusion weight map is more favorable for representing the region corresponding to the first fusion weight.
  • the first color image and the first infrared image are used as the input of the image fusion model, and the image fusion model is trained with the value of the loss function being less than the fourth threshold as the target.
  • obtain a trained image fusion model including: taking the first background reference image, the first color image and the first infrared image as the input of the image fusion model, and taking the value of the loss function less than the fourth threshold as the target to perform the image fusion model
  • a trained image fusion model is obtained, and the similarity between the first background reference image and the first color image is greater than the second threshold.
  • the background area in the first background reference image is the same as the background area in the first color image.
  • the similarity between the first background reference image and the first color image is greater than the second threshold, which may be that the similarity between the background area of the first background reference image and the background area of the first color image is greater than the second threshold.
  • the background region may be determined by the prior art, which is not limited in this embodiment of the present application.
  • the first background reference image may be input into the image fusion model in the form of a color image, or may be input into the image fusion model in the form of a grayscale image.
  • the background quality is to enhance the quality of the foreground area and the background area of the output image at the same time, so as to realize the image enhancement of the whole screen.
  • the loss function further includes a second loss function, where the second loss function is used to indicate the difference between the target color image and the image output by the image fusion model.
  • the image output by the loss constraint image fusion model is as similar as possible to the target color image, which can not only ensure the effect of noise reduction, but also ensure that the output image color is consistent with the target color image, so as to avoid the appearance of the output image The problem of wrong colors.
  • the denoising task and the fusion task are performed collaboratively to reduce information loss, which not only ensures that rich texture details are retained in the fused image, but also ensures that the fused image achieves high resolution and true color information.
  • the target fusion image is an image of a brightness channel
  • the difference between the image output by the image fusion model and the target fusion image is the difference between the brightness channel of the image output by the image fusion model and the brightness channel of the image output by the image fusion model. The difference between the target fusion images.
  • training at the luminance channel level is beneficial to the fusion of more texture features and reduces the influence of other factors on the image fusion process.
  • the target fusion image satisfies the following formula:
  • y fuse_adj y fuse ⁇ IN_FuseMap+(1-IN_FuseMap) ⁇ y gt_Vis ;
  • yfuse_adj represents the target fusion image
  • yfuse represents the fusion image obtained from the brightness channel of the target color image and the brightness channel of the target infrared image
  • IN_FuseMap represents the fusion weight map
  • y gt_Vis represents the brightness channel of the target color image.
  • the values on different regions in the fusion weight map respectively indicate the corresponding weights of different regions of the image.
  • an image fusion apparatus in a third aspect, includes a module/unit for performing the method in any one of the implementation manners of the first aspect.
  • an apparatus for training an image fusion model includes a module/unit for executing the method in any one of the implementation manners of the second aspect.
  • an image fusion apparatus comprising: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, the processor is used for executing the program in the first aspect method in any implementation of .
  • a sixth aspect provides a training device for an image fusion model, the device comprising: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, the processor is used for executing the first The method in any one of the two aspects.
  • a computer-readable medium stores program code for execution by a device, the program code comprising a method for executing any one of the first aspect or the second aspect. .
  • a computer program product containing instructions, which, when the computer program product runs on a computer, causes the computer to execute the method in any one of the implementation manners of the first aspect or the second aspect.
  • a ninth aspect provides a chip, the chip includes a processor and a data interface, the processor reads an instruction stored in a memory through the data interface, and executes any one of the first aspect or the second aspect above method in the implementation.
  • the chip may further include a memory, in which instructions are stored, the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the The processor is configured to execute the method in any one of the implementation manners of the first aspect or the second aspect.
  • the above chip may specifically be a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC).
  • FPGA field-programmable gate array
  • ASIC application-specific integrated circuit
  • a tenth aspect provides an electronic device, where the electronic device includes the apparatus in any one of the implementation manners of the third aspect to the fourth aspect.
  • FIG. 1 is a schematic structural diagram of a system architecture provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a convolutional neural network provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a chip hardware structure provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a color image and an infrared image captured at night;
  • FIG. 6 is a schematic diagram of a device for acquiring a color image and an infrared image provided by an embodiment of the present application
  • FIG. 7 is a schematic diagram of another device for acquiring a color image and an infrared image provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of yet another device for acquiring a color image and an infrared image provided by an embodiment of the present application.
  • FIG. 9 is a schematic block diagram of an image fusion apparatus provided by an embodiment of the present application.
  • FIG. 10 is a schematic flowchart of a training method for an image fusion model provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of a training sample provided by an embodiment of the present application.
  • FIG. 12 is a schematic diagram of a background reference image provided by an embodiment of the present application.
  • FIG. 13 is a schematic block diagram of an image fusion model provided by an embodiment of the present application.
  • FIG. 14 is a schematic block diagram of another image fusion model provided by an embodiment of the present application.
  • 15 is a schematic diagram of a method for obtaining fusion weights provided by an embodiment of the present application.
  • 16 is a schematic flowchart of an image fusion method provided by an embodiment of the present application.
  • 17 is a schematic diagram of a fusion image under different fusion weights provided by an embodiment of the present application.
  • FIG. 18 is a schematic diagram of a fusion result provided by an embodiment of the present application.
  • 20 is a schematic block diagram of an apparatus for training an image fusion model provided by an embodiment of the present application.
  • FIG. 21 is a schematic block diagram of an image fusion apparatus provided by an embodiment of the present application.
  • 22 is a schematic block diagram of an apparatus for training an image fusion model provided by an embodiment of the present application.
  • FIG. 23 is a schematic block diagram of an image fusion apparatus provided by an embodiment of the present application.
  • the image fusion method provided by the embodiment of the present application can be applied to video surveillance, safe city, night shooting, and scenes where image quality needs to be improved.
  • the image fusion method according to the embodiment of the present application can be applied to video surveillance and nighttime shooting, and the video surveillance and nighttime shooting are briefly introduced below.
  • Video surveillance is an important means of comprehensive public security and traffic supervision in the current city. With the development of imaging technology, the current monitoring equipment can obtain better imaging results under ideal conditions during the day. However, in some unsatisfactory situations, such as rainy weather or nighttime scenes with poor illumination, the collected surveillance images have problems such as low resolution, poor contrast, and loss of image details.
  • the method provided by the embodiment of the present application can significantly improve the imaging quality of the collected surveillance video, better meet the monitoring personnel's requirement for the definition of the surveillance video, and facilitate the monitoring personnel to view and obtain valuable information.
  • the quality of nighttime imaging can be improved and the user experience can be improved.
  • the quality of nighttime imaging can be significantly improved, the user's demand for nighttime photography can be met, the user's post-processing time is saved, and the user experience is improved.
  • the I/O interface 112 of the execution device 110 can combine the images processed by the execution device (such as fusion images) with the waiting images input by the user.
  • the processed color image and the infrared image are sent to the database 130 together as a pair of training data, so that the training data maintained by the database 130 is more abundant, thereby providing more abundant training data for the training work of the training device 120 .
  • the training method of the image fusion model provided in the embodiment of the present application involves the processing of computer vision, and can be specifically applied to data processing methods such as data training, machine learning, and deep learning.
  • the target color image, the first infrared image and the target infrared image) are symbolized and formalized for intelligent information modeling, extraction, preprocessing, training, etc., and finally a trained image fusion network is obtained;
  • the image fusion method can use the above-mentioned trained image fusion network to input input data (such as the color image and infrared image to be processed in this application) into the trained image fusion network, and obtain output data (such as this application). fused images in ).
  • image fusion network training method and image fusion method provided in the embodiments of this application are inventions based on the same concept, and can also be understood as two parts in a system, or two stages of an overall process : such as model training phase and model application phase.
  • a neural network can be composed of neural units, and a neural unit can refer to an operation unit that takes xs and intercept 1 as inputs, and the output of the operation unit can be:
  • s 1, 2,...n, n is a natural number greater than 1
  • Ws is the weight of xs
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to perform nonlinear transformation on the features obtained in the neural network, and convert the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next convolutional layer.
  • the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting many of the above single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field, and the local receptive field can be an area composed of several neural units.
  • a deep neural network also known as a multi-layer neural network, can be understood as a neural network with multiple hidden layers.
  • the DNN is divided according to the positions of different layers.
  • the neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the middle layers are all hidden layers.
  • the layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
  • DNN looks complicated, in terms of the work of each layer, it is not complicated. In short, it is the following linear relationship expression: in, is the input vector, is the output vector, is the offset vector, W is the weight matrix (also called coefficients), and ⁇ () is the activation function.
  • Each layer is just an input vector After such a simple operation to get the output vector Due to the large number of DNN layers, the coefficient W and offset vector The number is also higher.
  • the DNN Take the coefficient W as an example: Suppose that in a three-layer DNN, the linear coefficient from the fourth neuron in the second layer to the second neuron in the third layer is defined as The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.
  • the coefficient from the kth neuron in the L-1 layer to the jth neuron in the Lth layer is defined as
  • the input layer does not have a W parameter.
  • more hidden layers allow the network to better capture the complexities of the real world.
  • a model with more parameters is more complex and has a larger "capacity", which means that it can complete more complex learning tasks.
  • Training the deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vectors W of many layers).
  • Convolutional neural network is a deep neural network with a convolutional structure.
  • a convolutional neural network consists of a feature extractor consisting of convolutional layers and subsampling layers, which can be viewed as a filter.
  • the convolutional layer refers to the neuron layer in the convolutional neural network that convolves the input signal.
  • a convolutional layer of a convolutional neural network a neuron can only be connected to some of its neighbors.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some neural units arranged in a rectangle. Neural units in the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as the way of extracting features independent of location.
  • the convolution kernel can be formalized in the form of a matrix of random size, and the convolution kernel can be learned to obtain reasonable weights during the training process of the convolutional neural network.
  • the immediate benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
  • the pixel value of the image can be a red-green-blue (RGB) color value, and the pixel value can be a long integer representing the color.
  • the pixel value is 256*Red+100*Green+76*Blue, where Blue represents the blue component, Green represents the green component, and Red represents the red component. In each color component, the smaller the value, the lower the brightness, and the larger the value, the higher the brightness.
  • the pixel values can be grayscale values.
  • YUV is a color space
  • Y represents the brightness (Luminance or Luma), that is, the grayscale value
  • U and “V” represent the chromaticity (Chrominance or Chroma), which is used to describe the color and saturation of the image, used for Specifies the color of the pixel.
  • “U” and “V” are the two components that make up the color.
  • the importance of using the YUV color space is that its luminance signal Y and chrominance signals U, V are separated. If there is only the Y signal component and no U, V signal components, then the image represented in this way is a black and white grayscale image.
  • the luma signal may also be referred to as the luma channel
  • the chrominance signal may also be referred to as the chrominance channel.
  • the encoder is used to extract the features of the input image.
  • the encoder may employ a neural network, for example, a convolutional neural network.
  • the decoder is used to restore the extracted features into an image.
  • the decoder may employ a neural network, eg, a convolutional neural network.
  • an embodiment of the present application provides a system architecture 100 .
  • a data collection device 160 is used to collect training data.
  • the training data in this embodiment of the present application may include: a first color image, a target color image, a first infrared image, and a target infrared image; after collecting the training data, the data acquisition device 160 stores the training data in the database 130 , the training device 120 obtains the target model/rule 101 by training based on the training data maintained in the database 130 .
  • the training device 120 processes the first color image and the first infrared image, and compares the output image with the target fusion image, until the difference between the image output by the training device 120 and the target fusion image is less than a certain threshold, Thus, the training of the target model/rule 101 is completed.
  • the target model/rule 101 can be used to implement the image fusion method provided in the embodiment of the present application, that is, the image to be processed, such as the color image and infrared image to be processed, is input into the target model/rule 101 after relevant preprocessing, The fused image can be obtained.
  • the target model/rule 101 in this embodiment of the present application may specifically be a neural network.
  • the training data maintained in the database 130 may not necessarily come from the collection of the data collection device 160, and may also be received from other devices.
  • the training device 120 may not necessarily train the target model/rule 101 completely based on the training data maintained by the database 130, and may also obtain training data from the cloud or other places for model training. The above description should not be used as a reference to this application Limitations of Examples.
  • the target model/rule 101 trained according to the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in FIG. Laptops, augmented reality (AR)/virtual reality (VR), in-vehicle terminals, etc., can also be servers or the cloud.
  • the execution device 110 is configured with an (input/output, I/O) interface 112 for data interaction with external devices.
  • the user can input data to the I/O interface 112 through the client device 140, and the input
  • the data may include: color images and infrared images to be processed.
  • the preprocessing module 113 is configured to perform preprocessing according to the input data received by the I/O interface 112 (such as the color image and infrared image to be processed).
  • the processed color image or infrared image gets fusion weights.
  • the preprocessing module 114 may be used to obtain a background reference image.
  • the preprocessing module 113 and the preprocessing module 114 may also be absent, and the calculation module 111 may be directly used to process the input data.
  • the execution device 110 When the execution device 110 preprocesses the input data, or the calculation module 111 of the execution device 110 performs calculations and other related processing, the execution device 110 can call the data, codes, etc. in the data storage system 150 for corresponding processing , the data and instructions obtained by corresponding processing may also be stored in the data storage system 150 .
  • the I/O interface 112 returns the processing result, such as the fused image obtained above, to the client device 140, so as to be provided to the user.
  • the training device 120 can generate corresponding target models/rules 101 based on different training data for different goals or tasks, and the corresponding target models/rules 101 can be used to achieve the above goals or complete The above task, thus providing the user with the desired result.
  • the user can manually specify input data, which can be operated through the interface provided by the I/O interface 112 .
  • the client device 140 can automatically send the input data to the I/O interface 112 . If the user's authorization is required to request the client device 140 to automatically send the input data, the user can set the corresponding permission in the client device 140 .
  • the user can view the result output by the execution device 110 on the client device 140, and the specific presentation form can be a specific manner such as display, sound, and action.
  • the client device 140 can also be used as a data collection terminal to collect the input data of the input I/O interface 112 and the output result of the output I/O interface 112 as new sample data as shown in the figure, and store them in the database 130 .
  • the I/O interface 112 directly uses the input data input into the I/O interface 112 and the output result of the output I/O interface 112 as shown in the figure as a new sample The data is stored in database 130 .
  • FIG. 1 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data storage system 150 is an external memory relative to the execution device 110 , and in other cases, the data storage system 150 may also be placed in the execution device 110 .
  • a target model/rule 101 is obtained by training according to the training device 120.
  • the target model/rule 101 may be the neural network in the present application.
  • the neural network in the present application may include CNN or deep convolutional neural networks (DCNN) etc.
  • CNN is a very common neural network
  • a convolutional neural network is a deep neural network with a convolutional structure and a deep learning architecture. learning at multiple levels of abstraction.
  • CNN is a feed-forward artificial neural network in which individual neurons can respond to images fed into it.
  • a convolutional neural network (CNN) 200 may include an input layer 210 , a convolutional/pooling layer 220 (where the pooling layer is optional), and a fully connected layer 230 .
  • the convolutional/pooling layer 220 may include layers 221-226 as examples, for example: in one implementation, layer 221 is a convolutional layer, layer 222 is a pooling layer, and layer 223 is a convolutional layer Layer 224 is a pooling layer, 225 is a convolutional layer, and 226 is a pooling layer; in another implementation, 221 and 222 are convolutional layers, 223 are pooling layers, and 224 and 225 are convolutional layers. layer, 226 is the pooling layer. That is, the output of a convolutional layer can be used as the input of a subsequent pooling layer, or it can be used as the input of another convolutional layer to continue the convolution operation.
  • the convolution layer 221 may include many convolution operators.
  • the convolution operator is also called a kernel. Its role in image processing is equivalent to a filter that extracts specific information from the input image matrix.
  • the convolution operator is essentially Can be a weight matrix, which is usually pre-defined, usually one pixel by one pixel (or two pixels by two pixels) along the horizontal direction on the input image during the convolution operation on the image. ...It depends on the value of the stride step) to process, so as to complete the work of extracting specific features from the image.
  • the size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix is the same as the depth dimension of the input image.
  • each weight matrix is stacked to form the depth dimension of the convolutional image, where the dimension can be understood as determined by the "multiple" described above.
  • Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract image edge information, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to extract unwanted noise in the image.
  • the multiple weight matrices have the same size (row ⁇ column), and the size of the feature maps extracted from the multiple weight matrices with the same size is also the same, and then the multiple extracted feature maps with the same size are combined to form a convolution operation. output.
  • weight values in these weight matrices need to be obtained through a lot of training in practical applications, and each weight matrix formed by the weight values obtained by training can be used to extract information from the input image, so that the convolutional neural network 200 can make correct predictions .
  • the initial convolutional layer eg, 221
  • the features extracted by the later convolutional layers eg, 226 become more and more complex, such as features such as high-level semantics.
  • features with higher semantics are more suitable for the problem to be solved.
  • the pooling layer can be a convolutional layer followed by a layer.
  • the pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers.
  • the pooling layer may include an average pooling operator and/or a max pooling operator for sampling the input image to obtain a smaller size image.
  • the average pooling operator can calculate the pixel values in the image within a certain range to produce an average value as the result of average pooling.
  • the max pooling operator can take the pixel with the largest value within a specific range as the result of max pooling. Also, just as the size of the weight matrix used in the convolutional layer should be related to the size of the image, the operators in the pooling layer should also be related to the size of the image.
  • the size of the output image after processing by the pooling layer can be smaller than the size of the image input to the pooling layer, and each pixel in the image output by the pooling layer represents the average or maximum value of the corresponding sub-region of the image input to the pooling layer.
  • the convolutional neural network 200 After being processed by the convolutional layer/pooling layer 220, the convolutional neural network 200 is not sufficient to output the required output information. Because as mentioned before, the convolutional layer/pooling layer 220 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other relevant information), the convolutional neural network 200 needs to utilize the fully connected layer 230 to generate one or a set of outputs of the required number of classes. Therefore, the fully connected layer 230 may include multiple hidden layers (231, 232 to 23n as shown in FIG. 2), and the parameters contained in the multiple hidden layers may be based on the relevant training data of specific task types Pre-trained, for example, the task type can include image recognition, image classification, image super-resolution reconstruction, etc...
  • the output layer 240 After the multi-layer hidden layers in the fully connected layer 230, that is, the last layer of the entire convolutional neural network 200 is the output layer 240, the output layer 240 has a loss function similar to the classification cross entropy, and is specifically used to calculate the prediction error,
  • the forward propagation of the entire convolutional neural network 200 (as shown in Figure 2, the propagation from the direction 210 to 240 is forward propagation)
  • the back propagation (as shown in Figure 2, the propagation from the 240 to 210 direction is the back propagation) will Start to update the weight values and biases of the aforementioned layers to reduce the loss of the convolutional neural network 200 and the error between the result output by the convolutional neural network 200 through the output layer and the ideal result.
  • the convolutional neural network 200 shown in FIG. 2 is only used as an example of a convolutional neural network.
  • the convolutional neural network can also exist in the form of other network models. Including a part of the network structure shown in FIG. 2 , for example, the convolutional neural network adopted in this embodiment of the present application may only include an input layer 210 , a convolutional layer/pooling layer 220 and an output layer 240 .
  • FIG. 3 is a hardware structure of a chip provided by an embodiment of the application, and the chip includes a neural network processor 30 .
  • the chip can be set in the execution device 110 as shown in FIG. 1 to complete the calculation work of the calculation module 111 .
  • the chip can also be set in the training device 120 as shown in FIG. 1 to complete the training work of the training device 120 and output the target model/rule 101 .
  • the algorithms of each layer in the convolutional neural network shown in Figure 2 can be implemented in the chip shown in Figure 3.
  • Both the image fusion method and the training method of the image fusion model in the embodiments of the present application can be implemented in the chip as shown in FIG. 3 .
  • the neural network processor 30 may be a neural-network processing unit (NPU), a tensor processing unit (TPU), or a graphics processor (graphics processing unit, GPU), etc., all suitable for large-scale applications.
  • NPU neural-network processing unit
  • TPU tensor processing unit
  • GPU graphics processor
  • the NPU is mounted on the main central processing unit (CPU) (host CPU) as a co-processor, and the main CPU assigns tasks.
  • the core part of the NPU is the operation circuit 303, and the controller 304 controls the operation circuit 303 to extract the data in the memory (weight memory or input memory) and perform operations.
  • TPU is Google's fully customized artificial intelligence accelerator application-specific integrated circuit for machine learning.
  • the arithmetic circuit 303 includes multiple processing units (process engines, PEs). In some implementations, arithmetic circuit 303 is a two-dimensional systolic array. The arithmetic circuit 303 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, arithmetic circuit 303 is a general-purpose matrix processor.
  • the arithmetic circuit 303 fetches the weight data of the matrix B from the weight memory 302 and buffers it on each PE in the arithmetic circuit 303 .
  • the arithmetic circuit 303 fetches the input data of the matrix A from the input memory 301 , performs matrix operations according to the input data of the matrix A and the weight data of the matrix B, and stores the partial result or the final result of the matrix in the accumulator 308 .
  • the vector calculation unit 307 can further process the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on.
  • the vector computing unit 307 can be used for network computation of non-convolutional/non-FC layers in the neural network, such as pooling, batch normalization, local response normalization, etc. .
  • the vector computation unit 307 can store the processed output vectors to the unified buffer 306 .
  • the vector calculation unit 307 may apply a nonlinear function to the output of the arithmetic circuit 303, such as a vector of accumulated values, to generate activation values.
  • vector computation unit 307 generates normalized values, merged values, or both.
  • vector computation unit 307 stores the processed vectors to unified memory 306 .
  • the vector processed by the vector computing unit 307 can be used as the activation input of the arithmetic circuit 303, for example, for use in subsequent layers in the neural network, as shown in FIG. 2, if the current processing layer is the hidden layer 1 (231), the vector processed by the vector calculation unit 307 can also be used for calculation in the hidden layer 2 (232).
  • Unified memory 306 is used to store input data and output data.
  • the weight data is directly stored in the weight memory 302 through a storage unit access controller (direct memory access controller, DMAC) 305.
  • Input data is also stored in unified memory 306 via the DMAC.
  • the bus interface unit (bus interface unit, BIU) 310 is used for the interaction of the DMAC and the instruction fetch buffer (instruction fetch buffer) 309; the bus interface unit 301 is also used for the instruction fetch memory 309 to obtain instructions from the external memory; the bus interface unit 301 also The memory cell access controller 305 acquires the original data of the input matrix A or the weight matrix B from the external memory.
  • the DMAC is mainly used to store the input data in the external memory DDR into the unified memory 306 , or store the weight data into the weight memory 302 , or store the input data into the input memory 301 .
  • the instruction fetch memory (instruction fetch buffer) 309 connected with the controller 304 is used to store the instructions used by the controller 304;
  • the controller 304 is used for invoking the instructions cached in the memory 309 to realize the working process of controlling the operation accelerator
  • the unified memory 306 , the input memory 301 , the weight memory 302 and the instruction fetch memory 309 are all on-chip memories, and the external memory is the memory outside the NPU, and the external memory can be double data rate synchronous dynamic random access.
  • Memory double data rate synchronous dynamic random access memory, DDR SDRAM), high bandwidth memory (high bandwidth memory, HBM) or other readable and writable memory.
  • each layer in the convolutional neural network shown in FIG. 2 may be performed by the operation circuit 303 or the vector calculation unit 307 .
  • both the training method of the image fusion model and the image fusion method in the embodiment of the present application may be executed by the operation circuit 303 or the vector calculation unit 307 .
  • an embodiment of the present application provides a system architecture 400 .
  • the system architecture includes a local device 401, a local device 402, an execution device 410 and a data storage system 450, wherein the local device 401 and the local device 402 are connected with the execution device 410 through a communication network.
  • Execution device 410 may be implemented by one or more servers.
  • the execution device 410 may be used in conjunction with other computing devices, such as data storage, routers, load balancers and other devices.
  • the execution device 410 may be arranged on one physical site, or distributed across multiple physical sites.
  • the execution device 410 may use the data in the data storage system 450 or call the program code in the data storage system 450 to implement the training method of the time series prediction model in this embodiment of the present application.
  • the execution device 410 may perform the following processes:
  • the training sample includes a first color image, a first infrared image, a target color image and a target infrared image, and the first color image, the first infrared image, the target color image and the target infrared image are taken for the same scene , the same scene refers to that the similarity between any two images in the first color image, the first infrared image, the target color image and the target infrared image is greater than the first threshold, and the first color image and the target color image are the scene pair
  • the image formed by the reflection of visible light, the first infrared image and the target infrared image are images formed by the scene's reflection of light in the infrared band;
  • the signal-to-noise ratio of the target color image is higher than the signal-to-noise ratio of the first color image, and the target infrared image is The signal-to-noise ratio is higher than the signal-to-noise ratio of the first infrared image;
  • the first color the
  • an image fusion model can be obtained, and the image fusion model can be used to obtain a fused image.
  • a user may operate respective user devices (eg, local device 401 and local device 402 ) to interact with execution device 410 .
  • Each local device may represent any computing device, such as a personal computer, computer workstation, smartphone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set-top box, gaming console, etc.
  • Each user's local device can interact with the execution device 410 through a communication network of any communication mechanism/communication standard.
  • the communication network can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.
  • the local device 401 and the local device 402 obtain the image fusion model from the execution device 410, deploy the image fusion model on the local device 401 and the local device 402, and use the image fusion model to perform image fusion.
  • the image fusion model may be directly deployed on the execution device 410, and the execution device 410 obtains the images to be processed from the local device 401 and the local device 402, and uses the image fusion model to perform image fusion on the images to be processed.
  • the above execution device 410 may also be a cloud device, in this case, the execution device 410 may be deployed in the cloud; or, the above execution device 410 may also be a terminal device, in this case, the execution device 410 may be deployed on the user terminal side, the embodiment of the present application This is not limited.
  • the captured images or videos have problems such as low resolution, poor contrast, and loss of image details.
  • the human face is basically unrecognizable.
  • Current devices usually use near-infrared supplementary light to improve imaging quality in low-light scenes.
  • FIG 5(b) better human body details and face details can be obtained from near-infrared imaging in low-light scenes, but due to its imaging characteristics, infrared images cannot restore the true color of objects. Due to the complementarity of color image and infrared image, a fusion image can be obtained by fusing color image and infrared image.
  • the traditional fusion method is usually based on the fusion of the brightness channel, that is, the color road image is first converted into the YUV color space, and then the brightness channel Y is multi-scale fusion with the corresponding infrared image, and the fused Y channel is combined with the original UV.
  • the final fusion result the fusion image based on the fusion of the luminance channel will have problems such as reduced image saturation, color distortion, and more noise.
  • the fusion of color images and infrared images can be achieved by using deep learning. However, at present, it is usually aimed at the fusion task of high-definition color images and infrared images. In the case of low quality color images, infrared images are only used as a reference for noise reduction of color images, and the fusion effect cannot be guaranteed, and a lot of details are lost in the output image. , which affects the quality of the output image.
  • the embodiments of the present application propose a training method and an image fusion method for an image fusion model, which can improve imaging quality in a low illumination scene.
  • the solutions of the embodiments of the present application are applicable to scenarios where color images and infrared images can be obtained.
  • the following examples illustrate three types of obtaining color images and infrared images. Methods.
  • Example 1 Obtaining a color image and an infrared image based on a beam splitting prism.
  • the dichroic prism includes a prism 6020 and a filter 6030 .
  • the incident light received by the lens 6010 can be divided into visible light and near-infrared light by using a beam-splitting prism, and the visible light and the near-infrared light are imaged by two sensors, the color sensor 6040 and the near-infrared sensor 6050, respectively, and a color image and an infrared image are obtained at the same time.
  • Example 2 Obtaining color images and infrared images based on time-sharing and frame interpolation.
  • the fill light control unit 7030 periodically turns on and off the infrared fill light unit 7010 to control the type of light transmitted by the lens 7020 to the surface of the sensor 7040 , namely visible light or infrared light, and has no effect on the visible light and the infrared light in the photographed scene.
  • Infrared light is imaged separately.
  • the infrared image shown in FIG. 7 can also be a composite image of an infrared image and a color image. In the case of low illumination, the color image in the composite image has less information, so the composite image can be used as the original image. Infrared images in application examples. Through the frame insertion algorithm, the color image and the infrared image at the same moment are obtained. Interpolation refers to obtaining an image of an intermediate frame through the image information of the two frames before and after.
  • Example 3 Obtaining a color image and an infrared image based on an RGB-Near-infrared (NIR) sensor.
  • RGB-Near-infrared (NIR) sensor RGB-Near-infrared
  • FIG. 9 is a schematic diagram of an image fusion apparatus 600 according to an embodiment of the present application.
  • the function of each module in FIG. 9 is briefly described below.
  • the apparatus 600 may be a cloud service device or a terminal device, for example, a computer, a server, or other device with sufficient computing power to train a time series prediction model, or a system composed of a cloud service device and a mobile device.
  • the apparatus 600 may be the training device 120 in FIG. 1 , the neural network processor 30 in FIG. 3 , or the local device or execution device in FIG. 4 , or the like.
  • the apparatus 600 includes a background reference image acquisition module 610 , a fusion weight acquisition module 620 and an image fusion module 630 .
  • the enhancement of the color image is realized by fusing the color image and the infrared image, and the image fusion model can also be understood as an image enhancement model.
  • the background reference image acquisition module 610 is configured to acquire the background reference image, and input the background reference image into the image fusion module 630 .
  • the background area in the background reference image is the same as the background area in the color image.
  • the background reference image acquisition module may acquire the background reference image based on the color image. It should be understood that FIG. 9 is only an example, the background reference image module 610 may also acquire the background reference image in other manners, and the method for acquiring the background reference image may refer to the method 800 hereinafter.
  • the background reference image acquisition module 610 is an optional module.
  • the fusion weight obtaining module 620 is used to obtain the fusion weight, and input the fusion weight into the image fusion module 630 .
  • the fusion weight is used to adjust the fusion ratio of the color image and the infrared image in the image output by the image fusion model.
  • the fusion weight acquisition module can acquire fusion weights based on infrared images.
  • the fusion weight obtaining module can also obtain the fusion weight based on the color image.
  • FIG. 9 is only an example, the fusion weight obtaining module 620 may also obtain the fusion weight based on the method, and the specific method of obtaining the fusion weight may refer to the method 900 in the following.
  • the fusion weight obtaining module 620 is an optional module.
  • the image fusion module 630 is used for image fusion of the color image and the infrared image to obtain a fusion image.
  • the image fusion module 630 may perform image fusion on the color image and the infrared image based on the background reference image to obtain a fusion image.
  • the image fusion module 630 performs image fusion on the background reference image, the color image and the infrared image to obtain a fusion image.
  • the image fusion module 630 may perform image fusion on the color image and the infrared image based on the fusion weight to obtain a fusion image.
  • FIG. 10 is a training method 700 of an image fusion model provided by an embodiment of the present application.
  • the method shown in FIG. 10 can be performed by a training device for an image fusion model.
  • the training device for the image fusion model can be a cloud service device or a terminal device.
  • a computer, a server, and other computing capabilities are sufficient to perform image fusion.
  • the apparatus of the model training method may also be a system composed of cloud service equipment and terminal equipment.
  • the method 700 may be performed by the training device 120 in FIG. 1 , the neural network processor 30 in FIG. 3 , or the execution device 410 in FIG. 4 , or a local device.
  • the method 700 may be specifically performed by the training device 120 shown in FIG. 1 , and the first color image, the target color image, the first infrared image and the target infrared image in the method 700 may be stored in the database 130 shown in FIG. 1 .
  • S720 and S730 of the method 700 may be executed in the training device 120, or may be pre-executed by other functional modules before the training device 120, that is, the training data received or obtained from the database 130 is first preprocessed, According to the acquisition process described in S720 and S730, the first background reference image and the first fusion weight are obtained as the input of the training device 120, and the training device 120 performs steps S710 and S740.
  • the method 700 may be specifically performed by a local device as shown in FIG. 4 , and the local device may be a monitoring device. Specifically, method 700 may be performed by a computing module on a monitoring device.
  • the method 700 may be processed by the CPU, or may be jointly processed by the CPU and the GPU, or other processors suitable for neural network computing may be used without using the GPU, which is not limited in this application.
  • the method 700 includes steps S710 to S740. Steps S710 to S740 will be described in detail below.
  • the training samples include a first color image, a target color image, a first infrared image, and a target infrared image.
  • the color image may also be referred to as a visible light image.
  • the first color image and the target color image are images formed by the scene's reflection of visible light
  • the first infrared image and the target infrared image are images formed by the scene's reflection of light in the infrared band.
  • color images may be obtained by visible light imaging sensors, and infrared images may be obtained by infrared imaging sensors.
  • first color image, the first infrared image, the target color image and the target infrared image are taken for the same scene.
  • the same scene means that the similarity between any two images among the first color image, the first infrared image, the target color image, and the target infrared image is greater than the first threshold.
  • the similarity in this embodiment of the present application may be image texture similarity.
  • the similarity between any two images among the first color image, the first infrared image, the target color image, and the target infrared image may be the same as that of the first color image, the first infrared image, the target color image, and the target infrared image.
  • the image texture similarity between any two images of There is a one-to-one correspondence between the first color image and the first infrared image.
  • the first infrared image may be an infrared image captured at the same time as the first color image.
  • the target infrared image is the infrared image at the same time as the target color image.
  • FIG. 6 to FIG. 8 Reference may be made to FIG. 6 to FIG. 8 for the manner of acquiring the color image and the corresponding infrared image, which is not limited in this embodiment of the present application.
  • the signal-to-noise ratio of the target color image is higher than the signal-to-noise ratio of the first color image.
  • the target color image can be understood as a high-definition image corresponding to the first color image.
  • the target color image may be a high-definition image captured during the day, and the first color image may be a noisy image captured at night.
  • the signal-to-noise ratio refers to the ratio of signal to noise, for example, the ratio of the power spectrum of the signal to the noise, or the ratio of the variance of the signal to the noise, etc. The higher the signal-to-noise ratio, the better the image quality and the clearer the image.
  • the resolution of the target infrared image is higher than that of the first infrared image.
  • the target infrared image can be understood as a high-definition image corresponding to the first infrared image.
  • the target infrared image may be a high-definition image captured during the day, and the first infrared image may be a noisy image captured at night.
  • Shooting for the same scene can be understood as the same picture content in the image, for example, for images shot in the same area, the same area is the same scene.
  • the contents of the images in the images shown in FIG. 11 are the same, that is, images captured for the same scene.
  • the first infrared image and the target infrared image may be the same image.
  • the training samples include the first color image, the target color image and the first infrared image, that is, the training samples include three types of images.
  • the similarity between the first background reference image and the first color image is greater than the second threshold.
  • the background area in the first background reference image is the same as the background area in the first color image.
  • the similarity between the first background reference image and the first color image is greater than the second threshold, which may be that the similarity between the background area of the first background reference image and the background area of the first color image is greater than the second threshold.
  • the background region may be determined by the prior art, which is not limited in this embodiment of the present application.
  • the background signal-to-noise ratio of the background reference image is generally higher than the background signal-to-noise ratio of the first color image.
  • the background area of the image in the embodiment of the present application may be set as required.
  • the background area in the image may include buildings in the image, or may not include buildings in the image, and the method for dividing the background area is not limited in this embodiment of the present application.
  • Step S720 is an optional step.
  • the training samples may also include a first background reference image.
  • the first background reference image is acquired.
  • the first background reference image may be input into the image fusion model in the form of a color image, or may be input into the image fusion model in the form of a grayscale image.
  • the first background reference image is directly input into the image fusion model.
  • the luminance channel of the first background reference image may be input into the image fusion model.
  • Step S730 is an optional step.
  • the training samples may also include first fusion weights.
  • the first fusion weight is acquired.
  • the first fusion weight is used to weight the first color image and the first infrared image.
  • the first fusion weight is used to adjust the fusion ratio of the first color image and the first infrared image in the image output by the image fusion model.
  • the first fusion weight is used to adjust the fusion ratio of the color image and the infrared image in the image fusion process.
  • the first fusion weight is used to adjust the ratio of the information content of the first color image and the information content of the first infrared image contained in the image output by the image fusion model.
  • the first fusion weight corresponds to all images output by the image fusion model.
  • the first fusion weight may be a global weight.
  • the global weight is used to indicate the fusion weight of the whole image during the image fusion process. That is to say, all regions in the whole image use the same fusion weight in the image fusion process. In the image output by the entire image fusion model, there is only one first fusion weight. In any region in the image output by the image fusion model, the fusion ratio of the first color image and the first infrared image is the same.
  • the global weight corresponding to the infrared image when the global weight corresponding to the infrared image is larger, the information of the infrared image contained in the fusion image output by the image fusion model is more, that is, the fusion image is more similar to the infrared image.
  • the global weight corresponding to the color image is large, the information of the color image contained in the fusion image output by the image fusion model is more, that is, the fusion image is more similar to the color image.
  • the first fusion weight corresponds to an image output by the image fusion model.
  • the first fusion weight corresponds to a part of the image output by the image fusion model, and the first fusion weight corresponds to an area in the image output by the image fusion model.
  • the number of the first fusion weights may be multiple, and the multiple first fusion weights respectively correspond to different regions in the image output by the image fusion model.
  • the first fusion weight can be understood as a local weight.
  • the local weight is used to indicate the fusion weight of the local region during the image fusion process. That is to say, during the fusion process, different regions may adopt different first fusion weights.
  • the weight corresponding to the infrared image in the area A is larger, and the weight corresponding to the infrared image in the area B is smaller.
  • the area A contains more information of the infrared image
  • the area B contains more information of the color image. That is, the area A is more similar to the area A in the infrared image, and the area B is more similar to the area B in the color image.
  • the first fusion weight may be input into the image fusion model in the form of a parameter, or may be input into the image fusion model in the form of a fusion weight map, which is not limited in this application.
  • the values in the fusion weight map can be used to indicate the first fusion weight.
  • the values of different regions in the fusion weight map may be used to represent multiple first fusion weights.
  • Representing the first fusion weight in the form of a fusion weight map can reduce the complexity of adjusting the first fusion weight.
  • it is more favorable to represent the region corresponding to the first fusion weight through the fusion weight map.
  • the form of the fusion weight map is more favorable for representing the region corresponding to the first fusion weight.
  • Figure 11 shows a schematic diagram of a training sample.
  • (a) in FIG. 11 is the first color image In_Vis
  • (b) in FIG. 11 is the target color image Gt_Vis
  • (c) in FIG. 11 is the luminance channel In_VisRef_Y of the first background reference image
  • (d) in FIG. 11 is the first infrared image In_Nir
  • (e) in FIG. 11 is the target infrared image Gt_Nir
  • (f) in FIG. 11 is the fusion weight map In_FuseMap.
  • FIG. 11 is only for illustration, and the training samples may not include the luminance channel In_VisRef_Y and the fusion weight map In_FuseMap of the first background reference image, or may include one of the two, for example, include the fusion weight map In_FuseMap, or include the first background reference image In_FuseMap Luminance channel In_VisRef_Y of a background reference image.
  • the first background reference image exists in the form of luminance channel, that is, it is input into the image fusion model in the form of luminance channel. This is just an example.
  • the first background reference image can also be in the form of a luminance channel.
  • the form of color map exists, that is, the form of color map is input into the image fusion model.
  • the first fusion weight exists in the form of a fusion weight map, that is, it is input into the image fusion model in the form of a weight fusion map.
  • the first fusion weight may also exist in the form of parameters, that is, input into the image fusion model in the form of parameters.
  • there are two first fusion weights the weight values in the two rectangular boxes are the same, and the weight values outside the rectangular box are the same. This is just an example, and more first fusion weights may be set, or the first fusion weights may also be global weights.
  • the loss function includes the first loss function.
  • the first loss function is used to indicate the difference between the image output by the image fusion model and the target fusion image.
  • the target fusion image is determined according to the target color image and the target infrared image.
  • the target fusion image can be a color image or a grayscale image.
  • the target fusion image may be an image of a luminance channel.
  • the difference between the image output by the image fusion model and the target fusion image is the difference between the luminance channel of the image output by the image fusion model and the target fusion image.
  • the image fusion model is trained with the goal of reducing the value of the first loss function, that is, the difference between the luminance channel of the image output by the image fusion model and the target fusion image is continuously reduced.
  • This training process can be understood as a fusion task.
  • the first loss function can be understood as the loss function corresponding to the fusion task.
  • determining the target fusion image according to the target color image and the target infrared image including:
  • the target fusion image is determined according to the brightness channel of the target color image and the brightness channel of the target infrared image.
  • the luminance channel is described below.
  • the luminance channel includes structure (Structure) information S, contrast (Contrast) information C and luminance (Luminance) mean value L. It can also be understood that the luminance channel can be decomposed into structure information S, contrast information C and luminance mean value L.
  • the luminance channel y k of the image block k can be decomposed into the luminance mean value l k of the image block k, the structure information sk of the image block k, and the contrast information ck of the image block k .
  • the luminance channel y k , the luminance mean value lk , the structure information sk and the contrast information ck satisfy the following formula:
  • the luminance channel of the image can be decomposed to obtain the structure information, contrast information and luminance mean value of the image.
  • the luminance channel of the target color image and the luminance channel of the target infrared image satisfy the following formulas.
  • y gt_Vis c gt_Vis ⁇ s gt_Vis +l gt_Vis ;
  • y gt_Nir c gt_Nir ⁇ s gt_Nir +l gt_Nir ;
  • y gt_Vis represents the brightness channel of the target color image
  • c gt_Vis represents the contrast of the target color image
  • s gt_Vis represents the structural information of the target color image
  • l gt_Vis represents the brightness mean of the target color image
  • y gt_Nir represents the brightness channel of the target infrared image
  • c gt_Nir represents the contrast of the target infrared image
  • s gt_Nir represents the structural information of the target infrared image
  • l gt_Nir represents the average brightness of the target infrared image.
  • the brightness channel of the image can be obtained from the structure information, the contrast information and the mean brightness value of the image. It should be understood that the above is only for illustration, and the structure information, contrast information and average brightness of the image may also be obtained in other ways.
  • the value in the above formula may be a value corresponding to the entire image, or may be a value corresponding to an image block in the image.
  • image fusion may be performed in units of images, or image fusion may be performed in units of image blocks, which are not limited in the embodiments of the present application.
  • determining the target fusion image according to the brightness channel of the target color image and the brightness channel of the target infrared image includes: determining the contrast of the target fusion image according to the contrast of the target color image and/or the contrast of the target infrared image;
  • the structural information of the target fused image and/or the structural information of the target infrared image determines the structural information of the target fused image;
  • the average brightness of the target fused image is determined according to the brightness mean value of the target color image and/or the brightness mean value of the target infrared image.
  • determining the contrast of the target fusion image according to the contrast of the target color image and/or the contrast of the target infrared image includes: taking the larger contrast between the contrast of the target color image and the contrast of the target infrared image as the contrast of the target fusion image .
  • c gt_Vis represents the contrast of the target color image
  • c gt_Nir represents the contrast of the target infrared image
  • determining the contrast of the target fusion image according to the contrast of the target color image and/or the contrast of the target infrared image includes: taking the contrast of the target infrared image as the contrast of the target fusion image.
  • the contrast of the infrared image is relatively large. Taking the contrast of the infrared image as the contrast of the target fusion image can make the target fusion image contain more texture information, and at the same time improve the processing speed.
  • determining the contrast of the target fusion image according to the contrast of the target color image and/or the contrast of the target infrared image may also include: according to the contrast of the image blocks in the target color image and/or the contrast of the image blocks in the target infrared image , which determines the contrast of image patches in the target fused image.
  • determining the structure information of the target fusion image according to the structure information of the target color image and/or the structure information of the target infrared image including: performing a weighted average on the structure information of the target color image and the structure information of the target infrared image, and obtaining The result is used as the structural information of the target fusion image.
  • the structure weight corresponding to the target color image and the structure weight corresponding to the target infrared image may be preset, or may be determined according to the contrast of the target color image and the contrast of the target infrared image.
  • the structure weight corresponding to the target color image is determined according to the contrast of the target color image.
  • the structure weight corresponding to the target infrared image is determined according to the contrast of the target infrared image.
  • the larger the contrast the larger the structure weight value. In this way, the image with higher contrast in the target fusion image occupies a larger proportion of the structural information, so that the fusion image can contain more texture information.
  • w() represents the function of calculating the structural weight
  • s gt_Vis represents the structural information of the target color image
  • w(c gt_Vis ) represents the structural weight corresponding to the target color image determined according to the contrast of the target color image
  • s gt_Nir represents the target infrared image
  • the structure information of , w(c gt_Nir ) represents the structure weight corresponding to the target infrared image determined according to the contrast of the target infrared image.
  • the structure weight corresponding to the target color image and the structure weight corresponding to the target infrared image are determined according to the ratio between the contrast of the target color image and the contrast of the target infrared image.
  • determining the structure information of the target fusion image according to the structure information of the target color image and/or the structure information of the target infrared image includes: using the structure information of the target infrared image as the structure information of the target fusion image.
  • the infrared image has more structural information. Taking the structural information of the infrared image as the structural information of the target fusion image can make the fusion image contain more texture information and improve the processing speed at the same time.
  • determining the structure information of the target fusion image according to the structure information of the target color image and/or the structure information of the target infrared image may also include: The structure information of the image block, which determines the structure information of the image block in the target fusion image.
  • determining the mean brightness value of the target fusion image according to the mean brightness value of the target color image and/or the mean brightness value of the target infrared image includes: using the mean brightness value of the target color image as the mean brightness value of the target fusion image.
  • determining the brightness mean value of the target fusion image according to the brightness mean value of the target color image and/or the brightness mean value of the target infrared image including: performing a weighted average of the brightness mean value of the target color image and the brightness mean value of the target infrared image, and applying the weighted The averaged result is used as the mean brightness of the target fused image.
  • the brightness weight corresponding to the target color image and the brightness weight corresponding to the target infrared image may be preset.
  • determining the brightness mean value of the target fusion image according to the brightness mean value of the target color image and/or the brightness mean value of the target infrared image may also include: according to the brightness mean value of the image blocks in the target color image and/or the brightness mean value of the target infrared image The mean brightness of the image block, which determines the mean brightness of the image blocks in the target fusion image.
  • the target fusion image can be obtained from the contrast, structure value and average brightness of the target fusion image.
  • the target fusion image y fuse satisfies the following formula.
  • the difference between the luminance channel of the target fusion image and the image output by the image fusion model may be determined by a structural similarity index measure (SSIM) between the two images.
  • SSIM structural similarity index measure
  • the SSIM-based loss constraint enables the output image to retain as much structural information as possible.
  • the first loss function Lfuse satisfies the following formula.
  • L fuse 1-SSIM(y fuse , y out );
  • y out represents the luminance channel of the output image of the image fusion model.
  • the method 700 includes step S720 or the training sample includes the first reference background image
  • the first reference background image is added as the input of the image fusion model.
  • the method 700 includes step S730 or the training sample includes the first fusion weight
  • the first fusion weight is added as the input of the image fusion model.
  • determining the target fusion image according to the target color image and the target infrared image may include: determining the target fusion image according to the first fusion weight, the target color image and the target infrared image.
  • determining the target fusion image according to the first fusion weight, the target color image, and the target infrared image may include: determining a supervised image according to the target color image and the target infrared image, and performing the supervised image and the target color image according to the first fusion weight. weighted.
  • the supervision image can be determined by the method of determining the target fusion image y fuse in the previous section. That is, the target fusion image y fuse is used as the supervision image.
  • adjusting the proportion of the supervised image and the target color image in the target fused image according to the first fusion weight includes: adjusting the luminance channel of the supervised image and the luminance channel of the target color image in the target fused image according to the first fusion weight
  • the scale of the adjustment is used as the target fusion image.
  • the first fusion weight may correspond to part of the image output by the image fusion model.
  • the multiple first fusion weights respectively correspond to different positions in the target fusion image. That is, the multiple first fusion weights are respectively used to indicate the proportions of the supervisor image and the target color image at different positions in the target fusion image.
  • the information of the infrared images contained in the supervision image is more.
  • the more information of the supervision image contained in an area the more information of the infrared image contained in the area can also be understood.
  • the first fusion weight may be a fusion weight map.
  • the target fusion image yfuse_adj satisfies the following formula.
  • y fuse_adj y fuse ⁇ IN_FuseMap+(1-IN_FuseMap) ⁇ y gt_Vis ;
  • IN_FuseMap represents the fusion weight map.
  • the values on different regions in the fusion weight map respectively indicate the corresponding weights of different regions of the image.
  • multiplying yf use by the fusion weight map can be understood as multiplying the pixel value in yf use by the weight corresponding to the region where the pixel value is located in the fusion weight map.
  • the difference between the luminance channel of the target fusion image and the image output by the image fusion model may be determined by SSIM between the two images.
  • the first loss function L fuse_adj can satisfy the following formula.
  • the target fusion image is an image of a brightness channel
  • determining the target fusion image according to the first fusion weight, the target color image, and the target infrared image may include: adjusting the brightness channel of the target color image and the target infrared image according to the first fusion weight.
  • the ratio of the luminance channel in the target fusion image, and the adjusted result is used as the target fusion image.
  • the target fusion image when there is no target infrared image in the training sample, the target fusion image may also be determined according to the target color image and the first infrared image.
  • the determination method is the same as the target infrared image, and will not be repeated here. In this way, when the signal-to-noise ratio of the first infrared image is high, while ensuring the training effect of the image fusion model, the storage space of the training samples is saved, and the storage pressure is reduced.
  • the loss function further includes a second loss function.
  • the second loss function is used to indicate the difference between the target color image and the image output by the image fusion model.
  • Decreasing the value of the second loss function is to continuously optimize the parameters of the image fusion model to reduce the difference between the image output by the image fusion model and the target color image.
  • This training process can be understood as a noise reduction task.
  • the second loss function can be understood as the loss function corresponding to the noise reduction task.
  • the second loss function L denoise may satisfy the following formula.
  • P represents the set of pixels at different positions
  • p represents the pixels in the pixel set
  • C represents different RGB color channels
  • c represents a channel in the RGB color channel
  • Gt vis represents the target color image
  • Out represents the image output by the image fusion model.
  • the loss constrains the output image of the image fusion model to be as similar as possible to the target color image, which can not only ensure the effect of noise reduction, but also ensure that the output image color is consistent with the target color image, and avoid the problem of wrong output image color.
  • the noise reduction task and the fusion task are implemented collaboratively.
  • the loss function L of the image fusion model can satisfy the following formula.
  • L L denoise + ⁇ L fuse_adj ;
  • is a parameter, which is used to ensure that the loss function L denoise of the noise reduction task and the loss function L fuse_adj of the fusion task are in the same order of magnitude.
  • noise reduction processing is performed on the images before or after image fusion, for example, filtering is performed by using a filter.
  • this method will cause the noise reduction task and the fusion task to affect each other, resulting in poor fusion effect in the output image, or poor noise reduction effect, and the image quality cannot be guaranteed.
  • the noise reduction task and the fusion task are performed cooperatively to reduce information loss, which can not only ensure that rich texture details are retained in the fused image, but also ensure that the fused image achieves higher resolution and true color information.
  • the color image has rich color information
  • the infrared image has more texture information
  • the fusion image obtained by merging the two has natural color and rich texture information.
  • the target fusion image is determined according to the target color image and the target infrared image, and the image fusion model is trained based on the target fusion image, so that the image fusion model can make full use of the infrared information, which is beneficial to fuse more texture information in the output image and retain more images. detail.
  • different application scenarios may have different requirements for image fusion.
  • the fusion images obtained by using the same image fusion model cannot meet the fusion requirements of different application scenarios.
  • the fusion ratio of color images and infrared images can be adjusted. It is beneficial to be applied to different application scenarios. That is to say, it is not necessary to train multiple image fusion models for different application scenarios, and it can be applied to different scenarios only by adjusting the fusion weight, which improves the degree of freedom of the model.
  • different regions in the same image may have different requirements for image fusion.
  • the face region tends to fuse more infrared information to retain more texture information
  • the human region tends to fuse more A lot of color information to ensure the authenticity of the output image color.
  • different regions correspond to different fusion weights, so as to meet the requirements for image fusion of different regions in the same image, which is beneficial to improve the image quality of the output image.
  • the background blur problem that may be caused by the flashlight effect of the infrared image can be solved, and the background quality of the output image can be greatly improved.
  • the quality of the foreground area and the background area of the output image can achieve full-screen image enhancement.
  • the infrared image is active fill light, and there is a flashlight effect. As shown in (b) of FIG. 5 , the infrared image presents a phenomenon that the center of the screen is brighter and the edges are darker. In addition, infrared images are prone to overexposure. In order to ensure the image quality of the central area of the fill light, the brightness of the surrounding area is usually reduced, resulting in a large difference in the SNR between the foreground and background of the infrared image. The signal-to-noise ratio of the area is low.
  • the infrared image is directly used as the reference input for the image fusion task in the low-light scene, and the output result may have a background blur problem.
  • the embodiment of the present application provides a training method 800 for an image fusion model.
  • a background reference image as an input of the image fusion model, the training effect of the image fusion model is improved.
  • the method 800 can be performed by a training device for an image fusion model, which can be a cloud service device or a terminal device, for example, a computer, a server, etc. with sufficient computing power to execute the image fusion model training method It can also be a system composed of cloud service equipment and terminal equipment.
  • the method 800 may be performed by the training device 120 in FIG. 1 , the neural network processor 30 in FIG. 3 , or the execution device 410 in FIG. 4 , or a local device.
  • the method 800 may be specifically performed by a local device as shown in FIG. 4 , and the local device may be a monitoring device. Specifically, method 800 may be performed by a computing module on a monitoring device.
  • the method 800 includes steps S810 to S820, and steps S810 to S820 are described in detail below.
  • Step S810 acquiring a first background reference image, a first color image and a first infrared image.
  • the similarity between the first background reference image and the first color image is greater than the second threshold.
  • the background area in the first background reference image is the same as the background area in the first color image.
  • the similarity between the first background reference image and the first color image is greater than the second threshold, which may be that the similarity between the background area of the first background reference image and the background area of the first color image is greater than the second threshold.
  • the background region may be determined by the prior art, which is not limited in this embodiment of the present application.
  • the background signal-to-noise ratio of the first background reference image is higher than the background signal-to-noise ratio of the first color image.
  • the first color image is the input to the image fusion model.
  • the first color image and the first infrared image are captured for the same scene.
  • the same scene means that the similarity between the first color image and the first infrared image is greater than the first threshold, the first color image is an image formed by the scene's reflection of visible light, and the first infrared image is the scene's reflection of light in the infrared band. Image formed by reflection.
  • the first background reference image may be a color image or a grayscale image. That is to say, the first background reference image may be input into the image fusion model in the form of a color image, or may be input into the image fusion model in the form of a grayscale image.
  • Background reference images can be obtained in a number of ways.
  • the following example illustrates how to obtain the background reference image.
  • the background reference image can be obtained in any of the following ways. It should be understood that the following are only examples, and the background reference image may also be obtained in other manners, which is not limited in this application.
  • the first background reference image is a background reference image corresponding to the first color image, and can be acquired in any of the following ways.
  • the background reference image is determined based on the similarity to the color image.
  • the similarity between the images in the gallery and the color images is determined.
  • the image in the gallery with the highest similarity to the color image is used as the background reference image.
  • the gallery can be a high-definition image gallery. For example, an image in a gallery has a higher SNR than that color image.
  • the similarity of two images can be determined by parameters such as SSIM.
  • the background reference image is determined according to the similarity with the background area of the color image.
  • the similarity between the background area of the image in the gallery and the background area of the color image is determined.
  • the image in the gallery with the highest similarity to the background region of the color image is used as the background reference image.
  • the long exposure image corresponding to the color image is used as the background reference image.
  • a long exposure image refers to an image captured with a long exposure.
  • the long-exposure image corresponding to the color image refers to an image that is captured in a long-exposure manner in the area captured by the color image.
  • the long-exposure image corresponding to the color image may be an image captured by a device that captures the color image in a long-exposure manner at the location where the color image was captured.
  • the long-exposure image is an image obtained when the exposure duration is greater than the third threshold.
  • the background reference image of the color image is determined according to a plurality of color images corresponding to the color image.
  • the plurality of color images corresponding to the color images refer to images captured in the area where the color images were captured.
  • the plurality of color images corresponding to the color image may be images captured by a device that captured the color image at the location where the color image was captured.
  • the result of performing temporal noise reduction on the color image is used as the background reference image of the color image.
  • the method 800 may be applied to a video mode, that is, a scene for video fusion.
  • the image fusion model trained by the method 800 can be applied to a video scene.
  • the image fusion model obtained by the method 800 may be used to perform image fusion, thereby obtaining a fused image/fused video.
  • the background reference image can also be acquired in any of the following ways.
  • the first background reference image is a background reference image corresponding to the first color image, and may also be acquired in any of the following manners.
  • the background reference image is determined from multiple frames of color images preceding the color image.
  • the background reference images corresponding to the color images can be obtained by using consecutive multiple frames of color images.
  • the color image of the current input image fusion model is used as the target frame, and the multi-frame color images before the target frame are accumulated to obtain the accumulated frame, and the accumulated frame is used as the background reference image of the target frame.
  • the background area of the accumulated frame has good signal-to-noise ratio, and there may be motion blur in the foreground area.
  • n is an integer greater than 1. The larger the value of n is, the clearer the background area in the background reference image is obtained.
  • the background reference image Ref cur of the target frame can satisfy the following formula.
  • Frame i represents the ith frame
  • cur represents the current frame number, that is, the target frame is the cur th frame.
  • the cumulative frames are generated recursively.
  • the background reference image of the target frame is determined according to the background reference image of the frame preceding the target frame.
  • the background reference image Ref cur of the target frame may satisfy the following formula.
  • Ref cur weight ⁇ Ref old +(1-weight) ⁇ Vis cur ;
  • Ref old represents the background reference image of the frame before the target frame, or the accumulated frame corresponding to the frame before the target frame.
  • Vis cur represents the currently acquired color image, that is, the target frame, and weight represents the cumulative weight.
  • the larger the cumulative weight the higher the background signal-to-noise ratio of the background reference image, and the more obvious the motion smear.
  • Fig. 12(a) shows the grayscale image of the background reference image obtained when the cumulative weight is 0.5
  • Fig. 12(b) shows the grayscale image of the background reference image obtained when the cumulative weight is 0.9.
  • the SNR of the background region in the background reference image in (b) of FIG. 12 is significantly higher than that of the background region in the background reference image in (a) of FIG. 12 .
  • the image fusion model can better suppress the problem of motion blur. Therefore, the cumulative weight can be set higher to produce a better effect on the background enhancement. For example, set the cumulative weight to 0.9.
  • Generating background reference images recursively can reduce image cache and storage pressure.
  • the background reference image and the infrared image corresponding to the color image have a good complementary effect, which will not affect the picture quality of the foreground area.
  • the long exposure frame before the target frame is used as the background reference image of the target frame.
  • a long exposure frame refers to a frame shot with a long exposure.
  • a long exposure frame is a frame obtained when the exposure duration is greater than the third threshold.
  • the background reference image of the previous frame is used as the background reference image of the target frame. That is, the background reference image of the frame before the color image is taken as the background reference image.
  • the fused image of the previous frame of the target frame is used as the background reference image of the target frame. That is, the fused image of the frame before the color image output by the image fusion model is used as the background reference image.
  • input frame A as the first color image into the image fusion model obtain the fused frame A, use the fused frame A as the background reference image of frame A+1, and then use frame A+1 as the first color image,
  • the fused frame A is input to the image fusion model as the first reference image.
  • the result of performing temporal noise reduction on the target frame is used as the background reference image of the target frame.
  • the background changes less, and the background part of the picture has a high degree of similarity.
  • the color image taken under the condition of high illumination can be used as the background reference image.
  • a color image taken during the day with clear weather is used as a background reference image for a color image taken at night.
  • Step S820 using the first background reference image, the first color image and the first infrared image as the input of the image fusion model to train the image fusion model.
  • the image fusion model includes an encoder network and a decoder network.
  • the encoder network is used to extract the features of the input image, and the decoder network is used to obtain the fused image according to the extracted features.
  • the fused image is the fusion result of the first color image.
  • the encoder network may employ a neural network, eg, a convolutional neural network.
  • the decoder network may employ a neural network, eg, a convolutional neural network.
  • the encoder network includes a first encoder, a second encoder and a third encoder.
  • the first encoder is used to extract the features of the background reference image
  • the second encoder is used to extract the features of the input color image
  • the third encoder is used to extract the features of the input infrared image.
  • first encoder, the second encoder and the third encoder may be the same encoder, or may be different encoders.
  • the first encoder is used to extract the features of the background reference image
  • the second encoder and the third encoder are the same encoder, and are used to extract the features of the input color image and the input infrared image.
  • the background reference image is input into the encoder 11# (an example of the first encoder) in the fusion model, and the features of the background reference image are extracted by the encoder 11# and input into the decoder 12#.
  • Input the color image and infrared image into the encoder 13# (an example of the second encoder can also be understood as an example of the third encoder), and the encoder 13# extracts the features of the input color image and the input infrared image, and Input into decoder 12#.
  • the fused image is reconstructed by the decoder 12# according to the input features.
  • the encoder 11#, the encoder 13# and the decoder 12# can all be convolutional neural networks.
  • the input color image may be the first color image
  • the input infrared image may be the first infrared image.
  • the method 800 may be applied to a video mode, that is, a scene for video fusion.
  • the feature of the background reference image of the previous frame of the first color image may be used as the feature of the first background reference image. That is to say, the features of the background reference image of one frame are multiplexed in the image fusion process of the multi-frame color images.
  • the background reference images can be the same for different frames in the video. For example, a color image taken during the day with clear weather is used as a background reference image for a color image taken at night.
  • input frame A, background reference image A and infrared image A into the image fusion model extract the features of frame A, background reference image A and infrared image A respectively, and then reconstruct the fused image according to the extracted features, that is, frame A fusion results.
  • Input the frame A+1 and the infrared image A+1 into the image fusion model extract the features of the frame A+1 and the infrared image A+1 respectively, and use the feature of the background reference image A as the feature of the background reference image of the frame A+1 , and then reconstruct the fused image according to the extracted features, that is, the fusion result of frame A+1.
  • Image fusion can still be achieved when the computing resources of the device are limited.
  • the encoder network includes M first encoders and N second encoders
  • the decoder network includes N decoders. That is, the image fusion model includes M first encoders, N second encoders and N decoders. M is a positive integer, N is a positive integer greater than 1, and N>M.
  • the first color image may include N frames of color images
  • the first infrared image may include N frames of infrared images corresponding to the N frames of color images.
  • the N frames of color images and N frames of infrared images are used as the input of an image fusion model, and the image fusion model can output a fusion image corresponding to the N frames of color images, which specifically includes the following steps.
  • N frames of color images and N frames of infrared images are respectively input into N second encoders, and the N second encoders extract the features of N frames of color images and the features of N frames of infrared images respectively, and input the N second encoders respectively. in the decoder.
  • the M background reference images corresponding to the M frames of color images in the N frames of color images are respectively input into the M first encoders, and the M first encoders extract the features of the M background reference images respectively, and the M The features of the background reference images are respectively input into the N decoders, so that each decoder receives the feature of one background reference image among the M background reference images.
  • each decoder selects the color image that is closest to the frame number of the color image received by the decoder from M frames of color images, and input the background reference image of the closest color image into the decoder. .
  • the features input to the decoder A in step (1) are the features of the frame A and the features of the infrared image corresponding to the frame A. If the frame A is one frame of the M frames of color images, the feature of the background reference image of the frame A is input into the decoder A in step (2). If frame A does not belong to M frames of color images, step (2) inputs the feature of the background reference image of a frame color image that is closest to the frame number of frame A in the M color images into decoder A.
  • N fusion images are reconstructed respectively.
  • each of the N decoders reconstructs N fused images according to the features input to the N decoders, and the features input to the N decoders include: features of N frames of color images, features of N frames of infrared images, and M background references characteristics of the image.
  • an image fusion model includes a first encoder, two second encoders, and two decoders.
  • the encoder network includes encoder 21# (an example of the first encoder), encoder 22# and encoder 23# (an example of the second encoder), and the decoder network includes decoder 24# and Decoder 25# (an example of a decoder). Input the color image of the ith frame and the infrared image of the ith frame into the encoder 22#, and extract the features of the color image of the ith frame and the infrared image of the ith frame by the encoder 22#, and input them into the decoder 24# .
  • the background reference image of the i-th frame is input into the encoder 21#, and the features of the background reference image are extracted by the encoder 21# and input into the decoder 24# and the decoder 25#.
  • the fused image is reconstructed by the decoder 24# according to the features extracted by the encoder 22# and the encoder 21#, that is, the ith frame output image.
  • the fused image is reconstructed by the decoder 25# according to the features extracted by the encoder 23# and the encoder 21#, that is, the i+1th frame output image. That is, the feature of the background reference image of the i-th frame is multiplexed into the image fusion of the i-th frame and the i+1-th frame. In this way, image fusion of two frames can be performed at the same time, which improves the processing speed. At the same time, it is not necessary to extract the features of the background reference image twice, which reduces the amount of calculation. When the effect is basically lossless, the overhead of the scheme can be reduced by 25%.
  • image fusion model in step S820 is only for illustration, and other models capable of realizing image fusion may also be used as image fusion models in this embodiment of the present application.
  • the training process may not adopt the aforementioned method 700 but adopt other training methods, which is not limited in this application.
  • the background quality is to enhance the quality of the foreground area and the background area of the output image at the same time, so as to realize the image enhancement of the whole screen.
  • the embodiment of the present application proposes a training method 900 for an image fusion model, which adjusts the output image by increasing the fusion weight to meet different application scenarios.
  • the method 900 can be performed by a training device for an image fusion model, which can be a cloud service device or a terminal device, for example, a computer, a server, etc. whose computing power is sufficient to execute a training method for the image fusion model. It can also be a system composed of cloud service equipment and terminal equipment. Illustratively, the method 900 may be performed by the training device 120 in FIG. 1 , the neural network processor 30 in FIG. 3 , or the execution device 410 in FIG. 4 , or a local device.
  • the method 900 may be specifically performed by a local device as shown in FIG. 4 , and the local device may be a monitoring device. Specifically, method 900 may be performed by a computing module on a monitoring device.
  • the method 900 includes steps S910 to S920, and the steps S910 to S920 will be described in detail below.
  • Step S910 acquiring the first fusion weight, the first color image and the first infrared image.
  • the first fusion weight is used to weight the first color image and the first infrared image.
  • the first fusion weight is used to adjust the fusion ratio of the first color image and the first infrared image in the image output by the image fusion model.
  • the first fusion weight may be in the form of a parameter, or may be in the form of an image, that is, a fusion weight map. That is to say, the first fusion weight may be input into the image fusion model in the form of parameters, or may be input into the image fusion model in the form of images.
  • the first fusion weight may correspond to all images output by the image fusion model. That is, the first fusion weight may be a global weight.
  • the first fusion weight may correspond to an image output by a partial image fusion model, that is, the first fusion weight may be a local weight. Different first fusion weights correspond to different regions in the output image respectively.
  • the first fusion weight can be obtained in various ways.
  • the following example illustrates how to obtain the first fusion weight.
  • the first fusion weight can be obtained in any of the following ways. It should be understood that the following are only examples, and the first fusion weight may also be obtained in other manners, which is not limited in this application.
  • the first fusion weight may be preset.
  • the first fusion weight may be set manually.
  • Mode 2 Determine the first fusion weight according to the intensity of the infrared image.
  • multiple first fusion weights may be set, and the multiple first fusion weights are determined according to the brightness values of different regions of the infrared image. Specifically, in an area with higher brightness in the infrared image, the corresponding first fusion weight is higher. Since the signal-to-noise ratio of the area with higher brightness is higher, the weight value can be adjusted adaptively according to the intensity of the infrared image, and a higher weight can be set in the area with higher brightness, which is beneficial to make the quality of the fusion image higher.
  • the infrared image may be the first infrared image or the target infrared image.
  • Mode 3 Determine the first fusion weight according to the information entropy of the color image and the information entropy of the infrared image.
  • the plurality of first fusion weight values are determined according to the information entropy of the color image and the information entropy of the infrared image.
  • the weight value corresponding to the infrared image in the area A is higher.
  • the information entropy at the area B of the infrared image is smaller than the information entropy at the area B of the color image, and the corresponding weight value of the infrared image at the area B is lower.
  • the infrared image may be the first infrared image or the target infrared image.
  • the color image may be the first color image or the target color image.
  • Sources of information entropy include, but are not limited to, gradient information, contrast information, and the like.
  • the first fusion weight is obtained by adaptively adjusting the weight value through the information entropy of the image, so that the area with high information entropy corresponds to a higher weight, which is conducive to making the quality of the fusion image higher. .
  • Mode 4 Determine the first fusion weight according to the face information.
  • a higher weight value of the infrared image may be set at the face area, and a lower weight value of the infrared image may be set at other areas outside the face area, that is, a higher weight value of the color image.
  • the acquisition methods of the face area include but are not limited to methods such as face detection, image segmentation, or face heatmap.
  • the infrared image may be the first infrared image or the target infrared image.
  • the infrared image contains more image information, that is, more texture information. Therefore, by setting a higher weight value of the infrared image in the face area, more infrared information can be fused in the face area, and more infrared information can be retained. It can improve the clarity of the face area and improve the accuracy of face recognition.
  • the color in the color image is more realistic. Therefore, by setting a lower weight value of the infrared image in other areas, more color information can be fused in other areas, ensuring the natural degree of colors in other areas, and making the effect of the fusion image more natural. .
  • Fig. 15 shows a schematic diagram of a method for obtaining fusion weights.
  • the method can be applied to the scene of face bayonet monitoring.
  • the face region in (a) of FIG. 15 is acquired, and a weighted fusion image is generated according to the face region, as shown in (b) of FIG. 15 .
  • the weight of the face area is higher than that of other areas.
  • the weight value is used to indicate the proportion of the infrared image in the fusion image.
  • the weight of the face region in (b) of FIG. 15 is 0.6, and the weight of other regions is 0.1.
  • the face area may be a face frame.
  • the face frame may also have other shapes, for example, a circular frame or an irregular frame.
  • the image used for face detection may be a color image, such as a first color image, or an infrared image, such as a first infrared image. Face detection can be performed on color images to obtain face frames, or face detection can be performed on infrared images to obtain face frames. For example, in (a) of FIG. 15 , face detection is performed on an infrared image.
  • the first fusion weight is only represented in the form of a fusion weight map in FIG. 15 as an example, and does not constitute a limitation on the solution of the embodiment of the present application, and the first fusion weight may also be represented in other forms, for example, in the form of parameter values.
  • the first fusion weight is only represented in the form of a fusion weight map in FIG. 15 as an example, and does not constitute a limitation on the solution of the embodiment of the present application, and the first fusion weight may also be represented in other forms, for example, in the form of parameter values.
  • the first fusion weight is only represented in the form of a fusion weight map in FIG. 15 as an example, and does not constitute a limitation on the solution of the embodiment of the present application, and the first fusion weight may also be represented in other forms, for example, in the form of parameter values.
  • the first fusion weight is only represented in the form of a fusion weight map in FIG. 15 as an example, and does not constitute a limitation on the solution of the embodiment of the present application,
  • the first fusion weight indicates the weight value of the infrared image as an example for description, and does not constitute a limitation on the solution of the embodiment of the present application.
  • the first fusion weight can also be used to indicate the weight value of the color image, etc. .
  • Step S920 the image fusion model is trained by using the first fusion weight, the first color image and the first infrared image as the input of the image fusion model.
  • the training process may not adopt the aforementioned method 700 but adopt other training methods, which is not limited in this application.
  • the method 800 and the method 900 may be used in combination, that is, the image fusion model is trained by using the first fusion weight, the first background reference image, the first color image and the first infrared image as the input of the image fusion model.
  • the image fusion model in FIG. 13 the first fusion weight, the first infrared image, and the first color image are input into the second encoder, and the first background reference image is input into the first encoder. to train.
  • Different application scenarios may have different requirements for image fusion.
  • the fusion images obtained by using the same image fusion model cannot meet the fusion requirements of different application scenarios.
  • the fusion ratio of color images and infrared images can be adjusted, which is beneficial to Applied to different application scenarios. That is to say, there is no need to train multiple image fusion models for different application scenarios, and it can be applied to different scenarios only by adjusting the fusion weight, which improves the degree of freedom of the model.
  • the face area is more concerned with the recognition rate and tends to fuse more infrared information, making the fusion result closer to the infrared path, while the human area is more concerned about the color accuracy, and tends to use the infrared image as a noise reduction reference, making the fusion result closer Color road, improve the naturalness of the image.
  • different fusion processing is performed on different positions in the image according to the fusion weight, which is beneficial to improve the imaging quality of the image in a targeted manner.
  • the infrared images of different regions have different reference values for image fusion.
  • the fusion weights of infrared images at different positions it can be ensured that while the infrared image is used to improve the foreground clarity, the background signal of the image is not degraded, that is, the influence of the flashlight effect of the infrared image on the background area is reduced.
  • increase the fusion weight of infrared images in the foreground area so that the foreground area in the fused image can fuse more information of infrared images
  • decrease the fusion weight of infrared images in the background area so that the background in the fused image Regions can incorporate more information from the color image.
  • An embodiment of the present application provides a schematic flowchart of an image fusion method 1000.
  • the method may be executed by a device or device capable of image fusion, and the device capable of image fusion may be a cloud service device or a terminal device, such as , a computer, a server and other devices with sufficient computing power to execute the image fusion method, or a system composed of cloud service equipment and terminal equipment.
  • the method 1000 may be executed by the execution device 110 in FIG. 1 , the neural network processor 30 in FIG. 3 , or the execution device 410 in FIG. 4 , or a local device.
  • the method 1000 may be specifically executed by the execution device 110 shown in FIG. 1 , the color image and infrared image to be processed in the method 1000 may be the input data given by the client device 140 shown in FIG. 1 , and the execution device 110
  • the preprocessing module 113 may be used to perform the acquisition of the background reference image described in S1020 in the method 1000
  • the preprocessing module 114 in the execution device 110 may be used to perform the acquisition of the fusion weight described in S1030 in the method 1000
  • the execution of the procedure in the device 110 .
  • the computing module 111 may be used to perform the image fusion described in S1040 in the S method 1000 .
  • the method 1000 may be specifically performed by a local device as shown in FIG. 4 , and the local device may be a monitoring device. Specifically, method 1000 may be performed by a computing module on a monitoring device.
  • the method 1000 may be processed by the CPU, or may be jointly processed by the CPU and the GPU, or other processors suitable for neural network computing may be used without the use of the GPU, which is not limited in this application.
  • the image fusion model used in the image fusion method 1000 may be constructed by the method in FIG. 10 described above.
  • the method 1000 includes steps S1010 to S1040.
  • steps S1010 to S1040 For the specific implementation manner of the method 1000, reference may be made to the foregoing method 700. In order to avoid unnecessary repetition, repeated descriptions are appropriately omitted when introducing the method 1000 below.
  • S1010 acquire the color image and infrared image to be processed.
  • the color image to be processed is an image formed by the scene's reflection of visible light
  • the infrared image is an image formed by the scene's reflection of light in the infrared band.
  • the infrared image and the color image to be processed are taken for the same scene.
  • the same scene means that the similarity between the color image to be processed and the infrared image is greater than the first threshold.
  • shooting for the same scene means that the picture content of the infrared image and the color image to be processed are the same.
  • the infrared image may be an infrared image captured on the same area at the same time as the color image to be processed.
  • FIG. 6 to FIG. 8 for the manner of acquiring the color image and the corresponding infrared image, which is not limited in this embodiment of the present application.
  • the similarity between the background reference image and the color image to be processed is greater than the second threshold.
  • the background area in the background reference image is the same as the background area in the color image to be processed.
  • the similarity between the background reference image and the color image to be processed is greater than the second threshold, which may be that the similarity between the background area of the background reference image and the background area of the color image to be processed is greater than the second threshold.
  • the background SNR of the background reference image is usually higher than that of the color image to be processed.
  • the background region may be determined by the prior art, which is not limited in this embodiment of the present application.
  • Step S1020 is an optional step.
  • the background reference image may be input into the image fusion model in the form of a color image, or may be input into the image fusion model in the form of a grayscale image.
  • the background reference image is directly fed into the image fusion model.
  • the luminance channel of the background reference image can be input into the image fusion model.
  • the fusion weight is used to weight the color image and infrared image to be processed.
  • the fusion weight is used to adjust the fusion ratio of the color image to be processed and the infrared image in the fusion image.
  • the fusion weight is used to adjust the ratio of the information content of the color image to be processed and the information content of the infrared image contained in the fusion image.
  • Step S1030 is an optional step.
  • the fusion weights may be global weights.
  • the weight corresponding to the infrared image when the weight corresponding to the infrared image is larger, the information of the infrared image contained in the fusion image is more, that is, the fusion image is more similar to the infrared image.
  • the information of the color image to be processed contained in the fusion image is more, that is, the fusion image is more similar to the color image to be referenced.
  • the fusion weight corresponds to all fused images.
  • the fusion weight corresponds to all the fusion images, and there is only one fusion weight in the whole fusion image. In any region in the fused image, the fusion ratio of the color image to be processed and the infrared image is the same. This fusion weight may be referred to as a global weight.
  • the corresponding part of the fusion weight describes the fusion image.
  • the fusion weight corresponding to a part of the fusion image can be understood as the fusion weight corresponding to a region in the fusion image.
  • the number of fusion weights may be multiple, and the multiple fusion weights respectively correspond to different regions in the fusion image.
  • This fusion weight may be referred to as a local weight.
  • the weight corresponding to the infrared image in the area A is larger, and the weight corresponding to the infrared image in the area B is smaller.
  • the area A contains more information of the infrared image
  • the area B contains more information of the color image to be processed. That is, the area A is more similar to the area A in the infrared image, and the area B is more similar to the area B in the color image to be processed.
  • the fusion weight may be input into the image fusion model in the form of a parameter, or may be input into the image fusion model in the form of a fusion weight map, which is not limited in this application.
  • the values in the fusion weight map can be used to indicate fusion weights.
  • the values of different regions in the fusion weight map may be used to represent multiple fusion weights corresponding to different regions in the fusion image.
  • Representing the fusion weight in the form of a fusion weight map can reduce the complexity of adjusting the fusion weight.
  • the form of the fusion weight map is more favorable for representing different regions corresponding to the multiple fusion weights.
  • S1040 input the color image and infrared image to be processed into an image fusion model to perform feature extraction, and perform image fusion based on the extracted features to obtain a fusion image.
  • the image fusion model is obtained by training the image fusion model with the first color image and the first infrared image as the input of the image fusion model, and taking the value of the loss function smaller than the fourth threshold as the target.
  • the loss function includes a first loss function.
  • the first loss function is used to indicate the difference between the image output by the image fusion model and the target fusion image.
  • the target fusion image is determined according to the target color image and the target infrared image.
  • the first color image, The first infrared image, the target color image and the target infrared image are taken for the same scene, the signal-to-noise ratio of the target color image is higher than that of the first color image, and the signal-to-noise ratio of the target infrared image is higher than that of the first infrared image signal-to-noise ratio.
  • the same scene means that the similarity between any two images among the first color image, the first infrared image, the target color image, and the target infrared image is greater than the first threshold.
  • the target fusion image is an image of a brightness channel
  • the difference between the image output by the image fusion model and the target fusion image is the difference between the brightness channel of the image output by the image fusion model and the target fusion image
  • the loss function further includes a second loss function, where the second loss function is used to indicate the difference between the target color image and the image output by the image fusion model.
  • step S1040 further includes: inputting the background reference image into the image fusion model to perform image fusion.
  • step S1040 further includes: inputting the fusion weight into the image fusion model to perform image fusion.
  • the target fusion image is determined by the target color image and the target infrared image, and the image fusion model is trained based on the target fusion image, so that the image fusion model can make full use of infrared information, which is conducive to the fusion of more images in the output image. Texture information to preserve more image details.
  • Fig. 18 shows a schematic diagram of the effect of image fusion using the method 1000.
  • (a) of Fig. 18 is a color image obtained in a low-illumination scene, and the illumination of the scene is 0.2Lux. As shown in the figure, the signal-to-noise ratio of the color image is poor, and the face area is blurred and almost unrecognizable.
  • (b) of FIG. 18 is an infrared image corresponding to the color image, and a high-definition infrared image can be obtained by using near-infrared supplementary light. As shown in the figure, the definition of the face and the human body in the near-infrared image is relatively high.
  • FIG. 18 is a fused image obtained by the method 1000 .
  • the fusion image fully combines the advantages of the color image and the near-infrared image, and improves the imaging quality of the fusion image in the low-illumination scene.
  • Figure 19 shows a comparison chart of the effect of image fusion using different methods.
  • Fig. 19(a) is a color image to be processed, the signal-to-noise ratio is poor, the face area is relatively blurred, and it is almost unrecognizable. And due to the large noise, there is a certain error in the estimation of the image white balance parameter, resulting in a certain yellowishness of the image (white clothes are yellowish).
  • Figure 19(b) is a fusion image obtained by using the traditional brightness fusion scheme, which can improve the signal-to-noise ratio of the face part, but as shown by the arrow in the figure, this scheme leads to color distortion in the human body area, dark color The trousers appear grayish in color error.
  • (c) of FIG. 19 is a fusion image obtained by adopting the method 1000. While the definition of the face region in the fusion image is improved, the real color of the human body region is maintained.
  • Table 1 shows the test results of face recognition on fused images obtained by different methods.
  • the similarity between the face area in the 1424 fused images in the range of 0.2Lux-5Lux and the standard ID photo was measured, and the face recall was calculated when the similarity was greater than 0.85.
  • the method 700 is the training stage of the image fusion model (the stage performed by the training device 120 as shown in FIG. 1 ), and the specific training is carried out by using the image fusion model provided in the method 700; and the method 1000 can be understood as is the application stage of the image fusion model (the stage executed by the execution device 110 as shown in FIG. 1 ), which can be specifically embodied by using the image fusion model trained by the method 700, and according to the color image to be processed and the infrared image input by the user. image to obtain the output image, that is, the fused image.
  • FIG. 16 shows another image fusion method 1100 proposed by an embodiment of the present application.
  • the imaging quality is improved.
  • the method 1100 can be performed by an image fusion device, which can be a cloud service device or a terminal device, for example, a computer, a server, or other device with sufficient computing power to execute the image fusion method, or a cloud service device.
  • an image fusion device which can be a cloud service device or a terminal device, for example, a computer, a server, or other device with sufficient computing power to execute the image fusion method, or a cloud service device.
  • the method 1100 may be executed by the execution device 110 in FIG. 1 , the neural network processor 30 in FIG. 3 , or the execution device 410 in FIG. 4 , or a local device.
  • the method 1100 may be specifically performed by a local device as shown in FIG. 4 , and the local device may be a monitoring device. Specifically, method 1100 may be performed by a computing module on a monitoring device.
  • the image fusion model used in the image fusion method 1100 may be constructed by the method 800 described above.
  • the method 1100 includes steps S1110 to S1120.
  • steps S1110 to S1120 For the specific implementation manner of the method 1100, reference may be made to the foregoing method 800. In order to avoid unnecessary repetition, repeated descriptions are appropriately omitted when introducing the method 1100 below.
  • Step S1110 acquiring the color image, infrared image and background reference image to be processed.
  • the similarity between the background reference image and the color image to be processed is greater than the second threshold.
  • the background area in the background reference image is the same as the background area in the color image to be processed.
  • the similarity between the background reference image and the color image to be processed is greater than the second threshold, which may be that the similarity between the background area of the background reference image and the background area of the color image to be processed is greater than the second threshold.
  • the background SNR of the background reference image is higher than that of the color image to be processed.
  • the color image and the infrared image to be processed were taken for the same scene.
  • the same scene means that the similarity between the color image to be processed and the infrared image is greater than the first threshold.
  • the similarity in this embodiment of the present application may be image texture similarity.
  • the similarity between the color image to be processed and the infrared image may be the image texture similarity between the color image to be processed and the infrared image
  • the similarity between the background reference image and the color image to be processed may be the background Image texture similarity between the reference image and the color image to be processed.
  • the background reference image may be a color image or a grayscale image. That is, the background reference image can be input into the image fusion model in the form of a color map, or it can be input into the image fusion model in the form of a grayscale map.
  • the method 1100 can be applied to a video mode, ie, a scenario for video fusion.
  • a video mode ie, a scenario for video fusion.
  • the background reference image can be acquired in any of the following ways.
  • the background reference image is determined from a plurality of color images preceding the color image to be processed.
  • the color image to be processed in the current input image fusion model is taken as the target frame, and the multi-frame color images before the target frame are accumulated to obtain the accumulated frame, and the accumulated frame is used as the background reference image of the target frame.
  • the signal-to-noise ratio in the background area of the accumulated frame is good, and there may be motion blur in the foreground area.
  • n is an integer greater than 1. The larger the value of n is, the clearer the background area in the background reference image is obtained.
  • the background reference image Ref cur of the target frame can satisfy the following formula.
  • Frame i represents the ith frame
  • cur represents the current frame number, that is, the target frame is the cur th frame.
  • the cumulative frames are generated recursively.
  • the background reference image of the target frame is determined according to the background reference image of the frame preceding the target frame.
  • the background reference image Ref cur of the target frame may satisfy the following formula.
  • Ref cur weight ⁇ Ref old +(1-weight) ⁇ Vis cur ;
  • Ref old represents the background reference image of the frame before the target frame, or the accumulated frame corresponding to the frame before the target frame.
  • Vis cur represents the currently acquired color image, that is, the target frame, and weight represents the cumulative weight.
  • the larger the cumulative weight the higher the background signal-to-noise ratio of the background reference image, and the more obvious the motion smear.
  • the image fusion model can better suppress the problem of motion blur. Therefore, the cumulative weight can be set higher to produce a better effect on the background enhancement. For example, set the cumulative weight to 0.9.
  • Generating background reference images recursively can reduce image cache and storage pressure.
  • the first background reference image and the first infrared image have a good complementary effect and will not affect the picture quality of the foreground area.
  • the long exposure frame before the color image to be processed is used as the background reference image.
  • a long exposure frame is a frame obtained when the exposure duration is greater than the third threshold.
  • the background reference image is the background reference image of the frame preceding the color image to be processed.
  • the color image to be processed can be understood as the target frame, and the fused image of the previous frame of the target frame is used as the background reference image of the target frame. That is, the processing result of the previous frame of the target frame output by the image fusion model is used as the background reference image of the target frame.
  • input frame A as the color image to be processed into the image fusion model obtain the fused frame A, use the fused frame A as the background reference image of frame A+1, and then use frame A+1 as the color image to be processed
  • the image and the fused frame A are input into the image fusion model as the background reference image of frame A+1.
  • the result of performing temporal noise reduction on the color image to be processed is used as the background reference image.
  • the background changes less, and the background part of the picture has a high degree of similarity.
  • the color image taken under the condition of high illumination can be used as the background reference image.
  • a color image taken during the day with clear weather is used as a background reference image for a color image taken at night.
  • step S810 in the foregoing method 800 . It should be understood that the above manner of acquiring the background reference image is only an example, which is not limited in this application.
  • S1120 Input the color image, infrared image and background reference image to be processed into an image fusion model to perform feature extraction, and perform image fusion based on the extracted features to obtain a fusion image.
  • the image fusion model may refer to step S820 in the method 800 .
  • the background reference image is input into the encoder 11# (an example of the first encoder) in the fusion model, and the features of the background reference image are extracted by the encoder 11# and input into the decoder 12#.
  • Input the color image and infrared image to the encoder 13# an example of the second encoder can also be understood as an example of the third encoder
  • the feature of the input color image and the input infrared image is extracted by the encoder 13#,
  • the fused image is reconstructed by the decoder 12# according to the input features.
  • the encoder 11#, the encoder 13# and the decoder 12# can all be convolutional neural networks.
  • the input color image may be a color image to be processed
  • the input infrared image may be an infrared image.
  • the method 1100 may be applied to a video mode, that is, a scene for video fusion.
  • the characteristics of the background reference image of the previous frame of the color image to be processed may be used as the characteristics of the background reference image. That is to say, the features of the background reference image of one frame are multiplexed in the image fusion process of the multi-frame color images.
  • input frame A, background reference image A and infrared image A into the image fusion model extract the features of frame A, background reference image A and infrared image A respectively, and then reconstruct the fused image according to the extracted features, that is, frame A the corresponding fusion results.
  • Input the frame A+1 and the infrared image A+1 into the image fusion model extract the features of the frame A+1 and the infrared image A+1 respectively, and use the feature of the background reference image A as the feature of the background reference image of the frame A+1 , and then reconstruct the fused image according to the extracted features, that is, the fusion result corresponding to frame A+1.
  • Image fusion can still be achieved when the computing resources of the device are limited.
  • the encoder network includes M first encoders and N second encoders
  • the decoder network includes N decoders. That is, the image fusion model includes M first encoders, N second encoders and N decoders. M is a positive integer, N is a positive integer greater than 1, and N>M.
  • the color images to be processed may include N frames of color images, and the infrared images include N frames of infrared images corresponding to the N frames of color images.
  • the N frames of infrared images corresponding to the N frames of color images may be obtained when the same area is photographed at the same time as the N frames of color images. That is to say, there is a one-to-one correspondence between N frames of color images and N frames of infrared images.
  • step S1120 includes:
  • N frames of color images and N frames of infrared images are respectively input into N second encoders, and the N second encoders extract the features of N frames of color images and the features of N frames of infrared images respectively, and input the N second encoders respectively.
  • the decoder N frames of color images and N frames of infrared images are respectively input into N second encoders, and the N second encoders extract the features of N frames of color images and the features of N frames of infrared images respectively, and input the N second encoders respectively.
  • the M background reference images corresponding to the M frames of color images in the N frames of color images are respectively input into the M first encoders, and the M first encoders extract the features of the M background reference images respectively, and the M The features of the background reference images are respectively input into N decoders, and each decoder receives the feature of one background reference image among the M background reference images;
  • N fusion images are reconstructed respectively.
  • each of the N decoders reconstructs N fused images according to the features input to the N decoders, and the features input to the N decoders include: features of N frames of color images, features of N frames of infrared images, and M background references characteristics of the image.
  • each decoder selects the color image that is closest to the frame number of the color image received by the decoder from M frames of color images, and input the background reference image corresponding to the closest color image to the decoder middle.
  • the input features of the decoder A are the features of the frame A and the features of the infrared image corresponding to the frame A. If frame A is one of the M frames of color images, the features of the background reference image of frame A are input into decoder A. If the frame A does not belong to the M frames of color images, the characteristics of the background reference image of a frame of color images that are closest to the frame number of the frame A in the M frames of color images are input into the decoder A.
  • the features of the background reference image of the i-th frame are multiplexed into the image fusion of the i-th frame and the i+1-th frame.
  • two frames can be fused at the same time, which improves the processing speed.
  • the overhead of the scheme can be reduced by 25%.
  • the method 800 is the training stage of the image fusion model, and the specific training is carried out by using the image fusion model provided in the method 800.
  • the method 1100 can be understood as the application stage of the image fusion model, which can be embodied as the training obtained by the method 800. and obtain the output image, that is, the fusion image in method 1100, according to the color image and infrared image to be processed input by the user.
  • the method 1100 may use the image fusion model trained by the method 800, or may not use the image fusion model trained by the method 800.
  • the background reference image as the input of the image fusion model, the background blur problem that may be caused by the torch effect of the infrared image can be solved, and the background quality of the output image can be greatly improved, that is, the output image can be enhanced at the same time.
  • the quality of the foreground area and the quality of the background area can be improved to achieve full-screen image enhancement.
  • the embodiment of the present application proposes an image fusion method 1200, which adjusts the output image by increasing the fusion weight to meet different application scenarios.
  • the method 1200 can be performed by an image fusion device, which can be a cloud service device or a terminal device, for example, a computer, a server, or other device with sufficient computing power to execute the image fusion method, or a cloud service device.
  • an image fusion device which can be a cloud service device or a terminal device, for example, a computer, a server, or other device with sufficient computing power to execute the image fusion method, or a cloud service device.
  • the method 1200 may be executed by the execution device 110 in FIG. 1 , the neural network processor 30 in FIG. 3 , or the execution device 410 in FIG. 4 , or a local device.
  • the method 1200 may be specifically performed by a local device as shown in FIG. 4 , and the local device may be a monitoring device. Specifically, method 1200 may be performed by a computing module on a monitoring device.
  • the image fusion model used in the image fusion method 1200 may be constructed by the method 900 described above.
  • the method 1200 includes steps S1210 to S1220.
  • steps S1210 to S1220 For the specific implementation of the method 1200, reference may be made to the aforementioned method 900. In order to avoid unnecessary repetition, the repeated description is appropriately omitted when introducing the method 1200 below.
  • S1210 Acquire the color image, infrared image and fusion weight to be processed.
  • the fusion weights are used to weight the color and infrared images to be processed.
  • the fusion weight is used to adjust the fusion ratio of the color image to be processed and the infrared image in the fusion image.
  • the fusion weight may be in the form of a parameter, or may be in the form of an image, that is, a fusion weight map. That is to say, the fusion weight can be input into the image fusion model in the form of parameters, and can also be input into the image fusion model in the form of images.
  • the fusion weight corresponds to all fused images.
  • the fusion weight corresponds to all the fusion images, and there is only one fusion weight in the whole fusion image. In any region in the fused image, the fusion ratio of the color image to be processed and the infrared image is the same. This fusion weight may be referred to as a global weight.
  • the corresponding part of the fusion weight describes the fusion image.
  • the fusion weight corresponding to a part of the fusion image can be understood as the fusion weight corresponding to a region in the fusion image.
  • the number of fusion weights may be multiple, and the multiple fusion weights respectively correspond to different regions in the fusion image.
  • This fusion weight may be referred to as a local weight.
  • the fusion weight is greater than or equal to 0 and less than or equal to 1, and the proportion of the infrared image in the fusion image is positively correlated with the fusion weight.
  • the value range of the fusion weight is [0, 1], and the fusion weight can be used to indicate the proportion of the infrared image in the fusion image.
  • the larger the fusion weight the larger the proportion of infrared images in the fusion image, that is, the more infrared information is fused in the fusion image.
  • Fusion weights can be obtained in several ways. The following example illustrates how to obtain the fusion weight. Fusion weights can be obtained in any of the following ways. It should be understood that the following are only examples, and the fusion weight may also be obtained in other ways, which is not limited in this application.
  • the fusion weight can be preset.
  • the fusion weights can be artificially set.
  • Method 2 Determine the fusion weight according to the intensity of the infrared image.
  • multiple fusion weights are determined based on the brightness values of different regions of the infrared image. Specifically, in an area with higher brightness in the infrared image, the corresponding fusion weight is higher. Since the signal-to-noise ratio of the area with higher brightness is higher, the weight value can be adjusted adaptively according to the intensity of the infrared image, and a higher weight can be set in the area with higher brightness, which is beneficial to make the quality of the fusion image higher.
  • Mode 3 Determine the fusion weight according to the information entropy of the color image to be processed and the information entropy of the infrared image.
  • the plurality of fusion weights are determined according to the information entropy of the color image to be processed and the information entropy of the infrared image.
  • the weight value corresponding to the infrared image at the area A is higher. If the information entropy at the area B of the infrared image is smaller than the information entropy at the area B of the color image to be processed, the weight value corresponding to the infrared image at the area B is lower.
  • Sources of information entropy include, but are not limited to, gradient information, contrast information, and the like.
  • the weight value is adaptively adjusted by the information entropy of the image to obtain the fusion weight, so that the area with high information entropy corresponds to a higher weight, which is beneficial to make the quality of the fusion image higher.
  • Mode 4 Determine the fusion weight according to the face information.
  • a higher weight value of the infrared image can be set at the face area, and a lower weight value of the infrared image can be set at other areas outside the face area, that is, the color image to be processed has a higher weight value.
  • the acquisition methods of the face area include but are not limited to methods such as face detection, image segmentation, or face heatmap.
  • the infrared image contains more image information, that is, more texture information. Therefore, by setting a higher weight value of the infrared image in the face area, more infrared information can be fused in the face area, and more infrared information can be retained. It can improve the clarity of the face area and improve the accuracy of face recognition.
  • the color in the color image to be processed is more realistic. Therefore, by setting a lower weight value of the infrared image to be processed in other areas, more color information can be fused in other areas, and the natural degree of colors in other areas can be ensured. The effect of fused images is more natural.
  • the face area in FIG. 15( a ) is acquired, and a weight fusion map is generated according to the face area, as shown in FIG. 15( b ).
  • the weight of the face area is higher than that of other areas.
  • the weight value is used to indicate the proportion of the infrared image in the fusion image.
  • the weight of the face region in (b) of FIG. 15 is 0.6, and the weight of other regions is 0.1.
  • the face area may be a face frame.
  • the face frame may also have other shapes, for example, a circular frame or an irregular frame.
  • the image used for face detection can be a color image to be processed or an infrared image. Face detection can be performed on color images to obtain face frames, or face detection can be performed on infrared images to obtain face frames. For example, in (a) of FIG. 15 , face detection is performed on an infrared image.
  • the fusion weight is only represented in the form of a fusion weight map as an example, which does not constitute a limitation on the solution of the embodiment of the present application, and the fusion weight may also be represented in other forms, for example, the fusion weight is represented in the form of a parameter value.
  • FIG. 15 only takes the fusion weight indicating the weight value of the infrared image as an example for description, and does not limit the solution of the embodiment of the present application.
  • the fusion weight can also be used to indicate the weight value of the color image.
  • step S910 in the foregoing method 900 . It should be understood that the above manner of obtaining the fusion weight is only an example, which is not limited in this application.
  • S1220 Input the color image, infrared image and fusion weight to be processed into an image fusion model to perform feature extraction, and perform image fusion based on the extracted features to obtain a fusion image.
  • the image fusion model can be trained by using the method 900 to obtain an image fusion model.
  • Figure 17 shows fused images obtained with different fusion weights.
  • Figure 17 (a) adopts global weight, the corresponding weight value of infrared image is 0.1, the fusion image is similar to noise reduction of color image, the fusion infrared information is less, and the definition of the picture is lower, especially the face The area is blurry.
  • Figure 17 (b) adopts global weight, and the weight value corresponding to the infrared image is 0.6.
  • the definition of the fusion image is high, and the definition of the face area is improved, which is conducive to subsequent processing such as face recognition, but as shown in the figure As shown by the arrow of , there is more texture information fused in the human body area, resulting in a heavy oil painting feeling in the human body area and a low degree of naturalness of the image.
  • (c) of FIG. 17 adopts the fusion weight shown in (b) of FIG. 15 , that is, the weight value corresponding to the infrared image in the face area is 0.6, and the weight value corresponding to the infrared image in other areas is 0.1.
  • the face definition of the fusion image is high, and the naturalness of other areas is guaranteed at the same time.
  • the method 900 is the training stage of the image fusion model, and the specific training is carried out by using the image fusion model provided in the method 900, and the method 1200 can be understood as the application stage of the image fusion model, which can be embodied by using the training obtained by the method 900. and obtain the output image, that is, the fusion image in method 1200, according to the color image and infrared image to be processed input by the user.
  • the method 1200 may use the image fusion model trained by the method 900, or may not use the image fusion model trained by the method 900.
  • the method 1100 and the method 1200 may be used in combination, that is, the fusion weight, the infrared image, the color image to be processed and the background reference image are input into the image fusion model to perform image fusion to obtain a fusion image.
  • the fusion weight map, the infrared image, and the color image to be processed are input into the second encoder, and the background reference image is input into the first encoder to perform image fusion.
  • fusion weights are introduced.
  • the fusion ratio of color images and infrared images can be adjusted. It is suitable for different application scenarios. That is to say, there is no need to train multiple image fusion models for different application scenarios, and it can be applied to different scenarios only by adjusting the fusion weight, which improves the degree of freedom of the model.
  • different regions correspond to different fusion weights, so as to meet the requirements for image fusion of different regions in the same image, which is beneficial to improve the image quality of the output image.
  • FIG. 20 is a schematic block diagram of an apparatus for training an image fusion model according to an embodiment of the present application.
  • the image fusion model training apparatus 2000 shown in FIG. 20 includes an acquisition unit 2010 and a processing unit 2020 .
  • the acquiring unit 2010 and the processing unit 2020 may be configured to execute the training method 700 , the method 800 or the method 900 of the image fusion model according to the embodiment of the present application.
  • the obtaining unit 2010 is configured to obtain at least one training sample, the training sample includes a first color image, a first infrared image, a target color image and a target infrared image, the first color image, the first infrared image, and the target color image and the target infrared image are taken for the same scene, the same scene means that the similarity between any two images in the first color image, the first infrared image, the target color image and the target infrared image is greater than the first threshold, the first A color image and a target color image are images formed by the scene's reflection of visible light, and the first infrared image and the target infrared image are images formed by the scene's reflection of light in the infrared band; the signal-to-noise ratio of the target color image is higher than that of the first color image The signal-to-noise ratio of the image, the signal-to-noise ratio of the target infrared image is higher than the signal-to-noise ratio of
  • the processing unit 2020 is used for: taking the first color image and the first infrared image as the input of the image fusion model, and taking the value of the loss function less than the fourth threshold as the target to train the image fusion model, to obtain a trained image fusion model; wherein , the loss function includes a first loss function, the first loss function is used to indicate the difference between the image output by the image fusion model and the target fusion image, and the target fusion image is determined according to the target color image and the target infrared image.
  • the processing unit 2020 is specifically configured to: take the first fusion weight, the first color image, and the first infrared image as the input of the image fusion model, and take the value of the loss function less than the fourth threshold as the target pair.
  • the image fusion model is trained to obtain a trained image fusion model.
  • the first fusion weight is used to weight the first color image and the first infrared image.
  • the target fusion image is based on the first fusion weight, the target color image and the target infrared image. definite.
  • the first fusion weight corresponds to part or all of the images output by the image fusion model.
  • the processing unit 2020 is further configured to take the first background reference image, the first color image and the first infrared image as the input of the image fusion model, and take the value of the loss function less than the fourth threshold as the target pair.
  • the image fusion model is trained to obtain a trained image fusion model, and the similarity between the first background reference image and the first color image is greater than the second threshold.
  • the loss function further includes a second loss function, where the second loss function is used to indicate the difference between the target color image and the image output by the image fusion model.
  • the target fusion image is an image of a brightness channel
  • the difference between the image output by the image fusion model and the target fusion image is the difference between the brightness channel of the image output by the image fusion model and the target fusion image.
  • FIG. 21 is a schematic block diagram of an image fusion apparatus according to an embodiment of the present application.
  • the image fusion apparatus 3000 shown in FIG. 21 includes an acquisition unit 3010 and a processing unit 3020 .
  • the acquiring unit 3010 and the processing unit 3020 may be configured to execute the image fusion method 1000 , the method 1100 or the method 1200 in the embodiments of the present application.
  • the acquiring unit 3010 is configured to acquire the color image to be processed, the infrared image and the background reference image, the infrared image and the color image to be processed are taken for the same scene, and the same scene refers to the color image to be processed and the infrared image to be processed.
  • the similarity between the images is greater than the first threshold; the color image to be processed is an image formed by the scene's reflection of visible light, and the infrared image is an image formed by the scene's reflection of light in the infrared band.
  • the processing unit 3020 is used to input the color image to be processed, the infrared image and the background reference image into the trained image fusion model for feature extraction, and perform image fusion based on the extracted features to obtain a fusion image;
  • the similarity between the processed color images is greater than the second threshold.
  • the processing unit 3020 is further configured to: obtain a fusion weight, and input the fusion weight into the image fusion model; wherein, the fusion weight is used to weight the color image and infrared image to be processed.
  • the fusion weight corresponds to some or all of the fusion images.
  • the color images to be processed include N frames of color images
  • the infrared images include N frames of infrared images corresponding to the N frames of color images
  • the background reference images corresponding to the N frames of color images are based on N frames of color images.
  • M is determined by the background reference image of the M frames of color images, M is a positive integer, N is a positive integer greater than 1, and N>M.
  • the image fusion model includes M first encoders, N second encoders, and N decoders
  • the processing unit 3020 is specifically configured to: extract features of N frames of color images and N The features of frame infrared images; extract the features of M background reference images corresponding to M frames of color images respectively; reconstruct N fusion images according to the features of N frames of color images, the features of N frames of infrared images and the features of M background reference images respectively image.
  • the background reference image is obtained by any of the following methods: obtaining the background reference image according to multiple frames before the color image to be processed; using the long exposure frame before the color image to be processed as the background reference image, the long exposure frame is the frame obtained when the exposure duration is greater than the third threshold; the result of temporal noise reduction of the color image to be processed is used as the background reference image; or the frame before the color image to be processed corresponds to The fused image is used as the background reference image.
  • the trained image fusion model is performed by using the first color image and the first infrared image as the input of the image fusion model, and taking the value of the loss function less than the fourth threshold as the target to perform the image fusion model.
  • the loss function includes a first loss function, the first loss function is used to indicate the difference between the image output by the image fusion model and the target fusion image, and the target fusion image is determined according to the target color image and the target infrared image, and the first loss function is used to indicate the difference between the image output by the image fusion model and the target fusion image.
  • a color image, the first infrared image, the target color image and the target infrared image are taken for the same scene, and the same scene refers to any two of the first color image, the first infrared image, the target color image and the target infrared image
  • the similarity between the images is greater than the first threshold, the signal-to-noise ratio of the target color image is higher than that of the first color image, and the signal-to-noise ratio of the target infrared image is higher than that of the first infrared image.
  • the loss function further includes a second loss function, where the second loss function is used to indicate the difference between the target color image and the image output by the image fusion model.
  • the target fusion image is an image of a brightness channel
  • the difference between the image output by the image fusion model and the target fusion image is the difference between the brightness channel of the image output by the image fusion model and the target fusion image.
  • apparatus 2000 and apparatus 3000 are embodied in the form of functional units.
  • unit here can be implemented in the form of software and/or hardware, which is not specifically limited.
  • a "unit” may be a software program, a hardware circuit, or a combination of the two that realizes the above-mentioned functions.
  • the hardware circuits may include application specific integrated circuits (ASICs), electronic circuits, processors for executing one or more software or firmware programs (eg, shared processors, proprietary processors, or group processors) etc.) and memory, merge logic and/or other suitable components to support the described functions.
  • ASICs application specific integrated circuits
  • processors for executing one or more software or firmware programs eg, shared processors, proprietary processors, or group processors
  • the units of each example described in the embodiments of the present application can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.
  • FIG. 22 is a schematic diagram of a hardware structure of an apparatus for training an image fusion model provided by an embodiment of the present application.
  • the image fusion model training apparatus 4000 shown in FIG. 22 includes a memory 4001 , a processor 4002 , a communication interface 4003 and a bus 4004 .
  • the memory 4001 , the processor 4002 , and the communication interface 4003 are connected to each other through the bus 4004 for communication.
  • the memory 4001 may be a read only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM).
  • the memory 4001 can store a program.
  • the processor 4002 and the communication interface 4003 are used to execute each step of the image fusion model training method in the embodiment of the present application.
  • the processor 4002 can execute the above method 700, 800 or 900.
  • the processor 4002 may adopt a general-purpose central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), a graphics processor, or one or more integrated circuits for executing related functions.
  • the program is used to realize the functions required to be performed by the units in the training apparatus of the image fusion model of the embodiment of the present application, or to execute the training method of the image fusion model of the method embodiment of the present application.
  • the processor 4002 may also be an integrated circuit chip with signal processing capability. For example, it may be the chip shown in FIG. 3 .
  • each step of the training method of the image fusion model of the present application can be completed by the hardware integrated logic circuit in the processor 4002 or the instructions in the form of software.
  • the above-mentioned processor 4002 can also be a general-purpose processor, a digital signal processor (digital signal processing, DSP), an application-specific integrated circuit (ASIC), an off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices. , discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • ASIC application-specific integrated circuit
  • FPGA field programmable gate array
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory 4001, and the processor 4002 reads the information in the memory 4001 and, in combination with its hardware, completes the functions required to be performed by the units included in the apparatus for training the image fusion model of the embodiment of the present application, or executes the method embodiment of the present application.
  • the training method of the image fusion model may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory 4001, and the processor 4002 reads the information in the memory 4001 and, in combination with its hardware, completes the functions required to be performed by the units included in the apparatus for training the image fusion model of the embodiment of the present application, or executes the method embodiment of the present
  • the communication interface 4003 implements communication between the device 4000 and other devices or a communication network using a transceiver device such as, but not limited to, a transceiver.
  • a transceiver device such as, but not limited to, a transceiver.
  • training data eg, the first color image, the first infrared image, the target color image, and the target infrared image in method 700
  • Bus 4004 may include a pathway for communicating information between various components of device 4000 (eg, memory 4001, processor 4002, communication interface 4003).
  • the acquisition unit 2010 in the image fusion model training apparatus 2000 is equivalent to the communication interface 4003 in the image fusion model training apparatus 4000
  • the processing unit 2020 may be equivalent to the processor 4002 .
  • FIG. 23 is a schematic diagram of a hardware structure of an image fusion apparatus provided by an embodiment of the present application.
  • the image fusion apparatus 5000 shown in FIG. 23 (the apparatus 5000 may specifically be a computer device) includes a memory 5001 , a processor 5002 , a communication interface 5003 and a bus 5004 .
  • the memory 5001, the processor 5002, and the communication interface 5003 realize the communication connection with each other through the bus 5004.
  • the memory 5001 may be a ROM, a static storage device, a dynamic storage device or a RAM.
  • the memory 5001 may store a program. When the program stored in the memory 5001 is executed by the processor 5002, the processor 5002 and the communication interface 5003 are used to execute each step of the image fusion method of the embodiment of the present application.
  • the processor 5002 may adopt a general-purpose CPU, a microprocessor, an ASIC, a GPU, or one or more integrated circuits, and is used to execute a related program, so as to realize the functions required to be executed by the units in the image fusion apparatus of the embodiment of the present application, or Execute the image fusion method of the method embodiment of the present application.
  • the processor 5002 may also be an integrated circuit chip with signal processing capability. In the implementation process, each step of the image fusion method of the present application can be completed by an integrated logic circuit of hardware in the processor 5002 or instructions in the form of software.
  • the above-mentioned processor 5002 may also be a general-purpose processor, DSP, ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components.
  • the methods, steps, and logic block diagrams disclosed in the embodiments of this application can be implemented or executed.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory 5001, and the processor 5002 reads the information in the memory 5001, and combines its hardware to complete the functions required to be performed by the units included in the image fusion apparatus of the embodiment of the present application, or to perform the image fusion of the method embodiment of the present application. method.
  • the communication interface 5003 implements communication between the apparatus 5000 and other devices or a communication network using a transceiving device such as, but not limited to, a transceiver.
  • a transceiving device such as, but not limited to, a transceiver.
  • input data (such as the color image and infrared image to be processed in the embodiment of the present application) can be acquired through the communication interface 5003 .
  • the bus 5004 may include a pathway for communicating information between the various components of the device 5000 (eg, the memory 5001, the processor 5002, the communication interface 5003).
  • the acquisition unit 3010 in the image fusion apparatus 3000 is equivalent to the communication interface 5003 in the image fusion apparatus 5000 ; the processing unit 3020 in the image fusion apparatus 3000 may be equivalent to the processor 5002 .
  • apparatuses 4000 and 5000 shown in FIG. 22 and FIG. 23 only show a memory, a processor, and a communication interface, in the specific implementation process, those skilled in the art should understand that the apparatuses 4000 and 5000 also include implementation Other devices necessary for proper operation. Meanwhile, according to specific needs, those skilled in the art should understand that the apparatuses 4000 and 5000 may further include hardware devices for implementing other additional functions. In addition, those skilled in the art should understand that the apparatuses 4000 and 5000 may only include the necessary devices for implementing the embodiments of the present application, and need not include all the devices shown in FIG. 22 or FIG. 23 .
  • the apparatus 4000 is equivalent to the training equipment 120 in 1
  • the apparatus 5000 is equivalent to the execution apparatus 110 in FIG. 1 .
  • Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: Universal Serial Bus flash disk (USB flash disk, UFD), UFD may also be referred to as U disk or USB flash drive, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), disks or CDs and other media that can store program code.
  • USB flash disk Universal Serial Bus flash disk
  • UFD Universal Serial Bus flash disk
  • ROM read-only memory
  • RAM random access memory

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

一种图像融合方法、图像融合模型的训练方法及装置,涉及人工智能领域,具体涉及计算机视觉领域。该图像融合方法包括获取待处理的彩色图像、红外图像以及背景参考图像(S1110),红外图像和待处理的彩色图像是针对同一场景拍摄的;将待处理的彩色图像、红外图像和背景参考图像输入图像融合模型中进行特征提取,基于提取的特征进行图像融合,以得到融合图像(S1120)。该方法能够提高融合图像的图像质量,同时保证融合图像颜色准确自然。

Description

图像融合方法、图像融合模型的训练方法和装置 技术领域
本申请实施例涉及计算机视觉领域,尤其涉及一种图像融合方法、图像融合模型的训练方法及装置。
背景技术
计算机视觉是各个应用领域,如制造业、检验、文档分析、医疗诊断,和军事等领域中各种智能/自主系统中不可分割的一部分,它是一门关于如何运用照相机/摄像机和计算机来获取我们所需的,被拍摄对象的数据与信息的学问。形象地说,就是给计算机安装上眼睛(照相机/摄像机)和大脑(算法)用来代替人眼对目标进行识别、跟踪和测量等,从而使计算机能够感知环境。因为感知可以看作是从感官信号中提取信息,所以计算机视觉也可以看作是研究如何使人工系统从图像或多维数据中“感知”的科学。总的来说,计算机视觉就是用各种成象系统代替视觉器官获取输入信息,再由计算机来代替大脑对这些输入信息完成处理和解释。计算机视觉的最终研究目标就是使计算机能像人那样通过视觉观察和理解世界,具有自主适应环境的能力。
成像质量的高低对图像处理效果具有显著影响。随着成像技术的发展,当前拍摄设备在照度较高的情况下,例如白天的理想情况下,能够获得较好的成像结果。然而,在低照度的场景下,例如阴雨天气或夜间场景,拍摄的图像或视频存在分辨率低、对比度差、图像细节丢失等问题,当前的设备通常采用近红外补光的方式来提升低照度场景下的成像质量,但由于其成像特性,红外图像无法还原物体的真实色彩。
由于彩色图像和红外图像存在一定的互补性,通过融合彩色图像和红外图像能够得到融合图像,以提高成像质量。然而,目前的融合方法无法保证融合效果,输出图像中丢失大量细节,影响了输出图像的质量。
因此,如何提高融合图像的质量成为一个亟待解决的问题。
发明内容
本申请提供一种图像融合方法、图像融合模型的训练方法及装置,能够使融合图像包含更多图像细节,提高融合图像的质量,同时保证融合图像颜色准确自然。
第一方面,提供一种图像融合方法,该方法包括:获取待处理的彩色图像、红外图像以及背景参考图像,红外图像和待处理的彩色图像是针对同一场景拍摄的,同一场景指的是待处理的彩色图像和红外图像之间的相似度大于第一阈值;待处理的彩色图像为场景对可见光的反射形成的影像,红外图像为场景对红外波段的光的反射形成的影像;将待处理的彩色图像、红外图像和背景参考图像输入训练好的图像融合模型中进行特征提取,基于提取的特征进行图像融合,以得到融合图像;其中,背景参考图像与待处理的彩色图像之间的相似度大于第二阈值。
根据本申请实施例的方案,彩色图像拥有丰富的色彩信息,红外图像拥有更多的纹理信息,两者融合得到的融合图像具有自然的色彩以及丰富的纹理信息,显著提高了融合图像的前景质量;通过增加背景参考图像,能够解决红外图像手电筒效应可能导致的背景模糊问题,很大程度上提升输出图像的背景质量,即同时增强输出图像的前景区域质量和背景区域质量,实现全画面的图像增强。
本申请实施例中的相似度可以为图像纹理相似度。例如,待处理的彩色图像和红外图像之间的相似度可以为待处理的彩色图像和红外图像之间的图像纹理相似度。背景参考图像与待处理的彩色图像之间的相似度可以为背景参考图像与待处理的彩色图像之间的图像纹理相似度。
背景参考图像中的背景区域与待处理的彩色图像中的背景区域相同。背景参考图像与待处理的彩色图像之间的相似度大于第二阈值,可以为,背景参考图像的背景区域与待处理的彩色图像的背景区域之间的相似度大于第二阈值。背景区域可以通过现有技术确定,本申请实施例对此不作限定。
针对同一场景拍摄可以理解为红外图像和待处理的彩色图像的画面内容相同。例如,红外图像可以是与待处理的彩色图像在同一时刻对同一区域拍摄的红外图像。
背景参考图像可以以彩色图的形式输入图像融合模型,也可以以灰度图的形式输入图像融合模型。
结合第一方面,在第一方面的某些实现方式中,该方法还包括:获取融合权重,将融合权重输入图像融合模型中;其中,融合权重用于对待处理的彩色图像和红外图像进行加权。
也就是说,融合权重用于调整待处理的彩色图像和红外图像在融合图像中的融合比例。
根据本申请实施例的方案,使用相同的图像融合模型进行融合得到的融合图像无法满足不同应用场景的融合要求,通过引入融合权重,能够调整彩色图像和红外图像的融合比例,有利于应用于不同的应用场景。也就是说无需针对不同的应用场景分别训练多个图像融合模型,仅通过调整融合权重,即可应用于不同的场景,提高了模型使用的自由度。
结合第一方面,在第一方面的某些实现方式中,融合权重对应部分或全部的融合图像。
其中,融合权重对应全部的融合图像可以理解为,在整个融合图像中,融合权重仅有一个。在该融合图像中的任意区域中,待处理的彩色图像和红外图像的融合比例是相同的。
融合权重对应部分融合图像可以理解为,融合权重对应融合图像中的一个区域。在该情况下,融合权重的数量可以为多个,多个融合权重分别对应融合图像中的不同区域。
根据本申请实施例的方案,不同区域对应不同的融合权重,以满足同一图像中的不同区域对图像融合的要求,有利于提高输出图像的图像质量。
结合第一方面,在第一方面的某些实现方式中,融合权重大于等于0,且小于等于1,红外图像在融合图像中的比例与融合权重呈正相关关系。
融合权重越大,红外图像在融合图像中的比例越大,即融合图像中融合的红外信息越多。
结合第一方面,在第一方面的某些实现方式中,待处理的彩色图像包括N帧彩色图像,红外图像包括N帧彩色图像对应的N帧红外图像,N帧彩色图像对应的背景参考图像是根据N帧彩色图像中的M帧彩色图像的背景参考图像确定的,M为正整数,N为大于1的正整数,N>M。
具体地,视频中的任一帧彩色图像和红外图像均可以采用该方法进行图像融合。
示例性地,N帧彩色图像对应的N帧红外图像可以是与N帧彩色图像在同一时刻对同一区域进行拍摄的情况下获得的。也就是说N帧彩色图像与N帧红外图像是一一对应的。
根据本申请实施例的方案,复用之前的帧的背景参考图像的特征,无需在每次融合的过程中均提取背景参考图像的特征,减少了计算量,能够在保证成像质量的同时减少硬件开销,在设备的计算资源有限的情况下,仍然可以实现图像融合。
结合第一方面,在第一方面的某些实现方式中,分别提取N帧彩色图像的特征和N帧红外图像的特征;分别提取M帧彩色图像对应的M个背景参考图像的特征;根据N帧彩色图像的特征和N帧红外图像的特征以及M个背景参考图像的特征分别重建得到N个融合图像。
示例性地,N帧彩色图像和N帧红外图像可以同时输入图像融合模型,这样,可以同时提取N帧彩色图像的特征和N帧红外图像的特征,进一步提高处理速度。
应理解,N帧彩色图像和N帧红外图像也可以依次输入图像融合模型,依次提取N帧彩色图像的特征和N帧红外图像的特征。
根据本申请实施例的方案,同时对多帧图像进行融合,提高处理速度,且复用背景参考图像的特征,减少了背景参考图像的特征的提取过程中的计算量,降低硬件开销。
结合第一方面,在第一方面的某些实现方式中,背景参考图像是通过以下任一方式获得的:根据待处理的彩色图像之前的多帧得到背景参考图像;将待处理的彩色图像之前的长曝光帧作为背景参考图像,长曝光帧为在曝光时长大于第三阈值的情况下得到的帧;将待处理的彩色图像进行时域降噪后的结果作为背景参考图像;或者将待处理的彩色图像之前的帧的融合图像作为背景参考图像。
结合第一方面,在第一方面的某些实现方式中,训练好的图像融合模型是通过以第一彩色图像和第一红外图像作为图像融合模型的输入,以损失函数的值小于第四阈值为目标对图像融合模型进行训练得到的;损失函数包括第一损失函数,第一损失函数用于指示图像融合模型输出的图像与目标融合图像之间的差异,目标融合图像是根据目标彩色图像和目标红外图像确定的,第一彩色图像、第一红外图像、目标彩色图像和目标红外图像是针对同一场景拍摄的,同一场景指的是第一彩色图像、第一红外图像、目标彩色图像和目标红外图像中的任意两个图像之间的相似度大于第一阈值,目标彩色图像的信噪比高于第一彩色图像的信噪比,目标红外图像的信噪比高于第一红外图像的信噪比。
根据本申请实施例的方案,通过目标彩色图像和目标红外图像确定目标融合图像,基于目标融合图像训练图像融合模型,使得图像融合模型能够充分利用红外信息,有利于在输出图像中融合更多的纹理信息,保留更多的图像细节。
结合第一方面,在第一方面的某些实现方式中,损失函数还包括第二损失函数,第二损失函数用于指示目标彩色图像与图像融合模型输出的图像之间的差异。
根据本申请实施例的方案,该损失约束图像融合模型输出的图像与目标彩色图像尽可能相似,既能保证降噪的效果,又保证输出的图像颜色与目标彩色图像一致,避免出现输出的图像颜色错误的问题。此外,降噪任务和融合任务协同执行,减少信息损失,既能够保证融合图像中保留丰富的纹理细节,又能够保证融合图像达到较高的分辨率以及真实的彩色信息。
结合第一方面,在第一方面的某些实现方式中,目标融合图像为亮度通道的图像,图像融合模型输出的图像与目标融合图像之间的差异为图像融合模型输出的图像的亮度通道与目标融合图像之间的差异。
根据本申请实施例的方案,在亮度通道层面上进行训练,有利于融合更多的纹理特征,减少其他因素对图像融合过程的影响。
第二方面,提供一种图像融合模型的训练方法,该训练方法包括:获取至少一个训练样本,训练样本包括第一彩色图像、第一红外图像、目标彩色图像和目标红外图像,第一彩色图像、第一红外图像、目标彩色图像和目标红外图像是针对同一场景拍摄的,同一场景指的是第一彩色图像、第一红外图像、目标彩色图像和目标红外图像中的任意两个图像之间的相似度大于第一阈值,第一彩色图像和目标彩色图像为场景对可见光的反射形成的影像,第一红外图像和目标红外图像为场景对红外波段的光的反射形成的影像;目标彩色图像的信噪比高于第一彩色图像的信噪比,目标红外图像的信噪比高于第一红外图像的信噪比;以第一彩色图像和第一红外图像作为图像融合模型的输入,以损失函数的值小于第四阈值为目标对图像融合模型进行训练,得到训练好的图像融合模型;其中,损失函数包括第一损失函数,第一损失函数用于指示图像融合模型输出的图像与目标融合图像之间的差异,目标融合图像是根据目标彩色图像和目标红外图像确定的。
其中,彩色图像也可以称为可见光图像。
第一彩色图像与第一红外图像是一一对应的。例如,第一红外图像可以是与第一彩色图像在同一时刻拍摄的红外图像。
目标彩色图像与目标红外图像是一一对应的。例如,目标红外图像是与目标彩色图像在同一时刻的红外图像。
针对同一场景拍摄可以理解为图像中的画面内容相同,例如在同一位置拍摄的同样的场景。
本申请实施例中的相似度可以为图像纹理相似度。例如,第一彩色图像、第一红外图像、目标彩色图 像和目标红外图像中的任意两个图像之间的相似度可以为第一彩色图像、第一红外图像、目标彩色图像和目标红外图像中的任意两个图像之间的图像纹理相似度。
在本申请实施例的方案中,彩色图像拥有丰富的色彩信息,红外图像拥有更多的纹理信息,两者融合得到的融合图像具有自然的色彩以及丰富的纹理信息,根据目标彩色图像和目标红外图像确定目标融合图像,基于目标融合图像训练图像融合模型,使得图像融合模型能够充分利用红外信息,有利于在输出图像中融合更多的纹理信息,保留更多的图像细节。
结合第二方面,在第二方面的某些实现方式中,以第一彩色图像和第一红外图像作为图像融合模型的输入,以损失函数的值小于第四阈值为目标对图像融合模型进行训练,得到训练好的图像融合模型,包括:以第一融合权重、第一彩色图像和第一红外图像作为图像融合模型的输入,以损失函数的值小于第四阈值为目标对图像融合模型进行训练,得到训练好的图像融合模型,第一融合权重用于对第一彩色图像和第一红外图像进行加权,目标融合图像是根据第一融合权重、目标彩色图像和目标红外图像确定的。
也就是说,第一融合权重用于调整第一彩色图像和第一红外图像在图像融合模型输出的图像中的融合比例。
可选地,根据第一融合权重、目标彩色图像和目标红外图像确定目标融合图像,包括:根据目标彩色图像和目标红外图像确定监督图像,根据第一融合权重对监督图像和目标彩色图像进行加权。
也就是说,根据第一融合权重调整监督图像和目标彩色图像在目标融合图像中所占的比例。
根据本申请实施例的方案,使用相同的图像融合模型进行融合得到的融合图像无法满足不同应用场景的融合要求,通过引入融合权重,能够调整彩色图像和红外图像的融合比例,有利于应用于不同的应用场景。也就是说无需针对不同的应用场景分别训练多个图像融合模型,仅通过调整融合权重,即可应用于不同的场景,提高了模型使用的自由度。
结合第二方面,在第二方面的某些实现方式中,第一融合权重对应部分或全部的图像融合模型输出的图像。
其中,第一融合权重对应全部的图像融合模型输出的图像可以理解为,在整个图像融合模型输出的图像中,第一融合权重仅有一个。在该图像融合模型输出的图像中的任意区域中,第一彩色图像和第一红外图像的融合比例是相同的。
第一融合权重对应部分图像融合模型输出的图像可以理解为,第一融合权重对应图像融合模型输出的图像中的一个区域。在该情况下,第一融合权重的数量可以为多个,多个第一融合权重分别对应图像融合模型输出的图像中的不同区域。第一融合权重可以理解为局部权重。局部权重用于指示在图像融合过程中局部区域的融合权重。在融合过程中,不同的区域可以采用不同的第一融合权重。
本申请实施例中,不同区域对应不同的融合权重,以满足同一图像中的不同区域对图像融合的要求,有利于提高输出图像的图像质量。
第一融合权重可以以参数形式输入图像融合模型中,也可以以融合权重图的形式输入图像融合模型中,本申请对此不做限定。
以融合权重图的形式表示第一融合权重,能够降低调整第一融合权重的复杂度。在有多个第一融合权重情况下,通过融合权重图更有利于表示第一融合权重对应的区域。尤其在第一融合权重对应的区域为不规则的形状的情况下,融合权重图的形式更有利于表示第一融合权重对应的区域。
结合第二方面,在第二方面的某些实现方式中,以第一彩色图像和第一红外图像作为图像融合模型的输入,以损失函数的值小于第四阈值为目标对图像融合模型进行训练,得到训练好的图像融合模型,包括:以第一背景参考图、第一彩色图像和第一红外图像作为图像融合模型的输入,以损失函数的值小于第四阈值为目标对图像融合模型进行训练,得到训练好的图像融合模型,第一背景参考图像与第一彩色图像之间的相似度大于第二阈值。
其中,第一背景参考图像中的背景区域与第一彩色图像中的背景区域相同。第一背景参考图像与第一彩色图像之间的相似度大于第二阈值,可以为,第一背景参考图像的背景区域与第一彩色图像的背景区域之间的相似度大于第二阈值。背景区域可以通过现有技术确定,本申请实施例对此不作限定。
示例性地,第一背景参考图像可以以彩色图的形式输入图像融合模型,也可以以灰度图的形式输入图像融合模型。
根据本申请实施例的方案,通过将背景参考图像增加为图像融合模型的输入,并基于此训练图像融合模型,能够解决红外图像手电筒效应可能导致的背景模糊问题,很大程度上提升输出图像的背景质量,即同时增强输出图像的前景区域质量和背景区域质量,实现全画面的图像增强。
结合第二方面,在第二方面的某些实现方式中,损失函数还包括第二损失函数,第二损失函数用于指示目标彩色图像与图像融合模型输出的图像之间的差异。
根据本申请实施例的方案,该损失约束图像融合模型输出的图像与目标彩色图像尽可能相似,既能保证降噪的效果,又保证输出的图像颜色与目标彩色图像一致,避免出现输出的图像颜色错误的问题。此外,降噪任务和融合任务协同执行,减少信息损失,既能够保证融合图像中保留丰富的纹理细节,又能够保证融合图像达到较高的分辨率以及真实的彩色信息。
结合第二方面,在第二方面的某些实现方式中,目标融合图像为亮度通道的图像,图像融合模型输出的图像与目标融合图像之间的差异为图像融合模型输出的图像的亮度通道与目标融合图像之间的差异。
根据本申请实施例的方案,在亮度通道层面上进行训练,有利于融合更多的纹理特征,减少其他因素对图像融合过程的影响。
结合第二方面,在第二方面的某些实现方式中,目标融合图像满足如下公式:
y fuse_adj=y fuse×IN_FuseMap+(1-IN_FuseMap)×y gt_Vis
其中,yfuse_adj表示目标融合图像,y fuse表示由目标彩色图像的亮度通道和目标红外图像的亮度通道得到的融合图像,IN_FuseMap表示融合权重图,y gt_Vis表示目标彩色图像的亮度通道。融合权重图中的不同区域上的值分别指示图像的不同区域对应的权重。
第三方面,提供一种图像融合装置,该装置包括用于执行第一方面中的任意一种实现方式中的方法的模块/单元。
第四方面,提供一种图像融合模型的训练装置,该装置包括用于执行第二方面中的任意一种实现方式中的方法的模块/单元。
第五方面,提供一种图像融合装置,该装置包括:存储器,用于存储程序;处理器,用于执行存储器存储的程序,当存储器存储的程序被执行时,处理器用于执行第一方面中的任意一种实现方式中的方法。
第六方面,提供一种图像融合模型的训练装置,该装置包括:存储器,用于存储程序;处理器,用于执行存储器存储的程序,当存储器存储的程序被执行时,处理器用于执行第二方面中的任意一种实现方式中的方法。
第七方面,提供一种计算机可读介质,该计算机可读介质存储用于设备执行的程序代码,该程序代码包括用于执行第一方面或第二方面中的任意一种实现方式中的方法。
第八方面,提供一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述第一方面或第二方面中的任意一种实现方式中的方法。
第九方面,提供一种芯片,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,执行上述第一方面或第二方面中的任意一种实现方式中的方法。
可选地,作为一种实现方式,所述芯片还可以包括存储器,所述存储器中存储有指令,所述处理器用于执行所述存储器上存储的指令,当所述指令被执行时,所述处理器用于执行上述第一方面或第二方面中 的任意一种实现方式中的方法。
上述芯片具体可以是现场可编程门阵列(field-programmable gate array,FPGA)或者专用集成电路(application-specific integrated circuit,ASIC)。
第十方面,提供一种电子设备,该电子设备包括上述第三方面至第四方面中的任意一种实现方式中的装置。
附图说明
下面对本申请实施例用到的附图进行介绍。
图1是本申请实施例提供的系统架构的结构示意图;
图2是本申请实施例提供的一种卷积神经网络的示意图;
图3是本申请实施例提供的一种芯片硬件结构示意图;
图4是本申请实施例提供的一种系统架构的示意图;
图5是夜间拍摄的彩色图像和红外图像的示意图;
图6是本申请实施例提供的一种获取彩色图像和红外图像的装置的示意图;
图7是本申请实施例提供的另一种获取彩色图像和红外图像的装置的示意图;
图8是本申请实施例提供的再一种获取彩色图像和红外图像的装置的示意图;
图9是本申请实施例提供的一种图像融合装置的示意性框图;
图10是本申请实施例提供的一种图像融合模型的训练方法的示意性流程图;
图11是本申请实施例提供的一个训练样本的示意图;
图12是本申请实施例提供的背景参考图像的示意图;
图13是本申请实施例提供的一种图像融合模型的示意性框图;
图14是本申请实施例提供的另一种图像融合模型的示意性框图;
图15是本申请实施例提供的一种获取融合权重的方法的示意图;
图16是本申请实施例提供的一种图像融合方法的示意性流程图;
图17是本申请实施例提供的不同融合权重下的融合图像的示意图;
图18是本申请实施例提供的融合结果的示意图;
图19是本申请实施例提供的不同方法得到的融合图像的效果对比图;
图20是本申请实施例提供的图像融合模型的训练装置的示意性框图;
图21是本申请实施例提供的图像融合装置的示意性框图;
图22是本申请实施例提供的图像融合模型的训练装置的示意性框图;
图23是本申请实施例提供的图像融合装置的示意性框图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
本申请实施例提供的图像融合方法能够应用在视频监控、平安城市、夜间拍摄以及需要提升图像质量的场景。具体而言,本申请实施例的图像融合方法能够应用在视频监控和夜间拍摄中,下面分别对视频监控和夜间拍摄进行简单的介绍。
视频监控:
视频监控是当前城市综合治安和交通监管的重要手段。随着成像技术的发展,当前监控设备在白天理想情况下,可获得较好的成像效果。但在一些不理想的情况下,例如阴雨天气或夜间等照度不佳的场景下,采集到的监控画面存在分辨率低、对比度差以及图像细节丢失等问题。
本申请实施例提供的方法能够显著提高采集到的监控视频的成像质量,更好地满足监控人员对监控视频的清晰度要求,便于监控人员查看,获取有价值的信息。
夜间拍摄:
当用户需要在夜间拍摄照片或视频时,通过提高图像融合的方式,能够提高夜间成像的质量,提升用户体验。
利用本申请实施例的方法,能够显著提高夜间成像的质量,满足用户对夜间拍摄的需求,节省了用户后期处理的时间,提高了用户体验。
本申请实施例提供的方法和装置还可以用于扩充训练数据库,如图1所示执行设备110的I/O接口112可以将经执行设备处理过的图像(如融合图像)和用户输入的待处理的彩色图像和红外图像一起作为训练数据对发送给数据库130,以使得数据库130维护的训练数据更加丰富,从而为训练设备120的训练工作提供更丰富的训练数据。
下面从模型训练侧和模型应用侧对本申请提供的方法进行描述:
本申请实施例提供的图像融合模型的训练方法,涉及计算机视觉的处理,具体可以应用于数据训练、机器学习、深度学习等数据处理方法,对训练数据(如本申请中的第一彩色图像、目标彩色图像、第一红外图像和目标红外图像)进行符号化和形式化的智能信息建模、抽取、预处理、训练等,最终得到训练好的图像融合网络;并且,本申请实施例提供的图像融合方法可以运用上述训练好的图像融合网络,将输入数据(如本申请中的待处理的彩色图像和红外图像)输入到所述训练好的图像融合网络中,得到输出数据(如本申请中的融合图像)。需要说明的是,本申请实施例提供的图像融合网络的训练方法和图像融合方法是基于同一个构思产生的发明,也可以理解为一个系统中的两个部分,或一个整体流程的两个阶段:如模型训练阶段和模型应用阶段。
由于本申请实施例涉及大量神经网络的应用,为了便于理解,下面先对本申请实施例涉及的相关术语及神经网络等相关概念进行介绍。
(1)神经网络
神经网络可以是由神经单元组成的,神经单元可以是指以xs和截距1为输入的运算单元,该运算单元的输出可以为:
Figure PCTCN2021104634-appb-000001
其中,s=1、2、……n,n为大于1的自然数,Ws为xs的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于对神经网络中获取到的特征进行非线性变换,将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入。激活函数可以是sigmoid函数。神经网络是将许多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。
(2)深度神经网络
深度神经网络(deep neural network,DNN),也称多层神经网络,可以理解为具有多层隐含层的神经网络。按照不同层的位置对DNN进行划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。
虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下线性关系表达式:
Figure PCTCN2021104634-appb-000002
其中,
Figure PCTCN2021104634-appb-000003
是输入向量,
Figure PCTCN2021104634-appb-000004
是输出向量,
Figure PCTCN2021104634-appb-000005
是偏移向量,W是权重矩阵(也称系 数),α()是激活函数。每一层仅仅是对输入向量
Figure PCTCN2021104634-appb-000006
经过如此简单的操作得到输出向量
Figure PCTCN2021104634-appb-000007
由于DNN层数多,系数W和偏移向量
Figure PCTCN2021104634-appb-000008
的数量也比较多。这些参数在DNN中的定义如下所述:以系数W为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为
Figure PCTCN2021104634-appb-000009
上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。
综上,第L-1层的第k个神经元到第L层的第j个神经元的系数定义为
Figure PCTCN2021104634-appb-000010
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。
(3)卷积神经网络
卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器,该特征抽取器可以看作是滤波器。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取特征的方式与位置无关。卷积核可以以随机大小的矩阵的形式化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。
(4)损失函数
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断地调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。
(5)像素值
图像的像素值可以是一个红绿蓝(RGB)颜色值,像素值可以是表示颜色的长整数。例如,像素值为256*Red+100*Green+76*Blue,其中,Blue代表蓝色分量,Green代表绿色分量,Red代表红色分量。各个颜色分量中,数值越小,亮度越低,数值越大,亮度越高。对于灰度图像来说,像素值可以是灰度值。
(6)YUV
YUV是一种颜色空间,Y表示明亮度(Luminance或Luma),也就是灰阶值;“U”和“V”表示色度(Chrominance或Chroma),用于描述图像色彩及饱和度,用于指定像素的颜色。“U”和“V”是构成彩色的两个分量。采用YUV色彩空间的重要性是它的亮度信号Y和色度信号U、V是分离的。如果只有Y信号分量而没有U、V信号分量,那么这样表示的图像就是黑白灰度图像。亮度信号也可以称为亮度通道,色度信号也可以称为色度通道。
(7)编码器、解码器
编码器(encoder)用于提取输入图像的特征。具体地,编码器可以采用神经网络,例如,卷积神经网 络。
解码器(decoder)用于将提取的特征恢复为图像。具体地,解码器可以采用神经网络,例如,卷积神经网络。
下面介绍本申请实施例提供的系统架构。
参见附图1,本申请实施例提供了一种系统架构100。如系统架构100所示,数据采集设备160用于采集训练数据。示例性地,本申请实施例中训练数据可以包括:第一彩色图像、目标彩色图像、第一红外图像和目标红外图像;在采集到训练数据之后,数据采集设备160将这些训练数据存入数据库130,训练设备120基于数据库130中维护的训练数据训练得到目标模型/规则101。
下面对训练设备120基于训练数据得到目标模型/规则101进行描述。示例性地,训练设备120对第一彩色图像和第一红外图像进行处理,将输出的图像与目标融合图像进行对比,直到训练设备120输出的图像与目标融合图像的差值小于一定的阈值,从而完成目标模型/规则101的训练。
该目标模型/规则101能够用于实现本申请实施例提供的图像融合方法,即,将待处理的图像,例如待处理的彩色图像和红外图像通过相关预处理后输入该目标模型/规则101,即可得到融合后的图像。本申请实施例中的目标模型/规则101具体可以为神经网络。需要说明的是,在实际的应用中,数据库130中维护的训练数据不一定都来自于数据采集设备160的采集,也有可能是从其他设备接收得到的。另外需要说明的是,训练设备120也不一定完全基于数据库130维护的训练数据进行目标模型/规则101的训练,也有可能从云端或其他地方获取训练数据进行模型训练,上述描述不应该作为对本申请实施例的限定。
根据训练设备120训练得到的目标模型/规则101可以应用于不同的系统或设备中,如应用于图1所示的执行设备110,所述执行设备110可以是终端,如手机终端,平板电脑,笔记本电脑,增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR),车载终端等,还可以是服务器或者云端等。在附图1中,执行设备110配置有(input/output,I/O)接口112,用于与外部设备进行数据交互,用户可以通过客户设备140向I/O接口112输入数据,所述输入数据在本申请实施例中可以包括:待处理的彩色图像和红外图像。
预处理模块113用于根据I/O接口112接收到的输入数据(如所述待处理的彩色图像和红外图像)进行预处理,在本申请实施例中,预处理模块113可以用于基于待处理的彩色图像或红外图像获得融合权重。
示例性地,预处理模块114可以用于获得背景参考图像。
在本申请实施例中,也可以没有预处理模块113和预处理模块114,而直接采用计算模块111对输入数据进行处理。
在执行设备110对输入数据进行预处理,或者在执行设备110的计算模块111执行计算等相关的处理过程中,执行设备110可以调用数据存储系统150中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储系统150中。
最后,I/O接口112将处理结果,如上述得到的融合后的图像返回给客户设备140,从而提供给用户。
值得说明的是,训练设备120可以针对不同的目标或称不同的任务,基于不同的训练数据生成相应的目标模型/规则101,该相应的目标模型/规则101即可以用于实现上述目标或完成上述任务,从而为用户提供所需的结果。
在附图1中所示情况下,用户可以手动给定输入数据,该手动给定可以通过I/O接口112提供的界面进行操作。另一种情况下,客户设备140可以自动地向I/O接口112发送输入数据,如果要求客户设备140自动发送输入数据需要获得用户的授权,则用户可以在客户设备140中设置相应权限。用户可以在客户设备140查看执行设备110输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式。客户设备140也可以作为数据采集端,采集如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果作为新的样本数据,并存入数据库130。当然,也可以不经过客户设备140进行采集,而是由I/O接口112直 接将如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果,作为新的样本数据存入数据库130。
值得注意的是,附图1仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在附图1中,数据存储系统150相对执行设备110是外部存储器,在其它情况下,也可以将数据存储系统150置于执行设备110中。
如图1所示,根据训练设备120训练得到目标模型/规则101,该目标模型/规则101在本申请实施例中可以是本申请中的神经网络,具体的,本申请中的神经网络可以包括CNN或深度卷积神经网络(deep convolutional neural networks,DCNN)等等。
由于CNN是一种非常常见的神经网络,下面结合图2重点对CNN的结构进行详细的介绍。如前文的基础概念介绍所述,卷积神经网络是一种带有卷积结构的深度神经网络,是一种深度学习(deep learning)架构,深度学习架构是指通过机器学习的算法,在不同的抽象层级上进行多个层次的学习。作为一种深度学习架构,CNN是一种前馈(feed-forward)人工神经网络,该前馈人工神经网络中的各个神经元可以对输入其中的图像作出响应。
如图2所示,卷积神经网络(CNN)200可以包括输入层210,卷积层/池化层220(其中池化层为可选的),以及全连接层(fully connected layer)230。
卷积层/池化层220:
卷积层:
如图2所示卷积层/池化层220可以包括如示例221-226层,举例来说:在一种实现中,221层为卷积层,222层为池化层,223层为卷积层,224层为池化层,225为卷积层,226为池化层;在另一种实现方式中,221、222为卷积层,223为池化层,224、225为卷积层,226为池化层。即卷积层的输出可以作为随后的池化层的输入,也可以作为另一个卷积层的输入以继续进行卷积操作。
下面将以卷积层221为例,介绍一层卷积层的内部工作原理。
卷积层221可以包括很多个卷积算子,卷积算子也称为核,其在图像处理中的作用相当于一个从输入图像矩阵中提取特定信息的过滤器,卷积算子本质上可以是一个权重矩阵,这个权重矩阵通常被预先定义,在对图像进行卷积操作的过程中,权重矩阵通常在输入图像上沿着水平方向一个像素接着一个像素(或两个像素接着两个像素……这取决于步长stride的取值)的进行处理,从而完成从图像中提取特定特征的工作。该权重矩阵的大小应该与图像的大小相关,需要注意的是,权重矩阵的纵深维度(depth dimension)和输入图像的纵深维度是相同的,在进行卷积运算的过程中,权重矩阵会延伸到输入图像的整个深度。因此,和一个单一的权重矩阵进行卷积会产生一个单一纵深维度的卷积化输出,但是大多数情况下不使用单一权重矩阵,而是应用多个尺寸(行×列)相同的权重矩阵,即多个同型矩阵。每个权重矩阵的输出被堆叠起来形成卷积图像的纵深维度,这里的维度可以理解为由上面所述的“多个”来决定。不同的权重矩阵可以用来提取图像中不同的特征,例如一个权重矩阵用来提取图像边缘信息,另一个权重矩阵用来提取图像的特定颜色,又一个权重矩阵用来对图像中不需要的噪点进行模糊化等。该多个权重矩阵尺寸(行×列)相同,经过该多个尺寸相同的权重矩阵提取后的特征图的尺寸也相同,再将提取到的多个尺寸相同的特征图合并形成卷积运算的输出。
这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到,通过训练得到的权重值形成的各个权重矩阵可以用来从输入图像中提取信息,从而使得卷积神经网络200进行正确的预测。
当卷积神经网络200有多个卷积层的时候,初始的卷积层(例如221)往往提取较多的一般特征,该一般特征也可以称之为低级别的特征;随着卷积神经网络200深度的加深,越往后的卷积层(例如226)提取到的特征越来越复杂,比如高级别的语义之类的特征,语义越高的特征越适用于待解决的问题。
池化层:
由于常常需要减少训练参数的数量,因此卷积层之后常常需要周期性的引入池化层,在如图2中220所示例的221-226各层,可以是一层卷积层后面跟一层池化层,也可以是多层卷积层后面接一层或多层池化层。在图像处理过程中,池化层的唯一目的就是减少图像的空间大小。池化层可以包括平均池化算子和/或最大池化算子,以用于对输入图像进行采样得到较小尺寸的图像。平均池化算子可以在特定范围内对图像中的像素值进行计算产生平均值作为平均池化的结果。最大池化算子可以在特定范围内取该范围内值最大的像素作为最大池化的结果。另外,就像卷积层中用权重矩阵的大小应该与图像尺寸相关一样,池化层中的运算符也应该与图像的大小相关。通过池化层处理后输出的图像尺寸可以小于输入池化层的图像的尺寸,池化层输出的图像中每个像素点表示输入池化层的图像的对应子区域的平均值或最大值。
全连接层230:
在经过卷积层/池化层220的处理后,卷积神经网络200还不足以输出所需要的输出信息。因为如前所述,卷积层/池化层220只会提取特征,并减少输入图像带来的参数。然而为了生成最终的输出信息(所需要的类信息或其他相关信息),卷积神经网络200需要利用全连接层230来生成一个或者一组所需要的类的数量的输出。因此,在全连接层230中可以包括多层隐含层(如图2所示的231、232至23n),该多层隐含层中所包含的参数可以根据具体的任务类型的相关训练数据进行预先训练得到,例如该任务类型可以包括图像识别,图像分类,图像超分辨率重建等等……
在全连接层230中的多层隐含层之后,也就是整个卷积神经网络200的最后层为输出层240,该输出层240具有类似分类交叉熵的损失函数,具体用于计算预测误差,一旦整个卷积神经网络200的前向传播(如图2由210至240方向的传播为前向传播)完成,反向传播(如图2由240至210方向的传播为反向传播)就会开始更新前面提到的各层的权重值以及偏差,以减少卷积神经网络200的损失,及卷积神经网络200通过输出层输出的结果和理想结果之间的误差。
需要说明的是,如图2所示的卷积神经网络200仅作为一种卷积神经网络的示例,在具体的应用中,卷积神经网络还可以以其他网络模型的形式存在,例如,仅包括图2中所示的网络结构的一部分,比如,本申请实施例中所采用的卷积神经网络可以仅包括输入层210、卷积层/池化层220和输出层240。
下面介绍本申请实施例提供的一种芯片硬件结构。
图3为本申请实施例提供的一种芯片硬件结构,该芯片包括神经网络处理器30。该芯片可以被设置在如图1所示的执行设备110中,用以完成计算模块111的计算工作。该芯片也可以被设置在如图1所示的训练设备120中,用以完成训练设备120的训练工作并输出目标模型/规则101。如图2所示的卷积神经网络中各层的算法均可在如图3所示的芯片中得以实现。本申请实施例中的图像融合方法以及图像融合模型的训练方法均可在如图3所示的芯片中得以实现。
神经网络处理器30可以是神经网络处理器(neural-network processing unit,NPU),张量处理器(tensor processing unit,TPU),或者图形处理器(graphics processing unit,GPU)等一切适合用于大规模异或运算处理的处理器。以NPU为例:神经网络处理器NPU30作为协处理器挂载到主中央处理器(central processing unit,CPU)(host CPU)上,由主CPU分配任务。NPU的核心部分为运算电路303,控制器304控制运算电路303提取存储器(权重存储器或输入存储器)中的数据并进行运算。其中,TPU是谷歌(google)为机器学习全定制的人工智能加速器专用集成电路。
在一些实现中,运算电路303内部包括多个处理单元(process engine,PE)。在一些实现中,运算电路303是二维脉动阵列。运算电路303还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路303是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路303从权重存储器302中取矩阵B的权重数据,并缓存在运算电路303中的每一个PE上。运算电路303从输入存储器301中取矩阵A的输入数据,根据矩阵A的输入数据与矩阵B的权重数据进行矩阵运算,得到的矩阵的部分结果或最终结果, 保存在累加器(accumulator)308中。
向量计算单元307可以对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。例如,向量计算单元307可以用于神经网络中非卷积/非FC层的网络计算,如池化(pooling),批归一化(batch normalization),局部响应归一化(local response normalization)等。
在一些实现中,向量计算单元能307将经处理的输出的向量存储到统一缓存器306。例如,向量计算单元307可以将非线性函数应用到运算电路303的输出,例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元307生成归一化的值、合并值,或二者均有。在一些实现中,向量计算单元307将经处理的向量存储到统一存储器306。在一些实现中,经向量计算单元307处理过的向量能够用作运算电路303的激活输入,例如用于神经网络中后续层中的使用,如图2所示,若当前处理层是隐含层1(231),则经向量计算单元307处理过的向量还可以被用到隐含层2(232)中的计算。
统一存储器306用于存放输入数据以及输出数据。
权重数据直接通过存储单元访问控制器(direct memory access controller,DMAC)305,被存入到权重存储器302中。输入数据也通过DMAC被存入到统一存储器306中。
总线接口单元(bus interface unit,BIU)310,用于DMAC和取指存储器(instruction fetch buffer)309的交互;总线接口单元301还用于取指存储器309从外部存储器获取指令;总线接口单元301还用于存储单元访问控制器305从外部存储器获取输入矩阵A或者权重矩阵B的原数据。
DMAC主要用于将外部存储器DDR中的输入数据存入到统一存储器306中,或将权重数据存入到权重存储器302中,或将输入数据存入到输入存储器301中。
与控制器304连接的取指存储器(instruction fetch buffer)309,用于存储控制器304使用的指令;
控制器304,用于调用指存储器309中缓存的指令,实现控制该运算加速器的工作过程
一般地,统一存储器306,输入存储器301,权重存储器302以及取指存储器309均为片上(On-Chip)存储器,外部存储器为该NPU外部的存储器,该外部存储器可以为双倍数据率同步动态随机存储器(double data rate synchronous dynamic random access memory,DDR SDRAM)、高带宽存储器(high bandwidth memory,HBM)或其他可读可写的存储器。
其中,图2所示的卷积神经网络中各层的运算可以由运算电路303或向量计算单元307执行。示例性地,本申请实施例中的图像融合模型的训练方法以及图像融合方法均可以由运算电路303或向量计算单元307执行。
如图4所示,本申请实施例提供了一种系统架构400。该系统架构包括本地设备401、本地设备402以及执行设备410和数据存储系统450,其中,本地设备401和本地设备402通过通信网络与执行设备410连接。
执行设备410可以由一个或多个服务器实现。可选的,执行设备410可以与其它计算设备配合使用,例如:数据存储器、路由器、负载均衡器等设备。执行设备410可以布置在一个物理站点上,或者分布在多个物理站点上。执行设备410可以使用数据存储系统450中的数据,或者调用数据存储系统450中的程序代码来实现本申请实施例的时间序列预测模型的训练方法。
具体地,在一种实现方式中,执行设备410可以执行以下过程:
获取至少一个训练样本,训练样本包括第一彩色图像、第一红外图像、目标彩色图像和目标红外图像,第一彩色图像、第一红外图像、目标彩色图像和目标红外图像是针对同一场景拍摄的,同一场景指的是第一彩色图像、第一红外图像、目标彩色图像和目标红外图像中的任意两个图像之间的相似度大于第一阈值,第一彩色图像和目标彩色图像为场景对可见光的反射形成的影像,第一红外图像和目标红外图像为场景对红外波段的光的反射形成的影像;目标彩色图像的信噪比高于第一彩色图像的信噪比,目标红外图像的信噪比高于第一红外图像的信噪比;以第一彩色图像和第一红外图像作为图像融合模型的输入,以损失函数 的值小于第四阈值为目标对图像融合模型进行训练,得到训练好的图像融合模型;其中,损失函数包括第一损失函数,第一损失函数用于指示图像融合模型输出的图像与目标融合图像之间的差异,目标融合图像是根据目标彩色图像和目标红外图像确定的。
通过上述过程执行设备410能够获得一个图像融合模型,该图像融合模型可以用于得到融合后的图像。
用户可以操作各自的用户设备(例如本地设备401和本地设备402)与执行设备410进行交互。每个本地设备可以表示任何计算设备,例如个人计算机、计算机工作站、智能手机、平板电脑、智能摄像头、智能汽车或其他类型蜂窝电话、媒体消费设备、可穿戴设备、机顶盒、游戏机等。
每个用户的本地设备可以通过任何通信机制/通信标准的通信网络与执行设备410进行交互,通信网络可以是广域网、局域网、点对点连接等方式,或它们的任意组合。
在一种实现方式中,本地设备401、本地设备402从执行设备410获取到图像融合模型,将图像融合模型部署在本地设备401、本地设备402上,利用该图像融合模型进行图像融合。
在另一种实现中,执行设备410上可以直接部署图像融合模型,执行设备410通过从本地设备401和本地设备402获取待处理的图像,并采用图像融合模型对待处理的图像进行图像融合。
上述执行设备410也可以为云端设备,此时,执行设备410可以部署在云端;或者,上述执行设备410也可以为终端设备,此时,执行设备410可以部署在用户终端侧,本申请实施例对此并不限定。
在低照度的场景下,例如阴雨天气或夜间场景,拍摄的图像或视频存在分辨率低、对比度差、图像细节丢失等问题。例如,如图5的(a)所示的低照度场景下彩色路成像中,人脸基本无法辨识。当前的设备通常采用近红外补光的方式来提升低照度场景下的成像质量。如图5的(b)所示,从低照度场景下的近红外路成像中可以获得较好的人体细节以及人脸细节,但由于其成像特性,红外图像无法还原物体的真实色彩。由于彩色图像和红外图像存在一定的互补性,通过融合彩色图像和红外图像能够得到融合图像。传统的融合方式通常是基于亮度通道的融合,即先将彩色路图像转换到YUV颜色空间,然后将亮度通道Y与对应的红外图像进行多尺度融合,融合后的Y通道与原始的UV结合得到最终的融合结果。然而基于亮度通道的融合得到的融合图像会产生图像饱和度降低、色彩失真、噪声较多等问题。采用深度学习的方式可以实现彩色图像和红外图像的融合。然而,目前通常针对的是高清彩色图像和红外图像的融合任务,在彩色图像的质量较低的情况下,红外图像仅作为彩色图像的降噪参考,无法保证融合效果,输出图像中丢失大量细节,影响了输出图像的质量。
因此,如何在低照度的场景下提高成像质量成为一个亟待解决的问题。
本申请实施例提出一种图像融合模型的训练方法和图像融合方法,能够低照度的场景下提高成像质量。
由于本申请实施例的图像融合方法以彩色图像和红外图像作为输入数据,因此,本申请实施例的方案适用于能够获得彩色图像和红外图像的场景,下面举例说明三种获得彩色图像和红外图像的方法。
示例1:基于分光棱镜获得彩色图像和红外图像。
如图6所示,分光棱镜包括棱镜6020和滤光片6030。利用分光棱镜可以将镜头6010接收的入射光分为可见光和近红外光,通过彩色传感器6040和近红外传感器6050两个传感器,分别对可见光和近红外光进行成像,同时得到彩色图像和红外图像。
示例2:基于分时和插帧的方式获得彩色图像和红外图像。
如图7所示,补光控制单元7030通过周期性地开启和关闭红外补光单元7010,控制镜头7020传输到传感器7040表面的光线类型,即可见光或红外光,对被摄场景中的可见光和红外光分别成像。应理解,图7中示出的红外图像也可以为红外图像和彩色图像的合成图像,在低照度的情况下,该合成图像中的彩色图像的信息量较少,因此该合成图像可以作为本申请实施例中的红外图像。通过插帧算法,获得同一时刻的彩色图像和红外图像。插帧指的是通过前后两帧的图像信息,获得中间帧的图像。
示例3:基于RGB-近红外(Near-infrared,NIR)传感器获得彩色图像和红外图像。
如图8所示,利用RGB-NIR这种信息传感器的设计,在一次成像中,同时获得彩色路和红外图像。
下面结合附图对本申请实施例的图像融合模型的训练方法和图像融合方法进行详细的介绍。
图9是本申请实施例的图像融合装置600的示意图。为了更好的了解本申请实施例中的方法,下面对图9中的各个模块的功能进行简单的描述。
装置600可以是云服务设备,也可以是终端设备,例如,电脑、服务器等运算能力足以用来训练时间序列预测模型的设备,也可以是云服务设备和移动设备构成的系统。示例性地,该装置600可以为图1中的训练设备120或图3中的神经网络处理器30或图4中的本地设备或执行设备等。
装置600包括背景参考图像获取模块610、融合权重获取模块620和图像融合模块630。
在本申请实施例中,对彩色图像的增强是通过将彩色图像与红外图像融合实现的,该图像融合模型也可以理解为图像增强模型。
背景参考图像获取模块610用于获取背景参考图像,并将该背景参考图像输入图像融合模块630中。背景参考图像中的背景区域和彩色图像中的背景区域相同。
如图9所示,背景参考图像获取模块可以基于彩色图像获取背景参考图像。应理解,图9中仅为示例,背景参考图像模块610还可以采用其他方式获取背景参考图像,背景参考图像的获取方式可以参考后文中的方法800。
需要说明的是,背景参考图像获取模块610为可选模块。
融合权重获取模块620用于获取融合权重,并将融合权重输入图像融合模块630中。融合权重用于调整图像融合模型输出的图像中彩色图像和红外图像的融合比例。
如图9所示,融合权重获取模块可以基于红外图像获取融合权重。或者融合权重获取模块也可以基于彩色图像获取融合权重。应理解,图9中仅为示例,融合权重获取模块620还可以基于方式获取融合权重,融合权重的具体获取方式可以参考后文中的方法900。
需要说明的是,融合权重获取模块620为可选模块。
图像融合模块630用于对彩色图像和红外图像进行图像融合,得到融合图像。
在装置600包括背景参考图像获取模块610的情况下,图像融合模块630可以基于背景参考图像对彩色图像和红外图像进行图像融合,得到融合图像。或者,也可以理解为,图像融合模块630对背景参考图像、彩色图像和红外图像进行图像融合,得到融合图像。
在装置600包括融合权重获取模块610的情况下,图像融合模块630可以基于融合权重对彩色图像和红外图像进行图像融合,得到融合图像。
下面结合图10至图15对本申请实施例的图像融合模型的训练方法进行详细介绍。
图10为本申请实施例提供的一种图像融合模型的训练方法700。图10所示的方法可以由图像融合模型的训练装置来执行,该图像融合模型的训练装置可以是云服务设备,也可以是终端设备,例如,电脑、服务器等运算能力足以用来执行图像融合模型的训练方法的装置,也可以是由云服务设备和终端设备构成的系统。示例性地,方法700可以由图1中的训练设备120、图3中的神经网络处理器30或图4中的执行设备410或本地设备执行。
例如,方法700具体可以由如图1所示的训练设备120执行,方法700中的第一彩色图像、目标彩色图像、第一红外图像和目标红外图像可以是如图1所示的数据库130中维护的训练数据。可选的,方法700的S720和S730可以在训练设备120中执行,也可以在训练设备120之前由其他功能模块预先执行,即先对从数据库130中接收或者获取到的训练数据进行预处理,如S720和S730所述的获取过程,得到第一背景参考图像和第一融合权重,作为训练设备120的输入,并由训练设备120执行步骤S710和步骤S740。
再如,方法700具体可以由如图4中的本地设备执行,该本地设备可以为监控设备。具体地,方法700 可以由监控设备上的计算模块执行。
可选地,方法700可以由CPU处理,也可以由CPU和GPU共同处理,也可以不用GPU,而使用其他适合用于神经网络计算的处理器,本申请不做限制。
方法700包括步骤S710至步骤S740。下面对步骤S710至步骤S740进行详细说明。
S710,获取至少一个训练样本。训练样本包括第一彩色图像、目标彩色图像、第一红外图像和目标红外图像。
在本申请实施例中,彩色图像也可以称为可见光图像。
第一彩色图像和目标彩色图像为场景对可见光的反射形成的影像,第一红外图像和目标红外图像为场景对红外波段的光的反射形成的影像。
示例性地,彩色图像可以通过可见光成像传感器获得,红外图像是通过红外成像传感器获得。其中,第一彩色图像、第一红外图像、目标彩色图像和目标红外图像是针对同一场景拍摄的。
同一场景指的是第一彩色图像、第一红外图像、目标彩色图像和目标红外图像中的任意两个图像之间的相似度大于第一阈值。
本申请实施例中的相似度可以为图像纹理相似度。例如,第一彩色图像、第一红外图像、目标彩色图像和目标红外图像中的任意两个图像之间的相似度可以为第一彩色图像、第一红外图像、目标彩色图像和目标红外图像中的任意两个图像之间的图像纹理相似度。第一彩色图像与第一红外图像是一一对应的。例如,第一红外图像可以是与第一彩色图像在同一时刻拍摄的红外图像。
目标彩色图像与目标红外图像是一一对应的。例如,目标红外图像是与目标彩色图像在同一时刻的红外图像。
获取彩色图像以及对应的红外图像的方式可以参考图6至图8,本申请实施例对此不做限定。
目标彩色图像的信噪比高于第一彩色图像的信噪比。目标彩色图像可以理解为第一彩色图像对应的高清图像。例如,目标彩色图像可以是在白天拍摄的高清图像,第一彩色图像可以是夜间拍摄的带噪声的图像。
信噪比指的是信号与噪声之比,例如,信号与噪声的功率谱之比,或者,信号与噪声的方差之比等。信噪比越高,图像质量越好,图像越清晰。
目标红外图像的分辨率高于第一红外图像。目标红外图像可以理解为第一红外图像对应的高清图像。例如,目标红外图像可以是在白天拍摄的高清图像,第一红外图像可以是夜间拍摄的带噪声的图像。
针对同一场景拍摄可以理解为图像中的画面内容相同,例如,对同一区域拍摄的图像,该同一区域即为同一场景。如图11所示的图像中的画面内容相同,也就是针对同一场景拍摄的图像。
在一些实施例中,第一红外图像与目标红外图像可以是同一张图像。在该情况下,训练样本包括第一彩色图像、目标彩色图像和第一红外图像,即训练样本包括三类图像。
S720,获取第一背景参考图像。
第一背景参考图像与第一彩色图像之间的相似度大于第二阈值。
示例性地,第一背景参考图像中的背景区域与第一彩色图像中的背景区域相同。第一背景参考图像与第一彩色图像之间的相似度大于第二阈值,可以为,第一背景参考图像的背景区域与第一彩色图像的背景区域之间的相似度大于第二阈值。背景区域可以通过现有技术确定,本申请实施例对此不作限定。
背景参考图像的背景信噪比通常高于第一彩色图像的背景信噪比。
其中,本申请实施例中的图像的背景区域可以根据需要设定。以图12为例,图像中的背景区域可以包括图像中的建筑物,也可以不包括图像中的建筑物,本申请实施例对背景区域的划分方法不做限定。
步骤S720为可选步骤。
在一些实施方式中,训练样本还可以包括第一背景参考图像。在步骤S710中获取至少一个训练样本 时,即获取了第一背景参考图像。
具体的第一背景参考图像的获取方式可以参考后文中的方法800。
示例性地,第一背景参考图像可以以彩色图的形式输入图像融合模型,也可以以灰度图的形式输入图像融合模型。例如,直接将第一背景参考图像输入图像融合模型。再如,可以将第一背景参考图像的亮度通道输入图像融合模型。
S730,获取第一融合权重。
步骤S730为可选步骤。
在一些实施方式中,训练样本还可以包括第一融合权重。在步骤S710中获取至少一个训练样本时,即获取了第一融合权重。
具体的第一融合权重的获取方式可以参考后文中的方法900。
第一融合权重用于对第一彩色图像和第一红外图像进行加权。
也就是说,第一融合权重用于调整图像融合模型输出的图像中第一彩色图像和第一红外图像的融合比例。
第一融合权重用于调整图像融合过程中的彩色图像和红外图像的融合比例。或者说,第一融合权重用于调整在图像融合模型输出的图像中所包含的第一彩色图像的信息量和第一红外图像的信息量的比例。
可选地,第一融合权重对应全部的图像融合模型输出的图像。
也就是说,第一融合权重可以为全局权重。
全局权重用于指示在图像融合过程中整个图像的融合权重。也就是说整个图像中所有区域在图像融合过程中采用相同的融合权重。在整个图像融合模型输出的图像中,第一融合权重仅有一个。在该图像融合模型输出的图像中的任意区域中,第一彩色图像和第一红外图像的融合比例是相同的。
例如,当红外图像对应的全局权重较大时,图像融合模型输出的融合图像中所包含的红外图像的信息较多,即融合图像与红外图像更相似。当彩色图像对应的全局权重较大时,图像融合模型输出的融合图像中所包含的彩色图像的信息较多,即融合图像与彩色图像更相似。
可选地,第一融合权重对应部分的图像融合模型输出的图像。
第一融合权重对应部分图像融合模型输出的图像可以理解为,第一融合权重对应图像融合模型输出的图像中的一个区域。在该情况下,第一融合权重的数量可以为多个,多个第一融合权重分别对应图像融合模型输出的图像中的不同区域。
第一融合权重可以理解为局部权重。局部权重用于指示在图像融合过程中局部区域的融合权重。也就是说在融合过程中,不同的区域可以采用不同的第一融合权重。
例如,在区域A处的红外图像对应的权重较大,区域B处的红外图像对应的权重较小。在图像融合模型输出的融合图像中,区域A处包含的红外图像的信息较多,区域B处包含的彩色图像的信息较多。即区域A处与红外图像中的区域A处更相似,区域B处与彩色图像中的区域B处更相似。
示例性地,第一融合权重可以以参数形式输入图像融合模型中,也可以以融合权重图的形式输入图像融合模型中,本申请对此不做限定。
融合权重图中的值可以用于指示第一融合权重。例如,融合权重图中的不同区域的值可以用于表示多个第一融合权重。
以融合权重图的形式表示第一融合权重,能够降低调整第一融合权重的复杂度。在第一融合权重对应部分的图像融合模型输出的图像的情况下,通过融合权重图更有利于表示第一融合权重对应的区域。尤其在第一融合权重对应的区域为不规则的形状的情况下,融合权重图的形式更有利于表示第一融合权重对应的区域。
图11示出了一个训练样本的示意图。如图11所示,图11中的(a)为第一彩色图像In_Vis,图11中 的(b)为目标彩色图像Gt_Vis,图11中的(c)为第一背景参考图像的亮度通道In_VisRef_Y,图11中的(d)为第一红外图像In_Nir,图11中的(e)为目标红外图像Gt_Nir,图11中的(f)融合权重图In_FuseMap。
应理解,图11中仅为示意,训练样本中可以不包括第一背景参考图像的亮度通道In_VisRef_Y和融合权重图In_FuseMap,也可以包括两者中的一个,例如包括融合权重图In_FuseMap,或者包括第一背景参考图像的亮度通道In_VisRef_Y。在图11的训练样本中,第一背景参考图像是以亮度通道的形式存在,也就是以亮度通道的形式输入图像融合模型中,此处仅为示例,例如,第一背景参考图像还可以以彩色图的形式存在,也就是彩色图的形式输入图像融合模型中。在图11的训练样本中,第一融合权重是以融合权重图的形式存在,也就是以权重融合图的形式输入图像融合模型中。此处仅为示例,第一融合权重还可以以参数形式存在,也就是以参数形式输入图像融合模型中。此外,图11的训练样本中,存在两个第一融合权重,两个矩形框中的权重值相同,矩形框外的权重值相同。此处仅为示例,还可以设置更多第一融合权重,或者,第一融合权重也可以为全局权重。
S740,以第一彩色图像与第一红外图像作为图像融合模型的输入,以损失函数的值小于第四阈值为目标对图像融合模型进行训练,得到训练好的图像融合模型。
其中,损失函数包括第一损失函数。第一损失函数用于指示图像融合模型输出的图像与目标融合图像之间的差异。目标融合图像是根据目标彩色图像和目标红外图像确定的。
目标融合图像可以为彩色图像,也可以为灰度图像。
可选地,目标融合图像可以为亮度通道的图像。图像融合模型输出的图像与目标融合图像之间的差异为图像融合模型输出的图像的亮度通道与目标融合图像之间的差异。
在该情况下,以减小第一损失函数的值为目标对图像融合模型进行训练,也就是不断缩小图像融合模型输出的图像的亮度通道与目标融合图像之间的差异。该训练过程可以理解为融合任务。第一损失函数可以理解为融合任务对应的损失函数。
这样,可以在亮度通道层面上进行训练,有利于融合更多的纹理特征,减少其他因素对图像融合过程的影响。
进一步地,根据目标彩色图像与目标红外图像确定目标融合图像,包括:
根据目标彩色图像的亮度通道与目标红外图像的亮度通道确定目标融合图像。
为了更好的描述确定目标融合图像的方法,下面对亮度通道进行说明。
亮度通道包括结构(Structure)信息S,对比度(Contrast)信息C和亮度(Luminance)均值L。也可以理解为亮度通道可以分解为结构信息S,对比度信息C和亮度均值L。
例如,图像块k的亮度通道y k可以分解为图像块k的亮度均值l k、图像块k的结构信息s k和图像块k的对比度信息c k
亮度通道y k、亮度均值l k、结构信息s k和对比度信息c k满足如下公式:
Figure PCTCN2021104634-appb-000011
Figure PCTCN2021104634-appb-000012
Figure PCTCN2021104634-appb-000013
Figure PCTCN2021104634-appb-000014
其中,
Figure PCTCN2021104634-appb-000015
表示图像块k的亮度均值。
根据上述方式可以对图像的亮度通道进行分解,得到图像的结构信息、对比度信息和亮度均值。例如,目标彩色图像的亮度通道和目标红外图像的亮度通道满足如下公式。
y gt_Vis=c gt_Vis×s gt_Vis+l gt_Vis
y gt_Nir=c gt_Nir×s gt_Nir+l gt_Nir
其中,y gt_Vis表示目标彩色图像的亮度通道,c gt_Vis表示目标彩色图像的对比度,s gt_Vis表示目标彩色图像的结构信息,l gt_Vis表示目标彩色图像的亮度均值,y gt_Nir表示目标红外图像的亮度通道,c gt_Nir表示目标红外图像的对比度,s gt_Nir表示目标红外图像的结构信息,l gt_Nir表示目标红外图像的亮度均值。
相应地,根据上述方式可以由图像的结构信息、对比度信息和亮度均值得到图像的亮度通道。应理解,以上仅为示意,还可以通过其他方式得到图像的结构信息、对比度信息和亮度均值。
需要说明的是,上述公式中的值可以是整个图像对应的值,也可以是图像中的图像块对应的值。在本申请实施例的方案中,可以以图像为单位进行图像融合,也可以以图像块为单位进行图像融合,本申请实施例对此不做限定。
可选地,根据目标彩色图像的亮度通道与目标红外图像的亮度通道确定目标融合图像,包括:根据目标彩色图像的对比度和/或目标红外图像的对比度确定目标融合图像的对比度;根据目标彩色图像的结构信息和/或目标红外图像的结构信息确定目标融合图像的结构信息;根据目标彩色图像的亮度均值和/或目标红外图像的亮度均值确定目标融合图像的亮度均值。
示例性地,根据目标彩色图像的对比度和/或目标红外图像的对比度确定目标融合图像的对比度,包括:将目标彩色图像的对比度与目标红外图像的对比度中较大的对比度作为目标融合图像的对比度。
目标融合图像的对比度
Figure PCTCN2021104634-appb-000016
可以满足如下公式。
Figure PCTCN2021104634-appb-000017
其中,c gt_Vis表示目标彩色图像的对比度,c gt_Nir表示目标红外图像的对比度。
将较大的对比度作为目标融合图像的对比度可以使得目标融合图像中包含更多的纹理信息。
示例性地,根据目标彩色图像的对比度和/或目标红外图像的对比度确定目标融合图像的对比度,包括:将目标红外图像的对比度作为目标融合图像的对比度。
通常,红外图像的对比度较大,将红外图像的对比度作为目标融合图像的对比度,可以使得目标融合图像中包含更多的纹理信息,同时提高处理速度。
应理解,根据目标彩色图像的对比度和/或目标红外图像的对比度确定目标融合图像的对比度,也可以包括:根据目标彩色图像中的图像块的对比度和/或目标红外图像中的图像块的对比度,确定目标融合图像中的图像块的对比度。
示例性地,根据目标彩色图像的结构信息和/或目标红外图像的结构信息确定目标融合图像的结构信息,包括:对目标彩色图像的结构信息和目标红外图像的结构信息进行加权平均,将得到的结果作为目标融合图像的结构信息。
其中,目标彩色图像对应的结构权重和目标红外图像对应的结构权重可以是预先设定的,也可以是根据目标彩色图像的对比度和目标红外图像的对比度确定的。
例如,目标彩色图像对应的结构权重是根据目标彩色图像的对比度确定的。目标红外图像对应的结构权重是根据目标红外图像的对比度确定的。对比度越大,结构权重值越大。这样得到目标融合图像中的对比度较高的图像所占的结构信息的比重较大,使融合图像能够包含更多的纹理信息。
在该情况下,目标融合图像的结构信息
Figure PCTCN2021104634-appb-000018
可以满足如下公式。
Figure PCTCN2021104634-appb-000019
其中,w()表示计算结构权重的函数,s gt_Vis表示目标彩色图像的结构信息,w(c gt_Vis)表示根据目标彩色图像的对比度确定的目标彩色图像对应的结构权重,s gt_Nir表示目标红外图像的结构信息,w(c gt_Nir)表示根据目标红外图像的对比度确定的目标红外图像对应的结构权重。
再如,目标彩色图像对应的结构权重和目标红外图像对应的结构权重是根据目标彩色图像的对比度和目标红外图像的对比度之间的比值确定的。
示例性地,根据目标彩色图像的结构信息和/或目标红外图像的结构信息确定目标融合图像的结构信息,包括:将目标红外图像的结构信息作为目标融合图像的结构信息。
通常,在红外补光的场景下,红外图像的结构信息较多,将红外图像的结构信息作为目标融合图像的结构信息,可以使得融合图像中包含更多的纹理信息,同时提高处理速度。
应理解,根据目标彩色图像的结构信息和/或目标红外图像的结构信息确定目标融合图像的结构信息,也可以包括:根据目标彩色图像中的图像块的结构信息和/或目标红外图像中的图像块的结构信息,确定目标融合图像中的图像块的结构信息。
示例性地,根据目标彩色图像的亮度均值和/或目标红外图像的亮度均值确定目标融合图像的亮度均值,包括:将目标彩色图像的亮度均值作为目标融合图像的亮度均值。
这样可以保证融合图像中的颜色不失真,得到色彩自然的融合图像。
示例性地,根据目标彩色图像的亮度均值和/或目标红外图像的亮度均值确定目标融合图像的亮度均值,包括:将目标彩色图像的亮度均值和目标红外图像的亮度均值进行加权平均,将加权平均后的结果作为目标融合图像的亮度均值。
例如,目标彩色图像对应的亮度权重和目标红外图像对应的亮度权重可以是预先设定的。
应理解,根据目标彩色图像的亮度均值和/或目标红外图像的亮度均值确定目标融合图像的亮度均值,也可以包括:根据目标彩色图像中的图像块的亮度均值和/或目标红外图像中的图像块的亮度均值,确定目标融合图像中的图像块的亮度均值。
由目标融合图像的对比度、结构值和亮度均值可以得到目标融合图像。
示例性地,目标融合图像y fuse满足如下公式。
Figure PCTCN2021104634-appb-000020
可选地,目标融合图像与图像融合模型输出的图像的亮度通道之间的差异可以通过两个图像之间的结构相似性指数(structural similarity index measure,SSIM)确定。SSIM越大,两个图像的结构相似性越高。基于SSIM的损失(loss)约束能够使输出图像保持尽可能多的结构信息。
示例性地,第一损失函数Lfuse满足如下公式。
L fuse=1-SSIM(y fuse,y out);
其中,y out表示图像融合模型输出图像的亮度通道。
进一步地,在方法700包括步骤S720或训练样本包括第一参考背景图像的情况下,将第一参考背景图像增加为图像融合模型的输入。
进一步地,在方法700包括步骤S730或训练样本包括第一融合权重的情况下,将第一融合权重增加为图像融合模型的输入。
可选地,根据目标彩色图像和目标红外图像确定目标融合图像,可以包括:根据第一融合权重、目标 彩色图像和目标红外图像确定目标融合图像。
示例性地,根据第一融合权重、目标彩色图像和目标红外图像确定目标融合图像,可以包括:根据目标彩色图像和目标红外图像确定监督图像,根据第一融合权重对监督图像和目标彩色图像进行加权。
或者可以理解为,根据第一融合权重调整监督图像和目标彩色图像在目标融合图像中所占的比例。
其中,监督图像可以采用前文中确定目标融合图像y fuse的方式确定。也就是将目标融合图像y fuse作为监督图像。
具体地,根据第一融合权重调整监督图像和目标彩色图像在目标融合图像中所占的比例,包括:根据第一融合权重调整监督图像的亮度通道和目标彩色图像的亮度通道在目标融合图像中的比例,将调整的结果作为目标融合图像。
如前所述,第一融合权重可以对应部分的图像融合模型输出的图像。多个第一融合权重分别对应于目标融合图像中的不同位置。也就是多个第一融合权重分别用于指示在目标融合图像中的不同位置上监督图像和目标彩色图像所占的比重。
例如,目标融合图像中一个区域中监督图像所占的比重越多,该区域所包含的监督图像的信息越多,也就是目标融合图像的该区域与监督图像的该区域越相似。目标融合图像中一个区域中监督图像所占的比重越少,该区域所包含的目标彩色图像的信息越多,也就是目标融合图像的该区域与目标彩色图像中的该区域越相似。
如果第一彩色图像和第一红外图像为夜间拍摄的图像,并将前文中的目标融合图像y fuse作为监督图像,则监督图像中所包含的红外图像的信息更多。在该情况下,一个区域所包含的监督图像的信息越多,也可以理解为在该区域所包含的红外图像的信息更多。
示例性地,第一融合权重可以为融合权重图。
例如,目标融合图像yfuse_adj满足如下公式。
y fuse_adj=y fuse×IN_FuseMap+(1-IN_FuseMap)×y gt_Vis
其中,IN_FuseMap表示融合权重图。融合权重图中的不同区域上的值分别指示图像的不同区域对应的权重。例如,上式中等号右侧的第一项中,yf use与融合权重图相乘可以理解为将yf use中的像素值分别与融合权重图中该像素值所在区域对应的权重相乘。
示例性地,目标融合图像与图像融合模型输出的图像的亮度通道之间的差异可以通过两个图像之间的SSIM确定。SSIM越大,两个图像的结构相似性越高。
例如,第一损失函数L fuse_adj可以满足如下公式。
L fuse_adj=1-SSIM(y fuse_adj,y out);
示例性地,目标融合图像为亮度通道的图像,根据第一融合权重、目标彩色图像和目标红外图像确定目标融合图像,可以包括:根据第一融合权重调整目标彩色图像的亮度通道和目标红外图像的亮度通道在目标融合图像中的比例,将调整的结果作为目标融合图像。
在一些实施例中,当训练样本中没有目标红外图像时,目标融合图像也可以是根据目标彩色图像和第一红外图像确定的。确定方法同目标红外图像,此处不再赘述。这样,在第一红外图像信噪比较高的情况下,在保证图像融合模型的训练效果的同时,节约训练样本的存储空间,减小存储压力。
可选地,损失函数还包括第二损失函数。第二损失函数用于指示目标彩色图像与图像融合模型输出的图像之间的差异。
减小第二损失函数的值也就是不断优化图像融合模型的参数,以减小图像融合模型输出的图像与目标彩色图像之间的差异。该训练过程可以理解为降噪任务。第二损失函数可以理解为降噪任务对应的损失函数。
示例性地,第二损失函数L denoise可以满足如下公式。
L denoise=Σ p∈PΣ c∈C|Gt vis-Out|;
其中P表示不同位置的像素集合,p表示像素集合中的像素,C表示RGB不同颜色通道,c表示RGB颜色通道中的一个通道,Gt vis表示目标彩色图像,Out表示图像融合模型输出的图像。
该损失约束图像融合模型输出的图像与目标彩色图像尽可能相似,既能保证降噪的效果,又保证输出的图像颜色与目标彩色图像一致,避免出现输出的图像颜色错误的问题。
可选地,降噪任务和融合任务是协同实现的。
例如,图像融合模型的损失函数L可以满足如下公式。
L=L denoise+γL fuse_adj
γ为参数,用于保证降噪任务的损失函数L denoise和融合任务的损失函数L fuse_adj在同一个量级内。
应理解,该公式仅以融合任务的损失函数为L fuse_adj为例进行说明,不对本申请实施例的方案构成限定。融合任务的损失函数还可以为步骤S740中的Lfuse。
在低照度场景下采集到的图像噪声较高,通常在进行图像融合前或图像融合后对图像进行降噪处理,例如,采用滤波器进行滤波。然而该方式会导致降噪任务和融合任务相互影响,造成输出图像中融合效果较差,或降噪效果较差,图像质量无法保证。根据本申请实施例的方案,降噪任务和融合任务协同执行,减少信息损失,既能够保证融合图像中保留丰富的纹理细节,又能够保证融合图像达到较高的分辨率以及真实的彩色信息。
在本申请实施例的方案中,彩色图像拥有丰富的色彩信息,红外图像拥有更多的纹理信息,两者融合得到的融合图像具有自然的色彩以及丰富的纹理信息。根据目标彩色图像和目标红外图像确定目标融合图像,基于目标融合图像训练图像融合模型,使得图像融合模型能够充分利用红外信息,有利于在输出图像中融合更多的纹理信息,保留更多的图像细节。
此外,不同应用场景对图像融合的要求可能不同,使用相同的图像融合模型进行融合得到的融合图像无法满足不同应用场景的融合要求,通过引入融合权重,能够调整彩色图像和红外图像的融合比例,有利于应用于不同的应用场景。也就是说无需针对不同的应用场景分别训练多个图像融合模型,仅通过调整融合权重,即可应用于不同的场景,提高了模型使用的自由度。
此外,同一图像中的不同区域对图像融合的要求可能不同,例如,对于图像中的人像,人脸区域倾向于融合更多的红外信息,以保留更多的纹理信息,人体区域倾向于融合更多的彩色信息,以保证输出图像色彩的真实性。本申请实施例中,不同区域对应不同的融合权重,以满足同一图像中的不同区域对图像融合的要求,有利于提高输出图像的图像质量。
此外,通过将背景参考图像增加为图像融合模型的输入,并基于此训练图像融合模型,能够解决红外图像手电筒效应可能导致的背景模糊问题,很大程度上提升输出图像的背景质量,即同时增强输出图像的前景区域质量和背景区域质量,实现全画面的图像增强。
红外图像为主动补光,存在手电筒效应。如图5的(b)所示,红外图像中呈现出画面中心较亮、边缘较暗的现象。而且,红外图像容易存在过曝问题,为了保证补光中心区域的画面质量,通常会降低周围区域的亮度,导致红外图像前景与背景信噪比差异大,中间区域的信噪比较高,周围区域的信噪比较低。直接将红外图像作为低照场景下图像融合任务的参考输入,输出结果可能存在背景模糊问题。
本申请实施例提供一种图像融合模型的训练方法800,通过增加背景参考图像作为图像融合模型的输入,提高图像融合模型的训练效果。
方法800可以由图像融合模型的训练装置来执行,该图像融合模型的训练装置可以是云服务设备,也可以是终端设备,例如,电脑、服务器等运算能力足以用来执行图像融合模型的训练方法的装置,也可以 是由云服务设备和终端设备构成的系统。示例性地,方法800可以由图1中的训练设备120、图3中的神经网络处理器30或图4中的执行设备410或本地设备执行。
例如,方法800具体可以由如图4中的本地设备执行,该本地设备可以为监控设备。具体地,方法800可以由监控设备上的计算模块执行。
方法800包括步骤S810至步骤S820,下面对步骤S810至步骤S820进行详细说明。
步骤S810,获取第一背景参考图像、第一彩色图像和第一红外图像。
第一背景参考图像与第一彩色图像之间的相似度大于第二阈值。
其中,第一背景参考图像中的背景区域与第一彩色图像中的背景区域相同。第一背景参考图像与第一彩色图像之间的相似度大于第二阈值,可以为,第一背景参考图像的背景区域与第一彩色图像的背景区域之间的相似度大于第二阈值。背景区域可以通过现有技术确定,本申请实施例对此不作限定。
第一背景参考图像的背景信噪比高于第一彩色图像的背景信噪比。第一彩色图像为图像融合模型的输入。
第一彩色图像和第一红外图像是针对同一场景拍摄的。同一场景指的是第一彩色图像、第一红外图像之间的相似度大于第一阈值,第一彩色图像为场景对可见光的反射形成的影像,第一红外图像为场景对红外波段的光的反射形成的影像。
示例性地,第一背景参考图像可以为彩色图,也可以为灰度图。也就是说,第一背景参考图像可以以彩色图的形式输入图像融合模型,也可以以灰度图的形式输入图像融合模型。
背景参考图像可以通过多种方式获得。下面举例说明背景参考图像的获取方式,背景参考图像可以通过以下任一种方式获取。应理解,以下仅为示例,还可以通过其他方式获取背景参考图像,本申请对此不做限定。第一背景参考图像为第一彩色图像对应的背景参考图像,可以通过以下任一种方式获取。
例如,根据与彩色图像的相似度确定背景参考图像。
具体地,确定图库中的图像与彩色图像的相似度。将图库中与彩色图像的相似度最高的图像作为背景参考图像。该图库可以为高清图像库。例如,图库中的图像的信噪比高于该彩色图像的信噪比。
比如,可以通过SSIM等参数确定两张图像的相似度。
进一步地,根据与彩色图像的背景区域的相似度确定背景参考图像。
具体地,确定图库中的图像的背景区域与彩色图像的背景区域的相似度。将图库中与彩色图像的背景区域的相似度最高的图像作为背景参考图像。
再如,将与彩色图像对应的长曝光图像作为背景参考图像。
长曝光图像指的是采用长曝光的方式拍摄的图像。
与彩色图像对应的长曝光图像指的是在彩色图像拍摄的区域采用长曝光的方式拍摄的图像。例如,与彩色图像对应的长曝光图像可以是拍摄彩色图像的设备在拍摄彩色图像的位置采用长曝光的方式拍摄的图像。长曝光图像为在曝光时长大于第三阈值的情况下得到的图像。
再如,根据与彩色图像对应的多张彩色图像确定该彩色图像的背景参考图像。
与彩色图像对应的多张彩色图像指的是:在该彩色图像拍摄的区域拍摄的图像。例如,与彩色图像对应的多张彩色图像可以是拍摄该彩色图像的设备在拍摄该彩色图像的位置拍摄的图像。
再如,将彩色图像进行时域降噪后的结果作为该彩色图像的背景参考图像。
可选地,方法800可以应用于视频模式,也就是用于视频融合的场景。
也就是说,方法800训练得到的图像融合模型可以应用于视频场景中。对于视频中的任一帧均可以采用方法800得到的图像融合模型进行图像融合,进而得到融合图像/融合视频。
示例性地,在该情况下,背景参考图像还可以通过以下任一种方式获取。
第一背景参考图像为第一彩色图像对应的背景参考图像,也可以通过以下任一种方式获取。
例如,根据彩色图像之前的多帧彩色图像确定背景参考图像。
由于连续几帧图像的背景区域较为接近,可以通过连续多帧彩色图像获得该彩色图像对应的背景参考图像。
具体地,当前输入图像融合模型的彩色图像作为目标帧,对目标帧之前的多帧彩色图像进行累积,获得累积帧,将累积帧作为目标帧的背景参考图像。该累积帧的背景区域信噪比较好,前景区域可能存在运动模糊。
比如,计算目标帧之前的n帧彩色图像的平均值得到累积帧,该累积帧即为目标帧的背景参考图像。n为大于1的整数。n的值越大,则得到背景参考图像中的背景区域越清晰。
目标帧的背景参考图像Ref cur可以满足如下公式。
Figure PCTCN2021104634-appb-000021
其中,Frame i表示第i帧,cur表示当前帧数,即目标帧为第cur帧。
可替换地,采用递归的方式生成累积帧。
也可以理解为,根据目标帧之前的帧的背景参考图像确定目标帧的背景参考图像。
比如,目标帧的背景参考图像Ref cur可以满足如下公式。
Ref cur=weight×Ref old+(1-weight)×Vis cur
其中,Ref old表示目标帧之前的帧的背景参考图像,或者说是目标帧之前的帧对应的累积帧。Vis cur表示当前获取的彩色图像,即目标帧,weight表示累积权重,累积权重越大,背景参考图像的背景信噪比越高,运动拖影越明显。图12的(a)示出了累积权重为0.5时获得的背景参考图像的灰度图,图12的(b)示出了累积权重为0.9时获得的背景参考图像的灰度图。图12的(b)中的背景参考图像中的背景区域的信噪比明显高于图12的(a)中的背景参考图像中的背景区域的信噪比。图像融合模型可以较好地抑制运动模糊的问题,因此,可以将累计权重设置的高一点,以对背景提升产生更好的效果。例如,将累积权重设为0.9。
通过递归的方式生成背景参考图像可以减少图像缓存,降低存储压力。
虽然通过累积帧的方式得到背景参考图像的前景区域存在运动模糊,但背景参考图像和彩色图像对应的红外图像存在较好的互补效应,不会影响前景区域的画面质量。
再如,将目标帧之前的长曝光帧作为目标帧的背景参考图像。长曝光帧指的是采用长曝光的方式拍摄的帧。长曝光帧为在曝光时长大于第三阈值的情况下得到的帧。
再如,将前一帧的背景参考图像作为目标帧的背景参考图像。也就是将该彩色图像之前的帧的背景参考图像作为背景参考图像。
这样可以复用之前的背景参考图像的特征,减少计算量。
再如,将对目标帧的前一帧的融合图像作为目标帧的背景参考图像。也就是将图像融合模型输出的该彩色图像之前的帧的融合图像作为背景参考图像。
例如,将帧A作为第一彩色图像输入图像融合模型,得到融合后的帧A,将融合后的帧A作为帧A+1的背景参考图像,然后将帧A+1作为第一彩色图像、融合后的帧A作为第一参考图像输入图像融合模型。
再如,将目标帧进行时域降噪后的结果作为目标帧的背景参考图像。
由于大部分监控场景下,背景变化较少,画面的背景部分存在较高的相似度。对于监控设备位置不变的场景,可以将照度高的情况下拍摄的彩色图像作为背景参考图像。例如,将天气晴朗的白天拍摄的彩色图像作为夜间拍摄的彩色图像的背景参考图像。
应理解,以上获取背景参考图像的方式仅为示例,本申请对此不做限定。
步骤S820,将第一背景参考图像、第一彩色图像和第一红外图像作为图像融合模型的输入,对图像融合模型进行训练。
示例性地,图像融合模型包括编码器(encoder)网络和解码器(decoder)网络。
编码器网络用于提取输入图像的特征,解码器网络用于根据提取的特征得到融合后的图像。该融合后的图像即为第一彩色图像的融合结果。
编码器网络可以采用神经网络,例如,卷积神经网络。解码器网络可以采用神经网络,例如,卷积神经网络。
示例性地,该编码器网络包括第一编码器、第二编码器和第三编码器。
其中,第一编码器用于提取背景参考图像的特征,第二编码器用于提取输入的彩色图像的特征,第三编码器用于提取输入的红外图像的特征。
需要说明的是,第一编码器、第二编码器和第三编码器可以为同一个编码器,也可以为不同的编码器。
例如,第一编码器用于提取背景参考图像的特征,第二编码器和第三编码器为同一个编码器,用于提取输入的彩色图像和输入的红外图像的特征。
如图13所示,将背景参考图像输入融合模型中的编码器11#(第一编码器的一例),由编码器11#提取背景参考图像的特征,并输入解码器12#中。将彩色图像和红外图像输入编码器13#(第二编码器的一例,也可以理解为第三编码器的一例),由编码器13#提取输入的彩色图像和输入的红外图像的特征,并输入解码器12#中。由解码器12#根据输入的特征重建得到融合后的图像。编码器11#、编码器13#和解码器12#均可以为卷积神经网络。例如,该输入的彩色图像可以为第一彩色图像,输入的红外图像可以为第一红外图像。
可选地,方法800可以应用于视频模式,也就是用于视频融合的场景。
进一步地,可以将第一彩色图像的前一帧的背景参考图像的特征作为第一背景参考图像的特征。也就是说,将其中一帧的背景参考图像的特征复用于多帧彩色图像的图像融合过程中。
如前所述,视频中的不同帧的背景参考图像可以是相同的。例如,将天气晴朗的白天拍摄的彩色图像作为夜间拍摄的彩色图像的背景参考图像。
例如,将帧A、背景参考图像A和红外图像A输入图像融合模型,分别提取帧A、背景参考图像A和红外图像A的特征,然后根据提取的特征重建得到融合后的图像,即帧A的融合结果。将帧A+1和红外图像A+1输入图像融合模型,分别提取帧A+1和红外图像A+1的特征,并将背景参考图像A的特征作为帧A+1的背景参考图像的特征,然后根据提取的特征重建得到融合后的图像,即帧A+1的融合结果。
这样,无需在每次融合的过程中均提取背景参考图像的特征,减少了计算量,能够在保证成像质量的同时减少硬件开销,在设备的计算资源有限的情况下,仍然可以实现图像融合。
示例性地,编码器网络包括M个第一编码器、N个第二编码器,解码器网络包括N个解码器。也就是图像融合模型包括M个第一编码器、N个第二编码器和N个解码器。M为正整数,N为大于1的正整数,N>M。
第一彩色图像可以包括N帧彩色图像,第一红外图像可以包括N帧彩色图像对应的N帧红外图像。
将该N帧彩色图像和N帧红外图像作为图像融合模型的输入,该图像融合模型可以输出N帧彩色图像对应的融合图像,具体包括以下步骤。
(1)分别提取N帧彩色图像的特征和N帧红外图像的特征。
具体地,将N帧彩色图像和N帧红外图像分别输入N个第二编码器中,N个第二编码器分别提取N帧彩色图像的特征和N帧红外图像的特征,并分别输入N个解码器中。
(2)分别提取M个背景参考图像的特征。
具体地,将N帧彩色图像中的M帧彩色图像对应的M个背景参考图像分别输入M个第一编码器中, M个第一编码器分别提取M个背景参考图像的特征,并将M个背景参考图像的特征分别输入N个解码器中,使每个解码器均接收到M个背景参考图像中的一个背景参考图像的特征。
具体地,对于每个解码器,从M帧彩色图像中选择与该解码器接收到的彩色图像的帧数最接近的彩色图像,将该最接近的彩色图像的背景参考图像输入该解码器中。
例如,对于解码器A,步骤(1)中输入该解码器A的特征为帧A的特征和帧A对应的红外图像的特征。若帧A为M帧彩色图像中的一帧图像,则步骤(2)中将帧A的背景参考图像的特征输入解码器A中。若帧A不属于M帧彩色图像,则步骤(2)将M帧彩色图像中与帧A的帧数最接近的一帧彩色图像的背景参考图像的特征输入解码器A中。
(3)根据N帧彩色图像的特征和N帧红外图像的特征以及M个背景参考图像的特征分别重建得到N个融合图像。
具体地,N个解码器分别根据输入N个解码器的特征重建得到N个融合图像,输入N个解码器的特征包括:N帧彩色图像的特征、N帧红外图像的特征和M个背景参考图像的特征。
例如,图像融合模型包括一个第一编码器、两个第二编码器和两个解码器。如图14所示,编码器网络包括编码器21#(第一编码器的一例)、编码器22#和编码器23#(第二编码器的一例),解码器网络包括解码器24#和解码器25#(解码器的一例)。将第i帧的彩色图像和第i帧的红外图像输入编码器22#中,由编码器22#提取第i帧的彩色图像和第i帧的红外图像的特征,并输入解码器24#中。将第i+1帧的彩色图像和第i+1帧的红外图像输入编码器23#中,由编码器23#提取第i+1帧的彩色图像和第i+1帧的红外图像的特征,并输入解码器25#中。将第i帧的背景参考图像输入编码器21#中,由编码器21#提取背景参考图像的特征,并输入解码器24#和解码器25#中。由解码器24#根据编码器22#和编码器21#提取的特征重建得到融合的图像,即第i帧输出图像。由解码器25#根据编码器23#和编码器21#提取的特征重建得到融合的图像,即第i+1帧输出图像。也就是将第i帧的背景参考图像的特征复用于第i帧和第i+1帧的图像融合。这样可以同时对两帧进行图像融合,提高了处理速度,同时无需提取两次背景参考图像的特征,减少了计算量,在效果基本无损的情况下,可以将方案的开销降低25%。
通过增加编码器和解码器的数量,能够同时对多帧图像进行融合,提高处理速度,且复用背景参考图像的特征,减少了背景参考图像的特征的提取过程中的计算量,降低硬件开销。
需要说明的是,步骤S820中的图像融合模型仅为示意,其他能够实现图像融合的模型也可以作为本申请实施例中的图像融合模型。
示例性地,训练过程可以参考前述方法700,此处不再赘述。
需要说明的是,训练过程也可以不采用前述方法700而采用其他训练方法,本申请对此不做限定。
根据本申请实施例的方案,通过将背景参考图像增加为图像融合模型的输入,并基于此训练图像融合模型,能够解决红外图像手电筒效应可能导致的背景模糊问题,很大程度上提升输出图像的背景质量,即同时增强输出图像的前景区域质量和背景区域质量,实现全画面的图像增强。
本申请实施例提出一种图像融合模型的训练方法900,通过增加融合权重调整输出图像,满足不同的应用场景。
方法900可以由图像融合模型的训练装置来执行,该图像融合模型的训练装置可以是云服务设备,也可以是终端设备,例如,电脑、服务器等运算能力足以用来执行图像融合模型的训练方法的装置,也可以是由云服务设备和终端设备构成的系统。示例性地,方法900可以由图1中的训练设备120、图3中的神经网络处理器30或图4中的执行设备410或本地设备执行。
例如,方法900具体可以由如图4中的本地设备执行,该本地设备可以为监控设备。具体地,方法900可以由监控设备上的计算模块执行。
方法900包括步骤S910至步骤S920,下面对步骤S910至步骤S920进行详细说明。
步骤S910,获取第一融合权重、第一彩色图像和第一红外图像。
第一融合权重用于对第一彩色图像和第一红外图像进行加权。
第一融合权重用于调整第一彩色图像和第一红外图像在图像融合模型输出的图像中的融合比例。
示例性地,第一融合权重可以为参数形式,也可以为图像形式,即融合权重图。也就是说,第一融合权重可以以参数的形式输入图像融合模型中,也可以以图像形式输入图像融合模型中。
第一融合权重可以对应全部的图像融合模型输出的图像。即第一融合权重可以为全局权重。
第一融合权重可以对应部分的图像融合模型输出的图像,即第一融合权重可以为局部权重。不同的第一融合权重分别对应输出图像中的不同区域。
第一融合权重可以通过多种方式获得。下面举例说明第一融合权重的获取方式。第一融合权重可以通过以下任意一种方式获得。应理解,以下仅为示例,还可以通过其他方式获取第一融合权重,本申请对此不做限定。
方式1:第一融合权重可以是预先设置的。例如,第一融合权重可以是人为设定的。
方式2:根据红外图像的强度确定第一融合权重。
例如,可以设置多个第一融合权重,多个第一融合权重是根据红外图像不同区域的亮度值确定的。具体地,在红外图像中的亮度越高的区域,对应的第一融合权重越高。由于亮度越高的区域信噪比越高,可以根据红外图像的强度自适应调节权重值,在亮度越高的区域设置更高的权重,有利于使融合图像的质量更高。
该红外图像可以是第一红外图像,也可以是目标红外图像。
方式3:根据彩色图像的信息熵和红外图像的信息熵确定第一融合权重。
多个第一融合权重值是根据彩色图像的信息熵和红外图像的信息熵确定的。
例如,红外图像的区域A处的信息熵大于彩色图像的区域A处的信息熵,则区域A处红外图像对应的权重值较高。红外图像的区域B处的信息熵小于彩色图像的区域B处的信息熵,则区域B处红外图像对应的权重值较低。
该红外图像可以是第一红外图像,也可以是目标红外图像。
该彩色图像可以是第一彩色图像,也可以是目标彩色图像。
信息熵来源包括但不限于梯度信息、对比度信息等。
通常图像的信息熵越大,图像越清楚,通过图像的信息熵自适应调节权重值,得到第一融合权重,使信息熵高的区域对应的权重较高,有利于使融合图像的质量更高。
方式4:根据人脸信息确定第一融合权重。
具体地,在人脸区域处可以设置较高的红外图像的权重值,在人脸区域之外的其他区域设置较低的红外图像的权重值,即彩色图像的权重值较高。
人脸区域的获取方式包括但不限于人脸检测、图像分割或人脸热力图等方式。
该红外图像可以是第一红外图像,也可以是目标红外图像。
红外图像所包含的图像信息较多,即包含更多的纹理信息,因此,通过在人脸区域设置较高的红外图像的权重值,能够使人脸区域融合更多的红外信息,保留更多的细节,提高人脸区域的清晰度,有利于提高人脸识别的准确率。彩色图像中的色彩更真实,因此,通过在其他区域设置较低的红外图像的权重值,能够使其他区域融合更多的彩色信息,保证其他区域色彩的自然程度,使融合图像的效果更自然。
图15示出了一种获取融合权重的方法的示意图。示例性地,该方法可以应用于人脸卡口监控的场景中。
具体地,获取图15的(a)中的人脸区域,根据人脸区域生成权重融合图像,如图15的(b)所示。人脸区域的权重高于其他区域的权重。该权重值用于指示红外图像在融合图像中所占的比重。图15的(b) 中的人脸区域的权重为0.6,其他区域的权重为0.1。
示例性地,人脸区域可以为人脸框。例如,图15中的矩形框。可替换地,人脸框还可以其他形状,例如,圆形框或不规则框等。
用于进行人脸检测的图像可以为彩色图像,例如,第一彩色图像,也可以为红外图像,例如,第一红外图像。既可以在彩色图像上进行人脸检测,获得人脸框,也可以在红外图像上进行人脸检测,获得人脸框。例如,图15的(a)中为在红外图像上进行人脸检测。
应理解,图15中仅以融合权重图的形式表示第一融合权重作为示例,不对本申请实施例的方案构成限定,还可以以其他形式表示第一融合权重,例如,以参数值的形式表示第一融合权重。
需要说明的是,图15中仅以第一融合权重指示红外图像的权重值为例进行说明,不对本申请实施例的方案构成限定,第一融合权重还可以用于指示彩色图像的权重值等。
步骤S920,将第一融合权重、第一彩色图像和第一红外图像作为图像融合模型的输入,对图像融合模型进行训练。
示例性地,训练过程可以参考前述方法700,此处不再赘述。
需要说明的是,训练过程也可以不采用前述方法700而采用其他训练方法,本申请对此不做限定。
示例性地,方法800和方法900可以结合使用,也就是将第一融合权重、第一背景参考图像、第一彩色图像和第一红外图像作为图像融合模型的输入,对图像融合模型进行训练。例如,采用图13中的图像融合模型,将第一融合权重、第一红外图像、第一彩色图像输入第二编码器中,将第一背景参考图像输入第一编码器中,对图像融合模型进行训练。
不同应用场景对图像融合的要求可能不同,使用相同的图像融合模型进行融合得到的融合图像无法满足不同应用场景的融合要求,通过引入融合权重,能够调整彩色图像和红外图像的融合比例,有利于应用于不同的应用场景。也就是说无需针对不同的应用场景分别训练多个图像融合模型,仅通过调整融合权重,即可应用于不同的场景,提高了模型使用的自由度。
例如,人脸区域更关心识别率,倾向于融合更多的红外信息,使融合结果更接近红外路,人体区域更关心颜色准确度,倾向于将红外图像作为降噪参考,使融合结果更接近彩色路,提高图像的自然度。根据本申请实施例的方案,根据融合权重对图像中不同位置做不同的融合处理,有利于针对性地提高图像的成像质量。
此外,不同区域的红外图像对图像融合的参考价值不同。通过调整不同位置的红外图像的融合权重可以保证利用红外图像提升前景清晰度的同时,图像的背景信号不退化,即降低红外图像的手电筒效应对背景区域带来的影响。例如,调高前景区域的红外图像的融合权重,使融合后的图像中的前景区域能够融合更多的红外图像的信息;降低背景区域的红外图像的融合权重,使融合后的图像中的背景区域能够融合更多的彩色图像的信息。
本申请实施例提出一种图像融合方法1000的示意性流程图,该方法可以由能够进行图像融合的装置或设备执行,能够进行图像融合的装置可以是云服务设备,也可以是终端设备,例如,电脑、服务器等运算能力足以用来执行图像融合方法的装置,也可以是由云服务设备和终端设备构成的系统。示例性地,方法1000可以由图1中的执行设备110、图3中的神经网络处理器30或图4中的执行设备410或本地设备执行。
方法1000具体可以由如图1所示的执行设备110执行,方法1000中的待处理的彩色图像和红外图像可以是如图1所示的客户设备140给出的输入数据,执行设备110中的预处理模块113可以用来执行方法1000中S1020所述的获取背景参考图像,执行设备110中的预处理模块114可以用来执行方法1000中的S1030所述的获取融合权重,执行设备110中的计算模块111可以用于执行S方法1000中的S1040所述的图像融合。
再如,方法1000具体可以由如图4中的本地设备执行,该本地设备可以为监控设备。具体地,方法1000可以由监控设备上的计算模块执行。
可选的,方法1000可以由CPU处理,也可以由CPU和GPU共同处理,也可以不用GPU,而使用其他适合用于神经网络计算的处理器,本申请不做限制。
图像融合方法1000中使用的图像融合模型可以是通过上述图10中的方法构建的。方法1000包括步骤S1010至步骤S1040。方法1000中的具体实现方式可以参照前述方法700,为了避免不必要的重复,下面在介绍方法1000时适当省略重复的描述。
S1010,获取待处理的彩色图像和红外图像。
其中,待处理的彩色图像为场景对可见光的反射形成的影像,红外图像为场景对红外波段的光的反射形成的影像。
红外图像与待处理的彩色图像是针对同一场景拍摄的。同一场景指的是待处理的彩色图像和红外图像之间的相似度大于第一阈值。
或者,可以理解为针对同一场景拍摄指的是红外图像和待处理的彩色图像的画面内容相同。例如,红外图像可以是与待处理的彩色图像在同一时刻对同一区域拍摄的红外图像。获取彩色图像以及对应的红外图像的方式可以参考图6至图8,本申请实施例对此不做限定。
S1020,获取背景参考图像。
其中,背景参考图像与待处理的彩色图像之间的相似度大于第二阈值。
背景参考图像中的背景区域与待处理的彩色图像中的背景区域相同。背景参考图像与待处理的彩色图像之间的相似度大于第二阈值,可以为,背景参考图像的背景区域与待处理的彩色图像的背景区域之间的相似度大于第二阈值。背景参考图像的背景信噪比通常高于待处理的彩色图像的背景信噪比。背景区域可以通过现有技术确定,本申请实施例对此不作限定。
步骤S1020为可选步骤。
具体的背景参考图像的获取方法可以参考前文中的方法800。
示例性地,背景参考图像可以以彩色图的形式输入图像融合模型,也可以以灰度图的形式输入图像融合模型。例如,直接将背景参考图像输入图像融合模型。再如,可以将背景参考图像的亮度通道输入图像融合模型。
S1030,获取融合权重。
其中,融合权重用于对待处理的彩色图像和红外图像进行加权。
也就是说,融合权重用于调整待处理的彩色图像和红外图像在融合图像中的融合比例。或者说,融合权重用于调整融合图像中所包含的待处理的彩色图像的信息量和红外图像的信息量的比例。
步骤S1030为可选步骤。
具体的融合权重的获取方式可以参考前文中的方法900。
可选地,融合权重可以为全局权重。
例如,当红外图像对应的权重较大时,融合图像中所包含的红外图像的信息较多,即融合图像与红外图像更相似。当待处理的彩色图像对应的权重较大时,融合图像中所包含的待处理的彩色图像的信息较多,即融合图像与待参考的彩色图像更相似。
可选地,融合权重对应全部的融合图像。
其中,融合权重对应全部的融合图像可以理解为,在整个融合图像中,融合权重仅有一个。在该融合图像中的任意区域中,待处理的彩色图像和红外图像的融合比例是相同的。该融合权重可以称为全局权重。
可选地,融合权重对应部分述融合图像。
融合权重对应部分融合图像可以理解为,融合权重对应融合图像中的一个区域。在该情况下,融合权 重的数量可以为多个,多个融合权重分别对应融合图像中的不同区域。该融合权重可以称为局部权重。
例如,在区域A处的红外图像对应的权重较大,区域B处的红外图像对应的权重较小。在融合图像中,区域A处包含的红外图像的信息较多,区域B处包含的待处理的彩色图像的信息较多。即区域A处与红外图像中的区域A处更相似,区域B处与待处理的彩色图像中的区域B处更相似。
示例性地,融合权重可以以参数形式输入图像融合模型中,也可以以融合权重图的形式输入图像融合模型中,本申请对此不做限定。
融合权重图中的值可以用于指示融合权重。例如,在设置多个融合权重的情况下,融合权重图中的不同区域的值可以用于表示对应融合图像中不同区域的多个融合权重。
以融合权重图的形式表示融合权重,能够降低调整融合权重的复杂度。在设置多个融合权重的情况下,通过融合权重图更有利于表示多个融合权重对应的区域。尤其在多个融合权重对应的区域为不规则的形状的情况下,融合权重图的形式更有利于表示多个融合权重对应的不同区域。
S1040,将待处理的彩色图像和红外图像输入图像融合模型中进行特征提取,基于提取的特征进行图像融合,以得到融合图像。
其中,图像融合模型是通过以第一彩色图像和第一红外图像作为图像融合模型的输入,以损失函数的值小于第四阈值为目标对图像融合模型进行训练得到的。
损失函数包括第一损失函数,第一损失函数用于指示图像融合模型输出的图像与目标融合图像之间的差异,目标融合图像是根据目标彩色图像和目标红外图像确定的,第一彩色图像、第一红外图像、目标彩色图像和目标红外图像是针对同一场景拍摄的,目标彩色图像的信噪比高于第一彩色图像的信噪比,目标红外图像的信噪比高于第一红外图像的信噪比。同一场景指的是第一彩色图像、第一红外图像、目标彩色图像和目标红外图像中的任意两个图像之间的相似度大于第一阈值。
可选地,目标融合图像为亮度通道的图像,图像融合模型输出的图像与目标融合图像之间的差异为图像融合模型输出的图像的亮度通道与目标融合图像之间的差异。
可选地,损失函数还包括第二损失函数,第二损失函数用于指示目标彩色图像与图像融合模型输出的图像之间的差异。
在方法1000包括步骤S1020的情况下,步骤S1040还包括:将背景参考图像输入图像融合模型中,以执行图像融合。
在方法1000包括步骤S1030的情况下,步骤S1040还包括:将融合权重输入图像融合模型中,以执行图像融合。
具体的训练过程可以参见方法700中的S740,此处不再赘述。
根据本申请实施例的方案,通过目标彩色图像和目标红外图像确定目标融合图像,基于目标融合图像训练图像融合模型,使得图像融合模型能够充分利用红外信息,有利于在输出图像中融合更多的纹理信息,保留更多的图像细节。
图18示出了采用方法1000进行图像融合的效果示意图,图18的(a)为低照度场景下得到的彩色图像,该场景的照度为0.2Lux。如图所示,彩色图像的信噪比较差,人脸区域较为模糊,几乎无法识别。图18的(b)为该彩色图像对应的红外图像,通过近红外补光,可以获得清晰度较高的红外图像。如图所示,该近红外图像中的人脸和人体的清晰度较高。但该红外图像不存在色彩信息,且由于补光灯的手电筒效应,导致背景区域的信号几乎为0。图18的(c)为采用方法1000获得的融合图像。如图所示,该融合图像充分结合了彩色图像和近红外图像的优点,在低照度的场景下,提高了融合图像的成像质量。
图19示出了采用不同方法进行图像融合的效果对比图。图19的(a)为待处理的彩色图像,信噪比较差,人脸区域较为模糊,几乎不可识别。且由于噪声较大,图像白平衡参数估计出现一定误差,导致图像存在一定偏黄问题(白衣服偏黄)。图19的(b)为采用传统的亮度融合方案得到的融合图像,该方案 可以提高人脸部分的信噪比,但如图中箭头处所示,该方案导致人体区域的色彩失真,深色裤子出现偏灰色的颜色错误。图19的(c)为采用方法1000得到的融合图像,该融合图像中的人脸区域清晰度得到提高的同时,保持了人体区域的真实颜色。
表1示出了对不同方法得到融合图像进行人脸识别的测试结果。
具体地,测量照度范围为0.2Lux-5Lux范围内1424张融合图像中的人脸区域与标准证件照之间的相似度,统计相似度大于0.85时人脸召回情况。
表1
成像模式 测试目标数 >0.85 >0.85召回率
近红外 1424 425 29.85%
传统融合 1424 260 18.26%
本申请的方案 1424 465 32.65%
如表1所示,传统的亮度融合方案得到的融合图像中,人脸识别效果最差,本申请实施例中的方案得到的融合图像综合了红外图像和彩色图像的优点,人脸识别效果最好。
可以理解方法700为该图像融合模型的训练阶段(如图1所示的训练设备120执行的阶段),具体训练是采用由方法700中提供的图像融合模型进行的;而方法1000则可以理解为是该图像融合模型的应用阶段(如图1所示的执行设备110执行的阶段),具体可以体现为采用由方法700训练得到的图像融合模型,并根据用户输入的待处理的彩色图像和红外图像,从而得到输出图像,即融合图像。
图16示出了本申请实施例提出的另一种图像融合方法1100,通过增加背景参考图像作为图像融合模型的输入,提高成像质量。
方法1100可以由图像融合装置来执行,该图像融合装置可以是云服务设备,也可以是终端设备,例如,电脑、服务器等运算能力足以用来执行图像融合方法的装置,也可以是由云服务设备和终端设备构成的系统。示例性地,方法1100可以由图1中的执行设备110、图3中的神经网络处理器30或图4中的执行设备410或本地设备执行。
例如,方法1100具体可以由如图4中的本地设备执行,该本地设备可以为监控设备。具体地,方法1100可以由监控设备上的计算模块执行。
示例性地,图像融合方法1100中使用的图像融合模型可以是通过上述方法800构建的。方法1100包括步骤S1110至步骤S1120。方法1100中的具体实现方式可以参照前述方法800,为了避免不必要的重复,下面在介绍方法1100时适当省略重复的描述。
步骤S1110,获取待处理的彩色图像、红外图像和背景参考图像。
背景参考图像与待处理的彩色图像之间的相似度大于第二阈值。
示例性地,背景参考图像中的背景区域与待处理的彩色图像中的背景区域相同。背景参考图像与待处理的彩色图像之间的相似度大于第二阈值,可以为,背景参考图像的背景区域与待处理的彩色图像的背景区域之间的相似度大于第二阈值。背景参考图像的背景信噪比高于待处理的彩色图像的背景信噪比。
待处理的彩色图像和红外图像是针对同一场景拍摄的。同一场景指的是待处理的彩色图像和红外图像之间的相似度大于第一阈值。
本申请实施例中的相似度可以为图像纹理相似度。例如,待处理的彩色图像和红外图像之间的相似度可以为待处理的彩色图像和红外图像之间的图像纹理相似度;背景参考图像与待处理的彩色图像之间的相似度可以为背景参考图像与待处理的彩色图像之间的图像纹理相似度。
示例性地,背景参考图像可以为彩色图,也可以为灰度图。也就是说,背景参考图像可以以彩色图的 形式输入图像融合模型,也可以以灰度图的形式输入图像融合模型。
背景参考图像可以通过多种方式获得。
方法1100可以应用于视频模式,也就是用于视频融合的场景。示例性地,在该情况下,背景参考图像可以通过以下任一种方式获取。
例如,根据待处理的彩色图像之前的多帧彩色图像确定背景参考图像。
具体地,当前输入图像融合模型的待处理的彩色图像作为目标帧,对目标帧之前的多帧彩色图像进行累积,获得累积帧,将累积帧作为目标帧的背景参考图像。该累积帧的背景区域信噪比较好,前景区域可能存在运动模糊。
比如,计算目标帧之前的n帧彩色图像的平均值得到累积帧,该累积帧即为目标帧的背景参考图像。n为大于1的整数。n的值越大,则得到背景参考图像中的背景区域越清晰。
目标帧的背景参考图像Ref cur可以满足如下公式。
Figure PCTCN2021104634-appb-000022
其中,Frame i表示第i帧,cur表示当前帧数,即目标帧为第cur帧。
可替换地,采用递归的方式生成累积帧。
也可以理解为,根据目标帧之前的帧的背景参考图像确定目标帧的背景参考图像。
比如,目标帧的背景参考图像Ref cur可以满足如下公式。
Ref cur=weight×Ref old+(1-weight)×Vis cur
其中,Ref old表示目标帧之前的帧的背景参考图像,或者说是目标帧之前的帧对应的累积帧。Vis cur表示当前获取的彩色图像,即目标帧,weight表示累积权重,累积权重越大,背景参考图像的背景信噪比越高,运动拖影越明显。图像融合模型可以较好地抑制运动模糊的问题,因此,可以将累计权重设置的高一点,以对背景提升产生更好的效果。例如,将累积权重设为0.9。
通过递归的方式生成背景参考图像可以减少图像缓存,降低存储压力。
虽然通过累积帧的方式得到背景参考图像的前景区域存在运动模糊,但该第一背景参考图像和第一红外图像存在较好的互补效应,不会影响前景区域的画面质量。
可选地,将待处理的彩色图像之前的长曝光帧作为背景参考图像。长曝光帧为在曝光时长大于第三阈值的情况下得到的帧。
再如,背景参考图像为待处理的彩色图像之前的帧的背景参考图像。
这样可以复用之前的背景参考图像的特征,减少计算量。
再如,待处理的彩色图像可以理解为目标帧,将对目标帧的前一帧的融合图像作为目标帧的背景参考图像。也就是将图像融合模型输出的对目标帧的前一帧的处理结果作为目标帧的背景参考图像。
例如,将帧A作为待处理的彩色图像输入图像融合模型,得到融合后的帧A,将融合后的帧A作为帧A+1的背景参考图像,然后将帧A+1作为待处理的彩色图像、融合后的帧A作为帧A+1的背景参考图像输入图像融合模型。
再如,将待处理的彩色图像进行时域降噪后的结果作为背景参考图像。
由于大部分监控场景下,背景变化较少,画面的背景部分存在较高的相似度。对于监控设备位置不变的场景,可以将照度高的情况下拍摄的彩色图像作为背景参考图像。例如,将天气晴朗的白天拍摄的彩色图像作为夜间拍摄的彩色图像的背景参考图像。
以上仅为示例,其他获取方式可以参考前述方法800中的步骤S810。应理解,以上获取背景参考图像的方式仅为示例,本申请对此不做限定。
S1120,将待处理的彩色图像、红外图像和背景参考图像输入图像融合模型中进行特征提取,基于提取的特征进行图像融合,得到融合图像。
示例性地,图像融合模型可以参考方法800中的步骤S820。
例如,如图13所示,将背景参考图像输入融合模型中的编码器11#(第一编码器的一例),由编码器11#提取背景参考图像的特征,并输入解码器12#中。将的彩色图像和红外图像输入编码器13#(第二编码器的一例,也可以理解为第三编码器的一例),由编码器13#提取输入的彩色图像和输入的红外图像的特征,并输入解码器12#中。由解码器12#根据输入的特征重建得到融合后的图像。编码器11#、编码器13#和解码器12#均可以为卷积神经网络。例如,该输入的彩色图像可以为待处理的彩色图像,输入的红外图像可以为红外图像。
可选地,方法1100可以应用于视频模式,也就是用于视频融合的场景。
进一步地,可以将待处理的彩色图像的前一帧的背景参考图像的特征作为背景参考图像的特征。也就是说,将其中一帧的背景参考图像的特征复用于多帧彩色图像的图像融合过程中。
例如,将帧A、背景参考图像A和红外图像A输入图像融合模型,分别提取帧A、背景参考图像A和红外图像A的特征,然后根据提取的特征重建得到融合后的图像,即帧A对应的融合结果。将帧A+1和红外图像A+1输入图像融合模型,分别提取帧A+1和红外图像A+1的特征,并将背景参考图像A的特征作为帧A+1的背景参考图像的特征,然后根据提取的特征重建得到融合后的图像,即帧A+1对应的融合结果。
这样,无需在每次融合的过程中均提取背景参考图像的特征,减少了计算量,能够在保证成像质量的同时减少硬件开销,在设备的计算资源有限的情况下,仍然可以实现图像融合。
示例性地,编码器网络包括M个第一编码器、N个第二编码器,解码器网络包括N个解码器。也就是图像融合模型包括M个第一编码器、N个第二编码器和N个解码器。M为正整数,N为大于1的正整数,N>M。
待处理的彩色图像可以包括N帧彩色图像,红外图像包括该N帧彩色图像对应的N帧红外图像。
示例性地,N帧彩色图像对应的N帧红外图像可以是与N帧彩色图像在同一时刻对同一区域进行拍摄的情况下获得的。也就是说N帧彩色图像与N帧红外图像是一一对应的。
可选地,步骤S1120包括:
(1)分别提取N帧彩色图像的特征和N帧红外图像的特征。
具体地,将N帧彩色图像和N帧红外图像分别输入N个第二编码器中,N个第二编码器分别提取N帧彩色图像的特征和N帧红外图像的特征,并分别输入N个解码器中;
(2)分别提取M个背景参考图像的特征;
具体地,将N帧彩色图像中的M帧彩色图像对应的M个背景参考图像分别输入M个第一编码器中,M个第一编码器分别提取M个背景参考图像的特征,并将M个背景参考图像的特征分别输入N个解码器中,每个解码器接收M个背景参考图像中的一个背景参考图像的特征;
(3)根据N帧彩色图像的特征和N帧红外图像的特征以及M个背景参考图像的特征分别重建得到N个融合图像。
具体地,N个解码器分别根据输入N个解码器的特征重建得到N个融合图像,输入N个解码器的特征包括:N帧彩色图像的特征、N帧红外图像的特征和M个背景参考图像的特征。
具体地,对于每个解码器,从M帧彩色图像中选择与该解码器接收到的彩色图像的帧数最接近的彩色图像,将该最接近的彩色图像对应的背景参考图像输入该解码器中。
例如,对于解码器A,输入该解码器A的特征为帧A的特征和帧A对应的红外图像的特征。若帧A为M帧彩色图像中的一帧图像,将帧A的背景参考图像的特征输入解码器A中。若帧A不属于M帧彩 色图像,将M帧彩色图像中与帧A的帧数最接近的一帧彩色图像的背景参考图像的特征输入解码器A中。
例如,如图14所示,将第i帧的背景参考图像的特征复用于第i帧和第i+1帧的图像融合。这样可以同时对两帧进行融合,提高了处理速度,同时无需提取两次背景参考图像的特征,减少了计算量,在效果基本无损的情况下,可以将方案的开销降低25%。
可以理解方法800为该图像融合模型的训练阶段,具体训练采用由方法800中提供的图像融合模型进行的,方法1100可以理解为图像融合模型的应用阶段,具体可以体现为采用由方法800训练得到的图像融合模型,并根据用户输入的待处理的彩色图像和红外图像,得到输出图像,即方法1100中的融合图像。
需要说明的是,方法1100可以采用方法800训练得到的图像融合模型,也可以不采用由方法800训练得到的图像融合模型。
根据本申请实施例的方案,通过将背景参考图像增加为图像融合模型的输入,能够解决红外图像手电筒效应可能导致的背景模糊问题,很大程度上提升输出图像的背景质量,即同时增强输出图像的前景区域质量和背景区域质量,实现全画面的图像增强。
本申请实施例提出一种图像融合方法1200,通过增加融合权重调整输出图像,满足不同的应用场景。
方法1200可以由图像融合装置来执行,该图像融合装置可以是云服务设备,也可以是终端设备,例如,电脑、服务器等运算能力足以用来执行图像融合方法的装置,也可以是由云服务设备和终端设备构成的系统。示例性地,方法1200可以由图1中的执行设备110、图3中的神经网络处理器30或图4中的执行设备410或本地设备执行。
例如,方法1200具体可以由如图4中的本地设备执行,该本地设备可以为监控设备。具体地,方法1200可以由监控设备上的计算模块执行。
示例性地,图像融合方法1200中使用的图像融合模型可以是通过上述方法900构建的。方法1200包括步骤S1210至步骤S1220。方法1200中的具体实现方式可以参照前述方法900,为了避免不必要的重复,下面在介绍方法1200时适当省略重复的描述。
S1210,获取待处理的彩色图像、红外图像和融合权重。
融合权重用于对待处理的彩色图像和红外图像进行加权。
也就是说,融合权重用于调整待处理的彩色图像和红外图像在融合图像中的融合比例。
示例性地,融合权重可以为参数形式,也可以为图像形式,即融合权重图。也就是说,融合权重可以以参数的形式输入图像融合模型中,也可以以图像形式输入图像融合模型中。
可选地,融合权重对应全部的融合图像。
其中,融合权重对应全部的融合图像可以理解为,在整个融合图像中,融合权重仅有一个。在该融合图像中的任意区域中,待处理的彩色图像和红外图像的融合比例是相同的。该融合权重可以称为全局权重。
可选地,融合权重对应部分述融合图像。
融合权重对应部分融合图像可以理解为,融合权重对应融合图像中的一个区域。在该情况下,融合权重的数量可以为多个,多个融合权重分别对应融合图像中的不同区域。该融合权重可以称为局部权重。
示例性地,融合权重大于等于0,且小于等于1,红外图像在融合图像中的比例与融合权重呈正相关关系。
也就是说,融合权重的取值范围为[0,1],融合权重可以用于指示红外图像在融合图像中所占的比重。融合权重越大,红外图像在融合图像中的比例越大,即融合图像中融合的红外信息越多。
融合权重可以通过多种方式获得。下面举例说明融合权重的获取方式。融合权重可以通过以下任意一种方式获得。应理解,以下仅为示例,还可以通过其他方式获取融合权重,本申请对此不做限定。
方式1:融合权重可以是预先设置的。例如,融合权重可以是人为设定的。
方式2:根据红外图像的强度确定融合权重。
例如,多个融合权重是根据红外图像不同区域的亮度值确定的。具体地,在红外图像中的亮度越高的区域,对应的融合权重越高。由于亮度越高的区域信噪比越高,可以根据红外图像的强度自适应调节权重值,在亮度越高的区域设置更高的权重,有利于使融合图像的质量更高。
方式3:根据待处理的彩色图像的信息熵和红外图像的信息熵确定融合权重。
多个融合权重是根据待处理的彩色图像的信息熵和红外图像的信息熵确定的。
例如,红外图像的区域A处的信息熵大于待处理的彩色图像的区域A处的信息熵,则区域A处红外图像对应的权重值较高。红外图像的区域B处的信息熵小于待处理的彩色图像的区域B处的信息熵,则区域B处红外图像对应的权重值较低。
信息熵来源包括但不限于梯度信息、对比度信息等。
通常图像的信息熵越大,图像越清楚,通过图像的信息熵自适应调节权重值,得到融合权重,使信息熵高的区域对应的权重较高,有利于使融合图像的质量更高。
方式4:根据人脸信息确定融合权重。
具体地,在人脸区域处可以设置较高的红外图像的权重值,在人脸区域之外的其他区域设置较低的红外图像的权重值,即待处理的彩色图像的权重值较高。
人脸区域的获取方式包括但不限于人脸检测、图像分割或人脸热力图等方式。
红外图像所包含的图像信息较多,即包含更多的纹理信息,因此,通过在人脸区域设置较高的红外图像的权重值,能够使人脸区域融合更多的红外信息,保留更多的细节,提高人脸区域的清晰度,有利于提高人脸识别的准确率。待处理的彩色图像中的色彩更真实,因此,通过在其他区域设置较低的待处理的红外图像的权重值,能够使其他区域融合更多的彩色信息,保证其他区域色彩的自然程度,使融合图像的效果更自然。
例如,如图15所示,获取图15的(a)中的人脸区域,根据人脸区域生成权重融合图,如图15的(b)所示。人脸区域的权重高于其他区域的权重。该权重值用于指示红外图像在融合图像中所占的比重。图15的(b)中的人脸区域的权重为0.6,其他区域的权重为0.1。
示例性地,人脸区域可以为人脸框。例如,图15中的矩形框。可替换地,人脸框还可以其他形状,例如,圆形框或不规则框等。
用于进行人脸检测的图像可以为待处理的彩色图像,也可以为红外图像。既可以在彩色图像上进行人脸检测,获得人脸框,也可以在红外图像上进行人脸检测,获得人脸框。例如,图15的(a)中为在红外图像上进行人脸检测。
应理解,图15中仅以融合权重图的形式表示融合权重作为示例,不对本申请实施例的方案构成限定,还可以以其他形式表示融合权重,例如,以参数值的形式表示融合权重。
需要说明的是,图15中仅以融合权重指示红外图像的权重值为例进行说明,不对本申请实施例的方案构成限定,融合权重还可以用于指示彩色图像的权重值等。
以上仅为示例,其他获取方式可以参考前述方法900中的步骤S910。应理解,以上获取融合权重的方式仅为示例,本申请对此不做限定。
S1220,将待处理的彩色图像、红外图像和融合权重输入图像融合模型中进行特征提取,基于提取的特征进行图像融合,得到融合图像。
该图像融合模型可以采用方法900训练得到图像融合模型。
图17示出了采用不同的融合权重得到的融合图像。图17的(a)采用全局权重,红外图像对应的权重值为0.1,该融合图像的类似于对彩色图像进行降噪,融合的红外信息较少,画面的清晰度较低,尤其是人脸区域较为模糊。图17的(b)采用全局权重,红外图像对应的权重值为0.6,该融合图像的清晰度较高,人脸区域的清晰度提高,有利于进行人脸识别等后续处理,但如图中的箭头处所示,人体区域融合 的纹理信息较多,导致人体区域油画感较重,图像自然程度低。图17的(c)采用图15的(b)所示的融合权重,即人脸区域红外图像对应的权重值为0.6,其他区域红外图像对应的权重值为0.1。该融合图像的人脸清晰度较高,同时保证了其他区域的自然程度。
可以理解方法900为该图像融合模型的训练阶段,具体训练采用由方法900中提供的图像融合模型进行的,方法1200可以理解为图像融合模型的应用阶段,具体可以体现为采用由方法900训练得到的图像融合模型,并根据用户输入的待处理的彩色图像和红外图像,得到输出图像,即方法1200中的融合图像。
需要说明的是,方法1200可以采用方法900训练得到的图像融合模型,也可以不采用由方法900训练得到的图像融合模型。
示例性地,方法1100和方法1200可以结合使用,也就是将融合权重、红外图像、待处理的彩色图像和背景参考图像输入图像融合模型中进行图像融合,得到融合图像。例如,采用图13中的图像融合模型,将融合权重图、红外图像、待处理的彩色图像输入第二编码器中,将背景参考图像输入第一编码器中,以执行图像融合。
使用相同的图像融合模型进行融合得到的融合图像无法满足不同应用场景的融合要求,根据本申请实施例的方案,引入融合权重,通过调整融合权重,能够调整彩色图像和红外图像的融合比例,有利于应用于不同的应用场景。也就是说无需针对不同的应用场景分别训练多个图像融合模型,仅通过调整融合权重,即可应用于不同的场景,提高了模型使用的自由度。
此外,根据本申请实施例的方案,不同区域对应不同的融合权重,以满足同一图像中的不同区域对图像融合的要求,有利于提高输出图像的图像质量。
下面结合图20至图23对本申请实施例的装置进行说明。应理解,下面描述的装置能够执行前述本申请实施例的方法,为了避免不必要的重复,下面在介绍本申请实施例的装置时适当省略重复的描述。
图20是本申请实施例的图像融合模型的训练装置的示意性框图。图20所示的图像融合模型的训练装置2000包括获取单元2010和处理单元2020。
获取单元2010和处理单元2020可以用于执行本申请实施例的图像融合模型的训练方法700、方法800或方法900。
示例性地,获取单元2010,用于获取至少一个训练样本,训练样本包括第一彩色图像、第一红外图像、目标彩色图像和目标红外图像,第一彩色图像、第一红外图像、目标彩色图像和目标红外图像是针对同一场景拍摄的,同一场景指的是第一彩色图像、第一红外图像、目标彩色图像和目标红外图像中的任意两个图像之间的相似度大于第一阈值,第一彩色图像和目标彩色图像为场景对可见光的反射形成的影像,第一红外图像和目标红外图像为场景对红外波段的光的反射形成的影像;目标彩色图像的信噪比高于第一彩色图像的信噪比,目标红外图像的信噪比高于第一红外图像的信噪比。处理单元2020用于:以第一彩色图像和第一红外图像作为图像融合模型的输入,以损失函数的值小于第四阈值为目标对图像融合模型进行训练,得到训练好的图像融合模型;其中,损失函数包括第一损失函数,第一损失函数用于指示图像融合模型输出的图像与目标融合图像之间的差异,目标融合图像是根据目标彩色图像和目标红外图像确定的。
可选地,作为一个实施例,处理单元2020具体用于:以第一融合权重、第一彩色图像和第一红外图像作为图像融合模型的输入,以损失函数的值小于第四阈值为目标对图像融合模型进行训练,得到训练好的图像融合模型,第一融合权重用于对第一彩色图像和第一红外图像进行加权,目标融合图像是根据第一融合权重、目标彩色图像和目标红外图像确定的。
可选地,作为一个实施例,第一融合权重对应部分或全部的图像融合模型输出的图像。
可选地,作为一个实施例,处理单元2020还用于以第一背景参考图、第一彩色图像和第一红外图像作为图像融合模型的输入,以损失函数的值小于第四阈值为目标对图像融合模型进行训练,得到训练好的图像融合模型,第一背景参考图像与第一彩色图像之间的相似度大于第二阈值。
可选地,作为一个实施例,损失函数还包括第二损失函数,第二损失函数用于指示目标彩色图像与图像融合模型输出的图像之间的差异。
可选地,作为一个实施例,目标融合图像为亮度通道的图像,图像融合模型输出的图像与目标融合图像之间的差异为图像融合模型输出的图像的亮度通道与目标融合图像之间的差异。
图21是本申请实施例的图像融合装置的示意性框图。图21所示的图像融合装置3000包括获取单元3010和处理单元3020。
获取单元3010和处理单元3020可以用于执行本申请实施例的图像融合方法1000、方法1100或方法1200。
示例性地,获取单元3010用于获取待处理的彩色图像、红外图像以及背景参考图像,红外图像和待处理的彩色图像是针对同一场景拍摄的,同一场景指的是待处理的彩色图像和红外图像之间的相似度大于第一阈值;待处理的彩色图像为场景对可见光的反射形成的影像,红外图像为场景对红外波段的光的反射形成的影像。处理单元3020用于将待处理的彩色图像、红外图像和背景参考图像输入训练好的图像融合模型中进行特征提取,基于提取的特征进行图像融合,以得到融合图像;其中,背景参考图像与待处理的彩色图像之间的相似度大于第二阈值。
可选地,作为一个实施例,处理单元3020还用于:获取融合权重,将融合权重输入图像融合模型中;其中,融合权重用于对待处理的彩色图像和红外图像进行加权。
可选地,作为一个实施例,融合权重对应部分或全部的融合图像。
可选地,作为一个实施例,待处理的彩色图像包括N帧彩色图像,红外图像包括N帧彩色图像对应的N帧红外图像,N帧彩色图像对应的背景参考图像是根据N帧彩色图像中的M帧彩色图像的背景参考图像确定的,M为正整数,N为大于1的正整数,N>M。
可选地,作为一个实施例,图像融合模型包括M个第一编码器、N个第二编码器和N个解码器,以及处理单元3020具体用于:分别提取N帧彩色图像的特征和N帧红外图像的特征;分别提取M帧彩色图像对应的M个背景参考图像的特征;根据N帧彩色图像的特征和N帧红外图像的特征以及M个背景参考图像的特征分别重建得到N个融合图像。
可选地,作为一个实施例,背景参考图像是通过以下任一方式获得的:根据待处理的彩色图像之前的多帧得到背景参考图像;将待处理的彩色图像之前的长曝光帧作为背景参考图像,长曝光帧为在曝光时长大于第三阈值的情况下得到的帧;将待处理的彩色图像进行时域降噪后的结果作为背景参考图像;或者将待处理的彩色图像之前的帧对应的融合图像作为背景参考图像。
可选地,作为一个实施例,训练好的图像融合模型是通过以第一彩色图像和第一红外图像作为图像融合模型的输入,以损失函数的值小于第四阈值为目标对图像融合模型进行训练得到的;损失函数包括第一损失函数,第一损失函数用于指示图像融合模型输出的图像与目标融合图像之间的差异,目标融合图像是根据目标彩色图像和目标红外图像确定的,第一彩色图像、第一红外图像、目标彩色图像和目标红外图像是针对同一场景拍摄的,同一场景指的是第一彩色图像、第一红外图像、目标彩色图像和目标红外图像中的任意两个图像之间的相似度大于第一阈值,目标彩色图像的信噪比高于第一彩色图像的信噪比,目标红外图像的信噪比高于第一红外图像的信噪比。
可选地,作为一个实施例,损失函数还包括第二损失函数,第二损失函数用于指示目标彩色图像与图像融合模型输出的图像之间的差异。
可选地,作为一个实施例,目标融合图像为亮度通道的图像,图像融合模型输出的图像与目标融合图像之间的差异为图像融合模型输出的图像的亮度通道与目标融合图像之间的差异。
需要说明的是,上述装置2000和装置3000以功能单元的形式体现。这里的术语“单元”可以通过软件和/或硬件形式实现,对此不作具体限定。
例如,“单元”可以是实现上述功能的软件程序、硬件电路或二者结合。所述硬件电路可能包括应用特有集成电路(application specific integrated circuit,ASIC)、电子电路、用于执行一个或多个软件或固件程序的处理器(例如共享处理器、专有处理器或组处理器等)和存储器、合并逻辑电路和/或其它支持所描述的功能的合适组件。
因此,在本申请的实施例中描述的各示例的单元,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
图22是本申请实施例提供的一种图像融合模型的训练装置的硬件结构示意图。图22所示的图像融合模型的训练装置4000(该装置4000具体可以是一种计算机设备)包括存储器4001、处理器4002、通信接口4003以及总线4004。其中,存储器4001、处理器4002、通信接口4003通过总线4004实现彼此之间的通信连接。
存储器4001可以是只读存储器(read only memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(random access memory,RAM)。存储器4001可以存储程序,当存储器4001中存储的程序被处理器4002执行时,处理器4002和通信接口4003用于执行本申请实施例中的图像融合模型的训练方法的各个步骤。具体地,处理器4002可以执行上文中方法700、方法800或方法900。
处理器4002可以采用通用的中央处理器(central processing unit,CPU),微处理器,应用专用集成电路(application specific integrated circuit,ASIC),图形处理器或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例的图像融合模型的训练装置中的单元所需执行的功能,或者执行本申请方法实施例的图像融合模型的训练方法。
处理器4002还可以是一种集成电路芯片,具有信号的处理能力。例如,可以是图3所示的芯片。在实现过程中,本申请的图像融合模型的训练方法的各个步骤可以通过处理器4002中的硬件的集成逻辑电路或者软件形式的指令完成。
上述的处理器4002还可以是通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器4001,处理器4002读取存储器4001中的信息,结合其硬件完成本申请实施例的图像融合模型的训练装置中包括的单元所需执行的功能,或者执行本申请方法实施例的图像融合模型的训练方法。
通信接口4003使用例如但不限于收发器一类的收发装置,来实现装置4000与其他设备或通信网络之间的通信。例如,可以通过通信接口4003获取训练数据(如方法700中的第一彩色图像、第一红外图像、目标彩色图像和目标红外图像)。
总线4004可包括在装置4000各个部件(例如,存储器4001、处理器4002、通信接口4003)之间传送信息的通路。
应理解,图像融合模型的训练装置2000中的获取单元2010相当于图像融合模型的训练装置4000中的通信接口4003,处理单元2020可以相当于处理器4002。
图23是本申请实施例提供的图像融合装置的硬件结构示意图。图23所示的图像融合装置5000(该装置5000具体可以是一种计算机设备)包括存储器5001、处理器5002、通信接口5003以及总线5004。其 中,存储器5001、处理器5002、通信接口5003通过总线5004实现彼此之间的通信连接。
存储器5001可以是ROM,静态存储设备,动态存储设备或者RAM。存储器5001可以存储程序,当存储器5001中存储的程序被处理器5002执行时,处理器5002和通信接口5003用于执行本申请实施例的图像融合方法的各个步骤。
处理器5002可以采用通用的CPU,微处理器,ASIC,GPU或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例的图像融合装置中的单元所需执行的功能,或者执行本申请方法实施例的图像融合方法。
处理器5002还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请的图像融合方法的各个步骤可以通过处理器5002中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器5002还可以是通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器5001,处理器5002读取存储器5001中的信息,结合其硬件完成本申请实施例的图像融合装置中包括的单元所需执行的功能,或者执行本申请方法实施例的图像融合方法。
通信接口5003使用例如但不限于收发器一类的收发装置,来实现装置5000与其他设备或通信网络之间的通信。例如,可以通过通信接口5003获取输入数据(如本申请实施例中的待处理的彩色图像和红外图像)。
总线5004可包括在装置5000各个部件(例如,存储器5001、处理器5002、通信接口5003)之间传送信息的通路。
应理解,图像融合装置3000中的获取单元3010相当于图像融合装置5000中的通信接口5003;图像融合装置3000中的处理单元3020可以相当于处理器5002。
应注意,尽管图22和图23所示的装置4000和5000仅仅示出了存储器、处理器、通信接口,但是在具体实现过程中,本领域的技术人员应当理解,装置4000和5000还包括实现正常运行所必须的其他器件。同时,根据具体需要,本领域的技术人员应当理解,装置4000和5000还可包括实现其他附加功能的硬件器件。此外,本领域的技术人员应当理解,装置4000和5000也可仅仅包括实现本申请实施例所必须的器件,而不必包括图22或图23中所示的全部器件。
可以理解,装置4000相当于1中的训练设备120,装置5000相当于图1中的执行设备110。本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者 也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:通用串行总线闪存盘(USB flash disk,UFD),UFD也可以简称为U盘或者优盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (34)

  1. 一种图像融合方法,其特征在于,包括:
    获取待处理的彩色图像、红外图像以及背景参考图像,所述红外图像和所述待处理的彩色图像是针对同一场景拍摄的,所述同一场景指的是所述待处理的彩色图像和所述红外图像之间的相似度大于第一阈值;所述待处理的彩色图像为所述场景对可见光的反射形成的影像,所述红外图像为所述场景对红外波段的光的反射形成的影像,所述背景参考图像与所述待处理的彩色图像之间的相似度大于第二阈值;
    将所述待处理的彩色图像、所述红外图像和所述背景参考图像输入训练好的图像融合模型中进行特征提取,基于提取的特征进行图像融合,以得到融合图像。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    获取融合权重;
    将所述融合权重输入所述训练好的图像融合模型中;其中,所述融合权重用于对所述待处理的彩色图像和所述红外图像进行加权。
  3. 根据权利要求2所述的方法,其特征在于,所述融合权重对应部分或全部的所述融合图像。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述待处理的彩色图像包括N帧彩色图像,所述红外图像包括所述N帧彩色图像对应的N帧红外图像,所述N帧彩色图像对应的背景参考图像是根据所述N帧彩色图像中的M帧彩色图像的背景参考图像确定的,M为正整数,N为大于1的正整数,N>M。
  5. 根据权利要求4所述的方法,其特征在于,所述将所述待处理的彩色图像、所述红外图像和所述背景参考图像输入训练好的图像融合模型中进行特征提取,基于提取的特征进行图像融合,以得到融合图像,包括:
    分别提取所述N帧彩色图像的特征和所述N帧红外图像的特征;
    分别提取所述M帧彩色图像对应的M个背景参考图像的特征;
    根据所述N帧彩色图像的特征和所述N帧红外图像的特征以及所述M个背景参考图像的特征分别重建得到N个融合图像。
  6. 根据权利要求1至3中任一项所述的方法,其特征在于,所述背景参考图像是通过以下任一方式获得的:
    根据所述待处理的彩色图像之前的多帧得到所述背景参考图像;
    将所述待处理的彩色图像之前的长曝光帧作为所述背景参考图像,所述长曝光帧为在曝光时长大于第三阈值的情况下得到的帧;
    将所述待处理的彩色图像进行时域降噪后的结果作为所述背景参考图像;或者
    将所述待处理的彩色图像之前的帧的融合图像作为所述背景参考图像。
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,所述训练好的图像融合模型是通过以第一彩色图像和第一红外图像作为所述图像融合模型的输入,以损失函数的值小于第四阈值为目标对图像融合模型进行训练得到的;
    所述损失函数包括第一损失函数,所述第一损失函数用于指示所述图像融合模型输出的图像与目标融合图像之间的差异,所述目标融合图像是根据目标彩色图像和目标红外图像确定的,所述第一彩色图像、所述第一红外图像、所述目标彩色图像和所述目标红外图像是针对所述同一场景拍摄的,所述同一场景指的是所述第一彩色图像、所述第一红外图像、所述目标彩色图像和所述目标红外图像中的任意两个图像之间的相似度大于所述第一阈值,所述目标彩色图像的信噪比高于所述第一彩色图像的信噪比,所述目标红外图像的信噪比高于所述第一红外图像的信噪比。
  8. 根据权利要求7所述的方法,其特征在于,所述损失函数还包括第二损失函数,所述第二损失函 数用于指示所述目标彩色图像与所述图像融合模型输出的图像之间的差异。
  9. 根据权利要求7或8所述的方法,其特征在于,所述目标融合图像为亮度通道的图像,所述图像融合模型输出的图像与目标融合图像之间的差异为所述图像融合模型输出的图像的亮度通道与所述目标融合图像之间的差异。
  10. 一种图像融合模型的训练方法,其特征在于,包括:
    获取至少一个训练样本,所述训练样本包括第一彩色图像、第一红外图像、目标彩色图像和目标红外图像,所述第一彩色图像、所述第一红外图像、所述目标彩色图像和所述目标红外图像是针对同一场景拍摄的,所述同一场景指的是所述第一彩色图像、所述第一红外图像、所述目标彩色图像和所述目标红外图像中的任意两个图像之间的相似度大于第一阈值,所述第一彩色图像和所述目标彩色图像为所述场景对可见光的反射形成的影像,所述第一红外图像和所述目标红外图像为所述场景对红外波段的光的反射形成的影像;所述目标彩色图像的信噪比高于所述第一彩色图像的信噪比,所述目标红外图像的信噪比高于所述第一红外图像的信噪比;
    以所述第一彩色图像和所述第一红外图像作为图像融合模型的输入,以损失函数的值小于第四阈值为目标对所述图像融合模型进行训练,得到训练好的图像融合模型;
    其中,所述损失函数包括第一损失函数,所述第一损失函数用于指示所述图像融合模型输出的图像与目标融合图像之间的差异,所述目标融合图像是根据所述目标彩色图像和所述目标红外图像确定的。
  11. 根据权利要求10所述的方法,其特征在于,所述以所述第一彩色图像和所述第一红外图像作为图像融合模型的输入,以损失函数的值小于第四阈值为目标对所述图像融合模型进行训练,得到训练好的图像融合模型,包括:
    以第一融合权重、所述第一彩色图像和所述第一红外图像作为图像融合模型的输入,以损失函数的值小于第四阈值为目标对所述图像融合模型进行训练,得到训练好的图像融合模型,所述第一融合权重用于对所述第一彩色图像和所述第一红外图像进行加权,所述目标融合图像是根据所述第一融合权重、所述目标彩色图像和所述目标红外图像确定的。
  12. 根据权利要求11所述的方法,其特征在于,所述第一融合权重对应部分或全部的所述图像融合模型输出的图像。
  13. 根据权利要求10至12中任一项所述的方法,其特征在于,所述以所述第一彩色图像和所述第一红外图像作为图像融合模型的输入,以损失函数的值小于第四阈值为目标对所述图像融合模型进行训练,得到训练好的图像融合模型,包括:
    以第一背景参考图、所述第一彩色图像和所述第一红外图像作为图像融合模型的输入,以损失函数的值小于第四阈值为目标对所述图像融合模型进行训练,得到训练好的图像融合模型,所述第一背景参考图像与所述第一彩色图像之间的相似度大于第二阈值。
  14. 根据权利要求10至13中任一项所述的方法,其特征在于,所述损失函数还包括第二损失函数,所述第二损失函数用于指示所述目标彩色图像与所述图像融合模型输出的图像之间的差异。
  15. 根据权利要求10至14中任一项所述的方法,其特征在于,所述目标融合图像为亮度通道的图像,所述图像融合模型输出的图像与目标融合图像之间的差异为所述图像融合模型输出的图像的亮度通道与所述目标融合图像之间的差异。
  16. 一种图像融合装置,其特征在于,包括:
    获取单元,用于获取待处理的彩色图像、红外图像以及背景参考图像,所述红外图像与所述待处理的彩色图像是针对同一场景拍摄的,所述同一场景指的是所述待处理的彩色图像和所述红外图像之间的相似度大于第一阈值;所述待处理的彩色图像为所述场景对可见光的反射形成的影像,所述红外图像为所述场景对红外波段的光的反射形成的影像,所述背景参考图像与所述待处理的彩色图像之间的相似度大于第二 阈值;
    处理单元,用于将所述待处理的彩色图像、所述红外图像和所述背景参考图像输入训练好的图像融合模型中进行特征提取,基于提取的特征进行图像融合,以得到融合图像。
  17. 根据权利要求16所述的装置,其特征在于,所述处理单元还用于:
    获取融合权重;
    将所述融合权重输入所述训练好的图像融合模型中;
    其中,所述融合权重用于对所述待处理的彩色图像和所述红外图像进行加权。
  18. 根据权利要求17所述的装置,其特征在于,所述融合权重对应部分或全部的所述融合图像。
  19. 根据权利要求16至18中任一项所述的装置,其特征在于,所述待处理的彩色图像包括N帧彩色图像,所述红外图像包括所述N帧彩色图像对应的N帧红外图像,所述N帧彩色图像对应的背景参考图像是根据所述N帧彩色图像中的M帧彩色图像的背景参考图像确定的,M为正整数,N为大于1的正整数,N>M。
  20. 根据权利要求19所述的装置,其特征在于,所述处理单元具体用于:
    分别提取所述N帧彩色图像的特征和所述N帧红外图像的特征;
    分别提取所述M帧彩色图像对应的M个背景参考图像的特征;
    根据所述N帧彩色图像的特征和所述N帧红外图像的特征以及所述M个背景参考图像的特征分别重建得到N个融合图像。
  21. 根据权利要求16至18中任一项所述的装置,其特征在于,所述背景参考图像是通过以下任一方式获得的:
    根据所述待处理的彩色图像之前的多帧得到所述背景参考图像;
    将所述待处理的彩色图像之前的长曝光帧作为所述背景参考图像,所述长曝光帧为在曝光时长大于第三阈值的情况下得到的帧;
    将所述待处理的彩色图像进行时域降噪后的结果作为所述背景参考图像;或者
    将所述待处理的彩色图像之前的帧的融合图像作为所述背景参考图像。
  22. 根据权利要求16至21中任一项所述的装置,其特征在于,所述训练好的图像融合模型是通过以第一彩色图像和第一红外图像作为所述图像融合模型的输入,以损失函数的值小于第四阈值为目标对图像融合模型进行训练得到的;
    所述损失函数包括第一损失函数,所述第一损失函数用于指示所述图像融合模型输出的图像与目标融合图像之间的差异,所述目标融合图像是根据目标彩色图像和目标红外图像确定的,所述第一彩色图像、所述第一红外图像、所述目标彩色图像和所述目标红外图像是针对所述同一场景拍摄的,所述同一场景指的是所述第一彩色图像、所述第一红外图像、所述目标彩色图像和所述目标红外图像中的任意两个图像之间的相似度大于所述第一阈值,所述目标彩色图像的信噪比高于所述第一彩色图像的信噪比,所述目标红外图像的信噪比高于所述第一红外图像的信噪比。
  23. 根据权利要求22所述的装置,其特征在于,所述损失函数还包括第二损失函数,所述第二损失函数用于指示所述目标彩色图像与所述图像融合模型输出的图像之间的差异。
  24. 根据权利要求22或23所述的装置,其特征在于,所述目标融合图像为亮度通道的图像,所述图像融合模型输出的图像与目标融合图像之间的差异为所述图像融合模型输出的图像的亮度通道与所述目标融合图像之间的差异。
  25. 一种图像融合模型的训练装置,其特征在于,包括:
    获取单元,用于获取至少一个训练样本,所述训练样本包括第一彩色图像、第一红外图像、目标彩色图像和目标红外图像,所述第一彩色图像、所述第一红外图像、所述目标彩色图像和所述目标红外图像是 针对同一场景拍摄的,所述同一场景指的是所述第一彩色图像、所述第一红外图像、所述目标彩色图像和所述目标红外图像中的任意两个图像之间的相似度大于第一阈值,所述第一彩色图像和所述目标彩色图像为所述场景对可见光的反射形成的影像,所述第一红外图像和所述目标红外图像为所述场景对红外波段的光的反射形成的影像;所述目标彩色图像的信噪比高于所述第一彩色图像的信噪比,所述目标红外图像的信噪比高于所述第一红外图像的信噪比;
    处理单元,用于以所述第一彩色图像和所述第一红外图像作为图像融合模型的输入,以损失函数的值小于第四阈值为目标对所述图像融合模型进行训练,得到训练好的图像融合模型;
    其中,所述损失函数包括第一损失函数,所述第一损失函数用于指示所述图像融合模型输出的图像与目标融合图像之间的差异,所述目标融合图像是根据所述目标彩色图像和所述目标红外图像确定的。
  26. 根据权利要求25所述的装置,其特征在于,所述处理单元具体用于:
    以第一融合权重、所述第一彩色图像和所述第一红外图像作为图像融合模型的输入,以损失函数的值小于第四阈值为目标对所述图像融合模型进行训练,得到训练好的图像融合模型,所述第一融合权重用于对所述第一彩色图像和所述第一红外图像进行加权,所述目标融合图像是根据所述第一融合权重、所述目标彩色图像和所述目标红外图像确定的。
  27. 根据权利要求26所述的装置,其特征在于,所述第一融合权重对应部分或全部的所述图像融合模型输出的图像。
  28. 根据权利要求25至27中任一项所述的装置,其特征在于,所述处理单元还用于以第一背景参考图、所述第一彩色图像和所述第一红外图像作为图像融合模型的输入,以损失函数的值小于第四阈值为目标对所述图像融合模型进行训练,得到训练好的图像融合模型,所述第一背景参考图像与所述第一彩色图像之间的相似度大于第二阈值。
  29. 根据权利要求25至28中任一项所述的装置,其特征在于,所述损失函数还包括第二损失函数,所述第二损失函数用于指示所述目标彩色图像与所述图像融合模型输出的图像之间的差异。
  30. 根据权利要求25至29中任一项所述的装置,其特征在于,所述目标融合图像为亮度通道的图像,所述图像融合模型输出的图像与目标融合图像之间的差异为所述图像融合模型输出的图像的亮度通道与所述目标融合图像之间的差异。
  31. 一种图像融合装置,其特征在于,包括处理器和存储器,所述存储器用于存储程序指令,所述处理器用于调用所述程序指令来执行权利要求1至9中任一项所述的方法。
  32. 一种图像融合模型的训练装置,其特征在于,包括处理器和存储器,所述存储器用于存储程序指令,所述处理器用于调用所述程序指令来执行权利要求10至15中任一项所述的方法。
  33. 一种计算机可读存储介质,其特征在于,所述计算机可读介质存储用于设备执行的程序代码,该程序代码包括用于执行如权利要求1至9或10至15中任一项所述的方法。
  34. 一种电子设备,其特征在于,所述电子设备包括处理器与存储器,所述处理器与所述存储器耦合,所述处理器用于读取存储器上存储的指令,以执行如权利要求1至9或10至15中任一项所述的方法。
PCT/CN2021/104634 2020-08-31 2021-07-06 图像融合方法、图像融合模型的训练方法和装置 WO2022042049A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21859889.4A EP4198875A4 (en) 2020-08-31 2021-07-06 IMAGE FUSION METHOD AND TRAINING METHOD AND APPARATUS FOR IMAGE FUSION MODEL
US18/176,240 US20230214976A1 (en) 2020-08-31 2023-02-28 Image fusion method and apparatus and training method and apparatus for image fusion model

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010901107.9A CN114119378A (zh) 2020-08-31 2020-08-31 图像融合方法、图像融合模型的训练方法和装置
CN202010901107.9 2020-08-31

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/176,240 Continuation US20230214976A1 (en) 2020-08-31 2023-02-28 Image fusion method and apparatus and training method and apparatus for image fusion model

Publications (1)

Publication Number Publication Date
WO2022042049A1 true WO2022042049A1 (zh) 2022-03-03

Family

ID=80352555

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/104634 WO2022042049A1 (zh) 2020-08-31 2021-07-06 图像融合方法、图像融合模型的训练方法和装置

Country Status (4)

Country Link
US (1) US20230214976A1 (zh)
EP (1) EP4198875A4 (zh)
CN (1) CN114119378A (zh)
WO (1) WO2022042049A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115861380A (zh) * 2023-02-16 2023-03-28 深圳市瓴鹰智能科技有限公司 雾天低照度场景下端到端无人机视觉目标跟踪方法及装置
CN116048323A (zh) * 2022-05-27 2023-05-02 荣耀终端有限公司 图像处理方法及电子设备
CN116236164A (zh) * 2022-12-20 2023-06-09 哈尔滨海鸿基业科技发展有限公司 一种血运重建实时评估装置
CN116757988A (zh) * 2023-08-17 2023-09-15 齐鲁工业大学(山东省科学院) 基于语义丰富和分割任务的红外与可见光图像融合方法
TWI837039B (zh) 2023-07-18 2024-03-21 長庚大學 用於人體穴位的擴增實境的影像疊合方法、影像顯示方法及臨床測驗方法

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114666458A (zh) * 2020-12-22 2022-06-24 富泰华工业(深圳)有限公司 图像融合方法、装置、电子设备及存储介质
US20220374625A1 (en) * 2021-05-07 2022-11-24 Google Llc Machine-Learned Models for Unsupervised Image Transformation and Retrieval
TWI830230B (zh) * 2022-05-18 2024-01-21 逢甲大學 物件自動追蹤系統及其偵測方法
CN115578797B (zh) * 2022-09-30 2023-08-29 北京百度网讯科技有限公司 模型训练方法、图像识别方法、装置及电子设备
CN116725467B (zh) * 2023-03-31 2024-03-01 苏州宇懋医学科技有限公司 内窥镜装置、内窥镜医疗辅助系统及内窥镜图像处理方法
CN116452466B (zh) * 2023-06-14 2023-10-20 荣耀终端有限公司 图像处理方法、装置、设备及计算机可读存储介质
CN117834891B (zh) * 2024-03-06 2024-05-07 成都凌亚科技有限公司 一种视频信号压缩处理及发送方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3319040A1 (en) * 2015-08-05 2018-05-09 Wuhan Guide Infrared Co., Ltd. Visible light image and infrared image fusion processing system and fusion method
CN108875669A (zh) * 2018-06-28 2018-11-23 武汉市哈哈便利科技有限公司 一种基于可见光与红外图像融合的商品识别技术
CN109360175A (zh) * 2018-10-12 2019-02-19 云南大学 一种红外与可见光的图像融合方法
CN109919887A (zh) * 2019-02-25 2019-06-21 中国人民解放军陆军工程大学 一种基于深度学习的无监督图像融合方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101727665B (zh) * 2008-10-27 2011-09-07 广州飒特电力红外技术有限公司 红外图像和可见光图像融合的方法及装置
CN106780392B (zh) * 2016-12-27 2020-10-02 浙江大华技术股份有限公司 一种图像融合方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3319040A1 (en) * 2015-08-05 2018-05-09 Wuhan Guide Infrared Co., Ltd. Visible light image and infrared image fusion processing system and fusion method
CN108875669A (zh) * 2018-06-28 2018-11-23 武汉市哈哈便利科技有限公司 一种基于可见光与红外图像融合的商品识别技术
CN109360175A (zh) * 2018-10-12 2019-02-19 云南大学 一种红外与可见光的图像融合方法
CN109919887A (zh) * 2019-02-25 2019-06-21 中国人民解放军陆军工程大学 一种基于深度学习的无监督图像融合方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
See also references of EP4198875A4 *
XIE CHUNYU, XU JIAN;LI XINDE;WU WE: "Infrared and Visible Image Fusion Method Based on Deep Learning", COMMAND INFORMATION SYSTEM AND TECHNOLOGY, vol. 11, no. 2, 28 April 2020 (2020-04-28), CN , pages 15 - 20+38, XP055902944, ISSN: 1674-909X, DOI: 10.15908/j.cnki.cist.2020.02.003 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116048323A (zh) * 2022-05-27 2023-05-02 荣耀终端有限公司 图像处理方法及电子设备
CN116048323B (zh) * 2022-05-27 2023-11-24 荣耀终端有限公司 图像处理方法及电子设备
CN116236164A (zh) * 2022-12-20 2023-06-09 哈尔滨海鸿基业科技发展有限公司 一种血运重建实时评估装置
CN116236164B (zh) * 2022-12-20 2023-12-08 哈尔滨海鸿基业科技发展有限公司 一种血运重建实时评估装置
CN115861380A (zh) * 2023-02-16 2023-03-28 深圳市瓴鹰智能科技有限公司 雾天低照度场景下端到端无人机视觉目标跟踪方法及装置
TWI837039B (zh) 2023-07-18 2024-03-21 長庚大學 用於人體穴位的擴增實境的影像疊合方法、影像顯示方法及臨床測驗方法
CN116757988A (zh) * 2023-08-17 2023-09-15 齐鲁工业大学(山东省科学院) 基于语义丰富和分割任务的红外与可见光图像融合方法
CN116757988B (zh) * 2023-08-17 2023-12-22 齐鲁工业大学(山东省科学院) 基于语义丰富和分割任务的红外与可见光图像融合方法

Also Published As

Publication number Publication date
EP4198875A1 (en) 2023-06-21
US20230214976A1 (en) 2023-07-06
EP4198875A4 (en) 2024-02-21
CN114119378A (zh) 2022-03-01

Similar Documents

Publication Publication Date Title
WO2022042049A1 (zh) 图像融合方法、图像融合模型的训练方法和装置
WO2020192483A1 (zh) 图像显示方法和设备
WO2021164731A1 (zh) 图像增强方法以及图像增强装置
EP4109392A1 (en) Image processing method and image processing device
WO2020152521A1 (en) Systems and methods for transforming raw sensor data captured in low-light conditions to well-exposed images using neural network architectures
WO2021063341A1 (zh) 图像增强方法以及装置
WO2020199831A1 (zh) 图像处理模型的训练方法、图像处理方法、网络设备及存储介质
WO2020177607A1 (zh) 图像去噪方法和装置
CN112446380A (zh) 图像处理方法和装置
WO2022001372A1 (zh) 训练神经网络的方法、图像处理方法及装置
CN112348747A (zh) 图像增强方法、装置及存储介质
WO2020036782A2 (en) Methods and systems for object recognition in low illumination conditions
WO2022100419A1 (zh) 一种图像处理方法及相关设备
CN114627034A (zh) 一种图像增强方法、图像增强模型的训练方法及相关设备
WO2022165722A1 (zh) 单目深度估计方法、装置及设备
WO2024002211A1 (zh) 一种图像处理方法及相关装置
CN114339054A (zh) 拍照模式的生成方法、装置和计算机可读存储介质
CN115131256A (zh) 图像处理模型、图像处理模型的训练方法及装置
CN113284055A (zh) 一种图像处理的方法以及装置
CN112927162A (zh) 一种面向低照度图像的增强方法及系统
CN112561813B (zh) 人脸图像增强方法、装置、电子设备及存储介质
CN116055895B (zh) 图像处理方法及其装置、芯片系统和存储介质
CN112446835A (zh) 图像恢复方法、图像恢复网络训练方法、装置和存储介质
US11574484B1 (en) High resolution infrared image generation using image data from an RGB-IR sensor and visible light interpolation
CN116917954A (zh) 图像检测方法、装置和电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21859889

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021859889

Country of ref document: EP

Effective date: 20230314

NENP Non-entry into the national phase

Ref country code: DE