CN117876943A

CN117876943A - Image processing method, training method of image enhancement model and electronic equipment

Info

Publication number: CN117876943A
Application number: CN202410279945.5A
Authority: CN
Inventors: 杨建权; 吴日辉; 杨永兴
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2024-03-12
Filing date: 2024-03-12
Publication date: 2024-04-12

Abstract

The application discloses an image processing method, a training method of an image enhancement model and electronic equipment, which relate to the field of image processing, wherein the electronic equipment can comprise: the multi-spectral device, the RGB camera and the processor, wherein the multi-spectral device can be used for acquiring multi-spectral image data, the RGB camera can be used for acquiring RGB image data, the processor can be used for fusing the multi-spectral image data with the RGB image data to acquire a fused image, and the multi-spectral image and the spectrum information of the RGB image can be fused because the multi-spectral image and the RGB image are fused, so that the fused image has richer spectrum information, namely richer details, relative to the original RGB image, and therefore the multi-spectral image and the RGB image can be fused to obtain the fused image with higher quality and clearer compared with the original RGB image.

Description

Image processing method, training method of image enhancement model and electronic equipment

Technical Field

The present disclosure relates to the field of image processing, and in particular, to an image processing method, a training method of an image enhancement model, and an electronic device.

Background

The RGGB sensor refers to a sensor which adopts a color filter array (Color Filter Array, CFA) to sequentially arrange red, green and blue, and the arrangement mode enables the sensor to convert gray information into color information, so that a color effect close to human eyes can be displayed.

Currently, sensors in some electronic equipment cameras are RGGB sensors, and electronic equipment can image through the RGGB sensors.

Because of the RGGB sensor, the information quality collected by the device is poor, such as poor signal-to-noise ratio of signals collected under dim light, and the effective pixel number obtained by long-focus shooting is small. Therefore, in the process of shooting a portrait by an electronic device, some challenging scenes such as dim light, long-distance shooting and the like cause quality deviation of the shot image, unclear image and the like exist.

Disclosure of Invention

The application provides an image processing method, an image enhancement model training method and electronic equipment, so as to improve the image quality and definition of RGB images.

In order to achieve the above purpose, the present application adopts the following technical scheme:

in a first aspect, the present application provides an electronic device, which may include: the multi-spectral device, the RGB camera and the processor, wherein the multi-spectral device can be used for acquiring multi-spectral image data, the RGB camera can be used for acquiring RGB image data, the processor can be used for fusing the multi-spectral image data with the RGB image data to acquire a fused image, and the multi-spectral image and the spectrum information of the RGB image can be fused due to the fact that the multi-spectral image and the RGB image are fused, so that the fused image has richer spectrum information, namely richer details, compared with the original RGB image, and therefore the multi-spectral image and the RGB image can be fused to obtain the fused image with higher quality and clearer definition compared with the original RGB image.

In some possible implementations, the multispectral device includes at least eight channels, a spectral width of 10nm to 20nm, and a spectral range of 400nm to 1000nm, and the processor may be specifically configured to input multispectral image data and RGB image data into a pre-trained image enhancement model for fusion.

In some possible implementation manners, the processor may determine a target portrait effect in the fused image, determine multispectral image data with a target wavelength in multispectral image data according to the target portrait effect, for example, the multispectral image data includes a multispectral image with a wavelength of 460nm, a multispectral image with a wavelength of 645nm, and the like, wherein skin pores and dark spots of a portrait in a short-wave image have obvious reflection differences, the multispectral image with a short wave band is selected to be fused with an RGB image, the generated image has more details, the portrait skin reflection of the long-wave image is relatively uniform, the multispectral image with a long wave band is relatively higher signal to noise ratio, the multispectral image with the RGB image is selected to be fused, and the generated image may have a skin beautifying effect. The image enhancement model may also be called a multispectral auxiliary portrait enhancement algorithm network, and the structure may be as shown in fig. 4.

Specifically, the processor may be configured to downsample the multispectral image data and the RGB image data by using a discrete wavelet transform DWT, obtain a low-frequency image and a high-frequency image corresponding to the multispectral image data and a low-frequency image and a high-frequency image corresponding to the RGB image data, perform feature fusion on the low-frequency image and the high-frequency image, obtain a high-low frequency fused image, and finally perform residual connection on the high-low frequency fused image to obtain a fused image.

In some implementations, the image enhancement model may include a first type of network branch and a second type of network branch, and the processor may be specifically configured to:

and performing multiple downsampling on multispectral image data and RGB image data by using a multilayer DWT, performing feature fusion on low-frequency image data and high-frequency image data obtained by downsampling each time to obtain corresponding high-frequency and low-frequency fusion images, performing residual connection RCB on the high-frequency and low-frequency fusion images in a first network branch, inputting the image data obtained by the RCB into a second network branch, performing RCB on the high-frequency and low-frequency fusion images in the second network branch, and performing selective kernel feature fusion SKFF on the image data obtained by the RCB and the image data input by the first network branch or performing SKFF on the image data obtained by the RCB of the network branch of the layer and the image data obtained by the SKFF in a next network branch, wherein the next network branch is the network branch corresponding to the next DWT.

For example, the number of network branches of the first type including network branches may be one, the number of network branches of the second type including network branches may be two, and the processor is specifically configured to perform RCB twice in the network branches of the first type and perform RCB and SKFF twice in each network branch of the second type, respectively, where RCB is performed at intervals from SKFF.

The network structure of RGB may be as shown in fig. 6, and the network structure of SKFF may be as shown in fig. 7.

In some possible implementations, the high frequency image data may include lateral high frequency image data, longitudinal high frequency image data, and oblique high frequency image data, and the processor may be specifically configured to: the method includes the steps of fusing low-frequency image data corresponding to multispectral image data with low-frequency image data corresponding to RGB image data, determining transverse high-frequency image data with large absolute value of pixels from the multispectral image data and the transverse high-frequency image data of the RGB image data, determining longitudinal high-frequency image data with large absolute value of pixels from the multispectral image data and the longitudinal high-frequency image data of the RGB image data, determining oblique high-frequency image data with large absolute value of pixels from the oblique high-frequency image data of the multispectral image data and the RGB image data, and performing feature fusion on the determined transverse high-frequency image data, longitudinal high-frequency image data and oblique high-frequency image data.

In some possible implementations, the RGB image may include high-definition RGB image data with low noise and low-definition RGB image data with high noise, the RGB data input to the image enhancement model is low-definition RGB image data with high noise, and the pre-trained image enhancement model is obtained by training according to a prediction value and a pseudo tag, and training according to multispectral image data input to the image enhancement model and low-definition RGB image data with high noise, wherein the prediction value is the image data output by the image enhancement model, and the pseudo tag is high-definition RGB image data with low noise corresponding to the low-definition RGB image data with high noise input to the image enhancement model.

In a second aspect, the present application provides an image processing method, which may be applied to an electronic device, where the electronic device includes a multispectral device and an RGB camera, and the method may include:

in response to triggering a photographing event, on the one hand, a multispectral device can be utilized to photograph to obtain multispectral image data, on the other hand, an RGB camera can be utilized to photograph to obtain RGB image data, then the multispectral image data and the RGB image data are input into a pre-trained image enhancement model to be fused to obtain a fusion image, and as the multispectral image and the RGB image are fused, the multispectral image and the spectral information of the RGB image can be fused, so that the fusion image has richer spectral information, namely richer details, compared with the original RGB image, the multispectral image and the RGB image can be fused to obtain a fusion image with higher quality and clearer quality compared with the original RGB image.

In some possible implementation manners, the target image effect in the fused image can be determined, the multispectral image data with the target wavelength in the multispectral image data is determined according to the target image effect, for example, the multispectral image data comprises a multispectral image with the wavelength of 460nm, a multispectral image with the wavelength of 645nm and the like, wherein the skin pores and dark spots of the person image in the short-wave band image have obvious reflection differences, the multispectral image with the short-wave band is selected to be fused with the RGB image, the generated image has more details, the skin reflection of the person image with the long-wave band image is uniform, the multispectral image with the long-wave band image is relatively higher in signal-to-noise ratio, the multispectral image with the long-wave band image is selected to be fused with the RGB image, and the generated image can have skin beautifying effect. The image enhancement model can also be called a multispectral auxiliary portrait enhancement algorithm network.

In some possible implementation manners, the multispectral image data and the RGB image data are input into a pre-trained image enhancement model for fusion, which may specifically be:

And downsampling the multispectral image data and the RGB image data by using a Discrete Wavelet Transform (DWT) to obtain a low-frequency image and a high-frequency image corresponding to the multispectral image data and a low-frequency image and a high-frequency image corresponding to the RGB image data, performing feature fusion on the low-frequency image and the high-frequency image to obtain a high-low frequency fusion image, and finally performing residual connection on the high-low frequency fusion image to obtain a fusion image.

In some realizable modes, the number of times of downsampling is multiple, and feature fusion is performed on low-frequency image data and high-frequency image data obtained by downsampling each time to obtain corresponding high-frequency and low-frequency fusion images, wherein the image enhancement model may include a first type network branch and a second type network branch, and the residual connection RCB is performed on the high-frequency and low-frequency fusion images to obtain fusion images, which may specifically be:

For example, the number of first type network branches including network branches may be one, the number of second type network branches including network branches may be two, RCB being performed twice in the first type network branches and RCB and SKFF being performed twice in each network branch of the second type network branches, respectively, wherein RCB is performed at intervals from SKFF.

In some possible implementations, the high-frequency image data may include horizontal high-frequency image data, vertical high-frequency image data, and oblique high-frequency image data, where feature fusion is performed on the low-frequency image and the high-frequency image, and specifically may be:

the method includes the steps of fusing low-frequency image data corresponding to multispectral image data with low-frequency image data corresponding to RGB image data, determining transverse high-frequency image data with large absolute value of pixels from the multispectral image data and the transverse high-frequency image data of the RGB image data, determining longitudinal high-frequency image data with large absolute value of pixels from the multispectral image data and the longitudinal high-frequency image data of the RGB image data, determining oblique high-frequency image data with large absolute value of pixels from the oblique high-frequency image data of the multispectral image data and the RGB image data, and performing feature fusion on the determined transverse high-frequency image data, longitudinal high-frequency image data and oblique high-frequency image data.

In a third aspect, the present application provides a training method of an image enhancement model, where the image enhancement model may include: the training method comprises the following steps of: and acquiring multispectral image data and image data corresponding to two frames of RGB images, wherein the image data comprises low-definition RGB image data with high noise and high-definition RGB image data with low noise, determining the multispectral image and the low-definition RGB image data with high noise as training data of an image reconstruction module, determining the image data output by the image reconstruction module as a predicted value, determining the high-definition RGB image data with low noise as a pseudo tag of the image reconstruction module, further training the image reconstruction module according to the training data, the predicted value and the pseudo tag of the image reconstruction module, and training the feature fusion module according to the multispectral image data and the low-definition RGB image data with high noise.

Specifically, training the image reconstruction module according to the training data, the predicted value and the pseudo tag of the image reconstruction module may specifically be:

performing iterative training on the image reconstruction module according to training data, a predicted value and a pseudo tag of the image reconstruction module, and adjusting parameters of the image reconstruction module in each iterative process until a loss function of the image reconstruction module reaches a minimum value or the value of the loss function is not significantly reduced, wherein the loss function of the image reconstruction module can comprise: pixel mean square error loss and structural similarity loss. As shown in formula (1).

Wherein,and->Is a weight parameter for balancing MSE and SSIM at the loss function +.>Is a contribution of (a) to the total number of the components. By adjusting the weight parameters, the importance degree of the model on different loss items in the optimization process can be controlled, and the person skilled in the art can set +_ according to the actual requirements>And->For example, each set to 0.5./>For pixel mean square error loss, < >>SSIM loss for structural similarity.

Specifically, training the feature fusion module according to multispectral image data and low-definition RGB image data with high noise may specifically be:

carrying out iterative training on the feature fusion module according to multispectral image data and low-definition RGB image data with high noise, and adjusting parameters of the feature fusion module in each iterative process until a loss function of the feature fusion module reaches a minimum value or the value of the loss function is not obviously reduced, wherein the loss function of the feature fusion module comprises: loss of image intensity, pixel intensity difference, and structural similarity of the low frequency image data.

As shown in formula (2):

（2）

wherein,、/>and->Each can be set according to actual requirements, e.g. respectivelySet as 10, 2 and 1.

（3）

Gradient diagram representing low frequency band, i.e. low frequency image output by DWT module in second and third network branches, +.>Representing a low frequency image, i.e. the low frequency image output by the first network branch DWT module, < >>A fusion map of the low frequency image is shown.

（4）

（5）

Is a multispectral image.

In a fourth aspect, the present application provides a computer storage medium comprising computer instructions which, when run on a mobile terminal, cause an electronic device to perform the method of any one of the second aspects.

According to the technical scheme, the application has the following beneficial effects:

the application provides an image processing method, a training method of an image enhancement model and electronic equipment, wherein the electronic equipment can comprise: the multi-spectral device, the RGB camera and the processor, wherein the multi-spectral device can be used for acquiring multi-spectral image data, the RGB camera can be used for acquiring RGB image data, the processor can be used for fusing the multi-spectral image data with the RGB image data to acquire a fused image, and the multi-spectral image and the spectrum information of the RGB image can be fused due to the fact that the multi-spectral image and the RGB image are fused, so that the fused image has richer spectrum information, namely richer details, compared with the original RGB image, and therefore the multi-spectral image and the RGB image can be fused to obtain the fused image with higher quality and clearer definition compared with the original RGB image.

Drawings

Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 2 is a schematic diagram of an open camera application according to an embodiment of the present application;

fig. 3A is a schematic view of a shooting preview interface provided in an embodiment of the present application;

fig. 3B is a schematic diagram of a camera application switching photographing mode according to an embodiment of the present application;

FIG. 4 is a block diagram of a multispectral assisted portrait enhancement algorithm network according to an embodiment of the present application;

fig. 5 is a schematic diagram of a result of a feature fusion module according to an embodiment of the present application;

fig. 6 is a schematic diagram of a result of an image reconstruction module according to an embodiment of the present application;

FIG. 7 is a schematic diagram of the result of a selective kernel feature fusion module according to an embodiment of the present disclosure;

fig. 8 is a flowchart of an image processing method according to an embodiment of the present application;

fig. 9 is an interface diagram of a portrait effect selection window of an output image according to an embodiment of the present application;

fig. 10 is a schematic diagram of fusion of an image with a target wavelength and an RGB image input into a multispectral auxiliary portrait enhancement algorithm network according to an embodiment of the present application;

fig. 11 is a software architecture diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The terms first, second, third and the like in the description and in the claims and drawings are used for distinguishing between different objects and not for limiting the specified sequence.

In the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as examples, illustrations, or descriptions. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

The RGGB sensor refers to a sensor which adopts an RGGB arrangement mode, namely, a red, green and blue arrangement mode, and the arrangement mode can convert gray information into color information, so that a color effect similar to that of human eyes can be displayed.

RGGB refers to an arrangement of color filter arrays (Color Filter Array, CFA) that is commonly used in digital image sensors, such as those of digital cameras and cell phone cameras. In the RGGB arrangement, each pixel is covered with a filter of red (R), green (G) or blue (B). This arrangement helps capture full color images.

The RGGB sensor refers to a sensor which adopts a color filter array to be sequentially arranged in red, green and blue, and can enable the sensor to convert gray information into color information, so that a color effect close to human eyes can be displayed.

Because of the RGGB sensor, the information quality collected by the device is poor, such as poor signal-to-noise ratio of signals collected under dim light, and the effective pixel number obtained by long-focus shooting is small. Thus, during the shooting process of the electronic device, some challenging scenes such as dim light, long-distance shooting, etc. cause quality deviation of the shot image, unclear image, etc.

In view of this, the present application proposes an image processing method, which can be applied to an electronic device including a multispectral device and an RGB camera, in which the electronic device can utilize the multispectral camera to capture to obtain a multispectral image, utilize the RGB camera to capture to obtain an RGB image, then determine an image with a target wavelength in the multispectral image according to the determined image effect of the output image, input the image with the target wavelength and the RGB image into a multispectral auxiliary image enhancement algorithm network to perform fusion, thereby obtaining a fusion image.

The method provided in the present application may be executed on an electronic device, and in some embodiments, the electronic device may be a mobile phone, a tablet computer, a desktop, a laptop, a notebook, an Ultra-mobile personal computer (UMPC), a handheld computer, a netbook, a personal digital assistant (Personal Digital Assistant, PDA), a wearable electronic device, a smart watch, etc., and the specific form of the electronic device is not limited in the present application. In this embodiment, the structure of the electronic device may be shown in fig. 1, and fig. 1 is a schematic structural diagram of the electronic device according to the embodiment of the present application.

As shown in fig. 1, the electronic device may include a multi-spectral device 110, an rgb camera 120, a processor 130, a display screen 140, and the like.

It is to be understood that the configuration illustrated in this embodiment does not constitute a specific limitation on the electronic apparatus. In other embodiments, the electronic device may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The multispectral device 110 may be a multispectral camera, which is a camera capable of capturing a plurality of different spectral bands (i.e., different wavelength ranges) of optical signals. Compared with a common RGB camera, the multispectral camera can acquire richer and more detailed spectrum information, the multispectral camera is generally more than 8 channels, the spectrum is wide in 10-20 nm, and the spectrum range is 400-1000 nm.

The RGB camera 120 is a camera that can capture color images. Unlike a general black-and-white camera, an RGB camera uses three independent CCD sensors to acquire three basic color signals of red, green, and blue, thereby being capable of photographing a color image.

In particular, the RGB camera 120 may capture still images or video. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a complementary metal oxide semiconductor (complementary metal-ox0de-semiconductor, CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB format.

Processor 130 may include one or more processing units such as, for example: the processor 130 may include an application processor (application processor, AP), a controller, etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

The processor 130 may be configured to fuse the multispectral image acquired by the multispectral device 110 with the RGB image acquired by the RGB camera through an algorithm model to output an image with more detail and more detailed spectral information.

Specifically, the processor 130 may downsample the multispectral image data and the RGB image data by using a discrete wavelet transform DWT to obtain a low-frequency image and a high-frequency image corresponding to the multispectral image data and a low-frequency image and a high-frequency image corresponding to the RGB image data, perform feature fusion on the low-frequency image and the high-frequency image to obtain a high-low frequency fused image, and perform residual connection on the high-low frequency fused image to obtain a fused image.

The controller may be a neural hub and command center of the electronic device. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.

A memory may also be provided in the processor 130 for storing instructions and data. In some embodiments, the memory in the processor 130 is a cache memory. The memory may hold instructions or data that the processor 130 has just used or recycled. If the processor 130 needs to reuse the instruction or data, it may be called directly from the memory. Repeated accesses are avoided and the latency of the processor 130 is reduced, thereby improving the efficiency of the system.

The display screen 140 is used to display images, videos, and the like. The display screen 140 may include a display panel. The display panel may employ a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED) or an active-matrix organic light-emitting diode (matrix organic light emitting diode), a flexible light-emitting diode (flex), a mini, a Micro-led, a quantum dot light-emitting diode (quantum dot light emitting diodes, QLED), or the like. In some embodiments, the electronic device may include 1 or N display screens 140, N being a positive integer greater than 1.

A series of Graphical User Interfaces (GUI) may be displayed on the display screen, these GUIs being the home screen of the electronic device. Generally, the size of the display screen of an electronic device is fixed and only limited controls can be displayed in the display screen of the electronic device. A control is a GUI element that is a software component contained within an application program that controls all data processed by the application program and interactive operations on that data, and a user can interact with the control by direct manipulation (direct manipulation) to read or edit information about the application program. Such as camera APP icons, camera interfaces, etc.

The image processing method provided in the present application is briefly described below in connection with a scenario.

The electronic equipment provided by the embodiment of the application can be used for shooting photos through the camera. For example, the electronic device may take a photograph through a single camera, or may take a photograph through multiple cameras, which is not limited in this application. The electronic device may take photos through multiple cameras, for example, multiple cameras, where the multiple cameras may be a primary RGB camera and a secondary multispectral camera. The embodiment of the application is not particularly limited to a single-path camera shooting or a multi-path camera shooting, a person skilled in the art can set whether the electronic device selects the single-path camera or the multi-path camera according to an actual scene, the electronic device is taken by the multi-path camera, the multi-path camera is taken as a main camera and a secondary camera, the taking scene is taken as an example of a photographing portrait scene, and the image processing method provided by the application is required to be described, so that the image processing method is not only applied to the photographing portrait scene, but also is taken as an example of the photographing portrait scene, and the application is not limited.

When a user performs a touch operation on the touch sensor, the touch sensor may acquire the touch operation of the user and report the touch operation to the processor, and after receiving the touch operation transmitted by the touch sensor, the processor may respond to the touch operation and start an application corresponding to the touch operation.

As shown in fig. 2, fig. 2 shows a schematic diagram of a user opening a camera application, for example, a user touch operation is to open the camera, a touch sensor may receive a touch operation of a user on a camera icon and report the touch operation of the camera icon to a processor, and after receiving the touch operation of the camera icon, the processor may start an application corresponding to the camera icon (may simply be referred to as a camera application) in response to the touch operation, and display a shooting preview interface of the camera on a display screen. In addition, in the embodiment of the application, the electronic device may also be enabled to start the camera application in other manners. For example, the camera application is started according to a voice instruction, a gesture, etc. issued by the user, which is not limited herein.

In some possible implementations, after the electronic device starts the camera application, the processor may identify the video stream collected by the RGB camera frame by frame, and when it is identified that a portrait exists in the video stream, may automatically start a portrait enhancement mode, that is, trigger photographing through the RGB camera and the multispectral camera two-way camera.

In some possible implementations, the electronic device may also turn on the enhanced portrait mode according to a user's selection operation.

Specifically, as shown in fig. 3A, a shooting preview interface of the camera is displayed on a display screen of the electronic device, and various functional controls such as a mode selection control 301, a shooting control 302 and the like can be included on the shooting preview interface of the camera application. The touch sensor can receive touch operation of the user on the functional control, and trigger the camera application to enter different shooting modes, wherein the shooting modes can comprise a common mode, an enhanced portrait mode and the like.

Exemplary, a schematic diagram of a camera application switching a photographing mode is shown in fig. 3B. The user may switch the shooting mode during shooting of the camera application through the shooting control 302, for example, the normal shooting mode of the camera may be switched to the enhanced portrait mode.

After the electronic device determines the photographing mode, a user may trigger a photographing operation through a photographing control 302 in a photographing preview interface applied by the touch camera, for example, the photographing mode of the current electronic device is an enhanced portrait mode, when the user triggers the photographing operation, the touch sensor may acquire the photographing operation triggered by the user and report the photographing operation to the processor, the processor may control the RGB camera and the multispectral camera to photograph the target scene at the same time, the RGB camera may be used to photograph the target scene to obtain an RGB image of the target scene, the multispectral camera may be used to photograph the target scene to obtain a multispectral image of the target scene, and then the obtained RGB image and multispectral image are sent to the processor, so that the processor may perform fusion processing on the RGB image and the multispectral image to obtain a fusion RGB image with richer spectrum information and more perfect details.

Specifically, the processor may input the RGB image and the multispectral image into a multispectral auxiliary portrait enhancement algorithm network for fusion processing.

The following describes the structure of the multispectral auxiliary portrait enhancement algorithm network provided by the present application, as shown in fig. 4, fig. 4 is a structural diagram of the multispectral auxiliary portrait enhancement algorithm network provided by the present application, which may include: the device comprises a feature extraction module, a feature fusion module and an image reconstruction module.

Specifically, the feature extraction module may use a three-layer discrete wavelet transform DWT (denoted by (1) in fig. 4) to replace a pooling layer in the CNN to perform downsampling, and gradually extract multi-scale image information from low frequency to high frequency, where the DWT is a method capable of decomposing a signal on different scales, and may decompose a signal or an image into a plurality of subband signals of different scales, where each subband signal represents a component of an original signal in a different frequency range. In the DWT module, an input image signal may be decomposed into two functions representing low frequency information and high frequency information of an image, respectively. By respectively sending the two functions into different branch networks, the DWT module can simultaneously extract the low-frequency features and the high-frequency features of the image, that is, the DWT can decompose the input image data into low-frequency image data and high-frequency image data, wherein the high-frequency image data can be divided into transverse high-frequency image data, longitudinal high-frequency image data and oblique high-frequency image data. Thereby improving the feature extraction capability of the network. Compared with the traditional CNN, the DWT has faster convergence speed and better performance. This is mainly because the DWT module can extract the features of the image more effectively and optimize the network structure, and it should be noted that the feature extraction module herein is only exemplified by including three layers of DWTs, and those skilled in the art can set the specific layer number of the DWT according to the actual requirement, which is not limited herein.

The feature fusion module may also be called an FFM module, where the FFM module may include a feature fusion unit, and the feature fusion module is described by taking an example that the feature fusion module includes three feature fusion units, and a structure of the feature fusion unit may be as shown in fig. 5, and may include an image selection module, a low-frequency image fusion module, and a high-frequency image fusion module. The image selection module can be used for determining a transverse high-frequency image with large absolute value of pixels (the pixel absolute value is large to represent the details of the corresponding image to be more abundant) from the input transverse high-frequency image data in a mode of maximum absolute value, determining a longitudinal high-frequency image with large absolute value of pixels from the input longitudinal high-frequency image data, and determining an oblique high-frequency image with large absolute value of pixels from the input oblique high-frequency image data, so that three frames of high-frequency images can be obtained.

The high-frequency image fusion module is used for carrying out feature fusion on the determined three frames of high-frequency images. The low-frequency image fusion module is used for fusing the low-frequency images.

And finally, the FFM module can splice the fused low-frequency image and high-frequency image, then input the spliced image data into the image reconstruction module, and the fusion process of the characteristic fusion unit is respectively carried out on the extracted image space low-frequency band and the extracted image space high-frequency band. The low-frequency band fusion adopts a convolutional neural network CNN structure to carry out fusion, and loss constraint is intensity (intensity) and gradient (gradient) loss. The high-frequency band fusion adopts a maximum absolute value method to determine the input high-frequency information, and the determined high-frequency information is fused.

The image reconstruction module upsamples using an inverse discrete wavelet transform IDWT (denoted by (2) in fig. 4) and improves the multi-scale fusion effect using residual connection modules RCB and SKFF, wherein the IDWT can reconstruct the signal decomposed by the DWT module into the original signal, i.e., by combining and reconstructing the subband signals according to a specific algorithm, the original signal or image is restored. RCBs are generally used to solve gradient vanishing and presentation bottleneck problems in deep neural networks. The residual connection allows the network to learn identity mapping, thus making the deep network more easy to train, and the network structure can be shown in fig. 6, where the convolution layers, nonlinear activation functions (e.g., reLU) are combined together to form a basic convolution block. The input is then added directly to the output of the convolution block by introducing a residual connection, thereby forming a residual convolution block. Such a design allows the network to learn more complex feature representations during training while mitigating the problem of gradient extinction so that the network can train more efficiently. RCBs are commonly used in various deep neural network architectures to reduce training errors of the model and may improve generalization ability of the model.

A selective kernel feature fusion module (selective kernel feature fusion, SKFF) for fusing the multi-resolution features. The structure can be as shown in fig. 7, with simple feature stitching or feature addition expression limited, SKFF uses a self-attention mechanism to fuse these features in a non-linear fashion. The fusion mode can more effectively utilize the features with different resolutions, so that the performance of the model is improved.

The structure of the multispectral auxiliary portrait enhancement algorithm network provided by the application is described in detail below, and the image processing method provided by the application is described in detail with reference to the structure of the multispectral auxiliary portrait enhancement algorithm network. As shown in fig. 8, fig. 8 is a schematic diagram of an image processing method, which is executed in the electronic device with the structure shown in fig. 1 and is described by taking an enhanced portrait scene as an example, and includes:

s801: multispectral images and RGB images are acquired.

The electronic device acquires a multispectral image and an RGB image.

Specifically, when the electronic device triggers photographing through the RGB camera and the multispectral camera two-way camera according to a user operation or other modes, the electronic device may utilize the multispectral camera to perform photographing to obtain a multispectral image, and utilize the RGB camera to perform photographing to obtain an RGB image, where the multispectral image is an image including a plurality of wave bands, each wave band represents a specific spectral range, and the RGB image is also called a true color image, and is in a digital image format. RGB represents the three color channels of Red (Red), green (Green), blue (Blue) in an image. It should be noted that, the multispectral image has richer spectrum information than the RGB image, for example, the multispectral image may include spectrum information other than visible light, and the acquired multispectral image may be called multispectral image data, and the RGB image may be called RGB image data.

S802, determining an image of a target wavelength in the multispectral image according to the determined portrait effect of the output image.

After the multispectral image and the RGB image are acquired, the electronic device can determine the image of the target wavelength in the multispectral image according to the determined human image effect of the output image, wherein the multispectral image acquired by the electronic device can be a plurality of multispectral images with different wavelengths, such as a multispectral image with the wavelength of 460nm, a multispectral image with the wavelength of 645nm and the like, wherein the skin pores and dark spots of the human image in the short-wave-band image have obvious reflection differences, the multispectral image with the short-wave-band is selected to be fused with the RGB image, the generated image has more details, the human skin reflection of the long-wave-band image is relatively uniform, the multispectral image with the RGB image with the long-wave-band is selected to be fused relatively higher signal to noise ratio, and the generated image can have skin beautifying effect.

In some possible implementations, the electronic device may determine a portrait effect of the output image based on a user's selection.

For example, as shown in fig. 9, after the electronic device determines that the photographing mode of the camera is the enhanced portrait mode, a portrait effect selection window of the output image may be presented on a preview interface of the camera, where the portrait effect selection window of the output image may include different effect options of the output image, for example, the different effect options include an option 901 and an option 902, where the selection option 901 characterizes that the output image has more details, the selection option 902 may characterize that the output image has a skin beautifying effect, and the user selects a corresponding option according to an actual requirement.

The electronic device may determine the image of the target wavelength in the corresponding multispectral image according to the option selected by the user, for example, the user selects the option 901, that is, the user wants to output the image with more details, then the electronic device may determine the image of the target wavelength in the multispectral image to be a multispectral image of a short wavelength band, for example, a multispectral image of 460nm, and further say, the user selects the option 902, that is, the image that the user wants to output has a skin care effect, then the electronic device may determine the image of the target wavelength in the multispectral image to be a multispectral image of a long wavelength band, for example, a multispectral image of 645nm, so that after the electronic device acquires the multispectral image, step S803 may be performed.

S803: and inputting the image of the target wavelength and the RGB image into a multispectral auxiliary portrait enhancement algorithm network for fusion.

The electronic device may input the determined target wavelength image and the RGB image into the multispectral assisted portrait enhancement algorithm network for fusion, in some possible implementation manners, the application only wants to change the brightness of the RGB image, but does not want to change the color of the RGB image, so that the electronic device may convert the RGB image into the Ycbcr image first, then input the determined target wavelength image and the Ycbcr image converted by the RGB image into the multispectral assisted portrait enhancement algorithm network for fusion, as shown in fig. 10, (a) in fig. 10 is the determined target wavelength image, and (b) in fig. 10 is the RGB image, and input the determined target wavelength image and the RGB image into the multispectral assisted portrait enhancement algorithm network for fusion, so as to obtain the fusion image shown in fig. 10 (c).

The multispectral assisted portrait enhancement algorithm network provided in the present application is described below by taking three layers of DWT as an example.

As shown in fig. 4, each layer of DWT module may correspond to one network branch, it may be seen that the multispectral auxiliary portrait enhancement algorithm network provided in the present application may include three network branches, where the network branch corresponding to the first input to the DWT module for downsampling may be called a first network branch, the network branch corresponding to the second input to the DWT module for downsampling may be called a second network branch, and the network branch corresponding to the last input to the DWT module for downsampling may be called a third network branch.

Specifically, the following describes the processing flow of the first network branch.

As shown in fig. 4, the image of the target wavelength and the Yocbcr image are input to the discrete wavelet transform DWT module of the first network branch for downsampling so that the features of the image can be extracted effectively.

In the DWT module, the input image signal, that is, the image of the target wavelength and the Yocbcr image may be decomposed into a low frequency image and a high frequency image, respectively, where the low frequency image may include low frequency features of the input image, the low frequency image may refer to a region where the pixel intensity changes slowly or relatively uniformly, and the high frequency image may include high frequency features of the input image, and the high frequency image may refer to a region where the pixel intensity changes severely, such as edges, textures, or noise.

As shown in fig. 4, the low-frequency image and the high-frequency image of the target wavelength can be obtained after the image of the target wavelength and the Yocbcr image are input to the discrete wavelet transform DWT for downsampling, and the low-frequency image and the high-frequency image of the Yocbcr image can be three, namely, a transverse high-frequency image, a longitudinal high-frequency image and an oblique high-frequency image.

The electronic device can input the obtained low-frequency image and high-frequency image of the image with the target wavelength and the obtained low-frequency image and high-frequency image of the Yocbcr image into the FFM module for feature fusion.

Specifically, as shown in fig. 5, feature fusion is performed on the input low-frequency image, that is, the image with the target wavelength, and the low-frequency image of the Yocbcr image through the CNN convolution network, and the fusion mode may be, for example, feature stitching, feature addition, etc., which may be set by a person skilled in the art according to actual needs, and is not limited herein.

The high-frequency image is input to the image selection module, the transverse high-frequency image of the image with the target wavelength and the transverse high-frequency image of the Yocbcr image can be regarded as a group, the longitudinal high-frequency image of the image with the target wavelength and the longitudinal high-frequency image of the Yocbcr image can be regarded as a group, the oblique high-frequency image of the image with the target wavelength and the oblique high-frequency image of the Yocbcr image can be regarded as a group, and one frame of high-frequency image with large absolute value of pixels (namely richer details) can be selected from each group of images in a mode of maximum absolute value, so that three frames of high-frequency images can be obtained. And then carrying out feature fusion on the three frames of high-frequency images, wherein the fusion mode can be, for example, feature splicing, feature addition and the like, and can be set by a person skilled in the art according to actual requirements without limitation.

Furthermore, the electronic device can perform feature stitching on the fused low-frequency image and the fused high-frequency image to obtain a feature fused image, and then the feature fused image is input to the image reconstruction module for processing.

Specifically, as shown in fig. 6, the feature fusion image is processed through a convolution network of 3*3, then processed through a ReLU function, the result obtained after the ReLU function is input into the convolution network of 3*3, and a feature matrix with H x W x C is obtained, the height of the feature matrix is represented by H, the width of the feature matrix is W, the number of channels of the feature matrix is C, for example, for a 256x256 image, the data is stored according to a three-dimensional array with a shape of [256,256,3 ]. The first dimension is height, the second dimension is width, and the third dimension is the number of channels. A 3x3 convolution refers to the use of a 3x3 size filter, and one advantage of a 3x3 convolution kernel is that it can capture local features in the input image while being relatively computationally inexpensive. Furthermore, since the size of the 3x3 convolution kernel is smaller, it can reduce the number of parameters, thereby reducing the risk of overfitting and speeding up the training process. The ReLU function is often used as an activation function of a Convolutional Neural Network (CNN) to extract feature information in an image.

And then, on one hand, inputting the characteristic matrix of H, W and C into a first Reshape unit to perform dimensional conversion to obtain the characteristic matrix of 1, HW and C.

On the other hand, the feature matrix of h×w×c may be input to the convolution of 1*1 for processing to obtain the feature matrix of h×w×1, then input to the second Reshape unit for dimensional transformation to obtain the feature matrix of 1×1×hw, finally perform normalization processing through a Softmax function, and perform dimensional transformation on the feature matrix after the Softmax function processing and the feature matrix of 1×hw×c obtained through the first Reshape unit, to obtain the feature matrix of 1×1×c.

Then, the feature matrix of 1 x C is convolved through a convolution network of 1*1, then is processed through a ReLU function, finally is convolved through a convolution network of 1*1 to obtain the feature matrix of 1 x C, the feature matrix of 1 x C and the feature matrix of H x W x C are subjected to feature addition to obtain the feature matrix of H x W x C, finally, the feature matrix of H x W x C which is subjected to the ReLU function and the feature fusion image which is originally input to the RCB are subjected to feature addition to obtain the feature matrix which is still H x W x C, and finally, the feature matrix of H x W x C is output.

The processing flow of the second network branch and the processing flow of the third network branch are similar to those of the first network branch, and redundant description is not made here. It can be seen that each network branch comprises two RCB modules, the output result of the first RCB module of the third network branch is input to the second RCB module of the third network branch for residual connection, and the output result of the first RCB module after up-sampling by the inverse discrete wavelet transformed IDWT is input to the first SKFF module of the second network branch together with the output result of the first RCB module of the second network branch for fusion.

Specifically, as shown in fig. 7, feature addition is performed on two input image signals, and then feature stitching is performed.

And inputting the feature map obtained by feature stitching to a global average pooling, wherein the global average pooling adds all pixel values on the feature map and divides the pixel values by the total number of pixels to obtain a global feature value. This global feature value can be seen as global statistics of the feature map, such as average luminance, average color, etc. By converting each feature map into global features, global average pooling can effectively reduce the complexity of the model and improve the generalization capability of the model.

In addition, global averaging pooling can also avoid overfitting problems caused by fully connected layers. In conventional convolutional neural networks, the fully connected layer typically introduces a large number of parameters and tends to result in overfitting. And global average pooling can replace a full connection layer to convert each feature map into global features, so that the number of model parameters is greatly reduced, and the risk of overfitting is reduced.

Then, the feature map of the global averaging pooling layer is input into the convolution of 1*1 for processing, and then is processed by a ReLU function, the obtained results are respectively input into the convolution of 1*1 again, and then are respectively processed by a softmax function to obtain a first result and a second result, the first result is subjected to cross multiplication with one of the image signals input into SKFF, the second result is subjected to cross multiplication with the other one of the image signals input into SKFF, and the results obtained after the cross multiplication are subjected to feature addition, so that the output of SKFF is obtained.

The output result of the first SKFF module of the second network branch is input to the first SKFF module of the first network branch together with the output result of the first RCB module of the first network branch for fusion after IDWT is carried out, and the output result of the first SKFF module of the second network branch is input to the second RCB module of the second network branch for residual connection;

the output result of the second RCB module of the third network branch is input to the second SKFF module of the second network branch together with the output result of the second RCB module of the second network branch for fusion after IDWT is carried out;

inputting the output result of the first SKFF module of the first network branch to the second RCB module of the first network branch for residual connection, and inputting the output result of the second SKFF module of the second network branch to the second SKFF module of the first network branch for fusion together with the output result of the second RCB module of the first network branch after IDWT;

and carrying out IDWT on the output result of the second SKFF module of the first network branch to obtain a fusion image.

In some possible implementation manners, the image building module in the image enhancement model may include a first type network branch and a second type network branch, in the first type network branch, the high-frequency and low-frequency fused image is subjected to residual connection RCB, image data obtained by the RCB is input to the second type network branch, in the second type network branch, the high-frequency and low-frequency fused image is subjected to RCB, image data obtained by the RCB and image data input by the first type network branch are subjected to selective kernel feature fusion SKFF, or image data obtained by the RCB of the network branch of the present layer is subjected to SKFF with image data obtained by the SKFF in the network branch of the next layer, wherein the network branch of the next layer is the network branch corresponding to the next DWT. It should be noted that, a person skilled in the art may set the number of network branches included in the first type of network branches and the number of network branches included in the second type of network branches according to the requirement, and may set the number of times of performing RCB and SKFF slave operations in each of the first type of network branches and the second type of network branches according to the requirement, and the above embodiments provided in the present application are only provided by

The first type of network branches includes one network branch, the second type of network branches includes two network branches, RCB is performed twice in the first type of network branches, RCB and SKF are performed twice in each network branch of the second type of network branches, which are described by way of example, and the application is not limited herein.

S804: and displaying the obtained fusion image in a preview interface of the electronic equipment.

The electronic device may convert the fused image obtained through step S803 into an RCB image, and then display the RCB image in a preview interface of the electronic device for the user to preview.

According to the method, when the electronic equipment triggers shooting through the RGB camera and the multi-spectral camera double-channel camera according to user operation or other modes, the electronic equipment can utilize the multi-spectral camera to shoot to obtain a multi-spectral image, the RGB camera is utilized to shoot to obtain an RGB image, then an image with a target wavelength in the multi-spectral image can be determined according to the determined human image effect of the output image, the image with the target wavelength and the RGB image are input into a multi-spectral auxiliary human image enhancement algorithm network to be fused, so that a fused image is obtained, because the image with the target wavelength and the RGB image data are input into the multi-spectral auxiliary human image enhancement algorithm network to be fused, DWT, low-frequency feature fusion, high-frequency feature fusion and other treatments are needed, so that the spectrum information of the multi-spectral image and the RGB image can be fused, and the fused image has richer spectrum information, namely richer details, in the process of fusing the multi-spectral auxiliary human image enhancement algorithm network can also denoise the image data, and the obtained fused image has lower noise than the original input RGB image, and the fused image with higher quality can be obtained.

The present application further provides a training method of the model, which is used for training the spectrum-assisted portrait enhancement algorithm network provided in the present application, and the training method of the model provided in the present application is described in detail below with reference to fig. 4.

The loss constraint is divided into two parts (2 stages), the first part (stage 1) is the loss constraint of an RCB module and an SKFF module in the image reconstruction module, the loss constraint comprises pixel mean square error loss and structural similarity index loss SSIM loss, and a loss function formula of the first part can be shown as a formula (1).

Wherein,and->Is a weight parameter for balancing MSE and SSIM at the loss function +.>Is a contribution of (a) to the total number of the components. By adjusting the weight parameters, the importance degree of the model to different loss items in the optimization process can be controlled, and the person skilled in the art can realize the optimization according to the actual situationSetting of inter-requirement->And->For example, each set to 0.5./>For pixel mean square error loss, < >>SSIM loss for structural similarity.

Specifically, training data is obtained, the training data can be multispectral images and RGB images, it is to be noted that two frames of RGB images can be obtained by shooting with an RGB camera, one frame is a low-definition RGB image with higher noise, one frame is a high-definition RGB image with lower noise, the low-definition RGB image with higher noise can be used as training data and input into a model, and the high-definition RGB image with lower noise is used as a pseudo tag.

Further, training data can be input into a spectrum-assisted portrait enhancement algorithm network, an output fusion image can be obtained, the fusion image is a predicted value, and then parameters of an RCB module and an SKFF module in the spectrum-assisted portrait enhancement algorithm network can be adjusted according to the predicted value and the pseudo tag, so that the value of a first partial loss function is changed, and the difference between the predicted value and the pseudo tag is reduced. And repeating the steps, carrying out iterative training on the RCB module and the SKFF module, and stopping training the first part when the loss function reaches the minimum value or the value of the loss function is not obviously reduced any more.

The second part (stage 2) is the loss constraint of the feature fusion module, including the low frequency fusion constraint and the high frequency fusion constraint. The high-frequency fusion constraint is the same as the characteristic fusion strategy of the high-frequency image, and a method for selecting the maximum absolute value is adopted. The low frequency fusion constraint relies on a loss function, including gradient dependent loss of the low frequency mapPixel intensity difference loss->Three parts of structural similarity loss>. The loss function is specifically shown in formula (2):

（2）

wherein,、/>and->Each of which can be set according to actual requirements, for example, 10, 2 and 1, respectively.

（3）

（4）

（5）

Is a multispectral image.

The input multispectral image and the RGB image are used as training data, and parameters of the feature fusion module are adjusted by comparing the multispectral image and the RGB image, so that the value of the second partial loss function is changed. Repeating the steps, performing iterative training on the feature fusion module, and stopping training on the second part when the loss function reaches the minimum value or the value of the loss function is not obviously reduced any more.

The denoising capability of the spectrum auxiliary portrait enhancement algorithm network can be enhanced by training the first part model, the feature fusion capability of the spectrum auxiliary portrait enhancement algorithm network can be enhanced by training the second part model, the robustness is improved, and the feature representation capability is enhanced.

As shown in fig. 11, the software architecture of the electronic device of the present application may employ a layered architecture, an event driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture. The embodiment of the invention takes an Android system with a layered architecture as an example and exemplarily describes the Android system.

The layered architecture divides the software into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into five layers, from top to bottom, an application layer, an application framework layer, an Zhuoyun rows (Android run), and a system library and kernel layer, respectively.

The application layer may include a series of application packages. Exemplary may include cameras, gallery, calendar, talk, map, navigation, WLAN, bluetooth, music, video, short message, etc.

The application framework layer provides an application programming interface (application programming interface, API) and programming framework for the application. The application framework layer includes a number of predefined functions. In an embodiment of the present application, the application framework may include a view system and display system Android interface definition language (Android Interface Definition Language, AIDL) interface and an activity recognition management service ARMS.

The view system includes visual controls, such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, a display interface including a camera application may include a view displaying text and a view displaying a picture.

The application layer and the framework layer run in virtual machines. The virtual machine executes java files of the application layer and the framework layer as binary files. The virtual machine is used for executing the functions of object life cycle management, stack management, thread management, security and exception management, garbage collection and the like.

The kernel layer is a layer between hardware and software. The kernel layer provided by the embodiment of the application comprises a display driver, an audio driver and a sensor driver.

The technical solution of the present embodiment may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to perform all or part of the steps of the method described in the respective embodiments. And the aforementioned storage medium includes: flash memory, removable hard disk, read-only memory, random access memory, magnetic or optical disk, and the like.

The foregoing is merely a specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An electronic device, comprising: multispectral device, RGB camera and processor;

the multispectral device is used for acquiring multispectral image data;

the RGB camera is used for acquiring RGB image data;

the processor is used for fusing the multispectral image data with the RGB image data to obtain a fused image, wherein the fused image comprises the spectrum information of the multispectral image data and the spectrum information of the RGB image data.

2. The electronic device of claim 1, wherein the multispectral device comprises at least eight channels with a spectral width of 10nm to 20nm and a spectral range of 400nm to 1000nm, and wherein the processor is configured to input the multispectral image data and the RGB image data into a pre-trained image enhancement model for fusion.

3. The electronic device of claim 2, wherein the processor is specifically configured to:

downsampling the multispectral image data and the RGB image data by using a Discrete Wavelet Transform (DWT) to obtain low-frequency image data and high-frequency image data corresponding to the multispectral image data and low-frequency image data and high-frequency image data corresponding to the RGB image data;

Performing feature fusion on the low-frequency image data and the high-frequency image data to obtain a high-frequency and low-frequency fusion image;

and carrying out residual connection on the high-frequency and low-frequency fusion images to obtain fusion images.

4. The electronic device of claim 3, wherein the image enhancement model comprises a first type of network branch and a second type of network branch, the processor being specifically configured to:

downsampling the multispectral image data and the RGB image data a plurality of times by using a multilayer DWT;

performing feature fusion on the low-frequency image data and the high-frequency image data obtained by downsampling each time to obtain corresponding high-frequency and low-frequency fusion images;

performing residual connection RCB on the high-low frequency fusion image in the first type network branch, and inputting image data obtained by the RCB into a second type network branch;

and performing RCB on the high-low frequency fusion image in the second type network branch, performing selective kernel feature fusion SKFF on image data obtained by the RCB and image data input by the first type network branch, or performing SKFF on image data obtained by the RCB of the network branch of the layer and image data obtained by the SKFF in the next network branch, wherein the next network branch is the network branch corresponding to the next DWT.

5. The electronic device of claim 4, wherein the first type of network branch comprises a number of network branches of one and the second type of network branch comprises a number of network branches of two, the processor being specifically configured to perform RCB twice in the first type of network branch, RCB twice in each network branch of the second type of network branch, and SKFF, respectively, the RCB being spaced from the SKFF.

6. The electronic device of claim 3, wherein the high frequency image data comprises lateral high frequency image data, longitudinal high frequency image data, and oblique high frequency image data, the processor being specifically configured to:

fusing the low-frequency image data corresponding to the multispectral image data with the low-frequency image data corresponding to the RGB image data;

determining lateral high-frequency image data having large absolute values of pixels from lateral high-frequency image data of the multispectral image data and RGB image data, determining vertical high-frequency image data having large absolute values of pixels from vertical high-frequency image data of the multispectral image data and RGB image data, and determining diagonal high-frequency image data having large absolute values of pixels from diagonal high-frequency image data of the multispectral image data and RGB image data;

And carrying out feature fusion on the determined transverse high-frequency image data, the determined longitudinal high-frequency image data and the determined oblique high-frequency image data.

7. The electronic device according to claim 2, wherein the RGB image data includes high-definition RGB image data with low noise and low-definition RGB image data with high noise, the RGB data input to the image enhancement model is low-definition RGB image data with high noise, the pre-trained image enhancement model is obtained by training according to a prediction value, which is image data output by the image enhancement model, and a pseudo tag, which is low-definition RGB image data with low noise corresponding to the low-definition RGB image data with high noise input to the image enhancement model, and training according to the multispectral image data input to the image enhancement model and the low-definition RGB image data with high noise.

8. The electronic device of claim 1, wherein the processor is specifically configured to:

determining a target portrait effect in the fused image;

determining multispectral image data of target wavelength in the multispectral image data according to the target portrait effect;

and inputting the multispectral image data of the target wavelength and the RGB image data into a pre-trained image enhancement model for fusion.

9. An image processing method, which is applied to an electronic device, wherein the electronic device comprises a multispectral device and an RGB camera, and the method comprises:

in response to triggering a photographing event, photographing by using the multispectral device to obtain multispectral image data, and photographing by using the RGB camera to obtain RGB image data;

and inputting the multispectral image data and the RGB image data into a pre-trained image enhancement model for fusion to obtain a fusion image, wherein the fusion image comprises the spectrum information of the multispectral image data and the spectrum information of the RGB image data.

10. The method of claim 9, wherein the inputting the multispectral image data and the RGB image data into a pre-trained image enhancement model for fusion comprises:

And carrying out residual connection RCB on the high-frequency and low-frequency fusion image to obtain a fusion image.

11. The method according to claim 10, wherein the number of downsampling is a plurality of times, the feature fusion is performed on the low-frequency image data and the high-frequency image data obtained by downsampling each time to obtain corresponding high-frequency and low-frequency fused images, the image enhancement model includes a first type network branch and a second type network branch, and the performing residual connection RCB on the high-frequency and low-frequency fused images to obtain the fused images includes:

and performing RCB on the high-frequency and low-frequency fusion image in the second type network branch, performing selective kernel feature fusion SKFF on image data obtained by the RCB and image data input by the first type network branch to obtain a fusion image, or performing SKFF on image data obtained by the RCB of the network branch of the layer and image data obtained by performing SKFF on the image data in the next network branch of the layer, wherein the next network branch is the network branch corresponding to the next DWT.

12. The method of claim 11, wherein the first type of network branches comprises a number of network branches of one, wherein the second type of network branches comprises a number of network branches of two, wherein RCB is performed twice in the first type of network branches, wherein RCB is performed twice in each network branch of the second type of network branches, and wherein SKFF is performed separately from the RCB.

13. The method of claim 10, wherein the high frequency image data comprises lateral high frequency image data, longitudinal high frequency image data, and oblique high frequency image data, and wherein feature fusing the low frequency image and the high frequency image comprises:

determining lateral high-frequency image data with large absolute value of pixels from the lateral high-frequency image data of the multispectral image data and the RGB image data, determining longitudinal high-frequency image data with large absolute value of pixels from the longitudinal high-frequency image data of the multispectral image data and the RGB image data, and determining oblique high-frequency image data with large absolute value of pixels from the oblique high-frequency image data of the multispectral image data and the RGB image data;

14. The method according to claim 9, wherein the RGB image data includes high-definition RGB image data with low noise and low-definition RGB image data with high noise, the RGB data input to the image enhancement model is low-definition RGB image data with high noise, the pre-trained image enhancement model is obtained by training with a pseudo tag, which is image data output by the image enhancement model, and training with multispectral image data input to the image enhancement model and low-definition RGB image data with high noise, the pseudo tag is high-definition RGB image data with low noise, which corresponds to the low-definition RGB image data with high noise input to the image enhancement model.

15. The method of claim 9, wherein the inputting the multispectral image data and the RGB image data into a pre-trained image enhancement model for fusion comprises:

determining a target portrait effect in the fused image;

16. A method of training an image enhancement model, the image enhancement model comprising: the training method comprises the following steps of:

acquiring multispectral image data and image data corresponding to two frames of RGB images, wherein the image data comprises low-definition RGB image data with high noise and high-definition RGB image data with low noise;

determining the multispectral image data and the low-definition RGB image data with high noise as training data of the image reconstruction module, determining the image data output by the image reconstruction module as a predicted value, and determining the high-definition RGB image data with low noise as a pseudo tag of the image reconstruction module;

training the image reconstruction module according to the training data, the predicted value and the pseudo tag of the image reconstruction module;

and training the feature fusion module according to the multispectral image data and the low-definition RGB image data with high noise.

17. The method of claim 16, wherein the training the image reconstruction module based on training data of the image reconstruction module, a predicted value, and a pseudo tag comprises:

Performing iterative training on the image reconstruction module according to training data, a predicted value and a pseudo tag of the image reconstruction module, and adjusting parameters of the image reconstruction module in each iterative process until a loss function of the image reconstruction module reaches a minimum value, wherein the loss function of the image reconstruction module comprises: pixel mean square error loss and structural similarity loss.

18. The method of claim 16, wherein training the feature fusion module from the multispectral image data and the noisy low-definition RGB image data comprises:

and carrying out iterative training on the feature fusion module according to multispectral image data and the low-definition RGB image data with high noise, and adjusting parameters of the feature fusion module in each iterative process until a loss function of the feature fusion module reaches a minimum value, wherein the loss function of the feature fusion module comprises: loss of image intensity, pixel intensity difference, and structural similarity of the low frequency image data.

19. A computer storage medium comprising computer instructions which, when run on an electronic device, perform the method of any of claims 9-18.