CN115587956A

CN115587956A - Image processing method and device, computer readable storage medium and terminal

Info

Publication number: CN115587956A
Application number: CN202211337819.8A
Authority: CN
Inventors: 吴倩; 邵娜; 赵磊
Original assignee: Beijing Ziguang Zhanrui Communication Technology Co Ltd
Current assignee: Beijing Ziguang Zhanrui Communication Technology Co Ltd
Priority date: 2022-10-28
Filing date: 2022-10-28
Publication date: 2023-01-10

Abstract

An image processing method and device, a computer readable storage medium and a terminal are provided, and the method comprises the following steps: acquiring a first image of an F frame, wherein F is an integer larger than 1; inputting the F frames of first images into a neural network model obtained through pre-training to obtain fusion parameters output by the neural network model, wherein the fusion parameters comprise enhancement parameters and weight parameters, the enhancement parameters comprise mapping vectors corresponding to all pixel points of each frame of first images, and the weight parameters comprise weight values corresponding to all pixel points of each frame of first images; enhancing the first image of the F frame at least according to the enhancement parameters to obtain a second image of the F frame; and performing fusion processing on the F frame second image according to the weight parameter to obtain a target image. According to the scheme provided by the application, the multi-frame images can be efficiently fused, and the high-quality images are obtained.

Description

Image processing method and device, computer readable storage medium and terminal

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image processing method and apparatus, a computer-readable storage medium, and a terminal.

Background

The method of performing Dynamic Range fusion by using multi-frame images is a mainstream method for acquiring a High Dynamic Range (HDR) image in the industry at present, but it is difficult to consider both image quality and processing efficiency when performing multi-frame image fusion. In the existing scheme, in order to improve the quality of the fused image, the adopted algorithm is complex, the calculated amount is large, and the processing efficiency is low. Or to improve the processing efficiency, a simple fusion method is adopted, in which case better image quality cannot be obtained.

Disclosure of Invention

An object of the present invention is to provide an image processing method capable of efficiently fusing a plurality of frame images to obtain a high-quality image.

In order to solve the foregoing technical problem, an embodiment of the present application provides an image processing method, where the method includes: acquiring a first image of an F frame, wherein F is an integer larger than 1; inputting the F frames of first images into a neural network model obtained by pre-training to obtain fusion parameters, wherein the fusion parameters comprise enhancement parameters and weight parameters, the enhancement parameters comprise mapping vectors corresponding to all pixel points of each frame of first images, and the weight parameters comprise weight values corresponding to all pixel points of each frame of first images; enhancing the first image of the F frame at least according to the enhancement parameters to obtain a second image of the F frame; and performing fusion processing on the F frame second image according to the weight parameter to obtain a target image.

Optionally, the step of processing the F frame first image by the neural network model includes: down-sampling the F frame first image to obtain down-sampled data; performing first convolution processing on the down-sampling data to obtain intermediate enhancement data; and performing upsampling operation on the intermediate enhancement data to obtain the enhancement parameters.

Optionally, the fusion parameters further include: the method comprises the following steps that guide parameters comprise scene vectors corresponding to each pixel point, the scene vectors comprise the association weights of the pixel points and each scene, and the enhancement processing of the first image of the F frame at least according to the enhancement parameters comprises the following steps: and according to the guide parameter and the enhancement parameter, enhancing the first image of the F frame to obtain a second image of the F frame.

Optionally, the step of processing the F frame first image by the neural network model includes: and performing second convolution processing on the F frame first image to obtain the guide parameter.

Optionally, the step of processing the F frame first image by the neural network model includes: and performing third convolution processing on the F frame first image to obtain the weight parameter.

Optionally, the step of obtaining the neural network model includes: obtaining training data, the training data comprising: f frame sample input image and single frame sample target image; and training the neural network model by adopting the training data until the neural network model is converged.

Optionally, the acquiring the F-frame first image includes: acquiring an F frame original image; and if the F frame original image is acquired by the HDR camera, taking the F frame original image as the F frame first image, otherwise, aligning the F frame first image to obtain the F frame first image.

In order to solve the above technical problem, an embodiment of the present application further provides an image processing apparatus, including: the acquisition module is used for acquiring a first image of an F frame, wherein F is an integer larger than 1; the parameter calculation module is used for inputting the F-frame first images into a neural network model obtained through pre-training to obtain fusion parameters, wherein the fusion parameters comprise enhancement parameters and weight parameters, the enhancement parameters comprise mapping vectors corresponding to all pixel points of each frame of first images, and the weight parameters comprise weight values corresponding to all pixel points of each frame of first images; the enhancement module is used for enhancing the first image of the F frame at least according to the enhancement parameters to obtain a second image of the F frame; and the fusion module is used for carrying out fusion processing on the F frame second image according to the weight parameter to obtain a target image.

Embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, performs the steps of the image processing method described above.

The embodiment of the present application further provides a terminal, which includes a memory and a processor, where the memory stores a computer program that can be executed on the processor, and the processor executes the steps of the image processing method when executing the computer program.

Compared with the prior art, the technical scheme of the embodiment of the application has the following beneficial effects:

in the scheme of the embodiment of the application, the first image of the F frame is obtained, the first image of the F frame is input to a neural network model obtained through pre-training, and fusion parameters are obtained through calculation and comprise enhancement parameters and weight parameters. The enhancement parameters comprise mapping vectors corresponding to all pixel points of each frame of first image, and the weight parameters comprise weight values corresponding to all pixel points of each frame of first image. And further, enhancing the first image of the F frame at least according to the enhancement parameters to obtain a second image of the F frame, and then fusing the second image of the F frame according to the weight parameters to obtain a target image.

Compared with the prior art, the scheme of the embodiment of the application provides an end-to-end image processing method. Specifically, the fusion parameters for fusing the images are calculated by adopting a neural network model, so that the fusion parameters can be efficiently obtained, and the efficiency of the overall fusion algorithm is improved. In addition, because the enhancement parameters include the mapping vectors corresponding to the pixel points of each frame of the first image, the pixel points in each frame of the first image can be enhanced according to the enhancement parameters to enhance the color and/or brightness of each frame of the first image, so that the quality of the finally obtained image can be improved. In addition, because the weight parameters comprise the weight values corresponding to the pixel points of each frame of the first image, the fusion processing of the F frame of the second image according to the weight parameters can be carried out according to the weight of each pixel point in each frame, and the ghost can be effectively inhibited. Therefore, the scheme provided by the embodiment of the application can efficiently obtain the high dynamic range image with better quality.

Further, in the scheme of the embodiment of the present application, a down-sampling is performed on the first image of the F frame, and then a first convolution processing is performed after the down-sampling to obtain intermediate enhanced data, and then an up-sampling operation is performed on the intermediate enhanced data, so as to obtain an enhanced parameter. By adopting the scheme, the calculation amount of the calculation enhancement parameters is reduced, the efficiency of calculating the enhancement parameters is improved, and the overall efficiency of the fusion algorithm is improved.

Further, in the scheme of the embodiment of the application, the fusion parameters further include a guide parameter, and the F-frame first image is enhanced according to the guide parameter and the enhancement parameter to obtain the F-frame second image. The guiding parameters comprise the scene vector corresponding to each pixel point, and the scene vector comprises the association weight of the pixel point and each scene, so that the scene channel is expanded in the scheme of the embodiment of the application, and the association degree of the pixel point and the scene is considered during image enhancement, so that the whole algorithm has strong self-adaptive capacity, can be applied to different shooting scenes, and can obtain better fusion quality under different shooting scenes.

Drawings

FIG. 1 is a schematic flow chart of an image processing method in an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a neural network model in an embodiment of the present application;

fig. 3 is a schematic structural diagram of an image processing apparatus in an embodiment of the present application.

Detailed Description

As described in the background art, it is difficult for the existing scheme of multi-frame image fusion to consider both the quality of the fused image and the fusion efficiency.

The existing multi-frame image fusion method comprises the following steps: ghost detection, morphological processing of a ghost mask (mask), fourier transform, frequency spectrum fusion, inverse Fourier transform and the like. Another conventional fusion method divides an image into a texture region and a flat region, processes the two regions respectively, and then fuses the two regions respectively. The method has the advantages of more parameters needing to be adjusted, large calculation amount and long time consumption. For computationally intensive methods, it is often impractical to apply to low-end devices that are not configured with acceleration hardware. For example, when acceleration hardware such as a Neural-Network Processing Unit (NPU), a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP), etc. is not provided in the device, the multi-frame image fusion method with a large amount of computation may not be operated. Or, in some scenes with high real-time requirements, the multi-frame image fusion method with large calculation amount cannot meet the real-time requirements due to large time consumption. The few simple fusion methods are easy to have the problem of low image quality after fusion.

In order to solve the foregoing technical problem, an embodiment of the present application provides an image processing method, and in a scheme of the embodiment of the present application, an F frame first image is obtained, and the F frame first image is input to a neural network model obtained through pre-training to obtain a fusion parameter, where the fusion parameter includes an enhancement parameter and a weight parameter. The enhancement parameters comprise mapping vectors corresponding to all pixel points of each frame of first image, and the weight parameters comprise weight values corresponding to all pixel points of each frame of first image. And further, enhancing the first image of the F frame at least according to the enhancement parameters to obtain a second image of the F frame, and then fusing the second image of the F frame according to the weight parameters to obtain a target image.

Compared with the prior art, the scheme of the embodiment of the application provides an end-to-end image processing method. Specifically, the fusion parameters for fusing the images are calculated by adopting a neural network model, so that the fusion parameters can be efficiently obtained, and the efficiency of the overall fusion algorithm is improved. In addition, because the enhancement parameters include the mapping vectors corresponding to the pixel points of each frame of the first image, the pixel points in each frame of the first image can be enhanced according to the enhancement parameters to enhance the color and/or brightness of each frame of the first image, so that the quality of the finally obtained image can be improved. In addition, because the weight parameters comprise the weight values corresponding to the pixel points of each frame of the first image, the fusion processing of the F frame of the second image according to the weight parameters can be carried out according to the weight of each pixel point in each frame, and ghost can be effectively inhibited.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, embodiments accompanying the present application are described in detail below with reference to the accompanying drawings.

Referring to fig. 1, fig. 1 is a schematic flowchart of an image processing method in an embodiment of the present application. The method may be executed by a terminal, for example, but not limited to, a mobile phone, a computer, a tablet computer, an internet of things device, a wearable device, a vehicle-mounted terminal, a server, and the like. The terminal may be configured with a camera to have functions of photographing, and the like. Alternatively, the terminal may be an edge device, for example, a camera with a data processing function, such as a camera used for monitoring security. Alternatively, the terminal may also be a device without a camera, and the terminal may acquire the multiple frames of images for processing from the outside, which is not limited in this embodiment. The image processing method shown in fig. 1 may include the steps of:

step S11: acquiring a first image of an F frame, wherein F is an integer larger than 1;

step S12: inputting the F frames of first images into a neural network model obtained by pre-training to obtain fusion parameters, wherein the fusion parameters comprise enhancement parameters and weight parameters, the enhancement parameters comprise mapping vectors corresponding to all pixel points of each frame of first images, and the weight parameters comprise weight values corresponding to all pixel points of each frame of first images;

step S13: enhancing the first image of the F frame at least according to the enhancement parameters to obtain a second image of the F frame;

step S14: and performing fusion processing on the F frame second image according to the weight parameter to obtain a target image.

It is understood that, in the implementation, the method may be implemented by using a software program, where the software program runs in a processor integrated inside a chip or a chip module; alternatively, the method may be implemented by hardware or a combination of hardware and software, for example, by using a dedicated chip or a chip module, or by using a dedicated chip or a chip module and a software program; alternatively, the method may be implemented in hardware.

In a specific implementation of step S11, F frames of the first image may be acquired, where F is an integer greater than 1.

Specifically, the F-frame first image may be a continuous multi-frame image captured by a camera. The F frame images may be acquired by using the same exposure value, or may be acquired by using different exposure values.

It should be noted that the format of the first image is not limited in this embodiment, and the first image may be a Bayer (Bayer) image. The Bayer image may be acquired by a Bayer pattern (Bayer pattern) based camera, and the format of the Bayer image may be RGGB, BGGR, or the like. Alternatively, the first image may be a multi-channel image obtained by processing a bayer image, and may be an RGB image, a YUV image, or the like, for example.

It should be further noted that, in the embodiment of the present application, the value of F is not limited, that is, the number of images used for fusion is not limited in the embodiment. In one non-limiting example, F =3. In practical applications, F may also take other values greater than 1.

In one particular example, the F frame first image may be captured by an HDR camera. Specifically, the F frame first image may be obtained by performing a shooting operation by the HDR camera.

In a specific implementation, the HDR camera performs a shooting operation to obtain F frame original images, and since the F frame original images are obtained by performing a shooting operation by the HDR camera, the F frame original images are captured almost synchronously, and therefore the F frame original images are already aligned, in which case, the F frame original images may not need to be aligned. Thus, the F frame original image can be directly used as the F frame first image.

In another specific example, the F frame first image may be obtained by performing alignment processing on an F frame original image, where the F frame original image may be acquired by a non-HDR camera.

In a specific implementation, the non-HDR camera continuously performs shooting operations for multiple times, and obtains F-frame original images. Further, the F frame original image may be aligned to obtain the F frame first image.

The aligning the F-frame original image may include: one frame is selected from the F frame original image as a reference frame, and then other frames except the reference frame in the F frame original image are aligned to the reference frame to obtain an F frame first image. More specifically, if the exposure values of the F frame images are different, a frame having an exposure value that is an intermediate value may be selected as the reference frame. If the exposure values of the F frame images are the same, a frame whose timing is in the middle may be selected as the reference frame. In particular implementations, various suitable existing alignment methods may be used to align other frames to the reference frame, for example, but not limited to, an optical flow-based image alignment method, a keypoint-based image alignment method, an image block-based alignment method, and the like.

In the specific implementation of step S12, the F frame first image may be input to a neural network model obtained by pre-training, so as to obtain a fusion parameter. That is, the fusion parameters in this embodiment are calculated from the first image of the F frame by the neural network model trained in advance.

Specifically, the neural network model is pre-constructed and pre-trained, and the pre-trained neural network model is deployed in the terminal. When a plurality of frames of images need to be fused, the plurality of frames of images are input into the neural network model.

Referring to fig. 2, fig. 2 is a schematic structural diagram of a neural network model in an embodiment of the present application. The structure, training process, processing steps, etc. of the neural network model are described in the following without limitation in conjunction with fig. 2.

As shown in fig. 2, the neural network model may include an input module 20, a first convolution module 21, a second convolution module 22, and a third convolution module 23. Wherein the first convolution module 21 may be used to calculate the enhancement parameters, the second convolution module 22 may be used to calculate the guiding parameters, and the third convolution module 23 may be used to calculate the weighting parameters.

The input module 20 may be configured to obtain an F-frame image input to the neural network model, where the F-frame image is an F-frame sample input image in the model training stage, and the F-frame image is an F-frame first image in the model application stage.

In one example, input module 20 may Concatenate (convert) the input F frame images in the channel direction to obtain input data having dimensions H × W × F × C. Where H denotes the width of the image, W denotes the height of the image, F denotes the number of frames of the image, and C denotes the number of channels of the image. For example, if the input image is a Bayer image in an RGGB format, C =4. For another example, if the input image is a YUV image, C =3,3 channels are a Y channel, a U channel, and a V channel, respectively. For another example, if the input image is an RGB image, C =3,3 channels are an R channel, a G channel, and a B channel, respectively.

In another example, the input module 20 may also directly transmit the input multiple frames of images to the subsequent first convolution module 21, second convolution module 22, third convolution module 23, and so on. That is, the input data may be an F frame image.

Further, the output of the input module 20 may be connected to the first convolution module 21, the second convolution module 22 and the third convolution module 23, respectively.

In a first aspect, the input of the first convolution module 21 may be the input data output by the input module 20, and the output of the first convolution module 21 is the enhancement parameter. The enhancement parameters may be a result of convolution processing of the input data. The convolution process performed by the first convolution module 21 may be referred to as a first convolution process. The enhancement parameters may include mapping vectors corresponding to each pixel point of each frame of image, and the mapping vectors may be used to map each pixel point from an original vector space to an enhanced vector space. And enhancing each pixel point in each frame of image through the mapping vector, so that the information of each pixel point can be enhanced, and the enhanced image is obtained.

More specifically, the dimension of the input data of the first convolution module 21 is H × W × F × C, and the dimension of the output of the first convolution module 21 is H × W × F × S × P. That is, the dimension of the mapping vector is S P. In other words, each pixel in each frame of image corresponds to one S × P mapping vector. Where S represents the number of scenes. S may be preset. In one non-limiting example, S =4. In other embodiments, S may take other values. P represents the dimension of the mapping vector, the value of P may depend on the enhancement purpose. More specifically, if the enhancement purpose is to enhance only the image luminance, P =1; if the enhancement purpose is to enhance the color and brightness of an image, P = C × (C + 1).

As shown in fig. 2, in one embodiment, the first convolution module 21 may include a downsampling unit 211, a first convolution unit 212, and an upsampling unit 213.

The down-sampling unit 211 may be configured to down-sample the input data to obtain down-sampled data. More specifically, the downsampling unit 211 may downsample each frame of image, and Concatenate the downsampled images in the channel direction (corresponding), resulting in downsampled data. In the present embodiment, the downsampling method is not limited, and for example, the downsampling may be performed by using bilinear interpolation. In one specific example, the dimensions of the downsampled data may be 256 × 256 × F × C.

Further, the first convolution unit 212 may be configured to perform a first convolution process on the down-sampled data, resulting in an intermediate enhancement parameter. The operator employed by the first convolution unit 212 may be any one or combination of: a general convolution operator, a separable convolution operator, a residual operator, an activation function operator, a pooling operator, and the like. In one non-limiting example, the convolution operator is concatenated with the activation function operator in the first convolution unit 212, wherein the convolution operator may include 4 sets of 3 × 3 convolution operators, and the step size of the convolution operator may be 2, and the number of convolution kernels of 4 sets of convolution operators may be 16, 32, 64, C', respectively. Wherein C' = F × S × P. The activation function operator may be a ReLU function. The dimension of the intermediate enhancement parameter output by the first convolution unit 212 may be N × C', where N represents the spatial resolution. Alternatively, N may be set to 16.

Further, the upsampling unit 213 may be configured to upsample the intermediate enhancement parameter, resulting in an enhancement parameter. It should be noted that, this embodiment does not limit the upsampling method, and for example, bilinear interpolation may be used to perform upsampling, so as to obtain an enhancement parameter with dimensions H × W × F × S × P.

From the above, the processing steps of the first convolution module 21 shown in fig. 2 for the input image can be represented by the following formula:

O _1＝ U(B1(G(D(I _f ))))；

wherein, O ₁ Denotes an enhancement parameter, I _f F is more than or equal to 1 and less than or equal to F, F is a positive integer, D represents down-sampling processing, G represents connection in channel direction, B represents connection in channel direction ₁ Denotes the first convolution process and U denotes the upsampling process.

In other embodiments of the present application, the first convolution module 21 may not have the downsampling unit 211 and the upsampling unit 213. That is, the first convolution processing may be directly performed on the input data having the dimension H × W × F × C to obtain the enhancement parameter having the dimension H × W × F × S × P.

In a second aspect, the input of the second convolution module 22 is the input data output by the input module 20, and the output of the second convolution module 22 is the pilot parameter. The guiding parameters may be obtained by convolution processing of the input data. The convolution process performed by the second convolution module 22 may be referred to as a second convolution process. The guidance parameter may include a scene vector corresponding to each pixel point, the scene vector includes an association weight between a pixel point and each scene, and a sum of the association weights between the pixel point and each scene in each scene vector may be 1. The association weight can be used for representing the association degree of the pixel point and the scene, and the higher the association degree is, the larger the association weight is. It should be noted that the scene vectors corresponding to the pixels at the same position in each frame of image are the same, that is, the guiding parameter may be used to describe information of each position in the whole multi-frame image.

More specifically, the dimension of the input data of the second convolution module 22 is H × W × F × C, and the dimension of the output data of the second convolution module 22 is H × W × S. That is, the dimension of the scene vector is 1 × S. In other words, each pixel location corresponds to a 1 × S scene vector.

As shown in fig. 2, in one embodiment, the second convolution module 22 may include: a second convolution unit 221 and a first normalization unit 222. The second convolution unit 221 may be configured to perform a second convolution process on the input data to obtain an intermediate guiding parameter. More specifically, the operator employed by the second convolution unit 221 may be any one or combination of: a general convolution operator, a separable convolution operator, a residual operator, an activation function operator, a pooling operator, and the like. In one non-limiting example, the convolution operators in the second convolution unit 221 are connected in series with the activation function operators, wherein the convolution operators may include 2 sets of convolution operators 3 × 3, and the step size of the convolution operator may be 1, and the number of convolution kernels of 2 sets of convolution operators may be: 16,S. The activation function operator may be a ReLU function. The dimension of the intermediate pilot parameter output by the second convolution unit 221 may be H × W × S.

Further, the first normalization unit 222 may be configured to perform normalization processing on the intermediate guidance parameter to obtain the guidance parameter. Specifically, the first normalization unit 222 may perform normalization processing on the intermediate guidance parameter by using a normalization operator, so that a value of an association weight in a scene vector corresponding to each pixel point is between [0,1], thereby obtaining the guidance parameter.

The normalization operator employed by the first normalization unit 222 can be represented by the following formula:

wherein i can represent an index of H dimension, j can represent an index of W dimension, k can represent an index of S dimension, i is greater than or equal to 1 and less than or equal to H, j is greater than or equal to 1 and less than or equal to W, k is greater than or equal to 1 and less than or equal to S, O ₂ Denotes a guide parameter, O' ₂ Indicating intermediate boot parameters.

From the above, the processing steps of the second convolution module 22 shown in fig. 2 for the image can be represented by the following formula:

O ₂ ＝N(B ₂ (G(I _f )))

wherein, O ₂ Denotes a guide parameter, I _f Representing the F-th frame image, F is more than or equal to 1 and less than or equal to F, G represents the connection of the channel directions, B ₂ Representing the second convolution process and N the normalization process.

It should be noted that, in other embodiments, the second convolution module 22 may not be provided. That is, the second convolution module 22 is optional. It will be appreciated that if the neural network model does not include the second convolution module 22, the dimension of the enhancement parameters may be H W F P.

In a specific example, if the first convolution module 21 does not include the down-sampling unit 211 and the up-sampling unit 213, the second convolution module 22 may not be provided. It should be further noted that, if the first convolution module 21 does not include the down-sampling unit 211 and the up-sampling unit 213, the second convolution module 22 may also be provided, which is not limited in this application.

In a third aspect, the input of the third convolution module 23 may be the input data output by the input module 20, and the output of the third convolution module 23 is the weight parameter. The weight parameter may be obtained by performing convolution processing on the input data. The convolution process performed by the third convolution module 23 may be referred to as a third convolution process. The weight parameters comprise weight values corresponding to all pixel points of each frame of image.

More specifically, the dimension of the input data of the third convolution module 23 is H × W × F × C, and the dimension of the output data of the third convolution module 23 is H × W × F.

As shown in fig. 2, in one embodiment, the third convolution module 23 may include a third convolution unit 231 and a second normalization unit 232. The third convolution unit 231 is configured to perform third convolution processing on the input data to obtain an intermediate weight parameter. More specifically, the operator employed by the third convolution unit 231 may be any one or a combination of: a general convolution operator, a separable convolution operator, a residual operator, an activation function operator, a pooling operator, and the like. In one non-limiting example, the convolution operators in the third convolution unit 231 are concatenated with activation function operators, wherein the convolution operators may include 2 sets of 3x3 convolution operators. The step size of the convolution operator may be 1, the number of convolution kernels of 2 sets of convolution operators may be: 16,F. The activation function operator may be a ReLU function. The dimension of the intermediate weight parameter output by the third convolution unit 231 may be H × W × F.

Further, the second normalization unit 232 may be configured to perform normalization on the intermediate weight parameter to obtain a weight parameter. Specifically, the second normalization unit 232 may use a normalization operator to normalize the intermediate weight parameter, so that the weight value corresponding to each pixel point is between [0,1], thereby obtaining the weight parameter.

The normalization operator employed by the second normalization unit 232 can be represented by the following formula:

wherein i can represent the index of H dimension, j can represent the index of W dimension, l can represent the index of F dimension, i is more than or equal to 1 and less than or equal to H, j is more than or equal to 1 and less than or equal to W, l is more than or equal to 1 and less than or equal to F, O ₃ Denotes a guide parameter, O ₃ ' denotes an intermediate guidance parameter.

From the above, the processing steps of the third convolution module 23 shown in fig. 2 for the image can be represented by the following formula:

O ₃ ＝N(B ₃ (G(I _f )))

wherein, O ₃ Denotes a guide parameter, I _f Representing the F-th frame image, F is more than or equal to 1 and less than or equal to F, G represents the connection of the channel directions, B ₃ Representing the third convolution process and N the normalization process.

Further, the neural network model may also include an output module 26. In one particular example, the output module 26 may be used to output enhancement parameters, guidance parameters, and weight parameters. In particular, the input of the output module 26 may be connected to the outputs of the first, second and

third convolution modules

21, 22 and 23, respectively, so that the neural network model outputs the calculated enhancement, guidance and weight parameters.

In another specific example, the output module 26 may be used to output the fused image. In the model training phase, the output of the output module 26 may be a single frame sample target image; in the model application phase, the output of the output module 26 is a single frame target image.

Specifically, the neural network model may further include: an enhancement module 24 and a fusion module 25. The enhancing module 24 may be configured to perform enhancement processing on the input data according to the enhancing parameter and the guiding parameter to obtain an enhanced image, and the fusing module 25 may be configured to perform fusion processing on the enhanced image according to the weight parameter to obtain a fused image.

More specifically, the input of the enhancement module 24 is connected to the outputs of the input module 20, the first convolution module 21 and the second convolution module 22, respectively. The input of the fusion module 25 is connected to the output of the enhancement module 24 and the third reel module 23, respectively. The output of the fusion module 25 is connected to an output module 26.

For more contents of the enhancing module 24 and the fusing module 25, reference may be made to the following description about step S13 and step S14, which are not described herein again.

In the model training phase, the neural network model may be trained using the training data until the neural network model converges.

Specifically, the training data may include: an F-frame sample input image and a single-frame sample target image. Wherein the image content of the sample input image and the sample target image are the same, but the image quality is different, and the image quality of the sample target image is better than the sample input image.

In a specific example, the single-frame sample target image may be obtained by processing the F-frame sample input image by using other existing multi-frame image fusion methods.

In another specific example, the sample input image and the sample target image may be photographed with the same photographing contents in the same photographing scene using different photographing apparatuses. More specifically, the F-frame sample input image may be acquired by a first photographing apparatus, and the single-frame sample target image may be acquired by a second photographing apparatus having a photographing performance superior to that of the first photographing apparatus. For example, the first photographing apparatus may be a mobile phone and the second photographing apparatus may be a single-lens reflex camera.

In a further specific example, the sample target image may be captured by the second capturing device described above, and the sample target image has a high image quality. The sample input image may be an inverse transform of the sample target image. For example, the sample target image may be subjected to inverse transformation of image brightness and image contrast to obtain an F-frame sample input image.

Further, the built neural network model can be trained by adopting training data. The loss function used in the training process may be an L1 loss function, an L2 loss function, a Structural Similarity (SSIM) loss function, but is not limited thereto. Wherein the L1 loss function is based on comparing differences pixel by pixel and then taking the absolute value; the L2 loss function is based on comparing the difference pixel by pixel, then taking the square; the SSIM loss function considers the indexes of brightness (luminance), contrast (contrast) and structure (structure), and is more beneficial to embodying image details. The training method may be a training method of various existing deep learning algorithms, and this embodiment does not limit this.

Therefore, an end-to-end method is adopted in the scheme of the embodiment, the parameters of the neural network model can be obtained by learning based on the training data, and the parameters do not need to be manually debugged. The neural network model obtained by training can calculate enhancement parameters, guide parameters and weight parameters based on the input images, and can also fuse the images based on the obtained enhancement parameters, guide parameters and weight parameters to obtain fused images.

In a specific implementation, the trained neural network model may be deployed in different hardware units within the terminal. Different hardware units deployed in the terminal may mean that different modules of the neural network model may be disposed in different hardware units. In practical applications, the neural network model may be deployed in one or more hardware units of the terminal according to the type of the terminal. For example, the GPU and the DSP have better acceleration effects on the processing of down-sampling, up-sampling, mapping, etc., and the first convolution module 21 may be deployed in the GPU or the DSP. For another example, the NPU has a better acceleration effect on the convolutional neural network, the second convolution module 22 and the third convolution module 23 may be disposed in the NPU, or the first convolution module 21 may also be disposed in the NPU.

With continued reference to fig. 1, in a specific implementation of step S12, the F frame first image may be input into a neural network model, and a fusion parameter for fusing the F frame first image is obtained, where the fusion parameter may include an enhancement parameter and a weight parameter. Further, the fusion parameters may also include guidance parameters. The enhancement parameters, the guide parameters and the weight parameters can be calculated by a pre-trained neural network model on the first image of the F frame.

For specific contents of calculating the fusion parameters by the neural network model, see the related description above with respect to fig. 2, and are not described herein again.

In step S13, the first image of the F frame may be enhanced according to the enhancement parameter to obtain the second image of the F frame. More specifically, the pixel point may be mapped by using a mapping vector corresponding to each pixel point of each frame of the first image, so as to obtain an F frame of the second image.

Or, the F frame first image may be enhanced according to the enhancement parameter and the guidance parameter, so as to obtain the F frame second image. In an embodiment of the present application, the neural network model is the structure shown in fig. 2, the dimension of the enhancement parameter output by the neural network model is H × W × F × S × P, and the dimension of the guidance parameter is H × W × S. On one hand, a 1 × S scene vector may be used to perform first enhancement on the first image of the F frame to increase information of a scene channel, and on the other hand, an S × P mapping vector may be used to perform second enhancement on each pixel of the first image of the F frame to enhance image color and/or brightness, thereby obtaining a second image of the F frame.

In a specific example, the enhancement is aimed at enhancing only the image luminance, and as described above, when only the image luminance is enhanced, P =1. Accordingly, the parameter O is enhanced ₁ Dimension of H × W × F × S × 1, guide parameter O ₂ Dimension of (b) is H × W × S. In other words, each pixel of the first image corresponds to an S × 1 mapping vector, and each pixel corresponds to a 1 × S mapping vectorA scene vector. Further, step S13 may be represented by the following formula:

wherein, T _f Representing the second image, I, of the f-th frame _f Representing the first image of the f-th frame, H can represent an index of H dimension, W can represent an index of W dimension, C can represent an index of C dimension, H is more than or equal to 1 and less than or equal to H, W is more than or equal to 1 and less than or equal to W, C is more than or equal to 1 and less than or equal to C, T _f (h, w, c) may represent a value of index (h, w, c) in the f-th frame second image, I _f (h, w, c) may represent a value of a position where the index is (h, w, c) in the first image of the F-th frame, S represents an index of S dimension, 1. Ltoreq. S, F may represent an index of F dimension, 1. Ltoreq. F. Ltoreq.F, O ₁ (h, w, f, s, 1) represents a value with an index of (h, w, f, s, 1) in the mapping parameters, O ₂ And (h, w, s) represents a value with an index of (h, w, s) in the guide parameters, and h, w, c, f and s are positive integers. Thereby, a second image of F frames can be obtained, the brightness of which is enhanced compared to the first image.

In another specific example, the enhancement is to enhance the color and brightness of the image. As described above, when the enhancement purpose is to enhance the color and brightness of an image, P = C × (C + 1). Accordingly, the parameter O is enhanced ₁ Has a dimension of H × W × F × S × C × (C + 1), and a guide parameter O ₂ Dimension of (d) is H × W × S.

As an example, the mapping parameter O may be ₁ The decomposition is as follows: a first mapping parameter R1 and a second mapping parameter T1, where a dimension of R1 is H × W × F × S × C, and a dimension of T1 is: h × W × F × S × 1 × C.

Further, step S13 can be represented by the following formula:

wherein F represents the index of F dimension, F is more than or equal to 1 and less than or equal to F, T _f Representing the second image of the f-th frame, I _f Represents the f-th frameA first image, H can represent an index of H dimension, W can represent an index of W dimension, H is greater than or equal to 1 and less than or equal to H, W is greater than or equal to 1 and less than or equal to W, T _f (h, w) represents the 1 XC vector corresponding to the pixel point in the h row and w column of the f frame second image, I _f (h, w) represents a 1 xC vector corresponding to the pixel point of the ith row and the w column in the first image of the f frame, S represents an index of S dimension, S is more than or equal to 1 and less than or equal to S, R ₁ (h, w, f, s) represents R ₁ C x C vector with middle index of (h, w, f, s), T ₁ (h, w, f, s) denotes T ₁ 1 × C vector with middle index of (h, w, f, s), O ₂ (h, w, s) represents a value of index (h, w, s) in the guide parameter. h. w, f and s are positive integers.

From the above, by performing step S13, the F-frame second image can be obtained. The second images and the first images are in one-to-one correspondence, and each frame of second image is obtained after the corresponding first image is enhanced. It should be noted that the above description only exemplarily gives the calculation process of the enhancement processing, and other calculation methods may also be adopted to calculate the F frame second image according to the enhancement parameters and the guidance parameters of different dimensions, which is not limited in this embodiment.

In a specific implementation of step S14, the F-frame second image may be subjected to a fusion process according to the weighting parameter. Since the weight parameter includes a weight value corresponding to each pixel point of each frame of the first image, and the second image and the first image correspond to each other one to one, for this purpose, the weight parameter may include a weight value corresponding to each pixel point of each frame of the second image. The value of each pixel point in the target image can be obtained by weighting and summing the pixel values of the corresponding pixel points in the second image of each frame according to the weighting parameters.

Specifically, step S14 can be represented by the following formula:

where H may represent an index of the H dimension, W may represent an index of the W dimension, C may represent an index of the C dimension, H ≦ H1 ≦ W, W ≦ W1 ≦ C, Y (H, W, C) represents the value of the target image at index (H, W, C), T _f (h, w, c) may represent a value of (h, w, c) as an index in the f-th frame second image, O ₃ (h, w, f) represents a weight parameter O ₃ The middle index is the value of (h, w, f).

Thus, a single frame of the target image can be obtained. Because the weight parameters comprise the weight corresponding to each pixel point of each frame of image, the fusion process can be suitable for the weight of different frames in different areas for fusion, and ghost can be effectively inhibited.

In view of the above, the embodiment of the present application provides an end-to-end image processing method, where an F-frame first image is input to a neural network model obtained through pre-training, so that a single-frame target image output by the model can be obtained. The method is based on fusion of multiple frames of images, and the target image with high definition and high dynamic range is obtained. The method can be applied to shooting scenes with high dynamic range, low brightness, backlight and the like, and can solve the key problems of alignment, ghost image removal, image fusion, image color/brightness enhancement and the like so as to obtain clearer high-quality images.

In addition, the scheme has small calculation amount and less time consumption. In practical applications, the neural network model may be deployed on a low-end device to perform the method provided above. Or the method provided by the above can be applied to scenes with high requirements on algorithm real-time performance, such as a preview mode and a camera shooting mode, and can give consideration to both processing efficiency and fusion quality. As an example, the method can be applied to a vehicle-mounted device, and the vehicle-mounted device has a very high requirement on the real-time performance of the algorithm. By the method, the target image can be quickly obtained, the definition of the target image is high, and areas with different brightness can be well exposed, so that the accuracy of target detection or scene analysis can be improved.

Referring to fig. 3, fig. 3 is a schematic structural diagram of an image processing apparatus in an embodiment of the present application, and the apparatus shown in fig. 3 may include:

an obtaining module 31, configured to obtain F frames of first images, where F is an integer greater than 1;

a parameter calculation module 32, configured to input the F frame first images to a neural network model obtained through pre-training, so as to obtain a fusion parameter, where the fusion parameter includes an enhancement parameter and a weight parameter, the enhancement parameter includes a mapping vector corresponding to each pixel point of each frame of the first images, and the weight parameter includes a weight value corresponding to each pixel point of each frame of the first images;

the enhancement module 33 is configured to perform enhancement processing on the F-frame first image at least according to the enhancement parameter to obtain an F-frame second image;

and the fusion module 34 is configured to perform fusion processing on the F frame second image according to the weight parameter to obtain a target image.

For more contents of the operation principle, the operation method, the beneficial effects, and the like of the image processing apparatus in the embodiment of the present application, reference may be made to the above description about the image processing method, and details are not repeated here.

In a specific implementation, the Image Processing apparatus shown in fig. 3 may correspond to a chip having an Image Processing function in the terminal, or correspond to a chip having an Image Processing function, such as an Image Signal Processing (ISP) chip; or to a chip module having an image processing function, or to a terminal.

Embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, performs the steps of the image processing method described above. The storage medium may include ROM, RAM, magnetic or optical disks, etc. The storage medium may further include a non-volatile memory (non-volatile) or a non-transitory memory (non-transient), and the like.

The embodiment of the present application further provides a terminal, which includes a memory and a processor, where the memory stores a computer program that can be executed on the processor, and the processor executes the steps of the image processing method when executing the computer program. The terminal includes, but is not limited to, a mobile phone, a computer, a tablet computer and other terminal devices.

The embodiment of the application also provides a chip, and the chip can be used for executing the image processing method. Alternatively, the chip may include the image processing apparatus shown in fig. 3. In one particular example, the chip may be an ISP chip.

The embodiment of the application also provides a chip module, and the chip module can be used for executing the image processing method. Alternatively, the chip module may include the image processing apparatus shown in fig. 3.

It should be understood that, in the embodiment of the present application, the processor may be a Central Processing Unit (CPU), and the processor may also be other general-purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It will also be appreciated that the memory in the embodiments of the subject application can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, and not limitation, many forms of Random Access Memory (RAM) are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and direct rambus RAM (DR RAM)

The above-described embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. The procedures or functions according to the embodiments of the present application are wholly or partially generated when the computer instructions or the computer program are loaded or executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer program may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer program may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire or wirelessly.

In the several embodiments provided in the present application, it should be understood that the disclosed method, apparatus and system may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative; for example, the division of the unit is only a logic function division, and there may be another division manner in actual implementation; for example, various elements or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may be physically included alone, or two or more units may be integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit. For example, for each device or product applied to or integrated into a chip, each module/unit included in the device or product may be implemented by hardware such as a circuit, or at least a part of the module/unit may be implemented by a software program running on a processor integrated within the chip, and the rest (if any) part of the module/unit may be implemented by hardware such as a circuit; for each device or product applied to or integrated with the chip module, each module/unit included in the device or product may be implemented by using hardware such as a circuit, and different modules/units may be located in the same component (e.g., a chip, a circuit module, etc.) or different components of the chip module, or at least some of the modules/units may be implemented by using a software program running on a processor integrated within the chip module, and the rest (if any) of the modules/units may be implemented by using hardware such as a circuit; for each device and product applied to or integrated in the terminal, each module/unit included in the device and product may be implemented by hardware such as a circuit, different modules/units may be located in the same component (e.g., a chip, a circuit module, etc.) or different components in the terminal, or at least part of the modules/units may be implemented by a software program running on a processor integrated in the terminal, and the rest (if any) part of the modules/units may be implemented by hardware such as a circuit.

It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in this document indicates that the former and latter related objects are in an "or" relationship.

The "plurality" appearing in the embodiments of the present application means two or more. The descriptions of the first, second, etc. appearing in the embodiments of the present application are only for illustrating and differentiating the objects, and do not represent the order or the particular limitation of the number of the devices in the embodiments of the present application, and do not constitute any limitation to the embodiments of the present application. Although the present application is disclosed above, the present application is not limited thereto. Various changes and modifications may be effected by one skilled in the art without departing from the spirit and scope of the application, and the scope of protection is defined by the claims.

While the present application is disclosed above, the present application is not limited thereto. Various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the scope or spirit of the present disclosure, and it is intended that the scope of the present disclosure be defined by the appended claims.

Claims

1. An image processing method, characterized in that the method comprises:

acquiring a first image of an F frame, wherein F is an integer larger than 1;

inputting the F frames of first images into a neural network model obtained through pre-training to obtain fusion parameters, wherein the fusion parameters comprise enhancement parameters and weight parameters, the enhancement parameters comprise mapping vectors corresponding to all pixel points of each frame of first images, and the weight parameters comprise weight values corresponding to all pixel points of each frame of first images;

enhancing the first image of the F frame at least according to the enhancement parameters to obtain a second image of the F frame;

and performing fusion processing on the F frame second image according to the weight parameter to obtain a target image.

2. The image processing method according to claim 1, wherein the processing of the F-frame first image by the neural network model comprises:

down-sampling the F frame first image to obtain down-sampled data;

performing first convolution processing on the down-sampling data to obtain intermediate enhanced data;

and performing upsampling operation on the intermediate enhancement data to obtain the enhancement parameters.

3. The image processing method according to claim 1, wherein the fusion parameters further include: the method comprises the following steps that a guiding parameter comprises a scene vector corresponding to each pixel point, the scene vector comprises the associated weight of the pixel point and each scene, and the enhancement processing of the first image of the F frame at least according to the enhancement parameter comprises the following steps:

and according to the guide parameters and the enhancement parameters, enhancing the first image of the F frame to obtain a second image of the F frame.

4. The image processing method according to claim 3, wherein the step of processing the first image of the F frame by the neural network model comprises:

and performing second convolution processing on the F frame first image to obtain the guide parameter.

5. The image processing method according to claim 1, wherein the processing of the F-frame first image by the neural network model comprises:

and performing third convolution processing on the F frame first image to obtain the weight parameter.

6. The image processing method according to claim 1, wherein the obtaining of the neural network model comprises:

obtaining training data, the training data comprising: f frame sample input image and single frame sample target image;

and training the neural network model by adopting the training data until the neural network model is converged.

7. The image processing method of claim 1, wherein acquiring the F-frame first image comprises:

acquiring an F frame original image;

and if the F frame original image is acquired by the HDR camera, taking the F frame original image as the F frame first image, otherwise, aligning the F frame first image to obtain the F frame first image.

8. An image processing apparatus, characterized in that the apparatus comprises:

the acquisition module is used for acquiring a first image of an F frame, wherein F is an integer larger than 1;

the parameter calculation module is used for inputting the F-frame first images into a neural network model obtained through pre-training to obtain fusion parameters, wherein the fusion parameters comprise enhancement parameters and weight parameters, the enhancement parameters comprise mapping vectors corresponding to all pixel points of each frame of first images, and the weight parameters comprise weight values corresponding to all pixel points of each frame of first images;

the enhancement module is used for enhancing the first image of the F frame at least according to the enhancement parameters to obtain a second image of the F frame;

and the fusion module is used for carrying out fusion processing on the F frame second image according to the weight parameter to obtain a target image.

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the image processing method of any one of claims 1 to 7.

10. A terminal comprising a memory and a processor, the memory having stored thereon a computer program operable on the processor, characterized in that the processor, when executing the computer program, performs the steps of the image processing method of any of claims 1 to 7.