WO2023010754A1 - 一种图像处理方法、装置、终端设备及存储介质 - Google Patents

一种图像处理方法、装置、终端设备及存储介质 Download PDF

Info

Publication number
WO2023010754A1
WO2023010754A1 PCT/CN2021/138137 CN2021138137W WO2023010754A1 WO 2023010754 A1 WO2023010754 A1 WO 2023010754A1 CN 2021138137 W CN2021138137 W CN 2021138137W WO 2023010754 A1 WO2023010754 A1 WO 2023010754A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
model
processed
enhanced
processing
Prior art date
Application number
PCT/CN2021/138137
Other languages
English (en)
French (fr)
Inventor
章政文
陈翔宇
董超
乔宇
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Publication of WO2023010754A1 publication Critical patent/WO2023010754A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • G06T5/94Dynamic range modification of images or parts thereof based on local image properties, e.g. for local contrast enhancement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/77Retouching; Inpainting; Scratch removal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Definitions

  • the present application relates to the technical field of image processing, and in particular to an image processing method, device, terminal equipment, and storage medium.
  • Image optimization processing tasks generally include image retouching and toning, image beautification, image denoising, image super-resolution, image enhancement and other optimization tasks for images. Some optimization tasks are optimized for each video frame in the video to be processed, and can also be regarded as image optimization tasks, such as SDR video conversion to HDR video, video denoising, video super-resolution, etc. Compared with the original image, the optimized image after image processing can better reflect the visual information in the real scene.
  • the original image is generally only processed by using the deep learning model related to the processing task, such as only using the image denoising model to denoise the original image, using HDR
  • the conversion model converts SDR video frames into HDR video frames, etc.
  • Embodiments of the present application provide an image processing method, device, terminal device, and storage medium, which can improve the quality of an optimized image in an image optimization processing task.
  • the embodiment of the present application provides a method, which includes: using the trained optimization model to perform target type optimization processing on the image to be processed to obtain an initial optimized image; Local enhancement processing to obtain an enhanced image; input the enhanced image and the overexposure mask image of the image to be processed into the trained compensation model for processing, and perform information compensation on the highlighted area of the enhanced image to obtain a compensated image, and the overexposure mask
  • the mockup image indicates highlighted areas.
  • the image processing method provided in this application can first use the optimization model to perform target type optimization processing on the image to be processed to obtain the initial optimized image.
  • the local enhancement model can be used for enhancement processing, and the lost texture detail information can be reconstructed to obtain an enhanced image.
  • the enhanced image is processed through a compensation model to compensate for the content information lost in the overexposed region.
  • This application uses multiple deep learning models in series to perform information compensation on the initial optimized image obtained in the image optimization processing task, which can avoid artifacts and color deviations in the optimized image and improve the quality of the optimized image.
  • the local enhancement model includes: a downsampling module, an upsampling module, and multiple residual networks arranged between the downsampling module and the upsampling module.
  • the method for determining the pixel value of the pixel in the overexposure mask image includes: according to the formula Determine the pixel value of the pixel in the overexposure mask image, where I mask (x, y) represents the pixel value of the pixel in the overexposure mask image at (x, y), and I s (x, y) represents The pixel value of the pixel point of the image to be processed located at (x, y), ⁇ represents the preset overexposure threshold.
  • the compensation model includes a generator; input the enhanced image and the overexposure mask image of the image to be processed into the trained compensation model, and perform information compensation on the highlighted area of the enhanced image, including: input the enhanced image to Process in the trained generator to obtain global exposure information; determine the overexposure information of the highlighted area according to the overexposure mask image of the image to be processed and the global exposure information; use the overexposure information to compensate the highlighted area to obtain compensation image.
  • the optimized initial model, the local enhanced initial model and the compensated initial model are respectively trained to obtain corresponding optimized models, local enhanced models and compensation models.
  • the training method of the generator includes: constructing a generation confrontation network, the generation confrontation network includes an initial model of the generator and a discriminator; using a preset loss function and a training set to perform confrontation training on the generation confrontation network to obtain the generator, Among them, the training set includes enhanced image samples, overexposure mask image samples and compensation image samples corresponding to multiple image samples to be processed; the loss function is used to describe the absolute error loss value between the compensation image sample and the predicted image, the compensation image The comprehensive loss value of the perceptual loss value between the sample and the predicted image and the discriminator loss value of the predicted image; the predicted image means that the enhanced image sample is processed by the initial model of the generator, multiplied by the overexposure mask image sample, and then The resulting image overlaid with the augmented image sample.
  • target type optimization processing refers to HDR conversion processing
  • the image to be processed is a video frame obtained by extracting frames from the SDR video, and each video frame in the SDR video is sequentially processed by an optimization model, a local enhancement model and a compensation model The compensated image that is outputted later is combined to obtain the HDR video corresponding to the SDR video.
  • an embodiment of the present application provides an image processing device, including: an optimization unit, configured to use a trained optimization model to perform target type optimization processing on the image to be processed to obtain an initial optimized image; an enhancement unit, through the trained The local enhancement model performs local enhancement processing on the initial optimized image to obtain an enhanced image; the compensation unit is used to input the enhanced image and the overexposure mask image of the image to be processed into the trained compensation model for processing, and the high The bright area is compensated for information, and the compensated image is obtained, and the overexposure mask image indicates the highlighted area.
  • an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and operable on the processor.
  • a terminal device including a memory, a processor, and a computer program stored in the memory and operable on the processor.
  • the processor executes the computer program, any one of the above-mentioned first aspects is implemented. item method.
  • an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the method in any one of the above-mentioned first aspects is implemented.
  • an embodiment of the present application provides a computer program product, which, when the computer program product is run on a terminal device, causes the terminal device to execute the method in any one of the foregoing first aspects.
  • Fig. 1 is a flow chart of an image processing method provided by an embodiment of the present application
  • Fig. 2 is a schematic structural diagram of an optimization model provided by an embodiment of the present application.
  • Fig. 3 is a schematic structural diagram of a local enhancement model provided by an embodiment of the present application.
  • Fig. 4 is a schematic structural diagram of a compensation model provided by an embodiment of the present application.
  • Fig. 5 is a schematic diagram of the range of HDR and SDR color gamuts provided by an embodiment of the present application.
  • Fig. 6 is a schematic structural diagram of an initial compensation model provided by an embodiment of the present application.
  • FIG. 7 is a flow chart of converting an SDR video to an HDR video according to an embodiment of the present application.
  • Fig. 8 is a schematic diagram of comparison of image processing results of multiple models provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of an image processing device provided by an embodiment of the present application.
  • Fig. 10 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • the original image is generally processed by using a deep learning model related to the processing task, which causes the optimized image to lose a lot of information.
  • the edge information of the original image is generally smoothed to achieve the purpose of denoising.
  • the traditional image denoising It is difficult for the method to retain more detail information under the premise of ensuring the denoising effect, resulting in poor quality of the optimized image.
  • the exposure of the original image is high, the information of some highlighted areas is not easy to be extracted.
  • the optimized image will lose part of the high
  • the content information of the bright area leads to deviations in the color of the optimized image.
  • the neural network is used to obtain the color mapping relationship between the SDR video frame and the SDR video frame, so as to realize the HDR conversion of the SDR video frame , resulting in the loss of more detail information and highlight area information in the HDR video frame, and the obtained HDR video quality is poor.
  • a large network is usually built for training based on the purpose of optimization.
  • a large network that can take into account color mapping and detail enhancement is usually constructed model, the large network model is trained as a whole, so that the large network model can take into account the functions of color mapping and detail enhancement.
  • the quality of the optimized image obtained in this way has not been significantly improved, especially in areas with excessive color, and the optimization effect is obviously poor.
  • the present application provides an image processing method, which optimizes the image to be processed by connecting multiple deep learning models in series. Specifically, first optimize the target type of the image to be processed through the optimization model to obtain the initial optimized image, then use the local enhancement model to perform local enhancement processing on the initial optimized image, and then use the compensation model to compensate the information of the highlighted area of the enhanced image , to reconstruct the lost detail information and content information of highlighted regions in the initial optimized image.
  • image optimization tasks multiple deep learning models are selected to perform corresponding tasks for different information lost in the optimized image, and multiple deep learning models are connected in series to optimize the image to be processed, and then optimize the optimized image. Lost information is compensated to improve the quality of optimized images.
  • an image processing method provided in this application is exemplarily introduced with reference to FIG. 1 .
  • the image processing method can be applied to an image processing device, and the image processing device can be a mobile terminal such as a smart phone, a tablet computer, a video camera, or a device capable of processing image data such as a desktop computer, a robot, and a server.
  • an image processing method provided by the present application includes:
  • the image to be processed can be any image to be processed that requires image modification and color adjustment, image beautification, image denoising, or image super-resolution; Retouch and color correction, beautify, denoise or convert any video frame extracted from the video to be processed.
  • devices with camera functions such as smartphones, tablet computers, cameras, desktop computers, and robots can be used to obtain images or videos to be processed.
  • the image to be processed may be optimized for the target type based on a deep learning method.
  • the image to be processed can be converted into an initial optimized image by using a fully convolutional neural network, for example, a fully convolutional neural network comprising 3 convolutional layers and a convolution kernel with a size of 1 ⁇ 1. It is also possible to add other network structures on the basis of the fully convolutional neural network to form a new network model.
  • the present application provides an optimization model for performing target type optimization processing on an image to be processed to obtain an initial optimized image.
  • the optimization model provided by this application is shown in Figure 2.
  • the optimized model consists of a main network and a color conditional network.
  • the color condition network includes at least one color condition block (Color Condition Block, CCB) and a feature conversion module connected in sequence.
  • At least one color condition module is used to extract global color characteristic information from the low-resolution image of the image to be processed.
  • the feature conversion module is used to convert the global color feature information into N sets of adjustment parameters.
  • N sets of adjustment parameters are respectively used to adjust N intermediate features extracted by the main network during the process of converting the image to be processed into an optimized image, and N is an integer greater than or equal to 1.
  • the image to be processed may be down-sampled by a certain multiple (for example, down-sampled by 4 times) to obtain a corresponding low-resolution image.
  • the low-resolution image is obtained by downsampling the image to be processed by 4 times.
  • the size of the low-resolution image is the same as that of the image to be processed, but the number of pixels per unit area of the image to be processed is equal to the number of pixels per unit area of the low-resolution image 4 times the amount.
  • the color conditional module includes sequentially connected convolutional layers, pooling layers, first activation functions, and IN (Instance Normalization) layers.
  • the color condition module can perform global feature extraction on the input low-resolution image. Compared with the method based on image local feature extraction, it can effectively represent the global feature information of the image to be processed, thereby avoiding the introduction of artificial artifacts in the optimized image. film.
  • the feature conversion module includes dropout layer, convolutional layer, pooling layer and N fully connected layers.
  • the dropout layer, the convolution layer and the pooling layer are connected in sequence to process the global color feature information extracted by at least one color condition module to obtain a condition vector.
  • N fully connected layers are used to perform feature conversion on the conditional vectors to obtain N sets of adjustment parameters. It should be noted that each fully connected layer processes the condition vector to obtain a set of adjustment parameters, and finally the number of fully connected layers can be the same as the number of sets of adjustment parameters.
  • the optimization model shown in FIG. 2 includes four color condition modules connected in sequence.
  • the size of the convolution kernel in the convolution layer is 1 ⁇ 1, and the pooling layer adopts average pooling.
  • the first activation function is a non-linear activation function LeakyReLU.
  • the main network includes N global feature modulation (Global Feature Modulation, GFM) layers, and N sets of adjustment parameters are input to the N GFM layers.
  • the GFM layer can adjust the intermediate features input to the GFM layer according to the adjustment parameters.
  • the main network further includes N convolutional layers and N-1 second activation functions, and the N GFM layers are respectively connected to the output terminals of the N convolutional layers.
  • the main network is used to convert the image to be processed into an optimized image, and during the conversion process, N convolutional layers can be used to extract N intermediate features.
  • the size of the convolution kernel in each convolution layer is 1 ⁇ 1.
  • the second activation function may be a nonlinear activation function ReLU.
  • the number of fully connected layers in the color conditional network and the number of groups of correspondingly generated adjustment parameters should be designed based on the number of convolutional layers in the main network. For example, if the main network includes N convolutional layers, it means that the N intermediate features generated by the N convolutional layers need to be adjusted. Therefore, the color conditional network needs to output N sets of adjustment parameters corresponding to the N intermediate features, and the main network needs to have N GFM layers to adjust the N intermediate features according to the N sets of adjustment parameters.
  • the main network includes 3 convolution (Conv) layers, 3 GFM layers and 2 second activation function (ReLU) layers.
  • the main network sequentially includes a convolutional layer, a GFM layer, a ReLU layer, a convolutional layer, a GFM layer, a ReLU layer, a convolutional layer, and a GFM layer from input to output.
  • the color conditional module includes 4 CCB layers connected in sequence;
  • the feature conversion module can include a Dropout layer, a convolution (Conv) layer, an average pooling (Avgpool) layer, and respectively connected with 3 fully connected (FC) layers connected by the condition vector (Condition Vector) output by the average pooling layer.
  • Each fully connected layer can convert the condition vector into a corresponding set of adjustment parameters ( ⁇ , ⁇ ), and the color conditional network outputs a total of 3 sets of adjustment parameters (ie, adjustment parameter 1, adjustment parameter 2, and adjustment parameter 3).
  • Each GFM layer in the main network adjusts the intermediate features input to the GFM layer according to the corresponding adjustment parameters, which can be expressed as formula (1):
  • xi represents the i-th intermediate feature input to the GFM layer
  • GFM(xi ) represents the adjustment result of the GFM layer on the input intermediate feature xi according to the adjustment parameters ( ⁇ , ⁇ ).
  • the optimization model uses the color conditional network to extract the color feature information of the image to be processed as prior information, which is used to adjust the intermediate features in the main network, so that the optimization model can be based on the color prior feature information of different images to be processed. Adaptively outputting the initial optimized image corresponding to the image to be processed can avoid artificial artifacts in the initial optimized image.
  • the target type optimization process may also be performed on the image to be processed through a color lookup table or a traditional digital image processing method to obtain an initial optimized image.
  • the initial optimized image obtained by the method described in the above embodiment has higher color richness, contrast or sharpness, but in the process of optimizing the image to be processed, the edge texture information of the initially optimized image and the information in the highlighted part may be lost. Therefore, in order to ensure the quality of the initial optimized image, it is necessary to perform further processing on the initial optimized image to compensate for the detail information of the initial optimized image and the missing content information of the highlighted area.
  • image enhancement technology can be used to adjust the original image to better reflect the real environment.
  • the visual information in the image may be used for later image analysis and processing.
  • the initial optimization image may be enhanced using a neural network-based method, or local enhancement may be performed on the initial optimization image through a conventional digital image processing method.
  • a method based on a neural network is taken as an example to introduce an exemplary local enhancement process of an initially optimized image.
  • the local enhancement model provided by the present application is shown in FIG. 3 .
  • the local enhancement model includes a downsampling module, an upsampling module and multiple residual networks arranged between the downsampling module and the upsampling module.
  • the down-sampling module includes at least one set of alternately arranged first convolutional layers (Conv1) and first activation layers (ReLU1)
  • the up-sampling module includes up-sampling layers and at least one set of alternately arranged second activation layers (Conv2) and a second convolutional layer (ReLU2).
  • the upsampling layer may be a pixel reorganization layer (Pixel shuffle).
  • the residual network includes a third convolutional layer (Conv3), an activation layer, a fourth convolutional layer (Conv4) and skip connections.
  • Conv3 convolutional layer
  • Conv4 convolutional layer
  • the first image feature input to the residual network is sequentially processed by the first convolutional layer, the activation layer and the second convolutional layer to obtain the second image feature, which is connected to The first image feature is fused with the second image, and the fusion result is used as the input of the next layer.
  • the activation layer can be a nonlinear activation function ReLU.
  • the initial optimized image is input into the local enhancement model to enhance the edge texture detail information of the initial optimized image to obtain an enhanced image.
  • a deep learning method may be used to compensate the information of the overexposed area by using a trained neural network model.
  • the embodiment of the present application provides a compensation model, input the enhanced image and the over-exposed mask image (Over-exposed mask) of the image to be processed into the trained compensation model, and the highlighted area of the enhanced image Make information compensation.
  • the compensation model includes a generator. Specifically, the enhanced image is input into the trained generator for processing to obtain the global exposure information, and the overexposure information of the highlighted area is determined according to the overexposure mask image of the image to be processed and the global exposure information, and the overexposure information is used to The highlighted area is compensated to obtain a compensated image.
  • the overexposure mask image of the image to be processed can be obtained by formula (2), namely:
  • I mask (x, y) represents the pixel value of the pixel point of the overexposure mask image at (x, y);
  • I S (x, y) represents the image to be processed at (x, y ) is the pixel value of the pixel at the position;
  • is the preset overexposure threshold, which is used to control the overexposure degree of the image to be processed, and the corresponding value can be set according to actual needs.
  • the highlight area in the image to be processed can be determined according to the pixel values of the pixels in the overexposure mask image.
  • the generator can be a neural network structure containing any convolutional layer, which is used to obtain global exposure information from the enhanced image.
  • the structure of the generator (Generator) provided by the present application is shown in Figure 4, the generator includes: a plurality of down-sampling modules connected in sequence and a plurality of up-sampling modules corresponding to a plurality of up-sampling modules .
  • the down-sampling module includes a convolution layer and a down-sampling layer (DownSample)
  • the up-sampling module includes an up-sampling layer (UpSample) and a convolution layer.
  • the enhanced image is input into the trained generator, and the global exposure information can be obtained from the enhanced image.
  • the overexposure information of the highlighted area is determined according to the overexposure mask image of the image to be processed and the global exposure information, and the overexposure information is used to compensate the highlighted area to obtain a compensated image, which specifically includes: The information is multiplied pixel by pixel with the overexposure mask image to obtain the overexposure information of the highlighted area; the overexposure information is added to the enhanced image to obtain the compensated image.
  • This process can be expressed as formula (3):
  • I H I mask ⁇ G(I LE )+I LE (3)
  • I H represents the compensated image
  • I mask represents the overexposure mask image
  • I LE represents the enhanced image
  • G(I LE ) represents the overexposure information of the highlighted area obtained after the generator processes the enhanced image.
  • the image processing method provided in this application utilizes an optimization model to perform target type optimization processing on the image to be processed to obtain an initial optimized image.
  • the local enhancement model can be used for enhancement processing, and the lost texture detail information can be reconstructed to obtain an enhanced image.
  • the global exposure information is extracted from the enhanced image, and the overexposure information of the highlighted area of the enhanced image is determined through the overexposure mask image of the image to be processed. After the overexposure information is fused with the enhanced image, the lack of highlighted parts of the enhanced image can be compensated. content information.
  • the image to be processed is optimized by connecting multiple neural network models in series to compensate for the lost information, and the highlighted area of the final optimized image (that is, the compensated image) has more features than the highlighted area of the initial optimized image.
  • the edge texture information is also richer, which can avoid artifacts and color deviation in the optimized image, and improve the quality of the optimized image in the image optimization processing task.
  • the optimization model, local enhancement model and compensation model provided by this application are all universal. On the one hand, these three types of models can be used to perform corresponding tasks independently. Specifically, the optimization model can be applied to any task that requires color optimization or color conversion of an image to be processed or a video frame to be processed.
  • the local enhancement model can be applied to any task that needs to enhance the texture detail information of images or video frames.
  • the compensation model can be applied to any task that needs to compensate the content information of the highlighted regions of the image or video frame.
  • the optimization model can be connected in series with any one of the local enhancement model and the compensation model, and the details of the initial optimized image or the information of the highlighted area obtained in the image optimization task (such as color optimization or color conversion) can be respectively processed.
  • the optimization model, local enhancement model and compensation model can also be connected in series, so that the local enhancement model and compensation model can carry out detailed information and high Compensation for bright area information.
  • Image processing tasks include image editing, image retouching and toning, image coloring, SDR (Standard Dynamic Range) video conversion to HDR (High Dynamic Range) video, image denoising, image super-resolution processing, etc.
  • FIG. 5 is a schematic diagram showing ranges of HDR and SDR color gamuts.
  • BT.709 and BT.2020 are TV parameter standards issued by ITU (International Telecommunication Union)
  • DCI-P3 is a color gamut standard formulated by the American film industry for digital cinema. It can be seen from Figure 5 that BT.2020 has the largest color gamut among DCI-P3, BT.709 and BT.2020, followed by DCI-P3, and BT.709 has the smallest color gamut. .
  • HDR video uses the BT.709 color gamut
  • HDR video uses the wider BT.2020 color gamut or DCI-P3 color gamut.
  • the HDR video can show higher contrast and richer colors than the SDR video.
  • a common video conversion method is to convert SDR data into HDR data through image coding technology, so that the HDR data can be played on the HDR terminal device.
  • a super-resolution conversion method is required to convert the low-resolution SDR video content into high-resolution HDR video content conforming to the HDR video standard.
  • Existing video conversion methods have high computational cost, and part of the detail information will be lost in the converted HDR video. If the exposure of the SDR video content is too high, the information in some highlighted areas is not easy to be extracted. If the overexposed SDR video content is processed according to the optimal processing method for the normal exposure image, the content information of some highlighted areas will be lost in the HDR video, thereby affecting the quality of the video.
  • the image processing method provided by this application uses the optimization model, the local enhancement model and the compensation model to directly process the SDR video frame, convert the SDR video frame into an HDR video frame, and analyze the details of the HDR video frame. Information and content information in the brightness area are further enhanced to avoid artifacts and color deviation in HDR video.
  • the optimized initial model, local enhanced initial model and compensated initial model can be trained by designing corresponding training sets and loss functions, so as to obtain optimized models and local enhanced models suitable for different tasks. and compensation models.
  • the network structure of the optimized initial model is the same as that of the optimized model shown in Figure 2.
  • the training process of optimizing the initial model is as follows:
  • Step 1 obtain the training set.
  • the training set may include multiple SDR video frame samples and HDR video frame samples corresponding to the multiple SDR video frame samples one-to-one.
  • an SDR video sample and its corresponding HDR video sample are acquired first.
  • SDR video samples and corresponding HDR video samples can be obtained from public video websites. It is also possible to perform SDR and HDR processing on videos in the same RAW data format, respectively, to obtain SDR video samples and corresponding HDR video samples. It is also possible to use the SDR camera and the HDR camera respectively to shoot corresponding SDR video samples and HDR video samples in the same scene.
  • the SDR video samples and their corresponding HDR video samples are respectively subjected to frame sampling processing to obtain multiple SDR video frame samples, and to align with multiple SDR video frame samples in timing and space.
  • SDR video frame samples correspond to HDR video frame samples one-to-one.
  • Step 2 using the training set and the preset loss function to train the optimized initial model to obtain the optimized model.
  • the SDR video frame samples After building the optimized initial model, input the SDR video frame samples into the main network of the optimized initial model. Multiple SDR video frame samples are down-sampled to obtain multiple low-resolution images, and the low-resolution images are input into the color condition network of the optimized initial model to obtain adjustment parameters to adjust the HDR video predicted by the optimized initial model frame.
  • the preset loss function f 1 is used to describe the HDR video frame predicted by the optimized initial model L2 loss with HDR video frame sample H. It can be expressed as formula (4):
  • the optimized initial model can be iteratively trained by the gradient descent method until the model converges, and the trained optimized model can be obtained.
  • the structure of the local enhancement initial model is the same as that of the local enhancement model shown in Fig. 3.
  • the training process of the local enhanced initial model is as follows:
  • Step 1 obtain the training set.
  • the SDR video samples and their corresponding HDR video samples are obtained.
  • the SDR video samples and their corresponding HDR video samples are subjected to frame sampling processing to obtain multiple SDR video frame samples and HDR video frame samples corresponding to the multiple SDR video frame samples in time sequence and space.
  • the SDR video frame sample For each SDR video frame sample, the SDR video frame sample can be input into the trained optimization model provided by this application or other trained neural network models for HDR conversion processing, or the SDR video frame sample can be processed through the color lookup table An HDR conversion process is performed to convert the SDR video frame samples into initial optimized image samples, which are actually image samples of HDR data. Therefore, when training the local enhanced initial model, the training set includes a plurality of training samples, and each training sample includes an initial optimization image sample and an HDR video frame sample corresponding to the SDR video frame sample.
  • Step 2 Use the training set and the preset loss function to train the local enhancement initial model to obtain the local enhancement model.
  • the initial optimized image sample in the training sample For each training sample in the training set, input the initial optimized image sample in the training sample into the local enhanced initial model as shown in Figure 3 for training. Specifically, the detailed information of the initial optimized image sample is enhanced through the down-sampling module, multiple residual networks and up-sampling module in sequence to obtain a predicted enhanced image. According to the predicted enhanced image and the HDR video frame samples corresponding to the initial optimized image samples, the loss function is iteratively trained until the model converges, and a local enhanced model is obtained.
  • an L2 loss function may be used, and a gradient descent method may be used to iteratively train the loss function.
  • the generator in the compensation initial model can be trained by constructing a generative adversarial network. Use the preset loss function and training set to conduct confrontation training on the generative confrontation network to obtain the generator.
  • the training set includes enhanced image samples, overexposure mask image samples and compensation image samples corresponding to a plurality of image samples to be processed.
  • the compensation initial model provided by this application is shown in Figure 6. The model includes the initial model of the generator and the discriminator, and the initial model of the generator and the discriminator constitute a generative confrontation network.
  • the process of training the compensated inception model is as follows:
  • Step 1 obtain the training set.
  • the SDR video samples and their corresponding HDR video samples are obtained.
  • the SDR video samples and their corresponding HDR video samples are subjected to frame sampling processing to obtain multiple SDR video frame samples and HDR video frame samples corresponding to the multiple SDR video frame samples in time sequence and space.
  • HDR conversion can be performed on the SDR video frame sample through a trained optimization model or color lookup table or other trained neural network models to obtain an initial optimized image sample, and use The trained local enhancement model or other trained neural network models perform enhancement processing on the detailed information of the initial optimized image sample to obtain corresponding enhanced image samples.
  • the above formula (2) can also be used to obtain the overexposure mask image samples corresponding to the SDR video frame samples. Therefore, when training the compensation initial model, the training set includes a plurality of training samples, and each training sample includes an enhanced image sample corresponding to the SDR video frame sample, an overexposure mask image sample, and an HDR video frame sample.
  • Step 2 Input the enhanced image samples and the overexposure mask image samples in the training set to the compensation initial model for processing to obtain the predicted image.
  • the enhanced image samples in the training sample are input into the initial model of the generator for processing to obtain global exposure information.
  • the overexposure information of the highlighted area is obtained.
  • the overexposure information is fused with the enhanced image samples to obtain the predicted image.
  • Step 3 Input the predicted image and the corresponding HDR video frame samples into the discriminator for iterative training to obtain the compensation model.
  • the prediction image and the corresponding HDR video frame sample in the training sample are input into the discriminator for processing, and the discrimination result of the training sample is obtained.
  • the initial compensation model is iteratively trained to obtain the trained compensation model.
  • the preset loss function L provided by the embodiment of the present application can be expressed as formula (5):
  • L 1 represents absolute error loss
  • L p represents perceptual loss
  • L GAN represents generative confrontation loss
  • I GT represents HDR video frame samples (i.e. compensated image samples)
  • I H represents predicted images
  • Output ⁇ , ⁇ , and ⁇ are all hyperparameters.
  • the gradient descent method can be used for training.
  • the preset loss function meets certain requirements, it means that the model has converged, that is, the compensation initial model has been trained and the compensation model is obtained.
  • the trained optimization model, local enhancement model and compensation model can convert SDR video to HDR video.
  • Fig. 7 is a flow chart of converting SDR video into HDR video provided by the embodiment of the present application. Specifically, frame extraction processing is performed on the SDR video to be processed to obtain an SDR video frame. For each SDR video frame, the SDR video frame is input into the trained optimization model for HDR conversion processing, that is, the SDR video frame is converted into HDR data to obtain the initial optimized image. The texture detail information of the initial optimized image is enhanced through the trained local enhancement model to obtain the enhanced image. According to the overexposure mask image of the SDR video frame, the trained compensation model is used to compensate the content information of the highlighted area of the enhanced image to obtain the final compensated image. Combine the compensated images corresponding to each SDR video frame to obtain the HDR video corresponding to the SDR video to be processed.
  • the residual network (ResNet), the ring generation confrontation network (CycleGAN) and the pixel-to-pixel generation network (Pixel 2 Pixel) are algorithm models for image-to-image translation.
  • High Dynamic Range Network High Dynamic Range Net, HDRNet
  • Conditional Sequential Retouching Network Conditional Sequential Retouching Network, CSRNet
  • Ada-3DLUT Adaptive 3D lookup table
  • Deep super-resolution inverse tone-mapping method (Deep super-resolution inverse tone-mapping, Deep SR-ITM) and super-resolution joint inverse tone mapping generation confrontation network (GAN-Based Joint Super-Resolution and Inverse Tone-Mapping, JSI-GAN ) is an algorithm model for SDR video to HDR video conversion.
  • Fig. 8 is an example of an optimized image obtained after processing the same picture with each model listed in Table 1. From the two examples listed in FIG. 8, it can be seen that using the method provided by the present application to optimize the image based on multiple models in series, the optimization effect in the over-color area is obviously better.
  • the method provided by this application for optimizing the image to be processed by connecting multiple deep learning models in series can reduce the loss of information in the process of optimizing the image, and can obviously improve the quality of the optimized image compared with the existing technology .
  • an embodiment of the present application also provides an image processing device.
  • the device 400 includes: an optimization unit 401, which is configured to use the trained optimization model to perform target type optimization processing on the image to be processed to obtain an initial optimized image.
  • the enhancement unit 402 is configured to perform local enhancement processing on the initial optimized image through the trained local enhancement model to obtain an enhanced image.
  • the compensation unit 403 is configured to input the enhanced image and the overexposure mask image of the image to be processed into the trained compensation model for processing, and perform information compensation on the highlighted area of the enhanced image to obtain a compensated image, the overexposure mask image Indicates highlighted areas.
  • the local enhancement model includes: a downsampling module, an upsampling module, and multiple residual networks arranged between the downsampling module and the upsampling module.
  • the method for determining the pixel value of the pixel in the overexposure mask image includes: according to the formula Determine the pixel value of the pixel in the overexposure mask image, where I mask (x, y) represents the pixel value of the pixel in the overexposure mask image at (x, y), and I s (x, y) represents The pixel value of the pixel point of the image to be processed located at (x, y), ⁇ represents the preset overexposure threshold.
  • the compensation model includes a generator; input the enhanced image and the overexposure mask image of the image to be processed into the trained compensation model, and perform information compensation on the highlighted area of the enhanced image, including: input the enhanced image to Process in the trained generator to obtain global exposure information; determine the overexposure information of the highlighted area according to the overexposure mask image of the image to be processed and the global exposure information; use the overexposure information to compensate the highlighted area to obtain compensation image.
  • the optimized initial model, the local enhanced initial model and the compensated initial model are respectively trained to obtain corresponding optimized models, local enhanced models and compensation models.
  • the training method of the generator includes: constructing a generation confrontation network, the generation confrontation network includes an initial model of the generator and a discriminator; using a preset loss function and a training set to perform confrontation training on the generation confrontation network to obtain the generator, Among them, the training set includes enhanced image samples, overexposure mask image samples and compensation image samples corresponding to multiple image samples to be processed; the loss function is used to describe the absolute error loss value between the compensation image sample and the predicted image, the compensation image The comprehensive loss value of the perceptual loss value between the sample and the predicted image and the discriminator loss value of the predicted image; the predicted image means that the enhanced image sample is processed by the initial model of the generator, multiplied by the overexposure mask image sample, and then The resulting image overlaid with the augmented image sample.
  • target type optimization processing refers to HDR conversion processing
  • the image to be processed is a video frame obtained by extracting frames from the SDR video, and each video frame in the SDR video is sequentially processed by an optimization model, a local enhancement model and a compensation model The compensated image that is outputted later is combined to obtain the HDR video corresponding to the SDR video.
  • a terminal device 500 in this embodiment includes: a processor 501 , a memory 502 , and a computer program 504 stored in the memory 502 and operable on the processor 501 .
  • the computer program 504 can be run by the processor 501 to generate instructions 503 , and the processor 501 can implement the steps in the above embodiments of the image color optimization method according to the instructions 503 .
  • the processor 501 executes the computer program 504
  • the functions of the modules/units in the above-mentioned device embodiments are implemented, for example, the functions of the unit 401 and the unit 402 shown in FIG. 9 .
  • the computer program 504 can be divided into one or more modules/units, and one or more modules/units are stored in the memory 502 and executed by the processor 501 to complete the present application.
  • One or more modules/units may be a series of computer program instruction segments capable of accomplishing specific functions, and the instruction segments are used to describe the execution process of the computer program 504 in the terminal device 500 .
  • FIG. 10 is only an example of the terminal device 500, and does not constitute a limitation to the terminal device 500. It may include more or less components than those shown in the figure, or combine certain components, or different components. , for example, the terminal device 600 may also include an input and output device, a network access device, a bus, and the like.
  • the processor 501 can be a central processing unit (Central Processing Unit, CPU), and can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), on-site Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the storage 502 may be an internal storage unit of the terminal device 500 , such as a hard disk or memory of the terminal device 500 .
  • the memory 502 can also be an external storage device of the terminal device 500, such as a plug-in hard disk equipped on the terminal device 500, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash memory card (Flash Card) and so on. Further, the memory 502 may also include both an internal storage unit of the terminal device 500 and an external storage device.
  • the memory 502 is used to store computer programs and other programs and data required by the terminal device 500 .
  • the memory 502 can also be used to temporarily store data that has been output or will be output.
  • the terminal device provided in this embodiment can execute the foregoing method embodiment, and its implementation principle and technical effect are similar, and details are not repeated here.
  • the embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method described in the foregoing method embodiment is implemented.
  • the embodiment of the present application further provides a computer program product, which, when the computer program product runs on a terminal device, enables the terminal device to implement the method described in the foregoing method embodiments when executed.
  • the above integrated units are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, all or part of the procedures in the methods of the above embodiments in the present application can be completed by instructing related hardware through computer programs, and the computer programs can be stored in a computer-readable storage medium.
  • the computer program When executed by a processor, the steps in the above-mentioned various method embodiments can be realized.
  • the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form.
  • the computer-readable storage medium may at least include: any entity or device capable of carrying computer program codes to a photographing device/terminal device, a recording medium, a computer memory, a read-only memory (Read-Only Memory, ROM), a random access Memory (Random Access Memory, RAM), electrical carrier signal, telecommunication signal and software distribution medium.
  • a photographing device/terminal device a recording medium
  • a computer memory a read-only memory (Read-Only Memory, ROM), a random access Memory (Random Access Memory, RAM), electrical carrier signal, telecommunication signal and software distribution medium.
  • ROM read-only memory
  • RAM random access Memory
  • electrical carrier signal telecommunication signal and software distribution medium.
  • U disk mobile hard disk, magnetic disk or optical disk, etc.
  • references to "one embodiment” or “some embodiments” or the like in this application means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application.
  • appearances of the phrases “in one embodiment,” “in some embodiments,” “in other embodiments,” “in other embodiments,” etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean “one or more but not all embodiments” unless specifically stated otherwise.
  • the terms “including”, “comprising”, “having” and variations thereof mean “including but not limited to”, unless specifically stated otherwise.
  • first and second are used for description purposes only, and cannot be interpreted as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features.
  • the features defined as “first” and “second” may explicitly or implicitly include at least one of these features.
  • connection and “connected” should be understood in a broad sense, for example, it can be mechanical connection or electrical connection; it can be direct connection or through An intermediate medium is indirectly connected, which can be the internal communication of two elements or the interaction relationship between two elements. Unless otherwise clearly defined, those of ordinary skill in the art can understand the above terms in this application according to the specific situation. specific meaning.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)

Abstract

本申请提供一种图像处理方法、装置、终端设备及存储介质,应用于图像处理领域。本申请提供的图像处理方法包括:利用已训练的优化模型对待处理图像进行目标类型优化处理,得到初始优化图像;通过已训练的局部增强模型对初始优化图像进行局部增强处理,得到增强图像;将增强图像和待处理图像的过曝掩模图像输入到已训练的补偿模型中进行处理,对增强图像的高亮区域进行信息补偿,得到补偿图像,过曝掩模图像指示高亮区域。本申请提供的图像处理方法、装置、终端设备及存储介质可以提高图像优化处理任务中优化图像的质量。

Description

一种图像处理方法、装置、终端设备及存储介质 技术领域
本申请涉及图像处理技术领域,尤其涉及一种图像处理方法、装置、终端设备及存储介质。
背景技术
图像优化处理任务一般包括图像修饰与调色、图像美化、图像去噪、图像超分辨、图像增强等针对图像的优化任务。有些优化任务是针对待处理视频中的每一帧视频帧进行优化,也可以视为图像优化任务,例如SDR视频转HDR视频、视频去噪、视频超分辨等。经图像处理后的优化图像与原始图像相比可以更好地反应真实场景中的视觉信息。现有技术中,在利用传统的图像处理方法执行上述任务时,一般仅利用与处理任务相关的深度学习模型对原始图像进行处理,例如仅利用图像去噪模型对原始图像进行去噪、利用HDR转换模型将SDR视频帧转换成HDR视频帧等,通过这类方法实现图像优化处理任务时,可能会丢失许多的信息,导致优化图像中存在较多的伪影和色彩偏差,优化图像的质量较差。
发明内容
本申请实施例提供了一种图像处理方法、装置、终端设备及存储介质,可以提高图像优化处理任务中优化图像的质量。
第一方面,本申请实施例提供了一种方法,该方法包括:利用已训练的优化模型对待处理图像进行目标类型优化处理,得到初始优化图像;通过已训练的局部增强模型对初始优化图像进行局部增强处理,得到增强图像;将增强图像和待处理图像的过曝掩模图像输入到已训练的补偿模型中进行处理,对增强图像的高亮区域进行信息补偿,得到补偿图像,过曝掩模图像指示高亮区域。
本申请提供的图像处理方法,可以先利用优化模型对待处理图像进行 目标类型优化处理,得到初始优化图像。针对初始优化图像中丢失的局部信息,可以利用局部增强模型进行增强处理,对损失的纹理细节信息进行重建,得到增强图像。然后基于待处理图像对应的过曝光图像中指示的高亮区域,通过补偿模型对增强图像进行处理,以补偿过曝光区域中丢失的内容信息。本申请利用串联的多个深度学习模型对图像优化处理任务中得到的初始优化图像进行信息补偿,可以避免优化图像中出现伪影以及色彩偏差,以提高优化图像的质量。
可选地,局部增强模型包括:下采样模块、上采样模块以及设置在下采样模块和上采样模块之间的多个残差网络。
可选地,过曝掩模图像中像素点的像素值的确定方法包括:根据公式
Figure PCTCN2021138137-appb-000001
确定过曝掩模图像中像素点的像素值,其中,I mask(x,y)表示过曝掩模图像位于(x,y)处的像素点的像素值,I s(x,y)表示待处理图像位于(x,y)处的像素点的像素值,λ表示预设的过曝阈值。
可选地,补偿模型包括生成器;将增强图像和待处理图像的过曝掩模图像输入到已训练的补偿模型中,对增强图像的高亮区域进行信息补偿,包括:将增强图像输入到已训练的生成器中进行处理,得到全局曝光信息;根据待处理图像的过曝掩模图像和全局曝光信息确定高亮区域的过曝光信息;利用过曝光信息对高亮区域进行补偿,得到补偿图像。
可选地,分别对优化初始模型、局部增强初始模型和补偿初始模型进行训练,得到对应的优化模型、局部增强模型和补偿模型。
可选地,生成器的训练方法包括:构建生成对抗网络,生成对抗网络包括生成器的初始模型和判别器;利用预设的损失函数和训练集对生成对抗网络进行对抗训练,得到生成器,其中,训练集包括与多个待处理图像样本对应的增强图像样本、过曝掩模图像样本和补偿图像样本;损失函数用于描述补偿图像样本与预测图像之间的绝对误差损失值、补偿图像样本与预测图像之间的感知损失值和预测图像的判别器损失值的综合损失值;预测图像是指增强图像样本经过生成器的初始模型处理后,与过曝掩模图像样本相乘,再与增强图像样本叠加得到的图像。
可选地,目标类型优化处理是指HDR转换处理,待处理图像为从SDR视频中抽帧得到的视频帧,SDR视频中的每一帧视频帧依次经过优化模型、 局部增强模型和补偿模型处理后输出的补偿图像,经过合帧后得到与SDR视频对应的HDR视频。
第二方面,本申请实施例提供了一种图像处理装置,包括:优化单元,用于利用已训练的优化模型对待处理图像进行目标类型优化处理,得到初始优化图像;增强单元,通过已训练的局部增强模型对初始优化图像进行局部增强处理,得到增强图像;补偿单元,用于将增强图像和待处理图像的过曝掩模图像输入到已训练的补偿模型中进行处理,对增强图像的高亮区域进行信息补偿,得到补偿图像,过曝掩模图像指示高亮区域。
第三方面,本申请实施例提供了一种终端设备,包括存储器、处理器以及存储在存储器中并可在处理器上运行的计算机程序,处理器执行计算机程序时实现上述第一方面中任一项的方法。
第四方面,本申请实施例提供了一种计算机可读存储介质,计算机可读存储介质存储有计算机程序,计算机程序被处理器执行时实现上述第一方面中任一项的方法。
第五方面,本申请实施例提供了一种计算机程序产品,当计算机程序产品在终端设备上运行时,使得终端设备执行上述第一方面中任一项的方法。
可以理解的是,上述第二方面至第五方面的有益效果可以参见上述第一方面和第一方面的各可能的实施方式所带来的有益效果的相关描述,在此不再赘述。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一实施例提供的一种图像处理方法的流程图;
图2是本申请一实施例提供的一种优化模型的结构示意图;
图3是本申请一实施例提供的一种局部增强模型的结构示意图;
图4是本申请一实施例提供的一种补偿模型的结构示意图;
图5是本申请一实施例提供的HDR和SDR色域表示范围的示意图;
图6是本申请一实施例提供的一种补偿初始模型的结构示意图;
图7是本申请一实施例提供的一种SDR视频转HDR视频的流程图;
图8是本申请一实施例提供的多个模型的图像处理结果对比示意图;
图9是本申请一实施例提供的一种图像处理装置的结构示意图;
图10是本申请一实施例提供的一种终端设备的结构示意图。
具体实施方式
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。
在针对图像优化,一般会利用与处理任务相关的深度学习模型对原始图像进行处理,这就导致优化后的图像丢失许多的信息。示例性,在对正常的原始图像进行图像去噪的过程中,一般会将原始图像的边缘信息进行平滑处理,以实现去噪的目的,当原始图像的对比度较低时,传统的图像去噪方法难以在保证去噪效果的前提下保留较多的细节信息,导致优化图像质量不佳。或者当原始图像的曝光度较高时,一些高亮区域的信息不容易被提取出来,如果按照对正常曝光图像的优化处理方法对过曝光的原始图像进行处理,会使优化图像中丢失部分高亮区域的内容信息,导致优化图像的色彩存在偏差。又或者在对于HDR视频转换任务中,将SDR视频帧转换成HDR视频帧后,一般仅利用神经网络获取SDR视频帧与SDR视频帧之间的色彩映射关系,以实现对SDR视频帧的HDR转换,导致HDR视频帧中丢失较多的细节信息以及高亮区域信息,得到的HDR视频质量较差。
为了提高图像优化的质量,目前通常会基于优化目来构建一个大网络进行训练。比如说,针对HDR视频转换任务中,为了兼顾SDR视频到HDR视频帧之间的低频转换(颜色的映射)和高频转换(细节增强),通常构建一个能够兼顾颜色映射和细节增强的大网络模型,对该大网络模型进行 整体训练,以使得该大网络模型能够兼顾颜色映射和细节增强的功能。然而,这种方式得到的优化图像的质量也没有明显的提升,尤其是在颜色过度区域,优化效果明显较差。
针对图像优化处理任务中存在的问题,本申请提供了一种图像处理方法,通过串联的多个深度学习模型对待处理图像进行优化处理。具体地首先通过优化模型对待处理图像进行目标类型的优化处理,得到初始优化图像,再利用局部增强模型对初始优化图像进行局部增强处理,然后通过补偿模型对增强图像的高亮区域的信息进行补偿,以重建初始优化图像中丢失的细节信息和高亮区域的内容信息。通过将图像优化任务进行解耦,针对优化图像中丢失的不同信息,选取多个深度学习模型执行对应的任务,并将多个深度学习模型串联起来对待处理图像进行优化处理,进而对优化图像中丢失的信息进行补偿,提高优化图像的质量。
下面结合附图,对本申请的技术方案进行详细描述。下面通过参考附图描述的实施例是示例性的,旨在用于解释本申请,而不能理解为对本申请的限制。
在一种可能的实现方式中,结合图1对本申请提供的一种图像处理方法进行示例性的介绍。该图像处理方法可以应用在图像处理设备中,图像处理设备可以是智能手机、平板电脑、摄像机等移动终端,还可以是台式电脑、机器人、服务器等能够处理图像数据的设备。如图1所示,本申请提供的一种图像处理方法包括:
S100,利用已训练的优化模型对待处理图像进行目标类型优化处理,得到初始优化图像。
在一个实施例中,待处理图像可以是任意一幅需要对其进行图像修饰与调色、图像美化、图像去噪或图像超分辨等图像处理的待处理图像;也可以是从需要对其进行修饰与调色、美化、去噪或视频转换的待处理视频中提取到的任一视频帧。此外,可以利用智能手机、平板电脑、摄像机、台式电脑、机器人等具备摄像功能的设备获取待处理图像或待处理视频。
在一种可能的实现方式中,针对不同类型的图像处理任务,可以基于深度学习的方法对待处理图像进行目标类型优化处理。在一个实施例中,可以利用全卷积神经网络将待处理图像转换成初始优化图像,例如包含3 个卷积层,且卷积核的大小为1×1的全卷积神经网络。也可以在全卷积神经网络的基础上增加其他的网络结构,形成新的网络模型。示例性的,本申请提供了一种优化模型,以对待处理图像进行目标类型优化处理,得到初始优化图像。
本申请提供的优化模型如图2所示。优化模型包括主网络和颜色条件网络。其中,颜色条件网络包括依次连接的至少一个颜色条件模块(Color Condition Block,CCB)和特征转换模块。至少一个颜色条件模块用于从待处理图像的低分辨率图像中提取全局颜色特征信息。特征转换模块用于将全局颜色特征信息转换为N组调节参数。N组调节参数分别用于调节主网络在将待处理图像转换为优化图像的过程中提取到的N个中间特征,N为大于或者等于1的整数。
示例性的,可以将待处理图像下采样一定的倍数(例如下采样4倍)后得到相应的低分辨率图像。假设将待处理图像下采样4倍后得到低分辨率图像,低分辨率图像与待处理图像的大小相同,只是待处理图像在单位面积内的像素数量为低分辨率图像在单位面积内的像素数量的4倍。
如图2所示,颜色条件模块包括依次连接的卷积层、池化层、第一激活函数和IN(Instance Normalization)层。该颜色条件模块可以对输入的低分辨率图像进行全局特征提取,与基于图像局部特征提取的方法相比,可以有效的表征待处理图像的全局特征信息,进而可以避免在优化图像中引入人工伪影。
特征转换模块包括Dropout层、卷积层、池化层和N个全连接层。其中,Dropout层、卷积层和池化层依次连接,用于对至少一个颜色条件模块提取到的全局颜色特征信息进行处理,得到条件向量。N个全连接层分别用于对条件向量进行特征转换,得到N组调节参数。需要说明的是,每个全连接层都分别对条件向量进行处理得到一组调节参数,最终全连接层的个数可与调节参数的组数相同。
示例性的,如图2所示的优化模型中包括依次连接的4个颜色条件模块。在颜色条件模块与特征转换模块中,卷积层中卷积核的大小均为1×1,且池化层均采用平均池化。第一激活函数为非线性激活函数LeakyReLU。
在本申请实施例中,主网络包括N个全局特征调制(Global Feature  Modulation,GFM)层,将N组调节参数输入至N个GFM层。GFM层可以根据调节参数对输入至GFM层的中间特征进行调节。
在一个示例中,主网络还包括N个卷积层及N-1个第二激活函数,N个GFM层分别连接N个卷积层的输出端。主网络用于将待处理图像转换成优化图像,且在转换的过程中,N个卷积层可用于提取N个中间特征。每个卷积层中卷积核大小均为1×1。第二激活函数可以为非线性激活函数ReLU。
需要说明的是,颜色条件网络中全连接层的个数以及对应生成的调节参数的组数应基于主网络中卷积层的个数进行设计。例如,主网络中包括N个卷积层,则说明需要对N个卷积层生成的N个中间特征进行调节。因此,颜色条件网络中需要输出与N个中间特征对应的N组调节参数,主网络中需要有N个GFM层根据这N组调节参数对N个中间特征进行调节。
示例性的,如图2所示,假设N=3,则主网络包括3个卷积(Conv)层、3个GFM层及2个第二激活函数(ReLU)层。具体的,主网络从输入到输出依次包括卷积层、GFM层、ReLU层、卷积层、GFM层、ReLU层、卷积层和GFM层。相应的,在颜色条件网络中,颜色条件模块包括依次连接的4个CCB层;特征转换模块可以包括依次连接的Dropout层、卷积(Conv)层、平均池化(Avgpool)层,以及分别与平均池化层输出的条件向量(Condition Vector)连接的3个全连接(FC)层。每个全连接层可以将该条件向量转换成相应的一组调节参数(λ,β),颜色条件网络共输出3组调节参数(即调节参数1、调节参数2和调节参数3)。主网络中的每个GFM层根据对应的调节参数对输入到该GFM层的中间特征进行调节,可以表示为公式(1):
GFM(x i)=γ*x i+β        (1)
在公式(1)中,x i表示输入到GFM层的第i个中间特征;GFM(x i)表示GFM层根据调节参数(λ,β)对输入的中间特征x i的调节结果。
可以理解的是,包含不同场景的待处理图像与初始优化图像之间存在不同的颜色映射关系。本申请提供的优化模型,通过颜色条件网络提取待处理图像的颜色特征信息作为先验信息,用于调节主网络中的中间特征,使得优化模型可以基于不同待处理图像的颜色先验特征信息自适应地输出 与待处理图像对应的初始优化图像,可以避免初始优化图像中出现人工伪影。
在另一种可能的实现方式中,还可以通过颜色查找表或传统的数字图像处理方法对待处理图像进行目标类型优化处理,得到初始优化图像。
通过上述实施例所述的方法获取到的初始优化图像与待处理图像相比色彩丰富度、对比度或清晰度更高,但在对待处理图像进行优化处理的过程中,初始优化图像的边缘纹理信息以及高亮部分的信息可能会丢失。因此,为了保证初始优化图像的质量,需要对初始优化图像做进一步的处理,以对初始优化图像的细节信息以及高亮区域缺失的内容信息进行补偿。
S300,通过已训练的局部增强模型对初始优化图像进行局部增强处理,得到增强图像。
在图像优化处理任务中,当原始图像存在细节模糊、亮度分布不均匀、对比度较差或受噪声污染严重等问题时,可以利用图像增强技术对原始图像进行调整以使图像更好的反应真实环境中的视觉信息或有利用后期的图像分析处理。
在一种可能的实现方式中,可以基于神经网络的方法对初始优化图像进行增强处理,也可以通过常规的数字图像处理方法对初始优化图像进行局部增强处理。本申请实施例以基于神经网络的方法为例,对初始优化图像的局部增强处理过程进行示例性的介绍。
在一个实施例中,本申请提供的局部增强模型如图3所示。该局部增强模型包括下采样模块、上采样模块以及设置在下采样模块和上采样模块之间的多个残差网络。其中,下采样模块包括至少一组交替设置的第一卷积层(Conv1)和第一激活层(ReLU1),上采样模块包括上采样层以及至少一组交替设置的第二激活层(Conv2)和第二卷积层(ReLU2)。示例性的,如图3所示,上采样层可以为像素重组层(Pixel shuffle)。
在一个实施例中,如图3所示,残差网络包括第三卷积层(Conv3)、激活层、第四卷积层(Conv4)和跳跃连接。具体地,针对任意一个残差网络,输入到残差网络的第一图像特征依次经过第一卷积层、激活层和第二卷积层处理后,得到第二图像特征,并通过跳跃连接对第一图像特征和第二图像进行融合,同时将融合结果作为下一层的输入。示例性的,激活层 可以为非线性激活函数ReLU。
在图像颜色映射任务中,将待处理图像转换成初始优化图像时,一般仅考虑到颜色的映射,而忽略了对细节特征信息的提取,导致初始优化图像中丢失了部分细节信息。因此,需要对初始优化图像进行局部增强处理,本申请将初始优化图像输入到局部增强模型中,以对初始优化图像的边缘纹理细节信息进行增强,得到增强图像。
S300,将增强图像和待处理图像的过曝掩模图像输入到已训练的补偿模型中进行处理,对增强图像的高亮区域进行信息补偿,得到补偿图像,过曝掩模图像指示高亮区域。
在一种可能的实现方式中,可以采用深度学习的方法,利用已训练的神经网络模型对过曝光区域的信息进行补偿。示例性的,本申请实施例提供了一种补偿模型,将增强图像和待处理图像的过曝掩模图像(Over-exposed mask)输入到已训练的补偿模型中,对增强图像的高亮区域进行信息补偿。
参考图4,对本申请提供的补偿模型进行示例性的说明。该补偿模型包括生成器。具体地,将增强图像输入到已训练的生成器中进行处理,得到全局曝光信息,根据待处理图像的过曝掩模图像和全局曝光信息确定高亮区域的过曝光信息,利用过曝光信息对高亮区域进行补偿,得到补偿图像。
在一个实施例中,可以通过公式(2)获取待处理图像的过曝掩模图像,即:
Figure PCTCN2021138137-appb-000002
在公式(2)中,I mask(x,y)表示过曝掩模图像在(x,y)处的像素点的像素值;I S(x,y)表示待处理图像在(x,y)处的像素点的像素值;λ为预设的过曝阈值,用于控制待处理图像的过曝程度,可以根据实际需要设定对应的数值。可以根据过曝掩模图像中的像素点的像素值确定待处理图像中的高亮区域。
需要说明的是,生成器可以是包含任意卷积层的神经网络结构,用于从增强图像中获取到全局曝光信息。示例性的,本申请提供的生成器(Generator)的结构如图4所示,该生成器包括:依次连接的多个下采样模 块和与多个上采样模块一一对应的多个上采样模块。其中,下采样模块包括卷积层和下采样层(DownSample),上采样模块包括上采样层(UpSample)和卷积层。在本申请中,将增强图像输入到已训练的生成器中,可以从增强图像中获取到全局曝光信息。
在一个实施例中,根据待处理图像的过曝掩模图像和全局曝光信息确定高亮区域的过曝光信息,利用过曝光信息对高亮区域进行补偿,得到补偿图像,具体包括:将全局曝光信息与过曝掩模图像逐像素相乘,得到高亮区域的过曝光信息;将过曝光信息与增强图像相加,得到补偿图像。该过程可以表示为公式(3):
I H=I mask×G(I LE)+I LE         (3)
公式(3)中,I H表示补偿图像;I mask表示过曝掩模图像;I LE表示增强图像;G(I LE)表示生成器对增强图像处理后得到的高亮区域的过曝光信息。
本申请提供的图像处理方法利用优化模型对待处理图像进行目标类型优化处理,得到初始优化图像。针对初始优化图像中丢失的细节信息,可以利用局部增强模型进行增强处理,对损失的纹理细节信息进行重建,得到增强图像。同时,从增强图像中提取全局曝光信息,通过待处理图像的过曝掩模图像确定增强图像高亮区域的过曝光信息,将过曝光信息与增强图像融合后,可以弥补增强图像高亮部分缺失的内容信息。通过串联的多个神经网络模型对待处理图像进行优化处理,以对损失的信息进行补偿,最终得到的优化图像(即补偿图像)的高亮区域比初始优化图像的高亮区域具备更多的特征信息,同时边缘纹理信息信息也更加丰富,可以避免优化图像中出现伪影以及色彩偏差,提高了图像优化处理任务中优化图像的质量。
本申请提供的优化模型、局部增强模型和补偿模型均具备泛用性。一方面,可以分别利用这三类模型单独执行对应的任务。具体地,优化模型可以应用于任何需要对待处理图像或待处理视频帧进行颜色优化或者颜色转换的任务。局部增强模型可以应用于任何需要对图像或视频帧的纹理细节信息进行增强的任务。补偿模型可以应用于任何需要对图像或视频帧的高亮区域的内容信息进行补偿的任务。另一方面,可以将优化模型分别与局部增强模型和补偿模型中任意一个串联起来,分别对图像优化任务(例 如颜色优化或者颜色转换)中得到的初始优化图像的细节或高亮区域的信息做进一步的增强;也可以将优化模型、局部增强模型和补偿模型三者串联起来,使局部增强模型和补偿模型对依次对经优化模型进行目标类型优化后得到的初始优化图像,进行细节信息和高亮区域信息的补偿。图像处理任务包括图像编辑、图像修饰与调色、图像上色、SDR(Standard Dynamic Range)视频转HDR(High Dynamic Range)视频、图像去噪,图像超分辨处理等。
以SDR视频转HDR视频为例,由于受到拍摄设备的限制,现有的HDR视频资源较少,需要将已有的大量的SDR视频转换成HDR视频以满足用户的需求。图5为HDR和SDR色域表示范围的示意图。其中,BT.709和BT.2020都是ITU(国际电信联盟)发布的电视参数标准,DCI-P3是美国电影工业为数字电影院所制定的色域标准。从图5中可以看出,DCI-P3、BT.709和BT.2020中色域范围最大的是BT.2020,DCI-P3的色域范围次之,BT.709所表示的色域范围最小。目前,SDR视频采用的是BT.709色域,而HDR视频采用的是色域范围更为宽广的BT.2020色域或DCI-P3色域。就同一视频而言,无论HDR视频采用BT.2020色域还是DCI-P3色域,HDR视频可以比SDR视频展现出更高的对比度以及更加丰度的色彩。
常见的视频转换方法都是通过图像编码技术将SDR数据转换成HDR数据,使得HDR数据可以在HDR终端设备上播放。此外还需要通过超分辨率转换方法,将低分辨率的SDR视频内容转换成符合HDR视频标准的高分辨率HDR视频内容。现有的视频转换方法的计算成本较高,且转换后的HDR视频中会丢失部分的细节信息。若SDR视频内容的曝光度过高,一些高亮区域的信息不容易被提取出来。如果按照对正常曝光图像的优化处理方法对过曝光的SDR视频内容进行处理,会使HDR视频中丢失部分高亮区域的内容信息,从而影响视频的质量。与现有方法相比,本申请提供的图像处理方法依次利用优化模型、局部增强模型和补偿模型直接对SDR视频帧进行处理,将SDR视频帧转换成HDR视频帧,并对HDR视频帧的细节信息和亮度区域的内容信息做进一步的增强处理,避免HDR视频中出现伪影和色彩偏差。
可以理解的是,针对不同的任务,可以通过设计对应的训练集和损失 函数分别对优化初始模型、局部增强初始模型和补偿初始模型进行训练,从而得到适用于不同任务的优化模型、局部增强模型和补偿模型。
下面以SDR视频转HDR视频任务为例,对本申请提供的颜色初始映射模型、局部增强初始模型和补偿初始模型的训练过程及应用分别进行示例性的说明。
优化初始模型的网络结构与图2所示的优化模型的结构相同。在一个实施例中,优化初始模型的训练过程如下:
步骤一,获取训练集。
针对基于优化模型的SDR视频转HDR视频任务,训练集可以包括多个SDR视频帧样本及与多个SDR视频帧样本一一对应的HDR视频帧样本。
具体的,首先获取SDR视频样本及其对应的HDR视频样本。示例性的,可以从公开的视频网站中获取SDR视频样本及对应的HDR视频样本。也可以对同一RAW数据格式的视频分别进行SDR和HDR处理,得到SDR视频样本及其对应的HDR视频样本。还可以分别利用SDR相机和HDR相机,拍摄同一场景下对应的SDR视频样本和HDR视频样本。在获取到SDR视频样本及其对应的HDR视频样本之后,分别对SDR视频样本及其对应的HDR视频样本进行抽帧处理,得到多个SDR视频帧样本,以及在时序上和空间上与多个SDR视频帧样本一一对应的HDR视频帧样本。
步骤二,利用训练集和预设的损失函数对优化初始模型进行训练,得到优化模型。
在搭建好优化初始模型后,将SDR视频帧样本输入到优化初始模型的主网络中。分别对多个SDR视频帧样本进行下采样处理,得到多个低分辨率图像,并将低分辨率图像输入到优化初始模型的颜色条件网络中得到调节参数,以调节优化初始模型预测的HDR视频帧。
预设的损失函数f 1用于描述优化初始模型预测的HDR视频帧
Figure PCTCN2021138137-appb-000003
与HDR视频帧样本H之间的L2损失。可以表示为公式(4):
Figure PCTCN2021138137-appb-000004
基于训练集和上述预设的损失函数,可以通过梯度下降法对优化初始模型进行迭代训练,直到模型收敛,即可得到已训练的优化模型。
局部增强初始模型的结构与图3所示的局部增强模型的结构相同。在 另一个实施例中,对局部增强初始模型的训练过程如下:
步骤一,获取训练集。
首先获取SDR视频样本及其对应的HDR视频样本。具体地获取方法可以参考上述对优化初始模型的训练过程中的描述。然后对SDR视频样本及其对应的HDR视频样本进行抽帧处理,得到多个SDR视频帧样本以及在时序上和空间上与多个SDR视频帧样本一一对应的HDR视频帧样本。
针对每个SDR视频帧样本,可以将SDR视频帧样本输入到本申请提供的已训练的优化模型中或者其他的已训练的神经网络模型进行HDR转换处理,或者通过颜色查找表对SDR视频帧样本进行HDR转换处理,以将SDR视频帧样本转化成初始优化图像样本,该初始优化图像样本实际为HDR数据的图像样本。因此,在对局部增强初始模型进行训练时,训练集包括多个训练样本,每个训练样本包括与SDR视频帧样本对应的初始优化图像样本和HDR视频帧样本。
步骤二,利用训练集和预设的损失函数对局部增强初始模型进行训练,得到局部增强模型。
针对训练集中的每个训练样本,将训练样本中的初始优化图像样本输入到如图3所示的局部增强初始模型中进行训练。具体地,依次通过下采样模块、多个残差网络和上采样模块对初始优化图像样本的细节信息进行增强处理,得到预测增强图像。根据预测增强图像和与初始优化图像样本对应的HDR视频帧样本对损失函数进行迭代训练,直至模型收敛,得到局部增强模型。
示例性的,对局部增强初始模型进行训练时,可以使用L2损失函数,并采用梯度下降法对损失函数进行迭代训练。
在一个实施例中,可以通过构建生成对抗网络对补偿初始模型中的生成器进行训练。利用预设的损失函数和训练集对生成对抗网络进行对抗训练,得到生成器。其中,训练集包括与多个待处理图像样本对应的增强图像样本、过曝掩模图像样本和补偿图像样本。本申请提供的补偿初始模型如图6所示,该模型包括生成器的初始模型和判别器,生成器的初始模型和判别器构成生成对抗网络。对补偿初始模型进行训练的过程如下所示:
步骤一,获取训练集。
首先获取SDR视频样本及其对应的HDR视频样本。具体地获取方法可以参考上述对优化初始模型的训练过程中的描述。然后对SDR视频样本及其对应的HDR视频样本进行抽帧处理,得到多个SDR视频帧样本以及在时序上和空间上与多个SDR视频帧样本一一对应的HDR视频帧样本。
在本实施例中,针对每一个SDR视频帧样本,可以通过已训练的优化模型或颜色查找表或其他已训练的神经网络模型对SDR视频帧样本进行HDR转换,得到初始优化图像样本,以及利用已训练的局部增强模型或其他已训练的神经网络模型对初始优化图像样的细节信息进行增强处理,得到对应的增强图像样本。此外,还可以利用上述公式(2)获取SDR视频帧样本对应的过曝掩模图像样本。因此,在对补偿初始模型进行训练时,训练集包括多个训练样本,每个训练样本包括与SDR视频帧样本对应的增强图像样本、过曝掩模图像样本及HDR视频帧样本。
步骤二,将训练集中的增强图像样本和过曝掩模图像样本输入到补偿初始模型进行处理,得到预测图像。
具体地,针对每个训练样本,将训练样本中的增强图像样本输入到生成器的初始模型中进行处理,得到全局曝光信息。将全局曝光信息与对应的过曝掩模图像样本逐像素相乘后,得到高亮区域的过曝光信息。将过曝光信息与增强图像样本进行融合,得到预测图像。
步骤三:将预测图像与对应的HDR视频帧样本输入到判别器中进行迭代训练,得到补偿模型。
在一个实施例中,针对每个训练样本,将预测图像和训练样本中对应的HDR视频帧样本输入到判别器中进行处理,得到训练样本的判别结果。根据每个训练样本的判别结果和预设的损失函数对初始的补偿模型进行迭代训练,得到已训练的补偿模型。
在本实施例中,预设的损失函数用于描述HDR视频帧样本(即补偿图像样本)与预测图像之间的绝对误差损失值
Figure PCTCN2021138137-appb-000005
HDR视频帧样本(即补偿图像样本)与预测图像之间的感知损失值
Figure PCTCN2021138137-appb-000006
和预测图像的判别器损失值L GAN=-logD(I H)的综合损失值。本申请实施例提供的预设的损失函数L可以表示为公式(5):
Figure PCTCN2021138137-appb-000007
其中,L 1表示绝对误差损失;L p表示感知损失;L GAN表示生成对抗损失;I GT表示HDR视频帧样本(即补偿图像样本);I H表示预测图像;D(·)表示判别器的输出;α、β和γ均为超参数。
示例性的,可以使用梯度下降法进行训练,当预设的损失函数满足一定的要求时,表示模型已经收敛,即补偿初始模型已经完成训练,得到补偿模型。
已训练的优化模型、局部增强模型和补偿模型可以将SDR视频转换成HDR视频。图7为本申请实施例提供的将SDR视频转换成HDR视频的流程图。具体地,对待处理的SDR视频进行抽帧处理,得到SDR视频帧。针对每个SDR视频帧,将SDR视频帧输入到已训练的优化模型中进行HDR转换处理,即将SDR视频帧转换成HDR数据,得到初始优化图像。通过已训练的局部增强模型对初始优化图像的纹理细节信息进行增强处理,得到增强图像。根据SDR视频帧的过曝掩模图像,同时利用已训练的补偿模型对增强图像高亮区域的内容信息进行补偿,得到最终的补偿图像。将每个SDR视频帧对应的补偿图像进行合帧,得到与待处理的SDR视频对应的HDR视频。
下面以图2所示优化模型,图3所示局部增强模型和图4所示补偿模型串联得到的串联模型为例,结合表1,对本申请提供的串联模型的性能进行说明:
表1中,残差网络(ResNet)、环形生成对抗网络(CycleGAN)和像素到像素生成网络(Pixel 2 Pixel)是用于图像到图像的转换(image-to-image traslation)的算法模型。高动态范围网络(High Dynamic Range Net,HDRNet)、条件序列图像修饰网络(Conditional Sequential Retouching Network,CSRNet)和自适应3D查找表(Adaptive 3D lookup table。Ada-3DLUT)网络是用于图像修饰(photo retouching)的算法模型。深度超分辨联合逆色调映射方法(Deep super-resolution inverse tone-mapping,Deep SR-ITM)和超分辨联合逆色调映射生成对抗网络(GAN-Based Joint  Super-Resolution and Inverse Tone-Mapping,JSI-GAN)是用于SDR视频到HDR视频转换的算法模型。
表1
模型 Params PSNR SSIM SR-SIM ΔE ITP HDR-VDP3
ResNet 1.37M 37.32 0.9720 0.9950 9.02 8.391
Pixel2Pixel 11.38M 25.80 0.8777 0.9871 44.25 7.136
CycleGAN 11.38M 21.33 0.8496 0.9595 77.74 6.941
HDRNet 482K 35.73 0.9664 0.9957 11.52 8.462
CSRNet 36K 35.04 0.9625 0.9955 14.28 8.400
Ada-3DLUT 594K 36.22 0.9658 0.9967 10.89 8.423
Deep SR-ITM 2.87M 37.10 0.9686 0.9950 9.24 8.233
JSI-GAN 1.06M 37.01 0.9694 0.9928 9.36 8.169
串联模型 37.2M 37.21 0.9699 0.9968 9.11 8.569
从表1可以看出,在峰值信噪比(Peak Signal to Noise Ratio,PSNR)、结构相似性度量指标(structural similarity index measure,SSIM)、基于光谱残差的相似度度量指标(spectral residual based similarity index measure,SR-SIM)、色彩保真度ΔE ITP、高动态范围可见差异预测(High Dynamic Range Visible Difference Predictor,HDR-VDP3)等性能指标上,本申请提供的串联模型具备非常好的实验效果。
图8为利用表1中列举的各个模型对同一图片进行处理后,得到的优化图像的示例。从图8列举的两个示例可以看出,采用本申请提供的方法,基于串联的多个模型对图像进行优化,在颜色过度区域的优化效果明显更好。
综上可知,本申请提供的通过串联的多个深度学习模型对待处理图像进行优化处理的方法,能够减少优化图像过程中的信息丢失中,相比与现有技术,明显能够提高优化图像的质量。
基于同一发明构思,本申请实施例还提供了一种图像处理装置。如图9所示,该装置400包括:优化单元401,用于利用已训练的优化模型对待处 理图像进行目标类型优化处理,得到初始优化图像。增强单元402,用于通过已训练的局部增强模型对初始优化图像进行局部增强处理,得到增强图像。补偿单元403,用于将增强图像和待处理图像的过曝掩模图像输入到已训练的补偿模型中进行处理,对增强图像的高亮区域进行信息补偿,得到补偿图像,过曝掩模图像指示高亮区域。
可选地,局部增强模型包括:下采样模块、上采样模块以及设置在下采样模块和上采样模块之间的多个残差网络。
可选地,过曝掩模图像中像素点的像素值的确定方法包括:根据公式
Figure PCTCN2021138137-appb-000008
确定过曝掩模图像中像素点的像素值,其中,I mask(x,y)表示过曝掩模图像位于(x,y)处的像素点的像素值,I s(x,y)表示待处理图像位于(x,y)处的像素点的像素值,λ表示预设的过曝阈值。
可选地,补偿模型包括生成器;将增强图像和待处理图像的过曝掩模图像输入到已训练的补偿模型中,对增强图像的高亮区域进行信息补偿,包括:将增强图像输入到已训练的生成器中进行处理,得到全局曝光信息;根据待处理图像的过曝掩模图像和全局曝光信息确定高亮区域的过曝光信息;利用过曝光信息对高亮区域进行补偿,得到补偿图像。
可选地,分别对优化初始模型、局部增强初始模型和补偿初始模型进行训练,得到对应的优化模型、局部增强模型和补偿模型。
可选地,生成器的训练方法包括:构建生成对抗网络,生成对抗网络包括生成器的初始模型和判别器;利用预设的损失函数和训练集对生成对抗网络进行对抗训练,得到生成器,其中,训练集包括与多个待处理图像样本对应的增强图像样本、过曝掩模图像样本和补偿图像样本;损失函数用于描述补偿图像样本与预测图像之间的绝对误差损失值、补偿图像样本与预测图像之间的感知损失值和预测图像的判别器损失值的综合损失值;预测图像是指增强图像样本经过生成器的初始模型处理后,与过曝掩模图像样本相乘,再与增强图像样本叠加得到的图像。
可选地,目标类型优化处理是指HDR转换处理,待处理图像为从SDR视频中抽帧得到的视频帧,SDR视频中的每一帧视频帧依次经过优化模型、局部增强模型和补偿模型处理后输出的补偿图像,经过合帧后得到与SDR视频对应的HDR视频。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
基于同一发明构思,本申请实施例还提供了一种终端设备。如图10所示,该实施例的终端设备500包括:处理器501、存储器502以及存储在存储器502中并可在处理器501上运行的计算机程序504。计算机程序504可被处理器501运行,生成指令503,处理器501可根据指令503实现上述各个图像色彩优化方法实施例中的步骤。或者,处理器501执行计算机程序504时实现上述各装置实施例中各模块/单元的功能,例如图9所示的单元401和单元402的功能。
示例性的,计算机程序504可以被分割成一个或多个模块/单元,一个或者多个模块/单元被存储在存储器502中,并由处理器501执行,以完成本申请。一个或多个模块/单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述计算机程序504在终端设备500中的执行过程。
本领域技术人员可以理解,图10仅仅是终端设备500的示例,并不构成对终端设备500的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如终端设备600还可以包括输入输出设备、网络接入设备、总线等。
处理器501可以是中央处理单元(Central Processing Unit,CPU),还可以是其它通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其它可编程逻辑器件、 分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
存储器502可以是终端设备500的内部存储单元,例如终端设备500的硬盘或内存。存储器502也可以是终端设备500的外部存储设备,例如终端设备500上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器502还可以既包括终端设备500的内部存储单元也包括外部存储设备。存储器502用于存储计算机程序以及终端设备500所需的其它程序和数据。存储器502还可以用于暂时地存储已经输出或者将要输出的数据。
本实施例提供的终端设备可以执行上述方法实施例,其实现原理与技术效果类似,此处不再赘述。
本申请实施例还提供一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现上述方法实施例所述的方法。
本申请实施例还提供一种计算机程序产品,当计算机程序产品在终端设备上运行时,使得终端设备执行时实现上述方法实施例所述的方法。
上述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读存储介质至少可以包括:能够将计算机程序代码携带到拍照装置/终端设备的任何实体或装置、记录介质、计算机存储器、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、电载波信号、电信信号以及软件分发介质。例如U盘、移动硬盘、磁碟或者光盘等。
在本申请中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、 “在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。
在本申请的描述中,需要理解的是,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。
此外,在本申请中,除非另有明确的规定和限定,术语“连接”、“相连”等应做广义理解,例如可以是机械连接,也可以是电连接;可以是直接连接,也可以通过中间媒介间接相连,可以是两个元件内部的连通或两个元件的相互作用关系,除非另有明确的限定、对于本领域的普通技术人员而言,可以根据具体情况理解上述术语在本申请中的具体含义。
以上各实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述各实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。

Claims (10)

  1. 一种图像处理方法,其特征在于,包括:
    利用已训练的优化模型对待处理图像进行目标类型优化处理,得到初始优化图像;
    通过已训练的局部增强模型对所述初始优化图像进行局部增强处理,得到增强图像;
    将所述增强图像和所述待处理图像的过曝掩模图像输入到已训练的补偿模型中进行处理,对所述增强图像的高亮区域进行信息补偿,得到补偿图像,所述过曝掩模图像指示所述高亮区域。
  2. 根据权利要求1所述的方法,其特征在于,所述局部增强模型包括:下采样模块、上采样模块以及设置在所述下采样模块和所述上采样模块之间的多个残差网络。
  3. 根据权利要求1所述的方法,其特征在于,所述过曝掩模图像中像素点的像素值的确定方法包括:
    根据公式
    Figure PCTCN2021138137-appb-100001
    确定所述过曝掩模图像中像素点的像素值,其中,I mask(x,y)表示所述过曝掩模图像位于(x,y)处的像素点的像素值,I s(x,y)表示所述待处理图像位于(x,y)处的像素点的像素值,λ表示预设的过曝阈值。
  4. 根据权利要求1所述的方法,其特征在于,所述补偿模型包括生成器;
    所述将所述增强图像和所述待处理图像的过曝掩模图像输入到已训练的补偿模型中,对所述增强图像的高亮区域进行信息补偿,包括:
    将所述增强图像输入到已训练的所述生成器中进行处理,得到全局曝光信息;
    根据所述待处理图像的过曝掩模图像和所述全局曝光信息确定所述高亮区域的过曝光信息;
    利用所述过曝光信息对所述高亮区域进行补偿,得到所述补偿图像。
  5. 根据权利要求1所述的方法,其特征在于,分别对优化初始模型、局部增强初始模型和补偿初始模型进行训练,得到对应的所述优化模型、所述局部增强模型和所述补偿模型。
  6. 根据权利要求4所述的方法,其特征在于,所述生成器的训练方法包括:
    构建生成对抗网络,所述生成对抗网络包括所述生成器的初始模型和判别器;
    利用预设的损失函数和训练集对所述生成对抗网络进行对抗训练,得到所述生成器,其中,训练集包括与多个待处理图像样本对应的增强图像样本、过曝掩模图像样本和补偿图像样本;
    所述损失函数用于描述所述补偿图像样本与预测图像之间的绝对误差损失值、所述补偿图像样本与所述预测图像之间的感知损失值和所述预测图像的判别器损失值的综合损失值;所述预测图像是指所述增强图像样本经过所述生成器的初始模型处理后,与所述过曝掩模图像样本相乘,再与所述增强图像样本叠加得到的图像。
  7. 根据权利要求1至6任一项所述的方法,其特征在于,所述目标类型优化处理是指HDR转换处理,所述待处理图像为从SDR视频中抽帧得到的视频帧,所述SDR视频中的每一帧视频帧依次经过所述优化模型、所述局部增强模型和所述补偿模型处理后输出的所述补偿图像,经过合帧后得到与所述SDR视频对应的HDR视频。
  8. 一种图像处理装置,其特征在于,包括:
    优化单元,用于利用已训练的优化模型对待处理图像进行目标类型优化处理,得到初始优化图像;
    增强单元,用于通过已训练的局部增强模型对所述初始优化图像进行局部增强处理,得到增强图像;
    补偿单元,用于将所述增强图像和所述待处理图像的过曝掩模图像输入到已训练的补偿模型中进行处理,对所述增强图像的高亮区域进行信息补偿,得到补偿图像,所述过曝掩模图像指示所述高亮区域。
  9. 一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至7任一项所述的方法。
  10. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至7 任一项所述的方法。
PCT/CN2021/138137 2021-08-02 2021-12-15 一种图像处理方法、装置、终端设备及存储介质 WO2023010754A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110882192.3 2021-08-02
CN202110882192.3A CN113781320A (zh) 2021-08-02 2021-08-02 一种图像处理方法、装置、终端设备及存储介质

Publications (1)

Publication Number Publication Date
WO2023010754A1 true WO2023010754A1 (zh) 2023-02-09

Family

ID=78836457

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/138137 WO2023010754A1 (zh) 2021-08-02 2021-12-15 一种图像处理方法、装置、终端设备及存储介质

Country Status (2)

Country Link
CN (1) CN113781320A (zh)
WO (1) WO2023010754A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116580269A (zh) * 2023-07-13 2023-08-11 荣耀终端有限公司 训练模型的方法、处理图像的方法、电子设备及存储介质
CN116778095A (zh) * 2023-08-22 2023-09-19 苏州海赛人工智能有限公司 一种基于人工智能的三维重建方法

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113781320A (zh) * 2021-08-02 2021-12-10 中国科学院深圳先进技术研究院 一种图像处理方法、装置、终端设备及存储介质
CN113920493B (zh) * 2021-12-15 2022-04-05 深圳佑驾创新科技有限公司 遗落物品的检测方法、装置、设备及存储介质
CN114494063B (zh) * 2022-01-25 2023-04-07 电子科技大学 一种基于生物视觉机制的夜间交通图像增强方法
CN115082358B (zh) * 2022-07-21 2022-12-09 深圳思谋信息科技有限公司 图像增强方法、装置、计算机设备和存储介质
CN115953333A (zh) * 2023-03-15 2023-04-11 杭州魔点科技有限公司 一种动态背光补偿方法和系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105100637A (zh) * 2015-08-31 2015-11-25 联想(北京)有限公司 一种图像处理方法及电子设备
CN106791471A (zh) * 2016-12-29 2017-05-31 宇龙计算机通信科技(深圳)有限公司 图像优化方法、图像优化装置和终端
CN108648163A (zh) * 2018-05-17 2018-10-12 厦门美图之家科技有限公司 一种人脸图像的增强方法及计算设备
US20210150681A1 (en) * 2019-11-18 2021-05-20 Shinyfields Limited Systems and Methods for Selective Enhancement of Objects in Images
CN113781320A (zh) * 2021-08-02 2021-12-10 中国科学院深圳先进技术研究院 一种图像处理方法、装置、终端设备及存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107635102B (zh) * 2017-10-30 2020-02-14 Oppo广东移动通信有限公司 高动态范围图像曝光补偿值获取方法和装置
CN109345485B (zh) * 2018-10-22 2021-04-16 北京达佳互联信息技术有限公司 一种图像增强方法、装置、电子设备及存储介质
CN112348747A (zh) * 2019-08-08 2021-02-09 苏州科达科技股份有限公司 图像增强方法、装置及存储介质
WO2021026822A1 (zh) * 2019-08-14 2021-02-18 深圳市大疆创新科技有限公司 图像处理方法、装置、图像拍摄设备及移动终端
CN111861940A (zh) * 2020-07-31 2020-10-30 中国科学院深圳先进技术研究院 一种基于条件连续调节的图像调色增强方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105100637A (zh) * 2015-08-31 2015-11-25 联想(北京)有限公司 一种图像处理方法及电子设备
CN106791471A (zh) * 2016-12-29 2017-05-31 宇龙计算机通信科技(深圳)有限公司 图像优化方法、图像优化装置和终端
CN108648163A (zh) * 2018-05-17 2018-10-12 厦门美图之家科技有限公司 一种人脸图像的增强方法及计算设备
US20210150681A1 (en) * 2019-11-18 2021-05-20 Shinyfields Limited Systems and Methods for Selective Enhancement of Objects in Images
CN113781320A (zh) * 2021-08-02 2021-12-10 中国科学院深圳先进技术研究院 一种图像处理方法、装置、终端设备及存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116580269A (zh) * 2023-07-13 2023-08-11 荣耀终端有限公司 训练模型的方法、处理图像的方法、电子设备及存储介质
CN116580269B (zh) * 2023-07-13 2023-09-19 荣耀终端有限公司 训练模型的方法、处理图像的方法、电子设备及存储介质
CN116778095A (zh) * 2023-08-22 2023-09-19 苏州海赛人工智能有限公司 一种基于人工智能的三维重建方法
CN116778095B (zh) * 2023-08-22 2023-10-27 苏州海赛人工智能有限公司 一种基于人工智能的三维重建方法

Also Published As

Publication number Publication date
CN113781320A (zh) 2021-12-10

Similar Documents

Publication Publication Date Title
WO2023010754A1 (zh) 一种图像处理方法、装置、终端设备及存储介质
US10861133B1 (en) Super-resolution video reconstruction method, device, apparatus and computer-readable storage medium
US20200258197A1 (en) Method for generating high-resolution picture, computer device, and storage medium
US9501818B2 (en) Local multiscale tone-mapping operator
US10579908B2 (en) Machine-learning based technique for fast image enhancement
WO2023010749A1 (zh) 一种hdr视频转换方法、装置、设备及计算机存储介质
JP2021521517A (ja) ニューラルネットワークマッピングを用いるhdr画像表現
WO2023010750A1 (zh) 一种图像颜色映射方法、装置、终端设备及存储介质
EP3275190A1 (en) Chroma subsampling and gamut reshaping
US20220261961A1 (en) Method and device, electronic equipment, and storage medium
WO2023010751A1 (zh) 图像高亮区域的信息补偿方法、装置、设备及存储介质
WO2020215180A1 (zh) 图像处理方法、装置和电子设备
WO2021213336A1 (zh) 一种画质增强装置及相关方法
CN111226256A (zh) 用于图像动态范围调整的系统和方法
CN110717864B (zh) 一种图像增强方法、装置、终端设备及计算机可读介质
CN111738951A (zh) 图像处理方法及装置
CN109102463B (zh) 一种超分辨率图像重建方法及装置
Shao et al. Hybrid conditional deep inverse tone mapping
WO2022266955A1 (zh) 图像解码及处理方法、装置及设备
US10567777B2 (en) Contrast optimization and local adaptation approach for high dynamic range compression
CN111861940A (zh) 一种基于条件连续调节的图像调色增强方法
WO2023010753A1 (zh) 一种色域映射方法、装置、终端设备及存储介质
CN114240767A (zh) 一种基于曝光融合的图像宽动态范围处理方法及装置
JP2019165434A (ja) 減少したクリップ領域を有するhdrイメージを生成する方法及び装置
WO2023178648A1 (zh) 视频处理方法及装置、电子设备、计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21952617

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE