WO2023039863A1 - 图像处理模型训练方法、高动态范围图像生成方法 - Google Patents

图像处理模型训练方法、高动态范围图像生成方法 Download PDF

Info

Publication number
WO2023039863A1
WO2023039863A1 PCT/CN2021/119180 CN2021119180W WO2023039863A1 WO 2023039863 A1 WO2023039863 A1 WO 2023039863A1 CN 2021119180 W CN2021119180 W CN 2021119180W WO 2023039863 A1 WO2023039863 A1 WO 2023039863A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
dynamic range
processing model
high dynamic
image processing
Prior art date
Application number
PCT/CN2021/119180
Other languages
English (en)
French (fr)
Inventor
孙梦笛
陈冠男
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to PCT/CN2021/119180 priority Critical patent/WO2023039863A1/zh
Priority to CN202180002597.1A priority patent/CN116157825A/zh
Publication of WO2023039863A1 publication Critical patent/WO2023039863A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • G06T5/92Dynamic range modification of images or parts thereof based on global image properties
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20172Image enhancement details
    • G06T2207/20208High dynamic range [HDR] image processing

Definitions

  • the present disclosure relates to the field of computer technology, and in particular to an image processing model training method, a high dynamic range image generation method, electronic equipment, and a computer-readable storage medium.
  • High Dynamic Range Imaging (HDRI) technology is an image representation method used to achieve a larger exposure range than ordinary digital images.
  • High Dynamic Range (HDR) images can provide more Large brightness change range and more light and dark details, which enable high dynamic range images to present brightness change information closer to real scenes.
  • HDR Low Dynamic Range
  • an image processing model training method comprising: inputting a low dynamic range image into a first initial image processing model, performing high dynamic range reconstruction processing on the low dynamic range image, and generating a first high dynamic range image.
  • a loss function is generated according to the data pair of the second high dynamic range image and the real high dynamic range image; wherein the real high dynamic range image is a real high dynamic range image corresponding to the low dynamic range image.
  • the first initial image processing model and the second initial image processing model are trained using the loss function.
  • the first coefficient is a weight coefficient generated by analyzing structural features of the low dynamic range image by the second initial image processing model.
  • the weight coefficients are a 1 ⁇ 1 ⁇ 3 matrix.
  • the low dynamic range image, the first high dynamic range image and the second high dynamic range image include a fourth channel in addition to the RGB three channels of the image; the loss function Also related to the value of the fourth channel.
  • the fourth channel is the pixel maximum value of the image.
  • the loss function includes the L1 loss and the tone mapping loss of the RGB data pair of the second high dynamic range image and the real high dynamic range image, and the second high dynamic range image and the real high dynamic range image L1 loss for pixel-maximum data pairs.
  • the generating a second high dynamic range image according to the first high dynamic range image and the first coefficient includes:
  • the low dynamic range image before inputting the low dynamic range image into the second initial image processing model, it further includes: performing downsampling processing on the low dynamic range image to generate a downsampled low dynamic range image.
  • the number of layers of the second initial image processing model is smaller than the number of layers of the first initial image processing model.
  • the network optimizer uses an Adam optimizer with a learning rate of 1e-4.
  • a method for generating a high dynamic range image comprising: inputting an image to be processed into a first target image processing model, performing high dynamic range reconstruction processing on the image to be processed, and generating a first processed image;
  • the image to be processed is a low dynamic range image.
  • a second processed image is generated according to the first processed image and the second coefficient.
  • the second coefficient is a weight coefficient generated by analyzing structural features of the image to be processed by the second target image processing model.
  • the generating the second processed image according to the first processed image and the second coefficient includes: multiplying the first processed image by the second coefficient to generate The second processed image.
  • it before inputting the image to be processed into the second target image processing model, it further includes: performing downsampling processing on the image to be processed to generate a downsampled image to be processed.
  • the number of layers of the second target image processing model is smaller than the number of layers of the first target image processing model.
  • an electronic device including: a processor, a memory, and a computer program stored in the memory and operable on the processor.
  • the processor is configured to input a low dynamic range image into a first initial image processing model, perform high dynamic range reconstruction processing on the low dynamic range image, and generate a first high dynamic range image;
  • the processor is also configured For performing inputting the low dynamic range image into a second initial image processing model, generating first coefficients; the processor is further configured to perform generating a second High dynamic range images.
  • the processor is further configured to generate a loss function according to the data pair of the second high dynamic range image and the real high dynamic range image; wherein the real high dynamic range image corresponds to the low dynamic range image the real high dynamic range image; the processor is also configured to perform training on the first initial image processing model and the second initial image processing model using the loss function; the memory is configured to store The data of the first high dynamic range image, the first coefficient, and the data of the second high dynamic range image.
  • the processor is further configured to perform multiplying of the first high dynamic range image by the first coefficient to generate the second high dynamic range image.
  • the processor is further configured to perform downsampling processing on the low dynamic range image to generate a downsampled low dynamic range image before inputting the low dynamic range image into the second initial image processing model.
  • the processor is further configured to input the image to be processed into a first target image processing model, perform high dynamic range reconstruction processing on the image to be processed, and generate a first processed image;
  • the processed image is a low dynamic range image;
  • the processor is further configured to input the image to be processed into a second target image processing model to generate a second coefficient; wherein, the first target image processing model and the second target image processing model
  • the second target image processing model is obtained by training the image processing model training method as described in any one of the above;
  • the processor is also configured to perform generating a second processed image according to the first processed image and the second coefficient image.
  • the memory is configured to store the first processed image, the second coefficients and the second processed image.
  • the processor is further configured to perform multiplying of the first processed image by the second coefficient to generate the second processed image.
  • the processor is further configured to perform downsampling processing on the image to be processed before inputting the image to be processed into the second target image processing model to generate a downsampled image to be processed.
  • a computer-readable storage medium wherein computer program instructions are stored in the computer-readable storage medium, and when the computer program instructions are run on a processor, the processor is made to execute the above-mentioned some embodiments.
  • a computer program product includes computer program instructions, and when the computer program instructions are executed on the computer, the computer program instructions cause the computer to execute one or more steps in the image processing model training method as described in some of the above embodiments, And/or, one or more steps in the image processing method described in some of the above embodiments.
  • a computer program When the computer program is executed on the computer, the computer program causes the computer to execute one or more steps in the image processing model training method as described in some of the above embodiments, and/or, as described in some of the above embodiments One or more steps in the image processing method.
  • FIG. 1 is a flowchart of a high dynamic range image training method according to some embodiments
  • FIG. 2 is another flowchart of a high dynamic range image training method according to some embodiments
  • FIG. 3 is a step diagram of a high dynamic range image training method according to some embodiments.
  • FIG. 4 is a flowchart of a method for generating a high dynamic range image according to some embodiments
  • FIG. 5 is another flowchart of a method for generating a high dynamic range image according to some embodiments.
  • FIG. 6 is a step diagram of a method for generating a high dynamic range image according to some embodiments.
  • FIG. 7 is a block diagram of an electronic device according to some embodiments.
  • first and second are used for descriptive purposes only, and cannot be understood as indicating or implying relative importance or implicitly specifying the quantity of indicated technical features. Thus, a feature defined as “first” and “second” may explicitly or implicitly include one or more of these features. In the description of the embodiments of the present disclosure, unless otherwise specified, "plurality” means two or more.
  • the expressions “coupled” and “connected” and their derivatives may be used.
  • the term “connected” may be used in describing some embodiments to indicate that two or more elements are in direct physical or electrical contact with each other.
  • the term “coupled” may be used when describing some embodiments to indicate that two or more elements are in direct physical or electrical contact.
  • the terms “coupled” or “communicatively coupled” may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other.
  • the embodiments disclosed herein are not necessarily limited by the context herein.
  • a and/or B includes the following three combinations: A only, B only, and a combination of A and B.
  • the term “if” is optionally interpreted to mean “when” or “at” or “in response to determining” or “in response to detecting,” depending on the context.
  • the phrases “if it is determined that " or “if [the stated condition or event] is detected” are optionally construed to mean “when determining ! or “in response to determining ! depending on the context Or “upon detection of [stated condition or event]” or “in response to detection of [stated condition or event]”.
  • High dynamic range images can be used to describe realistic scenes with a wide range of brightness changes, and can better show the optical characteristics of high-brightness areas and low-dark areas in the scene.
  • images captured by ordinary cameras with limited dynamic range are converted into low dynamic range images.
  • a high dynamic range reconstruction method is used based on deep learning, and the low dynamic range image is passed through a set image generation network to complete the conversion of the low dynamic range image to the high dynamic range image.
  • the authenticity of the high dynamic range image obtained by the method in the related art is low, and the illuminance information of the real scene cannot be accurately restored.
  • some embodiments of the present disclosure provide an image processing model training method, as shown in FIG. 1 , FIG. 2 and FIG. 3 , the training method includes S1-S7.
  • the low dynamic range image is a randomly selected low dynamic range image
  • the low dynamic range image has a corresponding real high dynamic range image, which is called a real high dynamic range image.
  • the image data pair of the low dynamic range image and its corresponding real high dynamic range image is selected from the existing public data set, and the low dynamic range image is used as a sample for the first initial image processing model and the second initial image processing model for training.
  • the first initial image processing model and the second initial image processing model are image processing networks, for example, the image processing network is a neural network, the first initial image processing model and the second initial image processing model have initial setting parameters, and the first initial image processing
  • the model has an initial mapping relationship from low dynamic range images to high dynamic range images, and can convert low dynamic range images into high dynamic range images.
  • the second initial image processing model performs structural feature analysis on the low dynamic range image to generate weight coefficients for the first high dynamic range image.
  • the first coefficient is a weight coefficient generated by analyzing the structural features of the low dynamic range image by the second initial image processing model.
  • the first coefficient can supplement and repair details of the first high dynamic range image.
  • the weight coefficients are a 1 ⁇ 1 ⁇ 3 matrix.
  • the second initial image processing model before inputting the low dynamic range image into the second initial image processing model, it also includes:
  • a low dynamic range image with a size of M ⁇ N it is downsampled by s times to obtain a resolution image with a size of (M/s) ⁇ (N/s), where s is M and the common divisor of N, that is, the image in the original image s ⁇ s window is turned into a pixel, and the value of this pixel is the mean value of all pixels in the window.
  • the number of pixels in the image is reduced to the original s ⁇ s The square of s times the number of pixels in the s window.
  • each pixel (X, Y) in the target image B you must first Find the pixel (x, y) corresponding to the pixel (X, Y) in the original image A, and then calculate the target image B (X, Y) according to the 16 pixels closest to the pixel (x, y) in the original image A
  • the parameter of the pixel value at the place, using the Bicubic basis function to calculate the weight a ij of 16 pixels, the value of the target image B pixel (X, Y) is equal to the weighted superposition of 16 pixels.
  • the low dynamic range image is down-sampled, and the low dynamic range image is reduced, so that the image size is reduced, which can improve the processing speed of the subsequent steps, for example, it can improve the efficiency of structural feature analysis of the low dynamic range image in S2, and Improve the training speed of image processing models.
  • the number of layers of the second initial image processing model is smaller than the number of layers of the first initial image processing model.
  • the image size is reduced, so inputting the down-sampled low dynamic range image into a smaller network model can make the image size and the size of the network model smaller. Matching can also improve the training speed of the second initial image processing model.
  • the first initial image processing model is a ResNet network model or a DenseNet network model, for example, the first initial image processing model is a 34-layer, 50-layer or even 101-layer ResNet network model
  • the second initial image processing model is a VGG network model, GoogleNet Inception V1 network model, or MobileNets network model, for example, the second initial image processing model is a 16-layer or 19-layer VGG network model.
  • the ResNet network model is a residual network, which is characterized by easy optimization and the ability to increase accuracy by increasing considerable depth. Its internal residual block uses skip connections, which alleviates the problem of gradient disappearance caused by increasing depth in deep neural networks.
  • the VGG network uses multiple convolutional layers with smaller convolutional kernels (3x3) instead of a convolutional layer with a larger convolutional kernel (7x7). On the one hand, it can reduce parameters, and on the other hand, it is equivalent to more nonlinearity. Mapping, which can increase the fitting/expressive power of the network.
  • the second initial image processing model and the first initial image processing model may also use the same network model.
  • S1 and S2 may be executed simultaneously; S1 may be executed first, and then S2 may be executed, or S2 may be executed first, and then S1 may be executed.
  • the first high dynamic range image and the first coefficient are respectively obtained from the above S1 and S2, and S3 can be executed.
  • S3 includes: multiplying the first high dynamic range image by the first coefficient to generate the second high dynamic range image.
  • the image used in the present disclosure is a color image
  • the image data is a three-dimensional matrix with three channels, which are the first channel R channel, the second channel G channel and the third channel B channel, for example, the low A dynamic range image is an M ⁇ N ⁇ 3 picture, where M and N are the number of pixel rows and columns of the image, and 3 is the RGB three-channel of each pixel.
  • the pictures mentioned in this disclosure that is, the low dynamic range image, the first high dynamic range image and the second high dynamic range image include the third RGB channel in addition to the image.
  • the fourth channel can reflect the brightness information of the picture.
  • the fourth channel is a pixel maximum of the image.
  • the low dynamic range image before inputting the low dynamic range image into the first initial image processing model, it also includes: extracting the maximum pixel value of the low dynamic range image as the fourth channel, and performing channel connection.
  • S1 there will be four The channel low dynamic range image is input to the first initial image processing model.
  • the maximum value of a pixel is a single value, and the three values are expanded into a matrix of M ⁇ N ⁇ 1, and the matrix of M ⁇ N ⁇ 1 is combined with a low dynamic range image (image size is M ⁇ N ⁇ 3)
  • image size is M ⁇ N ⁇ 3
  • the three channels of the channel are connected to obtain a low dynamic range image with an image size of M ⁇ N ⁇ 4.
  • the generated first high dynamic range image is a three-dimensional matrix.
  • the first high dynamic range image also includes four channels.
  • the weight coefficient of the first high dynamic range image is a matrix of 1 ⁇ 1 ⁇ 3, and the number 3 corresponds to the three RGB channels of the image itself.
  • the first high dynamic range image is multiplied by the first coefficient to obtain a final result, that is, to output a second high dynamic range image
  • the second high dynamic range image includes four channels.
  • the first coefficient is the weight coefficient generated by analyzing the structural features of the low dynamic range image by the second initial image processing model
  • the second high dynamic range image is based on the first high dynamic range image, through the weight coefficient Adjustment, the details of the image are repaired and supplemented, so the second high dynamic range image is closer to the real high dynamic range image corresponding to the low dynamic range image than the first high dynamic range image, and can reflect a more real scene illuminance information.
  • the value of the loss function is calculated, and the loss value of the low dynamic range image in the first initial image processing model and the second initial image processing model can be obtained.
  • the loss function is obtained according to the data pair of the second high dynamic range image and the real high dynamic range image.
  • the low dynamic range image, the first high dynamic range image and the second high dynamic range image include a fourth channel in addition to the RGB three channels of the image, the loss function is divided with the image's In addition to the value of the three RGB channels, it is also related to the value of the fourth channel, which is related to the maximum pixel value of the image.
  • the loss function includes an L1 loss and a tone-mapping loss for the RGB data pairs of the second high dynamic range image and the real high dynamic range image, and the pixel maxima of the second high dynamic range image and the real high dynamic range image
  • the L1 loss of the data pair, the calculation formula of the loss function Loss is as follows:
  • I gt is the RGB data logarithmic value of the real high dynamic range image
  • I gt is the RGB data logarithmic value of the real high dynamic range image
  • pixel maximum value data pair of the second high dynamic range image It is the pixel maximum value data pair of real high dynamic range image.
  • the coefficients in this formula are preset.
  • the maximum pixel value of the image is extracted during the training process, as the fourth channel of the image, and the second high dynamic range image and the pixel maximum value of the real high dynamic range image are added to the loss function.
  • the constraint condition of the maximum pixel value of the value further constrains the training process, which can improve the training accuracy of the first initial image processing model and the second initial image processing model more specifically, and optimize the training of the network model. process to improve the high dynamic range reconstruction performance of the finally obtained first target image processing model and the second target image processing model.
  • the network optimization The machine adopts Adam optimizer, and the learning rate is 1e-4.
  • the first initial image processing model and the second initial image processing model are trained multiple times.
  • the first initial image processing model and the second initial image processing model are continuously updated.
  • the parameters in the model make the second high dynamic range image output each time closer to the dynamic range of the real scene, and the value of the loss function gradually decreases until the value of the loss function no longer decreases, that is, the loss function converges, so that the first
  • the initial image processing model and the second initial image processing model establish a mapping relationship from a single low dynamic range image to a high dynamic range image, and the first initial image processing model and the second initial image processing model obtained in the last training
  • the low dynamic range image can be directly completed in the subsequent image generation process by using the trained first target image processing model and the second target image processing model to the reconstruction of high dynamic range images.
  • the image processing model training method combines the first initial image processing model and the second initial image processing model, trains the two network models through training, and inputs the training into the first initial image
  • the processing model generates the first high dynamic range image, and at the same time generates the first coefficient through the second initial image processing model, the first coefficient is the weight coefficient of the first high dynamic range image for analyzing the structure characteristics of the low dynamic range image, through the first coefficient It can restore the details of the first high dynamic range image, and obtain the final high dynamic range image according to the output values of the two, so that the final high dynamic range image is closer to the real high dynamic range image, which can ensure the first
  • the authenticity of the target image processing model and the second target image processing model ensures the high dynamic range reconstruction quality of the image.
  • the second initial image processing model adopts a small network, which improves training efficiency and reduces work time.
  • some embodiments of the present disclosure also provide a method for generating a high dynamic range image, including:
  • the first target image processing model is trained through the image processing model training method introduced above.
  • the image to be processed is any low dynamic range image that requires high dynamic range reconstruction.
  • the second coefficient is a weight coefficient generated by analyzing the structural features of the image to be processed by the second target image processing model.
  • the second coefficient enables detail supplementation and restoration of the first processed image.
  • the second target image processing model is trained through the image processing model training method introduced above.
  • the second target image processing model before inputting the image to be processed into the second target image processing model to generate the weight coefficient of the high dynamic range image, it also includes: S20', performing downsampling processing on the image to be processed, Generate a downsampled image to be processed.
  • the number of layers of the second initial image processing model is smaller than that of the first initial image processing model. In some embodiments, the number of layers of the second target image processing model is smaller than that of the first target image processing model.
  • the first target image processing model is a ResNet network model or a DenseNet network model
  • the second target image processing model is a VGG network model, a GoogleNet Inception V1 network model, or a MobileNets network model.
  • the second target image processing model adopts a smaller network model, which can increase the processing rate of the image.
  • the second processed image is a high dynamic range image, and the image quality of the second processed image is higher than that of the first processed image, and is closer to the real scene.
  • S30 includes: multiplying the data of the first processed image by the second coefficient to generate the second processed image.
  • the high dynamic range image generation method uses the first target image processing model and the second target image processing model to process the low dynamic range image, and the first target image processing model and the second target image
  • the processing model is trained by the image processing model training method described above, so that the low dynamic range image is reconstructed by the high dynamic range through the first target image processing model, and the first processed image is output, while the low dynamic range image is passed through the second target image processing model.
  • the target image processing model analyzes its structural features to generate weight coefficients (second coefficients), which can supplement and repair the details of the first processed image, so that the second processed image obtained according to the first processed image and the weight coefficient
  • the processed image is closer to the effect of the real high dynamic image, so that by using the trained first target image processing model and the second target image processing model, the low dynamic range image can be reconstructed as much as possible to the real high dynamic range image.
  • the method improves the authenticity of the generated high dynamic range image, and can restore the illumination information of the real scene more accurately.
  • some embodiments of the present disclosure also provide an electronic device 10, which includes: a processor 1, a memory 2, and A computer program that runs.
  • the processor 1 is configured to input a low dynamic range image into a first initial image processing model, perform high dynamic range reconstruction processing on the low dynamic range image, and generate a first high dynamic range image.
  • the processor 1 is further configured to input the low dynamic range image into the second initial image processing model to generate the first coefficients.
  • the processor 1 is further configured to generate a second high dynamic range image according to the data of the first high dynamic range image and the first coefficient.
  • the processor 1 is further configured to execute a loss function generated according to the data pair of the second high dynamic range image and the real high dynamic range image; wherein the real high dynamic range image is a real high dynamic range image corresponding to the low dynamic range image .
  • the processor 1 is also configured to perform training on the first initial image processing model and the second initial image processing model using a loss function.
  • the memory 2 is configured to store data of a first high dynamic range image, first coefficients, and data of a second high dynamic range image.
  • the processor 1 is further configured to multiply the data of the first high dynamic range image by the first coefficient to generate the second high dynamic range image.
  • the processor 1 is further configured to perform downsampling processing on the low dynamic range image to generate a downsampled low dynamic range image before inputting the low dynamic range image into the second initial image processing model.
  • the processor 1 is further configured to input the image to be processed into the first target image processing model, perform high dynamic range reconstruction processing on the image to be processed, and generate the first processed image; the image to be processed is a low dynamic range image.
  • the processor 1 is further configured to input the image to be processed into a second target image processing model to generate second coefficients.
  • the first target image processing model and the second target image processing model are trained through the image processing model training method provided by some embodiments of the present disclosure.
  • the processor 1 is further configured to generate a second processed image according to the first processed image and the second coefficients.
  • the memory 2 is configured to store the first processed image, the second coefficients and the second processed image.
  • the processor 1 is further configured to perform multiplying of the first processed image by the second coefficient to generate the second processed image.
  • the processor 1 is further configured to perform downsampling processing on the image to be processed before inputting the image to be processed into the second target image processing model to generate a downsampled image to be processed.
  • the processor can be a central processing unit (Central Processing Unit, referred to as CPU), and can also be other general-purpose processors, digital signal processors (DSP), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs) ) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • CPU Central Processing Unit
  • DSP digital signal processors
  • ASICs application-specific integrated circuits
  • FPGAs field-programmable gate arrays
  • the general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • Memory can be read-only memory (Read-Only Memory, ROM) or other types of static storage devices that can store static information and instructions, random access memory (Random Access Memory, RAM) or other types of memory that can store information and instructions
  • a dynamic storage device can also be an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Compact Disc Read-Only Memory (CD-ROM) or other optical disc storage, optical disc storage (including compact discs, laser discs, optical discs, digital versatile discs, blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program code in the form of instructions or data structures and can be stored by a computer Any other medium, but not limited to.
  • the memory may exist independently and be connected to the processor through a communication bus. Memory can also be integrated with the processor.
  • Some embodiments of the present disclosure provide a computer-readable storage medium (for example, a non-transitory computer-readable storage medium).
  • Computer program instructions are stored in the computer-readable storage medium.
  • the processor is made to execute the image processing model training method provided in any of the above embodiments, and/or the high dynamic range image generation method provided in any of the above embodiments.
  • the computer-readable storage medium may include, but is not limited to: magnetic storage devices (such as hard disks, floppy disks, or magnetic tapes, etc.), optical disks (such as CDs (Compact Disks, compact disks), DVDs (Digital Versatile Disks, Digital Versatile Disk), etc.), smart cards and flash memory devices (for example, EPROM (Erasable Programmable Read-Only Memory, Erasable Programmable Read-Only Memory), card, stick or key drive, etc.).
  • magnetic storage devices such as hard disks, floppy disks, or magnetic tapes, etc.
  • optical disks such as CDs (Compact Disks, compact disks), DVDs (Digital Versatile Disks, Digital Versatile Disk), etc.
  • smart cards and flash memory devices for example, EPROM (Erasable Programmable Read-Only Memory, Erasable Programmable Read-Only Memory), card, stick or key drive, etc.
  • machine-readable storage media described in this disclosure can represent one or more devices and/or other machine-readable storage media for storing information.
  • the term "machine-readable storage medium” may include, but is not limited to, wireless channels and various other media capable of storing, containing and/or carrying instructions and/or data.
  • Some embodiments of the present disclosure also provide a computer program product.
  • the computer program product includes computer program instructions. When the computer program instructions are executed on the computer, the computer program instructions cause the computer to perform one or more steps in the image processing model training method described in some of the above embodiments, And/or, one or more steps in the image processing method described in some of the above embodiments.
  • Some embodiments of the present disclosure also provide a computer program.
  • the computer program When the computer program is executed on the computer, the computer program causes the computer to execute one or more steps in the image processing model training method as described in some of the above embodiments, and/or, as in some of the above embodiments One or more steps in the image processing method described in the embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)

Abstract

一种图像处理模型训练方法,包括:将低动态范围图像输入第一初始图像处理模型,对所述低动态范围图像进行高动态范围重建处理,生成第一高动态范围图像;将所述低动态范围图像输入第二初始图像处理模型,生成第一系数;根据所述第一高动态范围图像与所述第一系数,生成第二高动态范围图像;根据所述第二高动态范围图像与真实高动态范围图像的数据对,生成损失函数;其中,所述真实高动态范围图像为所述低动态范围图像所对应的真实的高动态范围图像;使用所述损失函数对所述第一初始图像处理模型和所述第二初始图像处理模型进行训练。

Description

图像处理模型训练方法、高动态范围图像生成方法 技术领域
本公开涉及计算机技术领域,尤其涉及一种图像处理模型训练方法、高动态范围图像生成方法、电子设备和计算机可读存储介质。
背景技术
高动态范围成像(High Dynamic Range Imaging,HDRI)技术是用来实现比普通数字图像更大曝光范围的一种图像表示方法,高动态范围(High Dynamic Range,HDR)图像可以提供比普通数字图像更大的亮度变化范围和更多的明暗细节,这使得高动态范围图像能够呈现更加接近真实场景的亮度变化信息。目前,存在将低动态范围(Low Dynamic Range,HDR)图像转换为高动态范围图像的技术,以还原逼近真实场景的照度信息。
发明内容
一方面,提供一种图像处理模型训练方法,包括:将低动态范围图像输入第一初始图像处理模型,对所述低动态范围图像进行高动态范围重建处理,生成第一高动态范围图像。将所述低动态范围图像输入第二初始图像处理模型,生成第一系数。根据所述第一高动态范围图像与所述第一系数,生成第二高动态范围图像。根据所述第二高动态范围图像与真实高动态范围图像的数据对,生成损失函数;其中,所述真实高动态范围图像为所述低动态范围图像所对应的真实的高动态范围图像。使用所述损失函数对所述第一初始图像处理模型和所述第二初始图像处理模型进行训练。
在一些实施例中,所述第一系数为所述第二初始图像处理模型对所述低动态范围图像进行结构特征分析所生成的权重系数。
在一些实施例中,所述权重系数为1×1×3的矩阵。
在一些实施例中,所述低动态范围图像、所述第一高动态范围图像和所述第二高动态范围图像除包括图像的RGB三通道之外,还包括第四通道;所述损失函数还与所述第四通道的值有关。
在一些实施例中,所述第四通道为图像的像素最大值。所述损失函数包括所述第二高动态范围图像与所述真实高动态范围图像的RGB数据对的L1损失与色调映射损失,以及所述第二高动态范围图像与所述真实高动态范围图像的像素最大值数据对的L1损失。
在一些实施例中,所述根据所述第一高动态范围图像与所述第一系数,生成第二高动态范围图像,包括:
将所述第一高动态范围图像与所述第一系数相乘,生成所述第二高动态范围图像。
在一些实施例中,在将所述低动态范围图像输入第二初始图像处理模型之前,还包括:将所述低动态范围图像进行下采样处理,生成经过下采样处理的低动态范围图像。
在一些实施例中,所述第二初始图像处理模型的层数小于所述第一初始图像处理模型的层数。
在一些实施例中,在使用所述损失函数对所述第一初始图像处理模型和所述第二初始图像处理模型进行训练的过程中,在所述损失函数不收敛的情况下,更新所述第一初始图像处理模型和所述第二初始图像处理模型中的参数,网络优化器采用Adam优化器,学习率为1e-4。
另一方面,提供一种高动态范围图像生成方法,包括:将待处理图像输入第一目标图像处理模型,对所述待处理图像进行高动态范围重建处理,生成第一已处理图像;所述待处理图像为低动态范围图像。将所述待处理图像输入第二目标图像处理模型,生成第二系数;其中,所述第一目标图像处理模型和所述第二目标图像处理模型通过如上任一项所述的图像处理模型训练方法训练得到。根据所述第一已处理图像与所述第二系数,生成第二已处理图像。
在一些实施例中,所述第二系数为所述第二目标图像处理模型对所述待处理图像进行结构特征分析所生成的权重系数。
在一些实施例中,所述根据所述第一已处理图像与所述第二系数,生成第二已处理图像,包括:将所述第一已处理图像与所述第二系数相乘,生成所述第二已处理图像。
在一些实施例中,在将所述待处理图像输入第二目标图像处理模型之前,还包括:将所述待处理图像进行下采样处理,生成经过下采样处理的待处理图像。
在一些实施例中,所述第二目标图像处理模型的层数小于所述第一目标图像处理模型的层数。
又一方面,提供一种电子设备,包括:处理器、存储器、以及存储在所述存储器上并可在所述处理器上运行的计算机程序。所述处理器被配置为执行将低动态范围图像输入第一初始图像处理模型,对所述低动态范围图像进行高动态范围重建处理,生成第一高动态范围图像;所述处理器还被配置为执行将所述低动态范围图像输入第二初始图像处理模型,生成第一系数;所 述处理器还被配置为执行根据所述第一高动态范围图像与所述第一系数,生成第二高动态范围图像。
所述处理器还被配置为执行根据所述第二高动态范围图像与真实高动态范围图像的数据对,生成损失函数;其中,所述真实高动态范围图像为所述低动态范围图像所对应的真实的高动态范围图像;所述处理器还被配置为执行使用所述损失函数对所述第一初始图像处理模型和所述第二初始图像处理模型进行训练;所述存储器被配置为存储所述第一高动态范围图像的数据、所述第一系数,以及所述第二高动态范围图像的数据。
在一些实施例中,所述处理器还被配置为执行将所述第一高动态范围图像与所述第一系数相乘,生成所述第二高动态范围图像。所述处理器还被配置为执行在将所述低动态范围图像输入第二初始图像处理模型之前,将所述低动态范围图像进行下采样处理,生成经过下采样处理的低动态范围图像。
在一些实施例中,所述处理器还被配置为执行将待处理图像输入第一目标图像处理模型,对所述待处理图像进行高动态范围重建处理,生成第一已处理图像;所述待处理图像为低动态范围图像;所述处理器还被配置为执行将所述待处理图像输入第二目标图像处理模型,生成第二系数;其中,所述第一目标图像处理模型和所述第二目标图像处理模型通过如上任一项所述的图像处理模型训练方法训练得到;所述处理器还被配置为执行根据所述第一已处理图像与所述第二系数,生成第二已处理图像。所述存储器被配置为存储所述第一已处理图像、所述第二系数和所述第二已处理图像。
在一些实施例中,所述处理器还被配置为执行将所述第一已处理图像与所述第二系数相乘,生成所述第二已处理图像。所述处理器还被配置为执行在将所述待处理图像输入第二目标图像处理模型之前,将所述待处理图像进行下采样处理,生成经过下采样处理的待处理图像。
又一方面,提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序指令,所述计算机程序指令在处理器上运行时,使得所述处理器执行如上述一些实施例所述的图像处理模型训练方法中的一个或多个步骤,和/或,如上述一些实施例所述的图像处理方法中的一个或多个步骤。
又一方面,提供一种计算机程序产品。所述计算机程序产品包括计算机程序指令,在计算机上执行所述计算机程序指令时,所述计算机程序指令使计算机执行如上述一些实施例所述的图像处理模型训练方法中的一个或多个步骤,和/或,如上述一些实施例所述的图像处理方法中的一个或多个步骤。
又一方面,提供一种计算机程序。当所述计算机程序在计算机上执行时, 所述计算机程序使计算机执行如上述一些实施例所述的图像处理模型训练方法中的一个或多个步骤,和/或,如上述一些实施例所述的图像处理方法中的一个或多个步骤。
附图说明
为了更清楚地说明本公开中的技术方案,下面将对本公开一些实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例的附图,对于本领域普通技术人员来讲,还可以根据这些附图获得其他的附图。此外,以下描述中的附图可以视作示意图,并非对本公开实施例所涉及的产品的实际尺寸、方法的实际流程、信号的实际时序等的限制。
图1为根据一些实施例的高动态范围图像训练方法的一种流程图;
图2为根据一些实施例的高动态范围图像训练方法的另一种流程图;
图3为根据一些实施例的高动态范围图像训练方法的步骤图;
图4为根据一些实施例的高动态范围图像生成方法的一种流程图;
图5为根据一些实施例的高动态范围图像生成方法的另一种流程图;
图6为根据一些实施例的高动态范围图像生成方法的步骤图;
图7为根据一些实施例的电子设备的结构图。
具体实施方式
下面将结合附图,对本公开一些实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。基于本公开所提供的实施例,本领域普通技术人员所获得的所有其他实施例,都属于本公开保护的范围。
除非上下文另有要求,否则,在整个说明书和权利要求书中,术语“包括(comprise)”及其其他形式例如第三人称单数形式“包括(comprises)”和现在分词形式“包括(comprising)”被解释为开放、包含的意思,即为“包含,但不限于”。在说明书的描述中,术语“一个实施例(one embodiment)”、“一些实施例(some embodiments)”、“示例性实施例(exemplary embodiments)”、“示例(example)”、“特定示例(specific example)”或“一些示例(some examples)”等旨在表明与该实施例或示例相关的特定特征、结构、材料或特性包括在本公开的至少一个实施例或示例中。上述术语的示意性表示不一定是指同一实施例或示例。此外,所述的特定特征、结构、材料或特点可以以任何适当方式包括在任何一个或多个实施例或示例中。
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或 暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本公开实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。
在描述一些实施例时,可能使用了“耦接”和“连接”及其衍伸的表达。例如,描述一些实施例时可能使用了术语“连接”以表明两个或两个以上部件彼此间有直接物理接触或电接触。又如,描述一些实施例时可能使用了术语“耦接”以表明两个或两个以上部件有直接物理接触或电接触。然而,术语“耦接”或“通信耦合(communicatively coupled)”也可能指两个或两个以上部件彼此间并无直接接触,但仍彼此协作或相互作用。这里所公开的实施例并不必然限制于本文内容。
“A和/或B”,包括以下三种组合:仅A,仅B,及A和B的组合。
如本文中所使用,根据上下文,术语“如果”任选地被解释为意思是“当……时”或“在……时”或“响应于确定”或“响应于检测到”。类似地,根据上下文,短语“如果确定……”或“如果检测到[所陈述的条件或事件]”任选地被解释为是指“在确定……时”或“响应于确定……”或“在检测到[所陈述的条件或事件]时”或“响应于检测到[所陈述的条件或事件]”。
本文中“适用于”或“被配置为”的使用意味着开放和包容性的语言,其不排除适用于或被配置为执行额外任务或步骤的设备。
另外,“基于”的使用意味着开放和包容性,因为“基于”一个或多个所述条件或值的过程、步骤、计算或其他动作在实践中可以基于额外条件或超出所述的值。
高动态范围图像可以用来描述具有大范围亮度变化的现实场景,可以更好地展现场景中高亮区域与低暗区域的光学特性,一般把动态范围有限的普通相机拍摄的图像成为低动态范围图像,为准确反映现实场景中的图像,需要通过一些技术将低动态范围图像转化为高动态范围图像。相关技术中,针对单幅低动态范围图像,基于深度学习采用高动态范围重建的方法,将低动态范围图像通过设定的图像生成网络,能够完成低动态范围图像到高动态范围图像的转化,然而,相关技术中的方法所得到的高动态范围图像的真实性较低,无法准确还原出真实场景的照度信息。
基于此,本公开的一些实施例提供了一种图像处理模型训练方法,如图1、图2和图3所示,该训练方法包括S1~S7。
S1、将低动态范围图像输入第一初始图像处理模型,对低动态范围图像进行高动态范围重建处理,生成第一高动态范围图像。
其中,低动态范围图像为随机选取的一张低动态范围图像,该低动态范围图像具有其所对应的真实的高动态范围图像,称为真实高动态范围图像。示例性地,低动态范围图像和其所对应的真实高动态范围图像的图像数据对为从现有的公共数据集中选取的,将低动态范围图像作为样本,用于对第一初始图像处理模型和第二初始图像处理模型进行训练。
S2、将低动态范围图像输入第二初始图像处理模型,生成第一系数。
第一初始图像处理模型和第二初始图像处理模型为图像处理网络,例如图像处理网络为神经网络,第一初始图像处理模型和第二初始图像处理模型具有初始设定参数,第一初始图像处理模型具有从低动态范围图像到高动态范围图像的初始映射关系,能够将低动态范围图像转化为高动态范围图像。
在一些实施例中,在上述步骤中,第二初始图像处理模型对低动态范围图像进行结构特征分析,生成第一高动态范围图像的权重系数。第一系数为第二初始图像处理模型对低动态范围图像进行结构特征分析所生成的权重系数。第一系数能够对第一高动态范围图像进行细节补充和修复。
在一些实施例中,所述权重系数为1×1×3的矩阵。
在一些实施例中,如图2所示,在将低动态范围图像输入第二初始图像处理模型之前,还包括:
S2’、对低动态范围图像进行下采样处理,生成经过下采样处理的低动态范围图像。
示例性地,对于一幅尺寸为M×N的低动态范围图像,对其进行s倍下采样处理,得到(M/s)×(N/s)尺寸的分辨率图像,其中,s是M和N的公约数,也就是把原始图像s×s窗口内的图像变成一个像素,这个像素点的值就是窗口内所有像素的均值,这种情况下图像的像素个数缩减为原先s×s窗口内像素个数的s平方倍。
例如,对低动态范围图像进行2倍双三次下采样处理,得到(M/2)×(N/2)尺寸的分辨率图像,原始图像2×2窗口内的图像变成一个像素,这个像素点的值通过公式
Figure PCTCN2021119180-appb-000001
得到。假设原始图像A大小为m×n,缩小K(此处为2)倍后的目标图像B的大小为M×N,即K=m/M。原始图像A的每一个像素点的值是已知的,目标图像B的每一个像素点是的值未知的,若要求出目标图像B中每一像素点(X,Y)的值,必须先找出像素(X,Y)在原始图像A中对应的像素(x,y),再根据原始图像A距离像素(x,y)最近的16个像素点作为计算目标图像B(X,Y)处像素值的参数,利用Bicubic基函数求出16个像素点的权重a ij,目标图像B像素(X,Y)的值就等于16个像素点的加权叠加。
在S2中,将经过下采样处理的低动态范围图像输入第二初始图像处理模型,对其进行处理。
对低动态范围图像进行下采样处理,将低动态范围图像缩小,从而图像尺寸减小,能够提升后续步骤的处理速度,例如能够提升了S2中对低动态范围图像进行结构特征分析的效率,以及提升图像处理模型的训练速度。
在一些实施例中,第二初始图像处理模型的层数小于第一初始图像处理模型的层数。在S2’中,由于对低动态范围图像进行了下采样处理,缩小了图像尺寸,因此将经过下采样处理的低动态范围图像输入较小的网络模型,能够使得图像尺寸与网络模型地大小更加匹配,同时也能提升对第二初始图像处理模型的训练速度。
示例性地,第一初始图像处理模型为ResNet网络模型或DenseNet网络模型,例如第一初始图像处理模型为34层、50层甚至是101层的ResNet网络模型,第二初始图像处理模型为VGG网络模型、GoogleNet Inception V1网络模型、或MobileNets网络模型,例如第二初始图像处理模型为16层或19层的VGG网络模型。
ResNet网络模型为残差网络,其特点是容易优化,并且能够通过增加相当的深度来提高准确率。其内部的残差块使用了跳跃连接,缓解了在深度神经网络中增加深度带来的梯度消失问题。
VGG网络使用多个较小卷积核(3x3)的卷积层代替一个卷积核比较大的卷积层(7x7),一方面可以减少参数,另一方面相当于进行了更多的非线性映射,可以增加网络的拟合/表达能力。
在一些实施例中,第二初始图像处理模型和第一初始图像处理模型也可以采用同样的网络模型。
需要说明的是,对上述S1和S2的执行次序并不设限,可以是同时执行S1和S2;也可以是先执行S1,再执行S2,还可以是先执行S2,再执行S1。
由上述S1和S2分别获得第一高动态范围图像和第一系数,可执行S3。
S3、根据第一高动态范围图像与第一系数,生成第二高动态范围图像。
在一些实施例中,S3包括:将第一高动态范围图像与第一系数相乘,生成第二高动态范围图像。
在一些实施例中,本公开所采用的图像为彩色图像,图像数据为三维矩阵,具有三个通道,分别为第一通道R通道、第二通道G通道和第三通道B通道,例如,低动态范围图像为M×N×3的图片,其中M和N为图像的像素行数和列数,3为每个像素的RGB三通道。
在一些实施例中,本公开所提到的图片,即低动态范围图像、所述第一高动态范围图像和所述第二高动态范围图像除包括图像的RGB三通道之外,还包括第四通道;第四通道能够反映图片的亮度信息。
在一些实例中,所述第四通道为图像的像素最大值。
在一些实施例中,在将低动态范围图像输入第一初始图像处理模型之前,还包括:提取低动态范围图像的像素最大值作为第四通道,进行通道连接,在S1中,将具有四个通道的低动态范围图像输入第一初始图像处理模型。
示例性地,像素最大值为单个值,将该三个值扩展为M×N×1的矩阵,将该M×N×1的矩阵与低动态范围图像(图像大小为M×N×3)的三个通道进行通道连接,得到图像大小为M×N×4的低动态范围图像。
在S1中,经过第一初始图像处理模型的处理,所生成的第一高动态范围图像为三维矩阵。第一高动态范围图像也包括四个通道。
在S2中,经过第二初始图像处理模型的处理,第一高动态范围图像的权重系数为1×1×3的矩阵,数字3对应的是图像本身的RGB三通道。
在S3中,将第一高动态范围图像与第一系数相乘,可得到最终的结果,即可输出第二高动态范围图像,第二高动态范围图像包括四个通道。其中,第一系数为第二初始图像处理模型对所述低动态范围图像进行结构特征分析所生成的权重系数,第二高动态范围图像在第一高动态范围图像的基础上,通过权重系数的调整,对图像的细节做了修复和补充,因此第二高动态范围图像相比第一高动态范围图像,与低动态范围图像所对应的真实高动态范围图像更接近,能反映更所真实场景的照度信息。
S4、根据第二高动态范围图像与真实高动态范围图像的数据对,生成损失函数;其中,真实高动态范围图像为低动态范围图像所对应的真实的高动态范围图像。
在上述步骤中,计算损失函数的值,能够获取低动态范围图像在第一初始图像处理模型和第二初始图像处理模型中的损失值。损失函数根据第二高动态范围图像与真实高动态范围图像的数据对得到。
由于本公开中,低动态范围图像、所述第一高动态范围图像和所述第二高动态范围图像除包括图像的RGB三通道之外,还包括第四通道,因此损失函数除与图像的RGB三通道的值有关之外,还与第四通道的值有关,也就是与图像的像素最大值有关。
在一些实施例中,损失函数包括第二高动态范围图像与真实高动态范围图像的RGB数据对的L1损失与色调映射损失,以及第二高动态范围图像与 真实高动态范围图像的像素最大值数据对的L1损失,损失函数Loss的计算公式如下:
Figure PCTCN2021119180-appb-000002
其中,
Figure PCTCN2021119180-appb-000003
为第二高动态范围图像的RGB数据对数值,I gt为真实高动态范围图像的RGB数据对数值,
Figure PCTCN2021119180-appb-000004
为第二高动态范围图像的像素最大值数据对,
Figure PCTCN2021119180-appb-000005
为真实高动态范围图像的像素最大值数据对。该公式中的系数为预先设定的。
因为不同曝光条件下图像的极值不同,因此在训练过程中提取图像的像素最大值,作为图像的第四通道,对损失函数增加了第二高动态范围图像与真实高动态范围图像的像素最大值的像素最大值这一约束条件,对训练过程做进一步的约束,能够更有针对性地提高对第一初始图像处理模型和第二初始图像处理模型地训练准确度,优化了网络模型的训练过程,提高最终得到地第一目标图像处理模型和第二目标图像处理模型的高动态范围重建性能。
S5、使用所述损失函数对第一初始图像处理模型和第二初始图像处理模型进行训练。判断损失函数的值是否不再减小;若损失函数的值继续减小,则执行S6,若损失函数的值不再减小,则执行S7。
S6、更新第一初始图像处理模型和第二初始图像处理模型中的参数。
在一些实施例中,在使用损失函数对第一初始图像处理模型和第二初始图像处理模型进行训练,更新第一初始图像处理模型和第二初始图像处理模型中的参数的过程中,网络优化器采用Adam优化器,学习率为1e-4。
重读上述步骤S1~S5,直至损失函数的值不再减小,则执行S7。
S7、将最后一次训练得到的第一初始图像处理模型和第二初始图像处理模型作为训练后的第一目标图像处理模型和第二目标图像处理模型。
在上述S5~S7中,根据训练策略,对第一初始图像处理模型和第二初始图像处理模型做多次训练,在训练过程中,通过不断更新第一初始图像处理模型和第二初始图像处理模型中的参数,使得每次输出的第二高动态范围图像愈加接近真实场景的动态范围,损失函数的值逐步减小,直到损失函数的值不再减小,即损失函数收敛,这样第一初始图像处理模型和第二初始图像处理模型便建立了一个从单张低动态范围图像到高动态范围图像的映射关系,将最后一次训练得到的第一初始图像处理模型和第二初始图像处理模型作为训练后的第一目标图像处理模型和第二目标图像处理模型,利用训练好的第一目标图像处理模型和第二目标图像处理模型,即可在后续图像生成过程中直接完成低动态范围图像到高动态范围图像的重建。
本公开的一些实施例所提供的图像处理模型训练方法,将第一初始图像 处理模型和第二初始图像处理模型进行组合,通过训练对该两个网络模型进行训练,将训练输入第一初始图像处理模型生成第一高动态范围图像,同时通过第二初始图像处理模型生成第一系数,第一系数为对低动态范围图像进行结构特征分析第一高动态范围图像的权重系数,通过第一系数能够对第一高动态范围图像进行细节修复,根据二者输出的值得到最终的高动态范围图像,使得最终的高动态范围图像更加接近真实高动态范围图像,这样能够保证最终训练得到的第一目标图像处理模型和第二目标图像处理模型的真确性,保证图像的高动态范围重建质量。并且,在训练输入第二初始图像处理模型之前,还对其进行下采样处理,且第二初始图像处理模型采用小网络,这样提升了训练效率,缩减了工作耗时。
如图4~图6所示,本公开的一些实施例还提供了一种高动态范围图像生成方法,包括:
S10、将待处理图像输入第一目标图像处理模型,对待处理图像进行高动态范围重建处理,生成第一已处理图像;待处理图像为低动态范围图像,第一已处理图像为高动态范围图像。
第一目标图像处理模型通过如上述介绍的图像处理模型训练方法训练得到。待处理图像为任意一张需要进行高动态范围重建的低动态范围图像。
S20、将待处理图像输入第二目标图像处理模型,生成第二系数。
所述第二系数为所述第二目标图像处理模型对所述待处理图像进行结构特征分析所生成的权重系数。第二系数能够对第一已处理图像进行细节补充和修复。
第二目标图像处理模型通过如上述介绍的图像处理模型训练方法训练得到。
在一些实施例中,如图5所示,在将待处理图像输入第二目标图像处理模型,生成高动态范围图像的权重系数之前,还包括:S20’、将待处理图像进行下采样处理,生成经过下采样处理的待处理图像。
在S20中,将经过下采样处理的待处理图像输入第二目标图像处理模型,对其进行处理。
对待处理图像进行下采样处理的步骤可参照上述对低动态范围图像进行下采样处理的步骤,此处不再赘述。
第二初始图像处理模型的层数小于第一初始图像处理模型的层数,在一些实施例中,第二目标图像处理模型的层数小于第一目标图像处理模型的层数。示例性地,第一目标图像处理模型为ResNet网络模型或DenseNet网络模 型,,第二目标图像处理模型为VGG网络模型、GoogleNet Inception V1网络模型、或MobileNets网络模型。
这样,对待处理图像进行下采样处理,缩小图片尺寸,且第二目标图像处理模型采用较小的网络模型,能够提高图片的处理速率。
S30、根据第一已处理图像与第二系数,生成第二已处理图像。
其中,第二已处理图像为高动态范围图像,且第二已处理图像的图像质量相比第一已处理图像的图像质量更高,更加接近真实场景。
在一些实施例中,S30包括:将第一已处理图像的数据与第二系数相乘,生成第二已处理图像。
本公开的一些实施例所提供的高动态范围图像生成方法,采用第一目标图像处理模型和第二目标图像处理模型对低动态范围图像进行处理,且第一目标图像处理模型和第二目标图像处理模型通过如上述介绍的图像处理模型训练方法训练得到,从而低动态范围图像经过第一目标图像处理模型对其进行高动态范围重建,输出第一已处理图像,同时低动态范围图像经过第二目标图像处理模型对其进行结构特征分析,生成权重系数(第二系数),能够对第一已处理图像进行细节上的补充和修复,使得根据第一已处理图像与权重系数得到的第二已处理图像更加逼近真实的高动态图像的效果,这样利用训练好的第一目标图像处理模型和第二目标图像处理模型,即可完成将低动态范围图像尽可能重建真实的高动态范围图像,该方法提高了所生成的高动态范围图像的真实性,能够较为准确还原出真实场景的照度信息。
如图7所示,本公开的一些实施例还提供了一种电子设备10,该电子设备包括:处理器1、存储器2、以及存储在所述存储器2上并可在所述处理器1上运行的计算机程序。
处理器1被配置为执行将低动态范围图像输入第一初始图像处理模型,对低动态范围图像进行高动态范围重建处理,生成第一高动态范围图像。
处理器1还被配置为执行将低动态范围图像输入第二初始图像处理模型,生成第一系数。
处理器1还被配置为执行根据第一高动态范围图像的数据与第一系数,生成第二高动态范围图像。
处理器1还被配置为执行根据第二高动态范围图像与真实高动态范围图像的数据对,生成损失函数;其中,真实高动态范围图像为低动态范围图像所对应的真实的高动态范围图像。
处理器1还被配置为执行使用损失函数对第一初始图像处理模型和第二 初始图像处理模型进行训练。
存储器2被配置为存储第一高动态范围图像的数据、第一系数,以及第二高动态范围图像的数据。
在一些实施例中,处理器1还被配置为执行将第一高动态范围图像的数据与第一系数相乘,生成第二高动态范围图像。
处理器1还被配置为执行在将低动态范围图像输入第二初始图像处理模型之前,将低动态范围图像进行下采样处理,生成经过下采样处理的低动态范围图像。
在一些实施例中,处理器1还被配置为执行将待处理图像输入第一目标图像处理模型,对待处理图像进行高动态范围重建处理,生成第一已处理图像;待处理图像为低动态范围图像。
处理器1还被配置为执行将待处理图像输入第二目标图像处理模型,生成第二系数。其中,第一目标图像处理模型和第二目标图像处理模型通过如本公开一些实施例提供的图像处理模型训练方法训练得到。
处理器1还被配置为执行根据第一已处理图像与第二系数,生成第二已处理图像。
存储器2被配置为存储第一已处理图像、第二系数和第二已处理图像。
在一些实施例中,处理器1还被配置为执行将第一已处理图像与第二系数相乘,生成第二已处理图像。
处理器1还被配置为执行在将待处理图像输入第二目标图像处理模型之前,将待处理图像进行下采样处理,生成经过下采样处理的待处理图像。
示例性地,处理器可以是中央处理单元(Central Processing Unit,简称CPU),还可以是其他通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
存储器可以是只读存储器(Read-Only Memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(Random Access Memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存 储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过通信总线与处理器相连接。存储器也可以和处理器集成在一起。
上述电子设备的有益效果和上述一些实施例所述的图像处理模型训练方法以及图像处理方法的有益效果相同,此处不再赘述。
本公开的一些实施例提供了一种计算机可读存储介质(例如,非暂态计算机可读存储介质),计算机可读存储介质中存储有计算机程序指令,计算机程序指令在处理器上运行时,使得处理器执行如上述任一实施例所提供的图像处理模型训练方法,和/或,如上述任一实施例所提供的高动态范围图像生成方法。
示例性的,上述计算机可读存储介质可以包括,但不限于:磁存储器件(例如,硬盘、软盘或磁带等),光盘(例如,CD(Compact Disk,压缩盘)、DVD(Digital Versatile Disk,数字通用盘)等),智能卡和闪存器件(例如,EPROM(Erasable Programmable Read-Only Memory,可擦写可编程只读存储器)、卡、棒或钥匙驱动器等)。
本公开描述的各种计算机可读存储介质可代表用于存储信息的一个或多个设备和/或其它机器可读存储介质。术语“机器可读存储介质”可包括但不限于,无线信道和能够存储、包含和/或承载指令和/或数据的各种其它介质。
本公开的一些实施例还提供了一种计算机程序产品。该计算机程序产品包括计算机程序指令,在计算机上执行该计算机程序指令时,该计算机程序指令使计算机执行如上述实施例中一些实施例所述的图像处理模型训练方法中的一个或多个步骤,和/或,如上述实施例中一些实施例所述的图像处理方法中的一个或多个步骤。
本公开的一些实施例还提供了一种计算机程序。当该计算机程序在计算机上执行时,该计算机程序使计算机执行如上述实施例中一些实施例所述的图像处理模型训练方法中的一个或多个步骤,和/或,如上述实施例中一些实施例所述的图像处理方法中的一个或多个步骤。
上述计算机可读存储介质、计算机程序产品及计算机程序的有益效果和上述一些实施例所述的图像处理模型训练方法以及图像处理方法的有益效果相同,此处不再赘述。
以上所述,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,想到变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的 保护范围应以所述权利要求的保护范围为准。

Claims (19)

  1. 一种图像处理模型训练方法,包括:
    将低动态范围图像输入第一初始图像处理模型,对所述低动态范围图像进行高动态范围重建处理,生成第一高动态范围图像;
    将所述低动态范围图像输入第二初始图像处理模型,生成第一系数;
    根据所述第一高动态范围图像与所述第一系数,生成第二高动态范围图像;
    根据所述第二高动态范围图像与真实高动态范围图像的数据对,生成损失函数;其中,所述真实高动态范围图像为所述低动态范围图像所对应的真实的高动态范围图像;
    使用所述损失函数对所述第一初始图像处理模型和所述第二初始图像处理模型进行训练。
  2. 根据权利要求1所述的图像处理模型训练方法,其中,所述第一系数为所述第二初始图像处理模型对所述低动态范围图像进行结构特征分析所生成的权重系数。
  3. 根据权利要求2所述的图像处理模型训练方法,其中,所述权重系数为1×1×3的矩阵。
  4. 根据权利要求1~3中任一项所述的图像处理模型训练方法,其中,所述低动态范围图像、所述第一高动态范围图像和所述第二高动态范围图像除包括图像的RGB三通道之外,还包括第四通道;
    所述损失函数还与所述第四通道的值有关。
  5. 根据权利要求4所述的图像处理模型训练方法,其中,所述第四通道为图像的像素最大值;
    所述损失函数包括所述第二高动态范围图像与所述真实高动态范围图像的RGB数据对的L1损失与色调映射损失,以及所述第二高动态范围图像与所述真实高动态范围图像的像素最大值数据对的L1损失。
  6. 根据权利要求1~5中任一项所述的图像处理模型训练方法,其中,所述根据所述第一高动态范围图像与所述第一系数,生成第二高动态范围图像,包括:
    将所述第一高动态范围图像与所述第一系数相乘,生成所述第二高动态范围图像。
  7. 根据权利要求1所述的图像处理模型训练方法,其中,在将所述低动态范围图像输入第二初始图像处理模型之前,还包括:
    将所述低动态范围图像进行下采样处理,生成经过下采样处理的低动态范围图像。
  8. 根据权利要求7所述的图像处理模型训练方法,其中,所述第二初始图像处理模型的层数小于所述第一初始图像处理模型的层数。
  9. 根据权利要求1~7中任一项所述的图像处理模型训练方法,其中,在使用所述损失函数对所述第一初始图像处理模型和所述第二初始图像处理模型进行训练的过程中,在所述损失函数不收敛的情况下,更新所述第一初始图像处理模型和所述第二初始图像处理模型中的参数,网络优化器采用Adam优化器,学习率为1e-4。
  10. 一种高动态范围图像生成方法,包括:
    将待处理图像输入第一目标图像处理模型,对所述待处理图像进行高动态范围重建处理,生成第一已处理图像;所述待处理图像为低动态范围图像;
    将所述待处理图像输入第二目标图像处理模型,生成第二系数;其中,所述第一目标图像处理模型和所述第二目标图像处理模型通过如权利要求1~9中任一项所述的图像处理模型训练方法训练得到;
    根据所述第一已处理图像与所述第二系数,生成第二已处理图像。
  11. 根据权利要求10所述的高动态范围图像生成方法,其中,所述第二系数为所述第二目标图像处理模型对所述待处理图像进行结构特征分析所生成的权重系数。
  12. 根据权利要求10或11所述的高动态范围图像训练方法,其中,所述根据所述第一已处理图像与所述第二系数,生成第二已处理图像,包括:
    将所述第一已处理图像与所述第二系数相乘,生成所述第二已处理图像。
  13. 根据权利要求10所述的高动态范围图像生成方法,其中,在将所述待处理图像输入第二目标图像处理模型之前,还包括:
    将所述待处理图像进行下采样处理,生成经过下采样处理的待处理图像。
  14. 根据权利要求13所述的高动态范围图像生成方法,其中,所述第二目标图像处理模型的层数小于所述第一目标图像处理模型的层数。
  15. 一种电子设备,包括:处理器、存储器、以及存储在所述存储器上并可在所述处理器上运行的计算机程序;
    所述处理器被配置为执行将低动态范围图像输入第一初始图像处理模型,对所述低动态范围图像进行高动态范围重建处理,生成第一高动态范围图像;
    所述处理器还被配置为执行将所述低动态范围图像输入第二初始图像处 理模型,生成第一系数;
    所述处理器还被配置为执行根据所述第一高动态范围图像与所述第一系数,生成第二高动态范围图像;
    所述处理器还被配置为执行根据所述第二高动态范围图像与真实高动态范围图像的数据对,生成损失函数;其中,所述真实高动态范围图像为所述低动态范围图像所对应的真实的高动态范围图像;
    所述处理器还被配置为执行使用所述损失函数对所述第一初始图像处理模型和所述第二初始图像处理模型进行训练;
    所述存储器被配置为存储所述第一高动态范围图像的数据、所述第一系数,以及所述第二高动态范围图像的数据。
  16. 根据权利要求15所述的电子设备,其中,所述处理器还被配置为执行将所述第一高动态范围图像的数据与所述第一系数相乘,得到所述第二高动态范围图像的数据,生成所述第二高动态范围图像;
    所述处理器还被配置为执行在将所述低动态范围图像输入第二初始图像处理模型之前,将所述低动态范围图像进行下采样处理,生成经过下采样处理的低动态范围图像。
  17. 根据权利要求15所述的电子设备,其中,所述处理器还被配置为执行将待处理图像输入第一目标图像处理模型,对所述待处理图像进行高动态范围重建处理,生成第一已处理图像;所述待处理图像为低动态范围图像;
    所述处理器还被配置为执行将所述待处理图像输入第二目标图像处理模型,生成第二系数;其中,所述第一目标图像处理模型和所述第二目标图像处理模型通过如权利要求1~9中任一项所述的图像处理模型训练方法训练得到;
    所述处理器还被配置为执行根据所述第一已处理图像与所述第二系数,生成第二已处理图像;
    所述存储器被配置为存储所述第一已处理图像、所述第二系数和所述第二已处理图像。
  18. 根据权利要求17所述的电子设备,其中,所述处理器还被配置为执行将所述第一已处理图像与所述第二系数相乘,生成所述第二已处理图像;
    所述处理器还被配置为执行在将所述待处理图像输入第二目标图像处理模型之前,将所述待处理图像进行下采样处理,生成经过下采样处理的待处理图像。
  19. 一种计算机可读存储介质,所述计算机可读存储介质中存储有计算 机程序指令,所述计算机程序指令在处理器上运行时,使得所述处理器执行如权利要求1~9中任一项所述的图像处理模型训练方法,和/或,如权利要求10~14中任一项所述的高动态范围图像生成方法。
PCT/CN2021/119180 2021-09-17 2021-09-17 图像处理模型训练方法、高动态范围图像生成方法 WO2023039863A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2021/119180 WO2023039863A1 (zh) 2021-09-17 2021-09-17 图像处理模型训练方法、高动态范围图像生成方法
CN202180002597.1A CN116157825A (zh) 2021-09-17 2021-09-17 图像处理模型训练方法、高动态范围图像生成方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/119180 WO2023039863A1 (zh) 2021-09-17 2021-09-17 图像处理模型训练方法、高动态范围图像生成方法

Publications (1)

Publication Number Publication Date
WO2023039863A1 true WO2023039863A1 (zh) 2023-03-23

Family

ID=85602331

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/119180 WO2023039863A1 (zh) 2021-09-17 2021-09-17 图像处理模型训练方法、高动态范围图像生成方法

Country Status (2)

Country Link
CN (1) CN116157825A (zh)
WO (1) WO2023039863A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9697592B1 (en) * 2015-12-30 2017-07-04 TCL Research America Inc. Computational-complexity adaptive method and system for transferring low dynamic range image to high dynamic range image
CN110163808A (zh) * 2019-03-28 2019-08-23 西安电子科技大学 一种基于卷积神经网络的单帧高动态成像方法
US20200134787A1 (en) * 2017-06-28 2020-04-30 Huawei Technologies Co., Ltd. Image processing apparatus and method
CN111292264A (zh) * 2020-01-21 2020-06-16 武汉大学 一种基于深度学习的图像高动态范围重建方法
CN111709900A (zh) * 2019-10-21 2020-09-25 上海大学 一种基于全局特征指导的高动态范围图像重建方法
CN113096021A (zh) * 2019-12-23 2021-07-09 中国移动通信有限公司研究院 一种图像处理方法、装置、设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9697592B1 (en) * 2015-12-30 2017-07-04 TCL Research America Inc. Computational-complexity adaptive method and system for transferring low dynamic range image to high dynamic range image
US20200134787A1 (en) * 2017-06-28 2020-04-30 Huawei Technologies Co., Ltd. Image processing apparatus and method
CN110163808A (zh) * 2019-03-28 2019-08-23 西安电子科技大学 一种基于卷积神经网络的单帧高动态成像方法
CN111709900A (zh) * 2019-10-21 2020-09-25 上海大学 一种基于全局特征指导的高动态范围图像重建方法
CN113096021A (zh) * 2019-12-23 2021-07-09 中国移动通信有限公司研究院 一种图像处理方法、装置、设备及存储介质
CN111292264A (zh) * 2020-01-21 2020-06-16 武汉大学 一种基于深度学习的图像高动态范围重建方法

Also Published As

Publication number Publication date
CN116157825A (zh) 2023-05-23

Similar Documents

Publication Publication Date Title
Li et al. An underwater image enhancement benchmark dataset and beyond
US11882357B2 (en) Image display method and device
WO2020259118A1 (en) Method and device for image processing, method and device for training object detection model
Kundu et al. No-reference quality assessment of tone-mapped HDR pictures
US9330334B2 (en) Iterative saliency map estimation
US20150116350A1 (en) Combined composition and change-based models for image cropping
WO2018018470A1 (zh) 一种去除图像噪声的方法、装置、设备及卷积神经网络
JP2012032370A (ja) 欠陥検出方法、欠陥検出装置、学習方法、プログラム、及び記録媒体
WO2021082819A1 (zh) 一种图像生成方法、装置及电子设备
CN111047543A (zh) 图像增强方法、装置和存储介质
CN109636765B (zh) 基于图像多重曝光融合的高动态显示方法
CN112581370A (zh) 人脸图像的超分辨率重建模型的训练及重建方法
JP7207846B2 (ja) 情報処理装置、情報処理方法及びプログラム
Steffens et al. Cnn based image restoration: Adjusting ill-exposed srgb images in post-processing
WO2023005818A1 (zh) 噪声图像生成方法、装置、电子设备及存储介质
JP2021179833A (ja) 情報処理装置、情報処理方法及びプログラム
TWI803243B (zh) 圖像擴增方法、電腦設備及儲存介質
CN114444679A (zh) 二值化输入模型的量化方法及系统、计算机可读存储介质
Feng et al. Low-light image enhancement algorithm based on an atmospheric physical model
CN116957917A (zh) 一种基于近端策略优化的图像美化方法及装置
Rahman et al. Efficient contrast adjustment and fusion method for underexposed images in industrial cyber-physical systems
CN110580696A (zh) 一种细节保持的多曝光图像快速融合方法
CN113902966A (zh) 一种针对电子元器件的无锚框目标检测网络及应用该网络的检测方法
WO2023039863A1 (zh) 图像处理模型训练方法、高动态范围图像生成方法
CN112991236B (zh) 一种基于模板的图像增强方法及装置

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 17910020

Country of ref document: US

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21957144

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE