WO2022266955A1 - Image decoding method and apparatus, image processing method and apparatus, and device - Google Patents

Image decoding method and apparatus, image processing method and apparatus, and device Download PDF

Info

Publication number
WO2022266955A1
WO2022266955A1 PCT/CN2021/102173 CN2021102173W WO2022266955A1 WO 2022266955 A1 WO2022266955 A1 WO 2022266955A1 CN 2021102173 W CN2021102173 W CN 2021102173W WO 2022266955 A1 WO2022266955 A1 WO 2022266955A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature information
information
channel
feature
image
Prior art date
Application number
PCT/CN2021/102173
Other languages
French (fr)
Chinese (zh)
Inventor
元辉
姜世奇
杨烨
李明
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to CN202180097934.XA priority Critical patent/CN117441186A/en
Priority to PCT/CN2021/102173 priority patent/WO2022266955A1/en
Publication of WO2022266955A1 publication Critical patent/WO2022266955A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Definitions

  • the present application relates to the technical field of image processing, and in particular to an image decoding and processing method, device and equipment.
  • Dynamic range is a term used to define how wide a range of tonal detail a camera can capture in an image, usually the range from the lowest value to the highest overflow value. Simply put, it describes the ratio between the brightest and darkest tones a camera can record in a single frame. The larger the dynamic range, the more likely it is to preserve information in highlights and shadows.
  • Embodiments of the present application provide an image decoding and processing method, device, and equipment to reduce the cost of converting a low dynamic range image into a high dynamic range image.
  • the embodiment of the present application provides an image decoding method, including:
  • the dynamic conversion model includes: N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in the N encoding modules is connected to the input of the first decoding module in the N decoding modules , and the i-th coding module is skip-connected to the N-i+1-th decoding module, and the i-th coding module is used to perform feature extraction on the i-1-th first feature information output by the i-1-th coding module, Obtain the i-th first feature information of the reconstructed image, and the N-i+1 decoding module is used to perform feature extraction on the i-1-th first feature information and the N-i-th second feature information of the reconstructed image, and obtain the reconstruction
  • the N-i+1th second feature information of the image, the HDR image of the reconstructed image is determined according to the second feature information output by the last decoding module in the N decoding modules, i is a positive integer less than or equal to N, and N is a positive integer.
  • the present application provides an image processing method, including:
  • the dynamic conversion model includes: N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in the N encoding modules is connected to the input of the first decoding module in the N decoding modules , and the i-th coding module is skip-connected to the N-i+1-th decoding module, and the i-th coding module is used to perform feature extraction on the i-1-th first feature information output by the i-1-th coding module, Obtain the i-th first feature information of the LDR image, and the N-i+1 decoding module is used to perform feature extraction on the i-1-th first feature information and the N-i-th second feature information of the LDR image, and obtain the LDR
  • the N-i+1th second feature information of the image, the HDR image of the LDR image is determined according to the second feature information output by the last decoding module in the N decoding modules, i is a positive integer less than or equal to N, and N is a positive integer.
  • the present application provides a model training method, including:
  • the i-1th first feature information and the N-ith second feature information of the LDR training image are extracted to obtain the N-i+1th second feature information of the LDR training image characteristic information;
  • an image decoding device configured to execute the method in the above first aspect or its implementations.
  • the image decoding device includes a functional unit configured to execute the method in the above first aspect or each implementation manner thereof.
  • a decoder including a processor and a memory.
  • the memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory, so as to execute the method in the above first aspect or its various implementations.
  • an image processing device configured to execute the method in the above-mentioned second aspect or various implementations thereof.
  • the device includes a functional unit configured to execute the method in the above second aspect or each implementation manner thereof.
  • an image processing device including a processor and a memory.
  • the memory is used to store a computer program
  • the processor is used to invoke and run the computer program stored in the memory, so as to execute the method in the above second aspect or its various implementations.
  • a model training device configured to execute the method in the above third aspect or various implementations thereof.
  • the model training device includes a functional unit for executing the method in the above third aspect or its various implementations.
  • a model training device including a processor and a memory.
  • the memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory, so as to execute the method in the above third aspect or its various implementations.
  • a chip configured to implement any one of the foregoing first to third aspects or the method in each implementation manner thereof.
  • the chip includes: a processor, configured to call and run a computer program from the memory, so that the device installed with the chip executes any one of the above-mentioned first to third aspects or any of the implementations thereof. method.
  • a computer-readable storage medium for storing a computer program, and the computer program causes a computer to execute any one of the above-mentioned first to third aspects or the method in each implementation manner thereof.
  • a twelfth aspect provides a computer program product, including computer program instructions, the computer program instructions cause a computer to execute any one of the above first to third aspects or the method in each implementation manner.
  • a thirteenth aspect provides a computer program, which, when running on a computer, causes the computer to execute any one of the above first to third aspects or the method in each implementation manner.
  • the dynamic conversion model includes N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in the N encoding modules is connected with the first decoding module in the N decoding modules
  • the input connection of and the i-th encoding module is skip-connected to the N-i+1-th decoding module, and the i-th encoding module is used to perform the i-1th first feature information output by the i-1-th encoding module Feature extraction, to obtain the i-th first feature information of the reconstructed image, and the N-i+1-th decoding module is used to perform feature extraction on the i-1-th first feature information and the N-i-th second feature information of the reconstructed image , to obtain the N-i+1th second characteristic information of the reconstructed image, the HDR image of the reconstructed image is determined according to the second characteristic information output by the last decoding module in the N decoding modules, i is a positive value less than or equal to N In
  • LDR images can be converted into HDR images, and HDR image conversion can be realized without increasing the cost of data acquisition, encoding, transmission, storage, etc., thereby improving the efficiency of HDR image conversion and reducing the cost of HDR images. image.
  • FIG. 1 is a schematic block diagram of a video encoding and decoding system involved in an embodiment of the present application
  • Fig. 2 is a schematic block diagram of a video encoder provided by an embodiment of the present application.
  • Fig. 3 is a schematic block diagram of a video decoder provided by an embodiment of the present application.
  • FIG. 4 is a schematic flow chart of a dynamic conversion model training method provided by an embodiment of the present application.
  • FIG. 5A is a network schematic diagram of a dynamic conversion model involved in an embodiment of the present application.
  • FIG. 5B is a schematic network diagram of a convolution block involved in an embodiment of the present application.
  • FIG. 5C is a network schematic diagram of a dynamic conversion model involved in an embodiment of the present application.
  • FIG. 5D is a network diagram of a convolutional attention module involved in an embodiment of the present application.
  • FIG. 5E is a network diagram of a channel attention module involved in an embodiment of the present application.
  • FIG. 5F is a network schematic diagram of a spatial attention module involved in an embodiment of the present application.
  • FIG. 5G is a network schematic diagram of a dynamic conversion model involved in an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of an image decoding method provided by an embodiment of the present application.
  • FIG. 7 is a network diagram of a spatial attention module involved in an embodiment of the present application.
  • FIG. 8 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • FIG. 9 is a schematic block diagram of an image decoding device provided by an embodiment of the present application.
  • FIG. 10 is a schematic block diagram of an image processing device provided by an embodiment of the present application.
  • Fig. 11 is a schematic block diagram of a model training device provided by an embodiment of the present application.
  • Fig. 12 is a schematic block diagram of an electronic device provided by an embodiment of the present application.
  • the present application can be applied to the technical field of point cloud upsampling, for example, can be applied to the technical field of point cloud compression.
  • the application can be applied to the field of image codec, video codec, hardware video codec, dedicated circuit video codec, real-time video codec, etc.
  • the solution of the present application can be combined with audio and video coding standards (audio video coding standard, referred to as AVS), for example, H.264/audio video coding (audio video coding, referred to as AVC) standard, H.265/high efficiency video coding ( High efficiency video coding (HEVC for short) standard and H.266/versatile video coding (VVC for short) standard.
  • the solutions of the present application may operate in conjunction with other proprietary or industry standards, including ITU-TH.261, ISO/IECMPEG-1Visual, ITU-TH.262 or ISO/IECMPEG-2Visual, ITU-TH.263 , ISO/IECMPEG-4Visual, ITU-TH.264 (also known as ISO/IECMPEG-4AVC), including scalable video codec (SVC) and multi-view video codec (MVC) extensions.
  • SVC scalable video codec
  • MVC multi-view video codec
  • FIG. 1 is a schematic block diagram of a video encoding and decoding system involved in an embodiment of the present application. It should be noted that FIG. 1 is only an example, and the video codec system in the embodiment of the present application includes but is not limited to what is shown in FIG. 1 .
  • the video codec system 100 includes an encoding device 110 and a decoding device 120 .
  • the encoding device is used to encode (can be understood as compression) the video data to generate a code stream, and transmit the code stream to the decoding device.
  • the decoding device decodes the code stream generated by the encoding device to obtain decoded video data.
  • the encoding device 110 in the embodiment of the present application can be understood as a device having a video encoding function
  • the decoding device 120 can be understood as a device having a video decoding function, that is, the embodiment of the present application includes a wider range of devices for the encoding device 110 and the decoding device 120, Examples include smartphones, desktop computers, mobile computing devices, notebook (eg, laptop) computers, tablet computers, set-top boxes, televisions, cameras, display devices, digital media players, video game consoles, vehicle-mounted computers, and the like.
  • the encoding device 110 may transmit the encoded video data (such as code stream) to the decoding device 120 via the channel 130 .
  • Channel 130 may include one or more media and/or devices capable of transmitting encoded video data from encoding device 110 to decoding device 120 .
  • channel 130 includes one or more communication media that enable encoding device 110 to transmit encoded video data directly to decoding device 120 in real-time.
  • encoding device 110 may modulate the encoded video data according to a communication standard and transmit the modulated video data to decoding device 120 .
  • the communication medium includes a wireless communication medium, such as a radio frequency spectrum.
  • the communication medium may also include a wired communication medium, such as one or more physical transmission lines.
  • the channel 130 includes a storage medium that can store video data encoded by the encoding device 110 .
  • the storage medium includes a variety of local access data storage media, such as optical discs, DVDs, flash memory, and the like.
  • the decoding device 120 may acquire encoded video data from the storage medium.
  • channel 130 may include a storage server that may store video data encoded by encoding device 110 .
  • the decoding device 120 may download the stored encoded video data from the storage server.
  • the storage server may store the encoded video data and may transmit the encoded video data to the decoding device 120, such as a web server (eg, for a website), a file transfer protocol (FTP) server, and the like.
  • FTP file transfer protocol
  • the encoding device 110 includes a video encoder 112 and an output interface 113 .
  • the output interface 113 may include a modulator/demodulator (modem) and/or a transmitter.
  • the encoding device 110 may include a video source 111 in addition to the video encoder 112 and the input interface 113 .
  • the video source 111 may include at least one of a video capture device (for example, a video camera), a video archive, a video input interface, a computer graphics system, wherein the video input interface is used to receive video data from a video content provider, and the computer graphics system Used to generate video data.
  • a video capture device for example, a video camera
  • a video archive for example, a video archive
  • a video input interface for example, a video archive
  • video input interface for example, a video input interface
  • computer graphics system used to generate video data.
  • the video encoder 112 encodes the video data from the video source 111 to generate a code stream.
  • Video data may include one or more pictures or a sequence of pictures.
  • the code stream contains the encoding information of an image or image sequence in the form of a bit stream.
  • Encoding information may include encoded image data and associated data.
  • the associated data may include a sequence parameter set (SPS for short), a picture parameter set (PPS for short) and other syntax structures.
  • SPS sequence parameter set
  • PPS picture parameter set
  • An SPS may contain parameters that apply to one or more sequences.
  • a PPS may contain parameters applied to one or more images.
  • the syntax structure refers to a set of zero or more syntax elements arranged in a specified order in the code stream.
  • the video encoder 112 directly transmits encoded video data to the decoding device 120 via the output interface 113 .
  • the encoded video data can also be stored on a storage medium or a storage server for subsequent reading by the decoding device 120 .
  • the decoding device 120 includes an input interface 121 and a video decoder 122 .
  • the decoding device 120 may include a display device 123 in addition to the input interface 121 and the video decoder 122 .
  • the input interface 121 includes a receiver and/or a modem.
  • the input interface 121 can receive encoded video data through the channel 130 .
  • the video decoder 122 is used to decode the encoded video data to obtain decoded video data, and transmit the decoded video data to the display device 123 .
  • the display device 123 displays the decoded video data.
  • the display device 123 may be integrated with the decoding device 120 or external to the decoding device 120 .
  • the display device 123 may include various display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or other types of display devices.
  • LCD liquid crystal display
  • plasma display a plasma display
  • OLED organic light emitting diode
  • FIG. 1 is only an example, and the technical solutions of the embodiments of the present application are not limited to FIG. 1 .
  • the technology of the present application may also be applied to one-sided video encoding or one-sided video decoding.
  • Fig. 2 is a schematic block diagram of a video encoder provided by an embodiment of the present application. It should be understood that the video encoder 200 can be used to perform lossy compression on images, and can also be used to perform lossless compression on images.
  • the lossless compression may be visually lossless compression or mathematically lossless compression.
  • the video encoder 200 can be applied to image data in luminance-chrominance (YCbCr, YUV) format.
  • the YUV ratio can be 4:2:0, 4:2:2 or 4:4:4, Y means brightness (Luma), Cb (U) means blue chroma, Cr (V) means red chroma, U and V are expressed as chroma (Chroma) for describing color and saturation.
  • 4:2:0 means that every 4 pixels have 4 luminance components
  • 2 chroma components YYYYCbCr
  • 4:2:2 means that every 4 pixels have 4 luminance components
  • 4 Chroma component YYYYCbCrCbCr
  • 4:4:4 means full pixel display (YYYYCbCrCbCrCbCrCbCr).
  • the video encoder 200 reads video data, and for each frame of image in the video data, divides a frame of image into several coding tree units (coding tree unit, CTU), "largest coding unit” (Largest Coding unit, LCU for short) or "coding tree block” (coding tree block, CTB for short).
  • Each CTU may be associated with a pixel block of equal size within the image.
  • Each pixel may correspond to one luminance (luma) sample and two chrominance (chrominance or chroma) samples.
  • each CTU may be associated with one block of luma samples and two blocks of chroma samples.
  • a CTU size is, for example, 128 ⁇ 128, 64 ⁇ 64, 32 ⁇ 32 and so on.
  • a CTU can be further divided into several coding units (Coding Unit, CU) for coding, and the CU can be a rectangular block or a square block.
  • the CU can be further divided into a prediction unit (PU for short) and a transform unit (TU for short), so that coding, prediction, and transformation are separated, and processing is more flexible.
  • a CTU is divided into CUs in a quadtree manner, and a CU is divided into TUs and PUs in a quadtree manner.
  • the video encoder and video decoder can support various PU sizes. Assuming that the size of a specific CU is 2N ⁇ 2N, video encoders and video decoders may support 2N ⁇ 2N or N ⁇ N PU sizes for intra prediction, and support 2N ⁇ 2N, 2N ⁇ N, N ⁇ 2N, NxN or similarly sized symmetric PUs for inter prediction. The video encoder and video decoder may also support asymmetric PUs of 2NxnU, 2NxnD, nLx2N, and nRx2N for inter prediction.
  • the video encoder 200 may include: a prediction unit 210, a residual unit 220, a transform/quantization unit 230, an inverse transform/quantization unit 240, a reconstruction unit 250, and a loop filter unit 260. Decoded image cache 270 and entropy encoding unit 280. It should be noted that the video encoder 200 may include more, less or different functional components.
  • the current block may be called a current coding unit (CU) or a current prediction unit (PU).
  • a predicted block may also be referred to as a predicted block to be encoded or an image predicted block, and a reconstructed block to be encoded may also be referred to as a reconstructed block or an image reconstructed block to be encoded.
  • the prediction unit 210 includes an inter prediction unit 211 and an intra prediction unit 212 . Because there is a strong correlation between adjacent pixels in a video frame, the intra-frame prediction method is used in video coding and decoding technology to eliminate the spatial redundancy between adjacent pixels. Due to the strong similarity between adjacent frames in video, the inter-frame prediction method is used in video coding and decoding technology to eliminate time redundancy between adjacent frames, thereby improving coding efficiency.
  • the inter-frame prediction unit 211 can be used for inter-frame prediction.
  • the inter-frame prediction can refer to image information of different frames.
  • the inter-frame prediction uses motion information to find a reference block from the reference frame, and generates a prediction block according to the reference block to eliminate temporal redundancy;
  • Frames used for inter-frame prediction may be P frames and/or B frames, P frames refer to forward predictive frames, and B frames refer to bidirectional predictive frames.
  • the motion information includes the reference frame list where the reference frame is located, the reference frame index, and the motion vector.
  • the motion vector can be an integer pixel or a sub-pixel. If the motion vector is sub-pixel, then it is necessary to use interpolation filtering in the reference frame to make the required sub-pixel block.
  • the reference frame found according to the motion vector A block of whole pixels or sub-pixels is called a reference block.
  • Some technologies will directly use the reference block as a prediction block, and some technologies will further process the reference block to generate a prediction block. Reprocessing and generating a prediction block based on a reference block can also be understood as taking the reference block as a prediction block and then processing and generating a new prediction block based on the prediction block.
  • inter-frame prediction methods include: geometric partitioning mode (GPM) in the VVC video codec standard, and angular weighted prediction (AWP) in the AVS3 video codec standard. These two intra-frame prediction modes have something in common in principle.
  • GPM geometric partitioning mode
  • AVS3 angular weighted prediction
  • the intra-frame prediction unit 212 only refers to the information of the same frame image, and predicts the pixel information in the block to be encoded of the current code, so as to eliminate spatial redundancy.
  • a frame used for intra prediction may be an I frame.
  • the intra prediction method further includes a multiple reference line intra prediction method (multiple reference line, MRL).
  • MRL can use more reference pixels to improve coding efficiency.
  • mode 0 is to copy the pixels above the current block to the current block in the vertical direction as the prediction value
  • mode 1 is to copy the reference pixels on the left to the current block in the horizontal direction as the prediction value
  • mode 2 (DC) is to copy the pixels from A to The average value of the 8 points D and I ⁇ L is used as the prediction value of all points
  • modes 3 to 8 are to copy the reference pixel to the corresponding position of the current block according to a certain angle. Because some positions of the current block cannot exactly correspond to the reference pixels, it may be necessary to use the weighted average of the reference pixels, or the sub-pixels of the interpolated reference pixels.
  • the intra prediction modes used by HEVC include planar mode (Planar), DC and 33 angle modes, a total of 35 prediction modes.
  • the intra-frame modes used by VVC include Planar, DC and 65 angle modes, with a total of 67 prediction modes.
  • the intra-frame modes used by AVS3 include DC, Plane, Bilinear and 63 angle modes, a total of 66 prediction modes.
  • the intra-frame prediction will be more accurate, and it will be more in line with the demand for the development of high-definition and ultra-high-definition digital video.
  • the residual unit 220 may generate a residual block of the CU based on the pixel blocks of the CU and the prediction blocks of the PUs of the CU. For example, residual unit 220 may generate a residual block for a CU such that each sample in the residual block has a value equal to the difference between the samples in the pixel blocks of the CU, and the samples in the PUs of the CU. Corresponding samples in the predicted block.
  • Transform/quantization unit 230 may quantize the transform coefficients. Transform/quantization unit 230 may quantize transform coefficients associated with TUs of a CU based on quantization parameter (QP) values associated with the CU. Video encoder 200 may adjust the degree of quantization applied to transform coefficients associated with a CU by adjusting the QP value associated with the CU.
  • QP quantization parameter
  • Inverse transform/quantization unit 240 may apply inverse quantization and inverse transform to the quantized transform coefficients, respectively, to reconstruct a residual block from the quantized transform coefficients.
  • the reconstruction unit 250 may add samples of the reconstructed residual block to corresponding samples of one or more prediction blocks generated by the prediction unit 210 to generate a reconstructed block to be encoded associated with the TU. By reconstructing the sample blocks of each TU of the CU in this way, the video encoder 200 can reconstruct the pixel blocks of the CU.
  • Loop filtering unit 260 may perform deblocking filtering operations to reduce blocking artifacts of pixel blocks associated with a CU.
  • the loop filtering unit 260 includes a deblocking filtering unit, a sample point adaptive compensation SAO unit, and an adaptive loop filtering ALF unit.
  • the decoded image buffer 270 may store reconstructed pixel blocks.
  • Inter prediction unit 211 may use reference pictures containing reconstructed pixel blocks to perform inter prediction on PUs of other pictures.
  • intra prediction unit 212 may use the reconstructed pixel blocks in decoded picture cache 270 to perform intra prediction on other PUs in the same picture as the CU.
  • Entropy encoding unit 280 may receive the quantized transform coefficients from transform/quantization unit 230 . Entropy encoding unit 280 may perform one or more entropy encoding operations on the quantized transform coefficients to generate entropy encoded data.
  • the basic flow of video coding involved in this application is as follows: at the coding end, the current image is divided into blocks, and for the current block, the prediction unit 210 uses intra prediction or inter prediction to generate a prediction block of the current block.
  • the residual unit 220 may calculate a residual block based on the predicted block and the original block of the current block, that is, a difference between the predicted block and the original block of the current block, and the residual block may also be referred to as residual information.
  • the residual block can be transformed and quantized by the transformation/quantization unit 230 to remove information that is not sensitive to human eyes, so as to eliminate visual redundancy.
  • the residual block before being transformed and quantized by the transform/quantization unit 230 may be called a time domain residual block, and the time domain residual block after being transformed and quantized by the transform/quantization unit 230 may be called a frequency residual block or a frequency-domain residual block.
  • the entropy encoding unit 280 receives the quantized transform coefficients output by the transform and quantization unit 230 , may perform entropy encoding on the quantized transform coefficients, and output a code stream.
  • the entropy coding unit 280 can eliminate character redundancy according to the target context model and the probability information of the binary code stream.
  • the video encoder performs inverse quantization and inverse transformation on the quantized transform coefficients output by the transform and quantization unit 230 to obtain a residual block of the current block, and then adds the residual block of the current block to the prediction block of the current block, Get the reconstructed block of the current block.
  • reconstructed blocks corresponding to other blocks to be encoded in the current image can be obtained, and these reconstructed blocks are spliced to obtain a reconstructed image of the current image.
  • filter the reconstructed image for example, use ALF to filter the reconstructed image to reduce the difference between the pixel value of the pixel in the reconstructed image and the original pixel value of the pixel in the current image difference.
  • the filtered reconstructed image is stored in the decoded image buffer 270, which may serve as a reference frame for inter-frame prediction for subsequent frames.
  • the block division information determined by the encoder as well as mode information or parameter information such as prediction, transformation, quantization, entropy coding, and loop filtering, etc., are carried in the code stream when necessary.
  • the decoding end analyzes the code stream and analyzes the existing information to determine the same block division information as the encoding end, prediction, transformation, quantization, entropy coding, loop filtering and other mode information or parameter information, so as to ensure the decoding image obtained by the encoding end It is the same as the decoded image obtained by the decoder.
  • Fig. 3 is a schematic block diagram of a video decoder provided by an embodiment of the present application.
  • the video decoder 300 includes: an entropy decoding unit 310 , a prediction unit 320 , an inverse quantization/transformation unit 330 , a reconstruction unit 340 , a loop filter unit 350 and a decoded image buffer 360 . It should be noted that the video decoder 300 may include more, less or different functional components.
  • the video decoder 300 can receive code streams.
  • the entropy decoding unit 310 may parse the codestream to extract syntax elements from the codestream. As part of parsing the codestream, the entropy decoding unit 310 may parse the entropy-encoded syntax elements in the codestream.
  • the prediction unit 320 , the inverse quantization/transformation unit 330 , the reconstruction unit 340 and the loop filter unit 350 can decode video data according to the syntax elements extracted from the code stream, that is, generate decoded video data.
  • the prediction unit 320 includes an intra prediction unit 321 and an inter prediction unit 322 .
  • Intra prediction unit 321 may perform intra prediction to generate a predictive block for a PU. Intra prediction unit 321 may use an intra prediction mode to generate a prediction block for a PU based on pixel blocks of spatially neighboring PUs. Intra prediction unit 321 may also determine an intra prediction mode for a PU from one or more syntax elements parsed from a codestream.
  • the inter prediction unit 322 may construct a first reference picture list (list 0) and a second reference picture list (list 1) according to the syntax elements parsed from the codestream. Furthermore, if the PU is encoded using inter prediction, entropy decoding unit 310 may parse the motion information for the PU. Inter prediction unit 322 may determine one or more reference blocks for the PU according to the motion information of the PU. Inter prediction unit 322 may generate a predictive block for the PU from one or more reference blocks for the PU.
  • Inverse quantization/transform unit 330 may inverse quantize (ie, dequantize) transform coefficients associated with a TU. Inverse quantization/transform unit 330 may use QP values associated with CUs of the TU to determine the degree of quantization.
  • inverse quantization/transform unit 330 may apply one or more inverse transforms to the inverse quantized transform coefficients in order to generate a residual block associated with the TU.
  • Reconstruction unit 340 uses the residual blocks associated with the TUs of the CU and the prediction blocks of the PUs of the CU to reconstruct the pixel blocks of the CU. For example, the reconstruction unit 340 may add the samples of the residual block to the corresponding samples of the prediction block to reconstruct the pixel block of the CU, and obtain the reconstructed block to be encoded.
  • Loop filtering unit 350 may perform deblocking filtering operations to reduce blocking artifacts of pixel blocks associated with a CU.
  • the loop filtering unit 350 includes a deblocking filtering unit, a sample point adaptive compensation SAO unit, and an adaptive loop filtering ALF unit.
  • Video decoder 300 may store the reconstructed picture of the CU in decoded picture cache 360 .
  • the video decoder 300 may use the reconstructed picture in the decoded picture buffer 360 as a reference picture for subsequent prediction, or transmit the reconstructed picture to a display device for presentation.
  • the entropy decoding unit 310 can analyze the code stream to obtain the prediction information of the current block, the quantization coefficient matrix, etc., and the prediction unit 320 uses intra prediction or inter prediction to generate the current block based on the prediction information.
  • the inverse quantization/transformation unit 330 uses the quantization coefficient matrix obtained from the code stream to perform inverse quantization and inverse transformation on the quantization coefficient matrix to obtain a residual block.
  • the reconstruction unit 340 adds the predicted block and the residual block to obtain a reconstructed block.
  • the reconstructed blocks form a reconstructed image
  • the loop filtering unit 350 performs loop filtering on the reconstructed image based on the image or based on the block to obtain a decoded image.
  • the decoded image can also be referred to as a reconstructed image.
  • the reconstructed image can be displayed by a display device, and on the other hand, it can be stored in the decoded image buffer 360 and serve as a reference frame for inter-frame prediction for subsequent frames.
  • the above is the basic process of the video codec under the block-based hybrid coding framework. With the development of technology, some modules or steps of the framework or process may be optimized. This application is applicable to the block-based hybrid coding framework.
  • the basic process of the video codec but not limited to the framework and process.
  • HDR high dynamic range
  • LDR low dynamic range
  • An embodiment of the present application provides a model-based image processing method, which converts an LDR image into an HDR image through a model. That is, the encoding end encodes the LDR image to form a code stream and transmits it to the decoding end. After decoding the LDR image, the decoding end uses the model of the embodiment of the present application to dynamically convert the decoded LDR image to obtain an HDR image. HDR image conversion is achieved while reducing the cost of encoding, transmission, and storage.
  • the image processing method provided in the present application converts an LDR image into an HDR image by using a dynamic conversion model, and the dynamic conversion model is a piece of software code or a chip with data processing functions. Based on this, the training process of the dynamic conversion model is firstly introduced.
  • Fig. 4 is a schematic flow chart of a dynamic conversion model training method provided by an embodiment of the present application. As shown in Fig. 4, the training process includes:
  • the above-mentioned LDR training image is a randomly selected LDR training image in the training set, which includes a plurality of LDR training images
  • the training process of the dynamic conversion model using the LDR training images in the training set is an iterative process.
  • the first LDR training image is input into the dynamic conversion model to be trained, and the initial parameters of the dynamic conversion model are adjusted once to obtain the dynamic conversion model trained for the first time.
  • input the second LDR training image into the dynamic conversion model trained for the first time adjust the parameters of the dynamic conversion model trained for the first time, and obtain the dynamic conversion model trained for the second time, refer to the above method, iterates in sequence until the training end condition of the dynamic conversion model is reached.
  • the training end condition of the dynamic conversion model includes that the number of training times reaches a preset number of times, or the loss reaches a preset loss.
  • the methods for determining the initial parameters of the above-mentioned dynamic conversion model include but are not limited to the following:
  • the initial parameters of the dynamic conversion model may be preset values, or random values, or empirical values.
  • the second way is to obtain the pre-training parameters obtained during the pre-training of the pre-training model, and determine the pre-training parameters as the initial parameters of the dynamic conversion model.
  • the second way is to determine the pre-training parameters of the pre-training model as the initial parameters of the dynamic conversion model, which can reduce the number of training times and training accuracy of the dynamic conversion model.
  • the pre-training model is the VGG-16 network model.
  • the true value of the HDR image of the above-mentioned LDR training image may be dynamically converted to the LDR training image manually to generate an HDR image.
  • the true value of the HDR image of the above-mentioned LDR training image may be an HDR image obtained by converting the LDR training image using an existing high dynamic conversion method.
  • the collected HDR image may be converted into an LDR image, the converted LDR image may be used as an LDR training image, and the collected HDR image may be used as a true value of the HDR image of the LDR training image.
  • the embodiment of the present application does not limit the way of acquiring the LDR training image and the HDR image true value of the LDR training image.
  • the network structure of the dynamic conversion model involved in the embodiment of the present application will be introduced below in conjunction with FIG. 5A. It should be noted that the network structure of the dynamic conversion model in the embodiment of the present application includes but is not limited to the modules shown in FIG. Figure 5A More or less modules.
  • FIG. 5A is a schematic network diagram of a dynamic conversion model according to an embodiment of the present application.
  • the dynamic conversion model can be understood as an autoencoder network composed of N-level encoding components and decoding components.
  • the dynamic conversion model includes: N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in the N encoding modules is connected to the input of the first decoding module in the N decoding modules, and
  • the i-th encoding module is connected to the N-i+1-th decoding module by skip connection.
  • the skip connection can be understood as the connection between the input end of the i-th encoding module and the input end of the N-i+1-th decoding module.
  • the i-th encoding module is used to perform feature extraction on the i-1-th first feature information to obtain the i-th first feature information of the LDR training image
  • the N-i+1-th decoding module is used to extract the i-1-th first feature information
  • the first feature information and the N-i second feature information of the LDR training image are extracted to obtain the N-i+1 second feature information of the LDR training image, i is a positive integer less than or equal to N, and N is positive integer.
  • the above N-i th second feature information is determined according to the N th first feature information output by the N th encoding module.
  • the above N-i th second feature information is determined according to the N-i th second feature information output by the N-i th decoding module.
  • the i-1th first feature information is determined according to the LDR training image.
  • the i-1th first feature information is determined according to the first feature information output by the i-1th coding module.
  • the encoding component includes 4 serial encoding modules
  • the decoding component includes 4 serial decoding modules
  • the output of the last encoding module is connected to the input of the first decoding module.
  • the first coding module is connected to the fourth decoding module by skipping
  • the second coding module is connected to the third decoding module by skipping
  • the third coding module is connected to the second decoding module by skipping
  • the fourth coding module is connected to the first skip connections of decoding modules.
  • Input the LDR training image into the dynamic conversion model to obtain the 0th first feature information can be the LDR training image, or the feature map after the LDR training image is processed, the embodiment of the present application is for This is not limited.
  • Input the 0th first feature information into the first encoding module and the fourth decoding module respectively the first encoding module outputs the first first feature information according to the 0th first feature information, and the first The first feature information is respectively input into the second encoding module and the third decoding module.
  • the second encoding module obtains the second first feature information according to the first first feature information, and inputs the second first feature information into the third encoding module and the second decoding module respectively.
  • the third encoding module obtains the third first characteristic information according to the second first characteristic information, and inputs the third first characteristic information into the fourth encoding module and the first decoding module respectively.
  • the fourth encoding module outputs the fourth first characteristic information according to the third first characteristic information, and inputs the fourth first characteristic information into the first decoding module.
  • the first decoding module obtains the first second characteristic information according to the fourth first characteristic information and the third first characteristic information, and inputs the first second characteristic information into the second decoding module.
  • the second decoding module obtains the second second characteristic information according to the first second characteristic information and the second first characteristic information, and inputs the second second characteristic information into the third decoding module.
  • the third decoding module obtains the third second characteristic information according to the second second characteristic information and the first first characteristic information, and inputs the third second characteristic information into the fourth decoding module.
  • the fourth decoding module obtains the fourth second characteristic information according to the 0th first characteristic information and the third second characteristic information.
  • the above S403 includes: concatenating the i-1th first feature information and the N-ith second feature information of the LDR training image, and “C” in FIG. 5A indicates Concatenation: input the concatenated feature information into the N-i+1th decoding module for feature extraction, and obtain the N-i+1th second feature information of the LDR training image.
  • the fourth first feature information and the third first feature information are concatenated, and the concatenated fourth first feature information and third first feature information are input into the first decoding module to obtain The first second feature information output by the first decoding module.
  • the second second characteristic information and the first first characteristic information are concatenated, and the concatenated second second characteristic information and the first first characteristic information are input into the third decoding module to obtain the third The third second feature information output by a decoding module.
  • the 0th first characteristic information and the third second characteristic information are concatenated, and the concatenated 0th first characteristic information and the third second characteristic information are input into the fourth decoding module, The fourth second feature information output by the fourth decoding module is obtained.
  • the embodiment of the present application does not limit the specific network structure of the encoding module.
  • each of the N coding modules includes at least one convolutional block, and parameters of the convolutional blocks included in each of the N coding modules are not completely the same.
  • the feature dimension of the convolution block included in the first encoding module is 64
  • the feature dimension of the convolution block included in the second encoding module is 128, and the feature dimension of the convolution block included in the third encoding module is is 256
  • the feature dimension of the convolutional block included in the fourth encoding module is 512 and so on.
  • the embodiment of the present application does not limit the specific network structure of the decoding module.
  • each of the N decoding modules includes at least one convolutional block, and parameters of the convolutional blocks included in each of the N decoding modules are not completely the same.
  • the feature dimension of the convolution block included in the first decoding module is 256
  • the feature dimension of the convolution block included in the second decoding module is 128,
  • the feature dimension of the convolution block included in the third decoding module is is 64
  • the feature dimension of the convolutional block included in the fourth coding module is 32 and so on.
  • the network structures of the convolutional blocks included in the encoding modules in the embodiments of the present application may be the same or different.
  • the network structures of the convolutional blocks included in each decoding module may be the same or different.
  • the network structures of the convolutional blocks included in the encoding module and the decoding module may be the same or different, which is not limited in this application.
  • the network structure of the convolutional block included in the encoding module and/or the decoding module includes a convolutional layer 1, a convolutional layer 2, a convolutional layer 3 and an activation function.
  • the convolution kernels of convolution layer 1 and convolution layer 2 are 3 ⁇ 3
  • the convolution kernel of convolution layer 3 is 1 ⁇ 1
  • the activation function is a Sigmoid weighted linear unit (Sigmoid Weighted Liner Unit, referred to as SiLU).
  • the sizes of the convolution kernels of the above-mentioned convolutional layer 1, convolutional layer 2, and convolutional layer 3 include but are not limited to the above values, and the activation functions include but are not limited to SiLU, such as RELU, etc. This is not limited.
  • the dynamic conversion model further includes: a convolutional block attention module (Convolutional Block Attention Module, Abbreviated as CBAM).
  • CBAM convolutional Block Attention Module
  • the attention mechanism of this convolutional attention module enables the dynamic transformation model to focus more attention on the relevant parts of the encoding side features and less attention on other irrelevant parts, that is, by The convolutional attention mechanism is used to improve the representation ability of the dynamic conversion model, focusing on important features and suppressing unnecessary features, thus greatly improving the efficiency of the model.
  • one or more CBAMs are included in the skip connections between each encoding module and decoding module.
  • the above S403 performs feature extraction on the i-1th first feature information and the N-ith second feature information of the LDR training image through the N-i+1th decoding module , the N-i+1 second feature information of the LDR training image includes S403-A and S403-B:
  • N-i+1th decoding module uses the N-i+1th decoding module to perform feature extraction on the i-1th third feature information and the N-ith second feature information to obtain the N-i+1th second feature information of the LDR training image characteristic information.
  • the i-1th third feature information and the N-ith second feature information are concatenated, and the concatenated i-1th third feature information and N-ith second feature information are input into the N-th
  • the i+1 decoding module obtains the N-i+1th second feature information of the LDR training image output by the N-i+1th decoding module.
  • the embodiment of the present application does not limit the network structure of the convolutional attention module.
  • the convolutional attention module includes: a channel attention module and a spatial attention module.
  • the channel attention module learns the channel information of features by using the inter-channel relationship of features
  • the spatial attention module learns the spatial information of features by using the spatial relationship of features.
  • the channel to which it belongs here can be understood as a feature dimension.
  • the feature dimension of a piece of feature information is 32, it means that the number of channels of the feature information is 32.
  • Using the spatial attention module perform spatial information extraction on the fusion channel feature information of the i-1 first feature information, to obtain the spatial attention information of the i-1 first feature information.
  • the fused channel feature information of the i-1 th first feature information is determined according to the i-1 th first feature information and the channel attention information of the i-1 th first feature information.
  • the convolutional attention module also includes a first multiplication unit, at this time S403-A2 includes S403-A21 and S403-A22:
  • S403-A3. Determine the i-1th third feature information of the LDR training image according to the channel attention information and the spatial attention information of the i-1th first feature information.
  • the convolutional attention module further includes a second multiplication unit, then S403-A3 includes: the fusion channel feature information of the i-1th first feature information through the second multiplication unit Multiply with the spatial attention information to obtain the i-1th third feature information of the LDR training image.
  • the network structure of the convolutional attention module is shown in Figure 5D.
  • the i-1th first feature information is a feature map F
  • the feature map F is input into the CBAM module, and the CBAM module will follow two independent dimensions (i.e. channel dimension and spatial dimension) the attention map is sequentially inferred, and then the attention map is multiplied with the input feature map for adaptive feature optimization.
  • the one-dimensional channel attention map MC is obtained through the channel attention module
  • F' is obtained after multiplying MC and the input feature F.
  • the final feature map F" is obtained after multiplying Ms and F', and the final feature map is the i-1th third feature information of the LDR training image.
  • Figure 5D Indicates that the corresponding elements are multiplied sequentially.
  • the dimension of the input feature map F is H ⁇ W ⁇ C
  • the dimension of the 1D channel attention map MC is 1 ⁇ 1 ⁇ C
  • the dimension of the 2D spatial attention map Ms is H ⁇ W ⁇ 1.
  • the channel attention module includes: a first spatial compression unit, a second spatial compression unit, and a channel feature extraction unit.
  • both the first space compression unit and the second space compression unit are used to compress the spatial size of the feature map
  • the channel feature extraction unit is used to perform feature extraction on the space compressed feature map. That is, as shown in Figure 5F, in order to efficiently calculate channel attention, the present application compresses the spatial dimension of the input feature map.
  • the above-mentioned first spatial compression unit and/or the second spatial compression unit includes a pooling layer.
  • the above-mentioned first spatial compression unit is a maximum pooling layer
  • the second spatial compression unit is an average pooling layer
  • the channel feature extraction unit is a multilayer perception machine (Multilayer perception, MLP for short), for example, the MLP is an MLP including a single hidden layer.
  • MLP multilayer perception machine
  • channel information is extracted from the i-1 first feature information through the channel attention module, and the channel attention information of the i-1 first feature information is obtained including S403- A11 to S403-A15:
  • S403-A15 Determine the channel attention information of the i-1 first feature information according to the first channel information and the second channel information of the i-1 first feature information.
  • the channel attention module further includes: a first addition unit and a first activation function.
  • the above S403-A15 includes:
  • the embodiment of the present application does not limit the specific form of the first activation function, which is specifically determined according to actual needs.
  • the spatial attention module includes: a first channel compression unit, a second channel compression unit, and a spatial feature extraction unit. Both the first channel compression unit and the second channel compression unit are used to compress the channel dimension of the feature map, and the spatial feature extraction unit is used to perform feature extraction on the channel compressed feature map. That is, the spatial attention module shown in Figure 5F generates a spatial attention map by utilizing the spatial relationship between features. Spatial attention complements channel attention. To compute spatial attention, the channel dimensions of the input feature maps are compressed.
  • the first channel compression unit and/or the second channel compression unit include a pooling layer.
  • the first channel compression unit is a maximum pooling layer (MaxPool), and/or the second channel compression unit is an average pooling (AvgPool) layer.
  • MaxPool maximum pooling layer
  • AvgPool average pooling
  • the aforementioned spatial feature extraction unit is a convolutional layer.
  • the above S403-A2 uses the spatial attention module to extract the spatial information of the fusion channel feature information of the i-1 first feature information to obtain the spatial attention information of the i-1 first feature information, including S403 -A21 to S403-A24:
  • the spatial attention module further includes a second activation function
  • S403-A24 includes: performing non-linearity on the spatial feature information of the i-1th first feature information through the second activation function processing to obtain the spatial attention information of the i-1th first feature information.
  • the embodiment of the present application does not limit the specific form of the second activation function, for example, a sigmoid activation function.
  • the spatial attention module utilizes average pooling (ie, the second channel compression unit) and maximum pooling (ie, the first channel compression unit) operations to generate corresponding feature vectors along the channel (channel) axis, and concatenate the two to generate efficient feature descriptors.
  • average pooling ie, the second channel compression unit
  • maximum pooling ie, the first channel compression unit
  • a two-dimensional spatial attention feature map Ms is generated after a sigmoid activation function (ie, the second activation function).
  • the spatial dimension of the channel attention information of the i-1th first feature information is 1 ⁇ 1.
  • the feature dimension of the spatial attention information of the i-1th first feature information is 1.
  • the dynamic conversion model provided by the embodiment of the present application adds a convolutional attention module to each branch, and the convolutional attention module includes a channel attention module and a spatial attention module, respectively for channel features and spatial features Learning is carried out, thereby improving the learning of image detail features by the dynamic conversion model, so that the trained dynamic conversion model can reconstruct more detailed features in the image, thereby improving the quality of the HDR image generated by the dynamic conversion model.
  • the dynamic conversion model further includes at least one downsampling unit
  • the training method in the embodiment of the present application further includes: performing spatial dimension downsampling on the feature information output by the encoding module through the downsampling unit. That is, in order to reduce network complexity in the embodiment of the present application, at least one downsampling unit is set in the coding component to reduce the spatial dimension of the feature information output by the coding module.
  • the embodiment of the present application does not limit the number of down-sampling units included in the dynamic conversion model, which is specifically determined according to actual requirements.
  • a downsampling unit is set between two adjacent encoding modules, which is used to downsample the feature information output by the previous encoding unit in a spatial dimension, and then input it into the next encoding module, This not only reduces the amount of data processed by the encoding module and reduces the complexity of the model, but also enables each encoding module to learn features of different sizes to improve the prediction accuracy of the dynamic conversion model.
  • the downsampling unit is a maximum pooling layer.
  • the dynamic conversion model further includes at least one upsampling unit
  • the training method in the embodiment of the present application further includes: performing spatial dimension upsampling on the feature information output by the decoding module through the upsampling unit.
  • At least one down-sampling unit is set in the encoding component, in order to ensure that the size of the decoded image is consistent with the size of the original image, at least one up-sampling unit is set in the decoding component for decoding
  • the feature information output by the module is up-sampled in the spatial dimension.
  • the upsampling unit is a bilinear interpolation unit.
  • the dynamic conversion model further includes a first convolutional layer, the first convolutional layer is located at the input end of the dynamic conversion model, and is used to process the image input to the dynamic conversion model to obtain the input The initial feature map of the image.
  • the first convolutional layer is located at the input end of the dynamic conversion model, and is used to process the image input to the dynamic conversion model to obtain the input The initial feature map of the image.
  • input the LDR training image into the dynamic conversion model and extract the features of the LDR training image through the first convolutional layer in the dynamic conversion model to obtain the initial feature map of the LDR training image; input the initial feature map into the first encoding module respectively
  • the first convolutional attention module the first first feature information output by the first encoding module, and the first third feature information output by the first convolutional attention module are obtained.
  • the aforementioned initial feature map can be understood as the aforementioned 0th first feature information.
  • the LDR training image is input into the dynamic conversion model, and the second characteristic information of the LDR training image output by the last decoding module in the dynamic conversion model can be obtained, and then, the following S404 is performed.
  • S404 Determine the HDR image prediction value of the LDR training image according to the second characteristic information of the LDR training image output by the last decoding module among the N decoding modules.
  • the channel of the second feature information of the LDR training image is converted into 3 channels (such as RGB channels) to obtain the predicted value of the HDR image of the LDR training image.
  • the dynamic conversion model further includes a second convolutional layer
  • the above S404 includes: performing the second feature information of the LDR training image output by the last decoding module through the second convolutional layer Feature extraction, output the HDR image prediction value of the LDR training image.
  • the second convolutional layer above also includes an activation function, and the feature dimension of the second convolutional layer is 3, that is, after passing through the second convolutional layer, a 3-channel (such as RGB) image can be output, and the 3-channel image can be used as an LDR HDR image predictors for training images.
  • a 3-channel (such as RGB) image can be output, and the 3-channel image can be used as an LDR HDR image predictors for training images.
  • the size of the convolution kernel of the second convolution layer may be 1 ⁇ 1.
  • S405. Determine the target loss between the predicted value of the HDR image of the LDR training image and the true value of the HDR image of the LDR training image, and train the dynamic transformation model according to the loss.
  • the HDR image prediction value of the LDR training image is obtained according to the steps of S404 above, the HDR image prediction value of the LDR training image is compared with the HDR image true value of the LDR training image to determine the HDR image prediction value of the LDR training image and the LDR training image The target loss between the true values of the HDR image, and adjust the parameters in the dynamic conversion model according to the target loss, to achieve a training of the dynamic conversion model.
  • the manner of determining the loss in S405 includes S405A: according to a preset loss function, determine a target loss between the predicted value of the HDR image of the LDR training image and the true value of the HDR image of the LDR training image.
  • the aforementioned preset loss function includes at least one of a reconstruction loss function, a perceptual loss function, and a style loss function.
  • the above preset loss function includes a reconstruction loss function, a perceptual loss function, and a style loss function.
  • S405A includes:
  • the reconstruction loss, perceptual loss and style loss between the predicted value of the HDR image and the ground truth value of the HDR image, the target loss between the predicted value of the HDR image and the ground truth value of the HDR image is determined.
  • the reconstruction loss determines that the predicted value of the HDR image is close to the true value of the HDR image on the pixel.
  • the perceptual loss evaluates how well the features of the predicted value of the HDR image match the features extracted from the ground truth of the HDR image, and allows the model to produce textures that are perceptually similar to the ground truth of the HDR image, i.e., the perceptual loss ensures the generation of textures with more texture details. Visually pleasing images.
  • the style loss captures both style and texture by comparing global statistics with Gram matrices collected over the entire image, ensuring both style consistency and color consistency of the predicted image.
  • the weighted sum of reconstruction loss, perceptual loss and style loss can be used as the target loss.
  • Loss is the target loss
  • L1 is the reconstruction loss
  • Lst is the perceptual loss
  • Lp is the style loss
  • ⁇ s and ⁇ p are hyperparameters.
  • the weight of the reconstruction loss is 1
  • the weight of the perceptual loss is ⁇ s
  • the weight of the style loss is ⁇ p .
  • the above formula (1) is just an example, and the method of determining the target loss in this application includes but is not limited to the above formula (1), such as adding, subtracting, multiplying or dividing in formula (1) A certain parameter, or the equivalent deformation of the above formula (1), etc., all belong to the protection scope of the present application.
  • the compressed tone mapping value of the predicted value of the HDR image is determined; according to the compressed tone mapping function, the compressed tone mapping value of the true value of the HDR image is determined; according to the compression of the true value of the HDR image.
  • the error between the tonemapped value and the compressed tonemapped value of the HDR image prediction determines the reconstruction loss.
  • the reconstruction loss is determined according to the following formula (2):
  • L1 represents the reconstruction loss
  • T is the ⁇ -law compressed tone mapping function
  • T(H) is the compressed tone mapping value of the predicted value of the HDR image
  • T(GT) is the compressed tone mapping value of the true value of the HDR image
  • x H or GT
  • H is the predicted value of the HDR image output by the dynamic conversion model
  • GT is the true value of the HDR image of the LDR training image
  • ⁇ . ⁇ 1 indicates the L1 norm
  • is the preset parameter.
  • the above formula (2) is just an example, and the method of determining the reconstruction loss in this application includes but is not limited to the above formula (2), such as adding, subtracting, multiplying or multiplying in formula (2) Except for a certain parameter, or the equivalent deformation of the above formula (2), etc., all belong to the protection scope of the present application.
  • the perceptual loss is determined in the following manner: obtain the feature map of the l-th layer of the pre-training model; determine the compressed tone-mapping value of the HDR image prediction value according to the preset compressed tone-mapping function; according to the compressed tone-mapping function , determine the compressed tone mapping value of the true value of the HDR image; determine the compressed tone mapping value of the predicted value of the HDR image, the first feature value corresponding to the feature map of the l layer; determine the compressed tone mapping value of the true value of the HDR image, in The second eigenvalue corresponding to the feature map of the l-th layer; determining the perceptual loss according to the error between the first eigenvalue and the second eigenvalue.
  • the perceptual loss is determined according to the following formula (3):
  • Lp represents the perceptual loss
  • ⁇ l represents the feature map of the l-th layer of the pre-training model, such as the feature map of the l-th layer of TGG-16
  • the size of the feature map is C l ⁇ H l ⁇ W l
  • ⁇ l (T(H)) is the first eigenvalue corresponding to the compressed tone mapping value of the HDR image prediction value in the feature map of the l layer
  • ⁇ l (T(GT)) is the compressed tone mapping value of the true value of the HDR image in The second eigenvalue corresponding to the feature map of layer l.
  • the above formula (3) is just an example, and the method of determining the perceptual loss in the present application includes but not limited to the above formula (3), such as adding, subtracting, multiplying or dividing in formula (3) A certain parameter, or the equivalent deformation of the above formula (3), etc., all belong to the protection scope of the present application.
  • the style loss is determined according to the following manner: obtain the Gram Gram matrix of the l-th layer feature map of the pre-training model; determine the compressed tone mapping value of the HDR image prediction value according to the preset compressed tone mapping function; Compressed tone mapping function, determine the compressed tone mapping value of the true value of the HDR image; determine the compressed tone mapping value of the predicted value of the HDR image, the corresponding first element value in the Gram Gram matrix; determine the compressed tone mapping value of the true value of the HDR image , the second element value corresponding to the feature map of the first layer; according to the error between the first element value and the second element value, determine the style loss.
  • the style loss is determined according to the following formula (4):
  • Lp represents the perceptual loss function
  • G(.) is the Gram Gram matrix of the l-th layer feature map of the pre-trained model
  • G(T(H)) is the compressed tone mapping value of the HDR image prediction value in the Gram Gram matrix
  • the first element value corresponding to G(T(GT)) is the second element value corresponding to the compressed tone mapping value of the true value of the HDR image in the feature map of layer l
  • x H or GT
  • the size of K l is C l H l W l , which represents the normalization factor of the calculation
  • the feature ⁇ is a matrix of (H l W l ) ⁇ C l , therefore, the size of the Gram matrix is C l ⁇ C l .
  • VGG-16 use the pre-trained VGG-16 network, and calculate the output of the feature maps and real features of the first three pooling layers pool1, pool2 and pool3 of VGG-16 respectively, and according to the above formula (3) and formula ( 4) Compute the perceptual loss and style loss for these features separately.
  • the target loss in the embodiment of the present application includes reconstruction loss, perceptual loss and style loss, so as to reduce reconstruction distortion, artifacts and tone anomalies of high dynamic range images, and further improve the quality of HDR images generated by the model.
  • Deep learning models rely on large-scale datasets, since datasets with LDR-HDR image pairs cannot be used.
  • This application collects from multiple HDR image datasets and HDR video data, and sets up a virtual camera to capture multiple random regions of the scene using randomly selected camera calibrations.
  • Virtual camera calibration contains parameters for exposure, camera curve, white balance and noise level. The virtual camera parameters are randomly selected, and the camera curve parameters are randomly fitted into the camera curve database. This provides a set of LDR and corresponding HDR images, which are used as input and ground truth for training, respectively. A set of data augmentation operations are then applied to improve the robustness of the predictions.
  • each HDR image as a real scene a region is selected as an image crop with random size and position, then randomly flipped and resampled to 256 ⁇ 256 pixels.
  • the final trained network using these data augmentations generalizes well to a variety of images captured with different cameras.
  • the obtained dataset is then divided into training set and test set. Specifically, two datasets, Fairchild HDR dataset and HDR EYE dataset, are collected from the HDR dataset for testing.
  • the hardware experimental equipment of this application is AMD Ryzen 5 CPU, NVIDIA GTX 1080 Ti and 16G memory, and the framework is PyTorch.
  • the method is compared with five existing single-image HDR reconstruction techniques, including three conventional non-learning methods: Akyuz method, KOV method and Masia method.
  • three conventional non-learning methods Akyuz method, KOV method and Masia method.
  • ExpandNet Three objective evaluation methods PU-PSNR, PU-SSIM and HDR-VDP Q-score were used to evaluate the image quality.
  • the perceptually uniform coding proposed in this application converts luminance values into approximately perceptually uniform pixel values of an HDR image.
  • PU-PSNR measures the pixel-wise difference between the predicted image and the reference image.
  • PU-SSIM measures the structural difference between predicted and reference images from the perspective of visual perception.
  • HDR-VDP is a visual metric used to compare reference and test images and predict the quality of an HDR image relative to the reference image. The quality Q-score provided in HDR-VDP is used as the evaluation metric.
  • Table 1 shows a quantitative comparison of reconstructed HDR images using existing methods on the HDR EYE dataset and the Fairchild dataset. Among them, the bold indicates the method with the best experimental results, and the underline indicates the second best algorithm. Our method has the best results in the Fairchild dataset, good Q-score in the HDR EYE dataset, and outperforms other methods in terms of PSNR and SSIM metrics on both datasets.
  • the Fairchild dataset was constructed by the team of Professor Mark D. Fairchild of Rochester Institute of Technology, and contains a series of HDR images and data of more than 100 pieces.
  • the embodiment of the present application provides a dynamic conversion model, the model includes N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in the N encoding modules and the output of the first encoding module in the N decoding modules
  • the input of a decoding module is connected, and the i-th encoding module is skipped and connected to the N-i+1-th decoding module, and the model is trained using the LDR training image.
  • the training process is: input the LDR training image into the dynamic conversion model,
  • the i-1th first feature information is extracted by the i-th encoding module to obtain the i-th first feature information of the LDR training image, and the i-1-th first feature information is obtained by the N-i+1-th decoding module
  • the first feature information and the N-i second feature information of the LDR training image are extracted to obtain the N-i+1 second feature information of the LDR training image; according to the LDR training output of the last decoding module in the N decoding modules
  • the second feature information of the image is to determine the HDR image prediction value of the LDR training image; determine the loss between the HDR image prediction value of the LDR training image and the HDR image true value of the LDR training image, and train the dynamic conversion model according to the loss.
  • the trained dynamic conversion model can be used to convert the LDR image into an HDR image, and then realize the conversion of the HDR image without increasing the cost of data acquisition, encoding, transmission, storage, etc., thereby improving the quality of the HDR image. conversion efficiency.
  • the dynamic conversion model provided by the embodiment of the present application can also be applied to the video codec framework, for example, it can be applied to the video decoding end to perform high dynamic conversion on the reconstructed image obtained by the decoding end to obtain the HDR of the reconstructed image image.
  • Fig. 6 is a schematic flowchart of an image decoding method provided by an embodiment of the present application. As shown in Fig. 6, the method includes:
  • the entropy decoding unit 310 can analyze the code stream to obtain prediction information of the current block, quantization coefficient matrix, etc., and the prediction unit 320 uses intra prediction or inter prediction for the current block based on the prediction information to generate a prediction block of the current block.
  • the inverse quantization/transformation unit 330 uses the quantization coefficient matrix obtained from the code stream to perform inverse quantization and inverse transformation on the quantization coefficient matrix to obtain a residual block.
  • the reconstruction unit 340 adds the predicted block and the residual block to obtain a reconstructed block.
  • the reconstructed blocks form a reconstructed image, and the loop filtering unit 350 performs loop filtering on the reconstructed image based on the image or based on the block to obtain the reconstructed image.
  • the dynamic transformation model is combined with the video coding framework.
  • the input 10-bit HDR data is converted into 8-bit LDR data through a tone mapping module (TM) at the encoding end, and then divided into CTUs and sent to the encoder for encoding.
  • TM tone mapping module
  • Motion compensation intra-frame prediction, inter-frame prediction, transformation, quantization, filtering, and entropy coding form a code stream.
  • the dynamic conversion model described in the above embodiment is added at the output end of the decoder.
  • the dynamic range of the decoded LDR reconstruction image is extended. Using this model, the quality of the obtained HDR data can be significantly improved, and the decoded image quality can be further improved under the premise of ensuring the bit rate.
  • S602. Input the reconstructed image into a dynamic conversion model to perform dynamic conversion to obtain a high dynamic range HDR image of the reconstructed image.
  • dynamic transformation model comprises: N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in N encoding modules is decoded with the first decoding module in N decoding modules The input connection of the module, and the i-th encoding module is skipped and connected to the N-i+1-th decoding module, and the i-th encoding module is used for the i-1th first feature information output by the i-1-th encoding module Perform feature extraction to obtain the i-th first feature information of the reconstructed image, and the N-i+1-th decoding module is used to perform feature extraction on the i-1-th first feature information and the N-i-th second feature information of the reconstructed image Extracting to obtain the N-i+1th second feature information of the reconstructed image, where i is a positive integer less than or equal to N, and N is a positive integer.
  • the HDR image of the reconstructed image is determined according to the second characteristic information output by the last decoding module among the N decoding modules.
  • the above N-i th second feature information is determined according to the N th first feature information output by the N th encoding module.
  • the above N-i th second feature information is determined according to the N-i th second feature information output by the N-i th decoding module.
  • the i-1th first feature information is determined according to the reconstructed image, for example, the 0th first feature information is the reconstructed image, or is a feature map after processing the reconstructed image.
  • the i-1th first feature information is determined according to the first feature information output by the i-1th coding module.
  • the embodiment of the present application does not limit the specific network structure of the encoding module.
  • each of the N coding modules includes at least one convolutional block, and parameters of the convolutional blocks included in each of the N coding modules are not completely the same.
  • the feature dimension of the convolution block included in the first encoding module is 64
  • the feature dimension of the convolution block included in the second encoding module is 128, and the feature dimension of the convolution block included in the third encoding module is is 256
  • the feature dimension of the convolutional block included in the fourth encoding module is 512 and so on.
  • the embodiment of the present application does not limit the specific network structure of the decoding module.
  • each of the N decoding modules includes at least one convolutional block, and parameters of the convolutional blocks included in each of the N decoding modules are not completely the same.
  • the feature dimension of the convolution block included in the first decoding module is 256
  • the feature dimension of the convolution block included in the second decoding module is 128,
  • the feature dimension of the convolution block included in the third decoding module is is 64
  • the feature dimension of the convolutional block included in the fourth coding module is 32 and so on.
  • the network structures of the convolutional blocks included in the encoding modules in the embodiments of the present application may be the same or different.
  • the network structures of the convolutional blocks included in each decoding module may be the same or different.
  • the network structures of the convolutional blocks included in the encoding module and the decoding module may be the same or different, which is not limited in this application.
  • the network structure included in the encoding module and/or the decoding module is as shown in FIG. 5B , including a convolutional layer 1, a convolutional layer 2, a convolutional layer 3 and an activation function.
  • the convolution kernel of convolution layer 1 and convolution layer 2 is 3 ⁇ 3
  • the convolution kernel of convolution layer 3 is 1 ⁇ 1
  • the activation function is Sigmoid Weighted Linear Unit (Sigmoid Weighted Liner Unit, referred to as SiLU ).
  • the sizes of the convolution kernels of the above-mentioned convolutional layer 1, convolutional layer 2, and convolutional layer 3 include but are not limited to the above values, and the activation functions include but are not limited to SiLU, such as RELU, etc. This is not limited.
  • the dynamic conversion model further includes: a convolutional attention module (CBAM) located in the skip connection between the i-th encoding module and the N-i+1-th decoding module.
  • CBAM convolutional attention module
  • the attention mechanism of this convolutional attention module enables the dynamic transformation model to focus more attention on the relevant parts of the encoding side features and less attention on other irrelevant parts, that is, by The convolutional attention mechanism is used to improve the representation ability of the dynamic conversion model, focusing on important features and suppressing unnecessary features, thus greatly improving the efficiency of the model.
  • one or more CBAMs are included in the skip connections between each encoding module and decoding module.
  • the convolutional attention module located in the skip connection between the i-th encoding module and the N-i+1-th decoding module is used to extract the spatial information and channel information of the i-1-th first feature information, and obtain the reconstruction The i-1th third feature information of the image.
  • the N-i+1th decoding module is used to perform feature extraction on the i-1th third feature information and the N-ith second feature information to obtain the N-i+1th second feature of the reconstructed image information.
  • the N-i+1 decoding module is used to perform feature extraction on the concatenated feature information of the i-1 first feature information and the N-i second feature information of the reconstructed image, and obtain the N-th feature information of the reconstructed image. i+1 pieces of second feature information.
  • the convolutional attention module includes a channel attention module and a spatial attention module.
  • the channel attention module is used to extract the channel information of the i-1 first feature information, and obtain the channel attention information of the i-1 first feature information.
  • the spatial attention module is used to extract the spatial information of the i-1 first feature information and the channel attention information of the i-1 first feature information, and obtain the spatial attention of the i-1 first feature information information.
  • the i-1th third feature information of the reconstructed image is determined according to the channel attention information and the spatial attention information of the i-1th first feature information.
  • the convolutional attention module also includes a first multiplication unit; the first multiplication unit is used for channel attention information of the i-1 first feature information and the i-1 first feature information Perform multiplication to obtain the fusion channel feature information of the i-1 first feature information.
  • the spatial attention module is used to extract the spatial information of the fusion channel feature information of the i-1 first feature information, and obtain The spatial attention information of the i-1th first feature information.
  • the convolutional attention module also includes a second multiplication unit; the second multiplication unit is used to multiply the fusion channel feature information and spatial attention information of the i-1 first feature information, Obtain the i-1th third feature information of the reconstructed image.
  • the channel attention module includes: a first spatial compression unit, a second spatial compression unit, and a channel feature extraction unit.
  • the first spatial compression unit is used to compress the spatial dimension of the i-1 first feature information to obtain the first spatial compression information of the i-1 first feature information;
  • the second spatial compression unit is used to perform spatial dimension compression on the i-1 first feature information to obtain second spatial compression information of the i-1 first feature information;
  • the channel feature extraction unit is used to perform channel feature extraction on the first spatial compression information of the i-1 first feature information, to obtain the first channel information of the i-1 first feature information, and to obtain the first channel information of the i-1 first feature information.
  • Channel feature extraction is performed on the second space compressed information of the feature information to obtain i-1 second channel information of the first feature information.
  • the channel attention information of the i-1 first feature information is determined according to the first channel information and the second channel information of the i-1 first feature information.
  • the first spatial compression unit and/or the second spatial compression unit includes a pooling layer.
  • the first spatial compression unit is a maximum pooling layer
  • the second spatial compression unit is an average pooling layer
  • the channel feature extraction unit is a multi-layer perceptron MLP.
  • the channel attention module also includes: a first addition unit and a first activation function
  • the first addition unit is configured to add the first channel information and the second channel information of the i-1 first feature information to obtain the fusion channel information of the i-1 first feature information;
  • the first activation function is used to perform non-linear processing on the fusion channel information of the i-1 first feature information to obtain the channel attention information of the i-1 first feature information.
  • the spatial attention module includes: a first channel compression unit, a second channel compression unit, and a spatial feature extraction unit;
  • the first channel compression unit is configured to perform channel dimension compression on the fusion channel feature information of the i-1 first feature information, to obtain the first channel compression information of the i-1 first feature information;
  • the second channel compression unit is used to perform channel dimension compression on the fusion channel feature information of the i-1 first feature information, to obtain the second channel compression information of the i-1 first feature information;
  • the spatial feature extraction unit is used to perform spatial feature extraction on the first channel compressed information and the second channel compressed information of the i-1 first feature information, to obtain the spatial feature information of the i-1 first feature information;
  • the spatial attention information of the i-1 first feature information is determined according to the spatial feature information of the i-1 first feature information.
  • the first channel compression unit and/or the second channel compression unit includes a pooling layer.
  • the first channel compression unit is a maximum pooling layer
  • the second channel compression unit is an average pooling layer
  • the spatial feature extraction unit is a convolutional layer.
  • the spatial attention module also includes a second activation function
  • the second activation function is used to perform non-linear processing on the spatial feature information of the i-1 first feature information to obtain the spatial attention information of the i-1 first feature information.
  • the spatial dimension of the channel attention information of the i-1 th first feature information is 1 ⁇ 1.
  • the feature dimension of the spatial attention information of the i-1th first feature information is 1.
  • the dynamic conversion model provided by the embodiment of the present application adds a convolutional attention module to each branch, and the convolutional attention module includes a channel attention module and a spatial attention module, respectively for channel features and spatial features Learning is carried out, thereby improving the learning of image detail features by the dynamic conversion model, so that the dynamic conversion model can reconstruct more detailed features in the image, thereby improving the quality of the HDR image generated by the dynamic conversion model.
  • the dynamic conversion model further includes at least one downsampling unit; the downsampling unit is used for downsampling the feature information output by the encoding module in a spatial dimension.
  • the downsampling unit is a maximum pooling layer.
  • the dynamic conversion model further includes at least one upsampling unit; the upsampling unit is used to perform spatial dimension upsampling on the feature information output by the decoding module.
  • the upsampling unit is a bilinear interpolation unit.
  • the dynamic conversion model also includes a first convolutional layer; the first convolutional layer is used to extract features from the reconstructed image, obtain the initial feature map of the reconstructed image, and input the initial feature map into the first in the first encoding module and the first convolutional attention module.
  • the dynamic conversion model also includes a second convolutional layer; the second convolutional layer is used for feature extraction of the second feature information of the reconstructed image output by the last decoding module, and outputs the HDR image of the reconstructed image .
  • the dynamic conversion model includes a first convolutional layer, 4 encoding modules connected in series, 3 down-sampling units, 4 decoding modules connected in series, 3 Upsampling units, 4 CBAMs on the skip connections of the encoding module and decoding module, and the second convolutional layer.
  • the convolution kernel of the first convolution layer is 3 ⁇ 3, and the number of channels is 32, where the number of channels can also be understood as a feature dimension
  • the convolution kernel of the second convolution layer is 1 ⁇ 1, and the number of channels is 3, and the second convolutional layer includes an activation function.
  • the first encoding module includes a convolutional block with 64 channels
  • the second encoding module includes a convolutional block with 128 channels
  • the third encoding module includes a convolutional block with 256 channels
  • the fourth encoding module includes A convolutional block with 512 channels.
  • a first down-sampling unit is set between the first coding module and the second coding module
  • a second down-sampling unit is set between the second coding module and the third coding module
  • a second down-sampling unit is set between the third coding module and the fourth coding module
  • a third down-sampling unit is set, the first down-sampling unit, the second down-sampling unit and the third down-sampling unit are all maximum pooling layers with a convolution kernel of 2 ⁇ 2 and a step size of S2.
  • the first decoding module includes a convolutional block with 256 channels
  • the second decoding module includes a convolutional block with 128 channels
  • the third decoding module includes a convolutional block with 64 channels
  • the fourth decoding module includes A convolutional block with 32 channels.
  • a first upsampling unit is set between the fourth coding block and the first decoding module
  • a second upsampling unit is set between the first decoding module and the second decoding module
  • a second upsampling unit is set between the second decoding module and the third decoding module
  • the third upsampling unit is set, the first upsampling unit, the second upsampling unit and the third sampling unit are all bilinear interpolation units, and the upsampling multiple is 2 ⁇ 2.
  • each upsampling unit also Including a convolutional layer
  • the first upsampling unit is Bilinear Upsample 2 ⁇ 2, Conv 3 ⁇ 3 256
  • the second upsampling unit is Bilinear Upsample 2 ⁇ 2, Conv 3 ⁇ 3 128,
  • the third upsampling unit is Bilinear Upsample 2 ⁇ 2, Conv 3 ⁇ 3 64.
  • the size of the reconstructed image is H ⁇ W ⁇ 3, where H ⁇ W represents the length and width dimensions of the reconstructed image, and 3 represents the number of RGB3 channels of the reconstructed image.
  • H ⁇ W represents the length and width dimensions of the reconstructed image
  • 3 represents the number of RGB3 channels of the reconstructed image.
  • the initial feature map output by the first convolutional layer is respectively input into the first encoding module and the first CBAM, and the convolution block in the first convolution module performs convolution processing on the initial feature map to obtain the first first image of the reconstructed image.
  • the first first feature information is respectively input into the second CBAM and the first down-sampling unit, and the size of the first first feature information is H ⁇ W ⁇ 64.
  • the first down-sampling unit down-samples the first first feature information to H/2 ⁇ W/2 ⁇ 64, and inputs the sampled first first feature information into the second coding module.
  • the convolution block in the second encoding module performs convolution processing on the sampled first first feature information to obtain the second first feature information of the reconstructed image, and input the second first feature information respectively For the third CBAM and the second down-sampling unit, the size of the second first feature information is H/2 ⁇ W/2 ⁇ 128.
  • the second down-sampling unit down-samples the second first feature information to H/4 ⁇ W/4 ⁇ 128, and inputs the sampled second first feature information into the third coding module.
  • the convolution block in the third encoding module performs convolution processing on the sampled second first feature information to obtain the third first feature information of the reconstructed image, and input the third first feature information respectively.
  • the fourth CBAM and the third down-sampling unit, the size of the third first feature information is H/4 ⁇ W/4 ⁇ 256.
  • the third down-sampling unit down-samples the third first feature information to H/8 ⁇ W/8 ⁇ 256, and inputs the sampled third first feature information into the fourth coding module.
  • the convolution block in the fourth encoding module performs convolution processing on the sampled third first feature information to obtain the fourth first feature information of the reconstructed image, and input the fourth first feature information into the first An upsampling unit, the size of the fourth first feature information is H/8 ⁇ W/8 ⁇ 512.
  • the first upsampling unit upsamples the fourth first feature information to H/4 ⁇ W/4 ⁇ 256.
  • the fourth CBAM performs feature extraction on the third first feature information, and outputs the first third feature information of the reconstructed image.
  • the first third feature information is concatenated with the upsampled fourth first feature information and input to the first decoding module.
  • the first decoding module performs feature extraction on the concatenated first third feature information and the upsampled fourth first feature information to obtain the first second feature information of the reconstructed image, and converts the first
  • the second feature information is input into the second upsampling unit.
  • the second upsampling unit upsamples the first second feature information to H/2 ⁇ W/2 ⁇ 128.
  • the third CBAM performs feature extraction on the second first feature information, and outputs the second and third feature information of the reconstructed image.
  • the second third feature information is concatenated with the upsampled first second feature information and then input to the second decoding module.
  • the second decoding module performs feature extraction on the concatenated second third feature information and the up-sampled first second feature information to obtain the second second feature information of the reconstructed image, and converts the second
  • the second feature information is input into the third upsampling unit.
  • the third upsampling unit upsamples the second second feature information to H ⁇ W ⁇ 64.
  • the second CBAM performs feature extraction on the first first feature information, and outputs the third third feature information of the reconstructed image.
  • the third third feature information is concatenated with the upsampled second second feature information and input to the third decoding module.
  • the third decoding module performs feature extraction on the concatenated third third feature information and the up-sampled second second feature information to obtain the third second feature information of the reconstructed image.
  • the first CBAM performs feature extraction on the initial feature map of the reconstructed image, and outputs the fourth and third feature information of the reconstructed image.
  • the fourth third feature information is concatenated with the third second feature information and input to the fourth decoding module.
  • the fourth decoding module performs feature extraction on the concatenated fourth third feature information and third second feature information to obtain the fourth second feature information of the reconstructed image, and converts the fourth second feature information Input the second convolutional layer, the size of the fourth second feature information is H ⁇ W ⁇ 32.
  • the second convolutional layer processes the fourth second feature information, and outputs the HDR image of the reconstructed image, and the size of the HDR image is H ⁇ W ⁇ 3.
  • the above-mentioned dynamic conversion model is used to convert the reconstructed image with a low dynamic range into an image with a high dynamic range, and the whole conversion process is simple and low in cost.
  • the initial parameters of the dynamic conversion model during training are pre-training parameters obtained during pre-training of the pre-training model.
  • the loss function of the dynamic transformation model includes at least one of a reconstruction loss function, a perceptual loss function, and a style loss function.
  • the loss function of the dynamic conversion model is as shown in the following formula:
  • Loss L 1 + ⁇ s L st + ⁇ p L p
  • Loss is the loss function of the dynamic conversion model
  • L1 is the reconstruction loss function
  • Lst is the perceptual loss function
  • Lp is the style loss function
  • ⁇ s and ⁇ p are hyperparameters.
  • the reconstruction loss function of the dynamic transformation model is determined based on the error between the compressed tone-mapped values of the true value of the HDR image and the compressed tone-mapped value of the predicted value of the HDR image, where the compressed tone-mapped value of the predicted value of the HDR image
  • the mapping value is determined according to the preset compressed tone mapping function and the predicted value of the HDR image
  • the compressed tone mapping value of the true value of the HDR image is determined according to the compressed tone mapping function and the true value of the HDR image.
  • the reconstruction loss function of the dynamic transformation model is determined based on the following formula:
  • L1 represents the reconstruction loss function
  • x H or GT
  • H is the predicted value output by the dynamic conversion model when training the dynamic conversion model
  • GT is the real value of the training image
  • ⁇ . ⁇ 1 indicates the L1 norm
  • is the preset parameter.
  • the perceptual loss function of the dynamic transformation model is determined based on the error between the first feature value and the second feature value, wherein the first feature value is the compressed tone map value of the predicted value of the HDR image in the pre-training The corresponding eigenvalue in the feature map of layer l of the model, the second eigenvalue is the compressed tone mapping value of the true value of the HDR image The corresponding eigenvalue in the feature map of layer l, the compressed tone mapping value of the predicted value of the HDR image It is determined according to the preset compressed tone mapping function and the predicted value of the HDR image, and the compressed tone mapping value of the true value of the HDR image is determined according to the compressed tone mapping function and the true value of the HDR image.
  • the perceptual loss function of the dynamic transition model is determined based on the following formula:
  • Lp represents the perceptual loss function
  • ⁇ l represents the feature map of the l-th layer of the pre-training model
  • the size is C l ⁇ H l ⁇ W l .
  • the style loss function of the dynamic transformation model is determined based on the error between the first element value and the second element value, wherein the first element value is the compressed tone map value of the HDR image prediction value in the pre-training The corresponding element value in the Gram matrix of the layer l feature map of the model, the second element value is the compressed tone mapping value of the true value of the HDR image corresponding to the element value in the Gram matrix, and the compressed tone mapping value of the predicted value of the HDR image It is determined according to the preset compressed tone mapping function and the predicted value of the HDR image, and the compressed tone mapping value of the true value of the HDR image is determined according to the compressed tone mapping function and the true value of the HDR image.
  • the style loss function of the dynamic transformation model is determined based on the following formula:
  • Lp represents the perceptual loss function
  • G(.) is the Gram matrix of the l-th layer feature of the pre-trained model
  • ⁇ l represents the feature map of layer l of the pre-training model, the size of which is C l ⁇ H l ⁇ W l
  • the size of K l is C l H l W l .
  • the reconstruction image with a low dynamic range is converted into an image with a high dynamic range by using the above dynamic conversion model, and the whole conversion process is simple and low in cost.
  • the reconstruction loss, perceptual loss and style loss to reduce high dynamic range image reconstruction distortion, artifacts and abnormal tone, the decoded image quality is further improved under the premise of ensuring the bit rate.
  • Fig. 8 is a schematic flow chart of an image processing method provided by an embodiment of the present application. As shown in Fig. 8, the method includes:
  • the dynamic conversion model includes: N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in the N encoding modules and the first in the N decoding modules
  • the input of the decoding module is connected, and the i-th encoding module is skipped to the N-i+1 decoding module, and the i-th encoding module is used for the i-1th first output of the i-1 encoding module
  • the feature information is extracted to obtain the i-th first feature information of the LDR image, and the N-i+1 decoding module is used for the i-1-th first feature information and the N-i-th second feature information of the LDR image
  • sequence numbers of the above-mentioned processes do not mean the order of execution, and the order of execution of the processes should be determined by their functions and internal logic, and should not be used in this application.
  • the implementation of the examples constitutes no limitation.
  • the term "and/or" is only an association relationship describing associated objects, indicating that there may be three relationships. Specifically, A and/or B may mean: A exists alone, A and B exist simultaneously, and B exists alone.
  • the character "/" in this article generally indicates that the contextual objects are an "or" relationship.
  • FIG. 9 is a schematic block diagram of an image decoding device provided by an embodiment of the present application.
  • the image decoding device may be the decoder shown in FIG. 3 , or a component in the decoder, such as a processor in the decoder.
  • the image decoding device 10 may include:
  • Decoding unit 11 configured to decode the code stream to obtain a reconstructed image
  • a processing unit 12 configured to input the reconstructed image into a dynamic conversion model for dynamic conversion to obtain a high dynamic range HDR image of the reconstructed image;
  • the dynamic conversion model includes: N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in the N encoding modules is the same as that of the first encoding module in the N decoding modules
  • the input of the decoding module is connected, and the i-th encoding module is skip-connected to the N-i+1 decoding module, and the i-th encoding module is used for the i-1th output of the i-1 encoding module
  • Feature extraction is performed on the first feature information to obtain the i-th first feature information of the reconstructed image
  • the N-i+1th decoding module is used to extract the i-1-th first feature information and the Perform feature extraction on the N-ith second feature information of the reconstructed image to obtain the N-i+1th second feature information of the reconstructed image
  • the HDR image of the reconstructed image is based on the last one of the N decoding modules Determined by the second feature information output by the decoding module, the i is
  • the dynamic conversion model further includes: a convolutional attention module located in the skip connection between the i-th encoding module and the N-i+1-th decoding module;
  • the convolutional attention module is used to extract spatial information and channel information from the i-1 first feature information to obtain the i-1 third feature information of the reconstructed image;
  • the N-i+1th decoding module is used to perform feature extraction on the i-1th third feature information and the N-ith second feature information to obtain the N-i+th feature information of the reconstructed image. 1 piece of second characteristic information.
  • the convolutional attention module includes a channel attention module and a spatial attention module
  • the channel attention module is used to extract the channel information of the i-1 first feature information, and obtain the channel attention information of the i-1 first feature information;
  • the spatial attention module is used to extract spatial information from the i-1 first feature information and channel attention information of the i-1 first feature information, to obtain the i-1 first feature information Spatial attention information of the first feature information;
  • the i-1 th third feature information of the reconstructed image is determined according to the channel attention information and the spatial attention information of the i-1 th first feature information.
  • the convolutional attention module further includes a first multiplication unit
  • the first multiplication unit is configured to multiply the i-1 first feature information and the channel attention information of the i-1 first feature information to obtain the i-1 first feature Information fusion channel feature information;
  • the spatial attention module is configured to extract spatial information from the fused channel feature information of the i-1 first feature information to obtain the spatial attention information of the i-1 first feature information.
  • the convolutional attention module further includes a second multiplication unit
  • the second multiplication unit is used to multiply the fusion channel feature information and the spatial attention information of the i-1 first feature information to obtain the i-1 third feature information of the reconstructed image.
  • the channel attention module includes: a first spatial compression unit, a second spatial compression unit, and a channel feature extraction unit;
  • the first spatial compression unit is configured to perform spatial dimension compression on the i-1 first feature information to obtain first spatial compression information of the i-1 first feature information;
  • the second spatial compression unit is configured to perform spatial dimension compression on the i-1 first feature information to obtain second spatial compression information of the i-1 first feature information;
  • the channel feature extraction unit is configured to perform channel feature extraction on the first spatially compressed information of the i-1 first feature information, obtain the first channel information of the i-1 first feature information, and perform the channel feature extraction on the i-1 first feature information. performing channel feature extraction on the second spatial compression information of the i-1 first feature information, and obtaining the second channel information of the i-1 first feature information;
  • the channel attention information of the i-1 first feature information is determined according to the first channel information and the second channel information of the i-1 first feature information.
  • the first spatial compression unit and/or the second spatial compression unit includes a pooling layer.
  • the first spatial compression unit is a maximum pooling layer, and/or the second spatial compression unit is an average pooling layer.
  • the channel feature extraction unit is a multi-layer perceptron MLP.
  • the channel attention module further includes: a first addition unit and a first activation function
  • the first adding unit is configured to add the first channel information and the second channel information of the i-1 pieces of first feature information to obtain the fusion channel information of the i-1 pieces of first feature information;
  • the first activation function is used to perform nonlinear processing on the fused channel information of the i-1 pieces of first feature information to obtain channel attention information of the i-1 th piece of first feature information.
  • the spatial attention module includes: a first channel compression unit, a second channel compression unit, and a spatial feature extraction unit;
  • the first channel compression unit is configured to perform channel dimension compression on the fused channel feature information of the i-1 first feature information, to obtain the first channel compression information of the i-1 first feature information;
  • the second channel compression unit is configured to perform channel dimension compression on the fused channel feature information of the i-1 first feature information to obtain second channel compression information of the i-1 first feature information;
  • the spatial feature extraction unit is configured to perform spatial feature extraction on the first channel compressed information and the second channel compressed information of the i-1 first feature information, to obtain the i-1 first feature information Spatial feature information;
  • the spatial attention information of the i-1 th first feature information is determined according to the spatial feature information of the i-1 th first feature information.
  • the first channel compression unit and/or the second channel compression unit includes a pooling layer.
  • the first channel compression unit is a maximum pooling layer, and/or the second channel compression unit is an average pooling layer.
  • the spatial feature extraction unit is a convolutional layer.
  • the spatial attention module further includes a second activation function
  • the second activation function is used to perform nonlinear processing on the spatial feature information of the i-1 th first feature information to obtain the spatial attention information of the i-1 th first feature information.
  • the spatial dimension of the channel attention information of the i-1 th first feature information is 1 ⁇ 1.
  • the feature dimension of the spatial attention information of the i-1 th first feature information is 1.
  • the dynamic conversion model further includes at least one downsampling unit
  • the down-sampling unit is used for down-sampling the feature information output by the encoding module in a spatial dimension.
  • the downsampling unit is a max pooling layer.
  • the dynamic conversion model further includes at least one upsampling unit
  • the up-sampling unit is used for up-sampling the feature information output by the decoding module in a spatial dimension.
  • the upsampling unit is a bilinear interpolation unit.
  • each of the N coding modules includes at least one convolutional block, wherein the parameters of the convolutional blocks included in each of the N coding modules are not completely the same.
  • each of the N decoding modules includes at least one convolutional block, and parameters of the convolutional blocks included in each of the N decoding modules are not completely the same.
  • the N-i second feature information is determined according to the N first feature information output by the N encoding module; or,
  • the N-i-th second feature information is determined according to the N-i-th second feature information output by the N-i-th decoding module; or,
  • the i-1th first feature information is determined according to the reconstructed image.
  • the i-1 first feature information is determined according to the first feature information output by the i-1 encoding module.
  • the N-i+1th decoding module is configured to perform concatenated feature information on the i-1th third feature information and the N-ith second feature information feature extraction, to obtain the N-i+1th second feature information of the reconstructed image.
  • the dynamic conversion model further includes a first convolutional layer
  • the first convolutional layer is used to perform feature extraction on the reconstructed image to obtain an initial feature map of the reconstructed image, and input the initial feature map to the first coding module and the first convolutional attention module respectively middle.
  • the dynamic conversion model further includes a second convolutional layer
  • the second convolutional layer is used to perform feature extraction on the second feature information of the reconstructed image output by the last decoding module, and output an HDR image of the reconstructed image.
  • the initial parameters of the dynamic conversion model during training are pre-training parameters obtained during pre-training of the pre-training model.
  • the loss function of the dynamic conversion model includes at least one of a reconstruction loss function, a perceptual loss function and a style loss function.
  • the loss function of the dynamic conversion model is as shown in the following formula:
  • Loss L 1 + ⁇ s L st + ⁇ p L p
  • Loss is the loss function of the dynamic conversion model
  • the L1 is the reconstruction loss function
  • the Lst is the perceptual loss function
  • the Lp is the style loss function
  • the ⁇ s and ⁇ p is a hyperparameter.
  • the reconstruction loss function of the dynamic transformation model is determined according to the error between the compressed tone mapping value of the true value of the HDR image and the compressed tone mapping value of the predicted value of the HDR image, wherein the HDR image
  • the compressed tone mapping value of the predicted value is determined according to the preset compressed tone mapping function and the predicted value of the HDR image
  • the compressed tone mapping value of the real value of the HDR image is determined according to the compressed tone mapping function and the HDR image
  • the truth value is determined.
  • the reconstruction loss function of the dynamic transformation model is determined based on the following formula:
  • the H is the predicted value output by the dynamic conversion model when the dynamic conversion model is trained
  • the GT is the real value of the training image
  • ⁇ . ⁇ 1 represents the L1 norm
  • is a preset parameter.
  • the perceptual loss function of the dynamic conversion model is determined based on an error between a first eigenvalue and a second eigenvalue, wherein the first eigenvalue is the compressed tone of the predicted value of the HDR image
  • the first eigenvalue is the compressed tone of the predicted value of the HDR image
  • the second feature value is a feature corresponding to the compressed tone mapping value of the true value of the HDR image in the feature map of the first layer value
  • the compressed tone mapping value of the predicted value of the HDR image is determined according to the preset compressed tone mapping function and the predicted value of the HDR image
  • the compressed tone mapping value of the real value of the HDR image is determined according to the compressed tone mapping function and the ground truth value of the HDR image is determined.
  • the perceptual loss function of the dynamic transition model is determined based on the following formula:
  • Lp represents the perceptual loss function
  • x H or GT
  • H is the predicted value output by the dynamic conversion model when the dynamic conversion model is trained
  • GT is the real value of the training image
  • ⁇ . ⁇ 1 represents the L1 norm
  • is a preset parameter
  • ⁇ l represents the feature map of layer l of the pre-training model
  • its size is C l ⁇ H l ⁇ W l .
  • the style loss function of the dynamic conversion model is determined based on an error between a first element value and a second element value, wherein the first element value is a compressed tone of an HDR image prediction value
  • the first element value is a compressed tone of an HDR image prediction value
  • the second element value is the corresponding element in the Gram matrix of the compressed tone mapping value of the true value of the HDR image value
  • the compressed tone mapping value of the predicted value of the HDR image is determined according to the preset compressed tone mapping function and the predicted value of the HDR image
  • the compressed tone mapping value of the real value of the HDR image is determined according to the compressed tone mapping function and the ground truth value of the HDR image is determined.
  • the style loss function of the dynamic transformation model is determined based on the following formula:
  • Lp represents the perceptual loss function
  • G(.) is the Gram matrix of the l-th layer feature of the pre-training model
  • x H or GT
  • the H is the predicted value output by the dynamic conversion model when the H is training the dynamic conversion model
  • the GT is the HDR true value of the training image
  • ⁇ . ⁇ 1 represents the L1 norm
  • is a preset parameter
  • ⁇ l represents the feature map of layer l of the pre-training model
  • its size is C l ⁇ H l ⁇ W l
  • the size of K l is C l H l W l .
  • the device embodiment and the method embodiment may correspond to each other, and similar descriptions may refer to the method embodiment. To avoid repetition, details are not repeated here.
  • the device 10 shown in FIG. 9 may correspond to the corresponding subject in the image decoding method of the embodiment of the present application, and the aforementioned and other operations and/or functions of each unit in the device 10 are respectively in order to realize the image decoding method For the sake of brevity, the corresponding process will not be repeated here.
  • Fig. 10 is a schematic block diagram of an image processing device provided by an embodiment of the present application.
  • the image processing device 20 may include:
  • An acquisition unit 21 configured to acquire a low dynamic range LDR image to be processed
  • a processing unit 22 configured to input the LDR image into a dynamic conversion model for dynamic conversion to obtain a high dynamic range HDR image of the LDR image;
  • the dynamic conversion model includes: N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in the N encoding modules is the same as that of the first encoding module in the N decoding modules
  • the input of the decoding module is connected, and the i-th encoding module is skip-connected to the N-i+1 decoding module, and the i-th encoding module is used for the i-1th output of the i-1 encoding module
  • Feature extraction is performed on the first feature information to obtain the ith first feature information of the LDR image
  • the N-i+1 decoding module is used to extract the i-1 first feature information and the Feature extraction is performed on the N-i second feature information of the LDR image to obtain the N-i+1 second feature information of the LDR image
  • the HDR image of the LDR image is based on the last one of the N decoding modules Determined by the second feature information output by the decoding module, the i is a positive integer less than or equal to N
  • the dynamic conversion model further includes: a convolutional attention module located in the skip connection between the i-th encoding module and the N-i+1-th decoding module;
  • the convolutional attention module is used to extract spatial information and channel information from the i-1 first feature information to obtain the i-1 third feature information of the LDR image;
  • the N-i+1th decoding module is used to perform feature extraction on the i-1th third feature information and the N-ith second feature information to obtain the N-i+th feature information of the LDR image. 1 piece of second characteristic information.
  • the convolutional attention module includes a channel attention module and a spatial attention module
  • the channel attention module is used to extract the channel information of the i-1 first feature information, and obtain the channel attention information of the i-1 first feature information;
  • the spatial attention module is used to extract spatial information from the i-1 first feature information and channel attention information of the i-1 first feature information, to obtain the i-1 first feature information Spatial attention information of the first feature information;
  • the i-1 th third feature information of the LDR image is determined according to the channel attention information and the spatial attention information of the i-1 th first feature information.
  • the convolutional attention module further includes a first multiplication unit
  • the first multiplication unit is configured to multiply the i-1 first feature information and the channel attention information of the i-1 first feature information to obtain the i-1 first feature Information fusion channel feature information;
  • the spatial attention module is configured to extract spatial information from the fused channel feature information of the i-1 first feature information to obtain the spatial attention information of the i-1 first feature information.
  • the convolutional attention module further includes a second multiplication unit
  • the second multiplication unit is configured to multiply the fusion channel feature information and the spatial attention information of the i-1 first feature information to obtain the i-1 third feature information of the LDR image.
  • the channel attention module includes: a first spatial compression unit, a second spatial compression unit, and a channel feature extraction unit;
  • the first spatial compression unit is configured to perform spatial dimension compression on the i-1 first feature information to obtain first spatial compression information of the i-1 first feature information;
  • the second spatial compression unit is configured to perform spatial dimension compression on the i-1 first feature information to obtain second spatial compression information of the i-1 first feature information;
  • the channel feature extraction unit is configured to perform channel feature extraction on the first spatially compressed information of the i-1 first feature information, obtain the first channel information of the i-1 first feature information, and perform the channel feature extraction on the i-1 first feature information. performing channel feature extraction on the second spatial compression information of the i-1 first feature information, and obtaining the second channel information of the i-1 first feature information;
  • the channel attention information of the i-1 first feature information is determined according to the first channel information and the second channel information of the i-1 first feature information.
  • the first spatial compression unit and/or the second spatial compression unit comprises a pooling layer.
  • the first spatial compression unit is a max pooling layer
  • the second spatial compression unit is an average pooling layer
  • the channel feature extraction unit is a multi-layer perceptron MLP.
  • the channel attention module further includes: a first addition unit and a first activation function
  • the first adding unit is configured to add the first channel information and the second channel information of the i-1 pieces of first feature information to obtain the fusion channel information of the i-1 pieces of first feature information;
  • the first activation function is used to perform nonlinear processing on the fused channel information of the i-1 pieces of first feature information to obtain channel attention information of the i-1 th piece of first feature information.
  • the spatial attention module includes: a first channel compression unit, a second channel compression unit, and a spatial feature extraction unit;
  • the first channel compression unit is configured to perform channel dimension compression on the fused channel feature information of the i-1 first feature information, to obtain the first channel compression information of the i-1 first feature information;
  • the second channel compression unit is configured to perform channel dimension compression on the fused channel feature information of the i-1 first feature information to obtain second channel compression information of the i-1 first feature information;
  • the spatial feature extraction unit is configured to perform spatial feature extraction on the first channel compressed information and the second channel compressed information of the i-1 first feature information, to obtain the i-1 first feature information Spatial feature information;
  • the spatial attention information of the i-1 th first feature information is determined according to the spatial feature information of the i-1 th first feature information.
  • the first channel compression unit and/or the second channel compression unit comprises a pooling layer.
  • the first channel compression unit is a max pooling layer, and/or the second channel compression unit is an average pooling layer.
  • the spatial feature extraction unit is a convolutional layer.
  • the spatial attention module further includes a second activation function
  • the second activation function is used to perform nonlinear processing on the spatial feature information of the i-1 th first feature information to obtain the spatial attention information of the i-1 th first feature information.
  • the spatial dimension of the channel attention information of the i-1 th first feature information is 1 ⁇ 1.
  • the feature dimension of the spatial attention information of the i-1 th first feature information is 1.
  • the dynamic conversion model further includes at least one downsampling unit
  • the down-sampling unit is used for down-sampling the feature information output by the encoding module in a spatial dimension.
  • the downsampling unit is a max pooling layer.
  • the dynamic conversion model further includes at least one upsampling unit
  • the up-sampling unit is used for up-sampling the feature information output by the decoding module in a spatial dimension.
  • the upsampling unit is a bilinear interpolation unit.
  • each of the N encoding modules includes at least one convolutional block, and parameters of the convolutional blocks included in each of the N encoding modules are not completely the same.
  • each of the N decoding modules includes at least one convolutional block, wherein parameters of the convolutional blocks included in each of the N decoding modules are not completely the same.
  • the N-i th second feature information is determined according to the N th first feature information output by the N th encoding module; or,
  • the N-i-th second feature information is determined according to the N-i-th second feature information output by the N-i-th decoding module; or,
  • the i-1th first feature information is determined according to the LDR image; or,
  • the i-1 first feature information is determined according to the first feature information output by the i-1 encoding module.
  • the N-i+1th decoding module is used to perform feature extraction on the concatenated feature information of the i-1th third feature information and the N-ith second feature information , to obtain the N-i+1th second feature information of the LDR image.
  • the dynamic transformation model further includes a first convolutional layer
  • the first convolutional layer is used to extract features from the LDR image, obtain an initial feature map of the LDR image, and input the initial feature map to the first coding module and the first convolutional attention module respectively middle.
  • the dynamic transformation model further includes a second convolutional layer
  • the second convolutional layer is used to perform feature extraction on the second feature information of the LDR image output by the last decoding module, and output an HDR image of the LDR image.
  • the initial parameters of the dynamic transformation model during training are pre-training parameters obtained during pre-training of the pre-training model.
  • the loss function of the dynamic transformation model includes at least one of a reconstruction loss function, a perceptual loss function and a style loss function.
  • the loss function of the dynamic conversion model is as shown in the following formula:
  • Loss L 1 + ⁇ s L st + ⁇ p L p
  • Loss is the loss function of the dynamic conversion model
  • the L1 is the reconstruction loss function
  • the Lst is the perceptual loss function
  • the Lp is the style loss function
  • the ⁇ s and ⁇ p is a hyperparameter.
  • the reconstruction loss function of the dynamic transformation model is determined from the error between the compressed tone-mapped values of the true value of the HDR image and the compressed tone-mapped value of the predicted value of the HDR image, wherein the predicted HDR image
  • the compressed tone-mapping value of the value is determined according to the preset compressed tone-mapping function and the predicted value of the HDR image
  • the compressed tone-mapping value of the real value of the HDR image is determined according to the compressed tone-mapping function and the real value of the HDR image. The value is determined.
  • the reconstruction loss function of the dynamic transformation model is determined based on the following formula:
  • the H is the predicted value output by the dynamic conversion model when the dynamic conversion model is trained
  • the GT is the real value of the training image
  • ⁇ . ⁇ 1 represents the L1 norm
  • is a preset parameter.
  • the perceptual loss function of the dynamic transformation model is determined based on an error between a first eigenvalue and a second eigenvalue, wherein the first eigenvalue is a compressed tone map of an HDR image prediction value
  • the value corresponds to the feature value in the feature map of the first layer of the pre-training model
  • the second feature value is the corresponding feature value of the compressed tone mapping value of the true value of the HDR image in the feature map of the first layer
  • the compressed tone mapping value of the predicted value of the HDR image is determined according to a preset compressed tone mapping function and the predicted value of the HDR image
  • the compressed tone mapping value of the real value of the HDR image is determined according to the compressed tone mapping function and the true value of the HDR image is determined.
  • the perceptual loss function of the dynamic transition model is determined based on the following formula:
  • Lp represents the perceptual loss function
  • x H or GT
  • H is the predicted value output by the dynamic conversion model when the dynamic conversion model is trained
  • GT is the real value of the training image
  • ⁇ . ⁇ 1 represents the L1 norm
  • is a preset parameter
  • ⁇ l represents the feature map of layer l of the pre-training model
  • its size is C l ⁇ H l ⁇ W l .
  • the style loss function of the dynamic transformation model is determined based on an error between a first element value and a second element value, wherein the first element value is a compressed tone map of an HDR image prediction value
  • the value corresponds to the element value in the Gram Gram matrix of the l-th layer feature map of the pre-training model
  • the second element value is the corresponding element value in the Gram matrix of the compressed tone mapping value of the true value of the HDR image
  • the compressed tone mapping value of the predicted value of the HDR image is determined according to a preset compressed tone mapping function and the predicted value of the HDR image
  • the compressed tone mapping value of the real value of the HDR image is determined according to the compressed tone mapping function and the true value of the HDR image is determined.
  • the style loss function of the dynamic transformation model is determined based on the following formula:
  • Lp represents the perceptual loss function
  • G(.) is the Gram matrix of the l-th layer feature of the pre-training model
  • x H or GT
  • the H is the predicted value output by the dynamic conversion model when the H is training the dynamic conversion model
  • the GT is the HDR true value of the training image
  • ⁇ . ⁇ 1 represents the L1 norm
  • is a preset parameter
  • ⁇ l represents the feature map of layer l of the pre-training model
  • its size is C l ⁇ H l ⁇ W l
  • the size of K l is C l H l W l .
  • the device embodiment and the method embodiment may correspond to each other, and similar descriptions may refer to the method embodiment. To avoid repetition, details are not repeated here.
  • the device 20 shown in FIG. 10 may correspond to the corresponding subject in the image processing method of the embodiment of the present application, and the aforementioned and other operations and/or functions of each unit in the device 20 are respectively in order to realize the image processing method For the sake of brevity, the corresponding process will not be repeated here.
  • Fig. 11 is a schematic block diagram of a model training device provided by an embodiment of the present application.
  • model training device 40 comprises:
  • An acquisition unit 41 configured to acquire a low dynamic range LDR training image and a true value of a high dynamic range HDR image of the LDR training image;
  • the processing unit 42 is configured to input the LDR training image into the dynamic conversion model, and extract the i-1 first feature information through the i-th encoding module to obtain the i-th first feature of the LDR training image information, wherein the dynamic conversion model includes N encoding modules connected in series and the N decoding modules connected in series, and the output of the last encoding module in the N encoding modules is the same as that of the N decoding modules
  • the input connection of the first decoding module of , and the i-th encoding module is skipped and connected to the N-i+1 decoding module, the i is a positive integer less than or equal to N, and the N is a positive integer; through the The N-i+1 decoding module performs feature extraction on the i-1 first feature information and the N-i second feature information of the LDR training image to obtain the N-i of the LDR training image.
  • the dynamic conversion model further includes: a convolutional attention module located in the skip connection between the i-th encoding module and the N-i+1-th decoding module, the above-mentioned processing unit 42 , specifically for performing spatial information and channel information extraction on the i-1th first feature information through the convolution attention module to obtain the i-1th third feature information of the LDR training image; by The N-i+1th decoding module performs feature extraction on the i-1th third feature information and the N-ith second feature information to obtain the N-i+1th of the LDR training image a second characteristic information.
  • a convolutional attention module located in the skip connection between the i-th encoding module and the N-i+1-th decoding module, the above-mentioned processing unit 42 , specifically for performing spatial information and channel information extraction on the i-1th first feature information through the convolution attention module to obtain the i-1th third feature information of the LDR training image; by The N-i+1th decoding module performs feature extraction on the i-1
  • the convolutional attention module includes a channel attention module and a spatial attention module
  • the above-mentioned processing unit 42 is specifically configured to perform the i-1th first feature through the channel attention module Extract channel information from the information to obtain the channel attention information of the i-1 first feature information; perform spatial information extraction on the fusion channel feature information of the i-1 first feature information through the spatial attention module , to obtain the spatial attention information of the i-1 first feature information, the fusion channel feature information of the i-1 first feature information is based on the i-1 first feature information and the Determined by the channel attention information of the i-1 first feature information; according to the channel attention information and spatial attention information of the i-1 first feature information, determine the i-1th of the LDR training image A third characteristic information.
  • the convolutional attention module further includes a first multiplication unit, the above-mentioned processing unit 42 is also used to perform the i-1th first feature information and the i-th by the first multiplication unit Multiply the channel attention information of the first feature information to obtain the fusion channel feature information of the i-1 first feature information.
  • the convolutional attention module further includes a second multiplication unit, the above-mentioned processing unit 42, which is specifically used to perform the fusion channel of the i-1th first feature information through the second multiplication unit
  • the feature information is multiplied by the spatial attention information to obtain the i-1th third feature information of the LDR training image.
  • the channel attention module includes: a first spatial compression unit, a second spatial compression unit, and a channel feature extraction unit, and the above processing unit 42 is specifically configured to use the first spatial compression unit to analyze the Perform spatial dimension compression on the i-1th first feature information to obtain first spatial compression information of the i-1th first feature information; use the second spatial compression unit to compress the i-1th first feature information
  • a feature information is subjected to spatial dimension compression to obtain the second spatial compression information of the i-1 first feature information
  • the first spatial dimension of the i-1 first feature information is obtained by the channel feature extraction unit performing channel feature extraction on the compressed information to obtain the first channel information of the i-1 first feature information; performing the second spatial compression information on the i-1 first feature information through the channel feature extraction unit Channel feature extraction, obtaining the second channel information of the i-1 first feature information; determining the i-1th channel information according to the first channel information and second channel information of the i-1 first feature information Channel attention information of the first feature information.
  • the first spatial compression unit and/or the second spatial compression unit comprises a pooling layer.
  • the first spatial compression unit is a max pooling layer
  • the second spatial compression unit is an average pooling layer
  • the channel feature extraction unit is a multi-layer perceptron MLP.
  • the channel attention module further includes: a first addition unit and a first activation function, and the above-mentioned processing unit 42 is specifically configured to perform the i-1 first features through the first addition unit adding the first channel information and the second channel information of the information to obtain the fusion channel information of the i-1 pieces of first feature information; The channel information is fused to perform nonlinear processing to obtain the channel attention information of the i-1 th first feature information.
  • the spatial attention module includes: a first channel compression unit, a second channel compression unit, and a spatial feature extraction unit, and the above-mentioned processing unit 42 is specifically configured to use the first channel compression unit to Perform channel dimension compression on the fused channel feature information of the i-1 first feature information to obtain first channel compression information of the i-1 first feature information; use the second channel compression unit to compress the first channel
  • the fusion channel feature information of the i-1 first feature information is compressed in the channel dimension to obtain the second channel compression information of the i-1 first feature information; the i-th first feature information is extracted by the spatial feature extraction unit.
  • the first channel compressed information and the second channel compressed information of one first feature information are subjected to spatial feature extraction to obtain the spatial feature information of the i-1 first feature information; according to the i-1 first
  • the spatial feature information of the feature information is to determine the spatial attention information of the i-1 first feature information.
  • the first channel compression unit and/or the second channel compression unit comprises a pooling layer.
  • the first channel compression unit is a max pooling layer, and/or the second channel compression unit is an average pooling layer.
  • the spatial feature extraction unit is a convolutional layer.
  • the spatial attention module further includes a second activation function
  • the above-mentioned processing unit 42 is specifically configured to perform the spatial feature information of the i-1th first feature information through the second activation function Perform nonlinear processing to obtain the spatial attention information of the i-1th first feature information.
  • the spatial dimension of the channel attention information of the i-1 th first feature information is 1 ⁇ 1.
  • the feature dimension of the spatial attention information of the i-1 th first feature information is 1.
  • the dynamic conversion model further includes at least one down-sampling unit, the above-mentioned processing unit 42 is further configured to down-sample the feature information output by the encoding module through the down-sampling unit in a spatial dimension.
  • the downsampling unit is a maximum pooling layer.
  • the dynamic conversion model further includes at least one upsampling unit, the above-mentioned processing unit 42 is further configured to perform spatial dimension upsampling on the feature information output by the decoding module through the upsampling unit.
  • the upsampling unit is a bilinear interpolation unit.
  • each of the N encoding modules includes at least one convolutional block, and parameters of the convolutional blocks included in each of the N encoding modules are not completely the same.
  • each of the N decoding modules includes at least one convolutional block, and parameters of the convolutional blocks included in each of the N decoding modules are not completely the same.
  • the N-i th second feature information is determined according to the N th first feature information output by the N th encoding module; or, if the i is less than N, the N-i-th second feature information is determined according to the N-i-th second feature information output by the N-i-th decoding module; or, if the i is equal to 1, then the i-1-th A feature information is determined according to the LDR training image; or, if the i is greater than 1, the i-1th first feature information is determined according to the first feature information output by the i-1th coding module of.
  • the above processing unit 42 is specifically configured to concatenate the i-1th third feature information and the N-ith second feature information; input the concatenated feature information into the The N-i+1th decoding module performs feature extraction to obtain the N-i+1th second feature information of the LDR training image.
  • the dynamic conversion model further includes a first convolutional layer
  • the above-mentioned processing unit 42 is also configured to perform feature extraction on the LDR training image through the first convolutional layer to obtain the LDR training image.
  • the initial feature map of the image is input into the first coding module and the first convolution attention module respectively, and the first first feature information output by the first coding module is obtained, and the obtained The first third feature information output by the first convolutional attention module.
  • the dynamic conversion model further includes a second convolutional layer
  • the above-mentioned processing unit 42 is specifically used to process the LDR training image output by the last decoding module through the second convolutional layer
  • the second feature information performs feature extraction, and outputs the HDR image prediction value of the LDR training image.
  • the processing unit 42 is further configured to obtain pre-training parameters obtained during pre-training of the pre-training model; and determine the pre-training parameters as initial parameters of the dynamic transformation model.
  • the above processing unit 42 is specifically configured to determine the target loss between the predicted value of the HDR image of the LDR training image and the true value of the HDR image of the LDR training image according to a preset loss function.
  • the preset loss function includes at least one of a reconstruction loss function, a perceptual loss function and a style loss function.
  • the above processing unit 42 is specifically configured to determine the reconstruction loss between the predicted value of the HDR image and the true value of the HDR image; determine the difference between the predicted value of the HDR image and the true value of the HDR image Perceptual loss between; determine the style loss between the predicted value of the HDR image and the true value of the HDR image; according to the reconstruction loss, perceptual loss and style between the predicted value of the HDR image and the true value of the HDR image Loss, determining the target loss between the predicted value of the HDR image and the true value of the HDR image.
  • the above-mentioned processing unit 42 is specifically configured to determine the target loss between the predicted value of the HDR image and the true value of the HDR image according to the following formula:
  • Loss L 1 + ⁇ s L st + ⁇ p L p
  • Loss is the target loss
  • the L1 is the reconstruction loss
  • the Lst is the perceptual loss
  • the Lp is the style loss
  • the ⁇ s and ⁇ p are hyperparameters.
  • the above-mentioned processing unit 42 is specifically configured to determine the compressed tone-mapping value of the predicted value of the HDR image according to a preset compressed tone-mapping function; The compressed tone-mapped value of the value; the reconstruction loss is determined according to the error between the compressed tone-mapped value of the true value of the HDR image and the compressed tone-mapped value of the predicted value of the HDR image.
  • the reconstruction loss is determined according to the following formula:
  • L1 represents the reconstruction loss
  • x H or GT
  • H is the predicted value of the HDR image output by the dynamic conversion model
  • GT is the true value of the HDR image
  • ⁇ . ⁇ 1 indicates the L1 norm
  • is preset parameter.
  • the above-mentioned processing unit 42 is specifically configured to obtain the feature map of the first layer of the pre-training model; determine the compressed tone-mapping value of the HDR image prediction value according to a preset compressed tone-mapping function; According to the compressed tone mapping function, determine the compressed tone mapping value of the real value of the HDR image; determine the compressed tone mapping value of the predicted value of the HDR image, and the corresponding first feature value in the feature map of the first layer ; Determining the compressed tone mapping value of the true value of the HDR image, the second eigenvalue corresponding to the feature map of the first layer; according to the error between the first eigenvalue and the second eigenvalue, Determine the perceptual loss.
  • the perceptual loss is determined according to the following formula:
  • Lp represents the perceptual loss
  • x H or GT
  • H is the predicted value of the HDR image output by the dynamic conversion model
  • GT is the true value of the HDR image
  • ⁇ . ⁇ 1 indicates the L1 norm
  • is preset parameter
  • ⁇ l represents the feature map of layer l of the pre-training model
  • its size is C l ⁇ H l ⁇ W l .
  • the above-mentioned processing unit 42 is specifically configured to obtain the Gram Gram matrix of the l-th layer feature map of the pre-training model; determine the compression of the predicted value of the HDR image according to a preset compressed tone mapping function Tone mapping value; according to the compressed tone mapping function, determine the compressed tone mapping value of the true value of the HDR image; determine the compressed tone mapping value of the predicted value of the HDR image, and the corresponding first element value in the Gram matrix ; Determine the compressed tone mapping value of the true value of the HDR image, the corresponding second element value in the Gram matrix; determine the style according to the error between the first element value and the second element value loss.
  • the style loss is determined according to the following formula:
  • Lp represents the perceptual loss function
  • G(.) is the Gram matrix of the l-th layer feature of the pre-training model
  • x H or GT
  • the H is the predicted value of the HDR image output by the dynamic conversion model
  • the GT is the true value of the HDR image
  • ⁇ . ⁇ 1 indicates the L1 norm
  • is preset Parameters
  • ⁇ l represents the feature map of layer l of the pre-training model, with a size of C l ⁇ H l ⁇ W l
  • K l is C l H l W l .
  • the device embodiment and the method embodiment may correspond to each other, and similar descriptions may refer to the method embodiment. To avoid repetition, details are not repeated here.
  • the device 40 shown in FIG. 11 may correspond to the corresponding subject in the model training method of the embodiment of the present application, and the aforementioned and other operations and/or functions of each unit in the device 40 are for realizing the model training method, etc. For the sake of brevity, the corresponding processes in each method are not repeated here.
  • the functional unit may be implemented in the form of hardware, may also be implemented by instructions in the form of software, and may also be implemented by a combination of hardware and software units.
  • each step of the method embodiment in the embodiment of the present application can be completed by an integrated logic circuit of the hardware in the processor and/or instructions in the form of software, and the steps of the method disclosed in the embodiment of the present application can be directly embodied as hardware
  • the decoding processor is executed, or the combination of hardware and software units in the decoding processor is used to complete the execution.
  • the software unit may be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, and registers.
  • the storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps in the above method embodiments in combination with its hardware.
  • Fig. 12 is a schematic block diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device 30 may be the image processing device described in the embodiment of the present application, or a decoder, or a model training device, and the electronic device 30 may include:
  • a memory 33 and a processor 32 the memory 33 is used to store a computer program 34 and transmit the program code 34 to the processor 32 .
  • the processor 32 can call and run the computer program 34 from the memory 33 to implement the method in the embodiment of the present application.
  • the processor 32 can be used to execute the steps in the above-mentioned method 200 according to the instructions in the computer program 34 .
  • the processor 32 may include, but is not limited to:
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • the memory 33 includes but is not limited to:
  • non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash.
  • the volatile memory can be Random Access Memory (RAM), which acts as external cache memory.
  • RAM Static Random Access Memory
  • SRAM Static Random Access Memory
  • DRAM Dynamic Random Access Memory
  • Synchronous Dynamic Random Access Memory Synchronous Dynamic Random Access Memory
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM, DDR SDRAM double data rate synchronous dynamic random access memory
  • Enhanced SDRAM, ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous connection dynamic random access memory
  • Direct Rambus RAM Direct Rambus RAM
  • the computer program 34 can be divided into one or more units, and the one or more units are stored in the memory 33 and executed by the processor 32 to complete the present application.
  • the one or more units may be a series of computer program instruction segments capable of accomplishing specific functions, and the instruction segments are used to describe the execution process of the computer program 34 in the electronic device 30 .
  • the electronic device 30 may also include:
  • a transceiver 33 the transceiver 33 can be connected to the processor 32 or the memory 33 .
  • the processor 32 can control the transceiver 33 to communicate with other devices, specifically, can send information or data to other devices, or receive information or data sent by other devices.
  • Transceiver 33 may include a transmitter and a receiver.
  • the transceiver 33 may further include antennas, and the number of antennas may be one or more.
  • bus system includes not only a data bus, but also a power bus, a control bus and a status signal bus.
  • the present application also provides a computer storage medium, on which a computer program is stored, and when the computer program is executed by a computer, the computer can execute the methods of the above method embodiments.
  • the embodiments of the present application further provide a computer program product including instructions, and when the instructions are executed by a computer, the computer executes the methods of the foregoing method embodiments.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transferred from a website, computer, server, or data center by wire (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a tape), an optical medium (such as a digital video disc (DVD)), or a semiconductor medium (such as a solid state disk (SSD)), etc. .
  • a magnetic medium such as a floppy disk, a hard disk, or a tape
  • an optical medium such as a digital video disc (DVD)
  • a semiconductor medium such as a solid state disk (SSD)
  • the disclosed systems, devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or can be Integrate into another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • a unit described as a separate component may or may not be physically separated, and a component shown as a unit may or may not be a physical unit, that is, it may be located in one place, or may also be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.

Abstract

The present application provides an image decoding method and apparatus, an image processing method and apparatus, and a device. The method comprises: decoding a code stream to obtain a reconstructed image, and inputting the reconstructed image into a dynamic conversion model for dynamic conversion, so as to obtain a high dynamic range (HDR) image of the reconstructed image, wherein the dynamic conversion model comprises N encoding modules and N decoding modules; an ith encoding module is in skip connection with a (N-i+1)th decoding module; the ith encoding module is configured to perform feature extraction on a (i-1)th piece of first feature information outputted by a (i-1)th encoding module, so as to obtain an ith piece of first feature information of the reconstructed image, and the (N-i+1)th decoding module is configured to perform feature extraction on the (i-1)th piece of first feature information and a (N-i)th piece of second feature information to obtain a (N-i+1)th piece of second feature information, and the HDR image is determined according to the second feature information outputted by a last decoding module. The present application converts an image having a low dynamic range into an image having a high dynamic range by using the dynamic conversion model, and thus, the process is simple and the cost is low.

Description

图像解码及处理方法、装置及设备Image decoding and processing method, device and equipment 技术领域technical field
本申请涉及图像处理技术领域,尤其涉及一种图像解码及处理方法、装置及设备。The present application relates to the technical field of image processing, and in particular to an image decoding and processing method, device and equipment.
背景技术Background technique
动态范围是用于定义相机可以在多大范围内捕捉图像的影调细节的术语,通常指由最低值到最高溢出值之间的范围。简单地说,它描述的是相机在单帧内可以记录的最亮和最暗影调之间的比率。动态范围越大,则可能保留高光区和阴影区的信息。Dynamic range is a term used to define how wide a range of tonal detail a camera can capture in an image, usually the range from the lowest value to the highest overflow value. Simply put, it describes the ratio between the brightest and darkest tones a camera can record in a single frame. The larger the dynamic range, the more likely it is to preserve information in highlights and shadows.
但是,高动态范围图像的获取相对复杂,在数据采集、传输、存储以及显示等方面对于硬件和算法也提出了更高的要求,目前将低动态范围图像转换为高动态范围图像的转换成本高。However, the acquisition of high dynamic range images is relatively complicated, and higher requirements are placed on hardware and algorithms in terms of data acquisition, transmission, storage, and display. At present, the conversion cost of converting low dynamic range images to high dynamic range images is high. .
发明内容Contents of the invention
本申请实施例提供了一种图像解码及处理方法、装置及设备,以降低将低动态范围图像转换为高动态范围图像的成本。Embodiments of the present application provide an image decoding and processing method, device, and equipment to reduce the cost of converting a low dynamic range image into a high dynamic range image.
第一方面,本申请实施例提供一种图像解码方法,包括:In the first aspect, the embodiment of the present application provides an image decoding method, including:
解码码流,得到重建图像;Decode the code stream to obtain the reconstructed image;
将重建图像输入动态转换模型进行动态转换,得到重建图像的高动态范围HDR图像;Input the reconstructed image into the dynamic conversion model for dynamic conversion to obtain a high dynamic range HDR image of the reconstructed image;
其中,动态转换模型包括:串联连接的N个编码模块和串联连接的N个解码模块,N个编码模块中的最后一个编码模块的输出与N个解码模块中的第一个解码模块的输入连接,且第i个编码模块与第N-i+1个解码模块跳跃连接,第i个编码模块用于对第i-1个编码模块输出的第i-1个第一特征信息进行特征提取,得到重建图像的第i个第一特征信息,第N-i+1个解码模块用于对第i-1个第一特征信息和重建图像的第N-i个第二特征信息进行特征提取,得到重建图像的第N-i+1个第二特征信息,重建图像的HDR图像是根据N个解码模块中最后一个解码模块输出的第二特征信息确定的,i为小于或等于N的正整数,N为正整数。Wherein, the dynamic conversion model includes: N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in the N encoding modules is connected to the input of the first decoding module in the N decoding modules , and the i-th coding module is skip-connected to the N-i+1-th decoding module, and the i-th coding module is used to perform feature extraction on the i-1-th first feature information output by the i-1-th coding module, Obtain the i-th first feature information of the reconstructed image, and the N-i+1 decoding module is used to perform feature extraction on the i-1-th first feature information and the N-i-th second feature information of the reconstructed image, and obtain the reconstruction The N-i+1th second feature information of the image, the HDR image of the reconstructed image is determined according to the second feature information output by the last decoding module in the N decoding modules, i is a positive integer less than or equal to N, and N is a positive integer.
第二方面,本申请提供了一种图像处理方法,包括:In a second aspect, the present application provides an image processing method, including:
获取待处理的低动态范围LDR图像;Obtain the low dynamic range LDR image to be processed;
将LDR图像输入动态转换模型进行动态转换,得到LDR图像的高动态范围HDR图像;Input the LDR image into the dynamic conversion model for dynamic conversion, and obtain the high dynamic range HDR image of the LDR image;
其中,动态转换模型包括:串联连接的N个编码模块和串联连接的N个解码模块,N个编码模块中的最后一个编码模块的输出与N个解码模块中的第一个解码模块的输入连接,且第i个编码模块与第N-i+1个解码模块跳跃连接,第i个编码模块用于对第i-1个编码模块输出的第i-1个第一特征信息进行特征提取,得到LDR图像的第i个第一特征信息,第N-i+1个解码模块用于对第i-1个第一特征信息和LDR图像的第N-i个第二特征信息进行特征提取,得到LDR图像的第N-i+1个第二特征信息,LDR图像的HDR图像是根据N个解码模块中最后一个解码模块输出的第二特征信息确定的,i为小于或等于N的正整数,N为正整数。Wherein, the dynamic conversion model includes: N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in the N encoding modules is connected to the input of the first decoding module in the N decoding modules , and the i-th coding module is skip-connected to the N-i+1-th decoding module, and the i-th coding module is used to perform feature extraction on the i-1-th first feature information output by the i-1-th coding module, Obtain the i-th first feature information of the LDR image, and the N-i+1 decoding module is used to perform feature extraction on the i-1-th first feature information and the N-i-th second feature information of the LDR image, and obtain the LDR The N-i+1th second feature information of the image, the HDR image of the LDR image is determined according to the second feature information output by the last decoding module in the N decoding modules, i is a positive integer less than or equal to N, and N is a positive integer.
第三方面,本申请提供了一种模型训练方法,包括:In a third aspect, the present application provides a model training method, including:
获取低动态范围LDR训练图像和LDR训练图像的高动态范围HDR图像真值;Obtain the true value of the low dynamic range LDR training image and the high dynamic range HDR image of the LDR training image;
将LDR训练图像输入动态转换模型,通过第i个编码模块对第i-1个第一特征信息进行特征提取,得到LDR训练图像的第i个第一特征信息,其中,动态转换模型包括串联连接的N个编码模块和串联连接的N个解码模块,N个编码模块中的最后一个编码模块的输出与N个解码模块中的第一个解码模块的输入连接,且第i个编码模块与第N-i+1个解码模块跳跃连接,i为小于或等于N的正整数,N为正整数;Input the LDR training image into the dynamic conversion model, and extract the i-1 first feature information through the i-th encoding module to obtain the i-th first feature information of the LDR training image, wherein the dynamic conversion model includes serial connection The N encoding modules of N encoding modules and N decoding modules connected in series, the output of the last encoding module among the N encoding modules is connected to the input of the first decoding module among the N decoding modules, and the i-th encoding module is connected to the first N-i+1 decoding modules are skipped and connected, i is a positive integer less than or equal to N, and N is a positive integer;
通过第N-i+1个解码模块对第i-1个第一特征信息和LDR训练图像的第N-i个第二特征信息进行特征提取,得到LDR训练图像的第N-i+1个第二特征信息;Through the N-i+1th decoding module, the i-1th first feature information and the N-ith second feature information of the LDR training image are extracted to obtain the N-i+1th second feature information of the LDR training image characteristic information;
根据N个解码模块中最后一个解码模块输出的LDR训练图像的第二特征信息,确定LDR训练图像的HDR图像预测值;According to the second feature information of the LDR training image output by the last decoding module in the N decoding modules, determine the HDR image prediction value of the LDR training image;
确定LDR训练图像的HDR图像预测值和LDR训练图像的HDR图像真值之间的损失,并根据损失对动态转换模型进行训练。Determine the loss between the predicted value of the HDR image of the LDR training image and the true value of the HDR image of the LDR training image, and train the dynamic transformation model according to the loss.
第四方面,提供了一种图像解码装置,用于执行上述第一方面或其各实现方式中的方法。具体地,该图像解码装置包括用于执行上述第一方面或其各实现方式中的方法的功能单元。In a fourth aspect, an image decoding device is provided, configured to execute the method in the above first aspect or its implementations. Specifically, the image decoding device includes a functional unit configured to execute the method in the above first aspect or each implementation manner thereof.
第五方面,提供了一种解码器,包括处理器和存储器。该存储器用于存储计算机程序,该处理器用于调用并运行该存储器中存储的计算机程序,以执行上述第一方面或其各实现方式中的方法。In a fifth aspect, a decoder is provided, including a processor and a memory. The memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory, so as to execute the method in the above first aspect or its various implementations.
第六方面,提供了一种图像处理装置,用于执行上述第二方面或其各实现方式中的方法。具体地,该装置包括用于执行上述第二方面或其各实现方式中的方法的功能单元。In a sixth aspect, an image processing device is provided, configured to execute the method in the above-mentioned second aspect or various implementations thereof. Specifically, the device includes a functional unit configured to execute the method in the above second aspect or each implementation manner thereof.
第七方面,提供了一种图像处理设备,包括处理器和存储器。该存储器用于存储计算机程序,该处理器用于调用并运行该存储器中存储的计算机程序,以执行上述第二方面或其各实现方式中的方法。In a seventh aspect, an image processing device is provided, including a processor and a memory. The memory is used to store a computer program, and the processor is used to invoke and run the computer program stored in the memory, so as to execute the method in the above second aspect or its various implementations.
第八方面,提供了一种模型训练装置,用于执行上述第三方面或其各实现方式中的方法。具体地,该模型训练装置包括用于执行上述第三方面或其各实现方式中的方法的功能单元。In an eighth aspect, a model training device is provided, configured to execute the method in the above third aspect or various implementations thereof. Specifically, the model training device includes a functional unit for executing the method in the above third aspect or its various implementations.
第九方面,提供了一种模型训练设备,包括处理器和存储器。该存储器用于存储计算机程序,该处理器用于调用并运行该存储器中存储的计算机程序,以执行上述第三方面或其各实现方式中的方法。In a ninth aspect, a model training device is provided, including a processor and a memory. The memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory, so as to execute the method in the above third aspect or its various implementations.
第十方面,提供了一种芯片,用于实现上述第一方面至第三方面中的任一方面或其各实现方式中的方法。具体地,该芯片包括:处理器,用于从存储器中调用并运行计算机程序,使得安装有该芯片的设备执行如上述第一方面至第三方面中的任一方面或其各实现方式中的方法。In a tenth aspect, a chip is provided, configured to implement any one of the foregoing first to third aspects or the method in each implementation manner thereof. Specifically, the chip includes: a processor, configured to call and run a computer program from the memory, so that the device installed with the chip executes any one of the above-mentioned first to third aspects or any of the implementations thereof. method.
第十一方面,提供了一种计算机可读存储介质,用于存储计算机程序,该计算机程序使得计算机执行上述第一方面至第三方面中的任一方面或其各实现方式中的方法。In an eleventh aspect, there is provided a computer-readable storage medium for storing a computer program, and the computer program causes a computer to execute any one of the above-mentioned first to third aspects or the method in each implementation manner thereof.
第十二方面,提供了一种计算机程序产品,包括计算机程序指令,该计算机程序指令使得计算机执行上述第一方面至第三方面中的任一方面或其各实现方式中的方法。A twelfth aspect provides a computer program product, including computer program instructions, the computer program instructions cause a computer to execute any one of the above first to third aspects or the method in each implementation manner.
第十三方面,提供了一种计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面至第三方面中的任一方面或其各实现方式中的方法。A thirteenth aspect provides a computer program, which, when running on a computer, causes the computer to execute any one of the above first to third aspects or the method in each implementation manner.
基于以上技术方案,该动态转换模型包括串联连接的N个编码模块和串联连接的N个解码模块,N个编码模块中的最后一个编码模块的输出与N个解码模块中的第一个解码模块的输入连接,且第i个编码模块与第N-i+1个解码模块跳跃连接,第i个编码模块用于对第i-1个编码模块输出的第i-1个第一特征信息进行特征提取,得到重建图像的第i个第一特征信息,第N-i+1个解码模块用于对第i-1个第一特征信息和重建图像的第N-i个第二特征信息进行特征提取,得到重建图像的第N-i+1个第二特征信息,重建图像的HDR图像是根据N个解码模块中最后一个解码模块输出的第二特征信息确定的,i为小于或等于N的正整数,N为正整数。使用该动态转换模型可以将LDR图像转换为HDR图像,进而实现在不增加数据采集、编码、传输、存储等成本的同时,实现HDR图像的转换,从而提高了HDR图像转换的效率,降低了HDR图像。Based on the above technical scheme, the dynamic conversion model includes N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in the N encoding modules is connected with the first decoding module in the N decoding modules The input connection of , and the i-th encoding module is skip-connected to the N-i+1-th decoding module, and the i-th encoding module is used to perform the i-1th first feature information output by the i-1-th encoding module Feature extraction, to obtain the i-th first feature information of the reconstructed image, and the N-i+1-th decoding module is used to perform feature extraction on the i-1-th first feature information and the N-i-th second feature information of the reconstructed image , to obtain the N-i+1th second characteristic information of the reconstructed image, the HDR image of the reconstructed image is determined according to the second characteristic information output by the last decoding module in the N decoding modules, i is a positive value less than or equal to N Integer, N is a positive integer. Using this dynamic conversion model, LDR images can be converted into HDR images, and HDR image conversion can be realized without increasing the cost of data acquisition, encoding, transmission, storage, etc., thereby improving the efficiency of HDR image conversion and reducing the cost of HDR images. image.
附图说明Description of drawings
图1为本申请实施例涉及的一种视频编解码系统的示意性框图;FIG. 1 is a schematic block diagram of a video encoding and decoding system involved in an embodiment of the present application;
图2是本申请实施例提供的视频编码器的示意性框图;Fig. 2 is a schematic block diagram of a video encoder provided by an embodiment of the present application;
图3是本申请实施例提供的视频解码器的示意性框图;Fig. 3 is a schematic block diagram of a video decoder provided by an embodiment of the present application;
图4为本申请一实施例提供的动态转换模型训练方法流程示意图;FIG. 4 is a schematic flow chart of a dynamic conversion model training method provided by an embodiment of the present application;
图5A为本申请一实施例涉及的动态转换模型的一种网络示意图;FIG. 5A is a network schematic diagram of a dynamic conversion model involved in an embodiment of the present application;
图5B为本申请一实施例涉及的卷积块的一种网络示意图;FIG. 5B is a schematic network diagram of a convolution block involved in an embodiment of the present application;
图5C为本申请一实施例涉及的动态转换模型的一种网络示意图;FIG. 5C is a network schematic diagram of a dynamic conversion model involved in an embodiment of the present application;
图5D为本申请一实施例涉及的卷积注意力模块的一种网络示意图;FIG. 5D is a network diagram of a convolutional attention module involved in an embodiment of the present application;
图5E为本申请一实施例涉及的通道注意力模块的一种网络示意图;FIG. 5E is a network diagram of a channel attention module involved in an embodiment of the present application;
图5F为本申请一实施例涉及的空间注意力模块的一种网络示意图;FIG. 5F is a network schematic diagram of a spatial attention module involved in an embodiment of the present application;
图5G为本申请一实施例涉及的动态转换模型的一种网络示意图;FIG. 5G is a network schematic diagram of a dynamic conversion model involved in an embodiment of the present application;
图6为本申请一实施例提供的图像解码方法的流程示意图;FIG. 6 is a schematic flowchart of an image decoding method provided by an embodiment of the present application;
图7为本申请一实施例涉及的空间注意力模块的一种网络示意图;FIG. 7 is a network diagram of a spatial attention module involved in an embodiment of the present application;
图8为本申请一实施例提供的图像处理方法的流程示意图;FIG. 8 is a schematic flowchart of an image processing method provided by an embodiment of the present application;
图9是本申请实施例提供的图像解码装置的示意性框图;FIG. 9 is a schematic block diagram of an image decoding device provided by an embodiment of the present application;
图10是本申请实施例提供的图像处理装置的示意性框图;FIG. 10 is a schematic block diagram of an image processing device provided by an embodiment of the present application;
图11是本申请实施例提供的模型训练装置的示意性框图;Fig. 11 is a schematic block diagram of a model training device provided by an embodiment of the present application;
图12是本申请实施例提供的电子设备的示意性框图。Fig. 12 is a schematic block diagram of an electronic device provided by an embodiment of the present application.
具体实施方式detailed description
本申请可应用于点云上采样技术领域,例如可以应用于点云压缩技术领域。The present application can be applied to the technical field of point cloud upsampling, for example, can be applied to the technical field of point cloud compression.
本申请可应用于图像编解码领域、视频编解码领域、硬件视频编解码领域、专用电路视频编解码领域、实时视频编解码领域等。例如,本申请的方案可结合至音视频编码标准(audio video coding standard,简称AVS),例如,H.264/音视频编码(audio video coding,简称AVC)标准,H.265/高效视频编码(high efficiency video coding,简称HEVC)标准以及H.266/多功能视频编码(versatile video coding,简称VVC)标准。或者,本申请的方案可结合至其它专属或行业标准而操作,所述标准包含ITU-TH.261、ISO/IECMPEG-1Visual、ITU-TH.262或ISO/IECMPEG-2Visual、ITU-TH.263、ISO/IECMPEG-4Visual,ITU-TH.264(还称为ISO/IECMPEG-4AVC),包含可分级视频编解码(SVC)及多视图视频编解码(MVC)扩展。应理解,本申请的技术不限于任何特定编解码标准或技术。The application can be applied to the field of image codec, video codec, hardware video codec, dedicated circuit video codec, real-time video codec, etc. For example, the solution of the present application can be combined with audio and video coding standards (audio video coding standard, referred to as AVS), for example, H.264/audio video coding (audio video coding, referred to as AVC) standard, H.265/high efficiency video coding ( High efficiency video coding (HEVC for short) standard and H.266/versatile video coding (VVC for short) standard. Alternatively, the solutions of the present application may operate in conjunction with other proprietary or industry standards, including ITU-TH.261, ISO/IECMPEG-1Visual, ITU-TH.262 or ISO/IECMPEG-2Visual, ITU-TH.263 , ISO/IECMPEG-4Visual, ITU-TH.264 (also known as ISO/IECMPEG-4AVC), including scalable video codec (SVC) and multi-view video codec (MVC) extensions. It should be understood that the techniques of this application are not limited to any particular codec standard or technology.
为了便于理解,首先结合图1对本申请实施例涉及的视频编解码系统进行介绍。For ease of understanding, the video codec system involved in the embodiment of the present application is first introduced with reference to FIG. 1 .
图1为本申请实施例涉及的一种视频编解码系统的示意性框图。需要说明的是,图1只是一种示例,本申请实施例的视频编解码系统包括但不限于图1所示。如图1所示,该视频编解码系统100包含编码设备110和解码设备120。其中编码设备用于对视频数据进行编码(可以理解成压缩)产生码流,并将码流传输给解码设备。解码设备对编码设备编码产生的码流进行解码,得到解码后的视频数据。FIG. 1 is a schematic block diagram of a video encoding and decoding system involved in an embodiment of the present application. It should be noted that FIG. 1 is only an example, and the video codec system in the embodiment of the present application includes but is not limited to what is shown in FIG. 1 . As shown in FIG. 1 , the video codec system 100 includes an encoding device 110 and a decoding device 120 . The encoding device is used to encode (can be understood as compression) the video data to generate a code stream, and transmit the code stream to the decoding device. The decoding device decodes the code stream generated by the encoding device to obtain decoded video data.
本申请实施例的编码设备110可以理解为具有视频编码功能的设备,解码设备120可以理解为具有视频解码功能的设备,即本申请实施例对编码设备110和解码设备120包括更广泛的装置,例如包含智能手机、台式计算机、移动计算装置、笔记本(例如,膝上型)计算机、平板计算机、机顶盒、电视、相机、显示装置、数字媒体播放器、视频游戏控制台、车载计算机等。The encoding device 110 in the embodiment of the present application can be understood as a device having a video encoding function, and the decoding device 120 can be understood as a device having a video decoding function, that is, the embodiment of the present application includes a wider range of devices for the encoding device 110 and the decoding device 120, Examples include smartphones, desktop computers, mobile computing devices, notebook (eg, laptop) computers, tablet computers, set-top boxes, televisions, cameras, display devices, digital media players, video game consoles, vehicle-mounted computers, and the like.
在一些实施例中,编码设备110可以经由信道130将编码后的视频数据(如码流)传输给解码设备120。信道130可以包括能够将编码后的视频数据从编码设备110传输到解码设备120的一个或多个媒体和/或装置。In some embodiments, the encoding device 110 may transmit the encoded video data (such as code stream) to the decoding device 120 via the channel 130 . Channel 130 may include one or more media and/or devices capable of transmitting encoded video data from encoding device 110 to decoding device 120 .
在一个实例中,信道130包括使编码设备110能够实时地将编码后的视频数据直接发射到解码设备120的一个或多个通信媒体。在此实例中,编码设备110可根据通信标准来调制编码后的视频数据,且将调制后的视频数据发射到解码设备120。其中通信媒体包含无线通信媒体,例如射频频谱,可选的,通信媒体还可以包含有线通信媒体,例如一根或多根物理传输线。In one example, channel 130 includes one or more communication media that enable encoding device 110 to transmit encoded video data directly to decoding device 120 in real-time. In this example, encoding device 110 may modulate the encoded video data according to a communication standard and transmit the modulated video data to decoding device 120 . The communication medium includes a wireless communication medium, such as a radio frequency spectrum. Optionally, the communication medium may also include a wired communication medium, such as one or more physical transmission lines.
在另一实例中,信道130包括存储介质,该存储介质可以存储编码设备110编码后的视频数据。存储介质包含多种本地存取式数据存储介质,例如光盘、DVD、快闪存储器等。在该实例中,解码设备120可从该存储介质中获取编码后的视频数据。In another example, the channel 130 includes a storage medium that can store video data encoded by the encoding device 110 . The storage medium includes a variety of local access data storage media, such as optical discs, DVDs, flash memory, and the like. In this example, the decoding device 120 may acquire encoded video data from the storage medium.
在另一实例中,信道130可包含存储服务器,该存储服务器可以存储编码设备110编码后的视频数据。在此实例中,解码设备120可以从该存储服务器中下载存储的编码后的视频数据。可选的,该存储服务器可以存储编码后的视频数据且可以将该编码后的视频数据发射到解码设备120,例如web服务器(例如,用于网站)、文件传送协议(FTP)服务器等。In another example, channel 130 may include a storage server that may store video data encoded by encoding device 110 . In this instance, the decoding device 120 may download the stored encoded video data from the storage server. Optionally, the storage server may store the encoded video data and may transmit the encoded video data to the decoding device 120, such as a web server (eg, for a website), a file transfer protocol (FTP) server, and the like.
一些实施例中,编码设备110包含视频编码器112及输出接口113。其中,输出接口113可以包含调制器/解调器(调制解调器)和/或发射器。In some embodiments, the encoding device 110 includes a video encoder 112 and an output interface 113 . Wherein, the output interface 113 may include a modulator/demodulator (modem) and/or a transmitter.
在一些实施例中,编码设备110除了包括视频编码器112和输入接口113外,还可以包括视频源111。In some embodiments, the encoding device 110 may include a video source 111 in addition to the video encoder 112 and the input interface 113 .
视频源111可包含视频采集装置(例如,视频相机)、视频存档、视频输入接口、计算机图形系统中的至少一个,其中,视频输入接口用于从视频内容提供者处接收视频数据,计算机图形系统用于产 生视频数据。The video source 111 may include at least one of a video capture device (for example, a video camera), a video archive, a video input interface, a computer graphics system, wherein the video input interface is used to receive video data from a video content provider, and the computer graphics system Used to generate video data.
视频编码器112对来自视频源111的视频数据进行编码,产生码流。视频数据可包括一个或多个图像(picture)或图像序列(sequence of pictures)。码流以比特流的形式包含了图像或图像序列的编码信息。编码信息可以包含编码图像数据及相关联数据。相关联数据可包含序列参数集(sequence parameter set,简称SPS)、图像参数集(picture parameter set,简称PPS)及其它语法结构。SPS可含有应用于一个或多个序列的参数。PPS可含有应用于一个或多个图像的参数。语法结构是指码流中以指定次序排列的零个或多个语法元素的集合。The video encoder 112 encodes the video data from the video source 111 to generate a code stream. Video data may include one or more pictures or a sequence of pictures. The code stream contains the encoding information of an image or image sequence in the form of a bit stream. Encoding information may include encoded image data and associated data. The associated data may include a sequence parameter set (SPS for short), a picture parameter set (PPS for short) and other syntax structures. An SPS may contain parameters that apply to one or more sequences. A PPS may contain parameters applied to one or more images. The syntax structure refers to a set of zero or more syntax elements arranged in a specified order in the code stream.
视频编码器112经由输出接口113将编码后的视频数据直接传输到解码设备120。编码后的视频数据还可存储于存储介质或存储服务器上,以供解码设备120后续读取。The video encoder 112 directly transmits encoded video data to the decoding device 120 via the output interface 113 . The encoded video data can also be stored on a storage medium or a storage server for subsequent reading by the decoding device 120 .
在一些实施例中,解码设备120包含输入接口121和视频解码器122。In some embodiments, the decoding device 120 includes an input interface 121 and a video decoder 122 .
在一些实施例中,解码设备120除包括输入接口121和视频解码器122外,还可以包括显示装置123。In some embodiments, the decoding device 120 may include a display device 123 in addition to the input interface 121 and the video decoder 122 .
其中,输入接口121包含接收器及/或调制解调器。输入接口121可通过信道130接收编码后的视频数据。Wherein, the input interface 121 includes a receiver and/or a modem. The input interface 121 can receive encoded video data through the channel 130 .
视频解码器122用于对编码后的视频数据进行解码,得到解码后的视频数据,并将解码后的视频数据传输至显示装置123。The video decoder 122 is used to decode the encoded video data to obtain decoded video data, and transmit the decoded video data to the display device 123 .
显示装置123显示解码后的视频数据。显示装置123可与解码设备120整合或在解码设备120外部。显示装置123可包括多种显示装置,例如液晶显示器(LCD)、等离子体显示器、有机发光二极管(OLED)显示器或其它类型的显示装置。The display device 123 displays the decoded video data. The display device 123 may be integrated with the decoding device 120 or external to the decoding device 120 . The display device 123 may include various display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or other types of display devices.
此外,图1仅为实例,本申请实施例的技术方案不限于图1,例如本申请的技术还可以应用于单侧的视频编码或单侧的视频解码。In addition, FIG. 1 is only an example, and the technical solutions of the embodiments of the present application are not limited to FIG. 1 . For example, the technology of the present application may also be applied to one-sided video encoding or one-sided video decoding.
下面对本申请实施例涉及的视频编码器进行介绍。The video encoder involved in the embodiment of the present application is introduced below.
图2是本申请实施例提供的视频编码器的示意性框图。应理解,该视频编码器200可用于对图像进行有损压缩(lossy compression),也可用于对图像进行无损压缩(lossless compression)。该无损压缩可以是视觉无损压缩(visually lossless compression),也可以是数学无损压缩(mathematically lossless compression)。Fig. 2 is a schematic block diagram of a video encoder provided by an embodiment of the present application. It should be understood that the video encoder 200 can be used to perform lossy compression on images, and can also be used to perform lossless compression on images. The lossless compression may be visually lossless compression or mathematically lossless compression.
该视频编码器200可应用于亮度色度(YCbCr,YUV)格式的图像数据上。例如,YUV比例可以为4:2:0、4:2:2或者4:4:4,Y表示明亮度(Luma),Cb(U)表示蓝色色度,Cr(V)表示红色色度,U和V表示为色度(Chroma)用于描述色彩及饱和度。例如,在颜色格式上,4:2:0表示每4个像素有4个亮度分量,2个色度分量(YYYYCbCr),4:2:2表示每4个像素有4个亮度分量,4个色度分量(YYYYCbCrCbCr),4:4:4表示全像素显示(YYYYCbCrCbCrCbCrCbCr)。The video encoder 200 can be applied to image data in luminance-chrominance (YCbCr, YUV) format. For example, the YUV ratio can be 4:2:0, 4:2:2 or 4:4:4, Y means brightness (Luma), Cb (U) means blue chroma, Cr (V) means red chroma, U and V are expressed as chroma (Chroma) for describing color and saturation. For example, in terms of color format, 4:2:0 means that every 4 pixels have 4 luminance components, 2 chroma components (YYYYCbCr), 4:2:2 means that every 4 pixels have 4 luminance components, 4 Chroma component (YYYYCbCrCbCr), 4:4:4 means full pixel display (YYYYCbCrCbCrCbCrCbCr).
例如,该视频编码器200读取视频数据,针对视频数据中的每帧图像,将一帧图像划分成若干个编码树单元(coding tree unit,CTU)、“最大编码单元”(Largest Coding unit,简称LCU)或“编码树型块”(coding tree block,简称CTB)。每一个CTU可以与图像内的具有相等大小的像素块相关联。每一像素可对应一个亮度(luminance或luma)采样及两个色度(chrominance或chroma)采样。因此,每一个CTU可与一个亮度采样块及两个色度采样块相关联。一个CTU大小例如为128×128、64×64、32×32等。一个CTU又可以继续被划分成若干个编码单元(Coding Unit,CU)进行编码,CU可以为矩形块也可以为方形块。CU可以进一步划分为预测单元(prediction Unit,简称PU)和变换单元(transform unit,简称TU),进而使得编码、预测、变换分离,处理的时候更灵活。在一种示例中,CTU以四叉树方式划分为CU,CU以四叉树方式划分为TU、PU。For example, the video encoder 200 reads video data, and for each frame of image in the video data, divides a frame of image into several coding tree units (coding tree unit, CTU), "largest coding unit" (Largest Coding unit, LCU for short) or "coding tree block" (coding tree block, CTB for short). Each CTU may be associated with a pixel block of equal size within the image. Each pixel may correspond to one luminance (luma) sample and two chrominance (chrominance or chroma) samples. Thus, each CTU may be associated with one block of luma samples and two blocks of chroma samples. A CTU size is, for example, 128×128, 64×64, 32×32 and so on. A CTU can be further divided into several coding units (Coding Unit, CU) for coding, and the CU can be a rectangular block or a square block. The CU can be further divided into a prediction unit (PU for short) and a transform unit (TU for short), so that coding, prediction, and transformation are separated, and processing is more flexible. In an example, a CTU is divided into CUs in a quadtree manner, and a CU is divided into TUs and PUs in a quadtree manner.
视频编码器及视频解码器可支持各种PU大小。假定特定CU的大小为2N×2N,视频编码器及视频解码器可支持2N×2N或N×N的PU大小以用于帧内预测,且支持2N×2N、2N×N、N×2N、N×N或类似大小的对称PU以用于帧间预测。视频编码器及视频解码器还可支持2N×nU、2N×nD、nL×2N及nR×2N的不对称PU以用于帧间预测。The video encoder and video decoder can support various PU sizes. Assuming that the size of a specific CU is 2N×2N, video encoders and video decoders may support 2N×2N or N×N PU sizes for intra prediction, and support 2N×2N, 2N×N, N×2N, NxN or similarly sized symmetric PUs for inter prediction. The video encoder and video decoder may also support asymmetric PUs of 2NxnU, 2NxnD, nLx2N, and nRx2N for inter prediction.
在一些实施例中,如图2所示,该视频编码器200可包括:预测单元210、残差单元220、变换/量化单元230、反变换/量化单元240、重建单元250、环路滤波单元260、解码图像缓存270和熵编码 单元280。需要说明的是,视频编码器200可包含更多、更少或不同的功能组件。In some embodiments, as shown in FIG. 2 , the video encoder 200 may include: a prediction unit 210, a residual unit 220, a transform/quantization unit 230, an inverse transform/quantization unit 240, a reconstruction unit 250, and a loop filter unit 260. Decoded image cache 270 and entropy encoding unit 280. It should be noted that the video encoder 200 may include more, less or different functional components.
可选的,在本申请中,当前块(current block)可以称为当前编码单元(CU)或当前预测单元(PU)等。预测块也可称为预测待编码块或图像预测块,重建待编码块也可称为重建块或图像重建待编码块。Optionally, in this application, the current block (current block) may be called a current coding unit (CU) or a current prediction unit (PU). A predicted block may also be referred to as a predicted block to be encoded or an image predicted block, and a reconstructed block to be encoded may also be referred to as a reconstructed block or an image reconstructed block to be encoded.
在一些实施例中,预测单元210包括帧间预测单元211和帧内预测单元212。由于视频的一个帧中的相邻像素之间存在很强的相关性,在视频编解码技术中使用帧内预测的方法消除相邻像素之间的空间冗余。由于视频中的相邻帧之间存在着很强的相似性,在视频编解码技术中使用帧间预测方法消除相邻帧之间的时间冗余,从而提高编码效率。In some embodiments, the prediction unit 210 includes an inter prediction unit 211 and an intra prediction unit 212 . Because there is a strong correlation between adjacent pixels in a video frame, the intra-frame prediction method is used in video coding and decoding technology to eliminate the spatial redundancy between adjacent pixels. Due to the strong similarity between adjacent frames in video, the inter-frame prediction method is used in video coding and decoding technology to eliminate time redundancy between adjacent frames, thereby improving coding efficiency.
帧间预测单元211可用于帧间预测,帧间预测可以参考不同帧的图像信息,帧间预测使用运动信息从参考帧中找到参考块,根据参考块生成预测块,用于消除时间冗余;帧间预测所使用的帧可以为P帧和/或B帧,P帧指的是向前预测帧,B帧指的是双向预测帧。运动信息包括参考帧所在的参考帧列表,参考帧索引,以及运动矢量。运动矢量可以是整像素的或者是分像素的,如果运动矢量是分像素的,那么需要再参考帧中使用插值滤波做出所需的分像素的块,这里把根据运动矢量找到的参考帧中的整像素或者分像素的块叫参考块。有的技术会直接把参考块作为预测块,有的技术会在参考块的基础上再处理生成预测块。在参考块的基础上再处理生成预测块也可以理解为把参考块作为预测块然后再在预测块的基础上处理生成新的预测块。The inter-frame prediction unit 211 can be used for inter-frame prediction. The inter-frame prediction can refer to image information of different frames. The inter-frame prediction uses motion information to find a reference block from the reference frame, and generates a prediction block according to the reference block to eliminate temporal redundancy; Frames used for inter-frame prediction may be P frames and/or B frames, P frames refer to forward predictive frames, and B frames refer to bidirectional predictive frames. The motion information includes the reference frame list where the reference frame is located, the reference frame index, and the motion vector. The motion vector can be an integer pixel or a sub-pixel. If the motion vector is sub-pixel, then it is necessary to use interpolation filtering in the reference frame to make the required sub-pixel block. Here, the reference frame found according to the motion vector A block of whole pixels or sub-pixels is called a reference block. Some technologies will directly use the reference block as a prediction block, and some technologies will further process the reference block to generate a prediction block. Reprocessing and generating a prediction block based on a reference block can also be understood as taking the reference block as a prediction block and then processing and generating a new prediction block based on the prediction block.
目前最常用的帧间预测方法包括:VVC视频编解码标准中的几何划分模式(geometric partitioning mode,GPM),以及AVS3视频编解码标准中的角度加权预测(angular weighted prediction,AWP)。这两种帧内预测模式在原理上有共通之处。Currently the most commonly used inter-frame prediction methods include: geometric partitioning mode (GPM) in the VVC video codec standard, and angular weighted prediction (AWP) in the AVS3 video codec standard. These two intra-frame prediction modes have something in common in principle.
帧内预测单元212只参考同一帧图像的信息,预测当前码待编码块内的像素信息,用于消除空间冗余。帧内预测所使用的帧可以为I帧。The intra-frame prediction unit 212 only refers to the information of the same frame image, and predicts the pixel information in the block to be encoded of the current code, so as to eliminate spatial redundancy. A frame used for intra prediction may be an I frame.
在一些实施例中,帧内预测方法还包括多参考行帧内预测方法(multiple reference line,MRL),MRL可以使用更多的参考像素从而提高编码效率。In some embodiments, the intra prediction method further includes a multiple reference line intra prediction method (multiple reference line, MRL). MRL can use more reference pixels to improve coding efficiency.
帧内预测有多种预测模式,H.264中对4×4的块进行帧内预测的9种模式。其中模式0是将当前块上面的像素按竖直方向复制到当前块作为预测值;模式1是将左边的参考像素按水平方向复制到当前块作为预测值;模式2(DC)是将A~D和I~L这8个点的平均值作为所有点的预测值,模式3至模式8是分别按某一个角度将参考像素复制到当前块的对应位置。因为当前块某些位置不能正好对应到参考像素,可能需要使用参考像素的加权平均值,或者说是插值的参考像素的分像素。There are multiple prediction modes for intra-frame prediction, and there are 9 modes for intra-frame prediction for 4×4 blocks in H.264. Among them, mode 0 is to copy the pixels above the current block to the current block in the vertical direction as the prediction value; mode 1 is to copy the reference pixels on the left to the current block in the horizontal direction as the prediction value; mode 2 (DC) is to copy the pixels from A to The average value of the 8 points D and I~L is used as the prediction value of all points, and modes 3 to 8 are to copy the reference pixel to the corresponding position of the current block according to a certain angle. Because some positions of the current block cannot exactly correspond to the reference pixels, it may be necessary to use the weighted average of the reference pixels, or the sub-pixels of the interpolated reference pixels.
HEVC使用的帧内预测模式有平面模式(Planar)、DC和33种角度模式,共35种预测模式。VVC使用的帧内模式有Planar、DC和65种角度模式,共67种预测模式。AVS3使用的帧内模式有DC、Plane、Bilinear和63种角度模式,共66种预测模式。The intra prediction modes used by HEVC include planar mode (Planar), DC and 33 angle modes, a total of 35 prediction modes. The intra-frame modes used by VVC include Planar, DC and 65 angle modes, with a total of 67 prediction modes. The intra-frame modes used by AVS3 include DC, Plane, Bilinear and 63 angle modes, a total of 66 prediction modes.
需要说明的是,随着角度模式的增加,帧内预测将会更加精确,也更加符合对高清以及超高清数字视频发展的需求。It should be noted that with the increase of the angle mode, the intra-frame prediction will be more accurate, and it will be more in line with the demand for the development of high-definition and ultra-high-definition digital video.
残差单元220可基于CU的像素块及CU的PU的预测块来产生CU的残差块。举例来说,残差单元220可产生CU的残差块,使得残差块中的每一采样具有等于以下两者之间的差的值:CU的像素块中的采样,及CU的PU的预测块中的对应采样。The residual unit 220 may generate a residual block of the CU based on the pixel blocks of the CU and the prediction blocks of the PUs of the CU. For example, residual unit 220 may generate a residual block for a CU such that each sample in the residual block has a value equal to the difference between the samples in the pixel blocks of the CU, and the samples in the PUs of the CU. Corresponding samples in the predicted block.
变换/量化单元230可量化变换系数。变换/量化单元230可基于与CU相关联的量化参数(QP)值来量化与CU的TU相关联的变换系数。视频编码器200可通过调整与CU相关联的QP值来调整应用于与CU相关联的变换系数的量化程度。Transform/quantization unit 230 may quantize the transform coefficients. Transform/quantization unit 230 may quantize transform coefficients associated with TUs of a CU based on quantization parameter (QP) values associated with the CU. Video encoder 200 may adjust the degree of quantization applied to transform coefficients associated with a CU by adjusting the QP value associated with the CU.
反变换/量化单元240可分别将逆量化及逆变换应用于量化后的变换系数,以从量化后的变换系数重建残差块。Inverse transform/quantization unit 240 may apply inverse quantization and inverse transform to the quantized transform coefficients, respectively, to reconstruct a residual block from the quantized transform coefficients.
重建单元250可将重建后的残差块的采样加到预测单元210产生的一个或多个预测块的对应采样,以产生与TU相关联的重建待编码块。通过此方式重建CU的每一个TU的采样块,视频编码器200可重建CU的像素块。The reconstruction unit 250 may add samples of the reconstructed residual block to corresponding samples of one or more prediction blocks generated by the prediction unit 210 to generate a reconstructed block to be encoded associated with the TU. By reconstructing the sample blocks of each TU of the CU in this way, the video encoder 200 can reconstruct the pixel blocks of the CU.
环路滤波单元260可执行消块滤波操作以减少与CU相关联的像素块的块效应。Loop filtering unit 260 may perform deblocking filtering operations to reduce blocking artifacts of pixel blocks associated with a CU.
在一些实施例中,环路滤波单元260包括去块滤波单元、样点自适应补偿SAO单元、自适应环路滤波ALF单元。In some embodiments, the loop filtering unit 260 includes a deblocking filtering unit, a sample point adaptive compensation SAO unit, and an adaptive loop filtering ALF unit.
解码图像缓存270可存储重建后的像素块。帧间预测单元211可使用含有重建后的像素块的参考 图像来对其它图像的PU执行帧间预测。另外,帧内预测单元212可使用解码图像缓存270中的重建后的像素块来对在与CU相同的图像中的其它PU执行帧内预测。The decoded image buffer 270 may store reconstructed pixel blocks. Inter prediction unit 211 may use reference pictures containing reconstructed pixel blocks to perform inter prediction on PUs of other pictures. In addition, intra prediction unit 212 may use the reconstructed pixel blocks in decoded picture cache 270 to perform intra prediction on other PUs in the same picture as the CU.
熵编码单元280可接收来自变换/量化单元230的量化后的变换系数。熵编码单元280可对量化后的变换系数执行一个或多个熵编码操作以产生熵编码后的数据。 Entropy encoding unit 280 may receive the quantized transform coefficients from transform/quantization unit 230 . Entropy encoding unit 280 may perform one or more entropy encoding operations on the quantized transform coefficients to generate entropy encoded data.
本申请涉及的视频编码的基本流程如下:在编码端,将当前图像划分成块,针对当前块,预测单元210使用帧内预测或帧间预测产生当前块的预测块。残差单元220可基于预测块与当前块的原始块计算残差块,即预测块和当前块的原始块的差值,该残差块也可称为残差信息。该残差块经由变换/量化单元230变换与量化等过程,可以去除人眼不敏感的信息,以消除视觉冗余。可选的,经过变换/量化单元230变换与量化之前的残差块可称为时域残差块,经过变换/量化单元230变换与量化之后的时域残差块可称为频率残差块或频域残差块。熵编码单元280接收到变换量化单元230输出的量化后的变换系数,可对该量化后的变换系数进行熵编码,输出码流。例如,熵编码单元280可根据目标上下文模型以及二进制码流的概率信息消除字符冗余。The basic flow of video coding involved in this application is as follows: at the coding end, the current image is divided into blocks, and for the current block, the prediction unit 210 uses intra prediction or inter prediction to generate a prediction block of the current block. The residual unit 220 may calculate a residual block based on the predicted block and the original block of the current block, that is, a difference between the predicted block and the original block of the current block, and the residual block may also be referred to as residual information. The residual block can be transformed and quantized by the transformation/quantization unit 230 to remove information that is not sensitive to human eyes, so as to eliminate visual redundancy. Optionally, the residual block before being transformed and quantized by the transform/quantization unit 230 may be called a time domain residual block, and the time domain residual block after being transformed and quantized by the transform/quantization unit 230 may be called a frequency residual block or a frequency-domain residual block. The entropy encoding unit 280 receives the quantized transform coefficients output by the transform and quantization unit 230 , may perform entropy encoding on the quantized transform coefficients, and output a code stream. For example, the entropy coding unit 280 can eliminate character redundancy according to the target context model and the probability information of the binary code stream.
另外,视频编码器对变换量化单元230输出的量化后的变换系数进行反量化和反变换,得到当前块的残差块,再将当前块的残差块与当前块的预测块进行相加,得到当前块的重建块。随着编码的进行,可以得到当前图像中其他待编码块对应的重建块,这些重建块进行拼接,得到当前图像的重建图像。由于编码过程中引入误差,为了降低误差,对重建图像进行滤波,例如,使用ALF对重建图像进行滤波,以减小重建图像中像素点的像素值与当前图像中像素点的原始像素值之间差异。将滤波后的重建图像存放在解码图像缓存270中,可以为后续的帧作为帧间预测的参考帧。In addition, the video encoder performs inverse quantization and inverse transformation on the quantized transform coefficients output by the transform and quantization unit 230 to obtain a residual block of the current block, and then adds the residual block of the current block to the prediction block of the current block, Get the reconstructed block of the current block. As the encoding proceeds, reconstructed blocks corresponding to other blocks to be encoded in the current image can be obtained, and these reconstructed blocks are spliced to obtain a reconstructed image of the current image. Due to the error introduced in the encoding process, in order to reduce the error, filter the reconstructed image, for example, use ALF to filter the reconstructed image to reduce the difference between the pixel value of the pixel in the reconstructed image and the original pixel value of the pixel in the current image difference. The filtered reconstructed image is stored in the decoded image buffer 270, which may serve as a reference frame for inter-frame prediction for subsequent frames.
需要说明的是,编码端确定的块划分信息,以及预测、变换、量化、熵编码、环路滤波等模式信息或者参数信息等在必要时携带在码流中。解码端通过解析码流及根据已有信息进行分析确定与编码端相同的块划分信息,预测、变换、量化、熵编码、环路滤波等模式信息或者参数信息,从而保证编码端获得的解码图像和解码端获得的解码图像相同。It should be noted that the block division information determined by the encoder, as well as mode information or parameter information such as prediction, transformation, quantization, entropy coding, and loop filtering, etc., are carried in the code stream when necessary. The decoding end analyzes the code stream and analyzes the existing information to determine the same block division information as the encoding end, prediction, transformation, quantization, entropy coding, loop filtering and other mode information or parameter information, so as to ensure the decoding image obtained by the encoding end It is the same as the decoded image obtained by the decoder.
图3是本申请实施例提供的视频解码器的示意性框图。Fig. 3 is a schematic block diagram of a video decoder provided by an embodiment of the present application.
如图3所示,视频解码器300包含:熵解码单元310、预测单元320、反量化/变换单元330、重建单元340、环路滤波单元350及解码图像缓存360。需要说明的是,视频解码器300可包含更多、更少或不同的功能组件。As shown in FIG. 3 , the video decoder 300 includes: an entropy decoding unit 310 , a prediction unit 320 , an inverse quantization/transformation unit 330 , a reconstruction unit 340 , a loop filter unit 350 and a decoded image buffer 360 . It should be noted that the video decoder 300 may include more, less or different functional components.
视频解码器300可接收码流。熵解码单元310可解析码流以从码流提取语法元素。作为解析码流的一部分,熵解码单元310可解析码流中的经熵编码后的语法元素。预测单元320、反量化/变换单元330、重建单元340及环路滤波单元350可根据从码流中提取的语法元素来解码视频数据,即产生解码后的视频数据。The video decoder 300 can receive code streams. The entropy decoding unit 310 may parse the codestream to extract syntax elements from the codestream. As part of parsing the codestream, the entropy decoding unit 310 may parse the entropy-encoded syntax elements in the codestream. The prediction unit 320 , the inverse quantization/transformation unit 330 , the reconstruction unit 340 and the loop filter unit 350 can decode video data according to the syntax elements extracted from the code stream, that is, generate decoded video data.
在一些实施例中,预测单元320包括帧内预测单元321和帧间预测单元322。In some embodiments, the prediction unit 320 includes an intra prediction unit 321 and an inter prediction unit 322 .
帧内预测单元321可执行帧内预测以产生PU的预测块。帧内预测单元321可使用帧内预测模式以基于空间相邻PU的像素块来产生PU的预测块。帧内预测单元321还可根据从码流解析的一个或多个语法元素来确定PU的帧内预测模式。Intra prediction unit 321 may perform intra prediction to generate a predictive block for a PU. Intra prediction unit 321 may use an intra prediction mode to generate a prediction block for a PU based on pixel blocks of spatially neighboring PUs. Intra prediction unit 321 may also determine an intra prediction mode for a PU from one or more syntax elements parsed from a codestream.
帧间预测单元322可根据从码流解析的语法元素来构造第一参考图像列表(列表0)及第二参考图像列表(列表1)。此外,如果PU使用帧间预测编码,则熵解码单元310可解析PU的运动信息。帧间预测单元322可根据PU的运动信息来确定PU的一个或多个参考块。帧间预测单元322可根据PU的一个或多个参考块来产生PU的预测块。The inter prediction unit 322 may construct a first reference picture list (list 0) and a second reference picture list (list 1) according to the syntax elements parsed from the codestream. Furthermore, if the PU is encoded using inter prediction, entropy decoding unit 310 may parse the motion information for the PU. Inter prediction unit 322 may determine one or more reference blocks for the PU according to the motion information of the PU. Inter prediction unit 322 may generate a predictive block for the PU from one or more reference blocks for the PU.
反量化/变换单元330可逆量化(即,解量化)与TU相关联的变换系数。反量化/变换单元330可使用与TU的CU相关联的QP值来确定量化程度。Inverse quantization/transform unit 330 may inverse quantize (ie, dequantize) transform coefficients associated with a TU. Inverse quantization/transform unit 330 may use QP values associated with CUs of the TU to determine the degree of quantization.
在逆量化变换系数之后,反量化/变换单元330可将一个或多个逆变换应用于逆量化变换系数,以便产生与TU相关联的残差块。After inverse quantizing the transform coefficients, inverse quantization/transform unit 330 may apply one or more inverse transforms to the inverse quantized transform coefficients in order to generate a residual block associated with the TU.
重建单元340使用与CU的TU相关联的残差块及CU的PU的预测块以重建CU的像素块。例如,重建单元340可将残差块的采样加到预测块的对应采样以重建CU的像素块,得到重建待编码块。 Reconstruction unit 340 uses the residual blocks associated with the TUs of the CU and the prediction blocks of the PUs of the CU to reconstruct the pixel blocks of the CU. For example, the reconstruction unit 340 may add the samples of the residual block to the corresponding samples of the prediction block to reconstruct the pixel block of the CU, and obtain the reconstructed block to be encoded.
环路滤波单元350可执行消块滤波操作以减少与CU相关联的像素块的块效应。Loop filtering unit 350 may perform deblocking filtering operations to reduce blocking artifacts of pixel blocks associated with a CU.
在一些实施例中,环路滤波单元350包括去块滤波单元、样点自适应补偿SAO单元、自适应环 路滤波ALF单元。In some embodiments, the loop filtering unit 350 includes a deblocking filtering unit, a sample point adaptive compensation SAO unit, and an adaptive loop filtering ALF unit.
视频解码器300可将CU的重建图像存储于解码图像缓存360中。视频解码器300可将解码图像缓存360中的重建图像作为参考图像用于后续预测,或者,将重建图像传输给显示装置呈现。Video decoder 300 may store the reconstructed picture of the CU in decoded picture cache 360 . The video decoder 300 may use the reconstructed picture in the decoded picture buffer 360 as a reference picture for subsequent prediction, or transmit the reconstructed picture to a display device for presentation.
本申请涉及的视频解码的基本流程如下:熵解码单元310可解析码流得到当前块的预测信息、量化系数矩阵等,预测单元320基于预测信息对当前块使用帧内预测或帧间预测产生当前块的预测块。反量化/变换单元330使用从码流得到的量化系数矩阵,对量化系数矩阵进行反量化、反变换得到残差块。重建单元340将预测块和残差块相加得到重建块。重建块组成重建图像,环路滤波单元350基于图像或基于块对重建图像进行环路滤波,得到解码图像。该解码图像也可以称为重建图像,该重建图像一方面可以被显示设备进行显示,另一方面可以存放在解码图像缓存360中,为后续的帧作为帧间预测的参考帧。The basic process of video decoding involved in this application is as follows: the entropy decoding unit 310 can analyze the code stream to obtain the prediction information of the current block, the quantization coefficient matrix, etc., and the prediction unit 320 uses intra prediction or inter prediction to generate the current block based on the prediction information. The predicted block for the block. The inverse quantization/transformation unit 330 uses the quantization coefficient matrix obtained from the code stream to perform inverse quantization and inverse transformation on the quantization coefficient matrix to obtain a residual block. The reconstruction unit 340 adds the predicted block and the residual block to obtain a reconstructed block. The reconstructed blocks form a reconstructed image, and the loop filtering unit 350 performs loop filtering on the reconstructed image based on the image or based on the block to obtain a decoded image. The decoded image can also be referred to as a reconstructed image. On the one hand, the reconstructed image can be displayed by a display device, and on the other hand, it can be stored in the decoded image buffer 360 and serve as a reference frame for inter-frame prediction for subsequent frames.
上述是基于块的混合编码框架下的视频编解码器的基本流程,随着技术的发展,该框架或流程的一些模块或步骤可能会被优化,本申请适用于该基于块的混合编码框架下的视频编解码器的基本流程,但不限于该框架及流程。The above is the basic process of the video codec under the block-based hybrid coding framework. With the development of technology, some modules or steps of the framework or process may be optimized. This application is applicable to the block-based hybrid coding framework. The basic process of the video codec, but not limited to the framework and process.
现实世界的场景具有很大的动态范围,从无月的深夜到正午刺眼的阳光,所跨越的动态范围高达14个数量级。在这种复杂的环境中,传统相机拍摄的低动态范围(Low Dynamic Range,简称LDR)图像会导致图像的某些部分曝光过度或曝光不足,无法真实还原现实世界,而高动态范围(High Dynamic Range,简称HDR)图像包含了现实场景中各种光照环境下丰富的光影和色彩信息,能够更加完整地记录或是展示出与真实场景基本相同的明暗区域的纹理细节。与此同时,HDR图像的获取相对复杂,在数据采集、传输、存储以及显示等方面对于硬件和算法也提出了更高的要求。Real-world scenes have a large dynamic range, spanning up to 14 orders of magnitude, from a moonless late night to the harsh midday sun. In such a complex environment, low dynamic range (LDR) images captured by traditional cameras will cause some parts of the image to be overexposed or underexposed, which cannot truly restore the real world, while high dynamic range (High Dynamic Range) Range (HDR for short) images contain rich light, shadow and color information in various lighting environments in real scenes, and can more completely record or show texture details in light and dark areas that are basically the same as real scenes. At the same time, the acquisition of HDR images is relatively complex, and higher requirements are placed on hardware and algorithms in terms of data acquisition, transmission, storage, and display.
近年来随着深度学习技术的快速发展,尤其是卷积神经网络(CNN)的广泛应用,使得从同一场景的单张或多张曝光低动态范围(LDR)图像重建涵盖整个动态范围的高动态范围(HDR)图像成为可能。In recent years, with the rapid development of deep learning technology, especially the wide application of convolutional neural network (CNN), reconstruction of high dynamic range (HDR) covering the entire dynamic range from single or multiple exposure low dynamic range (LDR) images of the same scene Range (HDR) images are possible.
本申请实施例提供一种基于模型的图像处理方法,通过模型将LDR图像转换为HDR图像。即编码端对LDR图像进行编码形成码流传输给解码端,解码端对LDR图像解码后,使用本申请实施例的模型对解码的LDR图像进行动态转换,得到HDR图像,在不增加数据采集、编码、传输、存储等成本的同时,实现HDR图像的转换。An embodiment of the present application provides a model-based image processing method, which converts an LDR image into an HDR image through a model. That is, the encoding end encodes the LDR image to form a code stream and transmits it to the decoding end. After decoding the LDR image, the decoding end uses the model of the embodiment of the present application to dynamically convert the decoded LDR image to obtain an HDR image. HDR image conversion is achieved while reducing the cost of encoding, transmission, and storage.
下面结合具体的实施例,对本申请实施例涉及的技术方案进行介绍。The technical solutions involved in the embodiments of the present application will be introduced below in conjunction with specific embodiments.
本申请提供的图像处理方法是使用动态转换模型将LDR图像转换为HDR图像,该动态转换模型为一段软件代码或者为具有数据处理功能的芯片。基于此,首先对动态转换模型的训练过程进行介绍。The image processing method provided in the present application converts an LDR image into an HDR image by using a dynamic conversion model, and the dynamic conversion model is a piece of software code or a chip with data processing functions. Based on this, the training process of the dynamic conversion model is firstly introduced.
图4为本申请一实施例提供的动态转换模型训练方法流程示意图,如图4所示,训练过程包括:Fig. 4 is a schematic flow chart of a dynamic conversion model training method provided by an embodiment of the present application. As shown in Fig. 4, the training process includes:
S401、获取LDR训练图像和LDR训练图像的HDR图像真值。S401. Acquire the LDR training image and the HDR image truth value of the LDR training image.
上述LDR训练图像为训练集中随机选取的一张LDR训练图像,该训练集中包括多张LDR训练图像,使用训练集中的LDR训练图像对动态转换模型的训练过程为迭代过程。例如,将第一张LDR训练图像输入待训练的动态转换模型中,对动态转换模型的初始参数进行一次调整,得到第一次训练过的动态转换模型。接着,将第二张LDR训练图像输入第一次训练过的动态转换模型中,对第一次训练过的动态转换模型的参数进行一次调整,得到第二次训练过的动态转换模型,参照上述方法,依次迭代,直到达到动态转换模型的训练结束条件为止。其中,动态转换模型的训练结束条件包括训练次数达到预设次数,或者损失达到预设损失。The above-mentioned LDR training image is a randomly selected LDR training image in the training set, which includes a plurality of LDR training images, and the training process of the dynamic conversion model using the LDR training images in the training set is an iterative process. For example, the first LDR training image is input into the dynamic conversion model to be trained, and the initial parameters of the dynamic conversion model are adjusted once to obtain the dynamic conversion model trained for the first time. Next, input the second LDR training image into the dynamic conversion model trained for the first time, adjust the parameters of the dynamic conversion model trained for the first time, and obtain the dynamic conversion model trained for the second time, refer to the above method, iterates in sequence until the training end condition of the dynamic conversion model is reached. Wherein, the training end condition of the dynamic conversion model includes that the number of training times reaches a preset number of times, or the loss reaches a preset loss.
上述动态转换模型的初始参数的确定方法包括但不限于如下几种:The methods for determining the initial parameters of the above-mentioned dynamic conversion model include but are not limited to the following:
方式一,动态转换模型的初始参数可以为预设值,或者为随机值,或者为经验值。In a first manner, the initial parameters of the dynamic conversion model may be preset values, or random values, or empirical values.
方式二,获取预训练模型在预训练时得到的预训练参数,将该预训练参数确定为动态转换模型的初始参数。The second way is to obtain the pre-training parameters obtained during the pre-training of the pre-training model, and determine the pre-training parameters as the initial parameters of the dynamic conversion model.
该方式二是将预训练模型的预训练参数确定为该动态转换模型的初始参数,可以降低该动态转换模型的训练次数和训练准确性。The second way is to determine the pre-training parameters of the pre-training model as the initial parameters of the dynamic conversion model, which can reduce the number of training times and training accuracy of the dynamic conversion model.
本申请实施例对预训练模型的类型不做限制,例如预训练模型为VGG-16网络模型。The embodiment of the present application does not limit the type of the pre-training model, for example, the pre-training model is the VGG-16 network model.
由上述可知,使用训练集中的每张LDR训练图像对动态转换模型进行训练的过程一致,为了便于描述,本申请实施例以一张LDR训练图像为例,对动态转换模型的训练过程进行说明。It can be known from the above that the process of training the dynamic conversion model using each LDR training image in the training set is consistent. For the convenience of description, the embodiment of the present application uses an LDR training image as an example to illustrate the training process of the dynamic conversion model.
在一些实施例中,上述LDR训练图像的HDR图像真值可以人工对LDR训练图像进行动态转换,生成的HDR图像。In some embodiments, the true value of the HDR image of the above-mentioned LDR training image may be dynamically converted to the LDR training image manually to generate an HDR image.
在一些实施例中,上述LDR训练图像的HDR图像真值可以是,使用已有的高动态转换方法将LDR训练图像转换而得到的HDR图像。In some embodiments, the true value of the HDR image of the above-mentioned LDR training image may be an HDR image obtained by converting the LDR training image using an existing high dynamic conversion method.
在一些实施例中,可以将采集的HDR图像转换为LDR图像,将转换得到的LDR图像作为LDR训练图像,将采集的HDR图像作为LDR训练图像的HDR图像真值。In some embodiments, the collected HDR image may be converted into an LDR image, the converted LDR image may be used as an LDR training image, and the collected HDR image may be used as a true value of the HDR image of the LDR training image.
本申请实施例对获取LDR训练图像,以及获取LDR训练图像的HDR图像真值的方式不做限制。The embodiment of the present application does not limit the way of acquiring the LDR training image and the HDR image true value of the LDR training image.
S402、将LDR训练图像输入动态转换模型进行动态转换,通过第i个编码模块对第i-1个第一特征信息进行特征提取,得到LDR训练图像的第i个第一特征信息。S402. Input the LDR training image into the dynamic conversion model for dynamic conversion, and extract the i-1 first feature information through the i-th encoding module to obtain the i-th first feature information of the LDR training image.
S403、通过第N-i+1个解码模块对第i-1个第一特征信息和LDR训练图像的第N-i个第二特征信息进行特征提取,得到LDR训练图像的第N-i+1个第二特征信息。S403, perform feature extraction on the i-1th first feature information and the N-ith second feature information of the LDR training image through the N-i+1 decoding module, and obtain the N-i+1th LDR training image Second characteristic information.
下面结合图5A对本申请实施例涉及的动态转换模型的网络结构进行介绍,需要说明的是,本申请实施例的动态转换模型的网络结构包括但不限于图5A所示的模块,还可以包括比图5A更多或更少的模块。The network structure of the dynamic conversion model involved in the embodiment of the present application will be introduced below in conjunction with FIG. 5A. It should be noted that the network structure of the dynamic conversion model in the embodiment of the present application includes but is not limited to the modules shown in FIG. Figure 5A More or less modules.
图5A为本申请一实施例涉及的动态转换模型的一种网络示意图,如图5A所示,动态转换模型可以理解为由N级编码组件和解码组件构成的自编码器网络。动态转换模型包括:串联连接的N个编码模块和串联连接的N个解码模块,N个编码模块中的最后一个编码模块的输出与N个解码模块中的第一个解码模块的输入连接,且第i个编码模块与第N-i+1个解码模块跳跃连接(skip connection),跳跃连接可以理解为第i个编码模块的输入端与第N-i+1个解码模块的输入端连接,第i个编码模块用于对第i-1个第一特征信息进行特征提取,得到LDR训练图像的第i个第一特征信息,第N-i+1个解码模块用于对第i-1个第一特征信息和LDR训练图像的第N-i个第二特征信息进行特征提取,得到LDR训练图像的第N-i+1个第二特征信息,i为小于或等于N的正整数,N为正整数。FIG. 5A is a schematic network diagram of a dynamic conversion model according to an embodiment of the present application. As shown in FIG. 5A , the dynamic conversion model can be understood as an autoencoder network composed of N-level encoding components and decoding components. The dynamic conversion model includes: N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in the N encoding modules is connected to the input of the first decoding module in the N decoding modules, and The i-th encoding module is connected to the N-i+1-th decoding module by skip connection. The skip connection can be understood as the connection between the input end of the i-th encoding module and the input end of the N-i+1-th decoding module. The i-th encoding module is used to perform feature extraction on the i-1-th first feature information to obtain the i-th first feature information of the LDR training image, and the N-i+1-th decoding module is used to extract the i-1-th first feature information The first feature information and the N-i second feature information of the LDR training image are extracted to obtain the N-i+1 second feature information of the LDR training image, i is a positive integer less than or equal to N, and N is positive integer.
其中,若i等于N,则上述第N-i个第二特征信息是根据第N个编码模块输出的第N个第一特征信息确定的。Wherein, if i is equal to N, the above N-i th second feature information is determined according to the N th first feature information output by the N th encoding module.
若i小于N,则上述第N-i个第二特征信息是根据第N-i个解码模块输出的第N-i个第二特征信息确定的。If i is less than N, the above N-i th second feature information is determined according to the N-i th second feature information output by the N-i th decoding module.
若i等于1,则上述第i-1个第一特征信息是根据LDR训练图像确定的。If i is equal to 1, the i-1th first feature information is determined according to the LDR training image.
若i大于1,则上述第i-1个第一特征信息是根据第i-1个编码模块输出的第一特征信息确定的。If i is greater than 1, the i-1th first feature information is determined according to the first feature information output by the i-1th coding module.
举例说明,图5A所示,N=4,编码组件包括4个串联的编码模块,解码组件包括4个串联的解码模块,最后一个编码模块的输出与第一个解码模块的输入端连接。第一个编码模块与第四个解码模块跳跃连接,第二个编码模块与第三个解码模块跳跃连接,第三个编码模块与第二个解码模块跳跃连接,第四个编码模块与第一个解码模块跳跃连接。For example, as shown in FIG. 5A , N=4, the encoding component includes 4 serial encoding modules, the decoding component includes 4 serial decoding modules, and the output of the last encoding module is connected to the input of the first decoding module. The first coding module is connected to the fourth decoding module by skipping, the second coding module is connected to the third decoding module by skipping, the third coding module is connected to the second decoding module by skipping, and the fourth coding module is connected to the first skip connections of decoding modules.
将LDR训练图像输入动态转换模型中,得到第0个第一特征信息,该第0个第一特征信息可以是LDR训练图像,或者对LDR训练图像进行处理后的特征图,本申请实施例对此不做限制。将第0个第一特征信息分别输入第一个编码模块和第四个解码模块中,第一个编码模块根据第0个第一特征信息输出第一个第一特征信息,并将第一个第一特征信息分别输入第二个编码模块和第三个解码模块中。第二个编码模块根据第一个第一特征信息得到第二个第一特征信息,并将第二个第一特征信息分别输入第三个编码模块和第二个解码模块中。第三个编码模块根据第二个第一个特征信息得到第三个第一特征信息,并将第三个第一特征信息分别输入第四个编码模块和第一个解码模块中。第四个编码模块根据第三个第一特征信息输出的第四个第一特征信息,并将第四个第一特征信息输入第一个解码模块。第一个解码模块根据第四个第一特征信息和第三个第一特征信息,得到第一个第二特征信息,并将第一个第二特征信息输入第二个解码模块中。第二个解码模块根据第一个第二特征信息和第二个第一特征信息,得到第二个第二特征信息,并将第二个第二特征信息输入第三个解码模块中。第三个解码模块根据第二个第二特征信息和第一个第一特征信息,得到第三个第二特征信息,并将第三个第二特征信息输入第四个解码模块中。第四个解码模块根据第0个第一特征信息和第三个第二特征信 息,得到第四个第二特征信息。Input the LDR training image into the dynamic conversion model to obtain the 0th first feature information, the 0th first feature information can be the LDR training image, or the feature map after the LDR training image is processed, the embodiment of the present application is for This is not limited. Input the 0th first feature information into the first encoding module and the fourth decoding module respectively, the first encoding module outputs the first first feature information according to the 0th first feature information, and the first The first feature information is respectively input into the second encoding module and the third decoding module. The second encoding module obtains the second first feature information according to the first first feature information, and inputs the second first feature information into the third encoding module and the second decoding module respectively. The third encoding module obtains the third first characteristic information according to the second first characteristic information, and inputs the third first characteristic information into the fourth encoding module and the first decoding module respectively. The fourth encoding module outputs the fourth first characteristic information according to the third first characteristic information, and inputs the fourth first characteristic information into the first decoding module. The first decoding module obtains the first second characteristic information according to the fourth first characteristic information and the third first characteristic information, and inputs the first second characteristic information into the second decoding module. The second decoding module obtains the second second characteristic information according to the first second characteristic information and the second first characteristic information, and inputs the second second characteristic information into the third decoding module. The third decoding module obtains the third second characteristic information according to the second second characteristic information and the first first characteristic information, and inputs the third second characteristic information into the fourth decoding module. The fourth decoding module obtains the fourth second characteristic information according to the 0th first characteristic information and the third second characteristic information.
在一些实施例中,如图5A所示,上述S403包括:对第i-1个第一特征信息和LDR训练图像的第N-i个第二特征信息进行级联,图5A中的“C”表示级联;将级联后的特征信息输入第N-i+1个解码模块进行特征提取,得到LDR训练图像的第N-i+1个第二特征信息。例如,将第四个第一特征信息和第三个第一特征信息进行级联,将级联后的第四个第一特征信息和第三个第一特征信息输入第一个解码模块,得到第一个解码模块输出的第一个第二特征信息。将第一个第二特征信息和第二个第一特征信息进行级联,并将级联后的第一个第二特征信息和第二个第一特征信息输入第二个解码模块,得到第二个解码模块输出的第二个第二特征信息。将第二个第二特征信息和第一个第一特征信息进行级联,将级联后的第二个第二特征信息和第一个第一特征信息输入第三个解码模块,得到第三个解码模块输出的第三个第二特征信息。同理,将第0个第一特征信息和第三个第二特征信息进行级联,将级联后的第0个第一特征信息和第三个第二特征信息输入第四个解码模块,得到第四个解码模块输出的第四个第二特征信息。In some embodiments, as shown in FIG. 5A, the above S403 includes: concatenating the i-1th first feature information and the N-ith second feature information of the LDR training image, and “C” in FIG. 5A indicates Concatenation: input the concatenated feature information into the N-i+1th decoding module for feature extraction, and obtain the N-i+1th second feature information of the LDR training image. For example, the fourth first feature information and the third first feature information are concatenated, and the concatenated fourth first feature information and third first feature information are input into the first decoding module to obtain The first second feature information output by the first decoding module. Concatenate the first second feature information and the second first feature information, and input the concatenated first second feature information and the second first feature information into the second decoding module to obtain the first The second second feature information output by the two decoding modules. The second second characteristic information and the first first characteristic information are concatenated, and the concatenated second second characteristic information and the first first characteristic information are input into the third decoding module to obtain the third The third second feature information output by a decoding module. Similarly, the 0th first characteristic information and the third second characteristic information are concatenated, and the concatenated 0th first characteristic information and the third second characteristic information are input into the fourth decoding module, The fourth second feature information output by the fourth decoding module is obtained.
本申请实施例对编码模块的具体网络结构不做限制。The embodiment of the present application does not limit the specific network structure of the encoding module.
在一种实施例中,N个编码模块中每个编码模块包括至少一个卷积块,其中N个编码模块中每个编码模块所包括的卷积块的参数不完全相同。例如,第一个编码模块所包括的卷积块的特征维度为64,第二个编码模块所包括的卷积块的特征维度为128,第三个编码模块所包括的卷积块的特征维度为256,第四个编码模块所包括的卷积块的特征维度为512等。In an embodiment, each of the N coding modules includes at least one convolutional block, and parameters of the convolutional blocks included in each of the N coding modules are not completely the same. For example, the feature dimension of the convolution block included in the first encoding module is 64, the feature dimension of the convolution block included in the second encoding module is 128, and the feature dimension of the convolution block included in the third encoding module is is 256, the feature dimension of the convolutional block included in the fourth encoding module is 512 and so on.
本申请实施例对解码模块的具体网络结构不做限制。The embodiment of the present application does not limit the specific network structure of the decoding module.
在一种实施例中,N个解码模块中每个解码模块包括至少一个卷积块,其中N个解码模块中每个解码模块所包括的卷积块的参数不完全相同。例如,第一个解码模块所包括的卷积块的特征维度为256,第二个解码模块所包括的卷积块的特征维度为128,第三个解码模块所包括的卷积块的特征维度为64,第四个编码模块所包括的卷积块的特征维度为32等。In an embodiment, each of the N decoding modules includes at least one convolutional block, and parameters of the convolutional blocks included in each of the N decoding modules are not completely the same. For example, the feature dimension of the convolution block included in the first decoding module is 256, the feature dimension of the convolution block included in the second decoding module is 128, and the feature dimension of the convolution block included in the third decoding module is is 64, the feature dimension of the convolutional block included in the fourth coding module is 32 and so on.
本申请实施例中各编码模块所包括的卷积块的网络结构可以相同,也可以不同。各解码模块所包括的卷积块的网络结构可以相同,也可以不同。另外,编码模块和解码模块所包括的卷积块的网络结构可以相同,也可以不同,本申请对此不做限制。The network structures of the convolutional blocks included in the encoding modules in the embodiments of the present application may be the same or different. The network structures of the convolutional blocks included in each decoding module may be the same or different. In addition, the network structures of the convolutional blocks included in the encoding module and the decoding module may be the same or different, which is not limited in this application.
在一种可能的实现方式中,编码模块和/或解码模块所包括卷积块的网络结构,包括卷积层1、卷积层2、卷积层3和激活函数。In a possible implementation manner, the network structure of the convolutional block included in the encoding module and/or the decoding module includes a convolutional layer 1, a convolutional layer 2, a convolutional layer 3 and an activation function.
可选的,如图5B所示,卷积层1和卷积层2的卷积核为3×3,卷积层3的卷积核为1×1,激活函数为Sigmoid加权线性单元(Sigmoid Weighted Liner Unit,简称SiLU)。Optionally, as shown in FIG. 5B, the convolution kernels of convolution layer 1 and convolution layer 2 are 3×3, the convolution kernel of convolution layer 3 is 1×1, and the activation function is a Sigmoid weighted linear unit (Sigmoid Weighted Liner Unit, referred to as SiLU).
需要说明的是,上述卷积层1、卷积层2、卷积层3的卷积核大小包括但不限于如上数值,激活函数包括但不限于SiLU,例如还可以是RELU等,本申请对此不做限制。It should be noted that the sizes of the convolution kernels of the above-mentioned convolutional layer 1, convolutional layer 2, and convolutional layer 3 include but are not limited to the above values, and the activation functions include but are not limited to SiLU, such as RELU, etc. This is not limited.
在一些实施例中,如图5C所示,动态转换模型还包括:位于第i个编码模块与第N-i+1个解码模块的跳跃连接中的卷积注意力模块(Convolutional Block Attention Module,简称CBAM)。该卷积注意力模块的注意力机制使动态转换模型能够将更多的注意力集中在编码侧特征的相关部分上,而将较少的注意力集中在其他无关的部分上,也就是说通过使用卷积注意力机制来提高动态转换模型的表征能力,关注重要的特征,抑制不必要的特征,从而极大地提高了模型的效率。In some embodiments, as shown in Figure 5C, the dynamic conversion model further includes: a convolutional block attention module (Convolutional Block Attention Module, Abbreviated as CBAM). The attention mechanism of this convolutional attention module enables the dynamic transformation model to focus more attention on the relevant parts of the encoding side features and less attention on other irrelevant parts, that is, by The convolutional attention mechanism is used to improve the representation ability of the dynamic conversion model, focusing on important features and suppressing unnecessary features, thus greatly improving the efficiency of the model.
在一种可能的实现方式中,在每个编码模块与解码模块的跳跃连接中均包括一个或多个CBAM。In a possible implementation manner, one or more CBAMs are included in the skip connections between each encoding module and decoding module.
在图5C所示的动态转换模型的基础上,上述S403通过第N-i+1个解码模块对第i-1个第一特征信息和LDR训练图像的第N-i个第二特征信息进行特征提取,得到LDR训练图像的第N-i+1个第二特征信息包括S403-A和S403-B:On the basis of the dynamic conversion model shown in Figure 5C, the above S403 performs feature extraction on the i-1th first feature information and the N-ith second feature information of the LDR training image through the N-i+1th decoding module , the N-i+1 second feature information of the LDR training image includes S403-A and S403-B:
S403-A、通过卷积注意力模块对第i-1个第一特征信息进行空间信息与通道信息提取,得到LDR训练图像的第i-1个第三特征信息。S403-A. Extract the spatial information and channel information of the i-1 th first feature information through the convolutional attention module, and obtain the i-1 th third feature information of the LDR training image.
S403-B、通过第N-i+1个解码模块对第i-1个第三特征信息和第N-i个第二特征信息进行特征提取,得到LDR训练图像的第N-i+1个第二特征信息。例如,将第i-1个第三特征信息和第N-i个第二特征信息进行级联,将级联后的第i-1个第三特征信息和第N-i个第二特征信息输入第N-i+1个解码模块,得到第N-i+1个解码模块输出的LDR训练图像的第N-i+1个第二特征信息。S403-B. Use the N-i+1th decoding module to perform feature extraction on the i-1th third feature information and the N-ith second feature information to obtain the N-i+1th second feature information of the LDR training image characteristic information. For example, the i-1th third feature information and the N-ith second feature information are concatenated, and the concatenated i-1th third feature information and N-ith second feature information are input into the N-th The i+1 decoding module obtains the N-i+1th second feature information of the LDR training image output by the N-i+1th decoding module.
本申请实施例对卷积注意力模块的网络结构不做限制。The embodiment of the present application does not limit the network structure of the convolutional attention module.
在一种可能的实现方式中,如图5D所示,卷积注意力模块包括:通道注意力模块和空间注意力模块。其中,通道注意力模块通过利用特征的通道间关系,对特征的通道信息进行学习,空间注意力模块通过利用特征的空间关系,对特征的空间信息进行学习。In a possible implementation, as shown in FIG. 5D , the convolutional attention module includes: a channel attention module and a spatial attention module. Among them, the channel attention module learns the channel information of features by using the inter-channel relationship of features, and the spatial attention module learns the spatial information of features by using the spatial relationship of features.
需要说明的是,这里所属的通道可以理解为特征维度,例如一个特征信息的特征维度为32,则表示该特征信息的通道数为32。It should be noted that the channel to which it belongs here can be understood as a feature dimension. For example, if the feature dimension of a piece of feature information is 32, it means that the number of channels of the feature information is 32.
在图5D的基础上,上述S403-A中通过卷积注意力模块对第i-1个第一特征信息进行空间信息与通道信息提取,得到LDR训练图像的第i-1个第三特征信息包括S403-A1至S403-A3:On the basis of Figure 5D, in the above S403-A, the spatial information and channel information of the i-1th first feature information are extracted through the convolution attention module, and the i-1th third feature information of the LDR training image is obtained Including S403-A1 to S403-A3:
S403-A1、通过通道注意力模块对第i-1个第一特征信息进行通道信息提取,得到第i-1个第一特征信息的通道注意力信息。S403-A1. Perform channel information extraction on the i-1 th first feature information through the channel attention module, and obtain channel attention information of the i-1 th first feature information.
S403-A2、通过空间注意力模块对第i-1个第一特征信息的融合通道特征信息进行空间信息提取,得到第i-1个第一特征信息的空间注意力信息。S403-A2. Using the spatial attention module, perform spatial information extraction on the fusion channel feature information of the i-1 first feature information, to obtain the spatial attention information of the i-1 first feature information.
其中,第i-1个第一特征信息的融合通道特征信息是根据第i-1个第一特征信息和第i-1个第一特征信息的通道注意力信息确定的。Wherein, the fused channel feature information of the i-1 th first feature information is determined according to the i-1 th first feature information and the channel attention information of the i-1 th first feature information.
在一些实施例中,如图5D所示,卷积注意力模块还包括第一乘法单元,此时S403-A2包括S403-A21和S403-A22:In some embodiments, as shown in Figure 5D, the convolutional attention module also includes a first multiplication unit, at this time S403-A2 includes S403-A21 and S403-A22:
S403-A21、通过第一乘法单元对第i-1个第一特征信息和第i-1个第一特征信息的通道注意力信息进行相乘,得到第i-1个第一特征信息的融合通道特征信息。S403-A21. Multiply the i-1 first feature information and the channel attention information of the i-1 first feature information by the first multiplication unit to obtain the fusion of the i-1 first feature information Channel characteristic information.
S403-A22、将第i-1个第一特征信息的融合通道特征信息输入空间注意力模块进行空间信息提取,得到第i-1个第一特征信息的空间注意力信息。S403-A22. Input the fused channel feature information of the i-1 th first feature information into the spatial attention module to extract spatial information, and obtain the spatial attention information of the i-1 th first feature information.
S403-A3、根据第i-1个第一特征信息的通道注意力信息和空间注意力信息,确定LDR训练图像的第i-1个第三特征信息。S403-A3. Determine the i-1th third feature information of the LDR training image according to the channel attention information and the spatial attention information of the i-1th first feature information.
在一些实施例中,如图5D所示,卷积注意力模块还包括第二乘法单元,则S403-A3包括:通过第二乘法单元对第i-1个第一特征信息的融合通道特征信息和空间注意力信息进行相乘,得到LDR训练图像的第i-1个第三特征信息。In some embodiments, as shown in FIG. 5D , the convolutional attention module further includes a second multiplication unit, then S403-A3 includes: the fusion channel feature information of the i-1th first feature information through the second multiplication unit Multiply with the spatial attention information to obtain the i-1th third feature information of the LDR training image.
举例说明,卷积注意力模块的网络结构如图5D所示,假设第i-1个第一特征信息为特征图F,将特征图F输入CBAM模块,CBAM模块会沿着两个独立的维度(即通道维度和空间维度)依次推断注意力图,然后将注意力图与输入特征图相乘以进行自适应特征优化。具体来说,首先经过通道注意力模块得到一维通道注意力图MC,将MC与输入特征F进行乘法运算之后得到F’。将F’输入空间注意力模块,经过空间注意力模块得到二维空间注意力图Ms。将Ms与F’进行乘法运算后得到最终的特征图F”,该最终的特征图为LDR训练图像的第i-1个第三特征信息。For example, the network structure of the convolutional attention module is shown in Figure 5D. Assume that the i-1th first feature information is a feature map F, and the feature map F is input into the CBAM module, and the CBAM module will follow two independent dimensions (i.e. channel dimension and spatial dimension) the attention map is sequentially inferred, and then the attention map is multiplied with the input feature map for adaptive feature optimization. Specifically, firstly, the one-dimensional channel attention map MC is obtained through the channel attention module, and F' is obtained after multiplying MC and the input feature F. Input F' into the spatial attention module, and get the two-dimensional spatial attention map Ms through the spatial attention module. The final feature map F" is obtained after multiplying Ms and F', and the final feature map is the i-1th third feature information of the LDR training image.
需要说明的是,图5D中
Figure PCTCN2021102173-appb-000001
表示对应元素依次相乘。在这里,如果输入特征图F的维度是H×W×C,那么一维通道注意力图MC的维度为1×1×C,二维空间注意力图Ms的维度为H×W×1。
It should be noted that in Figure 5D
Figure PCTCN2021102173-appb-000001
Indicates that the corresponding elements are multiplied sequentially. Here, if the dimension of the input feature map F is H×W×C, then the dimension of the 1D channel attention map MC is 1×1×C, and the dimension of the 2D spatial attention map Ms is H×W×1.
下面结合通道注意力模块的网络结构,对上述S403-A1进行说明。The above S403-A1 will be described below in conjunction with the network structure of the channel attention module.
在一些实施例中,如图5E所示,通道注意力模块包括:第一空间压缩单元、第二空间压缩单元和通道特征提取单元。其中,第一空间压缩单元和第二空间压缩单元均用于对特征图进行空间尺寸的压缩,通道特征提取单元用于对空间压缩后的特征图进行特征提取。即如图5F所示,本申请为了有效地计算通道注意力,对输入特征图的空间维度进行了压缩。In some embodiments, as shown in FIG. 5E , the channel attention module includes: a first spatial compression unit, a second spatial compression unit, and a channel feature extraction unit. Wherein, both the first space compression unit and the second space compression unit are used to compress the spatial size of the feature map, and the channel feature extraction unit is used to perform feature extraction on the space compressed feature map. That is, as shown in Figure 5F, in order to efficiently calculate channel attention, the present application compresses the spatial dimension of the input feature map.
可选的,上述第一空间压缩单元和/或第二空间压缩单元包括池化层。Optionally, the above-mentioned first spatial compression unit and/or the second spatial compression unit includes a pooling layer.
可选的,上述第一空间压缩单元为最大池化层,和/或第二空间压缩单元为平均池化层。Optionally, the above-mentioned first spatial compression unit is a maximum pooling layer, and/or the second spatial compression unit is an average pooling layer.
可选的,通道特征提取单元为多层感知机(Multilayer perception,简称MLP),例如MLP为包含单隐层的MLP。Optionally, the channel feature extraction unit is a multilayer perception machine (Multilayer perception, MLP for short), for example, the MLP is an MLP including a single hidden layer.
在图5E的基础上,上述S403-A1中通过通道注意力模块对第i-1个第一特征信息进行通道信息提取,得到第i-1个第一特征信息的通道注意力信息包括S403-A11至S403-A15:On the basis of Figure 5E, in the above S403-A1, channel information is extracted from the i-1 first feature information through the channel attention module, and the channel attention information of the i-1 first feature information is obtained including S403- A11 to S403-A15:
S403-A11、通过第一空间压缩单元对第i-1个第一特征信息进行空间维度压缩,得到第i-1个第一特征信息的第一空间压缩信息。S403-A11. Perform spatial dimension compression on the i-1 th first feature information by the first spatial compression unit, to obtain first spatial compression information of the i-1 th first feature information.
S403-A12、通过第二空间压缩单元对第i-1个第一特征信息进行空间维度压缩,得到第i-1个第一特征信息的第二空间压缩信息。S403-A12. Perform spatial dimension compression on the i-1 th first feature information by the second spatial compression unit, to obtain second spatial compression information of the i-1 th first feature information.
S403-A13、通过通道特征提取单元对第i-1个第一特征信息的第一空间压缩信息进行通道特征提取,得到i-1个第一特征信息的第一通道信息。S403-A13. Perform channel feature extraction on the first spatially compressed information of the i-1 first feature information by the channel feature extraction unit, to obtain the first channel information of the i-1 first feature information.
S403-A14、通过通道特征提取单元对第i-1个第一特征信息的第二空间压缩信息进行通道特征提取,得到i-1个第一特征信息的第二通道信息。S403-A14. Perform channel feature extraction on the i-1 second spatially compressed information of the first feature information by the channel feature extraction unit to obtain the i-1 second channel information of the first feature information.
S403-A15、根据i-1个第一特征信息的第一通道信息和第二通道信息,确定第i-1个第一特征信息的通道注意力信息。S403-A15. Determine the channel attention information of the i-1 first feature information according to the first channel information and the second channel information of the i-1 first feature information.
在一些实施例中,如图5E所示,通道注意力模块还包括:第一加法单元和第一激活函数,此时,上述S403-A15包括:In some embodiments, as shown in FIG. 5E, the channel attention module further includes: a first addition unit and a first activation function. At this time, the above S403-A15 includes:
S403-A151、通过第一加法单元对i-1个第一特征信息的第一通道信息和第二通道信息进行相加,得到i-1个第一特征信息的融合通道信息。S403-A151. Add the first channel information and the second channel information of the i-1 pieces of first feature information by the first adding unit to obtain the fusion channel information of the i-1 pieces of first feature information.
S403-A152、通过第一激活函数对i-1个第一特征信息的融合通道信息进行非线性处理,得到第i-1个第一特征信息的通道注意力信息。S403-A152. Perform non-linear processing on the fused channel information of the i-1 pieces of first feature information by using the first activation function to obtain channel attention information of the i-1 th piece of first feature information.
本申请实施例对第一激活函数的具体形式不做限制,具体根据实际需要确定。The embodiment of the present application does not limit the specific form of the first activation function, which is specifically determined according to actual needs.
下面结合空间注意力模块的网络结构,对上述S403-A2进行说明。The above S403-A2 will be described below in conjunction with the network structure of the spatial attention module.
在一些实施例中,如图5F所示,空间注意力模块包括:第一通道压缩单元、第二通道压缩单元和空间特征提取单元。第一通道压缩单元、第二通道压缩单元均用于对特征图进行通道维度的压缩,空间特征提取单元用于对通道压缩后的特征图进行特征提取。即如图5F所示空间注意力模块,通过利用特征间的空间关系生成空间注意力图。空间注意力与通道注意力相辅相成。为了计算空间注意力,对输入特征图的通道维度进行了压缩。In some embodiments, as shown in FIG. 5F , the spatial attention module includes: a first channel compression unit, a second channel compression unit, and a spatial feature extraction unit. Both the first channel compression unit and the second channel compression unit are used to compress the channel dimension of the feature map, and the spatial feature extraction unit is used to perform feature extraction on the channel compressed feature map. That is, the spatial attention module shown in Figure 5F generates a spatial attention map by utilizing the spatial relationship between features. Spatial attention complements channel attention. To compute spatial attention, the channel dimensions of the input feature maps are compressed.
可选的,上述第一通道压缩单元和/或第二通道压缩单元包括池化层。Optionally, the first channel compression unit and/or the second channel compression unit include a pooling layer.
可选的,上述第一通道压缩单元为最大池化层(MaxPool),和/或第二通道压缩单元为平均池化(AvgPool)层。Optionally, the first channel compression unit is a maximum pooling layer (MaxPool), and/or the second channel compression unit is an average pooling (AvgPool) layer.
可选的,上述空间特征提取单元为卷积层。Optionally, the aforementioned spatial feature extraction unit is a convolutional layer.
此时,上述S403-A2通过空间注意力模块对第i-1个第一特征信息的融合通道特征信息进行空间信息提取,得到第i-1个第一特征信息的空间注意力信息,包括S403-A21至S403-A24:At this time, the above S403-A2 uses the spatial attention module to extract the spatial information of the fusion channel feature information of the i-1 first feature information to obtain the spatial attention information of the i-1 first feature information, including S403 -A21 to S403-A24:
S403-A21、通过第一通道压缩单元对第i-1个第一特征信息的融合通道特征信息进行通道维度压缩,得到第i-1个第一特征信息的第一通道压缩信息。S403-A21. Perform channel dimension compression on the fused channel feature information of the i-1 th first feature information by the first channel compression unit, to obtain the first channel compressed information of the i-1 th first feature information.
S403-A22、通过第二通道压缩单元对第i-1个第一特征信息的融合通道特征信息进行通道维度压缩,得到第i-1个第一特征信息的第二通道压缩信息。S403-A22. Perform channel dimension compression on the fused channel feature information of the i-1 th first feature information by the second channel compression unit to obtain second channel compressed information of the i-1 th first feature information.
S403-A23、通过空间特征提取单元对第i-1个第一特征信息的第一通道压缩信息和第二通道压缩信息进行空间特征提取,得到第i-1个第一特征信息的空间特征信息。S403-A23, performing spatial feature extraction on the first channel compressed information and the second channel compressed information of the i-1 first feature information through the spatial feature extraction unit, to obtain the spatial feature information of the i-1 first feature information .
S403-A24、根据第i-1个第一特征信息的空间特征信息,确定第i-1个第一特征信息的空间注意力信息。S403-A24. Determine the spatial attention information of the i-1 th first feature information according to the spatial feature information of the i-1 th first feature information.
在一些实施例中,如图5F所示,空间注意力模块还包括第二激活函数,S403-A24包括:通过第二激活函数对第i-1个第一特征信息的空间特征信息进行非线性处理,得到第i-1个第一特征信息的空间注意力信息。In some embodiments, as shown in FIG. 5F , the spatial attention module further includes a second activation function, and S403-A24 includes: performing non-linearity on the spatial feature information of the i-1th first feature information through the second activation function processing to obtain the spatial attention information of the i-1th first feature information.
本申请实施例对第二激活函数的具体形式不做限制,例如为sigmoid激活函数。The embodiment of the present application does not limit the specific form of the second activation function, for example, a sigmoid activation function.
在一种具体示例中,例如空间注意力模块利用平均池化(即第二通道压缩单元)和最大池化(即第一通道压缩单元)操作沿着通道(channel)轴生成相应的特征向量,并将两者连接起来生成有效的特征描述符。在此基础上,经过一个卷积层(即空间特征提取单元)降维为一个通道,经过sigmoid激活函数(即第二激活函数)后生成二维的空间注意力特征图Ms。In a specific example, for example, the spatial attention module utilizes average pooling (ie, the second channel compression unit) and maximum pooling (ie, the first channel compression unit) operations to generate corresponding feature vectors along the channel (channel) axis, and concatenate the two to generate efficient feature descriptors. On this basis, after a convolutional layer (ie, the spatial feature extraction unit) is reduced to a channel, a two-dimensional spatial attention feature map Ms is generated after a sigmoid activation function (ie, the second activation function).
可选的,上述第i-1个第一特征信息的通道注意力信息的空间维度为1×1。Optionally, the spatial dimension of the channel attention information of the i-1th first feature information is 1×1.
可选的,上述第i-1个第一特征信息的空间注意力信息的特征维度为1。Optionally, the feature dimension of the spatial attention information of the i-1th first feature information is 1.
本申请实施例的提供的动态转换模型,通过在每一条分支上增加了卷积注意力模块,该卷积注意 力模块包含通道注意力模块以及空间注意力模块,分别对通道特征和空间特征分别进行学习,进而提高了动态转换模型对图像细节特征的学习,使得训练后的动态转换模型可以重建出图像中更多细节特征,进而提高动态转换模型生成的HDR图像的质量。The dynamic conversion model provided by the embodiment of the present application adds a convolutional attention module to each branch, and the convolutional attention module includes a channel attention module and a spatial attention module, respectively for channel features and spatial features Learning is carried out, thereby improving the learning of image detail features by the dynamic conversion model, so that the trained dynamic conversion model can reconstruct more detailed features in the image, thereby improving the quality of the HDR image generated by the dynamic conversion model.
在一些实施例中,如图5G所示,动态转换模型还包括至少一个下采样单元,本申请实施例的训练方法还包括:通过下采样单元对编码模块输出的特征信息进行空间维度下采样。即本申请实施例为了降低网络复杂度,在编码组件中设置至少一个下采样单元,以降低编码模块输出的特征信息的空间维度。In some embodiments, as shown in FIG. 5G , the dynamic conversion model further includes at least one downsampling unit, and the training method in the embodiment of the present application further includes: performing spatial dimension downsampling on the feature information output by the encoding module through the downsampling unit. That is, in order to reduce network complexity in the embodiment of the present application, at least one downsampling unit is set in the coding component to reduce the spatial dimension of the feature information output by the coding module.
本申请实施例对动态转换模型所包括的下采样单元的个数不做限制,具体根据实际需求确定。The embodiment of the present application does not limit the number of down-sampling units included in the dynamic conversion model, which is specifically determined according to actual requirements.
在一种可能的实现方式中,在相邻的两个编码模块之间设置一个下采样单元,用于对上一个编码单元输出的特征信息进行空间维度下采样后,输入下一个编码模块中,这样不仅降低了编码模块处理的数据量,降低模型的复杂度,并且可以使各编码模块对不同尺寸上的特征进行学习,以提高动态转换模型的预测准确性。In a possible implementation, a downsampling unit is set between two adjacent encoding modules, which is used to downsample the feature information output by the previous encoding unit in a spatial dimension, and then input it into the next encoding module, This not only reduces the amount of data processed by the encoding module and reduces the complexity of the model, but also enables each encoding module to learn features of different sizes to improve the prediction accuracy of the dynamic conversion model.
可选的,下采样单元为最大池化层。Optionally, the downsampling unit is a maximum pooling layer.
在一些实施例中,如图5G所示,动态转换模型还包括至少一个上采样单元,本申请实施例的训练方法还包括:通过上采样单元对解码模块输出的特征信息进行空间维度上采样。In some embodiments, as shown in FIG. 5G , the dynamic conversion model further includes at least one upsampling unit, and the training method in the embodiment of the present application further includes: performing spatial dimension upsampling on the feature information output by the decoding module through the upsampling unit.
如图5G所示,由于在编码组件中设置了至少一个下采样单元,为了保证解码出的图像的大小与原始图像的大小一致,则在解码组件中设置至少一个上采样单元,用于对解码模块输出的特征信息进行空间维度上采样。As shown in Figure 5G, since at least one down-sampling unit is set in the encoding component, in order to ensure that the size of the decoded image is consistent with the size of the original image, at least one up-sampling unit is set in the decoding component for decoding The feature information output by the module is up-sampled in the spatial dimension.
可选的,上采样单元为双线性插值单元。Optionally, the upsampling unit is a bilinear interpolation unit.
在一些实施例中,如图5G所示,动态转换模型还包括第一卷积层,第一卷积层位于动态转换模型的输入端,用于对输入动态转换模型的图像进行处理,得到输入图像的初始特征图。例如,将LDR训练图像输入动态转换模型,通过动态转换模型中的第一卷积层对LDR训练图像进行特征提取,得到LDR训练图像的初始特征图;将初始特征图分别输入第一个编码模块和第一卷积注意力模块中,得到第一个编码模块输出的第一个第一特征信息,以及得到第一个卷积注意力模块输出的第一个第三特征信息。上述初始特征图可以理解为上述的第0个第一特征信息。In some embodiments, as shown in Figure 5G, the dynamic conversion model further includes a first convolutional layer, the first convolutional layer is located at the input end of the dynamic conversion model, and is used to process the image input to the dynamic conversion model to obtain the input The initial feature map of the image. For example, input the LDR training image into the dynamic conversion model, and extract the features of the LDR training image through the first convolutional layer in the dynamic conversion model to obtain the initial feature map of the LDR training image; input the initial feature map into the first encoding module respectively And in the first convolutional attention module, the first first feature information output by the first encoding module, and the first third feature information output by the first convolutional attention module are obtained. The aforementioned initial feature map can be understood as the aforementioned 0th first feature information.
本申请实施例,根据上述方法,将LDR训练图像输入动态转换模型,可以得到动态转换模型中最后一个解码模块输出的LDR训练图像的第二特征信息,接着,执行如下S404。In the embodiment of the present application, according to the above method, the LDR training image is input into the dynamic conversion model, and the second characteristic information of the LDR training image output by the last decoding module in the dynamic conversion model can be obtained, and then, the following S404 is performed.
S404、根据N个解码模块中最后一个解码模块输出的LDR训练图像的第二特征信息,确定LDR训练图像的HDR图像预测值。S404. Determine the HDR image prediction value of the LDR training image according to the second characteristic information of the LDR training image output by the last decoding module among the N decoding modules.
在一些实施例中,将LDR训练图像的第二特征信息的通道转换为3通道(例如RGB通道),得到LDR训练图像的HDR图像预测值。In some embodiments, the channel of the second feature information of the LDR training image is converted into 3 channels (such as RGB channels) to obtain the predicted value of the HDR image of the LDR training image.
在一些实施例中,如图5G所示,动态转换模型还包括第二卷积层,则上述S404包括:通过第二卷积层对最后一个解码模块输出的LDR训练图像的第二特征信息进行特征提取,输出LDR训练图像的HDR图像预测值。In some embodiments, as shown in FIG. 5G, the dynamic conversion model further includes a second convolutional layer, then the above S404 includes: performing the second feature information of the LDR training image output by the last decoding module through the second convolutional layer Feature extraction, output the HDR image prediction value of the LDR training image.
上述第二卷积层还包括激活函数,且该第二卷积层的特征维度为3,即经过该第二卷积层后可以输出3通道(例如RGB)图像,将该3通道图像作为LDR训练图像的HDR图像预测值。The second convolutional layer above also includes an activation function, and the feature dimension of the second convolutional layer is 3, that is, after passing through the second convolutional layer, a 3-channel (such as RGB) image can be output, and the 3-channel image can be used as an LDR HDR image predictors for training images.
可选的,第二卷积层的卷积核的大小可以为1×1。Optionally, the size of the convolution kernel of the second convolution layer may be 1×1.
S405、确定LDR训练图像的HDR图像预测值和LDR训练图像的HDR图像真值之间的目标损失,并根据损失对动态转换模型进行训练。S405. Determine the target loss between the predicted value of the HDR image of the LDR training image and the true value of the HDR image of the LDR training image, and train the dynamic transformation model according to the loss.
根据上述S404的步骤得到LDR训练图像的HDR图像预测值后,将LDR训练图像的HDR图像预测值与LDR训练图像的HDR图像真值进行比较,确定LDR训练图像的HDR图像预测值与LDR训练图像的HDR图像真值之间的目标损失,并根据该目标损失对动态转换模型中的参数进行调整,实现对动态转换模型的一次训练。接着,使用另一张LDR训练图像参照与上述相同的步骤对动态转换模型进行训练,直到动态转换模型训练结束为止。After the HDR image prediction value of the LDR training image is obtained according to the steps of S404 above, the HDR image prediction value of the LDR training image is compared with the HDR image true value of the LDR training image to determine the HDR image prediction value of the LDR training image and the LDR training image The target loss between the true values of the HDR image, and adjust the parameters in the dynamic conversion model according to the target loss, to achieve a training of the dynamic conversion model. Next, use another LDR training image to train the dynamic transformation model by referring to the same steps as above, until the dynamic transformation model training is completed.
在一些实施例中,上述S405中确定损失的方式包括S405A:根据预设的损失函数,确定LDR训练图像的HDR图像预测值和LDR训练图像的HDR图像真值之间的目标损失。In some embodiments, the manner of determining the loss in S405 includes S405A: according to a preset loss function, determine a target loss between the predicted value of the HDR image of the LDR training image and the true value of the HDR image of the LDR training image.
可选的,上述预设的损失函数包括重构损失函数、感知损失函数和样式损失函数中的至少一个。Optionally, the aforementioned preset loss function includes at least one of a reconstruction loss function, a perceptual loss function, and a style loss function.
在一种可能的实现方式中,上述预设的损失函数包括重构损失函数、感知损失函数和样式损失函数,此时,S405A包括:In a possible implementation, the above preset loss function includes a reconstruction loss function, a perceptual loss function, and a style loss function. At this time, S405A includes:
确定HDR图像预测值与HDR图像真值之间的重构损失;Determine the reconstruction loss between the predicted value of the HDR image and the true value of the HDR image;
确定HDR图像预测值与HDR图像真值之间的感知损失;Determine the perceptual loss between the predicted value of the HDR image and the true value of the HDR image;
确定HDR图像预测值与HDR图像真值之间的样式损失;Determine the style loss between the predicted value of the HDR image and the true value of the HDR image;
根据HDR图像预测值与HDR图像真值之间的重构损失、感知损失和样式损失,确定HDR图像预测值与HDR图像真值之间的目标损失。According to the reconstruction loss, perceptual loss and style loss between the predicted value of the HDR image and the ground truth value of the HDR image, the target loss between the predicted value of the HDR image and the ground truth value of the HDR image is determined.
其中,重构损失确定HDR图像预测值在像素上逼近HDR图像真值。Among them, the reconstruction loss determines that the predicted value of the HDR image is close to the true value of the HDR image on the pixel.
感知损失评估了HDR图像预测值的特征与从HDR图像真值提取的特征的匹配程度,并允许模型产生在感觉上与HDR图像真值相似的纹理,即感知损失确保生成具有更多纹理细节的视觉上令人愉悦的图像。The perceptual loss evaluates how well the features of the predicted value of the HDR image match the features extracted from the ground truth of the HDR image, and allows the model to produce textures that are perceptually similar to the ground truth of the HDR image, i.e., the perceptual loss ensures the generation of textures with more texture details. Visually pleasing images.
样式损失通过将全局统计数据与整个图像上收集的Gram矩阵进行比较,捕获样式和纹理,保证了预测图像的样式一致性和颜色一致性。The style loss captures both style and texture by comparing global statistics with Gram matrices collected over the entire image, ensuring both style consistency and color consistency of the predicted image.
在一些实施例中,可以将重构损失、感知损失和样式损失的权重和作为目标损失。In some embodiments, the weighted sum of reconstruction loss, perceptual loss and style loss can be used as the target loss.
例如根据如下公式(1),确定HDR图像预测值与HDR图像真值之间的目标损失:For example, according to the following formula (1), determine the target loss between the predicted value of the HDR image and the true value of the HDR image:
Loss=L 1sL stpL p     (1) Loss=L 1s L stp L p (1)
其中,Loss为目标损失,L1为重构损失,Lst为感知损失,Lp为样式损失,λ s和λ p是超参数。上述公式(1)中可以理解为重构损失的权重为1,感知损失的权重为λ s,样式损失的权重为λ pAmong them, Loss is the target loss, L1 is the reconstruction loss, Lst is the perceptual loss, Lp is the style loss, and λ s and λ p are hyperparameters. In the above formula (1), it can be understood that the weight of the reconstruction loss is 1, the weight of the perceptual loss is λ s , and the weight of the style loss is λ p .
需要说明的是,上述公式(1)只是一种示例,本申请确定目标损失的方式包括但不限于上述公式(1)所示,例如在公式(1)中增加、减少、相乘或相除某一个参数,或者,上述公式(1)的等价变形等,均属于本申请的保护范围。It should be noted that the above formula (1) is just an example, and the method of determining the target loss in this application includes but is not limited to the above formula (1), such as adding, subtracting, multiplying or dividing in formula (1) A certain parameter, or the equivalent deformation of the above formula (1), etc., all belong to the protection scope of the present application.
在一种示例中,根据预设的压缩色调映射函数,确定HDR图像预测值的压缩色调映射值;根据压缩色调映射函数,确定HDR图像真值的压缩色调映射值;根据HDR图像真值的压缩色调映射值与HDR图像预测值的压缩色调映射值之间的误差,确定重构损失。In one example, according to the preset compressed tone mapping function, the compressed tone mapping value of the predicted value of the HDR image is determined; according to the compressed tone mapping function, the compressed tone mapping value of the true value of the HDR image is determined; according to the compression of the true value of the HDR image The error between the tonemapped value and the compressed tonemapped value of the HDR image prediction determines the reconstruction loss.
例如,根据如下公式(2)确定重构损失:For example, the reconstruction loss is determined according to the following formula (2):
L 1=‖T(H)-T(GT)‖ 1     (2) L 1 =‖T(H)-T(GT)‖ 1 (2)
其中,L1表示重构损失,T为μ-law压缩色调映射函数,T(H)为HDR图像预测值的压缩色调映射值,T(GT)为HDR图像真值的压缩色调映射值,
Figure PCTCN2021102173-appb-000002
x=H或GT,H为动态转换模型输出的HDR图像预测值,GT为LDR训练图像的HDR图像真值,“‖.‖ 1”表示L1范数,μ为预设参数。
Among them, L1 represents the reconstruction loss, T is the μ-law compressed tone mapping function, T(H) is the compressed tone mapping value of the predicted value of the HDR image, and T(GT) is the compressed tone mapping value of the true value of the HDR image,
Figure PCTCN2021102173-appb-000002
x=H or GT, H is the predicted value of the HDR image output by the dynamic conversion model, GT is the true value of the HDR image of the LDR training image, "‖.‖ 1 "indicates the L1 norm, and μ is the preset parameter.
需要说明的是,上述公式(2)只是一种示例,本申请确定重构损失的方式包括但不限于上述公式(2)所示,例如在公式(2)中增加、减少、相乘或相除某一个参数,或者,上述公式(2)的等价变形等,均属于本申请的保护范围。It should be noted that the above formula (2) is just an example, and the method of determining the reconstruction loss in this application includes but is not limited to the above formula (2), such as adding, subtracting, multiplying or multiplying in formula (2) Except for a certain parameter, or the equivalent deformation of the above formula (2), etc., all belong to the protection scope of the present application.
在一种示例中,通过如下方式确定感知损失:获取预训练模型的第l层的特征图;根据预设的压缩色调映射函数,确定HDR图像预测值的压缩色调映射值;根据压缩色调映射函数,确定HDR图像真值的压缩色调映射值;确定HDR图像预测值的压缩色调映射值,在第l层的特征图中对应的第一特征值;确定HDR图像真值的压缩色调映射值,在第l层的特征图中对应的第二特征值;根据第一特征值与第二特征值之间的误差,确定感知损失。In one example, the perceptual loss is determined in the following manner: obtain the feature map of the l-th layer of the pre-training model; determine the compressed tone-mapping value of the HDR image prediction value according to the preset compressed tone-mapping function; according to the compressed tone-mapping function , determine the compressed tone mapping value of the true value of the HDR image; determine the compressed tone mapping value of the predicted value of the HDR image, the first feature value corresponding to the feature map of the l layer; determine the compressed tone mapping value of the true value of the HDR image, in The second eigenvalue corresponding to the feature map of the l-th layer; determining the perceptual loss according to the error between the first eigenvalue and the second eigenvalue.
例如,根据如下公式(3)确定感知损失:For example, the perceptual loss is determined according to the following formula (3):
Figure PCTCN2021102173-appb-000003
Figure PCTCN2021102173-appb-000003
其中,Lp表示感知损失,φ l表示预训练模型的第l层的特征图,例如TGG-16的第l层的特征图, 该特征图的大小为C l×H l×W l,φ l(T(H))为HDR图像预测值的压缩色调映射值在第l层的特征图中对应的第一特征值,φ l(T(GT))为HDR图像真值的压缩色调映射值在第l层的特征图中对应的第二特征值。 Among them, Lp represents the perceptual loss, φ l represents the feature map of the l-th layer of the pre-training model, such as the feature map of the l-th layer of TGG-16, the size of the feature map is C l × H l × W l , φ l (T(H)) is the first eigenvalue corresponding to the compressed tone mapping value of the HDR image prediction value in the feature map of the l layer, φ l (T(GT)) is the compressed tone mapping value of the true value of the HDR image in The second eigenvalue corresponding to the feature map of layer l.
需要说明的是,上述公式(3)只是一种示例,本申请确定感知损失的方式包括但不限于上述公式(3)所示,例如在公式(3)中增加、减少、相乘或相除某一个参数,或者,上述公式(3)的等价变形等,均属于本申请的保护范围。It should be noted that the above formula (3) is just an example, and the method of determining the perceptual loss in the present application includes but not limited to the above formula (3), such as adding, subtracting, multiplying or dividing in formula (3) A certain parameter, or the equivalent deformation of the above formula (3), etc., all belong to the protection scope of the present application.
在一种示例中,根据如下方式确定样式损失:获取预训练模型的第l层特征图的格拉姆Gram矩阵;根据预设的压缩色调映射函数,确定HDR图像预测值的压缩色调映射值;根据压缩色调映射函数,确定HDR图像真值的压缩色调映射值;确定HDR图像预测值的压缩色调映射值,在格拉姆Gram矩阵中对应的第一元素值;确定HDR图像真值的压缩色调映射值,在第l层的特征图中对应的第二元素值;根据第一元素值与第二元素值之间的误差,确定样式损失。In one example, the style loss is determined according to the following manner: obtain the Gram Gram matrix of the l-th layer feature map of the pre-training model; determine the compressed tone mapping value of the HDR image prediction value according to the preset compressed tone mapping function; Compressed tone mapping function, determine the compressed tone mapping value of the true value of the HDR image; determine the compressed tone mapping value of the predicted value of the HDR image, the corresponding first element value in the Gram Gram matrix; determine the compressed tone mapping value of the true value of the HDR image , the second element value corresponding to the feature map of the first layer; according to the error between the first element value and the second element value, determine the style loss.
例如,根据如下公式(4)确定样式损失:For example, the style loss is determined according to the following formula (4):
Figure PCTCN2021102173-appb-000004
Figure PCTCN2021102173-appb-000004
其中,Lp表示感知损失函数,G(.)是预训练模型的第l层特征图的格拉姆Gram矩阵,G(T(H))为HDR图像预测值的压缩色调映射值在格拉姆Gram矩阵中对应的第一元素值,G(T(GT))为HDR图像真值的压缩色调映射值在第l层的特征图中对应的第二元素值,
Figure PCTCN2021102173-appb-000005
x=H或GT,K l的大小为C lH lW l,表示计算的归一化因子,特征φ是(H lW l)×C l的矩阵,因此,Gram矩阵的大小为C l×C l
Among them, Lp represents the perceptual loss function, G(.) is the Gram Gram matrix of the l-th layer feature map of the pre-trained model, and G(T(H)) is the compressed tone mapping value of the HDR image prediction value in the Gram Gram matrix The first element value corresponding to G(T(GT)) is the second element value corresponding to the compressed tone mapping value of the true value of the HDR image in the feature map of layer l,
Figure PCTCN2021102173-appb-000005
x=H or GT, the size of K l is C l H l W l , which represents the normalization factor of the calculation, and the feature φ is a matrix of (H l W l )×C l , therefore, the size of the Gram matrix is C l ×C l .
可选的,使用预训练的VGG-16网络,并分别计算VGG-16的前三个池化层pool1、pool2和pool3的特征图与真实特征的输出,并根据上述公式(3)和公式(4)分别计算这些特征的感知损失和样式损失。Optionally, use the pre-trained VGG-16 network, and calculate the output of the feature maps and real features of the first three pooling layers pool1, pool2 and pool3 of VGG-16 respectively, and according to the above formula (3) and formula ( 4) Compute the perceptual loss and style loss for these features separately.
本申请实施例的目标损失包含重构损失、感知损失和样式损失,以减少高动态范围图像重建失真,伪影和色调异常,进一步提高模型生成的HDR图像的质量。The target loss in the embodiment of the present application includes reconstruction loss, perceptual loss and style loss, so as to reduce reconstruction distortion, artifacts and tone anomalies of high dynamic range images, and further improve the quality of HDR images generated by the model.
进一步的,下面通过实验的方式对本申请实施例提出的动态转换模型的图像处理能力进行验证。Further, the image processing capability of the dynamic transformation model proposed in the embodiment of the present application is verified through experiments below.
数据集的收集:深度学习模型依赖于大规模的数据集,由于无法使用带有LDR-HDR图像对的数据集。本申请从多个HDR图像数据集和HDR视频数据中进行收集,并设置了一个虚拟相机使用随机选择的相机校准来捕获场景的多个随机区域。虚拟相机校准包含曝光,相机曲线,白平衡和噪声水平的参数。其中虚拟摄像机参数是随机选择的,摄像机曲线参数被随机拟合到摄像机曲线的数据库中。这提供了一组LDR和相应的HDR图像,它们分别用作输入和训练的真实情况。然后应用一组数据增强操作以提高预测的鲁棒性。将每个HDR图像视为真实场景,选择区域作为具有随机大小和位置的图像裁剪,然后随机翻转并重采样为256×256像素。使用这些数据增强功能的最终训练网络可以很好地推广到使用不同相机捕获的各种图像。然后将获得的数据集分为训练集和测试集。具体而言,从HDR数据集中收集了两个数据集,即Fairchild HDR数据集和HDR EYE数据集进行测试。Collection of datasets: Deep learning models rely on large-scale datasets, since datasets with LDR-HDR image pairs cannot be used. This application collects from multiple HDR image datasets and HDR video data, and sets up a virtual camera to capture multiple random regions of the scene using randomly selected camera calibrations. Virtual camera calibration contains parameters for exposure, camera curve, white balance and noise level. The virtual camera parameters are randomly selected, and the camera curve parameters are randomly fitted into the camera curve database. This provides a set of LDR and corresponding HDR images, which are used as input and ground truth for training, respectively. A set of data augmentation operations are then applied to improve the robustness of the predictions. Treating each HDR image as a real scene, a region is selected as an image crop with random size and position, then randomly flipped and resampled to 256×256 pixels. The final trained network using these data augmentations generalizes well to a variety of images captured with different cameras. The obtained dataset is then divided into training set and test set. Specifically, two datasets, Fairchild HDR dataset and HDR EYE dataset, are collected from the HDR dataset for testing.
实验环境:本申请的硬件实验设备为AMD Ryzen 5 CPU,NVIDIA GTX 1080 Ti以及16G内存,框架为PyTorch。Experimental environment: The hardware experimental equipment of this application is AMD Ryzen 5 CPU, NVIDIA GTX 1080 Ti and 16G memory, and the framework is PyTorch.
为了说明本申请提出方法的性能,将该方法与现有的五种单图像HDR重建技术方法进行了比较,其中包括三种常规的非学习方法:Akyuz方法、KOV方法以及Masia方法。除此以外,还有两种基于深度学习技术的方法:ExpandNet与HDRCNN。为了评估通过各种单图像HDR重建方法获得的重建图像的质量,使用三种客观评估方法PU-PSNR,PU-SSIM和HDR-VDP Q得分来评估图像质量。To illustrate the performance of the method proposed in this application, the method is compared with five existing single-image HDR reconstruction techniques, including three conventional non-learning methods: Akyuz method, KOV method and Masia method. In addition, there are two methods based on deep learning technology: ExpandNet and HDRCNN. To evaluate the quality of reconstructed images obtained by various single-image HDR reconstruction methods, three objective evaluation methods PU-PSNR, PU-SSIM and HDR-VDP Q-score were used to evaluate the image quality.
本申请提出的感知统一编码将亮度值转换为HDR图像的近似感知均匀的像素值。在评估指标中,PU-PSNR测量预测图像和参考图像之间的像素差异。PU-SSIM从视觉感知的角度测量预测图像和参考图像之间的结构差异。HDR-VDP是一种视觉度量,用于比较参考图像和测试图像,并相对于参考图像预测HDR图像的质量。HDR-VDP中提供的质量Q得分用作评估指标。The perceptually uniform coding proposed in this application converts luminance values into approximately perceptually uniform pixel values of an HDR image. Among the evaluation metrics, PU-PSNR measures the pixel-wise difference between the predicted image and the reference image. PU-SSIM measures the structural difference between predicted and reference images from the perspective of visual perception. HDR-VDP is a visual metric used to compare reference and test images and predict the quality of an HDR image relative to the reference image. The quality Q-score provided in HDR-VDP is used as the evaluation metric.
在客观指标中,Q值、PU-PSNR和PU-SSIM值越大,表明模型重构的高动态范围图像与原始图像越接近,重构质量就越高。Among the objective indicators, the larger the Q value, PU-PSNR and PU-SSIM value, the closer the high dynamic range image reconstructed by the model is to the original image, and the higher the reconstruction quality is.
表1显示了在HDR EYE数据集和Fairchild数据集上使用现有方法对重建的HDR图像的定量比较。其中,粗体表示具有最佳实验结果的方法,下划线表示次佳算法。我们的方法在Fairchild数据集中具有最佳结果,在HDR EYE数据集中具有良好的Q评分,并且在这两个数据集上就PSNR与SSIM指标而言性能均好于其他方法。Table 1 shows a quantitative comparison of reconstructed HDR images using existing methods on the HDR EYE dataset and the Fairchild dataset. Among them, the bold indicates the method with the best experimental results, and the underline indicates the second best algorithm. Our method has the best results in the Fairchild dataset, good Q-score in the HDR EYE dataset, and outperforms other methods in terms of PSNR and SSIM metrics on both datasets.
表1Table 1
Figure PCTCN2021102173-appb-000006
Figure PCTCN2021102173-appb-000006
其中,Fairchild数据集由罗切斯特理工大学Mark D.Fairchild教授团队构造,包含超过100张的一系列HDR图像和数据。Among them, the Fairchild dataset was constructed by the team of Professor Mark D. Fairchild of Rochester Institute of Technology, and contains a series of HDR images and data of more than 100 pieces.
由表1可知,其他方法无法恢复曝光过度区域的纹理,并且会导致变色、模糊和平铺伪影的结果。与本申请的方法相比,常规方法无法消除噪声或恢复饱和区域中丢失的细节。本申请所提出的模型与现有方法相比具有良好的性能,并且最终获得的HDR图像具有更自然的色彩和更丰富的细节,并且可以有效地抑制低曝光区域中的噪声。As can be seen from Table 1, other methods cannot recover the texture of the overexposed regions and lead to results of discoloration, blurring and tiling artifacts. Compared with the method of this application, conventional methods cannot remove noise or restore lost details in saturated regions. The model proposed in this application has good performance compared with existing methods, and the finally obtained HDR images have more natural colors and richer details, and can effectively suppress noise in low-exposure regions.
本申请实施例提供一种动态转换模型,该模型包括串联连接的N个编码模块和串联连接的N个解码模块,N个编码模块中的最后一个编码模块的输出与N个解码模块中的第一个解码模块的输入连接,且第i个编码模块与第N-i+1个解码模块跳跃连接,使用LDR训练图像对该模型进行训练,训练过程是:将LDR训练图像输入动态转换模型,通过第i个编码模块对第i-1个第一特征信息进行特征提取,得到LDR训练图像的第i个第一特征信息,通过第N-i+1个解码模块对第i-1个第一特征信息和LDR训练图像的第N-i个第二特征信息进行特征提取,得到LDR训练图像的第N-i+1个第二特征信息;根据N个解码模块中最后一个解码模块输出的LDR训练图像的第二特征信息,确定LDR训练图像的HDR图像预测值;确定LDR训练图像的HDR图像预测值和LDR训练图像的HDR图像真值之间的损失,并根据损失对动态转换模型进行训练。在后续使用时,可以使用训练好的动态转换模型将LDR图像转换为HDR图像,进而实现在不增加数据采集、编码、传输、存储等成本的同时,实现HDR图像的转换,从而提高了HDR图像转换的效率。The embodiment of the present application provides a dynamic conversion model, the model includes N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in the N encoding modules and the output of the first encoding module in the N decoding modules The input of a decoding module is connected, and the i-th encoding module is skipped and connected to the N-i+1-th decoding module, and the model is trained using the LDR training image. The training process is: input the LDR training image into the dynamic conversion model, The i-1th first feature information is extracted by the i-th encoding module to obtain the i-th first feature information of the LDR training image, and the i-1-th first feature information is obtained by the N-i+1-th decoding module The first feature information and the N-i second feature information of the LDR training image are extracted to obtain the N-i+1 second feature information of the LDR training image; according to the LDR training output of the last decoding module in the N decoding modules The second feature information of the image is to determine the HDR image prediction value of the LDR training image; determine the loss between the HDR image prediction value of the LDR training image and the HDR image true value of the LDR training image, and train the dynamic conversion model according to the loss. In subsequent use, the trained dynamic conversion model can be used to convert the LDR image into an HDR image, and then realize the conversion of the HDR image without increasing the cost of data acquisition, encoding, transmission, storage, etc., thereby improving the quality of the HDR image. conversion efficiency.
上文结合动态转换模型的网络结构,对动态转换模型的训练过程进行介绍,下面对动态转换模型的应用过程进行介绍。Combining with the network structure of the dynamic conversion model, the training process of the dynamic conversion model is introduced above, and the application process of the dynamic conversion model is introduced below.
在一些实施例中,本申请实施例提供的动态转换模型还可以应用于视频编解码框架中,例如可以应用于视频解码端,对解码端得到的重建图像进行高动态转换,得到重建图像的HDR图像。In some embodiments, the dynamic conversion model provided by the embodiment of the present application can also be applied to the video codec framework, for example, it can be applied to the video decoding end to perform high dynamic conversion on the reconstructed image obtained by the decoding end to obtain the HDR of the reconstructed image image.
图6为本申请一实施例提供的图像解码方法的流程示意图,如图6所示,该方法包括:Fig. 6 is a schematic flowchart of an image decoding method provided by an embodiment of the present application. As shown in Fig. 6, the method includes:
S601、解码码流,得到重建图像。S601. Decode the code stream to obtain a reconstructed image.
例如图3所示,熵解码单元310可解析码流得到当前块的预测信息、量化系数矩阵等,预测单元320基于预测信息对当前块使用帧内预测或帧间预测产生当前块的预测块。反量化/变换单元330使用从码流得到的量化系数矩阵,对量化系数矩阵进行反量化、反变换得到残差块。重建单元340将预测块和残差块相加得到重建块。重建块组成重建图像,环路滤波单元350基于图像或基于块对重建图像 进行环路滤波,得到重建图像。For example, as shown in FIG. 3 , the entropy decoding unit 310 can analyze the code stream to obtain prediction information of the current block, quantization coefficient matrix, etc., and the prediction unit 320 uses intra prediction or inter prediction for the current block based on the prediction information to generate a prediction block of the current block. The inverse quantization/transformation unit 330 uses the quantization coefficient matrix obtained from the code stream to perform inverse quantization and inverse transformation on the quantization coefficient matrix to obtain a residual block. The reconstruction unit 340 adds the predicted block and the residual block to obtain a reconstructed block. The reconstructed blocks form a reconstructed image, and the loop filtering unit 350 performs loop filtering on the reconstructed image based on the image or based on the block to obtain the reconstructed image.
在本实施例中,将动态转换模型与视频编码框架相结合。In this embodiment, the dynamic transformation model is combined with the video coding framework.
在一种示例中,为了便于编码,在编码端对于输入的10bitHDR数据,经过色调映射模块(TM)转化为8bit的LDR数据,然后切分成CTU送入到编码器中进行编码,经过运动估计、运动补偿、帧内预测、帧间预测、变换、量化、滤波以及熵编码等环节形成码流。在解码器的输出端增加上述实施例所述的动态转换模型。对解码后的LDR重建图像进行动态范围的扩展,利用该模型,可以显著提升获得的HDR数据质量,在保证码率的前提下,进一步提升解码后的图像质量。In one example, in order to facilitate encoding, the input 10-bit HDR data is converted into 8-bit LDR data through a tone mapping module (TM) at the encoding end, and then divided into CTUs and sent to the encoder for encoding. After motion estimation, Motion compensation, intra-frame prediction, inter-frame prediction, transformation, quantization, filtering, and entropy coding form a code stream. The dynamic conversion model described in the above embodiment is added at the output end of the decoder. The dynamic range of the decoded LDR reconstruction image is extended. Using this model, the quality of the obtained HDR data can be significantly improved, and the decoded image quality can be further improved under the premise of ensuring the bit rate.
S602、将重建图像输入动态转换模型进行动态转换,得到重建图像的高动态范围HDR图像。S602. Input the reconstructed image into a dynamic conversion model to perform dynamic conversion to obtain a high dynamic range HDR image of the reconstructed image.
参照图5A所示,动态转换模型包括:串联连接的N个编码模块和串联连接的N个解码模块,N个编码模块中的最后一个编码模块的输出与N个解码模块中的第一个解码模块的输入连接,且第i个编码模块与第N-i+1个解码模块跳跃连接,第i个编码模块用于对第i-1个编码模块输出的第i-1个第一特征信息进行特征提取,得到重建图像的第i个第一特征信息,第N-i+1个解码模块用于对第i-1个第一特征信息和重建图像的第N-i个第二特征信息进行特征提取,得到重建图像的第N-i+1个第二特征信息,i为小于或等于N的正整数,N为正整数。Shown in Fig. 5 A with reference to, dynamic transformation model comprises: N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in N encoding modules is decoded with the first decoding module in N decoding modules The input connection of the module, and the i-th encoding module is skipped and connected to the N-i+1-th decoding module, and the i-th encoding module is used for the i-1th first feature information output by the i-1-th encoding module Perform feature extraction to obtain the i-th first feature information of the reconstructed image, and the N-i+1-th decoding module is used to perform feature extraction on the i-1-th first feature information and the N-i-th second feature information of the reconstructed image Extracting to obtain the N-i+1th second feature information of the reconstructed image, where i is a positive integer less than or equal to N, and N is a positive integer.
其中,重建图像的HDR图像是根据N个解码模块中最后一个解码模块输出的第二特征信息确定的。Wherein, the HDR image of the reconstructed image is determined according to the second characteristic information output by the last decoding module among the N decoding modules.
其中,若i等于N,则上述第N-i个第二特征信息是根据第N个编码模块输出的第N个第一特征信息确定的。Wherein, if i is equal to N, the above N-i th second feature information is determined according to the N th first feature information output by the N th encoding module.
若i小于N,则上述第N-i个第二特征信息是根据第N-i个解码模块输出的第N-i个第二特征信息确定的。If i is less than N, the above N-i th second feature information is determined according to the N-i th second feature information output by the N-i th decoding module.
若i等于1,则上述第i-1个第一特征信息是根据重建图像确定的,例如,第0个第一特征信息是重建图像,或者为对重建图像进行处理后的特征图。If i is equal to 1, the i-1th first feature information is determined according to the reconstructed image, for example, the 0th first feature information is the reconstructed image, or is a feature map after processing the reconstructed image.
若i大于1,则上述第i-1个第一特征信息是根据第i-1个编码模块输出的第一特征信息确定的。If i is greater than 1, the i-1th first feature information is determined according to the first feature information output by the i-1th coding module.
本申请实施例对编码模块的具体网络结构不做限制。The embodiment of the present application does not limit the specific network structure of the encoding module.
在一种实施例中,N个编码模块中每个编码模块包括至少一个卷积块,其中N个编码模块中每个编码模块所包括的卷积块的参数不完全相同。例如,第一个编码模块所包括的卷积块的特征维度为64,第二个编码模块所包括的卷积块的特征维度为128,第三个编码模块所包括的卷积块的特征维度为256,第四个编码模块所包括的卷积块的特征维度为512等。In an embodiment, each of the N coding modules includes at least one convolutional block, and parameters of the convolutional blocks included in each of the N coding modules are not completely the same. For example, the feature dimension of the convolution block included in the first encoding module is 64, the feature dimension of the convolution block included in the second encoding module is 128, and the feature dimension of the convolution block included in the third encoding module is is 256, the feature dimension of the convolutional block included in the fourth encoding module is 512 and so on.
本申请实施例对解码模块的具体网络结构不做限制。The embodiment of the present application does not limit the specific network structure of the decoding module.
在一种实施例中,N个解码模块中每个解码模块包括至少一个卷积块,其中N个解码模块中每个解码模块所包括的卷积块的参数不完全相同。例如,第一个解码模块所包括的卷积块的特征维度为256,第二个解码模块所包括的卷积块的特征维度为128,第三个解码模块所包括的卷积块的特征维度为64,第四个编码模块所包括的卷积块的特征维度为32等。In an embodiment, each of the N decoding modules includes at least one convolutional block, and parameters of the convolutional blocks included in each of the N decoding modules are not completely the same. For example, the feature dimension of the convolution block included in the first decoding module is 256, the feature dimension of the convolution block included in the second decoding module is 128, and the feature dimension of the convolution block included in the third decoding module is is 64, the feature dimension of the convolutional block included in the fourth coding module is 32 and so on.
本申请实施例中各编码模块所包括的卷积块的网络结构可以相同,也可以不同。各解码模块所包括的卷积块的网络结构可以相同,也可以不同。另外,编码模块和解码模块所包括的卷积块的网络结构可以相同,也可以不同,本申请对此不做限制。The network structures of the convolutional blocks included in the encoding modules in the embodiments of the present application may be the same or different. The network structures of the convolutional blocks included in each decoding module may be the same or different. In addition, the network structures of the convolutional blocks included in the encoding module and the decoding module may be the same or different, which is not limited in this application.
在一种可能的实现方式中,编码模块和/或解码模块所包括的网络结构如图5B所示,包括卷积层1、卷积层2、卷积层3和激活函数。In a possible implementation manner, the network structure included in the encoding module and/or the decoding module is as shown in FIG. 5B , including a convolutional layer 1, a convolutional layer 2, a convolutional layer 3 and an activation function.
可选的,卷积层1和卷积层2的卷积核为3×3,卷积层3的卷积核为1×1,激活函数为Sigmoid加权线性单元(Sigmoid Weighted Liner Unit,简称SiLU)。Optionally, the convolution kernel of convolution layer 1 and convolution layer 2 is 3×3, the convolution kernel of convolution layer 3 is 1×1, and the activation function is Sigmoid Weighted Linear Unit (Sigmoid Weighted Liner Unit, referred to as SiLU ).
需要说明的是,上述卷积层1、卷积层2、卷积层3的卷积核大小包括但不限于如上数值,激活函数包括但不限于SiLU,例如还可以是RELU等,本申请对此不做限制。It should be noted that the sizes of the convolution kernels of the above-mentioned convolutional layer 1, convolutional layer 2, and convolutional layer 3 include but are not limited to the above values, and the activation functions include but are not limited to SiLU, such as RELU, etc. This is not limited.
在一些实施例中,如图5C所示,动态转换模型还包括:位于第i个编码模块与第N-i+1个解码模块的跳跃连接中的卷积注意力模块(CBAM)。该卷积注意力模块的注意力机制使动态转换模型能够将更多的注意力集中在编码侧特征的相关部分上,而将较少的注意力集中在其他无关的部分上,也就是说通过使用卷积注意力机制来提高动态转换模型的表征能力,关注重要的特征,抑制不必要的特 征,从而极大地提高了模型的效率。In some embodiments, as shown in FIG. 5C , the dynamic conversion model further includes: a convolutional attention module (CBAM) located in the skip connection between the i-th encoding module and the N-i+1-th decoding module. The attention mechanism of this convolutional attention module enables the dynamic transformation model to focus more attention on the relevant parts of the encoding side features and less attention on other irrelevant parts, that is, by The convolutional attention mechanism is used to improve the representation ability of the dynamic conversion model, focusing on important features and suppressing unnecessary features, thus greatly improving the efficiency of the model.
在一种可能的实现方式中,在每个编码模块与解码模块的跳跃连接中均包括一个或多个CBAM。In a possible implementation manner, one or more CBAMs are included in the skip connections between each encoding module and decoding module.
其中,位于第i个编码模块与第N-i+1个解码模块的跳跃连接中的卷积注意力模块用于对第i-1个第一特征信息进行空间信息与通道信息提取,得到重建图像的第i-1个第三特征信息。Among them, the convolutional attention module located in the skip connection between the i-th encoding module and the N-i+1-th decoding module is used to extract the spatial information and channel information of the i-1-th first feature information, and obtain the reconstruction The i-1th third feature information of the image.
此时,第N-i+1个解码模块用于对第i-1个第三特征信息和第N-i个第二特征信息进行特征提取,得到重建图像的第N-i+1个第二特征信息。例如第N-i+1个解码模块用于对第i-1个第一特征信息和重建图像的第N-i个第二特征信息级联后的特征信息进行特征提取,得到重建图像的第N-i+1个第二特征信息。At this time, the N-i+1th decoding module is used to perform feature extraction on the i-1th third feature information and the N-ith second feature information to obtain the N-i+1th second feature of the reconstructed image information. For example, the N-i+1 decoding module is used to perform feature extraction on the concatenated feature information of the i-1 first feature information and the N-i second feature information of the reconstructed image, and obtain the N-th feature information of the reconstructed image. i+1 pieces of second feature information.
在一些实施例中,如图5D所示,卷积注意力模块包括通道注意力模块和空间注意力模块。In some embodiments, as shown in Figure 5D, the convolutional attention module includes a channel attention module and a spatial attention module.
其中,通道注意力模块用于对第i-1个第一特征信息进行通道信息提取,得到第i-1个第一特征信息的通道注意力信息。Wherein, the channel attention module is used to extract the channel information of the i-1 first feature information, and obtain the channel attention information of the i-1 first feature information.
空间注意力模块用于对第i-1个第一特征信息和第i-1个第一特征信息的通道注意力信息进行空间信息提取,得到第i-1个第一特征信息的空间注意力信息。The spatial attention module is used to extract the spatial information of the i-1 first feature information and the channel attention information of the i-1 first feature information, and obtain the spatial attention of the i-1 first feature information information.
上述重建图像的第i-1个第三特征信息是根据第i-1个第一特征信息的通道注意力信息和空间注意力信息确定的。The i-1th third feature information of the reconstructed image is determined according to the channel attention information and the spatial attention information of the i-1th first feature information.
如图5E所示,卷积注意力模块还包括第一乘法单元;该第一乘法单元用于对第i-1个第一特征信息和第i-1个第一特征信息的通道注意力信息进行相乘,得到第i-1个第一特征信息的融合通道特征信息,此时,空间注意力模块用于对第i-1个第一特征信息的融合通道特征信息进行空间信息提取,得到第i-1个第一特征信息的空间注意力信息。As shown in Figure 5E, the convolutional attention module also includes a first multiplication unit; the first multiplication unit is used for channel attention information of the i-1 first feature information and the i-1 first feature information Perform multiplication to obtain the fusion channel feature information of the i-1 first feature information. At this time, the spatial attention module is used to extract the spatial information of the fusion channel feature information of the i-1 first feature information, and obtain The spatial attention information of the i-1th first feature information.
继续参照图5D所示,卷积注意力模块还包括第二乘法单元;该第二乘法单元用于对第i-1个第一特征信息的融合通道特征信息和空间注意力信息进行相乘,得到重建图像的第i-1个第三特征信息。Continuing to refer to Figure 5D, the convolutional attention module also includes a second multiplication unit; the second multiplication unit is used to multiply the fusion channel feature information and spatial attention information of the i-1 first feature information, Obtain the i-1th third feature information of the reconstructed image.
在一些实施例中,如图5E所示,通道注意力模块包括:第一空间压缩单元、第二空间压缩单元和通道特征提取单元。In some embodiments, as shown in FIG. 5E , the channel attention module includes: a first spatial compression unit, a second spatial compression unit, and a channel feature extraction unit.
其中,第一空间压缩单元用于对第i-1个第一特征信息进行空间维度压缩,得到第i-1个第一特征信息的第一空间压缩信息;Wherein, the first spatial compression unit is used to compress the spatial dimension of the i-1 first feature information to obtain the first spatial compression information of the i-1 first feature information;
第二空间压缩单元用于对第i-1个第一特征信息进行空间维度压缩,得到第i-1个第一特征信息的第二空间压缩信息;The second spatial compression unit is used to perform spatial dimension compression on the i-1 first feature information to obtain second spatial compression information of the i-1 first feature information;
通道特征提取单元用于对第i-1个第一特征信息的第一空间压缩信息进行通道特征提取,得到i-1个第一特征信息的第一通道信息,对第i-1个第一特征信息的第二空间压缩信息进行通道特征提取,得到i-1个第一特征信息的第二通道信息。The channel feature extraction unit is used to perform channel feature extraction on the first spatial compression information of the i-1 first feature information, to obtain the first channel information of the i-1 first feature information, and to obtain the first channel information of the i-1 first feature information. Channel feature extraction is performed on the second space compressed information of the feature information to obtain i-1 second channel information of the first feature information.
上述第i-1个第一特征信息的通道注意力信息是根据i-1个第一特征信息的第一通道信息和第二通道信息确定的。The channel attention information of the i-1 first feature information is determined according to the first channel information and the second channel information of the i-1 first feature information.
可选的,第一空间压缩单元和/或第二空间压缩单元包括池化层。Optionally, the first spatial compression unit and/or the second spatial compression unit includes a pooling layer.
可选的,第一空间压缩单元为最大池化层,和/或第二空间压缩单元为平均池化层。Optionally, the first spatial compression unit is a maximum pooling layer, and/or the second spatial compression unit is an average pooling layer.
可选的,通道特征提取单元为多层感知机MLP。Optionally, the channel feature extraction unit is a multi-layer perceptron MLP.
继续参照图5E所示,通道注意力模块还包括:第一加法单元和第一激活函数;Continuing to refer to Figure 5E, the channel attention module also includes: a first addition unit and a first activation function;
其中,第一加法单元用于对i-1个第一特征信息的第一通道信息和第二通道信息进行相加,得到i-1个第一特征信息的融合通道信息;Wherein, the first addition unit is configured to add the first channel information and the second channel information of the i-1 first feature information to obtain the fusion channel information of the i-1 first feature information;
第一激活函数用于对i-1个第一特征信息的融合通道信息进行非线性处理,得到第i-1个第一特征信息的通道注意力信息。The first activation function is used to perform non-linear processing on the fusion channel information of the i-1 first feature information to obtain the channel attention information of the i-1 first feature information.
在一些实施例中,如图5F所示,空间注意力模块包括:第一通道压缩单元、第二通道压缩单元和空间特征提取单元;In some embodiments, as shown in FIG. 5F , the spatial attention module includes: a first channel compression unit, a second channel compression unit, and a spatial feature extraction unit;
其中,第一通道压缩单元用于对第i-1个第一特征信息的融合通道特征信息进行通道维度压缩,得到第i-1个第一特征信息的第一通道压缩信息;Wherein, the first channel compression unit is configured to perform channel dimension compression on the fusion channel feature information of the i-1 first feature information, to obtain the first channel compression information of the i-1 first feature information;
第二通道压缩单元用于对第i-1个第一特征信息的融合通道特征信息进行通道维度压缩,得到第i-1个第一特征信息的第二通道压缩信息;The second channel compression unit is used to perform channel dimension compression on the fusion channel feature information of the i-1 first feature information, to obtain the second channel compression information of the i-1 first feature information;
空间特征提取单元用于对第i-1个第一特征信息的第一通道压缩信息和第二通道压缩信息进行空间特征提取,得到第i-1个第一特征信息的空间特征信息;The spatial feature extraction unit is used to perform spatial feature extraction on the first channel compressed information and the second channel compressed information of the i-1 first feature information, to obtain the spatial feature information of the i-1 first feature information;
第i-1个第一特征信息的空间注意力信息是根据第i-1个第一特征信息的空间特征信息确定的。The spatial attention information of the i-1 first feature information is determined according to the spatial feature information of the i-1 first feature information.
可选的,第一通道压缩单元和/或第二通道压缩单元包括池化层。Optionally, the first channel compression unit and/or the second channel compression unit includes a pooling layer.
可选的,第一通道压缩单元为最大池化层,和/或第二通道压缩单元为平均池化层。Optionally, the first channel compression unit is a maximum pooling layer, and/or the second channel compression unit is an average pooling layer.
可选的,空间特征提取单元为卷积层。Optionally, the spatial feature extraction unit is a convolutional layer.
继续参照图5F所示,空间注意力模块还包括第二激活函数;Continuing to refer to Figure 5F, the spatial attention module also includes a second activation function;
其中,第二激活函数用于对第i-1个第一特征信息的空间特征信息进行非线性处理,得到第i-1个第一特征信息的空间注意力信息。Wherein, the second activation function is used to perform non-linear processing on the spatial feature information of the i-1 first feature information to obtain the spatial attention information of the i-1 first feature information.
可选的,第i-1个第一特征信息的通道注意力信息的空间维度为1×1。Optionally, the spatial dimension of the channel attention information of the i-1 th first feature information is 1×1.
可选的,第i-1个第一特征信息的空间注意力信息的特征维度为1。Optionally, the feature dimension of the spatial attention information of the i-1th first feature information is 1.
本申请实施例的提供的动态转换模型,通过在每一条分支上增加了卷积注意力模块,该卷积注意力模块包含通道注意力模块以及空间注意力模块,分别对通道特征和空间特征分别进行学习,进而提高了动态转换模型对图像细节特征的学习,使得动态转换模型可以重建出图像中更多细节特征,进而提高动态转换模型生成的HDR图像的质量。The dynamic conversion model provided by the embodiment of the present application adds a convolutional attention module to each branch, and the convolutional attention module includes a channel attention module and a spatial attention module, respectively for channel features and spatial features Learning is carried out, thereby improving the learning of image detail features by the dynamic conversion model, so that the dynamic conversion model can reconstruct more detailed features in the image, thereby improving the quality of the HDR image generated by the dynamic conversion model.
在一些实施例中,如图5G所示,动态转换模型还包括至少一个下采样单元;该下采样单元用于对编码模块输出的特征信息进行空间维度下采样。In some embodiments, as shown in FIG. 5G , the dynamic conversion model further includes at least one downsampling unit; the downsampling unit is used for downsampling the feature information output by the encoding module in a spatial dimension.
可选的,下采样单元为最大池化层。Optionally, the downsampling unit is a maximum pooling layer.
在一些实施例中,如图5G所示,动态转换模型还包括至少一个上采样单元;该上采样单元用于对解码模块输出的特征信息进行空间维度上采样。In some embodiments, as shown in FIG. 5G , the dynamic conversion model further includes at least one upsampling unit; the upsampling unit is used to perform spatial dimension upsampling on the feature information output by the decoding module.
可选的,上采样单元为双线性插值单元。Optionally, the upsampling unit is a bilinear interpolation unit.
继续参照图5G所示,动态转换模型还包括第一卷积层;该第一卷积层用于对重建图像进行特征提取,得到重建图像的初始特征图,并将初始特征图分别输入第一个编码模块和第一卷积注意力模块中。Continuing to refer to Figure 5G, the dynamic conversion model also includes a first convolutional layer; the first convolutional layer is used to extract features from the reconstructed image, obtain the initial feature map of the reconstructed image, and input the initial feature map into the first in the first encoding module and the first convolutional attention module.
继续参照图5G所示,动态转换模型还包括第二卷积层;该第二卷积层用于对最后一个解码模块输出的重建图像的第二特征信息进行特征提取,输出重建图像的HDR图像。Continuing to refer to Figure 5G, the dynamic conversion model also includes a second convolutional layer; the second convolutional layer is used for feature extraction of the second feature information of the reconstructed image output by the last decoding module, and outputs the HDR image of the reconstructed image .
在本申请的一具体实施例中,如图7所示,该动态转换模型包括第一卷积层、4个串联连接的编码模块、3个下采样单元、4个串联连接的解码模块、3个上采样单元、位于编码模块和解码模块的跳跃连接上的4个CBAM,以及第二卷积层。示例性的,第一卷积层的卷积核为3×3,通道数为32,其中通道数也可以理解为特征维度,第二卷积层的卷积核为1×1,通道数为3,且第二卷积层包括一个激活函数。第一编码模块包括一个通道数为64的卷积块,第二编码模块包括一个通道数为128的卷积块,第三编码模块包括一个通道数为256的卷积块,第四编码模块包括一个通道数512的卷积块。在第一编码模块与第二编码模块之间设置第一下采样单元,在第二编码模块与第三编码模块之间设置第二下采样单元,在第三编码模块与第四编码模块之间设置第三下采样单元,该第一下采样单元、第二下采样单元和第三下采样单元均为卷积核为2×2,步长为S2的最大池化层。第一解码模块包括一个通道数为256的卷积块,第二解码模块包括一个通道数为128的卷积块,第三解码模块包括一个通道数为64的卷积块,第四解码模块包括一个通道数32的卷积块。在第四编码块与第一解码模块之间设置第一上采样单元,在第一解码模块与第二解码模块之间设置第二上采样单元,在第二解码模块与第三解码模块之间设置第三上采样单元,该第一上采样单元、第二上采样单元和第三采样单元均为双线性插值单元,且上采样的倍数为2×2,另外,每个上采样单元还包括一个卷积层,例如,第一上采样单元为Bilinear Upsample 2×2,Conv 3×3 256,第二上采样单元为Bilinear Upsample 2×2,Conv 3×3 128,第三上采样单元为Bilinear Upsample 2×2,Conv 3×3 64。In a specific embodiment of the present application, as shown in FIG. 7, the dynamic conversion model includes a first convolutional layer, 4 encoding modules connected in series, 3 down-sampling units, 4 decoding modules connected in series, 3 Upsampling units, 4 CBAMs on the skip connections of the encoding module and decoding module, and the second convolutional layer. Exemplarily, the convolution kernel of the first convolution layer is 3×3, and the number of channels is 32, where the number of channels can also be understood as a feature dimension, and the convolution kernel of the second convolution layer is 1×1, and the number of channels is 3, and the second convolutional layer includes an activation function. The first encoding module includes a convolutional block with 64 channels, the second encoding module includes a convolutional block with 128 channels, the third encoding module includes a convolutional block with 256 channels, and the fourth encoding module includes A convolutional block with 512 channels. A first down-sampling unit is set between the first coding module and the second coding module, a second down-sampling unit is set between the second coding module and the third coding module, and a second down-sampling unit is set between the third coding module and the fourth coding module A third down-sampling unit is set, the first down-sampling unit, the second down-sampling unit and the third down-sampling unit are all maximum pooling layers with a convolution kernel of 2×2 and a step size of S2. The first decoding module includes a convolutional block with 256 channels, the second decoding module includes a convolutional block with 128 channels, the third decoding module includes a convolutional block with 64 channels, and the fourth decoding module includes A convolutional block with 32 channels. A first upsampling unit is set between the fourth coding block and the first decoding module, a second upsampling unit is set between the first decoding module and the second decoding module, and a second upsampling unit is set between the second decoding module and the third decoding module The third upsampling unit is set, the first upsampling unit, the second upsampling unit and the third sampling unit are all bilinear interpolation units, and the upsampling multiple is 2×2. In addition, each upsampling unit also Including a convolutional layer, for example, the first upsampling unit is Bilinear Upsample 2×2, Conv 3×3 256, the second upsampling unit is Bilinear Upsample 2×2, Conv 3×3 128, and the third upsampling unit is Bilinear Upsample 2×2, Conv 3×3 64.
假设重建图像的大小为H×W×3,其中H×W表示重建图像的长度和宽度尺寸,3表示重建图像的RGB3通道数。将上述重建图像输入图7所示动态转换模型中,经过第一卷积层处理,输出重建 图像的初始特征图,该初始特征图的大小为H×W×32。第一卷积层输出的初始特征图分别输入第一编码模块和第一CBAM中,第一卷积模块中的卷积块对初始特征图进行卷积处理,得到重建图像的第一个第一特征信息,该第一个第一特征信息分别输入第二个CBAM和第一个下采样单元,该第一个第一特征信息的大小为H×W×64。第一个下采样单元将第一个第一特征信息下采样为H/2×W/2×64,并将采样后的第一个第一特征信息输入第二个编码模块中。第二个编码模块中的卷积块对采样后的第一个第一特征信息进行卷积处理,得到重建图像的第二个第一特征信息,并将该第二个第一特征信息分别输入第三个CBAM和第二个下采样单元,该第二个第一特征信息的大小为H/2×W/2×128。第二个下采样单元将第二个第一特征信息下采样为H/4×W/4×128,并将采样后的第二个第一特征信息输入第三个编码模块中。第三个编码模块中的卷积块对采样后的第二个第一特征信息进行卷积处理,得到重建图像的第三个第一特征信息,并将该第三个第一特征信息分别输入第四个CBAM和第三个下采样单元,该第三个第一特征信息的大小为H/4×W/4×256。第三个下采样单元将第三个第一特征信息下采样为H/8×W/8×256,并将采样后的第三个第一特征信息输入第四个编码模块中。第四个编码模块中的卷积块对采样后的第三个第一特征信息进行卷积处理,得到重建图像的第四个第一特征信息,并将该第四个第一特征信息输入第一个上采样单元,该第四个第一特征信息的大小为H/8×W/8×512。Suppose the size of the reconstructed image is H×W×3, where H×W represents the length and width dimensions of the reconstructed image, and 3 represents the number of RGB3 channels of the reconstructed image. Input the above reconstructed image into the dynamic conversion model shown in Figure 7, and after the first convolutional layer processing, output the initial feature map of the reconstructed image, the size of the initial feature map is H×W×32. The initial feature map output by the first convolutional layer is respectively input into the first encoding module and the first CBAM, and the convolution block in the first convolution module performs convolution processing on the initial feature map to obtain the first first image of the reconstructed image. Feature information, the first first feature information is respectively input into the second CBAM and the first down-sampling unit, and the size of the first first feature information is H×W×64. The first down-sampling unit down-samples the first first feature information to H/2×W/2×64, and inputs the sampled first first feature information into the second coding module. The convolution block in the second encoding module performs convolution processing on the sampled first first feature information to obtain the second first feature information of the reconstructed image, and input the second first feature information respectively For the third CBAM and the second down-sampling unit, the size of the second first feature information is H/2×W/2×128. The second down-sampling unit down-samples the second first feature information to H/4×W/4×128, and inputs the sampled second first feature information into the third coding module. The convolution block in the third encoding module performs convolution processing on the sampled second first feature information to obtain the third first feature information of the reconstructed image, and input the third first feature information respectively The fourth CBAM and the third down-sampling unit, the size of the third first feature information is H/4×W/4×256. The third down-sampling unit down-samples the third first feature information to H/8×W/8×256, and inputs the sampled third first feature information into the fourth coding module. The convolution block in the fourth encoding module performs convolution processing on the sampled third first feature information to obtain the fourth first feature information of the reconstructed image, and input the fourth first feature information into the first An upsampling unit, the size of the fourth first feature information is H/8×W/8×512.
第一个上采样单元将第四个第一特征信息上采样为H/4×W/4×256。第四个CBAM对第三个第一特征信息进行特征提取,输出重建图像的第一个第三特征信息。第一个第三特征信息与上采样后的第四个第一特征信息级联后输入第一个解码模块。第一个解码模块对级联后的第一个第三特征信息与上采样后的第四个第一特征信息进行特征提取,得到重建图像的第一个第二特征信息,并将第一个第二特征信息输入第二个上采样单元中。第二个上采样单元将第一个第二特征信息上采样为H/2×W/2×128。第三个CBAM对第二个第一特征信息进行特征提取,输出重建图像的第二个第三特征信息。第二个第三特征信息与上采样后的第一个第二特征信息级联后输入第二个解码模块。第二个解码模块对级联后的第二个第三特征信息与上采样后的第一个第二特征信息进行特征提取,得到重建图像的第二个第二特征信息,并将第二个第二特征信息输入第三个上采样单元中。第三个上采样单元将第二个第二特征信息上采样为H×W×64。第二个CBAM对第一个第一特征信息进行特征提取,输出重建图像的第三个第三特征信息。第三个第三特征信息与上采样后的第二个第二特征信息级联后输入第三个解码模块。第三个解码模块对级联后的第三个第三特征信息与上采样后的第二个第二特征信息进行特征提取,得到重建图像的第三个第二特征信息。第一个CBAM对重建图像的初始特征图进行特征提取,输出重建图像的第四个第三特征信息。第四个第三特征信息与第三个第二特征信息级联后输入第四个解码模块。第四个解码模块对级联后的第四个第三特征信息与第三个第二特征信息进行特征提取,得到重建图像的第四个第二特征信息,并将第四个第二特征信息输入第二个卷积层,第四个第二特征信息的大小为H×W×32。该第二个卷积层对第四个第二特征信息进行处理,输出重建图像的HDR图像,HDR图像的大小为H×W×3。The first upsampling unit upsamples the fourth first feature information to H/4×W/4×256. The fourth CBAM performs feature extraction on the third first feature information, and outputs the first third feature information of the reconstructed image. The first third feature information is concatenated with the upsampled fourth first feature information and input to the first decoding module. The first decoding module performs feature extraction on the concatenated first third feature information and the upsampled fourth first feature information to obtain the first second feature information of the reconstructed image, and converts the first The second feature information is input into the second upsampling unit. The second upsampling unit upsamples the first second feature information to H/2×W/2×128. The third CBAM performs feature extraction on the second first feature information, and outputs the second and third feature information of the reconstructed image. The second third feature information is concatenated with the upsampled first second feature information and then input to the second decoding module. The second decoding module performs feature extraction on the concatenated second third feature information and the up-sampled first second feature information to obtain the second second feature information of the reconstructed image, and converts the second The second feature information is input into the third upsampling unit. The third upsampling unit upsamples the second second feature information to H×W×64. The second CBAM performs feature extraction on the first first feature information, and outputs the third third feature information of the reconstructed image. The third third feature information is concatenated with the upsampled second second feature information and input to the third decoding module. The third decoding module performs feature extraction on the concatenated third third feature information and the up-sampled second second feature information to obtain the third second feature information of the reconstructed image. The first CBAM performs feature extraction on the initial feature map of the reconstructed image, and outputs the fourth and third feature information of the reconstructed image. The fourth third feature information is concatenated with the third second feature information and input to the fourth decoding module. The fourth decoding module performs feature extraction on the concatenated fourth third feature information and third second feature information to obtain the fourth second feature information of the reconstructed image, and converts the fourth second feature information Input the second convolutional layer, the size of the fourth second feature information is H×W×32. The second convolutional layer processes the fourth second feature information, and outputs the HDR image of the reconstructed image, and the size of the HDR image is H×W×3.
本申请实施例采用上述动态转换模型将低动态范围的重建图像转换为高动态范围的图像,整个转换过程简单,且成本低。In the embodiment of the present application, the above-mentioned dynamic conversion model is used to convert the reconstructed image with a low dynamic range into an image with a high dynamic range, and the whole conversion process is simple and low in cost.
在一些实施例中,动态转换模型在训练时的初始参数是预训练模型在预训练时得到的预训练参数。In some embodiments, the initial parameters of the dynamic conversion model during training are pre-training parameters obtained during pre-training of the pre-training model.
在一些实施例中,动态转换模型的损失函数包括重构损失函数、感知损失函数和样式损失函数中的至少一个。In some embodiments, the loss function of the dynamic transformation model includes at least one of a reconstruction loss function, a perceptual loss function, and a style loss function.
在一种示例中,动态转换模型的损失函数为如下公式所示:In one example, the loss function of the dynamic conversion model is as shown in the following formula:
Loss=L 1sL stpL p Loss=L 1s L stp L p
其中,Loss为动态转换模型的损失函数,L1为重构损失函数,Lst为感知损失函数,Lp为样式损失函数,λ s和λ p是超参数。 Among them, Loss is the loss function of the dynamic conversion model, L1 is the reconstruction loss function, Lst is the perceptual loss function, Lp is the style loss function, and λ s and λ p are hyperparameters.
在一种示例中,动态转换模型的重构损失函数是基于HDR图像真值的压缩色调映射值与HDR图像预测值的压缩色调映射值之间的误差确定的,其中HDR图像预测值的压缩色调映射值是根据预设的压缩色调映射函数和HDR图像预测值确定的,HDR图像真值的压缩色调映射值是根据压缩色调映射函数和HDR图像真值确定的。In one example, the reconstruction loss function of the dynamic transformation model is determined based on the error between the compressed tone-mapped values of the true value of the HDR image and the compressed tone-mapped value of the predicted value of the HDR image, where the compressed tone-mapped value of the predicted value of the HDR image The mapping value is determined according to the preset compressed tone mapping function and the predicted value of the HDR image, and the compressed tone mapping value of the true value of the HDR image is determined according to the compressed tone mapping function and the true value of the HDR image.
例如,动态转换模型的重构损失函数是基于如下公式确定的:For example, the reconstruction loss function of the dynamic transformation model is determined based on the following formula:
L 1=‖T(H)-T(GT)‖ 1 L 1 =‖T(H)-T(GT) ‖1
其中,L1表示重构损失函数,
Figure PCTCN2021102173-appb-000007
x=H或GT,H为训练动态转换模型时,动态转换模型输出的预测值,GT为训练图像的真实值,“‖.‖ 1”表示L1范数,μ为预设参数。
Among them, L1 represents the reconstruction loss function,
Figure PCTCN2021102173-appb-000007
x=H or GT, H is the predicted value output by the dynamic conversion model when training the dynamic conversion model, GT is the real value of the training image, "‖.‖ 1 "indicates the L1 norm, and μ is the preset parameter.
在一种示例中,动态转换模型的感知损失函数是基于第一特征值与第二特征值之间的误差确定的,其中,第一特征值为HDR图像预测值的压缩色调映射值在预训练模型的第l层的特征图中对应的特征值,第二特征值为HDR图像真值的压缩色调映射值在第l层的特征图中对应的特征值,HDR图像预测值的压缩色调映射值是根据预设的压缩色调映射函数和HDR图像预测值确定的,HDR图像真值的压缩色调映射值是根据压缩色调映射函数和HDR图像真值确定的。In one example, the perceptual loss function of the dynamic transformation model is determined based on the error between the first feature value and the second feature value, wherein the first feature value is the compressed tone map value of the predicted value of the HDR image in the pre-training The corresponding eigenvalue in the feature map of layer l of the model, the second eigenvalue is the compressed tone mapping value of the true value of the HDR image The corresponding eigenvalue in the feature map of layer l, the compressed tone mapping value of the predicted value of the HDR image It is determined according to the preset compressed tone mapping function and the predicted value of the HDR image, and the compressed tone mapping value of the true value of the HDR image is determined according to the compressed tone mapping function and the true value of the HDR image.
例如,动态转换模型的感知损失函数是基于如下公式确定的:For example, the perceptual loss function of the dynamic transition model is determined based on the following formula:
Figure PCTCN2021102173-appb-000008
Figure PCTCN2021102173-appb-000008
其中,Lp表示感知损失函数,φ l表示预训练模型的第l层的特征图,大小为C l×H l×W lAmong them, Lp represents the perceptual loss function, φ l represents the feature map of the l-th layer of the pre-training model, and the size is C l × H l × W l .
在一种示例中,动态转换模型的样式损失函数是基于第一元素值与第二元素值之间的误差确定的,其中,第一元素值为HDR图像预测值的压缩色调映射值在预训练模型的第l层特征图的格拉姆Gram矩阵中对应的元素值,第二元素值为HDR图像真值的压缩色调映射值在Gram矩阵中对应的元素值,HDR图像预测值的压缩色调映射值是根据预设的压缩色调映射函数和HDR图像预测值确定的,HDR图像真值的压缩色调映射值是根据压缩色调映射函数和HDR图像真值确定的。In one example, the style loss function of the dynamic transformation model is determined based on the error between the first element value and the second element value, wherein the first element value is the compressed tone map value of the HDR image prediction value in the pre-training The corresponding element value in the Gram matrix of the layer l feature map of the model, the second element value is the compressed tone mapping value of the true value of the HDR image corresponding to the element value in the Gram matrix, and the compressed tone mapping value of the predicted value of the HDR image It is determined according to the preset compressed tone mapping function and the predicted value of the HDR image, and the compressed tone mapping value of the true value of the HDR image is determined according to the compressed tone mapping function and the true value of the HDR image.
例如,动态转换模型的样式损失函数是基于如下公式确定的:For example, the style loss function of the dynamic transformation model is determined based on the following formula:
Figure PCTCN2021102173-appb-000009
Figure PCTCN2021102173-appb-000009
其中,Lp表示感知损失函数,G(.)是预训练模型的第l层特征的Gram矩阵,
Figure PCTCN2021102173-appb-000010
φ l表示预训练模型的第l层的特征图,大小为C l×H l×W l,K l大小为C lH lW l
Among them, Lp represents the perceptual loss function, G(.) is the Gram matrix of the l-th layer feature of the pre-trained model,
Figure PCTCN2021102173-appb-000010
φ l represents the feature map of layer l of the pre-training model, the size of which is C l ×H l ×W l , and the size of K l is C l H l W l .
本申请实施例,采用上述动态转换模型将低动态范围的重建图像转换为高动态范围的图像,整个转换过程简单,且成本低。另外,通过设定重构损失、感知损失和样式损失,以减少高动态范围图像重建失真,伪影和色调异常,在保证码率的前提下,进一步提升解码后的图像质量。In the embodiment of the present application, the reconstruction image with a low dynamic range is converted into an image with a high dynamic range by using the above dynamic conversion model, and the whole conversion process is simple and low in cost. In addition, by setting the reconstruction loss, perceptual loss and style loss to reduce high dynamic range image reconstruction distortion, artifacts and abnormal tone, the decoded image quality is further improved under the premise of ensuring the bit rate.
上文对动态转换模型应用于编解码系统中进行了介绍,上述动态转换模型还可以应用于其他的将低动态范围的图像转换为高动态范围的场景。The application of the dynamic conversion model to the codec system has been introduced above, and the above dynamic conversion model can also be applied to other scenarios where an image with a low dynamic range is converted to a high dynamic range.
图8为本申请一实施例提供的图像处理方法的流程示意图,如图8所示,该方法包括:Fig. 8 is a schematic flow chart of an image processing method provided by an embodiment of the present application. As shown in Fig. 8, the method includes:
S801、获取待处理的LDR图像;S801. Acquire the LDR image to be processed;
S802、将LDR图像输入动态转换模型进行动态转换,得到LDR图像的HDR图像。S802. Input the LDR image into the dynamic conversion model to perform dynamic conversion to obtain an HDR image of the LDR image.
其中,如图5A所示,动态转换模型包括:串联连接的N个编码模块和串联连接的N个解码模块,N个编码模块中的最后一个编码模块的输出与N个解码模块中的第一个解码模块的输入连接,且第i个编码模块与第N-i+1个解码模块跳跃连接,第i个编码模块用于对第i-1个编码模块输出的第i-1个第一特征信息进行特征提取,得到LDR图像的第i个第一特征信息,第N-i+1个解码模块用于对第i-1个第一特征信息和LDR图像的第N-i个第二特征信息进行特征提取,得到LDR图像的第N-i+1个第二特征信息,LDR图像的HDR图像是根据N个解码模块中最后一个解码模块输出的第二特征信息确定的,i为小于或等于N的正整数,N为正整数。Wherein, as shown in Figure 5A, the dynamic conversion model includes: N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in the N encoding modules and the first in the N decoding modules The input of the decoding module is connected, and the i-th encoding module is skipped to the N-i+1 decoding module, and the i-th encoding module is used for the i-1th first output of the i-1 encoding module The feature information is extracted to obtain the i-th first feature information of the LDR image, and the N-i+1 decoding module is used for the i-1-th first feature information and the N-i-th second feature information of the LDR image Perform feature extraction to obtain the N-i+1th second feature information of the LDR image, the HDR image of the LDR image is determined according to the second feature information output by the last decoding module in the N decoding modules, and i is less than or equal to A positive integer of N, where N is a positive integer.
动态转换模型的网络结构可以参照上述图5A至5G所示,具体参照上述实施例的描述,在此不再赘述。For the network structure of the dynamic conversion model, refer to the above-mentioned FIGS. 5A to 5G , and specifically refer to the description of the above-mentioned embodiments, and details are not repeated here.
应理解,图4至图8仅为本申请的示例,不应理解为对本申请的限制。It should be understood that Fig. 4 to Fig. 8 are only examples of the present application, and should not be construed as limiting the present application.
以上结合附图详细描述了本申请的优选实施方式,但是,本申请并不限于上述实施方式中的具体细节,在本申请的技术构思范围内,可以对本申请的技术方案进行多种简单变型,这些简单变型均属于本申请的保护范围。例如,在上述具体实施方式中所描述的各个具体技术特征,在不矛盾的情况下,可以通过任何合适的方式进行组合,为了避免不必要的重复,本申请对各种可能的组合方式不再另行说明。又例如,本申请的各种不同的实施方式之间也可以进行任意组合,只要其不违背本申请的思想,其同样应当视为本申请所公开的内容。The preferred embodiments of the present application have been described in detail above in conjunction with the accompanying drawings. However, the present application is not limited to the specific details in the above embodiments. Within the scope of the technical concept of the present application, various simple modifications can be made to the technical solutions of the present application. These simple modifications all belong to the protection scope of the present application. For example, the various specific technical features described in the above specific implementation manners can be combined in any suitable manner if there is no contradiction. Separately. As another example, any combination of various implementations of the present application can also be made, as long as they do not violate the idea of the present application, they should also be regarded as the content disclosed in the present application.
还应理解,在本申请的各种方法实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。另外,本申请实施例中,术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系。具体地,A和/或B可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。It should also be understood that in the various method embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the order of execution, and the order of execution of the processes should be determined by their functions and internal logic, and should not be used in this application. The implementation of the examples constitutes no limitation. In addition, in the embodiment of the present application, the term "and/or" is only an association relationship describing associated objects, indicating that there may be three relationships. Specifically, A and/or B may mean: A exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in this article generally indicates that the contextual objects are an "or" relationship.
上文结合图4至图8对动作转换模型的网络结构以及图像处理方法进行了介绍,下文结合图9至图12,详细描述本申请的装置实施例。The network structure of the action conversion model and the image processing method are introduced above with reference to FIG. 4 to FIG. 8 , and the device embodiment of the present application is described in detail below in conjunction with FIG. 9 to FIG. 12 .
图9是本申请实施例提供的图像解码装置的示意性框图,该图像解码装置可以为图3所示的解码器,或者为解码器中的部件,例如为解码器中的处理器。FIG. 9 is a schematic block diagram of an image decoding device provided by an embodiment of the present application. The image decoding device may be the decoder shown in FIG. 3 , or a component in the decoder, such as a processor in the decoder.
如图9所示,该图像解码装置10可包括:As shown in Figure 9, the image decoding device 10 may include:
解码单元11,用于解码码流,得到重建图像;Decoding unit 11, configured to decode the code stream to obtain a reconstructed image;
处理单元12,用于将所述重建图像输入动态转换模型进行动态转换,得到所述重建图像的高动态范围HDR图像;A processing unit 12, configured to input the reconstructed image into a dynamic conversion model for dynamic conversion to obtain a high dynamic range HDR image of the reconstructed image;
其中,所述动态转换模型包括:串联连接的N个编码模块和串联连接的N个解码模块,所述N个编码模块中的最后一个编码模块的输出与所述N个解码模块中的第一个解码模块的输入连接,且第i个编码模块与第N-i+1个解码模块跳跃连接,所述第i个编码模块用于对第i-1个编码模块输出的第i-1个第一特征信息进行特征提取,得到所述重建图像的第i个第一特征信息,所述第N-i+1个解码模块用于对所述第i-1个第一特征信息和所述重建图像的第N-i个第二特征信息进行特征提取,得到所述重建图像的第N-i+1个第二特征信息,所述重建图像的HDR图像是根据所述N个解码模块中最后一个解码模块输出的第二特征信息确定的,所述i为小于或等于N的正整数,所述N为正整数。Wherein, the dynamic conversion model includes: N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in the N encoding modules is the same as that of the first encoding module in the N decoding modules The input of the decoding module is connected, and the i-th encoding module is skip-connected to the N-i+1 decoding module, and the i-th encoding module is used for the i-1th output of the i-1 encoding module Feature extraction is performed on the first feature information to obtain the i-th first feature information of the reconstructed image, and the N-i+1th decoding module is used to extract the i-1-th first feature information and the Perform feature extraction on the N-ith second feature information of the reconstructed image to obtain the N-i+1th second feature information of the reconstructed image, and the HDR image of the reconstructed image is based on the last one of the N decoding modules Determined by the second feature information output by the decoding module, the i is a positive integer less than or equal to N, and the N is a positive integer.
在一种实施例中,所述动态转换模型还包括:位于所述第i个编码模块与所述第N-i+1个解码模块的跳跃连接中的卷积注意力模块;In one embodiment, the dynamic conversion model further includes: a convolutional attention module located in the skip connection between the i-th encoding module and the N-i+1-th decoding module;
所述卷积注意力模块用于对所述第i-1个第一特征信息进行空间信息与通道信息提取,得到所述重建图像的第i-1个第三特征信息;The convolutional attention module is used to extract spatial information and channel information from the i-1 first feature information to obtain the i-1 third feature information of the reconstructed image;
所述第N-i+1个解码模块用于对所述第i-1个第三特征信息和所述第N-i个第二特征信息进行特征提取,得到所述重建图像的第N-i+1个第二特征信息。The N-i+1th decoding module is used to perform feature extraction on the i-1th third feature information and the N-ith second feature information to obtain the N-i+th feature information of the reconstructed image. 1 piece of second characteristic information.
在一种实施例中,所述卷积注意力模块包括通道注意力模块和空间注意力模块;In one embodiment, the convolutional attention module includes a channel attention module and a spatial attention module;
所述通道注意力模块用于对所述第i-1个第一特征信息进行通道信息提取,得到所述第i-1个第一特征信息的通道注意力信息;The channel attention module is used to extract the channel information of the i-1 first feature information, and obtain the channel attention information of the i-1 first feature information;
所述空间注意力模块用于对所述第i-1个第一特征信息和所述第i-1个第一特征信息的通道注意力信息进行空间信息提取,得到所述第i-1个第一特征信息的空间注意力信息;The spatial attention module is used to extract spatial information from the i-1 first feature information and channel attention information of the i-1 first feature information, to obtain the i-1 first feature information Spatial attention information of the first feature information;
所述重建图像的第i-1个第三特征信息是根据所述第i-1个第一特征信息的通道注意力信息和空间注意力信息确定的。The i-1 th third feature information of the reconstructed image is determined according to the channel attention information and the spatial attention information of the i-1 th first feature information.
在一种实施例中,所述卷积注意力模块还包括第一乘法单元;In one embodiment, the convolutional attention module further includes a first multiplication unit;
所述第一乘法单元用于对所述第i-1个第一特征信息和第i-1个第一特征信息的通道注意力信息进行相乘,得到所述第i-1个第一特征信息的融合通道特征信息;The first multiplication unit is configured to multiply the i-1 first feature information and the channel attention information of the i-1 first feature information to obtain the i-1 first feature Information fusion channel feature information;
所述空间注意力模块用于对所述第i-1个第一特征信息的融合通道特征信息进行空间信息提取,得到所述第i-1个第一特征信息的空间注意力信息。The spatial attention module is configured to extract spatial information from the fused channel feature information of the i-1 first feature information to obtain the spatial attention information of the i-1 first feature information.
在一种实施例中,所述卷积注意力模块还包括第二乘法单元;In one embodiment, the convolutional attention module further includes a second multiplication unit;
所述第二乘法单元用于对所述第i-1个第一特征信息的融合通道特征信息和空间注意力信息进行 相乘,得到所述重建图像的第i-1个第三特征信息。The second multiplication unit is used to multiply the fusion channel feature information and the spatial attention information of the i-1 first feature information to obtain the i-1 third feature information of the reconstructed image.
在一种实施例中,所述通道注意力模块包括:第一空间压缩单元、第二空间压缩单元和通道特征提取单元;In one embodiment, the channel attention module includes: a first spatial compression unit, a second spatial compression unit, and a channel feature extraction unit;
所述第一空间压缩单元用于对所述第i-1个第一特征信息进行空间维度压缩,得到所述第i-1个第一特征信息的第一空间压缩信息;The first spatial compression unit is configured to perform spatial dimension compression on the i-1 first feature information to obtain first spatial compression information of the i-1 first feature information;
所述第二空间压缩单元用于对所述第i-1个第一特征信息进行空间维度压缩,得到所述第i-1个第一特征信息的第二空间压缩信息;The second spatial compression unit is configured to perform spatial dimension compression on the i-1 first feature information to obtain second spatial compression information of the i-1 first feature information;
所述通道特征提取单元用于对所述第i-1个第一特征信息的第一空间压缩信息进行通道特征提取,得到所述i-1个第一特征信息的第一通道信息,对所述第i-1个第一特征信息的第二空间压缩信息进行通道特征提取,得到所述i-1个第一特征信息的第二通道信息;The channel feature extraction unit is configured to perform channel feature extraction on the first spatially compressed information of the i-1 first feature information, obtain the first channel information of the i-1 first feature information, and perform the channel feature extraction on the i-1 first feature information. performing channel feature extraction on the second spatial compression information of the i-1 first feature information, and obtaining the second channel information of the i-1 first feature information;
所述第i-1个第一特征信息的通道注意力信息是根据所述i-1个第一特征信息的第一通道信息和第二通道信息确定的。The channel attention information of the i-1 first feature information is determined according to the first channel information and the second channel information of the i-1 first feature information.
在一种实施例中,所述第一空间压缩单元和/或所述第二空间压缩单元包括池化层。In an embodiment, the first spatial compression unit and/or the second spatial compression unit includes a pooling layer.
在一种实施例中,所述第一空间压缩单元为最大池化层,和/或所述第二空间压缩单元为平均池化层。In an embodiment, the first spatial compression unit is a maximum pooling layer, and/or the second spatial compression unit is an average pooling layer.
在一种实施例中,所述通道特征提取单元为多层感知机MLP。In one embodiment, the channel feature extraction unit is a multi-layer perceptron MLP.
在一种实施例中,所述通道注意力模块还包括:第一加法单元和第一激活函数;In one embodiment, the channel attention module further includes: a first addition unit and a first activation function;
所述第一加法单元用于对所述i-1个第一特征信息的第一通道信息和第二通道信息进行相加,得到所述i-1个第一特征信息的融合通道信息;The first adding unit is configured to add the first channel information and the second channel information of the i-1 pieces of first feature information to obtain the fusion channel information of the i-1 pieces of first feature information;
所述第一激活函数用于对所述i-1个第一特征信息的融合通道信息进行非线性处理,得到所述第i-1个第一特征信息的通道注意力信息。The first activation function is used to perform nonlinear processing on the fused channel information of the i-1 pieces of first feature information to obtain channel attention information of the i-1 th piece of first feature information.
在一种实施例中,所述空间注意力模块包括:第一通道压缩单元、第二通道压缩单元和空间特征提取单元;In one embodiment, the spatial attention module includes: a first channel compression unit, a second channel compression unit, and a spatial feature extraction unit;
所述第一通道压缩单元用于对所述第i-1个第一特征信息的融合通道特征信息进行通道维度压缩,得到所述第i-1个第一特征信息的第一通道压缩信息;The first channel compression unit is configured to perform channel dimension compression on the fused channel feature information of the i-1 first feature information, to obtain the first channel compression information of the i-1 first feature information;
所述第二通道压缩单元用于对所述第i-1个第一特征信息的融合通道特征信息进行通道维度压缩,得到所述第i-1个第一特征信息的第二通道压缩信息;The second channel compression unit is configured to perform channel dimension compression on the fused channel feature information of the i-1 first feature information to obtain second channel compression information of the i-1 first feature information;
所述空间特征提取单元用于对所述第i-1个第一特征信息的第一通道压缩信息和第二通道压缩信息进行空间特征提取,得到所述第i-1个第一特征信息的空间特征信息;The spatial feature extraction unit is configured to perform spatial feature extraction on the first channel compressed information and the second channel compressed information of the i-1 first feature information, to obtain the i-1 first feature information Spatial feature information;
所述第i-1个第一特征信息的空间注意力信息是根据所述第i-1个第一特征信息的空间特征信息确定的。The spatial attention information of the i-1 th first feature information is determined according to the spatial feature information of the i-1 th first feature information.
在一种实施例中,所述第一通道压缩单元和/或所述第二通道压缩单元包括池化层。In an embodiment, the first channel compression unit and/or the second channel compression unit includes a pooling layer.
在一种实施例中,所述第一通道压缩单元为最大池化层,和/或所述第二通道压缩单元为平均池化层。In one embodiment, the first channel compression unit is a maximum pooling layer, and/or the second channel compression unit is an average pooling layer.
在一种实施例中,所述空间特征提取单元为卷积层。In one embodiment, the spatial feature extraction unit is a convolutional layer.
在一种实施例中,所述空间注意力模块还包括第二激活函数;In one embodiment, the spatial attention module further includes a second activation function;
所述第二激活函数用于对所述第i-1个第一特征信息的空间特征信息进行非线性处理,得到所述第i-1个第一特征信息的空间注意力信息。The second activation function is used to perform nonlinear processing on the spatial feature information of the i-1 th first feature information to obtain the spatial attention information of the i-1 th first feature information.
在一种实施例中,所述第i-1个第一特征信息的通道注意力信息的空间维度为1×1。In an embodiment, the spatial dimension of the channel attention information of the i-1 th first feature information is 1×1.
在一种实施例中,所述第i-1个第一特征信息的空间注意力信息的特征维度为1。In an embodiment, the feature dimension of the spatial attention information of the i-1 th first feature information is 1.
在一种实施例中,所述动态转换模型还包括至少一个下采样单元;In one embodiment, the dynamic conversion model further includes at least one downsampling unit;
所述下采样单元用于对所述编码模块输出的特征信息进行空间维度下采样。The down-sampling unit is used for down-sampling the feature information output by the encoding module in a spatial dimension.
在一种实施例中,所述下采样单元为最大池化层。In one embodiment, the downsampling unit is a max pooling layer.
在一种实施例中,所述动态转换模型还包括至少一个上采样单元;In one embodiment, the dynamic conversion model further includes at least one upsampling unit;
所述上采样单元用于对所述解码模块输出的特征信息进行空间维度上采样。The up-sampling unit is used for up-sampling the feature information output by the decoding module in a spatial dimension.
在一种实施例中,所述上采样单元为双线性插值单元。In an embodiment, the upsampling unit is a bilinear interpolation unit.
在一种实施例中,所述N个编码模块中每个编码模块包括至少一个卷积块,其中所述N个编码 模块中每个编码模块所包括的卷积块的参数不完全相同。In one embodiment, each of the N coding modules includes at least one convolutional block, wherein the parameters of the convolutional blocks included in each of the N coding modules are not completely the same.
在一种实施例中,所述N个解码模块中每个解码模块包括至少一个卷积块,其中所述N个解码模块中每个解码模块所包括的卷积块的参数不完全相同。In an embodiment, each of the N decoding modules includes at least one convolutional block, and parameters of the convolutional blocks included in each of the N decoding modules are not completely the same.
在一种实施例中,若所述i等于N,则所述第N-i个第二特征信息是根据所述第N个编码模块输出的第N个第一特征信息确定的;或者,In one embodiment, if the i is equal to N, the N-i second feature information is determined according to the N first feature information output by the N encoding module; or,
若所述i小于N,则所述第N-i个第二特征信息是根据第N-i个解码模块输出的第N-i个第二特征信息确定的;或者,If the i is less than N, the N-i-th second feature information is determined according to the N-i-th second feature information output by the N-i-th decoding module; or,
若所述i等于1,则所述第i-1个第一特征信息是根据所述重建图像确定的;或者,If the i is equal to 1, the i-1th first feature information is determined according to the reconstructed image; or,
若所述i大于1,则所述第i-1个第一特征信息是根据第i-1个编码模块输出的第一特征信息确定的。If the i is greater than 1, the i-1 first feature information is determined according to the first feature information output by the i-1 encoding module.
在一种实施例中,所述第N-i+1个解码模块用于对对所述第i-1个第三特征信息和所述第N-i个第二特征信息级联后的特征信息进行特征提取,得到所述重建图像的第N-i+1个第二特征信息。In one embodiment, the N-i+1th decoding module is configured to perform concatenated feature information on the i-1th third feature information and the N-ith second feature information feature extraction, to obtain the N-i+1th second feature information of the reconstructed image.
在一种实施例中,所述动态转换模型还包括第一卷积层;In one embodiment, the dynamic conversion model further includes a first convolutional layer;
所述第一卷积层用于对所述重建图像进行特征提取,得到所述重建图像的初始特征图,并将所述初始特征图分别输入第一个编码模块和第一卷积注意力模块中。The first convolutional layer is used to perform feature extraction on the reconstructed image to obtain an initial feature map of the reconstructed image, and input the initial feature map to the first coding module and the first convolutional attention module respectively middle.
在一种实施例中,所述动态转换模型还包括第二卷积层;In one embodiment, the dynamic conversion model further includes a second convolutional layer;
所述第二卷积层用于对最后一个解码模块输出的所述重建图像的第二特征信息进行特征提取,输出所述重建图像的HDR图像。The second convolutional layer is used to perform feature extraction on the second feature information of the reconstructed image output by the last decoding module, and output an HDR image of the reconstructed image.
在一种实施例中,所述动态转换模型在训练时的初始参数是预训练模型在预训练时得到的预训练参数。In an embodiment, the initial parameters of the dynamic conversion model during training are pre-training parameters obtained during pre-training of the pre-training model.
在一种实施例中,所述动态转换模型的损失函数包括重构损失函数、感知损失函数和样式损失函数中的至少一个。In one embodiment, the loss function of the dynamic conversion model includes at least one of a reconstruction loss function, a perceptual loss function and a style loss function.
在一种实施例中,所述动态转换模型的损失函数为如下公式所示:In one embodiment, the loss function of the dynamic conversion model is as shown in the following formula:
Loss=L 1sL stpL p Loss=L 1s L stp L p
其中,Loss为所述动态转换模型的损失函数,所述L1为所述重构损失函数,所述Lst为所述感知损失函数,所述Lp为所述样式损失函数,所述λ s和λ p是超参数。 Wherein, Loss is the loss function of the dynamic conversion model, the L1 is the reconstruction loss function, the Lst is the perceptual loss function, the Lp is the style loss function, and the λ s and λ p is a hyperparameter.
在一种实施例中,所述动态转换模型的重构损失函数是根据HDR图像真值的压缩色调映射值与HDR图像预测值的压缩色调映射值之间的误差确定的,其中所述HDR图像预测值的压缩色调映射值是根据预设的压缩色调映射函数和所述HDR图像预测值确定的,所述HDR图像真值的压缩色调映射值是根据所述压缩色调映射函数和所述HDR图像真值确定的。In one embodiment, the reconstruction loss function of the dynamic transformation model is determined according to the error between the compressed tone mapping value of the true value of the HDR image and the compressed tone mapping value of the predicted value of the HDR image, wherein the HDR image The compressed tone mapping value of the predicted value is determined according to the preset compressed tone mapping function and the predicted value of the HDR image, and the compressed tone mapping value of the real value of the HDR image is determined according to the compressed tone mapping function and the HDR image The truth value is determined.
例如,动态转换模型的重构损失函数是基于如下公式确定的:For example, the reconstruction loss function of the dynamic transformation model is determined based on the following formula:
L 1=‖T(H)-T(GT)‖ 1 L 1 =‖T(H)-T(GT) ‖1
其中,L1表示所述重构损失函数,
Figure PCTCN2021102173-appb-000011
x=H或GT,所述H为训练所述动态转换模型时,所述动态转换模型输出的预测值,所述GT为训练图像的真实值,“‖.‖ 1”表示L1范数,μ为预设参数。
Among them, L1 represents the reconstruction loss function,
Figure PCTCN2021102173-appb-000011
x=H or GT, the H is the predicted value output by the dynamic conversion model when the dynamic conversion model is trained, the GT is the real value of the training image, "‖.‖ 1 "represents the L1 norm, μ is a preset parameter.
在一种实施例中,所述动态转换模型的感知损失函数是基于第一特征值与第二特征值之间的误差确定的,其中,所述第一特征值为HDR图像预测值的压缩色调映射值在所述预训练模型的第l层的特征图中对应的特征值,所述第二特征值为HDR图像真值的压缩色调映射值在所述第l层的特征图中对应的特征值,所述HDR图像预测值的压缩色调映射值是根据预设的压缩色调映射函数和所述HDR图像预测值确定的,所述HDR图像真值的压缩色调映射值是根据所述压缩色调映射函数和所述HDR图像真值确定的。In one embodiment, the perceptual loss function of the dynamic conversion model is determined based on an error between a first eigenvalue and a second eigenvalue, wherein the first eigenvalue is the compressed tone of the predicted value of the HDR image A feature value corresponding to the mapping value in the feature map of the first layer of the pre-training model, and the second feature value is a feature corresponding to the compressed tone mapping value of the true value of the HDR image in the feature map of the first layer value, the compressed tone mapping value of the predicted value of the HDR image is determined according to the preset compressed tone mapping function and the predicted value of the HDR image, and the compressed tone mapping value of the real value of the HDR image is determined according to the compressed tone mapping function and the ground truth value of the HDR image is determined.
例如,动态转换模型的感知损失函数是基于如下公式确定的:For example, the perceptual loss function of the dynamic transition model is determined based on the following formula:
Figure PCTCN2021102173-appb-000012
Figure PCTCN2021102173-appb-000012
其中,Lp表示所述感知损失函数,
Figure PCTCN2021102173-appb-000013
x=H或GT,所述H为训练所述动态转换模型时,所述动态转换模型输出的预测值,所述GT为训练图像的真实值,“‖.‖ 1”表示L1范数,μ为预设参数,φ l表示所述预训练模型的第l层的特征图,大小为C l×H l×W l
where Lp represents the perceptual loss function,
Figure PCTCN2021102173-appb-000013
x=H or GT, the H is the predicted value output by the dynamic conversion model when the dynamic conversion model is trained, the GT is the real value of the training image, "‖.‖ 1 "represents the L1 norm, μ is a preset parameter, φ l represents the feature map of layer l of the pre-training model, and its size is C l ×H l ×W l .
在一种实施例中,所述动态转换模型的样式损失函数是基于第一元素值与第二元素值之间的误差确定的,其中,所述第一元素值为HDR图像预测值的压缩色调映射值在所述预训练模型的第l层特征图的格拉姆Gram矩阵中对应的元素值,所述第二元素值为HDR图像真值的压缩色调映射值在所述Gram矩阵中对应的元素值,所述HDR图像预测值的压缩色调映射值是根据预设的压缩色调映射函数和所述HDR图像预测值确定的,所述HDR图像真值的压缩色调映射值是根据所述压缩色调映射函数和所述HDR图像真值确定的。In one embodiment, the style loss function of the dynamic conversion model is determined based on an error between a first element value and a second element value, wherein the first element value is a compressed tone of an HDR image prediction value The element value corresponding to the mapping value in the Gram Gram matrix of the l-th layer feature map of the pre-training model, and the second element value is the corresponding element in the Gram matrix of the compressed tone mapping value of the true value of the HDR image value, the compressed tone mapping value of the predicted value of the HDR image is determined according to the preset compressed tone mapping function and the predicted value of the HDR image, and the compressed tone mapping value of the real value of the HDR image is determined according to the compressed tone mapping function and the ground truth value of the HDR image is determined.
例如,动态转换模型的样式损失函数是基于如下公式确定的:For example, the style loss function of the dynamic transformation model is determined based on the following formula:
Figure PCTCN2021102173-appb-000014
Figure PCTCN2021102173-appb-000014
其中,Lp表示所述感知损失函数,
Figure PCTCN2021102173-appb-000015
G(.)是所述预训练模型的第l层特征的Gram矩阵,
Figure PCTCN2021102173-appb-000016
x=H或GT,所述H为训练所述动态转换模型时,所述动态转换模型输出的预测值,所述GT为训练图像的HDR真实值,“‖.‖ 1”表示L1范数,μ为预设参数,φ l表示所述预训练模型的第l层的特征图,大小为C l×H l×W l,所述K l大小为C lH lW l
where Lp represents the perceptual loss function,
Figure PCTCN2021102173-appb-000015
G(.) is the Gram matrix of the l-th layer feature of the pre-training model,
Figure PCTCN2021102173-appb-000016
x=H or GT, the H is the predicted value output by the dynamic conversion model when the H is training the dynamic conversion model, the GT is the HDR true value of the training image, "‖.‖ 1 "represents the L1 norm, μ is a preset parameter, φ l represents the feature map of layer l of the pre-training model, and its size is C l ×H l ×W l , and the size of K l is C l H l W l .
应理解,装置实施例与方法实施例可以相互对应,类似的描述可以参照方法实施例。为避免重复,此处不再赘述。具体地,图9所示的装置10可以对应于执行本申请实施例的图像解码方法中的相应主体,并且装置10中的各个单元的前述和其它操作和/或功能分别为了实现图像解码方法中的相应流程,为了简洁,在此不再赘述。It should be understood that the device embodiment and the method embodiment may correspond to each other, and similar descriptions may refer to the method embodiment. To avoid repetition, details are not repeated here. Specifically, the device 10 shown in FIG. 9 may correspond to the corresponding subject in the image decoding method of the embodiment of the present application, and the aforementioned and other operations and/or functions of each unit in the device 10 are respectively in order to realize the image decoding method For the sake of brevity, the corresponding process will not be repeated here.
图10是本申请实施例提供的图像处理装置的示意性框图。Fig. 10 is a schematic block diagram of an image processing device provided by an embodiment of the present application.
如图10所示,该图像处理装置20可包括:As shown in Figure 10, the image processing device 20 may include:
获取单元21,用于获取待处理的低动态范围LDR图像;An acquisition unit 21, configured to acquire a low dynamic range LDR image to be processed;
处理单元22,用于将所述LDR图像输入动态转换模型进行动态转换,得到所述LDR图像的高动态范围HDR图像;A processing unit 22, configured to input the LDR image into a dynamic conversion model for dynamic conversion to obtain a high dynamic range HDR image of the LDR image;
其中,所述动态转换模型包括:串联连接的N个编码模块和串联连接的N个解码模块,所述N个编码模块中的最后一个编码模块的输出与所述N个解码模块中的第一个解码模块的输入连接,且第i个编码模块与第N-i+1个解码模块跳跃连接,所述第i个编码模块用于对第i-1个编码模块输出的第i-1个第一特征信息进行特征提取,得到所述LDR图像的第i个第一特征信息,所述第N-i+1个解码模块用于对所述第i-1个第一特征信息和所述LDR图像的第N-i个第二特征信息进行特征提取,得到所述LDR图像的第N-i+1个第二特征信息,所述LDR图像的HDR图像是根据所述N个解码模块中最后一个解码模块输出的第二特征信息确定的,所述i为小于或等于N的正整数,所述N为正整数。Wherein, the dynamic conversion model includes: N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in the N encoding modules is the same as that of the first encoding module in the N decoding modules The input of the decoding module is connected, and the i-th encoding module is skip-connected to the N-i+1 decoding module, and the i-th encoding module is used for the i-1th output of the i-1 encoding module Feature extraction is performed on the first feature information to obtain the ith first feature information of the LDR image, and the N-i+1 decoding module is used to extract the i-1 first feature information and the Feature extraction is performed on the N-i second feature information of the LDR image to obtain the N-i+1 second feature information of the LDR image, and the HDR image of the LDR image is based on the last one of the N decoding modules Determined by the second feature information output by the decoding module, the i is a positive integer less than or equal to N, and the N is a positive integer.
在一些实施例中,所述动态转换模型还包括:位于所述第i个编码模块与所述第N-i+1个解码模块的跳跃连接中的卷积注意力模块;In some embodiments, the dynamic conversion model further includes: a convolutional attention module located in the skip connection between the i-th encoding module and the N-i+1-th decoding module;
所述卷积注意力模块用于对所述第i-1个第一特征信息进行空间信息与通道信息提取,得到所述LDR图像的第i-1个第三特征信息;The convolutional attention module is used to extract spatial information and channel information from the i-1 first feature information to obtain the i-1 third feature information of the LDR image;
所述第N-i+1个解码模块用于对所述第i-1个第三特征信息和所述第N-i个第二特征信息进行特征提取,得到所述LDR图像的第N-i+1个第二特征信息。The N-i+1th decoding module is used to perform feature extraction on the i-1th third feature information and the N-ith second feature information to obtain the N-i+th feature information of the LDR image. 1 piece of second characteristic information.
在一些实施例中,所述卷积注意力模块包括通道注意力模块和空间注意力模块;In some embodiments, the convolutional attention module includes a channel attention module and a spatial attention module;
所述通道注意力模块用于对所述第i-1个第一特征信息进行通道信息提取,得到所述第i-1个第 一特征信息的通道注意力信息;The channel attention module is used to extract the channel information of the i-1 first feature information, and obtain the channel attention information of the i-1 first feature information;
所述空间注意力模块用于对所述第i-1个第一特征信息和所述第i-1个第一特征信息的通道注意力信息进行空间信息提取,得到所述第i-1个第一特征信息的空间注意力信息;The spatial attention module is used to extract spatial information from the i-1 first feature information and channel attention information of the i-1 first feature information, to obtain the i-1 first feature information Spatial attention information of the first feature information;
所述LDR图像的第i-1个第三特征信息是根据所述第i-1个第一特征信息的通道注意力信息和空间注意力信息确定的。The i-1 th third feature information of the LDR image is determined according to the channel attention information and the spatial attention information of the i-1 th first feature information.
在一些实施例中,所述卷积注意力模块还包括第一乘法单元;In some embodiments, the convolutional attention module further includes a first multiplication unit;
所述第一乘法单元用于对所述第i-1个第一特征信息和第i-1个第一特征信息的通道注意力信息进行相乘,得到所述第i-1个第一特征信息的融合通道特征信息;The first multiplication unit is configured to multiply the i-1 first feature information and the channel attention information of the i-1 first feature information to obtain the i-1 first feature Information fusion channel feature information;
所述空间注意力模块用于对所述第i-1个第一特征信息的融合通道特征信息进行空间信息提取,得到所述第i-1个第一特征信息的空间注意力信息。The spatial attention module is configured to extract spatial information from the fused channel feature information of the i-1 first feature information to obtain the spatial attention information of the i-1 first feature information.
在一些实施例中,所述卷积注意力模块还包括第二乘法单元;In some embodiments, the convolutional attention module further includes a second multiplication unit;
所述第二乘法单元用于对所述第i-1个第一特征信息的融合通道特征信息和空间注意力信息进行相乘,得到所述LDR图像的第i-1个第三特征信息。The second multiplication unit is configured to multiply the fusion channel feature information and the spatial attention information of the i-1 first feature information to obtain the i-1 third feature information of the LDR image.
在一些实施例中,所述通道注意力模块包括:第一空间压缩单元、第二空间压缩单元和通道特征提取单元;In some embodiments, the channel attention module includes: a first spatial compression unit, a second spatial compression unit, and a channel feature extraction unit;
所述第一空间压缩单元用于对所述第i-1个第一特征信息进行空间维度压缩,得到所述第i-1个第一特征信息的第一空间压缩信息;The first spatial compression unit is configured to perform spatial dimension compression on the i-1 first feature information to obtain first spatial compression information of the i-1 first feature information;
所述第二空间压缩单元用于对所述第i-1个第一特征信息进行空间维度压缩,得到所述第i-1个第一特征信息的第二空间压缩信息;The second spatial compression unit is configured to perform spatial dimension compression on the i-1 first feature information to obtain second spatial compression information of the i-1 first feature information;
所述通道特征提取单元用于对所述第i-1个第一特征信息的第一空间压缩信息进行通道特征提取,得到所述i-1个第一特征信息的第一通道信息,对所述第i-1个第一特征信息的第二空间压缩信息进行通道特征提取,得到所述i-1个第一特征信息的第二通道信息;The channel feature extraction unit is configured to perform channel feature extraction on the first spatially compressed information of the i-1 first feature information, obtain the first channel information of the i-1 first feature information, and perform the channel feature extraction on the i-1 first feature information. performing channel feature extraction on the second spatial compression information of the i-1 first feature information, and obtaining the second channel information of the i-1 first feature information;
所述第i-1个第一特征信息的通道注意力信息是根据所述i-1个第一特征信息的第一通道信息和第二通道信息确定的。The channel attention information of the i-1 first feature information is determined according to the first channel information and the second channel information of the i-1 first feature information.
在一些实施例中,所述第一空间压缩单元和/或所述第二空间压缩单元包括池化层。In some embodiments, the first spatial compression unit and/or the second spatial compression unit comprises a pooling layer.
在一些实施例中,所述第一空间压缩单元为最大池化层,和/或所述第二空间压缩单元为平均池化层。In some embodiments, the first spatial compression unit is a max pooling layer, and/or the second spatial compression unit is an average pooling layer.
在一些实施例中,所述通道特征提取单元为多层感知机MLP。In some embodiments, the channel feature extraction unit is a multi-layer perceptron MLP.
在一些实施例中,所述通道注意力模块还包括:第一加法单元和第一激活函数;In some embodiments, the channel attention module further includes: a first addition unit and a first activation function;
所述第一加法单元用于对所述i-1个第一特征信息的第一通道信息和第二通道信息进行相加,得到所述i-1个第一特征信息的融合通道信息;The first adding unit is configured to add the first channel information and the second channel information of the i-1 pieces of first feature information to obtain the fusion channel information of the i-1 pieces of first feature information;
所述第一激活函数用于对所述i-1个第一特征信息的融合通道信息进行非线性处理,得到所述第i-1个第一特征信息的通道注意力信息。The first activation function is used to perform nonlinear processing on the fused channel information of the i-1 pieces of first feature information to obtain channel attention information of the i-1 th piece of first feature information.
在一些实施例中,所述空间注意力模块包括:第一通道压缩单元、第二通道压缩单元和空间特征提取单元;In some embodiments, the spatial attention module includes: a first channel compression unit, a second channel compression unit, and a spatial feature extraction unit;
所述第一通道压缩单元用于对所述第i-1个第一特征信息的融合通道特征信息进行通道维度压缩,得到所述第i-1个第一特征信息的第一通道压缩信息;The first channel compression unit is configured to perform channel dimension compression on the fused channel feature information of the i-1 first feature information, to obtain the first channel compression information of the i-1 first feature information;
所述第二通道压缩单元用于对所述第i-1个第一特征信息的融合通道特征信息进行通道维度压缩,得到所述第i-1个第一特征信息的第二通道压缩信息;The second channel compression unit is configured to perform channel dimension compression on the fused channel feature information of the i-1 first feature information to obtain second channel compression information of the i-1 first feature information;
所述空间特征提取单元用于对所述第i-1个第一特征信息的第一通道压缩信息和第二通道压缩信息进行空间特征提取,得到所述第i-1个第一特征信息的空间特征信息;The spatial feature extraction unit is configured to perform spatial feature extraction on the first channel compressed information and the second channel compressed information of the i-1 first feature information, to obtain the i-1 first feature information Spatial feature information;
所述第i-1个第一特征信息的空间注意力信息是根据所述第i-1个第一特征信息的空间特征信息确定的。The spatial attention information of the i-1 th first feature information is determined according to the spatial feature information of the i-1 th first feature information.
在一些实施例中,所述第一通道压缩单元和/或所述第二通道压缩单元包括池化层。In some embodiments, the first channel compression unit and/or the second channel compression unit comprises a pooling layer.
在一些实施例中,所述第一通道压缩单元为最大池化层,和/或所述第二通道压缩单元为平均池化层。In some embodiments, the first channel compression unit is a max pooling layer, and/or the second channel compression unit is an average pooling layer.
在一些实施例中,所述空间特征提取单元为卷积层。In some embodiments, the spatial feature extraction unit is a convolutional layer.
在一些实施例中,所述空间注意力模块还包括第二激活函数;In some embodiments, the spatial attention module further includes a second activation function;
所述第二激活函数用于对所述第i-1个第一特征信息的空间特征信息进行非线性处理,得到所述第i-1个第一特征信息的空间注意力信息。The second activation function is used to perform nonlinear processing on the spatial feature information of the i-1 th first feature information to obtain the spatial attention information of the i-1 th first feature information.
在一些实施例中,所述第i-1个第一特征信息的通道注意力信息的空间维度为1×1。In some embodiments, the spatial dimension of the channel attention information of the i-1 th first feature information is 1×1.
在一些实施例中,所述第i-1个第一特征信息的空间注意力信息的特征维度为1。In some embodiments, the feature dimension of the spatial attention information of the i-1 th first feature information is 1.
在一些实施例中,所述动态转换模型还包括至少一个下采样单元;In some embodiments, the dynamic conversion model further includes at least one downsampling unit;
所述下采样单元用于对所述编码模块输出的特征信息进行空间维度下采样。The down-sampling unit is used for down-sampling the feature information output by the encoding module in a spatial dimension.
在一些实施例中,所述下采样单元为最大池化层。In some embodiments, the downsampling unit is a max pooling layer.
在一些实施例中,所述动态转换模型还包括至少一个上采样单元;In some embodiments, the dynamic conversion model further includes at least one upsampling unit;
所述上采样单元用于对所述解码模块输出的特征信息进行空间维度上采样。The up-sampling unit is used for up-sampling the feature information output by the decoding module in a spatial dimension.
在一些实施例中,所述上采样单元为双线性插值单元。In some embodiments, the upsampling unit is a bilinear interpolation unit.
在一些实施例中,所述N个编码模块中每个编码模块包括至少一个卷积块,其中所述N个编码模块中每个编码模块所包括的卷积块的参数不完全相同。In some embodiments, each of the N encoding modules includes at least one convolutional block, and parameters of the convolutional blocks included in each of the N encoding modules are not completely the same.
在一些实施例中,所述N个解码模块中每个解码模块包括至少一个卷积块,其中所述N个解码模块中每个解码模块所包括的卷积块的参数不完全相同。In some embodiments, each of the N decoding modules includes at least one convolutional block, wherein parameters of the convolutional blocks included in each of the N decoding modules are not completely the same.
在一些实施例中,若所述i等于N,则所述第N-i个第二特征信息是根据所述第N个编码模块输出的第N个第一特征信息确定的;或者,In some embodiments, if the i is equal to N, the N-i th second feature information is determined according to the N th first feature information output by the N th encoding module; or,
若所述i小于N,则所述第N-i个第二特征信息是根据第N-i个解码模块输出的第N-i个第二特征信息确定的;或者,If the i is less than N, the N-i-th second feature information is determined according to the N-i-th second feature information output by the N-i-th decoding module; or,
若所述i等于1,则所述第i-1个第一特征信息是根据所述LDR图像确定的;或者,If the i is equal to 1, the i-1th first feature information is determined according to the LDR image; or,
若所述i大于1,则所述第i-1个第一特征信息是根据第i-1个编码模块输出的第一特征信息确定的。If the i is greater than 1, the i-1 first feature information is determined according to the first feature information output by the i-1 encoding module.
在一些实施例中,所述第N-i+1个解码模块用于对所述第i-1个第三特征信息和所述第N-i个第二特征信息级联后的特征信息进行特征提取,得到所述LDR图像的第N-i+1个第二特征信息。In some embodiments, the N-i+1th decoding module is used to perform feature extraction on the concatenated feature information of the i-1th third feature information and the N-ith second feature information , to obtain the N-i+1th second feature information of the LDR image.
在一些实施例中,所述动态转换模型还包括第一卷积层;In some embodiments, the dynamic transformation model further includes a first convolutional layer;
所述第一卷积层用于对所述LDR图像进行特征提取,得到所述LDR图像的初始特征图,并将所述初始特征图分别输入第一个编码模块和第一卷积注意力模块中。The first convolutional layer is used to extract features from the LDR image, obtain an initial feature map of the LDR image, and input the initial feature map to the first coding module and the first convolutional attention module respectively middle.
在一些实施例中,所述动态转换模型还包括第二卷积层;In some embodiments, the dynamic transformation model further includes a second convolutional layer;
所述第二卷积层用于对最后一个解码模块输出的所述LDR图像的第二特征信息进行特征提取,输出所述LDR图像的HDR图像。The second convolutional layer is used to perform feature extraction on the second feature information of the LDR image output by the last decoding module, and output an HDR image of the LDR image.
在一些实施例中,所述动态转换模型在训练时的初始参数是预训练模型在预训练时得到的预训练参数。In some embodiments, the initial parameters of the dynamic transformation model during training are pre-training parameters obtained during pre-training of the pre-training model.
在一些实施例中,所述动态转换模型的损失函数包括重构损失函数、感知损失函数和样式损失函数中的至少一个。In some embodiments, the loss function of the dynamic transformation model includes at least one of a reconstruction loss function, a perceptual loss function and a style loss function.
在一些实施例中,所述动态转换模型的损失函数为如下公式所示:In some embodiments, the loss function of the dynamic conversion model is as shown in the following formula:
Loss=L 1sL stpL p Loss=L 1s L stp L p
其中,Loss为所述动态转换模型的损失函数,所述L1为所述重构损失函数,所述Lst为所述感知损失函数,所述Lp为所述样式损失函数,所述λ s和λ p是超参数。 Wherein, Loss is the loss function of the dynamic conversion model, the L1 is the reconstruction loss function, the Lst is the perceptual loss function, the Lp is the style loss function, and the λ s and λ p is a hyperparameter.
在一些实施例中,所述动态转换模型的重构损失函数是根据HDR图像真值的压缩色调映射值与HDR图像预测值的压缩色调映射值之间的误差确定的,其中所述HDR图像预测值的压缩色调映射值是根据预设的压缩色调映射函数和所述HDR图像预测值确定的,所述HDR图像真值的压缩色调映射值是根据所述压缩色调映射函数和所述HDR图像真值确定的。In some embodiments, the reconstruction loss function of the dynamic transformation model is determined from the error between the compressed tone-mapped values of the true value of the HDR image and the compressed tone-mapped value of the predicted value of the HDR image, wherein the predicted HDR image The compressed tone-mapping value of the value is determined according to the preset compressed tone-mapping function and the predicted value of the HDR image, and the compressed tone-mapping value of the real value of the HDR image is determined according to the compressed tone-mapping function and the real value of the HDR image. The value is determined.
例如,动态转换模型的重构损失函数是基于如下公式确定的:For example, the reconstruction loss function of the dynamic transformation model is determined based on the following formula:
L 1=‖T(H)-T(GT)‖ 1 L 1 =‖T(H)-T(GT) ‖1
其中,L1表示所述重构损失函数,
Figure PCTCN2021102173-appb-000017
x=H或GT,所述H为训练所述动态转 换模型时,所述动态转换模型输出的预测值,所述GT为训练图像的真实值,“‖.‖ 1”表示L1范数,μ为预设参数。
Among them, L1 represents the reconstruction loss function,
Figure PCTCN2021102173-appb-000017
x=H or GT, the H is the predicted value output by the dynamic conversion model when the dynamic conversion model is trained, the GT is the real value of the training image, "‖.‖ 1 "represents the L1 norm, μ is a preset parameter.
在一些实施例中,所述动态转换模型的感知损失函数是基于第一特征值与第二特征值之间的误差确定的,其中,所述第一特征值为HDR图像预测值的压缩色调映射值在所述预训练模型的第l层的特征图中对应的特征值,所述第二特征值为HDR图像真值的压缩色调映射值在所述第l层的特征图中对应的特征值,所述HDR图像预测值的压缩色调映射值是根据预设的压缩色调映射函数和所述HDR图像预测值确定的,所述HDR图像真值的压缩色调映射值是根据所述压缩色调映射函数和所述HDR图像真值确定的。In some embodiments, the perceptual loss function of the dynamic transformation model is determined based on an error between a first eigenvalue and a second eigenvalue, wherein the first eigenvalue is a compressed tone map of an HDR image prediction value The value corresponds to the feature value in the feature map of the first layer of the pre-training model, and the second feature value is the corresponding feature value of the compressed tone mapping value of the true value of the HDR image in the feature map of the first layer , the compressed tone mapping value of the predicted value of the HDR image is determined according to a preset compressed tone mapping function and the predicted value of the HDR image, and the compressed tone mapping value of the real value of the HDR image is determined according to the compressed tone mapping function and the true value of the HDR image is determined.
例如,动态转换模型的感知损失函数是基于如下公式确定的:For example, the perceptual loss function of the dynamic transition model is determined based on the following formula:
Figure PCTCN2021102173-appb-000018
Figure PCTCN2021102173-appb-000018
其中,Lp表示所述感知损失函数,
Figure PCTCN2021102173-appb-000019
x=H或GT,所述H为训练所述动态转换模型时,所述动态转换模型输出的预测值,所述GT为训练图像的真实值,“‖.‖ 1”表示L1范数,μ为预设参数,φ l表示所述预训练模型的第l层的特征图,大小为C l×H l×W l
where Lp represents the perceptual loss function,
Figure PCTCN2021102173-appb-000019
x=H or GT, the H is the predicted value output by the dynamic conversion model when the dynamic conversion model is trained, the GT is the real value of the training image, "‖.‖ 1 "represents the L1 norm, μ is a preset parameter, φ l represents the feature map of layer l of the pre-training model, and its size is C l ×H l ×W l .
在一些实施例中,所述动态转换模型的样式损失函数是基于第一元素值与第二元素值之间的误差确定的,其中,所述第一元素值为HDR图像预测值的压缩色调映射值在所述预训练模型的第l层特征图的格拉姆Gram矩阵中对应的元素值,所述第二元素值为HDR图像真值的压缩色调映射值在所述Gram矩阵中对应的元素值,所述HDR图像预测值的压缩色调映射值是根据预设的压缩色调映射函数和所述HDR图像预测值确定的,所述HDR图像真值的压缩色调映射值是根据所述压缩色调映射函数和所述HDR图像真值确定的。In some embodiments, the style loss function of the dynamic transformation model is determined based on an error between a first element value and a second element value, wherein the first element value is a compressed tone map of an HDR image prediction value The value corresponds to the element value in the Gram Gram matrix of the l-th layer feature map of the pre-training model, and the second element value is the corresponding element value in the Gram matrix of the compressed tone mapping value of the true value of the HDR image , the compressed tone mapping value of the predicted value of the HDR image is determined according to a preset compressed tone mapping function and the predicted value of the HDR image, and the compressed tone mapping value of the real value of the HDR image is determined according to the compressed tone mapping function and the true value of the HDR image is determined.
例如,动态转换模型的样式损失函数是基于如下公式确定的:For example, the style loss function of the dynamic transformation model is determined based on the following formula:
Figure PCTCN2021102173-appb-000020
Figure PCTCN2021102173-appb-000020
其中,Lp表示所述感知损失函数,
Figure PCTCN2021102173-appb-000021
G(.)是所述预训练模型的第l层特征的Gram矩阵,
Figure PCTCN2021102173-appb-000022
x=H或GT,所述H为训练所述动态转换模型时,所述动态转换模型输出的预测值,所述GT为训练图像的HDR真实值,“‖.‖ 1”表示L1范数,μ为预设参数,φ l表示所述预训练模型的第l层的特征图,大小为C l×H l×W l,所述K l大小为C lH lW l
where Lp represents the perceptual loss function,
Figure PCTCN2021102173-appb-000021
G(.) is the Gram matrix of the l-th layer feature of the pre-training model,
Figure PCTCN2021102173-appb-000022
x=H or GT, the H is the predicted value output by the dynamic conversion model when the H is training the dynamic conversion model, the GT is the HDR true value of the training image, "‖.‖ 1 "represents the L1 norm, μ is a preset parameter, φ l represents the feature map of layer l of the pre-training model, and its size is C l ×H l ×W l , and the size of K l is C l H l W l .
应理解,装置实施例与方法实施例可以相互对应,类似的描述可以参照方法实施例。为避免重复,此处不再赘述。具体地,图10所示的装置20可以对应于执行本申请实施例的图像处理方法中的相应主体,并且装置20中的各个单元的前述和其它操作和/或功能分别为了实现图像处理方法中的相应流程,为了简洁,在此不再赘述。It should be understood that the device embodiment and the method embodiment may correspond to each other, and similar descriptions may refer to the method embodiment. To avoid repetition, details are not repeated here. Specifically, the device 20 shown in FIG. 10 may correspond to the corresponding subject in the image processing method of the embodiment of the present application, and the aforementioned and other operations and/or functions of each unit in the device 20 are respectively in order to realize the image processing method For the sake of brevity, the corresponding process will not be repeated here.
图11是本申请实施例提供的模型训练装置的示意性框图。Fig. 11 is a schematic block diagram of a model training device provided by an embodiment of the present application.
如图11所示,模型训练装置40包括:As shown in Figure 11, model training device 40 comprises:
获取单元41,用于获取低动态范围LDR训练图像和所述LDR训练图像的高动态范围HDR图像真值;An acquisition unit 41, configured to acquire a low dynamic range LDR training image and a true value of a high dynamic range HDR image of the LDR training image;
处理单元42,用于将所述LDR训练图像输入动态转换模型,通过第i个编码模块对第i-1个第一特征信息进行特征提取,得到所述LDR训练图像的第i个第一特征信息,其中,所述动态转换模型包括串联连接的N个编码模块和串联连接的所述N个解码模块,所述N个编码模块中的最后一个编码模块的输出与所述N个解码模块中的第一个解码模块的输入连接,且第i个编码模块与第N-i+1个解码模块跳跃连接,所述i为小于或等于N的正整数,所述N为正整数;通过所述第N-i+1个解码模块 对所述第i-1个第一特征信息和所述LDR训练图像的第N-i个第二特征信息进行特征提取,得到所述LDR训练图像的第N-i+1个第二特征信息;根据所述N个解码模块中最后一个解码模块输出的所述LDR训练图像的第二特征信息,确定所述LDR训练图像的HDR图像预测值;确定所述LDR训练图像的HDR图像预测值和所述LDR训练图像的HDR图像真值之间的损失,并根据所述损失对所述动态转换模型进行训练。The processing unit 42 is configured to input the LDR training image into the dynamic conversion model, and extract the i-1 first feature information through the i-th encoding module to obtain the i-th first feature of the LDR training image information, wherein the dynamic conversion model includes N encoding modules connected in series and the N decoding modules connected in series, and the output of the last encoding module in the N encoding modules is the same as that of the N decoding modules The input connection of the first decoding module of , and the i-th encoding module is skipped and connected to the N-i+1 decoding module, the i is a positive integer less than or equal to N, and the N is a positive integer; through the The N-i+1 decoding module performs feature extraction on the i-1 first feature information and the N-i second feature information of the LDR training image to obtain the N-i of the LDR training image. i+1 second feature information; according to the second feature information of the LDR training image output by the last decoding module in the N decoding modules, determine the HDR image prediction value of the LDR training image; determine the LDR The loss between the HDR image prediction value of the training image and the HDR image true value of the LDR training image, and the dynamic conversion model is trained according to the loss.
在一种实施例中,所述动态转换模型还包括:位于所述第i个编码模块与所述第N-i+1个解码模块的跳跃连接中的卷积注意力模块,上述处理单元42,具体用于通过所述卷积注意力模块对所述第i-1个第一特征信息进行空间信息与通道信息提取,得到所述LDR训练图像的第i-1个第三特征信息;通过所述第N-i+1个解码模块对所述第i-1个第三特征信息和所述第N-i个第二特征信息进行特征提取,得到所述LDR训练图像的第N-i+1个第二特征信息。In one embodiment, the dynamic conversion model further includes: a convolutional attention module located in the skip connection between the i-th encoding module and the N-i+1-th decoding module, the above-mentioned processing unit 42 , specifically for performing spatial information and channel information extraction on the i-1th first feature information through the convolution attention module to obtain the i-1th third feature information of the LDR training image; by The N-i+1th decoding module performs feature extraction on the i-1th third feature information and the N-ith second feature information to obtain the N-i+1th of the LDR training image a second characteristic information.
在一些实施例中,所述卷积注意力模块包括通道注意力模块和空间注意力模块,上述处理单元42,具体用于通过所述通道注意力模块对所述第i-1个第一特征信息进行通道信息提取,得到所述第i-1个第一特征信息的通道注意力信息;通过所述空间注意力模块对第i-1个第一特征信息的融合通道特征信息进行空间信息提取,得到所述第i-1个第一特征信息的空间注意力信息,所述第i-1个第一特征信息的融合通道特征信息是根据所述第i-1个第一特征信息和第i-1个第一特征信息的通道注意力信息确定的;根据所述第i-1个第一特征信息的通道注意力信息和空间注意力信息,确定所述LDR训练图像的第i-1个第三特征信息。In some embodiments, the convolutional attention module includes a channel attention module and a spatial attention module, and the above-mentioned processing unit 42 is specifically configured to perform the i-1th first feature through the channel attention module Extract channel information from the information to obtain the channel attention information of the i-1 first feature information; perform spatial information extraction on the fusion channel feature information of the i-1 first feature information through the spatial attention module , to obtain the spatial attention information of the i-1 first feature information, the fusion channel feature information of the i-1 first feature information is based on the i-1 first feature information and the Determined by the channel attention information of the i-1 first feature information; according to the channel attention information and spatial attention information of the i-1 first feature information, determine the i-1th of the LDR training image A third characteristic information.
在一些实施例中,所述卷积注意力模块还包括第一乘法单元,上述处理单元42,还用于通过所述第一乘法单元对所述第i-1个第一特征信息和第i-1个第一特征信息的通道注意力信息进行相乘,得到所述第i-1个第一特征信息的融合通道特征信息。In some embodiments, the convolutional attention module further includes a first multiplication unit, the above-mentioned processing unit 42 is also used to perform the i-1th first feature information and the i-th by the first multiplication unit Multiply the channel attention information of the first feature information to obtain the fusion channel feature information of the i-1 first feature information.
在一些实施例中,所述卷积注意力模块还包括第二乘法单元,上述处理单元42,具体用于通过所述第二乘法单元对所述第i-1个第一特征信息的融合通道特征信息和空间注意力信息进行相乘,得到所述LDR训练图像的第i-1个第三特征信息。In some embodiments, the convolutional attention module further includes a second multiplication unit, the above-mentioned processing unit 42, which is specifically used to perform the fusion channel of the i-1th first feature information through the second multiplication unit The feature information is multiplied by the spatial attention information to obtain the i-1th third feature information of the LDR training image.
在一些实施例中,所述通道注意力模块包括:第一空间压缩单元、第二空间压缩单元和通道特征提取单元,上述处理单元42,具体用于通过所述第一空间压缩单元对所述第i-1个第一特征信息进行空间维度压缩,得到所述第i-1个第一特征信息的第一空间压缩信息;通过所述第二空间压缩单元对所述第i-1个第一特征信息进行空间维度压缩,得到所述第i-1个第一特征信息的第二空间压缩信息;通过所述通道特征提取单元对所述第i-1个第一特征信息的第一空间压缩信息进行通道特征提取,得到所述i-1个第一特征信息的第一通道信息;通过所述通道特征提取单元对所述第i-1个第一特征信息的第二空间压缩信息进行通道特征提取,得到所述i-1个第一特征信息的第二通道信息;根据所述i-1个第一特征信息的第一通道信息和第二通道信息,确定所述第i-1个第一特征信息的通道注意力信息。In some embodiments, the channel attention module includes: a first spatial compression unit, a second spatial compression unit, and a channel feature extraction unit, and the above processing unit 42 is specifically configured to use the first spatial compression unit to analyze the Perform spatial dimension compression on the i-1th first feature information to obtain first spatial compression information of the i-1th first feature information; use the second spatial compression unit to compress the i-1th first feature information A feature information is subjected to spatial dimension compression to obtain the second spatial compression information of the i-1 first feature information; the first spatial dimension of the i-1 first feature information is obtained by the channel feature extraction unit performing channel feature extraction on the compressed information to obtain the first channel information of the i-1 first feature information; performing the second spatial compression information on the i-1 first feature information through the channel feature extraction unit Channel feature extraction, obtaining the second channel information of the i-1 first feature information; determining the i-1th channel information according to the first channel information and second channel information of the i-1 first feature information Channel attention information of the first feature information.
在一些实施例中,所述第一空间压缩单元和/或所述第二空间压缩单元包括池化层。In some embodiments, the first spatial compression unit and/or the second spatial compression unit comprises a pooling layer.
在一些实施例中,所述第一空间压缩单元为最大池化层,和/或所述第二空间压缩单元为平均池化层。In some embodiments, the first spatial compression unit is a max pooling layer, and/or the second spatial compression unit is an average pooling layer.
在一些实施例中,所述通道特征提取单元为多层感知机MLP。In some embodiments, the channel feature extraction unit is a multi-layer perceptron MLP.
在一些实施例中,所述通道注意力模块还包括:第一加法单元和第一激活函数,上述处理单元42,具体用于通过所述第一加法单元对所述i-1个第一特征信息的第一通道信息和第二通道信息进行相加,得到所述i-1个第一特征信息的融合通道信息;通过所述第一激活函数对所述i-1个第一特征信息的融合通道信息进行非线性处理,得到所述第i-1个第一特征信息的通道注意力信息。In some embodiments, the channel attention module further includes: a first addition unit and a first activation function, and the above-mentioned processing unit 42 is specifically configured to perform the i-1 first features through the first addition unit adding the first channel information and the second channel information of the information to obtain the fusion channel information of the i-1 pieces of first feature information; The channel information is fused to perform nonlinear processing to obtain the channel attention information of the i-1 th first feature information.
在一些实施例中,所述空间注意力模块包括:第一通道压缩单元、第二通道压缩单元和空间特征提取单元,上述处理单元42,具体用于通过所述第一通道压缩单元对所述第i-1个第一特征信息的融合通道特征信息进行通道维度压缩,得到所述第i-1个第一特征信息的第一通道压缩信息;通过所述第二通道压缩单元对所述第i-1个第一特征信息的融合通道特征信息进行通道维度压缩,得到所述第i-1个第一特征信息的第二通道压缩信息;通过所述空间特征提取单元对所述第i-1个第一特征信息的第一通道压缩信息和第二通道压缩信息进行空间特征提取,得到所述第i-1个第一特征信息的空间特征信息;根据所述第i-1个第一特征信息的空间特征信息,确定所述第i-1个第一特征信息的空间注意 力信息。In some embodiments, the spatial attention module includes: a first channel compression unit, a second channel compression unit, and a spatial feature extraction unit, and the above-mentioned processing unit 42 is specifically configured to use the first channel compression unit to Perform channel dimension compression on the fused channel feature information of the i-1 first feature information to obtain first channel compression information of the i-1 first feature information; use the second channel compression unit to compress the first channel The fusion channel feature information of the i-1 first feature information is compressed in the channel dimension to obtain the second channel compression information of the i-1 first feature information; the i-th first feature information is extracted by the spatial feature extraction unit. The first channel compressed information and the second channel compressed information of one first feature information are subjected to spatial feature extraction to obtain the spatial feature information of the i-1 first feature information; according to the i-1 first The spatial feature information of the feature information is to determine the spatial attention information of the i-1 first feature information.
在一些实施例中,所述第一通道压缩单元和/或所述第二通道压缩单元包括池化层。In some embodiments, the first channel compression unit and/or the second channel compression unit comprises a pooling layer.
在一些实施例中,所述第一通道压缩单元为最大池化层,和/或所述第二通道压缩单元为平均池化层。In some embodiments, the first channel compression unit is a max pooling layer, and/or the second channel compression unit is an average pooling layer.
在一些实施例中,所述空间特征提取单元为卷积层。In some embodiments, the spatial feature extraction unit is a convolutional layer.
在一些实施例中,所述空间注意力模块还包括第二激活函数,上述处理单元42,具体用于通过所述第二激活函数对所述第i-1个第一特征信息的空间特征信息进行非线性处理,得到所述第i-1个第一特征信息的空间注意力信息。In some embodiments, the spatial attention module further includes a second activation function, and the above-mentioned processing unit 42 is specifically configured to perform the spatial feature information of the i-1th first feature information through the second activation function Perform nonlinear processing to obtain the spatial attention information of the i-1th first feature information.
在一些实施例中,所述第i-1个第一特征信息的通道注意力信息的空间维度为1×1。In some embodiments, the spatial dimension of the channel attention information of the i-1 th first feature information is 1×1.
在一些实施例中,所述第i-1个第一特征信息的空间注意力信息的特征维度为1。In some embodiments, the feature dimension of the spatial attention information of the i-1 th first feature information is 1.
在一些实施例中,所述动态转换模型还包括至少一个下采样单元,上述处理单元42,还用于通过所述下采样单元对所述编码模块输出的特征信息进行空间维度下采样。In some embodiments, the dynamic conversion model further includes at least one down-sampling unit, the above-mentioned processing unit 42 is further configured to down-sample the feature information output by the encoding module through the down-sampling unit in a spatial dimension.
可选的,所述下采样单元为最大池化层。Optionally, the downsampling unit is a maximum pooling layer.
在一些实施例中,所述动态转换模型还包括至少一个上采样单元,上述处理单元42,还用于通过所述上采样单元对所述解码模块输出的特征信息进行空间维度上采样。In some embodiments, the dynamic conversion model further includes at least one upsampling unit, the above-mentioned processing unit 42 is further configured to perform spatial dimension upsampling on the feature information output by the decoding module through the upsampling unit.
可选的,所述上采样单元为双线性插值单元。Optionally, the upsampling unit is a bilinear interpolation unit.
可选的,所述N个编码模块中每个编码模块包括至少一个卷积块,其中所述N个编码模块中每个编码模块所包括的卷积块的参数不完全相同。Optionally, each of the N encoding modules includes at least one convolutional block, and parameters of the convolutional blocks included in each of the N encoding modules are not completely the same.
可选的,所述N个解码模块中每个解码模块包括至少一个卷积块,其中所述N个解码模块中每个解码模块所包括的卷积块的参数不完全相同。Optionally, each of the N decoding modules includes at least one convolutional block, and parameters of the convolutional blocks included in each of the N decoding modules are not completely the same.
在一些实施例中,若所述i等于N,则所述第N-i个第二特征信息是根据所述第N个编码模块输出的第N个第一特征信息确定的;或者,若所述i小于N,则所述第N-i个第二特征信息是根据第N-i个解码模块输出的第N-i个第二特征信息确定的;或者,若所述i等于1,则所述第i-1个第一特征信息是根据所述LDR训练图像确定的;或者,若所述i大于1,则所述第i-1个第一特征信息是根据第i-1个编码模块输出的第一特征信息确定的。In some embodiments, if the i is equal to N, the N-i th second feature information is determined according to the N th first feature information output by the N th encoding module; or, if the i is less than N, the N-i-th second feature information is determined according to the N-i-th second feature information output by the N-i-th decoding module; or, if the i is equal to 1, then the i-1-th A feature information is determined according to the LDR training image; or, if the i is greater than 1, the i-1th first feature information is determined according to the first feature information output by the i-1th coding module of.
在一些实施例中,上述处理单元42,具体用于对所述第i-1个第三特征信息和所述第N-i个第二特征信息进行级联;将级联后的特征信息输入所述第N-i+1个解码模块进行特征提取,得到所述LDR训练图像的第N-i+1个第二特征信息。In some embodiments, the above processing unit 42 is specifically configured to concatenate the i-1th third feature information and the N-ith second feature information; input the concatenated feature information into the The N-i+1th decoding module performs feature extraction to obtain the N-i+1th second feature information of the LDR training image.
在一些实施例中,所述动态转换模型还包括第一卷积层,上述处理单元42,还用于通过所述第一卷积层对所述LDR训练图像进行特征提取,得到所述LDR训练图像的初始特征图;将所述初始特征图分别输入第一个编码模块和第一卷积注意力模块中,得到所述第一个编码模块输出的第一个第一特征信息,以及得到所述第一个卷积注意力模块输出的第一个第三特征信息。In some embodiments, the dynamic conversion model further includes a first convolutional layer, and the above-mentioned processing unit 42 is also configured to perform feature extraction on the LDR training image through the first convolutional layer to obtain the LDR training image. The initial feature map of the image; the initial feature map is input into the first coding module and the first convolution attention module respectively, and the first first feature information output by the first coding module is obtained, and the obtained The first third feature information output by the first convolutional attention module.
在一些实施例中,所述动态转换模型还包括第二卷积层,上述处理单元42,具体用于通过所述第二卷积层对所述最后一个解码模块输出的所述LDR训练图像的第二特征信息进行特征提取,输出所述LDR训练图像的HDR图像预测值。In some embodiments, the dynamic conversion model further includes a second convolutional layer, and the above-mentioned processing unit 42 is specifically used to process the LDR training image output by the last decoding module through the second convolutional layer The second feature information performs feature extraction, and outputs the HDR image prediction value of the LDR training image.
在一些实施例中,上述处理单元42,还用于获取预训练模型在预训练时得到的预训练参数;将所述预训练参数确定为所述动态转换模型的初始参数。In some embodiments, the processing unit 42 is further configured to obtain pre-training parameters obtained during pre-training of the pre-training model; and determine the pre-training parameters as initial parameters of the dynamic transformation model.
在一些实施例中,上述处理单元42,具体用于根据预设的损失函数,确定所述LDR训练图像的HDR图像预测值和所述LDR训练图像的HDR图像真值之间的目标损失。In some embodiments, the above processing unit 42 is specifically configured to determine the target loss between the predicted value of the HDR image of the LDR training image and the true value of the HDR image of the LDR training image according to a preset loss function.
在一些实施例中,所述预设的损失函数包括重构损失函数、感知损失函数和样式损失函数中的至少一个。In some embodiments, the preset loss function includes at least one of a reconstruction loss function, a perceptual loss function and a style loss function.
在一些实施例中,上述处理单元42,具体用于确定所述HDR图像预测值与所述HDR图像真值之间的重构损失;确定所述HDR图像预测值与所述HDR图像真值之间的感知损失;确定所述HDR图像预测值与所述HDR图像真值之间的样式损失;根据所述HDR图像预测值与所述HDR图像真值之间的重构损失、感知损失和样式损失,确定所述确定所述HDR图像预测值与所述HDR图像真值之间的目标损失。In some embodiments, the above processing unit 42 is specifically configured to determine the reconstruction loss between the predicted value of the HDR image and the true value of the HDR image; determine the difference between the predicted value of the HDR image and the true value of the HDR image Perceptual loss between; determine the style loss between the predicted value of the HDR image and the true value of the HDR image; according to the reconstruction loss, perceptual loss and style between the predicted value of the HDR image and the true value of the HDR image Loss, determining the target loss between the predicted value of the HDR image and the true value of the HDR image.
在一些实施例中,上述处理单元42,具体用于根据如下公式,确定所述HDR图像预测值与所述 HDR图像真值之间的目标损失:In some embodiments, the above-mentioned processing unit 42 is specifically configured to determine the target loss between the predicted value of the HDR image and the true value of the HDR image according to the following formula:
Loss=L 1sL stpL p Loss=L 1s L stp L p
其中,Loss为所述目标损失,所述L1为所述重构损失,所述Lst为所述感知损失,所述Lp为所述样式损失,所述λ s和λ p是超参数。 Wherein, Loss is the target loss, the L1 is the reconstruction loss, the Lst is the perceptual loss, the Lp is the style loss, and the λ s and λ p are hyperparameters.
在一些实施例中,上述处理单元42,具体用于根据预设的压缩色调映射函数,确定所述HDR图像预测值的压缩色调映射值;根据所述压缩色调映射函数,确定所述HDR图像真值的压缩色调映射值;根据所述HDR图像真值的压缩色调映射值与所述HDR图像预测值的压缩色调映射值之间的误差,确定所述重构损失。In some embodiments, the above-mentioned processing unit 42 is specifically configured to determine the compressed tone-mapping value of the predicted value of the HDR image according to a preset compressed tone-mapping function; The compressed tone-mapped value of the value; the reconstruction loss is determined according to the error between the compressed tone-mapped value of the true value of the HDR image and the compressed tone-mapped value of the predicted value of the HDR image.
例如,根据如下公式确定所述重构损失:For example, the reconstruction loss is determined according to the following formula:
L 1=‖T(H)-T(GT)‖ 1 L 1 =‖T(H)-T(GT) ‖1
其中,L1表示所述重构损失,
Figure PCTCN2021102173-appb-000023
x=H或GT,所述H为所述动态转换模型输出的所述HDR图像预测值,所述GT为所述HDR图像真值,“‖.‖ 1”表示L1范数,μ为预设参数。
where L1 represents the reconstruction loss,
Figure PCTCN2021102173-appb-000023
x=H or GT, the H is the predicted value of the HDR image output by the dynamic conversion model, the GT is the true value of the HDR image, "‖.‖ 1 "indicates the L1 norm, μ is preset parameter.
在一些实施例中,上述处理单元42,具体用于获取所述预训练模型的第l层的特征图;根据预设的压缩色调映射函数,确定所述HDR图像预测值的压缩色调映射值;根据所述压缩色调映射函数,确定所述HDR图像真值的压缩色调映射值;确定所述HDR图像预测值的压缩色调映射值,在所述第l层的特征图中对应的第一特征值;确定所述HDR图像真值的压缩色调映射值,在所述第l层的特征图中对应的第二特征值;根据所述第一特征值与所述第二特征值之间的误差,确定所述感知损失。In some embodiments, the above-mentioned processing unit 42 is specifically configured to obtain the feature map of the first layer of the pre-training model; determine the compressed tone-mapping value of the HDR image prediction value according to a preset compressed tone-mapping function; According to the compressed tone mapping function, determine the compressed tone mapping value of the real value of the HDR image; determine the compressed tone mapping value of the predicted value of the HDR image, and the corresponding first feature value in the feature map of the first layer ; Determining the compressed tone mapping value of the true value of the HDR image, the second eigenvalue corresponding to the feature map of the first layer; according to the error between the first eigenvalue and the second eigenvalue, Determine the perceptual loss.
例如,根据如下公式确定所述感知损失:For example, the perceptual loss is determined according to the following formula:
Figure PCTCN2021102173-appb-000024
Figure PCTCN2021102173-appb-000024
其中,Lp表示所述感知损失,
Figure PCTCN2021102173-appb-000025
x=H或GT,所述H为所述动态转换模型输出的所述HDR图像预测值,所述GT为所述HDR图像真值,“‖.‖ 1”表示L1范数,μ为预设参数,φ l表示所述预训练模型的第l层的特征图,大小为C l×H l×W l
where Lp represents the perceptual loss,
Figure PCTCN2021102173-appb-000025
x=H or GT, the H is the predicted value of the HDR image output by the dynamic conversion model, the GT is the true value of the HDR image, "‖.‖ 1 "indicates the L1 norm, μ is preset parameter, φ l represents the feature map of layer l of the pre-training model, and its size is C l ×H l ×W l .
在一些实施例中,上述处理单元42,具体用于获取所述预训练模型的第l层特征图的格拉姆Gram矩阵;根据预设的压缩色调映射函数,确定所述HDR图像预测值的压缩色调映射值;根据所述压缩色调映射函数,确定所述HDR图像真值的压缩色调映射值;确定所述HDR图像预测值的压缩色调映射值,在所述Gram矩阵中对应的第一元素值;确定所述HDR图像真值的压缩色调映射值,在所述Gram矩阵中对应的第二元素值;根据所述第一元素值与所述第二元素值之间的误差,确定所述样式损失。In some embodiments, the above-mentioned processing unit 42 is specifically configured to obtain the Gram Gram matrix of the l-th layer feature map of the pre-training model; determine the compression of the predicted value of the HDR image according to a preset compressed tone mapping function Tone mapping value; according to the compressed tone mapping function, determine the compressed tone mapping value of the true value of the HDR image; determine the compressed tone mapping value of the predicted value of the HDR image, and the corresponding first element value in the Gram matrix ; Determine the compressed tone mapping value of the true value of the HDR image, the corresponding second element value in the Gram matrix; determine the style according to the error between the first element value and the second element value loss.
例如,根据如下公式确定所述样式损失:For example, the style loss is determined according to the following formula:
Figure PCTCN2021102173-appb-000026
Figure PCTCN2021102173-appb-000026
其中,Lp表示所述感知损失函数,
Figure PCTCN2021102173-appb-000027
G(.)是所述预训练模型的第l层特征的Gram矩阵,
Figure PCTCN2021102173-appb-000028
x=H或GT,所述H为所述动态转换模型输出的所述HDR图像预测值,所述GT为所述HDR图像真值,“‖.‖ 1”表示L1范数,μ为预设参数,φ l表示所述预训练模型的第l层的特征图,大小为C l×H l×W l,所述K l大小为C lH lW l
where Lp represents the perceptual loss function,
Figure PCTCN2021102173-appb-000027
G(.) is the Gram matrix of the l-th layer feature of the pre-training model,
Figure PCTCN2021102173-appb-000028
x=H or GT, the H is the predicted value of the HDR image output by the dynamic conversion model, the GT is the true value of the HDR image, "‖.‖ 1 "indicates the L1 norm, μ is preset Parameters, φ l represents the feature map of layer l of the pre-training model, with a size of C l ×H l ×W l , and the size of K l is C l H l W l .
应理解,装置实施例与方法实施例可以相互对应,类似的描述可以参照方法实施例。为避免重复,此处不再赘述。具体地,图11所示的装置40可以对应于执行本申请实施例的模型训练方法中的相应 主体,并且装置40中的各个单元的前述和其它操作和/或功能分别为了实现模型训练方法等各个方法中的相应流程,为了简洁,在此不再赘述。It should be understood that the device embodiment and the method embodiment may correspond to each other, and similar descriptions may refer to the method embodiment. To avoid repetition, details are not repeated here. Specifically, the device 40 shown in FIG. 11 may correspond to the corresponding subject in the model training method of the embodiment of the present application, and the aforementioned and other operations and/or functions of each unit in the device 40 are for realizing the model training method, etc. For the sake of brevity, the corresponding processes in each method are not repeated here.
上文中结合附图从功能单元的角度描述了本申请实施例的装置和系统。应理解,该功能单元可以通过硬件形式实现,也可以通过软件形式的指令实现,还可以通过硬件和软件单元组合实现。具体地,本申请实施例中的方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路和/或软件形式的指令完成,结合本申请实施例公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件单元组合执行完成。可选地,软件单元可以位于随机存储器,闪存、只读存储器、可编程只读存储器、电可擦写可编程存储器、寄存器等本领域的成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法实施例中的步骤。The device and system of the embodiments of the present application are described above from the perspective of functional units with reference to the accompanying drawings. It should be understood that the functional unit may be implemented in the form of hardware, may also be implemented by instructions in the form of software, and may also be implemented by a combination of hardware and software units. Specifically, each step of the method embodiment in the embodiment of the present application can be completed by an integrated logic circuit of the hardware in the processor and/or instructions in the form of software, and the steps of the method disclosed in the embodiment of the present application can be directly embodied as hardware The decoding processor is executed, or the combination of hardware and software units in the decoding processor is used to complete the execution. Optionally, the software unit may be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, and registers. The storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps in the above method embodiments in combination with its hardware.
图12是本申请实施例提供的电子设备的示意性框图。Fig. 12 is a schematic block diagram of an electronic device provided by an embodiment of the present application.
如图12所示,该电子设备30可以为本申请实施例所述的图像处理设备,或者解码器,或者为模型训练设备,该电子设备30可包括:As shown in Figure 12, the electronic device 30 may be the image processing device described in the embodiment of the present application, or a decoder, or a model training device, and the electronic device 30 may include:
存储器33和处理器32,该存储器33用于存储计算机程序34,并将该程序代码34传输给该处理器32。换言之,该处理器32可以从存储器33中调用并运行计算机程序34,以实现本申请实施例中的方法。A memory 33 and a processor 32 , the memory 33 is used to store a computer program 34 and transmit the program code 34 to the processor 32 . In other words, the processor 32 can call and run the computer program 34 from the memory 33 to implement the method in the embodiment of the present application.
例如,该处理器32可用于根据该计算机程序34中的指令执行上述方法200中的步骤。For example, the processor 32 can be used to execute the steps in the above-mentioned method 200 according to the instructions in the computer program 34 .
在本申请的一些实施例中,该处理器32可以包括但不限于:In some embodiments of the present application, the processor 32 may include, but is not limited to:
通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等等。General-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates Or transistor logic devices, discrete hardware components, and so on.
在本申请的一些实施例中,该存储器33包括但不限于:In some embodiments of the present application, the memory 33 includes but is not limited to:
易失性存储器和/或非易失性存储器。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synch link DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。volatile memory and/or non-volatile memory. Among them, the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash. The volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (Static RAM, SRAM), Dynamic Random Access Memory (Dynamic RAM, DRAM), Synchronous Dynamic Random Access Memory (Synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory (synch link DRAM, SLDRAM) and Direct Memory Bus Random Access Memory (Direct Rambus RAM, DR RAM).
在本申请的一些实施例中,该计算机程序34可以被分割成一个或多个单元,该一个或者多个单元被存储在该存储器33中,并由该处理器32执行,以完成本申请提供的方法。该一个或多个单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述该计算机程序34在该电子设备30中的执行过程。In some embodiments of the present application, the computer program 34 can be divided into one or more units, and the one or more units are stored in the memory 33 and executed by the processor 32 to complete the present application. Methods. The one or more units may be a series of computer program instruction segments capable of accomplishing specific functions, and the instruction segments are used to describe the execution process of the computer program 34 in the electronic device 30 .
如图12所示,该电子设备30还可包括:As shown in Figure 12, the electronic device 30 may also include:
收发器33,该收发器33可连接至该处理器32或存储器33。A transceiver 33 , the transceiver 33 can be connected to the processor 32 or the memory 33 .
其中,处理器32可以控制该收发器33与其他设备进行通信,具体地,可以向其他设备发送信息或数据,或接收其他设备发送的信息或数据。收发器33可以包括发射机和接收机。收发器33还可以进一步包括天线,天线的数量可以为一个或多个。Wherein, the processor 32 can control the transceiver 33 to communicate with other devices, specifically, can send information or data to other devices, or receive information or data sent by other devices. Transceiver 33 may include a transmitter and a receiver. The transceiver 33 may further include antennas, and the number of antennas may be one or more.
应当理解,该电子设备30中的各个组件通过总线系统相连,其中,总线系统除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。It should be understood that the various components in the electronic device 30 are connected through a bus system, wherein the bus system includes not only a data bus, but also a power bus, a control bus and a status signal bus.
本申请还提供了一种计算机存储介质,其上存储有计算机程序,该计算机程序被计算机执行时使得该计算机能够执行上述方法实施例的方法。或者说,本申请实施例还提供一种包含指令的计算机程序产品,该指令被计算机执行时使得计算机执行上述方法实施例的方法。The present application also provides a computer storage medium, on which a computer program is stored, and when the computer program is executed by a computer, the computer can execute the methods of the above method embodiments. In other words, the embodiments of the present application further provide a computer program product including instructions, and when the instructions are executed by a computer, the computer executes the methods of the foregoing method embodiments.
当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机程序指令时,全部或部分地产生按照本申请实施例该的流程或功能。该计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。 该计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,该计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。该计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如数字点云光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part. The computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transferred from a website, computer, server, or data center by wire (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server or data center. The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media. The available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a tape), an optical medium (such as a digital video disc (DVD)), or a semiconductor medium (such as a solid state disk (SSD)), etc. .
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those skilled in the art can appreciate that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,该单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or can be Integrate into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。例如,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。A unit described as a separate component may or may not be physically separated, and a component shown as a unit may or may not be a physical unit, that is, it may be located in one place, or may also be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment. For example, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
以上内容,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以该权利要求的保护范围为准。The above content is only the specific implementation of the application, but the scope of protection of the application is not limited thereto. Anyone familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the application, and should covered within the scope of protection of this application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.

Claims (107)

  1. 一种图像解码方法,其特征在于,包括:An image decoding method, characterized in that, comprising:
    解码码流,得到重建图像;Decode the code stream to obtain the reconstructed image;
    将所述重建图像输入动态转换模型进行动态转换,得到所述重建图像的高动态范围HDR图像;Inputting the reconstructed image into a dynamic conversion model for dynamic conversion to obtain a high dynamic range HDR image of the reconstructed image;
    其中,所述动态转换模型包括:串联连接的N个编码模块和串联连接的N个解码模块,所述N个编码模块中的最后一个编码模块的输出与所述N个解码模块中的第一个解码模块的输入连接,且第i个编码模块与第N-i+1个解码模块跳跃连接,所述第i个编码模块用于对第i-1个编码模块输出的第i-1个第一特征信息进行特征提取,得到所述重建图像的第i个第一特征信息,所述第N-i+1个解码模块用于对所述第i-1个第一特征信息和所述重建图像的第N-i个第二特征信息进行特征提取,得到所述重建图像的第N-i+1个第二特征信息,所述重建图像的HDR图像是根据所述N个解码模块中最后一个解码模块输出的第二特征信息确定的,所述i为小于或等于N的正整数,所述N为正整数。Wherein, the dynamic conversion model includes: N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in the N encoding modules is the same as that of the first encoding module in the N decoding modules The input of the decoding module is connected, and the i-th encoding module is skip-connected to the N-i+1 decoding module, and the i-th encoding module is used for the i-1th output of the i-1 encoding module Feature extraction is performed on the first feature information to obtain the i-th first feature information of the reconstructed image, and the N-i+1th decoding module is used to extract the i-1-th first feature information and the Perform feature extraction on the N-ith second feature information of the reconstructed image to obtain the N-i+1th second feature information of the reconstructed image, and the HDR image of the reconstructed image is based on the last one of the N decoding modules Determined by the second feature information output by the decoding module, the i is a positive integer less than or equal to N, and the N is a positive integer.
  2. 根据权利要求1所述的方法,其特征在于,所述动态转换模型还包括:位于所述第i个编码模块与所述第N-i+1个解码模块的跳跃连接中的卷积注意力模块;The method according to claim 1, wherein the dynamic conversion model further comprises: convolutional attention located in the skip connection between the i-th encoding module and the N-i+1-th decoding module module;
    所述卷积注意力模块用于对所述第i-1个第一特征信息进行空间信息与通道信息提取,得到所述重建图像的第i-1个第三特征信息;The convolutional attention module is used to extract spatial information and channel information from the i-1 first feature information to obtain the i-1 third feature information of the reconstructed image;
    所述第N-i+1个解码模块用于对所述第i-1个第三特征信息和所述第N-i个第二特征信息进行特征提取,得到所述重建图像的第N-i+1个第二特征信息。The N-i+1th decoding module is used to perform feature extraction on the i-1th third feature information and the N-ith second feature information to obtain the N-i+th feature information of the reconstructed image. 1 piece of second characteristic information.
  3. 根据权利要求2所述的方法,其特征在于,所述卷积注意力模块包括通道注意力模块和空间注意力模块;The method according to claim 2, wherein the convolution attention module includes a channel attention module and a spatial attention module;
    所述通道注意力模块用于对所述第i-1个第一特征信息进行通道信息提取,得到所述第i-1个第一特征信息的通道注意力信息;The channel attention module is used to extract the channel information of the i-1 first feature information, and obtain the channel attention information of the i-1 first feature information;
    所述空间注意力模块用于对所述第i-1个第一特征信息和所述第i-1个第一特征信息的通道注意力信息进行空间信息提取,得到所述第i-1个第一特征信息的空间注意力信息;The spatial attention module is used to extract spatial information from the i-1 first feature information and channel attention information of the i-1 first feature information, to obtain the i-1 first feature information Spatial attention information of the first feature information;
    所述重建图像的第i-1个第三特征信息是根据所述第i-1个第一特征信息的通道注意力信息和空间注意力信息确定的。The i-1 th third feature information of the reconstructed image is determined according to the channel attention information and the spatial attention information of the i-1 th first feature information.
  4. 根据权利要求3所述的方法,其特征在于,所述卷积注意力模块还包括第一乘法单元;The method according to claim 3, wherein the convolution attention module also includes a first multiplication unit;
    所述第一乘法单元用于对所述第i-1个第一特征信息和第i-1个第一特征信息的通道注意力信息进行相乘,得到所述第i-1个第一特征信息的融合通道特征信息;The first multiplication unit is configured to multiply the i-1 first feature information and the channel attention information of the i-1 first feature information to obtain the i-1 first feature Information fusion channel feature information;
    所述空间注意力模块用于对所述第i-1个第一特征信息的融合通道特征信息进行空间信息提取,得到所述第i-1个第一特征信息的空间注意力信息。The spatial attention module is configured to extract spatial information from the fused channel feature information of the i-1 first feature information to obtain the spatial attention information of the i-1 first feature information.
  5. 根据权利要求4所述的方法,其特征在于,所述卷积注意力模块还包括第二乘法单元;The method according to claim 4, wherein the convolution attention module also includes a second multiplication unit;
    所述第二乘法单元用于对所述第i-1个第一特征信息的融合通道特征信息和空间注意力信息进行相乘,得到所述重建图像的第i-1个第三特征信息。The second multiplication unit is configured to multiply the fusion channel feature information and the spatial attention information of the i-1 first feature information to obtain the i-1 third feature information of the reconstructed image.
  6. 根据权利要求3所述的方法,其特征在于,所述通道注意力模块包括:第一空间压缩单元、第二空间压缩单元和通道特征提取单元;The method according to claim 3, wherein the channel attention module comprises: a first space compression unit, a second space compression unit and a channel feature extraction unit;
    所述第一空间压缩单元用于对所述第i-1个第一特征信息进行空间维度压缩,得到所述第i-1个第一特征信息的第一空间压缩信息;The first spatial compression unit is configured to perform spatial dimension compression on the i-1 first feature information to obtain first spatial compression information of the i-1 first feature information;
    所述第二空间压缩单元用于对所述第i-1个第一特征信息进行空间维度压缩,得到所述第i-1个第一特征信息的第二空间压缩信息;The second spatial compression unit is configured to perform spatial dimension compression on the i-1 first feature information to obtain second spatial compression information of the i-1 first feature information;
    所述通道特征提取单元用于对所述第i-1个第一特征信息的第一空间压缩信息进行通道特征提取,得到所述i-1个第一特征信息的第一通道信息,对所述第i-1个第一特征信息的第二空间压缩信息进行通道特征提取,得到所述i-1个第一特征信息的第二通道信息;The channel feature extraction unit is configured to perform channel feature extraction on the first spatially compressed information of the i-1 first feature information, obtain the first channel information of the i-1 first feature information, and perform the channel feature extraction on the i-1 first feature information. performing channel feature extraction on the second spatial compression information of the i-1 first feature information, and obtaining the second channel information of the i-1 first feature information;
    所述第i-1个第一特征信息的通道注意力信息是根据所述i-1个第一特征信息的第一通道信息和第二通道信息确定的。The channel attention information of the i-1 first feature information is determined according to the first channel information and the second channel information of the i-1 first feature information.
  7. 根据权利要求6所述的方法,其特征在于,所述第一空间压缩单元和/或所述第二空间压缩单元包括池化层。The method according to claim 6, wherein the first spatial compression unit and/or the second spatial compression unit comprises a pooling layer.
  8. 根据权利要求6所述的方法,其特征在于,所述第一空间压缩单元为最大池化层,和/或所述第二空间压缩单元为平均池化层。The method according to claim 6, wherein the first spatial compression unit is a maximum pooling layer, and/or the second spatial compression unit is an average pooling layer.
  9. 根据权利要求6所述的方法,其特征在于,所述通道特征提取单元为多层感知机MLP。The method according to claim 6, wherein the channel feature extraction unit is a multi-layer perceptron (MLP).
  10. 根据权利要求6所述的方法,其特征在于,所述通道注意力模块还包括:第一加法单元和第一激活函数;The method according to claim 6, wherein the channel attention module further comprises: a first addition unit and a first activation function;
    所述第一加法单元用于对所述i-1个第一特征信息的第一通道信息和第二通道信息进行相加,得到所述i-1个第一特征信息的融合通道信息;The first adding unit is configured to add the first channel information and the second channel information of the i-1 pieces of first feature information to obtain the fusion channel information of the i-1 pieces of first feature information;
    所述第一激活函数用于对所述i-1个第一特征信息的融合通道信息进行非线性处理,得到所述第i-1个第一特征信息的通道注意力信息。The first activation function is used to perform nonlinear processing on the fused channel information of the i-1 pieces of first feature information to obtain channel attention information of the i-1 th piece of first feature information.
  11. 根据权利要求3所述的方法,其特征在于,所述空间注意力模块包括:第一通道压缩单元、第二通道压缩单元和空间特征提取单元;The method according to claim 3, wherein the spatial attention module comprises: a first channel compression unit, a second channel compression unit and a spatial feature extraction unit;
    所述第一通道压缩单元用于对所述第i-1个第一特征信息的融合通道特征信息进行通道维度压缩,得到所述第i-1个第一特征信息的第一通道压缩信息;The first channel compression unit is configured to perform channel dimension compression on the fused channel feature information of the i-1 first feature information, to obtain the first channel compression information of the i-1 first feature information;
    所述第二通道压缩单元用于对所述第i-1个第一特征信息的融合通道特征信息进行通道维度压缩,得到所述第i-1个第一特征信息的第二通道压缩信息;The second channel compression unit is configured to perform channel dimension compression on the fused channel feature information of the i-1 first feature information to obtain second channel compression information of the i-1 first feature information;
    所述空间特征提取单元用于对所述第i-1个第一特征信息的第一通道压缩信息和第二通道压缩信息进行空间特征提取,得到所述第i-1个第一特征信息的空间特征信息;The spatial feature extraction unit is configured to perform spatial feature extraction on the first channel compressed information and the second channel compressed information of the i-1 first feature information, to obtain the i-1 first feature information Spatial feature information;
    所述第i-1个第一特征信息的空间注意力信息是根据所述第i-1个第一特征信息的空间特征信息确定的。The spatial attention information of the i-1 th first feature information is determined according to the spatial feature information of the i-1 th first feature information.
  12. 根据权利要求11所述的方法,其特征在于,所述第一通道压缩单元和/或所述第二通道压缩单元包括池化层。The method according to claim 11, wherein the first channel compression unit and/or the second channel compression unit comprises a pooling layer.
  13. 根据权利要求11所述的方法,其特征在于,所述第一通道压缩单元为最大池化层,和/或所述第二通道压缩单元为平均池化层。The method according to claim 11, wherein the first channel compression unit is a maximum pooling layer, and/or the second channel compression unit is an average pooling layer.
  14. 根据权利要求11所述的方法,其特征在于,所述空间特征提取单元为卷积层。The method according to claim 11, wherein the spatial feature extraction unit is a convolutional layer.
  15. 根据权利要求11所述的方法,其特征在于,所述空间注意力模块还包括第二激活函数;The method according to claim 11, wherein the spatial attention module further comprises a second activation function;
    所述第二激活函数用于对所述第i-1个第一特征信息的空间特征信息进行非线性处理,得到所述第i-1个第一特征信息的空间注意力信息。The second activation function is used to perform nonlinear processing on the spatial feature information of the i-1 th first feature information to obtain the spatial attention information of the i-1 th first feature information.
  16. 根据权利要求3-15任一项所述的方法,其特征在于,所述第i-1个第一特征信息的通道注意力信息的空间维度为1×1。The method according to any one of claims 3-15, wherein the spatial dimension of the channel attention information of the i-1 th first feature information is 1×1.
  17. 根据权利要求3-15任一项所述的方法,其特征在于,所述第i-1个第一特征信息的空间注意力信息的特征维度为1。The method according to any one of claims 3-15, wherein the feature dimension of the spatial attention information of the i-1 th first feature information is 1.
  18. 根据权利要求2所述的方法,其特征在于,所述动态转换模型还包括至少一个下采样单元;The method according to claim 2, wherein the dynamic conversion model further comprises at least one downsampling unit;
    所述下采样单元用于对所述编码模块输出的特征信息进行空间维度下采样。The down-sampling unit is used for down-sampling the feature information output by the encoding module in a spatial dimension.
  19. 根据权利要求18所述的方法,其特征在于,所述下采样单元为最大池化层。The method according to claim 18, wherein the downsampling unit is a maximum pooling layer.
  20. 根据权利要求18所述的方法,其特征在于,所述动态转换模型还包括至少一个上采样单元;The method according to claim 18, wherein the dynamic conversion model further comprises at least one upsampling unit;
    所述上采样单元用于对所述解码模块输出的特征信息进行空间维度上采样。The up-sampling unit is used for up-sampling the feature information output by the decoding module in a spatial dimension.
  21. 根据权利要求20所述的方法,其特征在于,所述上采样单元为双线性插值单元。The method according to claim 20, wherein the upsampling unit is a bilinear interpolation unit.
  22. 根据权利要求1所述的方法,其特征在于,所述N个编码模块中每个编码模块包括至少一个卷积块,其中所述N个编码模块中每个编码模块所包括的卷积块的参数不完全相同。The method according to claim 1, wherein each of the N encoding modules includes at least one convolutional block, wherein the convolutional block included in each of the N encoding modules The parameters are not exactly the same.
  23. 根据权利要求1所述的方法,其特征在于,所述N个解码模块中每个解码模块包括至少一个卷积块,其中所述N个解码模块中每个解码模块所包括的卷积块的参数不完全相同。The method according to claim 1, wherein each of the N decoding modules includes at least one convolutional block, wherein the convolutional block included in each of the N decoding modules The parameters are not exactly the same.
  24. 根据权利要求1所述的方法,其特征在于,The method according to claim 1, characterized in that,
    若所述i等于N,则所述第N-i个第二特征信息是根据所述第N个编码模块输出的第N个第一特征信息确定的;或者,If the i is equal to N, the N-i th second feature information is determined according to the N th first feature information output by the N th encoding module; or,
    若所述i小于N,则所述第N-i个第二特征信息是根据第N-i个解码模块输出的第N-i个第二特征信息确定的;或者,If the i is less than N, the N-i-th second feature information is determined according to the N-i-th second feature information output by the N-i-th decoding module; or,
    若所述i等于1,则所述第i-1个第一特征信息是根据所述重建图像确定的;或者,If the i is equal to 1, the i-1th first feature information is determined according to the reconstructed image; or,
    若所述i大于1,则所述第i-1个第一特征信息是根据第i-1个编码模块输出的第一特征信息确定的。If the i is greater than 1, the i-1 first feature information is determined according to the first feature information output by the i-1 encoding module.
  25. 根据权利要求2所述的方法,其特征在于,所述第N-i+1个解码模块用于对所述第i-1个第三特征信息和所述第N-i个第二特征信息级联后的特征信息进行特征提取,得到所述重建图像的第N-i+1个第二特征信息。The method according to claim 2, wherein the N-i+1th decoding module is used to concatenate the i-1th third characteristic information and the N-ith second characteristic information Feature extraction is performed on the final feature information to obtain the N-i+1th second feature information of the reconstructed image.
  26. 根据权利要求2所述的方法,其特征在于,所述动态转换模型还包括第一卷积层;The method according to claim 2, wherein the dynamic conversion model further comprises a first convolutional layer;
    所述第一卷积层用于对所述重建图像进行特征提取,得到所述重建图像的初始特征图,并将所述初始特征图分别输入第一个编码模块和第一卷积注意力模块中。The first convolutional layer is used to perform feature extraction on the reconstructed image to obtain an initial feature map of the reconstructed image, and input the initial feature map to the first coding module and the first convolutional attention module respectively middle.
  27. 根据权利要求2所述的方法,其特征在于,所述动态转换模型还包括第二卷积层;The method according to claim 2, wherein the dynamic transformation model further comprises a second convolutional layer;
    所述第二卷积层用于对最后一个解码模块输出的所述重建图像的第二特征信息进行特征提取,输出所述重建图像的HDR图像。The second convolutional layer is used to perform feature extraction on the second feature information of the reconstructed image output by the last decoding module, and output an HDR image of the reconstructed image.
  28. 根据权利要求2所述的方法,其特征在于,所述动态转换模型在训练时的初始参数是预训练模型在预训练时得到的预训练参数。The method according to claim 2, wherein the initial parameters of the dynamic conversion model during training are pre-training parameters obtained during pre-training of the pre-training model.
  29. 根据权利要求28所述的方法,其特征在于,所述动态转换模型的损失函数包括重构损失函数、感知损失函数和样式损失函数中的至少一个。The method according to claim 28, wherein the loss function of the dynamic conversion model includes at least one of a reconstruction loss function, a perceptual loss function and a style loss function.
  30. 根据权利要求29所述的方法,其特征在于,所述动态转换模型的损失函数为如下公式所示:The method according to claim 29, wherein the loss function of the dynamic conversion model is as shown in the following formula:
    Loss=L 1sL stpL p Loss=L 1s L stp L p
    其中,Loss为所述动态转换模型的损失函数,所述L1为所述重构损失函数,所述Lst为所述感知损失函数,所述Lp为所述样式损失函数,所述λ s和λ p是超参数。 Wherein, Loss is the loss function of the dynamic conversion model, the L1 is the reconstruction loss function, the Lst is the perceptual loss function, the Lp is the style loss function, and the λ s and λ p is a hyperparameter.
  31. 根据权利要求30所述的方法,其特征在于,所述动态转换模型的重构损失函数是基于HDR图像真值的压缩色调映射值与HDR图像预测值的压缩色调映射值之间的误差确定的,其中所述HDR图像预测值的压缩色调映射值是根据预设的压缩色调映射函数和所述HDR图像预测值确定的,所述HDR图像真值的压缩色调映射值是根据所述压缩色调映射函数和所述HDR图像真值确定的。The method according to claim 30, wherein the reconstruction loss function of the dynamic transformation model is determined based on the error between the compressed tone mapping value of the true value of the HDR image and the compressed tone mapping value of the predicted value of the HDR image , wherein the compressed tone mapping value of the predicted value of the HDR image is determined according to a preset compressed tone mapping function and the predicted value of the HDR image, and the compressed tone mapping value of the real value of the HDR image is determined according to the compressed tone mapping function and the ground truth value of the HDR image is determined.
  32. 根据权利要求30所述的方法,其特征在于,所述动态转换模型的感知损失函数是基于第一特征值与第二特征值之间的误差确定的,其中,所述第一特征值为HDR图像预测值的压缩色调映射值在所述预训练模型的第l层的特征图中对应的特征值,所述第二特征值为HDR图像真值的压缩色调映射值在所述第l层的特征图中对应的特征值,所述HDR图像预测值的压缩色调映射值是根据预设的压缩色调映射函数和所述HDR图像预测值确定的,所述HDR图像真值的压缩色调映射值是根据所述压缩色调映射函数和所述HDR图像真值确定的。The method according to claim 30, wherein the perceptual loss function of the dynamic conversion model is determined based on an error between a first eigenvalue and a second eigenvalue, wherein the first eigenvalue is HDR The compressed tone mapping value of the image prediction value corresponds to the feature value in the feature map of the first layer of the pre-training model, and the second feature value is the compressed tone mapping value of the true value of the HDR image in the first layer. The corresponding feature value in the feature map, the compressed tone mapping value of the predicted value of the HDR image is determined according to the preset compressed tone mapping function and the predicted value of the HDR image, and the compressed tone mapped value of the true value of the HDR image is Determined according to the compressed tone mapping function and the true value of the HDR image.
  33. 根据权利要求30所述的方法,其特征在于,所述动态转换模型的样式损失函数是基于第一元素值与第二元素值之间的误差确定的,其中,所述第一元素值为HDR图像预测值的压缩色调映射值在所述预训练模型的第l层特征图的格拉姆Gram矩阵中对应的元素值,所述第二元素值为HDR图像真值的压缩色调映射值在所述Gram矩阵中对应的元素值,所述HDR图像预测值的压缩色调映射值是根据预设的压缩色调映射函数和所述HDR图像预测值确定的,所述HDR图像真值的压缩色调映射值是根据所述压缩色调映射函数和所述HDR图像真值确定的。The method according to claim 30, wherein the style loss function of the dynamic conversion model is determined based on an error between a first element value and a second element value, wherein the first element value is HDR The compressed tone mapping value of the image prediction value corresponds to the element value in the Gram Gram matrix of the l-th layer feature map of the pre-training model, and the second element value is the compressed tone mapping value of the true value of the HDR image in the The corresponding element value in the Gram matrix, the compressed tone mapping value of the predicted value of the HDR image is determined according to the preset compressed tone mapping function and the predicted value of the HDR image, and the compressed tone mapped value of the true value of the HDR image is Determined according to the compressed tone mapping function and the true value of the HDR image.
  34. 一种图像处理方法,其特征在于,包括:An image processing method, characterized in that, comprising:
    获取待处理的低动态范围LDR图像;Obtain the low dynamic range LDR image to be processed;
    将所述LDR图像输入动态转换模型进行动态转换,得到所述LDR图像的高动态范围HDR图像;The LDR image is input into a dynamic conversion model for dynamic conversion to obtain a high dynamic range HDR image of the LDR image;
    其中,所述动态转换模型包括:串联连接的N个编码模块和串联连接的N个解码模块,所述N个编码模块中的最后一个编码模块的输出与所述N个解码模块中的第一个解码模块的输入连接,且第i个编码模块与第N-i+1个解码模块跳跃连接,所述第i个编码模块用于对第i-1个编码模块输出的第i-1个第一特征信息进行特征提取,得到所述LDR图像的第i个第一特征信息,所述第N-i+1个解码模块用于对所述第i-1个第一特征信息和所述LDR图像的第N-i个第二特征信息进行特征提取,得到所述LDR图像的第N-i+1个第二特征信息,所述LDR图像的HDR图像是根据所述N个解码模块 中最后一个解码模块输出的第二特征信息确定的,所述i为小于或等于N的正整数,所述N为正整数。Wherein, the dynamic conversion model includes: N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in the N encoding modules is the same as that of the first encoding module in the N decoding modules The input of the decoding module is connected, and the i-th encoding module is skip-connected to the N-i+1 decoding module, and the i-th encoding module is used for the i-1th output of the i-1 encoding module Feature extraction is performed on the first feature information to obtain the ith first feature information of the LDR image, and the N-i+1 decoding module is used to extract the i-1 first feature information and the Feature extraction is performed on the N-i second feature information of the LDR image to obtain the N-i+1 second feature information of the LDR image, and the HDR image of the LDR image is based on the last one of the N decoding modules Determined by the second feature information output by the decoding module, the i is a positive integer less than or equal to N, and the N is a positive integer.
  35. 根据权利要求34所述的方法,其特征在于,所述动态转换模型还包括:位于所述第i个编码模块与所述第N-i+1个解码模块的跳跃连接中的卷积注意力模块;The method according to claim 34, wherein the dynamic conversion model further comprises: convolutional attention located in the skip connection between the i-th encoding module and the N-i+1-th decoding module module;
    所述卷积注意力模块用于对所述第i-1个第一特征信息进行空间信息与通道信息提取,得到所述LDR图像的第i-1个第三特征信息;The convolutional attention module is used to extract spatial information and channel information from the i-1 first feature information to obtain the i-1 third feature information of the LDR image;
    所述第N-i+1个解码模块用于对所述第i-1个第三特征信息和所述第N-i个第二特征信息进行特征提取,得到所述LDR图像的第N-i+1个第二特征信息。The N-i+1th decoding module is used to perform feature extraction on the i-1th third feature information and the N-ith second feature information to obtain the N-i+th feature information of the LDR image. 1 piece of second characteristic information.
  36. 根据权利要求35所述的方法,其特征在于,所述卷积注意力模块包括通道注意力模块和空间注意力模块;The method according to claim 35, wherein the convolution attention module includes a channel attention module and a spatial attention module;
    所述通道注意力模块用于对所述第i-1个第一特征信息进行通道信息提取,得到所述第i-1个第一特征信息的通道注意力信息;The channel attention module is used to extract the channel information of the i-1 first feature information, and obtain the channel attention information of the i-1 first feature information;
    所述空间注意力模块用于对所述第i-1个第一特征信息和所述第i-1个第一特征信息的通道注意力信息进行空间信息提取,得到所述第i-1个第一特征信息的空间注意力信息;The spatial attention module is used to extract spatial information from the i-1 first feature information and channel attention information of the i-1 first feature information, to obtain the i-1 first feature information Spatial attention information of the first feature information;
    所述LDR图像的第i-1个第三特征信息是根据所述第i-1个第一特征信息的通道注意力信息和空间注意力信息确定的。The i-1 th third feature information of the LDR image is determined according to the channel attention information and the spatial attention information of the i-1 th first feature information.
  37. 根据权利要求36所述的方法,其特征在于,所述卷积注意力模块还包括第一乘法单元;The method according to claim 36, wherein the convolutional attention module further comprises a first multiplication unit;
    所述第一乘法单元用于对所述第i-1个第一特征信息和第i-1个第一特征信息的通道注意力信息进行相乘,得到所述第i-1个第一特征信息的融合通道特征信息;The first multiplication unit is configured to multiply the i-1 first feature information and the channel attention information of the i-1 first feature information to obtain the i-1 first feature Information fusion channel feature information;
    所述空间注意力模块用于对所述第i-1个第一特征信息的融合通道特征信息进行空间信息提取,得到所述第i-1个第一特征信息的空间注意力信息。The spatial attention module is configured to extract spatial information from the fused channel feature information of the i-1 first feature information to obtain the spatial attention information of the i-1 first feature information.
  38. 根据权利要求37所述的方法,其特征在于,所述卷积注意力模块还包括第二乘法单元;The method according to claim 37, wherein the convolution attention module further comprises a second multiplication unit;
    所述第二乘法单元用于对所述第i-1个第一特征信息的融合通道特征信息和空间注意力信息进行相乘,得到所述LDR图像的第i-1个第三特征信息。The second multiplication unit is configured to multiply the fusion channel feature information and the spatial attention information of the i-1 first feature information to obtain the i-1 third feature information of the LDR image.
  39. 根据权利要求36所述的方法,其特征在于,所述通道注意力模块包括:第一空间压缩单元、第二空间压缩单元和通道特征提取单元;The method according to claim 36, wherein the channel attention module comprises: a first space compression unit, a second space compression unit and a channel feature extraction unit;
    所述第一空间压缩单元用于对所述第i-1个第一特征信息进行空间维度压缩,得到所述第i-1个第一特征信息的第一空间压缩信息;The first spatial compression unit is configured to perform spatial dimension compression on the i-1 first feature information to obtain first spatial compression information of the i-1 first feature information;
    所述第二空间压缩单元用于对所述第i-1个第一特征信息进行空间维度压缩,得到所述第i-1个第一特征信息的第二空间压缩信息;The second spatial compression unit is configured to perform spatial dimension compression on the i-1 first feature information to obtain second spatial compression information of the i-1 first feature information;
    所述通道特征提取单元用于对所述第i-1个第一特征信息的第一空间压缩信息进行通道特征提取,得到所述i-1个第一特征信息的第一通道信息,对所述第i-1个第一特征信息的第二空间压缩信息进行通道特征提取,得到所述i-1个第一特征信息的第二通道信息;The channel feature extraction unit is configured to perform channel feature extraction on the first spatially compressed information of the i-1 first feature information, obtain the first channel information of the i-1 first feature information, and perform the channel feature extraction on the i-1 first feature information. performing channel feature extraction on the second spatial compression information of the i-1 first feature information, and obtaining the second channel information of the i-1 first feature information;
    所述第i-1个第一特征信息的通道注意力信息是根据所述i-1个第一特征信息的第一通道信息和第二通道信息确定的。The channel attention information of the i-1 first feature information is determined according to the first channel information and the second channel information of the i-1 first feature information.
  40. 根据权利要求39所述的方法,其特征在于,所述第一空间压缩单元和/或所述第二空间压缩单元包括池化层。The method according to claim 39, wherein the first spatial compression unit and/or the second spatial compression unit comprises a pooling layer.
  41. 根据权利要求39所述的方法,其特征在于,所述第一空间压缩单元为最大池化层,和/或所述第二空间压缩单元为平均池化层。The method according to claim 39, wherein the first spatial compression unit is a maximum pooling layer, and/or the second spatial compression unit is an average pooling layer.
  42. 根据权利要求39所述的方法,其特征在于,所述通道特征提取单元为多层感知机MLP。The method according to claim 39, wherein the channel feature extraction unit is a multi-layer perceptron (MLP).
  43. 根据权利要求39所述的方法,其特征在于,所述通道注意力模块还包括:第一加法单元和第一激活函数;The method according to claim 39, wherein the channel attention module further comprises: a first addition unit and a first activation function;
    所述第一加法单元用于对所述i-1个第一特征信息的第一通道信息和第二通道信息进行相加,得到所述i-1个第一特征信息的融合通道信息;The first adding unit is configured to add the first channel information and the second channel information of the i-1 pieces of first feature information to obtain the fusion channel information of the i-1 pieces of first feature information;
    所述第一激活函数用于对所述i-1个第一特征信息的融合通道信息进行非线性处理,得到所述第i-1个第一特征信息的通道注意力信息。The first activation function is used to perform nonlinear processing on the fused channel information of the i-1 pieces of first feature information to obtain channel attention information of the i-1 th piece of first feature information.
  44. 根据权利要求36所述的方法,其特征在于,所述空间注意力模块包括:第一通道压缩单元、第二通道压缩单元和空间特征提取单元;The method according to claim 36, wherein the spatial attention module comprises: a first channel compression unit, a second channel compression unit and a spatial feature extraction unit;
    所述第一通道压缩单元用于对所述第i-1个第一特征信息的融合通道特征信息进行通道维度压缩,得到所述第i-1个第一特征信息的第一通道压缩信息;The first channel compression unit is configured to perform channel dimension compression on the fused channel feature information of the i-1 first feature information, to obtain the first channel compression information of the i-1 first feature information;
    所述第二通道压缩单元用于对所述第i-1个第一特征信息的融合通道特征信息进行通道维度压缩,得到所述第i-1个第一特征信息的第二通道压缩信息;The second channel compression unit is configured to perform channel dimension compression on the fused channel feature information of the i-1 first feature information to obtain second channel compression information of the i-1 first feature information;
    所述空间特征提取单元用于对所述第i-1个第一特征信息的第一通道压缩信息和第二通道压缩信息进行空间特征提取,得到所述第i-1个第一特征信息的空间特征信息;The spatial feature extraction unit is configured to perform spatial feature extraction on the first channel compressed information and the second channel compressed information of the i-1 first feature information, to obtain the i-1 first feature information Spatial feature information;
    所述第i-1个第一特征信息的空间注意力信息是根据所述第i-1个第一特征信息的空间特征信息确定的。The spatial attention information of the i-1 th first feature information is determined according to the spatial feature information of the i-1 th first feature information.
  45. 根据权利要求44所述的方法,其特征在于,所述第一通道压缩单元和/或所述第二通道压缩单元包括池化层。The method according to claim 44, wherein the first channel compression unit and/or the second channel compression unit comprises a pooling layer.
  46. 根据权利要求44所述的方法,其特征在于,所述第一通道压缩单元为最大池化层,和/或所述第二通道压缩单元为平均池化层。The method according to claim 44, wherein the first channel compression unit is a maximum pooling layer, and/or the second channel compression unit is an average pooling layer.
  47. 根据权利要求44所述的方法,其特征在于,所述空间特征提取单元为卷积层。The method according to claim 44, wherein the spatial feature extraction unit is a convolutional layer.
  48. 根据权利要求44所述的方法,其特征在于,所述空间注意力模块还包括第二激活函数;The method according to claim 44, wherein the spatial attention module further comprises a second activation function;
    所述第二激活函数用于对所述第i-1个第一特征信息的空间特征信息进行非线性处理,得到所述第i-1个第一特征信息的空间注意力信息。The second activation function is used to perform nonlinear processing on the spatial feature information of the i-1 th first feature information to obtain the spatial attention information of the i-1 th first feature information.
  49. 根据权利要求36-48任一项所述的方法,其特征在于,所述第i-1个第一特征信息的通道注意力信息的空间维度为1×1。The method according to any one of claims 36-48, wherein the spatial dimension of the channel attention information of the i-1 th first feature information is 1×1.
  50. 根据权利要求36-48任一项所述的方法,其特征在于,所述第i-1个第一特征信息的空间注意力信息的特征维度为1。The method according to any one of claims 36-48, characterized in that the feature dimension of the spatial attention information of the i-1 th first feature information is 1.
  51. 根据权利要求35所述的方法,其特征在于,所述动态转换模型还包括至少一个下采样单元;The method according to claim 35, wherein the dynamic conversion model further comprises at least one downsampling unit;
    所述下采样单元用于对所述编码模块输出的特征信息进行空间维度下采样。The down-sampling unit is used for down-sampling the feature information output by the encoding module in a spatial dimension.
  52. 根据权利要求51所述的方法,其特征在于,所述下采样单元为最大池化层。The method according to claim 51, wherein the downsampling unit is a max pooling layer.
  53. 根据权利要求51所述的方法,其特征在于,所述动态转换模型还包括至少一个上采样单元;The method according to claim 51, wherein the dynamic conversion model further comprises at least one upsampling unit;
    所述上采样单元用于对所述解码模块输出的特征信息进行空间维度上采样。The up-sampling unit is used for up-sampling the feature information output by the decoding module in a spatial dimension.
  54. 根据权利要求53所述的方法,其特征在于,所述上采样单元为双线性插值单元。The method according to claim 53, wherein the upsampling unit is a bilinear interpolation unit.
  55. 根据权利要求34所述的方法,其特征在于,所述N个编码模块中每个编码模块包括至少一个卷积块,其中所述N个编码模块中每个编码模块所包括的卷积块的参数不完全相同。The method according to claim 34, wherein each of the N encoding modules includes at least one convolutional block, wherein the convolutional block included in each of the N encoding modules The parameters are not exactly the same.
  56. 根据权利要求34所述的方法,其特征在于,所述N个解码模块中每个解码模块包括至少一个卷积块,其中所述N个解码模块中每个解码模块所包括的卷积块的参数不完全相同。The method according to claim 34, wherein each of the N decoding modules includes at least one convolutional block, wherein the convolutional block included in each of the N decoding modules The parameters are not exactly the same.
  57. 根据权利要求34所述的方法,其特征在于,The method of claim 34, wherein,
    若所述i等于N,则所述第N-i个第二特征信息是根据所述第N个编码模块输出的第N个第一特征信息确定的;或者,If the i is equal to N, the N-i th second feature information is determined according to the N th first feature information output by the N th encoding module; or,
    若所述i小于N,则所述第N-i个第二特征信息是根据第N-i个解码模块输出的第N-i个第二特征信息确定的;或者,If the i is less than N, the N-i-th second feature information is determined according to the N-i-th second feature information output by the N-i-th decoding module; or,
    若所述i等于1,则所述第i-1个第一特征信息是根据所述LDR图像确定的;或者,If the i is equal to 1, the i-1th first feature information is determined according to the LDR image; or,
    若所述i大于1,则所述第i-1个第一特征信息是根据第i-1个编码模块输出的第一特征信息确定的。If the i is greater than 1, the i-1 first feature information is determined according to the first feature information output by the i-1 encoding module.
  58. 根据权利要求35所述的方法,其特征在于,所述第N-i+1个解码模块用于对所述第i-1个第三特征信息和所述第N-i个第二特征信息级联后的特征信息进行特征提取,得到所述LDR图像的第N-i+1个第二特征信息。The method according to claim 35, wherein the N-i+1th decoding module is used to concatenate the i-1th third characteristic information and the N-ith second characteristic information Feature extraction is performed on the last feature information to obtain the N-i+1th second feature information of the LDR image.
  59. 根据权利要求35所述的方法,其特征在于,所述动态转换模型还包括第一卷积层;The method according to claim 35, wherein the dynamic conversion model further comprises a first convolutional layer;
    所述第一卷积层用于对所述LDR图像进行特征提取,得到所述LDR图像的初始特征图,并将所述初始特征图分别输入第一个编码模块和第一卷积注意力模块中。The first convolutional layer is used to extract features from the LDR image, obtain an initial feature map of the LDR image, and input the initial feature map to the first coding module and the first convolutional attention module respectively middle.
  60. 根据权利要求35所述的方法,其特征在于,所述动态转换模型还包括第二卷积层;The method according to claim 35, wherein the dynamic conversion model further comprises a second convolutional layer;
    所述第二卷积层用于对最后一个解码模块输出的所述LDR图像的第二特征信息进行特征提取,输出所述LDR图像的HDR图像。The second convolutional layer is used to perform feature extraction on the second feature information of the LDR image output by the last decoding module, and output an HDR image of the LDR image.
  61. 根据权利要求35所述的方法,其特征在于,所述动态转换模型在训练时的初始参数是预训练模型在预训练时得到的预训练参数。The method according to claim 35, wherein the initial parameters of the dynamic conversion model during training are pre-training parameters obtained during pre-training of the pre-training model.
  62. 根据权利要求61所述的方法,其特征在于,所述动态转换模型的损失函数包括重构损失函数、感知损失函数和样式损失函数中的至少一个。The method according to claim 61, wherein the loss function of the dynamic conversion model includes at least one of a reconstruction loss function, a perceptual loss function and a style loss function.
  63. 根据权利要求62所述的方法,其特征在于,所述动态转换模型的损失函数为如下公式所示:The method according to claim 62, wherein the loss function of the dynamic conversion model is as shown in the following formula:
    Loss=L 1sL stpL p Loss=L 1s L stp L p
    其中,Loss为所述动态转换模型的损失函数,所述L1为所述重构损失函数,所述Lst为所述感知损失函数,所述Lp为所述样式损失函数,所述λ s和λ p是超参数。 Wherein, Loss is the loss function of the dynamic conversion model, the L1 is the reconstruction loss function, the Lst is the perceptual loss function, the Lp is the style loss function, and the λ s and λ p is a hyperparameter.
  64. 根据权利要求63所述的方法,其特征在于,所述动态转换模型的重构损失函数是基于根据HDR图像真值的压缩色调映射值与HDR图像预测值的压缩色调映射值之间的误差确定的,其中所述HDR图像预测值的压缩色调映射值是根据预设的压缩色调映射函数和所述HDR图像预测值确定的,所述HDR图像真值的压缩色调映射值是根据所述压缩色调映射函数和所述HDR图像真值确定的。The method according to claim 63, wherein the reconstruction loss function of the dynamic transformation model is determined based on the error between the compressed tone mapping value of the true value of the HDR image and the compressed tone mapping value of the predicted value of the HDR image wherein the compressed tone mapping value of the predicted value of the HDR image is determined according to a preset compressed tone mapping function and the predicted value of the HDR image, and the compressed tone mapping value of the true value of the HDR image is determined according to the compressed tone mapping function The mapping function and the ground truth of the HDR image are determined.
  65. 根据权利要求63所述的方法,其特征在于,所述动态转换模型的感知损失函数是基于第一特征值与第二特征值之间的误差确定的,其中,所述第一特征值为HDR图像预测值的压缩色调映射值在所述预训练模型的第l层的特征图中对应的特征值,所述第二特征值为HDR图像真值的压缩色调映射值在所述第l层的特征图中对应的特征值,所述HDR图像预测值的压缩色调映射值是根据预设的压缩色调映射函数和所述HDR图像预测值确定的,所述HDR图像真值的压缩色调映射值是根据所述压缩色调映射函数和所述HDR图像真值确定的。The method according to claim 63, wherein the perceptual loss function of the dynamic conversion model is determined based on an error between a first eigenvalue and a second eigenvalue, wherein the first eigenvalue is HDR The compressed tone mapping value of the image prediction value corresponds to the feature value in the feature map of the first layer of the pre-training model, and the second feature value is the compressed tone mapping value of the true value of the HDR image in the first layer. The corresponding feature value in the feature map, the compressed tone mapping value of the predicted value of the HDR image is determined according to the preset compressed tone mapping function and the predicted value of the HDR image, and the compressed tone mapped value of the true value of the HDR image is Determined according to the compressed tone mapping function and the true value of the HDR image.
  66. 根据权利要求63所述的方法,其特征在于,所述动态转换模型的样式损失函数是基于第一元素值与第二元素值之间的误差确定的,其中,所述第一元素值为HDR图像预测值的压缩色调映射值在所述预训练模型的第l层特征图的格拉姆Gram矩阵中对应的元素值,所述第二元素值为HDR图像真值的压缩色调映射值在所述Gram矩阵中对应的元素值,所述HDR图像预测值的压缩色调映射值是根据预设的压缩色调映射函数和所述HDR图像预测值确定的,所述HDR图像真值的压缩色调映射值是根据所述压缩色调映射函数和所述HDR图像真值确定的。The method according to claim 63, wherein the style loss function of the dynamic conversion model is determined based on an error between a first element value and a second element value, wherein the first element value is HDR The compressed tone mapping value of the image prediction value corresponds to the element value in the Gram Gram matrix of the l-th layer feature map of the pre-training model, and the second element value is the compressed tone mapping value of the true value of the HDR image in the The corresponding element value in the Gram matrix, the compressed tone mapping value of the predicted value of the HDR image is determined according to the preset compressed tone mapping function and the predicted value of the HDR image, and the compressed tone mapped value of the true value of the HDR image is Determined according to the compressed tone mapping function and the true value of the HDR image.
  67. 一种模型训练方法,其特征在于,包括:A model training method, characterized in that, comprising:
    获取低动态范围LDR训练图像和所述LDR训练图像的高动态范围HDR图像真值;Obtain the true value of the high dynamic range HDR image of the low dynamic range LDR training image and the LDR training image;
    将所述LDR训练图像输入动态转换模型,通过第i个编码模块对第i-1个第一特征信息进行特征提取,得到所述LDR训练图像的第i个第一特征信息,其中,所述动态转换模型包括串联连接的N个编码模块和串联连接的所述N个解码模块,所述N个编码模块中的最后一个编码模块的输出与所述N个解码模块中的第一个解码模块的输入连接,且第i个编码模块与第N-i+1个解码模块跳跃连接,所述i为小于或等于N的正整数,所述N为正整数;The LDR training image is input into the dynamic conversion model, and the i-1 first feature information is extracted through the i-th encoding module to obtain the i-th first feature information of the LDR training image, wherein the The dynamic conversion model comprises N coding modules connected in series and the N decoding modules connected in series, the output of the last coding module in the N coding modules is connected with the first decoding module in the N decoding modules The input connection of the i-th encoding module and the N-i+1-th decoding module are jump-connected, the i is a positive integer less than or equal to N, and the N is a positive integer;
    通过所述第N-i+1个解码模块对所述第i-1个第一特征信息和所述LDR训练图像的第N-i个第二特征信息进行特征提取,得到所述LDR训练图像的第N-i+1个第二特征信息;Feature extraction is performed on the i-1th first feature information and the N-ith second feature information of the LDR training image through the N-i+1 decoding module, to obtain the LDR training image. N-i+1 pieces of second feature information;
    根据所述N个解码模块中最后一个解码模块输出的所述LDR训练图像的第二特征信息,确定所述LDR训练图像的HDR图像预测值;Determine the HDR image prediction value of the LDR training image according to the second characteristic information of the LDR training image output by the last decoding module in the N decoding modules;
    确定所述LDR训练图像的HDR图像预测值和所述LDR训练图像的HDR图像真值之间的损失,并根据所述损失对所述动态转换模型进行训练。Determining a loss between the predicted value of the HDR image of the LDR training image and the true value of the HDR image of the LDR training image, and training the dynamic transformation model according to the loss.
  68. 根据权利要求67所述的方法,其特征在于,所述动态转换模型还包括:位于所述第i个编码模块与所述第N-i+1个解码模块的跳跃连接中的卷积注意力模块,所述通过所述第N-i+1个解码模块对所述第i-1个第一特征信息和所述LDR训练图像的第N-i个第二特征信息进行特征提取,得到所述LDR训练图像的第N-i+1个第二特征信息,包括:The method according to claim 67, wherein the dynamic conversion model further comprises: convolutional attention located in the skip connection between the i-th encoding module and the N-i+1-th decoding module module, wherein the N-i+1th decoding module performs feature extraction on the i-1th first feature information and the N-ith second feature information of the LDR training image to obtain the LDR The N-i+1 second feature information of the training image, including:
    通过所述卷积注意力模块对所述第i-1个第一特征信息进行空间信息与通道信息提取,得到所述 LDR训练图像的第i-1个第三特征信息;Carry out spatial information and channel information extraction to the i-1th first feature information by the convolution attention module, obtain the i-1th third feature information of the LDR training image;
    通过所述第N-i+1个解码模块对所述第i-1个第三特征信息和所述第N-i个第二特征信息进行特征提取,得到所述LDR训练图像的第N-i+1个第二特征信息。The N-i+1th decoding module performs feature extraction on the i-1th third feature information and the N-ith second feature information to obtain the N-i+th of the LDR training image 1 piece of second characteristic information.
  69. 根据权利要求68所述的方法,其特征在于,所述卷积注意力模块包括通道注意力模块和空间注意力模块,所述通过所述卷积注意力模块对所述第i-1个第一特征信息进行空间信息与通道信息提取,得到所述LDR训练图像的第i-1个第三特征信息,包括:The method according to claim 68, wherein the convolutional attention module includes a channel attention module and a spatial attention module, and the ith-1th A characteristic information is carried out spatial information and channel information extraction, obtains the i-1 the 3rd characteristic information of described LDR training image, comprises:
    通过所述通道注意力模块对所述第i-1个第一特征信息进行通道信息提取,得到所述第i-1个第一特征信息的通道注意力信息;performing channel information extraction on the i-1 first feature information through the channel attention module, to obtain channel attention information of the i-1 first feature information;
    通过所述空间注意力模块对第i-1个第一特征信息的融合通道特征信息进行空间信息提取,得到所述第i-1个第一特征信息的空间注意力信息,所述第i-1个第一特征信息的融合通道特征信息是根据所述第i-1个第一特征信息和第i-1个第一特征信息的通道注意力信息确定的;Through the spatial attention module, spatial information is extracted from the fusion channel feature information of the i-1th first feature information, and the spatial attention information of the i-1th first feature information is obtained, and the i-1th first feature information is obtained. The fused channel feature information of one first feature information is determined according to the i-1 first feature information and the channel attention information of the i-1 first feature information;
    根据所述第i-1个第一特征信息的通道注意力信息和空间注意力信息,确定所述LDR训练图像的第i-1个第三特征信息。Determine the i-1th third feature information of the LDR training image according to the channel attention information and the spatial attention information of the i-1th first feature information.
  70. 根据权利要求69所述的方法,其特征在于,所述卷积注意力模块还包括第一乘法单元,所述方法还包括:The method according to claim 69, wherein the convolution attention module also includes a first multiplication unit, and the method also includes:
    通过所述第一乘法单元对所述第i-1个第一特征信息和第i-1个第一特征信息的通道注意力信息进行相乘,得到所述第i-1个第一特征信息的融合通道特征信息。The first multiplication unit multiplies the i-1 first feature information and the channel attention information of the i-1 first feature information to obtain the i-1 first feature information The fusion channel feature information.
  71. 根据权利要求69所述的方法,其特征在于,所述卷积注意力模块还包括第二乘法单元,所述根据所述第i-1个第一特征信息的通道注意力信息和空间注意力信息,确定所述LDR训练图像的第i-1个第三特征信息,包括:The method according to claim 69, wherein the convolutional attention module further comprises a second multiplication unit, and the channel attention information and spatial attention according to the i-1th first feature information Information, determine the i-1th third feature information of the LDR training image, including:
    通过所述第二乘法单元对所述第i-1个第一特征信息的融合通道特征信息和空间注意力信息进行相乘,得到所述LDR训练图像的第i-1个第三特征信息。The fusion channel feature information and the spatial attention information of the i-1 first feature information are multiplied by the second multiplication unit to obtain the i-1 third feature information of the LDR training image.
  72. 根据权利要求69所述的方法,其特征在于,所述通道注意力模块包括:第一空间压缩单元、第二空间压缩单元和通道特征提取单元,所述通过所述通道注意力模块对所述第i-1个第一特征信息进行通道信息提取,得到所述第i-1个第一特征信息的通道注意力信息;The method according to claim 69, wherein the channel attention module comprises: a first space compression unit, a second space compression unit, and a channel feature extraction unit, and the channel attention module is used for the Extracting channel information from the i-1 first feature information to obtain channel attention information of the i-1 first feature information;
    通过所述第一空间压缩单元对所述第i-1个第一特征信息进行空间维度压缩,得到所述第i-1个第一特征信息的第一空间压缩信息;performing spatial dimension compression on the i-1 th first feature information by the first spatial compression unit to obtain first spatial compression information of the i-1 th first feature information;
    通过所述第二空间压缩单元对所述第i-1个第一特征信息进行空间维度压缩,得到所述第i-1个第一特征信息的第二空间压缩信息;performing spatial dimension compression on the i-1 th first feature information by the second spatial compression unit to obtain second spatial compression information of the i-1 th first feature information;
    通过所述通道特征提取单元对所述第i-1个第一特征信息的第一空间压缩信息进行通道特征提取,得到所述i-1个第一特征信息的第一通道信息;performing channel feature extraction on the first spatial compression information of the i-1 first feature information by the channel feature extraction unit, to obtain the first channel information of the i-1 first feature information;
    通过所述通道特征提取单元对所述第i-1个第一特征信息的第二空间压缩信息进行通道特征提取,得到所述i-1个第一特征信息的第二通道信息;performing channel feature extraction on the second spatial compression information of the i-1 first feature information by the channel feature extraction unit, to obtain the second channel information of the i-1 first feature information;
    根据所述i-1个第一特征信息的第一通道信息和第二通道信息,确定所述第i-1个第一特征信息的通道注意力信息。Determine the channel attention information of the i-1 first feature information according to the first channel information and the second channel information of the i-1 first feature information.
  73. 根据权利要求72所述的方法,其特征在于,所述第一空间压缩单元和/或所述第二空间压缩单元包括池化层。The method according to claim 72, wherein the first spatial compression unit and/or the second spatial compression unit comprises a pooling layer.
  74. 根据权利要求72所述的方法,其特征在于,所述第一空间压缩单元为最大池化层,和/或所述第二空间压缩单元为平均池化层。The method according to claim 72, wherein the first spatial compression unit is a maximum pooling layer, and/or the second spatial compression unit is an average pooling layer.
  75. 根据权利要求72所述的方法,其特征在于,所述通道特征提取单元为多层感知机MLP。The method according to claim 72, wherein the channel feature extraction unit is a multi-layer perceptron (MLP).
  76. 根据权利要求72所述的方法,其特征在于,所述通道注意力模块还包括:第一加法单元和第一激活函数,所述根据所述i-1个第一特征信息的第一通道信息和第二通道信息,确定所述第i-1个第一特征信息的通道注意力信息,包括:The method according to claim 72, wherein the channel attention module further comprises: a first addition unit and a first activation function, and the first channel information according to the i-1 first feature information and the second channel information, determine the channel attention information of the i-1 first feature information, including:
    通过所述第一加法单元对所述i-1个第一特征信息的第一通道信息和第二通道信息进行相加,得到所述i-1个第一特征信息的融合通道信息;Adding the first channel information and the second channel information of the i-1 pieces of first feature information by the first adding unit to obtain the fusion channel information of the i-1 pieces of first feature information;
    通过所述第一激活函数对所述i-1个第一特征信息的融合通道信息进行非线性处理,得到所述第i-1个第一特征信息的通道注意力信息。Perform non-linear processing on the fused channel information of the i-1 pieces of first feature information by using the first activation function to obtain channel attention information of the i-1 th piece of first feature information.
  77. 根据权利要求69所述的方法,其特征在于,所述空间注意力模块包括:第一通道压缩单元、第二通道压缩单元和空间特征提取单元,所述通过所述空间注意力模块对第i-1个第一特征信息的融合通道特征信息进行空间信息提取,得到所述第i-1个第一特征信息的空间注意力信息,包括:The method according to claim 69, wherein the spatial attention module comprises: a first channel compression unit, a second channel compression unit, and a spatial feature extraction unit, and the i-th - Extracting spatial information from the fusion channel feature information of 1 first feature information to obtain the spatial attention information of the i-1 first feature information, including:
    通过所述第一通道压缩单元对所述第i-1个第一特征信息的融合通道特征信息进行通道维度压缩,得到所述第i-1个第一特征信息的第一通道压缩信息;performing channel dimension compression on the fusion channel feature information of the i-1 first feature information by the first channel compression unit, to obtain the first channel compression information of the i-1 first feature information;
    通过所述第二通道压缩单元对所述第i-1个第一特征信息的融合通道特征信息进行通道维度压缩,得到所述第i-1个第一特征信息的第二通道压缩信息;performing channel dimension compression on the fusion channel feature information of the i-1 first feature information by the second channel compression unit to obtain the second channel compression information of the i-1 first feature information;
    通过所述空间特征提取单元对所述第i-1个第一特征信息的第一通道压缩信息和第二通道压缩信息进行空间特征提取,得到所述第i-1个第一特征信息的空间特征信息;The space feature extraction unit performs spatial feature extraction on the first channel compressed information and the second channel compressed information of the i-1 first feature information to obtain the space of the i-1 first feature information characteristic information;
    根据所述第i-1个第一特征信息的空间特征信息,确定所述第i-1个第一特征信息的空间注意力信息。The spatial attention information of the i-1 th first feature information is determined according to the spatial feature information of the i-1 th first feature information.
  78. 根据权利要求77所述的方法,其特征在于,所述第一通道压缩单元和/或所述第二通道压缩单元包括池化层。The method according to claim 77, wherein the first channel compression unit and/or the second channel compression unit comprises a pooling layer.
  79. 根据权利要求77所述的方法,其特征在于,所述第一通道压缩单元为最大池化层,和/或所述第二通道压缩单元为平均池化层。The method according to claim 77, wherein the first channel compression unit is a maximum pooling layer, and/or the second channel compression unit is an average pooling layer.
  80. 根据权利要求77所述的方法,其特征在于,所述空间特征提取单元为卷积层。The method according to claim 77, wherein the spatial feature extraction unit is a convolutional layer.
  81. 根据权利要求77所述的方法,其特征在于,所述空间注意力模块还包括第二激活函数,所述根据所述第i-1个第一特征信息的空间特征信息,确定所述第i-1个第一特征信息的空间注意力信息,包括:The method according to claim 77, wherein the spatial attention module further includes a second activation function, and determining the ith according to the spatial feature information of the i-1 first feature information -1 spatial attention information of the first feature information, including:
    通过所述第二激活函数对所述第i-1个第一特征信息的空间特征信息进行非线性处理,得到所述第i-1个第一特征信息的空间注意力信息。The spatial feature information of the i-1 th first feature information is nonlinearly processed by the second activation function to obtain the spatial attention information of the i-1 th first feature information.
  82. 根据权利要求69-81任一项所述的方法,其特征在于,所述第i-1个第一特征信息的通道注意力信息的空间维度为1×1。The method according to any one of claims 69-81, wherein the spatial dimension of the channel attention information of the i-1th first feature information is 1×1.
  83. 根据权利要求69-81任一项所述的方法,其特征在于,所述第i-1个第一特征信息的空间注意力信息的特征维度为1。The method according to any one of claims 69-81, wherein the feature dimension of the spatial attention information of the i-1 first feature information is 1.
  84. 根据权利要求68所述的方法,其特征在于,所述动态转换模型还包括至少一个下采样单元,所述方法还包括:The method according to claim 68, wherein the dynamic conversion model further comprises at least one downsampling unit, and the method further comprises:
    通过所述下采样单元对所述编码模块输出的特征信息进行空间维度下采样。The feature information output by the coding module is down-sampled in a spatial dimension by the down-sampling unit.
  85. 根据权利要求84所述的方法,其特征在于,所述下采样单元为最大池化层。The method according to claim 84, wherein the downsampling unit is a max pooling layer.
  86. 根据权利要求84所述的方法,其特征在于,所述动态转换模型还包括至少一个上采样单元,所述方法还包括:The method according to claim 84, wherein the dynamic conversion model further comprises at least one upsampling unit, and the method further comprises:
    通过所述上采样单元对所述解码模块输出的特征信息进行空间维度上采样。The feature information output by the decoding module is subjected to spatial dimension up-sampling by the up-sampling unit.
  87. 根据权利要求86所述的方法,其特征在于,所述上采样单元为双线性插值单元。The method according to claim 86, wherein the upsampling unit is a bilinear interpolation unit.
  88. 根据权利要求67所述的方法,其特征在于,所述N个编码模块中每个编码模块包括至少一个卷积块,其中所述N个编码模块中每个编码模块所包括的卷积块的参数不完全相同。The method according to claim 67, wherein each of the N encoding modules includes at least one convolutional block, wherein the convolutional block included in each of the N encoding modules The parameters are not exactly the same.
  89. 根据权利要求67所述的方法,其特征在于,所述N个解码模块中每个解码模块包括至少一个卷积块,其中所述N个解码模块中每个解码模块所包括的卷积块的参数不完全相同。The method according to claim 67, wherein each of the N decoding modules includes at least one convolutional block, wherein the convolutional block included in each of the N decoding modules The parameters are not exactly the same.
  90. 根据权利要求67所述的方法,其特征在于,The method of claim 67, wherein,
    若所述i等于N,则所述第N-i个第二特征信息是根据所述第N个编码模块输出的第N个第一特征信息确定的;或者,If the i is equal to N, the N-i th second feature information is determined according to the N th first feature information output by the N th encoding module; or,
    若所述i小于N,则所述第N-i个第二特征信息是根据第N-i个解码模块输出的第N-i个第二特征信息确定的;或者,If the i is less than N, the N-i-th second feature information is determined according to the N-i-th second feature information output by the N-i-th decoding module; or,
    若所述i等于1,则所述第i-1个第一特征信息是根据所述LDR训练图像确定的;或者,If the i is equal to 1, the i-1th first feature information is determined according to the LDR training image; or,
    若所述i大于1,则所述第i-1个第一特征信息是根据第i-1个编码模块输出的第一特征信息确定的。If the i is greater than 1, the i-1 first feature information is determined according to the first feature information output by the i-1 encoding module.
  91. 根据权利要求68所述的方法,其特征在于,所述通过所述第N-i+1个解码模块对所述第i-1个第三特征信息和所述第N-i个第二特征信息进行特征提取,得到所述LDR训练图像的第N-i+1个 第二特征信息包括:The method according to claim 68, wherein the i-1th third characteristic information and the N-ith second characteristic information are performed by the N-i+1th decoding module Feature extraction, obtaining the N-i+1 second feature information of the LDR training image includes:
    对所述第i-1个第三特征信息和所述第N-i个第二特征信息进行级联;Concatenating the i-1th third characteristic information and the N-ith second characteristic information;
    将级联后的特征信息输入所述第N-i+1个解码模块进行特征提取,得到所述LDR训练图像的第N-i+1个第二特征信息。Inputting the concatenated feature information into the N-i+1th decoding module for feature extraction to obtain the N-i+1th second feature information of the LDR training image.
  92. 根据权利要求68所述的方法,其特征在于,所述动态转换模型还包括第一卷积层,所述方法还包括:The method according to claim 68, wherein the dynamic conversion model further comprises a first convolutional layer, and the method further comprises:
    通过所述第一卷积层对所述LDR训练图像进行特征提取,得到所述LDR训练图像的初始特征图;Carrying out feature extraction to the LDR training image through the first convolutional layer to obtain an initial feature map of the LDR training image;
    将所述初始特征图分别输入第一个编码模块和第一卷积注意力模块中,得到所述第一个编码模块输出的第一个第一特征信息,以及得到所述第一个卷积注意力模块输出的第一个第三特征信息。Input the initial feature map into the first encoding module and the first convolution attention module respectively, obtain the first first feature information output by the first encoding module, and obtain the first convolution The first third feature information output by the attention module.
  93. 根据权利要求67所述的方法,其特征在于,所述动态转换模型还包括第二卷积层,所述根据所述N个解码模块中最后一个解码模块输出的所述LDR训练图像的第二特征信息,确定所述LDR训练图像的HDR图像预测值,包括:The method according to claim 67, wherein the dynamic conversion model further comprises a second convolutional layer, and the second convolutional layer of the LDR training image output by the last decoding module among the N decoding modules is characterized in that: Feature information, determining the HDR image prediction value of the LDR training image, including:
    通过所述第二卷积层对所述最后一个解码模块输出的所述LDR训练图像的第二特征信息进行特征提取,输出所述LDR训练图像的HDR图像预测值。performing feature extraction on the second feature information of the LDR training image output by the last decoding module through the second convolution layer, and outputting the HDR image prediction value of the LDR training image.
  94. 根据权利要求68所述的方法,其特征在于,所述方法还包括:The method of claim 68, further comprising:
    获取预训练模型在预训练时得到的预训练参数;Obtain the pre-training parameters obtained by the pre-training model during pre-training;
    将所述预训练参数确定为所述动态转换模型的初始参数。The pre-training parameters are determined as initial parameters of the dynamic conversion model.
  95. 根据权利要求94所述的方法,其特征在于,所述确定所述LDR训练图像的HDR图像预测值和所述LDR训练图像的HDR图像真值之间的损失,包括:The method according to claim 94, wherein said determining the loss between the predicted value of the HDR image of the LDR training image and the true value of the HDR image of the LDR training image comprises:
    根据预设的损失函数,确定所述LDR训练图像的HDR图像预测值和所述LDR训练图像的HDR图像真值之间的目标损失。A target loss between the predicted value of the HDR image of the LDR training image and the true value of the HDR image of the LDR training image is determined according to a preset loss function.
  96. 根据权利要求95所述的方法,其特征在于,所述预设的损失函数包括重构损失函数、感知损失函数和样式损失函数中的至少一个。The method according to claim 95, wherein the preset loss function includes at least one of a reconstruction loss function, a perceptual loss function and a style loss function.
  97. 根据权利要求96所述的方法,其特征在于,所述根据预设的损失函数,确定所述LDR训练图像的HDR图像预测值和所述LDR训练图像的HDR图像真值之间的目标损失,包括:The method according to claim 96, wherein the target loss between the predicted value of the HDR image of the LDR training image and the true value of the HDR image of the LDR training image is determined according to a preset loss function, include:
    确定所述HDR图像预测值与所述HDR图像真值之间的重构损失;determining a reconstruction loss between the predicted value of the HDR image and the true value of the HDR image;
    确定所述HDR图像预测值与所述HDR图像真值之间的感知损失;determining a perceptual loss between the predicted value of the HDR image and the true value of the HDR image;
    确定所述HDR图像预测值与所述HDR图像真值之间的样式损失;determining a style loss between the predicted value of the HDR image and the true value of the HDR image;
    根据所述HDR图像预测值与所述HDR图像真值之间的重构损失、感知损失和样式损失,确定所述确定所述HDR图像预测值与所述HDR图像真值之间的目标损失。The determining a target loss between the predicted value of the HDR image and the true value of the HDR image is determined according to a reconstruction loss, a perceptual loss, and a style loss between the predicted value of the HDR image and the true value of the HDR image.
  98. 根据权利要求97所述的方法,其特征在于,所述根据所述HDR图像预测值与所述HDR图像真值之间的重构损失、感知损失和样式损失,确定所述确定所述HDR图像预测值与所述HDR图像真值之间的目标损失,包括:The method according to claim 97, wherein said determining said HDR image is determined according to reconstruction loss, perceptual loss, and style loss between said HDR image prediction value and said HDR image true value. The target loss between the predicted value and the true value of the HDR image, including:
    根据如下公式,确定所述HDR图像预测值与所述HDR图像真值之间的目标损失:Determine the target loss between the HDR image prediction value and the HDR image true value according to the following formula:
    Loss=L 1sL stpL p Loss=L 1s L stp L p
    其中,Loss为所述目标损失,所述L1为所述重构损失,所述Lst为所述感知损失,所述Lp为所述样式损失,所述λ s和λ p是超参数。 Wherein, Loss is the target loss, the L1 is the reconstruction loss, the Lst is the perceptual loss, the Lp is the style loss, and the λ s and λ p are hyperparameters.
  99. 根据权利要求97所述的方法,其特征在于,所述确定所述HDR图像预测值与所述HDR图像真值之间的重构损失,包括:The method according to claim 97, wherein the determining the reconstruction loss between the predicted value of the HDR image and the true value of the HDR image comprises:
    根据预设的压缩色调映射函数,确定所述HDR图像预测值的压缩色调映射值;determining a compressed tone-mapping value of the predicted value of the HDR image according to a preset compressed tone-mapping function;
    根据所述压缩色调映射函数,确定所述HDR图像真值的压缩色调映射值;determining a compressed tone-mapping value of the true value of the HDR image according to the compressed tone-mapping function;
    根据所述HDR图像真值的压缩色调映射值与所述HDR图像预测值的压缩色调映射值之间的误差,确定所述重构损失。The reconstruction loss is determined from an error between the compressed tone-mapped values of the true value of the HDR image and the compressed tone-mapped value of the predicted value of the HDR image.
  100. 根据权利要求97所述的方法,其特征在于,所述确定所述HDR图像预测值与所述HDR图像真值之间的感知损失,包括:The method according to claim 97, wherein the determining the perceptual loss between the predicted value of the HDR image and the true value of the HDR image comprises:
    获取所述预训练模型的第l层的特征图;Obtain the feature map of the first layer of the pre-training model;
    根据预设的压缩色调映射函数,确定所述HDR图像预测值的压缩色调映射值;determining a compressed tone-mapping value of the predicted value of the HDR image according to a preset compressed tone-mapping function;
    根据所述压缩色调映射函数,确定所述HDR图像真值的压缩色调映射值;determining a compressed tone-mapping value of the true value of the HDR image according to the compressed tone-mapping function;
    确定所述HDR图像预测值的压缩色调映射值,在所述第l层的特征图中对应的第一特征值;Determining the compressed tone mapping value of the predicted value of the HDR image, corresponding to the first feature value in the feature map of the first layer;
    确定所述HDR图像真值的压缩色调映射值,在所述第l层的特征图中对应的第二特征值;Determining a compressed tone mapping value of the true value of the HDR image, a second feature value corresponding to the feature map of the first layer;
    根据所述第一特征值与所述第二特征值之间的误差,确定所述感知损失。The perceptual loss is determined based on an error between the first feature value and the second feature value.
  101. 根据权利要求97所述的方法,其特征在于,所述确定所述HDR图像预测值与所述HDR图像真值之间的样式损失,包括:The method according to claim 97, wherein the determining the style loss between the predicted value of the HDR image and the true value of the HDR image comprises:
    获取所述预训练模型的第l层特征图的格拉姆Gram矩阵;Obtain the Gram Gram matrix of the l-th layer feature map of the pre-training model;
    根据预设的压缩色调映射函数,确定所述HDR图像预测值的压缩色调映射值;determining a compressed tone-mapping value of the predicted value of the HDR image according to a preset compressed tone-mapping function;
    根据所述压缩色调映射函数,确定所述HDR图像真值的压缩色调映射值;determining a compressed tone-mapping value of the true value of the HDR image according to the compressed tone-mapping function;
    确定所述HDR图像预测值的压缩色调映射值,在所述Gram矩阵中对应的第一元素值;Determine the compressed tone mapping value of the predicted value of the HDR image, and the corresponding first element value in the Gram matrix;
    确定所述HDR图像真值的压缩色调映射值,在所述Gram矩阵中对应的第二元素值;Determine the compressed tone mapping value of the true value of the HDR image, and the corresponding second element value in the Gram matrix;
    根据所述第一元素值与所述第二元素值之间的误差,确定所述样式损失。The style loss is determined based on an error between the first element value and the second element value.
  102. 一种图像解码装置,其特征在于,包括:An image decoding device, characterized in that it comprises:
    解码单元,用于解码码流,得到重建图像;a decoding unit, configured to decode the code stream to obtain a reconstructed image;
    处理单元,用于将所述重建图像输入动态转换模型进行动态转换,得到所述重建图像的高动态范围HDR图像;A processing unit, configured to input the reconstructed image into a dynamic conversion model for dynamic conversion to obtain a high dynamic range HDR image of the reconstructed image;
    其中,所述动态转换模型包括:串联连接的N个编码模块和串联连接的N个解码模块,所述N个编码模块中的最后一个编码模块的输出与所述N个解码模块中的第一个解码模块的输入连接,且第i个编码模块与第N-i+1个解码模块跳跃连接,所述第i个编码模块用于对第i-1个编码模块输出的第i-1个第一特征信息进行特征提取,得到所述重建图像的第i个第一特征信息,所述第N-i+1个解码模块用于对所述第i-1个第一特征信息和所述重建图像的第N-i个第二特征信息进行特征提取,得到所述重建图像的第N-i+1个第二特征信息,所述重建图像的HDR图像是根据所述N个解码模块中最后一个解码模块输出的第二特征信息确定的,所述i为小于或等于N的正整数,所述N为正整数。Wherein, the dynamic conversion model includes: N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in the N encoding modules is the same as that of the first encoding module in the N decoding modules The input of the decoding module is connected, and the i-th encoding module is skip-connected to the N-i+1 decoding module, and the i-th encoding module is used for the i-1th output of the i-1 encoding module Feature extraction is performed on the first feature information to obtain the i-th first feature information of the reconstructed image, and the N-i+1th decoding module is used to extract the i-1-th first feature information and the Perform feature extraction on the N-ith second feature information of the reconstructed image to obtain the N-i+1th second feature information of the reconstructed image, and the HDR image of the reconstructed image is based on the last one of the N decoding modules Determined by the second feature information output by the decoding module, the i is a positive integer less than or equal to N, and the N is a positive integer.
  103. 一种图像处理装置,其特征在于,包括:An image processing device, characterized in that it comprises:
    获取单元,用于获取待处理的低动态范围LDR图像;An acquisition unit, configured to acquire a low dynamic range LDR image to be processed;
    处理单元,用于将所述LDR图像输入动态转换模型进行动态转换,得到所述LDR图像的高动态范围HDR图像;A processing unit, configured to input the LDR image into a dynamic conversion model for dynamic conversion to obtain a high dynamic range HDR image of the LDR image;
    其中,所述动态转换模型包括:串联连接的N个编码模块和串联连接的N个解码模块,所述N个编码模块中的最后一个编码模块的输出与所述N个解码模块中的第一个解码模块的输入连接,且第i个编码模块与第N-i+1个解码模块跳跃连接,所述第i个编码模块用于对第i-1个编码模块输出的第i-1个第一特征信息进行特征提取,得到所述LDR图像的第i个第一特征信息,所述第N-i+1个解码模块用于对所述第i-1个第一特征信息和所述LDR图像的第N-i个第二特征信息进行特征提取,得到所述LDR图像的第N-i+1个第二特征信息,所述LDR图像的HDR图像是根据所述N个解码模块中最后一个解码模块输出的第二特征信息确定的,所述i为小于或等于N的正整数,所述N为正整数。Wherein, the dynamic conversion model includes: N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in the N encoding modules is the same as that of the first encoding module in the N decoding modules The input of the decoding module is connected, and the i-th encoding module is skip-connected to the N-i+1 decoding module, and the i-th encoding module is used for the i-1th output of the i-1 encoding module Feature extraction is performed on the first feature information to obtain the ith first feature information of the LDR image, and the N-i+1 decoding module is used to extract the i-1 first feature information and the Feature extraction is performed on the N-i second feature information of the LDR image to obtain the N-i+1 second feature information of the LDR image, and the HDR image of the LDR image is based on the last one of the N decoding modules Determined by the second feature information output by the decoding module, the i is a positive integer less than or equal to N, and the N is a positive integer.
  104. 一种模型训练装置,其特征在于,包括:A model training device, characterized in that it comprises:
    获取单元,用于获取低动态范围LDR训练图像和所述LDR训练图像的高动态范围HDR图像真值;An acquisition unit configured to acquire a low dynamic range LDR training image and a true value of a high dynamic range HDR image of the LDR training image;
    处理单元,用于将所述LDR训练图像输入动态转换模型,通过第i个编码模块对第i-1个第一特征信息进行特征提取,得到所述LDR训练图像的第i个第一特征信息,其中,所述动态转换模型包括串联连接的N个编码模块和串联连接的所述N个解码模块,所述N个编码模块中的最后一个编码模块的输出与所述N个解码模块中的第一个解码模块的输入连接,且第i个编码模块与第N-i+1个解码模块跳跃连接,所述i为小于或等于N的正整数,所述N为正整数;通过所述第N-i+1个解码模块对所述第i-1个第一特征信息和所述LDR训练图像的第N-i个第二特征信息进行特征提取,得到所述LDR训练图像的第N-i+1个第二特征信息;根据所述N个解码模块中最后一个解码模块输出的所述 LDR训练图像的第二特征信息,确定所述LDR训练图像的HDR图像预测值;确定所述LDR训练图像的HDR图像预测值和所述LDR训练图像的HDR图像真值之间的损失,并根据所述损失对所述动态转换模型进行训练。A processing unit, configured to input the LDR training image into a dynamic conversion model, perform feature extraction on the i-1 first feature information through the i-th encoding module, and obtain the i-th first feature information of the LDR training image , wherein the dynamic conversion model includes N encoding modules connected in series and the N decoding modules connected in series, the output of the last encoding module in the N encoding modules is the same as that of the N decoding modules The input of the first decoding module is connected, and the i-th encoding module is skipped and connected to the N-i+1 decoding module, the i is a positive integer less than or equal to N, and the N is a positive integer; through the The N-i+1 decoding module performs feature extraction on the i-1 first feature information and the N-i second feature information of the LDR training image to obtain the N-i of the LDR training image +1 second feature information; according to the second feature information of the LDR training image output by the last decoding module in the N decoding modules, determine the HDR image prediction value of the LDR training image; determine the LDR training The loss between the HDR image prediction value of the image and the HDR image true value of the LDR training image, and the dynamic conversion model is trained according to the loss.
  105. 一种解码器,其特征在于,包括:处理器和存储器;A decoder, characterized in that it includes: a processor and a memory;
    所述存储器用于存储计算机程序;The memory is used to store computer programs;
    所述处理器用于调用并运行所述存储器中存储的计算机程序,以执行如权利要求1-33任一项所述的方法。The processor is used for invoking and running the computer program stored in the memory, so as to execute the method according to any one of claims 1-33.
  106. 一种电子设备,其特征在于,包括:处理器和存储器;An electronic device, characterized in that it includes: a processor and a memory;
    所述存储器用于存储计算机程序;The memory is used to store computer programs;
    所述处理器用于调用并运行所述存储器中存储的计算机程序,以执行如权利要求34-66或67至101任一项所述的方法。The processor is used for invoking and running the computer program stored in the memory, so as to execute the method according to any one of claims 34-66 or 67-101.
  107. 一种计算机可读存储介质,其特征在于,用于存储计算机程序,所述计算机程序使得计算机执行如权利要求1至33或34至66或67至101任一项所述的方法。A computer-readable storage medium, characterized by being used to store a computer program, the computer program causes a computer to execute the method according to any one of claims 1-33 or 34-66 or 67-101.
PCT/CN2021/102173 2021-06-24 2021-06-24 Image decoding method and apparatus, image processing method and apparatus, and device WO2022266955A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202180097934.XA CN117441186A (en) 2021-06-24 2021-06-24 Image decoding and processing method, device and equipment
PCT/CN2021/102173 WO2022266955A1 (en) 2021-06-24 2021-06-24 Image decoding method and apparatus, image processing method and apparatus, and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/102173 WO2022266955A1 (en) 2021-06-24 2021-06-24 Image decoding method and apparatus, image processing method and apparatus, and device

Publications (1)

Publication Number Publication Date
WO2022266955A1 true WO2022266955A1 (en) 2022-12-29

Family

ID=84543976

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/102173 WO2022266955A1 (en) 2021-06-24 2021-06-24 Image decoding method and apparatus, image processing method and apparatus, and device

Country Status (2)

Country Link
CN (1) CN117441186A (en)
WO (1) WO2022266955A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115776571A (en) * 2023-02-10 2023-03-10 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Image compression method, device, equipment and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108805836A (en) * 2018-05-31 2018-11-13 大连理工大学 Method for correcting image based on the reciprocating HDR transformation of depth
CN109447907A (en) * 2018-09-20 2019-03-08 宁波大学 A kind of single image Enhancement Method based on full convolutional neural networks
CN109785263A (en) * 2019-01-14 2019-05-21 北京大学深圳研究生院 A kind of inverse tone mapping (ITM) image conversion method based on Retinex
CN110717868A (en) * 2019-09-06 2020-01-21 上海交通大学 Video high dynamic range inverse tone mapping model construction and mapping method and device
US20200074600A1 (en) * 2017-11-28 2020-03-05 Adobe Inc. High dynamic range illumination estimation
CN111292264A (en) * 2020-01-21 2020-06-16 武汉大学 Image high dynamic range reconstruction method based on deep learning
CN111372006A (en) * 2020-03-03 2020-07-03 山东大学 High dynamic range imaging method and system for mobile terminal
US20200265567A1 (en) * 2019-02-18 2020-08-20 Samsung Electronics Co., Ltd. Techniques for convolutional neural network-based multi-exposure fusion of multiple image frames and for deblurring multiple image frames
CN111709900A (en) * 2019-10-21 2020-09-25 上海大学 High dynamic range image reconstruction method based on global feature guidance
CN111914938A (en) * 2020-08-06 2020-11-10 上海金桥信息股份有限公司 Image attribute classification and identification method based on full convolution two-branch network
CN111951171A (en) * 2019-05-16 2020-11-17 武汉Tcl集团工业研究院有限公司 HDR image generation method and device, readable storage medium and terminal equipment

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200074600A1 (en) * 2017-11-28 2020-03-05 Adobe Inc. High dynamic range illumination estimation
CN108805836A (en) * 2018-05-31 2018-11-13 大连理工大学 Method for correcting image based on the reciprocating HDR transformation of depth
CN109447907A (en) * 2018-09-20 2019-03-08 宁波大学 A kind of single image Enhancement Method based on full convolutional neural networks
CN109785263A (en) * 2019-01-14 2019-05-21 北京大学深圳研究生院 A kind of inverse tone mapping (ITM) image conversion method based on Retinex
US20200265567A1 (en) * 2019-02-18 2020-08-20 Samsung Electronics Co., Ltd. Techniques for convolutional neural network-based multi-exposure fusion of multiple image frames and for deblurring multiple image frames
CN111951171A (en) * 2019-05-16 2020-11-17 武汉Tcl集团工业研究院有限公司 HDR image generation method and device, readable storage medium and terminal equipment
CN110717868A (en) * 2019-09-06 2020-01-21 上海交通大学 Video high dynamic range inverse tone mapping model construction and mapping method and device
CN111709900A (en) * 2019-10-21 2020-09-25 上海大学 High dynamic range image reconstruction method based on global feature guidance
CN111292264A (en) * 2020-01-21 2020-06-16 武汉大学 Image high dynamic range reconstruction method based on deep learning
CN111372006A (en) * 2020-03-03 2020-07-03 山东大学 High dynamic range imaging method and system for mobile terminal
CN111914938A (en) * 2020-08-06 2020-11-10 上海金桥信息股份有限公司 Image attribute classification and identification method based on full convolution two-branch network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KINOSHITA YUMA; KIYA HITOSHI: "Convolutional Neural Networks Considering Local and Global Features for Image Enhancement", 2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), IEEE, 22 September 2019 (2019-09-22), pages 2110 - 2114, XP033647118, DOI: 10.1109/ICIP.2019.8803194 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115776571A (en) * 2023-02-10 2023-03-10 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Image compression method, device, equipment and storage medium
CN115776571B (en) * 2023-02-10 2023-04-28 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Image compression method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN117441186A (en) 2024-01-23

Similar Documents

Publication Publication Date Title
JP6182644B2 (en) Layer decomposition in hierarchical VDR coding
JP7239711B2 (en) Chroma block prediction method and apparatus
US20230069953A1 (en) Learned downsampling based cnn filter for image and video coding using learned downsampling feature
CN111800629A (en) Video decoding method, video encoding method, video decoder and video encoder
US20230076920A1 (en) Global skip connection based convolutional neural network (cnn) filter for image and video coding
JP7277586B2 (en) Method and apparatus for mode and size dependent block level limiting
US20230362378A1 (en) Video coding method and apparatus
US11070808B2 (en) Spatially adaptive quantization-aware deblocking filter
WO2023279961A1 (en) Video image encoding method and apparatus, and video image decoding method and apparatus
WO2022266955A1 (en) Image decoding method and apparatus, image processing method and apparatus, and device
Lauga et al. Segmentation-based optimized tone mapping for high dynamic range image and video coding
US20230209096A1 (en) Loop filtering method and apparatus
WO2022179509A1 (en) Audio/video or image layered compression method and apparatus
KR20230129068A (en) Scalable encoding and decoding method and apparatus
WO2023000182A1 (en) Image encoding, decoding and processing methods, image decoding apparatus, and device
WO2023184088A1 (en) Image processing method and apparatus, device, system, and storage medium
TWI834087B (en) Method and apparatus for reconstruct image from bitstreams and encoding image into bitstreams, and computer program product
WO2022194137A1 (en) Video image encoding method, video image decoding method and related devices
WO2023206420A1 (en) Video encoding and decoding method and apparatus, device, system and storage medium
EP4226325A1 (en) A method and apparatus for encoding or decoding a picture using a neural network
TW202228081A (en) Method and apparatus for reconstruct image from bitstreams and encoding image into bitstreams, and computer program product
CN117939157A (en) Image processing method, device and equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21946451

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE