WO2022266955A1 - Procédé et appareil de décodage d'images, procédé et appareil de traitement d'images, et dispositif - Google Patents

Procédé et appareil de décodage d'images, procédé et appareil de traitement d'images, et dispositif Download PDF

Info

Publication number
WO2022266955A1
WO2022266955A1 PCT/CN2021/102173 CN2021102173W WO2022266955A1 WO 2022266955 A1 WO2022266955 A1 WO 2022266955A1 CN 2021102173 W CN2021102173 W CN 2021102173W WO 2022266955 A1 WO2022266955 A1 WO 2022266955A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature information
information
channel
feature
image
Prior art date
Application number
PCT/CN2021/102173
Other languages
English (en)
Chinese (zh)
Inventor
元辉
姜世奇
杨烨
李明
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to CN202180097934.XA priority Critical patent/CN117441186A/zh
Priority to PCT/CN2021/102173 priority patent/WO2022266955A1/fr
Publication of WO2022266955A1 publication Critical patent/WO2022266955A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Definitions

  • the present application relates to the technical field of image processing, and in particular to an image decoding and processing method, device and equipment.
  • Dynamic range is a term used to define how wide a range of tonal detail a camera can capture in an image, usually the range from the lowest value to the highest overflow value. Simply put, it describes the ratio between the brightest and darkest tones a camera can record in a single frame. The larger the dynamic range, the more likely it is to preserve information in highlights and shadows.
  • Embodiments of the present application provide an image decoding and processing method, device, and equipment to reduce the cost of converting a low dynamic range image into a high dynamic range image.
  • the embodiment of the present application provides an image decoding method, including:
  • the dynamic conversion model includes: N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in the N encoding modules is connected to the input of the first decoding module in the N decoding modules , and the i-th coding module is skip-connected to the N-i+1-th decoding module, and the i-th coding module is used to perform feature extraction on the i-1-th first feature information output by the i-1-th coding module, Obtain the i-th first feature information of the reconstructed image, and the N-i+1 decoding module is used to perform feature extraction on the i-1-th first feature information and the N-i-th second feature information of the reconstructed image, and obtain the reconstruction
  • the N-i+1th second feature information of the image, the HDR image of the reconstructed image is determined according to the second feature information output by the last decoding module in the N decoding modules, i is a positive integer less than or equal to N, and N is a positive integer.
  • the present application provides an image processing method, including:
  • the dynamic conversion model includes: N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in the N encoding modules is connected to the input of the first decoding module in the N decoding modules , and the i-th coding module is skip-connected to the N-i+1-th decoding module, and the i-th coding module is used to perform feature extraction on the i-1-th first feature information output by the i-1-th coding module, Obtain the i-th first feature information of the LDR image, and the N-i+1 decoding module is used to perform feature extraction on the i-1-th first feature information and the N-i-th second feature information of the LDR image, and obtain the LDR
  • the N-i+1th second feature information of the image, the HDR image of the LDR image is determined according to the second feature information output by the last decoding module in the N decoding modules, i is a positive integer less than or equal to N, and N is a positive integer.
  • the present application provides a model training method, including:
  • the i-1th first feature information and the N-ith second feature information of the LDR training image are extracted to obtain the N-i+1th second feature information of the LDR training image characteristic information;
  • an image decoding device configured to execute the method in the above first aspect or its implementations.
  • the image decoding device includes a functional unit configured to execute the method in the above first aspect or each implementation manner thereof.
  • a decoder including a processor and a memory.
  • the memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory, so as to execute the method in the above first aspect or its various implementations.
  • an image processing device configured to execute the method in the above-mentioned second aspect or various implementations thereof.
  • the device includes a functional unit configured to execute the method in the above second aspect or each implementation manner thereof.
  • an image processing device including a processor and a memory.
  • the memory is used to store a computer program
  • the processor is used to invoke and run the computer program stored in the memory, so as to execute the method in the above second aspect or its various implementations.
  • a model training device configured to execute the method in the above third aspect or various implementations thereof.
  • the model training device includes a functional unit for executing the method in the above third aspect or its various implementations.
  • a model training device including a processor and a memory.
  • the memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory, so as to execute the method in the above third aspect or its various implementations.
  • a chip configured to implement any one of the foregoing first to third aspects or the method in each implementation manner thereof.
  • the chip includes: a processor, configured to call and run a computer program from the memory, so that the device installed with the chip executes any one of the above-mentioned first to third aspects or any of the implementations thereof. method.
  • a computer-readable storage medium for storing a computer program, and the computer program causes a computer to execute any one of the above-mentioned first to third aspects or the method in each implementation manner thereof.
  • a twelfth aspect provides a computer program product, including computer program instructions, the computer program instructions cause a computer to execute any one of the above first to third aspects or the method in each implementation manner.
  • a thirteenth aspect provides a computer program, which, when running on a computer, causes the computer to execute any one of the above first to third aspects or the method in each implementation manner.
  • the dynamic conversion model includes N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in the N encoding modules is connected with the first decoding module in the N decoding modules
  • the input connection of and the i-th encoding module is skip-connected to the N-i+1-th decoding module, and the i-th encoding module is used to perform the i-1th first feature information output by the i-1-th encoding module Feature extraction, to obtain the i-th first feature information of the reconstructed image, and the N-i+1-th decoding module is used to perform feature extraction on the i-1-th first feature information and the N-i-th second feature information of the reconstructed image , to obtain the N-i+1th second characteristic information of the reconstructed image, the HDR image of the reconstructed image is determined according to the second characteristic information output by the last decoding module in the N decoding modules, i is a positive value less than or equal to N In
  • LDR images can be converted into HDR images, and HDR image conversion can be realized without increasing the cost of data acquisition, encoding, transmission, storage, etc., thereby improving the efficiency of HDR image conversion and reducing the cost of HDR images. image.
  • FIG. 1 is a schematic block diagram of a video encoding and decoding system involved in an embodiment of the present application
  • Fig. 2 is a schematic block diagram of a video encoder provided by an embodiment of the present application.
  • Fig. 3 is a schematic block diagram of a video decoder provided by an embodiment of the present application.
  • FIG. 4 is a schematic flow chart of a dynamic conversion model training method provided by an embodiment of the present application.
  • FIG. 5A is a network schematic diagram of a dynamic conversion model involved in an embodiment of the present application.
  • FIG. 5B is a schematic network diagram of a convolution block involved in an embodiment of the present application.
  • FIG. 5C is a network schematic diagram of a dynamic conversion model involved in an embodiment of the present application.
  • FIG. 5D is a network diagram of a convolutional attention module involved in an embodiment of the present application.
  • FIG. 5E is a network diagram of a channel attention module involved in an embodiment of the present application.
  • FIG. 5F is a network schematic diagram of a spatial attention module involved in an embodiment of the present application.
  • FIG. 5G is a network schematic diagram of a dynamic conversion model involved in an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of an image decoding method provided by an embodiment of the present application.
  • FIG. 7 is a network diagram of a spatial attention module involved in an embodiment of the present application.
  • FIG. 8 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • FIG. 9 is a schematic block diagram of an image decoding device provided by an embodiment of the present application.
  • FIG. 10 is a schematic block diagram of an image processing device provided by an embodiment of the present application.
  • Fig. 11 is a schematic block diagram of a model training device provided by an embodiment of the present application.
  • Fig. 12 is a schematic block diagram of an electronic device provided by an embodiment of the present application.
  • the present application can be applied to the technical field of point cloud upsampling, for example, can be applied to the technical field of point cloud compression.
  • the application can be applied to the field of image codec, video codec, hardware video codec, dedicated circuit video codec, real-time video codec, etc.
  • the solution of the present application can be combined with audio and video coding standards (audio video coding standard, referred to as AVS), for example, H.264/audio video coding (audio video coding, referred to as AVC) standard, H.265/high efficiency video coding ( High efficiency video coding (HEVC for short) standard and H.266/versatile video coding (VVC for short) standard.
  • the solutions of the present application may operate in conjunction with other proprietary or industry standards, including ITU-TH.261, ISO/IECMPEG-1Visual, ITU-TH.262 or ISO/IECMPEG-2Visual, ITU-TH.263 , ISO/IECMPEG-4Visual, ITU-TH.264 (also known as ISO/IECMPEG-4AVC), including scalable video codec (SVC) and multi-view video codec (MVC) extensions.
  • SVC scalable video codec
  • MVC multi-view video codec
  • FIG. 1 is a schematic block diagram of a video encoding and decoding system involved in an embodiment of the present application. It should be noted that FIG. 1 is only an example, and the video codec system in the embodiment of the present application includes but is not limited to what is shown in FIG. 1 .
  • the video codec system 100 includes an encoding device 110 and a decoding device 120 .
  • the encoding device is used to encode (can be understood as compression) the video data to generate a code stream, and transmit the code stream to the decoding device.
  • the decoding device decodes the code stream generated by the encoding device to obtain decoded video data.
  • the encoding device 110 in the embodiment of the present application can be understood as a device having a video encoding function
  • the decoding device 120 can be understood as a device having a video decoding function, that is, the embodiment of the present application includes a wider range of devices for the encoding device 110 and the decoding device 120, Examples include smartphones, desktop computers, mobile computing devices, notebook (eg, laptop) computers, tablet computers, set-top boxes, televisions, cameras, display devices, digital media players, video game consoles, vehicle-mounted computers, and the like.
  • the encoding device 110 may transmit the encoded video data (such as code stream) to the decoding device 120 via the channel 130 .
  • Channel 130 may include one or more media and/or devices capable of transmitting encoded video data from encoding device 110 to decoding device 120 .
  • channel 130 includes one or more communication media that enable encoding device 110 to transmit encoded video data directly to decoding device 120 in real-time.
  • encoding device 110 may modulate the encoded video data according to a communication standard and transmit the modulated video data to decoding device 120 .
  • the communication medium includes a wireless communication medium, such as a radio frequency spectrum.
  • the communication medium may also include a wired communication medium, such as one or more physical transmission lines.
  • the channel 130 includes a storage medium that can store video data encoded by the encoding device 110 .
  • the storage medium includes a variety of local access data storage media, such as optical discs, DVDs, flash memory, and the like.
  • the decoding device 120 may acquire encoded video data from the storage medium.
  • channel 130 may include a storage server that may store video data encoded by encoding device 110 .
  • the decoding device 120 may download the stored encoded video data from the storage server.
  • the storage server may store the encoded video data and may transmit the encoded video data to the decoding device 120, such as a web server (eg, for a website), a file transfer protocol (FTP) server, and the like.
  • FTP file transfer protocol
  • the encoding device 110 includes a video encoder 112 and an output interface 113 .
  • the output interface 113 may include a modulator/demodulator (modem) and/or a transmitter.
  • the encoding device 110 may include a video source 111 in addition to the video encoder 112 and the input interface 113 .
  • the video source 111 may include at least one of a video capture device (for example, a video camera), a video archive, a video input interface, a computer graphics system, wherein the video input interface is used to receive video data from a video content provider, and the computer graphics system Used to generate video data.
  • a video capture device for example, a video camera
  • a video archive for example, a video archive
  • a video input interface for example, a video archive
  • video input interface for example, a video input interface
  • computer graphics system used to generate video data.
  • the video encoder 112 encodes the video data from the video source 111 to generate a code stream.
  • Video data may include one or more pictures or a sequence of pictures.
  • the code stream contains the encoding information of an image or image sequence in the form of a bit stream.
  • Encoding information may include encoded image data and associated data.
  • the associated data may include a sequence parameter set (SPS for short), a picture parameter set (PPS for short) and other syntax structures.
  • SPS sequence parameter set
  • PPS picture parameter set
  • An SPS may contain parameters that apply to one or more sequences.
  • a PPS may contain parameters applied to one or more images.
  • the syntax structure refers to a set of zero or more syntax elements arranged in a specified order in the code stream.
  • the video encoder 112 directly transmits encoded video data to the decoding device 120 via the output interface 113 .
  • the encoded video data can also be stored on a storage medium or a storage server for subsequent reading by the decoding device 120 .
  • the decoding device 120 includes an input interface 121 and a video decoder 122 .
  • the decoding device 120 may include a display device 123 in addition to the input interface 121 and the video decoder 122 .
  • the input interface 121 includes a receiver and/or a modem.
  • the input interface 121 can receive encoded video data through the channel 130 .
  • the video decoder 122 is used to decode the encoded video data to obtain decoded video data, and transmit the decoded video data to the display device 123 .
  • the display device 123 displays the decoded video data.
  • the display device 123 may be integrated with the decoding device 120 or external to the decoding device 120 .
  • the display device 123 may include various display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or other types of display devices.
  • LCD liquid crystal display
  • plasma display a plasma display
  • OLED organic light emitting diode
  • FIG. 1 is only an example, and the technical solutions of the embodiments of the present application are not limited to FIG. 1 .
  • the technology of the present application may also be applied to one-sided video encoding or one-sided video decoding.
  • Fig. 2 is a schematic block diagram of a video encoder provided by an embodiment of the present application. It should be understood that the video encoder 200 can be used to perform lossy compression on images, and can also be used to perform lossless compression on images.
  • the lossless compression may be visually lossless compression or mathematically lossless compression.
  • the video encoder 200 can be applied to image data in luminance-chrominance (YCbCr, YUV) format.
  • the YUV ratio can be 4:2:0, 4:2:2 or 4:4:4, Y means brightness (Luma), Cb (U) means blue chroma, Cr (V) means red chroma, U and V are expressed as chroma (Chroma) for describing color and saturation.
  • 4:2:0 means that every 4 pixels have 4 luminance components
  • 2 chroma components YYYYCbCr
  • 4:2:2 means that every 4 pixels have 4 luminance components
  • 4 Chroma component YYYYCbCrCbCr
  • 4:4:4 means full pixel display (YYYYCbCrCbCrCbCrCbCr).
  • the video encoder 200 reads video data, and for each frame of image in the video data, divides a frame of image into several coding tree units (coding tree unit, CTU), "largest coding unit” (Largest Coding unit, LCU for short) or "coding tree block” (coding tree block, CTB for short).
  • Each CTU may be associated with a pixel block of equal size within the image.
  • Each pixel may correspond to one luminance (luma) sample and two chrominance (chrominance or chroma) samples.
  • each CTU may be associated with one block of luma samples and two blocks of chroma samples.
  • a CTU size is, for example, 128 ⁇ 128, 64 ⁇ 64, 32 ⁇ 32 and so on.
  • a CTU can be further divided into several coding units (Coding Unit, CU) for coding, and the CU can be a rectangular block or a square block.
  • the CU can be further divided into a prediction unit (PU for short) and a transform unit (TU for short), so that coding, prediction, and transformation are separated, and processing is more flexible.
  • a CTU is divided into CUs in a quadtree manner, and a CU is divided into TUs and PUs in a quadtree manner.
  • the video encoder and video decoder can support various PU sizes. Assuming that the size of a specific CU is 2N ⁇ 2N, video encoders and video decoders may support 2N ⁇ 2N or N ⁇ N PU sizes for intra prediction, and support 2N ⁇ 2N, 2N ⁇ N, N ⁇ 2N, NxN or similarly sized symmetric PUs for inter prediction. The video encoder and video decoder may also support asymmetric PUs of 2NxnU, 2NxnD, nLx2N, and nRx2N for inter prediction.
  • the video encoder 200 may include: a prediction unit 210, a residual unit 220, a transform/quantization unit 230, an inverse transform/quantization unit 240, a reconstruction unit 250, and a loop filter unit 260. Decoded image cache 270 and entropy encoding unit 280. It should be noted that the video encoder 200 may include more, less or different functional components.
  • the current block may be called a current coding unit (CU) or a current prediction unit (PU).
  • a predicted block may also be referred to as a predicted block to be encoded or an image predicted block, and a reconstructed block to be encoded may also be referred to as a reconstructed block or an image reconstructed block to be encoded.
  • the prediction unit 210 includes an inter prediction unit 211 and an intra prediction unit 212 . Because there is a strong correlation between adjacent pixels in a video frame, the intra-frame prediction method is used in video coding and decoding technology to eliminate the spatial redundancy between adjacent pixels. Due to the strong similarity between adjacent frames in video, the inter-frame prediction method is used in video coding and decoding technology to eliminate time redundancy between adjacent frames, thereby improving coding efficiency.
  • the inter-frame prediction unit 211 can be used for inter-frame prediction.
  • the inter-frame prediction can refer to image information of different frames.
  • the inter-frame prediction uses motion information to find a reference block from the reference frame, and generates a prediction block according to the reference block to eliminate temporal redundancy;
  • Frames used for inter-frame prediction may be P frames and/or B frames, P frames refer to forward predictive frames, and B frames refer to bidirectional predictive frames.
  • the motion information includes the reference frame list where the reference frame is located, the reference frame index, and the motion vector.
  • the motion vector can be an integer pixel or a sub-pixel. If the motion vector is sub-pixel, then it is necessary to use interpolation filtering in the reference frame to make the required sub-pixel block.
  • the reference frame found according to the motion vector A block of whole pixels or sub-pixels is called a reference block.
  • Some technologies will directly use the reference block as a prediction block, and some technologies will further process the reference block to generate a prediction block. Reprocessing and generating a prediction block based on a reference block can also be understood as taking the reference block as a prediction block and then processing and generating a new prediction block based on the prediction block.
  • inter-frame prediction methods include: geometric partitioning mode (GPM) in the VVC video codec standard, and angular weighted prediction (AWP) in the AVS3 video codec standard. These two intra-frame prediction modes have something in common in principle.
  • GPM geometric partitioning mode
  • AVS3 angular weighted prediction
  • the intra-frame prediction unit 212 only refers to the information of the same frame image, and predicts the pixel information in the block to be encoded of the current code, so as to eliminate spatial redundancy.
  • a frame used for intra prediction may be an I frame.
  • the intra prediction method further includes a multiple reference line intra prediction method (multiple reference line, MRL).
  • MRL can use more reference pixels to improve coding efficiency.
  • mode 0 is to copy the pixels above the current block to the current block in the vertical direction as the prediction value
  • mode 1 is to copy the reference pixels on the left to the current block in the horizontal direction as the prediction value
  • mode 2 (DC) is to copy the pixels from A to The average value of the 8 points D and I ⁇ L is used as the prediction value of all points
  • modes 3 to 8 are to copy the reference pixel to the corresponding position of the current block according to a certain angle. Because some positions of the current block cannot exactly correspond to the reference pixels, it may be necessary to use the weighted average of the reference pixels, or the sub-pixels of the interpolated reference pixels.
  • the intra prediction modes used by HEVC include planar mode (Planar), DC and 33 angle modes, a total of 35 prediction modes.
  • the intra-frame modes used by VVC include Planar, DC and 65 angle modes, with a total of 67 prediction modes.
  • the intra-frame modes used by AVS3 include DC, Plane, Bilinear and 63 angle modes, a total of 66 prediction modes.
  • the intra-frame prediction will be more accurate, and it will be more in line with the demand for the development of high-definition and ultra-high-definition digital video.
  • the residual unit 220 may generate a residual block of the CU based on the pixel blocks of the CU and the prediction blocks of the PUs of the CU. For example, residual unit 220 may generate a residual block for a CU such that each sample in the residual block has a value equal to the difference between the samples in the pixel blocks of the CU, and the samples in the PUs of the CU. Corresponding samples in the predicted block.
  • Transform/quantization unit 230 may quantize the transform coefficients. Transform/quantization unit 230 may quantize transform coefficients associated with TUs of a CU based on quantization parameter (QP) values associated with the CU. Video encoder 200 may adjust the degree of quantization applied to transform coefficients associated with a CU by adjusting the QP value associated with the CU.
  • QP quantization parameter
  • Inverse transform/quantization unit 240 may apply inverse quantization and inverse transform to the quantized transform coefficients, respectively, to reconstruct a residual block from the quantized transform coefficients.
  • the reconstruction unit 250 may add samples of the reconstructed residual block to corresponding samples of one or more prediction blocks generated by the prediction unit 210 to generate a reconstructed block to be encoded associated with the TU. By reconstructing the sample blocks of each TU of the CU in this way, the video encoder 200 can reconstruct the pixel blocks of the CU.
  • Loop filtering unit 260 may perform deblocking filtering operations to reduce blocking artifacts of pixel blocks associated with a CU.
  • the loop filtering unit 260 includes a deblocking filtering unit, a sample point adaptive compensation SAO unit, and an adaptive loop filtering ALF unit.
  • the decoded image buffer 270 may store reconstructed pixel blocks.
  • Inter prediction unit 211 may use reference pictures containing reconstructed pixel blocks to perform inter prediction on PUs of other pictures.
  • intra prediction unit 212 may use the reconstructed pixel blocks in decoded picture cache 270 to perform intra prediction on other PUs in the same picture as the CU.
  • Entropy encoding unit 280 may receive the quantized transform coefficients from transform/quantization unit 230 . Entropy encoding unit 280 may perform one or more entropy encoding operations on the quantized transform coefficients to generate entropy encoded data.
  • the basic flow of video coding involved in this application is as follows: at the coding end, the current image is divided into blocks, and for the current block, the prediction unit 210 uses intra prediction or inter prediction to generate a prediction block of the current block.
  • the residual unit 220 may calculate a residual block based on the predicted block and the original block of the current block, that is, a difference between the predicted block and the original block of the current block, and the residual block may also be referred to as residual information.
  • the residual block can be transformed and quantized by the transformation/quantization unit 230 to remove information that is not sensitive to human eyes, so as to eliminate visual redundancy.
  • the residual block before being transformed and quantized by the transform/quantization unit 230 may be called a time domain residual block, and the time domain residual block after being transformed and quantized by the transform/quantization unit 230 may be called a frequency residual block or a frequency-domain residual block.
  • the entropy encoding unit 280 receives the quantized transform coefficients output by the transform and quantization unit 230 , may perform entropy encoding on the quantized transform coefficients, and output a code stream.
  • the entropy coding unit 280 can eliminate character redundancy according to the target context model and the probability information of the binary code stream.
  • the video encoder performs inverse quantization and inverse transformation on the quantized transform coefficients output by the transform and quantization unit 230 to obtain a residual block of the current block, and then adds the residual block of the current block to the prediction block of the current block, Get the reconstructed block of the current block.
  • reconstructed blocks corresponding to other blocks to be encoded in the current image can be obtained, and these reconstructed blocks are spliced to obtain a reconstructed image of the current image.
  • filter the reconstructed image for example, use ALF to filter the reconstructed image to reduce the difference between the pixel value of the pixel in the reconstructed image and the original pixel value of the pixel in the current image difference.
  • the filtered reconstructed image is stored in the decoded image buffer 270, which may serve as a reference frame for inter-frame prediction for subsequent frames.
  • the block division information determined by the encoder as well as mode information or parameter information such as prediction, transformation, quantization, entropy coding, and loop filtering, etc., are carried in the code stream when necessary.
  • the decoding end analyzes the code stream and analyzes the existing information to determine the same block division information as the encoding end, prediction, transformation, quantization, entropy coding, loop filtering and other mode information or parameter information, so as to ensure the decoding image obtained by the encoding end It is the same as the decoded image obtained by the decoder.
  • Fig. 3 is a schematic block diagram of a video decoder provided by an embodiment of the present application.
  • the video decoder 300 includes: an entropy decoding unit 310 , a prediction unit 320 , an inverse quantization/transformation unit 330 , a reconstruction unit 340 , a loop filter unit 350 and a decoded image buffer 360 . It should be noted that the video decoder 300 may include more, less or different functional components.
  • the video decoder 300 can receive code streams.
  • the entropy decoding unit 310 may parse the codestream to extract syntax elements from the codestream. As part of parsing the codestream, the entropy decoding unit 310 may parse the entropy-encoded syntax elements in the codestream.
  • the prediction unit 320 , the inverse quantization/transformation unit 330 , the reconstruction unit 340 and the loop filter unit 350 can decode video data according to the syntax elements extracted from the code stream, that is, generate decoded video data.
  • the prediction unit 320 includes an intra prediction unit 321 and an inter prediction unit 322 .
  • Intra prediction unit 321 may perform intra prediction to generate a predictive block for a PU. Intra prediction unit 321 may use an intra prediction mode to generate a prediction block for a PU based on pixel blocks of spatially neighboring PUs. Intra prediction unit 321 may also determine an intra prediction mode for a PU from one or more syntax elements parsed from a codestream.
  • the inter prediction unit 322 may construct a first reference picture list (list 0) and a second reference picture list (list 1) according to the syntax elements parsed from the codestream. Furthermore, if the PU is encoded using inter prediction, entropy decoding unit 310 may parse the motion information for the PU. Inter prediction unit 322 may determine one or more reference blocks for the PU according to the motion information of the PU. Inter prediction unit 322 may generate a predictive block for the PU from one or more reference blocks for the PU.
  • Inverse quantization/transform unit 330 may inverse quantize (ie, dequantize) transform coefficients associated with a TU. Inverse quantization/transform unit 330 may use QP values associated with CUs of the TU to determine the degree of quantization.
  • inverse quantization/transform unit 330 may apply one or more inverse transforms to the inverse quantized transform coefficients in order to generate a residual block associated with the TU.
  • Reconstruction unit 340 uses the residual blocks associated with the TUs of the CU and the prediction blocks of the PUs of the CU to reconstruct the pixel blocks of the CU. For example, the reconstruction unit 340 may add the samples of the residual block to the corresponding samples of the prediction block to reconstruct the pixel block of the CU, and obtain the reconstructed block to be encoded.
  • Loop filtering unit 350 may perform deblocking filtering operations to reduce blocking artifacts of pixel blocks associated with a CU.
  • the loop filtering unit 350 includes a deblocking filtering unit, a sample point adaptive compensation SAO unit, and an adaptive loop filtering ALF unit.
  • Video decoder 300 may store the reconstructed picture of the CU in decoded picture cache 360 .
  • the video decoder 300 may use the reconstructed picture in the decoded picture buffer 360 as a reference picture for subsequent prediction, or transmit the reconstructed picture to a display device for presentation.
  • the entropy decoding unit 310 can analyze the code stream to obtain the prediction information of the current block, the quantization coefficient matrix, etc., and the prediction unit 320 uses intra prediction or inter prediction to generate the current block based on the prediction information.
  • the inverse quantization/transformation unit 330 uses the quantization coefficient matrix obtained from the code stream to perform inverse quantization and inverse transformation on the quantization coefficient matrix to obtain a residual block.
  • the reconstruction unit 340 adds the predicted block and the residual block to obtain a reconstructed block.
  • the reconstructed blocks form a reconstructed image
  • the loop filtering unit 350 performs loop filtering on the reconstructed image based on the image or based on the block to obtain a decoded image.
  • the decoded image can also be referred to as a reconstructed image.
  • the reconstructed image can be displayed by a display device, and on the other hand, it can be stored in the decoded image buffer 360 and serve as a reference frame for inter-frame prediction for subsequent frames.
  • the above is the basic process of the video codec under the block-based hybrid coding framework. With the development of technology, some modules or steps of the framework or process may be optimized. This application is applicable to the block-based hybrid coding framework.
  • the basic process of the video codec but not limited to the framework and process.
  • HDR high dynamic range
  • LDR low dynamic range
  • An embodiment of the present application provides a model-based image processing method, which converts an LDR image into an HDR image through a model. That is, the encoding end encodes the LDR image to form a code stream and transmits it to the decoding end. After decoding the LDR image, the decoding end uses the model of the embodiment of the present application to dynamically convert the decoded LDR image to obtain an HDR image. HDR image conversion is achieved while reducing the cost of encoding, transmission, and storage.
  • the image processing method provided in the present application converts an LDR image into an HDR image by using a dynamic conversion model, and the dynamic conversion model is a piece of software code or a chip with data processing functions. Based on this, the training process of the dynamic conversion model is firstly introduced.
  • Fig. 4 is a schematic flow chart of a dynamic conversion model training method provided by an embodiment of the present application. As shown in Fig. 4, the training process includes:
  • the above-mentioned LDR training image is a randomly selected LDR training image in the training set, which includes a plurality of LDR training images
  • the training process of the dynamic conversion model using the LDR training images in the training set is an iterative process.
  • the first LDR training image is input into the dynamic conversion model to be trained, and the initial parameters of the dynamic conversion model are adjusted once to obtain the dynamic conversion model trained for the first time.
  • input the second LDR training image into the dynamic conversion model trained for the first time adjust the parameters of the dynamic conversion model trained for the first time, and obtain the dynamic conversion model trained for the second time, refer to the above method, iterates in sequence until the training end condition of the dynamic conversion model is reached.
  • the training end condition of the dynamic conversion model includes that the number of training times reaches a preset number of times, or the loss reaches a preset loss.
  • the methods for determining the initial parameters of the above-mentioned dynamic conversion model include but are not limited to the following:
  • the initial parameters of the dynamic conversion model may be preset values, or random values, or empirical values.
  • the second way is to obtain the pre-training parameters obtained during the pre-training of the pre-training model, and determine the pre-training parameters as the initial parameters of the dynamic conversion model.
  • the second way is to determine the pre-training parameters of the pre-training model as the initial parameters of the dynamic conversion model, which can reduce the number of training times and training accuracy of the dynamic conversion model.
  • the pre-training model is the VGG-16 network model.
  • the true value of the HDR image of the above-mentioned LDR training image may be dynamically converted to the LDR training image manually to generate an HDR image.
  • the true value of the HDR image of the above-mentioned LDR training image may be an HDR image obtained by converting the LDR training image using an existing high dynamic conversion method.
  • the collected HDR image may be converted into an LDR image, the converted LDR image may be used as an LDR training image, and the collected HDR image may be used as a true value of the HDR image of the LDR training image.
  • the embodiment of the present application does not limit the way of acquiring the LDR training image and the HDR image true value of the LDR training image.
  • the network structure of the dynamic conversion model involved in the embodiment of the present application will be introduced below in conjunction with FIG. 5A. It should be noted that the network structure of the dynamic conversion model in the embodiment of the present application includes but is not limited to the modules shown in FIG. Figure 5A More or less modules.
  • FIG. 5A is a schematic network diagram of a dynamic conversion model according to an embodiment of the present application.
  • the dynamic conversion model can be understood as an autoencoder network composed of N-level encoding components and decoding components.
  • the dynamic conversion model includes: N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in the N encoding modules is connected to the input of the first decoding module in the N decoding modules, and
  • the i-th encoding module is connected to the N-i+1-th decoding module by skip connection.
  • the skip connection can be understood as the connection between the input end of the i-th encoding module and the input end of the N-i+1-th decoding module.
  • the i-th encoding module is used to perform feature extraction on the i-1-th first feature information to obtain the i-th first feature information of the LDR training image
  • the N-i+1-th decoding module is used to extract the i-1-th first feature information
  • the first feature information and the N-i second feature information of the LDR training image are extracted to obtain the N-i+1 second feature information of the LDR training image, i is a positive integer less than or equal to N, and N is positive integer.
  • the above N-i th second feature information is determined according to the N th first feature information output by the N th encoding module.
  • the above N-i th second feature information is determined according to the N-i th second feature information output by the N-i th decoding module.
  • the i-1th first feature information is determined according to the LDR training image.
  • the i-1th first feature information is determined according to the first feature information output by the i-1th coding module.
  • the encoding component includes 4 serial encoding modules
  • the decoding component includes 4 serial decoding modules
  • the output of the last encoding module is connected to the input of the first decoding module.
  • the first coding module is connected to the fourth decoding module by skipping
  • the second coding module is connected to the third decoding module by skipping
  • the third coding module is connected to the second decoding module by skipping
  • the fourth coding module is connected to the first skip connections of decoding modules.
  • Input the LDR training image into the dynamic conversion model to obtain the 0th first feature information can be the LDR training image, or the feature map after the LDR training image is processed, the embodiment of the present application is for This is not limited.
  • Input the 0th first feature information into the first encoding module and the fourth decoding module respectively the first encoding module outputs the first first feature information according to the 0th first feature information, and the first The first feature information is respectively input into the second encoding module and the third decoding module.
  • the second encoding module obtains the second first feature information according to the first first feature information, and inputs the second first feature information into the third encoding module and the second decoding module respectively.
  • the third encoding module obtains the third first characteristic information according to the second first characteristic information, and inputs the third first characteristic information into the fourth encoding module and the first decoding module respectively.
  • the fourth encoding module outputs the fourth first characteristic information according to the third first characteristic information, and inputs the fourth first characteristic information into the first decoding module.
  • the first decoding module obtains the first second characteristic information according to the fourth first characteristic information and the third first characteristic information, and inputs the first second characteristic information into the second decoding module.
  • the second decoding module obtains the second second characteristic information according to the first second characteristic information and the second first characteristic information, and inputs the second second characteristic information into the third decoding module.
  • the third decoding module obtains the third second characteristic information according to the second second characteristic information and the first first characteristic information, and inputs the third second characteristic information into the fourth decoding module.
  • the fourth decoding module obtains the fourth second characteristic information according to the 0th first characteristic information and the third second characteristic information.
  • the above S403 includes: concatenating the i-1th first feature information and the N-ith second feature information of the LDR training image, and “C” in FIG. 5A indicates Concatenation: input the concatenated feature information into the N-i+1th decoding module for feature extraction, and obtain the N-i+1th second feature information of the LDR training image.
  • the fourth first feature information and the third first feature information are concatenated, and the concatenated fourth first feature information and third first feature information are input into the first decoding module to obtain The first second feature information output by the first decoding module.
  • the second second characteristic information and the first first characteristic information are concatenated, and the concatenated second second characteristic information and the first first characteristic information are input into the third decoding module to obtain the third The third second feature information output by a decoding module.
  • the 0th first characteristic information and the third second characteristic information are concatenated, and the concatenated 0th first characteristic information and the third second characteristic information are input into the fourth decoding module, The fourth second feature information output by the fourth decoding module is obtained.
  • the embodiment of the present application does not limit the specific network structure of the encoding module.
  • each of the N coding modules includes at least one convolutional block, and parameters of the convolutional blocks included in each of the N coding modules are not completely the same.
  • the feature dimension of the convolution block included in the first encoding module is 64
  • the feature dimension of the convolution block included in the second encoding module is 128, and the feature dimension of the convolution block included in the third encoding module is is 256
  • the feature dimension of the convolutional block included in the fourth encoding module is 512 and so on.
  • the embodiment of the present application does not limit the specific network structure of the decoding module.
  • each of the N decoding modules includes at least one convolutional block, and parameters of the convolutional blocks included in each of the N decoding modules are not completely the same.
  • the feature dimension of the convolution block included in the first decoding module is 256
  • the feature dimension of the convolution block included in the second decoding module is 128,
  • the feature dimension of the convolution block included in the third decoding module is is 64
  • the feature dimension of the convolutional block included in the fourth coding module is 32 and so on.
  • the network structures of the convolutional blocks included in the encoding modules in the embodiments of the present application may be the same or different.
  • the network structures of the convolutional blocks included in each decoding module may be the same or different.
  • the network structures of the convolutional blocks included in the encoding module and the decoding module may be the same or different, which is not limited in this application.
  • the network structure of the convolutional block included in the encoding module and/or the decoding module includes a convolutional layer 1, a convolutional layer 2, a convolutional layer 3 and an activation function.
  • the convolution kernels of convolution layer 1 and convolution layer 2 are 3 ⁇ 3
  • the convolution kernel of convolution layer 3 is 1 ⁇ 1
  • the activation function is a Sigmoid weighted linear unit (Sigmoid Weighted Liner Unit, referred to as SiLU).
  • the sizes of the convolution kernels of the above-mentioned convolutional layer 1, convolutional layer 2, and convolutional layer 3 include but are not limited to the above values, and the activation functions include but are not limited to SiLU, such as RELU, etc. This is not limited.
  • the dynamic conversion model further includes: a convolutional block attention module (Convolutional Block Attention Module, Abbreviated as CBAM).
  • CBAM convolutional Block Attention Module
  • the attention mechanism of this convolutional attention module enables the dynamic transformation model to focus more attention on the relevant parts of the encoding side features and less attention on other irrelevant parts, that is, by The convolutional attention mechanism is used to improve the representation ability of the dynamic conversion model, focusing on important features and suppressing unnecessary features, thus greatly improving the efficiency of the model.
  • one or more CBAMs are included in the skip connections between each encoding module and decoding module.
  • the above S403 performs feature extraction on the i-1th first feature information and the N-ith second feature information of the LDR training image through the N-i+1th decoding module , the N-i+1 second feature information of the LDR training image includes S403-A and S403-B:
  • N-i+1th decoding module uses the N-i+1th decoding module to perform feature extraction on the i-1th third feature information and the N-ith second feature information to obtain the N-i+1th second feature information of the LDR training image characteristic information.
  • the i-1th third feature information and the N-ith second feature information are concatenated, and the concatenated i-1th third feature information and N-ith second feature information are input into the N-th
  • the i+1 decoding module obtains the N-i+1th second feature information of the LDR training image output by the N-i+1th decoding module.
  • the embodiment of the present application does not limit the network structure of the convolutional attention module.
  • the convolutional attention module includes: a channel attention module and a spatial attention module.
  • the channel attention module learns the channel information of features by using the inter-channel relationship of features
  • the spatial attention module learns the spatial information of features by using the spatial relationship of features.
  • the channel to which it belongs here can be understood as a feature dimension.
  • the feature dimension of a piece of feature information is 32, it means that the number of channels of the feature information is 32.
  • Using the spatial attention module perform spatial information extraction on the fusion channel feature information of the i-1 first feature information, to obtain the spatial attention information of the i-1 first feature information.
  • the fused channel feature information of the i-1 th first feature information is determined according to the i-1 th first feature information and the channel attention information of the i-1 th first feature information.
  • the convolutional attention module also includes a first multiplication unit, at this time S403-A2 includes S403-A21 and S403-A22:
  • S403-A3. Determine the i-1th third feature information of the LDR training image according to the channel attention information and the spatial attention information of the i-1th first feature information.
  • the convolutional attention module further includes a second multiplication unit, then S403-A3 includes: the fusion channel feature information of the i-1th first feature information through the second multiplication unit Multiply with the spatial attention information to obtain the i-1th third feature information of the LDR training image.
  • the network structure of the convolutional attention module is shown in Figure 5D.
  • the i-1th first feature information is a feature map F
  • the feature map F is input into the CBAM module, and the CBAM module will follow two independent dimensions (i.e. channel dimension and spatial dimension) the attention map is sequentially inferred, and then the attention map is multiplied with the input feature map for adaptive feature optimization.
  • the one-dimensional channel attention map MC is obtained through the channel attention module
  • F' is obtained after multiplying MC and the input feature F.
  • the final feature map F" is obtained after multiplying Ms and F', and the final feature map is the i-1th third feature information of the LDR training image.
  • Figure 5D Indicates that the corresponding elements are multiplied sequentially.
  • the dimension of the input feature map F is H ⁇ W ⁇ C
  • the dimension of the 1D channel attention map MC is 1 ⁇ 1 ⁇ C
  • the dimension of the 2D spatial attention map Ms is H ⁇ W ⁇ 1.
  • the channel attention module includes: a first spatial compression unit, a second spatial compression unit, and a channel feature extraction unit.
  • both the first space compression unit and the second space compression unit are used to compress the spatial size of the feature map
  • the channel feature extraction unit is used to perform feature extraction on the space compressed feature map. That is, as shown in Figure 5F, in order to efficiently calculate channel attention, the present application compresses the spatial dimension of the input feature map.
  • the above-mentioned first spatial compression unit and/or the second spatial compression unit includes a pooling layer.
  • the above-mentioned first spatial compression unit is a maximum pooling layer
  • the second spatial compression unit is an average pooling layer
  • the channel feature extraction unit is a multilayer perception machine (Multilayer perception, MLP for short), for example, the MLP is an MLP including a single hidden layer.
  • MLP multilayer perception machine
  • channel information is extracted from the i-1 first feature information through the channel attention module, and the channel attention information of the i-1 first feature information is obtained including S403- A11 to S403-A15:
  • S403-A15 Determine the channel attention information of the i-1 first feature information according to the first channel information and the second channel information of the i-1 first feature information.
  • the channel attention module further includes: a first addition unit and a first activation function.
  • the above S403-A15 includes:
  • the embodiment of the present application does not limit the specific form of the first activation function, which is specifically determined according to actual needs.
  • the spatial attention module includes: a first channel compression unit, a second channel compression unit, and a spatial feature extraction unit. Both the first channel compression unit and the second channel compression unit are used to compress the channel dimension of the feature map, and the spatial feature extraction unit is used to perform feature extraction on the channel compressed feature map. That is, the spatial attention module shown in Figure 5F generates a spatial attention map by utilizing the spatial relationship between features. Spatial attention complements channel attention. To compute spatial attention, the channel dimensions of the input feature maps are compressed.
  • the first channel compression unit and/or the second channel compression unit include a pooling layer.
  • the first channel compression unit is a maximum pooling layer (MaxPool), and/or the second channel compression unit is an average pooling (AvgPool) layer.
  • MaxPool maximum pooling layer
  • AvgPool average pooling
  • the aforementioned spatial feature extraction unit is a convolutional layer.
  • the above S403-A2 uses the spatial attention module to extract the spatial information of the fusion channel feature information of the i-1 first feature information to obtain the spatial attention information of the i-1 first feature information, including S403 -A21 to S403-A24:
  • the spatial attention module further includes a second activation function
  • S403-A24 includes: performing non-linearity on the spatial feature information of the i-1th first feature information through the second activation function processing to obtain the spatial attention information of the i-1th first feature information.
  • the embodiment of the present application does not limit the specific form of the second activation function, for example, a sigmoid activation function.
  • the spatial attention module utilizes average pooling (ie, the second channel compression unit) and maximum pooling (ie, the first channel compression unit) operations to generate corresponding feature vectors along the channel (channel) axis, and concatenate the two to generate efficient feature descriptors.
  • average pooling ie, the second channel compression unit
  • maximum pooling ie, the first channel compression unit
  • a two-dimensional spatial attention feature map Ms is generated after a sigmoid activation function (ie, the second activation function).
  • the spatial dimension of the channel attention information of the i-1th first feature information is 1 ⁇ 1.
  • the feature dimension of the spatial attention information of the i-1th first feature information is 1.
  • the dynamic conversion model provided by the embodiment of the present application adds a convolutional attention module to each branch, and the convolutional attention module includes a channel attention module and a spatial attention module, respectively for channel features and spatial features Learning is carried out, thereby improving the learning of image detail features by the dynamic conversion model, so that the trained dynamic conversion model can reconstruct more detailed features in the image, thereby improving the quality of the HDR image generated by the dynamic conversion model.
  • the dynamic conversion model further includes at least one downsampling unit
  • the training method in the embodiment of the present application further includes: performing spatial dimension downsampling on the feature information output by the encoding module through the downsampling unit. That is, in order to reduce network complexity in the embodiment of the present application, at least one downsampling unit is set in the coding component to reduce the spatial dimension of the feature information output by the coding module.
  • the embodiment of the present application does not limit the number of down-sampling units included in the dynamic conversion model, which is specifically determined according to actual requirements.
  • a downsampling unit is set between two adjacent encoding modules, which is used to downsample the feature information output by the previous encoding unit in a spatial dimension, and then input it into the next encoding module, This not only reduces the amount of data processed by the encoding module and reduces the complexity of the model, but also enables each encoding module to learn features of different sizes to improve the prediction accuracy of the dynamic conversion model.
  • the downsampling unit is a maximum pooling layer.
  • the dynamic conversion model further includes at least one upsampling unit
  • the training method in the embodiment of the present application further includes: performing spatial dimension upsampling on the feature information output by the decoding module through the upsampling unit.
  • At least one down-sampling unit is set in the encoding component, in order to ensure that the size of the decoded image is consistent with the size of the original image, at least one up-sampling unit is set in the decoding component for decoding
  • the feature information output by the module is up-sampled in the spatial dimension.
  • the upsampling unit is a bilinear interpolation unit.
  • the dynamic conversion model further includes a first convolutional layer, the first convolutional layer is located at the input end of the dynamic conversion model, and is used to process the image input to the dynamic conversion model to obtain the input The initial feature map of the image.
  • the first convolutional layer is located at the input end of the dynamic conversion model, and is used to process the image input to the dynamic conversion model to obtain the input The initial feature map of the image.
  • input the LDR training image into the dynamic conversion model and extract the features of the LDR training image through the first convolutional layer in the dynamic conversion model to obtain the initial feature map of the LDR training image; input the initial feature map into the first encoding module respectively
  • the first convolutional attention module the first first feature information output by the first encoding module, and the first third feature information output by the first convolutional attention module are obtained.
  • the aforementioned initial feature map can be understood as the aforementioned 0th first feature information.
  • the LDR training image is input into the dynamic conversion model, and the second characteristic information of the LDR training image output by the last decoding module in the dynamic conversion model can be obtained, and then, the following S404 is performed.
  • S404 Determine the HDR image prediction value of the LDR training image according to the second characteristic information of the LDR training image output by the last decoding module among the N decoding modules.
  • the channel of the second feature information of the LDR training image is converted into 3 channels (such as RGB channels) to obtain the predicted value of the HDR image of the LDR training image.
  • the dynamic conversion model further includes a second convolutional layer
  • the above S404 includes: performing the second feature information of the LDR training image output by the last decoding module through the second convolutional layer Feature extraction, output the HDR image prediction value of the LDR training image.
  • the second convolutional layer above also includes an activation function, and the feature dimension of the second convolutional layer is 3, that is, after passing through the second convolutional layer, a 3-channel (such as RGB) image can be output, and the 3-channel image can be used as an LDR HDR image predictors for training images.
  • a 3-channel (such as RGB) image can be output, and the 3-channel image can be used as an LDR HDR image predictors for training images.
  • the size of the convolution kernel of the second convolution layer may be 1 ⁇ 1.
  • S405. Determine the target loss between the predicted value of the HDR image of the LDR training image and the true value of the HDR image of the LDR training image, and train the dynamic transformation model according to the loss.
  • the HDR image prediction value of the LDR training image is obtained according to the steps of S404 above, the HDR image prediction value of the LDR training image is compared with the HDR image true value of the LDR training image to determine the HDR image prediction value of the LDR training image and the LDR training image The target loss between the true values of the HDR image, and adjust the parameters in the dynamic conversion model according to the target loss, to achieve a training of the dynamic conversion model.
  • the manner of determining the loss in S405 includes S405A: according to a preset loss function, determine a target loss between the predicted value of the HDR image of the LDR training image and the true value of the HDR image of the LDR training image.
  • the aforementioned preset loss function includes at least one of a reconstruction loss function, a perceptual loss function, and a style loss function.
  • the above preset loss function includes a reconstruction loss function, a perceptual loss function, and a style loss function.
  • S405A includes:
  • the reconstruction loss, perceptual loss and style loss between the predicted value of the HDR image and the ground truth value of the HDR image, the target loss between the predicted value of the HDR image and the ground truth value of the HDR image is determined.
  • the reconstruction loss determines that the predicted value of the HDR image is close to the true value of the HDR image on the pixel.
  • the perceptual loss evaluates how well the features of the predicted value of the HDR image match the features extracted from the ground truth of the HDR image, and allows the model to produce textures that are perceptually similar to the ground truth of the HDR image, i.e., the perceptual loss ensures the generation of textures with more texture details. Visually pleasing images.
  • the style loss captures both style and texture by comparing global statistics with Gram matrices collected over the entire image, ensuring both style consistency and color consistency of the predicted image.
  • the weighted sum of reconstruction loss, perceptual loss and style loss can be used as the target loss.
  • Loss is the target loss
  • L1 is the reconstruction loss
  • Lst is the perceptual loss
  • Lp is the style loss
  • ⁇ s and ⁇ p are hyperparameters.
  • the weight of the reconstruction loss is 1
  • the weight of the perceptual loss is ⁇ s
  • the weight of the style loss is ⁇ p .
  • the above formula (1) is just an example, and the method of determining the target loss in this application includes but is not limited to the above formula (1), such as adding, subtracting, multiplying or dividing in formula (1) A certain parameter, or the equivalent deformation of the above formula (1), etc., all belong to the protection scope of the present application.
  • the compressed tone mapping value of the predicted value of the HDR image is determined; according to the compressed tone mapping function, the compressed tone mapping value of the true value of the HDR image is determined; according to the compression of the true value of the HDR image.
  • the error between the tonemapped value and the compressed tonemapped value of the HDR image prediction determines the reconstruction loss.
  • the reconstruction loss is determined according to the following formula (2):
  • L1 represents the reconstruction loss
  • T is the ⁇ -law compressed tone mapping function
  • T(H) is the compressed tone mapping value of the predicted value of the HDR image
  • T(GT) is the compressed tone mapping value of the true value of the HDR image
  • x H or GT
  • H is the predicted value of the HDR image output by the dynamic conversion model
  • GT is the true value of the HDR image of the LDR training image
  • ⁇ . ⁇ 1 indicates the L1 norm
  • is the preset parameter.
  • the above formula (2) is just an example, and the method of determining the reconstruction loss in this application includes but is not limited to the above formula (2), such as adding, subtracting, multiplying or multiplying in formula (2) Except for a certain parameter, or the equivalent deformation of the above formula (2), etc., all belong to the protection scope of the present application.
  • the perceptual loss is determined in the following manner: obtain the feature map of the l-th layer of the pre-training model; determine the compressed tone-mapping value of the HDR image prediction value according to the preset compressed tone-mapping function; according to the compressed tone-mapping function , determine the compressed tone mapping value of the true value of the HDR image; determine the compressed tone mapping value of the predicted value of the HDR image, the first feature value corresponding to the feature map of the l layer; determine the compressed tone mapping value of the true value of the HDR image, in The second eigenvalue corresponding to the feature map of the l-th layer; determining the perceptual loss according to the error between the first eigenvalue and the second eigenvalue.
  • the perceptual loss is determined according to the following formula (3):
  • Lp represents the perceptual loss
  • ⁇ l represents the feature map of the l-th layer of the pre-training model, such as the feature map of the l-th layer of TGG-16
  • the size of the feature map is C l ⁇ H l ⁇ W l
  • ⁇ l (T(H)) is the first eigenvalue corresponding to the compressed tone mapping value of the HDR image prediction value in the feature map of the l layer
  • ⁇ l (T(GT)) is the compressed tone mapping value of the true value of the HDR image in The second eigenvalue corresponding to the feature map of layer l.
  • the above formula (3) is just an example, and the method of determining the perceptual loss in the present application includes but not limited to the above formula (3), such as adding, subtracting, multiplying or dividing in formula (3) A certain parameter, or the equivalent deformation of the above formula (3), etc., all belong to the protection scope of the present application.
  • the style loss is determined according to the following manner: obtain the Gram Gram matrix of the l-th layer feature map of the pre-training model; determine the compressed tone mapping value of the HDR image prediction value according to the preset compressed tone mapping function; Compressed tone mapping function, determine the compressed tone mapping value of the true value of the HDR image; determine the compressed tone mapping value of the predicted value of the HDR image, the corresponding first element value in the Gram Gram matrix; determine the compressed tone mapping value of the true value of the HDR image , the second element value corresponding to the feature map of the first layer; according to the error between the first element value and the second element value, determine the style loss.
  • the style loss is determined according to the following formula (4):
  • Lp represents the perceptual loss function
  • G(.) is the Gram Gram matrix of the l-th layer feature map of the pre-trained model
  • G(T(H)) is the compressed tone mapping value of the HDR image prediction value in the Gram Gram matrix
  • the first element value corresponding to G(T(GT)) is the second element value corresponding to the compressed tone mapping value of the true value of the HDR image in the feature map of layer l
  • x H or GT
  • the size of K l is C l H l W l , which represents the normalization factor of the calculation
  • the feature ⁇ is a matrix of (H l W l ) ⁇ C l , therefore, the size of the Gram matrix is C l ⁇ C l .
  • VGG-16 use the pre-trained VGG-16 network, and calculate the output of the feature maps and real features of the first three pooling layers pool1, pool2 and pool3 of VGG-16 respectively, and according to the above formula (3) and formula ( 4) Compute the perceptual loss and style loss for these features separately.
  • the target loss in the embodiment of the present application includes reconstruction loss, perceptual loss and style loss, so as to reduce reconstruction distortion, artifacts and tone anomalies of high dynamic range images, and further improve the quality of HDR images generated by the model.
  • Deep learning models rely on large-scale datasets, since datasets with LDR-HDR image pairs cannot be used.
  • This application collects from multiple HDR image datasets and HDR video data, and sets up a virtual camera to capture multiple random regions of the scene using randomly selected camera calibrations.
  • Virtual camera calibration contains parameters for exposure, camera curve, white balance and noise level. The virtual camera parameters are randomly selected, and the camera curve parameters are randomly fitted into the camera curve database. This provides a set of LDR and corresponding HDR images, which are used as input and ground truth for training, respectively. A set of data augmentation operations are then applied to improve the robustness of the predictions.
  • each HDR image as a real scene a region is selected as an image crop with random size and position, then randomly flipped and resampled to 256 ⁇ 256 pixels.
  • the final trained network using these data augmentations generalizes well to a variety of images captured with different cameras.
  • the obtained dataset is then divided into training set and test set. Specifically, two datasets, Fairchild HDR dataset and HDR EYE dataset, are collected from the HDR dataset for testing.
  • the hardware experimental equipment of this application is AMD Ryzen 5 CPU, NVIDIA GTX 1080 Ti and 16G memory, and the framework is PyTorch.
  • the method is compared with five existing single-image HDR reconstruction techniques, including three conventional non-learning methods: Akyuz method, KOV method and Masia method.
  • three conventional non-learning methods Akyuz method, KOV method and Masia method.
  • ExpandNet Three objective evaluation methods PU-PSNR, PU-SSIM and HDR-VDP Q-score were used to evaluate the image quality.
  • the perceptually uniform coding proposed in this application converts luminance values into approximately perceptually uniform pixel values of an HDR image.
  • PU-PSNR measures the pixel-wise difference between the predicted image and the reference image.
  • PU-SSIM measures the structural difference between predicted and reference images from the perspective of visual perception.
  • HDR-VDP is a visual metric used to compare reference and test images and predict the quality of an HDR image relative to the reference image. The quality Q-score provided in HDR-VDP is used as the evaluation metric.
  • Table 1 shows a quantitative comparison of reconstructed HDR images using existing methods on the HDR EYE dataset and the Fairchild dataset. Among them, the bold indicates the method with the best experimental results, and the underline indicates the second best algorithm. Our method has the best results in the Fairchild dataset, good Q-score in the HDR EYE dataset, and outperforms other methods in terms of PSNR and SSIM metrics on both datasets.
  • the Fairchild dataset was constructed by the team of Professor Mark D. Fairchild of Rochester Institute of Technology, and contains a series of HDR images and data of more than 100 pieces.
  • the embodiment of the present application provides a dynamic conversion model, the model includes N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in the N encoding modules and the output of the first encoding module in the N decoding modules
  • the input of a decoding module is connected, and the i-th encoding module is skipped and connected to the N-i+1-th decoding module, and the model is trained using the LDR training image.
  • the training process is: input the LDR training image into the dynamic conversion model,
  • the i-1th first feature information is extracted by the i-th encoding module to obtain the i-th first feature information of the LDR training image, and the i-1-th first feature information is obtained by the N-i+1-th decoding module
  • the first feature information and the N-i second feature information of the LDR training image are extracted to obtain the N-i+1 second feature information of the LDR training image; according to the LDR training output of the last decoding module in the N decoding modules
  • the second feature information of the image is to determine the HDR image prediction value of the LDR training image; determine the loss between the HDR image prediction value of the LDR training image and the HDR image true value of the LDR training image, and train the dynamic conversion model according to the loss.
  • the trained dynamic conversion model can be used to convert the LDR image into an HDR image, and then realize the conversion of the HDR image without increasing the cost of data acquisition, encoding, transmission, storage, etc., thereby improving the quality of the HDR image. conversion efficiency.
  • the dynamic conversion model provided by the embodiment of the present application can also be applied to the video codec framework, for example, it can be applied to the video decoding end to perform high dynamic conversion on the reconstructed image obtained by the decoding end to obtain the HDR of the reconstructed image image.
  • Fig. 6 is a schematic flowchart of an image decoding method provided by an embodiment of the present application. As shown in Fig. 6, the method includes:
  • the entropy decoding unit 310 can analyze the code stream to obtain prediction information of the current block, quantization coefficient matrix, etc., and the prediction unit 320 uses intra prediction or inter prediction for the current block based on the prediction information to generate a prediction block of the current block.
  • the inverse quantization/transformation unit 330 uses the quantization coefficient matrix obtained from the code stream to perform inverse quantization and inverse transformation on the quantization coefficient matrix to obtain a residual block.
  • the reconstruction unit 340 adds the predicted block and the residual block to obtain a reconstructed block.
  • the reconstructed blocks form a reconstructed image, and the loop filtering unit 350 performs loop filtering on the reconstructed image based on the image or based on the block to obtain the reconstructed image.
  • the dynamic transformation model is combined with the video coding framework.
  • the input 10-bit HDR data is converted into 8-bit LDR data through a tone mapping module (TM) at the encoding end, and then divided into CTUs and sent to the encoder for encoding.
  • TM tone mapping module
  • Motion compensation intra-frame prediction, inter-frame prediction, transformation, quantization, filtering, and entropy coding form a code stream.
  • the dynamic conversion model described in the above embodiment is added at the output end of the decoder.
  • the dynamic range of the decoded LDR reconstruction image is extended. Using this model, the quality of the obtained HDR data can be significantly improved, and the decoded image quality can be further improved under the premise of ensuring the bit rate.
  • S602. Input the reconstructed image into a dynamic conversion model to perform dynamic conversion to obtain a high dynamic range HDR image of the reconstructed image.
  • dynamic transformation model comprises: N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in N encoding modules is decoded with the first decoding module in N decoding modules The input connection of the module, and the i-th encoding module is skipped and connected to the N-i+1-th decoding module, and the i-th encoding module is used for the i-1th first feature information output by the i-1-th encoding module Perform feature extraction to obtain the i-th first feature information of the reconstructed image, and the N-i+1-th decoding module is used to perform feature extraction on the i-1-th first feature information and the N-i-th second feature information of the reconstructed image Extracting to obtain the N-i+1th second feature information of the reconstructed image, where i is a positive integer less than or equal to N, and N is a positive integer.
  • the HDR image of the reconstructed image is determined according to the second characteristic information output by the last decoding module among the N decoding modules.
  • the above N-i th second feature information is determined according to the N th first feature information output by the N th encoding module.
  • the above N-i th second feature information is determined according to the N-i th second feature information output by the N-i th decoding module.
  • the i-1th first feature information is determined according to the reconstructed image, for example, the 0th first feature information is the reconstructed image, or is a feature map after processing the reconstructed image.
  • the i-1th first feature information is determined according to the first feature information output by the i-1th coding module.
  • the embodiment of the present application does not limit the specific network structure of the encoding module.
  • each of the N coding modules includes at least one convolutional block, and parameters of the convolutional blocks included in each of the N coding modules are not completely the same.
  • the feature dimension of the convolution block included in the first encoding module is 64
  • the feature dimension of the convolution block included in the second encoding module is 128, and the feature dimension of the convolution block included in the third encoding module is is 256
  • the feature dimension of the convolutional block included in the fourth encoding module is 512 and so on.
  • the embodiment of the present application does not limit the specific network structure of the decoding module.
  • each of the N decoding modules includes at least one convolutional block, and parameters of the convolutional blocks included in each of the N decoding modules are not completely the same.
  • the feature dimension of the convolution block included in the first decoding module is 256
  • the feature dimension of the convolution block included in the second decoding module is 128,
  • the feature dimension of the convolution block included in the third decoding module is is 64
  • the feature dimension of the convolutional block included in the fourth coding module is 32 and so on.
  • the network structures of the convolutional blocks included in the encoding modules in the embodiments of the present application may be the same or different.
  • the network structures of the convolutional blocks included in each decoding module may be the same or different.
  • the network structures of the convolutional blocks included in the encoding module and the decoding module may be the same or different, which is not limited in this application.
  • the network structure included in the encoding module and/or the decoding module is as shown in FIG. 5B , including a convolutional layer 1, a convolutional layer 2, a convolutional layer 3 and an activation function.
  • the convolution kernel of convolution layer 1 and convolution layer 2 is 3 ⁇ 3
  • the convolution kernel of convolution layer 3 is 1 ⁇ 1
  • the activation function is Sigmoid Weighted Linear Unit (Sigmoid Weighted Liner Unit, referred to as SiLU ).
  • the sizes of the convolution kernels of the above-mentioned convolutional layer 1, convolutional layer 2, and convolutional layer 3 include but are not limited to the above values, and the activation functions include but are not limited to SiLU, such as RELU, etc. This is not limited.
  • the dynamic conversion model further includes: a convolutional attention module (CBAM) located in the skip connection between the i-th encoding module and the N-i+1-th decoding module.
  • CBAM convolutional attention module
  • the attention mechanism of this convolutional attention module enables the dynamic transformation model to focus more attention on the relevant parts of the encoding side features and less attention on other irrelevant parts, that is, by The convolutional attention mechanism is used to improve the representation ability of the dynamic conversion model, focusing on important features and suppressing unnecessary features, thus greatly improving the efficiency of the model.
  • one or more CBAMs are included in the skip connections between each encoding module and decoding module.
  • the convolutional attention module located in the skip connection between the i-th encoding module and the N-i+1-th decoding module is used to extract the spatial information and channel information of the i-1-th first feature information, and obtain the reconstruction The i-1th third feature information of the image.
  • the N-i+1th decoding module is used to perform feature extraction on the i-1th third feature information and the N-ith second feature information to obtain the N-i+1th second feature of the reconstructed image information.
  • the N-i+1 decoding module is used to perform feature extraction on the concatenated feature information of the i-1 first feature information and the N-i second feature information of the reconstructed image, and obtain the N-th feature information of the reconstructed image. i+1 pieces of second feature information.
  • the convolutional attention module includes a channel attention module and a spatial attention module.
  • the channel attention module is used to extract the channel information of the i-1 first feature information, and obtain the channel attention information of the i-1 first feature information.
  • the spatial attention module is used to extract the spatial information of the i-1 first feature information and the channel attention information of the i-1 first feature information, and obtain the spatial attention of the i-1 first feature information information.
  • the i-1th third feature information of the reconstructed image is determined according to the channel attention information and the spatial attention information of the i-1th first feature information.
  • the convolutional attention module also includes a first multiplication unit; the first multiplication unit is used for channel attention information of the i-1 first feature information and the i-1 first feature information Perform multiplication to obtain the fusion channel feature information of the i-1 first feature information.
  • the spatial attention module is used to extract the spatial information of the fusion channel feature information of the i-1 first feature information, and obtain The spatial attention information of the i-1th first feature information.
  • the convolutional attention module also includes a second multiplication unit; the second multiplication unit is used to multiply the fusion channel feature information and spatial attention information of the i-1 first feature information, Obtain the i-1th third feature information of the reconstructed image.
  • the channel attention module includes: a first spatial compression unit, a second spatial compression unit, and a channel feature extraction unit.
  • the first spatial compression unit is used to compress the spatial dimension of the i-1 first feature information to obtain the first spatial compression information of the i-1 first feature information;
  • the second spatial compression unit is used to perform spatial dimension compression on the i-1 first feature information to obtain second spatial compression information of the i-1 first feature information;
  • the channel feature extraction unit is used to perform channel feature extraction on the first spatial compression information of the i-1 first feature information, to obtain the first channel information of the i-1 first feature information, and to obtain the first channel information of the i-1 first feature information.
  • Channel feature extraction is performed on the second space compressed information of the feature information to obtain i-1 second channel information of the first feature information.
  • the channel attention information of the i-1 first feature information is determined according to the first channel information and the second channel information of the i-1 first feature information.
  • the first spatial compression unit and/or the second spatial compression unit includes a pooling layer.
  • the first spatial compression unit is a maximum pooling layer
  • the second spatial compression unit is an average pooling layer
  • the channel feature extraction unit is a multi-layer perceptron MLP.
  • the channel attention module also includes: a first addition unit and a first activation function
  • the first addition unit is configured to add the first channel information and the second channel information of the i-1 first feature information to obtain the fusion channel information of the i-1 first feature information;
  • the first activation function is used to perform non-linear processing on the fusion channel information of the i-1 first feature information to obtain the channel attention information of the i-1 first feature information.
  • the spatial attention module includes: a first channel compression unit, a second channel compression unit, and a spatial feature extraction unit;
  • the first channel compression unit is configured to perform channel dimension compression on the fusion channel feature information of the i-1 first feature information, to obtain the first channel compression information of the i-1 first feature information;
  • the second channel compression unit is used to perform channel dimension compression on the fusion channel feature information of the i-1 first feature information, to obtain the second channel compression information of the i-1 first feature information;
  • the spatial feature extraction unit is used to perform spatial feature extraction on the first channel compressed information and the second channel compressed information of the i-1 first feature information, to obtain the spatial feature information of the i-1 first feature information;
  • the spatial attention information of the i-1 first feature information is determined according to the spatial feature information of the i-1 first feature information.
  • the first channel compression unit and/or the second channel compression unit includes a pooling layer.
  • the first channel compression unit is a maximum pooling layer
  • the second channel compression unit is an average pooling layer
  • the spatial feature extraction unit is a convolutional layer.
  • the spatial attention module also includes a second activation function
  • the second activation function is used to perform non-linear processing on the spatial feature information of the i-1 first feature information to obtain the spatial attention information of the i-1 first feature information.
  • the spatial dimension of the channel attention information of the i-1 th first feature information is 1 ⁇ 1.
  • the feature dimension of the spatial attention information of the i-1th first feature information is 1.
  • the dynamic conversion model provided by the embodiment of the present application adds a convolutional attention module to each branch, and the convolutional attention module includes a channel attention module and a spatial attention module, respectively for channel features and spatial features Learning is carried out, thereby improving the learning of image detail features by the dynamic conversion model, so that the dynamic conversion model can reconstruct more detailed features in the image, thereby improving the quality of the HDR image generated by the dynamic conversion model.
  • the dynamic conversion model further includes at least one downsampling unit; the downsampling unit is used for downsampling the feature information output by the encoding module in a spatial dimension.
  • the downsampling unit is a maximum pooling layer.
  • the dynamic conversion model further includes at least one upsampling unit; the upsampling unit is used to perform spatial dimension upsampling on the feature information output by the decoding module.
  • the upsampling unit is a bilinear interpolation unit.
  • the dynamic conversion model also includes a first convolutional layer; the first convolutional layer is used to extract features from the reconstructed image, obtain the initial feature map of the reconstructed image, and input the initial feature map into the first in the first encoding module and the first convolutional attention module.
  • the dynamic conversion model also includes a second convolutional layer; the second convolutional layer is used for feature extraction of the second feature information of the reconstructed image output by the last decoding module, and outputs the HDR image of the reconstructed image .
  • the dynamic conversion model includes a first convolutional layer, 4 encoding modules connected in series, 3 down-sampling units, 4 decoding modules connected in series, 3 Upsampling units, 4 CBAMs on the skip connections of the encoding module and decoding module, and the second convolutional layer.
  • the convolution kernel of the first convolution layer is 3 ⁇ 3, and the number of channels is 32, where the number of channels can also be understood as a feature dimension
  • the convolution kernel of the second convolution layer is 1 ⁇ 1, and the number of channels is 3, and the second convolutional layer includes an activation function.
  • the first encoding module includes a convolutional block with 64 channels
  • the second encoding module includes a convolutional block with 128 channels
  • the third encoding module includes a convolutional block with 256 channels
  • the fourth encoding module includes A convolutional block with 512 channels.
  • a first down-sampling unit is set between the first coding module and the second coding module
  • a second down-sampling unit is set between the second coding module and the third coding module
  • a second down-sampling unit is set between the third coding module and the fourth coding module
  • a third down-sampling unit is set, the first down-sampling unit, the second down-sampling unit and the third down-sampling unit are all maximum pooling layers with a convolution kernel of 2 ⁇ 2 and a step size of S2.
  • the first decoding module includes a convolutional block with 256 channels
  • the second decoding module includes a convolutional block with 128 channels
  • the third decoding module includes a convolutional block with 64 channels
  • the fourth decoding module includes A convolutional block with 32 channels.
  • a first upsampling unit is set between the fourth coding block and the first decoding module
  • a second upsampling unit is set between the first decoding module and the second decoding module
  • a second upsampling unit is set between the second decoding module and the third decoding module
  • the third upsampling unit is set, the first upsampling unit, the second upsampling unit and the third sampling unit are all bilinear interpolation units, and the upsampling multiple is 2 ⁇ 2.
  • each upsampling unit also Including a convolutional layer
  • the first upsampling unit is Bilinear Upsample 2 ⁇ 2, Conv 3 ⁇ 3 256
  • the second upsampling unit is Bilinear Upsample 2 ⁇ 2, Conv 3 ⁇ 3 128,
  • the third upsampling unit is Bilinear Upsample 2 ⁇ 2, Conv 3 ⁇ 3 64.
  • the size of the reconstructed image is H ⁇ W ⁇ 3, where H ⁇ W represents the length and width dimensions of the reconstructed image, and 3 represents the number of RGB3 channels of the reconstructed image.
  • H ⁇ W represents the length and width dimensions of the reconstructed image
  • 3 represents the number of RGB3 channels of the reconstructed image.
  • the initial feature map output by the first convolutional layer is respectively input into the first encoding module and the first CBAM, and the convolution block in the first convolution module performs convolution processing on the initial feature map to obtain the first first image of the reconstructed image.
  • the first first feature information is respectively input into the second CBAM and the first down-sampling unit, and the size of the first first feature information is H ⁇ W ⁇ 64.
  • the first down-sampling unit down-samples the first first feature information to H/2 ⁇ W/2 ⁇ 64, and inputs the sampled first first feature information into the second coding module.
  • the convolution block in the second encoding module performs convolution processing on the sampled first first feature information to obtain the second first feature information of the reconstructed image, and input the second first feature information respectively For the third CBAM and the second down-sampling unit, the size of the second first feature information is H/2 ⁇ W/2 ⁇ 128.
  • the second down-sampling unit down-samples the second first feature information to H/4 ⁇ W/4 ⁇ 128, and inputs the sampled second first feature information into the third coding module.
  • the convolution block in the third encoding module performs convolution processing on the sampled second first feature information to obtain the third first feature information of the reconstructed image, and input the third first feature information respectively.
  • the fourth CBAM and the third down-sampling unit, the size of the third first feature information is H/4 ⁇ W/4 ⁇ 256.
  • the third down-sampling unit down-samples the third first feature information to H/8 ⁇ W/8 ⁇ 256, and inputs the sampled third first feature information into the fourth coding module.
  • the convolution block in the fourth encoding module performs convolution processing on the sampled third first feature information to obtain the fourth first feature information of the reconstructed image, and input the fourth first feature information into the first An upsampling unit, the size of the fourth first feature information is H/8 ⁇ W/8 ⁇ 512.
  • the first upsampling unit upsamples the fourth first feature information to H/4 ⁇ W/4 ⁇ 256.
  • the fourth CBAM performs feature extraction on the third first feature information, and outputs the first third feature information of the reconstructed image.
  • the first third feature information is concatenated with the upsampled fourth first feature information and input to the first decoding module.
  • the first decoding module performs feature extraction on the concatenated first third feature information and the upsampled fourth first feature information to obtain the first second feature information of the reconstructed image, and converts the first
  • the second feature information is input into the second upsampling unit.
  • the second upsampling unit upsamples the first second feature information to H/2 ⁇ W/2 ⁇ 128.
  • the third CBAM performs feature extraction on the second first feature information, and outputs the second and third feature information of the reconstructed image.
  • the second third feature information is concatenated with the upsampled first second feature information and then input to the second decoding module.
  • the second decoding module performs feature extraction on the concatenated second third feature information and the up-sampled first second feature information to obtain the second second feature information of the reconstructed image, and converts the second
  • the second feature information is input into the third upsampling unit.
  • the third upsampling unit upsamples the second second feature information to H ⁇ W ⁇ 64.
  • the second CBAM performs feature extraction on the first first feature information, and outputs the third third feature information of the reconstructed image.
  • the third third feature information is concatenated with the upsampled second second feature information and input to the third decoding module.
  • the third decoding module performs feature extraction on the concatenated third third feature information and the up-sampled second second feature information to obtain the third second feature information of the reconstructed image.
  • the first CBAM performs feature extraction on the initial feature map of the reconstructed image, and outputs the fourth and third feature information of the reconstructed image.
  • the fourth third feature information is concatenated with the third second feature information and input to the fourth decoding module.
  • the fourth decoding module performs feature extraction on the concatenated fourth third feature information and third second feature information to obtain the fourth second feature information of the reconstructed image, and converts the fourth second feature information Input the second convolutional layer, the size of the fourth second feature information is H ⁇ W ⁇ 32.
  • the second convolutional layer processes the fourth second feature information, and outputs the HDR image of the reconstructed image, and the size of the HDR image is H ⁇ W ⁇ 3.
  • the above-mentioned dynamic conversion model is used to convert the reconstructed image with a low dynamic range into an image with a high dynamic range, and the whole conversion process is simple and low in cost.
  • the initial parameters of the dynamic conversion model during training are pre-training parameters obtained during pre-training of the pre-training model.
  • the loss function of the dynamic transformation model includes at least one of a reconstruction loss function, a perceptual loss function, and a style loss function.
  • the loss function of the dynamic conversion model is as shown in the following formula:
  • Loss L 1 + ⁇ s L st + ⁇ p L p
  • Loss is the loss function of the dynamic conversion model
  • L1 is the reconstruction loss function
  • Lst is the perceptual loss function
  • Lp is the style loss function
  • ⁇ s and ⁇ p are hyperparameters.
  • the reconstruction loss function of the dynamic transformation model is determined based on the error between the compressed tone-mapped values of the true value of the HDR image and the compressed tone-mapped value of the predicted value of the HDR image, where the compressed tone-mapped value of the predicted value of the HDR image
  • the mapping value is determined according to the preset compressed tone mapping function and the predicted value of the HDR image
  • the compressed tone mapping value of the true value of the HDR image is determined according to the compressed tone mapping function and the true value of the HDR image.
  • the reconstruction loss function of the dynamic transformation model is determined based on the following formula:
  • L1 represents the reconstruction loss function
  • x H or GT
  • H is the predicted value output by the dynamic conversion model when training the dynamic conversion model
  • GT is the real value of the training image
  • ⁇ . ⁇ 1 indicates the L1 norm
  • is the preset parameter.
  • the perceptual loss function of the dynamic transformation model is determined based on the error between the first feature value and the second feature value, wherein the first feature value is the compressed tone map value of the predicted value of the HDR image in the pre-training The corresponding eigenvalue in the feature map of layer l of the model, the second eigenvalue is the compressed tone mapping value of the true value of the HDR image The corresponding eigenvalue in the feature map of layer l, the compressed tone mapping value of the predicted value of the HDR image It is determined according to the preset compressed tone mapping function and the predicted value of the HDR image, and the compressed tone mapping value of the true value of the HDR image is determined according to the compressed tone mapping function and the true value of the HDR image.
  • the perceptual loss function of the dynamic transition model is determined based on the following formula:
  • Lp represents the perceptual loss function
  • ⁇ l represents the feature map of the l-th layer of the pre-training model
  • the size is C l ⁇ H l ⁇ W l .
  • the style loss function of the dynamic transformation model is determined based on the error between the first element value and the second element value, wherein the first element value is the compressed tone map value of the HDR image prediction value in the pre-training The corresponding element value in the Gram matrix of the layer l feature map of the model, the second element value is the compressed tone mapping value of the true value of the HDR image corresponding to the element value in the Gram matrix, and the compressed tone mapping value of the predicted value of the HDR image It is determined according to the preset compressed tone mapping function and the predicted value of the HDR image, and the compressed tone mapping value of the true value of the HDR image is determined according to the compressed tone mapping function and the true value of the HDR image.
  • the style loss function of the dynamic transformation model is determined based on the following formula:
  • Lp represents the perceptual loss function
  • G(.) is the Gram matrix of the l-th layer feature of the pre-trained model
  • ⁇ l represents the feature map of layer l of the pre-training model, the size of which is C l ⁇ H l ⁇ W l
  • the size of K l is C l H l W l .
  • the reconstruction image with a low dynamic range is converted into an image with a high dynamic range by using the above dynamic conversion model, and the whole conversion process is simple and low in cost.
  • the reconstruction loss, perceptual loss and style loss to reduce high dynamic range image reconstruction distortion, artifacts and abnormal tone, the decoded image quality is further improved under the premise of ensuring the bit rate.
  • Fig. 8 is a schematic flow chart of an image processing method provided by an embodiment of the present application. As shown in Fig. 8, the method includes:
  • the dynamic conversion model includes: N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in the N encoding modules and the first in the N decoding modules
  • the input of the decoding module is connected, and the i-th encoding module is skipped to the N-i+1 decoding module, and the i-th encoding module is used for the i-1th first output of the i-1 encoding module
  • the feature information is extracted to obtain the i-th first feature information of the LDR image, and the N-i+1 decoding module is used for the i-1-th first feature information and the N-i-th second feature information of the LDR image
  • sequence numbers of the above-mentioned processes do not mean the order of execution, and the order of execution of the processes should be determined by their functions and internal logic, and should not be used in this application.
  • the implementation of the examples constitutes no limitation.
  • the term "and/or" is only an association relationship describing associated objects, indicating that there may be three relationships. Specifically, A and/or B may mean: A exists alone, A and B exist simultaneously, and B exists alone.
  • the character "/" in this article generally indicates that the contextual objects are an "or" relationship.
  • FIG. 9 is a schematic block diagram of an image decoding device provided by an embodiment of the present application.
  • the image decoding device may be the decoder shown in FIG. 3 , or a component in the decoder, such as a processor in the decoder.
  • the image decoding device 10 may include:
  • Decoding unit 11 configured to decode the code stream to obtain a reconstructed image
  • a processing unit 12 configured to input the reconstructed image into a dynamic conversion model for dynamic conversion to obtain a high dynamic range HDR image of the reconstructed image;
  • the dynamic conversion model includes: N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in the N encoding modules is the same as that of the first encoding module in the N decoding modules
  • the input of the decoding module is connected, and the i-th encoding module is skip-connected to the N-i+1 decoding module, and the i-th encoding module is used for the i-1th output of the i-1 encoding module
  • Feature extraction is performed on the first feature information to obtain the i-th first feature information of the reconstructed image
  • the N-i+1th decoding module is used to extract the i-1-th first feature information and the Perform feature extraction on the N-ith second feature information of the reconstructed image to obtain the N-i+1th second feature information of the reconstructed image
  • the HDR image of the reconstructed image is based on the last one of the N decoding modules Determined by the second feature information output by the decoding module, the i is
  • the dynamic conversion model further includes: a convolutional attention module located in the skip connection between the i-th encoding module and the N-i+1-th decoding module;
  • the convolutional attention module is used to extract spatial information and channel information from the i-1 first feature information to obtain the i-1 third feature information of the reconstructed image;
  • the N-i+1th decoding module is used to perform feature extraction on the i-1th third feature information and the N-ith second feature information to obtain the N-i+th feature information of the reconstructed image. 1 piece of second characteristic information.
  • the convolutional attention module includes a channel attention module and a spatial attention module
  • the channel attention module is used to extract the channel information of the i-1 first feature information, and obtain the channel attention information of the i-1 first feature information;
  • the spatial attention module is used to extract spatial information from the i-1 first feature information and channel attention information of the i-1 first feature information, to obtain the i-1 first feature information Spatial attention information of the first feature information;
  • the i-1 th third feature information of the reconstructed image is determined according to the channel attention information and the spatial attention information of the i-1 th first feature information.
  • the convolutional attention module further includes a first multiplication unit
  • the first multiplication unit is configured to multiply the i-1 first feature information and the channel attention information of the i-1 first feature information to obtain the i-1 first feature Information fusion channel feature information;
  • the spatial attention module is configured to extract spatial information from the fused channel feature information of the i-1 first feature information to obtain the spatial attention information of the i-1 first feature information.
  • the convolutional attention module further includes a second multiplication unit
  • the second multiplication unit is used to multiply the fusion channel feature information and the spatial attention information of the i-1 first feature information to obtain the i-1 third feature information of the reconstructed image.
  • the channel attention module includes: a first spatial compression unit, a second spatial compression unit, and a channel feature extraction unit;
  • the first spatial compression unit is configured to perform spatial dimension compression on the i-1 first feature information to obtain first spatial compression information of the i-1 first feature information;
  • the second spatial compression unit is configured to perform spatial dimension compression on the i-1 first feature information to obtain second spatial compression information of the i-1 first feature information;
  • the channel feature extraction unit is configured to perform channel feature extraction on the first spatially compressed information of the i-1 first feature information, obtain the first channel information of the i-1 first feature information, and perform the channel feature extraction on the i-1 first feature information. performing channel feature extraction on the second spatial compression information of the i-1 first feature information, and obtaining the second channel information of the i-1 first feature information;
  • the channel attention information of the i-1 first feature information is determined according to the first channel information and the second channel information of the i-1 first feature information.
  • the first spatial compression unit and/or the second spatial compression unit includes a pooling layer.
  • the first spatial compression unit is a maximum pooling layer, and/or the second spatial compression unit is an average pooling layer.
  • the channel feature extraction unit is a multi-layer perceptron MLP.
  • the channel attention module further includes: a first addition unit and a first activation function
  • the first adding unit is configured to add the first channel information and the second channel information of the i-1 pieces of first feature information to obtain the fusion channel information of the i-1 pieces of first feature information;
  • the first activation function is used to perform nonlinear processing on the fused channel information of the i-1 pieces of first feature information to obtain channel attention information of the i-1 th piece of first feature information.
  • the spatial attention module includes: a first channel compression unit, a second channel compression unit, and a spatial feature extraction unit;
  • the first channel compression unit is configured to perform channel dimension compression on the fused channel feature information of the i-1 first feature information, to obtain the first channel compression information of the i-1 first feature information;
  • the second channel compression unit is configured to perform channel dimension compression on the fused channel feature information of the i-1 first feature information to obtain second channel compression information of the i-1 first feature information;
  • the spatial feature extraction unit is configured to perform spatial feature extraction on the first channel compressed information and the second channel compressed information of the i-1 first feature information, to obtain the i-1 first feature information Spatial feature information;
  • the spatial attention information of the i-1 th first feature information is determined according to the spatial feature information of the i-1 th first feature information.
  • the first channel compression unit and/or the second channel compression unit includes a pooling layer.
  • the first channel compression unit is a maximum pooling layer, and/or the second channel compression unit is an average pooling layer.
  • the spatial feature extraction unit is a convolutional layer.
  • the spatial attention module further includes a second activation function
  • the second activation function is used to perform nonlinear processing on the spatial feature information of the i-1 th first feature information to obtain the spatial attention information of the i-1 th first feature information.
  • the spatial dimension of the channel attention information of the i-1 th first feature information is 1 ⁇ 1.
  • the feature dimension of the spatial attention information of the i-1 th first feature information is 1.
  • the dynamic conversion model further includes at least one downsampling unit
  • the down-sampling unit is used for down-sampling the feature information output by the encoding module in a spatial dimension.
  • the downsampling unit is a max pooling layer.
  • the dynamic conversion model further includes at least one upsampling unit
  • the up-sampling unit is used for up-sampling the feature information output by the decoding module in a spatial dimension.
  • the upsampling unit is a bilinear interpolation unit.
  • each of the N coding modules includes at least one convolutional block, wherein the parameters of the convolutional blocks included in each of the N coding modules are not completely the same.
  • each of the N decoding modules includes at least one convolutional block, and parameters of the convolutional blocks included in each of the N decoding modules are not completely the same.
  • the N-i second feature information is determined according to the N first feature information output by the N encoding module; or,
  • the N-i-th second feature information is determined according to the N-i-th second feature information output by the N-i-th decoding module; or,
  • the i-1th first feature information is determined according to the reconstructed image.
  • the i-1 first feature information is determined according to the first feature information output by the i-1 encoding module.
  • the N-i+1th decoding module is configured to perform concatenated feature information on the i-1th third feature information and the N-ith second feature information feature extraction, to obtain the N-i+1th second feature information of the reconstructed image.
  • the dynamic conversion model further includes a first convolutional layer
  • the first convolutional layer is used to perform feature extraction on the reconstructed image to obtain an initial feature map of the reconstructed image, and input the initial feature map to the first coding module and the first convolutional attention module respectively middle.
  • the dynamic conversion model further includes a second convolutional layer
  • the second convolutional layer is used to perform feature extraction on the second feature information of the reconstructed image output by the last decoding module, and output an HDR image of the reconstructed image.
  • the initial parameters of the dynamic conversion model during training are pre-training parameters obtained during pre-training of the pre-training model.
  • the loss function of the dynamic conversion model includes at least one of a reconstruction loss function, a perceptual loss function and a style loss function.
  • the loss function of the dynamic conversion model is as shown in the following formula:
  • Loss L 1 + ⁇ s L st + ⁇ p L p
  • Loss is the loss function of the dynamic conversion model
  • the L1 is the reconstruction loss function
  • the Lst is the perceptual loss function
  • the Lp is the style loss function
  • the ⁇ s and ⁇ p is a hyperparameter.
  • the reconstruction loss function of the dynamic transformation model is determined according to the error between the compressed tone mapping value of the true value of the HDR image and the compressed tone mapping value of the predicted value of the HDR image, wherein the HDR image
  • the compressed tone mapping value of the predicted value is determined according to the preset compressed tone mapping function and the predicted value of the HDR image
  • the compressed tone mapping value of the real value of the HDR image is determined according to the compressed tone mapping function and the HDR image
  • the truth value is determined.
  • the reconstruction loss function of the dynamic transformation model is determined based on the following formula:
  • the H is the predicted value output by the dynamic conversion model when the dynamic conversion model is trained
  • the GT is the real value of the training image
  • ⁇ . ⁇ 1 represents the L1 norm
  • is a preset parameter.
  • the perceptual loss function of the dynamic conversion model is determined based on an error between a first eigenvalue and a second eigenvalue, wherein the first eigenvalue is the compressed tone of the predicted value of the HDR image
  • the first eigenvalue is the compressed tone of the predicted value of the HDR image
  • the second feature value is a feature corresponding to the compressed tone mapping value of the true value of the HDR image in the feature map of the first layer value
  • the compressed tone mapping value of the predicted value of the HDR image is determined according to the preset compressed tone mapping function and the predicted value of the HDR image
  • the compressed tone mapping value of the real value of the HDR image is determined according to the compressed tone mapping function and the ground truth value of the HDR image is determined.
  • the perceptual loss function of the dynamic transition model is determined based on the following formula:
  • Lp represents the perceptual loss function
  • x H or GT
  • H is the predicted value output by the dynamic conversion model when the dynamic conversion model is trained
  • GT is the real value of the training image
  • ⁇ . ⁇ 1 represents the L1 norm
  • is a preset parameter
  • ⁇ l represents the feature map of layer l of the pre-training model
  • its size is C l ⁇ H l ⁇ W l .
  • the style loss function of the dynamic conversion model is determined based on an error between a first element value and a second element value, wherein the first element value is a compressed tone of an HDR image prediction value
  • the first element value is a compressed tone of an HDR image prediction value
  • the second element value is the corresponding element in the Gram matrix of the compressed tone mapping value of the true value of the HDR image value
  • the compressed tone mapping value of the predicted value of the HDR image is determined according to the preset compressed tone mapping function and the predicted value of the HDR image
  • the compressed tone mapping value of the real value of the HDR image is determined according to the compressed tone mapping function and the ground truth value of the HDR image is determined.
  • the style loss function of the dynamic transformation model is determined based on the following formula:
  • Lp represents the perceptual loss function
  • G(.) is the Gram matrix of the l-th layer feature of the pre-training model
  • x H or GT
  • the H is the predicted value output by the dynamic conversion model when the H is training the dynamic conversion model
  • the GT is the HDR true value of the training image
  • ⁇ . ⁇ 1 represents the L1 norm
  • is a preset parameter
  • ⁇ l represents the feature map of layer l of the pre-training model
  • its size is C l ⁇ H l ⁇ W l
  • the size of K l is C l H l W l .
  • the device embodiment and the method embodiment may correspond to each other, and similar descriptions may refer to the method embodiment. To avoid repetition, details are not repeated here.
  • the device 10 shown in FIG. 9 may correspond to the corresponding subject in the image decoding method of the embodiment of the present application, and the aforementioned and other operations and/or functions of each unit in the device 10 are respectively in order to realize the image decoding method For the sake of brevity, the corresponding process will not be repeated here.
  • Fig. 10 is a schematic block diagram of an image processing device provided by an embodiment of the present application.
  • the image processing device 20 may include:
  • An acquisition unit 21 configured to acquire a low dynamic range LDR image to be processed
  • a processing unit 22 configured to input the LDR image into a dynamic conversion model for dynamic conversion to obtain a high dynamic range HDR image of the LDR image;
  • the dynamic conversion model includes: N encoding modules connected in series and N decoding modules connected in series, the output of the last encoding module in the N encoding modules is the same as that of the first encoding module in the N decoding modules
  • the input of the decoding module is connected, and the i-th encoding module is skip-connected to the N-i+1 decoding module, and the i-th encoding module is used for the i-1th output of the i-1 encoding module
  • Feature extraction is performed on the first feature information to obtain the ith first feature information of the LDR image
  • the N-i+1 decoding module is used to extract the i-1 first feature information and the Feature extraction is performed on the N-i second feature information of the LDR image to obtain the N-i+1 second feature information of the LDR image
  • the HDR image of the LDR image is based on the last one of the N decoding modules Determined by the second feature information output by the decoding module, the i is a positive integer less than or equal to N
  • the dynamic conversion model further includes: a convolutional attention module located in the skip connection between the i-th encoding module and the N-i+1-th decoding module;
  • the convolutional attention module is used to extract spatial information and channel information from the i-1 first feature information to obtain the i-1 third feature information of the LDR image;
  • the N-i+1th decoding module is used to perform feature extraction on the i-1th third feature information and the N-ith second feature information to obtain the N-i+th feature information of the LDR image. 1 piece of second characteristic information.
  • the convolutional attention module includes a channel attention module and a spatial attention module
  • the channel attention module is used to extract the channel information of the i-1 first feature information, and obtain the channel attention information of the i-1 first feature information;
  • the spatial attention module is used to extract spatial information from the i-1 first feature information and channel attention information of the i-1 first feature information, to obtain the i-1 first feature information Spatial attention information of the first feature information;
  • the i-1 th third feature information of the LDR image is determined according to the channel attention information and the spatial attention information of the i-1 th first feature information.
  • the convolutional attention module further includes a first multiplication unit
  • the first multiplication unit is configured to multiply the i-1 first feature information and the channel attention information of the i-1 first feature information to obtain the i-1 first feature Information fusion channel feature information;
  • the spatial attention module is configured to extract spatial information from the fused channel feature information of the i-1 first feature information to obtain the spatial attention information of the i-1 first feature information.
  • the convolutional attention module further includes a second multiplication unit
  • the second multiplication unit is configured to multiply the fusion channel feature information and the spatial attention information of the i-1 first feature information to obtain the i-1 third feature information of the LDR image.
  • the channel attention module includes: a first spatial compression unit, a second spatial compression unit, and a channel feature extraction unit;
  • the first spatial compression unit is configured to perform spatial dimension compression on the i-1 first feature information to obtain first spatial compression information of the i-1 first feature information;
  • the second spatial compression unit is configured to perform spatial dimension compression on the i-1 first feature information to obtain second spatial compression information of the i-1 first feature information;
  • the channel feature extraction unit is configured to perform channel feature extraction on the first spatially compressed information of the i-1 first feature information, obtain the first channel information of the i-1 first feature information, and perform the channel feature extraction on the i-1 first feature information. performing channel feature extraction on the second spatial compression information of the i-1 first feature information, and obtaining the second channel information of the i-1 first feature information;
  • the channel attention information of the i-1 first feature information is determined according to the first channel information and the second channel information of the i-1 first feature information.
  • the first spatial compression unit and/or the second spatial compression unit comprises a pooling layer.
  • the first spatial compression unit is a max pooling layer
  • the second spatial compression unit is an average pooling layer
  • the channel feature extraction unit is a multi-layer perceptron MLP.
  • the channel attention module further includes: a first addition unit and a first activation function
  • the first adding unit is configured to add the first channel information and the second channel information of the i-1 pieces of first feature information to obtain the fusion channel information of the i-1 pieces of first feature information;
  • the first activation function is used to perform nonlinear processing on the fused channel information of the i-1 pieces of first feature information to obtain channel attention information of the i-1 th piece of first feature information.
  • the spatial attention module includes: a first channel compression unit, a second channel compression unit, and a spatial feature extraction unit;
  • the first channel compression unit is configured to perform channel dimension compression on the fused channel feature information of the i-1 first feature information, to obtain the first channel compression information of the i-1 first feature information;
  • the second channel compression unit is configured to perform channel dimension compression on the fused channel feature information of the i-1 first feature information to obtain second channel compression information of the i-1 first feature information;
  • the spatial feature extraction unit is configured to perform spatial feature extraction on the first channel compressed information and the second channel compressed information of the i-1 first feature information, to obtain the i-1 first feature information Spatial feature information;
  • the spatial attention information of the i-1 th first feature information is determined according to the spatial feature information of the i-1 th first feature information.
  • the first channel compression unit and/or the second channel compression unit comprises a pooling layer.
  • the first channel compression unit is a max pooling layer, and/or the second channel compression unit is an average pooling layer.
  • the spatial feature extraction unit is a convolutional layer.
  • the spatial attention module further includes a second activation function
  • the second activation function is used to perform nonlinear processing on the spatial feature information of the i-1 th first feature information to obtain the spatial attention information of the i-1 th first feature information.
  • the spatial dimension of the channel attention information of the i-1 th first feature information is 1 ⁇ 1.
  • the feature dimension of the spatial attention information of the i-1 th first feature information is 1.
  • the dynamic conversion model further includes at least one downsampling unit
  • the down-sampling unit is used for down-sampling the feature information output by the encoding module in a spatial dimension.
  • the downsampling unit is a max pooling layer.
  • the dynamic conversion model further includes at least one upsampling unit
  • the up-sampling unit is used for up-sampling the feature information output by the decoding module in a spatial dimension.
  • the upsampling unit is a bilinear interpolation unit.
  • each of the N encoding modules includes at least one convolutional block, and parameters of the convolutional blocks included in each of the N encoding modules are not completely the same.
  • each of the N decoding modules includes at least one convolutional block, wherein parameters of the convolutional blocks included in each of the N decoding modules are not completely the same.
  • the N-i th second feature information is determined according to the N th first feature information output by the N th encoding module; or,
  • the N-i-th second feature information is determined according to the N-i-th second feature information output by the N-i-th decoding module; or,
  • the i-1th first feature information is determined according to the LDR image; or,
  • the i-1 first feature information is determined according to the first feature information output by the i-1 encoding module.
  • the N-i+1th decoding module is used to perform feature extraction on the concatenated feature information of the i-1th third feature information and the N-ith second feature information , to obtain the N-i+1th second feature information of the LDR image.
  • the dynamic transformation model further includes a first convolutional layer
  • the first convolutional layer is used to extract features from the LDR image, obtain an initial feature map of the LDR image, and input the initial feature map to the first coding module and the first convolutional attention module respectively middle.
  • the dynamic transformation model further includes a second convolutional layer
  • the second convolutional layer is used to perform feature extraction on the second feature information of the LDR image output by the last decoding module, and output an HDR image of the LDR image.
  • the initial parameters of the dynamic transformation model during training are pre-training parameters obtained during pre-training of the pre-training model.
  • the loss function of the dynamic transformation model includes at least one of a reconstruction loss function, a perceptual loss function and a style loss function.
  • the loss function of the dynamic conversion model is as shown in the following formula:
  • Loss L 1 + ⁇ s L st + ⁇ p L p
  • Loss is the loss function of the dynamic conversion model
  • the L1 is the reconstruction loss function
  • the Lst is the perceptual loss function
  • the Lp is the style loss function
  • the ⁇ s and ⁇ p is a hyperparameter.
  • the reconstruction loss function of the dynamic transformation model is determined from the error between the compressed tone-mapped values of the true value of the HDR image and the compressed tone-mapped value of the predicted value of the HDR image, wherein the predicted HDR image
  • the compressed tone-mapping value of the value is determined according to the preset compressed tone-mapping function and the predicted value of the HDR image
  • the compressed tone-mapping value of the real value of the HDR image is determined according to the compressed tone-mapping function and the real value of the HDR image. The value is determined.
  • the reconstruction loss function of the dynamic transformation model is determined based on the following formula:
  • the H is the predicted value output by the dynamic conversion model when the dynamic conversion model is trained
  • the GT is the real value of the training image
  • ⁇ . ⁇ 1 represents the L1 norm
  • is a preset parameter.
  • the perceptual loss function of the dynamic transformation model is determined based on an error between a first eigenvalue and a second eigenvalue, wherein the first eigenvalue is a compressed tone map of an HDR image prediction value
  • the value corresponds to the feature value in the feature map of the first layer of the pre-training model
  • the second feature value is the corresponding feature value of the compressed tone mapping value of the true value of the HDR image in the feature map of the first layer
  • the compressed tone mapping value of the predicted value of the HDR image is determined according to a preset compressed tone mapping function and the predicted value of the HDR image
  • the compressed tone mapping value of the real value of the HDR image is determined according to the compressed tone mapping function and the true value of the HDR image is determined.
  • the perceptual loss function of the dynamic transition model is determined based on the following formula:
  • Lp represents the perceptual loss function
  • x H or GT
  • H is the predicted value output by the dynamic conversion model when the dynamic conversion model is trained
  • GT is the real value of the training image
  • ⁇ . ⁇ 1 represents the L1 norm
  • is a preset parameter
  • ⁇ l represents the feature map of layer l of the pre-training model
  • its size is C l ⁇ H l ⁇ W l .
  • the style loss function of the dynamic transformation model is determined based on an error between a first element value and a second element value, wherein the first element value is a compressed tone map of an HDR image prediction value
  • the value corresponds to the element value in the Gram Gram matrix of the l-th layer feature map of the pre-training model
  • the second element value is the corresponding element value in the Gram matrix of the compressed tone mapping value of the true value of the HDR image
  • the compressed tone mapping value of the predicted value of the HDR image is determined according to a preset compressed tone mapping function and the predicted value of the HDR image
  • the compressed tone mapping value of the real value of the HDR image is determined according to the compressed tone mapping function and the true value of the HDR image is determined.
  • the style loss function of the dynamic transformation model is determined based on the following formula:
  • Lp represents the perceptual loss function
  • G(.) is the Gram matrix of the l-th layer feature of the pre-training model
  • x H or GT
  • the H is the predicted value output by the dynamic conversion model when the H is training the dynamic conversion model
  • the GT is the HDR true value of the training image
  • ⁇ . ⁇ 1 represents the L1 norm
  • is a preset parameter
  • ⁇ l represents the feature map of layer l of the pre-training model
  • its size is C l ⁇ H l ⁇ W l
  • the size of K l is C l H l W l .
  • the device embodiment and the method embodiment may correspond to each other, and similar descriptions may refer to the method embodiment. To avoid repetition, details are not repeated here.
  • the device 20 shown in FIG. 10 may correspond to the corresponding subject in the image processing method of the embodiment of the present application, and the aforementioned and other operations and/or functions of each unit in the device 20 are respectively in order to realize the image processing method For the sake of brevity, the corresponding process will not be repeated here.
  • Fig. 11 is a schematic block diagram of a model training device provided by an embodiment of the present application.
  • model training device 40 comprises:
  • An acquisition unit 41 configured to acquire a low dynamic range LDR training image and a true value of a high dynamic range HDR image of the LDR training image;
  • the processing unit 42 is configured to input the LDR training image into the dynamic conversion model, and extract the i-1 first feature information through the i-th encoding module to obtain the i-th first feature of the LDR training image information, wherein the dynamic conversion model includes N encoding modules connected in series and the N decoding modules connected in series, and the output of the last encoding module in the N encoding modules is the same as that of the N decoding modules
  • the input connection of the first decoding module of , and the i-th encoding module is skipped and connected to the N-i+1 decoding module, the i is a positive integer less than or equal to N, and the N is a positive integer; through the The N-i+1 decoding module performs feature extraction on the i-1 first feature information and the N-i second feature information of the LDR training image to obtain the N-i of the LDR training image.
  • the dynamic conversion model further includes: a convolutional attention module located in the skip connection between the i-th encoding module and the N-i+1-th decoding module, the above-mentioned processing unit 42 , specifically for performing spatial information and channel information extraction on the i-1th first feature information through the convolution attention module to obtain the i-1th third feature information of the LDR training image; by The N-i+1th decoding module performs feature extraction on the i-1th third feature information and the N-ith second feature information to obtain the N-i+1th of the LDR training image a second characteristic information.
  • a convolutional attention module located in the skip connection between the i-th encoding module and the N-i+1-th decoding module, the above-mentioned processing unit 42 , specifically for performing spatial information and channel information extraction on the i-1th first feature information through the convolution attention module to obtain the i-1th third feature information of the LDR training image; by The N-i+1th decoding module performs feature extraction on the i-1
  • the convolutional attention module includes a channel attention module and a spatial attention module
  • the above-mentioned processing unit 42 is specifically configured to perform the i-1th first feature through the channel attention module Extract channel information from the information to obtain the channel attention information of the i-1 first feature information; perform spatial information extraction on the fusion channel feature information of the i-1 first feature information through the spatial attention module , to obtain the spatial attention information of the i-1 first feature information, the fusion channel feature information of the i-1 first feature information is based on the i-1 first feature information and the Determined by the channel attention information of the i-1 first feature information; according to the channel attention information and spatial attention information of the i-1 first feature information, determine the i-1th of the LDR training image A third characteristic information.
  • the convolutional attention module further includes a first multiplication unit, the above-mentioned processing unit 42 is also used to perform the i-1th first feature information and the i-th by the first multiplication unit Multiply the channel attention information of the first feature information to obtain the fusion channel feature information of the i-1 first feature information.
  • the convolutional attention module further includes a second multiplication unit, the above-mentioned processing unit 42, which is specifically used to perform the fusion channel of the i-1th first feature information through the second multiplication unit
  • the feature information is multiplied by the spatial attention information to obtain the i-1th third feature information of the LDR training image.
  • the channel attention module includes: a first spatial compression unit, a second spatial compression unit, and a channel feature extraction unit, and the above processing unit 42 is specifically configured to use the first spatial compression unit to analyze the Perform spatial dimension compression on the i-1th first feature information to obtain first spatial compression information of the i-1th first feature information; use the second spatial compression unit to compress the i-1th first feature information
  • a feature information is subjected to spatial dimension compression to obtain the second spatial compression information of the i-1 first feature information
  • the first spatial dimension of the i-1 first feature information is obtained by the channel feature extraction unit performing channel feature extraction on the compressed information to obtain the first channel information of the i-1 first feature information; performing the second spatial compression information on the i-1 first feature information through the channel feature extraction unit Channel feature extraction, obtaining the second channel information of the i-1 first feature information; determining the i-1th channel information according to the first channel information and second channel information of the i-1 first feature information Channel attention information of the first feature information.
  • the first spatial compression unit and/or the second spatial compression unit comprises a pooling layer.
  • the first spatial compression unit is a max pooling layer
  • the second spatial compression unit is an average pooling layer
  • the channel feature extraction unit is a multi-layer perceptron MLP.
  • the channel attention module further includes: a first addition unit and a first activation function, and the above-mentioned processing unit 42 is specifically configured to perform the i-1 first features through the first addition unit adding the first channel information and the second channel information of the information to obtain the fusion channel information of the i-1 pieces of first feature information; The channel information is fused to perform nonlinear processing to obtain the channel attention information of the i-1 th first feature information.
  • the spatial attention module includes: a first channel compression unit, a second channel compression unit, and a spatial feature extraction unit, and the above-mentioned processing unit 42 is specifically configured to use the first channel compression unit to Perform channel dimension compression on the fused channel feature information of the i-1 first feature information to obtain first channel compression information of the i-1 first feature information; use the second channel compression unit to compress the first channel
  • the fusion channel feature information of the i-1 first feature information is compressed in the channel dimension to obtain the second channel compression information of the i-1 first feature information; the i-th first feature information is extracted by the spatial feature extraction unit.
  • the first channel compressed information and the second channel compressed information of one first feature information are subjected to spatial feature extraction to obtain the spatial feature information of the i-1 first feature information; according to the i-1 first
  • the spatial feature information of the feature information is to determine the spatial attention information of the i-1 first feature information.
  • the first channel compression unit and/or the second channel compression unit comprises a pooling layer.
  • the first channel compression unit is a max pooling layer, and/or the second channel compression unit is an average pooling layer.
  • the spatial feature extraction unit is a convolutional layer.
  • the spatial attention module further includes a second activation function
  • the above-mentioned processing unit 42 is specifically configured to perform the spatial feature information of the i-1th first feature information through the second activation function Perform nonlinear processing to obtain the spatial attention information of the i-1th first feature information.
  • the spatial dimension of the channel attention information of the i-1 th first feature information is 1 ⁇ 1.
  • the feature dimension of the spatial attention information of the i-1 th first feature information is 1.
  • the dynamic conversion model further includes at least one down-sampling unit, the above-mentioned processing unit 42 is further configured to down-sample the feature information output by the encoding module through the down-sampling unit in a spatial dimension.
  • the downsampling unit is a maximum pooling layer.
  • the dynamic conversion model further includes at least one upsampling unit, the above-mentioned processing unit 42 is further configured to perform spatial dimension upsampling on the feature information output by the decoding module through the upsampling unit.
  • the upsampling unit is a bilinear interpolation unit.
  • each of the N encoding modules includes at least one convolutional block, and parameters of the convolutional blocks included in each of the N encoding modules are not completely the same.
  • each of the N decoding modules includes at least one convolutional block, and parameters of the convolutional blocks included in each of the N decoding modules are not completely the same.
  • the N-i th second feature information is determined according to the N th first feature information output by the N th encoding module; or, if the i is less than N, the N-i-th second feature information is determined according to the N-i-th second feature information output by the N-i-th decoding module; or, if the i is equal to 1, then the i-1-th A feature information is determined according to the LDR training image; or, if the i is greater than 1, the i-1th first feature information is determined according to the first feature information output by the i-1th coding module of.
  • the above processing unit 42 is specifically configured to concatenate the i-1th third feature information and the N-ith second feature information; input the concatenated feature information into the The N-i+1th decoding module performs feature extraction to obtain the N-i+1th second feature information of the LDR training image.
  • the dynamic conversion model further includes a first convolutional layer
  • the above-mentioned processing unit 42 is also configured to perform feature extraction on the LDR training image through the first convolutional layer to obtain the LDR training image.
  • the initial feature map of the image is input into the first coding module and the first convolution attention module respectively, and the first first feature information output by the first coding module is obtained, and the obtained The first third feature information output by the first convolutional attention module.
  • the dynamic conversion model further includes a second convolutional layer
  • the above-mentioned processing unit 42 is specifically used to process the LDR training image output by the last decoding module through the second convolutional layer
  • the second feature information performs feature extraction, and outputs the HDR image prediction value of the LDR training image.
  • the processing unit 42 is further configured to obtain pre-training parameters obtained during pre-training of the pre-training model; and determine the pre-training parameters as initial parameters of the dynamic transformation model.
  • the above processing unit 42 is specifically configured to determine the target loss between the predicted value of the HDR image of the LDR training image and the true value of the HDR image of the LDR training image according to a preset loss function.
  • the preset loss function includes at least one of a reconstruction loss function, a perceptual loss function and a style loss function.
  • the above processing unit 42 is specifically configured to determine the reconstruction loss between the predicted value of the HDR image and the true value of the HDR image; determine the difference between the predicted value of the HDR image and the true value of the HDR image Perceptual loss between; determine the style loss between the predicted value of the HDR image and the true value of the HDR image; according to the reconstruction loss, perceptual loss and style between the predicted value of the HDR image and the true value of the HDR image Loss, determining the target loss between the predicted value of the HDR image and the true value of the HDR image.
  • the above-mentioned processing unit 42 is specifically configured to determine the target loss between the predicted value of the HDR image and the true value of the HDR image according to the following formula:
  • Loss L 1 + ⁇ s L st + ⁇ p L p
  • Loss is the target loss
  • the L1 is the reconstruction loss
  • the Lst is the perceptual loss
  • the Lp is the style loss
  • the ⁇ s and ⁇ p are hyperparameters.
  • the above-mentioned processing unit 42 is specifically configured to determine the compressed tone-mapping value of the predicted value of the HDR image according to a preset compressed tone-mapping function; The compressed tone-mapped value of the value; the reconstruction loss is determined according to the error between the compressed tone-mapped value of the true value of the HDR image and the compressed tone-mapped value of the predicted value of the HDR image.
  • the reconstruction loss is determined according to the following formula:
  • L1 represents the reconstruction loss
  • x H or GT
  • H is the predicted value of the HDR image output by the dynamic conversion model
  • GT is the true value of the HDR image
  • ⁇ . ⁇ 1 indicates the L1 norm
  • is preset parameter.
  • the above-mentioned processing unit 42 is specifically configured to obtain the feature map of the first layer of the pre-training model; determine the compressed tone-mapping value of the HDR image prediction value according to a preset compressed tone-mapping function; According to the compressed tone mapping function, determine the compressed tone mapping value of the real value of the HDR image; determine the compressed tone mapping value of the predicted value of the HDR image, and the corresponding first feature value in the feature map of the first layer ; Determining the compressed tone mapping value of the true value of the HDR image, the second eigenvalue corresponding to the feature map of the first layer; according to the error between the first eigenvalue and the second eigenvalue, Determine the perceptual loss.
  • the perceptual loss is determined according to the following formula:
  • Lp represents the perceptual loss
  • x H or GT
  • H is the predicted value of the HDR image output by the dynamic conversion model
  • GT is the true value of the HDR image
  • ⁇ . ⁇ 1 indicates the L1 norm
  • is preset parameter
  • ⁇ l represents the feature map of layer l of the pre-training model
  • its size is C l ⁇ H l ⁇ W l .
  • the above-mentioned processing unit 42 is specifically configured to obtain the Gram Gram matrix of the l-th layer feature map of the pre-training model; determine the compression of the predicted value of the HDR image according to a preset compressed tone mapping function Tone mapping value; according to the compressed tone mapping function, determine the compressed tone mapping value of the true value of the HDR image; determine the compressed tone mapping value of the predicted value of the HDR image, and the corresponding first element value in the Gram matrix ; Determine the compressed tone mapping value of the true value of the HDR image, the corresponding second element value in the Gram matrix; determine the style according to the error between the first element value and the second element value loss.
  • the style loss is determined according to the following formula:
  • Lp represents the perceptual loss function
  • G(.) is the Gram matrix of the l-th layer feature of the pre-training model
  • x H or GT
  • the H is the predicted value of the HDR image output by the dynamic conversion model
  • the GT is the true value of the HDR image
  • ⁇ . ⁇ 1 indicates the L1 norm
  • is preset Parameters
  • ⁇ l represents the feature map of layer l of the pre-training model, with a size of C l ⁇ H l ⁇ W l
  • K l is C l H l W l .
  • the device embodiment and the method embodiment may correspond to each other, and similar descriptions may refer to the method embodiment. To avoid repetition, details are not repeated here.
  • the device 40 shown in FIG. 11 may correspond to the corresponding subject in the model training method of the embodiment of the present application, and the aforementioned and other operations and/or functions of each unit in the device 40 are for realizing the model training method, etc. For the sake of brevity, the corresponding processes in each method are not repeated here.
  • the functional unit may be implemented in the form of hardware, may also be implemented by instructions in the form of software, and may also be implemented by a combination of hardware and software units.
  • each step of the method embodiment in the embodiment of the present application can be completed by an integrated logic circuit of the hardware in the processor and/or instructions in the form of software, and the steps of the method disclosed in the embodiment of the present application can be directly embodied as hardware
  • the decoding processor is executed, or the combination of hardware and software units in the decoding processor is used to complete the execution.
  • the software unit may be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, and registers.
  • the storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps in the above method embodiments in combination with its hardware.
  • Fig. 12 is a schematic block diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device 30 may be the image processing device described in the embodiment of the present application, or a decoder, or a model training device, and the electronic device 30 may include:
  • a memory 33 and a processor 32 the memory 33 is used to store a computer program 34 and transmit the program code 34 to the processor 32 .
  • the processor 32 can call and run the computer program 34 from the memory 33 to implement the method in the embodiment of the present application.
  • the processor 32 can be used to execute the steps in the above-mentioned method 200 according to the instructions in the computer program 34 .
  • the processor 32 may include, but is not limited to:
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • the memory 33 includes but is not limited to:
  • non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash.
  • the volatile memory can be Random Access Memory (RAM), which acts as external cache memory.
  • RAM Static Random Access Memory
  • SRAM Static Random Access Memory
  • DRAM Dynamic Random Access Memory
  • Synchronous Dynamic Random Access Memory Synchronous Dynamic Random Access Memory
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM, DDR SDRAM double data rate synchronous dynamic random access memory
  • Enhanced SDRAM, ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous connection dynamic random access memory
  • Direct Rambus RAM Direct Rambus RAM
  • the computer program 34 can be divided into one or more units, and the one or more units are stored in the memory 33 and executed by the processor 32 to complete the present application.
  • the one or more units may be a series of computer program instruction segments capable of accomplishing specific functions, and the instruction segments are used to describe the execution process of the computer program 34 in the electronic device 30 .
  • the electronic device 30 may also include:
  • a transceiver 33 the transceiver 33 can be connected to the processor 32 or the memory 33 .
  • the processor 32 can control the transceiver 33 to communicate with other devices, specifically, can send information or data to other devices, or receive information or data sent by other devices.
  • Transceiver 33 may include a transmitter and a receiver.
  • the transceiver 33 may further include antennas, and the number of antennas may be one or more.
  • bus system includes not only a data bus, but also a power bus, a control bus and a status signal bus.
  • the present application also provides a computer storage medium, on which a computer program is stored, and when the computer program is executed by a computer, the computer can execute the methods of the above method embodiments.
  • the embodiments of the present application further provide a computer program product including instructions, and when the instructions are executed by a computer, the computer executes the methods of the foregoing method embodiments.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transferred from a website, computer, server, or data center by wire (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a tape), an optical medium (such as a digital video disc (DVD)), or a semiconductor medium (such as a solid state disk (SSD)), etc. .
  • a magnetic medium such as a floppy disk, a hard disk, or a tape
  • an optical medium such as a digital video disc (DVD)
  • a semiconductor medium such as a solid state disk (SSD)
  • the disclosed systems, devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or can be Integrate into another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • a unit described as a separate component may or may not be physically separated, and a component shown as a unit may or may not be a physical unit, that is, it may be located in one place, or may also be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

La présente invention concerne un procédé et un appareil de décodage d'images, un procédé et un appareil de traitement d'images, et un dispositif. Le procédé comprend les étapes suivantes : décoder un flux de code pour obtenir une image reconstruite, et fournir l'image reconstruite en entrée d'un modèle de conversion dynamique pour une conversion dynamique, de façon à obtenir une image à grande gamme dynamique (HDR) de l'image reconstruite, le modèle de conversion dynamique comprenant N modules d'encodage et N modules de décodage ; un i-ième module d'encodage étant en connexion saute-couche avec un (N - i +1)-ième module de décodage ; l'i-ième module d'encodage est configuré pour effectuer une extraction de caractéristiques sur un (i - 1)-ième élément de premières informations de caractéristiques fournies par un (i - 1)-ième module d'encodage, de manière à obtenir un i-ième élément de premières informations de caractéristiques de l'image reconstruite, et le (N - i +1)-ième module de décodage est configuré pour effectuer une extraction de caractéristiques sur le (i - 1)-ième élément de premières informations de caractéristiques et un (N - i)-ième élément de deuxièmes informations de caractéristiques pour obtenir un (N - i +1)-ième élément de deuxièmes informations de caractéristiques, et l'image HDR est déterminée selon les deuxièmes informations de caractéristiques fournies par un dernier module de décodage. La présente invention convertit une image ayant une petite gamme dynamique en une image ayant une grande gamme dynamique en utilisant le modèle de conversion dynamique et, ainsi, le procédé est simple et le coût est faible.
PCT/CN2021/102173 2021-06-24 2021-06-24 Procédé et appareil de décodage d'images, procédé et appareil de traitement d'images, et dispositif WO2022266955A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202180097934.XA CN117441186A (zh) 2021-06-24 2021-06-24 图像解码及处理方法、装置及设备
PCT/CN2021/102173 WO2022266955A1 (fr) 2021-06-24 2021-06-24 Procédé et appareil de décodage d'images, procédé et appareil de traitement d'images, et dispositif

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/102173 WO2022266955A1 (fr) 2021-06-24 2021-06-24 Procédé et appareil de décodage d'images, procédé et appareil de traitement d'images, et dispositif

Publications (1)

Publication Number Publication Date
WO2022266955A1 true WO2022266955A1 (fr) 2022-12-29

Family

ID=84543976

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/102173 WO2022266955A1 (fr) 2021-06-24 2021-06-24 Procédé et appareil de décodage d'images, procédé et appareil de traitement d'images, et dispositif

Country Status (2)

Country Link
CN (1) CN117441186A (fr)
WO (1) WO2022266955A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115776571A (zh) * 2023-02-10 2023-03-10 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) 一种图像压缩方法、装置、设备及存储介质
CN117854138A (zh) * 2024-03-07 2024-04-09 深圳航天信息有限公司 基于大数据的信息采集分析方法、装置、设备及存储介质

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108805836A (zh) * 2018-05-31 2018-11-13 大连理工大学 基于深度往复式hdr变换的图像校正方法
CN109447907A (zh) * 2018-09-20 2019-03-08 宁波大学 一种基于全卷积神经网络的单图像增强方法
CN109785263A (zh) * 2019-01-14 2019-05-21 北京大学深圳研究生院 一种基于Retinex的逆色调映射图像转换方法
CN110717868A (zh) * 2019-09-06 2020-01-21 上海交通大学 视频高动态范围反色调映射模型构建、映射方法及装置
US20200074600A1 (en) * 2017-11-28 2020-03-05 Adobe Inc. High dynamic range illumination estimation
CN111292264A (zh) * 2020-01-21 2020-06-16 武汉大学 一种基于深度学习的图像高动态范围重建方法
CN111372006A (zh) * 2020-03-03 2020-07-03 山东大学 一种面向移动端的高动态范围成像方法及系统
US20200265567A1 (en) * 2019-02-18 2020-08-20 Samsung Electronics Co., Ltd. Techniques for convolutional neural network-based multi-exposure fusion of multiple image frames and for deblurring multiple image frames
CN111709900A (zh) * 2019-10-21 2020-09-25 上海大学 一种基于全局特征指导的高动态范围图像重建方法
CN111914938A (zh) * 2020-08-06 2020-11-10 上海金桥信息股份有限公司 一种基于全卷积二分支网络的图像属性分类识别方法
CN111951171A (zh) * 2019-05-16 2020-11-17 武汉Tcl集团工业研究院有限公司 Hdr图像生成方法、装置、可读存储介质及终端设备

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200074600A1 (en) * 2017-11-28 2020-03-05 Adobe Inc. High dynamic range illumination estimation
CN108805836A (zh) * 2018-05-31 2018-11-13 大连理工大学 基于深度往复式hdr变换的图像校正方法
CN109447907A (zh) * 2018-09-20 2019-03-08 宁波大学 一种基于全卷积神经网络的单图像增强方法
CN109785263A (zh) * 2019-01-14 2019-05-21 北京大学深圳研究生院 一种基于Retinex的逆色调映射图像转换方法
US20200265567A1 (en) * 2019-02-18 2020-08-20 Samsung Electronics Co., Ltd. Techniques for convolutional neural network-based multi-exposure fusion of multiple image frames and for deblurring multiple image frames
CN111951171A (zh) * 2019-05-16 2020-11-17 武汉Tcl集团工业研究院有限公司 Hdr图像生成方法、装置、可读存储介质及终端设备
CN110717868A (zh) * 2019-09-06 2020-01-21 上海交通大学 视频高动态范围反色调映射模型构建、映射方法及装置
CN111709900A (zh) * 2019-10-21 2020-09-25 上海大学 一种基于全局特征指导的高动态范围图像重建方法
CN111292264A (zh) * 2020-01-21 2020-06-16 武汉大学 一种基于深度学习的图像高动态范围重建方法
CN111372006A (zh) * 2020-03-03 2020-07-03 山东大学 一种面向移动端的高动态范围成像方法及系统
CN111914938A (zh) * 2020-08-06 2020-11-10 上海金桥信息股份有限公司 一种基于全卷积二分支网络的图像属性分类识别方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KINOSHITA YUMA; KIYA HITOSHI: "Convolutional Neural Networks Considering Local and Global Features for Image Enhancement", 2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), IEEE, 22 September 2019 (2019-09-22), pages 2110 - 2114, XP033647118, DOI: 10.1109/ICIP.2019.8803194 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115776571A (zh) * 2023-02-10 2023-03-10 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) 一种图像压缩方法、装置、设备及存储介质
CN115776571B (zh) * 2023-02-10 2023-04-28 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) 一种图像压缩方法、装置、设备及存储介质
CN117854138A (zh) * 2024-03-07 2024-04-09 深圳航天信息有限公司 基于大数据的信息采集分析方法、装置、设备及存储介质
CN117854138B (zh) * 2024-03-07 2024-05-10 深圳航天信息有限公司 基于大数据的信息采集分析方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN117441186A (zh) 2024-01-23

Similar Documents

Publication Publication Date Title
US20230069953A1 (en) Learned downsampling based cnn filter for image and video coding using learned downsampling feature
JP7239711B2 (ja) クロマブロック予測方法及び装置
JP5666756B2 (ja) 階層的vdr符号化における層分解
TWI834087B (zh) 用於從位元流重建圖像及用於將圖像編碼到位元流中的方法及裝置、電腦程式產品
JP7277586B2 (ja) モードおよびサイズに依存したブロックレベル制限の方法および装置
CN111800629A (zh) 视频解码方法、编码方法以及视频解码器和编码器
WO2022068682A1 (fr) Procédé et appareil de traitement d'images
US20230076920A1 (en) Global skip connection based convolutional neural network (cnn) filter for image and video coding
WO2022266955A1 (fr) Procédé et appareil de décodage d'images, procédé et appareil de traitement d'images, et dispositif
EP4365820A1 (fr) Réseau de super-résolution vidéo, et procédé et dispositif de traitement de codage, décodage et super-résolution vidéo
US11070808B2 (en) Spatially adaptive quantization-aware deblocking filter
US20230362378A1 (en) Video coding method and apparatus
WO2021249290A1 (fr) Procédé et appareil de filtrage à boucle
WO2022194137A1 (fr) Procédé de codage d'image vidéo, procédé de décodage d'image vidéo et dispositifs associés
WO2023279961A1 (fr) Procédé et appareil de codage d'image vidéo, et procédé et appareil de décodage d'image vidéo
KR20230129068A (ko) 확장 가능한 인코딩 및 디코딩 방법 및 장치
Lauga et al. Segmentation-based optimized tone mapping for high dynamic range image and video coding
WO2022179509A1 (fr) Procédé et appareil de compression en couches de contenu audio/vidéo ou d'image
WO2023000182A1 (fr) Procédés de codage, de décodage et de traitement d'image, appareil de décodage d'image et dispositif
WO2023184088A1 (fr) Procédé et appareil de traitement d'image, dispositif, système et support de stockage
EP4226325A1 (fr) Procédé et appareil pour coder ou décoder une image à l'aide d'un réseau neuronal
CN117999784A (zh) 用于基于学习的图像/视频编解码的整形器
CN117939157A (zh) 图像处理方法、装置及设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21946451

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202180097934.X

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21946451

Country of ref document: EP

Kind code of ref document: A1