WO2024007977A1 - Image processing method and apparatus, and device - Google Patents

Image processing method and apparatus, and device Download PDF

Info

Publication number
WO2024007977A1
WO2024007977A1 PCT/CN2023/104420 CN2023104420W WO2024007977A1 WO 2024007977 A1 WO2024007977 A1 WO 2024007977A1 CN 2023104420 W CN2023104420 W CN 2023104420W WO 2024007977 A1 WO2024007977 A1 WO 2024007977A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature map
perform
neural network
code stream
Prior art date
Application number
PCT/CN2023/104420
Other languages
French (fr)
Chinese (zh)
Inventor
邓欣
景俊鹏
高方远
李胜曦
徐迈
吕卓逸
Original Assignee
维沃移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202211256194.2A external-priority patent/CN117395418A/en
Application filed by 维沃移动通信有限公司 filed Critical 维沃移动通信有限公司
Publication of WO2024007977A1 publication Critical patent/WO2024007977A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking

Definitions

  • the present application belongs to the field of image compression technology, and specifically relates to an image processing method, device and equipment.
  • JPEG Joint Photographic Experts Group
  • the embodiments of the present application provide an image processing method, device and equipment, which can solve the problem in related technologies that the image compression ratio cannot be improved while ensuring the image quality.
  • the first aspect provides an image processing method, including:
  • the encoding end uses a reversible neural network to perform a first sampling process on the acquired first image to obtain a second image; the resolution of the second image is lower than the resolution of the first image;
  • the encoding end codes and compresses the second image to generate a target code stream.
  • the second aspect provides an image processing method, including:
  • the decoding end decodes the obtained target code stream and obtains the third image
  • the decoding end performs a first operation on the third image to obtain a fourth image; the resolution of the third image is lower than the resolution of the fourth image;
  • the first operation includes any one of the following:
  • a sixth image is obtained, and a reversible neural network is used to perform a second sampling process on the sixth image to obtain the fourth image.
  • an image processing device including:
  • a processing module configured to use a reversible neural network to perform a first sampling process on the acquired first image to obtain a second Image; the resolution of the second image is lower than the resolution of the first image;
  • a generating module configured to encode and compress the second image and generate a target code stream.
  • an image processing device including:
  • the decoding module is used to decode the obtained target code stream and obtain the third image
  • An operation module configured to perform a first operation on the third image to obtain a fourth image; the resolution of the third image is lower than the resolution of the fourth image;
  • the first operation includes any one of the following:
  • a sixth image is obtained, and a reversible neural network is used to perform a second sampling process on the sixth image to obtain the fourth image.
  • a terminal in a fifth aspect, includes a processor and a memory.
  • the memory stores programs or instructions that can be run on the processor.
  • the program or instructions are executed by the processor, the following implementations are implemented: The steps of the method described in one aspect, or the steps of implementing the method described in the second aspect.
  • a readable storage medium is provided. Programs or instructions are stored on the readable storage medium. When the programs or instructions are executed by a processor, the steps of the method described in the first aspect are implemented, or the steps of the method are implemented as described in the first aspect. The steps of the method described in the second aspect.
  • a chip in a seventh aspect, includes a processor and a communication interface.
  • the communication interface is coupled to the processor.
  • the processor is used to run programs or instructions to implement the method described in the first aspect. , or implement the method described in the second aspect.
  • a computer program/program product is provided, the computer program/program product is stored in a storage medium, and the computer program/program product is executed by at least one processor to implement the method described in the first aspect The steps of a method, or steps of implementing a method as described in the second aspect.
  • a system in a ninth aspect, includes an encoding end and a decoding end.
  • the encoding end performs the steps of the method described in the first aspect.
  • the decoding end performs the steps of the method described in the second aspect. step.
  • the encoding end uses a reversible neural network to perform a first sampling process on the acquired first image to obtain a second image, and further encodes and compresses the second image to generate a target code stream; by using the reversible neural network
  • the first sampling process is performed to increase the compression ratio of the image.
  • the target code stream obtained by the decoding end obtains the third image, and further performs a first operation on the third image.
  • the above-mentioned first operation includes image enhancement processing, and the image quality is improved through the image enhancement processing, thereby improving the image quality at the same time. , improve the compression ratio of the image.
  • Figure 1 is one of the flow diagrams of the image processing method provided by the embodiment of the present application.
  • Figure 2 is a schematic diagram of a rescaling module provided by an embodiment of the present application.
  • Figure 3 is a schematic diagram of a learnable compression coding and decoding module provided by an embodiment of the present application
  • Figure 4 is the second schematic flowchart of the image processing method provided by the embodiment of the present application.
  • Figure 5 is a schematic diagram of the application framework of the image processing method provided by the embodiment of the present application.
  • Figure 6 is one of the structural diagrams of the image processing device provided by the embodiment of the present application.
  • Figure 7 is the second structural diagram of the image processing device provided by the embodiment of the present application.
  • Figure 8 is a structural diagram of a communication device provided by an embodiment of the present application.
  • Figure 9 is a schematic diagram of the hardware structure of a terminal provided by an embodiment of the present application.
  • first, second, etc. in the description and claims of this application are used to distinguish similar objects and are not used to describe a specific order or sequence. It is to be understood that the terms so used are interchangeable under appropriate circumstances so that the embodiments of the present application can be practiced in sequences other than those illustrated or described herein, and that "first" and “second” are distinguished objects It is usually one type, and the number of objects is not limited.
  • the first object can be one or multiple.
  • “and/or” in the description and claims indicates at least one of the connected objects, and the character “/" generally indicates that the related objects are in an "or” relationship.
  • FIG. 1 is one of the flow charts of the image processing method provided by the embodiment of the present application.
  • the image processing method provided in this embodiment includes the following steps:
  • the encoding end uses a reversible neural network to perform first sampling processing on the acquired first image to obtain a second image.
  • the encoding end may use the rescaling module to perform first sampling processing on the acquired first image to obtain the second image.
  • the rescaling module is implemented based on the structure of the reversible neural network.
  • Figure 2 is a schematic diagram of a rescaling module provided by an embodiment of the present application.
  • the affine coupling layer in Figure 2 is the neural network layer in the rescaling module.
  • the resolution of the second image is lower than the resolution of the first image.
  • the first image is also called the original image, and the second image is also called the low-resolution image.
  • S102 The encoding end encodes and compresses the second image to generate a target code stream.
  • an optional implementation method is that the encoding end inputs the second image to the neural network encoder, and the neural network encoder encodes and compresses the second image to generate a target code stream.
  • Figure 3 is a schematic diagram of a learnable compression encoding and decoding module provided by an embodiment of the present application.
  • the encoder in the learnable compression encoding and decoding module shown in Figure 3 is the above-mentioned neural network encoder.
  • the encoding end inputs the second image to an image encoder in the related art, and the image The encoder encodes and compresses the second image to generate a target code stream.
  • the encoding end uses a reversible neural network to perform a first sampling process on the acquired first image to obtain a second image, and further encodes and compresses the second image to generate a target code stream; by using the reversible neural network
  • the first sampling process is performed to increase the compression ratio of the image.
  • the target code stream obtained by the decoding end obtains the third image, and further performs a first operation on the third image.
  • the above-mentioned first operation includes image enhancement processing, and the image quality is improved through the image enhancement processing, thereby improving the image quality at the same time. , improve the compression ratio of the image.
  • using a reversible neural network to perform a first sampling process on the acquired first image to obtain the second image includes:
  • the encoding end performs discrete wavelet transformation on the acquired first image to obtain the first component and the second component corresponding to the first image;
  • the encoding end performs downsampling processing on the first component and the second component to obtain a second image.
  • This embodiment specifically explains how to use the rescaling module to perform the first sampling process on the acquired first image.
  • the rescaling module performs discrete wavelet transform on the first image, and decomposes the first image into a first component and a second component according to the frequency corresponding to each pixel of the first image, where the first component is also called low frequency. component, the second component is also called the high-frequency component.
  • the first component and the second component are input to the affine coupling layer shown in Figure 2, and after a series of affine coupling layer processes, the second image obtained after downsampling processing is output.
  • the reversible neural network includes multiple substructures, that is, the reversible neural network includes multiple affine coupling layers.
  • the encoding end can perform downsampling processing on the first component and the second component according to the first formula to obtain the second image.
  • the first formula is Characterizes the output of the i-th substructure in the reversible neural network
  • the second image is the output of the last substructure in the reversible neural network
  • To represent the first component To represent the second component, Characterize the processing results of the second component by the reversible neural network.
  • D( ⁇ ) represents the distortion loss function
  • Characterizing the fourth image represents the first image
  • represents the network parameters
  • N represents the number of training samples
  • l 2 represents the difference between the first image and the fourth image.
  • the difference between the first image and the fourth image can be calculated in several ways, which are not limited here.
  • the difference between the first image and the fourth image may be the mean square error (MSE) of each pixel between the first image and the fourth image, or the difference between the first image and the fourth image.
  • the sum of absolute differences (SAD) of each pixel point between the images can be the structural similarity (Structural Similarity (SSIM) of each pixel point between the first image and the fourth image, which can be the first image Multi-scale structural similarity (SSIM) of each pixel between the fourth image and the fourth image.
  • SSIM structural similarity
  • SSIM Multi-scale structural similarity
  • a reversible neural network is used to perform a first sampling process on the acquired first image, thereby reducing the code rate of directly compressing the first image, thereby improving the compression ratio of the image.
  • the encoding compresses the second image and generating the target code stream includes:
  • the encoding end uses a neural network encoder to compress the second image to obtain the first feature variable corresponding to the second image;
  • the encoding end performs quantization and arithmetic coding on the first feature variable to generate a target code stream.
  • This embodiment specifically explains how to use a neural network encoder to encode and compress the second image.
  • the "original image” in Figure 3 is the second image
  • the "encoder” in Figure 3 is the neural network encoder.
  • the second image is processed by the neural network encoder to obtain the latent variable y
  • feature extraction is performed on the latent variable y to obtain the first feature variable
  • the target code stream is generated by performing quantization and arithmetic coding on the first feature encoding.
  • the above arithmetic coding is a coding method of entropy coding, and other types of entropy coding operations may also be used in this embodiment.
  • rate loss function corresponding to the learnable compression encoding and decoding module shown in Figure 3 can be expressed by the following formula:
  • R( ⁇ ) represents the rate loss function
  • E represents the mathematical expectation
  • N represents the number of training samples
  • y represents the latent variable
  • Characterizing the fit of latent variables representation information entropy.
  • the method also includes:
  • the encoding end uses a reversible neural network to perform a first sampling process on at least part of the original feature map corresponding to the first image to obtain a first feature map;
  • the encoding end When performing the first sampling process on part of the original feature maps corresponding to the first image, the encoding end encodes and compresses the first feature map and the second feature map to generate a feature map code stream; the second The feature map is the original feature map corresponding to the first image that has not been subjected to the first sampling process;
  • the encoding end When performing the first sampling process on all original feature maps corresponding to the first image, the encoding end encodes and compresses the first feature map to generate a feature map code stream.
  • the image processing method provided by the embodiment of the present application can also be applied to processing feature maps.
  • the encoding end can use a neural network to extract the original feature map corresponding to the first image.
  • the above-mentioned neural network includes but is not limited to Convolutional Neural Network (Feature Pyramid Networks, FPN) and Fast Region Convolutional Neural Networks (R-CNN).
  • the first image in this example is composed of 3 color channels and has a resolution of W ⁇ H, where W is the image width and H is the image height.
  • Use FPN to extract features from the first image and obtain 4 original feature maps, namely P2, P3, P4 and P5.
  • the resolution corresponding to P2 is The resolution corresponding to P3 is The resolution corresponding to P4 is The resolution corresponding to P5 is The number of channels corresponding to the original feature maps is 256.
  • An optional implementation is that the encoding end uses a reversible neural network to perform the first sampling process on part of the original feature maps.
  • the encoding end performs the first sampling process on the second feature map (that is, the original features that have not been subjected to the first sampling process).
  • Figure and the first feature map obtained through the first sampling process are encoded and compressed to generate a feature map code stream.
  • the specific implementation of performing the first sampling process on part of the original feature maps is the same as the implementation of the first sampling process in the above embodiment. Consistent, will not be repeated here.
  • the encoding end uses a reversible neural network to perform a first sampling process on all original feature maps.
  • the encoding end encodes and compresses the first feature map obtained through the first sampling process. , generate feature image code stream.
  • the encoding end can normalize the feature map before performing encoding and compression.
  • val mew represents the value of the sample point after normalization
  • val ori represents the value of the sample point before normalization
  • the encoding end will also encode the above-mentioned maximum value (norm max ) and minimum value (norm min ) and transmit them to the decoding end.
  • the encoded data is dispersed as much as possible, thereby reducing the loss during the encoding process, thereby improving the encoding and compression effect.
  • a reversible neural network is used to perform a first sampling process on at least part of the original feature map corresponding to the first image, thereby reducing the code stream for directly compressing the feature map, thereby improving the compression ratio of the feature map.
  • FIG. 4 is the second flow chart of the image processing method provided by the embodiment of the present application.
  • the image processing method provided in this embodiment includes the following steps:
  • the decoder decodes the obtained target code stream and obtains the third image.
  • an optional implementation method is that the decoding end inputs the obtained target code stream to the neural network decoder, and the neural network decoder decodes the target code stream to obtain the third image.
  • Figure 3 is a schematic diagram of a learnable compression coding and decoding module provided by an embodiment of the present application.
  • the decoder in the learnable compression coding and decoding module shown in Figure 3 is the above-mentioned neural network decoder.
  • decoding end inputs the second image to an image decoder in the related art, and the image decoder encodes the target code stream to obtain a third image.
  • the decoder performs a first operation on the third image to obtain a fourth image.
  • the above-mentioned first operation includes any one of the following:
  • a reversible neural network to perform second sampling processing on the third image, a fifth image is obtained, and image enhancement processing is performed on the fifth image to obtain a fourth image; after image enhancement processing is performed on the third image, a sixth image is obtained, and using The reversible neural network performs a second sampling process on the sixth image to obtain a fourth image.
  • the resolution of the third image is lower than the resolution of the fourth image.
  • An optional implementation manner is to first perform a second sampling process on the third image to obtain a fifth image, and then perform image enhancement processing on the fifth image to obtain a fourth image.
  • the third image can be input into the rescaling module shown in FIG. 2 to obtain the sixth image output by the rescaling module, and then the enhancement module is used to perform image enhancement processing on the fifth image to obtain the fourth image.
  • the enhancement module is used to perform image enhancement processing on the fifth image to obtain the fourth image.
  • Another optional implementation is to first perform image enhancement processing on the third image to obtain a sixth image, and then perform second sampling processing on the sixth image to obtain a fourth image.
  • the enhancement module can be used to perform image enhancement processing on the third image to obtain a sixth image, and then the sixth image is input into the rescaling module shown in FIG. 2 to obtain a fourth image output by the rescaling module.
  • the rescaling module can be used to perform image enhancement processing on the third image to obtain a sixth image, and then the sixth image is input into the rescaling module shown in FIG. 2 to obtain a fourth image output by the rescaling module.
  • the above enhancement module is based on the Residual Channel Attention Network (RCAN).
  • RCAN Residual Channel Attention Network
  • set the residual connection group in the RCAN network to 5
  • the residual channel attention block in the RCAN network The number of Residual Channel Attention Block (RCAB) is 10.
  • the enhancement module can also be constructed based on other neural networks.
  • enhancement loss function corresponding to the enhancement module can be expressed by the following formula:
  • E( ⁇ ) represents the enhanced loss function
  • N represents the number of training samples
  • l 2 represents the difference between the image before enhancement processing and the image after enhancement processing.
  • the difference between the above-mentioned image before enhancement processing and the image after enhancement processing may be the mean square error of each pixel between the image before enhancement processing and the image after enhancement processing, or may be the difference between the image before enhancement processing and the image after enhancement processing.
  • the sum of the absolute values of the differences of each pixel between the image before enhancement and the image after enhancement can be the structural similarity of each pixel between the image before enhancement and the image after enhancement, or it can be the difference between the image before enhancement and the image after enhancement.
  • the image quality is improved by using the enhancement module for image enhancement processing.
  • the decoder performing the first operation on the third image includes:
  • the decoding end performs an upsampling process on the target image to obtain a third component, and determines a fourth component corresponding to the target image according to the component information; the target image is the third image or the sixth image;
  • the decoding end performs inverse discrete wavelet transform on the third component and the fourth component.
  • This embodiment specifically explains how to use the rescaling module to perform the first operation on the acquired second image.
  • the target image is input to the affine coupling layer shown in Figure 2.
  • the third component and the fourth component corresponding to the target image are output.
  • the above third component and The fourth component is subjected to inverse discrete wavelet transform to obtain the fourth image.
  • the enhanced image in Figure 2 is the target image
  • the reconstructed image in Figure 2 is the fourth image
  • the target image is the third image or the sixth image
  • the third component is the low-frequency component corresponding to the target image
  • the fourth component is the target The high-frequency component corresponding to the image.
  • the third component can be obtained by upsampling the target image according to the second formula, where the second formula is represents the third component, Characterizes the output of the i+1th substructure of the reversible neural network.
  • the third image is the output of the last substructure of the reversible neural network. Characterize the processing results of the fourth component by the reversible neural network.
  • the fourth component corresponding to the target image can be determined according to the third formula, where the third formula is Represents the fourth component, Characterizes the component information corresponding to the fourth component, Characterizes the processing results of component information by the reversible neural network, ⁇ is the preset parameter, Characterize the processing results of component information by the reversible neural network.
  • decoding the acquired target code stream to obtain the third image includes:
  • the decoding end performs arithmetic decoding and inverse quantization on the obtained target code stream to obtain the second feature variable
  • the decoding end uses a neural network decoder to decompress the second feature variable to obtain the third image.
  • This embodiment specifically explains how to use a neural network decoder to decode the obtained target code stream.
  • the "compressed image” in Figure 3 is the third image
  • the "decoder” in Figure 3 is the neural network decoder.
  • the decoder performs arithmetic decoding and inverse quantization on the acquired target code stream to obtain the decoded latent variables. and latent variables Feature extraction is performed to obtain the second feature variable, and a neural network decoder is used to decompress the second feature variable to obtain the third image.
  • the above arithmetic decoding is a coding method of entropy decoding, and other types of entropy decoding operations may also be used in this embodiment.
  • X H shown in Figure 5 represents the first image
  • X L represents the second image
  • X E represents the image after image enhancement processing
  • the distortion loss represents the image loss between the first image and the fourth image
  • the enhancement loss represents the image loss between the second image and the image after image enhancement processing
  • the quality gap represents the image quality gap between the second image and the third image
  • the rate loss represents the loss after lossy encoding by the learnable compression encoder.
  • the total loss function corresponding to the application framework shown in Figure 5 is the weighted sum of the above-mentioned rate loss function and distortion loss function, where the enhancement loss function is only used in the training phase of the enhancement module and is not included in the total loss function.
  • each neural network module included in the application framework shown in Figure 5 one possible way to implement it is to first train the rescaling module and the learnable compression codec together, and then train the enhancement module separately. After the training of the above three modules is completed, the neural network parameters in the enhancement module are adjusted under the joint action of the rescaling module and the learnable compression codec. Finally, the neural network parameters in the above three modules are adjusted in an end-to-end manner. Network parameters.
  • Another possible implementation is to train the rescaling module, learnable compression codec and enhancement module independently.
  • the encoding end uses the rescaling module to perform the first sampling process on the first image to obtain the second image; the encoding end uses the learnable compression codec to encode the second image to obtain the target code flow.
  • the decoding end uses the learnable compression codec to decode the target code stream to obtain the third image; the decoding end uses the enhancement module to perform image enhancement processing on the third image to obtain the image after image enhancement; the decoding end uses the rescaling module to perform image enhancement on the image.
  • the enhanced image is subjected to a second sampling process to obtain a fourth image.
  • the decoder may first use the rescaling module to perform the second sampling process on the third image, and then use the enhancement module to perform image enhancement processing on the image after the second sampling process.
  • the learnable compression codec includes a rescaling module, ie, the rescaling module is included as part of the learnable compression codec.
  • the above-mentioned learnable compression codec may be replaced with a codec in related art.
  • the method also includes:
  • the decoder When the decoder decodes the obtained feature map code stream to obtain a third feature map, the decoder performs a first operation on the third feature map to determine all reconstructed feature maps corresponding to the fourth image;
  • the decoder When the decoder decodes the obtained feature map code stream to obtain a third feature map and a partial original feature map, it determines all reconstructions corresponding to the fourth image based on the partially reconstructed feature map and the fourth feature map. Feature map.
  • An optional implementation manner is that the decoding end decodes the obtained feature map code stream to obtain the third feature map.
  • a first operation is performed on the third feature map to determine all reconstructed feature maps corresponding to the fourth image.
  • the decoder decodes the obtained feature map code stream to obtain the third feature map and the partially reconstructed feature map.
  • a first operation is performed on the third feature map to determine the fourth feature map, and the partially reconstructed feature map and the fourth feature map are determined as all reconstructed feature maps corresponding to the fourth image.
  • the first operation includes any one of the following: using a reversible neural network to perform a second sampling process on the third feature map, and performing image enhancement processing on the third feature map after the second sampling process; performing a second sampling process on the third feature map.
  • Image enhancement processing, and performing a second sampling process on the third feature map after the image enhancement processing are examples of the above-mentioned first operation.
  • the execution subject may be an image processing device.
  • the image processing device is used at the encoding end to perform the image processing method as an example to illustrate the image processing device provided by the embodiment of the present application.
  • this embodiment of the present application also provides an image processing device 600, including:
  • the first processing module 601 is configured to use a reversible neural network to perform a first sampling process on the acquired first image to obtain a second image; the resolution of the second image is lower than the resolution of the first image;
  • the first generation module 602 is used to encode and compress the second image to generate a target code stream.
  • the first processing module 601 is specifically used to:
  • the first generation module 602 is specifically used to:
  • the first feature variable is quantized and arithmetic encoded to generate a target code stream.
  • the image processing device 600 also includes:
  • a second processing module configured to use a reversible neural network to perform first sampling processing on at least part of the original feature map corresponding to the first image to obtain a first feature map
  • a second generation module configured to encode and compress the first feature map and the second feature map to generate a feature map code stream when performing the first sampling process on part of the original feature map corresponding to the first image;
  • the third generation module is configured to encode and compress the first feature map to generate a feature map code stream when performing the first sampling process on all original feature maps corresponding to the first image.
  • the execution subject may be an image processing device.
  • the image processing device is used in the decoder to perform the image processing method as an example to illustrate the image processing device provided by the embodiment of the present application.
  • this embodiment of the present application also provides an image processing device 700, including:
  • the decoding module 701 is used to decode the obtained target code stream to obtain the third image
  • the operation module 702 is used to perform a first operation on the third image to obtain a fourth image; the resolution of the third image is lower than the resolution of the fourth image;
  • the first operation includes any one of the following:
  • a sixth image is obtained, and a reversible neural network is used to perform a second sampling process on the sixth image to obtain the fourth image.
  • the operation module 702 is specifically used to:
  • the target image is the third image or the sixth image;
  • the third component and the fourth component are subjected to an inverse discrete wavelet transform.
  • the decoding module 701 is specifically used to:
  • the image processing device 700 also includes:
  • the first determination module is used for the decoder to perform a first operation on the third feature map to determine all the features corresponding to the fourth image when the decoder decodes the obtained feature map code stream to obtain the third feature map.
  • the second determination module is used for the decoding end to determine the third feature map and the partially reconstructed feature map based on the partially reconstructed feature map and the fourth feature map when decoding the obtained feature map code stream. All reconstructed feature maps corresponding to the fourth image; the fourth feature map is determined based on the first operation on the third feature map;
  • the first operation includes any one of the following:
  • the encoding end uses a reversible neural network to perform a first sampling process on the acquired first image to obtain a second image, and further encodes and compresses the second image to generate a target code stream; by using the reversible neural network
  • the first sampling process is performed to increase the compression ratio of the image.
  • the target code stream obtained by the decoding end obtains the third image, and further performs a first operation on the third image.
  • the above-mentioned first operation includes image enhancement processing, and the image quality is improved through the image enhancement processing, thereby improving the image quality at the same time. , improve the compression ratio of the image.
  • the image processing device applied to the encoding end provided by the embodiment of the present application can implement each process implemented by the method embodiment in Figure 1 and achieve the same technical effect. To avoid duplication, the details will not be described here.
  • the image processing device applied to the decoding end provided by the embodiment of the present application can implement each process implemented by the method embodiment in Figure 4 and achieve the same technical effect. To avoid duplication, the details will not be described here.
  • the image processing device in the embodiment of the present application may be an electronic device, such as an electronic device with an operating system, or may be a component in the electronic device, such as an integrated circuit or chip.
  • the electronic device may be a terminal or other devices other than the terminal.
  • terminals may include but are not limited to the types of terminals listed above, and other devices may be servers, network attached storage (Network Attached Storage, NAS), etc., which are not specifically limited in the embodiments of this application.
  • NAS Network Attached Storage
  • this embodiment of the present application also provides a communication device 800, which includes a processor 801 and a memory 802.
  • the memory 802 stores programs or instructions that can be run on the processor 801, such as , when the communication device 800 is a terminal, when the program or instruction is executed by the processor 801, each step of the above image processing method embodiment is implemented, and the same technical effect can be achieved.
  • An embodiment of the present application also provides a terminal, including a processor 801 and a communication interface.
  • the processor 801 is configured to perform the following operations:
  • processor 801 is configured to perform the following operations:
  • the first operation includes any one of the following:
  • a sixth image is obtained, and a reversible neural network is used to perform a second sampling process on the sixth image to obtain the fourth image.
  • This terminal embodiment corresponds to the above-mentioned terminal-side method embodiment.
  • Each implementation process and implementation manner of the above-mentioned method embodiment can be applied to this terminal embodiment, and can achieve the same technical effect.
  • Figure 9 shows the implementation of this application Schematic diagram of the hardware structure of a terminal according to the embodiment.
  • the terminal 900 includes but is not limited to: a radio frequency unit 901, a network module 902, an audio output unit 903, an input unit 904, a sensor 905, a display unit 906, a user input unit 907, an interface unit 907, a memory 909, a processor 910 and other components. .
  • the terminal 900 may also include a power supply (such as a battery) that supplies power to various components.
  • the power supply may be logically connected to the processor 910 through a power management system, thereby managing charging, discharging, and power consumption through the power management system. Management and other functions.
  • the terminal structure shown in FIG. 9 does not constitute a limitation on the terminal.
  • the terminal may include more or fewer components than shown in the figure, or may combine certain components, or arrange different components, which will not be described again here.
  • the input unit 904 may include a graphics processor (Graphics Processing Unit, GPU) 9041 and a microphone 9042.
  • the graphics processor 9041 is responsible for the image capture device (GPU) in the video capture mode or the image capture mode. Process the image data of still pictures or videos obtained by cameras (such as cameras).
  • the display unit 906 may include a display panel 9061, and the display panel 9071 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like.
  • the user input unit 907 includes a touch panel 9071 and at least one of other input devices 9072 .
  • Touch panel 9071 also known as touch screen.
  • the touch panel 9071 may include two parts: a touch detection device and a touch controller.
  • Other input devices 9072 may include but are not limited to physical keyboards, function keys (such as volume control keys, switch keys, etc.), trackballs, mice, and joysticks, which will not be described again here.
  • the radio frequency unit 901 after receiving downlink data from the network side device, the radio frequency unit 901 can transmit it to the processor 910 for processing; the radio frequency unit 901 can send uplink data to the network side device.
  • the radio frequency unit 901 includes, but is not limited to, an antenna, an amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, etc.
  • Memory 909 may be used to store software programs or instructions as well as various data.
  • the memory 909 may mainly include a first storage area for storing programs or instructions and a second storage area for storing data, wherein the first storage area may store an operating system, an application program or instructions required for at least one function (such as a sound playback function, Image playback function, etc.) etc.
  • memory 909 may include volatile memory or nonvolatile memory, or memory 909 may include both volatile and nonvolatile memory.
  • non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electrically removable memory.
  • Volatile memory can be random access memory (Random Access Memory, RAM), static random access memory (Static RAM, SRAM), dynamic random access memory (Dynamic RAM, DRAM), synchronous dynamic random access memory (Synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDRSDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), synchronous link dynamic random access memory (Synch link DRAM) , SLDRAM) and direct memory bus random access memory (Direct Rambus RAM, DRRAM).
  • RAM Random Access Memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • synchronous dynamic random access memory Synchronous DRAM, SDRAM
  • Double data rate synchronous dynamic random access memory Double Data Rate SDRAM, DDRSDRAM
  • Enhanced SDRAM, ESDRAM synchronous link dynamic random access memory
  • Synch link DRAM synchronous link dynamic random access memory
  • SLDRAM direct memory bus random access memory
  • the processor 910 may include one or more processing units; optionally, the processor 910 integrates an application processor and a modem processor, where the application processor mainly handles operations related to the operating system, user interface, application programs, etc., Modem processors mainly process wireless communication signals, such as baseband processors. It can be understood that the above modem processor may not be integrated into the processor 910.
  • the processor 910 is used to perform the following operations:
  • processor 910 is configured to perform the following operations:
  • the first operation includes any one of the following:
  • a sixth image is obtained, and a reversible neural network is used to perform a second sampling process on the sixth image to obtain the fourth image.
  • Embodiments of the present application also provide a readable storage medium.
  • Programs or instructions are stored on the readable storage medium.
  • the program or instructions are executed by a processor, each process of the above image processing method embodiment is implemented and the same can be achieved. The technical effects will not be repeated here to avoid repetition.
  • the processor is the processor in the terminal described in the above embodiment.
  • the readable storage medium includes computer readable storage media, such as computer read-only memory ROM, random access memory RAM, magnetic disk or optical disk, etc.
  • An embodiment of the present application further provides a chip.
  • the chip includes a processor and a communication interface.
  • the communication interface is coupled to the processor.
  • the processor is used to run programs or instructions to implement the above image processing method embodiments. Each process can achieve the same technical effect. To avoid duplication, it will not be described again here.
  • chips mentioned in the embodiments of this application may also be called system-on-chip, system-on-a-chip, system-on-chip or system-on-chip, etc.
  • Embodiments of the present application further provide a computer program/program product.
  • the computer program/program product is stored in a storage medium.
  • the computer program/program product is executed by at least one processor to implement the above image processing method embodiment.
  • Each process can achieve the same technical effect. To avoid repetition, we will not go into details here.
  • Embodiments of the present application further provide a system.
  • the system includes an encoding end and a decoding end.
  • the encoding end performs the various processes of the image processing method embodiments applied to the encoding end.
  • the decoding end performs the above processes applied to the decoding end.
  • Each process of the image processing method embodiment can achieve the same technical effect. To avoid repetition, it will not be described again here.
  • the methods of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better. implementation.
  • the technical solution of the present application can be embodied in the form of a computer software product that is essentially or contributes to related technologies.
  • the computer software product is stored in a storage medium (such as ROM/RAM, disk, CD), including several instructions to cause a terminal (which can be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in various embodiments of this application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The present application relates to the technical field of image compression, and discloses an image processing method and apparatus, and a device. The image processing method in the embodiments of the present application comprises: an encoding end performs first sampling processing on an obtained first image by using an invertible neural network, so as to obtain a second image, the resolution of the second image being lower than that of the first image; and the encoding end encodes and compresses the second image to generate a target code stream.

Description

图像处理方法、装置及设备Image processing methods, devices and equipment
相关申请的交叉引用Cross-references to related applications
本申请主张在2022年07月07日在中国提交的中国专利申请No.202210803766.8的优先权,以及,主张在2022年10月13日在中国提交的中国专利申请No.202211256194.2的优先权,其全部内容通过引用包含于此。This application claims the priority of Chinese Patent Application No. 202210803766.8 filed in China on July 7, 2022, and claims the priority of Chinese Patent Application No. 202211256194.2 filed in China on October 13, 2022, all of which The content is incorporated herein by reference.
技术领域Technical field
本申请属于图像压缩技术领域,具体涉及一种图像处理方法、装置及设备。The present application belongs to the field of image compression technology, and specifically relates to an image processing method, device and equipment.
背景技术Background technique
在图像压缩技术领域,传统的图像压缩标准例如图像专家联合小组(Joint Photographic Experts Group,JPEG)标准,是对普遍应用场景下的广泛图像进行压缩的一种图像压缩技术,存在图像的压缩比不高的缺陷;基于深度学习的图像压缩技术,在低比特率压缩时,存在重建图像的图像质量不高的缺陷。如何在保证图像质量的前提下提高图像的压缩比,是本领域待以解决的技术难题。In the field of image compression technology, traditional image compression standards such as the Joint Photographic Experts Group (JPEG) standard are an image compression technology that compresses a wide range of images in common application scenarios. There are differences in image compression ratios. High defect; image compression technology based on deep learning has the defect that the image quality of the reconstructed image is not high when compressing at a low bit rate. How to improve the compression ratio of images while ensuring image quality is a technical problem to be solved in this field.
发明内容Contents of the invention
本申请实施例提供一种图像处理方法、装置及设备,能够解决相关技术中无法在保证图像质量的前提下提高图像的压缩比的问题。The embodiments of the present application provide an image processing method, device and equipment, which can solve the problem in related technologies that the image compression ratio cannot be improved while ensuring the image quality.
第一方面,提供了一种图像处理方法,包括:The first aspect provides an image processing method, including:
编码端利用可逆神经网络对获取到的第一图像进行第一采样处理,得到第二图像;所述第二图像的分辨率低于所述第一图像的分辨率;The encoding end uses a reversible neural network to perform a first sampling process on the acquired first image to obtain a second image; the resolution of the second image is lower than the resolution of the first image;
所述编码端编码压缩所述第二图像,生成目标码流。The encoding end codes and compresses the second image to generate a target code stream.
第二方面,提供一种图像处理方法,包括:The second aspect provides an image processing method, including:
解码端解码获取到的目标码流,得到第三图像;The decoding end decodes the obtained target code stream and obtains the third image;
所述解码端对所述第三图像进行第一操作,得到第四图像;所述第三图像的分辨率低于所述第四图像的分辨率;The decoding end performs a first operation on the third image to obtain a fourth image; the resolution of the third image is lower than the resolution of the fourth image;
其中,所述第一操作包括以下任意一项:Wherein, the first operation includes any one of the following:
利用可逆神经网络对所述第三图像进行第二采样处理后得到第五图像,并对所述第五图像进行图像增强处理得到所述第四图像;Using a reversible neural network to perform second sampling processing on the third image to obtain a fifth image, and performing image enhancement processing on the fifth image to obtain the fourth image;
对所述第三图像进行图像增强处理后,得到第六图像,并利用可逆神经网络对所述第六图像进行第二采样处理后得到所述第四图像。After performing image enhancement processing on the third image, a sixth image is obtained, and a reversible neural network is used to perform a second sampling process on the sixth image to obtain the fourth image.
第三方面,提供了一种图像处理装置,包括:In a third aspect, an image processing device is provided, including:
处理模块,用于利用可逆神经网络对获取到的第一图像进行第一采样处理,得到第二 图像;所述第二图像的分辨率低于所述第一图像的分辨率;A processing module configured to use a reversible neural network to perform a first sampling process on the acquired first image to obtain a second Image; the resolution of the second image is lower than the resolution of the first image;
生成模块,用于编码压缩所述第二图像,生成目标码流。A generating module, configured to encode and compress the second image and generate a target code stream.
第四方面,提供了一种图像处理装置,包括:In a fourth aspect, an image processing device is provided, including:
解码模块,用于解码获取到的目标码流,得到第三图像;The decoding module is used to decode the obtained target code stream and obtain the third image;
操作模块,用于对所述第三图像进行第一操作,得到第四图像;所述第三图像的分辨率低于所述第四图像的分辨率;An operation module, configured to perform a first operation on the third image to obtain a fourth image; the resolution of the third image is lower than the resolution of the fourth image;
其中,所述第一操作包括以下任意一项:Wherein, the first operation includes any one of the following:
利用可逆神经网络对所述第三图像进行第二采样处理后得到第五图像,并对所述第五图像进行图像增强处理得到所述第四图像;Using a reversible neural network to perform second sampling processing on the third image to obtain a fifth image, and performing image enhancement processing on the fifth image to obtain the fourth image;
对所述第三图像进行图像增强处理后,得到第六图像,并利用可逆神经网络对所述第六图像进行第二采样处理后得到所述第四图像。After performing image enhancement processing on the third image, a sixth image is obtained, and a reversible neural network is used to perform a second sampling process on the sixth image to obtain the fourth image.
第五方面,提供了一种终端,该终端包括处理器和存储器,所述存储器存储可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如第一方面所述的方法的步骤,或者实现如第二方面所述的方法的步骤。In a fifth aspect, a terminal is provided. The terminal includes a processor and a memory. The memory stores programs or instructions that can be run on the processor. When the program or instructions are executed by the processor, the following implementations are implemented: The steps of the method described in one aspect, or the steps of implementing the method described in the second aspect.
第六方面,提供了一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如第一方面所述的方法的步骤,或者实现如第二方面所述的方法的步骤。In a sixth aspect, a readable storage medium is provided. Programs or instructions are stored on the readable storage medium. When the programs or instructions are executed by a processor, the steps of the method described in the first aspect are implemented, or the steps of the method are implemented as described in the first aspect. The steps of the method described in the second aspect.
第七方面,提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如第一方面所述的方法,或者实现如第二方面所述的方法。In a seventh aspect, a chip is provided. The chip includes a processor and a communication interface. The communication interface is coupled to the processor. The processor is used to run programs or instructions to implement the method described in the first aspect. , or implement the method described in the second aspect.
第八方面,提供了一种计算机程序/程序产品,所述计算机程序/程序产品被存储在存储介质中,所述计算机程序/程序产品被至少一个处理器执行以实现如第一方面所述的方法的步骤,或者实现如第二方面所述的方法的步骤。In an eighth aspect, a computer program/program product is provided, the computer program/program product is stored in a storage medium, and the computer program/program product is executed by at least one processor to implement the method described in the first aspect The steps of a method, or steps of implementing a method as described in the second aspect.
第九方面,提供了一种系统,所述系统包括编码端和解码端,所述编码端执行如第一方面所述的方法的步骤,所述解码端执行如第二方面所述的方法的步骤。In a ninth aspect, a system is provided. The system includes an encoding end and a decoding end. The encoding end performs the steps of the method described in the first aspect. The decoding end performs the steps of the method described in the second aspect. step.
本申请实施例中,编码端利用可逆神经网络对获取到的第一图像进行第一采样处理,得到第二图像,进一步的编码压缩所述第二图像,生成目标码流;通过使用可逆神经网络执行第一采样处理提高图像的压缩比。解码端获取到的目标码流,得到第三图像,进一步的对第三图像进行第一操作,上述第一操作包括图像增强处理,通过图像增强处理提高图像质量,以此在提高图像质量的同时,提高图像的压缩比。In the embodiment of this application, the encoding end uses a reversible neural network to perform a first sampling process on the acquired first image to obtain a second image, and further encodes and compresses the second image to generate a target code stream; by using the reversible neural network The first sampling process is performed to increase the compression ratio of the image. The target code stream obtained by the decoding end obtains the third image, and further performs a first operation on the third image. The above-mentioned first operation includes image enhancement processing, and the image quality is improved through the image enhancement processing, thereby improving the image quality at the same time. , improve the compression ratio of the image.
附图说明Description of the drawings
图1是本申请实施例提供的图像处理方法的流程示意图之一;Figure 1 is one of the flow diagrams of the image processing method provided by the embodiment of the present application;
图2是本申请实施例提供的重缩放模块的示意图;Figure 2 is a schematic diagram of a rescaling module provided by an embodiment of the present application;
图3是本申请实施例提供的可学习压缩编解码模块的示意图; Figure 3 is a schematic diagram of a learnable compression coding and decoding module provided by an embodiment of the present application;
图4是本申请实施例提供的图像处理方法的流程示意图之二;Figure 4 is the second schematic flowchart of the image processing method provided by the embodiment of the present application;
图5是本申请实施例提供的图像处理方法的应用框架示意图;Figure 5 is a schematic diagram of the application framework of the image processing method provided by the embodiment of the present application;
图6是本申请实施例提供的图像处理装置的结构图之一;Figure 6 is one of the structural diagrams of the image processing device provided by the embodiment of the present application;
图7是本申请实施例提供的图像处理装置的结构图之二;Figure 7 is the second structural diagram of the image processing device provided by the embodiment of the present application;
图8是本申请实施例提供的通信设备的结构图;Figure 8 is a structural diagram of a communication device provided by an embodiment of the present application;
图9是本申请实施例提供的终端的硬件结构示意图。Figure 9 is a schematic diagram of the hardware structure of a terminal provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art fall within the scope of protection of this application.
本申请的说明书和权利要求书中的术语“第一”、“第二”等是用于区别类似的对象,而不用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,以便本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施,且“第一”、“第二”所区别的对象通常为一类,并不限定对象的个数,例如第一对象可以是一个,也可以是多个。此外,说明书以及权利要求中“和/或”表示所连接对象的至少其中之一,字符“/”一般表示前后关联对象是一种“或”的关系。The terms "first", "second", etc. in the description and claims of this application are used to distinguish similar objects and are not used to describe a specific order or sequence. It is to be understood that the terms so used are interchangeable under appropriate circumstances so that the embodiments of the present application can be practiced in sequences other than those illustrated or described herein, and that "first" and "second" are distinguished objects It is usually one type, and the number of objects is not limited. For example, the first object can be one or multiple. In addition, "and/or" in the description and claims indicates at least one of the connected objects, and the character "/" generally indicates that the related objects are in an "or" relationship.
下面结合附图,通过一些实施例及其应用场景对本申请实施例提供的应用于编码端的图像处理方法进行详细地说明。The image processing method applied to the encoding end provided by the embodiments of the present application will be described in detail below with reference to the accompanying drawings through some embodiments and their application scenarios.
请参阅图1,图1是本申请实施例提供的图像处理方法的流程图之一。本实施例提供的图像处理方法包括以下步骤:Please refer to Figure 1, which is one of the flow charts of the image processing method provided by the embodiment of the present application. The image processing method provided in this embodiment includes the following steps:
S101,编码端利用可逆神经网络对获取到的第一图像进行第一采样处理,得到第二图像.S101. The encoding end uses a reversible neural network to perform first sampling processing on the acquired first image to obtain a second image.
本步骤中,编码端可以利用重缩放模块对获取到的第一图像进行第一采样处理,得到第二图像。其中,重缩放模块基于可逆神经网络的结构实现,In this step, the encoding end may use the rescaling module to perform first sampling processing on the acquired first image to obtain the second image. Among them, the rescaling module is implemented based on the structure of the reversible neural network.
为便于理解,请参阅图2,图2是本申请实施例提供的重缩放模块的示意图,图2中的仿射耦合层即重缩放模块中的神经网络层。For ease of understanding, please refer to Figure 2. Figure 2 is a schematic diagram of a rescaling module provided by an embodiment of the present application. The affine coupling layer in Figure 2 is the neural network layer in the rescaling module.
应理解,第二图像的分辨率低于第一图像的分辨率,上述第一图像又称原始图像,上述第二图像又称低分辨率图像。It should be understood that the resolution of the second image is lower than the resolution of the first image. The first image is also called the original image, and the second image is also called the low-resolution image.
S102,所述编码端编码压缩所述第二图像,生成目标码流。S102: The encoding end encodes and compresses the second image to generate a target code stream.
本步骤中,一种可选地实施方式为,编码端将第二图像输入至神经网络编码器,神经网络编码器编码压缩第二图像,生成目标码流。In this step, an optional implementation method is that the encoding end inputs the second image to the neural network encoder, and the neural network encoder encodes and compresses the second image to generate a target code stream.
为便于理解,请参阅图3,图3是本申请实施例提供的可学习压缩编解码模块的示意图,图3示出的可学习压缩编解码模块中编码器即上述神经网络编码器。For ease of understanding, please refer to Figure 3. Figure 3 is a schematic diagram of a learnable compression encoding and decoding module provided by an embodiment of the present application. The encoder in the learnable compression encoding and decoding module shown in Figure 3 is the above-mentioned neural network encoder.
另一种可选地实施方式为,编码端将第二图像输入至相关技术中的图像编码器,图像 编码器编码压缩第二图像,生成目标码流。Another optional implementation is that the encoding end inputs the second image to an image encoder in the related art, and the image The encoder encodes and compresses the second image to generate a target code stream.
本申请实施例中,编码端利用可逆神经网络对获取到的第一图像进行第一采样处理,得到第二图像,进一步的编码压缩所述第二图像,生成目标码流;通过使用可逆神经网络执行第一采样处理提高图像的压缩比。解码端获取到的目标码流,得到第三图像,进一步的对第三图像进行第一操作,上述第一操作包括图像增强处理,通过图像增强处理提高图像质量,以此在提高图像质量的同时,提高图像的压缩比。In the embodiment of this application, the encoding end uses a reversible neural network to perform a first sampling process on the acquired first image to obtain a second image, and further encodes and compresses the second image to generate a target code stream; by using the reversible neural network The first sampling process is performed to increase the compression ratio of the image. The target code stream obtained by the decoding end obtains the third image, and further performs a first operation on the third image. The above-mentioned first operation includes image enhancement processing, and the image quality is improved through the image enhancement processing, thereby improving the image quality at the same time. , improve the compression ratio of the image.
可选地,所述利用可逆神经网络对获取到的第一图像进行第一采样处理,得到第二图像包括:Optionally, using a reversible neural network to perform a first sampling process on the acquired first image to obtain the second image includes:
所述编码端对获取到的第一图像进行离散小波变换,得到所述第一图像对应的第一分量和第二分量;The encoding end performs discrete wavelet transformation on the acquired first image to obtain the first component and the second component corresponding to the first image;
所述编码端对所述第一分量和所述第二分量进行下采样处理,得到第二图像。The encoding end performs downsampling processing on the first component and the second component to obtain a second image.
本实施例具体阐述如何使用重缩放模块对获取到的第一图像进行第一采样处理。This embodiment specifically explains how to use the rescaling module to perform the first sampling process on the acquired first image.
本实施例中,重缩放模块对第一图像进行离散小波变换,根据第一图像各像素对应的频率,将第一图像分解为第一分量和第二分量,其中,第一分量又称为低频分量,第二分量又称为高频分量。将第一分量和第二分量输入至图2示出的仿射耦合层,经过一系列的仿射耦合层处理后,输出下采样处理后获得的第二图像。应理解,可逆神经网络包括多个子结构,即可逆神经网络包括多个仿射耦合层。In this embodiment, the rescaling module performs discrete wavelet transform on the first image, and decomposes the first image into a first component and a second component according to the frequency corresponding to each pixel of the first image, where the first component is also called low frequency. component, the second component is also called the high-frequency component. The first component and the second component are input to the affine coupling layer shown in Figure 2, and after a series of affine coupling layer processes, the second image obtained after downsampling processing is output. It should be understood that the reversible neural network includes multiple substructures, that is, the reversible neural network includes multiple affine coupling layers.
具体而言,编码端可以根据第一公式对第一分量和第二分量进行下采样处理,得到第二图像。其中,第一公式为表征可逆神经网络中第i个子结构的输出,第二图像为可逆神经网络中最后一个子结构的输出,为表征第一分量,为表征第二分量,表征可逆神经网络对第二分量的处理结果。Specifically, the encoding end can perform downsampling processing on the first component and the second component according to the first formula to obtain the second image. Among them, the first formula is Characterizes the output of the i-th substructure in the reversible neural network, and the second image is the output of the last substructure in the reversible neural network, To represent the first component, To represent the second component, Characterize the processing results of the second component by the reversible neural network.
应理解,图2示出的重缩放模块对应的失真损失函数可以采用以下公式表达:
It should be understood that the distortion loss function corresponding to the rescaling module shown in Figure 2 can be expressed by the following formula:
其中,D(θ)表征失真损失函数,表征第四图像,表示第一图像,θ表征网络参数,N表征训练样本的数量,l2表征第一图像和第四图像之间的差异。Among them, D(θ) represents the distortion loss function, Characterizing the fourth image, represents the first image, θ represents the network parameters, N represents the number of training samples, and l 2 represents the difference between the first image and the fourth image.
第一图像与第四图像之间的差异的计算可以有几种实现方法,在此不做限制。可选地,上述第一图像与第四图像之间的差异可以是第一图像与第四图像之间各像素点的均方误差(Mean square error,MSE),可以是第一图像与第四图像之间各像素点的差异绝对值总和(Sum of absolute differences,SAD),可以是第一图像与第四图像之间各像素点的结构相似性(Structural Similarity,SSIM),可以是第一图像与第四图像之间各像素点的多尺度结构相似性(Multi-scale structural Similarity,SSIM)。The difference between the first image and the fourth image can be calculated in several ways, which are not limited here. Optionally, the difference between the first image and the fourth image may be the mean square error (MSE) of each pixel between the first image and the fourth image, or the difference between the first image and the fourth image. The sum of absolute differences (SAD) of each pixel point between the images can be the structural similarity (Structural Similarity (SSIM) of each pixel point between the first image and the fourth image, which can be the first image Multi-scale structural similarity (SSIM) of each pixel between the fourth image and the fourth image.
本实施例中,利用可逆神经网络对获取到的第一图像进行第一采样处理,减少直接压缩第一图像的码率,以此提高图像的压缩比。In this embodiment, a reversible neural network is used to perform a first sampling process on the acquired first image, thereby reducing the code rate of directly compressing the first image, thereby improving the compression ratio of the image.
可选地,所述编码压缩所述第二图像,生成目标码流包括: Optionally, the encoding compresses the second image and generating the target code stream includes:
所述编码端使用神经网络编码器对所述第二图像进行压缩,得到所述第二图像对应的第一特征变量;The encoding end uses a neural network encoder to compress the second image to obtain the first feature variable corresponding to the second image;
所述编码端对所述第一特征变量进行量化和算术编码,生成目标码流。The encoding end performs quantization and arithmetic coding on the first feature variable to generate a target code stream.
本实施例具体阐述如何使用神经网络编码器编码压缩第二图像。This embodiment specifically explains how to use a neural network encoder to encode and compress the second image.
请参阅图3,图3中的“原始图像”即第二图像,图3中的“编码器”即神经网络编码器。本实施例中,第二图像经过神经网络编码器处理后得到潜在变量y,对潜在变量y进行特征提取得到第一特征变量,通过对第一特征编码进行量化和算术编码,生成目标码流。其中,上述算术编码是熵编码的一种编码方式,本实施例中也可以使用其他类型的熵编码操作。Please refer to Figure 3. The "original image" in Figure 3 is the second image, and the "encoder" in Figure 3 is the neural network encoder. In this embodiment, the second image is processed by the neural network encoder to obtain the latent variable y, feature extraction is performed on the latent variable y to obtain the first feature variable, and the target code stream is generated by performing quantization and arithmetic coding on the first feature encoding. The above arithmetic coding is a coding method of entropy coding, and other types of entropy coding operations may also be used in this embodiment.
应理解,图3示出的可学习压缩编解码模块对应的率损失函数可以采用以下公式表达:
It should be understood that the rate loss function corresponding to the learnable compression encoding and decoding module shown in Figure 3 can be expressed by the following formula:
其中,R(θ)表征率损失函数,E表征数学期望,N表征训练样本的数量,y表示潜在变量,表征潜在变量的拟合,表征的信息熵。Among them, R(θ) represents the rate loss function, E represents the mathematical expectation, N represents the number of training samples, y represents the latent variable, Characterizing the fit of latent variables, representation information entropy.
可选地,所述方法还包括:Optionally, the method also includes:
所述编码端利用可逆神经网络对所述第一图像对应的至少部分原始特征图进行第一采样处理,得到第一特征图;The encoding end uses a reversible neural network to perform a first sampling process on at least part of the original feature map corresponding to the first image to obtain a first feature map;
在对所述第一图像对应的部分原始特征图进行第一采样处理的情况下,所述编码端编码压缩所述第一特征图和第二特征图,生成特征图码流;所述第二特征图为所述第一图像对应的未进行第一采样处理的原始特征图;When performing the first sampling process on part of the original feature maps corresponding to the first image, the encoding end encodes and compresses the first feature map and the second feature map to generate a feature map code stream; the second The feature map is the original feature map corresponding to the first image that has not been subjected to the first sampling process;
在对所述第一图像对应的全部原始特征图进行第一采样处理的情况下,所述编码端编码压缩所述第一特征图,生成特征图码流。When performing the first sampling process on all original feature maps corresponding to the first image, the encoding end encodes and compresses the first feature map to generate a feature map code stream.
本申请实施例提供的图像处理方法还可以应用于对特征图进行处理。The image processing method provided by the embodiment of the present application can also be applied to processing feature maps.
具体而言,编码端在获取到第一图像之后,可以使用神经网络提取第一图像对应的原始特征图,其中,上述神经网络包括但不限于卷积神经网络(Feature Pyramid Networks,FPN)和Fast区域卷积神经网络(Region Convolutional Neural Networks,R-CNN)。Specifically, after acquiring the first image, the encoding end can use a neural network to extract the original feature map corresponding to the first image. The above-mentioned neural network includes but is not limited to Convolutional Neural Network (Feature Pyramid Networks, FPN) and Fast Region Convolutional Neural Networks (R-CNN).
以下以使用FPN提取第一图像对应的原始特征图为示例进行说明:The following is an example of using FPN to extract the original feature map corresponding to the first image:
本示例中的第一图像是3个颜色通道组成,且分辨率为W×H,其中,W为图像宽度,H为图像高度。使用FPN对第一图像进行特征提取,得到4个原始特征图,分别为P2、P3、P4和P5。其中,P2对应的分辨率为P3对应的分辨率为P4对应的分辨率为P5对应的分辨率为原始特征图对应的通道数均为256。The first image in this example is composed of 3 color channels and has a resolution of W×H, where W is the image width and H is the image height. Use FPN to extract features from the first image and obtain 4 original feature maps, namely P2, P3, P4 and P5. Among them, the resolution corresponding to P2 is The resolution corresponding to P3 is The resolution corresponding to P4 is The resolution corresponding to P5 is The number of channels corresponding to the original feature maps is 256.
一种可选地实施方式为,编码端利用可逆神经网络对部分原始特征图进行第一采样处理,这种实施方式下,编码端对第二特征图(即未进行第一采样处理的原始特征图)以及经过第一采样处理得到的第一特征图进行编码压缩,生成特征图码流。其中,对部分原始特征图进行第一采样处理的具体实施方式与上述实施例中进行第一采样处理的实施方式 一致,在此不做重复阐述。An optional implementation is that the encoding end uses a reversible neural network to perform the first sampling process on part of the original feature maps. In this implementation, the encoding end performs the first sampling process on the second feature map (that is, the original features that have not been subjected to the first sampling process). Figure) and the first feature map obtained through the first sampling process are encoded and compressed to generate a feature map code stream. Among them, the specific implementation of performing the first sampling process on part of the original feature maps is the same as the implementation of the first sampling process in the above embodiment. Consistent, will not be repeated here.
另一种可选地实施方式为,编码端利用可逆神经网络对全部原始特征图进行第一采样处理,这种实施方式下,编码端对经过第一采样处理得到的第一特征图进行编码压缩,生成特征图码流。Another optional implementation is that the encoding end uses a reversible neural network to perform a first sampling process on all original feature maps. In this implementation, the encoding end encodes and compresses the first feature map obtained through the first sampling process. , generate feature image code stream.
可选地,编码端在进行编码压缩之前,可以对特征图进行归一化处理。Optionally, the encoding end can normalize the feature map before performing encoding and compression.
具体而言,神经网络编码器对于每次接收到的同一批次(batch)的特征图数据,统计同一batch数据内特征图数据的最大值(normmax)和最小值(normmin),并通过以下公式对特征图进行归一化处理
valnew=(valori-normmin)/(normmax-normmin)
Specifically, the neural network encoder counts the maximum value (norm max ) and minimum value (norm min ) of the feature map data within the same batch of data for each received feature map data, and passes The following formula normalizes the feature map
val new =(val ori -norm min )/(norm max -norm min )
其中,valmew表示样本点归一化后的值,valori表示样本点归一化前的值。Among them, val mew represents the value of the sample point after normalization, and val ori represents the value of the sample point before normalization.
应理解,编码端在编码过程中还会对上述最大值(normmax)和最小值(normmin)进行编码,并传输至解码端。It should be understood that during the encoding process, the encoding end will also encode the above-mentioned maximum value (norm max ) and minimum value (norm min ) and transmit them to the decoding end.
本实施例中,通过对特征图进行归一化处理,在后续的对特征图编码压缩的过程中,使得编码数据尽可能分散,减少编码过程中的损失,进而提升编码压缩效果。In this embodiment, by normalizing the feature map, in the subsequent process of encoding and compressing the feature map, the encoded data is dispersed as much as possible, thereby reducing the loss during the encoding process, thereby improving the encoding and compression effect.
本实施例中,利用可逆神经网络对第一图像对应的至少部分原始特征图进行第一采样处理,减少直接压缩特征图的码流,以此提高特征图的压缩比。In this embodiment, a reversible neural network is used to perform a first sampling process on at least part of the original feature map corresponding to the first image, thereby reducing the code stream for directly compressing the feature map, thereby improving the compression ratio of the feature map.
下面结合附图,通过一些实施例及其应用场景对本申请实施例提供的应用于解码端的图像处理方法进行详细地说明。The image processing method provided by the embodiments of the present application and applied to the decoder will be described in detail below with reference to the accompanying drawings through some embodiments and their application scenarios.
请参阅图4,图4是本申请实施例提供的图像处理方法的流程图之二。本实施例提供的图像处理方法包括以下步骤:Please refer to FIG. 4 , which is the second flow chart of the image processing method provided by the embodiment of the present application. The image processing method provided in this embodiment includes the following steps:
S401,解码端解码获取到的目标码流,得到第三图像。S401. The decoder decodes the obtained target code stream and obtains the third image.
本步骤中,一种可选地实施方式为,解码端将获取到的目标码流输入至神经网络解码器,神经网络解码器解码目标码流,得到第三图像。In this step, an optional implementation method is that the decoding end inputs the obtained target code stream to the neural network decoder, and the neural network decoder decodes the target code stream to obtain the third image.
为便于理解,请参阅图3,图3是本申请实施例提供的可学习压缩编解码模块的示意图,图3示出的可学习压缩编解码模块中解码器即上述神经网络解码器。For ease of understanding, please refer to Figure 3. Figure 3 is a schematic diagram of a learnable compression coding and decoding module provided by an embodiment of the present application. The decoder in the learnable compression coding and decoding module shown in Figure 3 is the above-mentioned neural network decoder.
另一种可选地实施方式为,解码端将第二图像输入至相关技术中的图像解码器,图像解码器编码目标码流,得到第三图像。Another optional implementation is that the decoding end inputs the second image to an image decoder in the related art, and the image decoder encodes the target code stream to obtain a third image.
S402,所述解码端对所述第三图像进行第一操作,得到第四图像。S402: The decoder performs a first operation on the third image to obtain a fourth image.
本实施例中,上述第一操作包括以下任意一项:In this embodiment, the above-mentioned first operation includes any one of the following:
利用可逆神经网络对第三图像进行第二采样处理后得到第五图像,并对第五图像进行图像增强处理得到第四图像;对第三图像进行图像增强处理后,得到第六图像,并利用可逆神经网络对第六图像进行第二采样处理后得到第四图像。其中,第三图像的分辨率低于第四图像的分辨率。Using a reversible neural network to perform second sampling processing on the third image, a fifth image is obtained, and image enhancement processing is performed on the fifth image to obtain a fourth image; after image enhancement processing is performed on the third image, a sixth image is obtained, and using The reversible neural network performs a second sampling process on the sixth image to obtain a fourth image. Wherein, the resolution of the third image is lower than the resolution of the fourth image.
一种可选地实施方式为,先对第三图像进行第二采样处理后得到第五图像,再对第五图像进行图像增强处理得到第四图像。 An optional implementation manner is to first perform a second sampling process on the third image to obtain a fifth image, and then perform image enhancement processing on the fifth image to obtain a fourth image.
具体而言,可以将第三图像输入至图2示出的重缩放模块中,得到重缩放模块输出的第六图像,进而使用增强模块对第五图像进行图像增强处理得到第四图像。具体的如何使用重缩放模块对第三图像进行第二采样处理的实施方式,请参阅后续实施例。Specifically, the third image can be input into the rescaling module shown in FIG. 2 to obtain the sixth image output by the rescaling module, and then the enhancement module is used to perform image enhancement processing on the fifth image to obtain the fourth image. For a specific implementation of how to use the rescaling module to perform second sampling processing on the third image, please refer to subsequent embodiments.
另一种可选地实施方式为,先对第三图像进行图像增强处理得到第六图像,再对第六图像进行第二采样处理得到第四图像。Another optional implementation is to first perform image enhancement processing on the third image to obtain a sixth image, and then perform second sampling processing on the sixth image to obtain a fourth image.
具体而言,可以使用增强模块对第三图像进行图像增强处理得到第六图像,进而将第六图像输入至图2示出的重缩放模块中,得到重缩放模块输出的第四图像。具体的如何使用重缩放模块对第六图像进行第二采样处理的实施方式,请参阅后续实施例。Specifically, the enhancement module can be used to perform image enhancement processing on the third image to obtain a sixth image, and then the sixth image is input into the rescaling module shown in FIG. 2 to obtain a fourth image output by the rescaling module. For a specific implementation of how to use the rescaling module to perform the second sampling process on the sixth image, please refer to subsequent embodiments.
应理解,上述增强模块基于残差通道注意力网络(Residual Channel Attention Network,RCAN)构成,可选地,设置RCAN网络中的残差连接组为5个,RCAN网络中的残差通道注意块(Residual Channel Attention Block,RCAB)的数量为10个。在其他实施例中,增强模块也可以基于其他神经网络构成。It should be understood that the above enhancement module is based on the Residual Channel Attention Network (RCAN). Optionally, set the residual connection group in the RCAN network to 5, and the residual channel attention block in the RCAN network ( The number of Residual Channel Attention Block (RCAB) is 10. In other embodiments, the enhancement module can also be constructed based on other neural networks.
应理解,增强模块对应的增强损失函数可以采用以下公式表达:
It should be understood that the enhancement loss function corresponding to the enhancement module can be expressed by the following formula:
其中,E(θ)表征增强损失函数,N表征训练样本的数量,表征增强处理前的图像,即第三图像或第五图像,表征增强处理后的图像,即第四图像或第六图像,l2表征增强处理前的图像和增强处理后的图像之间的差异。Among them, E(θ) represents the enhanced loss function, N represents the number of training samples, Characterizes the image before enhancement processing, that is, the third image or the fifth image, represents the image after enhancement processing, that is, the fourth image or the sixth image, and l 2 represents the difference between the image before enhancement processing and the image after enhancement processing.
增强处理前的图像与增强处理后的图像之间的差异的计算可以有几种实现方法,在此不做限制。可选地,上述增强处理前的图像与增强处理后的图像之间的差异可以是增强处理前的图像与增强处理后的图像之间各像素点的均方误差,可以是增强处理前的图像与增强处理后的图像之间各像素点的差异绝对值总和,可以是增强处理前的图像与增强处理后的图像之间各像素点的结构相似性,可以是增强处理前的图像与增强处理后的图像之间各像素点的多尺度结构相似性。The calculation of the difference between the image before enhancement and the image after enhancement can be implemented in several ways, which are not limited here. Optionally, the difference between the above-mentioned image before enhancement processing and the image after enhancement processing may be the mean square error of each pixel between the image before enhancement processing and the image after enhancement processing, or may be the difference between the image before enhancement processing and the image after enhancement processing. The sum of the absolute values of the differences of each pixel between the image before enhancement and the image after enhancement can be the structural similarity of each pixel between the image before enhancement and the image after enhancement, or it can be the difference between the image before enhancement and the image after enhancement. The multi-scale structural similarity of each pixel between the resulting images.
本步骤中,通过使用增强模块进行图像增强处理,提高了图像质量。In this step, the image quality is improved by using the enhancement module for image enhancement processing.
可选地,所述解码端对所述第三图像进行第一操作包括:Optionally, the decoder performing the first operation on the third image includes:
所述解码端对目标图像进行上采样处理得到第三分量,根据分量信息确定所述目标图像对应的第四分量;所述目标图像为所述第三图像或所述第六图像;The decoding end performs an upsampling process on the target image to obtain a third component, and determines a fourth component corresponding to the target image according to the component information; the target image is the third image or the sixth image;
所述解码端对所述第三分量和所述第四分量进行离散小波反变换。The decoding end performs inverse discrete wavelet transform on the third component and the fourth component.
本实施例具体阐述如何使用重缩放模块对获取到的第二图像进行第一操作。This embodiment specifically explains how to use the rescaling module to perform the first operation on the acquired second image.
本实施例中,将目标图像输入至图2示出的仿射耦合层,经过一系列的仿射耦合层处理后,输出目标图像对应的第三分量和第四分量,对上述第三分量和第四分量进行离散小波反变换,得到第四图像。其中,图2中的增强图像即目标图像,图2中的重建图像即第四图像,目标图像为第三图像或第六图像,第三分量为目标图像对应的低频分量,第四分量为目标图像对应的高频分量。 In this embodiment, the target image is input to the affine coupling layer shown in Figure 2. After a series of affine coupling layer processing, the third component and the fourth component corresponding to the target image are output. The above third component and The fourth component is subjected to inverse discrete wavelet transform to obtain the fourth image. Among them, the enhanced image in Figure 2 is the target image, the reconstructed image in Figure 2 is the fourth image, the target image is the third image or the sixth image, the third component is the low-frequency component corresponding to the target image, and the fourth component is the target The high-frequency component corresponding to the image.
具体而言,可以根据第二公式对目标图像进行上采样处理得到第三分量,其中,第二公式为表征第三分量,表征可逆神经网络第i+1个子结构的输出,第三图像为可逆神经网络最后一个子结构的输出,表征可逆神经网络对第四分量的处理结果。Specifically, the third component can be obtained by upsampling the target image according to the second formula, where the second formula is represents the third component, Characterizes the output of the i+1th substructure of the reversible neural network. The third image is the output of the last substructure of the reversible neural network. Characterize the processing results of the fourth component by the reversible neural network.
可以根据第三公式确定目标图像对应的第四分量,其中,第三公式为 表征第四分量,表征第四分量对应的分量信息,表征可逆神经网络对分量信息的处理结果,α为预设的参数,表征可逆神经网络对分量信息的处理结果。The fourth component corresponding to the target image can be determined according to the third formula, where the third formula is Represents the fourth component, Characterizes the component information corresponding to the fourth component, Characterizes the processing results of component information by the reversible neural network, α is the preset parameter, Characterize the processing results of component information by the reversible neural network.
可选地,所述解码获取到的所述目标码流,得到所述第三图像包括:Optionally, decoding the acquired target code stream to obtain the third image includes:
所述解码端对获取到的所述目标码流进行算术解码和反量化,得到第二特征变量;The decoding end performs arithmetic decoding and inverse quantization on the obtained target code stream to obtain the second feature variable;
所述解码端使用神经网络解码器对所述第二特征变量进行解压缩,得到所述第三图像。The decoding end uses a neural network decoder to decompress the second feature variable to obtain the third image.
本实施例具体阐述如何使用神经网络解码器解码获取到的目标码流。This embodiment specifically explains how to use a neural network decoder to decode the obtained target code stream.
请参阅图3,图3中的“压缩后的图像”即第三图像,图3中的“解码器”即神经网络解码器。本实施例中,解码端对获取到目标码流进行算术解码和反量化,得到解码后的潜在变量并对潜在变量进行特征提取,得到第二特征变量,使用神经网络解码器对第二特征变量进行解压缩,得到第三图像。其中,上述算术解码是熵解码的一种编码方式,本实施例中也可以使用其他类型的熵解码操作。Please refer to Figure 3. The "compressed image" in Figure 3 is the third image, and the "decoder" in Figure 3 is the neural network decoder. In this embodiment, the decoder performs arithmetic decoding and inverse quantization on the acquired target code stream to obtain the decoded latent variables. and latent variables Feature extraction is performed to obtain the second feature variable, and a neural network decoder is used to decompress the second feature variable to obtain the third image. The above arithmetic decoding is a coding method of entropy decoding, and other types of entropy decoding operations may also be used in this embodiment.
为便于理解整体的技术方案,请参阅图5,图5示出的XH表示第一图像,XL表示第二图像,表示第三图像,表示第四图像,XE表示图像增强处理后的图像,失真损失表示第一图像和第四图像之间的图像损失,增强损失表示第二图像和图像增强处理后的图像之间的图像损失,质量差距表示第二图像和第三图像之间的图像质量差距,率损失表示经过可学习压缩编码器的有损编码后产生的损失。To facilitate understanding of the overall technical solution, please refer to Figure 5. X H shown in Figure 5 represents the first image, and X L represents the second image. represents the third image, represents the fourth image, X E represents the image after image enhancement processing, the distortion loss represents the image loss between the first image and the fourth image, the enhancement loss represents the image loss between the second image and the image after image enhancement processing, The quality gap represents the image quality gap between the second image and the third image, and the rate loss represents the loss after lossy encoding by the learnable compression encoder.
应理解,图5示出的应用框架对应的总损失函数是上述率损失函数和失真损失函数的加权和,其中,增强损失函数仅用于增强模块的训练阶段,不包括在总损失函数中。It should be understood that the total loss function corresponding to the application framework shown in Figure 5 is the weighted sum of the above-mentioned rate loss function and distortion loss function, where the enhancement loss function is only used in the training phase of the enhancement module and is not included in the total loss function.
图5示出的应用框架包括的各个神经网络模块在实际的训练过程中,一种可能实现的方式为:先对重缩放模块和可学习压缩编解码器共同训练,并单独训练增强模块,在上述三个模块训练完成后,在重缩放模块和可学习压缩编解码器的共同作用下,对增强模块中的神经网络参数进行调整,最后,以端到端的方式调整上述三个模块中的神经网络参数。In the actual training process of each neural network module included in the application framework shown in Figure 5, one possible way to implement it is to first train the rescaling module and the learnable compression codec together, and then train the enhancement module separately. After the training of the above three modules is completed, the neural network parameters in the enhancement module are adjusted under the joint action of the rescaling module and the learnable compression codec. Finally, the neural network parameters in the above three modules are adjusted in an end-to-end manner. Network parameters.
另一种可能实现的方式为:重缩放模块、可学习压缩编解码器和增强模块分别独立训练。Another possible implementation is to train the rescaling module, learnable compression codec and enhancement module independently.
在图5示出的应用场景中,编码端使用重缩放模块对第一图像进行第一采样处理,得到第二图像;编码端使用可学习压缩编解码器对第二图像进行编码,得到目标码流。解码端使用可学习压缩编解码器对目标码流解码,得到第三图像;解码端使用增强模块对第三图像进行图像增强处理,得到图像增强处理后的图像;解码端使用重缩放模块对图像增强处理后的图像进行第二采样处理,得到第四图像。 In the application scenario shown in Figure 5, the encoding end uses the rescaling module to perform the first sampling process on the first image to obtain the second image; the encoding end uses the learnable compression codec to encode the second image to obtain the target code flow. The decoding end uses the learnable compression codec to decode the target code stream to obtain the third image; the decoding end uses the enhancement module to perform image enhancement processing on the third image to obtain the image after image enhancement; the decoding end uses the rescaling module to perform image enhancement on the image. The enhanced image is subjected to a second sampling process to obtain a fourth image.
可选地,解码端在得到第三图像后,可以先使用重缩放模块对第三图像进行第二采样处理,再使用增强模块对第二采样处理后的图像进行图像增强处理。Optionally, after obtaining the third image, the decoder may first use the rescaling module to perform the second sampling process on the third image, and then use the enhancement module to perform image enhancement processing on the image after the second sampling process.
可选地,可学习压缩编解码器包括重缩放模块,即重缩放模块作为可学习压缩编解码器中的一部分。Optionally, the learnable compression codec includes a rescaling module, ie, the rescaling module is included as part of the learnable compression codec.
可选地,可以使用相关技术中的编解码器替换上述可学习压缩编解码器。Optionally, the above-mentioned learnable compression codec may be replaced with a codec in related art.
可选地,所述方法还包括:Optionally, the method also includes:
所述解码端在解码获取到的特征图码流得到第三特征图的情况下,对所述第三特征图进行第一操作,确定所述第四图像对应的全部重建特征图;When the decoder decodes the obtained feature map code stream to obtain a third feature map, the decoder performs a first operation on the third feature map to determine all reconstructed feature maps corresponding to the fourth image;
所述解码端在解码获取到的特征图码流得到第三特征图和部分原始特征图的情况下,基于所述部分重建特征图和第四特征图,确定所述第四图像对应的全部重建特征图。When the decoder decodes the obtained feature map code stream to obtain a third feature map and a partial original feature map, it determines all reconstructions corresponding to the fourth image based on the partially reconstructed feature map and the fourth feature map. Feature map.
一种可选地实施方式为,解码端解码获取到的特征图码流,得到第三特征图。这种实施方式下,对第三特征图进行第一操作,确定第四图像对应的全部重建特征图。An optional implementation manner is that the decoding end decodes the obtained feature map code stream to obtain the third feature map. In this implementation manner, a first operation is performed on the third feature map to determine all reconstructed feature maps corresponding to the fourth image.
另一种可选地实施方式为,解码端解码获取到的特征图码流,得到第三特征图和部分重建特征图。这种实施方式下,对第三特征图进行第一操作,确定第四特征图,并将部分重建特征图和第四特征图,确定为第四图像对应的全部重建特征图。Another optional implementation is that the decoder decodes the obtained feature map code stream to obtain the third feature map and the partially reconstructed feature map. In this implementation manner, a first operation is performed on the third feature map to determine the fourth feature map, and the partially reconstructed feature map and the fourth feature map are determined as all reconstructed feature maps corresponding to the fourth image.
应理解,第一操作包括以下任意一项:利用可逆神经网络对第三特征图进行第二采样处理,并对第二采样处理后的第三特征图进行图像增强处理;对第三特征图进行图像增强处理,并对图像增强处理后的第三特征图进行第二采样处理。上述第一操作的具体实施方式可以参阅上述实施例的内容,在此不做重复阐述。It should be understood that the first operation includes any one of the following: using a reversible neural network to perform a second sampling process on the third feature map, and performing image enhancement processing on the third feature map after the second sampling process; performing a second sampling process on the third feature map. Image enhancement processing, and performing a second sampling process on the third feature map after the image enhancement processing. For the specific implementation of the above-mentioned first operation, please refer to the contents of the above-mentioned embodiments, and will not be repeated here.
本申请实施例提供的图像处理方法,执行主体可以为图像处理装置。本申请实施例中以图像处理装置应用于编码端执行图像处理方法为例,说明本申请实施例提供的图像处理装置。For the image processing method provided by the embodiments of the present application, the execution subject may be an image processing device. In the embodiment of the present application, the image processing device is used at the encoding end to perform the image processing method as an example to illustrate the image processing device provided by the embodiment of the present application.
如图6所示,本申请实施例还提供了一种图像处理装置600,包括:As shown in Figure 6, this embodiment of the present application also provides an image processing device 600, including:
第一处理模块601,用于利用可逆神经网络对获取到的第一图像进行第一采样处理,得到第二图像;所述第二图像的分辨率低于所述第一图像的分辨率;The first processing module 601 is configured to use a reversible neural network to perform a first sampling process on the acquired first image to obtain a second image; the resolution of the second image is lower than the resolution of the first image;
第一生成模块602,用于编码压缩所述第二图像,生成目标码流。The first generation module 602 is used to encode and compress the second image to generate a target code stream.
可选地,所述第一处理模块601,具体用于:Optionally, the first processing module 601 is specifically used to:
对获取到的第一图像进行离散小波变换,得到所述第一图像对应的第一分量和第二分量;Perform discrete wavelet transform on the acquired first image to obtain the first component and the second component corresponding to the first image;
对所述第一分量和所述第二分量进行下采样处理,得到第二图像。Perform downsampling processing on the first component and the second component to obtain a second image.
可选地,所述第一生成模块602,具体用于:Optionally, the first generation module 602 is specifically used to:
使用神经网络编码器对所述第二图像进行压缩,得到所述第二图像对应的第一特征变量;Use a neural network encoder to compress the second image to obtain the first feature variable corresponding to the second image;
对所述第一特征变量进行量化和算术编码,生成目标码流。The first feature variable is quantized and arithmetic encoded to generate a target code stream.
可选地,所述图像处理装置600,还包括: Optionally, the image processing device 600 also includes:
第二处理模块,用于利用可逆神经网络对所述第一图像对应的至少部分原始特征图进行第一采样处理,得到第一特征图;A second processing module configured to use a reversible neural network to perform first sampling processing on at least part of the original feature map corresponding to the first image to obtain a first feature map;
第二生成模块,用于在对所述第一图像对应的部分原始特征图进行第一采样处理的情况下,编码压缩所述第一特征图和第二特征图,生成特征图码流;A second generation module, configured to encode and compress the first feature map and the second feature map to generate a feature map code stream when performing the first sampling process on part of the original feature map corresponding to the first image;
第三生成模块,用于在对所述第一图像对应的全部原始特征图进行第一采样处理的情况下,编码压缩所述第一特征图,生成特征图码流。The third generation module is configured to encode and compress the first feature map to generate a feature map code stream when performing the first sampling process on all original feature maps corresponding to the first image.
本申请实施例提供的图像处理方法,执行主体可以为图像处理装置。本申请实施例中以图像处理装置应用于解码端执行图像处理方法为例,说明本申请实施例提供的图像处理装置。For the image processing method provided by the embodiments of the present application, the execution subject may be an image processing device. In the embodiment of the present application, the image processing device is used in the decoder to perform the image processing method as an example to illustrate the image processing device provided by the embodiment of the present application.
如图7所示,本申请实施例还提供了一种图像处理装置700,包括:As shown in Figure 7, this embodiment of the present application also provides an image processing device 700, including:
解码模块701,用于解码获取到的目标码流,得到第三图像;The decoding module 701 is used to decode the obtained target code stream to obtain the third image;
操作模块702,用于对所述第三图像进行第一操作,得到第四图像;所述第三图像的分辨率低于所述第四图像的分辨率;The operation module 702 is used to perform a first operation on the third image to obtain a fourth image; the resolution of the third image is lower than the resolution of the fourth image;
其中,所述第一操作包括以下任意一项:Wherein, the first operation includes any one of the following:
利用可逆神经网络对所述第三图像进行第二采样处理后得到第五图像,并对所述第五图像进行图像增强处理得到所述第四图像;Using a reversible neural network to perform second sampling processing on the third image to obtain a fifth image, and performing image enhancement processing on the fifth image to obtain the fourth image;
对所述第三图像进行图像增强处理后,得到第六图像,并利用可逆神经网络对所述第六图像进行第二采样处理后得到所述第四图像。After performing image enhancement processing on the third image, a sixth image is obtained, and a reversible neural network is used to perform a second sampling process on the sixth image to obtain the fourth image.
可选地,所述操作模块702,具体用于:Optionally, the operation module 702 is specifically used to:
对目标图像进行上采样处理得到第三分量,根据分量信息确定所述目标图像对应的第四分量;所述目标图像为所述第三图像或所述第六图像;Perform an upsampling process on the target image to obtain a third component, and determine the fourth component corresponding to the target image according to the component information; the target image is the third image or the sixth image;
对所述第三分量和所述第四分量进行离散小波反变换。The third component and the fourth component are subjected to an inverse discrete wavelet transform.
可选地,所述解码模块701,具体用于:Optionally, the decoding module 701 is specifically used to:
对获取到的所述目标码流进行算术解码和反量化,得到第二特征变量;Perform arithmetic decoding and inverse quantization on the obtained target code stream to obtain the second feature variable;
使用神经网络解码器对所述第二特征变量进行解压缩,得到所述第三图像。Use a neural network decoder to decompress the second feature variable to obtain the third image.
可选地,所述图像处理装置700,还包括:Optionally, the image processing device 700 also includes:
第一确定模块,用于所述解码端在解码获取到的特征图码流得到第三特征图的情况下,对所述第三特征图进行第一操作,确定所述第四图像对应的全部重建特征图;The first determination module is used for the decoder to perform a first operation on the third feature map to determine all the features corresponding to the fourth image when the decoder decodes the obtained feature map code stream to obtain the third feature map. Reconstruct feature maps;
第二确定模块,用于所述解码端在解码获取到的特征图码流得到第三特征图和部分重建特征图的情况下,基于所述部分重建特征图和第四特征图,确定所述第四图像对应的全部重建特征图;所述第四特征图基于对所述第三特征图进行第一操作确定;The second determination module is used for the decoding end to determine the third feature map and the partially reconstructed feature map based on the partially reconstructed feature map and the fourth feature map when decoding the obtained feature map code stream. All reconstructed feature maps corresponding to the fourth image; the fourth feature map is determined based on the first operation on the third feature map;
其中,所述第一操作包括以下任意一项:Wherein, the first operation includes any one of the following:
利用可逆神经网络对所述第三特征图进行第二采样处理,并对第二采样处理后的第三特征图进行图像增强处理;Using a reversible neural network to perform a second sampling process on the third feature map, and performing image enhancement processing on the third feature map after the second sampling process;
对所述第三特征图进行图像增强处理,并对图像增强处理后的第三特征图进行第二采 样处理。Perform image enhancement processing on the third feature map, and perform a second acquisition on the third feature map after image enhancement processing. Treat it like this.
本申请实施例中,编码端利用可逆神经网络对获取到的第一图像进行第一采样处理,得到第二图像,进一步的编码压缩所述第二图像,生成目标码流;通过使用可逆神经网络执行第一采样处理提高图像的压缩比。解码端获取到的目标码流,得到第三图像,进一步的对第三图像进行第一操作,上述第一操作包括图像增强处理,通过图像增强处理提高图像质量,以此在提高图像质量的同时,提高图像的压缩比。In the embodiment of this application, the encoding end uses a reversible neural network to perform a first sampling process on the acquired first image to obtain a second image, and further encodes and compresses the second image to generate a target code stream; by using the reversible neural network The first sampling process is performed to increase the compression ratio of the image. The target code stream obtained by the decoding end obtains the third image, and further performs a first operation on the third image. The above-mentioned first operation includes image enhancement processing, and the image quality is improved through the image enhancement processing, thereby improving the image quality at the same time. , improve the compression ratio of the image.
本申请实施例提供的应用于编码端的图像处理装置能够实现图1的方法实施例实现的各个过程,并达到相同的技术效果,为避免重复,这里不再赘述。The image processing device applied to the encoding end provided by the embodiment of the present application can implement each process implemented by the method embodiment in Figure 1 and achieve the same technical effect. To avoid duplication, the details will not be described here.
本申请实施例提供的应用于解码端的图像处理装置能够实现图4的方法实施例实现的各个过程,并达到相同的技术效果,为避免重复,这里不再赘述。The image processing device applied to the decoding end provided by the embodiment of the present application can implement each process implemented by the method embodiment in Figure 4 and achieve the same technical effect. To avoid duplication, the details will not be described here.
本申请实施例中的图像处理装置可以是电子设备,例如具有操作系统的电子设备,也可以是电子设备中的部件、例如集成电路或芯片。该电子设备可以是终端,也可以为除终端之外的其他设备。示例性的,终端可以包括但不限于上述所列举的终端的类型,其他设备可以为服务器、网络附属存储器(Network Attached Storage,NAS)等,本申请实施例不作具体限定。The image processing device in the embodiment of the present application may be an electronic device, such as an electronic device with an operating system, or may be a component in the electronic device, such as an integrated circuit or chip. The electronic device may be a terminal or other devices other than the terminal. For example, terminals may include but are not limited to the types of terminals listed above, and other devices may be servers, network attached storage (Network Attached Storage, NAS), etc., which are not specifically limited in the embodiments of this application.
可选地,如图8所示,本申请实施例还提供一种通信设备800,包括处理器801和存储器802,存储器802上存储有可在所述处理器801上运行的程序或指令,例如,该通信设备800为终端时,该程序或指令被处理器801执行时实现上述图像处理方法实施例的各个步骤,且能达到相同的技术效果。Optionally, as shown in Figure 8, this embodiment of the present application also provides a communication device 800, which includes a processor 801 and a memory 802. The memory 802 stores programs or instructions that can be run on the processor 801, such as , when the communication device 800 is a terminal, when the program or instruction is executed by the processor 801, each step of the above image processing method embodiment is implemented, and the same technical effect can be achieved.
本申请实施例还提供一种终端,包括处理器801和通信接口,处理器801用于执行以下操作:An embodiment of the present application also provides a terminal, including a processor 801 and a communication interface. The processor 801 is configured to perform the following operations:
利用可逆神经网络对获取到的第一图像进行第一采样处理,得到第二图像;所述第二图像的分辨率低于所述第一图像的分辨率;Using a reversible neural network to perform a first sampling process on the acquired first image to obtain a second image; the resolution of the second image is lower than the resolution of the first image;
编码压缩所述第二图像,生成目标码流。Encoding and compressing the second image to generate a target code stream.
或者,处理器801用于执行以下操作:Alternatively, processor 801 is configured to perform the following operations:
解码获取到的目标码流,得到第三图像;Decode the obtained target code stream to obtain the third image;
对所述第三图像进行第一操作,得到第四图像;所述第三图像的分辨率低于所述第四图像的分辨率;Perform a first operation on the third image to obtain a fourth image; the resolution of the third image is lower than the resolution of the fourth image;
其中,所述第一操作包括以下任意一项:Wherein, the first operation includes any one of the following:
利用可逆神经网络对所述第三图像进行第二采样处理后得到第五图像,并对所述第五图像进行图像增强处理得到所述第四图像;Using a reversible neural network to perform second sampling processing on the third image to obtain a fifth image, and performing image enhancement processing on the fifth image to obtain the fourth image;
对所述第三图像进行图像增强处理后,得到第六图像,并利用可逆神经网络对所述第六图像进行第二采样处理后得到所述第四图像。After performing image enhancement processing on the third image, a sixth image is obtained, and a reversible neural network is used to perform a second sampling process on the sixth image to obtain the fourth image.
该终端实施例与上述终端侧方法实施例对应,上述方法实施例的各个实施过程和实现方式均可适用于该终端实施例中,且能达到相同的技术效果。具体地,图9为实现本申请 实施例的一种终端的硬件结构示意图。This terminal embodiment corresponds to the above-mentioned terminal-side method embodiment. Each implementation process and implementation manner of the above-mentioned method embodiment can be applied to this terminal embodiment, and can achieve the same technical effect. Specifically, Figure 9 shows the implementation of this application Schematic diagram of the hardware structure of a terminal according to the embodiment.
该终端900包括但不限于:射频单元901、网络模块902、音频输出单元903、输入单元904、传感器905、显示单元906、用户输入单元907、接口单元907、存储器909、以及处理器910等部件。The terminal 900 includes but is not limited to: a radio frequency unit 901, a network module 902, an audio output unit 903, an input unit 904, a sensor 905, a display unit 906, a user input unit 907, an interface unit 907, a memory 909, a processor 910 and other components. .
本领域技术人员可以理解,终端900还可以包括给各个部件供电的电源(比如电池),电源可以通过电源管理系统与处理器910逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。图9中示出的终端结构并不构成对终端的限定,终端可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置,在此不再赘述。Those skilled in the art can understand that the terminal 900 may also include a power supply (such as a battery) that supplies power to various components. The power supply may be logically connected to the processor 910 through a power management system, thereby managing charging, discharging, and power consumption through the power management system. Management and other functions. The terminal structure shown in FIG. 9 does not constitute a limitation on the terminal. The terminal may include more or fewer components than shown in the figure, or may combine certain components, or arrange different components, which will not be described again here.
应理解的是,本申请实施例中,输入单元904可以包括图形处理器(Graphics Processing Unit,GPU)9041和麦克风9042,图形处理器9041对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行处理。显示单元906可包括显示面板9061,可以采用液晶显示器、有机发光二极管等形式来配置显示面板9071。用户输入单元907包括触控面板9071以及其他输入设备9072中的至少一种。触控面板9071,也称为触摸屏。触控面板9071可包括触摸检测装置和触摸控制器两个部分。其他输入设备9072可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆,在此不再赘述。It should be understood that in the embodiment of the present application, the input unit 904 may include a graphics processor (Graphics Processing Unit, GPU) 9041 and a microphone 9042. The graphics processor 9041 is responsible for the image capture device (GPU) in the video capture mode or the image capture mode. Process the image data of still pictures or videos obtained by cameras (such as cameras). The display unit 906 may include a display panel 9061, and the display panel 9071 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 907 includes a touch panel 9071 and at least one of other input devices 9072 . Touch panel 9071, also known as touch screen. The touch panel 9071 may include two parts: a touch detection device and a touch controller. Other input devices 9072 may include but are not limited to physical keyboards, function keys (such as volume control keys, switch keys, etc.), trackballs, mice, and joysticks, which will not be described again here.
本申请实施例中,射频单元901接收来自网络侧设备的下行数据后,可以传输给处理器910进行处理;射频单元901可以向网络侧设备发送上行数据。通常,射频单元901包括但不限于天线、放大器、收发信机、耦合器、低噪声放大器、双工器等。In this embodiment of the present application, after receiving downlink data from the network side device, the radio frequency unit 901 can transmit it to the processor 910 for processing; the radio frequency unit 901 can send uplink data to the network side device. Generally, the radio frequency unit 901 includes, but is not limited to, an antenna, an amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, etc.
存储器909可用于存储软件程序或指令以及各种数据。存储器909可主要包括存储程序或指令的第一存储区和存储数据的第二存储区,其中,第一存储区可存储操作系统、至少一个功能所需的应用程序或指令(比如声音播放功能、图像播放功能等)等。此外,存储器909可以包括易失性存储器或非易失性存储器,或者,存储器909可以包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDRSDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synch link DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DRRAM)。本申请实施例中的存储器909包括但不限于这些和任意其它适合类型的存储器。Memory 909 may be used to store software programs or instructions as well as various data. The memory 909 may mainly include a first storage area for storing programs or instructions and a second storage area for storing data, wherein the first storage area may store an operating system, an application program or instructions required for at least one function (such as a sound playback function, Image playback function, etc.) etc. Additionally, memory 909 may include volatile memory or nonvolatile memory, or memory 909 may include both volatile and nonvolatile memory. Among them, non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electrically removable memory. Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory. Volatile memory can be random access memory (Random Access Memory, RAM), static random access memory (Static RAM, SRAM), dynamic random access memory (Dynamic RAM, DRAM), synchronous dynamic random access memory (Synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDRSDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), synchronous link dynamic random access memory (Synch link DRAM) , SLDRAM) and direct memory bus random access memory (Direct Rambus RAM, DRRAM). Memory 909 in embodiments of the present application includes, but is not limited to, these and any other suitable types of memory.
处理器910可包括一个或多个处理单元;可选的,处理器910集成应用处理器和调制解调处理器,其中,应用处理器主要处理涉及操作系统、用户界面和应用程序等的操作, 调制解调处理器主要处理无线通信信号,如基带处理器。可以理解的是,上述调制解调处理器也可以不集成到处理器910中。The processor 910 may include one or more processing units; optionally, the processor 910 integrates an application processor and a modem processor, where the application processor mainly handles operations related to the operating system, user interface, application programs, etc., Modem processors mainly process wireless communication signals, such as baseband processors. It can be understood that the above modem processor may not be integrated into the processor 910.
其中,处理器910用于执行以下操作:Among them, the processor 910 is used to perform the following operations:
利用可逆神经网络对获取到的第一图像进行第一采样处理,得到第二图像;所述第二图像的分辨率低于所述第一图像的分辨率;Using a reversible neural network to perform a first sampling process on the acquired first image to obtain a second image; the resolution of the second image is lower than the resolution of the first image;
编码压缩所述第二图像,生成目标码流。Encoding and compressing the second image to generate a target code stream.
或者,处理器910用于执行以下操作:Alternatively, processor 910 is configured to perform the following operations:
解码获取到的目标码流,得到第三图像;Decode the obtained target code stream to obtain the third image;
对所述第三图像进行第一操作,得到第四图像;所述第三图像的分辨率低于所述第四图像的分辨率;Perform a first operation on the third image to obtain a fourth image; the resolution of the third image is lower than the resolution of the fourth image;
其中,所述第一操作包括以下任意一项:Wherein, the first operation includes any one of the following:
利用可逆神经网络对所述第三图像进行第二采样处理后得到第五图像,并对所述第五图像进行图像增强处理得到所述第四图像;Using a reversible neural network to perform second sampling processing on the third image to obtain a fifth image, and performing image enhancement processing on the fifth image to obtain the fourth image;
对所述第三图像进行图像增强处理后,得到第六图像,并利用可逆神经网络对所述第六图像进行第二采样处理后得到所述第四图像。After performing image enhancement processing on the third image, a sixth image is obtained, and a reversible neural network is used to perform a second sampling process on the sixth image to obtain the fourth image.
本申请实施例还提供一种可读存储介质,所述可读存储介质上存储有程序或指令,该程序或指令被处理器执行时实现上述图像处理方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。Embodiments of the present application also provide a readable storage medium. Programs or instructions are stored on the readable storage medium. When the program or instructions are executed by a processor, each process of the above image processing method embodiment is implemented and the same can be achieved. The technical effects will not be repeated here to avoid repetition.
其中,所述处理器为上述实施例中所述的终端中的处理器。所述可读存储介质,包括计算机可读存储介质,如计算机只读存储器ROM、随机存取存储器RAM、磁碟或者光盘等。Wherein, the processor is the processor in the terminal described in the above embodiment. The readable storage medium includes computer readable storage media, such as computer read-only memory ROM, random access memory RAM, magnetic disk or optical disk, etc.
本申请实施例另提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现上述图像处理方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。An embodiment of the present application further provides a chip. The chip includes a processor and a communication interface. The communication interface is coupled to the processor. The processor is used to run programs or instructions to implement the above image processing method embodiments. Each process can achieve the same technical effect. To avoid duplication, it will not be described again here.
应理解,本申请实施例提到的芯片还可以称为系统级芯片,系统芯片,芯片系统或片上系统芯片等。It should be understood that the chips mentioned in the embodiments of this application may also be called system-on-chip, system-on-a-chip, system-on-chip or system-on-chip, etc.
本申请实施例另提供了一种计算机程序/程序产品,所述计算机程序/程序产品被存储在存储介质中,所述计算机程序/程序产品被至少一个处理器执行以实现上述图像处理方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。Embodiments of the present application further provide a computer program/program product. The computer program/program product is stored in a storage medium. The computer program/program product is executed by at least one processor to implement the above image processing method embodiment. Each process can achieve the same technical effect. To avoid repetition, we will not go into details here.
本申请实施例另提供了一种系统,所述系统包括编码端和解码端,所述编码端执行上述应用于编码端的图像处理方法实施例的各个过程,所述解码端执行上述应用于解码端的图像处理方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。Embodiments of the present application further provide a system. The system includes an encoding end and a decoding end. The encoding end performs the various processes of the image processing method embodiments applied to the encoding end. The decoding end performs the above processes applied to the decoding end. Each process of the image processing method embodiment can achieve the same technical effect. To avoid repetition, it will not be described again here.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所 固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。此外,需要指出的是,本申请实施方式中的方法和装置的范围不限按示出或讨论的顺序来执行功能,还可包括根据所涉及的功能按基本同时的方式或按相反的顺序来执行功能,例如,可以按不同于所描述的次序来执行所描述的方法,并且还可以添加、省去、或组合各种步骤。另外,参照某些示例所描述的特征可在其他示例中被组合。It should be noted that, in this document, the terms "comprising", "comprises" or any other variations thereof are intended to cover a non-exclusive inclusion, such that a process, method, article or device that includes a series of elements not only includes those elements, and also includes other elements not expressly listed or included for such process, method, article or apparatus. inherent elements. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article or apparatus that includes that element. In addition, it should be pointed out that the scope of the methods and devices in the embodiments of the present application is not limited to performing functions in the order shown or discussed, but may also include performing functions in a substantially simultaneous manner or in reverse order according to the functions involved. Functions may be performed, for example, the methods described may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对相关技术做出贡献的部分可以以计算机软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better. implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a computer software product that is essentially or contributes to related technologies. The computer software product is stored in a storage medium (such as ROM/RAM, disk, CD), including several instructions to cause a terminal (which can be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in various embodiments of this application.
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本申请的保护之内。 The embodiments of the present application have been described above in conjunction with the accompanying drawings. However, the present application is not limited to the above-mentioned specific implementations. The above-mentioned specific implementations are only illustrative and not restrictive. Those of ordinary skill in the art will Inspired by this application, many forms can be made without departing from the purpose of this application and the scope protected by the claims, all of which fall within the protection of this application.

Claims (20)

  1. 一种图像处理方法,包括:An image processing method including:
    编码端利用可逆神经网络对获取到的第一图像进行第一采样处理,得到第二图像;所述第二图像的分辨率低于所述第一图像的分辨率;The encoding end uses a reversible neural network to perform a first sampling process on the acquired first image to obtain a second image; the resolution of the second image is lower than the resolution of the first image;
    所述编码端编码压缩所述第二图像,生成目标码流。The encoding end codes and compresses the second image to generate a target code stream.
  2. 根据权利要求1所述的方法,其中,所述利用可逆神经网络对获取到的第一图像进行第一采样处理,得到第二图像包括:The method according to claim 1, wherein said using a reversible neural network to perform a first sampling process on the acquired first image to obtain the second image includes:
    所述编码端对获取到的第一图像进行离散小波变换,得到所述第一图像对应的第一分量和第二分量;The encoding end performs discrete wavelet transformation on the acquired first image to obtain the first component and the second component corresponding to the first image;
    所述编码端对所述第一分量和所述第二分量进行下采样处理,得到第二图像。The encoding end performs downsampling processing on the first component and the second component to obtain a second image.
  3. 根据权利要求1或2所述的方法,其中,所述编码压缩所述第二图像,生成目标码流包括:The method according to claim 1 or 2, wherein the encoding compresses the second image and generating the target code stream includes:
    所述编码端使用神经网络编码器对所述第二图像进行压缩,得到所述第二图像对应的第一特征变量;The encoding end uses a neural network encoder to compress the second image to obtain the first feature variable corresponding to the second image;
    所述编码端对所述第一特征变量进行量化和算术编码,生成目标码流。The encoding end performs quantization and arithmetic coding on the first feature variable to generate a target code stream.
  4. 根据权利要求1所述的方法,其中,所述方法还包括:The method of claim 1, further comprising:
    所述编码端利用可逆神经网络对所述第一图像对应的至少部分原始特征图进行第一采样处理,得到第一特征图;The encoding end uses a reversible neural network to perform a first sampling process on at least part of the original feature map corresponding to the first image to obtain a first feature map;
    在对所述第一图像对应的部分原始特征图进行第一采样处理的情况下,所述编码端编码压缩所述第一特征图和第二特征图,生成特征图码流;所述第二特征图为所述第一图像对应的未进行第一采样处理的原始特征图;When performing the first sampling process on part of the original feature maps corresponding to the first image, the encoding end encodes and compresses the first feature map and the second feature map to generate a feature map code stream; the second The feature map is the original feature map corresponding to the first image that has not been subjected to the first sampling process;
    在对所述第一图像对应的全部原始特征图进行第一采样处理的情况下,所述编码端编码压缩所述第一特征图,生成特征图码流。When performing the first sampling process on all original feature maps corresponding to the first image, the encoding end encodes and compresses the first feature map to generate a feature map code stream.
  5. 一种图像处理方法,包括:An image processing method including:
    解码端解码获取到的目标码流,得到第三图像;The decoding end decodes the obtained target code stream and obtains the third image;
    所述解码端对所述第三图像进行第一操作,得到第四图像;所述第三图像的分辨率低于所述第四图像的分辨率;The decoding end performs a first operation on the third image to obtain a fourth image; the resolution of the third image is lower than the resolution of the fourth image;
    其中,所述第一操作包括以下任意一项:Wherein, the first operation includes any one of the following:
    利用可逆神经网络对所述第三图像进行第二采样处理后得到第五图像,并对所述第五图像进行图像增强处理得到所述第四图像;Using a reversible neural network to perform second sampling processing on the third image to obtain a fifth image, and performing image enhancement processing on the fifth image to obtain the fourth image;
    对所述第三图像进行图像增强处理后,得到第六图像,并利用可逆神经网络对所述第六图像进行第二采样处理后得到所述第四图像。After performing image enhancement processing on the third image, a sixth image is obtained, and a reversible neural network is used to perform a second sampling process on the sixth image to obtain the fourth image.
  6. 根据权利要求5所述的方法,其中,所述解码端对所述第三图像进行第一操作包括:The method according to claim 5, wherein the decoding end performs the first operation on the third image including:
    所述解码端对目标图像进行上采样处理得到第三分量,根据分量信息确定所述目标图 像对应的第四分量;所述目标图像为所述第三图像或所述第六图像;The decoder performs an upsampling process on the target image to obtain the third component, and determines the target image based on the component information. the fourth component corresponding to the image; the target image is the third image or the sixth image;
    所述解码端对所述第三分量和所述第四分量进行离散小波反变换。The decoding end performs inverse discrete wavelet transform on the third component and the fourth component.
  7. 根据权利要求5或6所述的方法,其中,所述解码获取到的所述目标码流,得到所述第三图像包括:The method according to claim 5 or 6, wherein said decoding the acquired target code stream to obtain the third image includes:
    所述解码端对获取到的所述目标码流进行算术解码和反量化,得到第二特征变量;The decoding end performs arithmetic decoding and inverse quantization on the obtained target code stream to obtain the second feature variable;
    所述解码端使用神经网络解码器对所述第二特征变量进行解压缩,得到所述第三图像。The decoding end uses a neural network decoder to decompress the second feature variable to obtain the third image.
  8. 根据权利要求5所述的方法,其中,所述方法还包括:The method of claim 5, further comprising:
    所述解码端在解码获取到的特征图码流得到第三特征图的情况下,对所述第三特征图进行第一操作,确定所述第四图像对应的全部重建特征图;When the decoder decodes the obtained feature map code stream to obtain a third feature map, the decoder performs a first operation on the third feature map to determine all reconstructed feature maps corresponding to the fourth image;
    所述解码端在解码获取到的特征图码流得到第三特征图和部分重建特征图的情况下,基于所述部分重建特征图和第四特征图,确定所述第四图像对应的全部重建特征图;所述第四特征图基于对所述第三特征图进行第一操作确定;When the decoder decodes the obtained feature map code stream to obtain a third feature map and a partially reconstructed feature map, it determines all reconstructions corresponding to the fourth image based on the partially reconstructed feature map and the fourth feature map. Feature map; the fourth feature map is determined based on the first operation on the third feature map;
    其中,所述第一操作包括以下任意一项:Wherein, the first operation includes any one of the following:
    利用可逆神经网络对所述第三特征图进行第二采样处理,并对第二采样处理后的第三特征图进行图像增强处理;Using a reversible neural network to perform a second sampling process on the third feature map, and performing image enhancement processing on the third feature map after the second sampling process;
    对所述第三特征图进行图像增强处理,并对图像增强处理后的第三特征图进行第二采样处理。Perform image enhancement processing on the third feature map, and perform second sampling processing on the third feature map after the image enhancement processing.
  9. 一种图像处理装置,所述装置应用于编码端,所述装置包括:An image processing device, the device is applied to the encoding end, and the device includes:
    第一处理模块,用于利用可逆神经网络对获取到的第一图像进行第一采样处理,得到第二图像;所述第二图像的分辨率低于所述第一图像的分辨率;A first processing module configured to use a reversible neural network to perform a first sampling process on the acquired first image to obtain a second image; the resolution of the second image is lower than the resolution of the first image;
    第一生成模块,用于编码压缩所述第二图像,生成目标码流。The first generation module is used to encode and compress the second image to generate a target code stream.
  10. 根据权利要求9所述的装置,其中,所述第一处理模块,具体用于:The device according to claim 9, wherein the first processing module is specifically used for:
    对获取到的第一图像进行离散小波变换,得到所述第一图像对应的第一分量和第二分量;Perform discrete wavelet transform on the acquired first image to obtain the first component and the second component corresponding to the first image;
    对所述第一分量和所述第二分量进行下采样处理,得到第二图像。Perform downsampling processing on the first component and the second component to obtain a second image.
  11. 根据权利要求9或10所述的装置,其中,所述第一生成模块,具体用于:The device according to claim 9 or 10, wherein the first generation module is specifically used for:
    使用神经网络编码器对所述第二图像进行压缩,得到所述第二图像对应的第一特征变量;Use a neural network encoder to compress the second image to obtain the first feature variable corresponding to the second image;
    对所述第一特征变量进行量化和算术编码,生成目标码流。The first feature variable is quantized and arithmetic encoded to generate a target code stream.
  12. 根据权利要求9所述的装置,其中,所述装置还包括:The device of claim 9, further comprising:
    第二处理模块,用于利用可逆神经网络对所述第一图像对应的至少部分原始特征图进行第一采样处理,得到第一特征图;A second processing module configured to use a reversible neural network to perform first sampling processing on at least part of the original feature map corresponding to the first image to obtain a first feature map;
    第二生成模块,用于在对所述第一图像对应的部分原始特征图进行第一采样处理的情况下,编码压缩所述第一特征图和第二特征图,生成特征图码流;所述第二特征图为所述第一图像对应的未进行第一采样处理的原始特征图; The second generation module is used to encode and compress the first feature map and the second feature map when performing the first sampling process on the partial original feature map corresponding to the first image, and generate a feature map code stream; The second feature map is the original feature map corresponding to the first image that has not been subjected to the first sampling process;
    第三生成模块,用于在对所述第一图像对应的全部原始特征图进行第一采样处理的情况下,编码压缩所述第一特征图,生成特征图码流。The third generation module is configured to encode and compress the first feature map to generate a feature map code stream when performing the first sampling process on all original feature maps corresponding to the first image.
  13. 一种图像处理装置,所述装置应用于解码端,所述装置包括:An image processing device, the device is applied to the decoding end, and the device includes:
    解码模块,用于解码获取到的目标码流,得到第三图像;The decoding module is used to decode the obtained target code stream and obtain the third image;
    操作模块,用于对所述第三图像进行第一操作,得到第四图像;所述第三图像的分辨率低于所述第四图像的分辨率;An operation module, configured to perform a first operation on the third image to obtain a fourth image; the resolution of the third image is lower than the resolution of the fourth image;
    其中,所述第一操作包括以下任意一项:Wherein, the first operation includes any one of the following:
    利用可逆神经网络对所述第三图像进行第二采样处理后得到第五图像,并对所述第五图像进行图像增强处理得到所述第四图像;Using a reversible neural network to perform second sampling processing on the third image to obtain a fifth image, and performing image enhancement processing on the fifth image to obtain the fourth image;
    对所述第三图像进行图像增强处理后,得到第六图像,并利用可逆神经网络对所述第六图像进行第二采样处理后得到所述第四图像。After performing image enhancement processing on the third image, a sixth image is obtained, and a reversible neural network is used to perform a second sampling process on the sixth image to obtain the fourth image.
  14. 根据权利要求13所述的装置,其中,所述操作模块,具体用于:The device according to claim 13, wherein the operation module is specifically used for:
    对目标图像进行上采样处理得到第三分量,根据分量信息确定所述目标图像对应的第四分量;所述目标图像为所述第三图像或所述第六图像;Perform an upsampling process on the target image to obtain a third component, and determine the fourth component corresponding to the target image according to the component information; the target image is the third image or the sixth image;
    对所述第三分量和所述第四分量进行离散小波反变换。The third component and the fourth component are subjected to an inverse discrete wavelet transform.
  15. 根据权利要求13或14所述的装置,其中,所述解码模块,具体用于:The device according to claim 13 or 14, wherein the decoding module is specifically used for:
    对获取到的所述目标码流进行算术解码和反量化,得到第二特征变量;Perform arithmetic decoding and inverse quantization on the obtained target code stream to obtain the second feature variable;
    使用神经网络解码器对所述第二特征变量进行解压缩,得到所述第三图像。Use a neural network decoder to decompress the second feature variable to obtain the third image.
  16. 根据权利要求13所述的装置,其中,所述装置还包括:The device of claim 13, wherein the device further comprises:
    第一确定模块,用于所述解码端在解码获取到的特征图码流得到第三特征图的情况下,对所述第三特征图进行第一操作,确定所述第四图像对应的全部重建特征图;The first determination module is used for the decoder to perform a first operation on the third feature map to determine all the features corresponding to the fourth image when the decoder decodes the obtained feature map code stream to obtain the third feature map. Reconstruct feature maps;
    第二确定模块,用于所述解码端在解码获取到的特征图码流得到第三特征图和部分重建特征图的情况下,基于所述部分重建特征图和第四特征图,确定所述第四图像对应的全部重建特征图;所述第四特征图基于对所述第三特征图进行第一操作确定;The second determination module is used for the decoding end to determine the third feature map and the partially reconstructed feature map based on the partially reconstructed feature map and the fourth feature map when decoding the obtained feature map code stream. All reconstructed feature maps corresponding to the fourth image; the fourth feature map is determined based on the first operation on the third feature map;
    其中,所述第一操作包括以下任意一项:Wherein, the first operation includes any one of the following:
    利用可逆神经网络对所述第三特征图进行第二采样处理,并对第二采样处理后的第三特征图进行图像增强处理;Using a reversible neural network to perform a second sampling process on the third feature map, and performing image enhancement processing on the third feature map after the second sampling process;
    对所述第三特征图进行图像增强处理,并对图像增强处理后的第三特征图进行第二采样处理。Perform image enhancement processing on the third feature map, and perform second sampling processing on the third feature map after the image enhancement processing.
  17. 一种终端,包括处理器和存储器,所述存储器存储可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如权利要求1-4中任一项所述的图像处理方法的步骤,或者实现如权利要求5-8中任一项所述的图像处理方法的步骤。A terminal, including a processor and a memory, the memory stores programs or instructions that can be run on the processor, and when the programs or instructions are executed by the processor, any one of claims 1-4 is implemented. The steps of the image processing method, or the steps of implementing the image processing method as described in any one of claims 5-8.
  18. 一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如权利要求1-4中任一项所述的图像处理方法的步骤,或者实现如权利要求5-8中任一项所述的图像处理方法的步骤。 A readable storage medium that stores programs or instructions that, when executed by a processor, implement the steps of the image processing method according to any one of claims 1-4, or The steps of implementing the image processing method according to any one of claims 5-8.
  19. 一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如权利要求1-4中任一项所述的图像处理方法的步骤,或者实现如权利要求5-8中任一项所述的图像处理方法的步骤。A chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the image as described in any one of claims 1-4 The steps of the processing method, or the steps of implementing the image processing method according to any one of claims 5-8.
  20. 一种计算机程序/程序产品,所述计算机程序/程序产品被存储在存储介质中,所述计算机程序/程序产品被至少一个处理器执行以实现如权利要求1-4中任一项所述的图像处理方法的步骤,或者实现如权利要求5-8中任一项所述的图像处理方法的步骤。 A computer program/program product, the computer program/program product is stored in a storage medium, and the computer program/program product is executed by at least one processor to implement the method as claimed in any one of claims 1-4 The steps of the image processing method, or the steps of implementing the image processing method according to any one of claims 5-8.
PCT/CN2023/104420 2022-07-07 2023-06-30 Image processing method and apparatus, and device WO2024007977A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202210803766 2022-07-07
CN202210803766.8 2022-07-07
CN202211256194.2A CN117395418A (en) 2022-07-07 2022-10-13 Image processing method, device and equipment
CN202211256194.2 2022-10-13

Publications (1)

Publication Number Publication Date
WO2024007977A1 true WO2024007977A1 (en) 2024-01-11

Family

ID=89454363

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/104420 WO2024007977A1 (en) 2022-07-07 2023-06-30 Image processing method and apparatus, and device

Country Status (1)

Country Link
WO (1) WO2024007977A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110782393A (en) * 2019-10-10 2020-02-11 江南大学 Image resolution compression and reconstruction method based on reversible network
CN111355965A (en) * 2020-02-28 2020-06-30 中国工商银行股份有限公司 Image compression and restoration method and device based on deep learning
CN111698508A (en) * 2020-06-08 2020-09-22 北京大学深圳研究生院 Super-resolution-based image compression method, device and storage medium
CN111970513A (en) * 2020-08-14 2020-11-20 成都数字天空科技有限公司 Image processing method and device, electronic equipment and storage medium
CN112714313A (en) * 2020-12-25 2021-04-27 创新奇智(合肥)科技有限公司 Image processing method, device, equipment and storage medium
CN113870104A (en) * 2020-06-30 2021-12-31 微软技术许可有限责任公司 Super-resolution image reconstruction

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110782393A (en) * 2019-10-10 2020-02-11 江南大学 Image resolution compression and reconstruction method based on reversible network
CN111355965A (en) * 2020-02-28 2020-06-30 中国工商银行股份有限公司 Image compression and restoration method and device based on deep learning
CN111698508A (en) * 2020-06-08 2020-09-22 北京大学深圳研究生院 Super-resolution-based image compression method, device and storage medium
CN113870104A (en) * 2020-06-30 2021-12-31 微软技术许可有限责任公司 Super-resolution image reconstruction
CN111970513A (en) * 2020-08-14 2020-11-20 成都数字天空科技有限公司 Image processing method and device, electronic equipment and storage medium
CN112714313A (en) * 2020-12-25 2021-04-27 创新奇智(合肥)科技有限公司 Image processing method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
JP6717385B2 (en) System and method for quantization parameter based video processing
US8731315B2 (en) Image compression and decompression for image matting
WO2023016155A1 (en) Image processing method and apparatus, medium, and electronic device
Patwa et al. Semantic-preserving image compression
Jia et al. Layered image compression using scalable auto-encoder
Fu et al. Learned image compression with discretized gaussian-laplacian-logistic mixture model and concatenated residual modules
EP4018410A1 (en) Watermark-based image reconstruction
CN111432213A (en) Adaptive tile data size coding for video and image compression
Zhang et al. Globally variance-constrained sparse representation and its application in image set coding
WO2024007977A1 (en) Image processing method and apparatus, and device
WO2020053688A1 (en) Rate distortion optimization for adaptive subband coding of regional adaptive haar transform (raht)
WO2023124461A1 (en) Video coding/decoding method and apparatus for machine vision task, device, and medium
CN107027027B (en) Image encoding method, image decoding method, image encoding device, image decoding device, and image encoding/decoding system
WO2024078404A1 (en) Feature map processing method and apparatus, and device
CN117395418A (en) Image processing method, device and equipment
WO2024131692A1 (en) Image processing method, apparatus and device
WO2024078403A1 (en) Image processing method and apparatus, and device
Wang et al. A survey of image compression algorithms based on deep learning
CN108805943B (en) Image transcoding method and device
CN111491166A (en) Dynamic compression system and method based on content analysis
Lei et al. An end-to-end face compression and recognition framework based on entropy coding model
Boddu et al. VLSI implementation of image compressor using probabilistic run length coding
CN117676149B (en) Image compression method based on frequency domain decomposition
WO2022258055A1 (en) Point cloud attribute information encoding and decoding method and apparatus, and related device
US12022078B2 (en) Picture processing method and apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23834744

Country of ref document: EP

Kind code of ref document: A1