WO2024007977A1

WO2024007977A1 - Image processing method and apparatus, and device

Info

Publication number: WO2024007977A1
Application number: PCT/CN2023/104420
Authority: WO
Inventors: 邓欣; 景俊鹏; 高方远; 李胜曦; 徐迈; 吕卓逸
Original assignee: 维沃移动通信有限公司
Priority date: 2022-07-07
Filing date: 2023-06-30
Publication date: 2024-01-11

Abstract

The present application relates to the technical field of image compression, and discloses an image processing method and apparatus, and a device. The image processing method in the embodiments of the present application comprises: an encoding end performs first sampling processing on an obtained first image by using an invertible neural network, so as to obtain a second image, the resolution of the second image being lower than that of the first image; and the encoding end encodes and compresses the second image to generate a target code stream.

Description

Image processing methods, devices and equipment

Cross-references to related applications

This application claims the priority of Chinese Patent Application No. 202210803766.8 filed in China on July 7, 2022, and claims the priority of Chinese Patent Application No. 202211256194.2 filed in China on October 13, 2022, all of which The content is incorporated herein by reference.

Technical field

The present application belongs to the field of image compression technology, and specifically relates to an image processing method, device and equipment.

Background technique

In the field of image compression technology, traditional image compression standards such as the Joint Photographic Experts Group (JPEG) standard are an image compression technology that compresses a wide range of images in common application scenarios. There are differences in image compression ratios. High defect; image compression technology based on deep learning has the defect that the image quality of the reconstructed image is not high when compressing at a low bit rate. How to improve the compression ratio of images while ensuring image quality is a technical problem to be solved in this field.

Contents of the invention

The embodiments of the present application provide an image processing method, device and equipment, which can solve the problem in related technologies that the image compression ratio cannot be improved while ensuring the image quality.

The first aspect provides an image processing method, including:

The encoding end uses a reversible neural network to perform a first sampling process on the acquired first image to obtain a second image; the resolution of the second image is lower than the resolution of the first image;

The encoding end codes and compresses the second image to generate a target code stream.

The second aspect provides an image processing method, including:

The decoding end decodes the obtained target code stream and obtains the third image;

The decoding end performs a first operation on the third image to obtain a fourth image; the resolution of the third image is lower than the resolution of the fourth image;

Wherein, the first operation includes any one of the following:

Using a reversible neural network to perform second sampling processing on the third image to obtain a fifth image, and performing image enhancement processing on the fifth image to obtain the fourth image;

After performing image enhancement processing on the third image, a sixth image is obtained, and a reversible neural network is used to perform a second sampling process on the sixth image to obtain the fourth image.

In a third aspect, an image processing device is provided, including:

A processing module configured to use a reversible neural network to perform a first sampling process on the acquired first image to obtain a second Image; the resolution of the second image is lower than the resolution of the first image;

A generating module, configured to encode and compress the second image and generate a target code stream.

In a fourth aspect, an image processing device is provided, including:

The decoding module is used to decode the obtained target code stream and obtain the third image;

An operation module, configured to perform a first operation on the third image to obtain a fourth image; the resolution of the third image is lower than the resolution of the fourth image;

Wherein, the first operation includes any one of the following:

In a fifth aspect, a terminal is provided. The terminal includes a processor and a memory. The memory stores programs or instructions that can be run on the processor. When the program or instructions are executed by the processor, the following implementations are implemented: The steps of the method described in one aspect, or the steps of implementing the method described in the second aspect.

In a sixth aspect, a readable storage medium is provided. Programs or instructions are stored on the readable storage medium. When the programs or instructions are executed by a processor, the steps of the method described in the first aspect are implemented, or the steps of the method are implemented as described in the first aspect. The steps of the method described in the second aspect.

In a seventh aspect, a chip is provided. The chip includes a processor and a communication interface. The communication interface is coupled to the processor. The processor is used to run programs or instructions to implement the method described in the first aspect. , or implement the method described in the second aspect.

In an eighth aspect, a computer program/program product is provided, the computer program/program product is stored in a storage medium, and the computer program/program product is executed by at least one processor to implement the method described in the first aspect The steps of a method, or steps of implementing a method as described in the second aspect.

In a ninth aspect, a system is provided. The system includes an encoding end and a decoding end. The encoding end performs the steps of the method described in the first aspect. The decoding end performs the steps of the method described in the second aspect. step.

In the embodiment of this application, the encoding end uses a reversible neural network to perform a first sampling process on the acquired first image to obtain a second image, and further encodes and compresses the second image to generate a target code stream; by using the reversible neural network The first sampling process is performed to increase the compression ratio of the image. The target code stream obtained by the decoding end obtains the third image, and further performs a first operation on the third image. The above-mentioned first operation includes image enhancement processing, and the image quality is improved through the image enhancement processing, thereby improving the image quality at the same time. , improve the compression ratio of the image.

Description of the drawings

Figure 1 is one of the flow diagrams of the image processing method provided by the embodiment of the present application;

Figure 2 is a schematic diagram of a rescaling module provided by an embodiment of the present application;

Figure 3 is a schematic diagram of a learnable compression coding and decoding module provided by an embodiment of the present application;

Figure 4 is the second schematic flowchart of the image processing method provided by the embodiment of the present application;

Figure 5 is a schematic diagram of the application framework of the image processing method provided by the embodiment of the present application;

Figure 6 is one of the structural diagrams of the image processing device provided by the embodiment of the present application;

Figure 7 is the second structural diagram of the image processing device provided by the embodiment of the present application;

Figure 8 is a structural diagram of a communication device provided by an embodiment of the present application;

Figure 9 is a schematic diagram of the hardware structure of a terminal provided by an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be clearly described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art fall within the scope of protection of this application.

The terms "first", "second", etc. in the description and claims of this application are used to distinguish similar objects and are not used to describe a specific order or sequence. It is to be understood that the terms so used are interchangeable under appropriate circumstances so that the embodiments of the present application can be practiced in sequences other than those illustrated or described herein, and that "first" and "second" are distinguished objects It is usually one type, and the number of objects is not limited. For example, the first object can be one or multiple. In addition, "and/or" in the description and claims indicates at least one of the connected objects, and the character "/" generally indicates that the related objects are in an "or" relationship.

The image processing method applied to the encoding end provided by the embodiments of the present application will be described in detail below with reference to the accompanying drawings through some embodiments and their application scenarios.

Please refer to Figure 1, which is one of the flow charts of the image processing method provided by the embodiment of the present application. The image processing method provided in this embodiment includes the following steps:

S101. The encoding end uses a reversible neural network to perform first sampling processing on the acquired first image to obtain a second image.

In this step, the encoding end may use the rescaling module to perform first sampling processing on the acquired first image to obtain the second image. Among them, the rescaling module is implemented based on the structure of the reversible neural network.

For ease of understanding, please refer to Figure 2. Figure 2 is a schematic diagram of a rescaling module provided by an embodiment of the present application. The affine coupling layer in Figure 2 is the neural network layer in the rescaling module.

It should be understood that the resolution of the second image is lower than the resolution of the first image. The first image is also called the original image, and the second image is also called the low-resolution image.

S102: The encoding end encodes and compresses the second image to generate a target code stream.

In this step, an optional implementation method is that the encoding end inputs the second image to the neural network encoder, and the neural network encoder encodes and compresses the second image to generate a target code stream.

For ease of understanding, please refer to Figure 3. Figure 3 is a schematic diagram of a learnable compression encoding and decoding module provided by an embodiment of the present application. The encoder in the learnable compression encoding and decoding module shown in Figure 3 is the above-mentioned neural network encoder.

Another optional implementation is that the encoding end inputs the second image to an image encoder in the related art, and the image The encoder encodes and compresses the second image to generate a target code stream.

Optionally, using a reversible neural network to perform a first sampling process on the acquired first image to obtain the second image includes:

The encoding end performs discrete wavelet transformation on the acquired first image to obtain the first component and the second component corresponding to the first image;

The encoding end performs downsampling processing on the first component and the second component to obtain a second image.

This embodiment specifically explains how to use the rescaling module to perform the first sampling process on the acquired first image.

In this embodiment, the rescaling module performs discrete wavelet transform on the first image, and decomposes the first image into a first component and a second component according to the frequency corresponding to each pixel of the first image, where the first component is also called low frequency. component, the second component is also called the high-frequency component. The first component and the second component are input to the affine coupling layer shown in Figure 2, and after a series of affine coupling layer processes, the second image obtained after downsampling processing is output. It should be understood that the reversible neural network includes multiple substructures, that is, the reversible neural network includes multiple affine coupling layers.

Specifically, the encoding end can perform downsampling processing on the first component and the second component according to the first formula to obtain the second image. Among them, the first formula is Characterizes the output of the i-th substructure in the reversible neural network, and the second image is the output of the last substructure in the reversible neural network, To represent the first component, To represent the second component, Characterize the processing results of the second component by the reversible neural network.

It should be understood that the distortion loss function corresponding to the rescaling module shown in Figure 2 can be expressed by the following formula:

Among them, D(θ) represents the distortion loss function, Characterizing the fourth image, represents the first image, θ represents the network parameters, N represents the number of training samples, and l ₂ represents the difference between the first image and the fourth image.

The difference between the first image and the fourth image can be calculated in several ways, which are not limited here. Optionally, the difference between the first image and the fourth image may be the mean square error (MSE) of each pixel between the first image and the fourth image, or the difference between the first image and the fourth image. The sum of absolute differences (SAD) of each pixel point between the images can be the structural similarity (Structural Similarity (SSIM) of each pixel point between the first image and the fourth image, which can be the first image Multi-scale structural similarity (SSIM) of each pixel between the fourth image and the fourth image.

In this embodiment, a reversible neural network is used to perform a first sampling process on the acquired first image, thereby reducing the code rate of directly compressing the first image, thereby improving the compression ratio of the image.

Optionally, the encoding compresses the second image and generating the target code stream includes:

The encoding end uses a neural network encoder to compress the second image to obtain the first feature variable corresponding to the second image;

The encoding end performs quantization and arithmetic coding on the first feature variable to generate a target code stream.

This embodiment specifically explains how to use a neural network encoder to encode and compress the second image.

Please refer to Figure 3. The "original image" in Figure 3 is the second image, and the "encoder" in Figure 3 is the neural network encoder. In this embodiment, the second image is processed by the neural network encoder to obtain the latent variable y, feature extraction is performed on the latent variable y to obtain the first feature variable, and the target code stream is generated by performing quantization and arithmetic coding on the first feature encoding. The above arithmetic coding is a coding method of entropy coding, and other types of entropy coding operations may also be used in this embodiment.

It should be understood that the rate loss function corresponding to the learnable compression encoding and decoding module shown in Figure 3 can be expressed by the following formula:

Among them, R(θ) represents the rate loss function, E represents the mathematical expectation, N represents the number of training samples, y represents the latent variable, Characterizing the fit of latent variables, representation information entropy.

Optionally, the method also includes:

The encoding end uses a reversible neural network to perform a first sampling process on at least part of the original feature map corresponding to the first image to obtain a first feature map;

When performing the first sampling process on part of the original feature maps corresponding to the first image, the encoding end encodes and compresses the first feature map and the second feature map to generate a feature map code stream; the second The feature map is the original feature map corresponding to the first image that has not been subjected to the first sampling process;

When performing the first sampling process on all original feature maps corresponding to the first image, the encoding end encodes and compresses the first feature map to generate a feature map code stream.

The image processing method provided by the embodiment of the present application can also be applied to processing feature maps.

Specifically, after acquiring the first image, the encoding end can use a neural network to extract the original feature map corresponding to the first image. The above-mentioned neural network includes but is not limited to Convolutional Neural Network (Feature Pyramid Networks, FPN) and Fast Region Convolutional Neural Networks (R-CNN).

The following is an example of using FPN to extract the original feature map corresponding to the first image:

The first image in this example is composed of 3 color channels and has a resolution of W×H, where W is the image width and H is the image height. Use FPN to extract features from the first image and obtain 4 original feature maps, namely P2, P3, P4 and P5. Among them, the resolution corresponding to P2 is The resolution corresponding to P3 is The resolution corresponding to P4 is The resolution corresponding to P5 is The number of channels corresponding to the original feature maps is 256.

An optional implementation is that the encoding end uses a reversible neural network to perform the first sampling process on part of the original feature maps. In this implementation, the encoding end performs the first sampling process on the second feature map (that is, the original features that have not been subjected to the first sampling process). Figure) and the first feature map obtained through the first sampling process are encoded and compressed to generate a feature map code stream. Among them, the specific implementation of performing the first sampling process on part of the original feature maps is the same as the implementation of the first sampling process in the above embodiment. Consistent, will not be repeated here.

Another optional implementation is that the encoding end uses a reversible neural network to perform a first sampling process on all original feature maps. In this implementation, the encoding end encodes and compresses the first feature map obtained through the first sampling process. , generate feature image code stream.

Optionally, the encoding end can normalize the feature map before performing encoding and compression.

Specifically, the neural network encoder counts the maximum value (norm _max ) and minimum value (norm _min ) of the feature map data within the same batch of data for each received feature map data, and passes The following formula normalizes the feature map
val _new =(val _ori -norm _min )/(norm _max -norm _min )

Among them, val _mew represents the value of the sample point after normalization, and val _ori represents the value of the sample point before normalization.

It should be understood that during the encoding process, the encoding end will also encode the above-mentioned maximum value (norm _max ) and minimum value (norm _min ) and transmit them to the decoding end.

In this embodiment, by normalizing the feature map, in the subsequent process of encoding and compressing the feature map, the encoded data is dispersed as much as possible, thereby reducing the loss during the encoding process, thereby improving the encoding and compression effect.

In this embodiment, a reversible neural network is used to perform a first sampling process on at least part of the original feature map corresponding to the first image, thereby reducing the code stream for directly compressing the feature map, thereby improving the compression ratio of the feature map.

The image processing method provided by the embodiments of the present application and applied to the decoder will be described in detail below with reference to the accompanying drawings through some embodiments and their application scenarios.

Please refer to FIG. 4 , which is the second flow chart of the image processing method provided by the embodiment of the present application. The image processing method provided in this embodiment includes the following steps:

S401. The decoder decodes the obtained target code stream and obtains the third image.

In this step, an optional implementation method is that the decoding end inputs the obtained target code stream to the neural network decoder, and the neural network decoder decodes the target code stream to obtain the third image.

For ease of understanding, please refer to Figure 3. Figure 3 is a schematic diagram of a learnable compression coding and decoding module provided by an embodiment of the present application. The decoder in the learnable compression coding and decoding module shown in Figure 3 is the above-mentioned neural network decoder.

Another optional implementation is that the decoding end inputs the second image to an image decoder in the related art, and the image decoder encodes the target code stream to obtain a third image.

S402: The decoder performs a first operation on the third image to obtain a fourth image.

In this embodiment, the above-mentioned first operation includes any one of the following:

Using a reversible neural network to perform second sampling processing on the third image, a fifth image is obtained, and image enhancement processing is performed on the fifth image to obtain a fourth image; after image enhancement processing is performed on the third image, a sixth image is obtained, and using The reversible neural network performs a second sampling process on the sixth image to obtain a fourth image. Wherein, the resolution of the third image is lower than the resolution of the fourth image.

An optional implementation manner is to first perform a second sampling process on the third image to obtain a fifth image, and then perform image enhancement processing on the fifth image to obtain a fourth image.

Specifically, the third image can be input into the rescaling module shown in FIG. 2 to obtain the sixth image output by the rescaling module, and then the enhancement module is used to perform image enhancement processing on the fifth image to obtain the fourth image. For a specific implementation of how to use the rescaling module to perform second sampling processing on the third image, please refer to subsequent embodiments.

Another optional implementation is to first perform image enhancement processing on the third image to obtain a sixth image, and then perform second sampling processing on the sixth image to obtain a fourth image.

Specifically, the enhancement module can be used to perform image enhancement processing on the third image to obtain a sixth image, and then the sixth image is input into the rescaling module shown in FIG. 2 to obtain a fourth image output by the rescaling module. For a specific implementation of how to use the rescaling module to perform the second sampling process on the sixth image, please refer to subsequent embodiments.

It should be understood that the above enhancement module is based on the Residual Channel Attention Network (RCAN). Optionally, set the residual connection group in the RCAN network to 5, and the residual channel attention block in the RCAN network ( The number of Residual Channel Attention Block (RCAB) is 10. In other embodiments, the enhancement module can also be constructed based on other neural networks.

It should be understood that the enhancement loss function corresponding to the enhancement module can be expressed by the following formula:

Among them, E(θ) represents the enhanced loss function, N represents the number of training samples, Characterizes the image before enhancement processing, that is, the third image or the fifth image, represents the image after enhancement processing, that is, the fourth image or the sixth image, and l ₂ represents the difference between the image before enhancement processing and the image after enhancement processing.

The calculation of the difference between the image before enhancement and the image after enhancement can be implemented in several ways, which are not limited here. Optionally, the difference between the above-mentioned image before enhancement processing and the image after enhancement processing may be the mean square error of each pixel between the image before enhancement processing and the image after enhancement processing, or may be the difference between the image before enhancement processing and the image after enhancement processing. The sum of the absolute values of the differences of each pixel between the image before enhancement and the image after enhancement can be the structural similarity of each pixel between the image before enhancement and the image after enhancement, or it can be the difference between the image before enhancement and the image after enhancement. The multi-scale structural similarity of each pixel between the resulting images.

In this step, the image quality is improved by using the enhancement module for image enhancement processing.

Optionally, the decoder performing the first operation on the third image includes:

The decoding end performs an upsampling process on the target image to obtain a third component, and determines a fourth component corresponding to the target image according to the component information; the target image is the third image or the sixth image;

The decoding end performs inverse discrete wavelet transform on the third component and the fourth component.

This embodiment specifically explains how to use the rescaling module to perform the first operation on the acquired second image.

In this embodiment, the target image is input to the affine coupling layer shown in Figure 2. After a series of affine coupling layer processing, the third component and the fourth component corresponding to the target image are output. The above third component and The fourth component is subjected to inverse discrete wavelet transform to obtain the fourth image. Among them, the enhanced image in Figure 2 is the target image, the reconstructed image in Figure 2 is the fourth image, the target image is the third image or the sixth image, the third component is the low-frequency component corresponding to the target image, and the fourth component is the target The high-frequency component corresponding to the image.

Specifically, the third component can be obtained by upsampling the target image according to the second formula, where the second formula is represents the third component, Characterizes the output of the i+1th substructure of the reversible neural network. The third image is the output of the last substructure of the reversible neural network. Characterize the processing results of the fourth component by the reversible neural network.

The fourth component corresponding to the target image can be determined according to the third formula, where the third formula is Represents the fourth component, Characterizes the component information corresponding to the fourth component, Characterizes the processing results of component information by the reversible neural network, α is the preset parameter, Characterize the processing results of component information by the reversible neural network.

Optionally, decoding the acquired target code stream to obtain the third image includes:

The decoding end performs arithmetic decoding and inverse quantization on the obtained target code stream to obtain the second feature variable;

The decoding end uses a neural network decoder to decompress the second feature variable to obtain the third image.

This embodiment specifically explains how to use a neural network decoder to decode the obtained target code stream.

Please refer to Figure 3. The "compressed image" in Figure 3 is the third image, and the "decoder" in Figure 3 is the neural network decoder. In this embodiment, the decoder performs arithmetic decoding and inverse quantization on the acquired target code stream to obtain the decoded latent variables. and latent variables Feature extraction is performed to obtain the second feature variable, and a neural network decoder is used to decompress the second feature variable to obtain the third image. The above arithmetic decoding is a coding method of entropy decoding, and other types of entropy decoding operations may also be used in this embodiment.

To facilitate understanding of the overall technical solution, please refer to Figure 5. X _H shown in Figure 5 represents the first image, and X _L represents the second image. represents the third image, represents the fourth image, X _E represents the image after image enhancement processing, the distortion loss represents the image loss between the first image and the fourth image, the enhancement loss represents the image loss between the second image and the image after image enhancement processing, The quality gap represents the image quality gap between the second image and the third image, and the rate loss represents the loss after lossy encoding by the learnable compression encoder.

It should be understood that the total loss function corresponding to the application framework shown in Figure 5 is the weighted sum of the above-mentioned rate loss function and distortion loss function, where the enhancement loss function is only used in the training phase of the enhancement module and is not included in the total loss function.

In the actual training process of each neural network module included in the application framework shown in Figure 5, one possible way to implement it is to first train the rescaling module and the learnable compression codec together, and then train the enhancement module separately. After the training of the above three modules is completed, the neural network parameters in the enhancement module are adjusted under the joint action of the rescaling module and the learnable compression codec. Finally, the neural network parameters in the above three modules are adjusted in an end-to-end manner. Network parameters.

Another possible implementation is to train the rescaling module, learnable compression codec and enhancement module independently.

In the application scenario shown in Figure 5, the encoding end uses the rescaling module to perform the first sampling process on the first image to obtain the second image; the encoding end uses the learnable compression codec to encode the second image to obtain the target code flow. The decoding end uses the learnable compression codec to decode the target code stream to obtain the third image; the decoding end uses the enhancement module to perform image enhancement processing on the third image to obtain the image after image enhancement; the decoding end uses the rescaling module to perform image enhancement on the image. The enhanced image is subjected to a second sampling process to obtain a fourth image.

Optionally, after obtaining the third image, the decoder may first use the rescaling module to perform the second sampling process on the third image, and then use the enhancement module to perform image enhancement processing on the image after the second sampling process.

Optionally, the learnable compression codec includes a rescaling module, ie, the rescaling module is included as part of the learnable compression codec.

Optionally, the above-mentioned learnable compression codec may be replaced with a codec in related art.

Optionally, the method also includes:

When the decoder decodes the obtained feature map code stream to obtain a third feature map, the decoder performs a first operation on the third feature map to determine all reconstructed feature maps corresponding to the fourth image;

When the decoder decodes the obtained feature map code stream to obtain a third feature map and a partial original feature map, it determines all reconstructions corresponding to the fourth image based on the partially reconstructed feature map and the fourth feature map. Feature map.

An optional implementation manner is that the decoding end decodes the obtained feature map code stream to obtain the third feature map. In this implementation manner, a first operation is performed on the third feature map to determine all reconstructed feature maps corresponding to the fourth image.

Another optional implementation is that the decoder decodes the obtained feature map code stream to obtain the third feature map and the partially reconstructed feature map. In this implementation manner, a first operation is performed on the third feature map to determine the fourth feature map, and the partially reconstructed feature map and the fourth feature map are determined as all reconstructed feature maps corresponding to the fourth image.

It should be understood that the first operation includes any one of the following: using a reversible neural network to perform a second sampling process on the third feature map, and performing image enhancement processing on the third feature map after the second sampling process; performing a second sampling process on the third feature map. Image enhancement processing, and performing a second sampling process on the third feature map after the image enhancement processing. For the specific implementation of the above-mentioned first operation, please refer to the contents of the above-mentioned embodiments, and will not be repeated here.

For the image processing method provided by the embodiments of the present application, the execution subject may be an image processing device. In the embodiment of the present application, the image processing device is used at the encoding end to perform the image processing method as an example to illustrate the image processing device provided by the embodiment of the present application.

As shown in Figure 6, this embodiment of the present application also provides an image processing device 600, including:

The first processing module 601 is configured to use a reversible neural network to perform a first sampling process on the acquired first image to obtain a second image; the resolution of the second image is lower than the resolution of the first image;

The first generation module 602 is used to encode and compress the second image to generate a target code stream.

Optionally, the first processing module 601 is specifically used to:

Perform discrete wavelet transform on the acquired first image to obtain the first component and the second component corresponding to the first image;

Perform downsampling processing on the first component and the second component to obtain a second image.

Optionally, the first generation module 602 is specifically used to:

Use a neural network encoder to compress the second image to obtain the first feature variable corresponding to the second image;

The first feature variable is quantized and arithmetic encoded to generate a target code stream.

Optionally, the image processing device 600 also includes:

A second processing module configured to use a reversible neural network to perform first sampling processing on at least part of the original feature map corresponding to the first image to obtain a first feature map;

A second generation module, configured to encode and compress the first feature map and the second feature map to generate a feature map code stream when performing the first sampling process on part of the original feature map corresponding to the first image;

The third generation module is configured to encode and compress the first feature map to generate a feature map code stream when performing the first sampling process on all original feature maps corresponding to the first image.

For the image processing method provided by the embodiments of the present application, the execution subject may be an image processing device. In the embodiment of the present application, the image processing device is used in the decoder to perform the image processing method as an example to illustrate the image processing device provided by the embodiment of the present application.

As shown in Figure 7, this embodiment of the present application also provides an image processing device 700, including:

The decoding module 701 is used to decode the obtained target code stream to obtain the third image;

The operation module 702 is used to perform a first operation on the third image to obtain a fourth image; the resolution of the third image is lower than the resolution of the fourth image;

Wherein, the first operation includes any one of the following:

Optionally, the operation module 702 is specifically used to:

Perform an upsampling process on the target image to obtain a third component, and determine the fourth component corresponding to the target image according to the component information; the target image is the third image or the sixth image;

The third component and the fourth component are subjected to an inverse discrete wavelet transform.

Optionally, the decoding module 701 is specifically used to:

Perform arithmetic decoding and inverse quantization on the obtained target code stream to obtain the second feature variable;

Use a neural network decoder to decompress the second feature variable to obtain the third image.

Optionally, the image processing device 700 also includes:

The first determination module is used for the decoder to perform a first operation on the third feature map to determine all the features corresponding to the fourth image when the decoder decodes the obtained feature map code stream to obtain the third feature map. Reconstruct feature maps;

The second determination module is used for the decoding end to determine the third feature map and the partially reconstructed feature map based on the partially reconstructed feature map and the fourth feature map when decoding the obtained feature map code stream. All reconstructed feature maps corresponding to the fourth image; the fourth feature map is determined based on the first operation on the third feature map;

Wherein, the first operation includes any one of the following:

Using a reversible neural network to perform a second sampling process on the third feature map, and performing image enhancement processing on the third feature map after the second sampling process;

Perform image enhancement processing on the third feature map, and perform a second acquisition on the third feature map after image enhancement processing. Treat it like this.

The image processing device applied to the encoding end provided by the embodiment of the present application can implement each process implemented by the method embodiment in Figure 1 and achieve the same technical effect. To avoid duplication, the details will not be described here.

The image processing device applied to the decoding end provided by the embodiment of the present application can implement each process implemented by the method embodiment in Figure 4 and achieve the same technical effect. To avoid duplication, the details will not be described here.

The image processing device in the embodiment of the present application may be an electronic device, such as an electronic device with an operating system, or may be a component in the electronic device, such as an integrated circuit or chip. The electronic device may be a terminal or other devices other than the terminal. For example, terminals may include but are not limited to the types of terminals listed above, and other devices may be servers, network attached storage (Network Attached Storage, NAS), etc., which are not specifically limited in the embodiments of this application.

Optionally, as shown in Figure 8, this embodiment of the present application also provides a communication device 800, which includes a processor 801 and a memory 802. The memory 802 stores programs or instructions that can be run on the processor 801, such as , when the communication device 800 is a terminal, when the program or instruction is executed by the processor 801, each step of the above image processing method embodiment is implemented, and the same technical effect can be achieved.

An embodiment of the present application also provides a terminal, including a processor 801 and a communication interface. The processor 801 is configured to perform the following operations:

Using a reversible neural network to perform a first sampling process on the acquired first image to obtain a second image; the resolution of the second image is lower than the resolution of the first image;

Encoding and compressing the second image to generate a target code stream.

Alternatively, processor 801 is configured to perform the following operations:

Decode the obtained target code stream to obtain the third image;

Perform a first operation on the third image to obtain a fourth image; the resolution of the third image is lower than the resolution of the fourth image;

Wherein, the first operation includes any one of the following:

This terminal embodiment corresponds to the above-mentioned terminal-side method embodiment. Each implementation process and implementation manner of the above-mentioned method embodiment can be applied to this terminal embodiment, and can achieve the same technical effect. Specifically, Figure 9 shows the implementation of this application Schematic diagram of the hardware structure of a terminal according to the embodiment.

The terminal 900 includes but is not limited to: a radio frequency unit 901, a network module 902, an audio output unit 903, an input unit 904, a sensor 905, a display unit 906, a user input unit 907, an interface unit 907, a memory 909, a processor 910 and other components. .

Those skilled in the art can understand that the terminal 900 may also include a power supply (such as a battery) that supplies power to various components. The power supply may be logically connected to the processor 910 through a power management system, thereby managing charging, discharging, and power consumption through the power management system. Management and other functions. The terminal structure shown in FIG. 9 does not constitute a limitation on the terminal. The terminal may include more or fewer components than shown in the figure, or may combine certain components, or arrange different components, which will not be described again here.

It should be understood that in the embodiment of the present application, the input unit 904 may include a graphics processor (Graphics Processing Unit, GPU) 9041 and a microphone 9042. The graphics processor 9041 is responsible for the image capture device (GPU) in the video capture mode or the image capture mode. Process the image data of still pictures or videos obtained by cameras (such as cameras). The display unit 906 may include a display panel 9061, and the display panel 9071 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 907 includes a touch panel 9071 and at least one of other input devices 9072 . Touch panel 9071, also known as touch screen. The touch panel 9071 may include two parts: a touch detection device and a touch controller. Other input devices 9072 may include but are not limited to physical keyboards, function keys (such as volume control keys, switch keys, etc.), trackballs, mice, and joysticks, which will not be described again here.

In this embodiment of the present application, after receiving downlink data from the network side device, the radio frequency unit 901 can transmit it to the processor 910 for processing; the radio frequency unit 901 can send uplink data to the network side device. Generally, the radio frequency unit 901 includes, but is not limited to, an antenna, an amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, etc.

Memory 909 may be used to store software programs or instructions as well as various data. The memory 909 may mainly include a first storage area for storing programs or instructions and a second storage area for storing data, wherein the first storage area may store an operating system, an application program or instructions required for at least one function (such as a sound playback function, Image playback function, etc.) etc. Additionally, memory 909 may include volatile memory or nonvolatile memory, or memory 909 may include both volatile and nonvolatile memory. Among them, non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electrically removable memory. Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory. Volatile memory can be random access memory (Random Access Memory, RAM), static random access memory (Static RAM, SRAM), dynamic random access memory (Dynamic RAM, DRAM), synchronous dynamic random access memory (Synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDRSDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), synchronous link dynamic random access memory (Synch link DRAM) , SLDRAM) and direct memory bus random access memory (Direct Rambus RAM, DRRAM). Memory 909 in embodiments of the present application includes, but is not limited to, these and any other suitable types of memory.

The processor 910 may include one or more processing units; optionally, the processor 910 integrates an application processor and a modem processor, where the application processor mainly handles operations related to the operating system, user interface, application programs, etc., Modem processors mainly process wireless communication signals, such as baseband processors. It can be understood that the above modem processor may not be integrated into the processor 910.

Among them, the processor 910 is used to perform the following operations:

Encoding and compressing the second image to generate a target code stream.

Alternatively, processor 910 is configured to perform the following operations:

Decode the obtained target code stream to obtain the third image;

Wherein, the first operation includes any one of the following:

Embodiments of the present application also provide a readable storage medium. Programs or instructions are stored on the readable storage medium. When the program or instructions are executed by a processor, each process of the above image processing method embodiment is implemented and the same can be achieved. The technical effects will not be repeated here to avoid repetition.

Wherein, the processor is the processor in the terminal described in the above embodiment. The readable storage medium includes computer readable storage media, such as computer read-only memory ROM, random access memory RAM, magnetic disk or optical disk, etc.

An embodiment of the present application further provides a chip. The chip includes a processor and a communication interface. The communication interface is coupled to the processor. The processor is used to run programs or instructions to implement the above image processing method embodiments. Each process can achieve the same technical effect. To avoid duplication, it will not be described again here.

It should be understood that the chips mentioned in the embodiments of this application may also be called system-on-chip, system-on-a-chip, system-on-chip or system-on-chip, etc.

Embodiments of the present application further provide a computer program/program product. The computer program/program product is stored in a storage medium. The computer program/program product is executed by at least one processor to implement the above image processing method embodiment. Each process can achieve the same technical effect. To avoid repetition, we will not go into details here.

Embodiments of the present application further provide a system. The system includes an encoding end and a decoding end. The encoding end performs the various processes of the image processing method embodiments applied to the encoding end. The decoding end performs the above processes applied to the decoding end. Each process of the image processing method embodiment can achieve the same technical effect. To avoid repetition, it will not be described again here.

It should be noted that, in this document, the terms "comprising", "comprises" or any other variations thereof are intended to cover a non-exclusive inclusion, such that a process, method, article or device that includes a series of elements not only includes those elements, and also includes other elements not expressly listed or included for such process, method, article or apparatus. inherent elements. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article or apparatus that includes that element. In addition, it should be pointed out that the scope of the methods and devices in the embodiments of the present application is not limited to performing functions in the order shown or discussed, but may also include performing functions in a substantially simultaneous manner or in reverse order according to the functions involved. Functions may be performed, for example, the methods described may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better. implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a computer software product that is essentially or contributes to related technologies. The computer software product is stored in a storage medium (such as ROM/RAM, disk, CD), including several instructions to cause a terminal (which can be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in various embodiments of this application.

The embodiments of the present application have been described above in conjunction with the accompanying drawings. However, the present application is not limited to the above-mentioned specific implementations. The above-mentioned specific implementations are only illustrative and not restrictive. Those of ordinary skill in the art will Inspired by this application, many forms can be made without departing from the purpose of this application and the scope protected by the claims, all of which fall within the protection of this application.

Claims

An image processing method including:

The encoding end uses a reversible neural network to perform a first sampling process on the acquired first image to obtain a second image; the resolution of the second image is lower than the resolution of the first image;

The encoding end codes and compresses the second image to generate a target code stream.
The method according to claim 1, wherein said using a reversible neural network to perform a first sampling process on the acquired first image to obtain the second image includes:

The encoding end performs discrete wavelet transformation on the acquired first image to obtain the first component and the second component corresponding to the first image;

The encoding end performs downsampling processing on the first component and the second component to obtain a second image.
The method according to claim 1 or 2, wherein the encoding compresses the second image and generating the target code stream includes:

The encoding end uses a neural network encoder to compress the second image to obtain the first feature variable corresponding to the second image;

The encoding end performs quantization and arithmetic coding on the first feature variable to generate a target code stream.
The method of claim 1, further comprising:

The encoding end uses a reversible neural network to perform a first sampling process on at least part of the original feature map corresponding to the first image to obtain a first feature map;

When performing the first sampling process on part of the original feature maps corresponding to the first image, the encoding end encodes and compresses the first feature map and the second feature map to generate a feature map code stream; the second The feature map is the original feature map corresponding to the first image that has not been subjected to the first sampling process;

When performing the first sampling process on all original feature maps corresponding to the first image, the encoding end encodes and compresses the first feature map to generate a feature map code stream.
An image processing method including:

The decoding end decodes the obtained target code stream and obtains the third image;

The decoding end performs a first operation on the third image to obtain a fourth image; the resolution of the third image is lower than the resolution of the fourth image;

Wherein, the first operation includes any one of the following:

Using a reversible neural network to perform second sampling processing on the third image to obtain a fifth image, and performing image enhancement processing on the fifth image to obtain the fourth image;

After performing image enhancement processing on the third image, a sixth image is obtained, and a reversible neural network is used to perform a second sampling process on the sixth image to obtain the fourth image.
The method according to claim 5, wherein the decoding end performs the first operation on the third image including:

The decoder performs an upsampling process on the target image to obtain the third component, and determines the target image based on the component information. the fourth component corresponding to the image; the target image is the third image or the sixth image;

The decoding end performs inverse discrete wavelet transform on the third component and the fourth component.
The method according to claim 5 or 6, wherein said decoding the acquired target code stream to obtain the third image includes:

The decoding end performs arithmetic decoding and inverse quantization on the obtained target code stream to obtain the second feature variable;

The decoding end uses a neural network decoder to decompress the second feature variable to obtain the third image.
The method of claim 5, further comprising:

When the decoder decodes the obtained feature map code stream to obtain a third feature map, the decoder performs a first operation on the third feature map to determine all reconstructed feature maps corresponding to the fourth image;

When the decoder decodes the obtained feature map code stream to obtain a third feature map and a partially reconstructed feature map, it determines all reconstructions corresponding to the fourth image based on the partially reconstructed feature map and the fourth feature map. Feature map; the fourth feature map is determined based on the first operation on the third feature map;

Wherein, the first operation includes any one of the following:

Using a reversible neural network to perform a second sampling process on the third feature map, and performing image enhancement processing on the third feature map after the second sampling process;

Perform image enhancement processing on the third feature map, and perform second sampling processing on the third feature map after the image enhancement processing.
An image processing device, the device is applied to the encoding end, and the device includes:

A first processing module configured to use a reversible neural network to perform a first sampling process on the acquired first image to obtain a second image; the resolution of the second image is lower than the resolution of the first image;

The first generation module is used to encode and compress the second image to generate a target code stream.
The device according to claim 9, wherein the first processing module is specifically used for:

Perform discrete wavelet transform on the acquired first image to obtain the first component and the second component corresponding to the first image;

Perform downsampling processing on the first component and the second component to obtain a second image.
The device according to claim 9 or 10, wherein the first generation module is specifically used for:

Use a neural network encoder to compress the second image to obtain the first feature variable corresponding to the second image;

The first feature variable is quantized and arithmetic encoded to generate a target code stream.
The device of claim 9, further comprising:

A second processing module configured to use a reversible neural network to perform first sampling processing on at least part of the original feature map corresponding to the first image to obtain a first feature map;

The second generation module is used to encode and compress the first feature map and the second feature map when performing the first sampling process on the partial original feature map corresponding to the first image, and generate a feature map code stream; The second feature map is the original feature map corresponding to the first image that has not been subjected to the first sampling process;

The third generation module is configured to encode and compress the first feature map to generate a feature map code stream when performing the first sampling process on all original feature maps corresponding to the first image.
An image processing device, the device is applied to the decoding end, and the device includes:

The decoding module is used to decode the obtained target code stream and obtain the third image;

An operation module, configured to perform a first operation on the third image to obtain a fourth image; the resolution of the third image is lower than the resolution of the fourth image;

Wherein, the first operation includes any one of the following:

Using a reversible neural network to perform second sampling processing on the third image to obtain a fifth image, and performing image enhancement processing on the fifth image to obtain the fourth image;

After performing image enhancement processing on the third image, a sixth image is obtained, and a reversible neural network is used to perform a second sampling process on the sixth image to obtain the fourth image.
The device according to claim 13, wherein the operation module is specifically used for:

Perform an upsampling process on the target image to obtain a third component, and determine the fourth component corresponding to the target image according to the component information; the target image is the third image or the sixth image;

The third component and the fourth component are subjected to an inverse discrete wavelet transform.
The device according to claim 13 or 14, wherein the decoding module is specifically used for:

Perform arithmetic decoding and inverse quantization on the obtained target code stream to obtain the second feature variable;

Use a neural network decoder to decompress the second feature variable to obtain the third image.
The device of claim 13, wherein the device further comprises:

The first determination module is used for the decoder to perform a first operation on the third feature map to determine all the features corresponding to the fourth image when the decoder decodes the obtained feature map code stream to obtain the third feature map. Reconstruct feature maps;

The second determination module is used for the decoding end to determine the third feature map and the partially reconstructed feature map based on the partially reconstructed feature map and the fourth feature map when decoding the obtained feature map code stream. All reconstructed feature maps corresponding to the fourth image; the fourth feature map is determined based on the first operation on the third feature map;

Wherein, the first operation includes any one of the following:

Using a reversible neural network to perform a second sampling process on the third feature map, and performing image enhancement processing on the third feature map after the second sampling process;

Perform image enhancement processing on the third feature map, and perform second sampling processing on the third feature map after the image enhancement processing.
A terminal, including a processor and a memory, the memory stores programs or instructions that can be run on the processor, and when the programs or instructions are executed by the processor, any one of claims 1-4 is implemented. The steps of the image processing method, or the steps of implementing the image processing method as described in any one of claims 5-8.
A readable storage medium that stores programs or instructions that, when executed by a processor, implement the steps of the image processing method according to any one of claims 1-4, or The steps of implementing the image processing method according to any one of claims 5-8.
A chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the image as described in any one of claims 1-4 The steps of the processing method, or the steps of implementing the image processing method according to any one of claims 5-8.
A computer program/program product, the computer program/program product is stored in a storage medium, and the computer program/program product is executed by at least one processor to implement the method as claimed in any one of claims 1-4 The steps of the image processing method, or the steps of implementing the image processing method according to any one of claims 5-8.