CN114286113A

CN114286113A - Image compression recovery method and system based on multi-head heterogeneous convolution self-encoder

Info

Publication number: CN114286113A
Application number: CN202111605004.9A
Authority: CN
Inventors: 吴靖; 刘超; 陈爽; 白朝晖; 魏江; 王浩; 张艳; 王幸同; 常宏周
Original assignee: Beijing Yanfu Technology Co ltd; State Grid Shaanxi Electric Power Co Ltd Xixian New Area Power Supply Co; Global Energy Interconnection Research Institute
Current assignee: Beijing Yanfu Technology Co ltd; State Grid Shaanxi Electric Power Co Ltd Xixian New Area Power Supply Co; Global Energy Interconnection Research Institute
Priority date: 2021-12-24
Filing date: 2021-12-24
Publication date: 2022-04-05
Anticipated expiration: 2041-12-24
Also published as: CN114286113B

Abstract

The invention discloses an image compression recovery method and system based on a multi-head heterogeneous convolution self-encoder, which comprises the following steps: processing an input original image based on a heterogeneous transformation method to obtain a heterogeneous image; coding an original image and an isomerism image based on a depth learning method of a convolution self-encoder to obtain an original image code and an isomerism image code; fusing and quantizing the original image code and the heterogeneous image code based on an attention mechanism to obtain a compressed image; decoding the compressed image based on a deep learning method of a decoder to obtain a restored image; and constructing a loss function based on the difference between the recovered image and the original image, and continuously iterating until convergence by training the loss function to obtain the optimal recovered image. The method improves the picture compression quality by carrying out heterogeneous processing on the image and processing the image based on the attention mechanism, and has higher application value in the aspect of image transmission.

Description

Image compression recovery method and system based on multi-head heterogeneous convolution self-encoder

Technical Field

The invention belongs to the field of image processing, and relates to an image compression recovery method and system based on a multi-head heterogeneous convolution self-encoder.

Background

The image compression is mainly divided into lossy compression and lossless compression algorithms, and for the image compression, the lossless compression ratio is generally very small, and the lossy compression algorithm is mainly used. Image compression is important in the process of image fast transmission, the higher the compression ratio is, the faster the image transmission is, but usually a trade-off is required between the image compression ratio and the image fidelity. In recent years, deep learning has been increasingly applied to the field of image compression, but how to make the compressed content of an image restore the image itself under a constant compression ratio is still a problem to be solved.

Disclosure of Invention

The invention aims to solve the problems in the prior art and provides an image compression recovery method and system based on a multi-head heterogeneous convolution self-encoder.

In order to achieve the purpose, the invention adopts the following technical scheme to realize the purpose:

the image compression recovery method based on the multi-head heterogeneous convolution self-encoder comprises the following steps:

processing an input original image based on a heterogeneous transformation method to obtain a heterogeneous image;

the depth learning method based on the convolution self-encoder is used for respectively encoding an original image and an image after isomerism by using an independent convolution self-encoder to obtain an original image code and an image code after isomerism;

fusing and quantizing the original image code and the heterogeneous image code based on an attention mechanism to obtain a compressed image;

decoding the compressed image based on a deep learning method of a decoder to obtain a restored image;

and constructing a loss function based on the difference between the recovered image and the original image, and continuously iterating until convergence by training the loss function to obtain the optimal recovered image.

The invention is further improved in that:

processing an input original image based on a heterogeneous transformation method, specifically:

inputting an original image I₀Dimension of original image is [ H ]₀,W₀,3]Respectively carrying out heterogeneous transformation on the original image by using different heterogeneous transformation methods to obtain different heterogeneous images;

the heterogeneous transformation method comprises a method of randomly increasing and decreasing brightness, a method of randomly increasing and decreasing hue and a method of randomly increasing and decreasing contrast, and is based on three methodsThe heterogeneous method obtains three groups of different heterogeneous images I₁,I₂,I₃。

The depth learning method based on the convolution self-encoder is used for respectively encoding an original image and an image after isomerism by using an independent convolution self-encoder to obtain an original image code and an image code after isomerism, and specifically comprises the following steps:

the original image I₀And heterogeneous images I₁,I₂,I₃Respectively inputting the images into Python software to respectively obtain original images I₀And heterogeneous images I₁,I₂,I₃Mean and variance var of (1), for the original image I₀And heterogeneous images I₁,I₂,I₃Respectively carrying out normalization operation, and carrying out normalization operation on the original images I₀And heterogeneous images I₁,I₂,I₃Respectively carrying out convolution operation and down-sampling and feature extraction by using independent convolution self-encoders to obtain an original image code f₀And heterogeneous image coding f₁,f₂,f₃；

The normalization operation is shown in equation (1):

where i includes 0, 1, 2, and 3.

Fusing and quantizing the original image code and the heterogeneous image code based on an attention mechanism to obtain a compressed image, which specifically comprises the following steps:

encoding of an original image f₀And heterogeneous image coding f₁,f₂,f₃Adding spatial attention, encoding the original image with spatial attention added₀And heterogeneous image coding f₁,f₂,f₃Respectively carrying out global pooling and average pooling, and splicing the pooled result with the original image coding matrix; encoding of an original image f₀And heterogeneous image coding f₁,f₂,f₃Respectively carrying out convolution operation to obtain the dimension of [ H, W,1 ]]Is characterized by

And based on

Generating spatial attention weight w by sigmoid₀By encoding of the original image f₀And heterogeneous image coding f₁,f₂,f₃Respectively with spatial attention weight w₀Performing matrix multiplication to obtain the data added with space attention

After paying attention to the acquired added space

Add channel attention, to

In [ H, W, C ]]Performing global pooling in dimension to obtain features

Generating weights z by fully connected layers and sigmoid₀Will z₀And

multiplying and summing to obtain the characteristic f after adding the attention of the channel, and quantizing the f to obtain a compressed image f_qWith dimensions [ H, W, C]；

The matrix dimensions of the original image coding and the heterogeneous image coding are [ H, W, C ].

The deep learning method based on the decoder decodes the compressed image to obtain a restored image, and specifically comprises the following steps: after compressingImage f of_qPerforming inverse quantization, performing upsampling on the image by a decoder by using deconvolution operation, enlarging dimension of H and W of the image subjected to inverse quantization, and finally outputting a restored image

Dimension of [ H ]₀,W₀,3]And an original image I₀The dimensions are the same.

The loss function is shown in equation (2):

where m is the number of points in the original image, the number of points in the original image and the restored image being the same, x₁ and x₂Is the corresponding value of the original image and the restored image at the same point.

The image compression recovery system based on the multi-head heterogeneous convolution self-encoder comprises:

the image processing module is used for processing an input original image based on a heterogeneous transformation method to obtain a heterogeneous image;

the encoding module is used for respectively encoding the original image and the heterogeneous image by using independent convolution self-encoders based on a deep learning method of the convolution self-encoders to acquire the original image encoding and the heterogeneous image encoding;

the fusion quantization module fuses and quantizes the original image code and the heterogeneous image code based on an attention mechanism to obtain a compressed image;

a decoding module for decoding the compressed image based on a deep learning method of a decoder to obtain a restored image,

and the loss function optimization module constructs a loss function based on the difference between the recovered image and the original image, and obtains the optimal recovered image by continuously iterating to converge through the training loss function.

A terminal device comprising a memory, a processor and a computer program stored in said memory and executable on said processor, said processor implementing the steps of the above method when executing said computer program.

A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.

Compared with the prior art, the invention has the following beneficial effects:

the invention makes the convolution self-encoder focus on the characteristics of different aspects of the image by carrying out isomerism on the image. Meanwhile, an attention mechanism is used for processing the image, the image compression quality is improved, meanwhile, the method can be more suitable for image compression under different shooting conditions, the image fidelity can be improved under a certain compression ratio, and the method has a high application value in the aspect of image transmission.

Drawings

In order to more clearly explain the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

FIG. 1 is a general flowchart of an image compression recovery method based on a multi-head heterogeneous convolutional auto-encoder according to an embodiment of the present invention;

FIG. 2 is another flowchart of an image compression recovery method based on a multi-head heterogeneous convolutional auto-encoder according to an embodiment of the present invention;

fig. 3 is a block diagram of an image compression recovery system based on a multi-head heterogeneous convolutional auto-encoder according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

In the description of the embodiments of the present invention, it should be noted that if the terms "upper", "lower", "horizontal", "inner", etc. are used for indicating the orientation or positional relationship based on the orientation or positional relationship shown in the drawings or the orientation or positional relationship which is usually arranged when the product of the present invention is used, the description is merely for convenience and simplicity, and the indication or suggestion that the referred device or element must have a specific orientation, be constructed and operated in a specific orientation, and thus, cannot be understood as limiting the present invention. Furthermore, the terms "first," "second," and the like are used merely to distinguish one description from another, and are not to be construed as indicating or implying relative importance.

Furthermore, the term "horizontal", if present, does not mean that the component is required to be absolutely horizontal, but may be slightly inclined. For example, "horizontal" merely means that the direction is more horizontal than "vertical" and does not mean that the structure must be perfectly horizontal, but may be slightly inclined.

In the description of the embodiments of the present invention, it should be further noted that unless otherwise explicitly stated or limited, the terms "disposed," "mounted," "connected," and "connected" should be interpreted broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

The invention is described in further detail below with reference to the accompanying drawings:

referring to fig. 1 and fig. 2, the invention discloses an image compression recovery method based on a multi-head heterogeneous convolutional self-encoder, comprising the following steps:

step 1, processing an input original image based on a heterogeneous transformation method to obtain a heterogeneous image.

Wherein, the input image is set as I₀Dimension of [ H ]₀,W₀,3]The heterogeneous transformation method r used in this embodiment₁,r₂,r₃The brightness, the hue and the contrast are respectively increased and decreased randomly, and an image obtained by a heterogeneous transformation method is I₁,I₃,I₃。

The input image is assumed to be in the RGB format,

wherein rand (0.8-1.2) represents a random number generated between 0.8 and 1.2 every time brightness conversion is performed, and min (rand (0.8-1.2) × I)₀255) represents para-rand (0.8-1.2) × I₀The upper limit of each of the values in (a) is set to 255,

indicating rounding down the numbers therein;

for hue random increase and decrease, firstly, the image needs to be converted from an RGB format to an HSV format, firstly, the maximum value and the minimum value of RGB three channels are obtained, MAX is MAX (R, G, B), MIN is MIN (R, G, B),

wherein R, G and B represent I₀A corresponding matrix of three channels, thenCalculating hue of color

If the channel where the minimum value exists is R, H is H +120, if the channel where the minimum value exists is G, H is H +120, H is randomly transformed by R₂＝(H+rand(0，30))％360；

For the random contrast transformation, first, image I is obtained₀Average avg of medium maximum and minimum values ═

And half diff of the difference between 0.5 max (I)₀)-min(I₀) Then the new range is (max (avg-rand (0.8,1.2) × diff,0), min (avg + rand (0.8,1.2) × diff,255)), I is divided using the generated random numbers₀To a new range, then

And 2, respectively coding the original image and the heterogeneous image by using independent convolution self-encoders based on the depth learning method of the convolution self-encoders to obtain the original image code and the heterogeneous image code.

The normalization operation is shown in equation (1):

where i includes 0, 1, 2, and 3.

And 3, fusing and quantizing the original image code and the heterogeneous image code based on an attention mechanism to obtain a compressed image.

Wherein, the feature f is coded by the original image₀For example, let the matrix dimension of image coding be [ H, W, C]. Spatial attention is first added, giving different weights to different spatial locations of each feature. To f₀Performing global pooling and average pooling, and splicing the pooled result with the original image code to obtain a feature dimension of [ H, W, C +2 ]]Then using 5-by-5 convolution operation to obtain dimension [ H, W,1 ]]Is characterized by

Will be provided with

Generating spatial attention weight w by sigmoid₀A 1 is to f₀ and w₀Matrix multiplication is carried out to obtain the product with added space attention

The same operation is carried out on the characteristics after the isomerization, and the characteristics can be obtained

Channel attention is then added, making the content of the image itself more interesting. The isomerization operation has 3 kinds in total, will

In [ H, W, C ]]Performing global pooling in dimension to obtain dimension of [4]Is characterized by

Will be provided with

Generating weight z corresponding to each heterogeneous feature through full connection layer and sigmoid₀Dimension of [4 ]]Will z₀And

multiplying and summing to obtain the characteristic f after adding the attention of the channel, and quantizing the f to obtain an image compression result f_qThe dimensions of which are likewise [ H, W, C ]]。

The image compression at this time is composed of two parts, one is the reduction of the spatial dimension due to down-sampling, and the other is the compression due to quantization. Image compression ratio

wherein q₁Is the number of bits in quantizing f, q₀Is an original image I₀The number of bits per se is generally 8.

And 4, decoding the compressed image based on a deep learning method of a decoder to obtain a recovered image.

Wherein f is first quantized using inverse quantization_qChanging into floating point number, the decoder then uses multiple deconvolution operations to perform upsampling, gradually enlarging its H, W dimension size, and finally outputting the restored image

Dimension of [ H ]₀,W₀,3]And the same dimension as the original image. Recovering images by computing output during neural network training

And an original image I₀As a function of the loss. In which upsampling is done using a bilinear interpolation operation, e.g. for two points (x) in the image₀,y₀),(x₁,y₁) The values of which are A and B, respectively, and are needed to be in the middle (x) if the up-sampling operation is performed₂,y₂),x₁>x₂>x₀Is inserted into a new point, then its value

And 5: and constructing a loss function based on the difference between the recovered image and the original image, and continuously iterating until convergence by training the loss function to obtain the optimal recovered image.

The loss function is shown in equation (2):

Referring to fig. 3, the invention discloses an image compression recovery system based on a multi-head heterogeneous convolutional auto-encoder, comprising:

The terminal device provided by the embodiment of the invention. The terminal device of this embodiment includes: a processor, a memory, and a computer program stored in the memory and executable on the processor. The processor realizes the steps of the above-mentioned method embodiments when executing the computer program. Alternatively, the processor implements the functions of the modules/units in the above device embodiments when executing the computer program.

The computer program may be partitioned into one or more modules/units that are stored in the memory and executed by the processor to implement the invention.

The terminal device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The terminal device may include, but is not limited to, a processor, a memory.

The processor may be a Central Processing Unit (CPU), other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, etc.

The memory may be used for storing the computer programs and/or modules, and the processor may implement various functions of the terminal device by executing or executing the computer programs and/or modules stored in the memory and calling data stored in the memory.

The terminal device integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer memory, Read-only memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The image compression recovery method based on the multi-head heterogeneous convolution self-encoder is characterized by comprising the following steps:

2. The image compression recovery method based on the multi-head heterogeneous convolution self-encoder according to claim 1, wherein the heterogeneous transformation method is used for processing an input original image, and specifically comprises:

inputting an original image I₀Dimension of original image is [ H ]₀，W₀，3]Respectively carrying out heterogeneous transformation on the original image by using different heterogeneous transformation methods to obtain different heterogeneous images;

the heterogeneous transformation method comprises a method of random increase and decrease of brightness, a method of random increase and decrease of hue and a method of random increase and decrease of contrast, and three groups of different heterogeneous images I are obtained based on the three heterogeneous methods₁，I₂，I₃。

3. The image compression recovery method based on the multi-head heterogeneous convolutional auto-encoder as claimed in claim 2, wherein the depth learning method based on the convolutional auto-encoder is used for respectively encoding the original image and the heterogeneous image by using independent convolutional auto-encoders to obtain the original image encoding and the heterogeneous image encoding, and specifically comprises:

the original image I₀And heterogeneous images I₁，I₂，I₃Respectively inputting the images into Python software to respectively obtain original images I₀And heterogeneous images I₁，I₂，I₃Mean and variance var of (1), for the original image I₀And heterogeneous images I₁，I₂，I₃Respectively carrying out normalization operation, and carrying out normalization operation on the original images I₀And heterogeneous images I₁，I₂，I₃Respectively carrying out convolution operation and down-sampling and feature extraction by using independent convolution self-encoders to obtain an original image code f₀And heterogeneous image coding f₁，f₂，f₃；

The normalization operation is shown in equation (1):

where i includes 0, 1, 2, and 3.

4. The image compression recovery method based on the multi-head heterogeneous convolution self-encoder according to claim 3, wherein the original image encoding and the heterogeneous image encoding are fused and quantized based on an attention mechanism to obtain a compressed image, specifically:

encoding of an original image f₀And heterogeneous image coding f₁，f₂，f₃Adding spatial attention, encoding the original image with spatial attention added₀And heterogeneous image coding f₁，f₂，f₃Respectively carrying out global pooling and average pooling, and splicing the pooled result with the original image coding matrix; encoding of an original image f₀And heterogeneous image coding f₁，f₂，f₃Respectively carrying out convolution operation to obtain the dimension of [ H, W,1 ]]Is characterized by

And based on

Generating spatial attention weight w by sigmoid₀By encoding of the original image f₀And heterogeneous image coding f₁，f₂，f₃Respectively with spatial attention weight w₀Performing matrix multiplication to obtain the data added with space attention

After paying attention to the acquired added space

Add channel attention, to

In [ H, W, C ]]Performing global pooling in dimension to obtain features

Generating weights z by fully connected layers and sigmoid₀Will z₀And

And the matrix dimensions of the original image coding and the heterogeneous image coding are [ H, W, C ].

5. The image compression recovery method based on the multi-head heterogeneous convolution self-encoder according to claim 4, wherein the depth learning method based on the decoder decodes the compressed image to obtain a recovered image, and specifically includes: for compressed image f_qPerforming inverse quantization, performing upsampling on the image by a decoder by using deconvolution operation, enlarging dimension of H and W of the image subjected to inverse quantization, and finally outputting a restored image

Dimension of [ H ]₀，W₀，3]And an original image I₀The dimensions are the same.

6. The image compression recovery method based on the multi-head heterogeneous convolution self-encoder according to claim 5, wherein the loss function is shown in formula (2):

7. An image compression recovery system based on a multi-head heterogeneous convolutional auto-encoder, comprising:

8. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1-6 when executing the computer program.

9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.