CN114286113B

CN114286113B - Image compression recovery method and system based on multi-head heterogeneous convolution self-encoder

Info

Publication number: CN114286113B
Application number: CN202111605004.9A
Authority: CN
Inventors: 吴靖; 刘超; 陈爽; 白朝晖; 魏江; 王浩; 张艳; 王幸同; 常宏周
Original assignee: Beijing Yanfu Technology Co ltd; State Grid Shaanxi Electric Power Co Ltd Xixian New Area Power Supply Co; Global Energy Interconnection Research Institute
Current assignee: Beijing Yanfu Technology Co ltd; State Grid Shaanxi Electric Power Co Ltd Xixian New Area Power Supply Co; Global Energy Interconnection Research Institute
Priority date: 2021-12-24
Filing date: 2021-12-24
Publication date: 2023-05-30
Anticipated expiration: 2041-12-24
Also published as: CN114286113A

Abstract

The invention discloses an image compression recovery method and system based on a multi-head heterogeneous convolution self-encoder, comprising the following steps: processing an input original image based on a heterogeneous transformation method to obtain a heterogeneous image; coding an original image and a heterogeneous image based on a deep learning method of a convolution self-encoder to obtain an original image code and a heterogeneous image code; fusing and quantizing the original image code and the heterogeneous image code based on an attention mechanism to obtain a compressed image; decoding the compressed image based on a deep learning method of a decoder to obtain a restored image; based on the difference between the recovered image and the original image, constructing a loss function, and continuously iterating to be converged through training the loss function to obtain an optimal recovered image. The invention improves the compression quality of the picture by carrying out heterogeneous processing on the image and processing the image based on the attention mechanism, and has higher application value in the aspect of image transmission.

Description

Image compression recovery method and system based on multi-head heterogeneous convolution self-encoder

Technical Field

The invention belongs to the field of image processing, and relates to an image compression recovery method and system based on a multi-head heterogeneous convolution self-encoder.

Background

Image compression is mainly classified into lossy compression and lossless compression algorithms, and for image compression, the lossless compression ratio is generally small, and lossy compression algorithms are mainly used. Image compression is important in the process of rapid image transmission, and the higher the compression ratio is, the faster the image is transmitted, but the image compression ratio and the image fidelity are often required to be chosen and replaced. In recent years, deep learning has been increasingly applied in the field of image compression, but how to make the image compression content restore the image itself under the condition of a certain compression ratio is still a problem to be solved.

Disclosure of Invention

The invention aims to solve the problems in the prior art, and provides an image compression recovery method and system based on a multi-head heterogeneous convolution self-encoder, which are used for transforming images through a plurality of heterogeneous encoders and processing the images by combining an attention mechanism, so that the image fidelity is improved under a certain compression ratio, and the image compression recovery method and system have higher application value in the aspect of image compression in image transmission.

In order to achieve the purpose, the invention is realized by adopting the following technical scheme:

an image compression recovery method based on a multi-head heterogeneous convolution self-encoder comprises the following steps:

processing an input original image based on a heterogeneous transformation method to obtain a heterogeneous image;

the method comprises the steps of respectively encoding an original image and a heterogeneous image by using independent convolution self-encoders based on a deep learning method of the convolution self-encoders to obtain an original image code and a heterogeneous image code;

fusing and quantizing the original image code and the heterogeneous image code based on an attention mechanism to obtain a compressed image;

decoding the compressed image based on a deep learning method of a decoder to obtain a restored image;

based on the difference between the recovered image and the original image, constructing a loss function, and continuously iterating to be converged through training the loss function to obtain an optimal recovered image.

The invention further improves that:

based on the heterogeneous transformation method, the input original image is processed, specifically:

input of original image I ₀ The original image has dimensions of [ H ] ₀ ,W ₀ ,3]Carrying out heterogeneous transformation on the original image by using different heterogeneous transformation methods respectively to obtain different heterogeneous images;

the heterogeneous transformation method comprises a brightness random increase/decrease method, a hue random increase/decrease method and a contrast random increase/decrease method, and three different groups of heterogeneous images I are obtained based on the three heterogeneous methods ₁ ,I ₂ ,I ₃ 。

The method for deep learning based on the convolution self-encoder is characterized in that an original image and a heterogeneous image are respectively encoded by using independent convolution self-encoders, and the original image encoding and the heterogeneous image encoding are obtained, specifically:

will original image I ₀ And heterogeneous image I ₁ ,I ₂ ,I ₃ Respectively inputting the images into Python software to respectively obtain original images I ₀ And heterogeneous image I ₁ ,I ₂ ,I ₃ Mean and variance var of (a)For the original image I ₀ And heterogeneous image I ₁ ,I ₂ ,I ₃ Respectively carrying out normalization operation, and carrying out normalization operation on the original image I ₀ And heterogeneous image I ₁ ,I ₂ ,I ₃ The independent convolution self-encoder is used for carrying out convolution operation, downsampling and feature extraction respectively, and the original image code f is obtained ₀ And heterogeneous image coding f ₁ ,f ₂ ,f ₃ ；

The normalization operation is as shown in formula (1):

wherein i comprises 0, 1, 2 and 3.

Fusing and quantizing the original image code and the heterogeneous image code based on an attention mechanism to obtain a compressed image, wherein the compressed image specifically comprises:

encoding f of original image ₀ And heterogeneous image coding f ₁ ,f ₂ ,f ₃ Adding spatial attention, and coding f for the original image after adding the spatial attention ₀ And heterogeneous image coding f ₁ ,f ₂ ,f ₃ Global pooling and average pooling are respectively carried out, and the pooled results are respectively spliced with the original image coding matrix; encoding f of original image ₀ And heterogeneous image coding f ₁ ,f ₂ ,f ₃ Respectively performing convolution operation to obtain the dimension of [ H, W,1 ]]Features of (2)

And based on->

Generating spatial attention weights w by sigmoid ₀ Encoding f by original image ₀ And heterogeneous image coding f ₁ ,f ₂ ,f ₃ Respectively with the space attention weight w ₀ Matrix multiplication is carried out to obtain the spatial attention after the addition of the spatial attention/>

For the acquired added spatial attention

Add channel attention to

In [ H, W, C ]]Global pooling is performed on dimensions to obtain features ∈>

Generating weights z through full connection layer and sigmoid ₀ Will z ₀ And->

Multiplying and summing to obtain a characteristic f after adding the channel attention, and quantizing f to obtain a compressed image f _q Its dimension is [ H, W, C]；

The matrix dimensions of the original image code and the heterogeneous image code are H, W and C.

Decoding the compressed image based on a deep learning method of a decoder to obtain a restored image, wherein the method specifically comprises the following steps: for compressed image f _q Performing inverse quantization processing, up-sampling the inverse quantized image by using inverse convolution operation by a decoder, enlarging the dimension sizes of the inverse quantized image H and W, and finally enabling the output to recover the image

Dimension is [ H ] ₀ ,W ₀ ,3]And original image I ₀ The dimensions are the same.

The loss function is shown in formula (2):

where m is the number of points in the original image, the number of points in the original image and the recovered image are the same, x ₁ and x₂ Is the corresponding value of the original image and the restored image at the same point.

An image compression recovery system based on a multi-headed heterogeneous convolution self-encoder, comprising:

the image processing module is used for processing the input original image based on a heterogeneous transformation method to obtain a heterogeneous image;

the coding module is used for respectively coding the original image and the heterogeneous image by using an independent convolution self-encoder based on a deep learning method of the convolution self-encoder to obtain an original image code and a heterogeneous image code;

the fusion quantization module is used for fusing and quantizing the original image codes and the heterogeneous image codes based on an attention mechanism to obtain compressed images;

a decoding module for decoding the compressed image based on a deep learning method of the decoder to obtain a restored image,

and the loss function optimization module is used for constructing a loss function based on the difference between the restored image and the original image, and obtaining the optimal restored image by continuously iterating to be converged through training the loss function.

A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above method when the computer program is executed.

A computer readable storage medium storing a computer program which, when executed by a processor, performs the steps of the method described above.

Compared with the prior art, the invention has the following beneficial effects:

the present invention focuses the convolution self-encoder on the features of different aspects of the image by isomerising the image. Meanwhile, the attention mechanism is used for processing the image, so that the image compression quality is improved, the image compression method is more suitable for image compression under different shooting conditions, the image fidelity can be improved under a certain compression ratio, and the image compression method has higher application value in the aspect of image transmission.

Drawings

For a clearer description of the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a general flow chart of a multi-head heterogeneous convolution self-encoder based image compression recovery method according to an embodiment of the present invention;

FIG. 2 is a flowchart of another method for image compression recovery based on a multi-head heterogeneous convolutional self-encoder according to an embodiment of the present invention;

fig. 3 is a block diagram of an image compression recovery system based on a multi-head heterogeneous convolution self-encoder according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

In the description of the embodiments of the present invention, it should be noted that, if the terms "upper," "lower," "horizontal," "inner," and the like indicate an azimuth or a positional relationship based on the azimuth or the positional relationship shown in the drawings, or the azimuth or the positional relationship in which the inventive product is conventionally put in use, it is merely for convenience of describing the present invention and simplifying the description, and does not indicate or imply that the apparatus or element to be referred to must have a specific azimuth, be configured and operated in a specific azimuth, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and the like, are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance.

Furthermore, the term "horizontal" if present does not mean that the component is required to be absolutely horizontal, but may be slightly inclined. As "horizontal" merely means that its direction is more horizontal than "vertical", and does not mean that the structure must be perfectly horizontal, but may be slightly inclined.

In the description of the embodiments of the present invention, it should also be noted that, unless explicitly specified and limited otherwise, the terms "disposed," "mounted," "connected," and "connected" should be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.

The invention is described in further detail below with reference to the attached drawing figures:

referring to fig. 1 and 2, the invention discloses an image compression recovery method based on a multi-head heterogeneous convolution self-encoder, which comprises the following steps:

and step 1, processing an input original image based on a heterogeneous transformation method to obtain a heterogeneous image.

Wherein, let the input image be I ₀ Dimension is [ H ] ₀ ,W ₀ ,3]The heterogeneous transformation method r used in the present embodiment ₁ ,r ₂ ,r ₃ The brightness, hue and contrast are randomly increased and decreased, respectively, and the image obtained by heterogeneous transformation method is I ₁ ,I ₃ ,I ₃ 。

Let the input image be in RGB format,

wherein rand (0.8-1.2) represents that a random number between 0.8 and 1.2 is generated each time brightness conversion is performed, min (rand (0.8-1.2). Times.I ₀ 255) represents the para-rand (0.8-1.2) I ₀ The upper limit of each value of (c) is set to 255,

representing rounding down the numbers therein;

for the hue random increase and decrease, firstly, the image is required to be converted from the RGB format to the HSV format, firstly, the maximum value and the minimum value of RGB three channels are obtained, MAX=max (R, G, B), MIN=min (R, G, B),

wherein R, G, B represent I ₀ Three channels of the corresponding matrix, then the hue can be calculated

If the channel where the minimum value is located is R, h=h+120, and if the channel where the minimum value is located is G, h=h+120, performing random transform R on H ₂ ＝(H+rand(0，30))％360；

For contrast random transformation, first find image I ₀ Average avg=of the maximum and minimum values of (a)

And one of the difference value of the twoHalf=0.5 x (max (I ₀ )-min(I ₀ ) Then the new range is (max (avg-rand (0.8,1.2) ×diff, 0), min (avg+rand (0.8,1.2) ×diff, 255)), and the generated random number is used to determine I ₀ Mapping to a new range, then

And 2, respectively encoding the original image and the heterogeneous image by using independent convolution self-encoders based on a deep learning method of the convolution self-encoders to obtain an original image code and a heterogeneous image code.

Will original image I ₀ And heterogeneous image I ₁ ,I ₂ ,I ₃ Respectively inputting the images into Python software to respectively obtain original images I ₀ And heterogeneous image I ₁ ,I ₂ ,I ₃ Mean and variance var of (a) for the original image I ₀ And heterogeneous image I ₁ ,I ₂ ,I ₃ Respectively carrying out normalization operation, and carrying out normalization operation on the original image I ₀ And heterogeneous image I ₁ ,I ₂ ,I ₃ The independent convolution self-encoder is used for carrying out convolution operation, downsampling and feature extraction respectively, and the original image code f is obtained ₀ And heterogeneous image coding f ₁ ,f ₂ ,f ₃ ；

The normalization operation is as shown in formula (1):

wherein i comprises 0, 1, 2 and 3.

And 3, fusing and quantizing the original image code and the heterogeneous image code based on an attention mechanism to obtain a compressed image.

Wherein, the characteristic f after the original image is encoded ₀ For example, let the matrix dimension of image coding be [ H, W, C ]]. Spatial attention is first added, giving different weights to different spatial locations of each feature. For f ₀ Global pooling and average pooling are carried out, and then the pooled result and the original image code are spliced, and the obtained characteristic dimension is [ H, W, C+2 ]]The 5*5 convolution operation is then used to obtain a convolution with dimensions H, W,1]Features of (2)

Will->

Generating spatial attention weights w by sigmoid ₀ Will f ₀ and w₀ Matrix multiplication is performed to obtain +.>

The same procedure is carried out on the isomerised features, so that +.>

Channel attention is then added, making the content of the image itself more interesting. There are 3 isomerism operations, which will

In [ H, W, C ]]Global pooling is carried out on the dimension to obtain the dimension [4 ]]Features of->

Will->

Generating weight z corresponding to each heterogeneous feature through full connection layer and sigmoid ₀ Dimension [4 ]]Will z ₀ And->

Multiplying and summing to obtain the characteristic f after adding the channel attention, and quantizing f to obtain the image compression result f _q Its dimension is also [ H, W, C]。

The image compression at this time is composed of two parts, one isAnother part is quantization-induced compression due to spatial dimension reduction caused by downsampling. Image compression rate

wherein q₁ Is the number of bits when f is quantized, q ₀ Is the original image I ₀ The number of bits per se is generally 8.

And 4, decoding the compressed image based on a deep learning method of the decoder to obtain a restored image.

Wherein f is first quantized using inverse quantization _q Turning to a floating point number, the decoder then upsamples using multiple deconvolution operations, gradually expanding its H, W dimension size, ultimately restoring the output image

Dimension is [ H ] ₀ ,W ₀ ,3]The same dimensions as the original image. In the neural network training process, the restoration image +.>

And original image I ₀ The difference between them as a function of loss. Where the upsampling is done using bilinear interpolation operations, e.g. for two points (x ₀ ,y ₀ ),(x ₁ ,y ₁ ) The values are A and B, respectively, and if up-sampling operation is performed, it is necessary to perform a sampling operation in the middle (x ₂ ,y ₂ ),x ₁ >x ₂ >x ₀ A new point is inserted, then its value +.>

Step 5: based on the difference between the recovered image and the original image, constructing a loss function, and continuously iterating to be converged through training the loss function to obtain an optimal recovered image.

The loss function is shown in formula (2):

/>

Referring to fig. 3, the invention discloses an image compression recovery system based on a multi-head heterogeneous convolution self-encoder, which comprises:

The embodiment of the invention provides terminal equipment. The terminal device of this embodiment includes: a processor, a memory, and a computer program stored in the memory and executable on the processor. The steps of the various method embodiments described above are implemented when the processor executes the computer program. Alternatively, the processor may implement the functions of the modules/units in the above-described device embodiments when executing the computer program.

The computer program may be divided into one or more modules/units, which are stored in the memory and executed by the processor to accomplish the present invention.

The terminal equipment can be computing equipment such as a desktop computer, a notebook computer, a palm computer, a cloud server and the like. The terminal device may include, but is not limited to, a processor, a memory.

The processor may be a central processing unit (CentralProcessingUnit, CPU), but may also be other general purpose processors, digital signal processors (DigitalSignalProcessor, DSP), application specific integrated circuits (ApplicationSpecificIntegratedCircuit, ASIC), off-the-shelf programmable gate arrays (Field-ProgrammableGateArray, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like.

The memory may be used to store the computer program and/or module, and the processor may implement various functions of the terminal device by running or executing the computer program and/or module stored in the memory and invoking data stored in the memory.

The modules/units integrated in the terminal device may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a Read-only memory (ROM), a random access memory (RAM, randomAccessMemory), an electrical carrier signal, a telecommunication signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The image compression recovery method based on the multi-head heterogeneous convolution self-encoder is characterized by comprising the following steps of:

2. The image compression recovery method based on the multi-head heterogeneous convolution self-encoder according to claim 1, wherein the processing of the input original image based on the heterogeneous transformation method is specifically:

inputting an original image

The original image has dimensions +.>

Carrying out heterogeneous transformation on the original image by using different heterogeneous transformation methods respectively to obtain different heterogeneous images;

the heterogeneous transformation method comprises a brightness random increase/decrease method, a hue random increase/decrease method and a contrast random increase/decrease method, and three different groups of heterogeneous images are obtained based on the three heterogeneous methods

。

3. The image compression recovery method based on the multi-head heterogeneous convolutional self-encoder according to claim 2, wherein the deep learning method based on the convolutional self-encoder encodes an original image and a heterogeneous image respectively by using independent convolutional self-encoders, and obtains an original image encoding and a heterogeneous image encoding, specifically:

to the original image

And isomerised image->

Respectively inputting the images into Python software to respectively obtain original images

And isomerised image->

Mean>

Sum of variances->

For the original image->

And heterogeneous images

Respectively performing normalization operation, and performing +_on the original image after normalization operation>

And isomerised image->

The independent convolution self-encoder is used for carrying out convolution operation, downsampling and feature extraction respectively, and the original image coding is obtained

And heterogeneous image coding +.>

；

The normalization operation is as shown in formula (1):

wherein ,

including 0, 1, 2 and 3.

4. The image compression recovery method based on the multi-head heterogeneous convolution self-encoder according to claim 3, wherein the attention-based mechanism fuses and quantizes an original image code and a heterogeneous image code to obtain a compressed image, specifically:

encoding an original image

And heterogeneous image coding +.>

Respectively carrying out global pooling and average pooling, and respectively splicing the pooled results with the original image codes to obtain a characteristic dimension of +.>

The 5*5 convolution operation is then used to obtain a dimension of +.>

Features of->

And is based on->

Generating spatial attention weights by sigmoid +.>

Encoded by the original image +.>

And heterogeneous image coding +.>

Respectively with spatial attention weight->

Matrix multiplication is performed to obtain +.>

；

For the acquired added spatial attention

Add channel attention, for->

At the position of

Global pooling is performed on dimensions to obtain features ∈>

，/>

Generating weights by full connection layer and sigmoid>

Will->

And->

Multiplying and summing to obtain the feature +.>

For->

Quantization is carried out to obtain a compressed image +.>

Its dimension is->

；

The matrix dimensions of the original image code and the heterogeneous image code are respectively

。

5. The multi-headed based compound of claim 4The image compression recovery method of the deconvolution self-encoder is characterized in that the deep learning method based on the decoder decodes the compressed image to obtain a recovery image, and specifically comprises the following steps: for compressed image

Performing inverse quantization processing, up-sampling the inverse quantized image by using inverse convolution operation by a decoder, enlarging the dimension size of the inverse quantized image H and W, and finally enabling the output recovery image +.>

Dimension is->

And original image +.>

The dimensions are the same.

6. The image compression recovery method based on a multi-head heterogeneous convolutional self-encoder according to claim 5, wherein the loss function is as shown in formula (2):

wherein ,

is the number of pixels in the original image, the number of pixels in the original image is the same as the number of pixels in the restored image, < +.>

And

is the corresponding value of the original image and the restored image at the same pixel point.

7. An image compression recovery system based on a multi-head heterogeneous convolution self-encoder, comprising:

8. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1-6 when the computer program is executed.

9. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any of claims 1-6.