CN116258782A

CN116258782A - Image compression method, image encoding method, image decoding method and device

Info

Publication number: CN116258782A
Application number: CN202310105291.XA
Authority: CN
Inventors: 粘春湄; 戴亮; 江东; 林聚财; 金恒; 殷俊
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2023-02-07
Filing date: 2023-02-07
Publication date: 2023-06-13

Abstract

The application discloses an image compression method, an image encoding method, an image decoding method and an image decoding device. The image compression method comprises the following steps: based on the multiscale receptive field, carrying out convolution processing on the image features to be processed to obtain multiple context features of the image features to be processed; the image characteristics to be processed are determined based on the image characteristics of the image to be compressed; fusing the multiple context features to obtain sample features of the image features to be processed; and obtaining a compression result of the image to be compressed based on the sample characteristics of the image characteristics to be processed. The method and the device can effectively utilize neighborhood information in various ranges, and can effectively eliminate coding/decoding redundancy.

Description

Image compression method, image encoding method, image decoding method and device

Technical Field

The present invention relates to the field of image encoding and decoding technologies, and in particular, to an end-to-end image compression method, an image encoding method, an image decoding method, and an apparatus.

Background

The image encoding method and the image decoding method may include a context processing step of determining a sample feature of the image feature to be processed based on context information of the image feature to be processed. However, the related art extracts the context information of the sample to be processed using only a receptive field convolution of one size, i.e., using only neighborhood information within one range in the context model, and does not effectively eliminate encoding/decoding redundancy.

Disclosure of Invention

The application provides an image compression method, an image encoding method, an image decoding method and an image decoding device, which can effectively utilize neighborhood information in various ranges and can effectively eliminate encoding/decoding redundancy.

To achieve the above object, the present application provides an end-to-end image compression method, which includes:

based on the multiscale receptive field, carrying out convolution processing on the image features to be processed to obtain multiple context features of the image features to be processed; the image characteristics to be processed are determined based on the image characteristics of the image to be compressed;

fusing the multiple context features to obtain sample features of the image features to be processed;

and obtaining a compression result of the image to be compressed based on the sample characteristics of the image characteristics to be processed.

In order to achieve the above object, the present application further provides an end-to-end image encoding method, which includes:

obtaining image characteristics to be processed based on the image characteristics of the image to be compressed;

processing the image characteristics to be processed by the image compression method to obtain a compression result of the image to be compressed;

and obtaining the coding code stream of the image to be compressed based on the compression result.

In order to achieve the above object, the present application further provides an end-to-end image decoding method, which includes:

decoding a code stream of an image to be compressed to obtain image characteristics of the image to be compressed;

processing the image characteristics to be processed in the image characteristics of the image to be compressed by the image compression method to obtain a compression result of the image to be compressed;

and obtaining a decoded image of the code stream based on the compression result.

To achieve the above object, the present application also provides an encoder including a processor; the processor is configured to execute instructions to implement the steps of the above-described method.

To achieve the above object, the present application also provides a decoder including a processor; the processor is configured to execute instructions to implement the steps of the above-described method.

To achieve the above object, the present application also provides a computer readable storage medium storing instructions/program data capable of being executed to implement the above method.

In the image compression method, convolution processing is carried out on the image features to be processed based on the multiscale receptive fields, so that multiple contextual features of the image features to be processed are obtained, and then the multiple contextual features are fused to obtain sample features of the image features to be processed.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

FIG. 1 is a schematic diagram of an embodiment of an image codec network of the present application;

FIG. 2 is a flow chart of an embodiment of an image compression method of the present application;

FIG. 3 is a schematic diagram illustrating one embodiment of an image compression method of the present application;

FIG. 4 is a schematic diagram of another embodiment of an image compression method of the present application;

FIG. 5 is a schematic diagram of yet another embodiment of an image compression method of the present application;

FIG. 6 is a flow chart of another embodiment of an image compression method of the present application;

FIG. 7 is a flow chart of an embodiment of an image encoding method of the present application;

FIG. 8 is a flow chart of an embodiment of an image decoding method of the present application;

FIG. 9 is a schematic diagram of a further embodiment of an image compression method of the present application;

FIG. 10 is a schematic diagram of an embodiment of an encoder of the present application;

FIG. 11 is a schematic diagram of an embodiment of a decoder of the present application;

FIG. 12 is a schematic diagram of an embodiment of a computer-readable storage medium of the present application.

Detailed Description

The following description of the technical solutions in the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure. In addition, the term "or" as used herein refers to a non-exclusive "or" (i.e., "and/or") unless otherwise indicated (e.g., "or otherwise" or in the alternative "). Also, the various embodiments described herein are not necessarily mutually exclusive, as some embodiments may be combined with one or more other embodiments to form new embodiments.

An image compression method such as an image encoding method/an image decoding method can be performed by a model. For example, the image compression method may be performed using an end-to-end image codec that is entirely composed of a neural network.

Wherein, as shown in fig. 1, the codec model may include a main coding network and an entropy model network.

The primary encoding network may include transforms, inverse transform models, quantization, inverse quantization, entropy encoding, and/or entropy decoding.

In the transformation/inverse transformation network, a non-local attention module can be used, the element value of each feature is equal to the weighted sum of the original value and the weight of the whole feature position, and then the dimension is continuously reduced to reduce the data volume during transformation, namely the main feature of the original image is expressed by using a more compact expression, and the dimension and the data volume of the image are reduced. The inverse transformation network is a continuous dimension-increasing network, and the original data volume is restored.

Quantization is one of the links of lossy coding, and is used for shaping data and improving the compression rate. The inverse quantization (optional) is the opposite operation, but may not be done, as the strong non-linear capability through the neural network may involve the effect of the inverse quantization.

Entropy coding is a lossless process, and the probability of a sign bit in each feature is calculated mainly through a constructed probability model, and is coded into a binary representation to be written into a code stream. Whereas entropy decoding is the inverse of entropy encoding.

The entropy model network may include auxiliary transforms, auxiliary inverse transforms, quantization, inverse quantization, entropy encoding, entropy decoding, building probability models, prediction modules, and/or context modules. Wherein the remaining modules function similarly to the corresponding modules in the primary coding network, except for constructing probability models, prediction modules and/or context modules.

The probability model is built mainly by learning model parameters through a neural network and is used for calculating the probability of the to-be-coded characteristics of the main coding network.

In the auxiliary transformation/auxiliary inverse transformation, the characteristics are uniformly divided according to channels in a grouping convolution mode, then grouping is carried out according to fixed quantity, each group is convolved, and finally the groups are combined.

The prediction module can be used for differential coding so as to send the output of the auxiliary inverse transformation into the prediction module, and then the output and the quantized characteristics of the transformation are subjected to differential, and the differential is sent to coding. Alternatively, a uniform division of channels may also be performed in the prediction module.

And since each sample to be processed in the feature depends on the previous processed sample, the context model can be used to learn its correlation, reducing redundancy.

Although the related art image compression method considers image compression using the context feature. However, the related art only extracts the context features of the image based on convolution of receptive fields of one size, typically 5x5, 7x7, 11x11, etc., and the larger the convolution kernel size means that a larger neighborhood range is referenced, but the weaker the correlation of more distant information with the current point is, the more easily the prediction bias is caused; while a smaller convolution kernel means that a smaller neighborhood range is referenced, it is also possible to miss some strongly correlated neighbors. Thus, only neighborhood information in one range is utilized in the context model, coding/decoding redundancy is not effectively eliminated, and image processing efficiency is low.

Based on the above, the application provides a deep learning image compression method, which extracts the context characteristics of an image by using the receptive field convolution with various sizes, so that the neighborhood information in various ranges can be effectively utilized in a context model, the encoding/decoding redundancy can be effectively eliminated, and the image processing efficiency can be further improved.

Specifically, as shown in fig. 2, an image compression method proposed in the present application may include the following steps. It should be noted that the following step numbers are only for simplifying the description, and are not intended to limit the execution order of the steps, and the execution order of the steps of the present embodiment may be arbitrarily changed without departing from the technical idea of the present application.

S101: and carrying out convolution processing on the image features to be processed based on the multiscale receptive field to obtain multiple context features of the image features to be processed.

Optionally, in the image compression method, convolution processing can be performed on the image features to be processed based on the multiscale receptive field, so that multiple context features of the image features to be processed are obtained, so that the multiple context features are fused later to obtain sample features of the image features to be processed.

The image characteristics to be processed are determined based on the image characteristics of the image to be compressed. Alternatively, the image feature to be processed may be a pixel point or a pixel block in the image feature of the image to be compressed.

In a first possible manner, as shown in fig. 3, in step S101, a convolution kernel corresponding to the receptive field of each scale may be used to perform convolution processing on the image feature to be processed, so as to obtain multiple context features of the image feature to be processed. Wherein, the sizes of convolution kernels corresponding to the receptive fields of different scales are different. For example, convolution kernels of three sizes of 5x5, 7x7 and 11x11 may be trained in advance, in step S101, convolution processing may be performed on the image feature to be processed by using the convolution kernel of 5x5 to obtain a context feature of the image feature to be processed, convolution processing may be performed on the image feature to be processed by using the convolution kernel of 7x7 to obtain another context feature of the image feature to be processed, and convolution processing may be performed on the image feature to be processed by using the convolution kernel of 11x11 to obtain yet another context feature of the image feature to be processed, so by combining the receptive fields of different scales, neighborhood point information in different ranges may be effectively utilized, encoding/decoding redundancy may be effectively eliminated, and further image processing efficiency may be improved.

In this implementation, the step of convolving the image feature to be processed with a convolution kernel may include: determining the context information of the image features to be processed under the corresponding receptive fields by using a mask with the same size as the convolution kernel; and operating the context information of the image feature to be processed in the corresponding receptive field by using the convolution kernel to obtain the context feature corresponding to the convolution kernel.

The determination of the context information of the image feature to be processed in the corresponding receptive field by using the mask having the same size as the convolution kernel is understood to be the extraction of the information in the region centered on the image feature of the image feature to be processed in the image feature of the image to be compressed by using the mask. As can be seen, the receptive field corresponding to the convolution kernel correlates with the values of the various points in the mask corresponding to the convolution kernel. Specifically, the receptive field corresponding to the convolution kernel may refer to a region range corresponding to a region (hereinafter, referred to as an effective region for convenience of description) other than 0 in the mask corresponding to the convolution kernel in a region centered on the feature of the image to be processed. And the dimensions of the receptive field corresponding to the mask may be equal to the dimensions of the smallest circumscribed rectangle of the active area of the mask.

Considering that the image feature of the image to be compressed includes the sample to be decoded and the data of the sample to be decoded is invalid in the process of image decoding by using the image compression method, in order to avoid that the data of the sample to be decoded which is invalid affects the extraction of the context feature, when calculating the context feature in step S101, the convolution kernel may be used to calculate only the decoded sample in the neighborhood of the image feature to be processed (i.e. the current sample to be decoded) so as to relatively accurately predict the context feature of the image feature to be processed, thereby avoiding that the invalid sample data to be decoded affects the prediction of the image feature to be processed. Specifically, to implement an operation on only decoded samples in a neighborhood of an image feature to be processed using a convolution kernel, parameters in a mask corresponding to the convolution kernel may be set based on a prediction direction, so that context information of the image feature to be processed, taken through the mask corresponding to the convolution kernel, within the respective receptive field includes only decoded samples.

For example, as shown in fig. 4, a mask is placed on the image feature Y of the image to be compressed, and if the center of the mask is the image feature to be processed, a mask region centered on the image feature to be processed in the image feature of the image to be compressed may be determined, where the mask region includes the current sample to be decoded (i.e., the image feature to be processed, the horizontal line filled grid in fig. 4), the decoded sample (the diagonal line filled grid in fig. 4), and the remaining samples to be decoded (the small grid filled grid in fig. 4). In order to facilitate the operation of only decoded samples in the neighborhood of the image feature to be processed using the convolution kernel when predicting the image feature to be processed, the value of the region of the mask corresponding to the decoded sample may be set to a value other than 0 (e.g., 1), and the value of the region of the mask corresponding to the sample to be decoded may be set to 0. Specifically, as shown in fig. 4, assuming that the prediction direction is from left to right and from top to bottom, the values of points located right to the left and above the center point in the mask may be set to a value other than 0, and the values of the remaining points in the mask may be set to 0. In other embodiments, assuming that the prediction direction is from right to left and from bottom to top, the values of points in the mask located right and below the center point may be set to non-0 values, and the values of the remaining points in the mask may be set to 0. It is understood that the image features of the image to be compressed may be image features decoded in a bitstream.

Accordingly, in the image encoding process of the image compression method, a mask of each size is placed on the image feature of the image to be compressed, if the center of the mask is the image feature to be processed, a mask region with the image feature to be processed as the center in the image feature of the image to be compressed can be determined, the mask region includes the current sample to be encoded (i.e., the image feature to be processed), the encoded sample and the rest of the samples to be encoded, and the context information of the image feature to be processed extracted based on the mask can include only the encoded sample in the mask region. The image feature of the image to be compressed may be a feature obtained by processing the image, or may be the image itself.

In addition, the step of calculating the context information of the image feature to be processed in the corresponding receptive field by the convolution kernel to obtain the context feature corresponding to the convolution kernel may be: multiplying each point value in the context information of the image feature to be processed with the parameter of the corresponding point in the convolution kernel to obtain the product of each point in the context information of the image feature to be processed; and obtaining the context characteristics corresponding to the convolution kernel based on the product of all points of the context information of the image characteristics to be processed.

In a second implementation manner, the image feature to be processed is convolved by using at least one convolution kernel and at least two masks, so as to obtain multiple context features.

Wherein at least one convolution kernel corresponds to at least two masks. For the convolution kernels corresponding to at least two masks (hereinafter simply referred to as a first convolution kernel for convenience of description), the size of at least two masks corresponding to the first convolution kernel is equal to the size of the first convolution kernel. Only through the difference that the effective area setting of at least two masks that correspond with first convolution kernel, and the effective area of mask is the region that is not 0 in the mask, so can draw the different neighborhood information of pending image feature through different masks to can have multiple receptive fields when convolution processing, so can combine different receptive fields, effectively utilize the neighborhood point information in different scope, can effectual elimination coding/decoding redundancy, and then can improve image compression efficiency. In addition, the number of the convolution kernels for training can be reduced through the implementation mode, so that compared with the scheme of the first implementation mode, the model complexity is not excessively increased, correlations between different neighborhood ranges and points to be decoded can be effectively learned, and redundancy is reduced.

The scale of the receptive field corresponding to the mask may be equal to the scale of the smallest circumscribed rectangle of the effective area of the mask. Preferably, at least two masks corresponding to at least one first convolution kernel have different scales, so that the adjacent region point information in different ranges can be effectively utilized by combining the different scales of the receptive fields, and the encoding/decoding redundancy can be effectively eliminated. For example, there is a first convolution kernel having a scale of 11×11, which corresponds to three masks having scales of 5×5, 7×7, and 11×11, respectively.

And as described in the first possible manner, in order to facilitate, when predicting an image feature to be processed, an operation on only the processed image feature in the neighborhood of the image feature to be processed using the convolution kernel, the value of the region of the mask corresponding to the processed image feature may be set to a value other than 0 (for example, 1), and the value of the region of the mask corresponding to the image feature to be processed and the unprocessed image feature may be set to 0.

In this implementation manner, the step of performing convolution processing on the image feature to be processed by using at least one convolution kernel and at least two masks to obtain multiple context features may include: determining the context information of the image characteristics to be processed under the corresponding receptive fields by utilizing each mask corresponding to each convolution kernel; and carrying out operation on the context information corresponding to each mask by using the convolution kernel to obtain the context characteristics corresponding to each mask by using the convolution kernel.

In a specific example, as shown in fig. 5, three 11x11 masks are preset, the receptive field scales of the three masks are 5x5, 7x7, and 11x11, respectively, and the convolution kernel of 11x11 is trained. In step S101, performing an operation by using the context information of the mask with the receptive field scale of 5x5 in the 11x11 convolution kernel, to obtain a context feature Y3 corresponding to the 11x11 convolution kernel and the mask with the receptive field scale of 5x 5; performing operation by using the context information of the mask with the receptive field scale of 7x7 in the 11x11 convolution kernel to obtain a context feature Y2 corresponding to the 11x11 convolution kernel and the mask with the receptive field scale of 7x 7; and performing operation by using the context information of the 11x11 mask with the receptive field scale of the 11x11 convolution kernel to obtain a context feature Y1 corresponding to the 11x11 mask with the receptive field scale of the 11x11 convolution kernel.

When the convolution kernel is used for carrying out operation on the context information corresponding to each mask to obtain the context characteristics corresponding to each mask, the product of each point in the context information can be calculated and pre-stored, and when the context characteristics corresponding to each mask are determined, the product of the pre-stored corresponding points can be directly taken for calculation; in this way, for the same point in the context information corresponding to at least two masks of the first convolution kernel, the product of the two masks does not need to be repeatedly calculated, so that the influence on the model complexity and the calculation complexity caused by the masks with various receptive field sizes can be reduced, and the calculation amount of the achievable mode is almost the same as that of a single-scale calculation amount under the condition that only one convolution kernel is arranged.

In addition, considering that the data of the sample to be decoded (i.e., the sample to be processed) in the mask area are all invalid data in the image decoding process by using the image compression method, in order to avoid the influence of the invalid data on the context feature during the convolution processing in step S102, the parameter corresponding to the sample to be processed in the convolution kernel may be set to 0.

Alternatively, in the above implementation, the width and height of the convolution kernel may be the same or different. The number of kinds of convolution kernels may also be unlimited.

S102: and fusing the multiple context features to obtain sample features of the image features to be processed.

After obtaining the multiple context features, the multiple context features can be fused to obtain sample features of the image features to be processed.

The manner of fusion is not limited, and for example, a plurality of context features may be fused by any of the following methods.

In one possible approach, multiple contextual features may be weighted to obtain sample features of the image features to be processed. The weighting coefficient of each context feature is not limited, and can be preset or can be obtained through training. The sum of the weighting coefficients of all the contextual features may be equal to 1.

In another possible method, multiple context features may be stitched to obtain a stitched feature, and then the stitched feature is convolved to obtain a sample feature of the image feature to be processed. And fusing the multiple context features in a convolution fusion mode to obtain sample features of the image features to be processed.

S103: and obtaining a compression result of the image to be compressed based on the sample characteristics of the image characteristics to be processed.

After the sample characteristics of the image characteristics to be processed are obtained, a compression result of the image to be compressed can be obtained based on the sample characteristics of the image characteristics to be processed.

Optionally, in the case that the image feature to be processed is not the last feature in the image features of the image to be compressed, after obtaining the sample feature of the image feature to be processed, the image feature to be processed becomes the processed image feature, and the next feature of the image feature to be processed may be taken as the image feature to be processed, and return to step S101, so as to determine the sample feature of the feature by using steps S101 and S102; and repeatedly executing the steps of taking the next feature of the image features to be processed as the image features to be processed, and S101 and S102 until the image features to be processed are the last feature of the image features of the image to be compressed so as to determine the sample features of all the features in the image to be compressed, thereby obtaining the context processing result of the image to be compressed.

Alternatively, the context processing result of the image to be compressed may be output as the compression result of the image to be compressed. Or the context processing result of the image to be compressed can be processed to obtain the compression result of the image to be compressed.

In this embodiment, in the image compression method, convolution processing may be performed on the image feature to be processed based on the multiscale receptive field, so as to obtain multiple context features of the image feature to be processed, so that the multiple context features are fused later to obtain sample features of the image feature to be processed.

Consider that in an end-to-end image codec solution, the image features of the image to be compressed have channel dimensions in addition to spatial dimensions. In the context model of the related art, the same convolution kernel is adopted for processing all channels, so that the image processing efficiency is low.

Based on this, as shown in fig. 6, the present application proposes another image compression method to fully learn channel information and process image features to be processed using spatial information at the same time, which may include the following steps. It should be noted that the following step numbers are only for simplifying the description, and are not intended to limit the execution order of the steps, and the execution order of the steps of the present embodiment may be arbitrarily changed without departing from the technical idea of the present application.

S201: the image features of the image to be compressed are divided into a plurality of sub-features by channels.

In an optional implementation manner, based on the information distribution situation among channels in the image features of the image to be compressed, the image features of the image to be compressed are divided into a plurality of sub-features according to the channels, so that the plurality of sub-features are convolved respectively, and the convolution results of the plurality of sub-features are spliced to obtain target features of the image to be compressed.

Alternatively, the inter-channel information distribution condition refers to a distribution condition of image information among different channels in the feature, that is, the inter-channel information distribution condition may refer to a distribution condition of information amounts of image information among different channels in the feature. The amount of image information for at least one channel in the image characteristics of the image to be compressed may refer to the proportion of the restored original image in accordance with the at least one channel. For example, if the number of channels of the feature is 4, the first channel based on the feature can restore 40% of the image, the second channel based on the feature can restore 30% of the image, the third channel based on the feature can restore 20% of the image, and the fourth channel based on the feature can restore 10% of the image, the amount of image information in the four channels of the feature is 40%,30%,20%,10% in this order, and the inter-channel information distribution in the feature is (40%, 30%,20%, 10%).

In the optional implementation manner, based on the information distribution condition among channels in the image features of the image to be compressed, the image features of the image to be compressed are divided into a plurality of sub-features according to the channels so as to balance the information quantity of different sub-features, so that the convolution module corresponding to each sub-feature can learn the features, the convolution module can extract the information in the features, the convolution parameters in the convolution model can learn the features fully, the feature expression effect of the features can be improved, and the processing effect of the image can be improved. Illustratively, assuming that the image feature is divided into four sub-features according to step S201, the information amounts of the different sub-features may be equalized such that the difference between the image information amounts in the four sub-features and 25% respectively is within the difference threshold. The difference threshold may be set according to practical situations, and is not limited herein, and may be, for example, 1%, 5%, or 7%.

In a specific example, the engineer may manually determine the distribution of the inter-channel information in the image features of the image to be compressed according to the domain knowledge, and the engineer may divide the image features of the image to be compressed according to the determined distribution of the inter-channel information.

In another specific example, the image features of the image to be compressed may be processed by using a deep learning method, to determine the number of channels of each of the plurality of sub-features; the image feature of the image to be compressed is divided into a plurality of sub-features according to the number of channels of each of the plurality of sub-features. It will be appreciated that the deep learning method is also based on the number of channels per sub-feature determined by the inter-channel information distribution in the image features of the image to be compressed.

In yet another specific example, a channel number base value of each of the plurality of sub-features may be preset; processing the image features of the image to be compressed by using a deep learning method to determine the channel quantity offset value of each of the plurality of sub-features; adding the channel number basic value and the channel number offset value of each sub-feature to obtain the channel number of each sub-feature; dividing the image feature of the image to be compressed into a plurality of sub-features according to the number of channels of each of the plurality of sub-features.

In another optional implementation manner, the number of channels of each of the plurality of sub-features may be preset, and in step S201, the image feature of the image to be compressed may be divided into the plurality of sub-features according to the preset number of channels of each of the plurality of sub-features.

The number of channels of different sub-features may be equal or unequal, and is not limited herein.

The number of packets of the features may be set according to the actual situation, and is not limited herein. For example, 5, 8 or 10.

S202: and carrying out convolution processing on the image features to be processed in each sub-feature based on the multiscale receptive field to obtain multiple context features of the image features to be processed in each sub-feature.

See step S101, which is not described herein.

Alternatively, the coefficients of the same size convolution kernels for different sub-features may or may not be the same.

In an implementation manner, for each sub-feature, a corresponding convolution kernel may be trained in advance, so in step S202, various context information of each sub-feature may be checked by using the convolution corresponding to each sub-feature to perform convolution processing, so as to obtain various context features of the image feature to be processed in each sub-feature, and specific operation steps refer to S101.

In another implementation manner, at least one total convolution kernel may be trained for the image feature to be processed, then a convolution kernel corresponding to each sub-feature position is extracted from each total convolution kernel to obtain at least one convolution kernel corresponding to each sub-feature, and then various context information of each sub-feature is checked by using a convolution corresponding to each sub-feature to perform convolution processing to obtain various context features of the image feature to be processed in each sub-feature, and specific operation steps refer to S101.

S203: and fusing the multiple contextual characteristics of each sub-characteristic to obtain the sample characteristics of the image characteristics to be processed in each sub-characteristic.

In particular, refer to step S102, which is not described herein.

S204: and obtaining a compression result of the image to be compressed based on the sample characteristics of the image characteristics to be processed in each sub-characteristic.

When the image feature to be processed is not the last feature in the image features of the image to be compressed, after obtaining the sample feature of the image feature to be processed in the sub-features, the next feature of the image feature to be processed can be used as the image feature to be processed, and the step S202 is returned to, so that the sample feature of the feature is determined by using the steps S202 and S203; repeating the steps of S202 and S203, wherein the next feature of the image feature to be processed is taken as the image feature to be processed until the image feature to be processed is the last feature in the sub-features, so as to determine the sample features of all the features in each sub-feature, and further obtain the context processing result of the sub-features; and then the context processing results of the sub-features can be spliced to obtain the context processing results of the image to be compressed.

After the context processing result of the image to be compressed is obtained, the context processing result of the image to be compressed can be used as the compression result of the image to be compressed, or the context processing result of the image to be compressed is processed to obtain the compression result of the image to be compressed.

With continued reference to fig. 7, fig. 7 is a flowchart of an embodiment of an image encoding method provided in the present application.

As shown in fig. 7, the image encoding method of the present embodiment includes the steps of:

s301: and obtaining the image characteristics to be processed based on the image characteristics of the image to be compressed.

S302: and processing the image characteristics to be processed by the image compression method to obtain a compression result of the image to be compressed.

S303: and obtaining the coded code stream of the image to be compressed based on the compression result.

With continued reference to fig. 8, fig. 8 is a flowchart illustrating an embodiment of an image decoding method provided in the present application.

As shown in fig. 8, the image decoding method of the present embodiment includes the steps of:

s401: and decoding the code stream of the image to be compressed to obtain the image characteristics of the image to be compressed.

S402: by the image compression method, the image characteristics to be processed in the image characteristics of the image to be compressed are processed, and the compression result of the image to be compressed is obtained.

In the image decoding side, step S402 may include: the convolution processing can be carried out on at least partial samples comprising the image characteristics to be processed in the image to be compressed based on the multiscale receptive field, so that a plurality of intermediate characteristics of the image to be compressed are obtained; the data of the image features to be processed in the intermediate features corresponding to each receptive field are the context features corresponding to each receptive field, and the data of the processed image features in the intermediate features are equivalent to the data in the image features of the image to be compressed; fusing the plurality of intermediate features to obtain updated features of the image to be compressed, wherein data of the image features to be processed in the updated features of the image to be compressed are sample features of the image features to be processed; if the image feature to be processed is not the last feature in the image features of the image to be compressed, taking the updated feature of the image to be compressed as the image feature of the image to be compressed and taking the next feature of the image feature to be processed as the image feature to be processed; returning to execute a multiscale-based receptive field, and performing convolution processing on at least part of samples of the image to be compressed, which comprise the image features to be processed, to obtain a plurality of intermediate features of the image to be compressed, until the image features to be processed are the last feature in the image features of the image to be compressed, to obtain a context processing result of the image to be compressed; the context processing result of the image to be compressed can be used as the compression result of the image to be compressed, or the context processing result of the image to be compressed can be processed to obtain the compression result of the image to be compressed.

In addition, considering that the data of the processed image features in the intermediate features of the image to be compressed are unchanged, in the process of carrying out convolution processing on at least part of samples comprising the image features to be processed in the image to be compressed based on the multiscale receptive field, the sample features of the processed image features can be calculated only, i.e. the sample features of the image features to be processed and the sample features of the unprocessed image features can be calculated. The method for calculating the sample features of the unprocessed image features may refer to the method for calculating the sample features of the image features to be processed, which is not described herein.

In addition, when the compression result of the image to be compressed is determined by using the image compression method according to the second embodiment, based on the content of steps S102 and S203, it is known that the update feature of each sub-feature can be obtained based on step S402, and in this step, the update features of the plurality of sub-features can be spliced to obtain the update feature of the image to be compressed; and then taking the updated characteristic of the image to be compressed as the image characteristic of the image to be compressed, taking the next characteristic of the image characteristic to be processed as the image characteristic to be processed, returning to the step S201, dividing the image characteristic of the image to be compressed into a plurality of sub-characteristics according to the channel, sequentially executing the steps S201, S202, S203 and S204, and repeatedly executing the steps so as to sequentially obtain the sample characteristics of all the characteristics in the image characteristic of the image characteristic to be compressed, thereby obtaining the context processing result of the image to be compressed. Of course, in other embodiments, the sample features of all the features in each sub-feature may be determined first to obtain the context processing result of each sub-feature; and then splicing the context processing results of the plurality of sub-features to obtain the context processing result of the feature to be compressed.

In addition, as described in step S102, the plurality of intermediate features may be fused by weighting or convolution fusion or the like. For example, the plurality of intermediate features may be weighted directly to obtain updated features of the image to be compressed. Or, the intermediate features can be directly spliced and convolved in sequence to obtain the updated features of the image to be compressed.

S403: and obtaining a decoded image of the code stream based on the compression result.

In order to better explain the image decoding method of the present application, the following specific embodiments of image decoding are provided for exemplary illustration:

example 1

a. As shown in fig. 3, the receptive fields of three scales are set, convolved with convolution kernels of 5x5, 7x7, and 11x11, respectively, and the identification of the mask is similar, with respect to the convolution kernel size, identifying the decoded samples (diagonal filled portions), the image features to be processed (horizontal filled portions), and the remaining samples to be decoded (small grid filled regions).

b. A feature Y of 512x512x128 is input, wherein the (x, Y) position in the feature is the image feature to be processed, x=y=128. Feature Y copies were trisected and mask convolutions of 5x5, 7x7 and 11x11 were performed sequentially.

c. The convolution fusion (according to the channel dimension) operation is carried out on the output features Y1, Y2 and Y3 of the three scales, and finally the updated feature Y' of 512x512x128 is obtained.

d. As shown in fig. 4 and fig. 1, the updated feature Y' is taken as the feature of the image to be processed, the next sample to be decoded of the feature of the image to be processed is taken as the feature of the image to be processed, and the step a is returned to be executed to determine the context processing result of each sample in the feature of the image to be processed in turn, and then the decoded image is obtained based on the context processing result of the image.

Example 2

1. As shown in fig. 5, the receptive field of three scales is set, using only 11x11 convolution kernels, while the mask design is related to the size of 5x5,7x7, 11x11, identifying the decoded samples (diagonal filled-in portions), the image features to be processed (horizontal filled-in portions), and the remaining samples to be decoded (small grid filled-in regions).

2. A feature Y of 512x512x128 is input, wherein the (x, Y) position in the feature is the image feature to be processed, x=y=128. Feature Y copies were trisected and mask convolved 5x5,7x7, and 11x11 in sequence.

3. The three scale output features Y1, Y2, Y3 are convolutionally fused (in the channel dimension) to obtain the 512x512x128 feature Y'.

4. As shown in fig. 4 and fig. 1, the updated feature Y' is taken as the feature of the image to be processed, the next sample to be decoded of the feature of the image to be processed is taken as the feature of the image to be processed, and the process returns to step 1 to be executed, so as to sequentially determine the context processing result of each sample in the feature of the image to be processed, and then obtain the decoded image based on the context processing result of the image.

Example 3

And grouping the characteristics with the number of channels being N according to the channel dimension, wherein the sub-characteristics of each group are independent characteristics, and each independent characteristic is subjected to multi-scale convolution. The individual features after convolution are recombined. Efficient channel grouping means a different grouping mode strategy, and since efficient channels are generally gathered in the front and inefficient channels are generally gathered in the back, efficient channel regions can be finely grouped and inefficient channel regions can be coarsely grouped. As shown in fig. 9, the steps may be as follows:

1. a feature Y of 512x512x128 is input and grouped by channel number 16, 16, 32, 64 from front to back to obtain sub-features Y1, Y2, Y3, Y4.

2. The receptive fields of the three scales are set, convolved with convolution kernels of 3x3, 5x5 and 11x11, respectively, and the identification of the mask is similar, identifying the decoded samples (diagonal filled-in portions), the image features to be processed (horizontal filled-in portions) and the remaining samples to be decoded (small grid filled-in regions) in relation to the convolution kernel size.

3. Taking the sub-feature Y1 as an example, where the (x, Y) position in the sub-feature is the image feature to be processed, x=y=128. The sub-feature Y is copied in triplicate and mask convolutions of 3x3, 5x5 and 11x11 are performed sequentially.

4. The three scale output features y1_1, y1_2, y1_3 are convolved and fused (according to the channel dimension) to obtain 512x512x128 sub-feature Y1'.

5. Similarly, the sub-features Y2, Y3, Y4 are subjected to steps 2, 3, 4 to obtain sub-features Y2', Y3', Y4', which are then combined to finally output Y'.

Referring to fig. 10, fig. 10 is a schematic structural diagram of an embodiment of an encoder of the present application. The present encoder 10 includes a processor 12, the processor 12 being configured to execute instructions to implement the prediction method and the image encoding method described above. The specific implementation process is described in the above embodiments, and will not be described herein.

The processor 12 may also be referred to as a CPU (Central Processing Unit ). The processor 12 may be an integrated circuit chip having signal processing capabilities. Processor 12 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor 12 may be any conventional processor or the like.

Encoder 10 may further include a memory 11 for storing instructions and data necessary for processor 12 to operate.

The processor 12 is configured to execute instructions to implement the methods provided by any of the embodiments of the prediction method and the image encoding method of the present application and any non-conflicting combinations described above.

Referring to fig. 11, fig. 11 is a schematic structural diagram of an embodiment of a decoder of the present application. The present decoder 20 includes a processor 22, the processor 22 being configured to execute instructions to implement the prediction method and the image encoding method described above. The specific implementation process is described in the above embodiments, and will not be described herein.

The processor 22 may also be referred to as a CPU (Central Processing Unit ). The processor 22 may be an integrated circuit chip having signal processing capabilities. Processor 22 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The general purpose processor may be a microprocessor or the processor 22 may be any conventional processor or the like.

Decoder 20 may further include a memory 21 for storing instructions and data required for processor 22 to operate.

The processor 22 is configured to execute instructions to implement the methods provided by any of the embodiments of the prediction method and the image encoding method of the present application and any non-conflicting combinations described above.

Referring to fig. 12, fig. 12 is a schematic structural diagram of a computer readable storage medium according to an embodiment of the present application. The computer readable storage medium 30 of the embodiments of the present application stores instruction/program data 31, which when executed, implements the methods provided by any of the embodiments of the image compression method and the image encoding method of the present application, as well as any non-conflicting combination. Wherein the instructions/program data 31 may be stored in the storage medium 30 as a software product to enable a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium 30 includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes, or a terminal device such as a computer, a server, a mobile phone, a tablet, or the like.

In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.

The foregoing is only the embodiments of the present application, and not the patent scope of the present application is limited by the foregoing description, but all equivalent structures or equivalent processes using the contents of the present application and the accompanying drawings, or directly or indirectly applied to other related technical fields, which are included in the patent protection scope of the present application.

Claims

1. A method of end-to-end image compression, the method comprising:

2. The image compression method according to claim 1, wherein the convolution processing is performed on the image feature to be processed based on the multiscale receptive field to obtain multiple context features of the image feature to be processed, including:

performing convolution processing on the image features to be processed by utilizing convolution cores corresponding to the receptive fields of all scales to obtain various context features; wherein:

The sizes of the convolution kernels corresponding to the receptive fields of different scales are different.

3. The image compression method according to claim 1, wherein the convolution processing is performed on the image feature to be processed based on the multiscale receptive field to obtain multiple context features of the image feature to be processed, including:

and carrying out convolution processing on the image features to be processed by using at least one convolution kernel and at least two masks to obtain various context features, wherein the effective areas of the at least two masks corresponding to each convolution kernel are different, and the effective areas of the masks are areas which are not 0 in the masks.

4. The image compression method according to claim 3, wherein the compression direction of the image is from a first direction to a second direction, from a third direction to a fourth direction;

points in the mask where the value is not 0 are located in the third direction and in the positive first direction of the center point of the mask.

5. The image compression method of claim 1, wherein the fusing the plurality of contextual features comprises:

and carrying out convolution fusion on the multiple context features to obtain sample features of the image features to be processed.

6. The image compression method according to claim 1, wherein the convolution processing is performed on the image feature to be processed based on the multiscale receptive field to obtain multiple context features of the image feature to be processed, including:

dividing the image characteristics of the image to be compressed into a plurality of sub-characteristics according to the channel; based on the multiscale receptive field, carrying out convolution processing on the image features to be processed in each sub-feature to obtain multiple context features of the image features to be processed in each sub-feature;

the fusing the multiple context features to obtain sample features of the image features to be processed comprises the following steps: fusing the multiple context characteristics of the image characteristics to be processed of each sub-characteristic to obtain sample characteristics of the image characteristics to be processed of each sub-characteristic;

the obtaining a compression result of the image to be compressed based on the sample characteristics of the image to be processed comprises the following steps: and obtaining a compression result of the image to be compressed based on the sample characteristics of the image characteristics to be processed of each sub-characteristic.

7. The image compression method according to claim 6, wherein the dividing the image feature of the image to be compressed into a plurality of sub-features by channels comprises: based on the inter-channel information distribution condition in the image features of the image to be compressed, dividing the image features of the image to be compressed into a plurality of sub-features according to channels.

8. A method of end-to-end image coding, the method comprising:

processing the image characteristics to be processed by the image compression method according to any one of claims 1 to 7 to obtain a compression result of the image to be compressed;

9. A method of end-to-end image decoding, the method comprising:

processing image features to be processed in the image features of the image to be compressed by the image compression method of claim 1 to obtain a compression result of the image to be compressed;

10. The image decoding method according to claim 9, wherein the processing the image feature to be processed in the image feature of the image to be compressed by the image compression method according to claim 1, to obtain the compression result of the image to be compressed, includes:

Based on the multiscale receptive field, carrying out convolution processing on the image features to be processed to obtain a plurality of intermediate features of the image to be compressed; the data of the image features to be processed in the intermediate features corresponding to each receptive field are the context features corresponding to each receptive field, and the data of the processed image features in the intermediate features are equivalent to the data in the image features of the image to be compressed;

fusing the plurality of intermediate features to obtain updated features of the image to be compressed;

if the image feature to be processed is not the last feature in the image features of the image to be compressed, taking the updated feature of the image to be compressed as the image feature of the image to be compressed and taking the next feature of the image feature to be processed as the image feature to be processed;

and returning to the step of executing the multiscale-based receptive field, and carrying out convolution processing on the image features to be processed to obtain a plurality of intermediate features of the image to be compressed until the image features to be processed are the last features in the image features of the image to be compressed, so as to obtain the compression result.

11. An encoder, the encoder comprising a processor; the processor is configured to execute instructions to implement the steps of the method of claim 8.

12. A decoder, the decoder comprising a processor; the processor is configured to execute instructions to implement the steps of the method of claim 9 or 10.

13. A computer readable storage medium having stored thereon instruction/program data, wherein the instruction/program data when executed implement the steps of the method of any of claims 1-7 or 8 or 9-10.