CN117422779A

CN117422779A - Image feature transformation processing method, image encoding method, and image decoding method

Info

Publication number: CN117422779A
Application number: CN202311149058.8A
Authority: CN
Inventors: 粘春湄; 施晓迪; 江东; 林聚财; 殷俊; 戴亮
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2023-09-06
Filing date: 2023-09-06
Publication date: 2024-01-19

Abstract

The application discloses a method for image feature transformation processing, an image encoding method, an image decoding device and a computer storage medium, wherein the method for image feature transformation processing comprises the following steps: acquiring original image characteristics to be processed; sequentially downsampling the original image features for multiple times through a transformation network of a main image processing network to obtain a first image feature, a second image feature and a third image feature, wherein the scale of the first image feature is larger than that of the second image feature, and the scale of the second image feature is larger than that of the third image feature; performing feature fusion on the first image feature, the second image feature and the third image feature to obtain a fourth image feature; and carrying out weighted fusion on the fourth image feature and the third image feature to obtain the to-be-processed transformation image feature. According to the method, the remote dependency relationship is constructed by fusing the shallow texture and the deep semantic information, the feature representation of the image features to be processed is improved, and the coding and decoding performance is improved.

Description

Image feature transformation processing method, image encoding method, and image decoding method

Technical Field

The present invention relates to the technical field of feature processing, and in particular, to a method for image feature transformation processing, an image encoding method, an image decoding device, and a computer storage medium.

Background

Traditional image coding and decoding technologies are designed for human visual characteristics, and with the superior performance of deep neural networks in various machine vision tasks, such as image classification, target detection, semantic segmentation and the like, a large number of artificial intelligence applications based on machine vision are emerging. In order to ensure that the performance of the machine vision task is not damaged due to the image coding process, a mode of analyzing before coding is adopted to meet the machine vision requirement, namely, lossless images are directly subjected to feature extraction through a neural network at an image acquisition end, then the extracted features are subjected to coding transmission, and a decoding end directly utilizes the decoded features to input the decoded features into a subsequent network structure so as to finish different machine vision tasks. Therefore, in order to save transmission bandwidth resources, it is necessary to study an image encoding method for machine vision.

However, the interdependence between deep features and shallow features is not fully considered in the feature processing algorithm in the current image encoding and decoding process, so that the optimal feature representation cannot be extracted.

Disclosure of Invention

The application provides a method for image feature transformation processing, an image encoding method, an image decoding device and a computer storage medium.

The technical scheme adopted by the application is to provide a method for image feature transformation processing, which comprises the following steps:

acquiring original image characteristics to be processed;

sequentially downsampling the original image features for multiple times through a transformation network of a main image processing network to obtain a first image feature, a second image feature and a third image feature, wherein the scale of the first image feature is larger than that of the second image feature, and the scale of the second image feature is larger than that of the third image feature;

performing feature fusion on the first image feature, the second image feature and the third image feature to obtain a fourth image feature;

and carrying out weighted fusion on the fourth image feature and the third image feature to obtain a to-be-processed transformation image feature.

The feature fusion of the first image feature, the second image feature and the third image feature to obtain a fourth image feature includes:

performing up-sampling processing on the third image feature, and then performing feature fusion with the second image feature to obtain a fifth image feature;

Performing up-sampling processing on the fifth image feature, and performing feature fusion with the first image feature to obtain a sixth image feature;

performing downsampling on the sixth image feature, and then performing feature fusion with the upsampling result of the fifth image feature to obtain a seventh image feature;

performing downsampling on the seventh image feature, and then performing feature fusion with the upsampling result of the third image feature to obtain an eighth image feature;

and carrying out downsampling on the eighth image feature to obtain the fourth image feature.

The fusion mode of the feature fusion is channel splicing.

The step of performing weighted fusion on the fourth image feature and the third image feature to obtain a to-be-processed transformed image feature includes:

inputting the fourth image feature and the third image feature into an adaptive fusion module to obtain fusion weights;

and fusing the third image feature and the fourth image feature by using the fusion weight to obtain a to-be-processed transformation image feature.

The inputting the fourth image feature and the third image feature into the adaptive fusion module to obtain a fusion weight includes:

Inputting the fourth image feature and the third image feature into an adaptive fusion module, extracting a first activated image feature of the third image feature through a first convolution activation module, and extracting a second activated image feature of the fourth image feature through a second convolution activation module;

fusing the first activated image features and the second activated image features, and extracting third activated image features through a third convolution activation module;

and normalizing the third activated image feature by using a normalization function of the adaptive fusion module to obtain the fusion weight.

Another technical solution adopted in the present application is to provide an image encoding method, which includes:

obtaining the transformed image characteristics of the image to be encoded by the image characteristic transformation processing method;

and coding the transformed image features through a coding module of a main coding network to obtain a feature code stream of the image coding to be coded.

The coding module for coding the transformed image features through the main coding network to obtain the feature code stream of the image coding to be coded comprises the following steps:

Inputting the transformed image characteristics and the auxiliary inverse transformation results of the transformed image characteristics by the entropy model network into a context predictor of the entropy model network, and obtaining context information of the transformed image characteristics;

inputting the context information into a probability model of the entropy model network, and acquiring distribution information output by the probability model;

the coding module of the main coding network is adopted to code the characteristics of the transformed image according to the distribution information, so as to obtain a characteristic code stream of the image coding to be coded;

the auxiliary inverse transformation result is a first auxiliary inverse transformation result or a fusion result of the first auxiliary inverse transformation result and a second auxiliary inverse transformation result; the first auxiliary inverse transformation result is output by a first entropy model network, the second auxiliary inverse transformation result is output by a second entropy model network, and the input of the second entropy model network is the image characteristic output by an auxiliary transformation network of the first entropy model network.

Wherein, the image coding method further comprises:

sequentially carrying out downsampling on the transformed image features for multiple times through an auxiliary transformation network of an entropy model network to obtain a first auxiliary image feature, a second auxiliary image feature and a third auxiliary image feature, wherein the scale of the first auxiliary image feature is larger than that of the second auxiliary image feature, and the scale of the second auxiliary image feature is larger than that of the third auxiliary image feature;

Performing feature fusion on the first auxiliary image feature, the second auxiliary image feature and the third auxiliary image feature to obtain a fourth auxiliary image feature;

weighting and fusing the fourth auxiliary image feature and the third auxiliary image feature to obtain a to-be-processed transformation auxiliary image feature;

and sequentially passing the transformed auxiliary image features through an encoding module and a decoding module of the entropy model network to obtain auxiliary inverse transformation results of the transformed image features.

Another technical solution adopted in the present application is to provide another image encoding method, where the image encoding method includes:

obtaining the transformation image characteristics of an image to be coded;

inputting the transformed image features into a context model of an entropy model network, and extracting the context features of different scales of the transformed image features by using mask convolution of different scales;

weighting and fusing the context features of different scales to obtain the context information of the transformed image features;

and an encoding module of a main encoding network is adopted to encode the characteristics of the transformed image according to the distribution information, so as to obtain the characteristic code stream of the image encoding to be encoded.

The step of carrying out weighted fusion on the context features of different scales to obtain the context information of the transformed image features includes:

extracting the activated image features from the context features with different scales through a convolution activation module, and then splicing and fusing the activated image features to obtain context fusion features;

normalizing the context fusion characteristics by using a normalization function to obtain fusion weights;

and fusing the context features of different scales by using the fusion weights to perform weighted fusion, and obtaining the context information of the transformed image features.

Another technical solution adopted in the present application is to provide an image decoding method, which includes:

decoding the characteristic code stream through a decoding module of a main decoding network to obtain the decoded image characteristics of the characteristic code stream;

acquiring the inverse transformation image characteristics of the decoded image characteristics through the inverse process of the image characteristic transformation processing method;

and obtaining a decoded image corresponding to the characteristic code stream according to the inverse transformation image characteristics.

Another technical solution adopted in the present application is to provide another image decoding method, where the image decoding method includes:

Acquiring a characteristic code stream of an image to be decoded and transforming image characteristics;

and decoding the characteristic code stream according to the distribution information by a decoding module of a main decoding network to obtain the image to be decoded.

Another technical solution adopted by the present application is to provide an image encoding device, which includes a memory and a processor coupled to the memory;

wherein the memory is configured to store program data, and the processor is configured to execute the program data to implement the method of image feature transformation processing and/or the image encoding method as described above.

Another technical solution adopted by the present application is to provide an image decoding device, which includes a memory and a processor coupled to the memory;

Wherein the memory is configured to store program data, and the processor is configured to execute the program data to implement the method of image feature transformation processing and/or the image decoding method as described above.

Another aspect adopted by the present application is to provide a computer storage medium for storing program data, which when executed by a computer, is used to implement the method of image feature transformation processing, the image encoding method and/or the image decoding method as described above.

The beneficial effects of this application are: the image feature transformation processing device sequentially performs downsampling processing on original image features for a plurality of times through a transformation network of a main image processing network to obtain a first image feature, a second image feature and a third image feature, wherein the scale of the first image feature is larger than that of the second image feature, and the scale of the second image feature is larger than that of the third image feature; performing feature fusion on the first image feature, the second image feature and the third image feature to obtain a fourth image feature; and carrying out weighted fusion on the fourth image feature and the third image feature to obtain the to-be-processed transformation image feature. According to the method, the remote dependency relationship is constructed by fusing the shallow texture and the deep semantic information, the feature representation of the image features to be processed is improved, and the coding and decoding performance is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an image end-to-end codec provided in the present application;

FIG. 2 is a schematic diagram of an embodiment of an image end-to-end codec provided in the present application;

FIG. 3 is a schematic block diagram of an efficient image end-to-end codec provided herein;

FIG. 4 is a flow chart of an embodiment of a method for image feature transformation processing provided herein;

FIG. 5 is a schematic diagram of an embodiment of a multi-scale feature pyramid and adaptive fusion module in a transformation network provided herein;

FIG. 6 is a schematic structural diagram of an embodiment of a multi-scale feature pyramid and adaptive fusion module in an inverse transformation network provided herein;

FIG. 7 is a schematic structural diagram of an embodiment of an adaptive fusion module provided herein;

FIG. 8 is a schematic structural view of a particular embodiment of a multi-scale feature pyramid provided herein;

FIG. 9 is a schematic structural diagram of an embodiment of an adaptive fusion module provided in the present application;

FIG. 10 is a flowchart illustrating an embodiment of an image encoding method provided herein;

FIG. 11 is a flowchart illustrating another embodiment of an image encoding method provided herein;

FIG. 12 is a schematic diagram of an embodiment of a multi-scale context feature adaptive fusion module provided herein;

FIG. 13 is a schematic structural diagram of an embodiment of a multi-scale context feature adaptive fusion module provided herein;

FIG. 14 is a flowchart of an embodiment of an image decoding method provided in the present application;

FIG. 15 is a flowchart of another embodiment of an image decoding method provided herein;

FIG. 16 is a schematic view of an embodiment of an image encoding device provided in the present application;

FIG. 17 is a schematic diagram of an embodiment of an image decoding apparatus provided in the present application;

fig. 18 is a schematic structural diagram of an embodiment of a computer storage medium provided in the present application.

Detailed Description

The following description of the technical solutions in the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

Referring specifically to fig. 1 and fig. 2, fig. 1 is a schematic structural diagram of an image end-to-end codec provided in the present application, and fig. 2 is a schematic structural diagram of an image end-to-end codec provided in the present application.

As shown in fig. 1 and fig. 2, the image end-to-end codec provided in the present application mainly includes the following structures: a master codec network and an entropy model network. Under the main codec network, transformation and inverse transformation, quantization and inverse quantization, entropy encoding and entropy decoding are included. Wherein,

(1) The transformation mainly adopts a convolutional neural network to carry out nonlinear downsampling, and has the effects of expressing main characteristics of an original image by using a more compact expression and reducing the dimension and the data volume of the image, and the inverse transformation is used for recovering the original image from the compact expression.

(2) Quantization is one of the lossy coding links, which performs integer quantization on the data to increase the compression rate, while inverse quantization (optional) is the opposite operation, but may not be performed, because the inverse quantization effect may be included by the strong nonlinear capability of the neural network.

(3) Entropy coding is a lossless process, and by means of the constructed probability model, the probability of the sign bit in each feature is calculated and encoded into a binary representation, which is written into the code stream, while entropy decoding is the inverse process.

In the entropy model network, auxiliary transformation and auxiliary inverse transformation, quantization and inverse quantization, entropy coding and entropy decoding are included, and a probability model is built. Wherein, except for constructing a probability model, the functions of the other modules and the main coding network type. The probability model is built mainly by learning model parameters through a neural network and is used for calculating the probability of the main coding network to-be-imaged features.

Further, as shown in fig. 2, the image end-to-end codec of the embodiment of the present application may further include: multi-entropy model network, context model, pre-processing, entropy coding acceleration, post-processing enhancement, etc. Wherein,

(1) Preprocessing, namely dividing an image into 512 x 512 blocks, vertically turning and rotating an input image to generate 8 copies, inputting each copy into a network framework, calculating rate distortion RD, and recording an index with minimum RD.

(2) The entropy coding is accelerated, and only the effective channel, namely the characteristic channel which is not all 0, is coded, and the index is coded.

(3) And (3) a multi-entropy model network, namely adding an entropy model network 2, and fitting a probability model for the entropy model network 1.

(4) Context predictor. The section contains a context model and prediction network:

a) Context model. Since each feature point to be imaged in the feature depends on the previous imaged feature point, the context model can learn its correlation, reducing redundancy.

b) A network is predicted. The output of the transformation network in the principal codec network is the principal latent feature representation, and the prediction network aims to estimate and difference from this latent representation prediction value, which encodes the residual.

(5) A predictor. Only the prediction network is included in order to estimate the potential feature representation of the secondary transformation network for the purpose of residual coding, without the inclusion of a context model.

(6) Post-processing, namely eliminating the reconstructed blocking effect, artifacts, ringing effect and the like through a pre-trained enhancement module based on a neural network.

The loss function form adopted when NIC (neural image codec, neural Image Captioning) joint optimization is as follows:

wherein,and->Representing the characteristics to be coded, respectively, main coding network, entropy model network 1 and entropy model network 2 +.>And->Code rate estimation of->For measuring the original image x and the reconstructed image +.>Generally using mean square error or structural similarity, lambda is used to balance code rate and distortion.

In the image end-to-end codec shown in fig. 1 and fig. 2, transformation, inverse transformation, auxiliary transformation and/or auxiliary inverse transformation can be performed by fusing shallow detail texture information and deep semantic information of image features based on a multi-scale feature pyramid structure, a remote dependency relationship is established, then corresponding weights are distributed for output adaptation of a transformation network and a feature pyramid network through learning of a neural network, weighted fusion is performed, feature information is gradually improved, information loss caused by sampling operation is reduced, optimal feature representation is obtained, and meanwhile, better images can be reconstructed based on the feature representation, so that coding and decoding performances are improved.

Based on the image end-to-end codec shown in fig. 1 and fig. 2, the present application proposes a high-efficiency image end-to-end codec, and specifically please refer to fig. 3, fig. 3 is a schematic block diagram of the high-efficiency image end-to-end codec provided in the present application. The codec main improvement point of the present application includes transform/inverse transform based on the NIC shown in fig. 1 and 2.

Specifically, regarding improved transformation/inverse transformation, the application adds an adaptive multi-scale feature fusion module in transformation and inverse transformation, extracts shallow and deep information in a transformation network by utilizing a feature pyramid structure, establishes a remote dependency relationship, and then distributes weights through the adaptive fusion module for weighted fusion.

Based on the image end-to-end codec shown in fig. 3, the present application proposes a method for image feature transformation processing, and specifically referring to fig. 4, fig. 4 is a flowchart of an embodiment of the method for image feature transformation processing provided in the present application.

The main improvement points of the method for image feature transformation processing shown in fig. 4 are based on improvement of a transformation module in at least one of transformation, inverse transformation, auxiliary transformation and auxiliary inverse transformation. In the implementation of the present application, the image feature transformation processing of the embodiments of the present application is applied to any one of the transformation network, the inverse transformation network, the auxiliary transformation network, and the auxiliary inverse transformation network, which may be shown in fig. 3.

In particular, in the encoding scenario, one or more of the transformation network, the secondary transformation network and the secondary inverse transformation network may employ the network structure and the network logic mentioned in the image feature transformation processing method, and the respective combinations of possibilities are not listed here.

In the decoding scenario, one or more of the inverse transform network, the auxiliary transform network, and the auxiliary inverse transform network may employ the network structure and network logic mentioned in the image feature transform processing method, and the respective combinations of the possibilities are not listed here.

The following describes the main improvement point in detail with reference to an embodiment of the method for image feature transformation processing shown in fig. 4:

as shown in fig. 4, the method for image feature transformation processing according to the embodiment of the present application includes the following steps:

step S11: and acquiring the characteristics of the original image to be processed.

Step S12: and sequentially carrying out downsampling processing on the original image features for a plurality of times through a transformation network of the main image processing network to obtain a first image feature, a second image feature and a third image feature, wherein the scale of the first image feature is larger than that of the second image feature, and the scale of the second image feature is larger than that of the third image feature.

In this embodiment of the present application, referring to fig. 5 specifically, a multi-scale feature pyramid and adaptive fusion module in a transformation network of the present application, according to a specific structure of fig. 5, a flow of multi-scale feature adaptive fusion in the transformation network is introduced as follows:

the transformation network in the main image processing network acquires the input of a multi-scale feature pyramid, wherein the input of the multi-scale feature pyramid is the feature after downsampling at each stage of the transformation network, and the multi-scale feature pyramid has different scale changes. The shallow layer features have larger x scale and contain more detail information, and the deep layer features have smaller z scale and contain more semantic information. As shown in fig. 5, x in the figure is a first image feature, y is a second image feature, and z is a third image feature.

It should be noted that, correspondingly, referring to fig. 6 specifically, the multi-scale feature pyramid and the adaptive fusion module in the inverse transformation network, the workflow of the multi-scale feature pyramid and the adaptive fusion module in the inverse transformation network and the workflow of the multi-scale feature pyramid and the adaptive fusion module in the transformation network are the inverse process or the inverse operation.

Step S13: and carrying out feature fusion on the first image feature, the second image feature and the third image feature to obtain a fourth image feature.

In the embodiment of the present application, please continue to refer to fig. 5, where a dashed box a in fig. 5 is an end-to-end codec transformation network, and a dashed box B is a multi-scale feature pyramid and feature adaptive fusion module.

The multi-scale feature pyramid specifically includes deep-to-shallow fusion paths and shallow-to-deep fusion paths.

Specifically, in a deep-to-shallow fusion path, the deep feature z is upsampled and then fused with the middle layer feature y, the fused feature is upsampled and then fused with the shallow layer feature x, and finally the upsampled output x1 is obtained.

In a shallow-to-deep fusion path, the output feature x1 in the fusion path is subjected to downsampling and then fused with y1, the fused feature is subjected to downsampling and then fused with z1, and finally the fourth image feature is output through upsampling z 2.

The upsampling method in the two merging paths includes, but is not limited to: transposed convolution, linear interpolation, etc.; downsampling means include, but are not limited to, convolution with a step size of 2, pooling, etc.: the manner of fusion includes, but is not limited to: channel addition, channel stitching, etc.

Step S14: and carrying out weighted fusion on the fourth image feature and the third image feature to obtain the to-be-processed transformation image feature.

In the embodiment of the present application, the process of feature adaptive fusion of the present application is as follows:

the feature self-adaptive fusion module in fig. 5 performs weighted fusion on the output z of the NIC original transformation network and the output feature z2 processed by the feature pyramid according to the importance degree. Wherein, the importance degree is adaptively judged by the neural network through learning.

In a specific embodiment, referring to fig. 7, fig. 7 is a schematic structural diagram of an embodiment of the adaptive fusion module provided in the present application.

As shown in fig. 7, the input feature 1 (i.e. the output z of the original transformation network) and the input feature 2 (i.e. the output feature z2 processed by the feature pyramid) are respectively fused after passing through the L-layer and M-layer neural networks and the activation function, then pass through the N-layer neural networks and the activation function and obtain weight distribution through the normalization function, and finally the weight is multiplied by the corresponding input and fused to obtain the final output feature. Wherein L, M and N are integers and may be the same or different.

It should be noted that the neural network in the adaptive fusion module of the present application includes, but is not limited to: linear networks, convolutional networks, etc.; activation functions include, but are not limited to: reLU, sigmoid, etc.; fusion means include, but are not limited to: channel addition, channel splicing and the like; normalization functions include, but are not limited to: softmax, etc.

The following describes a specific flow of a method for image feature transformation processing of the present application in a specific embodiment:

for example, the dimensions of the input pictures are [3, 256, 256], and the third, fourth, and fifth downsampled features x, y, z in the NIC transform network are taken as inputs to the feature pyramid structure, with dimensions of [192, 64, 64], [192, 32, 32] and [192, 16, 16]. The up-sampling in the feature pyramid structure is realized by using 5×5 transpose convolution with the step length of 2, the down-sampling is realized by using 5×5 regular convolution with the step length of 2, and the fusion mode is channel splicing, as shown in fig. 8.

First, the fusion path from deep to shallow: the deep feature z is up-sampled to obtain z1, and the dimension of the deep feature z is [192, 32, 32]; splicing the z1 and the middle layer characteristic y along the channel dimension, and carrying out up-sampling and channel reduction to obtain y1, wherein the dimension is [192, 64, 64]; y1 and shallow features x are spliced along the channel dimension, and then up-sampled and down-sampled to obtain x1, the dimension of which is [192, 128, 128].

The next is the shallow to deep fusion path: downsampling x1 to obtain x2 with dimensions [192, 64, 64]; splicing x2 and y1 along the channel, downsampling, and reducing the channel to obtain y2, wherein the dimension is [192, 32, 32]; and then y2 and z1 are spliced along the channel, and z2 is obtained after downsampling and lowering the channel, and the dimension is 192, 16 and 16.

And finally, performing feature self-adaptive fusion, wherein z and z2 are used as the input of a self-adaptive fusion module to perform feature fusion.

The adaptive feature fusion in this example is shown in fig. 9, where the input features are z and z2, and the two are respectively sent to a normal convolution with a step size of 1 and a back-porch channel concatenation of a ReLU activation function, then through 3 groups of 1×1 convolution and activation functions, a weight matrix is calculated through a softmax function, and the weights are multiplied by the corresponding input features z and z2 and added to obtain a final output. The final output is fed into a priori for subsequent computation.

In the embodiment of the application, the image feature transformation processing device sequentially performs downsampling processing on original image features for a plurality of times through a transformation network of a main image processing network to obtain a first image feature, a second image feature and a third image feature, wherein the scale of the first image feature is larger than that of the second image feature, and the scale of the second image feature is larger than that of the third image feature; performing feature fusion on the first image feature, the second image feature and the third image feature to obtain a fourth image feature; and carrying out weighted fusion on the fourth image feature and the third image feature to obtain the to-be-processed transformation image feature. According to the method, the remote dependency relationship is constructed by fusing the shallow texture and the deep semantic information, the feature representation of the image features to be processed is improved, and the coding and decoding performance is improved.

Referring to fig. 10, fig. 10 is a schematic flow chart of an embodiment of an image encoding method according to the present application.

As shown in fig. 10, the image encoding method of the embodiment of the present application includes the steps of:

step S21: and obtaining the transformed image characteristics of the image to be encoded by an image characteristic transformation processing method.

In this embodiment, the method of image feature transformation processing is described in detail in the method of image feature transformation processing shown in fig. 4, and will not be described here.

Step S22: and coding the characteristics of the transformed image through a coding module of the main coding network to obtain a characteristic code stream of the image coding to be coded.

The image coding device utilizes a coding module in the main coding network to code the characteristics of the transformed image extracted by the transformation network shown in fig. 5, thereby obtaining the characteristic code stream of the image to be coded.

On the basis, the application can also input the transformed image characteristics and the auxiliary inverse transformation results of the transformed image characteristics by the entropy model network into a context predictor of the entropy model network to acquire the context information of the transformed image characteristics; inputting the context information into a probability model of the entropy model network, and acquiring distribution information output by the probability model; and the coding module of the main coding network is used for coding the characteristics of the transformed image according to the distribution information to obtain the characteristic code stream of the image coding to be coded.

In this process, the image encoding needs to use a transformation network, an auxiliary transformation network and an auxiliary inverse transformation network, and the workflow of the transformation network is specifically described in the above embodiment, and the workflow of the auxiliary transformation network is the same as that of the transformation network, except that the input and the output are different. The auxiliary inverse transformation grid is the inverse process or inverse operation of the auxiliary transformation network, and will not be described herein.

Specifically, the input and output of the auxiliary transformation network are described as follows: the auxiliary transformation network sequentially performs downsampling on the transformed image features for multiple times to obtain a first auxiliary image feature, a second auxiliary image feature and a third auxiliary image feature, wherein the scale of the first auxiliary image feature is larger than that of the second auxiliary image feature, and the scale of the second auxiliary image feature is larger than that of the third auxiliary image feature; performing feature fusion on the first auxiliary image feature, the second auxiliary image feature and the third auxiliary image feature to obtain a fourth auxiliary image feature; weighting and fusing the fourth auxiliary image feature and the third auxiliary image feature to obtain a to-be-processed transformation auxiliary image feature; and sequentially passing the transformed auxiliary image features through an encoding module and a decoding module of the entropy model network to obtain auxiliary inverse transformation results of the transformed image features.

Further, based on the image end-to-end codec shown in fig. 1 and fig. 2, the present application proposes a high-efficiency image end-to-end codec, and specifically please continue to refer to fig. 3, fig. 3 is a schematic block diagram of the high-efficiency image end-to-end codec provided in the present application. The codec main improvement point of the present application contains a context model on the basis of the NIC shown in fig. 1 and 2.

Specifically, regarding to improving the context model, the application adopts a multi-scale mask convolution to construct the context model so as to acquire different range neighborhood information, and adaptively distributes weights and weights to be fused through a learning method.

Based on the image end-to-end codec shown in fig. 3, another image encoding method is proposed in the present application, and referring specifically to fig. 11, fig. 11 is a flow chart of another embodiment of the image encoding method provided in the present application.

As shown in fig. 11, the image encoding method of the embodiment of the present application includes the steps of:

step S31: and obtaining the transformation image characteristics of the image to be encoded.

In the embodiment of the present application, the method for transforming the image features of the image to be encoded may be used to obtain the transformed image features of the image to be encoded, or the transforming network shown in fig. 1 and fig. 2 may be used to obtain the transformed image features of the image to be encoded, which is not limited herein.

Step S32: and inputting the transformed image features into a context model of the entropy model network, and extracting the context features of different scales of the transformed image features by using mask convolution of different scales.

In the embodiment of the application, the context model of the NIC is formed by mask convolution, and the probability of an undecoded point is predicted by surrounding decoded points. The receptive field of the single mask convolution is fixed, and only the information of the decoded points in the fixed neighborhood range of the current predicted point can be integrated.

The application provides a multi-scale receptive field context information self-adaptive fusion method, as shown in fig. 12, and fig. 12 is a schematic structural diagram of an embodiment of a multi-scale context feature self-adaptive fusion module provided by the application. The context model is formed by mask convolution of a plurality of different convolution kernels, and context information of NxN, mxM, … and L xL receptive fields is obtained.

Step S33: and carrying out weighted fusion on the context features of different scales to obtain the context information of the transformed image features.

In the embodiment of the application, the image coding device obtains multi-scale context information x1, x2, xn after the input features pass through the multi-scale context model, then obtains the self-adaptive weight through the self-adaptive weight learning network, and finally multiplies and fuses the weight and the multi-scale context information.

Specifically, mask convolution in this application includes, but is not limited to: 2D convolution and 3D convolution; activation functions include, but are not limited to: reLU, sigmoid, etc.; fusion means include, but are not limited to: channel addition, channel splicing and the like; normalization functions include, but are not limited to: softmax, etc.

The following describes a specific embodiment of a method for adaptive fusion of multiscale receptive field context information according to the application:

when the input picture dimension is [3, 256, 256], the dimension of the input x of the context model is [192, 16, 16]. The multi-scale context information adaptive fusion is shown in fig. 13. Wherein the context model is composed of 3D mask convolutions with convolution kernel sizes of 3 x3, 5 x 5 and 7 x 7, respectively, and output channel of 24. The input features x respectively pass through context models of 3 different receptive fields to obtain output features x1, x2 and x3, and then the three features are respectively sent into a 3D convolution with a convolution kernel of 1 multiplied by 1 and a ReLU activation function with a step length of 1, and are spliced according to channels; the spliced features are subjected to 3-D convolution with 3 groups of convolution kernels of 1 multiplied by 1 and step length of 1 and a ReLu activation function, and a weight matrix is calculated by softmax. And finally, correspondingly multiplying the weight with the outputs x1, x2 and x3 of the context model, and adding and fusing to obtain the output characteristics.

Step S34: and inputting the context information into a probability model of the entropy model network, and acquiring the distribution information output by the probability model.

Step S35: and the coding module of the main coding network is used for coding the characteristics of the transformed image according to the distribution information to obtain a characteristic code stream of the image coding to be coded.

According to the image coding method, in a transformation network, shallow texture and deep semantic information are fused through a feature pyramid network, a remote dependency relationship is constructed, adaptive weighted fusion is carried out on the output of the transformation network and the output of the feature pyramid network, and image compression performance is improved.

The image coding method of the application provides a multi-scale context self-adaptive fusion method, extracts context information of different receptive fields through multi-scale mask convolution, distributes different self-adaptive weights for the multi-scale context information through neural network learning and carries out weighted fusion, and the accuracy of a context model is improved.

In contrast, referring to fig. 14, fig. 14 is a flowchart illustrating an embodiment of an image decoding method provided in the present application.

As shown in fig. 14, the image decoding method of the embodiment of the present application includes the steps of:

step S41: and decoding the characteristic code stream through a decoding module of the main decoding network to obtain the decoded image characteristics of the characteristic code stream.

Step S42: and obtaining the inverse transformation image characteristics of the decoded image characteristics through the inverse process of the image characteristic transformation processing method.

Step S43: and obtaining a decoded image corresponding to the characteristic code stream according to the inverse transformation image characteristics.

It should be noted that, the image decoding method in the embodiment of the present application is substantially the inverse process of the image encoding method in the embodiment shown in fig. 10, so that all the technical solutions of the image encoding method can be applied to the image decoding method in the embodiment without doubt, and the related technical solutions can be deduced through simple inverse engineering, which is not described herein.

In contrast, referring to fig. 15, fig. 15 is a flowchart illustrating another embodiment of an image decoding method provided in the present application.

As shown in fig. 15, the image decoding method of the embodiment of the present application includes the steps of:

step S51: and acquiring a characteristic code stream of the image to be decoded and transforming the image characteristics.

Step S52: and inputting the transformed image features into a context model of the entropy model network, and extracting the context features of different scales of the transformed image features by using mask convolution of different scales.

Step S53: and carrying out weighted fusion on the context features of different scales to obtain the context information of the transformed image features.

Step S54: and inputting the context information into a probability model of the entropy model network, and acquiring the distribution information output by the probability model.

Step S55: and decoding the characteristic code stream according to the distribution information by a decoding module of the main decoding network to obtain an image to be decoded.

It should be noted that, the image decoding method in the embodiment of the present application is substantially the inverse process of the image encoding method in the embodiment shown in fig. 11, so that all the technical solutions of the image encoding method can be applied to the image decoding method in the embodiment without doubt, and the related technical solutions can be deduced through simple inverse engineering, which is not described herein.

The above embodiments are only one common case of the present application, and do not limit the technical scope of the present application, so any minor modifications, equivalent changes or modifications made to the above matters according to the scheme of the present application still fall within the scope of the technical scheme of the present application.

With continued reference to fig. 16, fig. 16 is a schematic structural diagram of an embodiment of an image encoding device provided in the present application. The image encoding apparatus 600 of the embodiment of the present application includes a processor 61, a memory 62, an input-output device 63, and a bus 64.

The processor 61, the memory 62, and the input-output device 63 are respectively connected to the bus 64, and the memory 62 stores program data, and the processor 61 is configured to execute the program data to implement the image feature transformation processing method and/or the image encoding method described in the above embodiments.

In the present embodiment, the processor 61 may also be referred to as a CPU (Central Processing Unit ). The processor 61 may be an integrated circuit chip with signal processing capabilities. Processor 61 may also be a general purpose processor, a digital signal processor (DSP, digital Signal Process), an application specific integrated circuit (ASIC, application Specific Integrated Circuit), a field programmable gate array (FPGA, field Programmable Gate Array) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The general purpose processor may be a microprocessor or the processor 51 may be any conventional processor or the like.

With continued reference to fig. 17, fig. 17 is a schematic structural diagram of an embodiment of an image decoding apparatus provided in the present application. The image decoding apparatus 700 of the embodiment of the present application includes a processor 71, a memory 72, an input-output device 63, and a bus 74.

The processor 71, the memory 72, and the input-output device 73 are respectively connected to the bus 74, and the memory 72 stores program data, and the processor 71 is configured to execute the program data to implement the image feature transformation processing method and/or the image decoding method described in the above embodiments.

Still further, referring to fig. 18, fig. 18 is a schematic structural diagram of an embodiment of the computer storage medium provided in the present application, where the computer storage medium 800 stores program data 81, and the program data 81, when executed by a processor, is used to implement the method for transforming image features, the image encoding method and/or the image decoding method of the above embodiment.

Embodiments of the present application are implemented in the form of software functional units and sold or used as a stand-alone product, which may be stored on a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution, in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely an embodiment of the present application, and the patent scope of the present application is not limited thereto, but the equivalent structures or equivalent flow changes made in the present application and the contents of the drawings are utilized, or directly or indirectly applied to other related technical fields, which are all included in the patent protection scope of the present application.

Claims

1. A method of image feature transformation processing, comprising:

acquiring original image characteristics to be processed;

2. The method of image feature transformation processing according to claim 1, wherein,

The step of performing feature fusion on the first image feature, the second image feature and the third image feature to obtain a fourth image feature includes:

3. The method of image feature transformation processing according to claim 2, wherein,

the fusion mode of the feature fusion is channel splicing.

4. The method of image feature transformation processing according to claim 1, wherein,

The step of carrying out weighted fusion on the fourth image feature and the third image feature to obtain a to-be-processed transformation image feature comprises the following steps:

5. The method of image feature transformation processing according to claim 4, wherein,

inputting the fourth image feature and the third image feature into an adaptive fusion module to obtain a fusion weight, including:

6. An image encoding method, characterized in that the image encoding method comprises:

obtaining transformed image features of an image to be encoded by the method of image feature transformation processing according to any one of claims 1-5;

7. The image encoding method according to claim 6, wherein,

the coding module for coding the transformed image features through the main coding network to obtain the feature code stream of the image coding to be coded, which comprises the following steps:

8. The image encoding method according to claim 7, wherein,

the image encoding method further includes:

9. An image encoding method, characterized in that the image encoding method comprises:

obtaining the transformation image characteristics of an image to be coded;

10. The image encoding method according to claim 9, wherein,

the step of carrying out weighted fusion on the context features of different scales to obtain the context information of the transformed image features comprises the following steps:

11. An image decoding method, characterized in that the image decoding method comprises:

obtaining inverse transformed image features of the decoded image features by an inverse of the method of image feature transformation processing of any one of claims 1-5;

12. An image decoding method, characterized in that the image decoding method comprises:

13. An image encoding device, comprising a memory and a processor coupled to the memory;

wherein the memory is configured to store program data, and the processor is configured to execute the program data to implement the method of image feature transformation processing according to any one of claims 1 to 5, and/or the image encoding method according to any one of claims 6 to 10.

14. An image decoding device, comprising a memory and a processor coupled to the memory;

wherein the memory is configured to store program data, and the processor is configured to execute the program data to implement the method of image feature transformation processing according to any one of claims 1 to 5, and/or the image decoding method according to any one of claims 11 to 12.

15. A computer storage medium storing program data which, when executed by a computer, is adapted to carry out the method of image feature transformation processing according to any one of claims 1 to 5, the image encoding method according to any one of claims 6 to 10, and/or the image decoding method according to any one of claims 11 to 12.