CN113965755A

CN113965755A - Image coding method, storage medium and terminal equipment

Info

Publication number: CN113965755A
Application number: CN202010706682.3A
Authority: CN
Inventors: 陈巍; 刘阳兴
Original assignee: Wuhan TCL Group Industrial Research Institute Co Ltd
Current assignee: Wuhan TCL Group Industrial Research Institute Co Ltd
Priority date: 2020-07-21
Filing date: 2020-07-21
Publication date: 2022-01-21

Abstract

The application discloses an image coding method, a storage medium and a terminal device, wherein the method comprises the steps of obtaining an image characteristic diagram and a saliency characteristic diagram corresponding to an image to be coded; determining a mask feature map corresponding to an image to be coded based on the saliency feature map; generating a coding feature map according to the image feature map and the mask feature map; and obtaining the coding file according to the coding feature map. According to the method and the device, the mask characteristic diagram is determined according to the saliency characteristic diagram, the image information included by each channel in the image characteristic diagram is determined based on the mask characteristic diagram, so that the saliency image information carried by each channel in the coding characteristic diagram is different, different bit positions can be distributed to different channels according to the saliency image information during image coding, the channels including more saliency image information occupy more bit positions, the image information of the saliency image content in the coding file is improved, and the image effect of the reconstructed image obtained according to the reconstruction of the coding file is improved.

Description

Image coding method, storage medium and terminal equipment

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image encoding method, a storage medium, and a terminal device.

Background

Image coding refers to a technique of representing an original pixel matrix in a small number of bytes. It is generally necessary to encode an image in order to save storage space when the image is stored, or to increase the image transmission speed when the image needs to be transmitted. In the current image coding, when entropy coding is performed on an image, the same bits are allocated to significant image content (e.g., foreground content, etc.) and non-significant image content (e.g., background content, etc.), so that the amount of image information carrying important image content in a coded file obtained by coding is the same as the amount of image information carrying non-important image content. However, in a human eye vision system, human eyes are more sensitive to foreground content, so that the effect of reconstructing to obtain a reconstructed image is not good due to the fact that the encoded file in the same bit mode is adopted.

Disclosure of Invention

The technical problem to be solved by the present application is to provide an image encoding method, a storage medium and a terminal device, aiming at the defects of the prior art.

In order to solve the above technical problem, a first aspect of the embodiments of the present application provides an image coding method applied to an image coding model, the method including:

acquiring an image feature map and a saliency feature map corresponding to an image to be coded;

determining a mask feature map corresponding to the image to be coded based on the salient feature map;

generating a coding feature map corresponding to the image to be coded based on the image feature map and the mask feature map;

and obtaining a coding file corresponding to the image to be coded based on the coding feature map.

In one embodiment, the saliency feature map comprises foreground feature information of the image to be encoded.

In one embodiment, each channel in the mask feature map corresponds to each channel in the saliency feature map in a one-to-one manner, and at least two channels with different image information contents exist in the mask feature map.

In one embodiment, the image coding method is applied to an image coding model; the image coding model comprises a feature extraction module and a significance extraction module, and the image coding model specifically comprises the following steps of:

the feature extraction module acquires an image feature map of the image to be coded;

the saliency extraction module acquires a saliency feature map of the image to be coded.

In one embodiment, the image coding method is applied to an image coding model; the image coding model comprises a mask module, and the salient feature map is a single-channel feature map; determining the mask feature map corresponding to the image to be encoded based on the significant feature map specifically includes:

the mask module determines a multi-channel feature map, wherein the image size of the multi-channel feature map is the same as the image size of the salient feature map;

for each channel in the multi-channel feature map, the mask module adjusts the pixel value of each pixel point in the channel based on the channel number of the channel and the significance feature map;

and taking the adjusted multichannel feature map as the mask feature map.

In one embodiment, the adjusting, by the mask module, the pixel value of each pixel point in the channel based on the channel number of the channel and the saliency map specifically includes:

for each pixel point in the channel, the mask module determines a target pixel value corresponding to the pixel point, wherein the target pixel value is the pixel value of the target pixel point, and the pixel position of the target pixel point in the saliency characteristic map corresponds to the pixel position of the pixel point in the channel;

and the mask module determines the pixel value of the pixel point according to the target pixel value and the channel number of the channel.

In one embodiment, before the mask module determines a multi-pass feature map, the method includes:

the mask module adjusts the image size of the saliency feature map and takes the adjusted saliency feature map as the saliency feature map, wherein the image size of the adjusted saliency feature map is the same as the image size of the image feature map.

In one embodiment, the image coding method is applied to an image coding model; the image encoding module comprises a quantization module; after generating the coding feature map corresponding to the image to be coded based on the image feature map and the mask feature map, the method includes:

and the quantization module generates a quantization feature map of the image to be coded based on the coding feature map, and takes the quantization feature map as the coding feature map.

A second aspect of embodiments of the present application provides a computer-readable storage medium storing one or more programs, which are executable by one or more processors to implement steps in an image encoding method as described in any one of the above.

A third aspect of the embodiments of the present application provides a terminal device, including: a processor, a memory, and a communication bus; the memory has stored thereon a computer readable program executable by the processor;

the communication bus realizes connection communication between the processor and the memory;

the processor, when executing the computer readable program, implements the steps in the image encoding method as described in any of the above.

Has the advantages that: compared with the prior art, the embodiment of the application provides an image coding method, a storage medium and a terminal device, wherein the image coding method obtains an image characteristic diagram and a saliency characteristic diagram corresponding to an image to be coded through an image coding model; determining a mask feature map corresponding to the image to be coded based on the salient feature map; generating a coding feature map corresponding to the image to be coded according to the image feature map and the mask feature map; and finally, obtaining a coding file corresponding to the image to be coded according to the coding feature map. According to the method and the device, the mask characteristic diagram is determined according to the saliency characteristic diagram, the image information included by each channel in the image characteristic diagram is determined based on the mask characteristic diagram, so that the saliency image information carried by each channel in the coding characteristic diagram is different, different bit positions can be distributed to different channels according to the saliency image information during image coding, the channels including more saliency image information occupy more bit positions, the image information of the saliency image content in the coding file is improved, and the image effect of the reconstructed image obtained according to the reconstruction of the coding file is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without any inventive work.

Fig. 1 is a flowchart of an image encoding method provided in the present application.

Fig. 2 is a schematic diagram illustrating an image encoding method according to the present application.

Fig. 3 is a schematic diagram illustrating a principle of a feature extraction module in the image encoding method provided in the present application.

Fig. 4 is a schematic diagram illustrating a principle of generating a reconstructed image from an encoded file obtained by the image encoding method provided by the present application.

Fig. 5 is a schematic structural diagram of a decoding module provided in the present application.

Fig. 6 is a schematic diagram of a structure of a residual block in a decoding module provided in the present application.

Fig. 7 is a schematic diagram of a structure of an upsampling module in the decoding module provided by the present application.

Fig. 8 is a schematic diagram of a reconstructed image generated by acquiring a coding file obtained by the image coding method provided in the present application.

Fig. 9 is a schematic diagram of a reconstructed image generated by directly encoding an encoded file obtained by using an image feature map.

Fig. 10 is a schematic structural diagram of a terminal device provided in the present application.

Detailed Description

The present application provides an image encoding method, a storage medium, and a terminal device, and in order to make the purpose, technical solution, and effect of the present application clearer and clearer, the present application will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

The inventor has found through research that with the continuous development of deep learning technology, a deep convolutional network has been generally applied to an image compression method. However, in the currently commonly used network model based on the self-encoder, after the coding feature map is obtained by the network model of the self-encoder, the coding feature is usually quantized, and the quantized coding feature map is subjected to entropy coding, and in the entropy coding, the same bits are allocated to the significant image content (e.g., foreground content, etc.) and the non-significant image content (e.g., background content, etc.) in the quantized coding feature map, so that the amount of image information carrying the significant image content in the coded file obtained by coding is the same as the amount of image information carrying the non-significant image content. However, in a human eye vision system, human eyes are more sensitive to the content corresponding to the significant image area, so that the effect of reconstructing the image obtained by reconstruction is not good due to the fact that the encoded file in the same bit mode is adopted.

In order to solve the above problem, in the embodiment of the present application, an image feature map and a saliency feature map corresponding to an image to be encoded are obtained through an image encoding model; determining a mask feature map corresponding to the image to be coded based on the salient feature map; generating a coding feature map corresponding to the image to be coded according to the image feature map and the mask feature map; and finally, obtaining a coding file corresponding to the image to be coded according to the coding feature map. According to the method and the device, the mask characteristic diagram is determined according to the saliency characteristic diagram, the image information included by each channel in the image characteristic diagram is determined based on the mask characteristic diagram, so that the saliency image information carried by each channel in the coding characteristic diagram is different, different bit positions can be distributed to different channels according to the saliency image information during image coding, the channels including more saliency image information occupy more bit positions, the image information of the saliency image content in the coding file is improved, and the image effect of the reconstructed image obtained according to the reconstruction of the coding file is improved.

For example, the embodiment of the present application may be applied to a scene of an electronic device configured with an image coding model. In the scene, firstly, the electronic equipment can collect an image to be coded and acquire an image feature map and a saliency feature map corresponding to the image to be coded through an image coding module; determining a mask feature map corresponding to the image to be coded based on the salient feature map; generating a coding feature map corresponding to the image to be coded based on the image feature map and the mask feature map; and obtaining a coding file corresponding to the image to be coded based on the coding feature map.

It should be noted that the above application scenarios are only shown for the convenience of understanding the present application, and the embodiments of the present application are not limited in any way in this respect. Rather, embodiments of the present application may be applied to any scenario where applicable.

The following further describes the content of the application by describing the embodiments with reference to the attached drawings.

The present embodiment provides an image encoding method, as shown in fig. 1 and 2, including:

and S10, the image coding model acquires an image feature map and a saliency feature map corresponding to the image to be coded.

Specifically, the image to be encoded may be an image acquired by an electronic device (e.g., a server) operating an image encoding method, and the image to be encoded may be an original image acquired by an image acquisition device, where the image acquisition device may be configured in the electronic device operating the image encoding method, or may be configured in another external device, and the acquired original image is sent to the electronic device operating the image encoding method through the external device. In a possible implementation manner of this embodiment, the image capturing apparatus is configured to an electronic device running an image encoding method, so that after the electronic device captures an image, the image can be directly encoded, which can reduce a storage space occupied by the image on the one hand, and when the image needs to be sent to another external device, the encoded image can be sent to improve an image transmission speed.

The image feature map is a global feature map of the image to be coded and contains all image features of the image to be coded, and each channel in the image feature map is a local feature map of the image to be coded and contains local features of the image to be coded. Therefore, the importance degree of the local features carried by each channel in the image to be encoded is different, for example, in a human visual system, human eyes are more sensitive to the content corresponding to the salient region (for example, foreground region, etc.), so that the content corresponding to the salient region in the image to be encoded is more important than the content corresponding to the non-salient region in the image to be encoded. Thus, the level of importance of the image content of the image to be encoded can be divided into 1 level and 2 levels according to the salient region and the non-salient region, wherein the 1 level corresponds to the salient region, the 2 level corresponds to the non-salient region, and the level 1 corresponds to a higher degree of importance than the level 2, and the amount of image information in the channel carrying the image feature of the content corresponding to the salient region is greater than the amount of image information in the channel carrying the content corresponding to the non-salient region.

Further, the saliency feature map is used for reflecting foreground feature information of an image to be encoded, wherein the saliency feature map is a single-channel feature map, a value range of a pixel point in the saliency feature map can be [0,1], the pixel point with a pixel value of 1 is used for representing a first part of foreground area in the saliency feature map to be encoded, and the pixel point with a pixel value of 0 represents a first part of background area in the saliency feature map to be encoded; and part of pixel points of which the pixel values are between 0 and 1 represent a second part of foreground area, and part of pixel points represent a second part of background area. The first part of foreground area and the second part of foreground area form a foreground area of the image to be coded, and the first part of background area and the second part of background area form a background area of the image to be coded, wherein the second part of foreground area is a boundary area of the foreground area, the second part of background area is a boundary area of the background area, and the second part of foreground area is intersected with the second part of background area.

Further, in an implementation manner of this embodiment, the image coding method is applied to an image coding module, as shown in fig. 2, the image coding model includes a feature extraction module and a saliency extraction module, and the acquiring, by the image coding model, an image feature map and a saliency feature map corresponding to an image to be coded specifically includes:

s11, the feature extraction module acquires an image feature map of the image to be coded;

s12, the saliency extraction module acquires a saliency feature map of the image to be coded.

Specifically, in step S11, the feature extraction module is configured to obtain a feature map of the image to be encoded, and the mask module is configured to obtain a mask feature map corresponding to the image to be encoded, where an input item of the feature extraction module is the image to be encoded, an output item of the feature extraction module is the image feature map, an input item of the mask module is the image feature map, and an output item of the mask module is the mask feature map. It can be understood that the input item of the mask module is the output item of the feature extraction module, that is, after the feature extraction module obtains the image feature map corresponding to the image to be encoded, the image feature map is input to the mask module, so that the mask feature map corresponding to the image feature map is output by the mask module.

Further, the feature extraction module may include a plurality of cascaded convolution modules, and the image feature map corresponding to the image to be encoded is generated by the plurality of cascaded convolution modules. In a specific implementation manner of this embodiment, as shown in fig. 3, the feature extraction module includes six convolution modules, which are respectively recorded as a first convolution module, a second convolution module, a third convolution module, a fourth convolution module, a fifth convolution module, and a sixth convolution module. In the first convolution, convolution kernels are 7 x 7, the number of the convolution kernels is 60, the step length is 1, and the number of the filled pixels is 3; normalizing operation is InstanceNorm normalization; the nonlinear transformation function is a relu function; in the second convolution module, the convolution kernel is 3 x 3, the number of the convolution kernels is 120, the step length is 1, and the filling pixel is 3; normalizing operation is InstanceNorm normalization; the nonlinear transformation function is a relu function; in the third convolution module, the convolution kernel is 3 × 3, the number of convolution kernels is 240, the step size is 2, and the filling pixel is 1; normalizing operation is InstanceNorm normalization; the nonlinear transformation function is a relu function; in the fourth convolution module, the convolution kernel is 3 x 3, the number of the convolution kernels is 480, the step length is 2, and the filling pixel is 1; normalizing operation is InstanceNorm normalization; the nonlinear transformation function is a relu function; in the fifth convolution module, the convolution kernel is 3 × 3, the number of convolution kernels is 960, the step size is 2, and the filling pixel is 1; normalizing operation is InstanceNorm normalization; the nonlinear transformation function is a relu function; in the sixth convolution module, the convolution kernel is 3 x 3, the number of the convolution kernels is C, the step length is 1, and the filling pixel is 1; the normalization operation is InstanceNorm normalization. Wherein C may be determined according to actual requirements, and in one possible implementation manner of this embodiment, C is set to 16.

Based on this, the number of channels of the image feature map is 16, that is, the image feature map is a 16-channel feature map, and the acquiring process of the image feature map may be: inputting an image to be coded into a first convolution module, and outputting a characteristic diagram A of a 60 channel through the convolution module; inputting the feature map A into a second convolution module, and outputting a feature map B of 120 channels through the second convolution module; inputting the feature map B into a third convolution module, and outputting a feature map C of 240 channels through the third convolution module; inputting the feature map C into a fourth convolution module, and outputting a feature map D of 480 channels through the fourth convolution module; inputting the feature map D into a fifth convolution module, and outputting 960 channel feature map E through the fifth convolution module; inputting the feature map E into a sixth convolution module, and outputting 16 channels of image feature maps through the sixth convolution module.

Further, in the step S12, the saliency module is configured to extract saliency features of the image to be encoded, and generate a saliency feature map based on the extracted saliency features. The significance module can be a traditional machine learning model and can also be a deep learning model. In an implementation manner of this embodiment, since the deep learning model is a machine learning model capable of simulating a neural structure of a human brain, and has a strong image processing capability and forward propagation, so that a learning result is closest to a human brain result, the detection algorithm model adopts the deep learning model.

Further, the significance module may employ a significance detection network model, which may include an encoding-decoding unit and a residual unit; the encoding-decoding unit is configured to extract an initial saliency feature map corresponding to the image to be encoded, where an alignment degree of a boundary of a foreground region in the initial saliency feature map with a boundary of a true foreground region of the image to be encoded is smaller than a preset threshold (e.g., 95%), and the residual unit is configured to modify the initial saliency feature map to obtain a saliency feature map, where an alignment degree of a boundary of a foreground region in the initial saliency feature map with a boundary of a true foreground region of the image to be encoded is greater than or equal to the preset threshold. Of course, in practical applications, the saliency module may also use an existing saliency detection network model, such as a BasNet network.

And S20, determining the corresponding mask feature map of the image to be coded based on the salient feature map.

Specifically, the mask feature map is used to reflect saliency image information of each channel in the saliency feature map, where the saliency image information refers to image information (for example, image information of a foreground region) that needs to be retained by each channel during encoding, and correspondingly, the saliency image information refers to redundant image information (for example, image information of a background region) of each channel during encoding. In addition, each channel in the mask feature map is used for reflecting the saliency image information of a target channel corresponding to the channel, the target channel is a channel in the saliency feature map, and the channel number of the target channel in the saliency feature map is the same as the channel number of the target channel in the mask feature map. Therefore, the first channel number of the mask feature map is the same as the second channel number of the saliency feature map, for example, if the second channel number of the saliency feature map is 16, then the channel number of the mask feature map is 16, and the channel 10 in the mask feature map is used for reflecting the saliency image information carried by the channel 10 in the saliency feature map.

Further, in one implementation of the present embodiment, as shown in fig. 2, the image coding method is applied to an image coding model; the image coding model comprises a mask module, and the salient feature map is a single-channel feature map; determining the mask feature map corresponding to the image to be encoded based on the significant feature map specifically includes:

a10, the mask module determines a multi-channel feature map, wherein the image size of the multi-channel feature map is the same as the image size of the salient feature map;

a20, for each channel in the multi-channel feature map, the mask module adjusts the pixel value of each pixel point in the channel based on the channel number of the channel and the saliency feature map;

and A30, taking the adjusted multichannel feature map as the mask feature map.

Specifically, in step a10, the image size of the multi-channel feature map is the same as the image size of the saliency feature map, and the number of channels of the multi-channel feature map is the same as the number of channels of the image feature map. For example, the image size of the saliency map is 128 × 128 and the number of channels of the image map is 16, and then the image size of the multi-channel map is 128 × 18 and the number of channels is 16. In addition, since the mask feature map is used to adjust the amount of image information carried by each channel in the image feature map, the image size of the mask feature map needs to be the same as the image size of the image feature map. However, the saliency map is determined based on the image to be encoded for reflecting the salient image content regions as well as the non-salient image content regions in the band-encoded image, so that the image size of the saliency map is the same as the image size of the image to be encoded.

Based on this, before determining the multi-channel feature map based on the saliency feature map, the image size of the saliency feature map may be adjusted so that the image size of the saliency feature map is the same as the image size of the image feature map. Accordingly, in an implementation manner of this embodiment, before the mask module determines a multi-channel feature map, the method includes:

Specifically, before the image size of the salient feature map is adjusted, a first image size of the salient feature map and a second image size of the image feature map may be obtained, an adjustment ratio is determined according to the first image size and the second image size, and the image size of the salient feature map is adjusted based on the adjustment ratio. For example, if the first image size of the saliency feature map is 256 × 256 and the second image size of the saliency feature map is 32 × 32, the adjustment ratio is 256/32 × 8, where the adjustment ratio is the ratio of the width of the first image size to the width of the second image size, or the ratio of the height of the first image size to the height of the second image size, where the ratio of the width to the height of the first image size is the same as the ratio of the width to the height of the second image size. After determining the adjustment ratio, the saliency map is downsampled based on the adjustment ratio to obtain the adjusted saliency map, wherein the downsampling step size is the adjustment ratio, for example, if the adjustment ratio is 16, the downsampling step size is 16.

Further, in the step a20, the pixel values of the pixel points in the multi-channel feature map may all be preset values, for example, 1,0, and the like. The channel number is the channel number of each channel in the multi-channel feature map, the channel number in the multi-channel feature map is a natural number starting from 0, and the channel numbers corresponding to two adjacent channels are continuous. It can be understood that, in the multi-channel feature map, the channel number of the channel located at the first bit channel along the channel direction is 0, the channel number of the channel located at the second bit channel is 1, and so on, the channel number of the channel located at the last bit channel is C-1, where C is the number of channels of the multi-channel feature map. In other words, when the number of channels of the multi-channel feature map is C, the channel numbers of the multi-channel feature map are 0, 1. For example, the number of channels C of the multi-channel feature map is 4, and the channel numbers are 0,1,2, and 3, respectively.

Further, in an implementation manner of this embodiment, the adjusting, by the mask module, the pixel value of each pixel point in the channel based on the channel number of the channel and the saliency map specifically includes:

for each pixel point in the channel, the mask module determines a target pixel value corresponding to the pixel point;

Specifically, the target pixel value is a pixel value of a target pixel point in the saliency feature map, and a pixel position of the target pixel point in the saliency feature map corresponds to a pixel position of the pixel point in the channel. It can be understood that, for each pixel point in the channel, the pixel position of the pixel point in the channel is obtained, where the pixel position refers to the coordinate information corresponding to the position of the pixel point in the channel, for example, for a pixel point whose coordinate information corresponding to the position of the channel 0 is (0,0), the pixel position of the pixel point in the channel 0 is (0, 0); after the pixel position is obtained, a pixel point corresponding to the pixel position is selected from the significance characteristic diagram based on the pixel position, and the pixel value of the selected pixel point is used as a target pixel value corresponding to the pixel point. For example, for the pixel point a in the channel, the pixel position of the pixel point a in the channel is (50, 50), and then the pixel point corresponding to the target pixel value in the saliency feature map is (50, 50).

Further, the value range of the pixel values of the pixels in the saliency characteristic map is 0-1, the number of channels of the multi-channel characteristic map is a preset number, and the pixels of the pixels in each channel in the multi-channel characteristic map are determined based on the pixel values and the channel numbers. In order to improve the correlation between the pixel value of a pixel in the saliency feature map and the channel number, before the channel is adjusted based on the channel number of the channel and the saliency feature map, the pixel value of each pixel in the saliency feature map needs to be adjusted to a preset interval, wherein the preset interval is used for the value range of the pixel value of each pixel in the adjusted single-channel feature map, the upper limit value of the preset interval is determined based on the number of channels of the multi-channel feature map, and the lower limit value of the preset interval is 0. For example, if the number of channels of the multi-channel feature map is C, the preset interval is [0, C ], and the pixel values of the pixels in the adjusted saliency feature map are all in the range of [0, C ]. In a specific implementation manner of this embodiment, when the saliency map is a normalized single-channel feature map, adjusting the pixel value of each target pixel value in the saliency map to a preset interval may multiply the pixel value of each target pixel value in the single-channel feature map by a channel number of a multi-channel feature map, where for example, the channel number of the multi-channel feature map is C, and then, for the pixel value of each pixel point in the saliency map, multiply the pixel value of the pixel point by C, and use the obtained product as the pixel value corresponding to the target pixel value. In this way, the pixel value of each target pixel value in the single-channel feature map is adjusted to a preset interval, so that the importance degree of each target pixel value in each single-channel feature map on each channel of the multi-channel feature map can be determined, and therefore useful information in each channel can be determined.

Further, after a target pixel value corresponding to the pixel point is obtained, the pixel value corresponding to the pixel point is calculated according to the target pixel value and the channel number corresponding to the channel map where the pixel point is located, and the calculated pixel value is used as the pixel value corresponding to the pixel point. Therefore, the pixel values of all pixel points in all channels in the multi-channel feature map can be adjusted, and the adjusted multi-channel feature map is used as a mask feature map. In a specific implementation manner of this embodiment, a calculation formula of pixel values of pixel points in each channel in the multi-channel feature map may be:

wherein k is a channel number of a channel in the multi-channel characteristic diagram, k is 0,1,2, C-1, and C is the number of channels in the multi-channel characteristic diagram; i, j represents the position of the pixel point in the channel; m is_i,j,kThe pixel value of the pixel point is; y is_i,jThe pixel value of the target pixel value of the i, j position in the single-channel feature map.

In this embodiment, the value range of the pixel value of each pixel point in the saliency characteristic map is from [0,1]Mapping to [0, C]Then, according to the above formula, the single channel characteristic diagram is corresponded to each channel of the multi-channel characteristic diagram, and the pixel value y of each pixel point is used_i,jThe pixel value of each pixel point in each channel k is determined according to the channel number k of each channel, so that each channel k can carry different significance image information, the significance image information carried by each channel in the coded feature graph obtained by coding is different, self-adaptive coding can be carried out according to the significance image information carried by each channel during coding, and therefore channels carrying a large amount of useful information can be allocated with a large number of bits, and a large amount of significance image information is reserved in a coded file obtained by coding.

And S30, generating a coding feature map corresponding to the image to be coded based on the image feature map and the mask feature map.

Specifically, the encoding feature map is a feature map used for encoding, and after the encoding feature map is obtained, the encoding feature map may be encoded to obtain an encoding file corresponding to an image to be encoded. The mask feature map is used for reflecting the significance image information of each channel in the image feature map. Therefore, when the coding feature map is determined according to the image feature map and the mask feature map, information filtering can be performed on each channel in the image feature map through the mask feature map, so that non-significant image information carried by each channel in the image feature map is removed.

In addition, the information filtering of each channel in the image feature map through the mask feature map means that, for each target channel in the image feature map, a reference channel corresponding to the target channel is determined, element multiplication is performed on each channel in the image feature map and the corresponding reference channel, and an image obtained by the element multiplication is used as an encoding feature map. The channel number of the reference channel in the mask feature map is the same as the channel number of the target channel in the image feature map, and for each target channel in the image feature map, one reference channel exists in the mask feature map and corresponds to the target channel, because the number of channels in the mask feature map is the same as the number of channels in the image feature map, and the channel numbering mode of each channel in the mask feature map is the same as the channel numbering mode of each channel in the image feature map. For example, the number of channels of the image feature map is C, the channel coding mode is 0, 1.. and C-1, then the number of channels of the mask feature map is C, and the channel coding mode is 0, 1.. and C-1; based on this, the target channel with channel number 5 in the image feature map corresponds to the reference channel with channel number 5 in the mask feature map. In this embodiment, the image feature map is screened by using the mask feature map, so that the salient region information in the encoding feature map can be improved, the non-salient region information can be filtered, and the salient region information amount in the encoding feature map can be improved.

Further, after the mask feature map is obtained, the mask feature map can be adjusted, so that the mask feature map carries the salient region information and simultaneously retains part of the non-salient region information, the information amount of the salient region information in the coding feature map is included, and simultaneously, the coding feature map can comprise part of the non-salient region information, so that a reconstructed image is obtained based on a coding file corresponding to the coding file, and the reconstructed image can carry the image content of the non-salient region while the image details of the salient region are ensured to be improved. Based on this, in an implementation manner of this embodiment, the adjusting process of the mask feature map may specifically include: for each channel in the mask feature map, adding the pixel value of each pixel point in the channel to a first preset numerical value to obtain a pixel value obtained by adding; and then, dividing the pixel value obtained by adding corresponding pixels in the channel by a second preset value, and taking the quotient obtained by the division as the pixel value corresponding to the pixel point to obtain an adjusted mask characteristic diagram. Any first pixel value of the first pixel values of the pixel points used for representing the salient region in the obtained mask feature map is larger than all second pixel values, the second pixel values are used for representing the non-salient region, and the second pixel values are not zero, so that the content of the salient region can be highlighted (more information is reserved, that is, more bits are allocated) and the content of the non-salient region can be reserved (less information is reserved, that is, fewer bits are allocated) in the coded image determined based on the mask image and the image feature map, so that the decoded image can be complete, and the details and the textures of the salient region can be enhanced. In an implementation manner of this embodiment, the first preset value may be 1, and the second preset value may be 2.

Further, in order to reduce the data amount of the coding feature map, after the image coding model obtains the image feature map and the mask feature map, the coding feature map may be quantized, and the coding feature map may be obtained according to a quantized feature map obtained by quantization. Accordingly, in one implementation of this embodiment, as shown in fig. 2, the image coding method is applied to an image coding model; the image coding module comprises a quantization module, and after the coding feature map corresponding to the image to be coded is generated based on the image feature map and the mask feature map, the method comprises the following steps:

Specifically, the quantization refers to dividing the value range of each pixel point of the encoding characteristic graph into a plurality of intervals, and setting the values of all the pixel points in each interval to be the same value. The quantization of the coding feature map can adopt the existing quantization method which can realize image quantization. In an implementation manner of this embodiment, the quantizing manner of the coding feature map may be that a clustering quantization manner is adopted to quantize the coding feature map. The process of quantizing the coding feature map by using a clustering quantization mode may be: the distance between each pixel point and the quantization central point in the coding feature graph is calculated by the given clustering quantization central point, and the minimum distance in all the acquired distances is used as a quantization value, wherein the calculation formula of the distance between each pixel point and the quantization central point can be as follows:

Q(input_x_i):＝argmin_j(input_x_i-c_j)，

wherein, input _ x_iI-th data representing the input code feature map, c_jExpress clustering quantization center point C ═ { C₁,c₂,...,c_LThe jth component of j ∈ [1, L ]]And L is a positive integer.

Further, in order to ensure the error back propagation, the distance between each pixel point and the quantization center point needs to be subjected to soft quantization processing first, and then hard quantization processing is performed. The processing mode of the soft quantization processing is as follows:

the processing procedure of the hard quantization processing is as follows:

stop_gradient(Q(input_x_i)-soft_Q(input_x_i))+soft_Q(input_x_i)

wherein stop _ gradient () stops the gradient computation.

In addition, after quantization processing, rounding is performed on each distance obtained by quantization, a quantization value is determined according to each distance obtained by rounding, and finally the coding feature map is quantized according to the quantization value to obtain the quantized coding feature map.

And S40, obtaining the coding file corresponding to the image to be coded based on the coding feature map.

Specifically, the encoded file is obtained by encoding the encoded feature map, and when the encoded feature map is encoded, an entropy encoding method may be used for encoding, and the encoded feature map may be losslessly compressed by entropy encoding, so as to obtain the encoded file, where the entropy encoding method may use various existing encoding methods, such as huffman encoding or arithmetic encoding. It should be noted that, when encoding the encoding feature map by entropy encoding, an adaptive encoding method based on the amount of information of the significant image information carried by the channels is adopted, and it can be understood that, when encoding the encoding feature map, the bits corresponding to the channels are determined based on the amount of information of the significant image information carried by each channel in the encoding feature map, and the bits corresponding to each channel are determined when encoding. In addition, the bit corresponding to the channel is positively correlated with the information amount of the saliency image information carried by the channel, i.e. the more the information amount of the saliency image information carried by the channel is, the larger the bit corresponding to the channel is; conversely, the less the amount of salient image information carried by the channel, the smaller the bit corresponding to the channel.

Based on the above image encoding method, this embodiment may further provide an image decoding method, where the image decoding method is applied to an image decoding model, as shown in fig. 4, and the image decoding method includes:

the image decoding module determines a reconstructed image corresponding to the coding file based on the coding file, wherein the coding file is obtained by coding based on the coding method.

Specifically, as shown in fig. 5, the image decoding model includes a first convolution module a, a second convolution module a, and a reconstruction module, where the first convolution module a includes a first convolution unit, a second convolution unit, and a third convolution unit, where in the first convolution unit, a convolution kernel is 3 × 3, the number of convolution kernels is 480, a step size is 1, and a padding pixel is 1; the normalization operation is an InstanceNorm operation; the nonlinear transformation function is a relu function; in the first convolution unit, convolution kernels are 3 x 3, the number of the convolution kernels is 960, the step length is 1, and the filling pixels are 1; the normalization operation is an InstanceNorm operation; the nonlinear transformation function is a relu function; in the third convolution unit, convolution kernels are 3 x 3, the number of the convolution kernels is 960, the step size is 1, and the filling pixels are 1; the normalization operation is an InstanceNorm operation; the nonlinear transformation function is a relu function.

The second convolution module a includes 9 residual blocks, as shown in fig. 6, each residual block includes a fourth convolution unit and a fifth convolution unit, in the fourth convolution unit, the convolution kernel is 3 × 3, the number of convolution kernels is 960, the step size is 1, and the padding pixel is 1; the normalization operation is an InstanceNorm operation; the nonlinear transformation function is the relu function: in the fifth convolution unit, convolution kernels are 3 x 3, the number of the convolution kernels is 960, the step size is 1, and the filling pixels are 1; the normalization operation is an InstanceNorm operation; and after the feature map is output by the fifth convolution unit, adding the input item of the fourth convolution unit and the output item of the fifth convolution unit through a short-circuit operation to obtain an output item corresponding to each residual block.

The reconstruction module comprises four cascaded upsampling modules and a sixth convolution unit; as shown in fig. 7, the upsampling module includes an upsampling unit in which 2 times bilinear difference upsampling is performed, and a seventh convolution unit, where the number of convolution kernels corresponding to the upsampling is 480; in the seventh convolution unit, the convolution kernel is 3 × 3, the number of the convolution kernels is 480, the step size is 1, and the normalization operation is an InstanceNorm operation; the nonlinear transformation function is a relu function. In the sixth convolution unit, the convolution kernel is 7 × 7, the number of convolution kernels is 3, the step size is 1, and the filled pixel is 3.

In summary, the present embodiment provides an image encoding method, which obtains an image feature map and a saliency feature map corresponding to an image to be encoded through an image encoding model; determining a mask feature map corresponding to the image to be coded based on the salient feature map; generating a coding feature map corresponding to the image to be coded according to the image feature map and the mask feature map; and finally, obtaining a coding file corresponding to the image to be coded according to the coding feature map. According to the mask feature map determined by the saliency feature map and the image information included by each channel in the image feature map determined based on the mask feature map, the saliency image information carried by each channel in the coding feature map is different, so that different bits can be allocated to different channels according to the saliency image information during image coding, and thus more bits are occupied by channels including more saliency image information, the image information of the saliency image content in the coding file is improved, and the image effect of the reconstructed image obtained according to the coding file reconstruction is improved, for example, as shown in fig. 8, a first reconstructed image generated by the coding file obtained by adopting the coding method provided by the embodiment and a second reconstructed image generated by the coding file obtained by directly adopting the image feature map for equal bit width coding as shown in fig. 9 can be known, the image details of the parrot region as the salient region in the first reconstructed image are better than those of the parrot region as the salient region in the second reconstructed image.

Based on the above-described image encoding method, the present embodiment provides a computer-readable storage medium storing one or more programs, which are executable by one or more processors, to implement the steps in the image encoding method as described in the above embodiment.

Based on the above image encoding method, the present application also provides a terminal device, as shown in fig. 10, which includes at least one processor (processor) 20; a display screen 21; and a memory (memory)22, and may further include a communication Interface (Communications Interface)23 and a bus 24. The processor 20, the display 21, the memory 22 and the communication interface 23 can communicate with each other through the bus 24. The display screen 21 is configured to display a user guidance interface preset in the initial setting mode. The communication interface 23 may transmit information. The processor 20 may call logic instructions in the memory 22 to perform the methods in the embodiments described above.

Furthermore, the logic instructions in the memory 22 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product.

The memory 22, which is a computer-readable storage medium, may be configured to store a software program, a computer-executable program, such as program instructions or modules corresponding to the methods in the embodiments of the present disclosure. The processor 20 executes the functional application and data processing, i.e. implements the method in the above-described embodiments, by executing the software program, instructions or modules stored in the memory 22.

The memory 22 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal device, and the like. Further, the memory 22 may include a high speed random access memory and may also include a non-volatile memory. For example, a variety of media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, may also be transient storage media.

In addition, the specific processes loaded and executed by the storage medium and the instruction processors in the terminal device are described in detail in the method, and are not stated herein.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. An image encoding method, characterized in that the method comprises:

2. The image encoding method according to claim 1, wherein the saliency feature map includes foreground feature information of the image to be encoded.

3. The image encoding method according to claim 1, wherein each channel in the mask feature map corresponds to each channel in the saliency feature map in a one-to-one manner, and at least two channels having different amounts of image information exist in the mask feature map.

4. The image encoding method according to any one of claims 1 to 3, wherein the image encoding method is applied to an image encoding model; the image coding model comprises a feature extraction module and a significance extraction module, and the image coding model specifically comprises the following steps of:

5. The image encoding method according to any one of claims 1 to 3, wherein the image encoding method is applied to an image encoding model; the image coding model comprises a mask module, and the salient feature map is a single-channel feature map; determining the mask feature map corresponding to the image to be encoded based on the significant feature map specifically includes:

and taking the adjusted multichannel feature map as the mask feature map.

6. The image encoding method of claim 5, wherein the adjusting, by the mask module, the pixel value of each pixel point in the channel based on the channel number of the channel and the saliency map specifically comprises:

7. The image encoding method of claim 6, wherein before the masking module determines a multi-channel feature map, the method comprises:

8. The image encoding method according to any one of claims 1 to 3, wherein the image encoding method is applied to an image encoding model; the image encoding module comprises a quantization module; after generating the coding feature map corresponding to the image to be coded based on the image feature map and the mask feature map, the method includes:

9. A computer-readable storage medium, characterized in that the computer-readable storage medium stores one or more programs which are executable by one or more processors to implement the steps in the image encoding method according to any one of claims 1 to 8.

10. A terminal device, comprising: a processor, a memory, and a communication bus; the memory has stored thereon a computer readable program executable by the processor;

the processor, when executing the computer readable program, implements the steps in the image encoding method of any of claims 1-8.