CN114501012A - Image filtering, coding and decoding method and related equipment - Google Patents

Image filtering, coding and decoding method and related equipment Download PDF

Info

Publication number
CN114501012A
CN114501012A CN202111664167.4A CN202111664167A CN114501012A CN 114501012 A CN114501012 A CN 114501012A CN 202111664167 A CN202111664167 A CN 202111664167A CN 114501012 A CN114501012 A CN 114501012A
Authority
CN
China
Prior art keywords
block
filtering
image
reconstruction
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111664167.4A
Other languages
Chinese (zh)
Other versions
CN114501012B (en
Inventor
张雪
江东
林聚财
殷俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202111664167.4A priority Critical patent/CN114501012B/en
Publication of CN114501012A publication Critical patent/CN114501012A/en
Application granted granted Critical
Publication of CN114501012B publication Critical patent/CN114501012B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses an image filtering, coding and decoding method and related equipment. The method comprises the following steps: carrying out multi-level down sampling on the reconstruction block to obtain the abbreviative characteristics of the reconstruction block; in each level of downsampling, downsampling a main feature and an auxiliary feature which have the same size respectively, wherein the sampled main feature is a fusion feature of the main feature and the auxiliary feature obtained by the previous level of downsampling; performing multi-level upsampling on the abbreviated features to obtain a filtering block of a reconstruction block; the method comprises the steps of respectively carrying out upsampling on main features and auxiliary features with the same size in each level of upsampling, enabling the sampled main features to be the main features and the auxiliary features obtained by the upsampling of the previous level and the fusion features of the downsampling main features with the corresponding sizes, and enabling a filtering block to be the main features obtained by the upsampling of the last level. Through the mode, the method can optimize the feature extraction capability and is beneficial to improving the filtering effect.

Description

Image filtering, coding and decoding method and related equipment
Technical Field
The present invention relates to the field of video coding technologies, and in particular, to an image filtering method, an encoding method, a decoding method, an encoder, a decoder, and a computer-readable storage medium.
Background
The video image data volume is large, and usually, the video pixel data needs to be compressed to form a video code stream, and the video code stream is transmitted to a user terminal in a wired or wireless network manner and then decoded for the user to watch. The whole video coding process comprises the processes of block division, prediction, transformation, quantization, coding and the like, and can also filter video pixel data so as to enable the image to be more natural. However, the existing image filtering method is relatively simple, usually, the image input is simply convolved, activated, and the like, the filtering effect is poor, and a relatively large improvement space still exists in the feature extraction capability.
Disclosure of Invention
In view of the above, the technical problem mainly solved by the present invention is to provide an image filtering method, an encoding method, a decoding method, an encoder, a decoder and a computer readable storage medium, which can optimize the feature extraction capability and facilitate the improvement of the filtering effect.
In order to solve the technical problems, the invention adopts a technical scheme that: an image filtering method is provided. The image filtering method includes: acquiring a reconstruction block of a current coding block; the method comprises the steps of carrying out multi-level down-sampling on a reconstruction block to obtain the thumbnail features of the reconstruction block, wherein the down-sampling of each level is respectively carried out on main features and auxiliary features with the same size, the sampled main features are the fusion features of the main features and the auxiliary features obtained by the up-level down-sampling, and the sampled auxiliary features are the auxiliary features obtained by the up-level down-sampling; and performing multi-level upsampling on the abbreviated features to obtain a filtering block of a reconstruction block, wherein the upsampling is performed on the main feature and the auxiliary feature which have the same size in each level of upsampling, the sampled main feature is the main feature and the auxiliary feature obtained by the upsampling of the previous level and the fusion feature of the downsampling main feature with the corresponding size, the sampled auxiliary feature is the auxiliary feature obtained by the upsampling of the previous level, and the filtering block is the main feature obtained by the upsampling of the last level.
In an embodiment of the present invention, the reconstruction block is input into a neural network to perform sampling processing on the reconstruction block, and the neural network includes a main branch network, an auxiliary branch network and a connection unit; the main branch network comprises a plurality of main down-sampling units and a plurality of main up-sampling units which are connected so as to carry out multi-stage down-sampling and multi-stage up-sampling on the main features; the auxiliary branch network comprises a plurality of auxiliary down-sampling units and/or a plurality of auxiliary up-sampling units which are connected so as to carry out multi-level down-sampling and/or multi-level up-sampling on the auxiliary features; the connecting unit comprises a plurality of first connecting units and a plurality of second connecting units, the first connecting units are respectively used for connecting the corresponding auxiliary down-sampling unit and the corresponding main down-sampling unit and connecting the corresponding auxiliary up-sampling unit and the corresponding main up-sampling unit, and the second connecting units are respectively used for connecting the corresponding main down-sampling unit and the corresponding main up-sampling unit.
In an embodiment of the present invention, a plurality of neural networks respectively filter the reconstruction block, each neural network includes a main branch network and an auxiliary branch network, and the main branch network/the auxiliary branch network in different neural networks have different structures, the method includes: respectively utilizing a plurality of neural networks to obtain a plurality of basic filtering blocks of the reconstruction block; and carrying out weighted fusion on the plurality of basic filtering blocks to obtain the filtering block of the current coding block.
In an embodiment of the invention, an image mask of a current coding block is obtained, and the image mask is used for identifying the degree of filtering to be performed on each pixel point in the current coding block; and carrying out fusion processing on the filtering block and the image mask to obtain a fusion filtering block of the current coding block.
In an embodiment of the invention, the size of the image mask is the same as the size of the filtering block; the mask value corresponding to each pixel point in the image mask includes an identification value for identifying whether filtering is required or not, or a degree value for identifying a degree to be filtered.
In one embodiment of the invention, a pre-constructed mask label is used for guiding a neural network to generate an image mask; constructing a mask label comprises: acquiring a pixel value of an image block corresponding to a current coding block in an original image; comparing the pixel value of the image block in the original image with the pixel value of the basic reconstruction block of the current coding block; and assigning the mask label according to the pixel value difference between the pixel value of the image block in the original image and the pixel value of the basic reconstruction block of the current coding block.
In an embodiment of the present invention, the reconstruction block includes a base reconstruction block or a fused reconstruction block, the fused reconstruction block is a reconstruction block obtained by fusing the base reconstruction block and side information, and the side information includes intermediate information obtained in a process of encoding and decoding a current coding block.
In an embodiment of the present invention, the side information includes one or more of prediction information, residual information, and partition information of the current coding block, the partition information includes a partition boundary of sub-blocks obtained by partitioning, and the sub-blocks are obtained by partitioning a reconstructed block, a prediction block, or a residual block of the current coding block.
In an embodiment of the invention, the pixels of the sub-blocks are assigned by using pixel statistical values, wherein the pixel statistical values are any one of the average value, the maximum value, the minimum value, the pixel value with the most occurrence times and the median of the pixel values of the pixels in the sub-blocks; or assigning the pixels of the sub-blocks according to the positions of the pixels, so that the pixel values of the pixels at the boundaries of the sub-blocks are the same, and the pixel values of the rest positions are the same; or assigning the pixels of the sub-block according to the distance from the sub-block boundary.
In an embodiment of the present invention, the reconstructed pixels of the current coding block include luma component pixels, first chroma component pixels, and second chroma component pixels; the fusion reconstruction block of the brightness component is a reconstruction block obtained by fusing a basic reconstruction block of the brightness component and side information of the brightness component; or the fused reconstruction block of the first chroma component is a reconstruction block obtained by fusing the basic reconstruction block of the first chroma component and the side information of the first chroma component; or the fused reconstruction block of the second chrominance component is a reconstruction block obtained by fusing the basic reconstruction block of the second chrominance component and the side information of the second chrominance component; or at least one of the fusion reconstruction block of the first chrominance component and the fusion reconstruction block of the second chrominance component is fused with the fusion reconstruction block of the luminance component to obtain a comprehensive reconstruction block, and the comprehensive reconstruction block is used as the fusion reconstruction block of the luminance component/the fusion reconstruction block of the first chrominance component/the fusion reconstruction block of the second chrominance component.
In order to solve the technical problem, the invention adopts another technical scheme that: a decoding method is provided. The decoding method comprises the following steps: acquiring coded data; decoding the encoded data to obtain a first decoded image; the first decoded image is filtered by using the image filtering method as set forth in any of the above embodiments.
In order to solve the technical problem, the invention adopts another technical scheme that: an encoding method is provided. The encoding method comprises the following steps: acquiring a reconstructed pixel value of a current coding block; carrying out filtering processing on the reconstructed pixel value by using an image filtering method as set forth in any one of the above embodiments; and coding the current coding block based on the reconstructed pixel value after filtering processing to generate coded data.
In order to solve the technical problem, the invention adopts another technical scheme that: a decoder is provided. The decoder comprises a processor for executing instructions to implement an image filtering method as set forth in any of the above embodiments, or a decoding method as set forth in the above embodiments.
In order to solve the technical problem, the invention adopts another technical scheme that: an encoder is provided. The encoder comprises a processor for executing instructions to implement an image filtering method as set forth in any of the above embodiments, or an encoding method as set forth in the above embodiments.
In order to solve the technical problem, the invention adopts another technical scheme that: a computer-readable storage medium is provided. The computer readable storage medium is used for storing instructions/program data that can be executed to implement an image filtering method as set forth in any one of the above embodiments, or a decoding method as set forth in the above embodiments, or an encoding method as set forth in the above embodiments.
The beneficial effects of the invention are: different from the prior art, in the image filtering method, the multi-level down sampling is carried out on the reconstructed block of the current coding block so as to obtain the semantic information of the reconstructed block; when down-sampling is carried out, the main feature and the auxiliary feature with the same size are respectively down-sampled, and the sampled main feature is the fusion feature of the main feature and the auxiliary feature obtained by the up-level down-sampling, so that more features and related information can be fused when the down-sampling is carried out, and the thumbnail feature of the reconstruction block is obtained; the method comprises the steps of carrying out multistage upsampling on the abbreviated features, carrying out upsampling on the main features and the auxiliary features which are matched in size in the upsampling process, wherein the sampled main features are the main features and the auxiliary features which are obtained by the upsampling of the previous stage and the fusion features of the main features which are corresponding to the size in the downsampling process, namely, the features which are matched in size are fused and then are subjected to sampling processing, so that the feature extraction capability is optimized, sufficient semantic information can be obtained, and the filtering effect is favorably improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. Moreover, the drawings and the description are not intended to limit the scope of the inventive concept in any way, but rather to illustrate it by those skilled in the art with reference to specific embodiments.
FIG. 1 is a flow chart illustrating an embodiment of an image filtering method according to the present invention;
FIG. 2 is a flow chart illustrating an embodiment of filtering performed by the neural network of the present invention;
FIG. 3 is a schematic diagram of a neural network according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of another embodiment of a neural network of the present invention;
FIGS. 5a-5b are schematic flow charts illustrating assignment of filter blocks by image masks according to an embodiment of the present invention;
FIG. 6 is a flow chart illustrating filtering performed by the multi-neural network according to an embodiment of the present invention;
FIGS. 7a-7b are schematic structural diagrams illustrating an embodiment of partition information block assignment according to the present invention;
FIGS. 8a-8f are schematic flow charts illustrating one embodiment of filtering component pixel values according to the present invention;
FIG. 9 is a flowchart illustrating an embodiment of the encoding method of the present invention;
FIG. 10 is a flow chart of another embodiment of the encoding method of the present invention;
FIG. 11 is a flowchart illustrating an embodiment of a decoding method according to the present invention;
FIG. 12 is a schematic diagram of an embodiment of an encoder of the present invention;
FIG. 13 is a block diagram of an embodiment of a decoder according to the present invention;
FIG. 14 is a schematic structural diagram of an embodiment of a computer-readable storage medium of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
In order to solve the technical problem of poor image filtering effect in the prior art, the invention provides an image filtering method, an encoding method, a decoding method, a codec and a computer readable storage medium, wherein the image filtering method is used for improving the image filtering effect and comprises the following steps: acquiring a reconstruction block of a current coding block; the method comprises the steps of carrying out multi-level down-sampling on a reconstruction block to obtain the thumbnail features of the reconstruction block, wherein the down-sampling of each level is respectively carried out on main features and auxiliary features with the same size, the sampled main features are the fusion features of the main features and the auxiliary features obtained by the up-level down-sampling, and the sampled auxiliary features are the auxiliary features obtained by the up-level down-sampling; and performing multi-level upsampling on the abbreviated features to obtain a filtering block of a reconstruction block, wherein the upsampling is performed on the main feature and the auxiliary feature which have the same size in each level of upsampling, the sampled main feature is the main feature and the auxiliary feature obtained by the upsampling of the previous level and the fusion feature of the downsampling main feature with the corresponding size, the sampled auxiliary feature is the auxiliary feature obtained by the upsampling of the previous level, and the filtering block is the main feature obtained by the upsampling of the last level. The image filtering method, the encoding method, the decoding method, the codec, and the computer-readable storage medium of the present invention are explained in detail below.
Referring to fig. 1, fig. 1 is a flowchart illustrating an image filtering method according to an embodiment of the present invention.
It should be noted that the image filtering method set forth in this embodiment is not limited to the following steps:
s101: and acquiring a reconstruction block of the current coding block.
In this embodiment, the reconstructed block of the current coding block is obtained through intra-frame prediction/inter-frame prediction, transformation, quantization, inverse transformation, inverse quantization, and the like.
S102: and performing multi-stage down-sampling on the reconstruction block to obtain the abbreviated features of the reconstruction block, wherein the down-sampling of each stage is performed on the main features and the auxiliary features with the same size, the sampled main features are the fusion features of the main features and the auxiliary features obtained by the up-stage down-sampling, and the sampled auxiliary features are the auxiliary features obtained by the up-stage down-sampling.
In this embodiment, when the reconstruction block is downsampled, the auxiliary features and the main features with the same size can be fused to obtain a fusion feature, and the fusion feature is downsampled to obtain a next-stage main feature, so that semantic information of the main feature is enriched, and the main feature after downsampling processing can include semantic information of a previous-stage main feature and the auxiliary features, that is, semantic information of the current main feature can be enriched, which is beneficial to optimizing feature extraction capability; meanwhile, the auxiliary features can be downsampled to obtain the next-level auxiliary features for fusing with the main features with the same size, and the semantic information of the main features is enriched, so that the obtained abbreviated features can contain rich semantic information, the feature extraction capability can be optimized, and the filtering effect can be improved.
S103: and performing multi-level upsampling on the abbreviated features to obtain a filtering block of a reconstruction block, wherein the upsampling is performed on the main feature and the auxiliary feature which have the same size in each level of upsampling, the sampled main feature is the main feature and the auxiliary feature obtained by the upsampling of the previous level and the fusion feature of the downsampling main feature with the corresponding size, the sampled auxiliary feature is the auxiliary feature obtained by the upsampling of the previous level, and the filtering block is the main feature obtained by the upsampling of the last level.
In this embodiment, after the reconstruction block is subjected to multi-level down-sampling to obtain the abbreviated feature of the reconstruction block, the abbreviated feature is subjected to multi-level up-sampling to obtain the filter block of the reconstruction block, where the size of the filter block of the reconstruction block is the same as the size of the reconstruction block. When the main feature is up-sampled, the main feature obtained by up-sampling at the previous stage, the auxiliary features with the same size and the main feature in the down-sampling process are fused to obtain a fused feature, and the fused feature is up-sampled to obtain the main feature at the next stage. When the main feature is up-sampled for the first time, the fused upper-level main feature is the main feature obtained by upper-level down-sampling. Meanwhile, the auxiliary features can be downsampled to obtain the next-stage auxiliary features, the next-stage auxiliary features are used for being fused with the main features with the same size, and information contained in the main features is enriched, so that feature extraction capability can be optimized, and the filtering effect can be improved.
In this embodiment, the number of the auxiliary features used for fusing with the main feature may be one or more, and is not limited herein.
Therefore, the reconstruction block of the current coding block is subjected to multi-level down sampling to obtain the semantic information of the reconstruction block; when the reconstruction block is subjected to multi-level down sampling, auxiliary features with the same size can be fused, and information contained in the abbreviated features is enriched; when multi-level up-sampling is carried out on the thumbnail features, the main features obtained by up-sampling at the upper level, the auxiliary features with the same size and the main features with the same size in the down-sampling process can be combined, quantization loss is removed, low-level image texture details and high-level semantic information can be combined, images can be better recovered, image quality is improved, information contained in the combined features is enriched, feature extraction capability is optimized, and the filtering effect is favorably improved.
Referring to fig. 2 and fig. 3 in combination, fig. 2 is a schematic flow chart of an embodiment of filtering performed by the neural network of the present invention, and fig. 3 is a schematic structural diagram of an embodiment of the neural network of the present invention.
In an embodiment, the reconstruction block may be input into a neural network, and the sampling process is performed on the reconstruction block by the neural network, and the sampling process includes the up-sampling and the down-sampling explained in the above embodiments.
Specifically, the neural network includes a primary branch network, a secondary branch network, and a connection unit. The main branch network comprises a plurality of main down-sampling units and a plurality of main up-sampling units which are connected, wherein the main down-sampling units are used for down-sampling the fusion features of the upper-level main features and the auxiliary features with the same size, and the main up-sampling units are used for down-sampling the fusion features of the upper-level main features, the auxiliary features with the same size and the main features with the same size in the down-sampling process. That is, the plurality of main downsampling units are configured to downsample the main feature in multiple stages, and the plurality of main upsampling units are configured to upsample the main feature in multiple stages to filter the main feature.
The auxiliary branch network comprises a plurality of auxiliary down-sampling units and a plurality of auxiliary up-sampling units which are connected, wherein the auxiliary down-sampling units are used for performing multi-level down-sampling on auxiliary features to obtain auxiliary features with the same size as the main features, and are used for fusing the main features with the same size to enrich semantic information of the main features.
The network structure of the main branch network may be the same as or different from that of the auxiliary branch network, and the difference is that the output of the auxiliary branch network is input to the main branch network and is not independently output from the neural network. The input of the auxiliary branch network and the input of the main branch network can be the same, namely, the input of the main branch network and the input of the auxiliary branch network are both reconstruction blocks; alternatively, the output of a certain main downsampling unit/convolutional layer in the main branch network may be used as the input of the auxiliary branch network, which is not limited herein.
The connecting unit comprises a plurality of first connecting units and a plurality of second connecting units, the first connecting units are respectively used for connecting corresponding auxiliary down-sampling units and main down-sampling units and connecting corresponding auxiliary up-sampling units and main up-sampling units, and the second connecting units are respectively used for connecting corresponding main down-sampling units and main up-sampling units, so that the auxiliary features can be fused with the main features with the same size through the connecting units. Taking the auxiliary down-sampling unit and the main down-sampling unit as examples, the corresponding auxiliary down-sampling unit and main down-sampling unit means that the main feature output by the main down-sampling unit is the same as the auxiliary feature output by the auxiliary down-sampling unit in size.
The connection unit may be a module that is not actually used for connecting the main branch network and the auxiliary branch network/connecting the main down-sampling unit and the main up-sampling unit of the main branch network.
For example, the connection unit may be configured to, when the number of feature map channels, the width, and the height of a certain auxiliary feature in the auxiliary branch network are the same as the number of feature map channels, the width, and the height of a certain main feature in the main branch network, add or multiply the feature maps of the auxiliary feature and the feature maps of the main feature point by point, and use the obtained result as an input of a next operation (which may be upsampling or downsampling) of the main branch network. Or, when the width and height of the feature map of a certain auxiliary feature in the auxiliary feature network are the same as those of a certain main feature in the main branch network, the channels of the main feature and the auxiliary feature may be spliced, that is, the number of feature channels of the fused feature is the number of feature channels of the main branch network plus the number of feature channels of the auxiliary branch network, and the obtained fused feature is used as the input for the next processing of the main branch network.
As shown in fig. 2, taking downsampling of the main feature and the auxiliary feature as an example, the main feature and the auxiliary feature are represented by 2: the downsampling at the ratio of 1 may be implemented by convolution with a convolution layer whose stride (step size) is 2, may also be implemented by a conventional interpolation filter, and may also be implemented by other manners, which are not limited herein.
It should be noted that the auxiliary down-sampling unit and the auxiliary up-sampling unit included in the auxiliary branch network do not need to be completely the same as the main down-sampling unit and the main up-sampling unit included in the main branch network. For example, the down-sampling by the secondary down-sampling unit of the secondary branch network may be 4: a ratio of 1 is downsampled to obtain the secondary feature, and the main downsampling unit of the main branch network may be 2: 1 to obtain a main feature, and fusing an auxiliary feature with the same size with the main feature, wherein in a popular way, when the size of the main feature at the current stage is the same as that of the auxiliary feature, the size of the main feature obtained after two-stage down-sampling is the same as that of the auxiliary feature after one-time down-sampling, and the main feature and the auxiliary feature are fused.
In an alternative embodiment, as shown in fig. 3, the auxiliary branch network may further not include an auxiliary upsampling unit, that is, the auxiliary branch network includes several convolutional layers and auxiliary downsampling units, and the auxiliary feature is used to be fused with the main feature during the downsampling process of the main feature, and can also play a role in optimizing the feature extraction capability, thereby being beneficial to improving the filtering effect.
In an alternative embodiment, not shown in the figure, the auxiliary branch network may not include an auxiliary down-sampling unit, that is, the auxiliary branch network includes a plurality of convolution layers and an auxiliary up-sampling unit, so that an auxiliary feature obtained by up-sampling an input feature size of the auxiliary branch network can be fused with a main feature in the up-sampling process, an input of the auxiliary branch network may be a main feature output by completing multi-level down-sampling in the main branch network, or a main feature output by completing partial down-sampling, or an input obtained by other methods, which is not limited herein.
Referring to fig. 4 and fig. 5a-5b in combination, fig. 4 is a schematic structural diagram of another embodiment of the neural network of the present invention, and fig. 5a-5b are schematic flow charts of an embodiment of assigning values to filter blocks by using an image mask according to the present invention.
In an embodiment, the neural network includes a main branch network and an auxiliary branch network, and at the same time, the neural network further includes an attention branch network, the attention branch network includes an attention layer and a plurality of convolution layers, the plurality of convolution layers are connected with the attention branch network input and the attention layer and are used for generating an image Mask (Mask), and the image Mask is used for predicting whether each pixel point in an output filter block needs to be filtered or not, so as to further improve the visual effect of an image. In alternative embodiments, the image mask may be obtained by other methods, which are not limited herein.
Specifically, an image mask of a current coding block is obtained, wherein the image mask is used for identifying the degree of filtering to be performed on each pixel point in the current coding block; and carrying out fusion processing on the filtering block and the image mask to obtain a fusion filtering block of the current coding block. That is to say, when the filtering block is output, an image mask is output, the size of the image mask is the same as that of the filtering block, and the value corresponding to each pixel point in the image mask can indicate whether the pixel point needs to be filtered and the degree of filtering needs to be performed, so that the filtering block can be fused by using the image mask to obtain a fused filtering block, and the fused filtering block is used as the output of the neural network.
Wherein, the image mask of the current coding block can be generated by guiding the neural network by using the pre-constructed mask label. Alternatively, the mask label may be obtained by the following implementation: acquiring the pixel value of an image block corresponding to a current coding block in an original image, wherein the original image refers to an image before the coding process, but not a reconstructed image obtained through intra-frame prediction/inter-frame prediction, and the pixel value of the image block in the original image and the pixel value of a basic reconstructed block of the current coding block can be compared when the pixel value of a fusion filtering block is close to the pixel value of the original image; and assigning a mask label according to the pixel value difference between the pixel value of the image block in the original image and the pixel value of the basic reconstruction block of the current coding block, wherein the mask label is used for training a neural network and guiding the neural network to learn and generate an image mask, namely the mask label is an optimization target output by the image mask.
For example, a mask label (label) of an image mask may be constructed by using an original image, and a corresponding block of the original image before encoding is used as a label of a reconstruction block, for example, as shown in fig. 5a, a pixel point of which a pixel value in an original image block is matched with a pixel value in a reconstruction block input to a neural network, the mask label of the pixel point position in the image mask is assigned to 0, which indicates that the pixel value of the pixel point in the reconstruction block is matched with or even the same as the pixel value of the pixel point in the original image, and the pixel point may not need to be filtered, and the pixel value of the pixel point in the fusion reconstruction block is taken as the pixel value of the pixel point in the reconstruction block; the mask label of the pixel point position in the image mask is assigned to be 1, the pixel value of the pixel point in the reconstruction block is indicated to be not matched with the pixel value of the pixel point in the original image, the pixel point needs to be filtered, and the pixel value of the pixel point in the fusion reconstruction block is the pixel value of the corresponding pixel point in the filtering block after being filtered by the neural network.
Or, as shown in fig. 5b, a corresponding block of the original image before encoding may also be used as a label (label) of the reconstruction block, and the original image is used to construct a mask label of the image mask, where a pixel value in the original image block is a pixel point with a pixel value matching the pixel value in the reconstruction block input to the neural network, and the mask label at the pixel point position in the image mask is assigned to 0, which indicates that the pixel value of the pixel point in the reconstruction block matches or is even the same as the pixel value of the pixel point in the original image, and the pixel point does not need to be filtered, and the pixel value of the pixel point in the reconstruction block is fused to take the pixel value of the pixel point in the reconstruction block; in the training process, firstly, the mask label of the pixel point position of which the pixel value is not 0 in the output image mask is assigned to be 1, the pixel value of the pixel point in the reconstruction block is indicated to be not matched with the pixel value of the pixel point in the original image, the pixel point needs to be filtered, then the loss is calculated by combining the mask label, the required filtering degree, namely the assignment of the image mask is obtained, the pixel value of the part of the pixel points in the fusion filtering block is the product of the pixel value of the corresponding pixel point of the filtering block and the image mask after being filtered by the neural network, so that the fusion reconstruction block is obtained and used as the output of the neural network, the filtering of the reconstruction block of the current coding block is indicated to be completed, and the reconstruction filtering block can be close to the visual effect in the original image and even can be used for repeatedly engraving the visual effect in the original image.
The mask value corresponding to each pixel point in the image mask includes an identification value for identifying whether filtering is required, for example, the identification value required for filtering is 1, and the identification value not required for filtering is 0; or, the degree value used for identifying the degree to be filtered may be different integer values, or float (floating point type data type) values, which are not limited herein, so as to represent different degrees of band filtering by different values.
Referring to fig. 6, fig. 6 is a schematic flow chart illustrating filtering performed by the multi-neural network according to an embodiment of the present invention.
In an embodiment, the filtering of the reconstructed block by the plurality of neural networks respectively includes a plurality of network groups, each neural network includes a main branch network and an auxiliary branch network, and the main branch networks/auxiliary branch networks in different neural networks have different structures, so as to output a plurality of basic filtering blocks of the reconstructed block by the plurality of neural networks respectively, and perform weighted fusion on the plurality of basic filtering blocks, as shown in the following formula 1-1, to obtain a filtering block of the current coding block as a filtering block output after completing filtering, that is, a filtering block of the current coding block. The filtering blocks obtained by weighted fusion can contain information of the basic filtering blocks output by the neural networks, the feature expression of the filtering blocks of the current coding block is enriched, namely the structure output by the neural networks is passed, the feature extraction capability is optimized, the reconstruction quality is improved, and the filtering effect on the current coding block is improved.
X is a 1X 1+ a 2X 2+ … + an Xn (formula 1-1)
Wherein X represents a filtering block of a current coding block output by the neural network; ai (i ═ 1,2, … n) denotes the weight of the ith neural network, and a1+ a2+ … + an ═ 1; xi (i ═ 1,2, … n) denotes the base filter block of the ith neural network output.
Optionally, when performing weighted fusion on a plurality of basic filtering blocks output by a plurality of neural networks, each neural network output basic filtering block may assign the same weight; the weights may also be adjusted according to the difference between the filtering block of the current coding block and the original image block output by the neural network, which is not limited herein.
In an alternative embodiment, the reconstruction block is filtered by other network models and at least one neural network as mentioned above in the application; the other network model can be one or more of a ResNet network model, a U-net network model and a DenseNet network model, the neural network comprises a main branch network and an auxiliary branch network, and the main branch network/auxiliary branch network structures in different neural networks can be different, so that a plurality of basic filtering blocks of a plurality of neural network output reconstruction blocks can be respectively utilized to perform weighted fusion on the basic filtering blocks.
Or filtering the reconstruction blocks through different network models respectively, and performing weighted fusion on the plurality of filtering blocks, wherein the network model can be one or a combination of ResNet network model, U-net network model and DenseNet network model. Taking a ResNet network model and a U-net network model as examples, the ResNet network model and the U-net network model respectively filter the reconstruction block to obtain two basic filter blocks, and because the network model structures and the working principles of the ResNet network model and the U-net network model are different, the obtained two basic filter blocks may have difference, the obtained two basic filter blocks are weighted and fused to obtain the filter block of the current coding block, and the filter block comprises information of the basic filter blocks output by the two network groups, so that the feature extraction capability can be optimized, and the filter effect can be improved.
Alternatively, when the two basic filtering blocks are weighted and fused, 50% of the weights can be respectively obtained, as shown in the following formula 1-2; the weights may also be adjusted according to the difference between the filtering block of the current coding block and the original image block, which is not limited herein.
Y ═ b × Y1+ (1-b) Y2 (formula 1-2)
Wherein Y represents a filtering block of a current coding block; b represents a weight, e.g., 50%; y1 denotes the filter block output by the ResNet network model and Y2 denotes the filter block output by the U-net network model.
Please continue with fig. 2. In an embodiment, the reconstructed block may include a basic reconstructed block or a fused reconstructed block, where the basic reconstructed block is a reconstructed block obtained by a current coding block through inter-frame prediction/intra-frame prediction, transform quantization/inverse transform inverse quantization, entropy coding, residual error processing, and the fused reconstructed block is a reconstructed block obtained by fusing the basic reconstructed block and side information, so that the basic reconstructed block can be encoded with the assistance of the side information, which may make the encoding length of the basic reconstructed block shorter, reduce redundancy in the basic reconstructed block, and thereby improve the efficiency of filtering the current coding block. That is to say, the side information includes the intermediate information obtained in the process of encoding and decoding the current coding block, so that the neural network can better filter the current sequence specific pixel point, the generalization can be improved in the learning process of the neural network, and the reconstruction quality after filtering of the neural network is further enhanced.
Further, the side information comprises one or more of prediction information, residual information and division information of the current coding block in the prediction reconstruction process; the prediction information comprises a prediction pixel value of a current coding block obtained by intra-frame prediction, inter-frame prediction or other prediction processes; the residual information comprises a reconstructed residual pixel value obtained after transformation, quantization, inverse quantization and inverse transformation; the partition information includes partition boundaries of sub-blocks obtained by partitioning, where the sub-blocks are obtained by partitioning a reconstructed block, a prediction block, or a residual block of a current coding block, that is, the partition information includes one or more of coding unit partition, prediction block partition, and transform block partition. The subblocks may be obtained by block partitioning an image, where block partitioning refers to that when a frame of image is encoded, a frame of image is input, but when a frame of image is encoded, a frame of image needs to be divided into a plurality of LCUs (largest coding units), and then recursive CU (coding unit) partitioning with different sizes is performed on each coding unit, and video encoding is performed in units of CUs.
If the design is adopted, when the current coding block is filtered, the prediction reconstruction process of the current coding block can be deduced by combining side information, so that the filtering efficiency is improved, and the filtering effect on the current coding block is improved.
Taking the example that the side information includes the partition information, assigning values to the partition information blocks may be according to the following manner:
calculating the average value/maximum value/minimum value/median of the pixel value with the most occurrence times in the dividing boundary, taking the obtained value as a pixel statistical value, namely, the pixel statistical value is any one of the average value, the maximum value, the minimum value, the pixel value with the most occurrence times and the median of the pixel value of each pixel in the subblock, and assigning the pixel of the subblock by using the pixel statistical value to obtain the dividing information block.
Or, assigning the pixels of the sub-blocks according to the positions of the pixels, so that the pixel values of the pixels at the boundaries of the sub-blocks are the same, and the pixel values of the rest positions are the same; as illustrated in fig. 7a, the pixel points at the division boundary may be assigned as 1, and the pixel points at other positions may be assigned as 0.
Or, assigning the pixels of the sub-blocks according to the distance from the boundaries of the sub-blocks, and assigning according to the distance of each pixel point, for example, the boundaries of the sub-blocks, according to the preset pixel value distribution; as illustrated in fig. 7b, the closer the pixel points are to the sub-block boundary, the larger the assignment is, and the farther the pixel points are from the sub-block boundary, the smaller the assignment is. In an alternative embodiment, the assignment of the pixel points farther from the ion block boundary may be larger, which is not limited herein.
In video coding, the most common color coding methods are YUV, RGB, etc., and hereinafter, by taking YUV coding method as an example, Y represents brightness, that is, the gray value of an image; u and V (i.e., Cb and Cr) represent chrominance, which is used to describe image color and saturation. Each Y luma block corresponds to one Cb and one Cr chroma block, and each chroma block corresponds to only one luma block. Taking the sample format of 4:2:0 as an example, one block of N × M corresponds to a luminance block size of N × M, the two corresponding chrominance blocks are both (N/2) × (M/2), and the chrominance block is 1/4 sizes of the luminance block.
Take the example that the reconstructed pixel of the current coding block includes a luma (Y) component pixel, a first chroma (U) component pixel, and a second chroma (V) component pixel.
In one embodiment, a component basic reconstruction block which needs to be filtered through a neural network is input into the neural network, meanwhile, prediction information and residual information of corresponding positions are taken to construct side information of the neural network, the side information and the component basic reconstruction block are fused to obtain a component fusion reconstruction block, and the component fusion reconstruction block is input into the neural network so as to be filtered through the neural network; the basic reconstructed block is a reconstructed block obtained by inter-frame prediction/intra-frame prediction, transform quantization/inverse transform inverse quantization, entropy coding, and residual error processing of the current coding block, and the specific filtering method may be as set forth in the above embodiments, and is not described herein again.
For example, the side information includes prediction information and residual information, please refer to fig. 8a-8f, and fig. 8a-8f are schematic flow charts illustrating filtering component pixel values according to an embodiment of the present invention.
FIG. 8a is a flow chart illustrating filtering of a luminance component pixel according to an embodiment of the present invention, as shown in FIG. 8 a; the method comprises the steps of fusing a luminance component basic reconstruction block, a luminance component prediction information block and a luminance component residual error information block to obtain a luminance component fused reconstruction block, wherein the luminance component fused reconstruction block is a reconstruction block obtained by fusing the luminance component basic reconstruction block and luminance component side information, the luminance component fused reconstruction block comprises a luminance component reconstructed pixel value, prediction information, residual error information and the like, and the luminance component fused reconstruction block is input into a neural network to be filtered to obtain a luminance component filtering block.
FIG. 8b is a flow chart illustrating filtering of the first chrominance component according to an embodiment of the present invention, as shown in FIG. 8 b; fusing the first chrominance component basic reconstruction block, the first chrominance component prediction information block and the first chrominance component residual error information block to obtain a first chrominance component fusion reconstruction block; the first chroma component fusion reconstruction block is a reconstruction block obtained by fusing a basic reconstruction block of the first chroma component and side information of the first chroma component, the first chroma component fusion reconstruction block comprises a reconstruction pixel value, prediction information, residual information and the like of the first chroma component, and the first chroma component fusion reconstruction block is input into a neural network for filtering to obtain a first chroma component filtering block, so that the filtering efficiency can be improved, and the filtering effect can be improved.
FIG. 8c is a flow chart illustrating an embodiment of filtering the second chrominance components according to the present invention, as shown in FIG. 8 c; fusing the second chrominance component basic reconstruction block, the second chrominance component prediction information block and the second chrominance component residual error information block to obtain a second chrominance component fusion reconstruction block; the second chrominance component fusion reconstruction block comprises a reconstruction pixel value, prediction information, residual error information and the like of the second chrominance component, and is input into a neural network for filtering to obtain a second chrominance component filtering block, so that the filtering efficiency can be improved, and the filtering effect can be improved.
Furthermore, at least one of the fusion reconstruction block of the first chrominance component and the fusion reconstruction block of the second chrominance component is fused with the fusion reconstruction block of the luminance component to obtain a comprehensive reconstruction block, and the comprehensive reconstruction block is used as the fusion reconstruction block of the luminance component/the fusion reconstruction block of the first chrominance component/the fusion reconstruction block of the second chrominance component, so that cross-component information can be combined when the comprehensive reconstruction block is used for filtering the component reconstruction block, the combination effect of the luminance component and the chrominance component can be improved finally, and the filtering effect can be improved.
FIG. 8d is a flow chart illustrating another embodiment of filtering the luminance component according to the present invention, as shown in FIG. 8 d; the method comprises the steps of fusing a fusion reconstruction block of a brightness component, a fusion reconstruction block of a first chrominance component and a fusion reconstruction block of a second chrominance component to obtain a comprehensive reconstruction block, wherein the comprehensive reconstruction block can also contain information of the first chrominance component and the second chrominance component, inputting the comprehensive reconstruction block into a neural network for filtering, and the obtained filtering block of the brightness component can contain information of the first chrominance component and the second chrominance component so as to have a good combination effect when combining the information of the brightness component, the first chrominance component and the second chrominance component. Taking the fused reconstructed block of the luminance component as an example, the fused reconstructed block of the luminance component includes a basic reconstructed block of the luminance component, a prediction information block of the luminance component, and a residual information block of the luminance component.
In an alternative embodiment, when filtering the luminance component, the fused reconstruction block of the luminance component may be fused with the fused reconstruction block of the first chrominance component; or, the fused reconstructed block of the luminance component and the fused reconstructed block of the second chrominance component are fused, which is not limited herein.
FIG. 8e is a flow chart illustrating another embodiment of filtering the first chrominance component according to the present invention, as shown in FIG. 8 e; and the integrated reconstruction block can also contain the information of the brightness component, the integrated reconstruction block is input into a neural network for filtering, and the obtained filter block of the first chrominance component can contain the information of the brightness component, so that a good combination effect can be achieved when the information of the brightness component, the information of the first chrominance component and the information of the second chrominance component are combined.
In an alternative embodiment, when filtering the first chrominance component, the fused reconstructed block of the first chrominance component may be fused with the fused reconstructed block of the luminance component and the fused reconstructed block of the second chrominance component, which is not limited herein.
FIG. 8f is a schematic flow chart illustrating another embodiment of filtering the second chrominance components according to the present invention, as shown in FIG. 8 f; and the integrated reconstruction block can also contain the information of the brightness component, the integrated reconstruction block is input into a neural network for filtering, and the obtained filter block of the second chrominance component can contain the information of the brightness component, so that a good combination effect can be achieved when the information of the brightness component, the information of the first chrominance component and the information of the second chrominance component are combined.
In an alternative embodiment, when filtering the second chrominance component, the fused reconstructed block of the first chrominance component may be fused with the fused reconstructed block of the luminance component and the fused reconstructed block of the second chrominance component, which is not limited herein.
Referring to fig. 9 and fig. 10 in combination, fig. 9 is a schematic flowchart of an embodiment of the encoding method of the present invention, and fig. 10 is a schematic flowchart of another embodiment of the encoding method of the present invention. It should be noted that the encoding method set forth in this embodiment is not limited to the following steps:
s201: and acquiring a reconstructed pixel value of the current coding block.
In this embodiment, when video encoding is performed, image frames are input, but when one frame image is encoded, it is necessary to divide one frame into a plurality of LCUs (largest coding units) and then perform recursive CU (coding unit) division of different sizes for each coding unit, and video encoding is performed in units of CUs. The current coding block in this embodiment may correspond to an LCU obtained by dividing an image frame.
The reconstructed pixel value is a pixel value obtained through intra-frame prediction/inter-frame prediction, and the image obtained through intra-frame prediction/inter-frame prediction can be filtered to optimize the visual effect of the image.
S202: and carrying out filtering processing on the reconstructed pixel value by using an image filtering method.
In this embodiment, after obtaining the reconstructed pixel value of the current coding block, the reconstructed pixel value may be subjected to filtering processing. Optionally, filtering the reconstructed pixel value of the current coding block through a neural network; the reconstructed pixel values may be specifically filtered by the image filtering method described above to optimize the visual effect of the image obtained through intra/inter prediction, transformation, quantization, inverse transformation, inverse quantization, and the like.
As shown in fig. 10, the side information set forth in the above-mentioned image filtering method may be obtained from an encoding method, that is, one or more of prediction information, residual information, and partition information are obtained in an encoding process, and participate in filtering a reconstructed block of a current coding block as the side information.
S203: and coding the current coding block based on the reconstructed pixel value after filtering processing to generate coded data.
In this embodiment, the current coding block is encoded based on the reconstructed pixel value after the filtering process in step S202. The code is a number that converts data into a computer-readable form. The current coding block may be coded by an arithmetic coding mode, a side length coding mode, or other coding modes to generate coded data, but is not limited thereto, and other coding modes may also be used, which are not described herein again.
The filtering process for the image generally further includes a loop filtering process, and the filtering process is performed through the loop filtering process. Specifically, the loop filtering process is a process of adjusting pixel values in a reconstructed image after the entire frame of image is reconstructed. Loop filtering is arranged according to a front-back flow: deblocking filtering (DBF), sample adaptive compensation (SAO), Adaptive Loop Filtering (ALF), cross-component adaptive loop filtering (CCALF). In the loop filtering, deblocking filtering is firstly carried out, and a deblocking filtering (DBF) technology is mainly used for filtering block boundaries in the process of block coding to remove the blocking effect, so that the subjective quality of images is greatly improved; then sample adaptive compensation (SAO) is carried out, the technology is a method of classifying pixels and adding a specific compensation value to each type of pixels, the image quality is further improved, and the problems of color shift, image high-frequency information loss and the like can be solved; then, the ALF Filtering is performed, in the ALF technique, a diamond filter is used at an encoding end, and a Wiener Filtering (WF) Filtering method is used to obtain a Filtering coefficient, so as to filter luminance and chrominance components, thereby reducing image distortion. The CCALF technique further adjusts the chrominance components after ALF by using the luminance components after wiener filtering as adjustment values.
In this embodiment, the image filtering method of the present invention may be used in a post-processing process, and filtering is still performed through loop filtering and the like in an encoding and decoding process, taking loop filtering as an example, a reconstructed frame formed by splicing filtering blocks obtained by loop filtering is used as a reference frame to participate in inter-frame prediction in an encoding process, and the image filtering method of the present invention may further filter the filtering blocks obtained by loop filtering and output the filtering blocks obtained by filtering with the image filtering method.
Alternatively, the image filtering method of the present invention can be used to replace the loop filtering process, such as filtering processing using a neural network, replacing SAO filtering and ALF filtering. Taking a decoding process as an example, the existing decoding process is to sequentially perform the steps of code stream decoding, image reconstruction, deblocking filtering, SAO filtering, ALF filtering, and the like. That is, in the present embodiment, the filtering process performed by the neural network can replace several existing filtering modules. In addition, the encoding method in this embodiment is also beneficial to relieving the situation that the encoding and decoding processes are complex.
Alternatively, the image filtering method of the present invention may be inserted into the filtering process as an additional optional filtering module, for example, by using a neural network to perform filtering processing, and be used as a candidate for SAO filtering in the loop filtering process. Taking a decoding process as an example, the existing decoding process is to sequentially execute the steps of code stream decoding, image reconstruction, deblocking filtering, SAO filtering, ALF filtering, and the like, in this embodiment, the decoding process is changed to sequentially execute code stream decoding, image reconstruction, and deblocking filtering, then a neural network can be selected to perform filtering processing or SAO filtering can be selected to perform filtering processing, and after filtering processing is performed by using the neural network or SAO filtering, the ALF filtering step is executed, so that the encoding/decoding process is more flexible and better in selectivity.
Referring to fig. 11, fig. 11 is a flowchart illustrating a decoding method according to an embodiment of the invention.
S301: encoded data is acquired.
In the present embodiment, encoded data generated in the encoding method is acquired. The encoded data may be obtained by the encoding method set forth in the above embodiment, or may be obtained by a conventional encoding method, which is not limited herein.
S302: and decoding the encoded data to obtain a first decoded image.
In this embodiment, the encoded data is analyzed, the encoding manner used for representing the encoded data in the encoded data is read, and corresponding inverse operation is performed to perform decoding, so as to obtain division information, residual information, prediction information, and the like, so as to obtain a first decoded image, which is not described herein again.
S303: and performing filtering processing on the first decoded image by using an image filtering method.
In this embodiment, after the first decoded image is acquired, the first decoded image may be subjected to filtering processing. Optionally, the first decoded image may be subjected to filtering processing by a neural network; specifically, the first decoded image may be filtered by the image filtering method described above, so as to optimize the visual effect of the decoded first decoded image. The image filtering method in the above embodiment is used to filter the first decoded image, and similar to the encoding method, the image filtering method may replace the existing filtering process, or be inserted as an additional optional filtering module in the existing filtering process, or be used in the post-processing process, which is not limited herein.
The following illustrates the relevant syntax used in the present invention:
(1) neural network tool switch syntax
Whether the neural network is applied to filtering can be controlled by setting a switch syntax, namely the switch syntax is represented by a neural network tool to indicate whether the neural network is adopted to filter in the current coding and decoding process, and if the neural network is not used to filter, other syntax does not need to be transmitted.
For example, it may be represented by syntax "nn _ filter _ flag ═ 1" that the neural network tool is turned on, and the neural network is applied for filtering; the syntax 'nn _ filter _ flag is 0' to indicate that the neural network tool is closed, and the neural network is not required to be applied for filtering; and the coding end transmits the syntax of 'nn _ filter _ flag' to the decoding end, and the decoding end selects whether to use a neural network for filtering according to the assignment of the nn _ filter _ flag in the syntax.
(2) Frame level switch syntax
Whether each video frame needs to be filtered by a neural network is controlled by setting a switch syntax, namely whether each frame needs to be filtered by the neural network is indicated by the frame level switch syntax. For the components (such as YUV or RGB, etc.) in a frame of video frame, three switch syntaxes may be used to transmit whether the corresponding components of the video frame are filtered by using a neural network, or the three components may share one switch syntax to control, that is, one syntax is used to control the three components to be filtered by using the neural network, or neither syntax is used to control the filtering by using the neural network.
For example, taking the YUV encoding method as an example, when Y, U, V three components share one frame level switch syntax, the transmission syntax of each frame video frame is "nn _ filter _ frame", where "nn _ filter _ frame ═ 1" may indicate that Y, U, V three components in a video frame are all filtered using a neural network; "nn _ filter _ frame ═ 0" may indicate that none of the three components Y, U, V in a frame were filtered using a neural network.
When Y, U, V the three components are divided into three frame level switch syntax, the per frame video frame transmission syntax is "nn _ filter _ frame [ i ], i is 0,1, 2" (for example, 0 represents Y component, 1 represents U component, and 2 represents V component), where "nn _ filter _ frame [ i ] ═ 1" represents that the ith component in a frame is filtered using a neural network, and "nn _ filter _ frame [ i ] ═ 0" represents that the ith component in a frame is not filtered using a neural network.
Referring to fig. 12, fig. 12 is a schematic structural diagram of an encoder according to an embodiment of the present invention.
In an embodiment, the encoder 30 includes a processor 31, and the processor 31 may also be referred to as a CPU (Central Processing Unit). The processor 31 may be an integrated circuit chip having signal processing capabilities. The processor 31 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor 31 may be any conventional processor or the like.
The encoder 30 may further include a memory (not shown) for storing instructions and data required for the processor 31 to operate.
The processor 31 is configured to execute instructions to implement the filtering method as set forth in any of the above embodiments, or the encoding method as set forth in the above embodiments.
Referring to fig. 13, fig. 13 is a schematic structural diagram of a decoder according to an embodiment of the present invention.
In one embodiment, decoder 40 includes a processor 41, and processor 41 may also be referred to as a CPU (Central Processing Unit). The processor 41 may be an integrated circuit chip having signal processing capabilities. The processor 41 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor 41 may be any conventional processor or the like.
Decoder 40 may further include a memory (not shown) for storing instructions and data necessary for processor 41 to operate.
The processor 41 is configured to execute instructions to implement a filtering method as set forth in any of the above embodiments, or a decoding method as set forth in the above embodiments.
Referring to fig. 14, fig. 14 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.
In an embodiment, the computer readable storage medium 50 is used for storing instructions/program data 51, and the instructions/program data 51 can be executed to implement the filtering method as described in any of the above embodiments, or the encoding method as described in the above embodiments, or the decoding method as described in the above embodiments, which will not be described herein again.
In the several embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are illustrative, e.g., a division of modules or units into one logical division, and an actual implementation may have another division, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit may be stored in a computer-readable storage medium if it is implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, the technical solution of the present invention, which is substantially or partly contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a computer readable storage medium 50 and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method set forth in the embodiments of the present invention. And the aforementioned computer-readable storage medium 50 includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, a server, and other various media capable of storing program codes.
In addition, in the present invention, unless otherwise expressly specified or limited, the terms "connected," "stacked," and the like are to be construed broadly, e.g., as meaning permanently connected, detachably connected, or integrally formed; they may be directly connected or indirectly connected through intervening media, or may be connected through the use of two elements or the interaction of two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (15)

1. An image filtering method, comprising:
acquiring a reconstruction block of a current coding block;
the reconstruction block is subjected to multi-level down-sampling to obtain the abbreviated features of the reconstruction block, wherein the down-sampling of each level is respectively carried out on the main features and the auxiliary features with the same size, the sampled main features are the fusion features of the main features and the auxiliary features obtained by the up-level down-sampling, and the sampled auxiliary features are the auxiliary features obtained by the up-level down-sampling;
the method comprises the steps of carrying out multistage upsampling on the abbreviated features to obtain a filtering block of the reconstruction block, wherein upsampling is carried out on main features and auxiliary features which are the same in size in each stage of upsampling, the sampled main features are the main features and the auxiliary features which are obtained by the previous stage of upsampling and the fusion features of the downsampling main features corresponding to the sizes, the sampled auxiliary features are the auxiliary features which are obtained by the previous stage of upsampling, and the filtering block is the main features obtained by the last stage of upsampling.
2. The image filtering method according to claim 1, characterized in that the method comprises:
inputting the reconstruction block into a neural network to perform sampling processing on the reconstruction block, wherein the neural network comprises a main branch network, an auxiliary branch network and a connection unit;
the main branch network comprises a plurality of main down-sampling units and a plurality of main up-sampling units which are connected so as to carry out multi-stage down-sampling and multi-stage up-sampling on the main features;
the auxiliary branch network comprises a plurality of auxiliary down-sampling units and/or a plurality of auxiliary up-sampling units which are connected so as to carry out multi-level down-sampling and/or multi-level up-sampling on auxiliary features;
the connecting unit includes a plurality of first connecting units and a plurality of second connecting unit, a plurality of first connecting units are used for connecting corresponding supplementary down-sampling unit and main down-sampling unit respectively to and connect corresponding supplementary up-sampling unit and main up-sampling unit, a plurality of second connecting units are used for connecting corresponding main down-sampling unit and main up-sampling unit respectively.
3. The image filtering method according to claim 2, wherein the reconstruction block is filtered by a plurality of neural networks, each of the neural networks including a primary branch network and a secondary branch network, and the primary branch network/secondary branch network structure differs among the neural networks, the method comprising:
obtaining a plurality of basic filtering blocks of the reconstruction block by respectively using the plurality of neural networks;
and performing weighted fusion on the plurality of basic filtering blocks to obtain the filtering block of the current coding block.
4. The image filtering method according to claim 1, further comprising:
acquiring an image mask of the current coding block, wherein the image mask is used for identifying the degree of to-be-filtered of each pixel point in the current coding block;
and fusing the filtering block and the image mask to obtain a fused filtering block of the current coding block.
5. The image filtering method according to claim 4,
the size of the image mask is the same as that of the filtering block;
the mask value corresponding to each pixel point in the image mask includes an identification value for identifying whether filtering is required or not, or a degree value for identifying a degree to be filtered.
6. The image filtering method according to claim 4 or 5, wherein a pre-constructed mask label is used to guide a neural network to generate the image mask;
constructing the mask label comprises:
acquiring a pixel value of an image block corresponding to the current coding block in an original image;
comparing the pixel value of the image block in the original image with the pixel value of the basic reconstruction block of the current coding block;
and assigning the mask label according to the pixel value difference between the pixel value of the image block in the original image and the pixel value of the basic reconstruction block of the current coding block.
7. The image filtering method according to claim 1,
the reconstruction block comprises a basic reconstruction block or a fusion reconstruction block, the fusion reconstruction block is obtained by fusing the basic reconstruction block and side information, and the side information comprises intermediate information obtained in the process of encoding and decoding the current coding block.
8. The image filtering method according to claim 7,
the side information comprises one or more of prediction information, residual error information and division information of the current coding block, the division information comprises division boundaries of sub blocks obtained by division, and the sub blocks are obtained by dividing a reconstruction block, a prediction block or a residual error block of the current coding block.
9. The image filtering method according to claim 8,
assigning values to the pixels of the sub-blocks by using pixel statistical values, wherein the pixel statistical values are any one of the average value, the maximum value, the minimum value, the pixel value with the most occurrence times and the median of the pixel values of the pixels in the sub-blocks; or
Assigning the pixels of the sub-blocks according to the positions of the pixels, so that the pixel values of the pixels at the boundaries of the sub-blocks are the same, and the pixel values of the rest positions are the same; or
And assigning the pixels of the sub-block according to the distance from the boundary of the sub-block.
10. The image filtering method according to claim 7,
the reconstruction pixel of the current coding block comprises a brightness component pixel, a first chrominance component pixel and a second chrominance component pixel;
the fusion reconstruction block of the brightness component is a reconstruction block obtained by fusing a basic reconstruction block of the brightness component and the side information of the brightness component; or
The fused reconstruction block of the first chrominance component is obtained by fusing the basic reconstruction block of the first chrominance component and the side information of the first chrominance component; or
The fused reconstruction block of the second chrominance component is a reconstruction block obtained by fusing the basic reconstruction block of the second chrominance component and the side information of the second chrominance component; or
And fusing at least one of the fused reconstruction block of the first chrominance component and the fused reconstruction block of the second chrominance component with the fused reconstruction block of the luminance component to obtain a comprehensive reconstruction block, and taking the comprehensive reconstruction block as the fused reconstruction block of the luminance component/the fused reconstruction block of the first chrominance component/the fused reconstruction block of the second chrominance component.
11. A method of decoding, comprising:
acquiring coded data;
decoding the coded data to obtain a first decoded image;
the first decoded image is subjected to a filtering process using the image filtering method according to any one of claims 1 to 10.
12. A method of encoding, comprising:
acquiring a reconstructed pixel value of a current coding block;
-filtering the reconstructed pixel values using the image filtering method according to any one of claims 1 to 10;
and coding the current coding block based on the reconstructed pixel value after filtering processing to generate coded data.
13. A decoder, characterized in that it comprises a processor for executing instructions to implement the filtering method of any one of claims 1 to 10, or the decoding method of claim 11.
14. An encoder, characterized in that it comprises a processor for executing instructions to implement the filtering method of any one of claims 1 to 10, or the encoding method of claim 12.
15. A computer-readable storage medium for storing instructions/program data executable to implement the filtering method of any one of claims 1 to 10, or the decoding method of claim 11, or the encoding method of claim 12.
CN202111664167.4A 2021-12-31 2021-12-31 Image filtering, encoding and decoding methods and related equipment Active CN114501012B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111664167.4A CN114501012B (en) 2021-12-31 2021-12-31 Image filtering, encoding and decoding methods and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111664167.4A CN114501012B (en) 2021-12-31 2021-12-31 Image filtering, encoding and decoding methods and related equipment

Publications (2)

Publication Number Publication Date
CN114501012A true CN114501012A (en) 2022-05-13
CN114501012B CN114501012B (en) 2024-06-11

Family

ID=81497129

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111664167.4A Active CN114501012B (en) 2021-12-31 2021-12-31 Image filtering, encoding and decoding methods and related equipment

Country Status (1)

Country Link
CN (1) CN114501012B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023221599A1 (en) * 2022-05-18 2023-11-23 腾讯科技(深圳)有限公司 Image filtering method and apparatus and device
WO2023246655A1 (en) * 2022-06-20 2023-12-28 华为技术有限公司 Image encoding method and apparatus, and image decoding method and apparatus
WO2024145988A1 (en) * 2023-01-03 2024-07-11 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Neural network-based in-loop filter

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1589524A (en) * 2001-11-19 2005-03-02 皇家飞利浦电子股份有限公司 Time discrete filter comprising upsampling, sampling rate conversion and downsampling stages
CN101312529A (en) * 2007-05-24 2008-11-26 华为技术有限公司 Method, system and apparatus generating up and down sampling filter
CN111598804A (en) * 2020-05-12 2020-08-28 西安电子科技大学 Deep learning-based image multi-level denoising method
WO2020187029A1 (en) * 2019-03-19 2020-09-24 京东方科技集团股份有限公司 Image processing method and device, neural network training method, and storage medium
CN111885280A (en) * 2020-07-17 2020-11-03 电子科技大学 Hybrid convolutional neural network video coding loop filtering method
CN113068031A (en) * 2021-03-12 2021-07-02 天津大学 Loop filtering method based on deep learning
CN113489974A (en) * 2021-07-02 2021-10-08 浙江大华技术股份有限公司 Intra-frame prediction method, video/image coding and decoding method and related device
WO2021228513A1 (en) * 2020-05-15 2021-11-18 Huawei Technologies Co., Ltd. Learned downsampling based cnn filter for image and video coding using learned downsampling feature
CN113766249A (en) * 2020-06-01 2021-12-07 腾讯科技(深圳)有限公司 Loop filtering method, device and equipment in video coding and decoding and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1589524A (en) * 2001-11-19 2005-03-02 皇家飞利浦电子股份有限公司 Time discrete filter comprising upsampling, sampling rate conversion and downsampling stages
CN101312529A (en) * 2007-05-24 2008-11-26 华为技术有限公司 Method, system and apparatus generating up and down sampling filter
WO2020187029A1 (en) * 2019-03-19 2020-09-24 京东方科技集团股份有限公司 Image processing method and device, neural network training method, and storage medium
CN111598804A (en) * 2020-05-12 2020-08-28 西安电子科技大学 Deep learning-based image multi-level denoising method
WO2021228513A1 (en) * 2020-05-15 2021-11-18 Huawei Technologies Co., Ltd. Learned downsampling based cnn filter for image and video coding using learned downsampling feature
CN113766249A (en) * 2020-06-01 2021-12-07 腾讯科技(深圳)有限公司 Loop filtering method, device and equipment in video coding and decoding and storage medium
CN111885280A (en) * 2020-07-17 2020-11-03 电子科技大学 Hybrid convolutional neural network video coding loop filtering method
CN113068031A (en) * 2021-03-12 2021-07-02 天津大学 Loop filtering method based on deep learning
CN113489974A (en) * 2021-07-02 2021-10-08 浙江大华技术股份有限公司 Intra-frame prediction method, video/image coding and decoding method and related device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XUE ZHANG, ET AL: "AHG11: A Deep In-Loop Filter Method", 《JVET-W0059-V5》, 12 July 2021 (2021-07-12) *
宁静艳: "基于深度学习的细胞核图像分割方法研究", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》, 15 June 2020 (2020-06-15) *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023221599A1 (en) * 2022-05-18 2023-11-23 腾讯科技(深圳)有限公司 Image filtering method and apparatus and device
WO2023246655A1 (en) * 2022-06-20 2023-12-28 华为技术有限公司 Image encoding method and apparatus, and image decoding method and apparatus
WO2024145988A1 (en) * 2023-01-03 2024-07-11 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Neural network-based in-loop filter

Also Published As

Publication number Publication date
CN114501012B (en) 2024-06-11

Similar Documents

Publication Publication Date Title
CN111066326B (en) Machine learning video processing system and method
CN114501012B (en) Image filtering, encoding and decoding methods and related equipment
CN106899861B (en) A kind of photograph document handling method and its equipment, system
CN111226442B (en) Method of configuring transforms for video compression and computer-readable storage medium
CN107925762B (en) Video coding and decoding processing method and device based on neural network
CN100568973C (en) The filter method of digital picture and filter plant
CN103202018B (en) The Video coding that uses the data based on sample to prune
TW202019183A (en) Affine linear weighted intra predictions
CN107251557A (en) The coding/decoding of chrominance resolution details
CN110999290B (en) Method and apparatus for intra prediction using cross-component linear model
CN113497941A (en) Image filtering method, encoding method and related equipment
CN115606179A (en) CNN filter for learning-based downsampling for image and video coding using learned downsampling features
CN113068032A (en) Image encoding and decoding method, encoder, decoder, and storage medium
CN104754362B (en) Image compression method using fine-divided block matching
CN118020297A (en) End-to-end image and video coding method based on hybrid neural network
CN116235496A (en) Encoding method, decoding method, encoder, decoder, and encoding system
CN113766247B (en) Loop filtering method and device
CN113544705A (en) Method and apparatus for picture encoding and decoding
CN116438796A (en) Image prediction method, encoder, decoder, and computer storage medium
CN109151503B (en) Picture file processing method and equipment
CN115552905A (en) Global skip connection based CNN filter for image and video coding
CN104935945B (en) The image of extended reference pixel sample value collection encodes or coding/decoding method
CN113489977B (en) Loop filtering method, video/image coding and decoding method and related device
CN112887722B (en) Lossless image compression method
US7197078B2 (en) Video coding/decoding buffering apparatus and buffering method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant