CN114157869A - Filtering method, encoding and decoding method, encoder and decoder and storage medium for video frame - Google Patents

Filtering method, encoding and decoding method, encoder and decoder and storage medium for video frame Download PDF

Info

Publication number
CN114157869A
CN114157869A CN202111162686.0A CN202111162686A CN114157869A CN 114157869 A CN114157869 A CN 114157869A CN 202111162686 A CN202111162686 A CN 202111162686A CN 114157869 A CN114157869 A CN 114157869A
Authority
CN
China
Prior art keywords
filtering
frame
target image
image frame
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111162686.0A
Other languages
Chinese (zh)
Inventor
张雪
江东
林聚财
殷俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202111162686.0A priority Critical patent/CN114157869A/en
Publication of CN114157869A publication Critical patent/CN114157869A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The application discloses a filtering method of a video frame, and a corresponding coding and decoding method, equipment and a storage medium, wherein the filtering method comprises the following steps: acquiring a target image frame; determining a filtering processing mode corresponding to the frame type of the target image frame; and filtering the pixel values of the pixel points of the target image frame by using corresponding filtering processing modes, wherein the filtering processing modes comprise a plurality of filtering sub-steps which are sequentially performed, the plurality of filtering sub-steps comprise neural network filtering, and the filtering processing modes corresponding to at least two different frame types are different. By the method, the filtering effect can be improved.

Description

Filtering method, encoding and decoding method, encoder and decoder and storage medium for video frame
Technical Field
The present application relates to the field of video compression coding technology, and in particular, to a filtering method, an encoding and decoding method, an encoder and decoder, and a storage medium for video frames.
Background
Video is a continuous sequence of images, consisting of successive video frames. Because the similarity between consecutive video frames is extremely high, in order to facilitate storage and transmission, the original video needs to be encoded and compressed to remove redundancy in spatial and temporal dimensions, so as to reduce the network bandwidth and reduce the occupation of storage space in the video transmission process. However, encoding and compressing a video frame may affect the image quality of the video frame to a certain extent, and the image quality of an image frame reconstructed based on a residual error is often inferior to that of a video frame before encoding and compressing.
Currently, filtering the reconstructed image frame is one of means for improving the image quality. The filtering method is, for example, neural network filtering, deblocking filtering, adaptive sample compensation filtering, adaptive loop filtering, or the like, performed on the image frame by using a neural network. However, the existing filtering method is single, which also makes the effect of the filtering processing not much.
Disclosure of Invention
The application provides a filtering method, a coding and decoding method, a coder-decoder and a storage medium of a video frame.
A first aspect of the present application provides a method for filtering a video frame, where the method includes: acquiring a target image frame; determining a filtering processing mode corresponding to the frame type of the target image frame; and filtering the pixel values of the pixel points of the target image frame by using corresponding filtering processing modes, wherein the filtering processing modes comprise a plurality of filtering sub-steps which are sequentially performed, the plurality of filtering sub-steps comprise neural network filtering, and the filtering processing modes corresponding to at least two different frame types are different.
Therefore, by determining the filtering processing mode corresponding to the frame type of each target image frame and limiting the filtering processing modes corresponding to at least two different frame types to be different, the pixel values of the pixels of the target image frame can be filtered by using the filtering processing mode more matched with the frame type, so that the pertinence of filtering is improved, the filtering effect is improved, and the image quality of the target image frame is improved. In addition, the complexity of a neural network for filtering each target image frame through the neural network can be reduced by determining the filtering processing mode corresponding to the frame type of the target image frame.
A second aspect of the present application provides an encoding method, including: obtaining a target image frame of the current frame by using a residual error corresponding to the current frame; performing the filtering method for the video frame described in the first aspect on the target image frame to obtain a filtered target image frame; and encoding the current frame based on the filtered target image frame.
A third aspect of the present application provides a decoding method, including: obtaining a target image frame of the current frame by using a residual error corresponding to the current frame; the filtering method of the video frame described in the above first aspect is performed on the target image frame to obtain a filtered target image frame.
A fourth aspect of the present application provides an encoder comprising a processor and a memory coupled to each other, wherein the processor is configured to perform the encoding method described in the second aspect above.
A fifth aspect of the present application provides a decoder, wherein the encoder comprises a processor and a memory coupled to each other, wherein the processor is configured to perform the decoding method described in the third aspect.
A sixth aspect of the present application provides a computer readable storage medium storing a computer program executable by a processor, the computer program being for implementing a method of filtering a video frame as described in the first aspect above, or a method of encoding as described in relation to the second aspect above, or a method of decoding as described in the third aspect above.
According to the scheme, the filtering processing mode corresponding to the frame type of each target image frame is determined, and the filtering processing modes corresponding to at least two different frame types are limited to be different, so that the pixel values of the pixel points of the target image frame can be filtered by using the filtering processing mode more matched with the frame type, the pertinence of filtering is improved, the filtering effect is improved, and the image quality of the target image frame is improved.
Drawings
FIG. 1 is a flow chart of a first embodiment of a method for filtering video frames according to the present application;
FIG. 2 is another flow chart of an embodiment of a method for filtering video frames according to the present application;
FIG. 3 is a schematic diagram of a filtering processing manner of a video frame according to an embodiment of the filtering method of a video frame of the present application;
FIG. 4 is a flowchart illustrating a second embodiment of a method for filtering video frames according to the present application;
FIG. 5 is a schematic diagram of a target image frame and a corresponding reference frame as input in an embodiment of a filtering method for a video frame of the present application;
FIG. 6 is a flowchart illustrating a third embodiment of the filtering method for video frames according to the present application;
FIG. 7 is a schematic diagram illustrating filtering of a luminance component and two chrominance components of a pixel of a target image frame according to an embodiment of the filtering method for a video frame of the present application;
FIG. 8 is a flowchart illustrating a fourth embodiment of the filtering method for video frames according to the present application;
FIG. 9 is a flowchart illustrating a fifth embodiment of the filtering method for video frames according to the present application;
FIG. 10 is a flowchart illustrating an embodiment of a method for encoding video frames according to the present application;
FIG. 11 is a flowchart illustrating an embodiment of a method for decoding video frames according to the present application;
FIG. 12 is a block diagram of an embodiment of an encoder of the present application;
FIG. 13 is a block diagram of an embodiment of a decoder of the present application;
FIG. 14 is a block diagram of an embodiment of a computer-readable storage medium of the present application.
Detailed Description
The following describes in detail the embodiments of the present application with reference to the drawings attached hereto.
In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, interfaces, techniques, etc. in order to provide a thorough understanding of the present application.
The terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. Further, the term "plurality" herein means two or more than two.
In this application, a video frame is synonymous with an image frame.
Referring to fig. 1, fig. 1 is a flowchart illustrating a filtering method for video frames according to a first embodiment of the present application. Specifically, the method may include the steps of:
step S11: and acquiring a target image frame.
The current frame is predicted to obtain a corresponding predicted value and a residual error corresponding to the current frame. After the residual error is transformed and quantized, an image frame corresponding to the current frame can be obtained by combining the predicted value, and the image frame is the target image frame in the application. Specifically, the target image frame may be obtained at the encoding side or the decoding side, that is, the filtering method for the video frame of the present application may be applied to the encoding side and the decoding side.
Step S12: and determining a filtering processing mode corresponding to the frame type of the target image frame.
The frame type of the target image frame is the same as that of the current frame corresponding to the target image frame. The frame types include, for example, Intra-coded image frames (I frames, hereinafter), forward-Predictive-coded image frames (P frames, hereinafter), and bidirectional-Predictive-coded image frames (B frames, hereinafter).
In a specific embodiment, the frame types of the B frames may further include further typing according to the time domain layers of the B frames, for example, the B frames include B1 frames at the first time domain layer, B2 frames at the second time domain layer, and B3 frames at the third time domain layer.
After the frame type of the target image frame is determined, the filtering processing mode corresponding to the frame type of the target image frame can be determined. For example, if the frame type of the target image frame is a P frame, the corresponding filtering processing method is a filtering processing method corresponding to the P frame. If the frame type of the target image frame is B frame, the corresponding filtering processing method is the filtering processing method corresponding to B frame.
Step S13: and filtering the pixel values of the pixel points of the target image frame by using a corresponding filtering processing mode.
In the present application, the filtering processing manner includes a plurality of filtering sub-steps performed in sequence, and the plurality of filtering sub-steps includes neural network filtering. The filtering sub-step comprises, for example: deblocking filtering, sample adaptive compensation filtering, adaptive loop filtering, and neural network filtering. And the filtering processing modes corresponding to at least two different frame types are different. That is, in the present application, there are at least two different filter processing methods. For example, the filtering processing method corresponding to each frame type is different from the filtering processing method corresponding to other frame types. For another example, the filtering processing modes corresponding to some frame types are all different from the filtering processing modes corresponding to other frame types. The pixel values of the pixel points of the target image frame are filtered by using the corresponding filtering processing mode, so that the filtering of the video frame can be realized.
By filtering the pixel values of the pixels of the target image frame, the pixel values of the pixels can be changed. For example, the pixel value of a certain pixel point of the target image frame is an a value under YUV coding, and after filtering, the pixel value may be a B value, specifically, a luminance component value and/or a color component value of the a value may be modified to obtain the B value.
In one embodiment, neural network filtering is performed using a neural network that includes hopping connections. A neural network containing hopping connections is for example a residual network. By performing neural network filtering using a neural network including hopping connections, it is possible to improve filtering effects by better feature learning performance using the neural network including hopping connections.
Therefore, by determining the filtering processing mode corresponding to the frame type of each target image frame and limiting the filtering processing modes corresponding to at least two different frame types to be different, the pixel values of the pixels of the target image frame can be filtered by using the filtering processing mode more matched with the frame type, so that the pertinence of filtering is improved, the filtering effect is improved, and the image quality of the target image frame is improved. In addition, the complexity of a neural network for filtering each target image frame through the neural network can be reduced by determining the filtering processing mode corresponding to the frame type of the target image frame.
In one embodiment, among the filtering substeps, a filtering substep before taking the image frame obtained by filtering as a reference frame may be taken as loop filtering. For example, the filtering sub-steps include deblocking filtering, sample adaptive compensation filtering, adaptive loop filtering, and neural network filtering, which are performed sequentially, and the image frame obtained by the adaptive loop filtering is used as a reference frame, in which case the deblocking filtering, the sample adaptive compensation filtering, and the adaptive loop filtering may be used as the loop filtering.
In one embodiment, in the case that the frame type of the target image frame is determined to be an I frame, the loop filtering may be configured to include neural network filtering. In the case where it is determined that the frame type of the target image frame is a B frame, the loop filter is set to exclude the neural network filter. The neural network filtering is performed on the B frame serving as the reference frame, and the neural network filtering may be considered to be used for post-processing.
In one embodiment, in the case that the frame type of the target image frame is determined to be an I frame, the loop filtering may be set to exclude the neural network filtering, and the neural network filtering may be referred to as filtering the I frame serving as the reference frame, and may also be referred to as neural network filtering for post-processing. In the case where it is determined that the frame type of the target image frame is a B frame, the loop filtering includes neural network filtering.
In a specific embodiment, in case the loop filter comprises a neural network filter, the loop filter comprises the neural network filter, and at most two of the deblocking filter, the sample adaptive compensation filter, and the adaptive loop filter. For example, the loop filtering includes neural network filtering, deblocking filtering, and sample adaptive compensation filtering. As another example, loop filtering includes neural network filtering, deblocking filtering.
Therefore, specific filtering processing is set through image frames of different frame types, and the filtering pertinence is improved.
In one embodiment, in a case where it is determined that the frame type of the target image frame is an I frame, the filtering processing manner corresponding to the I frame mentioned in the above step includes: and sequentially carrying out sample adaptive compensation filtering, neural network filtering and adaptive loop filtering on the pixel values of the pixel points of the target image frame. In one embodiment, an image obtained after adaptive loop filtering may be used as a reference frame image for subsequent video frame encoding, so as to improve image quality of the reference frame and help improve image quality of a video frame referencing the reference frame.
In one embodiment, in a case where it is determined that the frame type of the target image frame is a B frame, the filtering processing manner corresponding to the B frame mentioned in the above step includes: and sequentially carrying out deblocking filtering, sample adaptive compensation filtering, adaptive loop filtering and neural network filtering on the pixel values of the pixel points of the target image frame. In a specific embodiment, an image obtained after the adaptive loop filtering may be used as a reference frame image, and an image obtained after the neural network filtering may be used as an output image for output.
Therefore, by setting specific filtering processing modes for the I frame and the B frame, the filtering of the two types of video frames can be more targeted, the filtering effect is improved, and the image quality of the target image frame is improved.
Referring to fig. 2, fig. 2 is another flow chart illustrating a filtering method for video frames according to an embodiment of the present application. In this embodiment, the "determining the filtering processing manner corresponding to the frame type of the target image frame" mentioned in the above step may specifically include step S121 and step S122.
Step S121: and in the case that the frame type of the target image frame is determined to be the B frame, determining the time domain layer of the target image frame of the B frame.
Since the B frame target image frame can be further subdivided according to the temporal layer in which it is located, in the case that the frame type of the target image frame is determined to be a B frame, the temporal layer of the B frame target image frame can be further determined.
Step S122: a filtering process corresponding to a temporal layer of the B frame target image frame is determined.
After the time domain layer of the B frame target image frame is determined, the filtering processing corresponding to the time domain layer of the B frame target image frame can be correspondingly determined, so that the B frames of different time domain layers are subjected to targeted filtering, and the filtering effect is improved.
In one embodiment, in a case where the target image frame is determined to be a B frame of a first preset temporal layer, the filtering process corresponding to the temporal layer of the B frame target image frame includes: deblocking filtering, sample adaptive compensation filtering, adaptive loop filtering and neural network filtering, and taking the image frame output by the neural network filtering as a reference frame. The first preset temporal layer is, for example, a B1 target image frame at the first temporal layer. In other embodiments, the first preset temporal layer may also be a B frame target image frame of another temporal layer. In addition, the image frame output by the neural network filtering is used as a reference frame, namely after the filtering corresponding to the B frame of the first preset time domain layer is finished, the image frame obtained by filtering is used as the reference frame and then the processing is finished.
In one embodiment, in a case that the target image frame is determined to be a B frame of a second preset temporal layer, the filtering process corresponding to the temporal layer of the B frame target image frame includes: deblocking filtering, neural network filtering, sample adaptive compensation filtering, adaptive loop filtering, and taking the image frame output by the adaptive loop filtering as a reference frame. The second preset temporal layer is, for example, a B2 target image frame at the second temporal layer. In other embodiments, the second preset temporal layer may also be a B frame target image frame of another temporal layer. In addition, the image frame output by the adaptive loop filtering is used as a reference frame, namely after the filtering corresponding to the B frame of the second preset time domain layer is finished, the processing is finished before the image frame obtained by filtering is used as the reference frame.
In one embodiment, in a case that the target image frame is determined to be a B frame of a third preset temporal layer, the filtering process corresponding to the temporal layer of the B frame target image frame includes: deblocking filtering, sample adaptive compensation filtering, adaptive loop filtering, and neural network filtering. And, the image frame output by the adaptive loop filter is taken as a reference frame. The third preset temporal layer is, for example, a B3 target image frame at the third temporal layer. In other embodiments, the third preset temporal layer may also be a B frame target image frame of another temporal layer.
In this embodiment, when the image frame obtained by filtering is used as the reference frame, it can also be considered as one of the steps included in the filtering processing method.
Therefore, by determining a specific filtering processing mode for the B frames of different time domain layers, the B frame target image frames of different time domain layers can be subjected to targeted filtering, so as to improve the filtering effect.
Referring to fig. 3, fig. 3 is a schematic diagram illustrating a filtering processing method of a video frame according to an embodiment of the present invention. In fig. 3, I is an I frame, B is a B frame, where B1 is a B frame of the first temporal layer, B2 is a B frame of the second temporal layer, and B3 is a B frame of the third temporal layer. The video frame in the solid frame is a target image frame, and the video frame in the dashed frame is a filtered video frame serving as a reference frame. Specifically, in fig. 3, SAO is sample adaptive compensation, NN is neural network filtering, ALF is adaptive loop filtering, and DBF is deblocking filtering. In addition, in fig. 3, a broken line indicates a reference relationship between frames, and a broken-line arrow points to a reference frame.
In an embodiment, the step of "filtering the pixel values of the pixels of the target image frame using the corresponding filtering processing manner" specifically includes: and taking the target image frame and the reference frame of the target image frame as input to carry out neural network filtering so as to carry out neural network filtering on the pixel value of the target image frame by utilizing the reference frame. That is, when the neural network filtering is performed, the target image frame and the reference frame of the target image frame are input to perform the neural network filtering, that is, the target image frame and the reference frame of the target image frame are both input to the neural network for filtering, the neural network extracts the characteristic information of the target image frame and the reference frame of the target image frame, and the neural network outputs the filtered target image frame, thereby implementing the neural network filtering on the pixel value of the target image frame by using the reference frame.
Specifically, the target image frame and the reference frame of the target image frame are used as input, and the pixel values of the target image frame and the reference frame of the target image frame may be used as feature channel information, and then the feature channel information is used as input and input into the neural network for neural network filtering.
In one embodiment, the P frame target image frame and the reference frame thereof may be used as input of a neural network, and then the neural network can extract feature information of the P frame target image frame and the reference frame thereof, and finally output the filtered P frame target image frame. In another embodiment, the B frame target image frame and the reference frame thereof may be used as the input of the neural network, and then the neural network can extract the feature information of the B frame target image frame and the reference frame thereof, and finally output the filtered B frame target image frame.
Therefore, the target image frame and the reference frame thereof are used as input, so that time domain information contained in the reference frame can be utilized when neural network filtering is carried out, and the filtering effect is improved.
Referring to fig. 4, fig. 4 is a schematic flow chart of a second embodiment of the filtering method for video frames according to the present application. In the present embodiment, the "neural network filtering with the target image frame and the reference frame of the target image frame as inputs" mentioned in the above steps specifically includes step S21 and step S22.
Step S21: the number of I frames in a reference frame of a B frame target image frame of which the frame type is a B frame is determined.
For a B frame target image frame, it contains two frame reference frames. Therefore, the number of I frames in the reference frame of the B frame target image frame can be firstly determined, so as to carry out targeted neural network filtering. Specifically, the number of I frames in the B frame target image frame reference frame may be 2 frames, 1 frame or no I frame reference frame.
Step S22: and taking the B frame target image frames containing different numbers of I frame reference frames and the corresponding reference frames as input, and filtering the B frame target image frames by using a neural network filtering processing mode corresponding to the number of I frames in the reference frames of the B frame target image frames.
In this embodiment, the corresponding neural network filtering processing may be performed for a difference in the number of I frames in the reference frame of the target image frame of the B frame. Specifically, it can be considered that the B frame target image frames with different numbers of I frames in the reference frame are subjected to neural network filtering by using an unused neural network. In the present application, different neural networks may be considered as different in the structure of the neural network; or the same network structure but with different network parameters; or the structure and network parameters are different.
Specifically, a neural network filtering processing mode may be determined for the B frame target image frame whose number of I frames in the reference frame is 1, a neural network filtering processing mode may be determined for the B frame target image frame whose number of I frames in the reference frame is 2, and a neural network filtering processing mode may be determined for the B frame target image frame whose number of I frames in the reference frame is 0. Therefore, B frames with different numbers of I frames in the reference frame are subjected to targeted filtering.
In one embodiment, in a case that it is determined that the reference frame includes an I frame, the neural network filtering process may be performed using a pixel value of the I frame of one frame as a channel value of the first preset channel. The specific channel number of the first preset channel may be a number corresponding to a pixel value type of the I frame. For example, if the pixel value type of the I frame includes a luminance component and two chrominance components, the specific channel number of the first preset channel is 3. For another example, if the pixel value type of the I frame includes a luminance component, the specific channel number of the first preset channel is 1. The first preset channel may be a specific channel with a preset channel number, for example, when the first preset channel is specifically 3 channels, the first preset channel may include a first channel, a second channel, and a third channel, or three last but one channels, and the like. In one example, after taking the pixel value of one frame I frame as the channel value of the first three channels, the pixel value of the target image frame may be taken as the channel value, and finally the pixel value of the reference frame of the other frame as the channel value of the remaining channels. For example, the reference frame of the B frame target image frame is 1 frame I frame, 1 frame B frame. The pixel values of each frame are a luminance component value and two chrominance component values, and the luminance component value and the two chrominance component values of the I-frame reference frame may be taken as the first channel value to the third channel value, the luminance component value and the two chrominance component values of the B-frame target image frame may be taken as the fourth channel value to the sixth channel value, and the luminance component value and the two chrominance component values of the B-frame reference frame may be taken as the seventh channel value to the ninth channel value.
For another example, if the pixel values of the I-frame reference frame are a luminance component value and two chrominance component values, the luminance component value and the two chrominance component values of the I-frame reference frame are respectively the first channel value, the second channel value and the third channel value. If only the luma component value of the reference frame is needed, the luma component value of the reference frame is taken as the first channel value.
The reference frames include I frames, and the number of I frames may be 1 or 2. In the case where the number of I frames is 1, it may be a pixel value of an I frame reference frame as a channel value of a first preset channel of channel information input into the neural network. In the case where the number of I frames is 2, the pixel value of the I frame preceding in time series in the I frame reference frame may be used as the channel value of the first preset channel of the channel information input into the neural network, and the pixel value of the I frame reference frame of the other frame may be used as the last channel value.
In a specific embodiment, when it is determined that the reference frames are all B frames, the pixel value of the B frame reference frame in the lowest temporal layer is used as the channel value of the second preset channel to perform the neural network filtering processing. The specific channel number of the second preset channel may correspond to the number of pixel value types of the B frame. Please refer to the above description of the first preset channel for a specific setting method of the second preset channel, which is not described herein again. If the time domain layers of the B frame reference frames are the same, the B frame with the earlier time sequence may be further used as a channel value of a second preset channel for neural network filtering processing. The second preset channel is not specifically limited in this application.
In one embodiment, in the case that the reference frames are determined to be B frames and P frames, the neural network filtering process may be performed using the pixel values of the P frame reference frame as the front channel values.
In one embodiment, the target image frame of which the frame type is a P frame may have the pixel value of its reference frame as the front-most channel value.
Therefore, when the target image frame and the reference frame of the target image frame are used as input to carry out neural network filtering, the specific determination method of the channel information input to the neural network is determined, so that the filtering effect of the neural network is improved.
Referring to fig. 5, fig. 5 is a schematic diagram of a target image frame and a reference frame corresponding to the target image frame as input in the embodiment of the filtering method for video frames of the present application. In fig. 5, the video frames in the solid frame are target image frames, where I is an I frame, B is a B frame, where B1 is a B frame of the first temporal layer, B2 is a B frame of the second temporal layer, and B3 is a B frame of the third temporal layer, and the number in parentheses on each frame image frame indicates the coding order of the frame. The video frames in the dashed box are filtered and then used as reference frames. NN1, NN2, and NN3 are three different neural network filtering processes. In fig. 3, for a B3(5) target image frame, the corresponding neural network filtering process is NN2, and the input of NN2 includes an I (1) target image frame, a B3(5) target image frame, and a B2(4) target image frame, where the I (1) target image frame and the B2(4) target image frame are reference frames of the B3(5) target image frame.
In an embodiment, the "filtering the pixel values of the pixels in the target image frame by using the corresponding filtering processing manner" mentioned in the above step may specifically include: filtering the brightness component and the chrominance component of the pixel point of the target image frame in the same neural network filtering process; or filtering the brightness component and the chrominance component of the pixel point of the target image frame in different neural network filtering processes.
In this embodiment, the luminance component and the chrominance component of the pixel point of the target image frame are filtered in the same neural network filtering process, which may be considered as filtering the luminance component and the chrominance component of the pixel point of the target image frame by using one neural network. If the reference frame is also included in the input to the neural network, a luminance component and a chrominance component of the pixel points of the target image frame and the reference frame thereof may be filtered by using one neural network. Therefore, the whole filtering of the target image frame can be realized by filtering the brightness component and the chrominance component of the pixel point of the target image frame in the same neural network filtering process.
In this embodiment, the luminance component and the chrominance component of the pixel point of the target image frame are filtered in different neural network filtering processes, which may be considered as filtering the luminance component and the chrominance component of the pixel point of the target image frame by using different neural networks respectively. If the reference frame is also included in the input to the neural network, different neural networks may be used to filter the luminance component and the chrominance component of the pixel points of the target image frame and the reference frame thereof. Therefore, the brightness component and the chrominance component of the pixel point of the target image frame are respectively filtered in different neural network filtering processes, so that the pertinence filtering of the brightness component and the chrominance component of the target image frame can be realized, and the filtering effect is improved.
In an embodiment, the "filtering the luminance component and the chrominance of the pixel point of the target image frame in different neural network filtering processes" specifically includes: and respectively filtering the brightness component and the two chroma components of the pixel point of the target image frame in the three neural network filtering processes. Therefore, the luminance component and the two chrominance components of the pixel point of the target image frame are respectively filtered in the three neural network filtering processes, so that the respective filtering of the luminance component and the two chrominance components of the pixel point of the target image frame can be realized, the filtering pertinence is improved, and the filtering effect is favorably improved.
Referring to fig. 6, fig. 6 is a flow chart illustrating a third embodiment of the filtering method for video frames according to the present application. In an embodiment, the above "filtering the luminance component and the two chrominance components of the pixel point of the target image frame in three different neural network filtering processes" specifically includes steps S31 to S33.
Step S31: a first neural network filtering process is performed on a luminance component of a pixel point of a target image frame for filtering, and a chrominance component is taken as an input of the first neural network filtering process to filter the luminance component with the chrominance component.
In this embodiment, when the first neural network filtering process is performed on the luminance component of the pixel point of the target image frame for filtering, the luminance component and the chrominance component are simultaneously input to the neural network for the first neural network filtering process, so that the neural network can filter the luminance component by using the characteristic information of the chrominance component, thereby improving the filtering effect. Specifically, two chrominance components may be simultaneously used as input, or one of the chrominance components may be used as input.
Step S32: performing a second neural network filtering process on one chrominance component of the pixel points of the target image frame for filtering, and taking the luminance component as an input of the second neural network filtering process to filter the one chrominance component with the luminance component.
In this embodiment, when performing the second neural network filtering process on one chrominance component of the pixel point of the target image frame for filtering, the luminance component is simultaneously input to the neural network for the second neural network filtering process, so that the neural network can utilize the characteristic information of the luminance component to filter one chrominance component, thereby improving the filtering effect.
Step S33: performing a third neural network filtering process on another chrominance component of the pixel point of the target image frame for filtering, and taking the luminance component as an input of the third neural network filtering process to filter the other chrominance component with the luminance component.
In this embodiment, when the third neural network filtering process is performed on the other chrominance component of the pixel point of the target image frame for filtering, the luminance component is simultaneously input to the neural network for the third neural network filtering process, so that the neural network can filter the other chrominance component by using the characteristic information of the luminance component, thereby improving the filtering effect.
Referring to fig. 7, fig. 7 is a schematic diagram illustrating filtering of a luminance component and two chrominance components of a pixel of a target image frame according to an embodiment of the filtering method for a video frame of the present application. In fig. 7, Y denotes a luminance component of the target image frame, and U and V denote two chrominance components of the target image frame, respectively. In part (a) of fig. 7, the U, V components are deconvoluted, and the corresponding image is equal to the image of the luminance component. At this time, Y, U and the V component are input to the residual network ResNet to obtain a filtered Y component, and the Y component is filtered by U, V components. Parts (b) and (c) of fig. 7 are not described again.
Referring to fig. 8, fig. 8 is a flow chart illustrating a fourth embodiment of the filtering method for video frames according to the present application. In this embodiment, the aforementioned "filtering the pixel values of the pixels of the target image frame by using the corresponding filtering processing manner" includes steps S41 and S42 for each frame of the target image frame.
Step S41: and acquiring quantization parameters of a plurality of quantization blocks contained in the current frame corresponding to the target image frame.
In this embodiment, when quantizing a current frame, the current frame is segmented, and then a plurality of blocks for quantization are obtained, which are quantized blocks. In order to further improve the filtering effect, the quantization parameters of a plurality of quantization blocks included in the current frame corresponding to the target image frame may be obtained, and then the targeted filtering may be performed according to the difference of the quantization parameters of the quantization blocks.
Step S42: and carrying out neural network filtering corresponding to the preset range on the blocks, corresponding to the quantization blocks with the quantization parameters in different preset ranges, in the target image frame.
The block in the target image frame corresponding to the quantization block may be considered as a block in the target image frame corresponding to the quantization block position. The preset range may be set as desired. For example, the quantization parameter range is [0,51], and the quantization parameter may be divided into 6 preset ranges, [0,24], [25,29], [30,34], [35,39], [40,51 ]. Then, based on the quantization parameter of each quantized block, neural network filtering corresponding to a preset range may be performed on the block corresponding to the quantized block in the target image frame. For example, if the quantization parameter range of a certain quantized block is 45, it can be determined that it belongs to the preset range of [40,51 ]. Then, the corresponding block of the quantized block in the target image frame can be determined, and then the neural network filtering corresponding to the preset range of [40,51] is performed on the corresponding block of the quantized block in the target image frame.
Specifically, one corresponding neural network filtering processing method may be set for each preset range, or one corresponding neural network filtering processing method may be set for a part of the preset ranges, and the setting may be specifically performed as needed, which is not limited herein.
Therefore, the quantization parameters of a plurality of quantization blocks included in the current frame corresponding to the target image frame are obtained, and then the corresponding blocks of the quantization blocks in the target image frame are subjected to targeted filtering according to the difference of the quantization parameters of the quantization blocks, so as to improve the filtering effect.
In an embodiment, for each quantization block, the "performing neural network filtering corresponding to a preset range on a corresponding block of a quantization block of which the quantization parameter is in a different preset range in the target image frame" may specifically include: and respectively carrying out neural network filtering on the brightness component and the chrominance component of the pixel point of the corresponding block of the quantization block in the target image frame. That is, the neural network filtering may be performed on the luminance component and the chrominance component of the pixel point of the block corresponding to the quantization block in the target image frame, respectively. For example, the luminance component and the two chrominance components of a certain quantization block may be separately neural network filtered.
In a specific embodiment, corresponding neural network filtering may be performed on luminance components of blocks corresponding to quantization blocks with quantization parameters in different preset ranges in the target image frame, and chrominance components of blocks corresponding to all quantization blocks in the target image frame may be filtered in the same neural network filtering processing manner. Or, filtering the two chrominance components of the block corresponding to all the quantized blocks in the target image frame by using a neural network filtering processing mode respectively.
Therefore, on the basis of carrying out targeted filtering on the corresponding block of the quantization block in the target image frame, different components of the corresponding block of the quantization block in the target image frame can be further filtered, so as to improve the filtering effect.
Referring to fig. 9, fig. 9 is a flow chart illustrating a fifth embodiment of the filtering method for video frames according to the present application. In this embodiment, the step of "filtering the pixel values of the pixels of the target image frame by using the corresponding filtering processing manner" specifically includes the steps S51 and S52.
Step S51: and segmenting the target image frame into a plurality of target blocks.
The segmentation of the target image frame may be a method commonly used in the art. The target block is, for example, a Coding Unit (CU).
In one embodiment, the target image frame may be sliced into a number of target blocks that contain stitching regions. The splicing area is, for example, a coding unit. The size of the target block is not smaller than the splicing area. For example, the size of the target block is larger than the splicing area. The size of the target block is larger than the splicing area, and when the target block is subjected to neural network filtering subsequently, the splicing area can be filtered by utilizing the pixel value information of pixel points outside the splicing area, so that the filtering effect is improved. In one embodiment, if the splicing region is an edge of the target image frame, the target block may be obtained by filling the periphery of the splicing region.
Step S52: and filtering the pixel values of the pixel points of a plurality of target blocks of the target image frame by using a corresponding filtering processing mode to obtain a filtered target block.
The corresponding filter processing method is a filter processing method corresponding to the frame type of the target image frame. In this embodiment, specifically, the pixel values of the pixel points of the plurality of target blocks of the target image frame are filtered to obtain the filtered target block, so as to implement filtering of the whole target image frame.
In one embodiment, after the step "filtering the pixel values of the pixel points of the plurality of target blocks of the target image frame using the corresponding filtering processing mode to obtain the filtered target block", the filtering method for the video frame of the present application further includes: and obtaining the filtered target image frame by using the filtered target block. It can be understood that the filtered target block is obtained by filtering the target block obtained by segmenting the target image frame, and therefore the filtered target block can be correspondingly utilized for splicing, so as to obtain the filtered target image frame.
In a specific embodiment, for a case where the target image frame is divided into a plurality of target blocks including a splicing region, the splicing region in the filtered target block may be utilized for splicing, so as to obtain the filtered target image frame.
It is understood that the filtering method of the video frame mentioned in the foregoing embodiment is only an exemplary illustration, and the filtering methods of the different video frames mentioned in the foregoing embodiment may be combined with each other to obtain a new reconstructed frame method, and the application is not limited thereto.
In one embodiment, the filtering method for video frames of the present application may further include the following steps 1 to 3 to train a neural network for neural network filtering.
Step 1: and acquiring a target image frame and determining a filtering processing mode corresponding to the frame type of the target image frame.
For details of obtaining the reconstructed frame image, reference is made to the related description of the above embodiment, which is not repeated herein.
In this embodiment, the filtering processing manner includes a plurality of filtering sub-steps performed sequentially, the filtering sub-steps include neural network filtering, and the filtering processing manners corresponding to at least two different frame types are different. Please refer to the related description of the above embodiments for specifically determining the filtering processing manner corresponding to the frame type, which is not described herein again.
Step 2: and filtering the pixel values of the pixel points of the target image frame by using a corresponding filtering processing mode to obtain a filtered target image frame.
Please refer to the related description of the above embodiments, which is not repeated herein.
And step 3: and adjusting network parameters of a neural network for filtering the neural network based on the difference between the current frame corresponding to the target image frame and the filtered target image frame.
In this embodiment, the current frame corresponding to the target image frame is an original video frame that is not encoded. Specifically, when training the neural network for neural network filtering in the filtering processing method corresponding to the I frame, the neural network for neural network filtering in the filtering processing method corresponding to the P frame or the B frame may be trained using a training set composed of the I frame, and the training sets composed of the P frame or the B frame may be used for training.
In one embodiment, the neural network for neural network filtering of the target image frames of different frame types may be trained first and second according to the reference relationship between the frames of different types. Taking the filtering processing method of the video frame shown in fig. 3 as an example, since the I frame is an intra-frame prediction frame, a neural network for performing neural network filtering on a target image frame having a frame type of the I frame can be trained first, and the target image frame obtained after the ALF filtering is used as a reference frame. And then, training a neural network for carrying out neural network filtering on the target image frame with the frame type of B frame and the time domain layer of 1, and taking the target image frame obtained after NN filtering as a reference frame. And then, training a neural network for performing neural network filtering on the target image frame with the frame type of B frame and the time domain layer of 2, and taking the target image frame obtained after ALF filtering as a reference frame. And finally, training a neural network for performing neural network filtering on the target image frame with the frame type of B frame and the time domain layer of 3. Therefore, the neural network for carrying out neural network filtering on the target image frame with the frame type of the I frame is trained firstly, so that the picture quality of the subsequent target image frame taking the I frame as the reference frame is higher, errors introduced when the I frame is taken as the reference frame are reduced, and the training effect is improved. Similarly, the second training is used for training the neural network for performing neural network filtering on the target image frame with the frame type of B frame and the time domain layer of 1, and the third training is used for training the neural network for performing neural network filtering on the target image frame with the frame type of B frame and the time domain layer of 2, so that the subsequent target image frames with the B1 frame and the B2 frame as reference frames can have higher picture quality, errors introduced when the B1 frame and the B2 frame are used as reference frames can be reduced, and the training effect can be improved.
In one embodiment, when training the neural network for performing neural network filtering on the target image frame of the frame type P frame, the neural network for performing neural network filtering on the target image frame of the frame type P frame may be performed after the neural network for performing neural network filtering on the target image frame of the frame type I frame is trained. Therefore, the picture quality of a target image frame which takes the P frame as the reference frame is higher, the error introduced when the P frame is taken as the reference frame is reduced, and the training effect is improved.
Referring to fig. 10, fig. 10 is a schematic flowchart of an embodiment of a method for encoding a video frame according to the present application. In this embodiment, the method for encoding a video frame specifically includes:
step S61: and obtaining the target image frame of the current frame by using the residual error corresponding to the current frame.
The current frame may be an original video frame to be encoded, and a residual error corresponding to the current frame may be obtained by using a video frame prediction method commonly used in the art. For example, after a current frame is predicted by using an intra-frame prediction method, a predicted value can be obtained, and a residual error corresponding to the current frame can be obtained by subtracting a true value of the current frame from the predicted value. And obtaining a quantized residual error by changing and quantizing the residual error, and obtaining a target image frame based on the quantized residual error and the current frame predicted value.
Step S62: and executing a video frame filtering method on the target image frame to obtain a filtered target image frame.
In this embodiment, the filtering method of the video frame may be the method mentioned in the above embodiment of the filtering method of the video frame, and the specific reconstruction process is described with reference to the above embodiment, which is not described herein again.
In one embodiment, the filtering method of the video frame may be performed on a luminance component and two chrominance components of the target image frame, respectively, and then the components obtained after filtering may be combined to obtain the filtered target image frame.
Step S63: and encoding the current frame based on the filtered target image frame.
After the filtered target image frame is obtained, the current frame can be encoded by using an encoding method commonly used in the field. For example, a syntax switch related to a filtering method of a video frame may be encoded so that a decoding side can determine whether the filtering method of the video frame needs to be performed or not with the syntax switch related to the filtering method of the video frame.
In one embodiment, a first switch regarding a neural network filtering process switch may be set for a current sequence to which a current frame belongs. When the first switch of the neural network filtering processing switch corresponding to the current sequence to which the current frame belongs is on, it indicates that the video frame sequence may execute the video frame filtering method mentioned in the above embodiments. When the first switch of the neural network filtering processing switch corresponding to the current sequence to which the current frame belongs is off, it means that the video frame sequence does not execute the filtering method of the video frame mentioned in the above embodiment. Therefore, in one embodiment, the step S62 is executed when the first switch is turned on. Therefore, by setting the filter processing switch with respect to the neural network for the current sequence to which the current frame belongs, it is possible to realize control of whether or not the filtering method of the video frame is performed for the current sequence.
In one embodiment, encoding the current frame based on the filtered target image frame includes steps S631 to S63.
Step S631: and comparing the image quality of the target image frame after filtering with that of the target image frame before filtering.
The method for comparing the image quality of the target image frame after filtering and the target image frame before filtering may be an image quality comparison method commonly used in the art, for example, the method includes calculating rate distortion corresponding to the target image frame after filtering and the target image frame before filtering, and then comparing the image quality of the target image frame after filtering and the image quality of the target image frame before filtering.
In the case where the filtering method of the video frame is performed on the luminance component and the two chrominance components of the target image frame, the quality of each component of the target image frame after filtering and the quality of the corresponding component of the target image frame before filtering can be compared.
Step S632 may be performed if the filtered target image frame is better than the target image frame before filtering, and step S633 may be performed if the filtered target image frame is worse than the target image frame before filtering.
Step S632: and setting a neural network filtering processing switch of the current frame to be on.
The filtered target image frame is superior to the target image frame before filtering, meaning that the target image frame can be filtered using a filtering method of the video frame to improve image quality, and thus the neural network filtering processing switch of the current frame can be set to on so that the decoding side can perform the filtering method of the video frame on the current frame based on the neural network filtering processing switch of the current frame.
In a case where the filtering method of the video frame is performed on the luminance component and the two chrominance components of the target image frame, if any one component of the target image frame after filtering is better than the corresponding component of the target image frame before filtering, the neural network filtering processing switch corresponding to the any one component of the current frame may be set to on. For example, if the luminance component of the target image frame after filtering is better than the luminance component of the target image frame before filtering, the neural network filter processing switch corresponding to the luminance component of the current frame may be set to on.
Step S633: and setting the neural network filtering processing switch of the current frame to be off.
The target image frame after filtering is inferior to the target image frame before filtering, which means that it is not necessary to filter the target image frame using the filtering method of the video frame. Accordingly, the neural network filter processing switch of the current frame may be set to off, so that the decoding side can not perform the filtering method of the video frame on the current frame based on the neural network filter processing switch of the current frame.
In a case where the filtering method of the video frame is performed on the luminance component and the two chrominance components of the target image frame, if any one component of the filtered target image frame is inferior to the corresponding component of the target image frame before filtering, the neural network filtering processing switch corresponding to the any one component of the current frame may be set to off. For example, if the luminance component of the target image frame after filtering is inferior to the luminance component of the target image frame before filtering, the neural network filter processing switch corresponding to the luminance component of the current frame may be set to off.
Therefore, by comparing the image quality of the filtered target image frame with that of the target image frame before filtering, whether the filtering method of the video frame needs to be executed on the target image frame can be determined, and whether the filtering method of the video frame is executed on the target image frame is controlled.
In an embodiment, the step of "performing a video frame filtering method on a target image frame to obtain a filtered target image frame" specifically includes: and executing a video frame filtering method on a plurality of target blocks obtained by cutting the target image frame to obtain a filtered target image frame. In the present embodiment, the target block is, for example, an encoding unit. In other embodiments, the target block may be set as needed. And executing a video frame filtering method on each target block, and splicing the filtered target blocks to obtain a filtered target image frame.
In this case, the "encoding the current frame based on the filtered target image frame" mentioned in the above embodiments includes:
step S634: and comparing the image quality of each filtered target block with that of the target block before filtering.
The specific comparison method may refer to the description of step S631, and is not described herein again.
Step S635 may be performed if the filtered target block is better than the target block before filtering, and step S636 may be performed if the filtered target block is worse than the target block before filtering.
Step S635: and setting the neural network filtering processing switch corresponding to the target block to be on.
The target block after filtering is better than the target block before filtering, which means that the target block can be filtered using a filtering method of a video frame to improve image quality. Therefore, the neural network filter processing switch corresponding to the target block can be set to be on, so that the decoding side can execute the filtering method of the video frame on the target block based on the neural network filter processing switch of the target block.
Step 636: and setting the neural network filtering processing switch corresponding to the target block to be off.
The target block after filtering is inferior to the target block before filtering, meaning that the target block does not have to be filtered using the filtering method of the video frame. Therefore, the neural network filter processing switch corresponding to the target block can be set to be off, so that the decoding side can not perform the filtering method of the video frame on the target block based on the neural network filter processing switch of the target block.
Step S637: determining whether there is at least one target block for which the corresponding neural network filter processing switch is set to on.
For the target image frame, if there is at least one target block whose corresponding neural network filter processing switch is set to on, it means that the filtering method of the video frame can be performed on the target block whose corresponding neural network filter processing switch is set to on in the target image frame. Therefore, it is possible to determine in advance whether or not there is at least one target block whose corresponding neural network filter processing switch is set to on.
If there is at least one target block, the corresponding neural network filter processing switch may be set to on, step S638 may be performed, and if there is no target block, the corresponding neural network filter processing switch may be set to on, step S639 may be performed.
Step S638: and setting a neural network filtering processing switch of the current frame to be on.
Since there is at least one target block whose corresponding neural network filter processing switch is set to on, it indicates that the filtering method of the video frame can be performed on the target block whose corresponding neural network filter processing switch is set to on in the target image frame, so as to improve the image quality. Therefore, the neural network filter processing switch of the current frame can be correspondingly set to be on, so that the decoding side can execute the filtering method of the video frame on the target block based on the neural network filter processing switch of the current frame and the neural network filter processing switch corresponding to the target block.
Step S639: and setting the neural network filtering processing switch of the current frame to be off.
Since the corresponding neural network filter processing switch for which there is no target block is set to on, indicating that it is not necessary to perform the filtering method of the video frame for all target blocks in the target image frame, the neural network filter processing switch for the current frame may be set to off, so that the decoding side can not perform the filtering method of the video frame for the current frame based on the neural network filter processing switch for the current frame.
Therefore, by determining the neural network filtering processing switch corresponding to each target block, whether each target block needs to execute the filtering method of the video frame or not can be realized, and whether the target block needs to execute the filtering method of the video frame or not can be controlled.
Referring to fig. 11, fig. 11 is a schematic flowchart illustrating an embodiment of a method for decoding a video frame according to the present application. In this embodiment, the method for decoding a video frame specifically includes:
step S71: and obtaining the target image frame of the current frame by using the residual error corresponding to the current frame.
The residual corresponding to the current frame is obtained by decoding the code stream transmitted from the encoding side, for example. And after carrying out inverse quantization and inverse transformation on the residual error corresponding to the current frame, combining the prediction value of the current frame to obtain the target image frame corresponding to the current frame.
Step S72: and executing a video frame filtering method on the target image frame to obtain a filtered target image frame.
In this embodiment, the filtering method of the video frame may be the method mentioned in the above embodiment of the filtering method of the video frame, and the specific reconstruction process is described with reference to the above embodiment, which is not described herein again.
In a specific embodiment, after decoding the code stream transmitted from the encoding side, a specific switching condition of the neural network filtering processing switch of the current frame may be obtained, and under the condition that the neural network filtering processing switch of the current frame is on, the filtering method of the video frame may be performed on the target image frame to obtain the filtered target image frame.
In a specific embodiment, after decoding the code stream transmitted from the encoding side, the neural network filter processing switch of the current frame and the neural network filter processing switches corresponding to a plurality of target blocks obtained after segmenting the target image frame corresponding to the current frame can be obtained. In the case where the neural network filter processing switch of the current frame is on, the filtering method of the video frame may be performed on the target block for which the neural network filter processing switch corresponding to the target block is on. And finally, splicing all the target blocks to obtain the filtered target image frame.
Therefore, by the above-described decoding method, it is possible to realize a filtering method of performing a video frame on the decoding side.
Referring to fig. 12, fig. 12 is a block diagram of an embodiment of an encoder of the present application. The encoder 120 includes a memory 121 and a processor 122 coupled to each other, and the processor 122 is configured to execute program instructions stored in the memory 121 to implement the steps of any of the above-described embodiments of the video frame encoding method. In one particular implementation scenario, encoder 120 may include, but is not limited to: the encoder 120 may further include a mobile device such as a notebook computer, a tablet computer, and the like, which is not limited herein. In particular, the processor 122 is configured to control itself and the memory 121 to implement the steps of any of the encoding method embodiments.
Referring to fig. 13, fig. 13 is a block diagram of an embodiment of a decoder according to the present application. The decoder 130 comprises a memory 131 and a processor 132 coupled to each other, and the processor 132 is configured to execute program instructions stored in the memory 131 to implement the steps of any of the embodiments of the method for decoding video frames described above. In one particular implementation scenario, the decoder 130 may include, but is not limited to: the decoder 130 may further include a mobile device such as a notebook computer, a tablet computer, and the like, which is not limited herein. In particular, the processor 132 is configured to control itself and the memory 131 to implement the steps of any of the decoding method embodiments.
The processor described above may also be referred to as a CPU (Central Processing Unit). The processor may be an integrated circuit chip having signal processing capabilities. The Processor may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processors may be collectively implemented by an integrated circuit chip.
Referring to fig. 14, fig. 14 is a block diagram illustrating an embodiment of a computer-readable storage medium according to the present application. The computer readable storage medium 140 stores program instructions 141 capable of being executed by a processor, the program instructions 141 for implementing the steps of any of the above-described embodiments of the filtering method for video frames, the steps of any of the embodiments of the encoding method, or the steps of any of the embodiments of the decoding method.
According to the scheme, the filtering processing mode corresponding to the frame type of each target image frame is determined, and the filtering processing modes corresponding to at least two different frame types are limited to be different, so that the pixel values of the pixel points of the target image frame can be filtered by using the filtering processing mode more matched with the frame type, the pertinence of filtering is improved, the filtering effect is improved, and the image quality of the target image frame is improved. In addition, the complexity of a neural network for filtering each target image frame through the neural network can be reduced by determining the filtering processing mode corresponding to the frame type of the target image frame.
In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.
The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to each other, and for brevity, will not be described again herein.
In the embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely one type of logical division, and other divisions may be realized in practice, for example, the unit or component may be combined or integrated with another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.

Claims (25)

1. A method for filtering a video frame, comprising:
acquiring a target image frame;
determining a filtering processing mode corresponding to the frame type of the target image frame;
and filtering the pixel values of the pixel points of the target image frame by using the corresponding filtering processing modes, wherein the filtering processing modes comprise a plurality of filtering sub-steps which are sequentially performed, the plurality of filtering sub-steps comprise neural network filtering, and the filtering processing modes corresponding to at least two different frame types are different.
2. The method according to claim 1, wherein the determining a filtering processing manner corresponding to the frame type of the target image frame comprises:
in the filtering substeps, the filtering substep before the image frame obtained by filtering is used as a reference frame is used as loop filtering;
determining that the loop filtering comprises neural network filtering when the frame type of the target image frame is an I frame; determining that the loop filtering does not include neural network filtering when the frame type of the target image frame is a B frame; alternatively, the first and second electrodes may be,
determining that the loop filtering does not include neural network filtering in the case that the frame type of the target image frame is an I frame; in a case where it is determined that the frame type of the target image frame is a B frame, the loop filtering includes neural network filtering.
3. The method according to claim 2, characterized in that, in the case where the loop filtering comprises neural network filtering, the loop filtering comprises in particular: neural network filtering, and at most two of deblocking filtering, sample adaptive compensation filtering, and adaptive loop filtering.
4. The method of claim 1,
when the frame type of the target image frame is determined to be an I frame, the corresponding filtering processing mode includes: sequentially carrying out sample adaptive compensation filtering, neural network filtering and adaptive loop filtering on pixel values of pixel points of the target image frame;
when the frame type of the target image frame is determined to be a B frame, the corresponding filtering processing mode includes: and sequentially carrying out deblocking filtering, sample adaptive compensation filtering, adaptive loop filtering and neural network filtering on the pixel values of the pixel points of the target image frame.
5. The method according to claim 1, wherein the determining a filtering processing manner corresponding to the frame type of the target image frame comprises:
under the condition that the frame type of the target image frame is determined to be a B frame, determining a time domain layer of the B frame target image frame;
and determining the filtering processing corresponding to the time domain layer of the B frame target image frame.
6. The method of claim 5, wherein said determining a filtering process corresponding to a temporal layer of said B frame target image frame comprises:
in a case where the target image frame is determined to be a B frame of a first preset temporal layer, the filtering process corresponding to the temporal layer of the B frame target image frame includes: deblocking filtering, sample adaptive compensation filtering, adaptive loop filtering and neural network filtering, and taking an image frame output by the neural network filtering as a reference frame;
in a case where the target image frame is determined to be a B frame of a second preset time domain layer, the filtering process corresponding to the time domain layer of the B frame target image frame includes: deblocking filtering, neural network filtering, sample adaptive compensation filtering and adaptive loop filtering, and taking an image frame output by the adaptive loop filtering as a reference frame;
in a case where the target image frame is determined to be a B frame of a third preset time domain layer, the filtering process corresponding to the time domain layer of the B frame target image frame includes: deblocking filtering, sample adaptive compensation filtering, adaptive loop filtering, and neural network filtering, and taking an image frame output by the adaptive loop filtering as a reference frame.
7. The method according to any one of claims 1 to 6, wherein the filtering the pixel values of the pixels of the target image frame using the corresponding filtering processing manner includes:
dividing the target image frame into a plurality of target blocks;
filtering pixel values of pixel points of a plurality of target blocks of the target image frame by using the corresponding filtering processing mode to obtain a filtered target block;
after the filtering the pixel values of the pixel points of the plurality of target blocks of the target image frame by using the corresponding filtering processing mode to obtain the filtered target block, the method further comprises: and obtaining a filtered target image frame by using the filtered target block.
8. The method of claim 7,
the segmenting the target image frame into a plurality of target blocks comprises: dividing the target image frame into a plurality of target blocks comprising a splicing area;
the obtaining a filtered target image frame by using the filtered target block includes: and obtaining the filtered target image frame by using the splicing area in the filtered target block.
9. The method according to any one of claims 1 to 6, wherein the filtering the pixel values of the pixels of the target image frame using the corresponding filtering processing manner includes:
and taking the target image frame and a reference frame of the target image frame as input to carry out neural network filtering so as to carry out neural network filtering on pixel values of the target image frame by utilizing the reference frame.
10. The method of claim 9, wherein the neural network filtering the target image frame and the reference frame of the target image frame as inputs comprises:
determining the number of I frames in a reference frame of a B frame target image frame with a frame type of a B frame;
the method comprises the steps of taking B frame target image frames containing different numbers of I frame reference frames and corresponding reference frames as input, and filtering the B frame target image frames by using a neural network filtering processing mode corresponding to the number of I frames in the reference frames of the B frame target image frames.
11. The method of claim 10, wherein said filtering the B frame target image frame using a neural network filtering process corresponding to the number of I frames in the reference frame of the B frame target image frame comprises:
under the condition that the reference frame comprises an I frame, taking the pixel value of one frame I frame as the channel value of a first preset channel to carry out neural network filtering processing;
and under the condition that the reference frames are all B frames, taking the pixel value of the B frame reference frame in the lowest time domain layer as the channel value of a second preset channel to carry out neural network filtering processing.
12. The method according to any one of claims 1 to 6, wherein the filtering the pixel values of the pixels of the target image frame using the corresponding filtering processing manner includes:
filtering the brightness component and the chrominance component of the pixel point of the target image frame in the same neural network filtering process; alternatively, the first and second electrodes may be,
and filtering the brightness component and the chrominance component of the pixel point of the target image frame in different neural network filtering processes.
13. The method of claim 12, wherein the filtering luminance components and chrominance of pixel points of the target image frame in different neural network filtering processes comprises: and respectively filtering the brightness component and the two chrominance components of the pixel point of the target image frame in three different neural network filtering processes.
14. The method of claim 13, wherein the filtering the luminance component and the two chrominance components of the pixel points of the target image frame in three different neural network filtering processes, respectively, comprises:
performing a first neural network filtering process on a luminance component of a pixel point of the target image frame for filtering, and taking the chrominance component as an input of the first neural network filtering process to filter the luminance component with the chrominance component;
performing a second neural network filtering process on one chrominance component of pixel points of the target image frame for filtering, and taking a luminance component as an input of the second neural network filtering process to filter the one chrominance component with the other chrominance component;
performing a third neural network filtering process on the other chrominance component of the pixel points of the target image frame for filtering, and taking the luminance component as an input of the third neural network filtering process to filter the other chrominance component with the one chrominance component.
15. The method according to any one of claims 1 to 6, wherein for each frame of the target image frame, the filtering the pixel values of the pixel points of the target image frame using the corresponding filtering processing manner includes:
obtaining quantization parameters of a plurality of quantization blocks contained in a current frame corresponding to the target image frame;
and carrying out neural network filtering corresponding to the preset range on the blocks, corresponding to the quantization blocks with the quantization parameters in different preset ranges, in the target image frame.
16. The method according to claim 15, wherein for each of the quantization blocks, the performing neural network filtering corresponding to a preset range on a corresponding block in the target image frame of the quantization block with a quantization parameter in a different preset range includes:
and respectively carrying out neural network filtering on the brightness component and the chrominance component of the pixel point of the block corresponding to the quantization block in the target image frame.
17. The method of claim 1, wherein the neural network filtering is performed using a neural network comprising hopping connections.
18. A method for encoding a video frame, comprising:
obtaining a target image frame of the current frame by using a residual error corresponding to the current frame;
performing a video frame filtering method on the target image frame to obtain a filtered target image frame, wherein the video frame filtering method is the method according to any one of claims 1 to 15;
encoding the current frame based on the filtered target image frame.
19. The method of claim 18, further comprising: setting a first switch related to a neural network filtering processing switch for a current sequence to which the current frame belongs;
the method of filtering a video frame to obtain a filtered target image frame is performed with the first switch on,
the encoding the current frame based on the filtered target image frame includes:
comparing the image quality of the target image frame after filtering with that of the target image frame before filtering;
if the target image frame after filtering is better than the target image frame before filtering, setting a neural network filtering processing switch of the current frame to be on;
and if the target image frame after filtering is inferior to the target image frame before filtering, setting a neural network filtering processing switch of the current frame to be off.
20. The method of claim 19,
the method for performing filtering of a video frame on the target image frame to obtain a filtered target image frame includes: executing the filtering method of the video frame on the brightness component and the two chrominance components of the target image frame respectively to obtain the filtered target image frame;
the comparing the image quality of the filtered target image frame with the image quality of the target image frame before filtering comprises the following steps: comparing the quality of each component of the target image frame after filtering with the quality of the corresponding component of the target image frame before filtering;
if the target image frame after filtering is better than the target image frame before filtering, setting a neural network filtering processing switch of the current frame to be on, comprising: if any component of the target image frame after filtering is superior to the corresponding component of the target image frame before filtering, setting a neural network filtering processing switch corresponding to any component of the current frame to be on;
if the target image frame after filtering is inferior to the target image frame before filtering, setting a neural network filtering processing switch of the current frame to be off, including: and if any component of the target image frame after filtering is inferior to the corresponding component of the target image frame before filtering, setting a neural network filtering processing switch corresponding to any component of the current frame to be off.
21. The method according to claim 18, wherein said performing the video frame filtering method according to any one of claims 1-17 on the target image frame to obtain a filtered target image frame comprises:
performing the filtering method of the video frame according to any one of claims 1 to 17 on a number of target blocks obtained by slicing the target image frame to obtain a filtered target image frame,
the encoding the current frame based on the filtered target image frame includes:
comparing the image quality of each filtered target block with that of each target block before filtering;
if the target block after filtering is superior to the target block before filtering, setting a neural network filtering processing switch corresponding to the target block to be on;
if the target block after filtering is inferior to the target block before filtering, setting a neural network filtering processing switch corresponding to the target block to be off;
determining whether there is a corresponding neural network filter processing switch of at least one target block set to on;
if yes, setting a neural network filtering processing switch of the current frame to be on;
and if not, setting the neural network filtering processing switch of the current frame to be off.
22. A method for decoding video frames, comprising:
obtaining a target image frame of the current frame by using a residual error corresponding to the current frame;
the method of filtering a video frame according to any one of claims 1-17 is performed on the target image frame to obtain a filtered target image frame.
23. An encoder, characterized in that the encoder comprises a processor and a memory coupled to each other, wherein the processor is configured to execute a computer program stored by the memory to implement the encoding method of any one of claims 18-21.
24. A decoder, comprising a processor and a memory coupled to each other, wherein the processor is configured to execute a computer program stored by the memory to implement the decoding method of claim 22.
25. A computer-readable storage medium, characterized in that a computer program is stored which can be run by a processor for implementing a method for filtering a video frame as claimed in any one of claims 1 to 17, or a method for encoding as claimed in any one of claims 18 to 21, or a method for decoding as claimed in claim 22.
CN202111162686.0A 2021-09-30 2021-09-30 Filtering method, encoding and decoding method, encoder and decoder and storage medium for video frame Pending CN114157869A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111162686.0A CN114157869A (en) 2021-09-30 2021-09-30 Filtering method, encoding and decoding method, encoder and decoder and storage medium for video frame

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111162686.0A CN114157869A (en) 2021-09-30 2021-09-30 Filtering method, encoding and decoding method, encoder and decoder and storage medium for video frame

Publications (1)

Publication Number Publication Date
CN114157869A true CN114157869A (en) 2022-03-08

Family

ID=80462645

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111162686.0A Pending CN114157869A (en) 2021-09-30 2021-09-30 Filtering method, encoding and decoding method, encoder and decoder and storage medium for video frame

Country Status (1)

Country Link
CN (1) CN114157869A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024060791A1 (en) * 2022-09-19 2024-03-28 腾讯科技(深圳)有限公司 Multimedia data processing method and apparatus, and device, storage medium and program product
WO2024077573A1 (en) * 2022-10-13 2024-04-18 Oppo广东移动通信有限公司 Encoding and decoding methods, encoder, decoder, code stream, and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024060791A1 (en) * 2022-09-19 2024-03-28 腾讯科技(深圳)有限公司 Multimedia data processing method and apparatus, and device, storage medium and program product
WO2024077573A1 (en) * 2022-10-13 2024-04-18 Oppo广东移动通信有限公司 Encoding and decoding methods, encoder, decoder, code stream, and storage medium

Similar Documents

Publication Publication Date Title
AU2020201212B2 (en) Adaptive color space transform coding
US8369404B2 (en) Moving image decoding device and moving image decoding method
EP3598758B1 (en) Encoder decisions based on results of hash-based block matching
CN111711824B (en) Loop filtering method, device and equipment in video coding and decoding and storage medium
CN109845263B (en) Apparatus and method for video encoding
JP2017508415A (en) Image encoding / decoding method and apparatus
US10958912B2 (en) Bit prediction based bit rate control method and apparatus for video coding process supporting offline CABAC
CN114157869A (en) Filtering method, encoding and decoding method, encoder and decoder and storage medium for video frame
CN111327904B (en) Image reconstruction method and device
GB2498550A (en) A method of processing image components for coding using image sample subsets comprising samples selected from neighbouring borders of first and second image
CN106254870A (en) Video coding method and system using adaptive color conversion
JP2022524375A (en) Video coding in triangular predictive unit mode with different chroma formats
CN113785573A (en) Encoder, decoder and corresponding methods using an adaptive loop filter
CN111819856A (en) Loop filtering apparatus and method for video encoding
CN112400323B (en) Image encoder, image decoder, and corresponding methods
CN116456101A (en) Image encoding method, image decoding method and related devices
CN113497937B (en) Image encoding method, image decoding method and related devices
CN116235496A (en) Encoding method, decoding method, encoder, decoder, and encoding system
CN114902670B (en) Method and apparatus for signaling sub-image division information
CN115836525A (en) Method and system for prediction from multiple cross components
CN115398905A (en) High level syntax control of loop filter
TWI597977B (en) Video encoding methods and systems using adaptive color transform
CN114071162A (en) Image encoding method, image decoding method and related device
WO2023123398A1 (en) Filtering method, filtering apparatus, and electronic device
WO2020259330A1 (en) Non-separable transformation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination