CN113038126B - Multi-description video coding method and decoding method based on frame prediction neural network - Google Patents
Multi-description video coding method and decoding method based on frame prediction neural network Download PDFInfo
- Publication number
- CN113038126B CN113038126B CN202110261181.3A CN202110261181A CN113038126B CN 113038126 B CN113038126 B CN 113038126B CN 202110261181 A CN202110261181 A CN 202110261181A CN 113038126 B CN113038126 B CN 113038126B
- Authority
- CN
- China
- Prior art keywords
- odd
- sequence
- frame
- frame sequence
- frames
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/12—Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
- H04N19/122—Selection of transform size, e.g. 8x8 or 2x4x8 DCT; Selection of sub-band transforms of varying structure or type
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/13—Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/184—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/625—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using discrete cosine transform [DCT]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/01—Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
- H04N7/0117—Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level involving conversion of the spatial resolution of the incoming video signal
- H04N7/012—Conversion between an interlaced and a progressive signal
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Discrete Mathematics (AREA)
- Computer Graphics (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The invention discloses a multi-description video coding method and a decoding method based on a frame prediction neural network. And aiming at the problem of frame loss caused by time down-sampling, a frame prediction neural network is adopted to respectively predict the lost frames in the corresponding sequences. The predicted frame is subtracted from the encoded video frame of the corresponding sequence to obtain residual information, and the residual information and the encoded information of the current sequence form a description. And packing the two described code streams and transmitting the code streams to a decoding end through different channels respectively. The multi-description video coding formed by the method of the invention ensures that the code stream has certain error recovery capability, and the decoding end can fully utilize the relevant information among the descriptions to ensure the high-quality video reconstruction of the decoding end under the unreliable network transmission.
Description
Technical Field
The invention relates to the field of error compensation, in particular to a multi-description video coding method and a decoding method based on a frame prediction neural network.
Background
In recent years, with the rapid development of multimedia technology and internet technology, video communication, video applications (such as telemedicine, video conference, remote teaching) and the like are widely promoted, the lives of people are greatly enriched, and the happiness index of the lives of people is improved. Meanwhile, the demand for ultrahigh resolution (3840 × 2160, 7680 × 4320) and high frame rate (120 fps) videos is increasing. As video resolution and frame rate increase, the amount of multimedia data transmitted over the internet has increased explosively. On the background of meeting the requirements of high-definition and ultra-high-definition video transmission, a new generation of video coding standard HEVC is in force.
The new generation of video coding standard HEVC has the advantage of high compression rate, but still has the disadvantage of low fault tolerance resistance. In practical applications, unreliable channels are ubiquitous, such as channel interference, network congestion, burst errors of wireless channels, and the like. When a video code stream is transmitted through an unreliable channel, phenomena such as data packet loss, bit errors and the like are easily generated, so that the quality of a video received by a receiving end is seriously reduced.
Therefore, the multiple description coding is an effective technology for unreliable network transmission video, which divides a source video into two or more sub-videos, wherein the sub-videos have own specific information and other description protection information, and the protection information plays an important role for the multiple descriptions, and provides effective error recovery capability and improves the video quality at a decoding end. As the number of received descriptions increases, the video quality at the decoding end also becomes better. Therefore, it is necessary to research a HEVC multi-description video coding method with fault tolerance capability.
Disclosure of Invention
The invention mainly aims to design a multi-description video coding method with fault-tolerant capability by combining a deep learning method, and provides a multi-description video coding method based on a frame prediction neural network.
The invention adopts the following technical scheme:
a multi-description video coding method based on a frame prediction neural network is characterized by comprising the following steps:
a1 Divide the input video into a sequence of odd frames FOAnd an even frame sequence FERespectively encoded by an HEVC encoder to obtain a reconstructed odd frame sequence F'OAnd even frame sequence F'E;
A2 F 'from a sequence of odd frames'OAnd even frame sequence F'ERespectively inputting a frame to predict a neural network FP-CNN to obtain a predicted even frame sequence F'EIAnd predicted odd frame sequence F'OI;
A3 Predicted even frame sequence F'EIAnd a reconstructed even frame sequence F'ESubtracting to obtain an even residual FEIRThe predicted odd frame sequence and the reconstructed odd frame sequence F'OSubtracting to obtain odd residual FOIR;
A4 Even residual F)EIRSum odd residual FOIRRespectively obtaining even residual error code streams F through residual error codingESISum odd residual code stream FOSI;
A5 Sequence F 'of reconstructed odd frames'OSum even residual error code stream FESIPackaging into description 1, and carrying out reconstruction on even frame sequence F'ESum odd residual code stream FOSIPacked into description 2, and transmitted to a decoding end through different channels, respectively.
Preferably, videos of various scenes are selected, a source video is divided into odd frames and even frames, the videos are coded and decoded by an original HEVC (high efficiency video coding) coder when different QP (quantization parameter) values are set, the coded and decoded videos serve as training data, the original odd frames or the original even frames serve as training labels, and a data set is formed and used for training the frame prediction neural network FP-CNN.
Preferably, the frame prediction neural network FP-CNN includes an encoder-decoder, and the output characteristics of the encoder and the output characteristics of the decoder in the same scale adopt a skip connection mode.
Preferably, the input odd frame sequence F'OAnd even frame sequence F'EExtracting features through an encoder and a decoder, wherein the extracted features are provided for four sub-networks; estimating 1/4 output pixel by 1-dimensional kernel in dense pixel mode for each sub-network, and then combining the estimated pixel kernel with odd frame sequence F'OOr even frame sequence F'ELocally convolving consecutive two-frame video frames to generate the predicted even frame sequence F'EIOr a predicted odd frame sequence F'OI。
Preferably, the encoder and the decoder are provided with a convolution layer, an average pooling layer and a bilinear upsampling layer; each of the sub-networks includes one bilinear upsampling layer and three convolutional layers.
A multi-description video decoding method based on a frame prediction neural network is characterized by comprising the following steps:
b1 The decoder receives the description 1 and the description 2, judges whether a lost video frame is generated, and if not, samples the odd frame and the even frame of the description 1 and the description 2 according to the sequence of the odd frame and the even frame to obtain a decoding reconstruction video sequence of the full frame rate; if yes, entering step B2);
b2 Description 1 and description 2 are decoded by an HEVC standard decoder to obtain a reconstructed odd frame sequence F'OAnd even frame sequence F'ESequence F 'of reconstructed odd frames'OAnd even frame sequence F'ERespectively inputting a frame to predict a neural network FP-CNN to obtain a predicted even frame sequence F'EIAnd predicted odd frame sequence F'OI;
B3 Even residual F)ESISum odd residual FOSIRespectively obtaining even residual errors F 'after residual errors are decoded'ESIAnd decoded odd residual F'OSI;
B4 Utilizing predicted even frame sequence F'EIAnd even residual F'ESIObtaining missing even frames, utilizing a sequence of predicted odd frames F'OIAnd even residual F'OSIObtaining missing odd frames, and reusing even frames and even frame sequence F'EOdd frames and odd frame sequences F'OThe reconstructed video sequence is upsampled in parity frame order.
Preferably, videos of various scenes are selected, a source video is divided into odd frames and even frames, the videos are coded and decoded by an original HEVC (high efficiency video coding) coder when different QP (quantization parameter) values are set, the coded and decoded videos serve as training data, the original odd frames or the original even frames serve as training labels, and a data set is formed and used for training the frame prediction neural network FP-CNN.
Preferably, the frame prediction neural network FP-CNN includes an encoder-decoder, and the output characteristics of the encoder and the output characteristics of the decoder in the same scale adopt a skip connection mode.
Preferably, the input odd frame sequence F'OAnd even frame sequence F'EExtracting features through an encoder and a decoder, wherein the extracted features are provided for four sub-networks; estimating 1/4 of output pixels by using a 1-dimensional kernel in a dense pixel mode for each sub-network, and then enabling the estimated pixel kernel to be connected with an odd frame sequence F'OOr even frame sequence F'EThe two consecutive video frames are partially convolved to generate the predicted even frame sequence F'EIOr predicted sequence of odd frames F'OI。
Preferably, the encoder and the decoder are provided with a convolutional layer, an average pooling layer and a bilinear upsampling layer; each of the sub-networks includes one bilinear upsampling layer and three convolutional layers.
As can be seen from the above description of the present invention, compared with the prior art, the present invention has the following advantages:
1. according to the method, a coding end divides a source video into odd frames and even frames by adopting a time down-sampling method, the odd frames and the even frames are respectively formed into two new sequences, and coding is carried out through an HEVC (high efficiency video coding) coder. For the Frame loss problem caused by time down sampling, frame Prediction-conditional Neural Network (FP-CNN) is adopted to predict the lost frames in the corresponding sequence respectively. The predicted frame is subtracted from the encoded video frame of the corresponding sequence to obtain residual information, and the residual information and the encoded information of the current sequence form a description. The two described code streams are packed and transmitted to a decoding end through different channels respectively, and the multi-description video coding formed by the method enables the code streams to have certain error recovery capability.
2. When only one description is received, the HEVC decoder is adopted to reconstruct the received frame, the prediction information of the lost frame is obtained through the FP-CNN frame prediction neural network and is added with the residual decoding information to reconstruct the lost frame, and the reconstructed frame is restored to the original video frame rate according to the sequence of the odd and even frames; when the decoding end receives the two descriptions, the HEVC decoder is respectively adopted to obtain corresponding reconstructed frames, and the frame rate of the source video is restored according to the sequence of the parity frames. Namely, the decoding end can fully utilize the relevant information between the descriptions to ensure the high-quality video reconstruction of the decoding end under the unreliable network transmission.
Drawings
FIG. 1 is a flow chart of an encoding method according to the present invention;
FIG. 2 is a flow chart of a decoding method according to the present invention;
FIG. 3 is a diagram of a frame prediction neural network FP-CNN according to the present invention.
The invention is described in further detail below with reference to the figures and specific examples.
Detailed Description
The invention is further described below by means of specific embodiments.
The terms "first," "second," "third," and the like in this disclosure are used solely to distinguish between similar items and not necessarily to describe a particular order or sequence, nor are they to be construed as indicating or implying relative importance. In the description, the directions or positional relationships indicated by "up", "down", "left", "right", "front" and "rear" are used based on the directions or positional relationships shown in the drawings, and are only for convenience of describing the present invention, and do not indicate or imply that the device referred to must have a specific direction, be constructed and operated in a specific direction, and thus, should not be construed as limiting the scope of the present invention. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate.
In addition, in the description of the present application, "a plurality" means two or more unless otherwise specified. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
Referring to fig. 1, a multi-description video coding method based on a frame prediction neural network includes the following steps:
a1 Divide the input video into a sequence of odd frames FOAnd an even frame sequence FERespectively encoded by an HEVC encoder to obtain a reconstructed odd frame sequence F'OAnd even frame sequence F'E。
A2 F 'from a sequence of odd frames'OAnd even frame sequence F'ERespectively inputting a frame to predict a neural network FP-CNN to obtain a predicted even frame sequence F'EIAnd predicted odd frame sequence F'OI。
Half of the video frames in each sub-sequence are missing due to the temporal down-sampling, and a frame prediction module FP-CNN is used to predict the missing video sequence in each sub-sequence.
The frame prediction neural network FP-CNN structure is shown in figure 3, and comprises an encoder and a decoder, wherein the output characteristics of the encoder and the output characteristics of the decoder in the same scale adopt a skip connection mode. The encoder and decoder are provided with a convolutional layer, an average pooling layer, and a bilinear upsampling layer.
Input odd frame sequence F'OAnd even frame sequence F'EFeature extraction is carried out through an encoder and a decoder, and the extracted features are provided for four sub-networks; estimating 1/4 output pixel by 1-dimensional kernel in dense pixel mode for each sub-network, and then combining the estimated pixel kernel with odd frame sequence F'OOr even frame sequence F'EThe two consecutive video frames are partially convolved to generate the predicted even frame sequence F'EIOr a predicted odd frame sequence F'OI. Wherein each sub-network comprises one bilinear upsampling layer and three convolutional layers.
A3 Predicted even frame sequence F'EIAnd a reconstructed even frame sequence F'EESubtracting to obtain an even residual FEIRThe predicted odd frame sequence and the reconstructed odd frame sequence F'OSubtracting to obtain odd residual FOIR。
A4 Even residual F)EIRSum odd residual FOIRRespectively obtaining even residual error code stream F through residual error codingESISum odd residual code stream FOSI. In this step, residual coding is F to be inputEIRAnd FOIRDividing into 8 × 8 blocks, performing DCT transformation, quantization and entropy coding to obtain even residual error code stream FESISum odd residual code stream FOSI。
A5 Sequence F 'of reconstructed odd frames'OSum even residual error code stream FESIPackaging into description 1, and carrying out reconstruction on even frame sequence F'EAnd odd residual code stream FOSIPacked into description 2, and transmitted to a decoding end through different channels, respectively.
In the invention, a data set is adopted to train and test the predictive neural network FP-CNN in advance. Specifically, videos of various scenes are selected, a source video is divided into odd frames and even frames, when different QP values are set, the videos are coded and decoded by an original HEVC (high efficiency video coding) coder, the coded and decoded videos serve as training data, the original odd frames or the original even frames serve as training labels, a data set is formed and used for training a frame prediction neural network FP-CNN, and the trained frame prediction neural network FP-CNN can be used for predicting lost frames.
Referring to fig. 2, the present invention further provides a multi-description video decoding method based on a frame prediction neural network, including the following steps:
b1 A decoding end receives the description 1 and the description 2, judges whether a lost video frame is generated, and if not, samples the odd frame and the even frame of the description 1 and the description 2 according to the sequence of the odd frame and the even frame to obtain a decoding reconstruction video sequence of a full frame rate; if yes, go to step B2). In this step, description 1 and description 2 are formed by packing the multi-description video coding method based on the frame prediction neural network.
B2) Decoding description 1 and description 2 by an HEVC standard decoder to obtain a reconstructed odd frame sequence F'OAnd even frame sequence F'EAnd (c) converting the reconstructed odd frame sequence F'OAnd even frame sequence F'ERespectively inputting a frame to predict a neural network FP-CNN to obtain a predicted even frame sequence F'EIAnd predicted odd frame sequence F'OI;
B3 Even residual F)ESISum odd residual FOSIRespectively obtaining even residual errors F 'after residual errors are decoded'ESIAnd decoded odd residual F'OSI;
B4 Utilizing predicted even frame sequence F'EIAnd even residual F'ESIObtaining missing even frames, utilizing a sequence of predicted odd frames F'OIAnd even residual F'OSIObtaining missing odd frames, and reusing even frames and even frame sequence F'EOdd frame and odd frame sequence F'OThe reconstructed video sequence is upsampled in parity frame order.
Selecting videos of various scenes, dividing a source video into odd frames and even frames, coding and decoding the videos through an original HEVC (high efficiency video coding) coder when different QP (quantization parameter) values are set, taking the coded and decoded videos as training data, taking the original odd frames or even frames as training labels, forming a data set for training a frame prediction neural network FP-CNN, and using the trained frame prediction neural network FP-CNN for predicting lost frames.
The frame prediction neural network FP-CNN is the same as the above network, and includes an encoder-decoder, and the output characteristics of the encoder and the output characteristics of the decoder in the same scale adopt a skip connection mode.
Input odd frame sequence F'OAnd even frame sequence F'EExtracting features through an encoder and a decoder, wherein the extracted features are provided for four sub-networks; estimating 1/4 output pixel by 1-dimensional kernel in dense pixel mode for each sub-network, and then combining the estimated pixel kernel with odd frame sequence F'OOr even frame sequence F'EThe two consecutive video frames are partially convolved to generate the predicted even frame sequence F'EIOr a predicted odd frame sequence F'OI。
The encoder and the decoder are provided with a convolution layer, an average pooling layer and a bilinear upsampling layer; each sub-network comprises one bilinear upsampling layer and three convolutional layers.
The multi-description video coding formed by the method of the invention ensures that the code stream has certain error recovery capability, and the decoding end can fully utilize the relevant information among the descriptions to ensure the high-quality video reconstruction of the decoding end under the unreliable network transmission.
The above description is only an embodiment of the present invention, but the design concept of the present invention is not limited thereto, and any insubstantial modifications made by using the design concept should fall within the scope of infringing the present invention.
Claims (2)
1. A multi-description video coding method based on a frame prediction neural network is characterized by comprising the following steps:
a1 Divide the input video into a sequence of odd frames FOAnd even frame sequence FERespectively encoded by an HEVC encoder to obtain a reconstructed odd frame sequence F'OAnd even frame sequence F'E;
A2 F 'from a sequence of odd frames'OAnd even frame sequence F'ERespectively inputting a frame to predict a neural network FP-CNN to obtain a predicted even frame sequence F'EIAnd predicted odd frame sequence F'OI;
A3 Predicted even frame sequence F'EIAnd a reconstructed even frame sequence F'ESubtracting to obtain an even residual FEIRA predicted odd frame sequence and a reconstructed odd frame sequence F'OSubtracting to obtain odd residual FOIR;
A4 Even residual F)EIRSum odd residual FOIRRespectively obtaining even residual error code stream F through residual error codingESISum odd residual code stream FOSI;
A5 Sequence F 'of reconstructed odd frames'OSum even residual code stream FESIPacking into description 1, and transmitting the reconstructed even frame sequence F'EAnd odd residual code stream FOSIPacked into descriptions 2, respectivelyTransmitting to a decoding end through different channels;
selecting videos of various scenes, dividing a source video into odd frames and even frames, coding and decoding the videos through an original HEVC (high efficiency video coding) coder when different QP (quantization parameter) values are set, taking the coded and decoded videos as training data, and taking the original odd frames or even frames as training labels to form a data set for training the frame prediction neural network FP-CNN; the frame prediction neural network FP-CNN comprises an encoder-decoder, and the output characteristics of the encoder and the output characteristics of the decoder with the same scale adopt a skip connection mode;
the input odd frame sequence F'OAnd even frame sequence F'EFeature extraction is carried out through an encoder and a decoder, and the extracted features are provided for four sub-networks; estimating 1/4 output pixel by 1-dimensional kernel in dense pixel mode for each sub-network, and then combining the estimated pixel kernel with odd frame sequence F'OOr even frame sequence F'EThe two consecutive video frames are partially convolved to generate the predicted even frame sequence F'EIOr predicted sequence of odd frames F'OI;
The encoder and the decoder are provided with a convolution layer, an average pooling layer and a bilinear upsampling layer; each of the subnetworks includes one bilinear upsampling layer and three convolutional layers.
2. A multi-description video decoding method based on a frame prediction neural network is characterized by comprising the following steps:
b1 The decoder receives the description 1 and the description 2, judges whether a lost video frame is generated, and if not, samples the odd frame and the even frame of the description 1 and the description 2 according to the sequence of the odd frame and the even frame to obtain a decoding reconstruction video sequence of the full frame rate; if yes, entering a step B2);
b2 Description 1 and description 2 are decoded by an HEVC standard decoder to obtain a reconstructed odd frame sequence F'OAnd even frame sequence F'ESequence F 'of reconstructed odd frames'OAnd even frame sequence F'ERespectively inputting a frame to predict a neural network FP-CNN to obtain a predicted even frame sequence F'EIAnd predictedOdd frame sequence F'OI;
B3 Even residual F)ESISum odd residual FOSIRespectively obtaining even residual errors F 'after residual errors are decoded'ESIAnd decoded odd residual F'OSI;
B4 Utilizing predicted even frame sequence F'EIAnd even residual F'ESIObtaining missing even frames, utilizing a sequence of predicted odd frames F'OIAnd even residual F'OSIObtaining missing odd frames, and reusing even frames and even frame sequence F'EOdd and odd frame sequence F'OThe video sequence is up-sampled and reconstructed according to the parity frame sequence;
selecting videos of various scenes, dividing a source video into odd frames and even frames, coding and decoding the videos through an original HEVC (high efficiency video coding) coder when different QP (quantization parameter) values are set, taking the coded and decoded videos as training data, and taking the original odd frames or even frames as training labels to form a data set for training the frame prediction neural network FP-CNN;
the frame prediction neural network FP-CNN comprises an encoder-decoder, and the output characteristics of the encoder and the output characteristics of the decoder with the same scale adopt a skip connection mode; the input odd frame sequence F'OAnd even frame sequence F'EExtracting features through an encoder and a decoder, wherein the extracted features are provided for four sub-networks;
estimating 1/4 of output pixels by using a 1-dimensional kernel in a dense pixel mode for each sub-network, and then enabling the estimated pixel kernel to be connected with an odd frame sequence F'OOr even frame sequence F'ELocally convolving consecutive two-frame video frames to generate the predicted even frame sequence F'EIOr predicted sequence of odd frames F'OI;
The encoder and the decoder are provided with a convolution layer, an average pooling layer and a bilinear upsampling layer; each of the sub-networks includes one bilinear upsampling layer and three convolutional layers.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110261181.3A CN113038126B (en) | 2021-03-10 | 2021-03-10 | Multi-description video coding method and decoding method based on frame prediction neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110261181.3A CN113038126B (en) | 2021-03-10 | 2021-03-10 | Multi-description video coding method and decoding method based on frame prediction neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113038126A CN113038126A (en) | 2021-06-25 |
CN113038126B true CN113038126B (en) | 2022-11-01 |
Family
ID=76469255
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110261181.3A Active CN113038126B (en) | 2021-03-10 | 2021-03-10 | Multi-description video coding method and decoding method based on frame prediction neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113038126B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113452944B (en) * | 2021-08-31 | 2021-11-02 | 江苏北弓智能科技有限公司 | Picture display method of cloud mobile phone |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8391370B1 (en) * | 2009-03-11 | 2013-03-05 | Hewlett-Packard Development Company, L.P. | Decoding video data |
CN103501441A (en) * | 2013-09-11 | 2014-01-08 | 北京交通大学长三角研究院 | Multiple-description video coding method based on human visual system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102630012B (en) * | 2012-03-30 | 2014-09-03 | 北京交通大学 | Coding and decoding method, device and system based on multiple description videos |
-
2021
- 2021-03-10 CN CN202110261181.3A patent/CN113038126B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8391370B1 (en) * | 2009-03-11 | 2013-03-05 | Hewlett-Packard Development Company, L.P. | Decoding video data |
CN103501441A (en) * | 2013-09-11 | 2014-01-08 | 北京交通大学长三角研究院 | Multiple-description video coding method based on human visual system |
Non-Patent Citations (2)
Title |
---|
Multiple description coding for multi-view video with adaptive redundancy allocation;Jing Chen等;《2017 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)》;20180122;全文 * |
基于帧的多描述视频编码及其错误隐藏;励金祥等;《光子学报》;20100515(第05期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113038126A (en) | 2021-06-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101189882B (en) | Method and apparatus for encoder assisted-frame rate up conversion (EA-FRUC) for video compression | |
CN103139559B (en) | Multi-media signal transmission method and device | |
KR101425602B1 (en) | Method and apparatus for encoding/decoding image | |
KR100714689B1 (en) | Method for multi-layer based scalable video coding and decoding, and apparatus for the same | |
JP5014989B2 (en) | Frame compression method, video coding method, frame restoration method, video decoding method, video encoder, video decoder, and recording medium using base layer | |
CN103220508B (en) | Coding and decoding method and device | |
CN100512446C (en) | A multi-description video encoding and decoding method based on self-adapted time domain sub-sampling | |
JP2008527902A (en) | Adaptive entropy coding and decoding method and apparatus for stretchable coding | |
CN101573883A (en) | Systems and methods for signaling and performing temporal level switching in scalable video coding | |
CN107995493B (en) | Multi-description video coding method of panoramic video | |
KR20060063613A (en) | Method for scalably encoding and decoding video signal | |
CN100455020C (en) | Screen coding method under low code rate | |
JP2007520149A (en) | Scalable video coding apparatus and method for providing scalability from an encoder unit | |
CN103098471A (en) | Method and apparatus of layered encoding/decoding a picture | |
CN102630012B (en) | Coding and decoding method, device and system based on multiple description videos | |
KR20060063605A (en) | Method and apparatus for encoding video signal, and transmitting and decoding the encoded data | |
CN102438152B (en) | Scalable video coding (SVC) fault-tolerant transmission method, coder, device and system | |
CN113132735A (en) | Video coding method based on video frame generation | |
CN113038126B (en) | Multi-description video coding method and decoding method based on frame prediction neural network | |
CN103139571A (en) | Video fault-tolerant error-resisting method based on combination of forward error correction (FEC) and WZ encoding and decoding | |
CN112532908B (en) | Video image transmission method, sending equipment, video call method and equipment | |
CN1672421A (en) | Method and apparatus for performing multiple description motion compensation using hybrid predictive codes | |
CN104363454A (en) | Method and system for video coding and decoding of high-bit-rate images | |
CN111510721B (en) | Multi-description coding high-quality edge reconstruction method based on spatial downsampling | |
US20130223529A1 (en) | Scalable Video Encoding Using a Hierarchical Epitome |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |