CN113038126B - Multi-description video coding method and decoding method based on frame prediction neural network - Google Patents

Multi-description video coding method and decoding method based on frame prediction neural network Download PDF

Info

Publication number
CN113038126B
CN113038126B CN202110261181.3A CN202110261181A CN113038126B CN 113038126 B CN113038126 B CN 113038126B CN 202110261181 A CN202110261181 A CN 202110261181A CN 113038126 B CN113038126 B CN 113038126B
Authority
CN
China
Prior art keywords
odd
sequence
frame
frame sequence
frames
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110261181.3A
Other languages
Chinese (zh)
Other versions
CN113038126A (en
Inventor
陈婧
林琦
曾焕强
朱建清
蔡灿辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaqiao University
Original Assignee
Huaqiao University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaqiao University filed Critical Huaqiao University
Priority to CN202110261181.3A priority Critical patent/CN113038126B/en
Publication of CN113038126A publication Critical patent/CN113038126A/en
Application granted granted Critical
Publication of CN113038126B publication Critical patent/CN113038126B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/12Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
    • H04N19/122Selection of transform size, e.g. 8x8 or 2x4x8 DCT; Selection of sub-band transforms of varying structure or type
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/184Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/625Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using discrete cosine transform [DCT]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/01Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
    • H04N7/0117Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level involving conversion of the spatial resolution of the incoming video signal
    • H04N7/012Conversion between an interlaced and a progressive signal

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Discrete Mathematics (AREA)
  • Computer Graphics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses a multi-description video coding method and a decoding method based on a frame prediction neural network. And aiming at the problem of frame loss caused by time down-sampling, a frame prediction neural network is adopted to respectively predict the lost frames in the corresponding sequences. The predicted frame is subtracted from the encoded video frame of the corresponding sequence to obtain residual information, and the residual information and the encoded information of the current sequence form a description. And packing the two described code streams and transmitting the code streams to a decoding end through different channels respectively. The multi-description video coding formed by the method of the invention ensures that the code stream has certain error recovery capability, and the decoding end can fully utilize the relevant information among the descriptions to ensure the high-quality video reconstruction of the decoding end under the unreliable network transmission.

Description

Multi-description video coding method and decoding method based on frame prediction neural network
Technical Field
The invention relates to the field of error compensation, in particular to a multi-description video coding method and a decoding method based on a frame prediction neural network.
Background
In recent years, with the rapid development of multimedia technology and internet technology, video communication, video applications (such as telemedicine, video conference, remote teaching) and the like are widely promoted, the lives of people are greatly enriched, and the happiness index of the lives of people is improved. Meanwhile, the demand for ultrahigh resolution (3840 × 2160, 7680 × 4320) and high frame rate (120 fps) videos is increasing. As video resolution and frame rate increase, the amount of multimedia data transmitted over the internet has increased explosively. On the background of meeting the requirements of high-definition and ultra-high-definition video transmission, a new generation of video coding standard HEVC is in force.
The new generation of video coding standard HEVC has the advantage of high compression rate, but still has the disadvantage of low fault tolerance resistance. In practical applications, unreliable channels are ubiquitous, such as channel interference, network congestion, burst errors of wireless channels, and the like. When a video code stream is transmitted through an unreliable channel, phenomena such as data packet loss, bit errors and the like are easily generated, so that the quality of a video received by a receiving end is seriously reduced.
Therefore, the multiple description coding is an effective technology for unreliable network transmission video, which divides a source video into two or more sub-videos, wherein the sub-videos have own specific information and other description protection information, and the protection information plays an important role for the multiple descriptions, and provides effective error recovery capability and improves the video quality at a decoding end. As the number of received descriptions increases, the video quality at the decoding end also becomes better. Therefore, it is necessary to research a HEVC multi-description video coding method with fault tolerance capability.
Disclosure of Invention
The invention mainly aims to design a multi-description video coding method with fault-tolerant capability by combining a deep learning method, and provides a multi-description video coding method based on a frame prediction neural network.
The invention adopts the following technical scheme:
a multi-description video coding method based on a frame prediction neural network is characterized by comprising the following steps:
a1 Divide the input video into a sequence of odd frames FOAnd an even frame sequence FERespectively encoded by an HEVC encoder to obtain a reconstructed odd frame sequence F'OAnd even frame sequence F'E
A2 F 'from a sequence of odd frames'OAnd even frame sequence F'ERespectively inputting a frame to predict a neural network FP-CNN to obtain a predicted even frame sequence F'EIAnd predicted odd frame sequence F'OI
A3 Predicted even frame sequence F'EIAnd a reconstructed even frame sequence F'ESubtracting to obtain an even residual FEIRThe predicted odd frame sequence and the reconstructed odd frame sequence F'OSubtracting to obtain odd residual FOIR
A4 Even residual F)EIRSum odd residual FOIRRespectively obtaining even residual error code streams F through residual error codingESISum odd residual code stream FOSI
A5 Sequence F 'of reconstructed odd frames'OSum even residual error code stream FESIPackaging into description 1, and carrying out reconstruction on even frame sequence F'ESum odd residual code stream FOSIPacked into description 2, and transmitted to a decoding end through different channels, respectively.
Preferably, videos of various scenes are selected, a source video is divided into odd frames and even frames, the videos are coded and decoded by an original HEVC (high efficiency video coding) coder when different QP (quantization parameter) values are set, the coded and decoded videos serve as training data, the original odd frames or the original even frames serve as training labels, and a data set is formed and used for training the frame prediction neural network FP-CNN.
Preferably, the frame prediction neural network FP-CNN includes an encoder-decoder, and the output characteristics of the encoder and the output characteristics of the decoder in the same scale adopt a skip connection mode.
Preferably, the input odd frame sequence F'OAnd even frame sequence F'EExtracting features through an encoder and a decoder, wherein the extracted features are provided for four sub-networks; estimating 1/4 output pixel by 1-dimensional kernel in dense pixel mode for each sub-network, and then combining the estimated pixel kernel with odd frame sequence F'OOr even frame sequence F'ELocally convolving consecutive two-frame video frames to generate the predicted even frame sequence F'EIOr a predicted odd frame sequence F'OI
Preferably, the encoder and the decoder are provided with a convolution layer, an average pooling layer and a bilinear upsampling layer; each of the sub-networks includes one bilinear upsampling layer and three convolutional layers.
A multi-description video decoding method based on a frame prediction neural network is characterized by comprising the following steps:
b1 The decoder receives the description 1 and the description 2, judges whether a lost video frame is generated, and if not, samples the odd frame and the even frame of the description 1 and the description 2 according to the sequence of the odd frame and the even frame to obtain a decoding reconstruction video sequence of the full frame rate; if yes, entering step B2);
b2 Description 1 and description 2 are decoded by an HEVC standard decoder to obtain a reconstructed odd frame sequence F'OAnd even frame sequence F'ESequence F 'of reconstructed odd frames'OAnd even frame sequence F'ERespectively inputting a frame to predict a neural network FP-CNN to obtain a predicted even frame sequence F'EIAnd predicted odd frame sequence F'OI
B3 Even residual F)ESISum odd residual FOSIRespectively obtaining even residual errors F 'after residual errors are decoded'ESIAnd decoded odd residual F'OSI
B4 Utilizing predicted even frame sequence F'EIAnd even residual F'ESIObtaining missing even frames, utilizing a sequence of predicted odd frames F'OIAnd even residual F'OSIObtaining missing odd frames, and reusing even frames and even frame sequence F'EOdd frames and odd frame sequences F'OThe reconstructed video sequence is upsampled in parity frame order.
Preferably, videos of various scenes are selected, a source video is divided into odd frames and even frames, the videos are coded and decoded by an original HEVC (high efficiency video coding) coder when different QP (quantization parameter) values are set, the coded and decoded videos serve as training data, the original odd frames or the original even frames serve as training labels, and a data set is formed and used for training the frame prediction neural network FP-CNN.
Preferably, the frame prediction neural network FP-CNN includes an encoder-decoder, and the output characteristics of the encoder and the output characteristics of the decoder in the same scale adopt a skip connection mode.
Preferably, the input odd frame sequence F'OAnd even frame sequence F'EExtracting features through an encoder and a decoder, wherein the extracted features are provided for four sub-networks; estimating 1/4 of output pixels by using a 1-dimensional kernel in a dense pixel mode for each sub-network, and then enabling the estimated pixel kernel to be connected with an odd frame sequence F'OOr even frame sequence F'EThe two consecutive video frames are partially convolved to generate the predicted even frame sequence F'EIOr predicted sequence of odd frames F'OI
Preferably, the encoder and the decoder are provided with a convolutional layer, an average pooling layer and a bilinear upsampling layer; each of the sub-networks includes one bilinear upsampling layer and three convolutional layers.
As can be seen from the above description of the present invention, compared with the prior art, the present invention has the following advantages:
1. according to the method, a coding end divides a source video into odd frames and even frames by adopting a time down-sampling method, the odd frames and the even frames are respectively formed into two new sequences, and coding is carried out through an HEVC (high efficiency video coding) coder. For the Frame loss problem caused by time down sampling, frame Prediction-conditional Neural Network (FP-CNN) is adopted to predict the lost frames in the corresponding sequence respectively. The predicted frame is subtracted from the encoded video frame of the corresponding sequence to obtain residual information, and the residual information and the encoded information of the current sequence form a description. The two described code streams are packed and transmitted to a decoding end through different channels respectively, and the multi-description video coding formed by the method enables the code streams to have certain error recovery capability.
2. When only one description is received, the HEVC decoder is adopted to reconstruct the received frame, the prediction information of the lost frame is obtained through the FP-CNN frame prediction neural network and is added with the residual decoding information to reconstruct the lost frame, and the reconstructed frame is restored to the original video frame rate according to the sequence of the odd and even frames; when the decoding end receives the two descriptions, the HEVC decoder is respectively adopted to obtain corresponding reconstructed frames, and the frame rate of the source video is restored according to the sequence of the parity frames. Namely, the decoding end can fully utilize the relevant information between the descriptions to ensure the high-quality video reconstruction of the decoding end under the unreliable network transmission.
Drawings
FIG. 1 is a flow chart of an encoding method according to the present invention;
FIG. 2 is a flow chart of a decoding method according to the present invention;
FIG. 3 is a diagram of a frame prediction neural network FP-CNN according to the present invention.
The invention is described in further detail below with reference to the figures and specific examples.
Detailed Description
The invention is further described below by means of specific embodiments.
The terms "first," "second," "third," and the like in this disclosure are used solely to distinguish between similar items and not necessarily to describe a particular order or sequence, nor are they to be construed as indicating or implying relative importance. In the description, the directions or positional relationships indicated by "up", "down", "left", "right", "front" and "rear" are used based on the directions or positional relationships shown in the drawings, and are only for convenience of describing the present invention, and do not indicate or imply that the device referred to must have a specific direction, be constructed and operated in a specific direction, and thus, should not be construed as limiting the scope of the present invention. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate.
In addition, in the description of the present application, "a plurality" means two or more unless otherwise specified. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
Referring to fig. 1, a multi-description video coding method based on a frame prediction neural network includes the following steps:
a1 Divide the input video into a sequence of odd frames FOAnd an even frame sequence FERespectively encoded by an HEVC encoder to obtain a reconstructed odd frame sequence F'OAnd even frame sequence F'E
A2 F 'from a sequence of odd frames'OAnd even frame sequence F'ERespectively inputting a frame to predict a neural network FP-CNN to obtain a predicted even frame sequence F'EIAnd predicted odd frame sequence F'OI
Half of the video frames in each sub-sequence are missing due to the temporal down-sampling, and a frame prediction module FP-CNN is used to predict the missing video sequence in each sub-sequence.
The frame prediction neural network FP-CNN structure is shown in figure 3, and comprises an encoder and a decoder, wherein the output characteristics of the encoder and the output characteristics of the decoder in the same scale adopt a skip connection mode. The encoder and decoder are provided with a convolutional layer, an average pooling layer, and a bilinear upsampling layer.
Input odd frame sequence F'OAnd even frame sequence F'EFeature extraction is carried out through an encoder and a decoder, and the extracted features are provided for four sub-networks; estimating 1/4 output pixel by 1-dimensional kernel in dense pixel mode for each sub-network, and then combining the estimated pixel kernel with odd frame sequence F'OOr even frame sequence F'EThe two consecutive video frames are partially convolved to generate the predicted even frame sequence F'EIOr a predicted odd frame sequence F'OI. Wherein each sub-network comprises one bilinear upsampling layer and three convolutional layers.
A3 Predicted even frame sequence F'EIAnd a reconstructed even frame sequence F'EESubtracting to obtain an even residual FEIRThe predicted odd frame sequence and the reconstructed odd frame sequence F'OSubtracting to obtain odd residual FOIR
A4 Even residual F)EIRSum odd residual FOIRRespectively obtaining even residual error code stream F through residual error codingESISum odd residual code stream FOSI. In this step, residual coding is F to be inputEIRAnd FOIRDividing into 8 × 8 blocks, performing DCT transformation, quantization and entropy coding to obtain even residual error code stream FESISum odd residual code stream FOSI
A5 Sequence F 'of reconstructed odd frames'OSum even residual error code stream FESIPackaging into description 1, and carrying out reconstruction on even frame sequence F'EAnd odd residual code stream FOSIPacked into description 2, and transmitted to a decoding end through different channels, respectively.
In the invention, a data set is adopted to train and test the predictive neural network FP-CNN in advance. Specifically, videos of various scenes are selected, a source video is divided into odd frames and even frames, when different QP values are set, the videos are coded and decoded by an original HEVC (high efficiency video coding) coder, the coded and decoded videos serve as training data, the original odd frames or the original even frames serve as training labels, a data set is formed and used for training a frame prediction neural network FP-CNN, and the trained frame prediction neural network FP-CNN can be used for predicting lost frames.
Referring to fig. 2, the present invention further provides a multi-description video decoding method based on a frame prediction neural network, including the following steps:
b1 A decoding end receives the description 1 and the description 2, judges whether a lost video frame is generated, and if not, samples the odd frame and the even frame of the description 1 and the description 2 according to the sequence of the odd frame and the even frame to obtain a decoding reconstruction video sequence of a full frame rate; if yes, go to step B2). In this step, description 1 and description 2 are formed by packing the multi-description video coding method based on the frame prediction neural network.
B2) Decoding description 1 and description 2 by an HEVC standard decoder to obtain a reconstructed odd frame sequence F'OAnd even frame sequence F'EAnd (c) converting the reconstructed odd frame sequence F'OAnd even frame sequence F'ERespectively inputting a frame to predict a neural network FP-CNN to obtain a predicted even frame sequence F'EIAnd predicted odd frame sequence F'OI
B3 Even residual F)ESISum odd residual FOSIRespectively obtaining even residual errors F 'after residual errors are decoded'ESIAnd decoded odd residual F'OSI
B4 Utilizing predicted even frame sequence F'EIAnd even residual F'ESIObtaining missing even frames, utilizing a sequence of predicted odd frames F'OIAnd even residual F'OSIObtaining missing odd frames, and reusing even frames and even frame sequence F'EOdd frame and odd frame sequence F'OThe reconstructed video sequence is upsampled in parity frame order.
Selecting videos of various scenes, dividing a source video into odd frames and even frames, coding and decoding the videos through an original HEVC (high efficiency video coding) coder when different QP (quantization parameter) values are set, taking the coded and decoded videos as training data, taking the original odd frames or even frames as training labels, forming a data set for training a frame prediction neural network FP-CNN, and using the trained frame prediction neural network FP-CNN for predicting lost frames.
The frame prediction neural network FP-CNN is the same as the above network, and includes an encoder-decoder, and the output characteristics of the encoder and the output characteristics of the decoder in the same scale adopt a skip connection mode.
Input odd frame sequence F'OAnd even frame sequence F'EExtracting features through an encoder and a decoder, wherein the extracted features are provided for four sub-networks; estimating 1/4 output pixel by 1-dimensional kernel in dense pixel mode for each sub-network, and then combining the estimated pixel kernel with odd frame sequence F'OOr even frame sequence F'EThe two consecutive video frames are partially convolved to generate the predicted even frame sequence F'EIOr a predicted odd frame sequence F'OI
The encoder and the decoder are provided with a convolution layer, an average pooling layer and a bilinear upsampling layer; each sub-network comprises one bilinear upsampling layer and three convolutional layers.
The multi-description video coding formed by the method of the invention ensures that the code stream has certain error recovery capability, and the decoding end can fully utilize the relevant information among the descriptions to ensure the high-quality video reconstruction of the decoding end under the unreliable network transmission.
The above description is only an embodiment of the present invention, but the design concept of the present invention is not limited thereto, and any insubstantial modifications made by using the design concept should fall within the scope of infringing the present invention.

Claims (2)

1. A multi-description video coding method based on a frame prediction neural network is characterized by comprising the following steps:
a1 Divide the input video into a sequence of odd frames FOAnd even frame sequence FERespectively encoded by an HEVC encoder to obtain a reconstructed odd frame sequence F'OAnd even frame sequence F'E
A2 F 'from a sequence of odd frames'OAnd even frame sequence F'ERespectively inputting a frame to predict a neural network FP-CNN to obtain a predicted even frame sequence F'EIAnd predicted odd frame sequence F'OI
A3 Predicted even frame sequence F'EIAnd a reconstructed even frame sequence F'ESubtracting to obtain an even residual FEIRA predicted odd frame sequence and a reconstructed odd frame sequence F'OSubtracting to obtain odd residual FOIR
A4 Even residual F)EIRSum odd residual FOIRRespectively obtaining even residual error code stream F through residual error codingESISum odd residual code stream FOSI
A5 Sequence F 'of reconstructed odd frames'OSum even residual code stream FESIPacking into description 1, and transmitting the reconstructed even frame sequence F'EAnd odd residual code stream FOSIPacked into descriptions 2, respectivelyTransmitting to a decoding end through different channels;
selecting videos of various scenes, dividing a source video into odd frames and even frames, coding and decoding the videos through an original HEVC (high efficiency video coding) coder when different QP (quantization parameter) values are set, taking the coded and decoded videos as training data, and taking the original odd frames or even frames as training labels to form a data set for training the frame prediction neural network FP-CNN; the frame prediction neural network FP-CNN comprises an encoder-decoder, and the output characteristics of the encoder and the output characteristics of the decoder with the same scale adopt a skip connection mode;
the input odd frame sequence F'OAnd even frame sequence F'EFeature extraction is carried out through an encoder and a decoder, and the extracted features are provided for four sub-networks; estimating 1/4 output pixel by 1-dimensional kernel in dense pixel mode for each sub-network, and then combining the estimated pixel kernel with odd frame sequence F'OOr even frame sequence F'EThe two consecutive video frames are partially convolved to generate the predicted even frame sequence F'EIOr predicted sequence of odd frames F'OI
The encoder and the decoder are provided with a convolution layer, an average pooling layer and a bilinear upsampling layer; each of the subnetworks includes one bilinear upsampling layer and three convolutional layers.
2. A multi-description video decoding method based on a frame prediction neural network is characterized by comprising the following steps:
b1 The decoder receives the description 1 and the description 2, judges whether a lost video frame is generated, and if not, samples the odd frame and the even frame of the description 1 and the description 2 according to the sequence of the odd frame and the even frame to obtain a decoding reconstruction video sequence of the full frame rate; if yes, entering a step B2);
b2 Description 1 and description 2 are decoded by an HEVC standard decoder to obtain a reconstructed odd frame sequence F'OAnd even frame sequence F'ESequence F 'of reconstructed odd frames'OAnd even frame sequence F'ERespectively inputting a frame to predict a neural network FP-CNN to obtain a predicted even frame sequence F'EIAnd predictedOdd frame sequence F'OI
B3 Even residual F)ESISum odd residual FOSIRespectively obtaining even residual errors F 'after residual errors are decoded'ESIAnd decoded odd residual F'OSI
B4 Utilizing predicted even frame sequence F'EIAnd even residual F'ESIObtaining missing even frames, utilizing a sequence of predicted odd frames F'OIAnd even residual F'OSIObtaining missing odd frames, and reusing even frames and even frame sequence F'EOdd and odd frame sequence F'OThe video sequence is up-sampled and reconstructed according to the parity frame sequence;
selecting videos of various scenes, dividing a source video into odd frames and even frames, coding and decoding the videos through an original HEVC (high efficiency video coding) coder when different QP (quantization parameter) values are set, taking the coded and decoded videos as training data, and taking the original odd frames or even frames as training labels to form a data set for training the frame prediction neural network FP-CNN;
the frame prediction neural network FP-CNN comprises an encoder-decoder, and the output characteristics of the encoder and the output characteristics of the decoder with the same scale adopt a skip connection mode; the input odd frame sequence F'OAnd even frame sequence F'EExtracting features through an encoder and a decoder, wherein the extracted features are provided for four sub-networks;
estimating 1/4 of output pixels by using a 1-dimensional kernel in a dense pixel mode for each sub-network, and then enabling the estimated pixel kernel to be connected with an odd frame sequence F'OOr even frame sequence F'ELocally convolving consecutive two-frame video frames to generate the predicted even frame sequence F'EIOr predicted sequence of odd frames F'OI
The encoder and the decoder are provided with a convolution layer, an average pooling layer and a bilinear upsampling layer; each of the sub-networks includes one bilinear upsampling layer and three convolutional layers.
CN202110261181.3A 2021-03-10 2021-03-10 Multi-description video coding method and decoding method based on frame prediction neural network Active CN113038126B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110261181.3A CN113038126B (en) 2021-03-10 2021-03-10 Multi-description video coding method and decoding method based on frame prediction neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110261181.3A CN113038126B (en) 2021-03-10 2021-03-10 Multi-description video coding method and decoding method based on frame prediction neural network

Publications (2)

Publication Number Publication Date
CN113038126A CN113038126A (en) 2021-06-25
CN113038126B true CN113038126B (en) 2022-11-01

Family

ID=76469255

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110261181.3A Active CN113038126B (en) 2021-03-10 2021-03-10 Multi-description video coding method and decoding method based on frame prediction neural network

Country Status (1)

Country Link
CN (1) CN113038126B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113452944B (en) * 2021-08-31 2021-11-02 江苏北弓智能科技有限公司 Picture display method of cloud mobile phone

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8391370B1 (en) * 2009-03-11 2013-03-05 Hewlett-Packard Development Company, L.P. Decoding video data
CN103501441A (en) * 2013-09-11 2014-01-08 北京交通大学长三角研究院 Multiple-description video coding method based on human visual system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102630012B (en) * 2012-03-30 2014-09-03 北京交通大学 Coding and decoding method, device and system based on multiple description videos

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8391370B1 (en) * 2009-03-11 2013-03-05 Hewlett-Packard Development Company, L.P. Decoding video data
CN103501441A (en) * 2013-09-11 2014-01-08 北京交通大学长三角研究院 Multiple-description video coding method based on human visual system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Multiple description coding for multi-view video with adaptive redundancy allocation;Jing Chen等;《2017 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)》;20180122;全文 *
基于帧的多描述视频编码及其错误隐藏;励金祥等;《光子学报》;20100515(第05期);全文 *

Also Published As

Publication number Publication date
CN113038126A (en) 2021-06-25

Similar Documents

Publication Publication Date Title
CN101189882B (en) Method and apparatus for encoder assisted-frame rate up conversion (EA-FRUC) for video compression
CN103139559B (en) Multi-media signal transmission method and device
KR101425602B1 (en) Method and apparatus for encoding/decoding image
KR100714689B1 (en) Method for multi-layer based scalable video coding and decoding, and apparatus for the same
JP5014989B2 (en) Frame compression method, video coding method, frame restoration method, video decoding method, video encoder, video decoder, and recording medium using base layer
CN103220508B (en) Coding and decoding method and device
CN100512446C (en) A multi-description video encoding and decoding method based on self-adapted time domain sub-sampling
JP2008527902A (en) Adaptive entropy coding and decoding method and apparatus for stretchable coding
CN101573883A (en) Systems and methods for signaling and performing temporal level switching in scalable video coding
CN107995493B (en) Multi-description video coding method of panoramic video
KR20060063613A (en) Method for scalably encoding and decoding video signal
CN100455020C (en) Screen coding method under low code rate
JP2007520149A (en) Scalable video coding apparatus and method for providing scalability from an encoder unit
CN103098471A (en) Method and apparatus of layered encoding/decoding a picture
CN102630012B (en) Coding and decoding method, device and system based on multiple description videos
KR20060063605A (en) Method and apparatus for encoding video signal, and transmitting and decoding the encoded data
CN102438152B (en) Scalable video coding (SVC) fault-tolerant transmission method, coder, device and system
CN113132735A (en) Video coding method based on video frame generation
CN113038126B (en) Multi-description video coding method and decoding method based on frame prediction neural network
CN103139571A (en) Video fault-tolerant error-resisting method based on combination of forward error correction (FEC) and WZ encoding and decoding
CN112532908B (en) Video image transmission method, sending equipment, video call method and equipment
CN1672421A (en) Method and apparatus for performing multiple description motion compensation using hybrid predictive codes
CN104363454A (en) Method and system for video coding and decoding of high-bit-rate images
CN111510721B (en) Multi-description coding high-quality edge reconstruction method based on spatial downsampling
US20130223529A1 (en) Scalable Video Encoding Using a Hierarchical Epitome

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant