US20160065967A1 - Independent temporally concurrent video stream coding - Google Patents

Independent temporally concurrent video stream coding Download PDF

Info

Publication number
US20160065967A1
US20160065967A1 US14/834,624 US201514834624A US2016065967A1 US 20160065967 A1 US20160065967 A1 US 20160065967A1 US 201514834624 A US201514834624 A US 201514834624A US 2016065967 A1 US2016065967 A1 US 2016065967A1
Authority
US
United States
Prior art keywords
frame
input
frames
encoded
encoding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US14/834,624
Other versions
US10291917B2 (en
Inventor
Ermin Kozica
Dave Zachariah
Willem Bastiaan Kleijn
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
Google LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google LLC filed Critical Google LLC
Priority to US14/834,624 priority Critical patent/US10291917B2/en
Assigned to GLOBAL IP SOLUTIONS (GIPS) AB, GLOBAL IP SOLUTIONS, INC. reassignment GLOBAL IP SOLUTIONS (GIPS) AB ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ZACHARIAH, DAVE, KLEIJN, WILLEM BASTIAAN, KOZICA, ERMIN
Assigned to GOOGLE INC. reassignment GOOGLE INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GLOBAL IP SOLUTIONS, INC., GLOBAL IP SOLUTIONS (GIPS) AB
Publication of US20160065967A1 publication Critical patent/US20160065967A1/en
Assigned to GOOGLE LLC reassignment GOOGLE LLC CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: GOOGLE INC.
Application granted granted Critical
Publication of US10291917B2 publication Critical patent/US10291917B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/12Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/23439Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements for generating different versions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/262Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists
    • H04N21/26275Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists for distributing content or additional data in a staggered manner, e.g. repeating movies on different channels in a time-staggered manner in a near video on demand system
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4305Synchronising client clock from received content stream, e.g. locking decoder clock with encoder clock, extraction of the PCR packets

Definitions

  • the present invention relates to a method and apparatus for encoding of a video sequence.
  • a video sequence consists of a number of still images called frames. Coding of a video sequence, video coding, is done by describing the frames as bit-efficiently as possible. To do this, redundancy in the video sequence is exploited. There are three types of redundancies that can be exploited, temporal redundancy, spatial redundancy and spectral redundancy. Temporal redundancy is the redundancy between two frames, while spatial redundancy is the redundancy within a frame. Spectral redundancy is the redundancy between different colour components in the video. In the following we will not consider the spectral redundancy.
  • Video coding standards define a number of frame types, out of which the I-frame and the P-frame are common to most standards.
  • the I-frame is coded by exploiting spatial redundancy solely, resulting in a representation that is independent of all other frames.
  • P-frames are coded by exploiting both temporal and spatial redundancies. This leads to a more compact representation of the frame, while at the same time making this representation dependent of an other frame (in most cases the previous one).
  • the applications have mainly included videoconferencing and videotelephony over circuit-switched networks, but also storing video material for later retrieval, e.g., the DVD.
  • Newer standards e.g., MPEG-4 and H.264, have a performance that is significantly improved over their predecessors and achieve low bit-rates for given video quality.
  • the main ideas of using different frame types have been preserved and the performance improvement is a result of refinement of the methods used in older standards.
  • One such refinement is that a frame can be segmented into smaller regions called slices, and the method of using I frames and P frames can be applied on individual slices.
  • An object of the present invention is to provide encoding and decoding of a video sequence which improves the perceptual video quality with only a moderate increase of the bit-rate for transferring the encoded video sequence.
  • a method and an apparatus for encoding a video sequence, and a method and an apparatus for decoding a video sequence, in accordance with the present invention are defined in the appended independent claims.
  • the invention is based on the idea of using two or more coding units for encoding two or more descriptions of the same video sequence, wherein the encoding units perform their encoding operations displaced in time in relation to each other.
  • the invention also includes the use of two or more decoding units for decoding two or more descriptions of the same video sequence, wherein the decoding units perform their decoding operations displaced in time in relation to each other.
  • the use of more than one encoder for encoding the same video sequence has the advantage of increasing the possibility that one or more encoded descriptions of a video sequence frame are received without error, even though one or more encoded descriptions of the same frame are non-existent due to an error or delay when transferring the encoded video sequence over a network from a transmitting end to a receiving end.
  • By displacing the encoding operations of the encoders in time the probability that the received encoded sequences include propagated errors at the same time will be reduced. This is because the different encoded sequences will have some kind of zero states occurring at different points in time. With increased time since the last zero state for an encoded sequence, the higher probability of a propagated error for that encoded sequence.
  • Another advantage of displacing the encoding operations of the encoders in time is achieved in case of a disruption in the network transferring all the encoded video sequences and effecting all the sequences at the same time.
  • the time until one of the video sequences includes a zero state after the disruption will in most cases be smaller, as compared to the case with no displacement of the zero state.
  • the time to the next zero state for all the multiple encoded sequences will be the same as in the case when only one single encoded sequence is used for transferring the video.
  • the present inventions has jitter buffer arranged at the receiving end, preferrably one jitter buffer for each description received, and, thus, for each decoder.
  • the decoders will be provided with data to be decoded from respective jitter buffers. According to the invention, the decoding operations of one decoder are then displaced in time with regard to decoding operations of another decoder.
  • a zero state as discussed above corresponds to an intra-encoding operation, i.e. an encoding operation exploiting spatial redundancy only, and the encoding operations between two zero states of the same encoded video sequence correspond to inter-encoding operations, i.e. encoding operations exploiting temporal redundancy between successive points of time of encoding.
  • the intra-encoding and inter-encoding may be used on a frame-by-frame basis of the video sequence, or on a slice-by-slice basis, wherein a slice corresponds to a segment of a frame.
  • the intra-encoding and inter-encoding correspond to I type and P type encoding, respectively.
  • the invention is applicable both for video coding standards in which the encoding uses I/P frames and video coding standards using I/P slices. Consequently, as the invention does not depend on whether successive full frames or successive slices of frames are encoded using the I/P concept, the following description will use the term I/P frame as general notation for both I/P frame and I/P slice. Thus, whenever I and P frames are discussed and described, the same description applies for I and P slices.
  • the inter-encoded frames/slices of the present invention can be implemented with different kinds of predictive frames/slices, e.g. B type (Bi-predictive encoding), and that the reference to P type encoding merely discloses an exemplifying embodiment.
  • the present invention provides video sequence encoding using two or more encoders such that shorter error propagation on an average is provided, which results in perceptually improved quality of the displayed video at a receiving end after decoding of the video sequences.
  • displacing the encoding operations for different encoders in time does not increase the bit-rate for transferring the different encoded video sequences, as compared to transferring the same number of encoded video sequences without any displacement of the encoding operations.
  • the present invention improves the video quality by ensuring robustness against transmission errors.
  • FIG. 1 schematically shows an exemplifying overall system environment in which various embodiments of the invention may be included and arranged to operate;
  • FIG. 2 schematically shows how to obtain several different descriptions of a video frame (or slice of a video frame) for encoding of each of the description by a separate encoder;
  • FIG. 3 shows an embodiment of the invention where intra-encoding operations of each encoded video sequence among three encoded video sequences are displaced in relation to the intra-encoding operations of the other encoded video sequences;
  • FIG. 4 shows an embodiment of the invention where intra-encoding operations of one encoded video sequence is displaced in relation to the intra-encoding operations of another encoded video sequences.
  • FIG. 1 schematically shows an exemplifying overall system environment in which the different embodiments of the invention may be included and arranged to operate.
  • FIG. 1 a digitized video signal 101 , divided into frames, is input, each frame representing a still image in time.
  • a video signal can be divided into multiple descriptions. Each description is then encoded in a separate coding unit which is an implementation of an existing standard coder. This implies that there are I-frames and P-frames for each description. In case all descriptions are received at the receiver end, the best quality of video is obtained. In case there are errors in the transmission, affecting a number of descriptions, these descriptions are disregarded until they have been updated by an I-frame. Of course, this has the effect that the quality of the video is reduced temporarily.
  • the descriptions in a multiple description video encoding setup can relate to each other in a number of ways. First of all, they can be either equivalent or non-equivalent, i.e., each description results in the same quality or a differing quality compared to another description. Whether the descriptions are equivalent or not, they can (i) be fully redundant, i.e., several descriptions are replications of one another, (ii) have zero redundancy, i.e., the descriptions have no mutual information and (iii) be redundant to some extent, i.e. there is some mutual information between the descriptions. How the descriptions relate can affect the overall performance on different networks.
  • the transmitting end includes three encoders 121 , 122 and 123 .
  • These three encoders are preferably standard encoders operating in accordance with the H.263, MPEG-2, H.264, or MPEG-4 video coding standards.
  • the three encoders all handle their respective description in a similar manner, i.e. encode the received description using I-frames and P-frames (or when applicable, I-slices and P-slices) in accordance with the video coding standard used.
  • the difference between the three encoders themselves is the time during which they perform intra-encoding operations.
  • the receiving end includes three decoders 151 , 152 and 153 , also preferably being standard encoders operating in accordance with the H.263, MPEG-2, H.264, or MPEG-4 video coding standards.
  • Each decoder 151 , 152 , 153 decodes a respective description 111 , 112 , 113 of the video signal.
  • the three decoders all handle their respective description in a similar manner, i.e. decode the received encoded description consisting of I-frames and P-frames (or when applicable, I-slices and P-slices) in accordance with the video coding standard used.
  • the difference between the three decoders themselves is the time during which they perform intra-decoding operations.
  • the sequence of decoded I-frames and P-frames differ between the three encoders.
  • the video signal 101 is input to a sub-sampling unit 110 .
  • the sub-sampling unit sub-samples (in time or space, i.e. performs temporal or spatial sub-sampling) the input video sequence signal 101 into multiple, differing descriptions 111 , 112 and 113 of the video signal 101 .
  • the receiving end includes an up-sampling unit 170 that performs the inverse procedure of the sub-sampling procedure, i.e. rearranges the decoded descriptions, decoded by decoders 151 , 152 and 153 , into one set of successive video frames.
  • the descriptions 111 , 112 and 113 are identical, in which case the unit referenced as 110 is a replication unit replicating the input video signal 101 into three identical descriptions 111 , 112 and 113 . Consequently, in this alternative embodiment, the up-sampling unit 170 may simply be a unit responsible for discarding redundant decoded description (or for merging decoded descriptions if these are not fully redundant). That is, if two or more descriptions 161 , 162 , 163 are decoded by respective decoders 151 , 152 and 153 at the receiving end without errors, and if the descriptions are fully redundant, all but one of the decoded descriptions may simply be discarded by the unit 170 .
  • This exemplified sub-sampling procedure assigns pixels from the input video still images to the three descriptions 111 , 112 and 113 .
  • An input video image, or frame, 201 is here five pixels high and nine pixels wide.
  • the pixels are assigned to descriptions column-wise: columns one, four and seven are assigned to description one, denoted 202 , columns two, five and eight are assigned to description two, denoted 203 , and columns three, six and nine are assigned to description three, denoted 204 .
  • Each pixel is named in the figure and can be located in its description.
  • the sub-sampling procedure of FIG. 2 is not the only one that can be used. There are other possible sub-sampling procedures, which also can be incorporated with the invention. Depending on the number of descriptions in the multiple description coding setup, so called quincunx sub-sampling, temporal sub-sampling and poly-phase sub-sampling can be used. In quincunx sub sampling, two descriptions are assigned the pixels in a checker-board fashion, the (odd-row, odd-column) pixels and the (even-row, even-column) pixels are assigned to one description, while the (odd-row, even-column) pixels and the (even-row, odd-column) pixels are assigned to the other description.
  • temporal sub-sampling the number of descriptions is arbitrary. For example, assigning every third frame starting from frame one to description one, every third frame starting from frame two to description two and every third frame starting from frame three to description three, which yields three descriptions.
  • Poly-phase sub-sampling is performed by sub-sampling the original frame along rows (by factor R), producing R temporary descriptions. These R temporary descriptions are then sub-sampled (by factor C), each producing C descriptions and a total of R*C descriptions.
  • each description is independently encoded by its respective encoder 121 , 122 and 123 .
  • each encoder encodes its input description so as to output the encoded video description as a series of I frames and P frames.
  • the intra-encoding operations applied to each video sequence description among three different video sequence descriptions are displaced in relation to the intra-encoding operations applied to the other video sequence descriptions.
  • the independency of the corresponding three encoding units is exploited by displacing the I frames to be interlaced in time such that the temporal distance between two (in different encoded descriptions) following I-frames is always equal.
  • the group of pictures (GOP) length for each encoded description is six frames, while the distance between two I-frames is two frames.
  • the first frame of each description is coded as an I-frame. This is done to get a good reference for prediction in all the encoded descriptions.
  • intra-decoding operations applied to each received video sequence description among three different video sequence descriptions are displaced in relation to the intra-decoding operations applied to the other video sequence descriptions.
  • the displacement of the intra-decoding operations of two decoders corresponds to the temporal distance between two I-frames of respective encoded descriptions that are to be decoded.
  • each coded description 131 - 133 is sent over the network 140 .
  • the network is such that some of the encoded description frames may be transferred with errors or be delayed, which in a packet switched network results in missing video data for the frames in question. This behaviour is typical of packet-switched networks.
  • the encoded descriptions 141 - 143 that arrive at the receiving end are decoded in respective decoders 151 - 153 .
  • the final result is obtained by up-sampling of the decoded descriptions 161 - 163 in up-sampling unit 170 .
  • the up-sampling procedure is the inverse of the sub-sampling procedure, i.e. the pixels are rearranged from the three descriptions into one final frame.
  • the final result 171 is a digital representation of the video that was input at the receiving end and is sent for further processing, e.g., displaying on a monitor.
  • Some of the descriptions of a current frame may be lost, delayed or corrupted, resulting in a treatment as being non-existent. This will result in a propagated error in the decoded representation of the description.
  • the propagated error is caused by the dependence of frames which causes all inter-coded frames following an erroneous frame to be erroneous.
  • a non-existent or corrupted description is disregarded by up-sampling unit 170 and its pixels are instead estimated from the pixels of the other descriptions. This can be done in an interpolating manner, e.g., pixel b 1 in FIG. 2 is estimated as the mean of a 1 and c 1 .
  • a description is disregarded as long as it is corrupt. Hence, it will be taken in use only when an I-frame of that description arrives at the receiver end. Having access to as many non-corrupt descriptions as possible results in the best quality, why one wants to maximize the number of non-corrupt descriptions at all times.
  • the expected number of descriptions available at any time will be greater than if the same frame would have been encoded as an I-frame for every description. This follows from the fact that the interval between I-frames is smaller and the probability of a propagated error will at any time be different for the three different descriptions.
  • the up-sampling unit 170 In order to for the up-sampling unit 170 to be able to decide how to arrange the received descriptions, i.e. the output of the decoders, into one set of successive video frames, it needs to keep track of the validity of the received descriptions. This is preferrably done by including output validity flags in the up-sampling unit, one output validity flag for each decoder connected to the up-sampling unit.
  • a decoder's output validity flag indicates whether the description from that decoder is corrupted or non-corrupted, and, thus, whether that description should be used when arranging the received descriptions into one set of successive video frames.
  • a decoder When a decoder determines a description to be lost, delayed or corrupted, it signals to the up-sampling unit that the corresponding output validity flag should be set to corrupted. When a decoder decodes an I frame, it signals to the up-sampling unit that the corresponding output validity flag should be set to non-corrupted. Thus, the up-sampling unit 170 will at every time instance be able keep track of the validity of each one of the descriptions received from the decoders.
  • the above design of separate signalling for each decoder with regard to setting output validity as non-corrupted is due to the fact that the I frames of the different descriptions are displaced in time. In comparison, in a design in which the I frames of the different descriptions are not displaced in time, it is sufficient with a single signalling for all descriptions when the I frames are decoded.
  • FIG. 4 another embodiment of the invention is described.
  • the independence of the coding units is in this embodiment exploited by placing the I-frames in the multiple descriptions such that the expected distortion at the receiver end is minimized.
  • the I frames of the different descriptions are placed based on calculations that utilize known transmission error probabilities, i.e. known network characteristics.
  • FIG. 4 shows an example with two descriptions in which the probability of transmission error for the upper (in FIG. 4 ) description is assumed or known to be lower than the probability of transmission error for the lower (in FIG. 4 ) description. In this way the I-frames are interlaced such that the expected distortion at the receiver end is minimized.
  • the sender can choose to use the information regarding different transmission error probabilities for the two transferred encoded descriptions to improve the performance, not only in comparison to placement of the I frames at the same time for both descriptions, but also in comparison to the placement of I frames described in the previous embodiment.
  • the displacement of the decoding operations at the receiving end corresponds to the placement of the I-frames shown in FIG. 4 .
  • the probability of error for the upper description is lower than the probability of error for the lower description, then it is advantageous to move the relative placement of the I frames of the encoded descriptions in accordance with what is shown in FIG. 4 .
  • the lower description can be seen as complementary, i.e., it is used to decrease the probability of error when the upper description is no longer reliable. Since, the upper description has lower probability of error, the I frame of the lower description can be moved to the right (to occur later in time) and the upper description is trusted with a greater number of P frames before the lower description is used to decrease the overall probability of error. For example, with the time from left to right in FIG. 4 , the first P frame after the I frame of the lower description occurs at the same time as the fifth P frame of the upper description, thereby providing a decreased overall probability of error. The situation for the lower description is the opposite.
  • the optimal placement of the I-frames for descriptions one and two can with given probabilities of error and expected distortion be calculated in a minimization problem.
  • the expected value of the total distortion is minimized with respect to the relative placement of the I-frames.
  • the expression for the expected distortion is shown to occur in periods, why it is sufficient to solve the minimization problem only for an interval between two I-frames in either description.
  • the expression for the expected distortion in this interval is differentiated with respect to the length of I-frame displacement, giving an extremum. Since, the problem now lies in an interval, the minimum is found by evaluating the expected distortion at the extremum and at the boundaries of the interval. This will be described further in the following.
  • the optimization problem is to minimize the expectation of distortion D over all frames k ⁇ , ⁇ for the discrete displacement variable ⁇ 0,K ⁇ 1 ⁇ where K denotes the I-frame period length.
  • brackets ⁇ and ⁇ denote the floor and ceil operations, respectively.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Discrete Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Implementations of independent temporally concurrent video stream coding may include encoding a plurality of input frames from an input video sequence, wherein the plurality of input frames includes a first input frame. Encoding the plurality of input frames may include generating a first plurality of encoded frames based on the plurality of input frames such that the first plurality of encoded frames includes a first encoded I-frame corresponding to the first input frame, and generating a second plurality of encoded frames based on the plurality of input frames such that the second plurality of encoded frames includes a first encoded P-frame corresponding to the first input frame. Implementations of independent temporally concurrent video stream coding may include including the first plurality of encoded frames and the second plurality of encoded frames in an output, and transmitting the output to a decoder.

Description

    TECHNICAL FIELD OF THE INVENTION
  • The present invention relates to a method and apparatus for encoding of a video sequence.
  • BACKGROUND OF THE INVENTION
  • A video sequence consists of a number of still images called frames. Coding of a video sequence, video coding, is done by describing the frames as bit-efficiently as possible. To do this, redundancy in the video sequence is exploited. There are three types of redundancies that can be exploited, temporal redundancy, spatial redundancy and spectral redundancy. Temporal redundancy is the redundancy between two frames, while spatial redundancy is the redundancy within a frame. Spectral redundancy is the redundancy between different colour components in the video. In the following we will not consider the spectral redundancy.
  • Video coding standards define a number of frame types, out of which the I-frame and the P-frame are common to most standards. The I-frame is coded by exploiting spatial redundancy solely, resulting in a representation that is independent of all other frames. P-frames, on the other hand, are coded by exploiting both temporal and spatial redundancies. This leads to a more compact representation of the frame, while at the same time making this representation dependent of an other frame (in most cases the previous one).
  • Video coding standards from about 1995, e.g., H.263 and later MPEG-2, have been developed for the purpose of bit-efficient video coding and make use of the I-frame/P-frame setup. The applications have mainly included videoconferencing and videotelephony over circuit-switched networks, but also storing video material for later retrieval, e.g., the DVD. Newer standards, e.g., MPEG-4 and H.264, have a performance that is significantly improved over their predecessors and achieve low bit-rates for given video quality. The main ideas of using different frame types have been preserved and the performance improvement is a result of refinement of the methods used in older standards. One such refinement is that a frame can be segmented into smaller regions called slices, and the method of using I frames and P frames can be applied on individual slices.
  • With the arrival of new technology, where greater processing power and packet-switched networks (WLAN and Internet) have had the leading role, new applications have become of interest. These applications include streaming video and live video communication over IP networks. The requirements that live video communication applications pose on the underlying technique are quite different from those of storage applications and even streaming applications. In addition to the requirements that are present in storage and streaming applications, live video communication poses a strict requirement on the delay between sending and displaying video. This strict requirement makes the overall number of errors in transmission increase, since delayed packets are handled equivalently to lost packets.
  • Existing video coding techniques, using the mentioned setup with different frame types, are not suitable for live video communication due to the strict delay restriction. Introducing high dependency between frames to achieve a bit-efficient representation of the signal results in display of erroneous video in environments where the probability of transmission error is significant. Not only is it impossible to render frames that are not received in time, but the frame dependency makes the error propagate throughout the video sequence, which is annoying to the viewer. The problem is in current standards handled, in more or less efficient ways, by sending a frame that is independent of other frames, an I-frame. In this manner, the propagated error is reset to zero. However, the choice of how frequently I-frames should be sent is not trivial. Increasing the I frame frequency results in better video quality when there is a possibility of transmission errors, while at the same time increasing the bit-rate. Hence, there is a trade-off between video quality and bit-efficient representation of the video.
  • Therefore, it is desirable to be able to increase the video quality without having to increase the bit-rate too much, thereby still providing a bit-efficient representation of the video.
  • SUMMARY OF THE INVENTION
  • An object of the present invention is to provide encoding and decoding of a video sequence which improves the perceptual video quality with only a moderate increase of the bit-rate for transferring the encoded video sequence.
  • A method and an apparatus for encoding a video sequence, and a method and an apparatus for decoding a video sequence, in accordance with the present invention are defined in the appended independent claims.
  • The invention is based on the idea of using two or more coding units for encoding two or more descriptions of the same video sequence, wherein the encoding units perform their encoding operations displaced in time in relation to each other. The invention also includes the use of two or more decoding units for decoding two or more descriptions of the same video sequence, wherein the decoding units perform their decoding operations displaced in time in relation to each other.
  • The use of more than one encoder for encoding the same video sequence has the advantage of increasing the possibility that one or more encoded descriptions of a video sequence frame are received without error, even though one or more encoded descriptions of the same frame are non-existent due to an error or delay when transferring the encoded video sequence over a network from a transmitting end to a receiving end. By displacing the encoding operations of the encoders in time, the probability that the received encoded sequences include propagated errors at the same time will be reduced. This is because the different encoded sequences will have some kind of zero states occurring at different points in time. With increased time since the last zero state for an encoded sequence, the higher probability of a propagated error for that encoded sequence. By displacing the zero states for the different encoded sequences, there will always be a lower probability of a propagated error for one or more of the sequences than for the other(s). In comparison, with no displacement of the zero states for the encoded sequences, all encoded sequences will simultaneously increase their probability of including a propagated error up to the point when, at the same time for all sequences, new encoded zero states occur.
  • Another advantage of displacing the encoding operations of the encoders in time is achieved in case of a disruption in the network transferring all the encoded video sequences and effecting all the sequences at the same time. In such case, the time until one of the video sequences includes a zero state after the disruption will in most cases be smaller, as compared to the case with no displacement of the zero state. In the latter case, since the zero states for the multiple encoded sequences occur simultaneously, the time to the next zero state for all the multiple encoded sequences will be the same as in the case when only one single encoded sequence is used for transferring the video.
  • It will be appreciated that transfer of two or more descriptions over a network may result in that the different descriptions are not received in synchronism due to varying network jitter for the different descriptions. As known by the skilled person, jitter buffers at the receiving end are used for dealing with network jitter. Thus, with multiple descriptions, multiple jitter buffers are needed. Frames of different descriptions may then be output in synchronism from the respective jitter buffers. Advantageously, the present inventions has jitter buffer arranged at the receiving end, preferrably one jitter buffer for each description received, and, thus, for each decoder. Thus, using jitter buffers, the decoders will be provided with data to be decoded from respective jitter buffers. According to the invention, the decoding operations of one decoder are then displaced in time with regard to decoding operations of another decoder.
  • Typically, a zero state as discussed above corresponds to an intra-encoding operation, i.e. an encoding operation exploiting spatial redundancy only, and the encoding operations between two zero states of the same encoded video sequence correspond to inter-encoding operations, i.e. encoding operations exploiting temporal redundancy between successive points of time of encoding. Further, the intra-encoding and inter-encoding may be used on a frame-by-frame basis of the video sequence, or on a slice-by-slice basis, wherein a slice corresponds to a segment of a frame.
  • According to an embodiment of the invention, the intra-encoding and inter-encoding correspond to I type and P type encoding, respectively. Thus, the invention is applicable both for video coding standards in which the encoding uses I/P frames and video coding standards using I/P slices. Consequently, as the invention does not depend on whether successive full frames or successive slices of frames are encoded using the I/P concept, the following description will use the term I/P frame as general notation for both I/P frame and I/P slice. Thus, whenever I and P frames are discussed and described, the same description applies for I and P slices. Further, it will be appreciated that the inter-encoded frames/slices of the present invention can be implemented with different kinds of predictive frames/slices, e.g. B type (Bi-predictive encoding), and that the reference to P type encoding merely discloses an exemplifying embodiment.
  • Thus, it will be appreciated that the present invention provides video sequence encoding using two or more encoders such that shorter error propagation on an average is provided, which results in perceptually improved quality of the displayed video at a receiving end after decoding of the video sequences. In addition, it will be appreciated that displacing the encoding operations for different encoders in time does not increase the bit-rate for transferring the different encoded video sequences, as compared to transferring the same number of encoded video sequences without any displacement of the encoding operations. Thus, the present invention improves the video quality by ensuring robustness against transmission errors.
  • It will also be appreciated that the discussion above, and the following description, of encoding operations in accordance with the invention, apply correspondingly to decoding operations as defined by the invention.
  • Further features of the invention, as well as advantages thereof, will become more readily apparent from the following detailed description of a number of exemplifying embodiments of the invention. As is understood, various modifications, alterations and different combinations of features coming within the scope of the invention as defined by the appended claims will become apparent to those skilled in the art when studying the general teaching set forth herein and the following detailed description.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • Exemplifying embodiments of the present invention will now be described with reference to the accompanying drawings, in which:
  • FIG. 1 schematically shows an exemplifying overall system environment in which various embodiments of the invention may be included and arranged to operate;
  • FIG. 2 schematically shows how to obtain several different descriptions of a video frame (or slice of a video frame) for encoding of each of the description by a separate encoder;
  • FIG. 3 shows an embodiment of the invention where intra-encoding operations of each encoded video sequence among three encoded video sequences are displaced in relation to the intra-encoding operations of the other encoded video sequences; and
  • FIG. 4 shows an embodiment of the invention where intra-encoding operations of one encoded video sequence is displaced in relation to the intra-encoding operations of another encoded video sequences.
  • DETAILED DESCRIPTION OF THE INVENTION
  • FIG. 1 schematically shows an exemplifying overall system environment in which the different embodiments of the invention may be included and arranged to operate.
  • In FIG. 1 a digitized video signal 101, divided into frames, is input, each frame representing a still image in time.
  • In general, to obtain robustness to transmission errors, a video signal can be divided into multiple descriptions. Each description is then encoded in a separate coding unit which is an implementation of an existing standard coder. This implies that there are I-frames and P-frames for each description. In case all descriptions are received at the receiver end, the best quality of video is obtained. In case there are errors in the transmission, affecting a number of descriptions, these descriptions are disregarded until they have been updated by an I-frame. Of course, this has the effect that the quality of the video is reduced temporarily.
  • The descriptions in a multiple description video encoding setup can relate to each other in a number of ways. First of all, they can be either equivalent or non-equivalent, i.e., each description results in the same quality or a differing quality compared to another description. Whether the descriptions are equivalent or not, they can (i) be fully redundant, i.e., several descriptions are replications of one another, (ii) have zero redundancy, i.e., the descriptions have no mutual information and (iii) be redundant to some extent, i.e. there is some mutual information between the descriptions. How the descriptions relate can affect the overall performance on different networks.
  • One important property of coders using the multiple description coding setup which makes improvement of the performance possible is the following. When sending multiple descriptions of a video signal, where the descriptions are coded in separate coding units, there is the possibility of utilizing the fact that the coding units of each description are independent. That is, the coding procedure of description one does not depend on the coding procedure of description two. The present invention provides simple and yet effective techniques for utilizing this property of the coding setup.
  • Referring to FIG. 1, the transmitting end includes three encoders 121, 122 and 123. These three encoders are preferably standard encoders operating in accordance with the H.263, MPEG-2, H.264, or MPEG-4 video coding standards. To each encoder 121, 122, 123 a respective description 111, 112, 113 of the video signal is input. The three encoders all handle their respective description in a similar manner, i.e. encode the received description using I-frames and P-frames (or when applicable, I-slices and P-slices) in accordance with the video coding standard used. The difference between the three encoders themselves is the time during which they perform intra-encoding operations. Thus, the sequence of output I-frames and P-frames differ between the three encoders. The receiving end includes three decoders 151, 152 and 153, also preferably being standard encoders operating in accordance with the H.263, MPEG-2, H.264, or MPEG-4 video coding standards. Each decoder 151, 152, 153 decodes a respective description 111, 112, 113 of the video signal. The three decoders all handle their respective description in a similar manner, i.e. decode the received encoded description consisting of I-frames and P-frames (or when applicable, I-slices and P-slices) in accordance with the video coding standard used. The difference between the three decoders themselves is the time during which they perform intra-decoding operations. Thus, the sequence of decoded I-frames and P-frames differ between the three encoders.
  • In accordance with one embodiment, the video signal 101 is input to a sub-sampling unit 110. The sub-sampling unit sub-samples (in time or space, i.e. performs temporal or spatial sub-sampling) the input video sequence signal 101 into multiple, differing descriptions 111, 112 and 113 of the video signal 101. The receiving end includes an up-sampling unit 170 that performs the inverse procedure of the sub-sampling procedure, i.e. rearranges the decoded descriptions, decoded by decoders 151, 152 and 153, into one set of successive video frames.
  • According to an alternative embodiment, the descriptions 111, 112 and 113 are identical, in which case the unit referenced as 110 is a replication unit replicating the input video signal 101 into three identical descriptions 111, 112 and 113. Consequently, in this alternative embodiment, the up-sampling unit 170 may simply be a unit responsible for discarding redundant decoded description (or for merging decoded descriptions if these are not fully redundant). That is, if two or more descriptions 161, 162, 163 are decoded by respective decoders 151, 152 and 153 at the receiving end without errors, and if the descriptions are fully redundant, all but one of the decoded descriptions may simply be discarded by the unit 170.
  • An exemplifying sub-sampling procedure is described with reference to FIG. 2. This exemplified sub-sampling procedure assigns pixels from the input video still images to the three descriptions 111, 112 and 113.
  • An input video image, or frame, 201 is here five pixels high and nine pixels wide. The pixels are assigned to descriptions column-wise: columns one, four and seven are assigned to description one, denoted 202, columns two, five and eight are assigned to description two, denoted 203, and columns three, six and nine are assigned to description three, denoted 204. Each pixel is named in the figure and can be located in its description.
  • The sub-sampling procedure of FIG. 2 is not the only one that can be used. There are other possible sub-sampling procedures, which also can be incorporated with the invention. Depending on the number of descriptions in the multiple description coding setup, so called quincunx sub-sampling, temporal sub-sampling and poly-phase sub-sampling can be used. In quincunx sub sampling, two descriptions are assigned the pixels in a checker-board fashion, the (odd-row, odd-column) pixels and the (even-row, even-column) pixels are assigned to one description, while the (odd-row, even-column) pixels and the (even-row, odd-column) pixels are assigned to the other description. In temporal sub-sampling the number of descriptions is arbitrary. For example, assigning every third frame starting from frame one to description one, every third frame starting from frame two to description two and every third frame starting from frame three to description three, which yields three descriptions. Poly-phase sub-sampling is performed by sub-sampling the original frame along rows (by factor R), producing R temporary descriptions. These R temporary descriptions are then sub-sampled (by factor C), each producing C descriptions and a total of R*C descriptions.
  • Referring again to FIG. 1, regardless of whether the three descriptions 111, 112 and 113 are identical (equivalent) or different (non-equivalent), each description is independently encoded by its respective encoder 121, 122 and 123. Typically, each encoder encodes its input description so as to output the encoded video description as a series of I frames and P frames.
  • In the embodiment of FIG. 3 the intra-encoding operations applied to each video sequence description among three different video sequence descriptions are displaced in relation to the intra-encoding operations applied to the other video sequence descriptions. The independency of the corresponding three encoding units is exploited by displacing the I frames to be interlaced in time such that the temporal distance between two (in different encoded descriptions) following I-frames is always equal. The group of pictures (GOP) length for each encoded description is six frames, while the distance between two I-frames is two frames. However, one exception to the setup in FIG. 3 has to be made, namely, the first frame of each description is coded as an I-frame. This is done to get a good reference for prediction in all the encoded descriptions.
  • Correspondingly, at a receiver end, intra-decoding operations applied to each received video sequence description among three different video sequence descriptions are displaced in relation to the intra-decoding operations applied to the other video sequence descriptions. The displacement of the intra-decoding operations of two decoders corresponds to the temporal distance between two I-frames of respective encoded descriptions that are to be decoded.
  • Referring to FIG. 1, each coded description 131-133 is sent over the network 140. The network is such that some of the encoded description frames may be transferred with errors or be delayed, which in a packet switched network results in missing video data for the frames in question. This behaviour is typical of packet-switched networks. The encoded descriptions 141-143 that arrive at the receiving end are decoded in respective decoders 151-153. After decoding, the final result is obtained by up-sampling of the decoded descriptions 161-163 in up-sampling unit 170. As described above, the up-sampling procedure is the inverse of the sub-sampling procedure, i.e. the pixels are rearranged from the three descriptions into one final frame. The final result 171 is a digital representation of the video that was input at the receiving end and is sent for further processing, e.g., displaying on a monitor.
  • Some of the descriptions of a current frame may be lost, delayed or corrupted, resulting in a treatment as being non-existent. This will result in a propagated error in the decoded representation of the description. The propagated error is caused by the dependence of frames which causes all inter-coded frames following an erroneous frame to be erroneous.
  • In one possible embodiment, a non-existent or corrupted description is disregarded by up-sampling unit 170 and its pixels are instead estimated from the pixels of the other descriptions. This can be done in an interpolating manner, e.g., pixel b1 in FIG. 2 is estimated as the mean of a1 and c1. A description is disregarded as long as it is corrupt. Hence, it will be taken in use only when an I-frame of that description arrives at the receiver end. Having access to as many non-corrupt descriptions as possible results in the best quality, why one wants to maximize the number of non-corrupt descriptions at all times. By placing the I frames as illustrated in FIG. 3, the expected number of descriptions available at any time will be greater than if the same frame would have been encoded as an I-frame for every description. This follows from the fact that the interval between I-frames is smaller and the probability of a propagated error will at any time be different for the three different descriptions.
  • In order to for the up-sampling unit 170 to be able to decide how to arrange the received descriptions, i.e. the output of the decoders, into one set of successive video frames, it needs to keep track of the validity of the received descriptions. This is preferrably done by including output validity flags in the up-sampling unit, one output validity flag for each decoder connected to the up-sampling unit. A decoder's output validity flag indicates whether the description from that decoder is corrupted or non-corrupted, and, thus, whether that description should be used when arranging the received descriptions into one set of successive video frames. When a decoder determines a description to be lost, delayed or corrupted, it signals to the up-sampling unit that the corresponding output validity flag should be set to corrupted. When a decoder decodes an I frame, it signals to the up-sampling unit that the corresponding output validity flag should be set to non-corrupted. Thus, the up-sampling unit 170 will at every time instance be able keep track of the validity of each one of the descriptions received from the decoders. The above design of separate signalling for each decoder with regard to setting output validity as non-corrupted is due to the fact that the I frames of the different descriptions are displaced in time. In comparison, in a design in which the I frames of the different descriptions are not displaced in time, it is sufficient with a single signalling for all descriptions when the I frames are decoded.
  • By maximizing the number of descriptions that is available at any given time instance, the perceptual quality of the video is improved. Also, in the case that all descriptions are corrupted, the time until an update (zero state or I frame for any description) is received is minimized. It will be appreciated that the above described structure and operation made with reference to FIGS. 1-3 is applicable to any number of description of the video sequence. Thus, even though FIGS. 1-3 relates to three descriptions, the corresponding disclosure is applicable also with regard to two or four, or any greater number, of utilized descriptions.
  • Referring to FIG. 4 another embodiment of the invention is described. The independence of the coding units is in this embodiment exploited by placing the I-frames in the multiple descriptions such that the expected distortion at the receiver end is minimized. The I frames of the different descriptions are placed based on calculations that utilize known transmission error probabilities, i.e. known network characteristics. FIG. 4 shows an example with two descriptions in which the probability of transmission error for the upper (in FIG. 4) description is assumed or known to be lower than the probability of transmission error for the lower (in FIG. 4) description. In this way the I-frames are interlaced such that the expected distortion at the receiver end is minimized. The sender can choose to use the information regarding different transmission error probabilities for the two transferred encoded descriptions to improve the performance, not only in comparison to placement of the I frames at the same time for both descriptions, but also in comparison to the placement of I frames described in the previous embodiment. The displacement of the decoding operations at the receiving end corresponds to the placement of the I-frames shown in FIG. 4.
  • With the assumption above that the probability of error for the upper description is lower than the probability of error for the lower description, then it is advantageous to move the relative placement of the I frames of the encoded descriptions in accordance with what is shown in FIG. 4. With such a placement of the I-frames, the lower probability of error in the upper encoded description is recognized. The lower description can be seen as complementary, i.e., it is used to decrease the probability of error when the upper description is no longer reliable. Since, the upper description has lower probability of error, the I frame of the lower description can be moved to the right (to occur later in time) and the upper description is trusted with a greater number of P frames before the lower description is used to decrease the overall probability of error. For example, with the time from left to right in FIG. 4, the first P frame after the I frame of the lower description occurs at the same time as the fifth P frame of the upper description, thereby providing a decreased overall probability of error. The situation for the lower description is the opposite.
  • The optimal placement of the I-frames for descriptions one and two can with given probabilities of error and expected distortion be calculated in a minimization problem. The expected value of the total distortion is minimized with respect to the relative placement of the I-frames. In brief, the expression for the expected distortion is shown to occur in periods, why it is sufficient to solve the minimization problem only for an interval between two I-frames in either description. Next, the expression for the expected distortion in this interval is differentiated with respect to the length of I-frame displacement, giving an extremum. Since, the problem now lies in an interval, the minimum is found by evaluating the expected distortion at the extremum and at the boundaries of the interval. This will be described further in the following.
  • Let us assume that the network is modelled by two independent Gilbert channel models A and B, where state 0 denotes error free transmission and state 1 denotes erroneous transmission. The following table defines the properties that are assumed to be known about the two Gilbert channels. Also, the expected average distortions for different channel realizations are defined.
  • Variable Meaning
    px Probability that channel X is in state 1 if it
    previously was in state 0.
    qx Probability that channel X is in state 0 if it
    previously was in state 1.
    D0 Distortion of the output video if both
    descriptions are received.
    Dx Distortion of the output video if only
    description X is received.
    DT Distortion of the output video if no
    descriptions are received.

    Let us define the following variables to simplify notation.
  • π A = q A p A + q A π B = q B p B + q B r A = 1 - p A r B = 1 - p B
  • The optimization problem is to minimize the expectation of distortion D over all frames kε{−κ,κ} for the discrete displacement variable Δε{0,K−1} where K denotes the I-frame period length.
  • min Δ { 0 , K - 1 } E [ D ] = min Δ [ 0 , K ) 1 2 κ k = - κ κ ( D 0 π A r A mod κ K π B r B mod κ - Δ K + D A π A r A mod k K ( 1 - π B r B mod k - Δ K ) + D B π B r B mod k - Δ K ( 1 - π A r A mod k K ) + D T ( 1 - π A r A mod k K ) ( 1 - π B r B mod k - Δ K ) ) ,
  • where
  • mod a b
  • denotes the modulo b division of a. Let us approximate that the distortion summation is represented by the following integral, in which the frame number kε{−κ,κ}, and the displacement variable Δε[0,K) are continuous.
  • min Δ [ 0 , K ) D = min Δ [ 0 , K ) 1 2 κ - κ κ ( D 0 π A r A mod k K π B r B mod k - Δ K + D A π A r A mod k K ( 1 - π B r B mod k - Δ K ) + D B π B r B mod k - Δ K ( 1 - π A r A mod k K ) + D T ( 1 - π A r A mod k K ) ( 1 - π B r B mod k - Δ K ) ) k = min Δ [ 0 , K ) 0 Δ ( D 0 π A r A k π B r B K + k - Δ + D A π A r A k ( 1 - π B r B K + k - Δ ) + D B π B r B K + k - Δ ( 1 - π A r A k ) + D T ( 1 - π A r A k ) ( 1 - π B r B K + k - Δ ) ) k + Δ K ( D 0 π A r A k π B r B k - Δ + D A π A r A k ( 1 - π B r B k - Δ ) + D B π B r B k - Δ ( 1 - π A r A k ) + D T ( 1 - π A r A k ) ( 1 - π B r B k - Δ ) ) k = min Δ [ 0 , K ) 0 K ( π A r A k ( D A - D T ) + D T ) k + π B r B K - Δ 0 Δ ( π A ( D 0 - D A - D B + D T ) r A k r B k + ( D B - D T ) r B k ) k + π B r B - Δ Δ K ( π A ( D 0 - D A - D B + D T ) r A k r B k + ( D B - D T ) r B k ) k = min Δ [ 0 , K ) 0 K ( π A r A k ( D A - D T ) + D T ) k + π B r B K - Δ 0 Δ ( π A D 1 r A k r B k + D 2 r B k ) k + π B r B - Δ Δ K ( π A D 1 r A k r B k + D 2 r B k ) k
  • where D1=D0−DA−DB+DT and D2=DB−DT.
  • Differentiate D with respect to Δ and set equal to zero to find an extremum.
  • D Δ = Δ { π B r B K - Δ 0 Δ ( π A D 1 r A k r B k + D 2 r B k ) k } + Δ { π B r B - Δ Δ K ( π A D 1 r A k r B k + D 2 r B k ) k } = - π B r B K - Δ ln ( r B ) 0 Δ ( π A D 1 r A k r B k + D 2 r B k ) k + π B r B K - Δ ( π A D 1 r A Δ r B Δ + D 2 r B Δ ) - π B r B - Δ ln ( r B ) 0 K ( π A D 1 r A k r B k + D 2 r B k ) k - π B r B - Δ ( π A D 1 r A Δ r B Δ + D 2 r B Δ ) = π B r B - D ( ( π B D 1 r A Δ r B Δ + D 2 r B Δ ) ( r B K - 1 ) - ln ( r B ) ( r B K 0 Δ ( π A D 1 r A k r B k + D 2 r B k ) k + Δ K ( π A D 1 r A k r B k + D 2 r B k ) k ) ) = 0
  • Using that ∫ax dx=ax ln−1(a)+C and the notation
  • R = ln ( r B ) ln ( r A ) + ln ( r B ) ,
  • the extremum in the following equation.
  • 0 = r A Δ r B Δ { π A D 1 ( r B K - 1 - Rr B K + R ) } + D 1 π A R ( 1 - r A K ) r B K = r A Δ r B Δ { r B K - 1 - Rr B K + R } + R ( 1 - r A K ) r B K = γ r A Δ r B Δ + α , where γ r B K - 1 - Rr B K + R and α R ( 1 - r A K ) r B K .
  • Hence, the displacement is given by
  • Δ = ln ( - α / γ ) ln ( r A + ln ( r B ) )
  • and is dependent only on rA and rB, i.e., the probabilities that the transmission in the channels will remain error free if the previous transmission was error free.
  • Since the range of Δ is bounded, the minimum of D for Δε[0,K) is given by the minimum of D(Δ=0) and
  • D ( Δ = ln ( - α / γ ) ln ( r A ) + ln ( r B ) ) .
  • The solution for the discrete problem that we started with is the displacement Δ that gives the minimum value of D(Δ=0),
  • D ( Δ = ln ( - α / γ ) ln ( r A ) + ln ( r B ) ) , D ( Δ = ln ( - α / γ ) ln ( r A ) + ln ( r B ) )
  • and D(Δ=K−1). The brackets └·┘ and ┌·┐ denote the floor and ceil operations, respectively.
  • It should be noted that the detailed description above of different embodiments of the invention has been given by way of illustration only and that these therefore are not intended to limit the scope of the invention, as it is defined by the appended claims. Furthermore, it will be appreciated that various alterations and modifications falling within the scope of the appended claims will become apparent to those skilled in the art when studying the claims and the detailed description.

Claims (20)

What is claimed is:
1. A method comprising:
encoding, by a processor of an encoder in response to instructions stored on a non-transitory computer readable medium, a plurality of input frames from an input video sequence, wherein the plurality of input frames includes a first input frame, wherein encoding the plurality of input frames includes:
generating a first plurality of encoded frames based on the plurality of input frames such that the first plurality of encoded frames includes a first encoded I-frame corresponding to the first input frame, and
generating a second plurality of encoded frames based on the plurality of input frames such that the second plurality of encoded frames includes a first encoded P-frame corresponding to the first input frame;
including the first plurality of encoded frames and the second plurality of encoded frames in an output; and
transmitting the output to a decoder.
2. The method of claim 1, wherein transmitting the output to the decoder includes transmitting the output to the decoder via a wireless electronic communication medium.
3. The method of claim 1, wherein encoding the plurality of input frames includes determining I-frame placement, wherein determining I-frame placement includes:
determining whether to generate the first plurality of encoded frames such that the first plurality of encoded frames includes the first encoded I-frame, and generate the second plurality of encoded frames such that the second plurality of encoded frames includes the first encoded P-frame.
4. The method of claim 3, wherein determining I-frame placement is based on minimizing expected distortion at the decoder.
5. The method of claim 4, wherein minimizing expected distortion at the decoder is based on an identified network characteristic for transmitting the output to the decoder.
6. The method of claim 5, wherein the identified network characteristic is an identified transmission error probability for transmitting the output to the decoder.
7. The method of claim 1, wherein the plurality of input frames includes a second input frame subsequent to the first input frame, and wherein encoding the plurality of input frames includes:
generating the first plurality of encoded frames such that the first plurality of encoded frames includes a second encoded P-frame corresponding to the second input frame; and
generating the second plurality of encoded frames such that the second plurality of encoded frames includes a second encoded I-frame corresponding to the first input frame.
8. The method of claim 7, wherein determining I-frame placement includes:
identifying the second input frame such that a temporal distance between the first input frame and the second input frame is an identified temporal distance, wherein the temporal distance indicates a cardinality of a set of frames from the plurality of input frames temporally between the first input frame and the second input frame in the input video sequence.
9. The method of claim 8, wherein determining I-frame placement includes:
determining, by the processor of the encoder, the identified temporal distance based on minimizing expected distortion at the decoder.
10. The method of claim 9, wherein minimizing expected distortion at the decoder is based on an identified network characteristic for transmitting the output to the decoder.
11. The method of claim 10, wherein the identified network characteristic is an identified transmission error probability for transmitting the output to the decoder.
12. The method of claim 1, wherein encoding the plurality of input frames includes:
concurrently generating the first encoded I-frame and the first encoded P-frame.
13. A method comprising:
encoding, by a processor of an encoder in response to instructions stored on a non-transitory computer readable medium, a plurality of input frames from an input video sequence, wherein the plurality of input frames includes a first input frame and a second input frame subsequent to the first input frame, wherein encoding the plurality of input frames includes:
generating a first plurality of encoded frames based on the plurality of input frames such that the first plurality of encoded frames includes a first encoded I-frame corresponding to the first input frame and a first encoded P-frame corresponding to the second input frame, and
generating a second plurality of encoded frames based on the plurality of input frames such that the second plurality of encoded frames includes a second encoded P-frame corresponding to the first input frame and a second encoded I-frame corresponding to the first input frame;
including the first plurality of encoded frames and the second plurality of encoded frames in an output; and
transmitting the output to a decoder.
14. The method of claim 13, wherein transmitting the output to the decoder includes transmitting the output to the decoder via a wireless electronic communication medium.
15. The method of claim 13, wherein encoding the plurality of input frames includes:
identifying the second input frame such that a temporal distance between the first input frame and the second input frame is an identified temporal distance, wherein the temporal distance indicates a cardinality of a set of frames from the plurality of input frames temporally between the first input frame and the second input frame in the input video sequence.
16. The method of claim 15, wherein encoding the plurality of input frames includes:
determining, by the processor of the encoder, the identified temporal distance based on minimizing expected distortion at the decoder.
17. The method of claim 16, wherein minimizing expected distortion at the decoder is based on an identified network characteristic for transmitting the output to the decoder.
18. The method of claim 17, wherein the identified network characteristic is an identified transmission error probability for transmitting the output to the decoder.
19. The method of claim 13, wherein encoding the plurality of input frames includes:
concurrently generating the first encoded I-frame and the second encoded P-frame; and
concurrently generating the second encoded I-frame and the first encoded P-frame.
20. A method comprising:
encoding, by a processor of an encoder in response to instructions stored on a non-transitory computer readable medium, a plurality of input frames from an input video sequence, wherein encoding the plurality of input frames includes:
for each input frame from the plurality of input frames:
in response to a determination to encode the input frame as a first I-frame and include the first I-frame in a first plurality of encoded frames:
encoding the input frame as the first I-frame,
including the first I-frame in the first plurality of encoded frames,
encoding the input frame as a first P-frame, wherein encoding the input frame as the first P-frame includes encoding the input frame as the first P-frame independently of and concurrently with encoding the input frame as the first I-frame, and
including the first P-frame in a second plurality of encoded frames; and
in response to a determination to encode the input frame as a second P-frame and include the second P-frame in the first plurality of encoded frames:
encoding the input frame as the second P-frame,
including the second P-frame in the first plurality of encoded frames,
in response to a determination to encode the input frame as a second I-frame and include the second I-frame in the second plurality of encoded frames:
 encoding the input frame as the second I-frame, wherein encoding the input frame as the second I-frame includes encoding the input frame as the second I-frame independently of and concurrently with encoding the input frame as the second P-frame; and
 including the second I-frame in the second plurality of encoded frames, and
in response to a determination to encode the input frame as a third P-frame and include the third P-frame in the second plurality of encoded frames:
 encoding the input frame as the third P-frame, wherein encoding the input frame as the third P-frame includes encoding the input frame as the third P-frame independently of and concurrently with encoding the input frame as the second P-frame; and
 including the third P-frame in the second plurality of encoded frames, and
including the first plurality of encoded frames and the second plurality of encoded frames in an output; and
transmitting the output to a decoder.
US14/834,624 2007-02-01 2015-08-25 Independent temporally concurrent Video stream coding Active 2030-02-26 US10291917B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/834,624 US10291917B2 (en) 2007-02-01 2015-08-25 Independent temporally concurrent Video stream coding

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US89871807P 2007-02-01 2007-02-01
US12/068,025 US8073049B2 (en) 2007-02-01 2008-01-31 Method of coding a video signal
US13/281,087 US8582662B2 (en) 2007-02-01 2011-10-25 Method of coding a video signal
US14/077,304 US9137561B2 (en) 2007-02-01 2013-11-12 Independent temporally concurrent video stream coding
US14/834,624 US10291917B2 (en) 2007-02-01 2015-08-25 Independent temporally concurrent Video stream coding

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US14/077,304 Continuation US9137561B2 (en) 2007-02-01 2013-11-12 Independent temporally concurrent video stream coding

Publications (2)

Publication Number Publication Date
US20160065967A1 true US20160065967A1 (en) 2016-03-03
US10291917B2 US10291917B2 (en) 2019-05-14

Family

ID=39715875

Family Applications (4)

Application Number Title Priority Date Filing Date
US12/068,025 Expired - Fee Related US8073049B2 (en) 2007-02-01 2008-01-31 Method of coding a video signal
US13/281,087 Expired - Fee Related US8582662B2 (en) 2007-02-01 2011-10-25 Method of coding a video signal
US14/077,304 Expired - Fee Related US9137561B2 (en) 2007-02-01 2013-11-12 Independent temporally concurrent video stream coding
US14/834,624 Active 2030-02-26 US10291917B2 (en) 2007-02-01 2015-08-25 Independent temporally concurrent Video stream coding

Family Applications Before (3)

Application Number Title Priority Date Filing Date
US12/068,025 Expired - Fee Related US8073049B2 (en) 2007-02-01 2008-01-31 Method of coding a video signal
US13/281,087 Expired - Fee Related US8582662B2 (en) 2007-02-01 2011-10-25 Method of coding a video signal
US14/077,304 Expired - Fee Related US9137561B2 (en) 2007-02-01 2013-11-12 Independent temporally concurrent video stream coding

Country Status (1)

Country Link
US (4) US8073049B2 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9191671B2 (en) 2012-04-19 2015-11-17 Vid Scale, Inc. System and method for error-resilient video coding
GB2505912B (en) * 2012-09-14 2015-10-07 Canon Kk Method and device for generating a description file, and corresponding streaming method
AU2019260722A1 (en) * 2018-04-26 2020-10-08 Phenix Real Time Solutions, Inc. Adaptive bit-rate methods for live broadcasting

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6357045B1 (en) * 1997-03-31 2002-03-12 Matsushita Electric Industrial Co., Ltd. Apparatus and method for generating a time-multiplexed channel surfing signal at television head-end sites
US20030012278A1 (en) * 2001-07-10 2003-01-16 Ashish Banerji System and methodology for video compression
US20030031251A1 (en) * 2001-06-29 2003-02-13 Shinichiro Koto Video encoding method and apparatus
US20060114995A1 (en) * 2004-12-01 2006-06-01 Joshua Robey Method and system for high speed video encoding using parallel encoders
US20060245735A1 (en) * 2005-04-28 2006-11-02 Masakazu Kanda Image recording device and method for driving image recording device

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19531847A1 (en) 1995-08-29 1997-03-06 Sel Alcatel Ag Device for storing video image data
JP3976942B2 (en) * 1998-12-18 2007-09-19 キヤノン株式会社 Image processing apparatus and method, and computer-readable recording medium on which an image processing program is recorded
US20030185455A1 (en) * 1999-02-04 2003-10-02 Goertzen Kenbe D. Digital image processor
US7072393B2 (en) * 2001-06-25 2006-07-04 International Business Machines Corporation Multiple parallel encoders and statistical analysis thereof for encoding a video sequence
US20030007515A1 (en) * 2001-07-03 2003-01-09 Apostolopoulos John G. System and method for receiving mutiple description media streams in fixed and mobile streaming media systems
US6804301B2 (en) * 2001-08-15 2004-10-12 General Instrument Corporation First pass encoding of I and P-frame complexity for compressed digital video
DE10149544B4 (en) * 2001-10-08 2004-11-11 Rohde & Schwarz Gmbh & Co. Kg Method for determining the time offset of a CDMA signal and computer program for carrying out the method
MXPA04010318A (en) * 2002-04-23 2005-02-03 Nokia Corp Method and device for indicating quantizer parameters in a video coding system.
US7379496B2 (en) * 2002-09-04 2008-05-27 Microsoft Corporation Multi-resolution video coding and decoding
US6788225B2 (en) 2002-12-19 2004-09-07 Sony Corporation System and method for intraframe timing in multiplexed channel
CN100387043C (en) 2003-01-28 2008-05-07 汤姆森特许公司 Robust mode staggercasting
EP1578131A1 (en) 2004-03-18 2005-09-21 STMicroelectronics S.r.l. Encoding/decoding methods and systems, computer program products therefor
US7668712B2 (en) * 2004-03-31 2010-02-23 Microsoft Corporation Audio encoding and decoding with intra frames and adaptive forward error correction
WO2005099274A1 (en) * 2004-04-08 2005-10-20 Koninklijke Philips Electronics N.V. Coding method applied to multimedia data
EP1615441A1 (en) 2004-07-06 2006-01-11 STMicroelectronics S.r.l. Multiple description coding combined with channel encoding
JP2006093843A (en) 2004-09-21 2006-04-06 Denon Ltd Video recording and reproducing apparatus
US8139642B2 (en) * 2005-08-29 2012-03-20 Stmicroelectronics S.R.L. Method for encoding signals, related systems and program product therefor
US7903743B2 (en) * 2005-10-26 2011-03-08 Mediatek Inc. Memory sharing in video transcoding and displaying
US8340183B2 (en) * 2007-05-04 2012-12-25 Qualcomm Incorporated Digital multimedia channel switching

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6357045B1 (en) * 1997-03-31 2002-03-12 Matsushita Electric Industrial Co., Ltd. Apparatus and method for generating a time-multiplexed channel surfing signal at television head-end sites
US20030031251A1 (en) * 2001-06-29 2003-02-13 Shinichiro Koto Video encoding method and apparatus
US20030012278A1 (en) * 2001-07-10 2003-01-16 Ashish Banerji System and methodology for video compression
US20060114995A1 (en) * 2004-12-01 2006-06-01 Joshua Robey Method and system for high speed video encoding using parallel encoders
US20060245735A1 (en) * 2005-04-28 2006-11-02 Masakazu Kanda Image recording device and method for driving image recording device

Also Published As

Publication number Publication date
US20120039392A1 (en) 2012-02-16
US8582662B2 (en) 2013-11-12
US10291917B2 (en) 2019-05-14
US9137561B2 (en) 2015-09-15
US20140079123A1 (en) 2014-03-20
US8073049B2 (en) 2011-12-06
US20080205520A1 (en) 2008-08-28

Similar Documents

Publication Publication Date Title
US6317462B1 (en) Method and apparatus for transmitting MPEG video over the internet
US6490705B1 (en) Method and apparatus for receiving MPEG video over the internet
KR100296660B1 (en) Error for decoding video signal
RU2297729C2 (en) Method for grouping image frames during video decoding
Hannuksela et al. Isolated regions in video coding
KR101012149B1 (en) Video coding
Apostolopoulos Error-resilient video compression through the use of multiple states
US6744924B1 (en) Error concealment in a video signal
KR101739821B1 (en) Methods for error concealment due to enhancement layer packet loss in scalable video coding (svc) decoding
EP1746845A2 (en) Video error concealment method
JP4654244B2 (en) Method for forming a frame of a video sequence
MXPA05011533A (en) Picture coding method.
CN113348666B (en) Method and system for decoding an encoded video stream
US10291917B2 (en) Independent temporally concurrent Video stream coding
Ducla-Soares et al. Error resilience and concealment performance for MPEG-4 frame-based video coding
RU2341037C2 (en) Partial intraframe coding mode and system for multimedia radio transmission
EP1954056A1 (en) Multiple description coding and transmission of a video signal
Fernandez et al. Error concealment and early resynchronization techniques for MPEG-2 video streams damaged by transmission over ATM networks
Richardson et al. Intelligent packetising of MPEG video data
Rhaiem et al. New robust decoding scheme-aware channel condition for video streaming transmission
North MPEG video and ATM network cell loss: analysis and experimentation
Scoville et al. A Dependency-based Strategy for Handling ATM Cell Loss in MPEG-2 Transport Streams
EP1387587A1 (en) Image encoder and decoder with error concealment of motion vector losses
Aladrovic et al. An error resilience scheme for layered video coding
Ling Error-Resilient Coding Tools In MPEG-4

Legal Events

Date Code Title Description
AS Assignment

Owner name: GOOGLE INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GLOBAL IP SOLUTIONS (GIPS) AB;GLOBAL IP SOLUTIONS, INC.;SIGNING DATES FROM 20110818 TO 20110819;REEL/FRAME:036601/0230

Owner name: GLOBAL IP SOLUTIONS, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KOZICA, ERMIN;ZACHARIAH, DAVE;KLEIJN, WILLEM BASTIAAN;SIGNING DATES FROM 20080308 TO 20080319;REEL/FRAME:036601/0126

Owner name: GLOBAL IP SOLUTIONS (GIPS) AB, SWEDEN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KOZICA, ERMIN;ZACHARIAH, DAVE;KLEIJN, WILLEM BASTIAAN;SIGNING DATES FROM 20080308 TO 20080319;REEL/FRAME:036601/0126

AS Assignment

Owner name: GOOGLE LLC, CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:GOOGLE INC.;REEL/FRAME:044129/0001

Effective date: 20170929

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4