EP1719343A1 - Envoi d'informations video - Google Patents

Envoi d'informations video

Info

Publication number
EP1719343A1
EP1719343A1 EP04713602A EP04713602A EP1719343A1 EP 1719343 A1 EP1719343 A1 EP 1719343A1 EP 04713602 A EP04713602 A EP 04713602A EP 04713602 A EP04713602 A EP 04713602A EP 1719343 A1 EP1719343 A1 EP 1719343A1
Authority
EP
European Patent Office
Prior art keywords
macroblocks
group
frame
switching
frames
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP04713602A
Other languages
German (de)
English (en)
Other versions
EP1719343A4 (fr
Inventor
Ru-Shang Wang
Ragip Kurceren
Viktor Varsa
Keith Miller
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nokia Oyj
Original Assignee
Nokia Oyj
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Oyj filed Critical Nokia Oyj
Publication of EP1719343A1 publication Critical patent/EP1719343A1/fr
Publication of EP1719343A4 publication Critical patent/EP1719343A4/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/127Prioritisation of hardware or computational resources
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/156Availability of hardware or computational resources, e.g. encoding based on power-saving criteria

Definitions

  • the present invention relates to a method for transmitting video information, in which at least one bitstream is formed from the video information comprising a set of frames.
  • the invention also relates to an encoder, a decoder, a transmission system, a signal, and a computer product.
  • a typical video stream comprises a sequence of pictures, often referred to as frames.
  • the frames comprise pixels arranged into a rectangular form.
  • Intra frames l-frames
  • Predictive frames P-frames
  • Bidirectional frames B-frames
  • Each picture type exploits a different type of redundancy in a sequence of images and consequently results in different level of compression efficiency and, as explained in the following, provides different functionality within the encoded video sequence.
  • An intra frame is a frame of video data that is coded by exploiting only the spatial correlation of the pixels within the frame itself without using any information from the past or the future frames. Intra frames are used as the basis for decoding/decompression of other frames and provide access points to the coded sequence where decoding can begin.
  • a predictive frame is a frame that is encoded/compressed using motion compensated prediction from a so-called reference frame, i.e. one or more previous/subsequent Intra frames or Predictive frames available in an encoder or in a decoder.
  • a bi-directional frame is a frame that is encoded/compressed by prediction from a previous Intra frame or Predictive frame and/or a subsequent Intra frame or Predictive frame. Since adjacent frames in a typical video sequence are highly correlated, higher compression can be achieved when using Bidirectional or Predictive frames instead of Intra frames.
  • Figs. 1A-1 C illustrate the types of encoded/compressed video frames used in a typical video encoding/decoding system.
  • the pictures of the video sequence are represented by these matrices of multiple-bit numbers, one representing the luminance (brightness) of the image pixels, and the other two each representing a respective one of two chrominance (color) components.
  • Fig. 1A depicts the way in which an Intra frame 200 is encoded using only image information present in the frame itself.
  • Fig. 1 B illustrates construction of a Predictive frame 210.
  • Arrow 205a represents the use of motion compensated prediction to create the P-frame 210.
  • Fig. 1C depicts construction of Bi-directional frames 220.
  • B-frames are usually inserted between l-frames or P-frames.
  • Fig. 2 represents a group of pictures in display order and illustrates how B-frames inserted between l-and P-frames, as well as showing the direction in which motion compensation information flows.
  • arrows 205a depict forward motion compensation prediction information necessary to reconstruct P-frames 210
  • arrows 215a and 215b depict motion compensation information used in reconstructing B-frames 220 in forward direction (215a) and backward direction (215b).
  • the arrows 205a and 215a show the flow of information when predictive frames are predicted from frames that are earlier in display order than the frame being reconstructed
  • arrows 215b show the flow of information when predictive frames are predicted from frames that are later in display order than the frame being reconstructed.
  • motion vectors are used to describe the way in which pixels or regions of pixels move between successive frames of the sequence.
  • the motion vectors provide offset values and error data that refer to a past or a future frame of video data having decoded pixel values that may be used with the error data to compress/encode or decompress/decode a given frame of video data.
  • the decoding order differs from the display order because the B- frames require future I- or P-frames for their decoding.
  • Fig. 2 displays the beginning of the above frame sequence and can be referred to in order to understand the dependencies of the frames, as described earlier.
  • P-frames require the previous I- or P-reference frame be available. For example, P 4 requires to be decoded.
  • frame P 6 requires that P 4 be available in order to decode/decompress frame P 6 .
  • B-frames such as frame B 3 , require a past and/or a future I- or P- reference frame, such as P 4 and ⁇ in order to be decoded.
  • B-frames are frames between l-or P-frames during encoding.
  • Video streaming has emerged as an important application in the fixed Internet. It is further anticipated that video streaming will also be important in the future of 3G wireless networks.
  • the transmitting server starts transmitting a pre-encoded video bit stream via a transmission network to a receiver upon a request from the receiver. The receiver plays the video stream back while receiving it.
  • the best-effort nature of present networks causes variations in the effective bandwidth available to a user due to the changing network conditions.
  • the transmitting server can scale the bit rate of the compressed video. In the case of a conversational service characterized by real-time encoding and point-to-point delivery, this can be achieved by adjusting the source encoding parameters on the fly.
  • Such adjustable parameters can be, for example, a quantisation parameter, or a frame rate. The adjustment is advantageously based on feedback from the transmission network. In typical streaming scenarios when a previously encoded video bit stream is to be transmitted to the receiver, the above solution cannot be applied.
  • One solution to achieve bandwidth scalability in case of pre-encoded sequences is to produce multiple and independent streams having different bit-rates and quality.
  • the transmitting server then dynamically switches between the streams to accommodate variations in the available bandwidth.
  • the following example illustrates this principle. Let us assume that multiple bit streams are generated independently with different encoding parameters, such as quantisation parameter, corresponding to the same video sequence. Let ⁇ P ⁇ , n - ⁇ , P ⁇ ,n> P ⁇ , n + ⁇ and ⁇ P 2 ,n--t. P2,n > P2,n+ ⁇ denote the sequence of decoded frames from bit streams 1 and 2, respectively.
  • a video streaming/delivery system inevitably suffers from video quality degradation due to transmission errors.
  • the transmission errors can be roughly classified into random bit errors and erasure errors (packet loss).
  • Many error control and concealment techniques try to avoid this problem by forward error concealment, post-processing and interactive error concealment.
  • the predicted video coding mechanism has low tolerance on packet loss where the error caused by a missing block will propagate and thus create objectionable visual distortion.
  • the intra macroblock insertion which is based on the forward error concealment, can stop the error propagation by introducing a self- contained intra macroblock and concealing the erroneous block.
  • the problem with the introduced intra macroblock is that the coding of such a macroblock increases the amount of information of the bit stream, thus reducing coding efficiency, and that it is not scalable.
  • An Adaptive Intra Refresh (AIR) system described in MPEG-4 standard (Worral, "Motion Adaptive Intra Refresh for MPEG-4", Electronics Letters November 2000) Worral mentions the inserting intra macroblocks at later and later positions in succeeding frames as part of a motion-adaptive scheme. Deciding when to insert the macroblocks (when bandwidth is available for that frame) is shown to benefit from identifying image areas with high motion. Worral notes that his approach is backward-compatible with the standard (does not require a standard change).
  • the encoder moves down the frame encoding intra macroblocks until the number of preset macroblocks have been encoded. For the next frame the encoder starts in the same position, and begins encoding intra macroblocks.
  • the purpose of the insertion of intra macroblocks is to try to minimize the propagation of artefacts caused by an erroneous macroblock and to stop the propagation.
  • Another alternative is the Random Intra Refresh (RIR) used in the JM61e H.264 reference software where intra macroblocks are randomly inserted.
  • RIR Random Intra Refresh
  • the coding efficiency is fixed for systems based on the Adaptive Intra Refresh or the Random Intra Refresh.
  • the packet loss rate is different from time to time, wherein schemes such as AIR cannot reflect the packet loss rate to optimize for the performance.
  • the error protection of AIR is non-scalable. In good connection conditions the quality is not optimized due to the inserted intra blocks.
  • Video Streaming Server It is important for Video Streaming Server to be able to adapt to different connection conditions and different network types such as wired and wireless networks.
  • Bitstream switching scheme where multiple bitstreams are used provides a low complexity way for a server to adapt to varying connection conditions without re-encoding video content, which requires high computation power.
  • switching from one bitstream to another produces pixel drift problem if the switching takes place at a predicted frame. Since the reference frame is taken from another bitstream, the mismatch would propagate and thus degrade the video quality.
  • bitstream switching The problem with bitstream switching is that the switching point must be an intra frame (key frame), otherwise a pixel mismatch which degrades the video quality will occur until the next intra frame. During a video streaming session it is desirable that the switching can take place at any frame. However, it is not easy to implement such a system without affecting significant reduction to coding efficiency.
  • Regular intra frames can be used to provide switching points. But, more frequent the intra frames more bits are required which will lower the video quality.
  • One scheme provides extra bitstream with all intra frames at a certain period of, say, one second and during switching the intra frame will be used for switching, which will minimize the prediction error.
  • Another simple technique is just to switch at any frame, which in general suffers from pixel drift quite significantly.
  • a correct (mismatch-free) switching between video streams can be enabled by forming a special type of a compressed video frame and inserting frames of the special type into video bit-streams at locations where switching from one bit-stream to another is to be allowed.
  • the patent application WO02054776 describes switching frames which are used for enabling the system to perform the switching from one bit stream to another without the need to insert Intra frames into the bit stream for switching locations.
  • the special type of compressed video frame will be referred to generally as an S-frame (Switching).
  • S-frames may be classified as SP-frames (Switching Predictive), which are formed at the decoder using motion compensated prediction from already decoded frames using motion vector information, and Sl-frames, which are formed at the decoder using spatial (intra) prediction from already decoded neighbouring pixels within a frame being decoded.
  • SP-frames Switchching Predictive
  • Sl-frames which are formed at the decoder using spatial (intra) prediction from already decoded neighbouring pixels within a frame being decoded.
  • an S-frame is formed on a block-by-block basis and may comprise both inter-coded (SP) blocks as well as intra-coded (SI) blocks (Switching Intra).
  • the special type of frame allows switching between bit streams to occur not only at the locations of l-frames but also at the locations of the SP-frames.
  • the coding efficiency of an SP-frame is much better than the coding efficiency of a typical l-frame wherein less bandwidth is needed to transmit bit streams having SP-frames in locations where l-frames would be used.
  • the switching of one bit stream into another can be performed at locations in which an SP-frame is placed in the encoded bit stream.
  • the invention is based on the idea that some of the macroblocks of SP-frames are replaced with Intra macroblocks or Sl-macroblocks. This procedure is repeated to successive frames so that after a certain number of successive SP- frames are transmitted and decoded, substantially all macroblocks of the frame area (image) are replaced with intra macroblocks. This means that substantially the whole image area is refreshed by the Intra macroblocks or Sl-macroblocks.
  • the replacement procedure proceeds slice-by-slice until enough number of frames are modified.
  • the replacement order can be different in different implementations. It is also possible to apply the invention so that the replacement order is not fixed but it is variable. Further, it is also possible that the number of replaced macroblocks need not be more than one, i.e. in some situations one macroblock is replaced by another macroblock. For example, this kind of replacement may be used in a situation in which a slice contains only one macroblock and that macroblock is replaced by another type of macroblock.
  • SI systematic intra refresh
  • a method for transmitting video information in which at least one bitstream is formed from the video information comprising a set of frames, the frames comprising macroblocks, wherein the method comprises: forming at least one switching frame into said bitstream; - arranging macroblocks of said switching frame into a first and a second group of macroblocks; encoding each macroblock of said first group of macroblocks by a first encoding method to provide a switching point for continuing the transmission of video information with another bitstream formed from the video information; and encoding macroblocks of said second group of macroblocks by another encoding method.
  • an encoder for encoding video information into at least one bitstream, the video information comprising a set of frames comprising macroblocks
  • the encoder comprising: means for forming at least one switching frame into said bitstream; - grouping means for arranging macroblocks of said switching frame into a first and a second group of macroblocks; first encoding means for encoding each macroblock of said first group of macroblocks by a first encoding method to provide a switching point for continuing the transmission of video information with another bitstream formed from the video information; and second encoding means for encoding macroblocks of said second group of macroblocks by another encoding method.
  • a transmission system for transmitting video information comprising an encoder for encoding video information into at least one bitstream, a transmitter for transmitting the bit stream to a receiver, and a decoder for decoding the bitstream transmitted to the receiver, the video information comprising a set of frames comprising macroblocks
  • the encoder comprising: means for forming at least one switching frame into said bitstream; grouping means for arranging macroblocks of said switching frame into a first and a second group of macroblocks; first encoding means for encoding each macroblock of said first group of macroblocks by a first encoding method to provide a switching point for continuing the transmission of video information with another bitstream formed from the video information; and second encoding means for encoding macroblocks of said second group of macroblocks by another encoding method; the decoder comprising first decoding means for decoding each macroblock of said first group of macroblocks by a first decoding method corresponding to the first encoding
  • a computer program product comprising machine executable steps for transmitting video information, in which at least one bitstream is formed from the video information comprising a set of frames, the frames comprising macroblocks
  • the computer program product further comprises machine executable steps for: forming at least one switching frame into said bitstream; arranging macroblocks of said switching frame into a first and a second group of macroblocks; - encoding each macroblock of said first group of macroblocks by a first encoding method to provide a switching point for continuing the transmission of video information with another bitstream formed from the video information; and encoding macroblocks of said second group of macroblocks by another encoding method.
  • a method for reducing effects of transmission errors in transmission of video information in which at least one bitstream is formed from the video information comprising a set of frames, the frames comprising macroblocks, wherein the method comprises: forming at least one SP-encoded frame into said bitstream by predictively encoding the macroblocks of the frame; replacing part of the SP-encoded macroblocks with macroblocks encoded by an intra encoding method; and - transmitting the encoded frame containing both predictively encoded macroblocks and intra encoded macroblocks instead of said SP-encoded frame.
  • a computer program product comprising machine executable steps for reducing effects of transmission errors in transmission of video information, in which at least one bitstream is formed from the video information comprising a set of frames, the frames comprising macroblocks, wherein the computer program product further comprises machine executable steps for: forming at least one SP-encoded frame into said bitstream by predictively encoding the macroblocks of the frame; replacing part of the SP-encoded macroblocks with macroblocks encoded by an intra encoding method; and transmitting the encoded frame containing both predictively encoded macroblocks and intra encoded macroblocks instead of said SP- encoded frame.
  • a signal for transmitting video information in which at least one bitstream is formed from the video information comprising a set of frames, the frames comprising macroblocks
  • the signal comprises: at least one switching frame; macroblocks of said switching frame being arranged into a first and a second group of macroblocks; each macroblock of said first group of macroblocks being encoded by a first encoding method to provide a switching point for continuing the transmission of video information with another bitstream formed from the video information; and - macroblocks of said second group of macroblocks being encoded by another encoding method.
  • the coding efficiency of the method according to the invention is typically better than with the prior art AIR scheme because the coding efficiency for SP macroblock is typically better than an intra macroblock. It has also been measured that the method according to the invention makes the recovery from packet loss typically faster than AIR. The method according to the invention can also be used for bitstream switching while AIR is not very well suitable for this purpose.
  • each intra frame is large in size while SP-frame with one SI slice is smaller in size, so during the switching the intra frame requires an increase of the transmission rate while SI slices will spread the bandwidth more evenly.
  • the invention can provide a scalable error protection for the bitstream, which typically improves the quality of the video during transmission at any packet loss condition. This invention also provides means for bitstream switching at any frame with little pixel drift.
  • FIGs. 1A-1C and 2 are diagrams showing the prior art encoding/compression of video frames
  • Fig. 3a is an illustration showing examples of frames encoded using a method according to the invention
  • Fig. 3b is an illustration showing example of a sequence of frames comprising frames encoded using a method according to the invention
  • Fig. 4 is an illustration showing another example of a sequence of frames encoded using a method according to the invention.
  • Fig. 5 is an illustration showing switching between two different bit streams using SP/SI-frames according to the invention
  • Fig. 6 is a block diagram of an encoder in accordance with an example embodiment of the invention.
  • Fig. 7 is a block diagram of a decoder in accordance with an example embodiment of the invention.
  • Fig. 8 is a block diagram of a system in accordance with an example embodiment of the invention. Detailed Description of the Invention
  • bit streams are formed from a video signal from of a video source 2.
  • the video signal can be any digital video signal comprising multiple images, i.e. an image sequence. If multiple bit streams are formed, each of them is encoded from the same video signal using at least partly different encoding parameters.
  • the bit rate can be altered by selecting the encoding parameters differently, and in this way bit streams with different bit rates can be formed.
  • the encoding parameters can be, for example, frame rate, quantisation parameter, spatial resolution, or another factor affecting the image size.
  • the encoder 3 also inserts at least one Intra frame 10 to each bit stream. Typically, at least the first frame of each bit stream is preferably an Intra frame. This enables the decoder 8 to start reconstruction of the video signal.
  • the encoder 3 encodes the I- frames, P-frames, B-frames, SP-frames and Sl-frames from the video signal.
  • the encoder 3 also inserts frames encoded using motion compensated predictive coding (P-frames and optionally B-frames) into the bit streams.
  • the encoder also inserts SP-frames 11 — 19 into each bit stream at locations where switching between different bit streams will be allowed.
  • the SP-frames may be used at locations where in prior art methods an Intra coded frame would be inserted, or the SP-frames may be used in addition to using Intra coded frames in the video sequence.
  • the different bit streams are, for example, transmitted by the transmitter 4 to a streaming server 5. In the streaming server 5 the bit streams can be stored into memory 6 for later use.
  • transmission to the receiver 7 may take place substantially immediately after encoding wherein it is not necessary to store complete video sequences, but storing the necessary reference frames suffices.
  • Transmission of the encoded video stream may be performed e.g. by a streaming server 5.
  • the transmitting server 5 can also have means for transmitting the bit stream to the transmission network (not shown) and/or directly to the receiver 7.
  • QCIF images are used as an example of encoded images.
  • the size of QCIF image is 176 by 144 pixels arranged into 9 rows of 11 macroblocks.
  • the rows can also be called as slices or groups of macrobocks (GOB).
  • Each macroblock consists of 16x16 pixels in this example.
  • the frames 11 — 19 of Fig. 3a can be formed in the encoder 3 in the following way.
  • the encoder 3 encodes the macroblocks of the first slice 11.1 of the first P-frame 11 by using intra encoding wherein the first slice 11.1 contains intra blocks (in this case 11 macroblocks out of 99 macroblocks).
  • the encoder 3 encodes the macroblocks of the other slices 11.2 — 11.9 by using some predictive coding to form predicted blocks such as P- or B-blocks.
  • another slice for example the second slice 12.2
  • the third frame 13 is encoded so that yet another slice of the third frame 13 ⁇ i.e. not the first 13.1 and not the second slice 13.2) is intra encoded and all the other slices are encoded by the predictive encoding method.
  • the procedure will be repeated until substantially all the slices of the image are intra encoded at least once. In the QCIF image example this requires 9 repetitions, i.e.
  • 9 frames 11 — 19 are formed in which one slice is intra encoded and the other slices are encoded by the predictive encoding method. By doing so, the whole image can be refreshed within 9 frames for QCIF image.
  • the method according to the present invention only a minor part of the frames are intra encoded requiring higher bit rate while the majority of the frames are predictively encoded. In practice this means that the invention does not significantly increase the size of the encoded frames in the bit stream and still error recovery can be performed faster than with prior art methods.
  • the encoder 3 encodes also at least one intra frame 10 and inserts it to the bit stream so that the bit stream can be decoded and the images can be reconstructed at the receiving end.
  • the encoder 3 can further add P-frames, B-frames, SP-frames and Sl-frames to the bit stream as in prior art systems.
  • Fig. 3b illustrates an example of a sequence of encoded frames containing frames which are encoded according to the present invention.
  • the sequence contains one or more Intra frames 10 after which there are a number of predicted frames 11 — 19 which have been encoded so that all the macroblocks of one slice of the frames are Intra encoded macroblocks.
  • the Intra frames 10 can be used as switching points, for example, to change the bit rate, to provide a proper place for a scene change, etc.
  • the modification can be performed, for example, if the network, the streaming server 5, the decoder 8 or some other element of the system notices that possibly one or more transmitted packets are lost or corrupted so that the decoder 8 can not properly decode the bit stream.
  • the element which notices the error informs it, for example, to the streaming server 5 which then begins to transmit the modified predicted frames 11 — 19 containing slices of Intra encoded macroblocks. If such frames are not present at the memory 6 (for example the encoder 3 has not encoded such frames), the streaming server 5 informs the encoder 3 and asks it to modify the predicted frames according to the invention. When all the slices are refreshed, i.e.
  • the order in which the slices of the frames are Intra encoded is not necessarily from top (the first slice) to bottom (the last slice) of the frame as described above, but it can also be different from that. In some implementations the order can even be random or virtually random, for example an arbitrary shape that uses Flexible Macroblock Ordering (FMO) described in H.264 standard. The order can also vary during the encoding process.
  • FMO Flexible Macroblock Ordering
  • the order is from top to bottom
  • the order is such that in the first frame of the second set of modified frames the second slice contains Intra encoded macroblocks, in the second frame the third slice is Intra encoded, and so on to the frame before the last frame of the second set of modified frames in which the last slice is Intra encoded, and in the last frame of this second set of modified frames the first slice is Intra encoded.
  • the invention can also be implemented in connection with switching from one bit stream into another.
  • the invention also enables the transmission system to adjust the intra refresh rate adaptively.
  • SP- picture and Sl-picture according to H.264 standard are specially encoded frames where they can be perfectly reconstructed by another SP or SI frames. This property enables the invention to adjust the intra refresh rate adaptively.
  • This invention uses systematic intra refresh scheme described above. With refererence to Figures 4 and 5, two bitstreams 410, 420 are encoded, one encoded with SP slices throughout the whole sequence and the other one encoded with SI slices, which are the exact replicas for all the SP slices. In the example situation mentioned above where QCIF images are used, one QCIF- image contains 176x144 pixels arranged to macroblocks of 16x16 pixels.
  • the QCIF image comprises 9 slices and only one or some of them is/are encoded with SP/SI macroblocks according to the invention.
  • an intra macroblock including SI macroblock requires more bits to encode than a predicted macroblock including SP macroblock.
  • the SP encoded slices are much smaller in size. Since every SP macroblock can be replaced by a SP or SI macroblock without causing any pixel drift problem, so when during the streaming session the bitstream encoded with SP slices can be used to stream to the client (receiver 7) and when the streaming server 5 detects packet loss then SI slices can replace SP slices to conceal the error. Normally any damage to the image can be recovered by SI slices after 9 frames in QCIF case. It is possible to randomly deploy the SI slices depending the rate of lost packets.
  • the advantage of the invention over the AIR is that the coding efficiency is typically better for SP slices during good network conditions and for bad network conditions the systematic intra refresh scheme can typically recover the error faster.
  • the encoder 3 forms two different encoded frames 411 — 415, 421 — 25 (in Fig. 4 only some of the frames are shown) from the same picture information.
  • the first set 410 of frames is encoded using SP encoding, i.e. the slices depicted in Figure 4 of the frames 411 — 415 are SP-encoded slices, in the figure one slice per frame.
  • the second set 420 of frames is encoded so that, for example, one slice of each frame 421 — 425 is Sl-encoded while the other slices of the frames are P-encoded.
  • the two sets 410, 420 of frames can, for example, be stored to the memory 6 of the streaming server 5 for delivery to clients (receivers) either substantially immediately or at a later stage, for example, upon a request by a client device (a receiver).
  • the encoder 3 has also encoded one or more Intra frames and, possibly, P- and/or B-frames into the bit stream.
  • the SP-encoded frames 411 — 415 are transmitted and if the streaming server 5 detects that an error has occurred during the transmission of the frames it begins to transmit the frames of the second set 420 of frames ⁇ i.e. the encoded frames 421 — 425 containing one or a few Sl-encoded slices) instead of the frames of the first set of frames.
  • the streaming server 5 can switch to transmit the frames of the first set 410 of frames.
  • the problem of SIR encoded with intra slices can be that the viewer may perceive a disturbing effect that a scrolling slice rolling from top of the image to the bottom over and over again.
  • This problem can also exist for the SP-encoded frames containing Sl-encoded slices, however the effect is less visible and it only happens for the first SP/SI frame.
  • the first 9 frames for QCIF size image will show similar effect as in SIR case, but it will typically not show any more visual artifacts after that.
  • One method to solve this problem is to encode one SP frame right after an intra frame (generally a scene change frame).
  • Encoding a bitstream for video streaming requires many key frames (in general intra frames) to allow fast forward/backward operation as well as indexing.
  • scene change could be encoded with intra frame 510, 519 and between these two intra frames multiple SP frames 514, 517 could be inserted for fast playback, searching, bitstream switching and error concealment since the SP frames can be replaced with SI frames when necessary.
  • SP/SI-frames 512, 513, 515, 516, 518 could be placed for error concealment and emergency switching.
  • the SP slices and SP frames are encoded first and then SI slices and SI frames.
  • the extra bitstream containing SI slices and frames can be stored along with the main SP bitstream.
  • Each set of bitstreams contains a main bitstream and a SI bitstream and all the main bitstreams of each set are encoded at different bitrates to be used for different connection speeds.
  • Fig. 5 depicts a part of a first bit stream 510 and a part of a second bit stream 520, which are formed in the encoder 3. Only a few frames of the respective bit streams are shown. Specifically, the first bit stream 510 is shown to comprise l-frames 511 , 519, SP-frames 514, 517 and SP/SI-frames 512, 513, 515, 516, 518, while the second bit stream 520 comprises corresponding l-frames 521 , 529, SP-frames 524, 527 and SP/SI-frames 522, 523, 525, 526, 528. It should be noted here that not all the SP/SI frames between SP-frames are shown for clarity.
  • the two bit streams 510 and 520 correspond to the same sequence encoded at different bit rates, for example, by using different frame rates, different spatial resolutions or different quantisation parameters. It is further assumed that the first bit stream 510 is being transmitted from the transmitting server 5 to a decoder 8 (Fig. 7) via a transmission network (not shown), and that the transmitting server 5 receives a request from the transmission network to change the bit rate of the video stream being transmitted. SP-frames are placed in the bit stream during the encoding process at those locations within the video sequences where switching from one bit stream to another is allowed.
  • the transmitting server 5 When the transmitting server 5 reaches the frame of the video sequence encoded as SP-frame 514 in the first bit stream 510, it can begin the necessary operations to continue transmission of the video stream using the encoded frames of the second bit stream 520. At that point the transmitting server 5 has already transmitted frames preceding the SP-frame 514 of the first bit stream 510 and the decoder 8 has received and decoded the respective frames. Thus, those frames have already been stored in the frame memory 750 of the decoder 8.
  • the frame memory 750 comprises sufficient memory to store all those frames, which are needed to reconstruct a P-frame or a B-frame, i.e. the necessary information of all the reference frames required by the current frame to be reconstructed.
  • the transmitting server 5 performs the following operations to continue the transmission of the video stream using the encoded frames of the second bit stream 520.
  • the transmitting server 5 notices, for example, by examining the type information of the frame, that the current frame to be transmitted is an SP-frame, so it is possible to perform switching between the bit streams. Of course, switching is only performed if a request to do so has been received or there is for some other reason a need to perform the switching.
  • the transmitting server 5 inputs the corresponding SP-frame 524SP of the second bit stream, and transmits the SP-frame 524SP to the decoder 8.
  • SP-frame 524SP is a predicted frame using frame 513 as a reference frame to reconstruct SP-frame 524.
  • the transmitting server 5 continues to transmit the encoded frames of the second bit stream 520, i.e., SP/SI-frames 525, 526 following the SP- frame 524SP, other SP-frames 527 and so on.
  • an SP/SI-frame according to the invention such as frames 512, 522, 513, 523 in Figure 5 is constructed on a block-by-block basis.
  • a group of blocks e.g. a slice is coded in such a way as to take advantage of the spatial correlations among pixels of the image being coded (intra or Sl-blocks).
  • Other blocks are coded in such a way as to take advantage of the temporal correlation between blocks of pixels in successive frames of a video sequence (inter or SP-blocks).
  • FIG. 6 is a block diagram of a frame encoder 3 according to a first embodiment of the invention.
  • a video frame to be encoded is first partitioned into blocks and each block is then encoded as either an SP-block, an Sl-block, or an intra- block.
  • Switch 690 is operated as appropriate to switch between the SI and SP encoding modes, i.e., the switch 690 is a construction used in the description of the invention, not necessarily a physical device.
  • switch 690 is operated to obtain a motion compensated prediction for the current block 670.
  • Motion compensated prediction block 670 forms a prediction P(x,y) for the current block of the frame being encoded in a manner analogous to that used in motion compensated prediction known from prior art.
  • motion compensated prediction block 670 forms the prediction P(x,y) for the current block of the frame being encoded by determining a motion vector describing the relationship between the pixels in the current block and pixel values of a reconstructed reference frame held in frame memory 646.
  • Sl-encoding mode switch 690 is operated to obtain a prediction for the current block of the frame being coded from intra prediction block 680.
  • Intra prediction block 680 forms the prediction P(x,y) for the current block of the frame being encoded in a manner analogous to that used in intra prediction known from prior art. More specifically, intra prediction block 680 forms the prediction P(x,y) for the current block of the frame being encoded using spatial prediction from already encoded neighbouring pixels within the frame being encoded.
  • the prediction P(x,y) takes the form of a block of pixel values.
  • a forward transform for example a Discrete Cosine Transform (DCT)
  • DCT Discrete Cosine Transform
  • c pred transform coefficients
  • transform coefficients are passed to quantisation block 620 where they are quantised to form quantised transform coefficients l orig .
  • the summing element 630 receives both sets of quantised transform coefficients l prec ⁇ and l on g from the respective quantisation blocks 650 and 620 and generates a set of quantised prediction error coefficients l err according to the relationship:
  • the quantised prediction error coefficients l err are passed to multiplexer 635. If the current block is encoded in SP-format/mode, multiplexer 635 also receives the motion vectors for the SP-coded block. If the current block is encoded in Sl-format/mode, information concerning the intra prediction mode used to form the prediction for the Sl-coded block in intra prediction block 680 is passed to the multiplexer.
  • variable length coding is applied to the quantised prediction error coefficients l err and to the motion vector or intra prediction mode information in the multiplexer 635, a bit-stream is formed by multiplexing together the various forms of information and the bit-stream thus formed is transmitted to a corresponding decoder 8 (see Fig. 7).
  • the S-frame encoder 3 also comprises local decoding functionality.
  • the quantised prediction transform coefficients I pred formed in quantisation block 650 are supplied to the summing element 640 which also receives the quantisation error coefficients l err ⁇ 5
  • the summing element 640 recombines the quantised prediction transform coefficients l pred and the quantised prediction error coefficients l err to form a set of reconstructed quantised transform coefficients l rec according to the relationship:
  • the reconstructed quantised transform coefficients are passed to inverse quantisation block 642 which inverse quantises the reconstructed quantised transform coefficients to form inverse 5 quantised reconstructed transform coefficients d rec .
  • the inverse quantised reconstructed transform coefficients are further passed to inverse transform block 644 where they are subjected to e.g. an Inverse Discrete Cosine Transform (IDCT), or any other inverse transform corresponding to the transform performed in block 660.
  • IDCT Inverse Discrete Cosine Transform
  • a block of reconstructed pixel values is formed for the image block in question and is stored in frame memory 646.
  • a decoded version of the current frame is progressively assembled in the 5 frame memory from where it can be accessed and used in intra prediction of subsequent blocks of the same frame or in inter (motion compensated) prediction of subsequent frames in the video sequence.
  • the bit-stream generated by the frame encoder previously described in connection with Figure 6 is received by decoder 8 and is 5 demultiplexed into its constituent parts by demultiplexer 710.
  • the decoder reconstructs a decoded version of the SP/SI-frame on a block-by-block basis.
  • an SP/SI-frame comprises SP-coded and Sl-coded image blocks.
  • the information in the received bit-stream comprises VLC encoded motion coefficient information and VLC encoded quantised prediction error coefficients l err -
  • the information in the received bit-stream comprises VLC coded information relating to the intra prediction mode used to form the intra prediction for the Sl-coded block together with VLC coded quantised prediction error coefficients l err -
  • demultiplexer 710 When decoding an SP-coded block, demultiplexer 710 first applies appropriate variable length decoding (VLD) to the received bit-stream to recover the motion vector information and quantised prediction error coefficients l err - It then separates the motion vector information from the quantised prediction error coefficients l err - The motion vector information is supplied to motion compensated prediction block 760 and the quantised prediction error coefficients recovered from the bit- stream are applied to one input of summing element 720. The motion vector information is used in motion compensated prediction block 760 together with pixel values of a previously reconstructed frame held in frame memory 750 to form a prediction P(x,y) in a manner analogous to that employed in the encoder 3.
  • VLD variable length decoding
  • demultiplexer 710 When decoding an Sl-coded block, demultiplexer 710 applies appropriate variable length decoding to the received intra prediction mode information and the quantised prediction error coefficients l err -
  • the intra prediction mode information is then separated from the quantised prediction error coefficients and supplied to intra prediction block 770.
  • the quantised prediction error coefficients l err are supplied to one input of the summing element 720.
  • the intra prediction mode information is used in intra prediction block 770 in conjunction with previously decoded pixel values of the current frame held in frame memory 750 to form a prediction P(x,y) for the current block being decoded.
  • the intra prediction process performed in decoder 8 is analogous to that performed in encoder 3 and previously described.
  • switch 780 is operated so that the prediction P(x,y) which comprises predicted pixel values is supplied to transform block 790.
  • switch 780 is an abstract construction used in the 5 description of the invention, not necessarily a physical device. In the case of an SP-coded block, switch 780 is operated to connect motion compensated prediction block 760 to transform block 790, while in the case of an Sl-coded block it is operated to connect intra prediction block 770 to transform block 790.
  • a forward transform e.g., a Discrete Cosine Transform (DCT)
  • DCT Discrete Cosine Transform
  • the reconstructed quantised transform coefficients l rec are further supplied to inverse quantisation block 730 where they are inverse quantised to form inverse quantised reconstructed transform
  • inverse quantised transform coefficients d rec are then passed to inverse transform block 740 where they are subjected to e.g. an Inverse Discrete Cosine Transform (IDCT), or any other inverse transform corresponding to the transform performed in block 790.
  • IDCT Inverse Discrete Cosine Transform
  • the reconstructed pixel values are supplied to the video output and to frame memory 750.
  • a decoded version of the current frame is progressively assembled in frame memory 750 from where it can be
  • the request for the change of the bit stream transmission properties may also be originated by other parts of the transmission system.
  • the receiver may request the transmitting server to change the parameters for some reason. This request is delivered to the transmitting server e.g. via the transmission network.
  • Bit stream switching is not the only application in which the present invention can be applied. If one of the bit streams has a lower temporal resolution, e.g. 1 frame/sec, this bit stream can be used to provide fast-forward functionality. Specifically, decoding from the bit stream with a lower temporal resolution and then switching to the bit stream with a normal frame rate would provide such functionality.
  • Fig. 8 depicts two bit streams the second of which comprises only S-frames predicted from each other at intervals greater than the frame repetition interval of the first bit-stream. Furthermore, "Fast Forward" can start and stop at any location in the bit-stream. In the following, some other applications of the present invention are described.
  • bit stream-switching example discussed earlier considered bit streams belonging to the same sequence of images. However, this is not necessarily the case in all situations where bit stream switching is needed. Examples include: switching between bit streams arriving from different cameras capturing the same event but from different perspectives, or cameras placed around a building for surveillance; switching to local/national programming or insertion of commercials in a television broadcast, video bridging, etc.
  • the general term for the process of concatenating encoded bit streams is splicing.
  • the invention described above provides an adaptive error resilience tool using SP/SI coding mode as well as a bitstream switching scheme. It is obvious that the present invention is not limited to the above described embodiments but it can be modified within the scope of the appended claims. For example, more than one group of blocks of the SP-frames can be replaced with Sl-encoded macroblocks.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

La présente invention concerne un procédé d'envoi d'informations vidéo (fig.5) dans lequel un train de bits (510, 520) est formé, ce train de bits comprenant un ensemble de séquences (512,513,515,516,518 et 522,523,525 et 528) constituées de macroblocs. Au moins une séquence de commutation (524) est formée dans le train de bits et les macroblocs de la séquence de commutation sont arrangés dans des premier et deuxième groupes de macroblocs, chaque macrobloc du premier groupe étant codé au moyen d'une première méthode de codage (fig. 5, intra) pour former un point de commutation servant à poursuivre l'envoi des informations vidéo avec un autre train de bits formé à partir des informations vidéo.
EP04713602A 2004-02-23 2004-02-23 Envoi d'informations video Withdrawn EP1719343A4 (fr)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/IB2004/000454 WO2005091632A1 (fr) 2004-02-23 2004-02-23 Envoi d'informations video

Publications (2)

Publication Number Publication Date
EP1719343A1 true EP1719343A1 (fr) 2006-11-08
EP1719343A4 EP1719343A4 (fr) 2011-10-26

Family

ID=34994077

Family Applications (1)

Application Number Title Priority Date Filing Date
EP04713602A Withdrawn EP1719343A4 (fr) 2004-02-23 2004-02-23 Envoi d'informations video

Country Status (3)

Country Link
EP (1) EP1719343A4 (fr)
CN (1) CN1926862A (fr)
WO (1) WO2005091632A1 (fr)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101677400B (zh) * 2008-09-19 2012-08-15 华为技术有限公司 编码、解码方法和编码器、解码器及编解码系统
EP2643977B1 (fr) * 2010-12-29 2019-05-15 Skype Procédé et appareil pour traiter un signal vidéo
CN111479114B (zh) * 2019-01-23 2022-07-22 华为技术有限公司 点云的编解码方法及装置
US20210136378A1 (en) * 2020-12-14 2021-05-06 Intel Corporation Adaptive quality boosting for low latency video coding
CN112911295A (zh) * 2021-04-16 2021-06-04 北京杰瑞创通科技有限公司 自适应动态抗网络丢包智能信源编码装置及方法
CN114513658B (zh) * 2022-01-04 2024-04-02 聚好看科技股份有限公司 一种视频加载方法、装置、设备及介质
CN116248895B (zh) * 2023-05-06 2023-07-21 上海扬谷网络科技有限公司 虚拟现实全景漫游的视频云转码方法及系统

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002054776A1 (fr) * 2001-01-03 2002-07-11 Nokia Corporation Commutation entre des trains de bits en transmission video

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6611624B1 (en) * 1998-03-13 2003-08-26 Cisco Systems, Inc. System and method for frame accurate splicing of compressed bitstreams
US6996173B2 (en) * 2002-01-25 2006-02-07 Microsoft Corporation Seamless switching of scalable video bitstreams
CN101232618B (zh) * 2002-04-23 2013-03-27 诺基亚有限公司 用于在视频编码系统中指示量化器参数的方法与设备

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002054776A1 (fr) * 2001-01-03 2002-07-11 Nokia Corporation Commutation entre des trains de bits en transmission video

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KARCZEWICZ M ET AL: "The SP- and SI-frames design for H.264/AVC", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 13, no. 7, 1 July 2003 (2003-07-01), pages 637-644, XP011099256, ISSN: 1051-8215, DOI: 10.1109/TCSVT.2003.814969 *
See also references of WO2005091632A1 *

Also Published As

Publication number Publication date
EP1719343A4 (fr) 2011-10-26
WO2005091632A1 (fr) 2005-09-29
CN1926862A (zh) 2007-03-07

Similar Documents

Publication Publication Date Title
US7693220B2 (en) Transmission of video information
JP4109113B2 (ja) ビデオ伝送におけるビットストリーム間の切換
US7706447B2 (en) Switching between bit-streams in video transmission
US6765963B2 (en) Video decoder architecture and method for using same
Karczewicz et al. The SP-and SI-frames design for H. 264/AVC
KR100960282B1 (ko) 비디오 부호화
KR100495820B1 (ko) 비디오 코딩
KR100929558B1 (ko) 비디오 부호화 방법, 복호화 방법, 부호화기, 복호기, 무선 통신 장치 및 멀티미디어 터미널 장치
KR100878057B1 (ko) 비디오 부호화
US20070009039A1 (en) Video encoding and decoding methods and apparatuses
US6744924B1 (en) Error concealment in a video signal
US20060233235A1 (en) Video encoding/decoding apparatus and method capable of minimizing random access delay
EP1719343A1 (fr) Envoi d'informations video
Aho et al. Error resilience techniques for MPEG-2 compressed video signal
KR100626419B1 (ko) 비디오 전송에서 비트 스트림들간의 교환
Tian et al. Improved H. 264/AVC video broadcast/multicast
Yang et al. Error resilient GOP structures on video streaming
Aladrovic et al. An error resilience scheme for layered video coding
Tian et al. Error resilient video coding techniques using spare pictures
KR100669621B1 (ko) 동영상 디코더의 참조 영상 변경 제어방법
Cai et al. Joint mode selection and unequal error protection for bitplane coded video transmission over wireless channels

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20060824

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PT RO SE SI SK TR

DAX Request for extension of the european patent (deleted)
A4 Supplementary search report drawn up and despatched

Effective date: 20110928

RIC1 Information provided on ipc code assigned before grant

Ipc: H04N 7/26 20060101ALI20110922BHEP

Ipc: H04N 7/24 20110101AFI20110922BHEP

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20110901