WO2013141872A1 - Method and system to process a video frame using prior processing decisions - Google Patents

Method and system to process a video frame using prior processing decisions Download PDF

Info

Publication number
WO2013141872A1
WO2013141872A1 PCT/US2012/030219 US2012030219W WO2013141872A1 WO 2013141872 A1 WO2013141872 A1 WO 2013141872A1 US 2012030219 W US2012030219 W US 2012030219W WO 2013141872 A1 WO2013141872 A1 WO 2013141872A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
video
quality measure
information
decision
Prior art date
Application number
PCT/US2012/030219
Other languages
French (fr)
Inventor
Mina Makar
Wai-Tian Tan
Original Assignee
Hewlett-Packard Development Company, L.P.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett-Packard Development Company, L.P. filed Critical Hewlett-Packard Development Company, L.P.
Priority to PCT/US2012/030219 priority Critical patent/WO2013141872A1/en
Publication of WO2013141872A1 publication Critical patent/WO2013141872A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/454Content or additional data filtering, e.g. blocking advertisements
    • H04N21/4542Blocking scenes or portions of the received content, e.g. censoring scenes

Definitions

  • the present disclosure generally relates to processing video information. More particularly, the present disclosure provides a method and system for processing video information, frame by frame, using at least one prior processing decision of an earlier video frame.
  • the video frame is processed by adapting a decision threshold on whether to hide loss-impaired pictures based on a history of past decisions of earlier video frames.
  • a decision threshold on whether to hide loss-impaired pictures based on a history of past decisions of earlier video frames.
  • Digital video has become widespread and common.
  • An example of digital video products include The PICS Animation Compiler from The Company of Science & Art in Buffalo, l, digital MPEG I , advancing versions of MPEG, QuickTime from Apple Computer,H.264 and VP8
  • Digital streaming also includes editing or correction features, that is, the two most common approaches for streaming are to (i ) simply hide all frames that are loss- impaired regardless of quality, or (2) simply show all frames that are loss-impaired regardless of quality.
  • Each of these approaches is limiting, since there are usually many impaired pictures that are acceptably good, and always a few that are bad enough to be objectionable.
  • Figure 1 is a simplified diagram of a system for processing video information according to an example of the present disclosure
  • Figure 2 is a simplified flow diagram of a processing a stream of video information according to an example of the present disclosure:
  • FIG. 3 is a simplified diagram of a video processing system according to an example of the present disclosure:
  • Figure 4 is a simplified diagram illustrating various relationships in equations according to examples of the present disclosure:
  • Figures 5 through 10 are simplified diagrams illustrating processing information for examples of the present method and system.
  • the present disclosure provides a method and system for processing video information, frame by frame, using at least one prior processing decision of an earlier video frame.
  • the video frame is processed by adapting a decision threshold on whether to freeze loss-impaired pictures based on a history of past decisions of earlier video frames.
  • a decision threshold on whether to freeze loss-impaired pictures based on a history of past decisions of earlier video frames.
  • video picture reezes and breakups are two notorious yet distinct artifacts of streaming video.
  • a task for a receiver is to decide whether an impaired picture should be displayed.
  • Many impaired pictures are preferable to freezes, yet some may be objectionable. It is therefore desirable to have methods to distinguish the acceptable pictures from the objectionable ones.
  • FIG. 1 is a simplified diagram of a system for processing video information according to an example of the present disclosure.
  • the system 100 includes a decoder module 1 10, quality module 120. processing module 1 0. compare module 140, and storage device 150. which tracks prior decisions.
  • Streaming video 101 enters into the decoder module, which converts the compressed video stream into raw video 1 1 S.
  • the quality module processes each frame. Determines a quality measure and decides whether to show or hide the frame based upon at least one decision based upon a past frame within the sequence of streaming video.
  • the system includes the qual ity measure module, which is configured to produce a quality measure Qi for each frame i.
  • the quality measure can be one of a reduced or no reference methods or a binary indicator of whether a frame is loss impaired, among other characteristics.
  • the system includes the compare module coupled to the quality measure module.
  • the compare module is configured to use the quality measure Qi to a threshold information Ti. frame by frame, and configured to decide whether to show a frame if Qi > Ti, or hide the frame.
  • the decision is Fi and Ti is adjusted based on a past decision F(i- I ), F(i-2), .. F(i-N).
  • the threshold Ti is a function of measurable information.
  • the measurable information is at least one of a number of frame freeze episodes within a past ten second interval or a duration of each of the freeze episodes.
  • the decision whether to hide the frame is provided by freezing an earlier frame.
  • modi Heat ions and alternatives.
  • a method according to an example can be outlined below.
  • Stream first video information (i.e.. compressed) comprising a plurality of video frames.
  • the video information may include at least a portion of a video frame: [0018] 3.
  • the above sequence of steps relates to a call tracking method according to this example of the present disclosure.
  • the video processing technique which is provided within the apparatus, uses prior quality measure information and decisions for processing a current or present video frame.
  • the decision is often to hide or show the frame, which may entail freezing an earlier frame.
  • certain steps may be added, remov ed, or modified. Further details of the present method can be found by way of the examples below.
  • Figure 2 is a simplified flow diagram of a processing a stream of video information according to an example of the present disclosure. This diagram is merely an example, which should not unduly limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. As shown, the present method begins with start, although there may be variations.
  • the method transmits a stream of first video information (i.e.. compressed) comprising a plurality of video frames.
  • the stream of video information can be in various formats such as MPEG 1/2/4, H.264, and VP8, among others.
  • the video information may include at least a portion of a video frame or an entire frame.
  • the method processes the streaming video information, using a decoder device, frame by frame.
  • the decoder device decompresses the compressed video into a raw video form, which comprises a plurality of frames.
  • the decoder device can be MPEG 1/2 4, H.264, or VPS, among others.
  • the method determines a quality measure for the portion of video frame or an entire video frame.
  • the quality measure can be similarity between reduced-reference representations of sent and received frames.
  • the method determines the reduced-reference representation by a variety of techniques such as Resize-PSNR. and Select-DCT. that will be introduced later.
  • the quality in forma I ion is stored in memory or in a database.
  • the method determines a display decision for the portion of video frame using a function of the quality measure and a stored information associated with one or more quality measures of prior video frames.
  • the function of the quality measure can a perception model such as History-based Perception model that will be introduced later.
  • the quality measure is obtained using a reduced reference method.
  • the reference method can be Resize-PSNR or Select-DCT.
  • the display decision is a binary decision to show or hide the particular portion of a video frame.
  • the method stores at least the display decision of the portion of video frame or the full frame.
  • the method can also store other information such as the quality measure or other information that is desirable.
  • the method outputs the portion of video frame according to the display decision and displays the streaming video with improved video quality.
  • the above sequence of steps relates to a call tracking method according to this example of the present disclosure.
  • the video processing technique which is provided within the apparatus, uses prior quality measure information and decisions for processing a current or present video frame.
  • the decision is often to hide or show the frame, which may entail freezing an earlier frame.
  • certain steps may be added, removed, or modified. Further details of the present method can be found by way of the examples below.
  • video frame samples included two error-concealed pictures that have been compressed H.264 video subjected to loss. It is visually obvious that a first picture suffered from significant breakup and probably should not be shown. In contrast, a second picture only suffered minor visual distortion, and should be preferentially displayed. The first picture, with PSNR of 28.07 dB compared to loss-free transmitted picture, was more objectionable than the second picture with a lower PSNR of 25.40 dB.
  • One possible way to establish preference for the second picture is to employ a shift- invariant metric. This is because an acceptable concealed picture typically differs from its loss-free counter-part via minor perturbations that can be approximated by small shifts. We next define a shift-invariant metric to serve as target for evaluating the effectiveness of our subsequent reduced-reference hints.
  • PSNR quality of service
  • One way to reduce such sensitivity is through picture downscaling. Specifically, instead of PSNR between S and R, we seek to determine the PSNR between their respective downscaled versions s and r. While a receiver can compute r, sending s as hint represents an impractical overhead. Instead, we employ the technique of [ 1 ] as follows to achieve low- rate computation of PSNR between r and s.
  • the method uses the relationship in Equation 3 where J 240+ (x) is a vector obtained by first performing a pixel- wise multiplication of image x with a pseudo-random sequence of ⁇ 1. followed by a Walsh-l-ladamard transform (WHT), then pixel- wise multiplication with a second pseudo-random sequence of ⁇ 1 , followed by an inverse WHT. and then sampling.
  • T is method in [ I ] is denoted by J 240+ since it is an extension of the methods in ITU J.240 to handle localized errors due to losses.
  • Equation 6 We propose to change the threshold T for each video frame based on the total quality degradation as in Equation 6. where the summation is over all freeze episodes, each with duration ii. in the last 10 seconds, and c I and c2 are positive constants where cl represents the threshold used to judge each frame independently and c2 controls how much we increase the threshold to freeze fewer frames in case of burst errors.
  • c l and c2 are empirically chosen to be 10 and I , respectively.
  • Equations S and 6 the History-based Perception model.
  • Conference with distinctly different characteristics. Shields exhibits slow and predominantly panning motion, yielding concealment artifacts that fit our shift-invariant assumption. In contrast. Conference contains stationary background and human subjects with complex motion. This yields many concealment artifacts that differ from small shifts. Conference is cropped from I080p source from the Federal University of Rio de Janeiro. Both sequences contain 300 frames, and are repeated in a loop for SO times to obtain results in this section.
  • Figure 5 shows a segment of the PSNR trace when Shields, encoded using H.264. is subjected to simulated packet loss ratio of 0.5% with average burst length of 3, and a round-trip time (RTT) of 200 ms.
  • RTT round-trip time
  • PSNR trace for Shields shows extreme preference reversal between pictures A and B by PSNR and our target shift- invariant metric MC-PSNR. Losses are generally corrected within one RTT using reference picture selection. All loss impaired pictures are marked in cyan, with the pictures A and B having the highest and lowest PSNR. respectively. This means freezing decisions made using PSNR would likely display A but not B.
  • Precision of a hint is the percentage of pictures in its bad set that is also in the bad set of MC-PSNR. and recall is the detection percentage of the bad set of MC-PSNR.
  • the temporal threshold adjustment is not applied, since we are only interested in how well the different hints approximate our target MC-PSNR metric.
  • Figure 10 presents the results of the adaptive temporal adjustment of threshold T for frequent errors.
  • an adaptive temporal adjustment of T (a) PSNR trace, (b) adaptive threshold values, and (c) frame freezing decisions with constant and adaptive thresholds (high: freeze, low: display).
  • Figure 10 shows a PSNR trace for Conference with loss-impaired pictures marked in bold. The temporal evolution in T is shown where we see that raises in threshold last at least 10 seconds.
  • shift- invariant metric such as MC-PSNR can be approximated in a reduced reference framework by resizing the picture or dropping DCT coefficients.
  • these methods produce superior precision and recall compared to PSNR based methods.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

Irrecoverable data loss may be unavoidable for real-time video communication over common best-effort networks. Rather than always display impaired pictures or always "freeze" the last good picture, the method and system transmits additional hints to support selective freezing of heavily damaged pictures.

Description

METHOD AND SYSTEM TO PROCESS A VIDEO FRAME USING PRIOR PROCESSING DECISIONS
BACKGROUND OF THE DISCLOSURE
[0001] The present disclosure generally relates to processing video information. More particularly, the present disclosure provides a method and system for processing video information, frame by frame, using at least one prior processing decision of an earlier video frame. Merely by way of example, the video frame is processed by adapting a decision threshold on whether to hide loss-impaired pictures based on a history of past decisions of earlier video frames. Of course, there can be other variations, modifications, and alternatives.
[0002 ] Digital video has become widespread and common. An example of digital video products include The PICS Animation Compiler from The Company of Science & Art in Providence, l, digital MPEG I , advancing versions of MPEG, QuickTime from Apple Computer,H.264 and VP8 Digital streaming also includes editing or correction features, that is, the two most common approaches for streaming are to (i ) simply hide all frames that are loss- impaired regardless of quality, or (2) simply show all frames that are loss-impaired regardless of quality. Each of these approaches is limiting, since there are usually many impaired pictures that are acceptably good, and always a few that are bad enough to be objectionable.
[ 0003] Although highly successful, it is still desirable to improve digital video techniques.
BRIEF DESCRI PTION OF THE DRAWINGS
[0004 ] Figure 1 is a simplified diagram of a system for processing video information according to an example of the present disclosure;
[0005 ] Figure 2 is a simplified flow diagram of a processing a stream of video information according to an example of the present disclosure:
[0006] Figure 3 is a simplified diagram of a video processing system according to an example of the present disclosure:
[0007 ] Figure 4 is a simplified diagram illustrating various relationships in equations according to examples of the present disclosure:
[0008 ] Figures 5 through 10 are simplified diagrams illustrating processing information for examples of the present method and system.
DETAILED DESCRIPTION OF SPECI FIC EM BODIM ENTS OF THE
DISCLOSURE
[0009] According to the present disclosure, techniques general ly related to processing video information are provided. More particularly, the present disclosure provides a method and system for processing video information, frame by frame, using at least one prior processing decision of an earlier video frame. Merely by way of example, the video frame is processed by adapting a decision threshold on whether to freeze loss-impaired pictures based on a history of past decisions of earlier video frames. Of course, there can be other variations, modifications, and alternatives.
[0010 ] As an example, video picture reezes and breakups are two notorious yet distinct artifacts of streaming video. After loss recovery and concealment have been performed, a task for a receiver is to decide whether an impaired picture should be displayed. Many impaired pictures are preferable to freezes, yet some may be objectionable. It is therefore desirable to have methods to distinguish the acceptable pictures from the objectionable ones.
[0 0 1 1 ] Determining visual quality without other hints is challenging according to this example. In contrast, there exist reduced-reference methods that use low-rate hints instead of source pictures to determine received video quality. In particular. [ I ] proposed a low-rate method based on spread- spectrum ideas that can accurately approximate the Peak Signal-to-Noise Ratio (PSNR) for video degraded with both compression and losses. Parts of the techniques in [l ] are adopted into ITU J.240 standard [2] for video quality monitoring. Nevertheless, il is well known that PSNR does not accurately capture visual quality. In particular, even well error-concealed pictures that are otherwise visually pleasing may contain slight perturbations that yield low PSNR. especially when contrast is large. Therefore, it is essential to develop low-rate hints that are invariant or insensitive to small perturbations or shifts to properly guide picture freezing decisions.
[0012] Shift-invariant metrics have been studied [3]. In particular, the SSIM scheme admits both a shift-invariant extension [3] and a reduced-reference implementation [4]. Nevertheless, as we will discuss later, this scheme is designed for spread errors and fails to recognize localized errors common in error-concealed pictures. Further details of the present method and system can be found throughout the present specification and more particularly below.
[0013] Figure 1 is a simplified diagram of a system for processing video information according to an example of the present disclosure. As shown, the system 100 includes a decoder module 1 10, quality module 120. processing module 1 0. compare module 140, and storage device 150. which tracks prior decisions. Streaming video 101 enters into the decoder module, which converts the compressed video stream into raw video 1 1 S. On a frame-by- frame basis, the quality module processes each frame. Determines a quality measure and decides whether to show or hide the frame based upon at least one decision based upon a past frame within the sequence of streaming video. As an example, the system includes the qual ity measure module, which is configured to produce a quality measure Qi for each frame i. The quality measure can be one of a reduced or no reference methods or a binary indicator of whether a frame is loss impaired, among other characteristics.
[ 0014 ] The system includes the compare module coupled to the quality measure module. The compare module is configured to use the quality measure Qi to a threshold information Ti. frame by frame, and configured to decide whether to show a frame if Qi > Ti, or hide the frame. The decision is Fi and Ti is adjusted based on a past decision F(i- I ), F(i-2), .. F(i-N). In an example, the threshold Ti is a function of measurable information. The measurable information is at least one of a number of frame freeze episodes within a past ten second interval or a duration of each of the freeze episodes. The decision whether to hide the frame is provided by freezing an earlier frame. Of course, there can be other variations, modi Heat ions, and alternatives.
[0015 ] A method according to an example can be outlined below.
[0016] I . Start;
[0017 ] 2. Stream first video information (i.e.. compressed) comprising a plurality of video frames. The video information may include at least a portion of a video frame: [0018] 3. Process the streaming video information, using a decoder device, frame by frame;
[0019] 4. Determine a quality measure for the portion of video frame:
[0020] 5. Store information associated with the quality measure:
[0021 ] 6. Determine a display decision for the portion of video frame using a function of the quality measure and a stored information associated with one or more quality measures of prior video frames:
[ 0022 ] 7. Store at least the display decision of the portion of video frame:
[0023 ] 8. Output the portion of video frame according to the display decision:
[ 0024 ] 9. Display the streaming video with improved video quality: and [0025 ] 10. Perform other steps, as desired.
[ 0026] The above sequence of steps relates to a call tracking method according to this example of the present disclosure. As shown, the video processing technique, which is provided within the apparatus, uses prior quality measure information and decisions for processing a current or present video frame. The decision is often to hide or show the frame, which may entail freezing an earlier frame. Depending upon the example, certain steps may be added, remov ed, or modified. Further details of the present method can be found by way of the examples below.
[0027 ] Figure 2 is a simplified flow diagram of a processing a stream of video information according to an example of the present disclosure. This diagram is merely an example, which should not unduly limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. As shown, the present method begins with start, although there may be variations.
[0028 ] In this example, the method transmits a stream of first video information (i.e.. compressed) comprising a plurality of video frames. The stream of video information can be in various formats such as MPEG 1/2/4, H.264, and VP8, among others. In this example, the video information may include at least a portion of a video frame or an entire frame.
[0029] The method processes the streaming video information, using a decoder device, frame by frame. The decoder device decompresses the compressed video into a raw video form, which comprises a plurality of frames. In this example, the decoder device can be MPEG 1/2 4, H.264, or VPS, among others.
[0030] In this example, the method determines a quality measure for the portion of video frame or an entire video frame. The quality measure can be similarity between reduced-reference representations of sent and received frames. The method determines the reduced-reference representation by a variety of techniques such as Resize-PSNR. and Select-DCT. that will be introduced later. In this example, the quality in forma I ion is stored in memory or in a database.
[0031 ] The method determines a display decision for the portion of video frame using a function of the quality measure and a stored information associated with one or more quality measures of prior video frames. The function of the quality measure can a perception model such as History-based Perception model that will be introduced later. Alternatively, the quality measure is obtained using a reduced reference method. The reference method can be Resize-PSNR or Select-DCT. As an example, the display decision is a binary decision to show or hide the particular portion of a video frame.
[0032] In this example, the method stores at least the display decision of the portion of video frame or the full frame. The method can also store other information such as the quality measure or other information that is desirable.
[0033] The method outputs the portion of video frame according to the display decision and displays the streaming video with improved video quality. Of course, other steps can be performed. The above sequence of steps relates to a call tracking method according to this example of the present disclosure. As shown, the video processing technique, which is provided within the apparatus, uses prior quality measure information and decisions for processing a current or present video frame. The decision is often to hide or show the frame, which may entail freezing an earlier frame. Depending upon the example, certain steps may be added, removed, or modified. Further details of the present method can be found by way of the examples below.
[0034 ] It should be understood that the description recited above is an example of the disclosure and that modifications and changes to the examples may be undertaken which are within the scope of the claimed disclosure. Therefore, the scope of the appended claims should be accorded the broadest interpretation so as to encompass all such modifications and similar arrangements, including a full scope of equivalents.
[0035 ] Examples:
[0036] To prove the principles and operation of the present method and system, certain experiments and simulations were performed. These experiments were merely examples, which should not unduly limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. As will be explained, irrecoverable data loss may be unavoidable for real-time video communication over networks. Rather than always display impaired pictures or always "freeze" the last good picture, it is desirable to transmit additional hints to support selective hiding of heavily damaged pictures only. In particular, errors in impaired pictures tend to be localized, and often manifest themselves as small spatial shifts that are visually preferable to freezes. Therefore, it is desirable that the hints can identify localized error and do not penalize small shifts. We show two ways such "shift-invariant" hints for detecting concealment error can be constructed to less than 1% of v ideo rate. Experiments using 720p sequences achieve recall and precision of 90% and 75%, respectively, with respect to a shift- invariant PSNR measure. We also present an adaptive decision rule to obtain shorter and less frequent freezes. In this disclosure, we present method and system that we develop and compare two shift-invariant hints that work well for localized errors, and characterize their performance against PSNR.
[0037] In this example, video frame samples included two error-concealed pictures that have been compressed H.264 video subjected to loss. It is visually obvious that a first picture suffered from significant breakup and probably should not be shown. In contrast, a second picture only suffered minor visual distortion, and should be preferentially displayed. The first picture, with PSNR of 28.07 dB compared to loss-free transmitted picture, was more objectionable than the second picture with a lower PSNR of 25.40 dB. One possible way to establish preference for the second picture is to employ a shift- invariant metric. This is because an acceptable concealed picture typically differs from its loss-free counter-part via minor perturbations that can be approximated by small shifts. We next define a shift-invariant metric to serve as target for evaluating the effectiveness of our subsequent reduced-reference hints.
[0038] There are many possible ways to compute distortion between a received picture R and sent picture S that exhibit some degree of shift- invariance. We adopt a direct approach in this paper. Instead of computing PSNR between R and S. our chosen target metric achieves shift-in variance by performing motion compensation (MC) over a small range, then compute the PSNR between R and MC(S). We call this the motion compensated PSNR (MC-PSNR). In this example. MC-PSNR succeeds in capturing human visual preference where PSNR fails, with a MC-PSNR of 30.17 dB and 34.67 dB for the first and second pictures, respectively. In the results section, MC-PSNR is computed using 16 * 16 blocks with a search range of 4 or about 0.5% picture height for employed 720p content.
[0039] We choose this metric over the shift-invariant extension of SSIM [3] due to the letter's inability to handle localized errors. For example, with a sizable impaired region that is 10% picture width and height, 99% of the image would have no distortion, giving a score over 0.99 out of 1 , regardless of how distorted that region is. In contrast, squared- eiTor based measures such as MC-PSNR can better represent large errors with small spatial support. Nevertheless, the purpose of the target metric is for benchmarking only. Other choices may be equally appropriate.
[0040 ] Given a sent picture S and a received picture R, our goal is to design a hint h(S ) whose compressed version h" (S ) can be used in place of S at the receiver for estimating a shift-invariant distortion D. A picture is displayed if the computed distortion D is smaller than a threshold T, referring to Equation I of Figure 4. In practice, it is common to compute h(R), and determine D simply as the mean square error between h*(S) and h(R). Thus, we have: display if, Equation 2. which is the approach we adopt in this disclosure. The procedures are outlined in Figure 3. We next describe two variants of h that can be used with ( I ) to achieve shifl -ίη variance.
[0041] A well-established but shift-sensitive quality metric is PSNR. One way to reduce such sensitivity is through picture downscaling. Specifically, instead of PSNR between S and R, we seek to determine the PSNR between their respective downscaled versions s and r. While a receiver can compute r, sending s as hint represents an impractical overhead. Instead, we employ the technique of [ 1 ] as follows to achieve low- rate computation of PSNR between r and s.
[0042] In this example, the method uses the relationship in Equation 3 where J 240+ (x) is a vector obtained by first performing a pixel- wise multiplication of image x with a pseudo-random sequence of ± 1. followed by a Walsh-l-ladamard transform (WHT), then pixel- wise multiplication with a second pseudo-random sequence of ± 1 , followed by an inverse WHT. and then sampling. T is method in [ I ] is denoted by J 240+ since it is an extension of the methods in ITU J.240 to handle localized errors due to losses.
[0043] For 720p pictures in the results section, s and r arc downscaled by 8 in both dimensions from S and R, respectively, and the hint employs SO coefficients. We refer to this scheme as ResizePSNR. For comparison purposes, we also employ the similar scheme FullPSNR but without resizing, i.e.. h(S ) = J 240+ (S ).
[0044 ] Another method to realize shift invariance is to explicitly discard high frequency components of the signals as in Equation 4, see Figure 4, where DCT is the 2- D discrete cosine transform, and Select retains a small set of low-frequency coefficients. In the results section, 80 low frequency DCT coefficients are used as hint, and this scheme is denoted by SelectOCT. Both ResizePSNR and SelectDCT perform low-pass filtering. The key difference is that ResizePSNR capture more high frequencies but with less precision due to sampling.
[0045 ] So far, we have discussed how distortions can be computed for each frame independently. Nevertheless, we know that freezing 20 consecutive pictures is disruptive to viewing while freezing the same number of pictures every other frame yields acceptable quality. Clearly e should adapt the threshold in ( 1 ) based on past frame freezing decisions.
[0046] It has been shown in [5] that viewer mean opinion score ( M OS) can be accurately modeled by the number of freeze episodes and their durations in the last 10 seconds. Following their findings, we simplify their proposed model to avoid per- sequence training for fitting MOS. For a picture freeze of duration τ . we compute the degradation е(т ) as in Equation 5.
[00 7] We propose to change the threshold T for each video frame based on the total quality degradation as in Equation 6. where the summation is over all freeze episodes, each with duration ii. in the last 10 seconds, and c I and c2 are positive constants where cl represents the threshold used to judge each frame independently and c2 controls how much we increase the threshold to freeze fewer frames in case of burst errors. In the results section, c l and c2 are empirically chosen to be 10 and I , respectively. We call the rule given by Equations S and 6 the History-based Perception model.
[0048] We next present results for two 30 fps, 720p sequences, Shields and
Conference, with distinctly different characteristics. Shields exhibits slow and predominantly panning motion, yielding concealment artifacts that fit our shift-invariant assumption. In contrast. Conference contains stationary background and human subjects with complex motion. This yields many concealment artifacts that differ from small shifts. Conference is cropped from I080p source from the Federal University of Rio de Janeiro. Both sequences contain 300 frames, and are repeated in a loop for SO times to obtain results in this section.
[0049] Figure 5 shows a segment of the PSNR trace when Shields, encoded using H.264. is subjected to simulated packet loss ratio of 0.5% with average burst length of 3, and a round-trip time (RTT) of 200 ms. As shown. PSNR trace for Shields shows extreme preference reversal between pictures A and B by PSNR and our target shift- invariant metric MC-PSNR. Losses are generally corrected within one RTT using reference picture selection. All loss impaired pictures are marked in cyan, with the pictures A and B having the highest and lowest PSNR. respectively. This means freezing decisions made using PSNR would likely display A but not B. The freezing decisions achieved using our shin-invariant target measure are marked, where B is displayed but not A, indicating that extreme preference reversal is possible. In other words, using PSNR to guide freezing decisions is likely to unnecessarily omit good pictures while inadvertently display pictures with breakup.
[0050 ] We next characterize the decision quality of the proposed shift- invariant hints RcsizePSNR and SelectDCT with respect to our target shift-invariant metric MC-PSNR that has access to the sent picture. Using simulation with the same loss and delay as before, we first examine all loss-impaired pictures to find the set of bad pictures that should not be shown according to our target MC-PSNR metric. There is no optimal bad set since it depends on various factors such as viewer preference and viewing distance. Instead, for the sake of comparison, we choose the worst 10% as the bad set. The bad set of MC-PSNR is generally different from the bad set of other metrics. We then examine the precision and recall achieved by different hints as freeze decision threshold in ( I ) is varied. Precision of a hint is the percentage of pictures in its bad set that is also in the bad set of MC-PSNR. and recall is the detection percentage of the bad set of MC-PSNR. The temporal threshold adjustment is not applied, since we are only interested in how well the different hints approximate our target MC-PSNR metric.
[0051 ] The results for Shields are shown in Figure 6. Precision and recall for the frame freezing decisions have been generated by various hints with respect to our target metric MC-PSNR. We see that ResizcPSNR performs marginally better than SelectDCT. with both shin-invariant hints significantly outperforming FullPSNR. which estimates full frame PSNR. Since imperfections in the loss-impaired pictures of Shields are dominated by small shifts, this shows that both shift-invariant hints are effective in approximating MC-PSNR while PSNR based metric fails. Specifically, at 90% recall, i.e.. when we are willing to accept 10% un-detection rate of bad pictures. FullPSNR has precision of 36%. meaning for every bad frame detected, 1.77 good frames is inadvertently misclassified. In contrast, the corresponding misclassification rate for RcsizePSNR is 0.32 frame, a reduction of 82%. [0052 ] For Conference, we can see thai ResizePSNR again outperforms SelectDCT, indicating that the latter is too aggressive in discount- ing higher frequencies. More interestingly, unlike in Shields, FullPSNR which estimates full frame PSNR outperforms both ResizePSNR and SelectDCT at higher recalls above 85%. This is explained by the complex motions in Conference, which yields imperfections that are picture breakups rather than shifts in the worst impaired pictures. For those worst pictures, motion compensation is unlikely to help and MC-PSNR is essentially simple PSNR. In contrast, by deemphasizing higher frequencies, both ResizePSNR and SelectDCT introduce more deviation from MC-PSNR.
[0053] Our shift-invariant hints yield good agreement with MC-PSNR for translation motions, but PSNR yields better agreement under complex motions. It is natural to seek hybrid hints that combine shift-invariance and full frame PSNR. There are many possible hybrids, and one simple choice is given by CombinePSNR, which spends half its hint bit- budget on a ResizePSNR hint, and the other half on a FullPSNR hint. The resulting estimated MSE for CombinePSNR is simply taken as the geometric mean of the two noisier MSE according to ResizePSNR and FullPSNR. The results are shown in Figure 7. For Shields, its performance remains close to ResizePSNR. but generally lies between ResizePSNR and FullPSNR. For Conference however. CombinePSNR significantly outperforms both ResizePSNR and FullPSNR. This suggests that a shift-invariant component in the hint can significantly improve agreement with target MC-PSNR even for content without significant panning.
[0054 ] It is perhaps surprising that the hybrid scheme CombinePSNR outperforms both its constituents ResizePSNR and FullPSNR for Conference. This phenomenon is best explained using Figure 6. where the PSNR between the sent and loss-impaired received pictures are shown sorted in descending MC-PSNR for various hints. A hint in perfect agreement with MC-PSNR would rank the loss-impaired pictures in the exact same order, yielding a monotonically decreasing curve. We already explained why FullPSNR is in agreement with MC-PSNR for the worst pictures in Conference. This is shown by the near monotonic behavior for the FullPSNR curve in Figure 9 beyond rank 1400.
Nevertheless, the significant variation between the ranks of 600 and 1400 causes general inability to distinguish the good Prom the bad. Variations in the same range are also present lor the ResizePSNR curve but at a much lower degree. More importantly, the variations in ResizePSNR and FullPSNR are likely to be independent. Since MSE of CombinePSNR is formed by the geometric mean of its constituents' MSE. its PNSR is their average PSNR, which will show smaller variation. As a result, the CombinePSNR curve is more monotonic than either ResizePSNR or FullPSNR. The corresponding results for Shields are shown in Figure 8.
[0055 ] Figure 10 presents the results of the adaptive temporal adjustment of threshold T for frequent errors. As shown is an adaptive temporal adjustment of T: (a) PSNR trace, (b) adaptive threshold values, and (c) frame freezing decisions with constant and adaptive thresholds (high: freeze, low: display). Figure 10 shows a PSNR trace for Conference with loss-impaired pictures marked in bold. The temporal evolution in T is shown where we see that raises in threshold last at least 10 seconds. The frame freezing decisions of ResizePSNR using constant threshold (by setting c2 = 0 in (5)) and adaptive threshold are shown in Figure 10. We see that the use of adaptive threshold successfully suppresses close cluster of freeze episodes and reduces the duration of some freeze episodes.
[0056] Finally, e discuss how negligible bit-rate overhead of 1% is achieved for sending the hints. We encode the video sequences at 2 Mbps. 1% of which is 667 bits per frame. With 80 coefficients per frame, it suffices to quantize each coefficient to 8 bits without further entropy coding. In this paper, we use only 7 bits per coefficient to leave room for headers for J 240+ based hints. For SelectDCT. different number of bits are used for different frequency components according to their dynamic ranges. The overhead can be further decreased by entropy coding, e.g., by using distributed source coding ideas as in [6] to exploit correlation between h(R) and h'(S).
[0057 ] For the purpose of determining frame freeze decisions, we show that shift- invariant metric such as MC-PSNR can be approximated in a reduced reference framework by resizing the picture or dropping DCT coefficients. For a panning sequence, these methods produce superior precision and recall compared to PSNR based methods. For a sequence without significant panning, we show that these schemes can be combined with PSNR measures lo improve precision and re- call with respect to MC-PSNR. We also propose an adaptive thresholding technique to account Cor a viewer's increased dissatisfaction when freeze episodes are clustered.
[0058] By allowing selective freezing of visually unpleasant pictures only, our technique significantly improves the visual quality of the resulting video at a lower than 1% increase in bit-rate.
REFERENCES
[0059] [1 ] R. awada. O. Sugimoto. A. Koike, M. Wada, and S. Matsumoto, 'Highly precise estimation scheme for remote video PSNR using spread spectrum and extraction of orthogonal transform coefficients." Electronics and Communications in Japan. Part I , vol. 89, no. 6. 2006.
[0060] [2] Framework for remote monitoring of transmitted picture signal-to-noise ratio using spread-spectrum and orthogonal transform, Std., ITU-T Recommendation J.240, 2004.
[0061] [3] M. P. Sampat, Z. Wang, S. Gupta. A. C. Bovik and M. K. Markey, "Complex wavelet structural similarity: A new image similarity index," IEEE Trans, on Image Processing, vol. 18, no. 1 1 , pp. 2385-2401. Nov. 2009.
[0062 ] [4] A. Rehman and Z. Wang. "'Reduced-reference SSIM estimation." Proc. International Conference on Image Processing (ICIP). Hong Kong. Sept. 2010.
[0063] [5] R. R. Pastrana-Vidal. J. C. Gicquel. C. Colonies, and H. Cherifi. "Frame dropping effects on user quality perception," Proc. 5th International Workshop on Image Analysis for Multimedia Interactive Services, Lisbon. Portugal, April 2004.
[0064 ] [6] K. Chono. Y. C. Lin. D. Varodayan. Y. Miyamoto, and B. Girod. "Reduced- reference image quality estimation using distributed source coding," Proc. International Conference on Multimedia and Expo, Hannover, Germany, June 2008.
[0065 ] It should be understood that the description recited above is an example of the disclosure and that modifications and changes to the examples may be undertaken which are within the scope of the claimed disclosure. Therefore, the scope of the appended claims should be accorded the broadest interpretation so as to encompass all such modifications and similar arrangements, including a full scope of equivalents.

Claims

In the Claims:
1. A method for processing streaming video information, the method comprising:
streaming first video information comprising a plurality of video frames, the video information including at least a portion of a video frame:
processing the streaming video information, using a decoder device, frame by frame;
determining a quality measure for the portion of video frame;
storing information associated with the quality measure;
determining a display decision for the portion of video frame using a function of the quality measure and a stored information associated with one or more quality measures of prior video frames:
storing at least the display decision of the portion of video frame; and outputting the portion of video frame according to the display decision.
2. The method of claim 1 wherein the portion of a video frame is a video frame.
3. The method of claim 1 wherein the function is history-based perception model.
4. The method of claim 1 wherein the quality measure is obtained using a reduced reference method, the reduced reference method being selected from at least one of Resize-PSNR or Select-DCT.
5. The method of claim 1 wherein the display decision is a binary decision to show or hide the particular portion of a video frame.
6. The method of claim 1 wherein the information associated with a quality measure is a corresponding display decision for a respective prior video frame.
7. A method for processing streaming video information, the method comprising:
streaming first video information, the video information including a plurality a video frames;
processing the streaming video information, frame by frame;
determining a quality measure of a first frame;
determining a threshold for a second frame using a function of the quality measure and a at least one prior quality measure;
performing a binary decision to show or hide the second frame using the threshold information: and
outputting second video information.
8. The method of claim 7 wherein the threshold information is the binary decision to hide or show one or more of the earlier frames.
9. The method of claim 7 wherein the stream of video information is raw video: and wherein the function is one of a perception model.
10. The method of claim 7 the determining of the quality measure is performed using a reduced reference method, the reduced reference method being from at least one of esize-PSNR or Select-DCT.
1 1. The method of claim 7 wherein the second video information is provided in one of a plurality of selected applications, the selected applications including video conferencing, video outputting. and broadcasting of live events.
12. The method of claim 7 wherein the first frame is a fraction of a frame.
13. A digital video processing apparatus comprising: a quality measure module, the quality measure module configured to produce a quality measure Qi for each frame i. the quality measure being one of a reduced or no reference methods or a binary indicator of whether a frame is loss impaired;
a compare module coupled to the quality measure module, the compare module being configured to use the quality measure Qi to a threshold information Ti, frame by frame, and configured to decide whether to show a frame if Qi > Ti, or hide the frame; and
whereupon the decision is Fi and Ti is adjusted based on a past decision F(i- I ). F(i-2), .. F(i-N).
14. The apparatus of claim 13 wherein the threshold Ti is a function of measurable information, the measurable information is at least one of a number of frame freeze episodes within a past ten second interval or a duration of each of the freeze episodes.
15. The apparatus of claim 13 wherein the decision whether to hide the frame is provided by freezing an earlier frame.
PCT/US2012/030219 2012-03-23 2012-03-23 Method and system to process a video frame using prior processing decisions WO2013141872A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2012/030219 WO2013141872A1 (en) 2012-03-23 2012-03-23 Method and system to process a video frame using prior processing decisions

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2012/030219 WO2013141872A1 (en) 2012-03-23 2012-03-23 Method and system to process a video frame using prior processing decisions

Publications (1)

Publication Number Publication Date
WO2013141872A1 true WO2013141872A1 (en) 2013-09-26

Family

ID=49223131

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2012/030219 WO2013141872A1 (en) 2012-03-23 2012-03-23 Method and system to process a video frame using prior processing decisions

Country Status (1)

Country Link
WO (1) WO2013141872A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117061825A (en) * 2023-10-12 2023-11-14 深圳云天畅想信息科技有限公司 Method and device for detecting bad frames of streaming media video and computer equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050286802A1 (en) * 2004-06-22 2005-12-29 Canon Kabushiki Kaisha Method for detecting and selecting good quality image frames from video
US20060198443A1 (en) * 2005-03-01 2006-09-07 Yi Liang Adaptive frame skipping techniques for rate controlled video encoding
US20090147854A1 (en) * 2007-12-10 2009-06-11 Qualcomm Incorporated Selective display of interpolated or extrapolaed video units
US20110080944A1 (en) * 2009-10-07 2011-04-07 Vixs Systems, Inc. Real-time video transcoder and methods for use therewith

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050286802A1 (en) * 2004-06-22 2005-12-29 Canon Kabushiki Kaisha Method for detecting and selecting good quality image frames from video
US20060198443A1 (en) * 2005-03-01 2006-09-07 Yi Liang Adaptive frame skipping techniques for rate controlled video encoding
US20090147854A1 (en) * 2007-12-10 2009-06-11 Qualcomm Incorporated Selective display of interpolated or extrapolaed video units
US20110080944A1 (en) * 2009-10-07 2011-04-07 Vixs Systems, Inc. Real-time video transcoder and methods for use therewith

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117061825A (en) * 2023-10-12 2023-11-14 深圳云天畅想信息科技有限公司 Method and device for detecting bad frames of streaming media video and computer equipment
CN117061825B (en) * 2023-10-12 2024-01-26 深圳云天畅想信息科技有限公司 Method and device for detecting bad frames of streaming media video and computer equipment

Similar Documents

Publication Publication Date Title
US9025673B2 (en) Temporal quality metric for video coding
US9288071B2 (en) Method and apparatus for assessing quality of video stream
US9521411B2 (en) Method and apparatus for encoder assisted-frame rate up conversion (EA-FRUC) for video compression
US9232217B2 (en) Method and apparatus for objective video quality assessment based on continuous estimates of packet loss visibility
US20060188014A1 (en) Video coding and adaptation by semantics-driven resolution control for transport and storage
KR20070117660A (en) Content adaptive multimedia processing
EP2727344B1 (en) Frame encoding selection based on frame similarities and visual quality and interests
WO2009091530A1 (en) Method for assessing perceptual quality
Reibman et al. Characterizing packet-loss impairments in compressed video
Winkler et al. Vision and video: models and applications
Reibman et al. Predicting packet-loss visibility using scene characteristics
Chen et al. Hybrid distortion ranking tuned bitstream-layer video quality assessment
EP2875640B1 (en) Video quality assessment at a bitstream level
WO2014032463A1 (en) Method and apparatus for estimating content complexity for video quality assessment
WO2013141872A1 (en) Method and system to process a video frame using prior processing decisions
Makar et al. Selective freezing of impaired video frames using low-rate shift-invariant hint
Kwon et al. A novel video quality impairment monitoring scheme over an ipty service with packet loss
Ahn et al. No-reference video quality assessment based on convolutional neural network and human temporal behavior
Liu et al. Perceptual quality measurement of video frames affected by both packet losses and coding artifacts
Yang et al. Research on Video Quality Assessment.
Farrugia et al. Objective video quality metrics for HDTV services: A survey
Yang et al. Temporal quality evaluation for enhancing compressed video
Al Juboori et al. Content characterization for bit rate estimation in live video compression and delivery
Yang et al. Objective quality metric based on perception for video
Tan et al. A unified transcoding approach to fast forward and reverse playback of compressed video

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12872021

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12872021

Country of ref document: EP

Kind code of ref document: A1