WO2013159275A1 - Estimation de qualité vidéo perçue en considérant l'attention visuelle - Google Patents

Estimation de qualité vidéo perçue en considérant l'attention visuelle Download PDF

Info

Publication number
WO2013159275A1
WO2013159275A1 PCT/CN2012/074527 CN2012074527W WO2013159275A1 WO 2013159275 A1 WO2013159275 A1 WO 2013159275A1 CN 2012074527 W CN2012074527 W CN 2012074527W WO 2013159275 A1 WO2013159275 A1 WO 2013159275A1
Authority
WO
WIPO (PCT)
Prior art keywords
block
fixation point
video
quality
determining
Prior art date
Application number
PCT/CN2012/074527
Other languages
English (en)
Inventor
Xiaodong Gu
Debing Liu
Zhibo Chen
Original Assignee
Technicolor (China) Technology Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Technicolor (China) Technology Co., Ltd. filed Critical Technicolor (China) Technology Co., Ltd.
Priority to PCT/CN2012/074527 priority Critical patent/WO2013159275A1/fr
Publication of WO2013159275A1 publication Critical patent/WO2013159275A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30168Image quality inspection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/154Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion

Definitions

  • TECHNICAL FIELD This invention relates to video quality measurement, and more particularly, to a method and apparatus for estimating perceived video quality in response to the visual attention.
  • Video quality losses may be caused by various events, for example, by lossy compression and transmission errors and they may be perceived by human eyes as various types of visual artifacts. For example, blockiness, ringing, and blurriness are typical artifacts caused by lossy compression.
  • a decoder may apply error concealment in order to reduce the strength of channel artifacts.
  • the present principles provide a method for estimating video quality of a video sequence, comprising the steps of: accessing a block of a picture in the video sequence; determining a fixation point in the video sequence; determining a vision sensitivity factor for the block in response to a distance between the fixation point and the block; adjusting a quality metric for the block in response to the vision sensitivity factor; and determining the video quality for the video sequence in response to the adjusted quality metric as described below.
  • the present principles also provide an apparatus for performing these steps.
  • the present principles also provide a method for estimating video quality of a video sequence, comprising the steps of: accessing a block of a picture in the video sequence; determining a fixation point in the video sequence; determining a vision sensitivity factor for the block in response to a distance between the fixation point and the block, wherein the vision sensitivity factor for the block decreases if the distance between the fixation point and the block increases; weighting a quality metric for the block using the vision sensitivity factor to adjust the quality metric for the block, wherein the quality metric for the block measures strength of artifacts at the block, the artifacts being caused by at least one of compression, transmission errors, and error propagation; and determining the video quality for the video sequence in response to the adjusted quality metric as described below.
  • the present principles also provide an apparatus for performing these steps.
  • the present principles also provide a computer readable storage medium having stored thereon instructions for estimating video quality of a video sequence according to the methods described above.
  • FIGs. 1 A and 1 B are pictorial examples depicting pictures with channel artifacts.
  • FIGs.2A and 2B are pictorial examples depicting pictures with channel artifacts at different locations.
  • FIG. 3A is a pictorial example depicting photoreceptor and ganglion cell distribution in human retina
  • FIG. 3B is a pictorial example depicting the relation between contrast sensitivity and distance
  • FIG. 3C is a pictorial example depicting a possible approximation of the relation between vision sensitivity and distance, in accordance with an embodiment of the present principles.
  • FIG. 4 is a pictorial example depicting a picture with 99 macroblocks
  • FIG. 5 is a flow diagram depicting an example for calculating a video quality metric, in accordance with an embodiment of the present principles.
  • FIG. 6 is a block diagram depicting an example of a video quality
  • FIG. 7 is block diagram depicting an example of a video processing system that may be used with one or more implementations of the present principles.
  • a decoder may adopt error concealment techniques to conceal the lost portions.
  • the goal of error concealment is to estimate missing portions in order to minimize perceptual quality degradation.
  • the perceived strength of artifacts produced by transmission errors depends heavily on the employed error concealment techniques.
  • a spatial approach or a temporal approach may be used for error
  • MVs motion vectors
  • Visual artifacts may still be perceived after error concealment. We denote visual artifacts caused directly by transmission error or loss as initial visible artifacts. If a block having artifacts is used as a reference, for example, for intra prediction or inter prediction, the initial visible artifacts may propagate spatially or temporally to other macroblocks in the same or other pictures through prediction. Such
  • propagated artifacts are denoted as propagated visible artifacts.
  • FIG. 1A illustrates an exemplary decoded picture with initial visible artifact.
  • the middle portion of the picture is lost and is reconstructed by a spatial error concealment technique.
  • the strength of the artifact is directly related to the error concealment algorithm.
  • FIG. 1 B illustrates an exemplary decoded picture with propagated visible artifacts.
  • the data representing the picture is correctly received.
  • artifacts can be seen in various locations in this picture.
  • fixation points also referred to as focusing points
  • saccades typically without conscious planning.
  • the eye can move in other ways, such as smoothly pursuing a moving object. Using fixation points and saccades, the visual attention regions of the picture can be detected.
  • a viewer when a viewer watches a video, he/she may only be able to move between few fixation points in a picture. Since a shot in a video is usually short, for example, less than 5 seconds, the viewer may only be able to move between few feature points in the shot.
  • the vision sensitivity diminishes quickly toward the periphery, for example, artifacts that are far away from the fixation point may not be identified or content details far away from the fixation point may not be noticed. Note that a viewer may focus at a point of a picture because the content at the focusing point appears interesting or because the artifact at the focusing point is so prominent that it catches the viewer's attention immediately.
  • FIGs. 2A and 2B show two exemplary pictures with channel artifacts.
  • channel artifacts occur at the bottom-left of the picture (around 210)
  • FIG. 2B channel artifacts occur at the center of the picture (below the skier, around 220).
  • the channel artifacts are viewed, i.e., when the fixation point is around 210 or 220, they have similar level of artifact strength at both FIGs. 2A and 2B.
  • the perceived artifact strength when the artifact is focused at as objective artifact strength.
  • the objective artifact strengths around 210 and 220 are similar, while the perceived artifact strengths may differ significantly.
  • the light In an early vision system, the light first passes through the optics of the eye and is then sampled by the photoreceptors (cones and rods) on the retina.
  • the cone receptor distribution is highly non-uniform.
  • the photoreceptors deliver data to the bipolar cells, which in turn supply information to the ganglion cells, which also have a highly non-uniform distribution.
  • the variation of the densities of photoreceptors and ganglion cells with eccentricity is shown in FIG. 3A.
  • the densities of cones and ganglion cells play important roles in determining the ability of our eyes to resolve what we see.
  • Psycho-visual experiments have been conducted to measure the contrast sensitivity as a function of retinal eccentricity. By fitting experimental data, a visible contrast threshold function CT(f, c) can be formulated as:
  • CT(f, c) T 0 x e af ⁇ , 0 ) where / is the spatial frequency (cycles/degree), c is the retinal eccentricity
  • T Q is a constant
  • a is the spatial frequency decay constant
  • c 2 is the half-resolution eccentricity.
  • a contrast sensitivity is then defined as a reciprocal of the contrast threshold
  • FIG. 3B illustrates an exemplary curve for the contrast sensitivity, which reaches its maximum at the eye fixation point.
  • the contrast sensitivity is mainly
  • the retinal eccentricity which depends on a distance between a fixation point and an object, especially when the viewing distance is fixed.
  • the present principles provide a method and apparatus for estimating perceived video quality.
  • Most existing video compression standards for example, H.264 and MPEG-2, use a macroblock (MB) as the basic encoding unit.
  • MB macroblock
  • the following embodiments use a macroblock as the basic processing unit.
  • the principles may be adapted to use a block at a different size, for example, an 8x8 block, a 16x8 block, a 32x32 block, and a 64x64 block.
  • Objective artifact strength may be evaluated using some existing methods, for example, the methods described in a commonly owned PCT application, entitled “Method and device for estimating video quality on bitstream level" by N. Liao, X. Gu, Z. Chen, and K. Xie (PCT/CN201 1 /000832, Attorney Docket No. PA1 10009). Given the objective artifact strength, the perceived artifact strength may be estimated using the property of human visual system. Within the picture fr,
  • t £j the distance between macroblocks B t and B j
  • #(t £ j) vision sensitivity at macroblock B t .
  • #(t £ j) is described as an exponential function as depicted in FIG. 3B.
  • the function #(t £ j) can be simplified to a linear function as depicted in FIG. 3C.
  • FIG. 4 illustrates an exemplary picture (with 99 macroblocks) to be evaluated, where the upper-left MB 410 is denoted as the first macroblock of the picture, and where a fixation point may be at the center macroblock 420.
  • the macroblocks in the picture can be indexed according to a raster scanning order. The distance between two macroblocks may be measured as the distance between centers of these two macroblocks.
  • the vision sensitivity at macroblock B t may be described as a function of p £ and #(t £ ):
  • Embodiment I In the first exemplary embodiment, we assume that the center of the picture is the only fixation point. Mathematically, this can be described as:
  • B t is center macroblock
  • fixation point may be around the center macroblock, but offset by several
  • equation (4) becomes:
  • a x is a constant number.
  • the value of a x can be derived from equation and FIGs. 3B and 3C, or it can be determined by experiments or trained by subjective data.
  • the perceived artifact strength for macroblock B t may be calculated as:
  • t i attr is the distance between the most attractive point and macroblock B t x,attr is tne maximal distance between B attr and four corners of the picture, and, similar to a x in equation (5), a 2 is a constant number.
  • Embodiment 3 In the third exemplary embodiment, we assume there are more than one fixation points in a picture (i.e., m > 1), and we define these fixation points as:
  • Battr, m - Having multiple fixation points is more likely to occur when the viewer has more time to watch a picture, for example, when the viewer plays the video slower than real time or a shot is long and static.
  • Those m fixation points may be determined using existing methods, for example, they may correspond to m macroblocks with the highest salience values according the salience model described in Gu.
  • the perceived artifact strength for macroblock B t may be calculated as:
  • the perceived artifact strength is controlled by all fixation points through a weighted average.
  • the overall video quality metric can be predicted using an existing pooling strategy, for example, the pooling strategy disclosed in a commonly owned PCT application, entitled “Video quality measurement considering multiple artifacts" by X. Gu, D, Liu, and Z. Chen (PCT/CN201 1/083020, Attorney Docket No. PA1 10071 ).
  • FIG. 5 illustrates an exemplary method 500 for estimating a perceived video quality metric considering the property of the human visual system.
  • it determines one or more fixation points in a video that a viewer may focus at.
  • the objective artifact strength for an individual image block is estimated.
  • perceived artifact strength is determined for the image block at step 530, for example, using equation (5), (6), (7), or (8).
  • it checks whether more blocks need to be processed. If more blocks are to be processed, the control returns to step 520. Otherwise, an overall quality metric is estimated using a pooling strategy at step 550, based on the perceived artifact strength for individual blocks.
  • FIG. 6 depicts a block diagram of an exemplary video quality monitor 600.
  • the input of apparatus 600 may include a decoded video or a reconstructed video from an encoder.
  • Artifact detector 610 estimates objective artifact strength for individual blocks. For example, it may detect compression artifacts, channel artifacts, or other types of visual artifacts.
  • Fixation point detector 620 detects where a viewer will focus at the picture. For example, the fixation point may be estimated to be at the center of the picture, at the point with the highest salience value, or it may be tracked with an eye tracking system.
  • a quality predictor 630 estimates an overall perceived video quality metric, for example, using method 500.
  • a video transmission system or apparatus 700 is shown, to which the features and principles described above may be applied.
  • a processor 705 processes the video and the encoder 710 encodes the video.
  • the bitstream generated from the encoder is transmitted to a decoder 730 through a distribution network 720.
  • a video quality monitor or a video quality measurement apparatus, for example, the apparatus 600, may be used at different stages.
  • a video quality monitor 740 may be used by a content creator.
  • the estimated video quality may be used by an encoder in deciding encoding parameters, such as mode decision or bit rate allocation.
  • the content creator uses the video quality monitor to monitor the quality of encoded video. If the quality metric does not meet a pre-defined quality level, the content creator may choose to re-encode the video to improve the video quality. The content creator may also rank the encoded video based on the quality and charges the content accordingly.
  • a video quality monitor 750 may be used by a content distributor.
  • a video quality monitor may be placed in the distribution network. The video quality monitor calculates the quality metrics and reports them to the content distributor. Based on the feedback from the video quality monitor, a content distributor may improve its service by adjusting bandwidth allocation and access control.
  • the content distributor may also send the feedback to the content creator to adjust encoding.
  • improving encoding quality at the encoder may not necessarily improve the quality at the decoder side since a high quality encoded video usually requires more bandwidth and leaves less bandwidth for transmission protection. Thus, to reach an optimal quality at the decoder, a balance between the encoding bitrate and the bandwidth for channel protection should be considered.
  • a video quality monitor 760 may be used by a user device. For example, when a user device searches videos in Internet, a search result may return many videos or many links to videos corresponding to the requested video content. The videos in the search results may have different quality levels. A video quality monitor can calculate quality metrics for these videos and decide to select which video to store.
  • the decoder estimates qualities of concealed videos with respect to different error concealment modes. Based on the estimation, an error concealment that provides a better concealment quality may be selected by the decoder.
  • the implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed may also be implemented in other forms (for example, an apparatus or program).
  • An apparatus may be implemented in, for example, appropriate hardware, software, and firmware.
  • the methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs”), and other devices that facilitate communication of information between end-users.
  • PDAs portable/personal digital assistants
  • the appearances of the phrase “in one embodiment” or “in an embodiment” or “in one implementation” or “in an implementation”, as well any other variations, appearing in various places throughout the specification are not necessarily all referring to the same embodiment.
  • this application or its claims may refer to "determining" various pieces of information. Determining the information may include one or more of, for example, estimating the information, calculating the information, predicting the information, or retrieving the information from memory. Further, this application or its claims may refer to "accessing" various pieces of information. Accessing the information may include one or more of, for example, receiving the information, retrieving the information (for example, from memory), storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.
  • Receiving is, as with “accessing”, intended to be a broad term.
  • Receiving the information may include one or more of, for example, accessing the information, or retrieving the information (for example, from memory).
  • “receiving” is typically involved, in one way or another, during operations such as, for example, storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.
  • implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted.
  • the information may include, for example, instructions for performing a method, or data produced by one of the described implementations.
  • a signal may be formatted to carry the bitstream of a described embodiment.
  • Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal.
  • formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream.
  • the information that the signal carries may be, for example, analog or digital information.
  • the signal may be transmitted over a variety of different wired or wireless links, as is known.
  • the signal may be stored on a processor-readable medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)

Abstract

Des artéfacts visuels peuvent exister dans une vidéo décodée, par exemple en raison d'une compression avec perte, d'erreurs de transmission et d'une propagation d'erreur. Afin d'estimer une qualité vidéo perçue de la vidéo décodée d'une manière précise, les propriétés du système visuel humain sont prises en considération selon l'invention. En particulier, un ou plusieurs points de fixation du spectateur sont estimés. En fonction de la distance entre un macrobloc particulier et les points de fixation, un facteur de sensibilité visuelle est calculé pour le macrobloc particulier. L'intensité d'artéfact objective pour le macrobloc particulier est ensuite pondérée par le facteur de sensibilité visuelle afin d'estimer l'intensité d'artéfact perçu. En utilisant des intensités d'artéfact perçu estimées pour des macroblocs individuels, une mesure de qualité vidéo perçue globale peut être estimée pour la vidéo décodée à l'aide d'une stratégie de groupage.
PCT/CN2012/074527 2012-04-23 2012-04-23 Estimation de qualité vidéo perçue en considérant l'attention visuelle WO2013159275A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2012/074527 WO2013159275A1 (fr) 2012-04-23 2012-04-23 Estimation de qualité vidéo perçue en considérant l'attention visuelle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2012/074527 WO2013159275A1 (fr) 2012-04-23 2012-04-23 Estimation de qualité vidéo perçue en considérant l'attention visuelle

Publications (1)

Publication Number Publication Date
WO2013159275A1 true WO2013159275A1 (fr) 2013-10-31

Family

ID=49482106

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/074527 WO2013159275A1 (fr) 2012-04-23 2012-04-23 Estimation de qualité vidéo perçue en considérant l'attention visuelle

Country Status (1)

Country Link
WO (1) WO2013159275A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116074585A (zh) * 2023-03-03 2023-05-05 乔品科技(深圳)有限公司 基于ai和注意力机制的超高清视频编解码方法和装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1885954A (zh) * 2005-06-23 2006-12-27 华为技术有限公司 一种方块效应度量方法和视频质量评估方法
WO2007130389A2 (fr) * 2006-05-01 2007-11-15 Georgia Tech Research Corporation Procédé et système de mesure de la qualité vidéo automatique reposant sur des mesures de cohérence spatio-temporelles
CN101562758A (zh) * 2009-04-16 2009-10-21 浙江大学 基于区域权重和人眼视觉特性的图像质量客观评价方法
CN101601070A (zh) * 2006-10-10 2009-12-09 汤姆逊许可公司 用于生成画面显著度图的设备和方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1885954A (zh) * 2005-06-23 2006-12-27 华为技术有限公司 一种方块效应度量方法和视频质量评估方法
WO2007130389A2 (fr) * 2006-05-01 2007-11-15 Georgia Tech Research Corporation Procédé et système de mesure de la qualité vidéo automatique reposant sur des mesures de cohérence spatio-temporelles
CN101601070A (zh) * 2006-10-10 2009-12-09 汤姆逊许可公司 用于生成画面显著度图的设备和方法
CN101562758A (zh) * 2009-04-16 2009-10-21 浙江大学 基于区域权重和人眼视觉特性的图像质量客观评价方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MARIO VRANJES ET AL.: "Influence of foveated vision on video quality perception.", 51ST INTERNATIONAL SYMPOSIUM ELMAR-2009., 30 September 2009 (2009-09-30), pages 29 - 32, XP031572262 *
ZHOU WANG ET AL.: "Wavelet-based foveated image quality measurement for region of interest image coding.", IMAGE PROCESSING, 2001. PROCEEDINGS. 2001 INTERNATIONAL CONFERENCE ON, vol. 2, 10 October 2001 (2001-10-10), pages 89 - 92, XP010563704 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116074585A (zh) * 2023-03-03 2023-05-05 乔品科技(深圳)有限公司 基于ai和注意力机制的超高清视频编解码方法和装置

Similar Documents

Publication Publication Date Title
JP4733049B2 (ja) 映像品質客観評価装置、評価方法およびプログラム
EP2297963B1 (fr) Compression de vidéo sous de multiples contraintes de distorsion
AU2011381970B2 (en) Video quality measurement
WO2013078822A1 (fr) Masquage de texture pour la mesure d'une qualité vidéo
EP2783513A1 (fr) Évaluation de la qualité d'une vidéo en prenant en compte des artefacts consécutifs à une scène coupée
US9280813B2 (en) Blur measurement
EP2875640B1 (fr) Évaluation de la qualité d'une vidéo à l'échelle d'un train de bits
US9723301B2 (en) Method and apparatus for context-based video quality assessment
EP2954677B1 (fr) Procédé et appareil d'évaluation de qualité vidéo sur la base du contexte
WO2013159275A1 (fr) Estimation de qualité vidéo perçue en considérant l'attention visuelle
US20150170350A1 (en) Method And Apparatus For Estimating Motion Homogeneity For Video Quality Assessment
Yu et al. A perceptual quality metric based rate-quality optimization of h. 265/hevc
CN114202495A (zh) 应用于视频的图像质量损失确定方法、装置、设备及介质
CN104995914A (zh) 用于基于上下文的视频质量评估的方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12875416

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12875416

Country of ref document: EP

Kind code of ref document: A1