MX2014006269A

MX2014006269A - Video quality assessment considering scene cut artifacts.

Info

Publication number: MX2014006269A
Application number: MX2014006269A
Authority: MX
Inventors: Ning Liao; Fan Zhang; Zhibo Chen; Kai Xie
Original assignee: Thomson Licensing
Priority date: 2011-11-25
Filing date: 2011-11-25
Publication date: 2014-07-09
Also published as: CN103988501A; WO2013075335A1; MX339675B; CA2855177A1; RU2597493C2; RU2014125557A; JP5981561B2; EP2783513A1; JP2015502713A; US20140301486A1; KR20140110881A; EP2783513A4; HK1202739A1

Abstract

A particular implementation detects scene cut artifacts in a bitstream without reconstructing the video. A scene cut artifact is usually observed in the decoded video (1) when a scene cut picture in the original video is partially received or (2) when a picture refers to a lost scene cut picture in the original video. To detect scene cut artifacts, candidate scene cut pictures are first selected and scene cut artifact detection is then performed on the candidate pictures. When a block is determined to have a scene cut artifact, a lowest quality level is assigned to the block.

Description

EVALUATION OF VIDEO QUALITY CONSIDERING THE SCENE CUTTING ARTIFACTS Field of the Invention The present invention relates to video quality measurement, and more particularly, to a method and apparatus for determining an objective video quality metric.

Background of the Invention With the development of IP networks, video communication over wired and wireless IP networks (for example, IPTV service) has become popular. Unlike traditional video transmission over cable networks, the delivery of video over IP networks is less reliable. Consequently, in addition to the loss of quality from video compression, video quality is further degraded when a video is transmitted over IP networks. A successful video quality modeling tool needs to assess the quality degradation caused by the deterioration of network transmission (for example, packet loss, transmission delay and transmission instability) in addition to the quality degradation produced by the compression of video.

Brief Description of the Invention According to a general aspect, a stream of bits including coded images, and a scene cut image in the bit stream that is determined using bitstream information without decoding the bitstream to derive the pixel information.

According to another general aspect, a bitstream including encoded images is accessed, and the respective difference measures are determined in response to at least one of frame sizes, prediction residues and motion vectors among a set of images of the bit stream, wherein the set of images includes at least one of a candidate scene-cut image, an image that precedes the candidate scene-cut image, and an image after the scene-cut image candidate. The candidate scene cut image is determined to be the scene cut image if one or more of the difference measurements exceeds their respective pre-determined thresholds.

According to another general aspect, a bitstream including encoded images is accessed. An intra image is selected as a candidate scene cut image if the compressed data for at least one block in the intra image is lost or an image that refers to a lost image is selected as a candidate scene cut image. The respective difference measures are determined in response to at least one of sizes of frame, prediction residues and motion vectors between a set of images of the bit stream, wherein the set of images includes at least one of a candidate scene cut image, an image that precedes the cut image of Candidate scene, and an image after the cut scene image candidate. The candidate scene cut image is determined to be the scene cut image if one or more of the difference measurements exceeds their respective pre-determined thresholds.

The details of one or more implementations are defined in the drawings that accompany the present and its description below. Even if it was described in a particular way, it must be evident that implementations can be configured or represented in various ways. For example, an implementation can be performed as a method, or represented as an apparatus, such as, for example, an apparatus configured to perform a set of operations or an apparatus that stores instructions for performing a set of operations or represented in a signal . Other aspects and features will become apparent from the following detailed description considered in conjunction with the accompanying drawings and the claims.

Brief Description of the Drawings Figure 1A is an example image representing an image with scene cutting artifacts in a frame of scene cut, Figure 1B, is an example image representing an image without scene cut artifacts, and Figure 1C, is an example image representing an image with scene cut artifacts in a frame, which It is not a scene cut box.

Figures 2A and 2B are pictorial examples that depict how scene-cut artifacts relate to scene cuts according to a mode of present principles.

Figure 3 is a flowchart representing an example of video quality modeling according to one embodiment of the present principles.

Fig. 4 is a flow chart depicting an example of scene cut artifact detection according to a present embodiment of the principles.

Figure 5 is an example image that represents how to calculate the n, oss variable Figures 6A and 6C are examples of images that represent how the variable pk_num varies with the frame index, and Figures 6B and 2D are pictorial examples that represent how the variable bytes_num varies with the frame rate, according to a modality of the present principles.

Figure 7 is a flow chart depicting an example of determining cutting artifact locations of candidate scene according to a mode of present principles.

Figure 8 is an example image representing an image with 99 macroblocks.

Figures 9A and 9B are pictorial examples that depict how the surrounding frames are used for the detection of scene cut artifact according to an embodiment of the present principles.

Fig. 10 is a flow chart depicting an example of scene cut detection according to a present embodiment of the principles.

Figures 11 A and 11 B are pictorial examples that represent how the surrounding l-frames are used for the detection of artifact, according to a modality of the present principles.

Figure 12 is a block diagram that represents an example of video quality monitoring according to a mode of the present principles.

Figure 13 is a block diagram representing an example of a video processing system that can be used with one or more implementations.

Detailed description of the invention A video quality measurement tool can operate at different levels. In one mode, the tool can take the received bit stream and measure the quality of the video without building the video. Said method is usually referred to as a video quality measurement of bit current level. When extra computational complexity is allowed, video quality measurement can reconstruct some or all of the images from the bit stream and use the reconstructed images to more accurately estimate the quality of the video.

The present modalities refer to objective video quality models that evaluate the quality of video (1) without reconstructing the videos; and (2) with partially reconstructed videos. In particular, the present principles consider a particular type of artifacts that is observed around a cut scene, denoted as the scene cutting artifact.

Most existing video compression standards, for example, H.264 and MPEG-2, use a macroblock (MB) as the basic encoding unit. Therefore, the following modes use a macroblock as the basic processing unit. However, the principles can be adapted to use a block in a different size, for example, an 8x8 block, a 16x8 block, a 32x32 block, and a 64x64 block.

When some portions of the encoded video stream are lost during network transmission, a decoder can adopt the reconcile techniques of error to reconcile the macroblocks that correspond to the lost portions. The objective of the error reconciliation is to estimate the missing macroblocks in order to minimize the degradation of perception quality. The perceived strength of artifacts produced by transmission errors depends to a large extent on the error reconciliation techniques employed.

You can use a spatial method or a temporary method for error reconciliation. In a spatial method, the spatial correlation between the pixels is the one that is exploited, and the missing macroblocks are recovered by the interpolation techniques of the surrounding pixels. In a temporal method, both the coherence of the motion field and the spatial fluidity of the pixels are exploited to estimate the motion vectors (MVs) of a lost macroblock or MVs of the pixels are matched using the reference pixels in the frames above according to the estimated motion vectors.

Visual artifacts can still be perceived after error reconciliation Figures 1A through 1C illustrate exemplified decoded images, where some packets of the encoded bit stream are lost during transmission. In these examples, a temporary error reconciliation method is used to reconcile the macroblocks lost in the decoder. In particular, macroblocks placed in a previous frame are copied to the lost macroblocks.

In Figure 1A, packet losses, for example, due to transmission errors, occur in a scene cut box (ie, a first frame in a new scene). Because the change of dramatic content between the current frame and the previous frame (from another scene), the composite image contains an area that remains in the matched image. That is, this area has a texture very different from its surrounding macroblocks. Therefore, this area could easily be perceived as a visual artifact. To facilitate annotation, this type of artifact around a scene cut image is denoted as a scene cut artifact.

In contrast, Figure 1B illustrates another image located within a scene. Because the content lost in the current frames is similar to that in the macroblocks placed in the previous frame, which are used to reconcile the current frame, the temporary error reconciliation works properly and the visual artifacts can be perceived hardly in Figure 1B.

It should be noted that scene-cut artifacts may not necessarily occur in the first frame of a scene. Instead, these can be observed in the scene cut box or after a scene cut box lost, as illustrated by the examples in Figures 2A and 2B.

In the example of figure 2A, the images 210 and 220 belong to different scenes. The image 210 is received correctly and the image 220 is a partially received scene cut frame. The received parts of the image 220 are decoded in an appropriate manner, where the lost parts are matched with the placed macroblocks of the image 210. When there is a significant change between the images 210 and 220, the image of the composite 220 will have the cutting artifacts. of scene. Therefore, in this example, scene cut artifacts occur in the scene cut frame.

In the example of Figure 2B, the images 250 and 260 belong to one scene, and the images 270 and 280 belong to another scene. During compression, image 270 is used as a reference for image 280 for motion compensation. During transmission, the compressed data corresponding to images 260 and 270 are lost. To reconcile the images lost in the decoder, the decoded image 250 can be copied to the images 260 and 270.

The compressed boxes for image 280 are received correctly. But because they refer to the image 270, which is now a copy of the decoded image 250 of another scene, the decoded image 270 may also have Scene cut artifacts. Therefore, scene cut artifacts can occur after the scene cut scene (270), in this example, in the second frame of a scene. It should be noted that scene-cut artifacts can also occur at other locations in a scene. An example image with scene-cutting artifacts, which occur after a scene-cutting frame, are described in Figure 1C.

In fact, while the scene changes in the 270 image in the original video, the scene may seem to change in the 280 image, with the scene cut artifacts, in the decoded video. Unless stated explicitly, the scene cuts in the present application refer to those observed in the original video.

In the example shown in Figure 1A, the placed blocks (ie MV = 0) in a previous frame are used to reconcile the lost blocks in the current frame. Other temporary error reconciliation methods can be used with other motion vectors, and can be processed in different processing units, for example, at an image level or at a pixel level. It should be noted that scene cutting artifacts can occur around the scene cutoff for any method of temporary error reconciliation.

It can be observed from the examples shown in Figures 1A and 1C, that scene-cut artifacts have a strong negative impact on the quality of video perception. Therefore, in order to accurately predict the quality of objective video, it is important to measure the effect of scene-cut artifacts when modeling video quality.

To detect scene cut artifacts, you first need to detect if a scene cut frame is not received correctly or if a scene cut image was lost. This is a difficult problem, considering that we can only analyze the bitstream (without reconstructing the images) when the artifacts are detected. It becomes more difficult when the compression data that corresponds to a scene cut box is lost.

Obviously, the scene-cutting artifact detection problem for video quality modeling is different from the traditional scene-cut-scene detection problem, which usually works in a pixel domain and has access to the images.

An example video quality modeling method 300 which considers the cut-scene artifacts is shown in Figure 3. Artifacts resulting from the lost data, for example, those described in Figures 1A and 2A, are denoted as initial visible artifacts. In addition, we also classify the type of artifacts from the first image received in a scene, for example, that described in Figures 1C and 2B, as initial visible artifacts.

If a block having the initial visible artifacts is used as a reference, for example, for intra prediction or inter prediction, the initial visible artifacts may be spatially or temporally propagated to other macroblocks in the same or other images through the prediction . These propagated artifacts are denoted as visible propagated artifacts.

In the method 300, a video bit stream is entered in step 310 and the target video quality corresponding to the bitstream will be estimated. In step 320, an initial visible artifact level is calculated. The initial visible artifact may include scene-cut artifacts and other artifacts. The level of the initial visible artifacts can be estimated from the type of artifact, type of frame or other level of four or MB level characteristics obtained from the bit stream. In one embodiment, if a scene-cutting artifact is detected in a macroblock, the initial visible artifact level for the macroblock is set to the highest artifact level (that is, the lowest quality level).

In step 330, a propagated artifact level is calculated. For example, if a macroblock is marked as having a scene-cutting artifact, the levels of artifact propagated from all other pixels that refer to this macroblock could also be placed at the highest artifact level. In step 340, a spatiotemporal artifact grouping algorithm can be used to convert the different types of artifacts into a target MOS (Average Opinion Classification), which estimates the overall visual quality of the video corresponding to the current input. of bits. In step 350, the estimated MOS is output.

Figure 4 illustrates an example method 400 for detecting scene cut artifact. In step 410, the bit stream is digitally scanned to determine the candidate locations for scene cutting artifacts. After the candidate locations are determined, it is determined whether the scene cut artifacts exist in a candidate location in step 420.

It should be noted that step 420 can only be used for bit-current level scene cut-scene detection, for example, in the case of no packet loss. This can be used to obtain the limits of the scene, which are necessary when the scene level characteristics will be determined. When step 420 is used separately, each frame can be interpreted as a candidate scene cut image, or it can be specified which frames will be considered the locations candidates Next, the tables are discussed in more detail to determine the candidate scene cut artifact locations and detect the scene cut artifact locations.

Determining Place Locations Artifact Locations Candidates As set out in figures 2A and 2B, scene cutting artifacts occur in partially cut scene scenes or in frames referring to scene cut-outs. Therefore, the boxes with or surrounding the packet losses can be interpreted as potential scene cutoff artifact locations.

In one embodiment, when the bit stream is analyzed, the numbers of the received packets, the number of lost packets and the number of bytes received for each frame are obtained based on time records, for example, time records RTP and the PES MPEG-2 time registers, or the syntax element "frame_num" in the compressed bit stream, and the frame types of the encoded frames are also recorded. The obtained packet numbers, byte numbers and frame types can be used to refine the candidate artifact location determination.

Next, using RFC3984 for H.264 over RTP as an example transport protocol, it is illustrated how to determine candidate scene cut artifact locations.

For each received RTP packet, which video frame it belongs to can be determined based on the time record. That is, video packages that have the same time record are interpreted as belonging to the same video frame. For the video frame I that is received partially or completely, the following variables are recorded: (1). the sequence number of the first received RTP packet belonging to table i, denoted as sns (i), (2) . the sequence number of the last RTP packet for table i, denoted as sne (i), (3) . the number of RTP packets lost between the first and last RTP packets received for frame i, denoted as n, oss (i).

The sequence number is defined in the RTP protocol header and it is incremented by one per RTP packet. Therefore, r \ oss (i) is calculated by counting the number of lost RTP packets whose sequence numbers are between sns (i) and sne (i) based on the discontinuity of the sequence numbers. An example of calculating nioss (i) is illustrated in figure 5. In this example, sns (i) = 105 and sne (i) = 110. Between the start package (with a sequence number 105) and the packet term (with a sequence number 110) for the box i, packets with sequence numbers 107 and 109 are lost. Therefore ni0Ss (i) = 2 in this example.

A parameter, pk_num (i), is defined to estimate the number of packets transmitted for frame i and can be calculated as num (i) = [sne (i) - sne (i-k)] / k (1) where the square i-k is the square immediately before the square i (that is, other squares between the squares i and i-k are lost). For the table i that have the lost packets or that have the immediately preceding frame (s), a parameter is calculated, pk_num_avg (i) averaging pk_num from the previous frames (not I) in a sliding window of length N (for example , N = 6), that is, pk_num_avg (i) is defined as the average number (estimated of the transmitted packets that precede the current frame: picture j T the window glide. (2) In addition, the average number of bytes per packet (bytes_numpacket (i)) can be calculated by averaging the number of bytes in the packets received from the immediately preceding frames in a sliding window of N frames. A parameter, bytes_num (i), is defined to estimate the number of bytes transmitted for frame i and can be calculated as bytes_num (i) = bytesrecvd (i) + [ni0Ss (i) + sn8 (i) -sne (ik) -1] * bytes_numpacket (i) / k, (3) where bytesrecVd (i) is the number of bytes received for frame i, and [nioss (i) + sns (i) -sne (ik) -1] bytes_numpacket (i) / k is the estimated number of bytes lost for frame i. It should be noted that equation (3) is designed in a particular way for the RTP protocol. When other transport protocols are used, equation (3) must be adjusted, for example, by adjusting the estimated number of lost packets.

A parameter, bytes_num_avg (i) is defined as the average number (estimated) of transmitted bytes that precede the current frame, and this can be calculated by averaging bytes_num from the previous frames (not I) in a sliding window, that is, box j 6 the sliding window. (4) As previously stated, a sliding window can be used to calculate what is contained in the sliding window are completely or partially received (and, say, not completely lost). When images in a video sequence generally have the same spatial resolution, pk_num for a frame depends largely on the image content and the type of frame used. for compression. For example, a P-frame of a QCIF video may correspond to a packet, and an l-frame may need more bits and therefore corresponds to more packets, as illustrated in Figure 6A.

As shown in Figure 2A, scene cut artifacts can occur in a partially received scene cut frame. Because a scene cut frame is usually encoded as an l-frame, a partially received l-frame can be marked as a candidate location for scene cut artifacts, and this frame rate is registered as idx (k), where k indicates that the box is a candidate location k.

A scene cut box can also be encoded as a non-intra (for example, a Pframe). Scene-cut artifacts can also occur in that frame when it is partially received. A frame may also contain scene cut artifacts if it refers to a scene cut scene, as discussed earlier in Figure 28. In these scenarios, the parameters raised above can be used to determine more accurately whether a frame it must be a candidate location.

Figures 6A to 6D illustrate, through the examples, how to use the parameters set forth above to identify the candidate scene cut artifact locations. The pictures can be ordered in an order of decoding or a deployment order. In all the examples of FIGS. 6A to 6D, frames 60 and 120 are scene cut boxes in the original video.

In the examples of FIGS. 6A and 6B, frames 47, 109, 137, 235 and 271 are completely lost, and frames 120 and 210 are partially received. For tables 49, 110, 138, 236, 272, 120, and 210, pk_num (i) can be compared with pk_num_avg (i). When pk_num (i) is much longer than pk_num_avg (i), for example, 3, frame i can be identified as a candidate scene cutting frame in the decoded video. In the example of Figure 6A, frame 120 is identified as a candidate scene cut artifact location.

The comparison can also be made between bytes_num (i) and bytes_num_avg (i). If bytes_num (i) is much longer than bytes_n um_avg (¡), frame i can be identified as a candidate scene cutting frame in the decoded video. In the example of Fig. 6B, frame 120 is again identified as a candidate location.

In the examples of FIGS. 6C and 6D, the scene cutting frame 120 is completely lost. For this next table 121, pk_num (i) can be compared with pk_num_avg (i). In the example of Figure 6C,, 3. Therefore, frame 120 is not identified as a candidate scene cut artifact location. In contrast, when compare bytes_num (¡) with bytes_num_avg (¡), 3, and frame 120 is identified as a candidate location.

In general, the method that uses the estimated number of transmitted bytes is observed to have a better performance than the method that uses the estimated number of transmitted packets.

Figure 7 illustrates an example method 700 for determining the candidate scene cut artifact locations, which will be recorded in a data set denoted by. { idx (k)} . In step 710, the process to initialize k = 0 is initialized. The input bitstream in step 720 is then analyzed to obtain the frame type and the variables sns, sne, nioss, bytes_numpacke, and bytesrecvd for a frame current.

This determines if there is a lost packet in step 730. When a painting is completely lost, its nearest next frame, which has not been completely lost, is examined to determine if it is a candidate cut scene artifact location. When a table is partially received (that is, some, but not all, of the packets in the table are lost), this table is examined to determine if it is a candidate cut scene artifact location.

If it is a lost packet, it checks if the current box is an INTRA box. If the current box is an INTRA box, the The current frame is considered as a candidate scene cut location and the control goes to step 780. Otherwise, it calculates pk_num and pk_num_avg, for example, as described in equations (1) and (2), in step 740. This verifies if pk_num > T1 * pk_num_avg in step 750. If the inequality is maintained, the current frame is considered as a candidate frame for the candidate scene cutting artifacts and control goes to step 780.

Otherwise, it calculates bytes_num and bytes_num_avg, for example, as described in equations (3) and (4), in step 760. This checks whether bytes_num > T2 * bytes_num_avg in step 770. If the inequality is maintained, the current frame is considered as a candidate frame for scene cutting artifacts, and the current frame rate is registered as idx (k) and k is incremented by one in step 780. Otherwise, the control proceeds to step 790, which verifies whether the bit stream is fully analyzed. If the analysis is completed, the control goes to a final step 799. Otherwise, the control returns to step 720.

In Figure 7, both the estimated number of transmitted packets and the estimated number of bytes transmitted are used to determine candidate locations. In another implementation, these two methods can be examined in another order or they can be applied separately.

Detecting Scene Cutoff Artifact Locations Scene-cut artifacts can be detected after the candidate location set is determined. { idx (k)} . The present embodiments use the packet stratum information (such as the frame size) and the bit stream information (such as prediction residues and motion vectors) in the detection of scene cutting artifacts. The detection of scene cutting artifact can be done without reconstructing the video, that is, without reconstructing the pixel information of the video. It should be noted that the bit stream can be partially decoded to obtain information about the video, for example, the prediction residuals and the motion vectors.

When the frame size is used to detect the scene cutoff artifact locations, a difference between the byte numbers of the P-frames received (partially or completely) before and after a candidate scene cut position is calculated . If the difference exceeds a threshold, for example, three times larger or smaller, the candidate scene cutting frame is determined as a scene cutting frame.

On the other hand, it is observed that the change of prediction residual energy is often greater when there is a scene change. In general, the residual energy of prediction of P-frame and B-frame is not of the same order of magnitude, and the residual prediction energy of the B-frame is less reliable to indicate the video content information than that of the P-frame. Therefore, it is preferred to use the residual energy of the P-frames.

Referring to Figure 8, an example image 800 containing 11 * 9 = 99 macroblocks is illustrated. For each macroblock indicated by its location (m, n), a residual energy factor is calculated from the dequatized transform coefficients. In one modality, the factor of residual energy is calculated as where Xp, q (m, n) is the transformation coefficient dequatized at the location (p, q) within the macroblock (m, n). In another mode, only the AC coefficients are used to calculate the residual energy factor, that is, em, n = In another modality, when the 4x4 transform is used, the residual energy factor can be calculated as , where Xu (m, n) represents the coefficient OC and Xu, v (m, n) (v = 2, ..., 16) represents the AC coefficients for the 4x4 block, it is already a weighting factor for the OC coefficients. Must be Note that there are sixteen 4x4 blocks in a 16x16 macroblock and sixteen transformation coefficients in each 4x4 block. The predictive residual energy factors for an image can then be represented by a matrix: £ 1.1 £ 1.2 £ 1.3 £ 2.1 £ 2.2 £ 2.3 E - £ 3,1 £ 3,2 £ 3,3 When other coding units are used instead of a macroblock, the prediction residual energy calculation can be easily adapted.

A difference measurement matrix for the candidate box location k can be represented by: where Aem, n, k is | as measured by difference calculated for the candidate location k in the macroblock (m, n). By adding up the difference in all the macroblocks in a table, a measure of difference for the candidate box location can be calculated as You can also use a sub-group of the macroblocks to calculate Dk to speed up the calculation. For example, you can use the entire other row of macroblocks or the entire other column of macroblocks for the calculation.

In one modality, Áem n, k can be calculated as the difference between two Pframes closest to the candidate location: one immediately before the candidate location and the other immediately after it. Referring to Figures 9A and 9B, images 910 and 920 or images 950 and 960, can be used to calculate мm "ik by applying a subtraction between the prediction residual energy factors in the macroblock (m, n) in both images .

The parameter Aem n, k can also be calculated by applying a Gaussian filter difference (DoG) to more images, for example, a 10 DoG filter can be used with the filter center located at a candidate scene cutting artifact location. Referring again to Figures 9A and 9B, images 910 to 915 and 920 to 925 in Figure 9A, or images 950 to 955 and 960 to 965 in Figure 9B may be used. For each macroblock location (m, n), a difference of the Gaussian filter function applies to em '" from a frame window to get the parameter em, n, k When the difference calculated using the prediction residual energy exceeds a threshold, the candidate box can be detected as having scene cutting artifacts.

The motion vectors can also be used for the detection of scene cutting artifact. For example, the average magnitude of motion vectors, the variance of motion vectors, and the histogram of motion vectors within a frame window, can be calculated to indicate the level of motion. Motion vectors of P-frames are preferred for the detection of scene cutting artifact. If the difference of the movement levels exceeds a threshold, the candidate scene cut position can be determined as the scene cut box.

Using features such as frame size, prediction residual energy and motion vector, a scene cut box can be detected in the video decoded at a candidate location. If the scene change is detected in the decoded video, the candidate location is detected having scene cut artifacts. More particularly, the lost macroblocks of the detected scene cut frame are marked as having scene cut artifacts if the candidate location corresponds to a partially lost scene cut frame, and the macroblocks refer to a scene cut frame Lost items are marked as having scene cut artifacts if the candidate's location corresponds to a P- or B-frame that refers to a missing scene cut frame.

It should be noted that the scene cuts in the original video may or may not overlap with those that are observed in the decoded video. As stated above, for the example shown in Figure 2B, a scene change is observed in the image 280 in the decoded video, while the scene changes in the image 270 in the original video.

The frames in and around the candidate locations can be used to calculate the change in frame size, the prediction residual energy change and the change in movement, as illustrated in the examples of Figures 9A and 9B. When the candidate location corresponds to a partially cut scene frame received 905, the P-frames (910 ... 915, and 920 ... 925) surrounding the candidate location can be used. When a candidate location corresponds to a frame that refers to a missing scene cut frame of 940, the P-frames (950 ... 955, and 960 ... 965) that surround the missing frame can be used. When a candidate location corresponds to a P-frame, the candidate location itself (960) can be used to calculate the predicted residual energy difference. It should be noted that different image numbers can be used to calculate changes in frame sizes, prediction residuals and movement levels.

Figure 10 illustrates an example method 1000 for detection of scene cut frames of the candidate locations. In step 1005, the process is started by setting y = 0. The P-frames around a candidate location are select and the prediction residuals, frame sizes and motion vectors are analyzed in step 1010.

In step 1020, a measure of frame size difference for the candidate frame location is calculated. In step 1025, it is checked if there is a large frame size change in the candidate's location, such as by comparing it with a threshold. If the difference is less than a threshold, it passes control to step 1030.

Otherwise, for those P-frames selected in step 101, a prediction residual energy factor is calculated for the individual macroblocks in step 1030. Then, in step 1040, a difference measure is calculated for the locations of individual macroblocks, to indicate the change in the predictive residual energy, and a predictive residual energy difference measure for the candidate frame location can be calculated in step 1050. In step 1060, it is verified if there is a power change large prediction residual in the candidate location. In one embodiment, if Dk is large, for example, Dk > T3, where T3 is a threshold, then the candidate location is detected as a scene cut box in the decoded video and passes control to step 1080.

Otherwise, it calculates a measure of movement difference for the candidate location in step 1065. In step 1070, it is checked if there is a movement change big in the candidate location. If there is a large difference, it passes control to step 1080.

In step 1080, the corresponding frame index is registered as. { idx '(y)} and, and, it is increased by one, where and indicates that the frame is a scene cut box detected and in the decoded video. It is determined if all the candidate locations are processed at 1090. If all the candidate locations are processed, the control goes to a final step 1099. Otherwise, the control returns to step 1010.

In another embodiment, when the candidate scene cutting frame is an l-frame (735), the residual energy difference of prediction between the image and a preceding l-frame is calculated. The residual energy difference of prediction is calculated using the energy of the MBs received correctly in the image and the MBs placed in the preceding l-frame. If the difference between the energy factors is T times larger than the largest energy factor (for example, T4 = 1/3), the candidate l-frame is detected as a scene cut box in the decoded video. This is useful when the scene cut artifacts of the candidate scene cut-out box need to be determined before the decoder proceeds to the decoding of the next image, ie the information of the following images is not yet available in time. detection artifacts It should be noted that the characteristics can be considered in different orders. For example, the effectiveness of each feature can be learned through the training of a large set of video sequences in different coding / transmission conditions. Based on the training results, the order of the characteristics can be chosen based on the video content and coding / transmission conditions. It can also be decided to only try one or two more effective features to accelerate the detection of scene cutting artifact.

The different thresholds, for example,? ,, T2, T3, and T4, are used in methods 900 and 1000. These thresholds can be adapted, for example, to image properties or other conditions.

In another modality, when additional computational complexity is allowed, some l-images will be reconstructed. Generally, the pixel information may better reflect the texture content than the parameters analyzed from the bit stream (e.g., prediction residues and motion vectors), and therefore, using the reconstructed l-images for the Scene cut detection can improve detection accuracy. Because the decoding of l-frame is not so expensive in terms of computing resources such as decoding of P- or B-frames, this improved detection accuracy has a cost of a small upper computation space.

Figure 11 illustrates, by way of example, how adjacent l-frames can be used for scene cut detection. For the example shown in Figure 11A, when the candidate scene cutting frame (1120) is a partially received l-frame, the received part of the frame can be decoded appropriately in the pixel domain, because it does not refer to other frames. Similarly, the adjacent I-frames (1110, 1130) can also be decoded in the pixel domain (i.e., the images are reconstructed) without incurring too much decoding complexity. After the l-frames are reconstructed, traditional scene-cut detection methods can be applied, for example, by comparing the difference of the luminance histogram between the partially decoded pixels of the frame (1120) and the placed pixels of the frames. -Adjacent frames (1110, 1130).

For the example shown in Figure 11B, the candidate scene cutting frame (1160) may be completely lost. In this case, if the image feature difference (eg, histogram difference) between the adjacent l-frames (1150, 1170) is small, the candidate location can be identified as not being a cut location. of scene. This is especially true in the IPTV scenario where the GOP length is typically 0.5 to 1 second, during which multiple scene changes are unlikely.

Using the reconstructed l-frames for the detection of scene-cut artifacts can have limited use when the distance between the l-frames is large. For example, in the mobile video stream scenario, the GOP length can be up to 5 seconds, and the frame rate can be as low as 15 fps. Therefore, the distance between the candidate scene cut location and the previous l-frame is too large to obtain a robust detection performance.

The mode that decodes some l-pictures can be used in combination with the bit current level mode (eg, method 1000) to complement each other. In a modality, when they must be deployed together, it can be decided from the coding configuration (for example, resolution, frame rates).

The present principles can be used in a video quality monitor to measure the quality of the video. For example, the video quality monitor can detect and measure scene-cut artifacts and other types of artifacts, and can also consider artifacts produced by propagation to provide a general quality metric.

Figure 12 represents a block diagram of an example video quality monitor 1200. The input of the apparatus 1200 may include a transport stream containing a bit stream. The input can be in other formats that contain the bit stream.

The demultiplexer 1205 obtains the packet layer information, for example, the number of packets, number of bytes, frame sizes, number of bytes, frame sizes from the bit stream. The decoder 1210 analyzes the input current to obtain more information, for example, frame type, prediction residuals and motion vectors. The main decoder 1210 may not reconstruct the images. In other embodiments, the decoder can perform the functions of the demultiplexer.

Using the decoded information, the candidate scene cut artifact locations are detected in a candidate scene cut artifact detector 1220, where the method 700 can be used. For the candidate locations detected, a cut artifact detector 1230 determines whether there are scene cuts in the decoded video, so it determines whether the candidate locations contain scene cutting artifacts. For example, when the detected scene cut frame is a partially lost l-frame, a macroblock lost in the frame is detected having a scene cut artifact. In other example, when the detected scene cut frame refers to a lost scene cut frame, a macro block that refers to the lost scene cut frame is detected having a cutting artifact. The method 1000 can be used by the scene cut detector 1230.

After the scene cut artifacts are detected at a macroblock level, a quality predictor 1240 maps the artifacts into a quality classification. The 1240 quality predictor may consider other types of artifacts, and may also consider artifacts produced by error propagation.

Referring to Figure 13, a 1300 video transmission system or apparatus is shown, to which the features and principles described above can be applied. A processor 1305 processes the video and the encoder 1310 encodes the video. The bitstream generated from the encoder is transmitted to a decoder 1330 through a distribution network 1320. A video quality monitor can be used in the different stages.

In one embodiment, a video quality monitor 1340 can be used by a content creator. For example, the estimated video quality can be used by an encoder when deciding the encoding parameters, such as mode decision or bit index assignment. In another example, after the video is encoded, the content creator uses the video quality monitor to monitor the quality of the encoded video. If the quality metric does not meet a previously defined quality level, the content creator may choose to re-encode the video to improve the quality of the video. The content creator can also classify the encoded video based on the quality and load the content accordingly.

In another embodiment, a video quality monitor 1350 can be used by a content distributor. A video quality monitor can be placed in the distribution network. The video quality monitor calculates the quality metrics and reports them to the content distributor. Based on feedback from the video quality monitor, a content distributor can improve its service by adjusting bandwidth allocation and access control.

The content distributor can also send the feedback to the content creator to adjust the encoding. It should be noted that the improvement of the coding quality in the encoder may not necessarily improve the quality on the decoder side because a high quality encoded video usually requires more bandwidth and leaves less bandwidth for protection of transmission. Therefore, to achieve an optimum quality in the decoder, a balance between the bit rate of coding and the bandwidth must be considered for channel protection.

In another embodiment, a video quality monitor 1360 can be used by a user's device. For example, when a user's device searches for videos on the Internet, a search result can return many videos or many links to videos that correspond to the requested video content. The videos in the search results may have different levels of quality. A video quality monitor can calculate the quality metric for these videos and decide to select which videos to store. In another example, the user can have access to several error reconciliation techniques. A video quality monitor can calculate the quality metrics for different error reconciliation techniques and automatically choose which conciliation technique to use based on the calculated quality metric.

The implementations described in the present description can be implemented in, for example, a method or a process, an apparatus, a software program, a data stream or a signal. Even if they are only raised in the context of a single form of implementation (for example, presented as a method only), the implementation of the proposed characteristics can also be implemented in other forms (for example, an apparatus or program). An appliance can be implemented in, for example, suitable hardware, software and firmware. The methods can be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a device of programmable logic. Processors also include communication devices, such as, for example, computers, cell phones, portable / personal digital assistants ("PDAs"), and other devices that facilitate the communication of information among end users.

The implementations of the various processes and features described in the present description can be represented in a variety of different equipment or applications., particularly, for example, equipment or applications associated with data encoding, data decoding, scene cut artifact detection, quality measurement and quality monitoring. Examples of such equipment include an encoder, a decoder, a downstream processor that processes the output from a decoder, a preprocessor that provides input to an encoder, a video encoder, a video decoder, a video encoder-decoder, a global network server, a tuner box, a personal computer, a laptop, a telephone cellular phone, a PDA, a game console and other communication devices. As it should be clear, the equipment can be mobile and even be installed in a mobile vehicle.

Additionally, methods can be implemented by instructions that are being performed by a processor, and said instructions (and / or data values produced by an implementation) can be stored in a processor-readable medium, such as, for example, a circuit integrated, a software carrier or other storage device such as, for example, a hard disk, a compact disc ("CD"), an optical disc (such as, for example, a DVD, often referred to as a video disc) digital), a random access memory ("RAM") or a read-only memory ("ROM"). The instructions can form an application program represented in a tangible way on a medium readable by the processor. The instructions can be, for example, in hardware, firmware, software or a combination. The instructions can be found in, for example, an operating system, a separate application or a combination of the two. A processor can therefore be characterized, for example, both a device configured to perform a process and a device that includes a means readable by the processor (such as a storage device) that has instructions for performing a process. In addition, a processor-readable medium can store, in addition to or in place of instructions, the data values produced by an implementation.

As will be apparent to one skilled in the art, implementations can produce a variety of formatted signals to carry information that can be, for example, stored or transmitted. The information may include, for example, instructions for executing a method or data produced by one of the described implementations. For example, a signal can be formatted to carry the data governing the writing or reading of the syntax of a described modality, or to carry as data the actual syntax-values written by a described modality. Said signal can be formatted, for example, as an electromagnetic wave (for example, using a portion of the radio frequency spectrum) or as a baseband signal. The formatting can include, for example, the coding of a data stream and the modulation of a carrier with the encoded data stream. The information carried by the signal can be, for example, analog or digital information. The signal can be transmitted over a variety of different wired or wireless links, such as those known. The signal can be stored in a medium readable by the processor.

A number of implementations has been described. However, it will be understood that various modifications may be made. For example, the elements of Different implementations can be combined, supplemented, modified or removed to produce other implementations. Additionally, one skilled in the art will understand that other structures and processes may be substituted by those described and the processes may be replaced by those described and the resulting implementations shall perform at least substantially the same functions, in at least substantially the same way, for achieve at least substantially the same results as the described implementations. Accordingly, these and other implementations are contemplated by the present application.

Claims

1. A method comprising: access a bit stream that includes encoded images; Y determining (1080) a scene cut image in the bit stream using the bitstream information without decoding the bitstream to derive the pixel information.

2. The method as described in claim 1 further characterized in that the determination comprises: determining (1020, 1050, 1065) the respective difference measures in response to at least one of frame sizes, prediction residues and motion vectors between a set of images of the bitstream, wherein the set of images includes at least one of a candidate scene cut image, an image that precedes the candidate scene cut image, and an image after the candidate cut scene image; Y determining (1080) that the candidate scene cut image is the scene cut image if one or more of the difference measurements exceeds their respective predefined thresholds (1025, 1060, 1070).

3. The method as described in claim 2, further characterized in that determining the respective difference measures additionally comprises: calculating (1030) the prediction residual energy factors that correspond to a block location for images of a set of images; Y calculating (1040) a measure of difference for the block location or using predictive residual energy factors, where the difference measure for the block location is used to calculate the difference measure for the candidate scene cut image .

4. The method as described in claim 2, further characterized by additionally comprising: selecting (735, 780) an intra image as the candidate scene cut image if the compressed data for at least one block in the intra image is lost (730).

5. The method as described in claim 4, further characterized by additionally comprising: determine that at least one block in the scene cut image has a scene cut artifact.

6. The method as described in claim 5, further characterized by additionally comprising: assign a lower quality level to the at least one block that is determined to have the scene cut artifact.

7. The method as described in claim 2, further characterized by additionally comprising: Select an image that refers to a lost image as the candidate scene cut image.

8. The method as described in claim 7, further characterized by additionally comprising: determining (740) an estimated number of packets transmitted from an image and an average number of transmitted packets of images preceding the image, wherein the image is selected as the candidate scene cut image when a ratio between the estimated number of transmitted packets of the image and the average number of packets transmitted from the images preceding the image, exceed a predetermined threshold (750, 780).

9. The method as described in claim 7, further characterized by additionally comprising: determining (760) an estimated number of bytes transmitted from an image and an average number of bytes transmitted from images preceding the image, wherein the image is selected as the candidate scene cut image when a ratio between the estimated number of images transmitted bytes of the image and the average number of transmitted bytes of the images that precede the image, exceed a predetermined threshold (770, 780).

10. The method as described in claim 9, further characterized in that the estimated number of bytes transmitted is determined in response to a number of bytes received from the image and an estimated number of bytes lost.

11. The method as described in claim 7, further characterized in that it further comprises: determining that a block in the scene cut image has a scene cut artifact when the block refers to the lost image.

12. The method as described in claim 11, further characterized in that it further comprises: assigning a lower quality level a block, wherein it is determined that the block has the scene cutting artifact.

13. The method as described in claim 2, further characterized in that the images in the group of images are P-pictures (1010)

14. An apparatus, comprising: a decoder (1210) that accesses a stream of bits that includes encoded images; Y a scene cutting artifact detector that determines (1230) a scene cut image in the bit stream using the bitstream information without decoding the bit stream to derive the pixel information.

15. The apparatus as described in claim 14, further characterized in that the decoder (1210) decodes at least one of frame sizes, prediction residues and motion vectors for a group of images of a bit stream, wherein the group of images includes at least one of a candidate scene cut image, an image that precedes a scene cut image, and an image that follows the scene cut image candidate, and wherein the scene cutoff detector (1230) determines the respective difference measurements for the candidate scene cut image in response to at least one of the frame sizes, the prediction residuals, and the motion vectors and determines that the candidate scene cut image is the scene cut image if one or more of the difference measures exceeds their respective pre-determined thresholds.

16. The apparatus as described in claim 15, further characterized by additionally comprising: a candidate cut scene artifact detector (1220) which selects an intra image as the candidate scene cut image if the compressed data for at least one block in the intra image is lost.

17. The apparatus as described in claim 16, further characterized in that the scene cutoff detector (1230) determines that the at least one block in the scene cut image has a scene cutoff artifact.

18. The apparatus as described in claim 17, further characterized by additionally comprising: a quality predictor (1240) that assigns a lower quality level to the at least one block that is determined to have the scene cutoff artifact.

19. The apparatus as described in claim 15, further characterized in that it further comprises: a candidate scene cutting artifact detector (1220) that selects an image that refers to a lost image as the candidate scene cut image.

20. The apparatus as described in claim 19, further characterized in that the candidate scene cutting artifact detector (1220) determines an estimated number of transmitted packets of an image and an average number of transmitted packets of images preceding the image, and selects the image as the candidate scene cut image when a ratio between the estimated number of packets transmitted from the image and the average number of packets transmitted from the images preceding the image exceed a predetermined threshold.

21. The apparatus as described in the claim 19, further characterized in that the candidate scene cutting artifact detector (1220) determines an estimated number of transmitted bytes of an image and an average number of transmitted bytes of images that precede the image, and selects the image as the image of court of Candidate scene when a ratio between the estimated number of bytes transmitted from the image and the average number of bytes transmitted from the images that precede the image, exceed a predetermined threshold.

22. The apparatus as described in the claim 21, further characterized in that the candidate scene cutoff artifact detector (1220) determines the estimated number of bytes transmitted in response to a number of bytes received from the image and an estimated number of bytes lost.

23. The apparatus as described in the claim 19, further characterized in that the scene cutting artifact detector (1230) determines that a block in the scene cut image has a scene cut artifact when the block refers to the lost image.

24. The apparatus as described in the claim 23, further characterized by additionally comprising: a quality predictor (1240) assigns a lower quality level to the block where it is determined that the block has the scene cutting artifact.

25. A means readable by the processor that has stored in it instructions to cause one or more processors to perform collectively: access a bit stream that includes encoded images; Y determine (1080) the scene cut images in the bit stream using the bitstream information without decoding the bitstream to derive the pixel information.