WO2001063937A2 - Compressed video analysis - Google Patents

Compressed video analysis Download PDF

Info

Publication number
WO2001063937A2
WO2001063937A2 PCT/US2001/006094 US0106094W WO0163937A2 WO 2001063937 A2 WO2001063937 A2 WO 2001063937A2 US 0106094 W US0106094 W US 0106094W WO 0163937 A2 WO0163937 A2 WO 0163937A2
Authority
WO
WIPO (PCT)
Prior art keywords
data
motion vector
inter
intra
frame
Prior art date
Application number
PCT/US2001/006094
Other languages
French (fr)
Other versions
WO2001063937A3 (en
Inventor
Glenn Arthur Reitmeier
Original Assignee
Sarnoff Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sarnoff Corporation filed Critical Sarnoff Corporation
Priority to JP2001562030A priority Critical patent/JP2003533906A/en
Priority to EP01913054A priority patent/EP1258146A2/en
Publication of WO2001063937A2 publication Critical patent/WO2001063937A2/en
Publication of WO2001063937A3 publication Critical patent/WO2001063937A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/142Detection of scene cut or scene change
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/179Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scene or a shot
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/48Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using compressed domain processing techniques other than decoding, e.g. modification of transform coefficients, variable length coding [VLC] data or run-length data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/87Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving scene cut or scene change detection in combination with video compression

Definitions

  • the present invention relates to image processing, and, in particular, to the analysis of the content of a compressed video bitstream.
  • Compressed digital video standards such as H.261, MPEG-1, and MPEG-2, are on the verge of rapid deployment and proliferation in applications that include video teleconferencing, distributed multimedia systems, and television broadcasting. Unlike analog video signals, digital video signals employ many levels of data representation in order to effect their high compression rates.
  • Typical operations that are performed in a video compression encoder include: o Organization of original image pixels as blocks of pixels; o Motion estimation during which blocks of pixels from surrounding frames are correlated with each block of the current frame to find a "best prediction," which is then encoded as a motion vector; o Motion compensation during which the residual interframe differences are generated between each current block and the corresponding "best prediction" block; o DCT transformation during which a discrete cosine transformation is applied to each block of residual interframe differences; o Quantization during which the resulting DCT transform coefficients are quantized; o Run-length encoding during which the resulting quantized DCT coefficients are run-length/value encoded; o Generation of a series of instructions that indicate the start of a block, the motion vector used to predict that block, the run/length value data for the quantized DCT coefficients of the residual interframe differences, and the end of the block; and o Variable-length encoding during which the instructions are variable-length coded (VLC) according to tables defined in the corresponding
  • One way to achieve this goal is to fully decode the compressed video bitstream to the decoded pixel domain to generate a corresponding decoded video stream, which can then be analyzed "manually” by a human operator or “automatically” using conventional analysis tools that identify the locations of transitions between scenes in the video stream.
  • the present invention is directed to techniques for analyzing the content of compressed video bitstreams without having first to fully decode the bitstream to the decoded pixel domain.
  • a compressed video bitstream is only partially decoded (e.g., just enough to extract the motion vector data) and the partially decoded data is then analyzed to characterize the content of the bitstream.
  • the present invention is a method for characterizing picture content of a compressed video bitstream, comprising the steps of: (a) partially decoding the compressed video bitstream to extract particular data for the compressed video bitstream; and (b) analyzing the extracted particular data to characterize the picture content of the compressed video bitstream.
  • FIG. 1 is a flow diagram of processing, according to one embodiment of the present invention.
  • Compressed digital video signals can be analyzed to extract key high-level events that occur in their picture content, thus making it possible to monitor a compressed digital video bitstream for purposes such as cataloging, alerting, and/or key frame extraction.
  • Processes can be developed to directly process the compressed bitstream to extract information such as (but not necessarily limited to): (1) Scene changes;
  • a compressed digital video bitstream (such as a teleconferencing bitstream) can be a rich source of information.
  • the compressed syntax-level representation contains clues to key high-level events that occur in the picture content.
  • Fig. 1 shows a block diagram of the processing of a compressed digital video bitstream, according to one embodiment of the present invention.
  • the compressed digital video bitstream is received (step 102 in Fig. 1) and partially decoded (step 104), for example, just enough to extract the motion vector data represented in the bitstream for each frame.
  • this partial decoding may involve only the variable-length decoding of bitstream data and extraction of the motion vector data from the resulting variable-length decoded information.
  • substantial computational advantage is gained by (a) avoiding the inverse DCT transform and (b) avoiding the recomputation of motion vectors or other data that is extractable from the compressed domain.
  • storage requirements are significantly reduced by not having to store fully decoded pixel data
  • the extracted data is analyzed to characterize the content of the compressed digital video bitstream (step 106).
  • the type of analysis performed and the nature of the content characterized will vary from application to application. Some of these different applications are described below.
  • appropriate subsequent processing may be performed (step 108) based on the characterized bitstream content. This subsequent processing may include cataloging the various scenes in the bitstream or any other suitable processing.
  • a frame is predicted from another (i.e., reference) frame in the video stream by computing a motion vector for each block of data that best predicts it from the reference frame. If there is no good predictor from the reference frame, a block of data may instead be intra-frame coded (i.e., encoded without reference to any other frame and therefore without a motion vector being assigned).
  • information as to which blocks in the current frame have been encoded using intra-frame coding and which blocks have been encoded using inter-frame coding as well as the magnitudes and directions of the motion vectors used during the inter-frame coding may be analyzed to characterize the content of the compressed bitstream.
  • other information such as the DCT coefficients, may be analyzed to characterize bitstream content.
  • the block may be encoded using intra-frame coding.
  • a given frame may be encoded with both intra-frame coded blocks and inter-frame coded blocks.
  • the relative frequencies of intra-frame and inter-frame coded blocks per frame can be used to indicate certain types of changes in the picture content. In particular, if the number of intra-frame coded blocks in a current frame exceeds a specified threshold, then this may be an indication of the occurrence of a scene change or camera switch in the compressed bitstream.
  • the locations, relative magnitudes, and directions of the set of motion vectors for a given frame can be used as an indication of temporal changes in the picture content of the compressed bitstream, especially when these patterns continue over multiple consecutive frames.
  • Such motion vector pattern analysis can be used to distinguish different types of changes in picture content.
  • the motion vectors for most of the current frame will have relatively uniform magnitude and direction, with the new information being represented either as intra-frame coded blocks or as inter-frame coded blocks with possibly uncorrelated motion vectors.
  • Such a pattern of motion vectors and inter/intra block types can be used to detect the occurrence of a camera pan.
  • a camera "zoom in” may be detected as a set of motion vectors forming a radial pattern with the motion vectors generally referencing towards the focal point of the zoom.
  • a camera "zoom ouf may be detected as a set of motion vectors forming a radial pattern with most of the motion vectors generally referencing away from the focal point of the zoom with a ring of intra-coded blocks and/or inter- coded blocks having uncorrelated motion vectors around the outer boundary of the frame corresponding to the new information added to the field of view during the camera zoom out.
  • Patterns within the motion vector field can also be used to indicate the motion of a person/object within a scene, or the entrance or exit of a person/object to or from a scene.
  • a person/object moving within the camera's field of view will be indicated by a region of similar (i.e., highly correlated) motion vectors that progress in a trajectory across several frames.
  • a person/object entering or exiting from the edge of the field of view may be indicated by a growing or shrinking region of correlated motion vectors at the corresponding picture boundary.
  • a person/object entering or exiting, e.g., from a doorway, within the field of view will likely be indicated by a growing or shrinking region of motion vectors forming an inward-pointing or outward-pointing radial pattern across a series of frames.
  • spatial and temporal patterns of motion vectors and inter/intra block type fields can be used to detect these different situations.
  • a sequence of frames having a motion vector pattern in which almost all motion vectors are zero motion vectors can be used detect the occurrence of still text, slides, and pictures that occupy the entire video frame, or a general lack of moving objects in the scene.
  • Motion vector data could also be used to guide noise reduction and edge enhancement processing. If a block is stationary over several frames, these blocks can be averaged together for temporal noise reduction. If motion exists, such averaging would result in unacceptable motion blur. On the other hand, noise reduction could be achieved by averaging after taking motion into consideration. In effect, the motion vector data substitutes for the motion detection in a motion-adaptive noise reduction algorithm. Similarly, motion vector data can be used to implement temporal edge enhancement techniques. In addition, the knowledge of the coarseness of the actual quantization matrices can be used to contrain enhancement processing.
  • One way to characterize the various patterns of motion vectors would be to have a set of canned motion vector patterns that would be convolved over the decoded motion vector field (e.g., taking vector inner products along the way) to generate a correlation value (e.g., average inner product) that could be compared to a threshold value to determine whether the motion vector field possessed a similar general pattern.
  • Another technique would be to use statistical analysis (e.g., mean and or standard deviation of motion vector data) over either the entire picture or specific regions to characterize the presence of high- level events in the scene. For example, a set of contiguous blocks having a large mean motion vector and a small standard deviation within an otherwise stationary picture suggests the presence of a moving object within the scene.
  • the DCT coefficients can be used to characterize the spatial frequency within each frame as well as the temporal changes in spatial frequency between frames. This information can be used to characterize certain types of picture content.
  • the DC coefficient of the (B-Y) component DCT block is large in most blocks in the upper third of a frame, it probably corresponds to sky.
  • low-energy DCT coefficients indicate no substantial change, while high-energy DCT coefficients may indicate a change of shape or texture within the block.
  • text appearing in an image usually exhibits the combination of having high contrast, being monochromatic, and having many edges.
  • This unusual combination of characteristics may be indicated in the DCT coefficients as high-energy, high-frequency DCT coefficients in many orientations with few or even no non-zero quantized coefficients in the corresponding U and V blocks.
  • a raster display in a video sequence may be detected by the presence of a temporal beat frequency between the raster display frame rate and the frame rate of the video sequence.
  • a temporal beat frequency between the raster display frame rate and the frame rate of the video sequence.
  • the present invention may be implemented as circuit-based processes, including possible implementation on a single integrated circuit.
  • various functions of circuit elements may also be implemented as processing steps in a software program.
  • Such software may be employed in, for example, a digital signal processor, micro-controller, or general-purpose computer.
  • the present invention can be embodied in the form of methods and apparatuses for practicing those methods.
  • the present invention can also be embodied in the form of program code embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.
  • the present invention can also be embodied in the form of program code, for example, whether stored in a storage medium, loaded into and/or executed by a machine, or transmitted over some transmission medium or carrier, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.
  • program code When implemented on a general-purpose processor, the program code segments combine with the processor to provide a unique device that operates analogously to specific logic circuits.

Abstract

A compressed digital video bitstream is partially decoded to extract certain data (e.g., inter/intra block field data, motion vector data, and/or DCT coefficient data). The extracted data is then analyzed (for a single frame or over multiple frames) to characterize picture content. For example, the inter/intra block field data can be analyzed to detect the existence of scene changes and camera switches. In addition to inter/intra block field data, patterns in the motion vector data can also be characterized to detect the existence of camera pans and zooms and the occurrence of persons/objects moving within or entering or leaving a scene. Intra/inter block field data and motion vector data can also be used to detect the occurrence of still text, slides, and pictures within a compressed bitstream. In addition, DCT coefficient data can be used to detect the existence of text and raster displays within the field of view.

Description

COMPRESSED VIDEO ANALYSIS
BACKGROUND OF THE INVENTION Field of the Invention The present invention relates to image processing, and, in particular, to the analysis of the content of a compressed video bitstream.
Description of the Related Art
Compressed digital video standards, such as H.261, MPEG-1, and MPEG-2, are on the verge of rapid deployment and proliferation in applications that include video teleconferencing, distributed multimedia systems, and television broadcasting. Unlike analog video signals, digital video signals employ many levels of data representation in order to effect their high compression rates. Typical operations that are performed in a video compression encoder include: o Organization of original image pixels as blocks of pixels; o Motion estimation during which blocks of pixels from surrounding frames are correlated with each block of the current frame to find a "best prediction," which is then encoded as a motion vector; o Motion compensation during which the residual interframe differences are generated between each current block and the corresponding "best prediction" block; o DCT transformation during which a discrete cosine transformation is applied to each block of residual interframe differences; o Quantization during which the resulting DCT transform coefficients are quantized; o Run-length encoding during which the resulting quantized DCT coefficients are run-length/value encoded; o Generation of a series of instructions that indicate the start of a block, the motion vector used to predict that block, the run/length value data for the quantized DCT coefficients of the residual interframe differences, and the end of the block; and o Variable-length encoding during which the instructions are variable-length coded (VLC) according to tables defined in the corresponding standard (e.g., based on Huffinan coding principles) to form a serial compressed video bitstream. In virtually every compressed digital video standard, only the VLC and instruction syntaxes are rigidly defined. This approach allows encoders to employ algorithms of varying cost performance and allows encoder performance to improve over time, while maintaining full backwards compatibility with the installed base of decoders. However, because of the syntax elements that are part of a given compression standard, any reasonable encoder built to work with those standards will inevitably have similarities in certain algorithmic approaches, such as motion compensation. In certain applications, it is desirable to determine information about the content of an existing compressed video bitstream. For example, it may be desirable to identify and catalog the locations of different scenes in a video stream in order to enable subsequent searching for specific scenes. One way to achieve this goal is to fully decode the compressed video bitstream to the decoded pixel domain to generate a corresponding decoded video stream, which can then be analyzed "manually" by a human operator or "automatically" using conventional analysis tools that identify the locations of transitions between scenes in the video stream.
SUMMARY OF THE INVENTION The present invention is directed to techniques for analyzing the content of compressed video bitstreams without having first to fully decode the bitstream to the decoded pixel domain. According to the present invention, a compressed video bitstream is only partially decoded (e.g., just enough to extract the motion vector data) and the partially decoded data is then analyzed to characterize the content of the bitstream. According to one embodiment, the present invention is a method for characterizing picture content of a compressed video bitstream, comprising the steps of: (a) partially decoding the compressed video bitstream to extract particular data for the compressed video bitstream; and (b) analyzing the extracted particular data to characterize the picture content of the compressed video bitstream.
BRIEF DESCRIPTION OF THE DRAWINGS
Other aspects, features, and advantages of the present invention will become more fully apparent from the following detailed description, the appended claims, and the accompanying drawings in which Fig. 1 is a flow diagram of processing, according to one embodiment of the present invention.
DETAILED DESCRIPTION
Compressed digital video signals can be analyzed to extract key high-level events that occur in their picture content, thus making it possible to monitor a compressed digital video bitstream for purposes such as cataloging, alerting, and/or key frame extraction. Processes can be developed to directly process the compressed bitstream to extract information such as (but not necessarily limited to): (1) Scene changes;
(2) Camera switches;
(3) Camera pans;
(4) Camera zooms;
(5) Person/object moving within or entering/leaving a scene; (6) Classifying occurrence of text, slide/picture, raster display, etc.; and
(7) Performing enhancement processing Development of such approaches can enable a wide range of systems and applications as digital video becomes increasingly integrated into the information infrastructure.
With the combined knowledge of compression syntax, common encoder compression processing practices, and video signal processing expertise, a compressed digital video bitstream (such as a teleconferencing bitstream) can be a rich source of information. The compressed syntax-level representation contains clues to key high-level events that occur in the picture content.
Fig. 1 shows a block diagram of the processing of a compressed digital video bitstream, according to one embodiment of the present invention. The compressed digital video bitstream is received (step 102 in Fig. 1) and partially decoded (step 104), for example, just enough to extract the motion vector data represented in the bitstream for each frame. Depending on the encoding scheme, this partial decoding may involve only the variable-length decoding of bitstream data and extraction of the motion vector data from the resulting variable-length decoded information. Most importantly, substantial computational advantage is gained by (a) avoiding the inverse DCT transform and (b) avoiding the recomputation of motion vectors or other data that is extractable from the compressed domain. Furthermore, storage requirements are significantly reduced by not having to store fully decoded pixel data
After extracting data for one or more frames, the extracted data is analyzed to characterize the content of the compressed digital video bitstream (step 106). The type of analysis performed and the nature of the content characterized will vary from application to application. Some of these different applications are described below. Again, depending on the particular application, appropriate subsequent processing may be performed (step 108) based on the characterized bitstream content. This subsequent processing may include cataloging the various scenes in the bitstream or any other suitable processing.
Many video compression standards rely on predictive (i.e., inter-frame) coding, where a frame is predicted from another (i.e., reference) frame in the video stream by computing a motion vector for each block of data that best predicts it from the reference frame. If there is no good predictor from the reference frame, a block of data may instead be intra-frame coded (i.e., encoded without reference to any other frame and therefore without a motion vector being assigned). According to embodiments of the present invention (as described below), information as to which blocks in the current frame have been encoded using intra-frame coding and which blocks have been encoded using inter-frame coding as well as the magnitudes and directions of the motion vectors used during the inter-frame coding may be analyzed to characterize the content of the compressed bitstream. In other embodiments, other information, such as the DCT coefficients, may be analyzed to characterize bitstream content.
Frequency of Intra-Frame Coded Blocks As described above, during motion estimation processing, if a good predictor for a current block of pixel data cannot be found in the reference frame, the block may be encoded using intra-frame coding. A given frame may be encoded with both intra-frame coded blocks and inter-frame coded blocks. The relative frequencies of intra-frame and inter-frame coded blocks per frame can be used to indicate certain types of changes in the picture content. In particular, if the number of intra-frame coded blocks in a current frame exceeds a specified threshold, then this may be an indication of the occurrence of a scene change or camera switch in the compressed bitstream.
Motion Vector Pattern Analysis
The locations, relative magnitudes, and directions of the set of motion vectors for a given frame can be used as an indication of temporal changes in the picture content of the compressed bitstream, especially when these patterns continue over multiple consecutive frames. Such motion vector pattern analysis can be used to distinguish different types of changes in picture content.
For example, during a camera pan, most of the picture content in the current frame was present in the previous frame, but at a uniformly translated location, with new information added to the field of view along one picture boundary (i.e., corresponding to the direction of the pan). As a result, the motion vectors for most of the current frame will have relatively uniform magnitude and direction, with the new information being represented either as intra-frame coded blocks or as inter-frame coded blocks with possibly uncorrelated motion vectors. Such a pattern of motion vectors and inter/intra block types can be used to detect the occurrence of a camera pan.
A camera "zoom in" may be detected as a set of motion vectors forming a radial pattern with the motion vectors generally referencing towards the focal point of the zoom. Similarly, a camera "zoom ouf may be detected as a set of motion vectors forming a radial pattern with most of the motion vectors generally referencing away from the focal point of the zoom with a ring of intra-coded blocks and/or inter- coded blocks having uncorrelated motion vectors around the outer boundary of the frame corresponding to the new information added to the field of view during the camera zoom out. Patterns within the motion vector field can also be used to indicate the motion of a person/object within a scene, or the entrance or exit of a person/object to or from a scene. A person/object moving within the camera's field of view will be indicated by a region of similar (i.e., highly correlated) motion vectors that progress in a trajectory across several frames. A person/object entering or exiting from the edge of the field of view may be indicated by a growing or shrinking region of correlated motion vectors at the corresponding picture boundary. A person/object entering or exiting, e.g., from a doorway, within the field of view will likely be indicated by a growing or shrinking region of motion vectors forming an inward-pointing or outward-pointing radial pattern across a series of frames. Here, too, spatial and temporal patterns of motion vectors and inter/intra block type fields can be used to detect these different situations. A sequence of frames having a motion vector pattern in which almost all motion vectors are zero motion vectors (i.e., corresponding to essentially no motion across the image) can be used detect the occurrence of still text, slides, and pictures that occupy the entire video frame, or a general lack of moving objects in the scene.
Motion vector data could also be used to guide noise reduction and edge enhancement processing. If a block is stationary over several frames, these blocks can be averaged together for temporal noise reduction. If motion exists, such averaging would result in unacceptable motion blur. On the other hand, noise reduction could be achieved by averaging after taking motion into consideration. In effect, the motion vector data substitutes for the motion detection in a motion-adaptive noise reduction algorithm. Similarly, motion vector data can be used to implement temporal edge enhancement techniques. In addition, the knowledge of the coarseness of the actual quantization matrices can be used to contrain enhancement processing.
One way to characterize the various patterns of motion vectors would be to have a set of canned motion vector patterns that would be convolved over the decoded motion vector field (e.g., taking vector inner products along the way) to generate a correlation value (e.g., average inner product) that could be compared to a threshold value to determine whether the motion vector field possessed a similar general pattern. Another technique would be to use statistical analysis (e.g., mean and or standard deviation of motion vector data) over either the entire picture or specific regions to characterize the presence of high- level events in the scene. For example, a set of contiguous blocks having a large mean motion vector and a small standard deviation within an otherwise stationary picture suggests the presence of a moving object within the scene.
Quantized DCT Coefficients
In certain embodiments of the present invention, it may be useful to decode the compressed digital video bitstream sufficiently to extract the quantized DCT coefficients. Depending on the application, it may be further useful to dequantize the extracted quantized DCT coefficients, although even knowledge of the presence of a non-zero quantized DCT coefficients (which emerges from the number and type of run- length/value joint codes and the location of the end of block (EOB) codes) may be enough to provide insight into picture content. In either case, the DCT coefficients can be used to characterize the spatial frequency within each frame as well as the temporal changes in spatial frequency between frames. This information can be used to characterize certain types of picture content. For example, if the DC coefficient of the (B-Y) component DCT block is large in most blocks in the upper third of a frame, it probably corresponds to sky. As another example, in an inter-frame encoded block, low-energy DCT coefficients indicate no substantial change, while high-energy DCT coefficients may indicate a change of shape or texture within the block.
For example, text appearing in an image (either over the entire field of view or just within a region of the image) usually exhibits the combination of having high contrast, being monochromatic, and having many edges. This unusual combination of characteristics may be indicated in the DCT coefficients as high-energy, high-frequency DCT coefficients in many orientations with few or even no non-zero quantized coefficients in the corresponding U and V blocks.
Similarly, the existence of a raster display in a video sequence (e.g., an image in which an active television or computer display is within the field of view) may be detected by the presence of a temporal beat frequency between the raster display frame rate and the frame rate of the video sequence. Both inter/intra block type fields and DCT level spatio-temporal frequency analysis can contribute to this detection.
The present invention may be implemented as circuit-based processes, including possible implementation on a single integrated circuit. As would be apparent to one skilled in the art, various functions of circuit elements may also be implemented as processing steps in a software program. Such software may be employed in, for example, a digital signal processor, micro-controller, or general-purpose computer.
The present invention can be embodied in the form of methods and apparatuses for practicing those methods. The present invention can also be embodied in the form of program code embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. The present invention can also be embodied in the form of program code, for example, whether stored in a storage medium, loaded into and/or executed by a machine, or transmitted over some transmission medium or carrier, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. When implemented on a general-purpose processor, the program code segments combine with the processor to provide a unique device that operates analogously to specific logic circuits. It will be further understood that various changes in the details, materials, and arrangements of the parts which have been described and illustrated in order to explain the nature of this invention may be made by those skilled in the art without departing from the principle and scope of the invention as expressed in the following claims.

Claims

CLAIMS What is claimed is:
1. A method for characterizing picture content of a compressed video bitstream, comprising the steps of: (a) partially decoding the compressed video bitstream to extract particular data for the compressed video bitstream; and
(b) analyzing the extracted particular data to characterize the picture content of the compressed video bitstream.
2. The invention of claim 1, wherein: the extracted particular data comprises inter/intra block type data; and step (b) comprises the steps of counting a number of intra-frame blocks per frame and comparing the number of intra-frame blocks to a specified threshold to detect changes in the picture content.
3. The invention of claim 1 , wherein the extracted particular data comprises motion vector data.
4. The invention of claim 3, wherein the extracted particular data further comprises inter/intra block type data.
5. The invention of claim 3, wherein step (b) comprises the step of characterizing a pattern in the motion vector data to detect changes in the picture content.
6. The invention of claim 3, wherein step (b) comprises the step of using the motion vector data to guide noise reduction processing.
7. The invention of claim 3, wherein step (b) comprises the step of using the motion vector data to guide edge enhancement processing.
8. The invention of claim 1, wherein the extracted particular data comprises DCT coefficient data.
9. The invention of claim 8, wherein the extracted particular data further comprises inter/intra block type data.
10. The invention of claim 8, wherein step (b) comprises the step of characterizing the DCT coefficient data to characterize the picture content.
PCT/US2001/006094 2000-02-24 2001-02-26 Compressed video analysis WO2001063937A2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2001562030A JP2003533906A (en) 2000-02-24 2001-02-26 Compressed video analysis
EP01913054A EP1258146A2 (en) 2000-02-24 2001-02-26 Compressed video analysis

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US51240600A 2000-02-24 2000-02-24
US09/512,406 2000-02-24

Publications (2)

Publication Number Publication Date
WO2001063937A2 true WO2001063937A2 (en) 2001-08-30
WO2001063937A3 WO2001063937A3 (en) 2002-01-31

Family

ID=24038957

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/006094 WO2001063937A2 (en) 2000-02-24 2001-02-26 Compressed video analysis

Country Status (3)

Country Link
EP (1) EP1258146A2 (en)
JP (1) JP2003533906A (en)
WO (1) WO2001063937A2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SG140441A1 (en) * 2003-03-17 2008-03-28 St Microelectronics Asia Decoder and method of decoding using pseudo two pass decoding and one pass encoding
US8004563B2 (en) 2002-07-05 2011-08-23 Agent Vi Method and system for effectively performing event detection using feature streams of image sequences
US20130279882A1 (en) * 2012-04-23 2013-10-24 Apple Inc. Coding of Video and Audio with Initialization Fragments
US9330426B2 (en) 2010-09-30 2016-05-03 British Telecommunications Public Limited Company Digital video fingerprinting
US9369668B2 (en) 2014-03-14 2016-06-14 Cisco Technology, Inc. Elementary video bitstream analysis

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4747681B2 (en) * 2005-05-31 2011-08-17 パナソニック株式会社 Digital broadcast receiver
JP4743601B2 (en) * 2005-09-21 2011-08-10 Kddi株式会社 Moving image processing device
CN102611891B (en) * 2012-02-07 2014-05-07 中国电子科技集团公司第三研究所 Method for directly performing transform coding in transform domain

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998052356A1 (en) * 1997-05-16 1998-11-19 The Trustees Of Columbia University In The City Of New York Methods and architecture for indexing and editing compressed video over the world wide web
US5911008A (en) * 1996-04-30 1999-06-08 Nippon Telegraph And Telephone Corporation Scheme for detecting shot boundaries in compressed video data using inter-frame/inter-field prediction coding and intra-frame/intra-field coding

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5911008A (en) * 1996-04-30 1999-06-08 Nippon Telegraph And Telephone Corporation Scheme for detecting shot boundaries in compressed video data using inter-frame/inter-field prediction coding and intra-frame/intra-field coding
WO1998052356A1 (en) * 1997-05-16 1998-11-19 The Trustees Of Columbia University In The City Of New York Methods and architecture for indexing and editing compressed video over the world wide web

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BOYCE J M: "NOISE REDUCTION OF IMAGE SEQUENCES USING ADAPTIVE MOTION COMPENSATED FRAME AVERAGING" MULTIDIMENSIONAL SIGNAL PROCESSING. SAN FRANCISCO, MAR. 23 - 26, 1992, PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), NEW YORK, IEEE, US, vol. 3 CONF. 17, 23 March 1992 (1992-03-23), pages 461-464, XP000378968 ISBN: 0-7803-0532-9 *
HONGJIANG ZHANG ET AL: "VIDEO PARSING AND BROWSING USING COMPRESSED DATA" MULTIMEDIA TOOLS AND APPLICATIONS, KLUWER ACADEMIC PUBLISHERS, BOSTON, US, vol. 1, 1995, pages 89-111, XP000571810 ISSN: 1380-7501 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8004563B2 (en) 2002-07-05 2011-08-23 Agent Vi Method and system for effectively performing event detection using feature streams of image sequences
SG140441A1 (en) * 2003-03-17 2008-03-28 St Microelectronics Asia Decoder and method of decoding using pseudo two pass decoding and one pass encoding
US9330426B2 (en) 2010-09-30 2016-05-03 British Telecommunications Public Limited Company Digital video fingerprinting
US20130279882A1 (en) * 2012-04-23 2013-10-24 Apple Inc. Coding of Video and Audio with Initialization Fragments
US10264274B2 (en) 2012-04-23 2019-04-16 Apple Inc. Coding of video and audio with initialization fragments
US10992946B2 (en) 2012-04-23 2021-04-27 Apple Inc. Coding of video and audio with initialization fragments
US9369668B2 (en) 2014-03-14 2016-06-14 Cisco Technology, Inc. Elementary video bitstream analysis

Also Published As

Publication number Publication date
WO2001063937A3 (en) 2002-01-31
EP1258146A2 (en) 2002-11-20
JP2003533906A (en) 2003-11-11

Similar Documents

Publication Publication Date Title
EP1709801B1 (en) Video Decoding Method Using Adaptive Quantization Matrices
CA2316848C (en) Improved video coding using adaptive coding of block parameters for coded/uncoded blocks
CA2374067C (en) Method and apparatus for generating compact transcoding hints metadata
US6175593B1 (en) Method for estimating motion vector in moving picture
US7738550B2 (en) Method and apparatus for generating compact transcoding hints metadata
US5491523A (en) Image motion vector detecting method and motion vector coding method
US5508744A (en) Video signal compression with removal of non-correlated motion vectors
EP1021041B1 (en) Methods of scene fade detection for indexing of video sequences
US20100110303A1 (en) Look-Ahead System and Method for Pan and Zoom Detection in Video Sequences
EP1135934A1 (en) Efficient macroblock header coding for video compression
EP2536143B1 (en) Method and a digital video encoder system for encoding digital video data
JP2001197501A (en) Motion vector searching device and motion vector searching method, and moving picture coder
US6480543B1 (en) Detection of a change of scene in a motion estimator of a video encoder
US5699129A (en) Method and apparatus for motion vector determination range expansion
EP1446957A1 (en) Feature extraction and detection of events and temporal variations in activity in video sequences
Tsai et al. Block-matching motion estimation using correlation search algorithm
US6847684B1 (en) Zero-block encoding
KR20040060980A (en) Method and system for detecting intra-coded pictures and for extracting intra DCT precision and macroblock-level coding parameters from uncompressed digital video
EP1258146A2 (en) Compressed video analysis
US9654775B2 (en) Video encoder with weighted prediction and methods for use therewith
US8472523B2 (en) Method and apparatus for detecting high level white noise in a sequence of video frames
JP2002064823A (en) Apparatus and method for detecting scene change of compressed dynamic image as well as recording medium recording its program
WO2000027128A1 (en) Methods and apparatus for improved motion estimation for video encoding
Li et al. A robust, efficient, and fast global motion estimation method from MPEG compressed video
Fenimore et al. Test patterns and quality metrics for digital video compression

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): BR JP

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
AK Designated states

Kind code of ref document: A3

Designated state(s): BR JP

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR

WWE Wipo information: entry into national phase

Ref document number: 2001913054

Country of ref document: EP

ENP Entry into the national phase in:

Ref country code: JP

Ref document number: 2001 562030

Kind code of ref document: A

Format of ref document f/p: F

WWP Wipo information: published in national office

Ref document number: 2001913054

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 2001913054

Country of ref document: EP