WO2001063937A2 - Compressed video analysis - Google Patents
Compressed video analysis Download PDFInfo
- Publication number
- WO2001063937A2 WO2001063937A2 PCT/US2001/006094 US0106094W WO0163937A2 WO 2001063937 A2 WO2001063937 A2 WO 2001063937A2 US 0106094 W US0106094 W US 0106094W WO 0163937 A2 WO0163937 A2 WO 0163937A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- motion vector
- inter
- intra
- frame
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/90—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/142—Detection of scene cut or scene change
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/179—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scene or a shot
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/48—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using compressed domain processing techniques other than decoding, e.g. modification of transform coefficients, variable length coding [VLC] data or run-length data
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/85—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
- H04N19/87—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving scene cut or scene change detection in combination with video compression
Definitions
- the present invention relates to image processing, and, in particular, to the analysis of the content of a compressed video bitstream.
- Compressed digital video standards such as H.261, MPEG-1, and MPEG-2, are on the verge of rapid deployment and proliferation in applications that include video teleconferencing, distributed multimedia systems, and television broadcasting. Unlike analog video signals, digital video signals employ many levels of data representation in order to effect their high compression rates.
- Typical operations that are performed in a video compression encoder include: o Organization of original image pixels as blocks of pixels; o Motion estimation during which blocks of pixels from surrounding frames are correlated with each block of the current frame to find a "best prediction," which is then encoded as a motion vector; o Motion compensation during which the residual interframe differences are generated between each current block and the corresponding "best prediction" block; o DCT transformation during which a discrete cosine transformation is applied to each block of residual interframe differences; o Quantization during which the resulting DCT transform coefficients are quantized; o Run-length encoding during which the resulting quantized DCT coefficients are run-length/value encoded; o Generation of a series of instructions that indicate the start of a block, the motion vector used to predict that block, the run/length value data for the quantized DCT coefficients of the residual interframe differences, and the end of the block; and o Variable-length encoding during which the instructions are variable-length coded (VLC) according to tables defined in the corresponding
- One way to achieve this goal is to fully decode the compressed video bitstream to the decoded pixel domain to generate a corresponding decoded video stream, which can then be analyzed "manually” by a human operator or “automatically” using conventional analysis tools that identify the locations of transitions between scenes in the video stream.
- the present invention is directed to techniques for analyzing the content of compressed video bitstreams without having first to fully decode the bitstream to the decoded pixel domain.
- a compressed video bitstream is only partially decoded (e.g., just enough to extract the motion vector data) and the partially decoded data is then analyzed to characterize the content of the bitstream.
- the present invention is a method for characterizing picture content of a compressed video bitstream, comprising the steps of: (a) partially decoding the compressed video bitstream to extract particular data for the compressed video bitstream; and (b) analyzing the extracted particular data to characterize the picture content of the compressed video bitstream.
- FIG. 1 is a flow diagram of processing, according to one embodiment of the present invention.
- Compressed digital video signals can be analyzed to extract key high-level events that occur in their picture content, thus making it possible to monitor a compressed digital video bitstream for purposes such as cataloging, alerting, and/or key frame extraction.
- Processes can be developed to directly process the compressed bitstream to extract information such as (but not necessarily limited to): (1) Scene changes;
- a compressed digital video bitstream (such as a teleconferencing bitstream) can be a rich source of information.
- the compressed syntax-level representation contains clues to key high-level events that occur in the picture content.
- Fig. 1 shows a block diagram of the processing of a compressed digital video bitstream, according to one embodiment of the present invention.
- the compressed digital video bitstream is received (step 102 in Fig. 1) and partially decoded (step 104), for example, just enough to extract the motion vector data represented in the bitstream for each frame.
- this partial decoding may involve only the variable-length decoding of bitstream data and extraction of the motion vector data from the resulting variable-length decoded information.
- substantial computational advantage is gained by (a) avoiding the inverse DCT transform and (b) avoiding the recomputation of motion vectors or other data that is extractable from the compressed domain.
- storage requirements are significantly reduced by not having to store fully decoded pixel data
- the extracted data is analyzed to characterize the content of the compressed digital video bitstream (step 106).
- the type of analysis performed and the nature of the content characterized will vary from application to application. Some of these different applications are described below.
- appropriate subsequent processing may be performed (step 108) based on the characterized bitstream content. This subsequent processing may include cataloging the various scenes in the bitstream or any other suitable processing.
- a frame is predicted from another (i.e., reference) frame in the video stream by computing a motion vector for each block of data that best predicts it from the reference frame. If there is no good predictor from the reference frame, a block of data may instead be intra-frame coded (i.e., encoded without reference to any other frame and therefore without a motion vector being assigned).
- information as to which blocks in the current frame have been encoded using intra-frame coding and which blocks have been encoded using inter-frame coding as well as the magnitudes and directions of the motion vectors used during the inter-frame coding may be analyzed to characterize the content of the compressed bitstream.
- other information such as the DCT coefficients, may be analyzed to characterize bitstream content.
- the block may be encoded using intra-frame coding.
- a given frame may be encoded with both intra-frame coded blocks and inter-frame coded blocks.
- the relative frequencies of intra-frame and inter-frame coded blocks per frame can be used to indicate certain types of changes in the picture content. In particular, if the number of intra-frame coded blocks in a current frame exceeds a specified threshold, then this may be an indication of the occurrence of a scene change or camera switch in the compressed bitstream.
- the locations, relative magnitudes, and directions of the set of motion vectors for a given frame can be used as an indication of temporal changes in the picture content of the compressed bitstream, especially when these patterns continue over multiple consecutive frames.
- Such motion vector pattern analysis can be used to distinguish different types of changes in picture content.
- the motion vectors for most of the current frame will have relatively uniform magnitude and direction, with the new information being represented either as intra-frame coded blocks or as inter-frame coded blocks with possibly uncorrelated motion vectors.
- Such a pattern of motion vectors and inter/intra block types can be used to detect the occurrence of a camera pan.
- a camera "zoom in” may be detected as a set of motion vectors forming a radial pattern with the motion vectors generally referencing towards the focal point of the zoom.
- a camera "zoom ouf may be detected as a set of motion vectors forming a radial pattern with most of the motion vectors generally referencing away from the focal point of the zoom with a ring of intra-coded blocks and/or inter- coded blocks having uncorrelated motion vectors around the outer boundary of the frame corresponding to the new information added to the field of view during the camera zoom out.
- Patterns within the motion vector field can also be used to indicate the motion of a person/object within a scene, or the entrance or exit of a person/object to or from a scene.
- a person/object moving within the camera's field of view will be indicated by a region of similar (i.e., highly correlated) motion vectors that progress in a trajectory across several frames.
- a person/object entering or exiting from the edge of the field of view may be indicated by a growing or shrinking region of correlated motion vectors at the corresponding picture boundary.
- a person/object entering or exiting, e.g., from a doorway, within the field of view will likely be indicated by a growing or shrinking region of motion vectors forming an inward-pointing or outward-pointing radial pattern across a series of frames.
- spatial and temporal patterns of motion vectors and inter/intra block type fields can be used to detect these different situations.
- a sequence of frames having a motion vector pattern in which almost all motion vectors are zero motion vectors can be used detect the occurrence of still text, slides, and pictures that occupy the entire video frame, or a general lack of moving objects in the scene.
- Motion vector data could also be used to guide noise reduction and edge enhancement processing. If a block is stationary over several frames, these blocks can be averaged together for temporal noise reduction. If motion exists, such averaging would result in unacceptable motion blur. On the other hand, noise reduction could be achieved by averaging after taking motion into consideration. In effect, the motion vector data substitutes for the motion detection in a motion-adaptive noise reduction algorithm. Similarly, motion vector data can be used to implement temporal edge enhancement techniques. In addition, the knowledge of the coarseness of the actual quantization matrices can be used to contrain enhancement processing.
- One way to characterize the various patterns of motion vectors would be to have a set of canned motion vector patterns that would be convolved over the decoded motion vector field (e.g., taking vector inner products along the way) to generate a correlation value (e.g., average inner product) that could be compared to a threshold value to determine whether the motion vector field possessed a similar general pattern.
- Another technique would be to use statistical analysis (e.g., mean and or standard deviation of motion vector data) over either the entire picture or specific regions to characterize the presence of high- level events in the scene. For example, a set of contiguous blocks having a large mean motion vector and a small standard deviation within an otherwise stationary picture suggests the presence of a moving object within the scene.
- the DCT coefficients can be used to characterize the spatial frequency within each frame as well as the temporal changes in spatial frequency between frames. This information can be used to characterize certain types of picture content.
- the DC coefficient of the (B-Y) component DCT block is large in most blocks in the upper third of a frame, it probably corresponds to sky.
- low-energy DCT coefficients indicate no substantial change, while high-energy DCT coefficients may indicate a change of shape or texture within the block.
- text appearing in an image usually exhibits the combination of having high contrast, being monochromatic, and having many edges.
- This unusual combination of characteristics may be indicated in the DCT coefficients as high-energy, high-frequency DCT coefficients in many orientations with few or even no non-zero quantized coefficients in the corresponding U and V blocks.
- a raster display in a video sequence may be detected by the presence of a temporal beat frequency between the raster display frame rate and the frame rate of the video sequence.
- a temporal beat frequency between the raster display frame rate and the frame rate of the video sequence.
- the present invention may be implemented as circuit-based processes, including possible implementation on a single integrated circuit.
- various functions of circuit elements may also be implemented as processing steps in a software program.
- Such software may be employed in, for example, a digital signal processor, micro-controller, or general-purpose computer.
- the present invention can be embodied in the form of methods and apparatuses for practicing those methods.
- the present invention can also be embodied in the form of program code embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.
- the present invention can also be embodied in the form of program code, for example, whether stored in a storage medium, loaded into and/or executed by a machine, or transmitted over some transmission medium or carrier, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.
- program code When implemented on a general-purpose processor, the program code segments combine with the processor to provide a unique device that operates analogously to specific logic circuits.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Image Analysis (AREA)
- Television Signal Processing For Recording (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2001562030A JP2003533906A (en) | 2000-02-24 | 2001-02-26 | Compressed video analysis |
EP01913054A EP1258146A2 (en) | 2000-02-24 | 2001-02-26 | Compressed video analysis |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US51240600A | 2000-02-24 | 2000-02-24 | |
US09/512,406 | 2000-02-24 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2001063937A2 true WO2001063937A2 (en) | 2001-08-30 |
WO2001063937A3 WO2001063937A3 (en) | 2002-01-31 |
Family
ID=24038957
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2001/006094 WO2001063937A2 (en) | 2000-02-24 | 2001-02-26 | Compressed video analysis |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP1258146A2 (en) |
JP (1) | JP2003533906A (en) |
WO (1) | WO2001063937A2 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
SG140441A1 (en) * | 2003-03-17 | 2008-03-28 | St Microelectronics Asia | Decoder and method of decoding using pseudo two pass decoding and one pass encoding |
US8004563B2 (en) | 2002-07-05 | 2011-08-23 | Agent Vi | Method and system for effectively performing event detection using feature streams of image sequences |
US20130279882A1 (en) * | 2012-04-23 | 2013-10-24 | Apple Inc. | Coding of Video and Audio with Initialization Fragments |
US9330426B2 (en) | 2010-09-30 | 2016-05-03 | British Telecommunications Public Limited Company | Digital video fingerprinting |
US9369668B2 (en) | 2014-03-14 | 2016-06-14 | Cisco Technology, Inc. | Elementary video bitstream analysis |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4747681B2 (en) * | 2005-05-31 | 2011-08-17 | パナソニック株式会社 | Digital broadcast receiver |
JP4743601B2 (en) * | 2005-09-21 | 2011-08-10 | Kddi株式会社 | Moving image processing device |
CN102611891B (en) * | 2012-02-07 | 2014-05-07 | 中国电子科技集团公司第三研究所 | Method for directly performing transform coding in transform domain |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1998052356A1 (en) * | 1997-05-16 | 1998-11-19 | The Trustees Of Columbia University In The City Of New York | Methods and architecture for indexing and editing compressed video over the world wide web |
US5911008A (en) * | 1996-04-30 | 1999-06-08 | Nippon Telegraph And Telephone Corporation | Scheme for detecting shot boundaries in compressed video data using inter-frame/inter-field prediction coding and intra-frame/intra-field coding |
-
2001
- 2001-02-26 JP JP2001562030A patent/JP2003533906A/en active Pending
- 2001-02-26 WO PCT/US2001/006094 patent/WO2001063937A2/en not_active Application Discontinuation
- 2001-02-26 EP EP01913054A patent/EP1258146A2/en not_active Withdrawn
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5911008A (en) * | 1996-04-30 | 1999-06-08 | Nippon Telegraph And Telephone Corporation | Scheme for detecting shot boundaries in compressed video data using inter-frame/inter-field prediction coding and intra-frame/intra-field coding |
WO1998052356A1 (en) * | 1997-05-16 | 1998-11-19 | The Trustees Of Columbia University In The City Of New York | Methods and architecture for indexing and editing compressed video over the world wide web |
Non-Patent Citations (2)
Title |
---|
BOYCE J M: "NOISE REDUCTION OF IMAGE SEQUENCES USING ADAPTIVE MOTION COMPENSATED FRAME AVERAGING" MULTIDIMENSIONAL SIGNAL PROCESSING. SAN FRANCISCO, MAR. 23 - 26, 1992, PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), NEW YORK, IEEE, US, vol. 3 CONF. 17, 23 March 1992 (1992-03-23), pages 461-464, XP000378968 ISBN: 0-7803-0532-9 * |
HONGJIANG ZHANG ET AL: "VIDEO PARSING AND BROWSING USING COMPRESSED DATA" MULTIMEDIA TOOLS AND APPLICATIONS, KLUWER ACADEMIC PUBLISHERS, BOSTON, US, vol. 1, 1995, pages 89-111, XP000571810 ISSN: 1380-7501 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8004563B2 (en) | 2002-07-05 | 2011-08-23 | Agent Vi | Method and system for effectively performing event detection using feature streams of image sequences |
SG140441A1 (en) * | 2003-03-17 | 2008-03-28 | St Microelectronics Asia | Decoder and method of decoding using pseudo two pass decoding and one pass encoding |
US9330426B2 (en) | 2010-09-30 | 2016-05-03 | British Telecommunications Public Limited Company | Digital video fingerprinting |
US20130279882A1 (en) * | 2012-04-23 | 2013-10-24 | Apple Inc. | Coding of Video and Audio with Initialization Fragments |
US10264274B2 (en) | 2012-04-23 | 2019-04-16 | Apple Inc. | Coding of video and audio with initialization fragments |
US10992946B2 (en) | 2012-04-23 | 2021-04-27 | Apple Inc. | Coding of video and audio with initialization fragments |
US9369668B2 (en) | 2014-03-14 | 2016-06-14 | Cisco Technology, Inc. | Elementary video bitstream analysis |
Also Published As
Publication number | Publication date |
---|---|
EP1258146A2 (en) | 2002-11-20 |
JP2003533906A (en) | 2003-11-11 |
WO2001063937A3 (en) | 2002-01-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1709801B1 (en) | Video Decoding Method Using Adaptive Quantization Matrices | |
CA2316848C (en) | Improved video coding using adaptive coding of block parameters for coded/uncoded blocks | |
CA2374067C (en) | Method and apparatus for generating compact transcoding hints metadata | |
US6175593B1 (en) | Method for estimating motion vector in moving picture | |
US7738550B2 (en) | Method and apparatus for generating compact transcoding hints metadata | |
US5491523A (en) | Image motion vector detecting method and motion vector coding method | |
US5508744A (en) | Video signal compression with removal of non-correlated motion vectors | |
US20100110303A1 (en) | Look-Ahead System and Method for Pan and Zoom Detection in Video Sequences | |
EP1135934A1 (en) | Efficient macroblock header coding for video compression | |
EP2536143B1 (en) | Method and a digital video encoder system for encoding digital video data | |
JP2000217121A (en) | Method for detecting scene change by processing video data in compressed form for digital image display | |
JP2001197501A (en) | Motion vector searching device and motion vector searching method, and moving picture coder | |
US6480543B1 (en) | Detection of a change of scene in a motion estimator of a video encoder | |
US5699129A (en) | Method and apparatus for motion vector determination range expansion | |
WO2003045070A1 (en) | Feature extraction and detection of events and temporal variations in activity in video sequences | |
Tsai et al. | Block-matching motion estimation using correlation search algorithm | |
US6847684B1 (en) | Zero-block encoding | |
KR20040060980A (en) | Method and system for detecting intra-coded pictures and for extracting intra DCT precision and macroblock-level coding parameters from uncompressed digital video | |
US9654775B2 (en) | Video encoder with weighted prediction and methods for use therewith | |
EP1258146A2 (en) | Compressed video analysis | |
US8472523B2 (en) | Method and apparatus for detecting high level white noise in a sequence of video frames | |
KR100367468B1 (en) | Method and device for estimating motion in a digitized image with pixels | |
JP2002064823A (en) | Apparatus and method for detecting scene change of compressed dynamic image as well as recording medium recording its program | |
EP1157559A1 (en) | Methods and apparatus for improved motion estimation for video encoding | |
Li et al. | A robust, efficient, and fast global motion estimation method from MPEG compressed video |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): BR JP |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
AK | Designated states |
Kind code of ref document: A3 Designated state(s): BR JP |
|
AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2001913054 Country of ref document: EP |
|
ENP | Entry into the national phase in: |
Ref country code: JP Ref document number: 2001 562030 Kind code of ref document: A Format of ref document f/p: F |
|
WWP | Wipo information: published in national office |
Ref document number: 2001913054 Country of ref document: EP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2001913054 Country of ref document: EP |