EP2002650A1 - Präprozessorverfahren und -vorrichtung - Google Patents

Präprozessorverfahren und -vorrichtung

Info

Publication number
EP2002650A1
EP2002650A1 EP07758479A EP07758479A EP2002650A1 EP 2002650 A1 EP2002650 A1 EP 2002650A1 EP 07758479 A EP07758479 A EP 07758479A EP 07758479 A EP07758479 A EP 07758479A EP 2002650 A1 EP2002650 A1 EP 2002650A1
Authority
EP
European Patent Office
Prior art keywords
video
frame
information
metadata
frames
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP07758479A
Other languages
English (en)
French (fr)
Inventor
Tao Tian
Fang Liu
Fang Shi
Vijayalakshmi R. Raveendran
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Publication of EP2002650A1 publication Critical patent/EP2002650A1/de
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/01Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
    • H04N7/0112Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level one of the standards corresponding to a cinematograph film standard
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/107Selection of coding mode or of prediction mode between spatial and temporal predictive coding, e.g. picture refresh
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/19Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding using optimisation based on Lagrange multipliers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/523Motion estimation or motion compensation with sub-pixel accuracy
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/86Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/87Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving scene cut or scene change detection in combination with video compression
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/14Picture signal circuitry for video frequency region
    • H04N5/144Movement detection
    • H04N5/145Movement estimation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/14Picture signal circuitry for video frequency region
    • H04N5/147Scene change detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/01Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
    • H04N7/0117Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level involving conversion of the spatial resolution of the incoming video signal
    • H04N7/012Conversion between an interlaced and a progressive signal

Definitions

  • the invention generally relates to multimedia data processing, and more particularly, to processing operations performed prior to or in conjunction with data compression processing.
  • a method of processing multimedia data comprises receiving interlaced video frames, converting the interlaced video frames to progressive video, generating metadata associated with the progressive video, and providing the progressive video and at least a portion of the metadata to an encoder for use in encoding the progressive video.
  • the method can further include encoding the progressive video using the metadata
  • the interlaced video frames comprise NTSC video.
  • Converting the video frames can include deinterlacing the interlaced video frames.
  • JCM JCM
  • the metadata can include bandwidth information, bi-directional motion information, a bandwidth ratio, a complexity value such as a temporal or a spatial complexity value or both, luminance information
  • the spatial information can include luminance and/or chrominance information.
  • the method can also include generating spatial information and bi-directional motion information for the interlaced video frames and generating the progressive video based on the interlaced video frames using the spatial and bi-directional motion information.
  • converting the interlaced video frames comprises inverse telecining 3/2 pulldown video frames, and/or resizing the progressive video.
  • the method can further comprise partitioning the progressive video to determine group of picture information, where the partitioning can include shot detection of the progressive video.
  • the method also includes progressive video with a denoising filter.
  • an apparatus for processing multimedia data can include a receiver configured to receive interlaced video frames, a deinterlacer configured to convert the interlaced video frames to progressive video, and a parti tioner configured to generate metadata associated with the progressive video and provide the progressive video and the metadata to an encoder for use in encoding the progressive video.
  • the apparatus can further include an encoder configured to receive the progressive video from the communications module and encode the progressive video using the provided metadata.
  • the deinterlacer can be configured to perform spatio- temporal dejnteriacing and/or inverse telecining.
  • the parti tioner can be configured to perform shot detection and generate compression information based on the shot detection. In some aspects the parti tioner can be configured to generate bandwidth information.
  • the apparatus can also include a resampler configured to resize a progressive frame.
  • the metadata can include bandwidth information, bi-directional motion information, a bandwidth ratio, luminance information, a spatial complexity value related to content, and/or a temporal complexity value related to content.
  • the deinterlacer is configured to generate spatial information and bi-directional motion information for the interlaced video frames and progressive video based on the interlaced video frames using spatial and bi-directional motion information.
  • an apparatus for processing multimedia data includes means for receiving interlaced video frames, means for converting the interlaced video frames to progressive video, means for generating metadata associated with the progressive video, and means for providing the progressive video and at least a portion of the metadata to an encoder for use in encoding the progressive video
  • the converting means comprises an inverse teleciner and/or a spatio- temporal deinterlacer
  • the generating means is configured to perform shot detection and generate compression information based on the shot detection.
  • the generating means is configured to generate bandwidth information
  • the generating includes means for resampling to resize a progressive frame.
  • fO ⁇ OSJ Another aspect comprises a machine readable medium comprising instructions for processing multimedia data that upon execution cause a machine to receive interlaced video frames, convert the interlaced video frames to progressive v ideo, generate metadata associated with the progressive video, and provide the progressive video and at least a portion of the metadata to an encoder for use in encoding the progressive video.
  • Another aspect includes a processor comprising a configuration to receive interlaced video, convert the interlaced video to progressive video, generate metadata associated with the progressive video, and provide the progressive video and at least a portion of the metadata to an encoder for use in encoding the progressive video.
  • the conversion of the interlaced video can include a performing spatio-temporal deinterlacing.
  • the conversion of the interlaced video comprises performing inverse telecine
  • generation of metadata includes generating compression information based on detecting shot changes.
  • generation of metadata includes determining compression information of the progressive video.
  • the configuration includes a configuration to resample video to generate a resized a progressive frame.
  • the metadata can include bandwidth information, bi-directional motion information, complexity information such as temporal or spatial complexity information based on content, and/or com pressi on i nforroati on BRIEF DESCRIPTION OF THE DRAWINGS f ⁇ 010
  • Figtiic 1 is a block diagram of a communications system for delivering streaming multimedia data
  • Figure 2 is a block diagram of a digital transmission facility that includes a preprocessor.
  • Figure 3A is a bSock diagram of an illustrative aspect of a preprocessor
  • ⁇ O0I3J Figure 3B is a flou diagram that illustrates a process for processing multimedia data
  • Figure 3C is a block diagram illustrating means for processing multimedia data
  • ffl ⁇ tSJ Figure 4 is a block diagram illustrating operations of an exemplary preprocessor.
  • Figure 5 is a diagram of phase decisions in an inverse telecme process
  • Figure 6 is a How diagram illustrating a piocess of inverting telecined video
  • FIG. 7 is an illustration of a trellis showing phase transitions
  • ffl ⁇ l ⁇ j Figure 8 is a guide to identify the respective frames that are used to create a pi uralit) of matri ces
  • Figure 9 is a flou diagiam illustrating how the metrics of Figure 8 are created.
  • FIG. 10 is a flow diagram which shows the processing of the metrics to arrhe at an estimated phase
  • Figure I i is a dataflow diagram illustrating a sj steni for generating decision variables
  • Figure 12 is a block diagram depicting variables that are used to exaluate the branch information
  • Figure 14 is a flow diagram shoeing the operation of a consistency deteetoi
  • FIG. 5 is a flow diagram showing a process of computing an offset to a decision ⁇ ariable that is used to compensate for inconsistency in phase decisions
  • Figure 16 presents the operation of inverse telecine after the pull down phase has been estimated
  • figure 17 is a block diagram of a deinterlaeer device
  • Figure 18 is a block diagram of another deinterlacer dev ice
  • Figure 19 is. drawing of a subsampling pattern of an interlaced picture.
  • FIG. 20 is a block diagram of a deinteriacer dev ice that uses Wmed filtering motion estimation to generate a deintedaced frame
  • FIG. 21 illustrates one aspect of an aperture for determining static areas of multimedia data
  • ⁇ 0033 J Figure 22 is a diagram illustrating one aspect of an aperture for determining slow -motion areas of multimedia data
  • Hg ⁇ ire 23 is a diagram illustrating an aspect of motion estimation
  • ⁇ 0035J Figure 24 illustrates two motion vector maps used in determining motion compensation
  • Figure 25 is a flow diagram illustrating a method of deinterlacing multimedia data
  • Figure 26 is a flow diagram illustrating a method of generating a deinterlaced frame using spatio-temporal information
  • Figure 27 is a flow diagram illustrating a method of performing motion compensation for deinterlacing
  • FIG. 28 is a block diagram of a preprocessor comprising a processor configured for shot detection and other preprocessing operations according to some aspects
  • FIG. 30 is a flow diagram that illustrates a process that operates on a group of pictures and can be used in some aspects to encode video based on shot detection in video frames.
  • FIG. 31 is a flow diagram illustrating a process for shot detection
  • Figure 32 is a Dow diagram illustrating a process foi determining different classifications of shots in video
  • Figure 33 is a flow diagiam illustrating a process for assigning frame compression schemes to video frames based on shot detection results
  • 0045j figure 34 is a flow diagram illustrating a process for determining abmpt scene changes
  • FIG. 35 is. a flow diagram illustrating a process for determining slowly- changing scenes
  • FIG. 41 is a flow diagram illustrating the procedure where compression types are assigned to frames
  • Figure 42 illustrates an example of 1-D po!> phase resampling
  • Figure 43 is a graphic illustrating a safe action area and a safe title aiea of a frame of data
  • Such preprocessors can process metadata and video in preparation for encoding, including performing deinteriacing, inverse telecining, filtering identifying shot types, processing and generating metadata, and generating bandwidth information
  • References herein to "one aspect,” “an aspect, “ some aspects.” or “certain aspects” mean that one or more of a particular feature, structure, or characteristic described in connection with the aspect can be included in at least one aspect of a preprocessor system.
  • the appearances of such phrases in various places in the specification are not necessarily all referring to the same aspect, nor are separate or alternative aspects mutually exclusive of other aspects.
  • various features are described which may be exhibited by some aspects and not by others.
  • various steps are described which may be steps for some aspects but not other aspects.
  • '-Multimedia data or “multimedia” as used herein is a broad term that includes video data (which can include audio data), audio data, or both video data and audio data
  • Video data or “video” as used herein as a broad term, which refers to an image or one or more series or sequences of images containing text, image, and/ or audio data, and can be used to refer to multimedia data or the terms may be used interchangeably, unless otherwise specified.
  • FIG. 1 is a block diagram of a communications system 100 for delivering streaming multimedia. Such system finds application in the transmission of digital compressed video to a multiplicity of terminals as shown in Figure 1
  • a digital video source can be, for example, a digital cable or satellite feed or an analog source that is digitized.
  • the video source is processed in a transmission facility 120 where it is encoded and modulated onto a carrier for transmission through a network 140 to one or more terminals 160.
  • the terminals 160 decode the received video and typically display at least a portion the ⁇ ideo
  • the network 140 refers to any type of communication network, wired or wireless, suitable for the transmission, of encoded data
  • the network 140 can be a cell phone network, wired or wireless local area network (LAN) or a wide area network (WAN), or the Internet.
  • the terminals 160 can be any type of communication device capable of receiving and displaying data, including, but not limited to, cell phones. PDA's, in-home or commercial video display equipment, computers (portable, laptop, handheld, PCs, and larger server-based computer systems), and personal entertainment devices capable of using multimedia data.
  • FIGS. 2 and 3 illustrate sample aspects of a preprocessor 202
  • preprocessor 202 is in a digital transmission facility 120
  • a decoder 201 decodes encoded data from a digital video source and provides metadata 204 and video 205 to the preprocessor 202.
  • the preprocessor 202 is configured to perform certain types of processing on the video 205 and the metadata 204 and provide processed metadata 206 (e g., base layer reference frames, enhancement layer reference frames, bandwidth information, content information) and video 207 to an encoder 203.
  • Such preprocessing of multimedia data can improve the visual clarity, anti-aliasing, and compression efficiency of the data.
  • the preprocessor 202 receives video sequences provided by the decoder 201 and converts the video sequences into progressive video sequences for further processing (e.g., encoding) by an encoder.
  • the preprocessor 202 can be configured for numerous operations, including inverse teiecine, de-interlacing, filtering (e.g , artifact remov al, de-ringing, de-blocking, and de-noising), resizing (e g., spatial resolution down-sampling from standard definition to Quarter Video Graphics Array (QVGA)), and GOP structure generation fe.g , calculating complexity map generation, scene change detection, and fade/flash detection).
  • inverse teiecine de-interlacing
  • filtering e.g , artifact remov al, de-ringing, de-blocking, and de-noising
  • resizing e.g., spatial resolution down-sampling from standard definition to Quarter Video Graphics Array (Q
  • FIG. 3 A illustrates a preprocessor 202 that is configured with modules or components (collectively referred to here as "modules”) to perform its preprocessing operations on received metadata 204 and video 205, and then prov ide processed metadata 206 and progressive video 207 for further processing (e.g.. to an encoder).
  • the modules can be implemented in hardware, software, firmware, or a combination thereof.
  • the preprocessor 202 can include various modules, including one or more or the modules illustrated, which include inverse teiecine 301, deinterlacer 302, denoiser 303, alias suppressor 304, resampler 305. deblocker/derringer 306, and a GOP partitioner 307, all described further below.
  • the preprocessor 202 can also include other appropriate modules that may be used to process the video and metadata, including memory 308 and a communications module 309.
  • a software module may reside in RAM memory, Hash memory, ROM memory, EPROM memory, EBPROM memory, registers, a hard disk, a remov able disk, a CD-ROM, or any other form of storage medium known in the art.
  • An exemplary storage medium is coupled to the processor. such that the processor can read information from, and write information to, the storage medium In the alternative, the storage medium ma) be integral to the processor
  • the processor and the storage medium may reside in an ASIC
  • the ASIC may reside in a user terminal
  • the processor and the storage medium may reside as discrete components in a user terminal
  • FIG. 3B is a flow diagram that illustrates a process 300 for processing of multimedia data Piocess 300 sta ⁇ s and proceeds to block 320 where interlaced video is recehed Preprocessor 202 illustrated in f igure 2 and figure 3 can perform this step
  • a decoder Ce g , decoder 2Oi Figure 2 ⁇ can receive the interlaced data and the prox ide it to preprocessor 202
  • a data receiving module 330 shown in Figure 3 C which is a portion of a preprocessor 202 can pet form this step
  • Process 300 then proceeds to block 322 where interlaced ⁇ ideo is converted to progressive video Preprocessor 202 in Figure 2 and Figure 3 A, and module 332 of Figure 3 € can perform this step If the interlaced video has been telecined, block 322 processing can include performing inverse telecining to generate progressive ⁇ ideo Process 300 then proceeds to block 324 to generate metadata associated with the progressive ⁇ kleo The GOP Parti tioner 30
  • process 300 can end f ⁇ 63j figure 3C is a block diagram illustrating means for processing multimedia data Shown here such means are incorporated in a preprocessor 202
  • the preprocessor 202 includes means for receiving video such as module 330
  • the preprocessor 202 also includes means for converting interlaced data to progressive ⁇ ideo such as module 332
  • Such means can include, foi esample.
  • the preprocessor 202 also includes means for genera ling metadata associated with the progressive video such as module 334 Such means can include the GOP paxtitioner 307 ( Figure 3A) w hich can generate various types of metadata as described herein
  • the preprocessor 202 can also include means for providing the progresske video and metadata to an encoder for encoding as illustrated by module 336 Such IO
  • means can include a communications module 309 illustrated in Figure 3 A in some aspects. As will be appreciated by one skilled in the art, such means can be implemented in many standard ways,
  • the preprocessor 202 can use obtained metadata (e.g., obtained from the decoder 20i or from another source) for one or more of the preprocessing operations.
  • Metadata can include information relating to. describing, or classifying the content of the multimedia data ("content information") hi particular the metadata can include a content classification.
  • the metadata does not include content information desired for encoding operations
  • the preprocessor 202 can be configured to determine content information and use the content information for preprocessing operations and/or provides the content information to other components, e g., the decoder 203
  • the preprocessor 202 can use such content information to influence GOP partitioning, determine appropriate type of filtering, and/or determine encoding parameters that are communicated to an encoder.
  • FIG. 4 shows an illustrative example of process blocks that can be included in the preprocessor, and illustrates processing that can be performed by the preprocessor 202
  • the preprocessor 202 receives metadata and video 204, 205 and provides output data 206, 207 comprising (processed) metadata and video to the encoder 228
  • the preprocessor 202 receives metadata and video 204, 205 and provides output data 206, 207 comprising (processed) metadata and video to the encoder 228
  • the received video can be progressive video and deinterlacing does not have to be performed.
  • the video data can be telecined video, interlaced video converted from 24fps movie sequences, in which case the video.
  • the video can be non- telecined interlaced video.
  • Preprocessor 226 can process these types of video as described below.
  • the preprocessor 202 determines if the received video 204. 205 is progressive video In some cases, this can be determined from the metadata if the metadata contains such information, or by processing of the video itself. For example, an inverse teiechie process, described below, can determine if the received video 205 is progressive video. If it is, the process proceeds to block 407 where filtering operations are performed on the video to reduce noise, such as white Gaussian noise If the video is not progressive video, at block 401 the process proceeds to block 404 to a phase detector.
  • Phase detector 604 distinguishes between video thai originated in a telecine and that which began in a standard broadcast format If the decision is made that the video was telecined (the YES decision path exiting phase detector 404), the telecined video is returned to its original format in inverse telecine 406. Redundant fields are identified and eliminated and fields derived from the same video frame are rewoven into a complete image. Since the sequence of reconstructed film images were photographically recorded at regular intervals of 1/24 of a second, the motion estimation process performed in a GOP partitioner 412 or a decoder is more accurate using the inverse telecined images rather than the felecined data, which has an irregular time base,
  • the phase detector 404 makes certain decisions after receipt of a video frame. These decisions include (i) whether the present video from a telecine output and the 3.2 pull down phase is one of the five phases Po, P 1 , P 2 , P.*, and P 4 shown in Figure 5; and (ii) the video was generated as conventional NTSC That decision is denoted as phase Pj These decisions appear as outputs of phase detector 404 shown in Figure 4.
  • the path from phase detector 404 labeled "YES" actuates the inverse telecine 406, indicating that it has been provided with the correct pull down phase so that it can sort out the fields that were formed from the same photographic image and combine them.
  • phase detector 404 labeled "NO” similarly actuates the deinterlacer 405 to separate an apparent NTSC frame into fields for optimal processing Inverse telecine is further described in co-pending U.S. Patent Application [Attorney Docket No QFDM.021A ⁇ 050943)1 entitled "INVLRSL Tj-u-ciNi- Ai (KiRIiHM B ⁇ SPD ON S ' I ⁇ ⁇ H MACH INE " which is owned by the assignee hereof and incorporated by reference herein in its entirety
  • the phase detector 404 can continuously analyze video frames that because different types of ⁇ ideo may be received at any time.
  • video conforming to the NTSC standard may be inserted into the video as a commercial.
  • the resulting progressive video is sent to a denoiser (filter) 407 which can be used to reduce white Gaussian noise.
  • Blocking artifacts occur because compression algorithms divide each frame into blocks (e g , 8x8 blocks) Each block is reconstructed with some small errors, and the errors at the edges of a block often contrast with the errors at the edges of neighboring blocks, making block boundaries visible
  • ringing artifacts appear as disio ⁇ ions around the edges of image features Ringing artifacts occur because the encoder discards too much information in quantizing the high-frequencv DCT coefficients
  • both deblocking and deringing can use low -pass FIR. (finite impulse response) filters to hide these visible artifacts
  • GOP positioning can include detecting shot changes, generating complexity maps (e g , temporal, spatial bandwidth maps), and adaptive GOP partitioning Shot detection relates to determining when a frame in a group of pictures ((3OP) exhibits data that indicates a scene change has occurred
  • Scene change detection can be used for a video encoder to determine a proper GOP length and insert I-frames based on the GOP length, instead of inserting an ⁇ -frame at a fixed in ten al
  • the preprocessor 202 can also be configured to generate a bandwidth map which can be used for encoding the multimedia data
  • a content classification module located external to the preprocessor generates the bandwidth map instead Adaptive GOP partitioning the can adapi ⁇ ely change the composition of a group of pictures coded together
  • Illustrative examples of is the operations shown in Figure 4 are described below Inverse Telecine
  • Video compression gives best results when the properties of the source are known and used to select the ideally matching form of processing
  • Off-the-air video can originate in several ways Broadcast video that is conventionally generated - in video cameras, broadcast studios etc. - conforms in the United States to the NTSC standard. According to the standard, each frame is made up of two fields. One field consists of the odd lines, the other, the even lines. This may be referred to as an "interlaced " format. While the frames are generated at approximately 30 frames/sec, the fields are records of the television camera s image that are 1/60 sec apart.
  • Film on the other hand is shot at 24 frames/sec, each frame consisting of a complete image. This may be referred to as a "progressive' * format.
  • ' -progressive" video is converted into "interlaced" video format via a telecine process
  • the system advantageously determines when video lias been telecmed and performs an appropriate transform to regenerate the original progressive frames ⁇ 0075]
  • Figure 4 shows the effect of telecining progressive frames that were converted to interlaced video, Fi, F;, Fjs, and F 4 are progressive images that are the input to a teleciner.
  • FIG. 4 also shows puH-down phases Po, P f , P;. ⁇ and P4
  • the phase PQ is marked by the first of two NTSC compatible frames which have identical first fields. The following four frames correspond to phases P t , P 2 . P .*> and P4 Note that the frames marked by P ⁇ and PH have identical second fields Because film frame F 1 is scanned three times, two identical successive output NTSC compatible first fields are formed. All NTSC fields derived from film frame Fi are taken from the same film image and therefore are taken at the same instant of time.
  • phase detector 404 illustrated in Figure 4 makes certain decisions after receipt of a video frame These decisions include' (i) whether the present video from a telecine output and the 3 2 pull down phase is one of the five phases Pc Pn P;, P ⁇ , and P 4 shown in definition 512 of Figure 5; and (H ) the video was generated as conventional NTSC — that decision is denoted as phase P ? .
  • phase detector 401 shown in Figure 4.
  • the path from phase detector 4Oi labeled "YES” actuates the inverse telecine 406, indicating that it has been provided with the correct pull down phase so that it can sort out the fields that were formed from the same photographic image and combine them.
  • the path from phase detector 401 labeled "NO” similarly actuates the deinterlacer block 405 to separate an apparent NI SC frame into fields for optima! processing f0078
  • Figure 6 is a flowchart illustrating a process 600 of inverse telecining a video stream, in one aspect the process 600 is performed by the inverse telecine 301 of Figure 3.
  • the inverse telecine 301 determines a plurality of metrics based upon the received video.
  • four metrics are formed which are sums of differences between fields drawn from the same frame or adjacent frames.
  • the four metrics are further assembled into a Euclidian measure of distance between the four metrics derived from the received data and the most likely values of these metrics for each of the six hypothesized phases.
  • the Euclidean sums are called branch information: for each received frame there are six such quantities.
  • Each hypothesized phase has a successor phase which, in the case of the possible pull down phases, changes with each received frame.
  • the applicable phase is either utilized as the current pull down phase, or as an indicator to command the de-interlace of a frame that has been estimated to have a valid NTSC format.
  • fO ⁇ SOJ For every frame received from the video input, a new value for each of four metrics is computed These are defined as-
  • SADts refers to differences between the field one of the current frame, labeled Ci, and field one of the previous frame, labeled P 1 , which are spanned by a bracket labeled FS in definition provided in Figure 8
  • SA(Xs refers to differences between the field two of the current frame, labeled CN, and field two of the previous frame, labeled P;, which are both spanned by a bracket labeled SS
  • SA Deo refers to differences between field 2 of the current frame labeled C ⁇ and field one of the current frame, labeled Cj, which is spanned by a bracket labeled CO
  • SADpo refers to differences between field one of the current frame and field 2 of the previous frame, which are both spanned by a bracket labeled PO, jQ082
  • the computational load to evaluate each SAD is described below.
  • the SAD calculator c ⁇ uld be a standalone component, incorporated as hardware, firmware, middleware in a component of another device, or be implemented in microcode or software that is executed on the processor, or a combination thereof.
  • the program code or code segments that perform the calculation may be stored in a machine readable medium such as a storage medium ⁇ code segment may represent a pioceduie.
  • step 1030 the metrics defined in Figure 9 are evaluated Continuing to step 1083, lower en ⁇ elope values of the four metrics are found
  • a low er env elope of a SAD metric is a dynamically determined quantity that is the highest numerical floor below which the SAD does not penetrate
  • branch information quantities defined belovi in Equations 5-10 are determined, which can use the previously determined metrics, the lower en ⁇ elope ⁇ alues and an experimentally determined constant ⁇ Since the successive values of the phase may be inconsistent, a quantity ⁇ is determined to reduce this apparent instability in step 1087 The phase is deemed consistent when the sequence of phase decisions is consistent with the model of the problem shown in Figure 7 Following that step, the process proceeds to step iO8V to calculate the decision variables using the current value of ⁇ Decision ⁇ ariable
  • phase selector 1090 uses the applicable phase to either invert the telecined video or deinteriace it as shown It is a more explicit statement of the operation of phase detector 404 in Figure 4, In one aspect the processing of Figure 10 is performed by the phase detector 404 of Figure 4. Starting at step 1030, detector 404 determines a plurality of metrics by the process described above with reference to figure 8, and continues through steps 1083. 1085, 1087, 1089, j ⁇ 90, and 1091 . f ⁇ OSSJ Flowchart 1000 illustrates a process for estimating the current phase.
  • the flowchart at a step 1083 describes the use of the determined metrics and lower envelope values to compute branch information.
  • the branch information may be recognized as the Euclidean distances discussed earlier Exemplary equations that may be used to generate the branch information are Equations 5-10 below.
  • the Branch Info quantities are computed in block S 209 of Figure 12
  • the processed video data can be stored in a storage medium which can include, for example, a chip configured storage medium (e g., ROM RAM) or a disctype storage medium (e.g., magnetic or optical) connected to processor
  • a storage medium which can include, for example, a chip configured storage medium (e g., ROM RAM) or a disctype storage medium (e.g., magnetic or optical) connected to processor
  • the inverse telecine 406 and the deinteriacer 405 can each contain part or all of the storage medium
  • the branch information quantities are defined by the following equations.
  • Branch InfcX4 (S ⁇ F s - HO 2 * (SADs. - Hs) 2 ' (SAD PO - U?) 2 ⁇ (S ⁇ Dco - Lc) 2 ( ⁇ )
  • Branch lnfo(5) (S ⁇ D re - L s ) ⁇ ⁇ (SAD 8 S - L 8 ) 2 ⁇ (SAD ro - LP) " + (SADco - Lc) 2 (10>
  • bianch information calculator 1209 uses the quantities Ls, the lower envelope value of SAD ⁇ and SAD ⁇ s, L r , the lower envelope ⁇ alue of SADpo, and I ⁇ ., the Sower envelope value of SADco E he lower env elopes a?e used as distance offsets in the branch information calculations, either alone or in conjunction with a predetermined constant A to create 11 S , Hp and I L., Their values are kept up to date in lower envelope trackers described below The H offsets are defined to be
  • FIG. 1 1 is a flowchart illustrating an exemplary process for performing step 1080 of Figure 10 Figure 1 1 generally shows a process for updating the decision variables. There the six decision variables (corresponding to the six possible decisions) are updated with new information derived from the metrics The decision variables are found as follows:
  • is computed in block 1 1 12
  • the quantity is chosen to reduce an inconsistency in the sequence of phases determined by this system.
  • the smallest decision variable is found in block 1 120.
  • new information specific to each decision is added Io the appropriate decision variable ' s previous value that lias been multiplied by ⁇ , to get the current decision variable ' s value.
  • a new decision can be made when new metrics are in hand; therefore this technique is capable of making a new decision upon receipt of fields 1 and 2 of every frame
  • These decision variables are the sums of Euclidean distances referred to earlier.
  • the applicable phase is selected to be the one having the subscript of the smallest decision variable.
  • a decision based on the decision variables is made explicitly in block 1090 of Figure 10 Certain decisions are allowed in decision space. As described in block 1091, these decisions are.
  • the applicable phase in not P*, - inverse telecine the video and
  • the applicable phase is P ? — deinterlace the video [0095 ⁇ There may be occasional errors in a coherent string of decisions, because the metrics are drawn from video, which is inherently variable.
  • Figure 16 shows how the inverse telecine process proceeds once the pull down phase is determined
  • fields 1605 and 1605" are identified as representing the same field of video.
  • the two fields are averaged together, and combined with field 1606 to reconstruct frame 1620.
  • the reconstructed frame is 162O 1 .
  • a Viterbi decoder adds the metrics of the brandies that make up the paths together to form the path metric.
  • the decision variables defined here are formed by a similar rule: each is the "leaky "' sum of new information variables. (In a leak ⁇ - summation the previous value of a decision variable is multiplied by a number less than unity before new information data is added to it,)
  • a Viterbi decoder structure could be modified to support the operation of this procedure.
  • Hie processing of video described in this patent can also be applied to video in die PAL format.
  • Deinteriacer as used herein is a broad term that can be used to describe a deinterlacing system, dev ice, or process (including for example, software, firmware, or hardware configured to perform a process) that processes, in whole or in significant part, interlaced multimedia data to form progressive multimedia data,
  • the fields are records of the television camera's image that are 1/60 sec apart.
  • Each frame of an interlaced video signal shows even- other horizontal line of the image.
  • the video signal alternates between showing even and odd lines.
  • the video image looks smooth to the human eye f0107
  • Interlacing has been used for decades in analog television broadcasts that are based on the NTSC (U.S.) and PAl. (Europe) formats Because only half the image is sent with each frame, interlaced video uses roughly half the bandwidth than it would sending the entire picture.
  • modem pixel-based displays e.g., LCD, DLP, LCOS, plasma, etc
  • modem pixel-based displays are progressive scan and display progressively scanned video sources (whereas many older video devices use the older interlaced scan technology).
  • deinterlacing algorithms are described in "Scan rate up-conversion using adaptive weighted median filtering, 1' P. Haavisto, J. Juhola, and Y, Neuvo, Signal Processing of HDT! ' H, pp. 703-710, 1990, and "Deinterlacing of HDTV images for Multimedia Applications," R. Sir ⁇ onetti, S. Carrato, G. Ramponi, and A.
  • the spatio-temporal filtering can use a weighted median filter ("Wined”) filter that can include a horizontal edge detector that prevents blurring horizontal or near horizontal edges.
  • Wined weighted median filter
  • Spatio-temporal filtering of previous and subsequent neighboring fields to a "current" field produces an intensity motion-level map that categorizes portions of a selected frame into different motion levels, for example, static, slow-motion, and fast motion.
  • the intensity map is produced by Wmed filtering using a filtering aperture that includes pixels from five neighboring fields (two previous fields, the current field, and two next fields).
  • the Wmed filtering can determine forward, backward, and bidirectional static area detection which can effectively handle scene changes and objects appearing and disappearing
  • a Wmed filter can be utilized across one or more fields of the same parity in an inter-field filtering mode, and switched to an intra-field filtering mode by tweaking threshold criteria.
  • motion estimation and compensation uses iuroa (intensity or brightness of the pixels) and chroma data (color information of the pixels) to improve deinterlacing regions of the selected frame where the brightness level is almost uniform but the color differs
  • a denoising filter can be used to increase the accuracy of motion estimation.
  • the denoising filter can be applied to Wmed deinterlaced provisional frames to remove alias artifacts generated by Wmed filtering.
  • the deinterlacing methods and systems described below produce good deinterlacing results and have a relatively low computational complexity that allow fast running deinterlacing implementations, making such implementations suitable for a wide variety of deinterlacing applications, including systems that are used to provide data to cell phones, computers and other types of electronic or communication devices utilizing a display
  • [Oil I j Figure 1 ? is a block diagram illustrating one aspect of a deinterlacer 1700 that can be used as the deinterlacer 405 in Figure 4.
  • the deinterlacer 1722 includes a spatial filter 1730 that spatially and temporally ("spatio-temporal") filters at least a portion of the interlaced data and generates spatio-temporal information For example, Wmed can be used in the spatial filter 1730.
  • the deinterlacer 1700 also includes a denoting filter (not shown), for example, a Wei tier filter or a wavelet shrinkage filler
  • the deinterlacer 1700 also includes a motion estimator 1732 which provides motion estimates and compensation of a selected frame of interlaced data and generates motion information
  • a combiner 1734 receives and combines the spatio- temporal information and the motion information to form a progressive frame
  • Figure 18 is another block diagram of the deinterlacer 1700.
  • a processor 1836 in the deinterlacer 1700 includes a spatial filter module 1838, a motion estimation module 1840, and a combiner module 18 42 interlaced multimedia data from an external source 48 can be provided to a communications module 44 in the deinterlacer 1700.
  • the deinterlacer and components or steps thereof, can be implemented by hardware, software, firmware, middleware, microcode, or any combination thereof
  • a deinterlacer may be a standalone component, incorporated as hardware, firmware, middleware in a component of another device, or be implemented in microcode or software that is executed on the processor, or a combination thereof
  • the program code or code segments that perform the deinterlacer tasks may be stored in a machine readable medium such as a storage medium.
  • a code segment may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program statements
  • a code segment may be coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents.
  • the received interlaced data can be stored in the deinterlacer 1700 in a storage medium 1846 which can include, for example, a chip configured storage medium (e.g., ROM, RAM) or a disc-type storage medium (e g., magnetic or optical) connected to the processor 1836.
  • the processor 1836 can contain part or all of the storage medium.
  • the processor 1836 is configured to process the interlaced multimedia data to form progressive frames which are then provided to another device or process.
  • Equation 2 J abo ⁇ e is simplified as
  • FIG. 20 is a Mock diagram illustrating certain aspects of an aspect of a deinteriacer that uses Wmed filtering and motion estimation to generate a progressive frame from interlaced multimedia data
  • the upper part of Figure 20 shows a motion intensity map 2052 that can be generated using information from a current field, two previous fields (PP Field and P Held), and two subsequent fields ⁇ Next Field and Next Next field)
  • the motion intensity map 2052 categorizes, or partitions, the current frame into two or more different motion levels, and can be generated by spatio-temporal filtering, described in further detail hereinbelow
  • the motion intensm map 2052 is generated to identify static areas, slow-motion areas, and fast-motion areas, as described in reference to Kquations 4-8 below
  • a spatio-temporal filter, e g , W med filter 2054 filters the interlaced multimedia data using criteria based on the motion intensity map, and produces a spatio-temporal provisional dehiteriaced frame
  • a spatio-temporal filter e g
  • the denoiser 2056 is configured to filter the spatio-temporal provisional deinterlaced frame generated b ⁇ the W med filter 2056 Denoising the spatio-temporal provisional deinterlaced frame makes the subsequent motion search process more accurate especially if the source interlaced multimedia data sequence is contaminated by white noise it can also at least parti j remove alias between even and odd rows in a Wmed picture
  • the denoiser 2056 can be implemented as a ⁇ ariety of filters including a wavelet shrinkage and wavelet Wiener filter based denoiser which arc also described further hereinbelow
  • Figure 20 illustrates an aspect for determining motion information (e g , motion vector candidates, motion estimation, motion compensation) of interlaced multimedia data.
  • Figure 20 illustrates a motion estimation and motion compensation scheme that is used to generate a motion compensated provisional progressive frame of the selected frame, and then combined with the Wmed provisional frame to form a resulting "final" progressive frame, shown as de-interlaced current frame 2064
  • motion vector CMV motion vector CMV candidates (or estimates) of the interlaced multimedia data are provided to the deinterlacer from external motion estimators and used to provide a starting point for bi-directional motion estimator and compensator ("ME/MC") 2068
  • a MV candidate selector 2072 uses previously determined MVs for neighboring blocks for MV candidates of the blocks being processed, such as the MVs of previous processed blocks, for example blocks in a deinteriaced previous frame 2070.
  • the motion compensation can be done bi-directional, based on the previous deinteriaced frame 70 and a next (e g., future) Wmed frame 2058.
  • a current Wmed frame 2060 and a motion compensated ("MC") current frame 2066 are merged, or combined, by a combiner 2062
  • a resulting deinteriaced current frame 2064, now a progressive frame is provided back to the ME/MO 2068 to be used as a deinteriaced previous frame 2070 and also communicated externa! to the deinterlacer for further processing, e.g., compression and transmission to a display terminal '
  • the various aspects shown in Figure 20 are described in more detail below.
  • FIG. 25 illustrates a process 2500 for processing multimedia data to produce a sequence of progressive frames from a sequence of interlaced frames.
  • a progressive frame is produced by the deinterlacer 405 illustrated in Figure 4.
  • process 2500 (process "'A") generates spatio-temporal information for a selected frame Spatio-temporal information can include information used to categorize the motion levels of the multimedia data and generate a motion intensity map. and includes the Wmed provisional deinteriaced frame and information used to generate the frame (e.g , information used in Equations 26-33).
  • This process can be performed by the Wmed filter 2054, as illustrated in the upper portion of Figure 20, and its associated processing, which is described in further detail below.
  • process A process A.
  • process 2500 generates motion compensation information for a selected frame.
  • the bi-directional motion estimator/motion compensator 2068 illustrated in the lower portion of Figure 20, can perform this process.
  • the process 2500 then proceeds to block 2506 where it deinterlaces fields of the selected frame based on the spatio-temporal information and the motion compensation information to form a progressive frame associated with the selected frame This can he performed by the combiner 2062 illustrated in the lower portion of Figure 20.
  • a motion intensity 2052 map can be determined by processing pixels in a current field to determine areas of different "motion."
  • An illustrative aspect of determining a three category motion intensity map h described below with reference to Figures 2 ! -24 The motion intensity map designates areas of each frame as static areas, slow-motion areas, and fast motion areas based on comparing pixels in same-parity fields and different parity fields
  • Determining static areas of the motion map can comprise processing pixels in a neighborhood of adjacent fields to determine if luminance differences of certain pixel (s) meet certain criteria.
  • determining static areas of the motion map comprises processing pixels in a neighborhood of five adjacent fields (a Current Field (C), two fields temporally before the current field, and two frames temporally after the Current Field) to determine if luminance differences of certain pixel(s) meet certain thresholds.
  • C Current Field
  • Z ! representing a delay of one field.
  • the f ⁇ e adjacent would typically be displayed in such a sequence with a Z ! time delay.
  • Figure 21 illustrates an aperture identifying certain pixels of each of the five fields that can be used for the spatio-temporal filtering, according to some aspects.
  • the aperture includes, from left to right. 3x3 pixel groups of a Previous Previous Field (PP) 5 a Previous Field (P), the Current Field (C), a Next Field (N), and a Next Next Field (NN).
  • PP Previous Previous Field
  • P Previous Field
  • C Current Field
  • N Next Field
  • N Next Next Field
  • N Next Next Field
  • an area of the Current Field is considered static in the motion map if it meets the criteria described in the Equations 26-28, the pixel locations and corresponding fields being illustrated in Figure 21 ⁇
  • / ' is the Luminance of a pixel P located in the P Held
  • / s is the luminance of a pixel N located in the N Field
  • Luminance of a pixel B located in the Current Field is the Luminance of a pixel B located in the Current Field.
  • L is the Luminance of a pixel E located in the Current Field
  • / ,jpp is the Luminance of a pixel Bpj located in the PP Field
  • / ⁇ ' ⁇ is the luminance of a pixel Bw located in the NN Field
  • w is the Luminance of a pixel Kw located in the NK Field
  • Threshold I 1 can he predetermined and set at a particular value, determined by a process other than dei ⁇ terfacing and prcnided (for example, as metadata for the video being deinterlaccd) or it can be dynamically determined during deinferlacing
  • the five fields can be distributed s> mr ⁇ etrically in the past and in the future relative to a pixel X in the Current Frame C, as shown in Figure 21 T
  • he static area can he sub-dh ided into three categories forward static (static relative to the previous frame), backward static (static relative to the next frame), or bi-directional (if both the forward and the backward criteria are satisfied)
  • This finer categorization of the static areas can Improve performance especially at scene changes and object appear! ng/d i sappearing
  • An area of the motion-map can be considered a slow -motions area in the motion-map if the luminance values of certain pixels do not meet the criteria to be designated a static area but meet criteria to be designated a slow-motion area. Equation 29 below defines criteria that can be used to determine a slow- motion area. Referring to Figure 22, the locations of pixels Ia. Ic, Ja, Jc, Ka, Kc. La 5 Lx, P and N identified in Equation 29 are shown in an aperture centered around pixel .V.
  • the aperture includes a 3x7 pixel neighborhood of the Current Field (C) and 3x5 neighborhoods of the Next Field (N) a Previous Field (P) Pixel ,V is considered to be part of a slow-motion area if it does not meet the above-listed criteria for a static area and if pixels in the aperture meet the following criteria shown in Equation 29- where T: is a threshold, and re luminance values for pixels
  • the threshold 72 can also be predetermined and set at a particular value, determined by a process other than dei ⁇ terlacing and provided (for example, as metadata for the video being deinteriaced) or it can be dynamically determined during deinterlacing.
  • a filter can blur edges that are horizontal (e.g., more than 45° from vertically aligned) because of the angle of its edge detection capability.
  • the edge detection capability of the aperture ⁇ filter ⁇ illustrated in Figure 22 is affected by the angle formed by pixel "A" and "F", or "C” and "D". Any edges more horizontal than such an angle that will not be interpolated optimally and hence staircase artifacts may appear at those edges
  • the slow-motion category can be divided into two sub-categories, "Horizontal Edge” and “otherwise " to account for this edge detection effect.
  • the slow-motion pixel can be categorized as a Horizontal fudge if the criteria in Equation 30, shown below, is satisfied, and to a so-called “Otherwise” category if the criteria in Equation 30 is not satisfied
  • 7 ' is a threshold
  • LA, AB, AC LD, /.E.. and AF are the luminance values of pixels A 5 B. C, D. Xi. and F
  • process A ( Figure 26) then proceeds to block 2604 and generates a provisional deinterlaced frame based upon the motion intensity map.
  • Wmed filter 2054 ( Figure 20) filters the selected field and the necessary adjacent fieids(s) to provide a candidate full-frame image F ( , which can be defined as follows.
  • the Wmed filtered provisional deinferlaced frame is provided for further processing in conjunction with motion estimation and motion compensation processing, as illustrated in the lower portion of Figure 20
  • the static interpolation comprises inter-field interpolation and the slow-motion and fast-motion interpolation comprises intra-field interpolation
  • temporal interpolation can be "disabled” by setting the threshold f? (Equations 4-6) to zero (7, ⁇ 0) . Processing of the current field with temporal interpolation disabled results in categorizing no areas of the motion-level map as static, and the Wmed filter 2054 ( Figure 20) uses the three fields illustrated in the aperture in Figure 22 which operate on a current field and the two adjacent non-parity fields
  • a denoiser can be used to remove noise from the candidate Wmed frame before it is further processed using motion compensation information.
  • a denoiser can remove noise that is present in the Wmed frame and retain the signal present regardless of the signal ' s frequency content.
  • de noising filters can be used, including wavelet filters. Wavelets are a class of functions used to localize a given signal in both space and scaling domains Hie fundamental idea behind wavelets is to analyze the signal at different scales or resolutions such that small changes in the wavelet representation produce a correspondingly small change in the original signal.
  • a de ⁇ oising filter is based on an aspect of a (4, 2) bi- orthogona! cubic B-spline wavelet filter
  • One such filter can be defined by the following forward and inverse transforms.
  • Wavelet shrinkage or a wavelet Wiener filter can be also be applied as the denoiser.
  • Wavelet shrinkage denoising can involve shrinking in the wavelet transform domain, and typically comprises three steps, a linear forward wavelet transform, a nonlinear shrinkage denoising, and a linear inverse wavelet transform.
  • the Wiener filter is a MSE-optima! linear filter which can be used to improve images degraded by additive noise and blurring.
  • Such filters are generally known in the art and are described, for example, in "Ideal spatial adaptation by wavelet shrinkage," referenced above, and by S P Ghael, A. M. Sayeed. and R G Bara ⁇ iuk, "Improvement Wavelet denoising via empirical Wiener filtering," Proceedings of SPfE, ⁇ ol 3 16 ⁇ - ⁇ pp 389-399, San Diego, July 1997
  • process B performs bi-directional motion estimation, and then at block 104 uses motion estimates to perform motion compensation, which is illustrated further illustrated in Figure 20, and described in an illustrative aspect hereinbelow
  • motion compensation information for the "missing" data (the non-original rows of pixel data) of the Current Field "C" is being predicted from information in both the previous frame "P " and the next frame “N” as shown in Figure 23
  • solid lines represent rows where original pixel data exist and dashed lines represent rows where Wmed-interpoiated pixel data exist
  • motion compensation is perfoimed in a 4-iovv by 8- colurnn pixel neighboihood Howe ⁇ er, this, pixel neighborhood is an example for purposes of explanation, and it will be apparent to those skilled in the art that motion compensation ma ⁇
  • the bi-directional MFAIC 2068 can use sum of squared errors (SSR) can be used to measure the similarity between a predicting block and a predicted block for the Wmcd current frame 2060 relative to the Wmed next frame 2058 and the deinterlaced current frame 2070 The generation of the motion compensated current frame 2066 then uses pixel information from the most similar matching blocks to fill in the missing data between the original pixel lines
  • the bi-directional MIi MC 2068 biases or gives more weight to the pixel information from the deinteriaced pievious frame 2070 information because it uas generated by motion compensation information and W med information, while the Wmed next frame 2058 is only deinterlaced b ⁇ spatio-temporal filtering
  • a metric can be used that includes the contribution of pixel values of one or more luma group of pixels (SSR) can be used to measure the similarity between a predicting block and a predicted block for the Wmcd current frame 2060 relative to the Wmed next frame 2058
  • MVs Motion Vectors
  • some filters that can be used to obtain half-pixel samples include a bilinear filtei (1, 1 ), an inteipolatio ⁇ filter recommended by H 263 'AVC (L -5, 20, 20, -5. I), and a six-tap Hamming windowed sine function filter (3, -21 , 147, 147. -2i, 3).
  • Vpixel samples can be generated from full and half pixel sample by applying a bilinear filter
  • motion compensation can use various types of searching processes to match data ⁇ e.g , depicting an object) at a certain location of a current frame to corresponding data at a different location in another frame (e.g , a next frame or a previous frame), the difference in location within the respective frames indicating the object's motion.
  • the searching processes use a full motion search which may cover a larger search area or a fast motion search which can use fewer pixels, and/or the selected pixels used in the search pattern can have a particular shape, e.g., a diamond shape.
  • the search areas can be centered around motion estimates, or motion candidates, which can be used as a starting point for searching the adjacent frames. In some aspects.
  • MV candidates can be generated from external motion estimators and provided to the deinterlacer. Motion vectors of a macroblock from a corresponding neighborhood in a previously motion compensated adjacent frame can also he used as a motion estimate. In some aspects, MV candidates can be generated from searching a neighborhood of macrobiocks (e.g., a 3-roacroblock by 3-niacroblock) of the corresponding previous and next frames
  • Figure 24 illustrates an example of two MV maps, MVp and MV ⁇ , that could be generated during motion estimation/compensation by searching a neighborhood of the previous frame and the next frame, as show in Figure 23. In both MVp and MV N the block to be processed to determine motion information is the center block denoted by "X.
  • a combiner 2062 typically merges the Wmed Current Frame 2060 and the MC Current Frame 2066 by using at least a portion of the Wined Current Frame 2060 and the MC Current Frame 2066 to generate a Current Deinferlaced Frame 2064.
  • the combiner 2062 may generate a Current Deinterlaced frame using only one of the Current Frame 2060 or the MC Current Frame 2066 In one example, the combiner 2062 merges the Wmed Current Frame 2060 and the MC Current Frame 2066 to generate a deinterlaced output signal as shown in Equation 36;
  • the combiner 2062 can be configured to try and maintain the following equation to achieve a high PSNR and robust results.
  • Chroma handling can be consistent with the collocated iuma handling.
  • the motion level of a chroma pixel is obtained by observing the motion level of its four collocated iuma pixeis. The operation can be based on voting (chroma motion level borrows the dominant Iuma motion level).
  • the chroma motion level shall be fast-motion; other wise, if any one of the four iuma pixels has a slow motion level, the chroma motion level shall be slow-motion; otherwise the chroma motion level is static
  • the conservative approach may not achieve the highest PSNR, but it avoids the risk of using IN TER prediction wherever there is ambiguity in chroma motion level [015Oj Multimedia data sequences were deinterlaced using the described Wmed algorithm described alone and the combined Wmed and motion compensated algorithm described herein The same multimedia data sequences were also deinterlaced using a pixel blending (or averaging) algorithm and a "no-dei ⁇ terlacing" ' case where the fields were merely combined without any interpolation or blending. The resulting frames were analyzed to determine the PSNR and is shown in the following table.
  • a poly-phase resampler is implemented for picture size resizing In one example of downsarnpling.
  • the ratio between the original and the resized picture can be p i ⁇ ⁇ /, where // and q are relatively prime integers.
  • the total number of phases is p
  • the cutoff frequency of the poly-phase filter in some aspects is 0.6 for resizing factors around 0.5.
  • the cutoff frequency does not exactly match the resizing ratio in order to boost the high-frequency response of the resized sequence. This inevitably allows some aliasing.
  • Figure 42 illustrates an example of poly-phase resampling, showing the phases if the resizing ration is J -J.
  • the cutoff frequency illustrated in Figure 42 is % also.
  • Original pixels are illustrated in the above Figure 42 with vertical axes.
  • a sine function is also drawn centered around the axes to represent the filter waveform. Because we choose the cutoff frequency to be exactly the same as the resampling ration, the zeros of the sine function overlap the position of the pixels after resizing, illustrated in Figure 42 with crosses. To find a pixel value after resizing, the contribution can be summed up from the original pixels as shown in the following equation:
  • SMPI E Society of Motion Picture & ' Television Engineers
  • the safe title area is defined as the area where "all the useful information can be confined to ensure visibility on the majority of home television receivers * " For example, as illustrated in Figure 43, the safe action area 4310 occupies the center 90% of the screen, giving a 5% border all around The safe title area 4305 occupies the center 80% of the screen, giving a 10% border Figure
  • a deblocking filter can be applied to all die 4x4 block edges of a frame, except edges at the boundary of the frame and any edges for which the deblocking filter process is disabled.
  • This filtering process shall be performed on a macroblock basis after the completion of the frame construction process with all macrobiocks in a frame processed in order of increasing macroblock addresses. For each macroblock, vertical edges are filtered first, from left to right, and then horizontal edges are filtered from top to bottom The luma deblocking filter process is 4!
  • a 2-D filter can be adaptively applied to smooth out areas near edges. Edge pixels undergo little or no filtering in order to avoid blurring.
  • Bandwidth Map Generation f ⁇ l ⁇ Oj Human visual quality V can be a function of both encoding complexity C and allocated bits B ⁇ also referred to as bandwidth).
  • Figure 29 is a graph illustrating this relationship. It should be noted that the encoding complexity metric C considers spatial and temporal frequencies from the human vision point of view. For distortions more sensitive to human eyes, the complexity value is correspondingly higher. It can typically be assume that V is monotonically decreasing in C, and monotonica ⁇ ly increasing in B.
  • bandwidth ratio ⁇ V / ⁇ can be treated as unchanged within the neighborhood of a (C, V) pair,
  • the bandwidth ratio ⁇ i is defined in the equation shown below:
  • Y is the average luminance component of a macroblock
  • o??. ⁇ m-s is a weighing factor for the luminance square and /. ⁇ - Ujf term following it, /W/*.
  • ⁇ W/*. is a normalization factor to guarantee For example, a value for ⁇ r ;;V7R , ⁇ 4 achieves good visual qualify.
  • Content information (e.g., a content classification) can be used to set ⁇ r AT ⁇ i to a value that corresponds to a desired good visual quality level for the particular content of the video, hi one example, if the video content comprises a "talking head" news broadcast, the visual quality level may be set lower because the information image or displayable portion of the video may be deemed of less importance than the audio portion, and less bits can be allocated to encode the data In another example, if the video content comprises a sporting event, content information may be used to set ⁇ , s ⁇ s .- to a value that corresponds to a higher visual quality level because the displayed images may be more important to a viewer, and accordingly more bits can be allocated to encode the data.
  • a content classification can be used to set ⁇ r AT ⁇ i to a value that corresponds to a desired good visual quality level for the particular content of the video, hi one example, if the video content comprises a "talking head" news broadcast, the visual quality level may be set lower because the information image or
  • the temporal complexity is determined by a measure of a frame difference metric, which measures the difference between two consecutive frames taking into account the amount of motion (e.g., motion vectors) along with a frame difference metric such as the sum of the absolute differences (SAD).
  • Bit allocation for inter-coded pictures can consider spatial as well as temporal complexity. This is expressed below:
  • MVp and MV N are the forward and the backward motion vectors for the current MB. It can be noted that Y 2 in the intra-coded bandwidth formula is replaced by sum of squared differences (SSD) To understand the role of
  • the motion compensator 23 can be configured to determine bi-directional motion information about frames in the video
  • the motion compensator 23 can also be configured to determine one or more difference metrics, for example, the sum of absolute differences (SAD) or the sum of absolute differences (SSD).
  • the shot classifier can be configured to classify frames in the video into two or more categories of "shots" using information determined by the motion compensator
  • the encoder is configured to adaptively encode the plurality of frames based on the shot classifications.
  • the motion compensator, shot classifier, and encoder are described below in reference to Equations 1-10.
  • J0172J Figure 28 is a block diagram of a preprocessor 202 comprising a processor 2831 configured for shot detection and other preprocessing operations according to some aspects.
  • a digital video source can be provided by a source externa! to the preprocessor 202 as shown in Figure 4 and communicated to a communications module 2836 in the preprocessor 202.
  • the preprocessor 202 contains a storage medium 2S25 which communicates w ith the processor 2831 , both of which communicate with the communications module 2836.
  • the processor 2831 includes a motion compensator 2032, a shot classifier 2833, and other modules for preprocessing 2034, which can operate to generate motion information, classify shots in frames of the video data, and perform other preprocessing tests as described herein
  • the motion compensator, shot classier, and other modules can contain processes similar to corresponding modules in Figure 4, and can process ⁇ kleo to determine information described below
  • the processor 283 ] can have a configuration to obtain metrics indicative of a difference between adjacent frames, of a plurality of video frames, the metrics comprising hi -directions!
  • the metrics can be calculated by a device or process externa! to the processor 2831 , which can also be external to the preprocessor 202, and communicated to the processor 2831, either directly or indirectly via another de ⁇ ice or memory
  • the metrics can also be calculated b> the processor 283 1.
  • the preprocessor 202 provides video and metadata for further processing, encoding, and transmission to other devices, for example, terminals 6 ( Figure 1 )
  • the encoded video can be, in some aspects, scalable multi-layered encoded video which can comprise a base layer and an enhancement laver Scalable laver encoding is further described in co-pending IJ S Patent Application No [Attorney docket no 050078] entitled "SC ⁇ L ⁇ BEJ V ⁇ S ⁇ O COUINI. Wn H Tw o LAY* K EM. ⁇ E >INI. AND SJNGI ⁇ LAYI- R D)VOl >i W owned by the assignee hereof, and which is incorporated by reference in its entirety herein
  • a general purpose processor such as the one shown in Hgure 28 ma> be a microprocessor, hut in the alternative, the processor ma> be any conventional processor, controller, microcontioller, or state machine
  • a processor may also be implemented as a combination of computing devices, e g .
  • Video encoding usually operates on a stmctured group of pictures (GOP)
  • a Gi)P norma!3> stalls with an intra-coded frame (l-llame), followed by a series of P (predictive) or B (bi-directional) frames
  • GOP stmctured group of pictures
  • a Gi)P norma!3> stalls with an intra-coded frame (l-llame), followed by a series of P (predictive) or B (bi-directional) frames
  • an I -frame can store all the data to display the frame
  • a ⁇ -frame relies on data in the preceding and following frames (e g., only containing data changed from the preceding frame or is different from data in the next frame)
  • a P-frame contains data that has changed from the preceding frame.
  • 1-frames are interspersed with P-frames and B-frames in encoded video.
  • I- frames are typically much larger than P-frames, which in turn are larger than B-frames.
  • the length of a GOF should be long enough to reduce the efficient loss from big I-frames, and short enough to tight mismatch between encoder and decoder, or channel impairment
  • macro blocks (MB) in P frames can be intra coded for the same reason.
  • Scene change detection can be used for a video encoder to determine a proper GOP length and insert I-frames based on the GOP length, instead of inserting an I-frame at a fixed interval.
  • the communication channel is usually impaired by bit errors or packet losses Where to place 1 frames or I MBs may significantly impact decoded video quality and viewing experience.
  • One encoding scheme is to use intra-coded frames for pictures or portions of pictures that have significant change from collocated previous pictures or picture portions Normally these regions cannot be predicted effectively and efficiently with motion estimation, and encoding can be done more efficiently if such regions are exempted from inter-frame coding techniques (e.g., encoding using B-frames and P-frames) In the context of channel impairment, those regions are likely to suffer from error propagation, which can be reduced or eliminated (or nearly so) by intra-frame encoding.
  • 0178J Portions of the GOP video can be classified into two or more categories, where each region can have different intra-frame encoding criteria that may depend on the particular implementation.
  • the video can be classified into three categories' abrupt scene changes, cross-fading and other slow scene changes, and camera flashlights.
  • Abrupt scene changes includes frames that are significantly different from the previous frame, usually caused by a camera operation Since the content of these frames is different from that of the previous frame, the abrupt scene change frames should be encoded as I frames.
  • Cross- fading and other slow scene changes includes slow switching of scenes, usually caused by computer processing of camera shots. Gradual blending of two different scenes may look more pleasing to human eyes, but poses a challenge to video coding. Motion compensation cannot reduce the bitrate of those frames effectively, and more intra MBs can be updated for these frames.
  • Camera flashlights or camera flash events, occur when the content of a frame includes camera flashes. Such flashes are relatively short in duration (e.g., one frame) and extremely bright such that the pixels in a frame portraying the flashes exhibit unusually high luminance relative to a corresponding area on an adjacent frame. Camera flashlights shift the luminance of a picture suddenly and swiftly Usually the duration of a camera flashlight is shorter than the temporal masking duration of the human vision system (HVS), which is typically defined to be 44 ms. Human eyes are not sensitive to the quality of these short bursts of brightness and therefore they can be encoded coarsely.
  • HVS human vision system
  • FIG. 30 illustrates a process 3000 that operates on a GOP and can be used in some aspects to encode video based ⁇ n shot detection in video frames, where portions of the process 3000 ⁇ or sub-processes) are described and illustrated with reference to Figures 30-40.
  • the processor 283 1 can be configured to incorporate process 3000. After process 3000 starts, it proceeds to block 3042 where metrics (information) are obtained for the video frames, the metrics including information indicative of a difference between adjacent frames.
  • the metrics includes bidirectional motion information and luminance-based information that is subsequently to determine changes that occurred between adjacent frames which can be used for shot classification. Such metrics can be obtained from another device or process, or calculated by. for example, processor 2831. Illustrative examples of metrics generation are described in reference to process A in Figure 31
  • a video frame can be classified into two or more categories of what type of shot is contained in the frame, for example, an abrupt scene change, a slowly changing scene, or a scene containing high luminance values (camera flashes). Certain implementations encoding may necessitate other categories.
  • An illustrative example of shot classification is described in reference to process B in Figure 32, and in more detai! with reference to processes D, E 5 and F in Figures 34-36, respectively
  • process 3000 proceeds to block 3046 where the frame can be encoded, or designated for encoding, using the shot classification results Such results can influence whether to encode the frame with an intra-coded frame or a predictive frame (e.g , P-frame or B-frame).
  • Process O in Figure 33 shows an example of an encoding scheme using the shot results.
  • FIG. 31 illustrates an example of a process for obtaining metrics of the ⁇ ideo Figure 31 illustrates certain steps that occur in block 3042 of Figure 30
  • process A obtains or determines bidirectional motion estimation and compensation information of the video.
  • the motion compensator 2832 of Figure 28 can be configured to perform bi-directional motion estimation on the frames and determine motion compensation information that can be used for subsequent shot classification.
  • Process A then proceeds to block 31 54 where it generates luminance information including a luminance difference histogram for a current or selected frame and one or more adjacent frames Lastly, process A then continues to block 3156 where a metric is calculated that indicative of the shot contained in the frame.
  • a frame difference metric is shown in two examples in Equations 4 and 10
  • Illustrative examples of determining motion information, luminance information, and a frame difference metric are described below Motion Compensation
  • a video sequence can be preprocessed with a bi-directional motion compensator that matches every 8x8 biock of the current frame with blocks in two of the frames most adjacent neighboring frames, one in the past, and one in the future Hie motion compensator produces motion vectors and difference metrics for every block.
  • Figure 37 illustrates this concept, showing an example of matching pixels of a current frame C to a past frame P and a future (or next) frame N, and depicts motion vectors to the matched pixels (past motion vector MV,, and future motion vector MV N .
  • FIG 40 illustrates an example of a motion vector determination process and predictive frame encoding in, for example, MPEG-4. The process described in Figure 40 is a more detailed illustration of an example process that can take place in block 3152 of Figure 31.
  • current picture 4034 is made up of 5 x 5 macroblocks, where the number of macroblocks in this example is arbitrary.
  • a macroblock is made up of 16 x 16 pixels Pixels can be defined by an 8-bit luminance value (Y) and two 8-bi ⁇ chrominance values (Cr and Cb).
  • Y. Cr and Cb components can be stored in a 4;2 0 format, where the Cr and Cb components are down-sampled by 2 in the X and the Y directions.
  • each macroblock would consist of 256 Y components, 64 Cr components and 64 Cb components.
  • Macroblock 4036 of current picture 4034 is predicted from reference picture 4032 at a different time point than current picture 4034 A search is made in reference picture 4032 to locate best matching macroblock 4038 that is closest, in terms of Y, Cr and Cb values to current macroblock 4036 being encoded. Hie location of best matching macroblock 138 in reference picture 4032 is encoded in motion vector 4040.
  • Reference picture 4032 can be an 1 -frame or P-frame that a decoder will have reconstructed prior to the construction of current picture 4034 Best matching macroblock 4038 is subtracted from current macrob1oek40(a difference for each of the Y. Cr and Cb components is calculated) resulting in residual error 4042 Residua! error 4042 is encoded with 2D Discrete Cosine Transform (DCT) 4044 and then quantized 4046 Quantization 4046 can be performed to provide spatial compression by, for example, allotting fewer bits to the high frequency coefficients while allotting more bits to the low frequency coefficients.
  • the quantized coefficients of residual error 4042, along with motion vector 4040 and reference picture 4034 identifying information, are encoded information representing current macroblock 4036.
  • the encoded information can be stored in memory for future use or operated on for purposes of, for example, error correction or image enhancement, or transmitted over network 140
  • the encoded quantized coefficients of residual error 4042. along with encoded motion vector 4040 can be used to reconstruct current macroblock 4036 in the encoder for use as part of a reference frame for subsequent motion estimation and compensation.
  • the encoder can emulate the procedures of a decoder for this P-frame reconstruction. The emulation of the decoder will result in both the encoder and decoder working with the same reference picture.
  • the reconstruction process, -whether done in an encoder, for further inter-coding, or in a decoder, is presented here.
  • Reconstruction of a P-frame can be started after the reference frame (or a portion of a picture or frame that is being referenced) is reconstructed.
  • the encoded quantized coefficients are dequantized 4050 and then 2D Inverse DCT, or IDCT, 4052 is performed resulting in decoded or reconstructed residual error 4054.
  • Encoded motion vector 4040 is decoded and used to locate the already reconstructed best matching macroblock 4056 in the already reconstructed reference picture 4032 deconstructed residual error 4054 is then added to reconstructed best matching raacroblock 4056 to form reconstructed macrob5ock 4058.
  • Reeonstmeted macroblock 4058 can be stored in memory, displayed independently or in a picture with other reconstructed macrobiocks, or processed further for image enhancement.
  • Encoding using B-frames can exploit temporal redundancy between a region in a current picture and a best matching prediction region in a previous picture and a best matching prediction region in a subsequent picture
  • the subsequent best matching prediction region and the previous best matching prediction region are combined to form a combined bidirectional predicted region.
  • the difference between the current picture region and the best matching combined bi-directional prediction region is a residual error (or prediction error).
  • the locations of the best matching prediction region in the subsequent reference picture and the best matching prediction region in the previous reference picture can be encoded in two motion vectors. Luminance Histogram Difference 5 !
  • the motion compensator can produce a difference metric for every Mock,
  • the difference metric can be a sum of square difference (SSD) or a sum of absolute difference (SAD) Without loss of generality, here SAD is used as an example ⁇ 01911 F° r every frame, a SAD ratio is calculated as below:
  • SADp and SADx are the sum of absolute differences of the forward and the backward difference metric, respectively.
  • the denominator contains a small positive number ⁇ to prevent Ui e "divide-by-zero " error.
  • the nominator also contains an ⁇ to balance the effect of the unity in the denominator. For example, if the previous frame, the current frame, and the next frame are identical, motion search should yield SADp ⁇ SA D ⁇ ⁇ 0 in this case, the above calculation generators ⁇ - • - • - ⁇ S instead of 0 or infinity
  • a luminance histogram can be calculated for ever ⁇ ' frame.
  • the multimedia images have a luminance depth (e g., number of "bins " ) of eight bits
  • the luminance depth used for calculating the luminance histogram according t ⁇ some aspects can be set to 16 to obtain the histogram
  • the luminance depth can be set to an appropriate number which may depend upon the type of data being processed, the computational power available, or other predetermined criteria.
  • the luminance depth can be set dynamically based on a calculated or received metric, such as the content of the data
  • Equation 49 illustrates one example of calculating a luminance histogram difference (lambda):
  • Np 1 is the number of blocks in the • ith 1 bin for the previous frame, and Nc; is the number of blocks in the f' bin for the current frame, and N is the total number of blocks in a frame
  • process B determines three categories of shot (or scene) changes using metrics obtained or determined for the ⁇ idco Figure 32 illustrates certain steps occurring in one aspect of block 3044 of Figure 30
  • process B first determines if the frame meets criteria to be designated an abrupt scene change Process D in Figure 34 illustrates an example of this determination Process B then proceeds to block 3264 where it determines of the frame is part of a slowly changing scene Process C m Figure 35 illmtiates an example of determining a slow changing scene
  • process B determines if the frame contains camera flashes, in other words, large luminance ⁇ alues differing from the previous fiarne Process F in Figure 36 illustrates an example of determining a frame containing camera flashes An illustrative example of these processes are described below
  • Figme 34 is a flow diagram illustrating a process of determining abrupt scene changes Hgure 34 further elaborates certain steps that can occur in some aspects of block 3262 of Figure 32, At block 3482 checks if the frame difference metric D meets the criterion shown in Equation 51 33
  • process D designates the frame as an abrupt scene change and, in this example, no further shot classification is necessary.
  • FIG. 35 further illustrates further details of some aspects that can occur in block 3264 of Figure 32.
  • process E determines if the frame is part of a series of frames depicting a slow scene change.
  • Process E determines that the current frame is a cross-fading or other slow scene change if the frame difference metric D is less than the first threshold value T> and greater or equal to a second threshold value /> as illustrated in Equation 52:
  • process E classifies the frame as part of a slow changing scene shot classification for the selected frame ends.
  • Process F shown in Figure 36 is an example of a process that can determine if the current frame comprises camera flashlights.
  • the luminance histogram statistics are used to detenu ine if the current frame comprises camera flashlights.
  • Process F determines camera flash events are in the selected frame by first determining if the luminance of a current frame is greater than the luminance of the previous frame and the luminance of the next frame, shown at block 3602. if not, the frame is not a camera flash event; but if so it may be.
  • Process F determines whether the backwards difference metric is greater than a threshold Ti, and if the forwards difference metric is greater than a threshold T.i; if both these conditions are satisfied, at block 3606 process F classifies the current frame as having camera flashlights, ⁇ n one example, at block 3602, process F determines if the average luminance of the current frame minus the average luminance of the previous frame is equal of exceeds a threshold 7), and process F determines if the average luminance of minus the average luminance of the next frame is greater than or equal to the threshold T;, as shown in Equations 53 and 54:
  • process F proceeds to block 3604 where it determines if a backwards difference metric SADt- and a forward difference metric SADs are greater than a certain threshold T4, as illustrated in liquations 55 and 56 below:
  • process F returns
  • one or more of the threshold values Tj, T 2 , T 3 , and T 4 are predetermined and such values are incorporated into the shot classifier in the encoding device Typically, these threshold values are selected through testing of a particular implementation of shot detection, In some aspects, one or more of the threshold values Tj, T 3 , T * , and T 4 can be set during processing (e.g , dynamically) based on using information (e.g , metadata) supplied to the shot classifier or based on information calculated by the shot classifier itself
  • FIG 33 shows a process C for determining encoding parameters for the video, or for encoding the video, based on the shot classification of the selected frame.
  • process C determines if the selected frame was classified as an abrupt scene change If so, at block 3371 the current frame is classified as an abrupt scene change, and the frame can be encoded as an I -frame and a GOP boundary can be determined. If not process C" proceeds to block 3372; if the current frame is classified as a portion of a slowly changing scene, at block 3373 the current frame, and other frames in the slow changing scene can be encoded as a predictive frame (e.g., P-frame or B-frame).
  • a predictive frame e.g., P-frame or B-frame
  • Process C then proceeds to block 3374 where it checks if the current frame was classified as a flashlight scene comprising camera flashes. If so, at block 3375 the frame can be identified for special processing, for example, removal, replication of a previous frame or encoding a particular coefficient for the frame If not, no classification of the current frame was made and the selected frame can be encoded in accordance with other criteria, encoded as an 1-franie, or dropped Process C can be implemented in an encoder.
  • the amount of difference between the frame to be compressed and its adjacent two frames is indicated by a frame difference metric D. If a significant amount of a one-way luminance change is detected, it signifies a cross- fade effect in the frame The more prominent the cross-fade is, the more gain may be achieved by using B frames.
  • a modified frame difference metric is used as shown in Equation 57 below.
  • represents a constant that can be determined in normal experimentation as it can depend on the implementation
  • a is a weighting variable having a value between 0 and 1.
  • Table 1 below shows performance improvement by adding abaipt scene change detection.
  • the total number of ⁇ -fra ⁇ ies in both the non-scene-change (NSC) and the scene-change ( SC) cases are approximately the same.
  • NSC non-scene-change
  • SC scene-change
  • a "B" frame (B stands for bi-directional) can use the previous and next ⁇ or P pictures either individually or simultaneously as reference.
  • the number of bits used to encode an 1-fraine on the average exceeds the number of bits used to encode a P-frame; likewise the number of bits used to encode a P-frame on the average exceeds that of a B-fraroe A skipped frame, if it is used, may use no bits for its representation.
  • a group of pictures parti tioner adapt! vely encodes frames to minimize temporal redundancy. Differences between frames are quantified and a decision to represent the picture by a L P. B, or skipped frame is automatically made after suitable tests are performed on the quantified differences.
  • the adaptive encoding process described herein is flexible and is made to adapt to these changes in content
  • the adaptive encoding process evaluates a frame difference metric, which can be thought of as measure of distance between frames, with the same additive properties of distance In concept, given frames FY F ' 2, and F ? having the inter-frame distances ⁇ z and da;?, the distance between F f and F. ⁇ is taken as being at least d ⁇ ; ⁇ d;.v Frame assignments are made on the basis of this distance- ⁇ ke metric and other measures
  • the GOP partitioner 412 operates by assigning picture types to frames as they are received.
  • the picture type indicates the method of prediction that may be used to code each block
  • the residual block is encoded, typically using the discrete cosine transform for the elimination of spatial redundancy, A P encoding types is assigned to a frame if the "distance" between it and the last frame assigned to be a F frame exceeds a second threshold, which is typically less than the first.
  • B-frame pictures can use the previous and next P- or l-pictuies for motion compensation as described above.
  • a block in a B picture can be forward, backward or bi-directionally predicted, or it could be intra-coded without reference to other frames.
  • a reference block can be a linear combination of as many as 32 blocks from as many frames. If the frame cannot be assigned to be an 1 or P type, it is assigned to be a B type, if the "distance" from it to its immediate predecessor is greater than a third threshold, which typically is less than the second threshold. If the frame cannot be assigned to become a B-frame encoded, it is assigned to "skip frame'" status. This frame can be skipped because it is virtually a copy of a previous frame
  • the motion compensation is done on a field basis, the search for the reference blocks taking place in fields rather than frames
  • a forward reference block is found in fields of the frame that follows it; likewise a backward reference block found in fields of the frame that immediately precedes the current field
  • the current blocks are assembled into a compensated field.
  • the process continues with the second field of the frame.
  • the two compensated fields are combined to form a forward and a backward compensated frame.
  • the motion compensator produces motion vectors and difference metrics for every block Note that the differences in the metric are evaluated between a block in the field or frame being considered and a block that best matches it, either in a preceding field or frame or a field or frame that immediately follows it, depending on whether a forward or backward difference is being evaluated. Only luminance values enter into this calculation [022Of The motion compensation step thus generates two sets of differences. These are between blocks of current values of luminance and the luminance values in reference blocks taken from frames that are immediately ahead and immediately behind the current one in time.
  • each forward and each backward difference is determined for each pixel in a block and each is separately summed over the entire frame. Both fields are included in the two summations when the deinterlaced NTSC fields that comprise a frame are processed. In this way, SADp, and SADs, the summed absolute values of the forward and backward differences are found f 022 i I For every frame a SAD ratio is calculated using the relationship,
  • the difference can be the SSD, the sum of squared differences, and SAD, the sum of absolute differences, or the SATD, in which the blocks of pixel values are transformed by applying the two dimensional Discrete Cosine Transform to them before differences in block elements are taken The sums are evaluated over the area of active video, though a smaller area may be used in other aspects
  • the luminance histogram of every frame as received ⁇ non-motion compensated) is also computed.
  • the histogram operates on the DC coefficient, i.e., the (0.0) coefficient, in the 16x 16 array of coefficients that is the result of applying the two dimensional Discrete Cosine Transform to the block of luminance values if it were available, Equivalentiy the average value of the 256 values of luminance in the 16x16 block may be used in the histogram. For images whose luminance depth is eight bits, the number of bins is set at 16, The next metric evaluates the histogram difference
  • Np is the number of blocks from the previous frame in the i' h bin
  • N C! is the number of blocks from the current frame that belong in the /'* bin
  • N is the total number of blocks in a frame.
  • Preprocessor 4125 delivers interlaced fields in the case of video having a NTSC source, and frames of film images when the source of the video is the result of inverse telecine to the bi-directional motion compensator 4133.
  • the bi-directional motion compensator 4133 operates on a field (or frame in the case of a cinematic source of video) by breaking it into blocks of 16x 16 pixels and comparing each block to all 16x16 blocks in a defined area of a field of the previous frame. The block which provides the best match is selected and subtracted from the current block.
  • the absolute values of the differences is taken and the result summed over the 256 pixels that comprise the current block.
  • the backward difference metric has been computed by a backward difference module 4137.
  • a similar procedure may be performed by a forward difference module 4136.
  • the forward difference module 4136 uses the frame which is immediately ahead of the current one in time as a source of reference blocks to develop the SADp 5 the forward difference metric
  • the same estimation process albeit done using the recovered film frames, takes place when the inp ⁇ t frames are formed in the inverse tele ⁇ ne
  • the histograms that can be used to complete the computation of the frame difference metric may be formed in histogram difference module 4141 Each 16x 16 block is assigned to a bin based on the average value of its luminance.
  • This information is formed by adding all 256 pixel luminance values in a block together, normalizing it by 256 if desired, and incrementing the count of the bin into which the average value would have been placed
  • the calculation is done once for each pre-motion compensated frame, the histogram for the current frame becoming the histogram for the previous frame when a new current frame arrives
  • the two histograms are differenced and normalized by the number of blocks in histogram di tTerence module 4141 Io form ⁇ , defined by Equation 59
  • frame difference combiner 4143 which uses the inte ⁇ nediale results found in histogram difference module 413 Q , forward and backward difference modules 4136 and 4136 to evaluate the current frame difference defined in Fxj ⁇ ation 60 [0227 f
  • the system of flowchart 4100 and components or steps thereof, can be implemented by hardware, software, firmware, middleware, microcode, or any combination thereof.
  • Each functional component of flowchart 4100 including the preprocessor 4135. the bidirectional motion compensator 4133, the toward and backward difference metric modules 4136 and 4137. the histogram difference module 4141, and the frame difference metric combiner 4143, may be realized as a standalone component, incorporated as hardware, firmware, middleware in a component of another device, or be implemented in microcode or software that is executed on the processor, or a combination thereof.
  • the program code or code segments that perform the desired tasks may be stored in a machine readable medium such as a storage medium.
  • a code segment may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program statements.
  • a code segment may be coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents.
  • the received and processed data can be stored in a storage medium which can include, for example, a chip configured storage medium (e.g , ROM, RAM) or a disc-type storage medium (e.g , magnetic or optical) connected to a processor.
  • the combiner 4143 can contain part or all of the storage medium.
  • Flowchart 4200 in Figure 4 ! illustrates a process of assigning compression types to frames
  • M the current frame difference defined in Equation 3
  • decision block 4253 indicates, if a frame under consideration is the first in a sequence, the decision path marked YES is followed to block 4255, thereby declaring the frame to be an 1 frame.
  • the accumulated frame differences is set to zero in block 4257, and the process returns (in block 4258) to the start block 4253 If the frame being considered is not the first frame in a sequence, the path marked NO is followed from block 4253 where the decision was made, and in test block 425*5 the current frame difference is tested against the scene change threshold If the current frame difference is larger than that threshold, the decision path marked YKS is followed to block 4255, again leading to the assignment of an I -frame. If the current frame difference is less than the scene change threshold, the NO path is followed to block 426! where the current frame difference is added the accumulated frame difference
  • the accumulated frame difference is compared with threshold t, which is in general less than the scene change threshold. If the accumulated frame difference is larger than t, control transfers to block 4265. and the frame is assigned to be a P frame; the accumulated frame difference is then reset to zero in step 4267. if the accumulated frame difference is less than t, control transfers from block 4263 to block 4269 There the current frame difference is compared with T, which is less than t If the current frame difference is smaller than ⁇ , the frame is assigned to be skipped in block 4273, if the current frame difference is larger than ⁇ , the frame is assigned to be a ⁇ frame
  • is a scaler
  • SAl )p is the SAP with forward motion compensation
  • Afl ⁇ P is the sum of lengths measured in pixels of the motion ⁇ ectojs from the forward motion compensation
  • ⁇ and m are two threshold numbers that render the frame encoding complexity indicator to zero if SADp is lower than .v or ⁇ / ⁇ > is lower than m.
  • M* would he used in place of the cu ⁇ ent frame difference in flowchart 4200 of Figure 41 ⁇ s can be seen, M* is different from M onl ⁇ if the forward motion compensation shows a low le ⁇ el of rno ⁇ ement In this case, ⁇ /;s smaller than ⁇ /
  • shot detection and encoding aspects described herein may be described as a process which is depicted as a flowchart, a flow diagiam, a str ⁇ ciuie diagram, or a block diagram
  • the flowcharts shown in the figures rna ⁇ describe operations as a sequential process, many operations can be performed in parallel or concurrently
  • the order of operations may be re-arranged.
  • ⁇ process is typically terminated when its operations are completed
  • a process may correspond to a method, a function, a procedure, a subroutine, a subprogiam, etc
  • a process corresponds to a function
  • its termination corresponds to a return of the function to the calling function or the main function
  • infojmation and multimedia data may be represented using am of a v ariety of different technologies and techniques
  • ⁇ arious illustraine logical blocks, modules, and algorithm steps described in connection with the examples disclosed herein may be implemented as electronic hardware, firmware, computei softwate, middleware, microcode, or combinations thereof Fo clearly illustrate this interchangeabil ' ity of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described abo ⁇ e generalh in terms of their functionality .
  • the steps of a method or algorithm described in connection with the shot detection and encoding examples and Figures disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two.
  • the methods and algorithms are particularly applicable to communication technology including wireless transmissions of video to cell phones, computers, laptop computers, PDA ' s and all tv ⁇ es of personal and business communication devices, software module may reside in RAM memory, flash memory, ROM memory, EPROM memory.
  • An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium.
  • the storage medium may be integral to the processor.
  • the processor and the storage medium may reside in an Application Specific. Integrated Circuit (ASIC).
  • ASIC Application Specific. Integrated Circuit
  • the ASIC may reside in a wireless modem
  • the processor and the storage medium may reside as discrete components in the wireless modem.
  • DSP digital signal processor
  • ASK application specific integrated circuit
  • FPGA field programmable gate array
  • a general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine.
  • a processor may also be implemented as a combination of computing devices, e.g , a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
EP07758479A 2006-04-03 2007-03-13 Präprozessorverfahren und -vorrichtung Withdrawn EP2002650A1 (de)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US78904806P 2006-04-03 2006-04-03
US78937706P 2006-04-04 2006-04-04
US78926606P 2006-04-04 2006-04-04
PCT/US2007/063929 WO2007114995A1 (en) 2006-04-03 2007-03-13 Preprocessor method and apparatus

Publications (1)

Publication Number Publication Date
EP2002650A1 true EP2002650A1 (de) 2008-12-17

Family

ID=38121947

Family Applications (1)

Application Number Title Priority Date Filing Date
EP07758479A Withdrawn EP2002650A1 (de) 2006-04-03 2007-03-13 Präprozessorverfahren und -vorrichtung

Country Status (7)

Country Link
EP (1) EP2002650A1 (de)
JP (3) JP2009532741A (de)
KR (5) KR20140010190A (de)
CN (1) CN104159060B (de)
AR (1) AR060254A1 (de)
TW (1) TW200803504A (de)
WO (1) WO2007114995A1 (de)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI396975B (zh) * 2008-08-06 2013-05-21 Realtek Semiconductor Corp 可調適緩衝裝置及其方法
TWI392335B (zh) * 2009-08-14 2013-04-01 Sunplus Technology Co Ltd 在縮放器中去除一影像訊號之環形雜訊之濾波系統及方法
CN105739209B (zh) 2009-11-30 2022-05-27 株式会社半导体能源研究所 液晶显示设备、用于驱动该液晶显示设备的方法
WO2012100117A1 (en) * 2011-01-21 2012-07-26 Thomson Licensing System and method for enhanced remote transcoding using content profiling
CN103907136A (zh) * 2011-10-01 2014-07-02 英特尔公司 用于视频代码转换中的集成后处理和预处理的系统、方法和计算机程序产品
KR101906946B1 (ko) 2011-12-02 2018-10-12 삼성전자주식회사 고밀도 반도체 메모리 장치
US10136147B2 (en) 2014-06-11 2018-11-20 Dolby Laboratories Licensing Corporation Efficient transcoding for backward-compatible wide dynamic range codec
CN108702506B9 (zh) * 2016-03-07 2021-10-15 索尼公司 编码设备和编码方法
CN111656246B (zh) * 2018-01-02 2022-07-29 伦敦大学国王学院 用于定位显微的方法和系统、计算机可读存储介质
CN111310744B (zh) * 2020-05-11 2020-08-11 腾讯科技(深圳)有限公司 图像识别方法、视频播放方法、相关设备及介质
CN112949449B (zh) * 2021-02-25 2024-04-19 北京达佳互联信息技术有限公司 交错判断模型训练方法及装置和交错图像确定方法及装置
CN114363638B (zh) * 2021-12-08 2022-08-19 慧之安信息技术股份有限公司 基于h.265熵编码二值化的视频加密方法
CN114125346B (zh) * 2021-12-24 2023-08-29 成都索贝数码科技股份有限公司 视频转换方法及装置

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000287173A (ja) * 1999-03-31 2000-10-13 Toshiba Corp 映像データ記録装置
US20030219160A1 (en) * 2002-05-22 2003-11-27 Samsung Electronics Co., Ltd. Method of adaptively encoding and decoding motion image and apparatus therefor
US6970513B1 (en) * 2001-06-05 2005-11-29 At&T Corp. System for content adaptive video decoding

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2700090B1 (fr) 1992-12-30 1995-01-27 Thomson Csf Procédé de désentrelacement de trames d'une séquence d'images animées.
DE69506076T2 (de) * 1994-04-05 1999-06-10 Koninkl Philips Electronics Nv Umsetzung einer zeilensprung-abtastung in eine sequentielle abtastung
JP2832927B2 (ja) * 1994-10-31 1998-12-09 日本ビクター株式会社 走査線補間装置及び走査線補間用動きベクトル検出装置
JPH09284770A (ja) * 1996-04-13 1997-10-31 Sony Corp 画像符号化装置および方法
US5864369A (en) 1997-06-16 1999-01-26 Ati International Srl Method and apparatus for providing interlaced video on a progressive display
JP3649370B2 (ja) * 1998-02-25 2005-05-18 日本ビクター株式会社 動き補償符号化装置及び動き補償符号化方法
US6297848B1 (en) * 1998-11-25 2001-10-02 Sharp Laboratories Of America, Inc. Low-delay conversion of 3:2 pulldown video to progressive format with field averaging
JP2001204026A (ja) * 2000-01-21 2001-07-27 Sony Corp 画像情報変換装置及び方法
JP4576783B2 (ja) 2000-03-13 2010-11-10 ソニー株式会社 データ処理方法及びデータ処理装置
KR100708091B1 (ko) * 2000-06-13 2007-04-16 삼성전자주식회사 양방향 움직임 벡터를 이용한 프레임 레이트 변환 장치 및그 방법
KR100393066B1 (ko) 2001-06-11 2003-07-31 삼성전자주식회사 적응 움직임 보상형 디-인터레이싱 장치 및 그 방법
US6784942B2 (en) * 2001-10-05 2004-08-31 Genesis Microchip, Inc. Motion adaptive de-interlacing method and apparatus
JP4016646B2 (ja) * 2001-11-30 2007-12-05 日本ビクター株式会社 順次走査変換装置及び順次走査変換方法
KR100446083B1 (ko) * 2002-01-02 2004-08-30 삼성전자주식회사 움직임 추정 및 모드 결정 장치 및 방법
KR20060011281A (ko) * 2004-07-30 2006-02-03 한종기 트랜스코더에 적용되는 해상도 변환장치 및 방법
JP2006074684A (ja) * 2004-09-06 2006-03-16 Matsushita Electric Ind Co Ltd 画像処理方法及び装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000287173A (ja) * 1999-03-31 2000-10-13 Toshiba Corp 映像データ記録装置
US6970513B1 (en) * 2001-06-05 2005-11-29 At&T Corp. System for content adaptive video decoding
US20030219160A1 (en) * 2002-05-22 2003-11-27 Samsung Electronics Co., Ltd. Method of adaptively encoding and decoding motion image and apparatus therefor

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of WO2007114995A1 *

Also Published As

Publication number Publication date
WO2007114995A1 (en) 2007-10-11
KR20100126506A (ko) 2010-12-01
KR20140010190A (ko) 2014-01-23
KR20110128366A (ko) 2011-11-29
KR20090006159A (ko) 2009-01-14
JP2015109662A (ja) 2015-06-11
KR101019010B1 (ko) 2011-03-04
KR101377370B1 (ko) 2014-03-26
JP6352173B2 (ja) 2018-07-04
KR101373896B1 (ko) 2014-03-12
TW200803504A (en) 2008-01-01
KR101127432B1 (ko) 2012-07-04
JP5897419B2 (ja) 2016-03-30
AR060254A1 (es) 2008-06-04
JP2009532741A (ja) 2009-09-10
CN104159060A (zh) 2014-11-19
JP2013031171A (ja) 2013-02-07
KR20120091423A (ko) 2012-08-17
CN104159060B (zh) 2017-10-24

Similar Documents

Publication Publication Date Title
US9131164B2 (en) Preprocessor method and apparatus
WO2007114995A1 (en) Preprocessor method and apparatus
US8750372B2 (en) Treating video information
US8238421B2 (en) Apparatus and method for estimating compression modes for H.264 codings
KR100957479B1 (ko) 필드-기반 비디오에 대해 모션 보상을 이용한 공간-시간디인터레이싱을 위한 방법 및 장치
US6862372B2 (en) System for and method of sharpness enhancement using coding information and local spatial features
JP2009532741A6 (ja) プリプロセッサ方法および装置
EP1938580A1 (de) Verfahren und vorrichtung zur kameraeinstellungsdetektion beim video-streaming
EP1980115A2 (de) Verfahren und vorrichtung zur bestimmung eines codierungsverfahrens auf der basis eines verzerrungswerts in bezug auf die fehlerverbergung
EP1938615A1 (de) Adaptive gop-struktur beim video-streaming
US7031388B2 (en) System for and method of sharpness enhancement for coded digital video
JP2010232734A (ja) 画像符号化装置及び画像符号化方法
Segall et al. Super-resolution from compressed video
Jo et al. Hybrid error concealments based on block content
Manimaraboopathy et al. Frame Rate Up-Conversion using Trilateral Filtering For Video Processing

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20081006

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC MT NL PL PT RO SE SI SK TR

RIN1 Information on inventor provided before grant (corrected)

Inventor name: SHI, FANG

Inventor name: TIAN, TAO

Inventor name: LIU, FANG

Inventor name: RAVEENDRAN, VIJAYALAKSHMI R.

17Q First examination report despatched

Effective date: 20120328

DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20180628