CN104159060B

CN104159060B - Preprocessor method and equipment

Info

Publication number: CN104159060B
Application number: CN201410438251.8A
Authority: CN
Inventors: 田涛; 刘方; 石方; 维贾雅拉克希米·R·拉韦恩德拉恩
Original assignee: Qualcomm Inc
Current assignee: Qualcomm Inc
Priority date: 2006-04-03
Filing date: 2007-03-13
Publication date: 2017-10-24
Anticipated expiration: 2027-03-13
Also published as: KR20100126506A; KR101019010B1; TW200803504A; JP6352173B2; CN104159060A; KR20110128366A; JP2015109662A; JP2013031171A; JP5897419B2; EP2002650A1; KR20140010190A; KR101373896B1; KR101377370B1; WO2007114995A1; AR060254A1; JP2009532741A; KR101127432B1; KR20120091423A; KR20090006159A

Abstract

The present invention relates to preprocessor method and equipment, and more particularly, the processing operation for being related to before data compression process or being performed together with data compression process.A kind of method for handling multi-medium data includes receiving interlaced frames of video, obtains the metadata for the interlaced frames of video, the interlaced frames of video is converted into progressive using at least a portion of the metadata；And provide at least a portion of the progressive and the metadata to encoder to encode the progressive.Methods described may also include the spatial information and bi-directional motion information produced for the interlaced frames of video, and be based on the interlaced frames of video using the spatial information and the bi-directional motion information and produce progressive.

Description

Preprocessor method and equipment

The relevant information of divisional application

The application is the divisional application of the former Chinese invention patent application of entitled " preprocessor method and equipment ". The Application No. 200780010753.9 of original application；The applying date of original application is on March 13rd, 2007.

According to the CLAIM OF PRIORITYs of 35U.S.C. § 119

Present application for patent advocates No. 60/789,048 Provisional Application, April 4 in 2006 filed in 3 days April in 2006 No. 60/789,377 Provisional Application filed in No. 60/789,266 Provisional Application filed in day and 4 days April in 2006 Priority, all application cases transfer the present assignee and be hereby incorporated by reference herein.

Technical field

The present invention generally relates to multimedia-data procession, and more particularly, is related to before data compression process Or the processing operation performed together with data compression process.

Background technology

Nothing

The content of the invention

Each of invention as described herein device and method are respectively provided with any list in some aspects, the aspect Individual aspect does not want individually attribute to be responsible for it.In the case of not limiting the scope of the invention, now by brief discussion, it is more prominent Go out feature.After considering this discussion, and in particular, after the part of entitled " embodiment " is read, it should be understood that this How the feature of invention provides the improvement to multimedia-data procession device and method.

In one aspect, a kind of method for handling multi-medium data is comprising interlaced frames of video is received, by the terleaved video Frame is converted into progressive (progressive video), produces the metadata associated with the progressive, and will be described At least a portion of progressive and the metadata provides to encoder to encode the progressive.Methods described can Further comprise carrying out encoded progressive video using metadata.In certain aspects, interlaced frames of video (interlaced video) Include ntsc video.Transformed video frame may include interlaced frames of video described in release of an interleave.

In certain aspects, metadata may include bandwidth information, bi-directional motion information, bandwidth ratio (bandwidth Ratio), complexity value (for example, time complexity value or spatial complexity value or both), monochrome information (luminance ), and spatial information may include brightness and/or chrominance information information.Methods described, which may also include generation, is used for described hand over The spatial information and bi-directional motion information of wrong frame of video, and using the spatial information and the bi-directional motion information based on described Interlaced frames of video and produce progressive.In certain aspects, change the interlaced frames of video and include anti-telecine process 3/2 Pulldown video frame, and/or progressive is sized.Methods described can further include segmentation progressive to determine image Group (group of picture) information, wherein the segmentation may include to detect (shot to the story board of progressive detection).In certain aspects, methods described also includes spending noise filter to filter progressive.

In another aspect, a kind of equipment for being used to handle multi-medium data may include to be configured to receive interlaced frames of video Receiver, be configured to the interlaced frames of video being converted into the deinterlacer of progressive, and be configured to produce and institute State the associated metadata of progressive and provide the progressive and the metadata to encoder for coding institute State the dispenser of progressive.In certain aspects, the equipment can further comprise being configured to from communication module receive by Row video and the encoder for carrying out encoded progressive video using the metadata provided.The deinterlacer can be configured to perform sky Between time release of an interleave and/or anti-telecine process (inverse telecining).The dispenser can be configured to perform Story board detects and produces compression information based on story board detection.In certain aspects, the dispenser can be configured with Produce bandwidth information.The equipment, which may also include, to be configured to the adjusted size of resampler of progressive frame.The metadata can Including bandwidth information, bi-directional motion information, bandwidth ratio, monochrome information, the spatial complexity value related to content, and/or with it is interior Hold related time complexity value.In certain aspects, the deinterlacer, which is configured to produce, is used for the interlaced frames of video Spatial information and bi-directional motion information and be based on the terleaved video using the spatial information and the bi-directional motion information Frame and produce progressive.

On the other hand a kind of equipment for handling multi-medium data is included, the equipment includes being used to receive terleaved video The device of frame, the device for the interlaced frames of video to be converted into progressive, for produce it is related to the progressive The device of the metadata of connection, and for by least a portion of the progressive and the metadata provide to encoder with In the device for encoding the progressive.In certain aspects, the conversion equipment includes anti-teleciner and/or sky Between time deinterlacer.In certain aspects, the generation device is configured to perform story board detection and based on described point of mirror Head detection produces compression information.In certain aspects, the generation device is configured to produce bandwidth information.In some respects In, the generation device includes being used for resampling with to the adjusted size of device of progressive frame.

On the other hand a kind of machine-readable medium is included, the machine-readable medium, which is included, to be used to handle multi-medium data Instruction, the instruction causes machine upon execution：Interlaced frames of video is received, the interlaced frames of video is converted into progressive, The metadata associated with the progressive is produced, and at least a portion of the progressive and the metadata is provided To encoder for the coding progressive.

On the other hand include a kind of processor, the processor comprising one configuration, it is described configuration to receive terleaved video, Terleaved video is converted into progressive, the generation metadata associated with the progressive and by the progressive and institute At least a portion for stating metadata provides to encoder to encode the progressive.Conversion to terleaved video may include Perform spatio-temporal deinterlacing.In certain aspects, the conversion to terleaved video includes the anti-telecine process of execution.At some In aspect, the generation of metadata includes based on detector lens changing and producing compression information.In certain aspects, the production of metadata It is raw to include determining the compression information of progressive.In certain aspects, the configuration is included to be produced to video resampling The configuration of the progressive frame of adjusted size.In certain aspects, the metadata may include bandwidth information, bi-directional motion information, Complexity information (for example, time or space complexity information based on content) and/or compression information.

Brief description of the drawings

Fig. 1 is for the block diagram for the communication system for delivering streaming multimedia data；

Fig. 2 is the block diagram for the digital transmission facility for including preprocessor；

Fig. 3 A are the block diagram of the illustrative aspect of preprocessor；

Fig. 3 B are to illustrate to be used to handle the flow chart of the process of multi-medium data；

Fig. 3 C are to illustrate to be used to handle the block diagram of the device of multi-medium data；

Fig. 4 is the block diagram for the operation for illustrating exemplary preprocessor；

Fig. 5 be anti-telecine process during phase bit decisions figure；

Fig. 6 is the flow chart for the process for illustrating anti-telecine process video；

Fig. 7 is the explanation to showing the grid of phase transition；

Fig. 8 is to the guidance for the respective frame for recognizing to create multiple measurements；

Fig. 9 is the flow chart for illustrating how to create Fig. 8 measurement；

Figure 10 is flow chart of the displaying to the processing of phase estimated by the arrival of measurement；

Figure 11 is to illustrate to be used to produce the DFD of the system of decision variable；

Figure 12 is used for the block diagram for the variable for assessing branch information for description；

How Figure 13 A, 13B and 13C calculate the flow chart of lower envelope for displaying；

Figure 14 is the flow chart of the operation of displaying consistency detector；

Figure 15 calculates the flow chart of the process of the skew of decision variable for displaying, and the skew is to compensate in phase bit decisions Inconsistency；

Figure 16 is in the operation for having estimated the anti-telecine process after pulldown phase.

Figure 17 is the block diagram of deinterlacer device；

Figure 18 is the block diagram of another deinterlacer device；

Figure 19 is the schema of the subsample pattern of interlaced image；

Figure 20 is the block diagram for the deinterlacer device for estimating to produce release of an interleave frame using Wmed filtered motions；

Figure 21 illustrates the one side of the aperture of the static zones for determining multi-medium data；

Figure 22 is to illustrate to be used to determine the figure of the one side of the aperture of the slow motor area of multi-medium data；

Figure 23 is the figure for the one side for illustrating estimation；

Figure 24 explanations are it is determined that used two motion vector figures during motion compensation；

Figure 25 is the flow chart for the method for illustrating release of an interleave multi-medium data；

Figure 26 is the flow chart for illustrating to produce the method for release of an interleave frame using spatial temporal information；

Figure 27 is the flow chart for illustrating to perform the method for motion compensation for release of an interleave；

The block diagram of preprocessor according to Figure 28 in terms of some, the preprocessor is comprising being configured for story board inspection Survey and the processor of other pretreatment operations；

Figure 29 illustrates the relation between codec complexity C and distributed position B；

Figure 30 is to illustrate to operate group of pictures and can be used in certain aspects based on the story board inspection in frame of video Survey and the flow chart of the process of encoded video；

Figure 31 is to illustrate to be used for the flow chart for the process that story board is detected；

Figure 32 is to illustrate to be used to determine the flow chart of the process of the different classifications of the camera lens in video；

Figure 33 is to illustrate to be used to be assigned to frame compression scheme based on story board testing result the flow of the process of frame of video Figure；

Figure 34 is to illustrate to be used to determine the flow chart of the process of unexpected scene changes；

Figure 35 is to illustrate to be used to determine the flow chart of the process of slowly varying scene；

Figure 36 is to illustrate to be used to determine the flow chart of the process of the scene containing camera flash；

Figure 37 illustrates present frame and former frame MV_PBetween and present frame and next frame MV_NBetween motion compensation vector；

Figure 38 is the chart for illustrating to be used to determine the relation of used variable during frame difference metric；

Figure 39 is the block diagram for illustrating coded data and calculating remnants；

Figure 40 is the block diagram for illustrating to determine frame difference metric；

Figure 41 is the flow chart for the program for illustrating wherein to be assigned to compression type frame；

Figure 42 illustrates the example of 1-D leggy resamplings；

Figure 43 is the chart for the safe action area and safe header area for illustrating data frame；And

Figure 44 is the chart in the safe action area for illustrating data frame.

Embodiment

Description includes the details for being used to provide the thorough understanding to example below.However, those skilled in the art will Understand, even if being not described herein or the process in illustrated example or aspect or each details of device, can still put into practice the reality Example.For example, electrical component can be shown in block diagrams, and the block diagram does not illustrate each electrical connection of the component or each electricity member Part is so as not to meeting with example described in unnecessary unnecessary details.In other cases, the component, other structures can be shown in detail And technology is to be explained further the example.

There is described herein some inventive aspects and preprocessor and the aspect of preprocessor operating method, it can improve existing Deposit the performance of pretreatment and coded system.The preprocessor can handle metadata and video thinks that coding is prepared, and it includes Perform release of an interleave, anti-telecine process, filtering, identification lens type, processing and produce metadata and produce bandwidth information.This " one side ", " one side ", " some aspects " or the reference of " some aspects " are meaned to be retouched with reference to the aspect in text One of special characteristic, structure or characteristic for stating or it is one or more of may include in preprocessor system at least one aspect In.Appearance in the multiple positions of the phrase in the description not necessarily refers to, with the one hand, also not necessarily refer to and other side Mutually exclusive independent or alternative aspect.In addition, describing, some aspects may be shown and other side may not shown Various features.Similarly, describe the step of can be for some aspects rather than various steps the step of other side.

" multi-medium data " or " multimedia " is broad terms as used herein, and it includes video data, and (it can Including voice data), voice data, or video data and both audio.As used herein " video data " or " video " be broad terms, its refer to containing text, image and/or the image of voice data or one or more series or The image of sequence, and unless specified otherwise herein, otherwise it can be used for referring to multi-medium data or the term is used interchangeably.

Fig. 1 is for the block diagram for the communication system 100 for delivering streaming multimedia.The system can be applied to digital compression Transmission of video is to multiple terminals (as demonstrated in Figure 1).Digital video source can be (for example) digital cable or satellite feed-in or warp Digitized simulation source.Video source is handled in transmission facilities 120, video source is encoded in the transmission facilities and is modulated to For being transferred to one or more terminals 160 via network 140 on carrier wave.Terminal 160 decode received video and Generally show at least a portion of the video.Network 140 refers to any kind of communication network for being suitable for transmitting encoded data Network (wired or wireless).For example, network 140 can be cellular phone network, wired or wireless LAN network (LAN) or wide area Network (WAN) or internet.Terminal 160 can for can receive and display data any kind of communicator, it include (but Be not limited to) cell phone, PDA, domestic or commercial video display apparatus, computer (pocket, on knee, handheld, PC and compared with Big server- based computing machine system) and can use the personal entertainment device of multi-medium data.

In terms of Fig. 2 and Fig. 3 illustrate the sample of preprocessor 202.In fig. 2, preprocessor 202 is in digital transmission facility In 120.Decoder 201 decodes the encoded data from digital video source and provides pre- place by metadata 204 and video 205 Manage device 202.Preprocessor 202 is configured to perform video 205 and metadata 204 certain types of processing and by through processing Metadata 206 (for example, Primary layer reference frame, enhancement layer reference frame, bandwidth information, content information) and video 207 are provided to volume Code device 203.The pretreatment to multi-medium data can improve vision definition, anti aliasing (anti-aliasing) and data Compression efficiency.Generally, preprocessor 202 receives the video sequence provided by decoder 201 and is converted into the video sequence Progressive scanning sequence for further being handled by encoder (for example, coding).In certain aspects, preprocessor 202 can through with Put for many operations, the operation includes anti-telecine process, release of an interleave, filtering (for example, artifact is removed, decyclization (de-ringing), deblocking (de-blocking) and denoising (de-noising)), be sized (for example, from standard definition To the lower sampling of the spatial resolution of a quarter Video Graphics Array (QVGA)), and gop structure generation is (for example, calculate complicated Property mapping produce, Scene change detection and decline/flashlight detection).

Fig. 3 A illustrate preprocessor 202, and it is configured with module or component (being collectively referred to herein " module ") to perform it There is provided to the metadata 204 and the pretreatment operation of video 205 that are received and then metadata 206 and progressive through processing 207 for further processing (such as there is provided to encoder).The module can be implemented with hardware, software, firmware or its combination. Preprocessor 202 may include various modules, and the module includes one of illustrated module or one or more of, illustrated Module include anti-telecine process 301, deinterlacer 302, denoising device 303, aliasing suppressor 304, resampler 305, Deblocking device/decyclization device 306, and GOP dispensers 307, all modules described further below.Preprocessor 202 may also include can Other appropriate modules for handling video and metadata, it includes memory 308 and communication module 309.Software module can be stayed Stay in RAM memory, flash memory, ROM memory, eprom memory, eeprom memory, register, hard disk, can fill In the storage media for unloading disk, CD-ROM or any other form known in the art.Exemplary storage medium is coupled to Processor, to cause the processor from read information and storage media can be write information to.Implement substituting In example, storage media can be integral with processor.Processor and storage media can reside within ASIC.The ASIC can reside within In user terminal.In alternative embodiments, processor and storage media can be resided in user terminal as discrete component.

Fig. 3 B are to illustrate to be used to handle the flow chart of the process 300 of multi-medium data.Process 300 starts and proceeds to frame 320, in a block 320, receive terleaved video.Fig. 2 and preprocessor 202 illustrated in fig. 3 can perform this step.In some sides In face, decoder (for example, Fig. 2 decoder 201), which can receive intercrossed data and provide intercrossed data, arrives preprocessor 202. In certain aspects, the data reception module 330 (it is a part for preprocessor 202) shown in Fig. 3 C can perform this step Suddenly.Process 300 proceeds to frame 322, in a block 322, and terleaved video is converted into progressive.It is pre- in Fig. 2 and Fig. 3 A Processor 202 and Fig. 3 C module 332 can perform this step.If terleaved video passes through at telecine process, frame 322 Reason may include to perform anti-telecine process to produce progressive.Process 300 proceeds to frame 324 to produce with regarding line by line The associated metadata of frequency.The module 334 in GOP dispensers 307 and Fig. 3 C in Fig. 3 A can perform the processing.Process 300 Frame 326 is proceeded to, in frame 326, at least a portion of progressive and metadata is provided to encoder to compile Code (for example, compression).Module 336 in the preprocessor 202 and Fig. 3 C that are shown in Fig. 2 and Fig. 3 A can perform this step. Progressive and associated metadata are provided after arriving another component for coding, process 300 can terminate.

Fig. 3 C are to illustrate to be used to handle the block diagram of the device of multi-medium data.Displaying described device is incorporated into pretreatment herein In device 202.Preprocessor 202 includes the device (for example, module 330) for being used to receive video.Preprocessor 202 also includes being used for Intercrossed data is converted into the device (for example, module 332) of progressive.Described device may include that (such as) space time solution is handed over Wrong device and/or anti-teleciner.Preprocessor 202 also includes being used to produce the metadata associated with progressive Device (for example, module 334).Described device may include that the GOP of polytype metadata can be produced as described in this article Dispenser 307 (Fig. 3 A).Preprocessor 202 may also include for progressive and metadata to be provided to encoder for volume The device of code, as illustrated by by module 336.In certain aspects, described device may include communication mould illustrated in Fig. 3 A Block 309.As those skilled in the art will understand, can many standard modes implementation described devices.

The metadata obtained (for example, being obtained from decoder 201 or from another source) can be used for institute by preprocessor 202 State one of pretreatment operation or one or more of.Metadata may include relevant with the content for the multi-medium data that describes or classify Information (" content information ").Specifically, metadata may include classifying content.In certain aspects, metadata does not include coding Content information required for operation.In such cases, preprocessor 202 can be configured with determine content information and will it is described in Holding information is used to pretreatment operation and/or provides content information arrive other components (for example, encoder 203).In some respects In, the content information can be used to influence GOP to split for preprocessor 202, it is determined that appropriate filtering type, and/or determine quilt It is sent to the coding parameter of encoder.

Fig. 4 displayings may include the illustrative example of the process frame in preprocessor, and explanation can be held by preprocessor 202 Capable processing.In this example, preprocessor 202 receives metadata and video 204,205 and will include first number (through processing) According to and video output data 206,207 provide arrive encoder 228.The video received by preprocessor generally has three types. First, the video received can be progressive and need not perform release of an interleave.Second, video data can be through telecine process Video, from 24fps film sequences change terleaved video, be in this situation video.3rd, video can be without TV electricity The terleaved video of shadow processing.Preprocessor 226 can handle the video of the type as described below.

At frame 401, preprocessor 202 determines whether received video 204,205 is progressive.In some situations Under, this can be determined (if metadata contains described information) from metadata, or be determined in itself by handling video.Lift For example, anti-telecine process process discussed below can determine that whether received video 205 is progressive.If The video 205 received is progressive, then process proceeds to frame 407, in frame 407, and video is performed filtering operation to subtract Few noise (for example, white Gauss (Gaussian) noise).If video is not progressive, at frame 401, process proceeds to Frame 404 arrives phase detectors.

Phase detectors 404 distinguish the video for originating from the video of telecine process and starting with standard broadcast format. If it is the decision-making (what is exited from phase detectors 404 is (YES) decision path) through telecine process to make video, The video through telecine process is set to return to its initial format in anti-telecine process 406.Recognize and eliminate redundancy field and Derived from same frame of video complete image will be weaved into again in field.Because reconstructed with the aturegularaintervals photographic recording of 1/24 second The sequence of film image, so the motion estimation process performed in GOP dispensers 412 or decoder is more accurate, the process It is using the image through anti-telecine process rather than the data through telecine process using base when having irregular.

In one aspect, phase detectors 404 make some decision-makings after frame of video is received.The decision-making includes： (i) current video and 3 exported from telecine process:Whether 2 pulldown phases are five phase P demonstrated in Figure 5₀、 P₁、P₂、P₃And P₄One of；And (ii) video is through being produced as conventional NTSC.The decision-making is represented as phase P₅.It is described to determine Plan shows as the output of phase detectors 404 demonstrated in Figure 4.The road through being labeled as "Yes" from phase detectors 404 Footpath starts anti-telecine process 406, indicates that anti-telecine process 406 has possessed correct pulldown phase to cause its optional Go out the field formed by same photographs and combine the field.The path through being labeled as "No" from phase detectors 404 Similarly start deinterlacer 405 obvious NTSC frames are divided into some fields for optimization process.Anti- telecine process Through being further described in entitled " anti-telecine process algorithm (the Inverse Telecine based on state machine Algorithm Based on State Machine) " the same U.S. patent application case in application [attorney docket is QFDM.021A (050943)] in, the application case returns assignee of the present invention to possess and is incorporated to this in entirety by reference Wen Zhong.

Phase detectors 404 can continuously analyze frame of video, because different types of video can be received at any time.Make To illustrate, the video for meeting NTSC standard can be inserted in video and be used as commercial programme.After anti-telecine process, by institute The progressive obtained is sent to available for the denoising device (wave filter) 407 for reducing white Gauss noise.

When picking out conventional ntsc video (no (NO) path from phase detectors 401), by the transmission of video To deinterlacer 405 for compression.Interlaced field is transformed into progressive by deinterlacer 405, and can be then to progressive Perform de-noise operation.

After appropriate anti-telecine process or release of an interleave processing, at frame 408, handle progressive to be mixed It is folded to suppress and resampling (for example, being sized).

After resampling, progressive proceeds to frame 410, in frame 410, performs deblocking and decyclization operation.Two The artifact " blocking (blocking) " of type and " cyclic (ringing) " is common occurs in video compression applications.Blocking vacation The appearance of shadow is because each frame is divided into some pieces (for example, 8 × 8 blocks) by compression algorithm.Each piece of reconstruct has Small error, and error of the error usually with the edge of contiguous block of the edge of block be contrasted so that block boundary is visible.Phase Than under, cyclic artifact shows as the distortion around the edge of characteristics of image.The appearance of cyclic artifact is because encoder is in amount Too many information has been abandoned when changing high frequency DCT coefficients.In some illustrative examples, deblocking and the usable low pass FIR of both decyclizations (finite impulse response (FIR)) wave filter hides the visible artifact.

After deblocking and decyclization, progressive is handled by GOP dispensers 412.GOP segmentations may include that detector lens become Change, produce complexity mapping (for example, time, spatial bandwidth mapping) and adaptive GOP segmentations.Story board detection is related to determination figure As when the frame in group (GOP) shows the data for indicating that scene changes have occurred.Scene change detection is compiled available for video Code device inserts I frames to determine appropriate GOP length and be based on the GOP length, rather than inserts I frames with fixed intervals.Pretreatment Device 202 also can be configured to produce the Bandwidth map available for coded multimedia data.In certain aspects, it is changed to by positioned at pre- Content, classification module outside processor produces Bandwidth map.Adaptive GOP segmentations, which can adaptively change, to be coded in together Groups of pictures composition.The illustrative example of the operation demonstrated in Figure 4 is described below.

Anti- telecine process

Anti- telecine process process is described below and the illustrative reality of anti-telecine process is provided referring to Fig. 4 to Figure 16 Example.When known source attribute and using the source attribute to select the processing form ideally matched when, video compress is provided most preferably As a result.The video (such as) of (Off-the-air) of not being on the air can originate from a number of ways.Video camera, broadcast studio Etc. in by convention produced INVENTIONBroadcast video meet NTSC standard in the U.S..According to the standard, each frame is by two fields Composition.One field is made up of odd lines, and another field is made up of even lines.This can be referred to as " interlocking " form.Although with substantially 30 The speed of frame/second produces frame, but the field is the record of the image of television camera, and the record is separated by 1/60 second.On the other hand, With the speed photographic film of 24 frames/second, each frame is made up of complete image.This can be referred to as form " line by line ".Set for NTSC Transmission in standby, " line by line " video is converted into the video format that " interlocks " via telecine process process.Discuss further below State, in one aspect, system advantageously determines when video telecine process and performs proper transformation to regenerate Initial progressive frame.

Fig. 4 displayings telecine process has been converted into the effect of the progressive frame of terleaved video.F₁、F₂、F₃And F₄To scheme line by line Picture, it is the input of teleciner.Numeral " 1 " and " 2 " under respective frame are the instruction to odd number or even field.Note Meaning, in view of the disparity between frame rate, repeats some fields.Fig. 4 also shows that pulldown phase P₀、P₁、P₂、P₃And P₄.Pass through tool There is the one mark phase P in two NTSC compliant frames of identical first₀.Four subsequent frames correspond to phase P₁、 P₂、P₃And P₄.Note, by P₂And P₃The frame of mark has identical second.Because film frame F₁Scanned three times, so being formed The NTSC that two identicals are exported in succession is compatible first.From film frame F₁Derived all NTSC from same film image Obtain and be therefore to be obtained in synchronization.Other NTSC frames can have the opposite field for being separated by 1/24 second derived from film.

Phase detectors 404 illustrated in fig. 4 make some decision-makings after frame of video is received.The decision package Include：(i) current video and 3 exported from telecine process:Whether 2 pulldown phases are by showing in Fig. 5 definition 512 Five phase P₀、P₁、P₂、P₃And P₄One of；And (ii) video is through being produced as conventional NTSC --- the decision-making is expressed For phase P₅。

The decision-making shows as the output of phase detectors 401 demonstrated in Figure 4.Warp from phase detectors 401 The path for being labeled as "Yes" starts anti-telecine process 406, and it indicates that anti-telecine process 406 has possessed correct drop-down Field and the combination field that phase is formed with causing it to select by same photographs.From phase detectors 401 through mark Note and start deinterlacer frame 405 as the class of paths of "No" obvious NTSC frames are divided into some fields for the best of it Reason.

Fig. 6 is the flow chart for the process 600 for illustrating anti-telecine process video flowing.In one aspect, by Fig. 3 anti-electricity Radio movies handles 301 implementation procedures 600.Start at step 651, anti-teleciner 301 is based on the video received Determine multiple measurements.In in this regard, form four measurements, four measurements be the field taken out from same frame or consecutive frame it Between difference sum.Four measurements are through being further assembled into four measurements derived from received data and for six The euclidean (Euclidean) of the distance between most likely value of the measurement is surveyed for the individual each for assuming phase Amount.Euclidean and it is referred to as branch information；For each received frame, there are six amounts.Each hypothesis phase tool There is subsequent phase, the subsequent phase changes under the situation of possible pulldown phase with each received frame.

Possible route of transition is showed in Fig. 7 and is expressed as 767.In the presence of six paths.Decision process maintains six Individual measurement, the measurement is equivalent to the sum of the Euclidean distance in each path for assuming phase.To make program in response to having changed The condition of change, when each Euclidean distance with becomes the old times, reduces the Euclidean distance.Euclidean distance And minimum phase locus be considered as exercisable phase locus.The current phase of this track is referred to as " applicable phase ". Anti- telecine process based on chosen phase can occur now, as long as it is not P₅.If selected for P₅, then using frame The deinterlacer release of an interleave present frame at 405 (Fig. 4) places.In a word, by the use of applicable phase as current pulldown phase, or it is used as use To order the designator that release of an interleave is carried out to being estimated as the frame with effective NTSC format.

For each frame received from video input, the new value of each of four measurements is calculated.The measurement warp It is defined as：

SAD_FS=∑ | current field one is worth (i, j)-preceding field one and is worth (i, j) | (1)

SAD_SS=∑ | current field two-value (i, j)-preceding field two-value (i, j) | (2)

SAD_PO=∑ | current field one is worth (i, j)-preceding field two-value (i, j) | (3)

SAD_CO=∑ | current field one is worth (i, j)-current field two-value (i, j) | (4)

Term SAD is the abbreviation of term " total absolute difference ".In Fig. 8 diagrammatic illustration through distinguishing to form the field of measurement.Under Mark refers to field number；Letter represents previously (=P) or current (=C).Bracket in Fig. 8 refers to the paired poor of field.SAD_FSRefer to present frame Through being labeled as C₁Field one and previous frame through being labeled as P₁Field one between difference, in the definition provided in fig. 8 through mark The bracket for FS is noted across the field；SAD_SSRefer to present frame through being labeled as C₂Field two and previous frame through being labeled as P₂Field Difference between two, through being labeled as SS bracket across described two fields；SAD_CORefer to present frame through being labeled as C₂Field 2 with it is current Frame through being labeled as C₁Field one between difference, through being labeled as CO bracket across the field；And SAD_PORefer to the field one of present frame Difference between the field 2 of previous frame, through being labeled as PO bracket across described two fields.

Describe to assess every SAD computational load below.There are about 480 level of significance lines in conventional NTSC. In order that the resolution ratio in horizontal direction is identical, with 4:3 aspect ratio, should have the vertical of 480 × 4/3=640 bars equalization Line or the free degree.The video format of 640 × 480 pixels is one of form that advanced television standard committee member club receives.Therefore, Every 1/30 second (duration of a frame), 640 × 480=307,200 new pixels are produced.With 9.2 × 10⁶Pixel/second Speed produces new data, and it, which is implied, runs the hardware or software of this system to be about 10MB or more rate processing data. This is one of highspeed portion of system.It can give reality by hardware, software, firmware, middleware, microcode or its any combinations Apply.SAD calculators can be the independent assembly being incorporated into as hardware, firmware, middleware in the component of another device, or with place The microcode or software performed on reason device is practiced, or its combination.When being practiced with software, firmware, middleware or microcode, It can will perform the procedure code calculated or code section is stored in the machine-readable medium of such as storage media.Code section can representation program, Function, subprogram, program, routine, subroutine, module, software encapsulation, class, or instruction, data structure or program statement it is any Combination.Can be by transmission and/or receive information, data, independent variable (argument), parameter or memory content by code section coupling Close another yard of section or hardware circuit.

Flow chart 900 in Fig. 9 makes the relation in Fig. 8 clear and definite and Fig. 9 is the graphic representation of equation 1 to 4.Fig. 9 displayings point SAD is not kept_FS、SAD_CO、SAD_SSAnd SAD_POMost recent value storage location 941,942,943 and 944.Each freedom of described value is exhausted It is to four of poor calculator 940 and produced, the absolute difference computation device 940 handle previous first field data 931 brightness value, The brightness of the brightness value of current first field data 932, the brightness value of current second field data 933 and previous second field data 934 Value.In the summation of definition measurement, term " value (i, j) " means as position i, the brightness value at j places, and summation is to all effective pictures The summation of element, but it is not excluded for the summation to the significant subset of valid pixel.

Flow chart 100 in Figure 10 is to illustrate to be used to detect the video through telecine process and invert at through telecine The video of reason is to return to the detail flowchart of the process of the film image through initially scanning.In step 1030, assess in Fig. 9 Defined measurement.Step 1083 is proceeded to, the lower envelope value of four measurements is found.The lower envelope of SAD measurements is true through dynamic Fixed amount, it is the high lowest limit, is not passed through in its lower SAD.Step 1085 is proceeded to, it is determined that following in equation 5 to 10 Defined in branch information amount, its can be used previously determined by measurement, lower envelope value and the constant experimentally determined A.Because continuous phase value may be inconsistent, determination amount Δ is to reduce this obvious unstability in step 1087. When phase bit decisions sequence with it is demonstrated in Figure 7 the problem of model it is consistent when, it is believed that the phase is consistent.In the step Afterwards, process proceeds to step 1089 to calculate decision variable using the currency of Δ.Decision variable calculator 1089 uses logical Carry out evaluation decision variable to its all information produced in frame 1080.Step 1030,1083,1085,1087 and 1089 are Measurement in Fig. 6 determines 651 extension.Found by phase selector 1090 from the variable and be applicable phase.As illustrated, certainly Plan step 1091 inverts video or release of an interleave the regarding through telecine process through telecine process using phase is applicable Frequently.It is more clearly describing for the operation to the phase detectors 404 in Fig. 4.In one aspect, by Fig. 4 phase detectors 404 perform Figure 10 processing.Start at step 1030, detector 404 is determined many by the above-mentioned process referring to described by Fig. 8 Individual measurement, and continue through step 1083,1085,1087,1089,1090 and 1091.

Flow chart 1000 illustrates the process for estimating current phase.The flow chart is described at step 1083 using warp The measurement and lower envelope value of determination calculates branch information.Branch information can it is recognized for the euclidean previously discussed away from From.It is equation 5 below to 10 available for the exemplary equation of branch information is produced.Figure 12 step 1209 fall into a trap point counting branch letter Breath amount.

Video data through processing can be stored in storage media, and the storage media may include (such as) chip collocation type Storage media (for example, ROM, RAM) or the dish-type storage media (for example, magnetic or optical) for being connected to processor.One In a little aspects, anti-telecine process 406 and deinterlacer 405 can each contain some or all storage medias.By following Equation defines branch information amount.

Branch information (0)=(SAD_FS-H_S)²+(SAD_SS-H_S)²+(SAD_PO-H_P)²+(SAD_CO-L_C)² (5)

Branch information (1)=(SAD_FS-L_S)²+(SAD_SS-H_S)²+(SAD_PO-L_P)²+(SAD_CO-H_C)² (6)

Branch information (2)=(SAD_FS-H_S)²+(SAD_SS-H_S)²+(SAD_PO-L_P)²+(SAD_CO-H_C)² (7)

Branch information (3)=(SAD_FS-H_S)²+(SAD_SS-L_S)²+(SAD_PO-L_P)²+(SAD_CO-L_C)² (8)

Branch information (4)=(SAD_FS-H_S)²+(SAD_SS-H_S)²+(SAD_PO-H_P)²+(SAD_CO-L_C)² (9)

Branch information (5)=(SAD_FS-L_S)²+(SAD_SS-L_S)²+(SAD_PO-L_P)²+(SAD_CO-L_C)² (10)

The small detail that branch calculates is shown in branch information calculator 1209 in fig. 12.Such as in calculator 1209 Shown, draw branch information usage amount L_S(SAD_FSAnd SAD_SSLower envelope value), L_P(SAD_POLower envelope value), and L_C (SAD_COLower envelope value).Lower envelope branch information calculating in be used as ranging offset, so as to individually or with predetermined constant A mono- Rise and produce H_S、H_PAnd H_C.The value that lower envelope is kept in lower envelope tracker discussed below is newest.H skews are defined For：

H_S=L_S+A (11)

H_PO=L_P+A (12)

H_C=L_C+A (13)

Tracking L is presented in Figure 13 A, 13B and 13C_S、L_PAnd L_CValue process.Consider place at the top of (such as) Figure 13 A What is shown is used for L_PTrack algorithm 1300.SAD will be measured in comparator 1305_POWith L_PCurrency add threshold value T_PCarry out Compare.If SAD_POMore than L_PCurrency add threshold value T_P, then as shown in frame 1315, L is not changed_PCurrency.If SAD_PONot less than L_PCurrency add threshold value T_P, then as seen in frame 1313, L_PNew value become SAD_POWith L_PLinear group Close.In another aspect, for step 1315, L_PNew value be L_P+T_P。

Similarly calculate the amount L in Figure 13 B and Figure 13 C_SAnd L_C.There is the processing of identical function in Figure 13 A, 13B and 13C Frame through numbering in the same manner, but provide apostrophe (' or ") to show that it is operated to different set of variables.For example, formation is worked as SAD_POWith L_CLinear combination when, the operation is shown in frame 1313'.For L_PSituation, on the other hand for 1315' will Use L_C+T_CReplace L_C。

However, in L_SSituation under, the algorithm in Figure 13 B alternately handles SAD_FSAnd SAD_SS, every X is marked successively, because It is applied to two variables for this lower envelope.When by the SAD in frame 1308_FSCurrency read in frame 1303 in X position in, then By SAD in 1307_SSCurrency read in frame 1302 in X position in when, occur SAD_FSValue and SAD_SSThe alternating of value.For L_P Situation, L on the other hand will be used for 1315 "_S+T_SReplace L_S.It is intended for testing the amount A of current lower envelope value by experiment And threshold value.

Figure 11 is to illustrate to be used to perform the flow chart of the exemplary process of Figure 10 step 1089.Figure 11, which is substantially shown, to be used for Update the process of decision variable.(correspond to six with from the derived fresh information of measurement to update six decision variables in fig. 11 Possible decision-making).The decision variable is obtained as follows：

D₀=α D₄+ branch information (0) (14)

D₁=α D₀+ branch information (1) (15)

D₂=α D₁+ branch information (2) (16)

D₃=α D₂+ branch information (3) (17)

D₄=α D₃+ branch information (4) (18)

D₅=α D₅+ branch information (5) (19)

α is measured to be less than one and limit dependence of the decision variable to its past value；α use, which is equivalent to, works as Euclidean distance Data become the effect that the old times reduce each Euclidean distance.In flow chart 1162, decision-making to be updated is become in left side Measure be listed as on online 1101,1102,1103,1104,1105 and 1106 it is available.Then by phase in one of frame 1100 The each of decision variable on one of route of transition is multiplied by α, and α is the number less than one；Then declining old decision variable Depreciation is added to the currency for the branch information variable indexed by next phase on phase transition path, the decision-making of decay Variable is on the phase transition path.This occurs in frame 1110.Make variables D in frame 1193₅Offset an amount Δ；Δ be Calculated in frame 1112.As described below, it is inconsistent in phase sequence determined by thus system to reduce to select the amount Property.The decision variable of minimum is obtained in frame 1120.

In a word, fresh information specific to each decision-making is added to the preceding value for the appropriate decision variable for being multiplied by α To obtain the value of current decision variable.When newly being measured, new decision-making can be made；Therefore, this technology can received New decision-making is made during to the field 1 of each frame and field 2.The decision variable is the sum for the Euclidean distance being initially mentioned.

It is the lower target phase with minimum decision variable to be applicable phase chosen.Clearly made in Figure 10 frame 1090 Decision-making based on decision variable.Allow some decision-makings in decision space.As described in frame 1091, the decision-making is：(i) it is applicable Phase is not P₅--- it is P that then anti-telecine process video and (ii), which are applicable phase,₅--- then release of an interleave video.

Because measurement is taking-up in inherently variable video, accidental mistake is there may be in the relevant string of decision-making Difference.This technology for detection is to the phase sequence inconsistent with Fig. 7.Its operation is summarized in fig. 14.Algorithm 1400 is deposited in frame 1405 Store up the subscript (=x) of current phase bit decisions and the subscript (=y) of previous phase decision-making is stored in frame 1406.In frame 1410, Test whether x=y=5；In frame 1411, values below is tested：

Whether

X=l, y=0；Or

X=2, y=l；Or

X=3, y=2；Or

X=4, y=3；Or

X=0, y=4.

If any one of two tests is affirmative, it is consistent to declare the decision-making in frame 1420.If appointed One test is not affirmative, then calculates the skew being showed in Figure 11 frame 1193 in fig .15 and be added to the skew With P₅Associated decision variable D₅。

To D₅Modification also come across as a part for process 1500 in Figure 15, it is described modification in phase sequence not Uniformity provides corrective action.It is assumed that the uniformity test in frame 1510 in flow chart 1500 has failed.Along from frame 1510 The "No" branch of extraction is carried out, and next test in frame 1514 is：For all i<5, if D₅>D_i；Or be：For i<5, At least one D of the variable_iWhether D is more than₅.If the first situation effectively, in frame 1516 by initial value be δ₀Parameter δ changes over 3 δ₀.If δ effectively, 4 δ is changed in frame 1517 by the second situation₀.In frame 152B, the value of Δ is updated to Δ_B, wherein

Δ_B=maximum (Δ-δ, -40 δ₀) (20)

Return again to frame 15210, it is assumed that the decision-making string is consistent through being determined as.In frame 15215, by parameter δ change over by The δ that following formula is defined₊

δ₊=maximum (2 δ, 16 δ₀) (21)

δ new value is inserted into the renewal relationship delta for Δ in frame 152A_AIn.This is

Δ_A=maximum (Δ+δ, 40 δ₀) (22)

Then the updated value of Δ is added to decision variable D in frame 1593₅。

Figure 16 is shown once it is determined that how pulldown phase, anti-telecine process process is carried out.Using this information, by field 1605 and 1605' is identified as representing the same field of video.Described two fields are averaging together and combine it with field 1606 with Reconstructed frame 1620.Reconstructed frame is 1620'.Similar procedure is by reconstructed frame 1622.Do not replicate from derived from frame 1621 and 1623 .The frame is reconstructed by the way that first and second of the frame is woven together again.

In described above in terms of, whenever new frame is received, four new values being measured and using counting recently The decision variable calculated tests sixfold hypothesis group.Other processing structures are suitably adapted for calculating the decision variable.Viterbi (Viterbi) measurement for the branch that decoder will constitute path adds and is added together to form path metric.It is defined here to determine Plan variable is formed by similar rule：Each for fresh information variable " sewing " and.(in summation is sewed, will newly it believe Breath data are added to before the preceding value of decision variable, and the preceding value of the decision variable is multiplied by into the number less than one.) Viterbi Decoder architecture may be modified to support the operation of this program.

Although just describing present aspect for processing convention video (wherein, a new frame occurred every 1/30 second), it should be noted that This process is applicable to the frame for recording and handling backward in time.Decision space keeps identical, but there is small change, described Change reflects the time reversal of the sequence of input frame.For example, a string of relevant telecines from time reversal pattern Handle decision-making (presented herein)

P₄P₃P₂P₁P₀

Also it will be inverted in time.

Using this change to first aspect decision-making will be allowed to carry out two kinds of trials when making successful policy：One kind is attempted It is that in time forward, another trial is in time backward.Although two kinds of trials are not independent, it is differed, because Measurement will be handled in a different order for each trial.

This idea can be applied together with buffer, and the buffer stores regarding for the future that may be additionally needed through maintaining Frequency frame.If it find that video segment provides unacceptably inconsistent results in forward direction processing direction, then program will be from described The frame in future is taken out in buffer and attempts to overcome by handling the frame in opposite direction the stretching of video difficult.

The processing to video described in this patent could be applicable to the video of PAL format.

Deinterlacer

" deinterlacer " is broad terms as used herein, and it can be used for description wholly or largely to handle friendship Wrong multi-medium data (including is for example configured to perform to form the release of an interleave system of multi-medium data line by line, device or process Software, firmware or the hardware of process).

INVENTIONBroadcast video produced by convention meets NTSC standard in the U.S. in video camera, broadcast studio etc..One The normal method for planting compression video is to be interlocked.In intercrossed data, each frame is made up of one of two fields.One field It is made up of the odd lines of frame, another field is made up of even lines.Although producing frame with substantially 30 frames/second, the field is TV phase The record of the image of machine, the record is separated by 1/60 second.Each frame display image of terleaved video signal every a horizontal line. When the frame is projected on screen, vision signal replaces between displaying even lines and odd lines.When performing this fast enough When alternately (for example, 60 frames about per second), video image appears to be smooth in human eye.

Using staggeredly up to many decades in the analog television broadcast based on NTSC (U.S.) and PAL (Europe) form.Cause To send the image of only half with each frame, it will be used so the bandwidth that terleaved video is used is about its transmission whole image Bandwidth half.The last display format of video inside terminal 16 is unnecessary compatible with NTSC and can not easily show Show intercrossed data.On the contrary, the modern display (for example, LCD, DLP, LCOS, plasma etc.) based on pixel is progressive scan And video source (and many older video-unit use older interlacing technology) of the display through progressive scan.Some often make The example of release of an interleave algorithm is described in P Arvo Haavistos (P.Haavisto), and J. wishes that Hora (J.Juhola) and Y. are about difficult to understand (Y.Neuvo) " change (Scan rate up-conversion upwards using the sweep speed of adaptive weighted averaging filter Using adaptive weighted median filtering) " (HDTV II signal transacting (Signal Processing of HDTV II), the 703-710 pages, nineteen ninety) and R simons in carry (R.Simonetti), S. OK a karaoke club supports (S.Carrato), the blue ripple Buddhist nuns (G.Ramponi) of G. and A. Borrows Fei Lisen (A.Polo Filisan) " should for multimedia Release of an interleave (the Deinterlacing of HDTV Images for Multimedia of HDTV images Applications) " (HDTV II signal transacting (Signal Processing of HDTV IV), the 765-772 pages, 1993) in.

Describe below for may be used alone or in combination using to improve the performance of release of an interleave and available for deinterlacer 405 Example in terms of the release of an interleave of system and method in (Fig. 4).The aspect may include to carry out release of an interleave using space time filtering Selected frame is to determine the first interim release of an interleave frame, determine that second faces from the selected frame using bi-directional motion estimation and motion compensation When release of an interleave frame, and then combine first and second described interim frame to form final progressive frame.The space time filtering Weighted median filtering (" Wmed ") wave filter can be used, the wave filter may include to prevent from making what level or nearly horizontal edge were obscured Horizontal edge detector.Intensity movements grade is produced to the previous of " current " field and the filtering of the space time of subsequent neighbouring field to reflect Penetrate, the intensity movements grade is mapped some part classifyings of selected frame into different sport ranks, for example, static, slow Motion and quick motion.

In certain aspects, used by Wmed filtering including from five neighbouring field (the first two, current fields and rear two Individual field) the filtering aperture of pixel produce intensity mapping.Wmed filtering, which can determine that, can effectively dispose scene changes and thing Body occurs and the forward, backward disappeared and the detection of bidirectional static area.In in every respect, can be crossed between field in filter patterns has One or more of same parity utilize Wmed wave filters, and can switch it to show up by adjusting threshold value standard Interior filter patterns.In certain aspects, estimation and compensation use brightness (intensity of pixel or lightness) and chroma data (color information of pixel) improves the release of an interleave region of selected frame, and wherein almost consistent but color is different for lightness grade.Go Noise filter can be used for the degree of accuracy of increase estimation.Denoising acoustic filter can be applied to through the interim of Wmed release of an interleaves Frame is filtered produced aliasing artifacts by Wmed to remove.De-interlace method discussed below and system produce excellent solution and handed over Wrong result and with relatively low computational complexity, it allows quick operation release of an interleave embodiment, fits the embodiment Together in various release of an interleave applications, the application includes being used for providing data to cell phone, calculating using display The system of machine and other types of electronics or communicator.

Deinterlacer is described with reference to various assemblies, module and/or the step for release of an interleave multi-medium data herein And the aspect of de-interlace method.

Figure 17 is the block diagram of the one side for the deinterlacer 1700 for illustrating can be used as the deinterlacer 405 in Fig. 4.Release of an interleave Device 1700 is included at least a portion in space and time upper (" space time ") filtering intercrossed data and produces space time letter The spatial filter 1730 of breath.For example, Wmed can be used in spatial filter 1730.In certain aspects, release of an interleave Device 1700 also includes denoising acoustic filter (not shown), for example, wiener (Weiner) wave filter or wavelet shrinkage (wavelet Shrinkage) wave filter.Deinterlacer 1700 also includes providing the estimation to the selected frame of intercrossed data and compensation and production The exercise estimator 1732 of raw movable information.Combiner 1734 receive and interblock space temporal information and movable information with formed by Row frame.

Figure 18 is another block diagram of deinterlacer 1700.Processor 1836 in deinterlacer 1700 includes spatial filter Module 1838, motion estimation module 1840 and combiner modules 1842.Interlaced multimedia data from external source 48 can be carried It is supplied to the communication module 44 in deinterlacer 1700.Can by hardware, software, firmware, middleware, microcode or its any combinations come Implement deinterlacer and its component or step.For example, deinterlacer can be to be incorporated into separately as hardware, firmware, middleware Independent assembly in the component of one device, or be practiced with the microcode or software performed on a processor, or its combination.When with When software, firmware, middleware or microcode are practiced, the procedure code or code section for performing deinterlacer task can be stored in for example In the machine-readable medium of storage media.Code section can represent process, function, subprogram, program, routine, subroutine, module, soft Any combinations of part encapsulation, class, or instruction, data structure or program statement.Can by transmission and/or receive information, data, from Variable, parameter or memory content and by code section be coupled to another yard of section or hardware circuit.

The intercrossed data received can be stored in the storage media 1846 in deinterlacer 1700, and storage media 1846 can Including (such as) chip collocation type storage media (for example, ROM, RAM) or it is connected to the dish-type storage media (example of processor 1836 Such as, magnetic or it is optical).In certain aspects, processor 1836 can contain some or all storage medias.Processor 1836 It is configured to handle interlaced multimedia data and is subsequently fed to the progressive frame of another device or process to be formed.

The conventional analog video device of similar TV reproduces video in an interleaved manner, i.e. described device transmission numbering is even Several scan line (even field) and the scan line (odd field) that numbering is odd number.In terms of sample of signal viewpoint, this is equivalent to such as The space time subsample (subsampling) that lower described pattern is carried out：

Wherein Θ represents initial two field picture, and F represents interlaced field, and (x, y, n) represent respectively pixel level, it is vertical and Time location.

Do not lose it is general in the case of, it may be assumed that n=0 is always even field in the present invention, therefore above equation 23 simplified are

Because being extracted in horizontal size, it is possible to which ensuing n~y-coordinate describes subsample pattern. In Figure 19, circle represents position with asterisk, and initial full frame image has sampled pixel in the position.Release of an interleave mistake Journey extracts asterisk pixel, and it is perfect to retain circle pixel.It note that we start from scratch to index to upright position, because This, even field is top field, and odd field is bottom field.

The target of deinterlacer is that terleaved video (field sequence) is transformed into noninterlace progressive frame (frame sequence).In other words Say, interpolation even number and odd field are with " recovery " or generation full frame image.This can be represented by equation 25：

Wherein F_iRepresent the release of an interleave result of pixel lacked.

Figure 20 be illustrate deinterlacer one side some aspects block diagram, the deinterlacer using Wmed filtering and Estimation produces progressive frame from interlaced multimedia data.Figure 20 upper part displaying can be used from current field, the first two The information of (PP and P) and latter two (next field and again next field) and produce exercise intensity mapping 2052.Motion is strong Degree mapping 2052 is by current frame classification or is divided into two or more different motion grades, and can be by hereinafter entering one The space time that step is described in detail is filtered and produced.In certain aspects, exercise intensity mapping 2052 is produced to recognize as following Static zones, slow motor area and quick motor area with reference to described by equation 4 to 8.Spatio-temporal filter is (for example, Wmed is filtered Ripple device 2054) interlaced multimedia data is filtered using the standard mapped based on exercise intensity, and generation space time is solved temporarily Interlaced frame.In certain aspects, Wmed filterings are related to the horizontal neighbors of [- 1,1], the vertical neighborhood of [- 3,3], and pass through Five fields illustrated in fig. 20 (PP, P, current field, next field, again next field) time of five opposite fields for representing is adjacent Domain, wherein Z^-1Represent the delay of a field.Relative to current field, next field and P are no parity and PP and next field again For parity field." neighborhood " filtered for space time refers to actual used field and the space of pixel during filtering operation And time location, and can be explained as such as (e.g.) " aperture " shown in Figure 21 and Figure 22.

Deinterlacer may also include denoising device (denoising acoustic filter) 2056.Denoising device 2056 be configured to filtering by The interim release of an interleave frame of space time that Wmed wave filters 2054 are produced.The interim release of an interleave frame denoising of space time is made subsequent Motion search process is more accurate, especially when source interlaced multimedia data sequence is by white noise sound pollution.Denoising device 2056 is also The aliasing in Wmed images between even number line and odd-numbered line can be removed at least in part.Denoising device 2056 can be embodied as a variety of Wave filter, including be equally further described below based on wavelet shrinkage and small echo wiener (Wiener) wave filter Denoising device.

Figure 20 bottom defend oneself it is bright be used for determine interlaced multimedia data movable information (for example, motion vector candidates, Estimation, motion compensation) aspect.Specifically, Figure 20 illustrates estimation and motion compensated schemes, the estimation And motion-compensated interim progressive frame of the motion compensated schemes for producing selected frame, and then by itself and the interim frame groups of Wmed Close to form " final " progressive frame of gained, it is shown as the present frame 2064 through release of an interleave.In certain aspects, staggeredly many matchmakers Motion vector (" MV ") candidate (or estimation) of volume data is provided to deinterlacer and for being double from external movement estimator Starting point is provided to exercise estimator and compensator (" ME/MC ") 2068.In certain aspects, MV candidate selectors 2072 for The MV candidates of block being processed using the MV of contiguous block was used for determined by previously, for example previous through processing block (for example, Block in previous frame 2070 through release of an interleave) MV.Can be based on frame 70 previous through release of an interleave and next (for example, future) Wmed frames 2058 and two-way carry out motion compensation.Merged by combiner 2062 or combine current Wmed frames 2060 with it is motion-compensated The present frame 2066 of (" MC ").The present frame 2064 (being now progressive frame) through release of an interleave of gained through provide back ME/MC 2068 with As the previous frame 2070 through release of an interleave and be also communicated outside deinterlacer for further processing (for example, compression and pass It is defeated to arrive display terminal).The various aspects shown in more detail below Figure 20.

Figure 25 illustrates for handling multi-medium data to produce the process 2500 of frame sequence line by line from interlaced frame sequences.One In aspect, progressive frame is produced by deinterlacer 405 illustrated in fig. 4.At frame 2502, process 2500 (process " A ") is produced Spatial temporal information for selected frame.Spatial temporal information may include the sport rank for multi-medium data of classifying and generation The information of exercise intensity mapping, and including the interim release of an interleave frames of Wmed and for producing the information of the frame (for example, for equation Information in 26 to 33).Wmed wave filters 2054 and its relevant treatment that can be illustrated in such as Figure 20 upper part (enter below One step is described in detail) perform this process.In fig. 26 in illustrated process A, by territorial classification into not at frame 2602 With the field of sport rank, such as it is been described by further below.

Next, at frame 2504 (process " B "), process 2500 produces the motion compensation information for selected frame.One In aspect, illustrated bidirectional motion estimation device/motion compensator 2068 can perform this process in Figure 20 lower part.Process 2500 proceed to the field of frame 2506, wherein process release of an interleave selected frame based on spatial temporal information and motion compensation information To form the progressive frame associated with selected frame.This can be performed by illustrated combiner 2062 in Figure 20 lower part.

Exercise intensity maps

For each frame, the area of different " motions " can be determined by handling the pixel in current field to determine exercise intensity 2052 mappings.Describe to determine the illustrative aspect of three type games intensity mapping referring to Figure 21 to Figure 24 below.Exercise intensity maps The area that each frame is specified based on the pixel compared in same parity and not like parity field is static zones, slow motor area And quick motor area.

Static zones

Certain (a little) pixel can be determined comprising the pixel in the neighborhood for handling opposite field by determining the static zones of Motion mapping Whether luminance difference meets certain standard.In certain aspects, determine the static zones of Motion mapping comprising five opposite fields of processing (when Front court (C), on the time in two fields before the current field and two frames on the time after the current field) neighborhood In pixel whether some threshold values are met with the luminance difference for determining certain (a little) pixel.Five fields are illustrated in Figure 20, Z^-1Table Show the delay of a field.In other words, generally will be with Z^-1The sequence of time delay show five opposite fields.

Figure 21 illustrates the aperture that some pixels of each of five fields are recognized according to some aspects, and the aperture can For space time filtering.The aperture includes (from left to right) again previous field (PP), previous field (P), current field (C), next Field (N) and again 3 × 3 pixel groups of next field (NN).In certain aspects, if the area of current field is met in equation 26 to 28 Described standard, then the area that current field is thought in Motion mapping is static zones, pixels illustrated position and right in Figure 21 Answer field：

|L_P-L_N|<T₁ (26)

And

Or

Wherein T₁For threshold value,

L_PFor the brightness of the pixel P in P,

L_NFor the brightness of the pixel N in N,

L_BFor the brightness of the pixel B in current field,

L_EFor the brightness of the pixel E in current field,

L_BPPFor the pixel B in PP_PPBrightness,

L_EPPFor the pixel E in PP_PPBrightness,

L_BNNFor the pixel B in NN_NNBrightness, and

L_ENNFor the pixel E in NN_NNBrightness.

Threshold value T₁It can be determined and be provided by the process in addition to release of an interleave through making a reservation for and being set as particular value (for example, as just by metadata of the video of release of an interleave), or threshold value T₁It can be dynamically determined during release of an interleave.

Due at least two reasons, static zones standard of the above described in equation 26,27 and 28 is used than conventional solution Field more than interleaving technique.First, compared with the comparison between not like parity field, the comparison between same parity have compared with Low aliasing and phase mismatch.However, during minimum between field being processed and its most adjacent same parity neighbour Between poor (therefore, correlation) be two fields, than between field being processed and its most adjacent not like parity field neighbour when Between difference it is big.Relatively reliable not like parity field can improve the standard of static zones detection with combining for aliasing relatively low same parity Exactness.

In addition, five fields can be symmetrically distributed past and future, such as Figure 21 relative to the pixel X in present frame C It is middle to be shown.Static zones can be through being subdivided into three classes：Forward direction static (being static state relative to previous frame), backward static state are (relative under One frame for static state), or it is two-way (if meet before to backward both criteria).To static zones this compared with the disaggregated classification property improved Can, especially in scene changes and when object occurs/disappeared.

Slow motor area

If the brightness value of some pixels is unsatisfactory for the standard but satisfaction that be designated as static zones will be designated as slowly The standard of motor area, a then it is believed that area of Motion mapping is slow motor area in Motion mapping.The definition of equation 2 below 9 can Standard for determining slow motor area.Referring to Figure 22, show what is recognized in equation 29 in the aperture centered on pixel X Pixel Ia, Ic, Ja, Jc, Ka, Kc, La, Lc, P and N position.The aperture includes 3 × 7 neighborhood of pixels of current field (C), with And next field (N), 3 × 5 neighborhoods of previous field (P).If pixel X is unsatisfactory for above-mentioned listed standard for static zones and such as Pixel in fruit aperture meets the following standard shown in equation 29, then it is assumed that pixel X is a part for slow motor area：

(L_la-L_lc|+|L_Ja-L_Jc|+|L_Ja-L_Jc|+|L_Ka-L_Kc|+|L_La-L_Lc|+|L_P-L_N|)/5<T₂ (29)

Wherein T₂For threshold value, and

L_Ia、L_Ic、L_Ja、L_Jc、L_Ka、L_Kc、L_La、L_Lc、L_P、L_NRespectively pixel Ia, Ic, Ja, Jc, Ka, Kc, La, Lc, P and N Brightness value.

Threshold value T₂Also it can also be determined and be carried by the process in addition to release of an interleave through making a reservation for and being set as particular value For (for example, as just by metadata of the video of release of an interleave), or threshold value T₂It can be dynamically determined during release of an interleave.

Note that due to the angle of the rim detection ability of wave filter, wave filter can make horizontal edge it is fuzzy (for example, away from More than 45 ° of perpendicular alignmnet).For example, the rim detection ability of aperture illustrated in fig. 22 (wave filter) by pixel " A " with " F " or " C " influences with the angle that " D " is formed.Any edge more more horizontal than this angle and therefore rank most preferably will be not incorporated in Terraced artifact may alternatively appear in the edge.In certain aspects, slow motion class can be divided into two subclasses " horizontal edge " And " other " are in terms of and this rim detection effect.If meeting the standard in the following equation 30 shown, will can slowly it transport Dynamic pixel classifications are horizontal edge, and if being unsatisfactory for the standard in equation 30, then it is so-called to be categorized as slow motion pixel " other " class.

|(LA+LB+LC)-(LD+LE+LF)|<T₃ (30)

Wherein T₃For threshold value, and LA, LB, LC, LD, LE and LF are pixel A, B, C, D, E and F brightness value.

Different interpolating methods can be used to each of horizontal edge and other edges.

Quick motor area

If being unsatisfactory for the standard for the standard of static zones and for slow motor area, it is believed that pixel is quick In motor area.

After the pixel classifications in selected frame, process A (Figure 26) proceeds to frame 2604 and based on exercise intensity Mapping produces interim release of an interleave frame.In in this regard, the selected field of Wmed wave filters 2054 (Figure 20) filtering and necessary opposite field To provide candidate's full frame image F₀, candidate's full frame image can be defined as follows：

Wherein, α_i(i=0,1,2,3) is integer weight, and it is computed as follows：

The interim release of an interleave frame that filters through Wmed is provided to carry out together with estimation and motion compensation process It is illustrated in further processing, such as Figure 20 lower part.

As described above and as shown in equation 31, static interpolation includes interpolation and slow motion and quick fortune between field Dynamic interpolation includes intrafield interpolation.In some aspects of time (for example, between field) interpolation for not needing same parity wherein, Can be by by threshold value T₁(equation 4 to 6) is set as zero (T₁=0) " deactivation " temporal interpolation.Situation about being deactivated in temporal interpolation Under any region class not mapped sport rank can be caused to be static zones to the processing of current field, and Wmed wave filters 2054 (Figure 20) is grasped using three fields illustrated in the aperture in Figure 22, its no parity adjacent to a current field and two Make.

Denoising

In certain aspects, denoising device can be used for before candidate's Wmed frames are further handled using motion compensation information Noise is removed from candidate Wmed frames.The removable noise being present in Wmed frames of denoising device and stick signal presence, but regardless of How is the frequency content of signal.Various types of denoising acoustic filters, including wavelet filter can be used.Small echo is used in space Certain function summary with positioning Setting signal in ratiometric conversion domain.Basic idea based on small echo for (scale) in varing proportions or Resolution analysis signal is to cause the small change in Wavelet representation for transient to produce corresponding small change in initial signal.

In certain aspects, denoising acoustic filter is based on (4,2) biorthogonal cubic B-Spline (spline) wavelet filter Aspect.A such a wave filter can be defined by following direct transform and inverse transformation：

And

The application of denoising acoustic filter can increase the accuracy for the motion compensation having in noise circumstance.It is assumed that in video sequence Noise be the white Gauss of additivity.Pass throughRepresent estimated noise change.It can estimated be highest frequency sub-bands coefficient Median absolute deviation divided by 0.6745.The embodiment of the wave filter be further described in D.L. Du Nuofu (D.L.Donoho) and I.M. Johnston (I.M.Johnstone) " ideal space carried out by wavelet shrinkage adapts to (Ideal spatial Adaptation by wavelet shrinkage) " (biostatistics (Biometrika), volume 8, the 425-455 pages, 1994) in, it is incorporated herein in entirety by reference.

Wavelet shrinkage or small echo Wiener filter also act as denoising device.Wavelet shrinkage denoising can relate to wavelet transformation Contraction in domain, and generally comprise three steps：Linear positive wavelet transformation, nonlinear shrinkage denoising and linear inverse small echo become Change.Wiener filter is MSE optimum linear filters, and it can be used for improving due to additive noise and figure that is fuzzy and degrading Picture.The wave filter be typically it is known in technique and be described in (such as) it is referred to above " by wavelet shrinkage The ideal space of progress adapts to (Ideal spatial adaptation by wavelet shrinkage) " and the S.P. dagger-axe last of the twelve Earthly Branches You are (S.P.Ghael), and A.M. Saids (A.M.Sayeed) and R.G. Baraniks (R.G.Baraniuk) are written " via examination Test improvement Wavelet Denoising Method sound (the Improvement Wavelet denoising via empirical of Wiener filtering progress Wiener filtering) " (SPIE journals (Proceedings of SPIE), volume 3169, the 389-399 pages, the Holy Land is sub- Brother (San Diego), in July, 1997) in.

Motion compensation

Referring to Figure 27, at frame 2702, process B performs bi-directional motion estimation, and then uses estimation at frame 104 To perform motion compensation, motion compensation be discussed in Figure 20 and be described below in illustrative aspect in.In Wmed There is one " delayed (lag) " between wave filter and deinterlacer based on motion compensation.From as show in Figure 23 it is previous Frame " P " is with the information prediction in a later frame " N " for " lacking " data (the non-originating row of pixel data) of current field " C " Motion compensation information.In current field (Figure 23), solid line represents that row and dotted line present in initial pixel data represent to pass through Row present in the pixel data of Wmed interpolations.In certain aspects, motion compensation is performed in the neighborhood of pixels that 4 rows multiply 8 row. However, this neighborhood of pixels be an example for illustrative purposes, and those skilled in the art it should be appreciated that can based on comprising Motion compensation is performed in the other side for the neighborhood of pixels that different number rows are arranged from different numbers, the selection of neighborhood of pixels can Based on many factors, the factor includes (for example) calculating speed, available processes power or just by the multi-medium data of release of an interleave Feature.Because current field only has the half of the row, four rows to be matched actually correspond to the picture of 8 pixels × 8 The area of element.

Referring to Figure 20, mean square error and (SSE) can be used in two-way ME/MC 2068, and it can be used for measurement relative under Wmed One prediction block of the Wmed present frames 2060 of one frame 2058 and present frame 2070 through release of an interleave and one is predicted class between block Like property.The generation of motion-compensated present frame 2066 then uses the Pixel Information from most similar match block to fill most Missing data between first pixel line.In certain aspects, the two-way biasings of ME/MC 2068 are come the previous frame for release of an interleave of hanging oneself The Pixel Information of 2070 information gives the Pixel Information more weights because the Pixel Information by motion compensation information and Wmed information is produced, and only filters release of an interleave Wmed next frames 2058 by space time.

In certain aspects, the matching in the region to improve the field with similar luminance area but different chroma areas Can, can be used a measurement, the measurement include pixel one or more brightness groups (for example, one 4 rows × 8 row it is bright Spend block) and pixel one or more colourity groups (for example, two 2 rows × 4 row chrominance block U and V) pixel value Composition.Methods described efficiently reduces the mismatch at color sensitivity region.

Motion vector (MV) has the granularity of 1/2 pixel in vertical dimension and has 1/2 or 1/4 picture in horizontal size The granularity of element.Interpolation filter can be used to obtain fraction pixel (fractional-pixel) sample.For example, it can be used for Obtaining some wave filters of half-pix sample includes bi-linear filter (1,1), the interpolation filter H.263/AVC recommended： (1, -5,20,20, -5,1), and six branch's hamming window (Hamming windowed) SIN function wave filters (3, -21,147, 147,-21,3).1/4 pixel samples can be produced by using bi-linear filter from both full-pixel and half-pix sample.

In certain aspects, motion compensation can be used polytype search procedure to match in a certain position of present frame The data (for example, describe an object) at place with it is corresponding at the diverse location in another frame (for example, next frame or former frame) The difference of position in data, respective frame indicates the motion of the object.For example, search procedure is relatively wantonly searched for using can cover The full motion search in rope area or the fast motion estimate that less pixel can be used, and/or for the selected pixel in search pattern There can be given shape (for example, rhombus).For fast motion estimate, the field of search can be using estimation or motion candidates person in The available starting point for being search for consecutive frame of the heart, estimation or motion candidates person.In certain aspects, it can estimate from external movement Device, which produces MV candidates and provides MV candidates, arrives deinterlacer.Correspondence in previously motion-compensated consecutive frame is adjacent The motion vector of the macro block in domain also acts as estimation.In certain aspects, can be from searching for corresponding previous frame and next The neighborhood of macro block (for example, 3 macro blocks × 3 macro block) of frame produce MV candidates.

Figure 24 illustrates as show in Figure 23 can be during motion estimation/compensation by searching for former frame and next frame Neighborhood and produce two MV mapping MV_PAnd MV_NAn example.In MV_PWith MV_NIn, the pending block to determine movable information is The central block represented by " X ".In MV_PWith MV_NIn, have what can be used during current block X being processed estimation Nine MV candidates.In this example, four in the MV candidates are present in from previously performed motion search In same field and pass through MV_PWith MV_NThe block (Figure 24) of middle paler colour is described.Five described by the deeper block of color Other MV candidates are replicated from the movable information (or mapping) of the previously frame through processing.

After motion estimation/compensation is completed, two can be produced for the row (by represented by the dotted line in Figure 23) lacked Individual interpolation results：By an interpolation results of Wmed wave filters (the Wmed present frames 2060 in Figure 20) generation and by motion compensator The interpolation results that the motion estimation process of (MC present frames 2066) is produced.Combiner 2062 is generally by using Wmed present frames At least a portion of 2060 and MC present frames 2066 is to merge Wmed present frames 2060 and MC present frames 2066 to produce current warp The frame 2064 of release of an interleave.However, under certain conditions, combiner 2062 can be used only in present frame 2060 or MC present frames 2066 One of produce the current frame through release of an interleave.In one example, the merging of combiner 2062 Wmed present frames 2060 are current with MC Frame 2066 is with generation such as the output signal through release of an interleave shown in equation 36：

WhereinFor position x=(x, y)^tThe field n at place₁In brightness value, wherein^tFor transposition.Using definition such as Under cut function

Cut (0,1, a)=0, if (a<0)；1, if (a>1)；A , Fou The (37)

k₁May be calculated for：

Wherein C₁For robustness parameter, and Diff for prediction frame pixel with the available pixel in predicted frame (from existing Obtain) between luminance difference.By suitably selecting C₁, it is possible to tune the relative importance of mean square error.Can be such as equation 39 It is middle to show calculating k₂：

Wherein For motion vector, δ is for preventing from being divided by zero small constant.Use Cut function (clipping function) is further described in the G.D. Chinese (G.D.Haan) and E.B. shellfishes come the release of an interleave filtered Le Si (E.B.Bellers) " release of an interleave (De-interlacing of video data) of video data " (electric and electronic Transaction (IEEE Transactions on Consumer Electronics) of the IEEE on consumption electronic product, Volume 43, the 3rd phase, the 819-825 pages, 1997) in, it is incorporated herein in entirety by reference.

In certain aspects, combiner 2062 can be configured to attempt and safeguard below equation to realize high PSNR and sane As a result：

It is possible to Wmed+MC release of an interleaves scheme come the decoupling release of an interleave prediction side for including interpolation and intrafield interpolation between field Case.In other words, space time Wmed filtering may be used primarily for intrafield interpolation purpose, and can be performed during motion compensation between field Interpolation.This reduces the Y-PSNR of Wmed results, but visual quality is more satisfactory after application motion compensation, because The bad pixel of predictive mode decision-making will be removed from Wmed filterings between inaccurate field.

Colourity processing can be consistent with arranged brightness processed.Produced according to Motion mapping, by observing chroma pixel The sport rank of four arranged luminance pixels and the sport rank for obtaining the chroma pixel.The operation can be based on voting (chroma motion grade borrows main luma motion grade).However, it is proposed that using following conservative approach.If four bright Any one of degree pixel has quick sport rank, then chroma motion grade should be quick motion；Otherwise, if four brightness pictures Any one of element has slow sport rank, then chroma motion grade will be slow motion；Otherwise chroma motion grade is static state 's.The conservative approach possibly can not realize highest PSNR, but no matter whether there is ambiguity in chroma motion grade, described conservative Method avoids the risk predicted using INTER.

The Wmed algorithms and the Wmed of combination described herein individually described using described warp is calculated with motion compensation Method carrys out release of an interleave multi-medium data sequence.Also (or average) algorithm and " non-release of an interleave " situation (wherein, only group are blended using pixel Close field and without any interpolation or blending) carry out release of an interleave identical multi-medium data sequence.Gained frame is analyzed to determine PSNR And PSNR is shown in following table：

Even if being only capable of improving edge PSNR plus Wmed release of an interleave by using MC, by combining Wmed and MC interpolation knots The visual quality of release of an interleave image produced by fruit is still visually more satisfactory due to set forth above, combination Wmed results and MC results can suppress the aliasing and noise between even field and odd field.

In in terms of some resamplings, leggy resampler is reset through implementing to be used for image size.In lower sampling An example in, ratio between initial image and the image of adjusted size can be the whole of prime number each other for p/q, wherein p and q Number.The total number of phase is p.For the factor that is sized for being about 0.5, the cut-off frequency of polyphase filters is in some respects In be 0.6.Cut-off frequency Incomplete matching is sized ratio, so as to the high frequency response of the sequence that improves adjusted size.This is not Allow some aliasings with can avoiding.It is well known, however, that compared with fuzzy and image without aliasing, human eye prefer it is clear but There is the image of some aliasings.

Figure 42 illustrates an example of leggy resampling, its show be sized than for 3/4 when phase.It is described in Figure 42 Bright cut-off frequency is also 3/4.Illustrate initial pixel with vertical axis in figure 4 above 2.Also by SIN function (sinc Function) it is plotted as centered on the axle representing filter shape.Because we select cut-off frequency and resampling ratio It is identical, so the zero of the SIN function location overlap after pixel size is readjusted with pixel, with ten in Figure 42 Font line illustrates described overlapping.To obtain pixel value after be sized, can as in below equation displaying from initial picture The total composition of element：

Wherein f_cFor cut-off frequency.Above 1-D polyphase filters are applicable to horizontal size and vertical dimension.

The another aspect meter of resampling (being sized) and overscanning.In ntsc television signal, an image has 486 Scan line, and in digital video, can have 720 pixels in each scan line.However, due to the size and screen lattice Mismatch between formula, and not all complete image can see on TV.The invisible part of image is referred to as overscanning.

To help broadcasting station that useful information is placed in in the visible area of TV as much as possible, film and Television Engineer Association (SMPTE) defines the particular size for the action action frame for being referred to as safe action area and safe header area.See what SMPTE recommended Specification on the safe action area for television system and safe header area test pattern puts into practice RP 27.3-1989 (practice RP 27.3~1989on Specifications for Safe Action and Safe Title Areas Test Pattern for Television Systems).Safe action area is defined as that " all notable actions are necessary by SMPTE The area of generation ".Safe header area is defined as " to limit all useful informations to ensure on most of household television reception devices Observability " area.For example, as illustrated by Figure 43, safe action area 4310 occupies the center 90% of screen, around gives Go out 5% border.Safe header area 4305 occupies the center 80% of screen, provides 10% border.

Referring now to Figure 44, because safe header area is so small, in order to add more contents in the picture, some broadcast Text will be included in safe action area, described this paper is inside white rectangle window 4415.Generally it can be seen that in overscanning black Color border.For example, in Figure 44, black border appears in the upside 4420 and downside 4425 of image.Can be in overscanning The black border is removed, because H.264 video is extended in estimation using border.Black border through extension can increase It is remaining.Border suitably can be cut down 2% by us, and then be adjusted size.Can therefore it produce for adjusted size of filter Ripple device.Performed before being sampled under leggy and block to remove overscanning.

Deblocking/decyclization

, can be to all 4 × 4 block edges (edges of the boundary of the frame of a frame in an example of deblocking processing And be deactivated except any edge of de-blocking filter process) apply de-blocking filter.Will complete frame construction process after with All macro blocks in this filtering, a frame are performed based on macro block to be handled with the order of incremental macroblock address.It is right In each macro block, vertical edge is from left to right filtered first, and then filters horizontal edge from the top to bottom.For horizontal direction and For vertical direction, brightness block elimination filtering process is performed in four 16 sample edges and use is performed in two 8 sample edges In the block elimination filtering process of every chrominance component, as shown in Figure 39.The process of deblocking to previous macro block may be passed through The top of current macro changed and the sample value to left is operated to will act as to the block elimination filtering process of current macro Input and can further be changed during the filtering to current macro.Changed during the filtering to vertical edge Sample value can be used as the input of the filtering of the horizontal edge for same macro block.Can individually it be called for brightness and chromatic component Block process.

, can be adaptively using 2-D wave filters so that the area of adjacent edges be smooth in an example of decyclization processing.Edge Pixel experience seldom filters or not suffered from filtering to avoid obscuring.

GOP dispensers

Description is including may include that the Bandwidth map in GOP dispensers is produced, story board is detected and adaptive GOP points below The illustrative example for the processing cut.

Bandwidth map is produced

Human vision quality V can be codec complexity C and the allocated position B (also referred to bandwidth) function.Figure 29 is to say The chart of this bright relation.It note that in terms of human vision viewpoint, codec complexity measurement C considers space and temporal frequency.Because Human eye is more sensitive to distortion, so complexity value is corresponding higher.Generally it can be assumed that：V monotone decreasings in C, and dullness is passed in B Increase.

To realize constant visual quality, by bandwidth (B_i) it is assigned to i-th of object (frame or MB) to be encoded, the band Width (B_i) standard represented in two equatioies immediately below satisfaction：

B_i=B (C_i,V) (42)

In two equatioies of surface, C_iFor the codec complexity of i-th of object, B is total available bandwidth, and V is The visual quality realized for an object.

Human vision quality is difficult to be represented with equation.Therefore, non-explication above equation group.But, if it is assumed that 3-D Model is continuous in all variables, then it is believed that bandwidth ratio (B_i/ B) (C, V) to neighborhood in it is constant.Opened up following Bandwidth ratio β defined in the equation shown_i：

β_i=B_i/B (44)

Then bit allocation can be defined as represented in below equation：

β_i=β (C_i)

Wherein

(C_i,V)∈δ(C₀,V₀)

Wherein δ indicates " neighborhood ".

Codec complexity is in space with being influenceed on the time by human vision sensitiveness.Ji Luode (Girod) mankind regard It is an example of the model available for definition space complexity to feel model.This model considers local spatial frequencies and ambient lighting. Gained measurement is referred to as D_csat.At pretreatment point in this process, it is not known that image will be intra-encoded or through interframe Encode and produce the bandwidth ratio for both.According to the β of different video object_INTRABetween ratio and distribute position.For through frame in The image of coding, bandwidth ratio is shown in below equation：

β_INTRA=β_0INTRAlog₁₀(1+α_INTRAY²D_csat) (46)

In above equation, Y is the average luminance component of macro block, α_INTRAFor for luminance square and subsequent D_csat Weighting factor, β_0INTRAFor for ensureingRegular factor.For example, α_INTRA=4 value realizes good vision Quality.Content information (for example, classifying content) can be used for α_INTRAIt is set as a value, described value corresponds to the certain content of video Desired good vision credit rating.In one example, if video content includes " spokesman head (talking Head) " news broadcast, then because may think that the frame of the video or displayable part are not so good as audio-frequency unit weight Will, so visual quality level can be set as relatively low, and less bits can be distributed for coded data.In another example, such as Fruit video content includes sport events, then because image shown for observer is even more important, content Information can be used for α_INTRAThe value of higher visual quality level is set to correspond to, and therefore can distribute more multidigit for coded number According to.

To understand this relation, it is noted that, bandwidth is to be assigned codec complexity in logarithmic fashion.Luminance square Y²Reflection The fact that the coefficient with larger value is using compared with multidigit to encode.To prevent logarithm from obtaining negative value, addition one is into bracket .It it is also possible to use the logarithm with other radixes.

Time complexity is determined by the measurement to frame difference metric, it is described measurement consider amount of exercise (for example, Motion vector) and such as absolute difference and (SAD) frame difference metric in the case of measure two successive frames between difference.

Bit allocation for inter-coded image is contemplated that space complexity and time complexity.This be shown in Under：

β_INTER=β_0INTERlog₁₀(1+α_INTER·SSD·D_csatexp(-γ||MV_P+MV_N||²)) (47)

In above equation, MV_PAnd MV_NFor the forward motion vector and backward motion vectors for current MB.It may be noted that Arrive, the Y in intra-encoded bandwidth formula²Replaced by the difference of two squares with (SSD).To understand in above equation | | MV_P+MV_N ||²Effect, note the following characteristics of human visual system：Experience it is smooth, it is predictable motion (it is small | | MV_P+MV_N||²) Area attracts notice and can be tracked as eyes and can not generally stand the distortion more than than static region.However, experience it is quick or Uncertain motion (big | | MV_P+MV_N||²) area can not be traced and notable quantization can be stood.Experiment shows：α_INTER= 1st, good vision quality is realized in γ=0.001.

Story board is detected

One illustrative example of story board detection is described below.The component and process may include in GOP dispensers 412 In (Fig. 4).

Motion compensator 23 can be configured to determine the bi-directional motion information on the frame in video.Motion compensator 23 is also It can be configured to determine one or more difference measurements, such as absolute difference and (SAD) or the difference of two squares and (SSD), and Calculate monochrome information (for example, macro block (MB) average brightness or difference), the brightness for including being used for one or more frames The other information of histogram difference and a frame difference metric, the example of the other information is described with reference to equation 1 to 3.Camera lens point Class device can be configured with using determined by motion compensator information by the frame classification in video into more than two classes or two classes " camera lens ".Encoder is configured to adaptively encode multiple frames based on the shot classification.Retouched below with reference to equation 1 to 10 State motion compensator, shot classification device and encoder.

The block diagram of preprocessor 202 according to Figure 28 in terms of some, the preprocessor 202 is comprising being configured for use in Story board detection and the processor 2831 of other pretreatment operations.Can be by outside preprocessor 202 as show in Figure 4 Source provides digital video source and digital video source is sent to the communication module 2836 in preprocessor 202.Preprocessor 202 contains There is the storage media 2825 communicated with processor 2831, processor 2831 is logical with communication module 2836 with storage media 2825 Letter.Processor 2831 includes the operable mirror to produce as described in this article in movable information, the frame for video data of classifying Head and motion compensator 2032, shot classification device 2833 and the other modules for pretreatment for performing other pretreatment tests 2034.Motion compensator, shot classification device and other modules can contain the process for the respective modules being similar in Fig. 4, and can locate Video is managed to determine information discussed below.Specifically, processor 2831 can have one configuration with：Obtain and indicate multiple regard The measurement (measurement includes bi-directional motion information and monochrome information) of difference between the consecutive frame of frequency frame, based on the measurement And the shot change in the multiple frame of video is determined, and the multiple frame of adaptive coding based on the shot change. In some aspects, the measurement can be calculated by the device or process outside processor 2831, described device or process are also It can directly or indirectly be communicated via another device or memory in the outside of preprocessor 202 and with processor 2831.Can also be by Reason device 2831 calculates the measurement, for example, calculating the measurement by motion compensator 2832.

Video and metadata for further handling, encoding and transmitting is provided and arrives other devices, example by preprocessor 202 Such as, terminal 6 (Fig. 1).Encoded video in certain aspects can be more for that can include the scalable warp of Primary layer and enhancement layer The video of layer coding.Scalable layer coding, which is further described in, entitled " has the scalable of two layer encoding and single layer coding Video coding (Scalable Video Coding With Two Layer Encoding And Single Layer Decoding in the same U.S. patent application case [attorney docket is 050078] in application) ", the application case returns this to send out Bright assignee possesses and it is incorporated herein in entirety by reference.

Various illustrative components, blocks, component with reference to described by Figure 28 and other examples disclosed herein and schema, mould Block and circuit can be used to lower device and be carried out or perform in certain aspects：General processor, digital signal processor (DSP), application specific integrated circuit (ASIC), field programmable gate array (FPGA) or other programmable logic devices, discrete gate or Transistor logic, discrete hardware components or its any combinations for being designed to perform functionality described herein.For example in Figure 28 The general processor of the processor shown can be microprocessor, but in alternative embodiments, processor can be at any routine Manage device, controller, microcontroller or state machine.Processor can also be embodied as the combination of computing device, for example, DSP and microprocessor The combination of device, the combination of multi-microprocessor, one or more microprocessors are combined with DSP core, or any other Such a configuration.

Video coding is generally operated to constructed groups of pictures (GOP).GOP is generally by intra-encoded frame (I Frame) start, it is followed by a series of P (prediction) or B (two-way) frame.Generally, I frames can store all numbers for showing the frame According to B frames are dependent on the data in former frame and a later frame (for example, only containing the data changed from former frame or different from next Data in frame), and P frames contain the data changed from former frame.

Common in use, in encoded video, I frames are scattered with P frames and B frames.With regard to size (for example, for encoding The number of the position of the frame) for, I frames are generally more much bigger than P frame, and P frames are again bigger than B frame.For it is efficient coding, transmission and Decoding process, GOP length answers long enough and can reduce the effective loss for carrying out arrogant I frames, and answers short enough and can hinder The only mismatch between encoder and decoder, or channel impairment.Further, since it is identical the reason for, macro block (MB) in P frames can be through Intraframe coding.

Scene change detection can be used for video encoder to determine appropriate GOP length and insert I frames based on GOP length, Rather than insert I frames with fixed intervals.In actual streaming video system, communication channel is typically due to an error or package loss And it is undermined.Where I frames or I MB can significantly affect decoded video quality and observation experience if being placed in.A kind of encoding scheme is pair It is intra-encoded in being used with the image significantly changed or image section from arranged prior images or image section Frame.Usual unavailable estimation efficiently and effectively predicts the region, and if the region is not by interframe encode skill Art (for example, using B frames and the coding of P frames) is acted on, then can efficiently be encoded.In the case of channel impairment, institute State region and can suffer from error propagation, (or almost such) described error propagation can be reduced or eliminated by intraframe coding.

Can be by the part classifying of GOP videos into more than two classes or two classes, each of which region can have different frame ins to compile Code standard, it may depend on particular., can be by visual classification into three classes as an example：Unexpected scene changes, are handed over Fork decline and other slow scene changes, and camera flash-light.Unexpected scene changes include generally being caused by camera operation The frame for being markedly different from previous frame.Because the content of the frame is different from the content of previous frame, unexpected scene changes Frame should be encoded to I frames.Cross compound turbine and other slow scene changes include generally being caused by the computer disposal of camera lens Scene slowly switching.The gradually blending of two kinds of different scenes may be more satisfactory in human eye, but its propose to regarding The challenge of frequency coding.Motion compensation can not efficiently reduce the bit rate of the frame, and renewable more interior for the frame Portion MB.

When the content of frame includes camera flash, camera flash-light or camera flash events occur.The flash of light is continuing ((for example, a frame)) relatively short on time and extremely become clear, so that the pixel in the frame of a description flash of light is relative to consecutive frame On corresponding area show generally high brightness.Camera flash-light suddenly and rapidly changes the brightness of image.Camera flash-light Duration generally it is shorter than the temporary transient masking duration of human visual system (HVS), human visual system's (HVS) is temporary transient The masking duration is generally defined as 44ms.Human eye is insensitive to the quality of the short lightness burst, and therefore can be right It is encoded roughly.Because unavailable motion compensation effectively handle flash lamp frame and flash lamp frame be future frame it is bad Predicting candidate person, so the rough coding of the frame will not reduce the code efficiency of frame in future.Due to " artificial " high brightness, warp The scene for being categorized as flash lamp bel not applied to predict other frames, and due to very same reason, other frames are not effectively used for prediction The frame.Once recognizing the frame, the frame is just can be taken off, because the frame needs relatively high treating capacity.One option is shifting Except camera flash-light frame and the coding DC coefficients in the position of camera flash-light frame；Such a solution is simple, is being calculated On be quick and save many positions.

When detecting any one of above frame, declare camera lens event.Story board detection not only facilitates improvement coding matter Amount, and also assist in identification video contents search and index.The one side of scene detection process is described below.

Figure 30 is illustrated to operate GOP and can be used for being compiled based on the story board detection in frame of video in certain aspects The process 3000 of code video, the part (or subprocess) of wherein process 3000 is described by and illustrated referring to Figure 30 to Figure 40.Place Reason device 2831 can be configured with and have process 3000.After the beginning of process 3000, process 3000 proceeds to frame 3042, in frame In 3042, the measurement (information) for frame of video is obtained, the measurement includes the information for indicating the difference between consecutive frame.It is described Measurement includes being used subsequently to determine the bi-directional motion information for appearing in the change between consecutive frame and the information based on brightness, described Information can be used for shot classification.The measurement can be obtained from another device or process, or institute is calculated by (such as) processor 2831 State measurement.The illustrative example produced with reference to the process A description measurements in Figure 31.

Process 3000 proceeds to frame 3044, in frame 3044, and the shot change in video is determined based on the measurement. Frame of video can be categorized into the lens types being contained in frame more than two classes or two classes, for example, unexpected scene changes, slow The scene of change or the scene (camera flash) containing high luminance values.Some embodiment codings may need other classes.With reference to Process B and more detailed process D, E and F with reference in Figure 34 to Figure 36 in Figure 32 describe an illustrative example of shot classification.

Once being classified to frame, process 3000 just proceeds to frame 3046, in frame 3046, and shot classification knot can be used Fruit carrys out coded frame or is coding designated frame.The result can influence with intra-encoded frame come coded frame or with prediction frame (for example, P frames or B frames) carrys out coded frame.Process C displayings in Figure 33 use an example of the encoding scheme of camera lens result.

Figure 31 illustrates an example of the process for obtaining video metric.Figure 31 illustrates to come across in Figure 30 frame 3042 Some steps.Referring still to Figure 31, in frame 3152, process A obtains or determined the bi-directional motion estimation and compensated information of video.Figure 28 motion compensator 2832 can be configured to perform bi-directional motion estimation to frame and determine available for subsequent shot classification Motion compensation information.Process A proceeds to frame 3154, in frame 3154, process A produce include being used for current or selected frame with The monochrome information of the luminance difference histogram of one or more consecutive frames.Finally, process A then continues to frame 3156, In frame 3156, the measurement for indicating camera lens contained in frame is calculated.One it is such a measurement be in two examples in equation 4 and 10 The frame difference metric of displaying.Description determines the illustrative example of movable information, monochrome information and frame difference metric below.

Motion compensation

To perform bi-directional motion estimation/compensation, a video sequence, the two-way fortune can be pre-processed with bidirectional motion compensator (one in the past, and one in general with two frames in most adjacent contiguous frames by every one 8 × 8 pieces of present frame for dynamic compensator Come) in Block- matching.Motion compensator produces each piece of motion vector and difference measurement.Figure 37 illustrates this concept, Tu37Zhan Show an example of the pixel matching by present frame C pixel Yu past frame P and (or next) the frame N in future, and Figure 37 is painted into Motion vector (the past motion vector MV of matched pixel_PAnd motion vector MV in future_N).The following is to bidirectional motion vector Produce and about the Short Description of the illustrative aspect encoded.

Figure 40 illustrates an example of the prediction frame coding in motion vector determination process and (such as) MPEG-4.Institute in Figure 40 The process of description is being described in more detail for pair example procedure that can occur in Figure 31 frame 3152.In Figure 40, present image 4034 are made up of 5 × 5 macro blocks, and the number of the wherein macro block in this example is arbitrary.One macro block is by 16 × 16 pixel groups Into.Pixel can be defined by 8 brightness values (Y) and two 8 chromatic values (Cr and Cb).

, can be with 4 in MPEG:2:0 form stores Y, Cr and Cb component, wherein in X and Y-direction with factor 2 to Cr and Cb components carry out lower sample.Therefore, each macro block will be made up of 256 Y-components, 64 Cr components and 64 Cb components.Not At the time point for being same as present image 4034, the macro block 4036 of present image 4034 is predicted from reference picture 4032.With reference to figure It is grand to be positioned at best match in Y, Cr and Cb value closest to the current macro 4036 being just encoded as scanning in 4032 Block 4038.4038 position in reference picture 4032 of best match macro block is encoded in motion vector 4040.Reference picture 4032 can be the I frames or P frames that decoder has just been reconstructed before construction present image 4034.Subtracted from current macro 4036 Best match macro block 4038 (difference for calculating each of Y, Cr and Cb component), so as to produce residual error 4042.With two dimension The coded residual error 4042 of (2D) discrete cosine transform (DCT) 4044 and then quantified 4046.It is executable quantify 4046 with Less position is distributed to high frequency coefficient and distribute more position to low frequency coefficient and space compression is provided by (such as).Remnants are by mistake Poor 4042 quantified coefficient and motion vector 4040 and the identification information of reference picture 4034 is to represent current macro 4036 Coding information.Coding information can be stored in memory to use in the future or for (such as) error correction or image enhaucament Purpose operated, or transmitted over the network 140.

The encoded quantified coefficient of residual error 4042 and encoded motion vector 4040 can be used in coding Current macro 4036 is reconstructed in device for use as the part for subsequent estimation and the reference frame of compensation.Encoder can be imitated The program of the decoder reconstructed for this P frame.Imitate decoder encoder will be caused to be worked with decoder with same reference picture. Restructuring procedure is presented herein, no matter the restructuring procedure is carried out for further interframe encode either in solution in the encoder Carried out in code device.It can start to reconstruct P frames after reconstructed reference frame (or the image or a part for frame being just referenced).De-quantization 4050 encoded quantified coefficients and then execution 2-D discrete cosine inverse transformation DCT or IDCT 4052, so as to produce through solution Code or the residual error 4054 of reconstruct.Encoded motion vector 4040 is decoded and in reconstructed reference picture Reconstructed best match macro block 4056 is positioned in 4032.Then reconstructed residual error 4054 is added to reconstructed Best match macro block 4056 is to form reconstructed macro block 4058.Reconstructed macro block 4058 can be stored in memory, independent Ground is shown in an image together with other reconstructed macro blocks, or after further treatment for image enhaucament.

Using the coding (or with the bi-directional predicted any section encoded) of B frames using the region in present image Time between the best match estimation range in previous image and the best match estimation range in latter image is superfluous It is remaining.Combine the bi-directional predicted region that latter best match estimation range is combined with previous best match estimation range to be formed.When Difference between the bi-directional predicted region of the combination of preceding image-region and best match is residual error (or predicated error).Can be Position and best match estimation range of the best match estimation range in latter reference picture is encoded in two motion vectors to exist Position in previous reference picture.

Luminance Histogram Difference

Motion compensator can produce each piece of difference measurement.The difference measurement can be the difference of two squares and (SSD) or exhausted To difference and (SAD).Do not lose it is general in the case of, herein SAD be used as an example.

For each frame, SAD ratios are calculated as follows：

Wherein SAD_PAnd SAD_NThe respectively sum of the absolute difference of forward and backward difference measurement.It note that denominator contains one small Positive number ε to prevent " be divided by zero " error.Molecule is also containing ε to balance in denominator one effect.For example, if preceding One frame, present frame and next frame are identicals, then motion search should produce SAD_P=SAD_N=0.In this situation, the above is calculated Produce γ=1 rather than 0 or unlimited.

A brightness histogram can be calculated for each frame.Multi-media image generally has the brightness and depth of 8 (for example, " interval (bin) number ").16 can be set to obtain Nogata according to the brightness and depth that some aspects are used to calculate brightness histogram Figure.In other side, brightness and depth can be set to proper number, and the proper number may depend on data being processed Type, available calculating power or other preassigneds.In certain aspects, can be based on the measurement for calculating or receiving (for example, content of data) and dynamic set brightness and depth.

Equation 49 illustrates the example for calculating Luminance Histogram Difference (lambda)：

Wherein N_PiFor the number of the block in i-th of interval for former frame, and N_CiFor i-th of area for present frame Between in block number, and N be a frame in block total number.If the Luminance Histogram Difference of former frame and present frame is complete Different (or non-intersect), then λ=2.

The frame difference metric D discussed with reference to Fig. 5 block 56 can be calculated as shown in equation 50：

Wherein A be according to apply selected constant, andAnd

Figure 32 illustrates the mistake that the change of three class camera lenses (or scene) is determined using the measurement for obtaining or determining for video A journey B example.Figure 32 illustrates some steps occurred in the one side of Figure 30 frame 3044.Referring again to Figure 32, in frame In 3262, process B determines whether frame meets the standard by unexpected scene changes are designated as first.Process D in Figure 34 is said One example of this bright determination.Process B proceeds to frame 3264, in frame 3264, and whether determine the frame is slowly varying A part for scene.Process C in Figure 35 illustrates the example for determining slowly varying scene.Finally, at frame 3266, process B determines whether frame contains camera flash (in other words, different from the big brightness value of former frame).Process F explanations in Figure 36 It is determined that an example of the frame containing camera flash.One illustrative example of the process is described below.

Unexpected scene changes

Figure 34 is the flow chart for illustrating to determine the process of unexpected scene changes.Figure 34 is further elaborated on can be in Figure 32 Frame 3262 some aspects in some steps for occurring.Check whether frame difference metric D is met in equation 51 at frame 3482 The standard shown：

Wherein A is according to the selected constant of application, and T₁For threshold value.If meeting the standard, at frame 3484, Process D specifies the frame to be unexpected scene changes, and in this example, it is not necessary to any other shot classification.

In one example, simulation shows setting A=1 and T₁=5 realize excellent detection performance.If present frame is unexpected Scene change frame, then γ_CShould big and γ_PShould be small.Ratio can be usedWithout γ is used alone_CTo cause measurement through on regular turn to Activity Level hereafter.

It note that above standard uses Luminance Histogram Difference (λ) with nonlinear method.Figure 39 illustrates that λ * (2 λ+1) are convex Function.When λ small (for example, close to zero), it is only pre-emphasis.When λ becomes big, more emphasize is carried out by the function. In the case of this pre-emphasis, for any λ more than 1.4, if threshold value T₁5 are set to, then detects unexpected scene Change.

Cross compound turbine and slow scene changes

Figure 35 further illustrates other details in terms of some that can occur in Figure 32 step 3264.Referring to Figure 35, At frame 3592, process E determine frame whether be the series of frames for describing slow scene changes a part.If frame difference degree Measure D and be less than first threshold T₁And more than or equal to Second Threshold T₂(illustrated in such as equation 52), then process E determines that present frame is Cross compound turbine or other slow scene changes：

T₂≤D<T₁ (52)

For certain number successive frame, wherein T₁For same threshold value used above and T₂For another threshold value.Due to embodiment party Possible difference, T in case₁And T₂Explicit value generally determined by normally testing.If meeting standard, at frame 94, Frame classification is the part classified for the slowly varying scene shot of selected frame end by process E.

Camera flash-light event

The process F shown in Figure 36 is that can determine that whether present frame includes an example of the process of camera flash-light. In this illustrative aspect camera, luminance histogram statistics are used to determine whether present frame includes camera flash-light.As at frame 3602 Shown, whether brightness of the process F by determining present frame first is more than the brightness of former frame and the brightness of next frame to determine Camera flash events are in selected frame.If answer is no, frame is not camera flash events；But if answer is yes, then Frame may be camera flash events.At frame 3604, process F determines whether backward difference measurement is more than threshold value T₃, and forward direction is poor Whether different measurement is more than threshold value T₄；If two conditions are met, at frame 3606, process F divides present frame Class is with camera flash-light.In one example, at frame 3602, process F determines that the mean flow rate of present frame subtracts former frame Mean flow rate whether equal or exceed threshold value T₃, and process F determines that the mean flow rate of present frame subtracts the average bright of next frame Whether degree is more than or equal to threshold value T₃, as shown in equation 53 and 54：

If being unsatisfactory for standard, present frame is not categorized as comprising camera flash-light and process F returns.If meeting mark Standard, then process F proceed to frame 3604, in frame 3604, it is determined that backward difference measurement SAD_PAnd forward difference measurement SAD_NIt is whether big In specific threshold T₄, as illustrated by equation 5 below 5 and 56：

SAD_P≥T₄ (55)

SAD_N≥T₄ (56)

WhereinFor the mean flow rate of present frame,For the mean flow rate of former frame,For the average bright of next frame Degree, and SAD_PAnd SAD_NFor the forward and backward difference measurement associated with present frame.If being unsatisfactory for standard, process F is returned Return.

Because implementing described process can cause to include the difference of the operating parameter of threshold value, T₃Value is generally by just Often experiment is determined.Because camera flash generally only carries out a frame, sad value is included in determination, and due to brightness Difference, it is impossible to predict this frame well from forward direction and backward direction using motion compensation.

In certain aspects, threshold value T₁、T₂、T₃And T₄One of or it is one or more of through predetermined and described value through being incorporated into In shot classification device in code device.The test of the particular generally detected via story board selects the threshold Value.In certain aspects, can be based on the use information (for example, metadata) for being fed to shot classification device or based on by shot classification The information that device the is calculated given threshold T (for example, dynamically) during processing in itself₁、T₂、T₃And T₄One of or one with On.

Referring now to Figure 33, Figure 33 is shown to be determined to be used for video or for encoding for the shot classification based on selected frame State the process C of the coding parameter of video.At frame 3370, process C determines whether selected frame is classified as unexpected scene changes. If answer is yes, at frame 3371, present frame is categorized as to unexpected scene changes, and I frames can be encoded a frame as and can Determine GOP borders.If answer is no, process C proceeds to frame 3372；If present frame is classified as slowly varying scene A part, then the present frame in slowly varying scene and other frames can be encoded to at frame 3373 prediction frame (for example, P Frame or B frames).Process C proceeds to frame 3374, at frame 3374, checks whether present frame is classified as including camera flash Flash lamp scene.If answer is yes, it can recognize that frame is used for specially treated at frame 3375, for example, removing, replicating previous Frame, or encode the particular factor for the frame.If answer is no, without present frame any classification and can be according to it Its standard encodes selected frame, and selected frame is encoded into I frames or discarding.Can implementation process C in the encoder.

In above-mentioned aspect, by between frame difference metric D instructions frame to be compressed and two adjacent frames of the frame Measures of dispersion.If detecting significant unidirectional luminance delta, it represents the cross compound turbine effect in frame.Cross compound turbine is got over Significantly, bigger gain can be realized by using B frames.In certain aspects, using being shown in such as equation 5 below 7 through repairing The frame difference metric changed：

If Y_P-Δ≥Y_C≥Y_N+ Δ or Y_P+Δ≤Y_C≤Y_N-Δ, (57)

Wherein d_P=| Y_C-Y_P| and d_N=| Y_C-Y_N| it is respectively the luminance difference and present frame between present frame and former frame Luminance difference between next frame, Δ represents the constant that can be determined in normal experiment (because it may depend on implementation Scheme), and α is the weight variable with the value between 0 and 1.

If it is observed that the consistent trend of brightness change and warping strength is sufficiently large, then modified frame difference metric D₁Only Different from initial frame difference metric D.D₁Equal to or less than D.If the change of brightness is stable (d_P=d_N), then it is modified Frame difference metric D₁Less than initial frame difference metric D, minimum ratio is (1- α).

Table 1 below is shown by adding the performance improvement that unexpected Scene change detection is obtained.Non- scene changes (NSC) with In scene changes (SC) situation, the total number of I frames is approximately the same.In NSC situations, I frames are uniformly distributed in whole sequence, and In SC situations, I frames are only assigned to unexpected scene change frame.

It can be seen that 0.2~0.3dB improvement generally can be achieved in terms of PSNR.Analog result is shown：Story board detector exists It is very accurate in camera lens event mentioned above to determine.Simulation exhibition to five chips with normal cross compound turbine effect Show：In the case of Δ=5.5 and α=0.4,0.226031dB PSNR gains are realized under same bit rate.

Sequence measurement	Bit rate (kbps)	Average QP	PSNR(dB)
				Animation NSC	226.2403	31.1696	35.6426
Animation SC	232.8023	29.8171	36.4513
				Music NSC	246.6394	32.8524	35.9337
Music SC	250.0994	32.3209	36.1202
				Title NSC	216.9493	29.8304	38.9804
Headline news SC	220.2512	28.9011	39.3151
				Basketball NSC	256.8726	33.1429	33.5262
Basketball SC	254.9242	32.4341	33.8635

Table 1：The analog result of unexpected Scene change detection

Adaptive gop structure

One illustrative example of adaptive gop structure operation is described below.The operation may include in GOP points of Figure 41 2 In cutter 412.Although can force a regular texture, MPEG2 (older video compression standard) does not require that GOP has a rule Structure.MPEG2 sequences are always started with I frames, i.e. the frame encoded in the case of without reference to prior images.It is generally logical The spacing crossed in the GOP of fixed P images or prognostic chart picture after an i frame arranges MPEG2GOP forms in advance at encoder.P Frame is the image for having given fractional prediction from previous I images or P images.Frame between the I frames of starting and follow-up P frames is encoded For B frames." B " frame (B represents two-way) can use previous and ensuing I images or P images individually or simultaneously as reference.With Number in the position of coding I frames can averagely exceed the number for the position for being used to encode P frames；Equally, for the number for the position for encoding P frames The number for the position for being used to encode B frames can averagely be exceeded.If using the frame being skipped, the frame can be used for without using any position It is represented.

Using P frames and B frames and (closer to compression algorithm in) jump of frame one has an advantage that, it is possible to reduction is regarded Keep pouring in defeated size.When time redundancy is higher (for example, when there is small change between image), P images, B images or it is skipped The use of image effectively represents video flowing, because previous decoded I images or P images later serve as the other P images of decoding Or the reference of B images.

Groups of pictures dispenser adaptively coded frame to minimize time redundancy.Difference between quantized frame and to warp The difference of quantization perform after suitable test it is automatic make by I frames, P frames, B frames or the frame that is skipped represent image certainly Plan.The processing in GOP dispensers is aided in by other operations of preprocessor 202, the processing provides filtering to make an uproar Sound is removed.

Adaptive coding process has the unavailable advantage in " fixation " cataloged procedure.In fixed process is ignored The possibility of small change has occurred in appearance；However, self-adapting program allow to insert more B frames between every I frames and P frames or Between two P frames, the number of the position for fully representing frame sequence is thereby reduced.On the contrary, (such as) is in fixed cataloged procedure In, when the change in video content is more significant, because the difference between prediction frame and reference frame is too big, the efficiency of P frames It is greatly reduced.Under the described conditions, the object of matching may be dropped out from motion search area, or due to by camera angle Change the distortion caused and reduce the similarity between the object of matching.Adaptive coding process is advantageously used for optionally It is determined that when P frames should encoded.

In system disclosed herein, the type of condition described above is sensed automatically.It is described herein adaptive It is the change that is flexible and making it suitable for content to answer cataloged procedure.The frame difference metric of adaptive coding process assessment one, The frame difference metric can be considered as the measurement to the distance between the frame with identical additivity distance property.In concept On, if frame F₁、F₂And F₃With interframe apart from d₁₂And d₂₃, then it is assumed that F₁With F₃The distance between be at least d₁₂+d₂₃.Such Like progress frame appointment on the basis of the measurement of distance and other measurements.

GOP dispensers 412 are operated by the way that image type is assigned into frame when a frame is received.Image type indicates available In each piece of Forecasting Methodology of coding：

The coded I pictures under without reference to other images.Because I images are independent, it provides deposit in a stream Take a little, decoding can be started at the access point.If " distance " to the previous video frames of frame exceedes scene change threshold, by I Type of coding is assigned to the frame.

Previous I images or P images can be used to carry out motion compensated prediction for P images.P images are used in preceding field or frame Can from be just predicted block movement block as coding basis.After reference block is subtracted from just considered block, generally Carry out coded residual block using the discrete cosine transform for eliminating spatial redundancy.If frame and the assigned last frame for P frames it Between " distance " exceed Second Threshold of typically smaller than first threshold, then P type of codings are assigned to the frame.

B two field pictures can carry out motion compensation using previous and ensuing P images or I images as described above.Can before To, the backward or bidirectionally block in prediction B images；Or can be in the case of without reference to other frames to described piece of progress frame in volume Code.In h .264, reference block can be the linear combination of up to 32 blocks from up to 32 frames.If frame can not be assigned For I types or P types, if being more than typically smaller than described second from the frame to " distance " of the previous video frames abutted of the frame 3rd threshold value of threshold value, then be assigned as B types by the frame.If frame can not be assigned as to becoming encoded B frames, by institute State frame and be assigned as " frame-skipping " state.This frame is can skip, because it is actually the duplicate of former frame.

The Part I that the measurement for quantifying the difference between consecutive frame with display order is this processing is assessed, it is betided In GOP dispensers 412.This measurement is distance mentioned above；The appropriate type of each frame is estimated with this measurement.Cause This, the gap variable between I frames and adjacent P frames or between two successive P frames.The measurement is calculated by with block-based Motion compensator processing frame of video start, although other block sizes of such as 8 × 8,4 × 4 and 8 × 16 are possible, but for regarding The block of the base unit of frequency compression generally comprises 16 × 16 pixels.For being presented in being made up of two release of an interleaves for output Frame, motion compensation is carried out based on field, the search of reference block is occurred rather than in field in frame to occur.For current One piece in first of frame, a forward direction reference block is found in the field of frame after the current frame；Equally, in close proximity in current field A backward reference block is found in the field of frame before.The current block is combined into compensated field.The process is with the of frame Two continuation.Two compensated fields of combination compensate frame to form forward and backward.

For the frame created in anti-telecine process 406, to the search of reference block can only based on frame because Only produce reconstructed film frame.Obtain two reference blocks and two difference (forward direction with backward), thus also produce it is preceding to and after To compensation frame.In a word, motion compensator produces the motion vector and difference measurement for each piece.Note, before assessing To depending on difference or backward difference, block in just considered field or frame is with most preferably matching described piece of (described piece of block In previous field or frame or in close proximity in field thereafter or frame) between assess the difference of measurement.Only brightness value participates in this meter Calculate.

Therefore motion compensation step produces two groups of differences.The difference is between the block with present intensity value and has Between brightness value in the reference block obtained from the frame for being close proximity to before present frame and being close proximity in time after present frame.For Each pixel in block determines the absolute value of each forward difference and each backward difference and added up to respectively on whole frame each Person.When release of an interleave NTSC of the processing comprising frame, two kinds of summations include two fields.In this approach, obtain forward difference and after To total absolute value SAD of difference_PAnd SAD_N。

For each frame, SAD ratios are calculated using following relation,

Wherein SAD_PAnd SAD_NRespectively total absolute value of forward difference and backward difference.Small positive number ε is added to point Son to prevent " be divided by zero " error.Similar ε items are added to denominator, further reduces and works as SAD_POr SAD_NClose to zero When γ sensitiveness.

In an alternative aspect, difference can be that SSD (sum of the difference of two squares) and SAD (sum of absolute difference) or SATD (wherein, lead to Cross and the block of pixel value is converted to block application two-dimension discrete cosine transform before the difference in obtaining block element).Although at it It can be used less area in its aspect, but it is described and assessed in the area of effective video.

Also calculate the brightness histogram of each frame (non-motion compensated) received.The histogram to the 16 of coefficient × DC coefficients (that is, (0,0) coefficient) (if it is available) in 16 arrays are operated, and the array is the block application to brightness value The result of two-dimension discrete cosine transform.Equally, the average values of 256 values of the brightness in 16 × 16 pieces can be used for histogram In.For the image that brightness and depth is eight, interval number is set to 16.Ensuing metric evaluation histogram difference

More than, N_PiFor the number of the block of the former frame in the i-th interval, and N_ciFor from belong in the i-th interval work as The number of the block of previous frame, N be frame in block total number.

The intermediate result is combined as follows to form current frame difference metric

Wherein γ_CFor the SAD ratios based on present frame, and γ_PFor the SAD ratios based on former frame.If scene has smooth fortune Dynamic and its brightness histogram hardly changes, then M ≈ 1.If present frame shows unexpected scene changes, γ_CWill big and γ_P Should be small.Using thanWithout γ is used alone_CTo cause measurement through the regular Activity Level for turning to context.

Data flow 4100 in Figure 40 illustrates the specific components that can be used for calculating frame difference metric.Preprocessor 4125 will be handed over The frame of wrong field (under the video situation with NTSC sources) and film image is (when video source is the result of anti-telecine process When) it is delivered to bidirectional motion compensator 4133.Bidirectional motion compensator 4133 by by field splitting into the block of 16 × 16 pixels and All 16 × 16 pieces in each piece of defined area with the field of former frame are compared and to the field (or in film video source Situation under frame) operated.Selection provides the block of best match and subtracts described piece from current block.Obtain the difference Absolute value and the aggregate result in 256 pixels comprising current block.When all current blocks to the field carry out this operation and Then when two fields are carried out with this operation, output SAD is calculated by backward difference module 4137_N(backward difference measurement).Can be by Forward difference module 4136 performs similar program.Forward difference module 4136 uses the frame being close proximity in time before present frame As the source of reference block to draw SAD_P(forward difference measurement).Although carrying out estimation procedure using recovered film frame, Same estimation procedure is also betided when input frame is formed in anti-telecine process.Can be in histogram difference module 4141 Form the histogram for the calculating that can be used for completing frame difference metric.The average value of block-based brightness assigns every one 16 × 16 pieces It is interval to one.This information is formed by following：All 256 pixel brightness values are added in one piece, by 256 to it Carry out regular (if necessary) and be incremented by the interval counting for being equipped with average value.Each frame through advance motion compensation is entered Row is once calculated, and when new present frame is reached, the histogram for present frame becomes the histogram for former frame.By straight The number difference of block in square figure difference module 4141 and regular described two histograms are defined with being formed by equation 59 λ.The result is combined in frame difference combiner 4143 to assess the current frame difference defined in equation 60, the frame difference group Clutch 4143 uses knot in the middle of being obtained in histogram difference module 4141, forward and backward difference module 4136 and 4137 Really.

Can by hardware, software, firmware, middleware, microcode or its any combinations implementing procedure Figure 41 00 system and its Component or step.Each functional unit (including preprocessor 4135, bidirectional motion compensator 4133, the forward direction of flow chart 4100 And backward difference measurement module 4136 and 4137, histogram difference module 4141 and frame difference metric combiner 4143) can be achieved For independent assembly, in the component that another device is incorporated into as hardware, firmware, middleware, or it is micro- with what is performed on a processor Code or software are practiced, or its combination.When being practiced with software, firmware, middleware or microcode, it can be appointed performing Procedure code or the code section of business are stored in the machine-readable medium of such as storage media.Code section can represent process, function, sub- journey Sequence, program, routine, subroutine, module, software encapsulation, class, or instruction, any combinations of data structure or program statement.It can lead to Cross transmission and/or receive information, data, independent variable, parameter or memory content and code section is coupled to another yard of section or hardware Circuit.

Received and processing data can be stored in storage media, and the storage media may include that (such as) chip is configured Formula storage media (for example, ROM, RAM) or the dish-type storage media (for example, magnetic or optical) for being connected to processor. In some aspects, combiner 4143 can contain some or all storage medias.The explanation of flow chart 4200 in Figure 41 will pressure Process of the contracting type assignment to frame.On the one hand in M, the current frame difference defined in equation 3 is to be used to assign relative to frame The basis for all decision-makings made.When decision block 4253 is indicated：If it is considered that in the sequence of frame system one in the first frame, be labeled as The decision path for being proceeds to frame 4255, and it is I frames thereby to declare the frame.The frame difference of accumulation is set as in frame 4257 Zero, and initial block 4253 is arrived in process return (in frame 4258).If considered frame is not the first frame in a sequence, mark No path is designated as since the frame 4253 made decision, and it is poor for scene change threshold test present frame in test block 4259 It is different.If current frame difference is more than the threshold value, proceeds to frame 4255 labeled as the decision path for being, again lead to I frames Assign.If current frame difference is less than scene change threshold, no path proceeds to frame 4261, in frame 4261, by present frame Difference is added to the frame difference of accumulation.

Continue the flow chart, at decision block 4263, by the frame difference of accumulation and typically smaller than scene change threshold Threshold value t compares.If the frame difference of accumulation is more than t, control is transferred to frame 4265, and frame is assigned as into P frames；Then in frame The frame difference of accumulation is reset to zero in 4267.If the frame difference of accumulation is less than t, control to be transferred to frame from frame 4263 4269.Current frame difference is compared with the τ less than t in frame 4269.If current frame difference is less than τ, refer in frame 4273 The frame is skipped by group；If current frame difference is more than τ, frame is assigned as β frames.

In an alternative aspect, another frame encoding complexity indicator M* is defined as

M*=M × minimum (1, α maximums (0, SAD_P- s) × maximum (0, MV_P-m)), (61)

Wherein α is conversion factor (scaler), SAD_PFor the SAD with forward motion compensation, MV_PFor from preceding to fortune The sum of the length measured in the pixel of the motion vector of dynamic compensation, and s and m is to work as SAD_PLess than s or MV_PFrame is compiled during less than m Code complexity designator is reproduced as zero two number of threshold values.Present frame in the flow chart 4200 that Figure 41 is replaced using M* is poor It is different.As can be seen, only when it is preceding go out slow sport rank to motion compensation shows when, M* just be different from M.In this situation, M* Less than M.

It should be noted that story board detection described herein and encoding context can be described by as process, the process is retouched It is depicted as flow chart (flowchart, flow diagram), structure chart or block diagram.Although the flow chart shown in figure will can be grasped It is described as a progressive process, but many operations can be performed in parallel or concurrently.In addition, the order of operation can be rearranged.When complete Into process operation when, typically end up the process.Process may correspond to method, function, program, subroutine, subprogram etc..When When process corresponds to function, it terminates the return for corresponding to the function to call function or principal function.

Those skilled in the art will be further appreciated that can rearrange this paper institutes in the case where not influenceing the operation of device One or more elements of the device of announcement.Similarly, this paper institutes can be combined in the case where not influenceing the operation of device One or more elements of the device of announcement.Be understood by those skilled in the art that, can be used a variety of different technologies and Any one of skill and technique represents information and multi-medium data.Those skilled in the art will be further understood that, with reference to herein Various illustrative components, blocks, module and the algorithm steps that disclosed example is been described by can be embodied as electronic hardware, firmware, meter Calculation machine software, middleware, microcode or its combination.For this interchangeability of clear explanation hardware and software, the above is substantially in function Property aspect describe various Illustrative components, block, module, circuit and step.The feature is implemented as hardware or software Depending on application-specific and force at the design constraint of whole system.For each application-specific, those skilled in the art The method that can change implements described feature, but the implementation decision should not be interpreted as causing and run counter to disclosed side The scope of method.

For example, the method or algorithm with reference to described by story board detection disclosed herein and encoding example and schema The step of can be embodied directly in hardware, in the software module of computing device, or both combination in.Methods described and calculation Method is particularly suitable for use in the communication technology, and it includes video to mobile phone, computer, laptop computer, PDA and all types of Personal and business correspondence device is wirelessly transferred.Software module can reside within RAM memory, flash memory, ROM memory, Eprom memory, eeprom memory, register, hard disk, removeable disk, CD-ROM or known in the art are any In the storage media of other forms.Exemplary storage medium is coupled to processor, to cause processor to be read from storage media Information and by information to write-in storage media.In alternative embodiments, storage media can be integral with processor.Processor and deposit Storage media can reside within application specific integrated circuit (ASIC).The ASIC can reside within modem.In alternate embodiment In, processor and storage media can be resided in modem as discrete component.

In addition, the multiple declaration logical block, component, module and circuit with reference to described by examples disclosed herein can use Following device is practiced or performed：General processor, digital signal processor (DSP), application specific integrated circuit (ASIC), scene Programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic, discrete hardware components or its warp Design to perform any combinations of functionality described herein.General processor can be microprocessor, but in alternative embodiments, Processor can be any conventional processor, controller, microcontroller or state machine.Processor can also be embodied as computing device Combination, for example, the combination of DSP and microprocessor, the combination of multi-microprocessor, one or more microprocessors and DSP The joint of core, or any other such configuration.

Being previously described so that any those skilled in the art can make or use to disclosed example is provided Disclosed method and apparatus.It will be readily apparent to those skilled in the art that the various modifications to the example, and this paper institutes The principle of definition is applicable to other examples and can be in the case where not departing from the spirit or scope of disclosed method and apparatus Add additional element.Description to the aspect is not intended to limit the scope of claims it is intended that illustrative.

Claims

1. a kind of method for handling multi-medium data, it is included：

Receive digital interlaced frames of video；

The digital interlaced frames of video is converted into by digital progressive video frame by digital interlaced frames of video described in release of an interleave,

Wherein, the release of an interleave includes：

Produce for the spatial information of the digital interlaced frames of video and at least one in the digital interlaced frames of video The movable information of frame；And

The digital progressive video frame is produced using the spatial information and the movable information；

The Pixel Information corresponding to the first frame wherein produced using both the spatial information and movable information, compared to immediately in The unused movable information after first frame and the Pixel Information corresponding to the second frame for producing, are applied in more power Weight.

2. according to the method described in claim 1, wherein release of an interleave is further included：

Produce the bi-directional motion information for the digital interlaced frames of video；And

The digital interlaced frames of video is based on using the bi-directional motion information and produces the digital progressive.

3. according to the method described in claim 1, wherein changing the digital interlaced frames of video includes anti-telecine process 3:2 Pulldown video frame.

4. according to the method described in claim 1, it further includes and the digital progressive is sized.

5. according to the method described in claim 1, it, which is further included, spends the noise filter filtering digital progressive.

6. according to the method described in claim 1, further comprise：Metadata is produced based on converted digital progressive video frame；

Coding parameter is determined based on the metadata；And, the digital progressive video frame is encoded according to the coding parameter.

7. a kind of equipment for handling multi-medium data, it is included：

Receiver, it is configured to receive digital interlaced frames of video；

Deinterlacer, it is configured to digital interlaced frames of video described in release of an interleave and is converted into the digital interlaced frames of video Digital progressive video frame, wherein, the release of an interleave includes：

8. equipment according to claim 7, it further includes encoder, and the encoder is configured to receive the number Word progressive video frame and the compression information that is produced according to dispenser encode the digital progressive, and the dispenser is configured to Produce the metadata associated with the digital progressive video frame.

9. equipment according to claim 7, it, which is further included, is used to carry out denoising to the digital progressive video frame Denoising acoustic filter.

10. equipment according to claim 7, wherein the deinterlacer includes anti-teleciner.

11. equipment according to claim 7, it further includes and is configured to the digital progressive video frame line by line The adjusted size of resampler of frame of video.

12. equipment according to claim 7, wherein deinterlacer are configured to

The digital interlaced frames of video is based on using the bi-directional motion information and produces the digital progressive video frame.

13. equipment according to claim 7, further comprises dispenser, the dispenser is configured to：Produce and the number The associated metadata of word progressive video frame, by the digital progressive video frame and the metadata be supplied to encoder for The coding of the digital progressive video frame, wherein the metadata includes compression information.

14. a kind of equipment for handling multi-medium data, it is included：

Reception device, it is used to receive digital interlaced frames of video；

Conversion equipment, its be used for by digital interlaced frames of video described in release of an interleave by the digital terleaved video be converted into numeral by Row frame of video, wherein, the release of an interleave includes：

15. equipment according to claim 14, wherein the conversion equipment includes anti-teleciner.

16. equipment according to claim 14, it is further comprising sampling device is refetched, and the sampling device that refetches is used to refetch Sample to progressive frame to be sized.

17. equipment according to claim 14, it further includes code device, and the code device is used for using being carried The metadata associated with the digital progressive video frame supplied encodes the digital progressive video frame.

18. equipment according to claim 14, it further includes denoising apparatus, and the denoising apparatus is used for institute State digital progressive video frame denoising.

19. equipment according to claim 14, wherein the conversion equipment is configured to：Produce and interlock for the numeral The bi-directional motion information of frame of video；And using the bi-directional motion information be based on the interlaced frames of video and produce it is described numeral by Row frame of video.

20. equipment according to claim 14, further comprises：It is associated with the digital progressive video frame for producing Metadata device；And for by the digital progressive video frame and at least partly metadata be supplied to encoder with In the device of the coding of the digital progressive video frame, wherein coding parameter is determined based at least part metadata.

21. a kind of processor, is configured for：

Receive digital interlaced frames of video；

Wherein, the release of an interleave includes：