CN101313592A

CN101313592A - Methods and device for data alignment with time domain boundary

Info

Publication number: CN101313592A
Application number: CN200680044013.2A
Authority: CN
Inventors: 维贾雅拉克希米·R·拉韦恩德拉恩; 戈登·肯特·沃克; 田涛; 帕尼库马尔·巴米迪帕蒂; 石方; 陈培松; 希塔拉曼·加纳帕蒂·苏布拉马尼亚; 塞伊富拉·哈立德·奥古兹
Original assignee: Qualcomm Inc
Current assignee: Qualcomm Inc
Priority date: 2005-09-27
Filing date: 2006-09-27
Publication date: 2008-11-26
Anticipated expiration: 2026-09-27
Also published as: CN101313588B; CN101313580A; CN101313592B; CN101313589A; CN101313588A; UA92368C2; ES2371188T3

Abstract

Apparatus and methods of using content information for encoding multimedia data are described. A method of processing multimedia data includes receiving content information of multimedia data, and encoding the multimedia data for aligning the data boundary and frame boundary in time domain, wherein the coding being based on the content information. In other aspects, a method for processing multimedia data includes obtaining content classification of multimedia data and coding blocks in the multimedia data into intraframe coding block or interframe coding block based on the content classification to increase error recovery by coding multimedia data. The invention further claims a device capable of processing the multimedia data described in the method.

Description

Be used for carrying out the method and apparatus of data alignment with the time domain border

Advocate priority according to 35 U.S.C. § 119

Present application for patent is advocated the priority based on following application case: (a) the 60/721st, No. 416 temporary patent application case that is entitled as " being used for the video code translator (A VIDEO TRANSCODER FOR REAL-TIMESTREAMING AND MOBILE BROADCAST APPLICATIONS) that real-time crossfire and mobile broadcast are used " of application on September 27th, 2005; (b) the 60/789th, No. 377 temporary patent application case that is entitled as " being used for the video code translator (A VIDEO TRANSCODER FOR REAL-TIME STREAMING AND MOBILEBROADCASTAPPLICATIONS) that real-time crossfire and mobile broadcast are used " of application on April 4th, 2006; (c) the 60/727th, No. 643 provisional application case that is entitled as " being used for method and apparatus (METHOD AND APPARATUS FOR SPATIO-TEMPORAL DEINTERLACING AIDED BYMOTION COMPENSATION FOR FIELD-BASED VIDEO) " of application on October 17th, 2005 by the auxiliary space-time release of an interleave of motion compensation based on the video of field; (d) the 60/727th, No. 644 provisional application case that is entitled as " method and apparatus (METHOD AND APPARATUS FOR SHOT DETECTION IN VIDEO STREAMING) that is used for carrying out Shot Detection " of application on October 17th, 2005 at video streaming; (e) the 60/727th, No. 640 provisional application case that is entitled as " method and apparatus (A METHOD AND APPARATUS FOR USING ANADAPTIVE GOP STRUCTURE IN VIDEO STREAMING) that is used for using the adaptability gop structure " of application on October 17th, 2005 at video streaming; (f) the 60/730th, No. 145 provisional application case that is entitled as " based on the reverse telecine algorithm (INVERSE TELECINEALGORITHM BASED ON STATE MACHINE) of state machine " of application on October 24th, 2005; (g) the 60/789th, No. 048 provisional application case of being entitled as of on April 3rd, 2006 application " be used for based on the multi-medium data of field by the auxiliary space-time release of an interleave (SPATIO-TEMPORAL DEINTERLACING AIDED BY MOTION COMPENSATIONFOR FIELD-BASED MULTIMEDIA DATA) of motion compensation ".All these 7 parts of temporary patent application cases all transfer the assignee of the present invention and are incorporated herein with way of reference clearly.

Reference to co-pending patent application case

The 11/373rd of being entitled as of present application for patent and on March 10th, 2006 application " being used for the classifying content (CONTENT CLASSIFICATION FOR MULTIMEDIA PROCESSING) that multimedia is handled ", No. 577 U.S. patent application case is relevant, and described U.S. patent application case transfers the assignee of the present invention and is incorporated herein with way of reference clearly.

Technical field

The application's case is directed to equipment and the method that is used for the video data that is used for real-time crossfire is carried out the video code conversion, and more particularly, is directed in mobile broadcast is used the video data that is used for real-time crossfire is carried out code conversion.

Background technology

Because limited bandwidth resources and the variable cause of available bandwidth, it is useful that effective video is compressed in for example many multimedia application such as wireless video crossfire and visual telephone.Some video encoding standard (for example MPEG-4 (ISO/IEC), H.264 (ITU) or similar video coding) provides and is fit to very much the high efficient coding that for example radio broadcasting etc. is used.Some multi-medium datas (for example, Digital Television presents) are encoded according to other standards such as for example MPEG-2 usually.Therefore, before radio broadcasting, use the transcoder will be according to (for example, MPEG-2) the multi-medium data code conversion of coding or be converted to another standard (for example, H.264) of a standard.

Improve rate optimized codec and can provide advantage aspect error resilient, mistake recovery and the scalability.In addition, use the information of determining according to multi-medium data self also to can be coding additional improvement is provided, recover and scalability comprising error resilient, mistake.Therefore, a kind of providing the efficient processing of multi-medium data and the transcoder of compression is provided, and it uses the information of determining according to multi-medium data self, has scalability, and have error resilient, use with many multi-medium datas of the mobile broadcast that is used for comprising the crossfire multimedia messages.

Summary of the invention

Code conversion equipment that the invention of describing and illustrating is content-based and each in the method all have some aspects, wherein are not that single aspect is responsible for its required attribute separately.Under situation about not limiting the scope of the invention, existing with brief discussion its than notable attribute.Consider this discuss after and especially reading after title is the part of " embodiment ", how to provide improvement with the feature of this content driven code conversion of understanding at multi-medium data treatment facility and method.

Invention described herein aspect relates to content information is used in a plurality of modules or assembly of the several different methods of encoded multimedia data and encoder (encoder that for example, uses in transcoder).Transcoder can use content information to coordinate the code conversion multi-medium data.Described content information can (for example, the metadata that receives with video) receive from another source.Transcoder can be configured to produce content information by various different disposal operations.In certain aspects, transcoder produces the classifying content of multi-medium data, then uses described classifying content in one or more cataloged procedures.In certain aspects, the content driven transcoder can be determined the room and time content information of multi-medium data, and uses the compression/position distribution carrying out the homogeneous quality coding of perception of content and carry out content-based classification on channel of described content information.

In certain aspects, obtain or calculate the content information (for example, metadata, content are measured and/or classifying content) of multi-medium data, and follow its assembly that is provided to transcoder to be used to handling multi-medium data to encode.For instance, preprocessor can use certain content information to carry out the scene change-detection, thereby (for example carry out reverse telecine (" IVTC "), release of an interleave, motion compensation and noise suppression, the 2D wavelet transformation) and the space-time noise subdue (for example, false image drift removes, decyclization, remove piece and/or denoising sound).In certain aspects, it is downsampled that preprocessor also can use content information to carry out spatial resolution, for example determining suitable " safety " and " effect is handled " zone during to 1/4th Video Graphics Arrays (QVGA) from single-definition (SD) is downsampled.

In certain aspects, encoder comprises the content, classification module that is configured to calculate content information.Described encoder can use classifying content (for example to carry out bit rate control, the position is distributed) quantization parameter (QP), estimation of determining each MB are (for example, carry out color motion estimation (ME), carry out motion vector (MV) prediction), at the scalability and the error resilient that provide aspect basal layer and the enhancement layer, described error resilient comes impact prediction hierarchy and error resiliency schemes (comprising (for example) adaptability frame refreshing, boundary alignment process) and provides redundant I frame data to obtain in enhancement layer by using classifying content.In certain aspects, transcoder and data multiplexer are used classifying content to keep the optimum multimedia quality of data on channel.In certain aspects, encoder can use content category message to appear in the encoded data to allow Fast Channel to switch to force I frame period property.This type of embodiment also can be utilized the I piece that may need for the error resilient purpose in the encoded data, and making can be by predicting that effective combining random access switching of hierarchy and error resilient (based on (for example) classifying content) are to improve code efficiency when increasing anti-mistake.

In one aspect, a kind of method of handling multi-medium data comprises: the content information that obtains multi-medium data; And the encoded multimedia data, so that in time domain, data boundary is aimed at frame boundaries, the content-based information of wherein said coding.But described content content classification.Obtain content information and can comprise the content information of calculating from multi-medium data.In some cases, data boundary comprises I frame data border.Data boundary also can be the border of the decodable encoded data of independence of multi-medium data.In some cases, data boundary comprises slice boundaries.Data boundary also can be the access unit border of intraframe coding.Data boundary also can be P frame boundaries or B frame boundaries.Content information can comprise the complexity of multi-medium data, and described complexity can comprise time complexity, space complexity or time complexity and space complexity.

In another aspect, a kind of equipment that is used to handle multi-medium data comprises: the classifying content device, and it is configured to determine the classifying content of multi-medium data; And encoder, it is configured to multi-medium data is encoded, so that in time domain, data boundary is aimed at frame boundaries, and the content-based information of wherein said coding.

In another aspect, a kind of equipment that is used to handle multi-medium data comprises: the device that is used to obtain the content information of multi-medium data; And be used for the encoded multimedia data so that the device of data boundary being aimed at frame boundaries in time domain, the content-based information of wherein said coding.

In another aspect, processor is configured to: the content information that obtains multi-medium data; And the encoded multimedia data, so that in time domain, data boundary is aimed at frame boundaries, the content-based information of wherein said coding.

Comprise a kind of machine-readable medium that comprises instruction on the other hand, described instruction causes machine when carrying out: the content information that obtains multi-medium data; Reach the encoded multimedia data so that in time domain, data boundary is aimed at frame boundaries, the content-based information of wherein said coding.

Comprise a kind of method of handling multi-medium data on the other hand, it comprises: the classifying content that obtains multi-medium data; Reaching based on described classifying content is Intra-coded blocks or inter-coded block with the block encoding in the multi-medium data, to increase the error resilient of encoded multi-medium data.

In aspect another, a kind of equipment that is used to handle multi-medium data comprises: the classifying content device, and it is configured to obtain the classifying content of multi-medium data; And encoder, it is configured to based on described classifying content is Intra-coded blocks or inter-coded block with the block encoding in the multi-medium data, to increase the error resilient of encoded multi-medium data.

Comprise a kind of processor on the other hand, it is configured to: the classifying content that obtains multi-medium data; And be that Intra-coded blocks or inter-coded block are to increase the error resilient of encoded multi-medium data with the block encoding in the multi-medium data based on described classifying content.

In another aspect, a kind of equipment that is used to handle multi-medium data comprises: the device that is used to obtain the classifying content of multi-medium data; And to be used for based on described classifying content be Intra-coded blocks or inter-coded block with the device of the error resilient that increases encoded multi-medium data with the block encoding of multi-medium data.

Comprise a kind of machine-readable medium that comprises instruction on the other hand, described instruction causes machine when carrying out: the classifying content that obtains multi-medium data; And be that Intra-coded blocks or inter-coded block are to increase the error resilient of encoded multi-medium data with the block encoding in the multi-medium data based on described classifying content.

Description of drawings

Figure 1A is the block diagram that comprises the media broadcasting system of the transcoder that is used for carrying out code conversion between the different video form.

Figure 1B is the block diagram that is configured to the encoded multimedia data and the encoder of encoded first data set and encoded second data set is provided.

Fig. 1 C is the block diagram that is configured to the processor of encoded multimedia data.

Fig. 2 is the block diagram of example of transcoder of the system of Fig. 1.

Fig. 3 is the flow chart of the operation of the parser that uses in the transcoder of key diagram 2.

Fig. 4 is the flow chart of the operation of the decoder that uses in the transcoder of key diagram 2.

Fig. 5 is the system sequence figure of explanation by the sequence of operation of the transcoder execution of Fig. 2.

Fig. 6 illustrates the operation of the preprocessor that can use and the flow chart of function sequence in the transcoder of Fig. 2.

Fig. 7 is the block diagram of exemplary twice encoder that can use in the transcoder of Fig. 2.

Fig. 8 illustrates the example of subject diagram, the aspect how its explanation makes texture value and motion value be associated with classifying content.

Fig. 9 is the flow chart that explanation is used for the example operation of classifying content (for example the encoder at Fig. 7 uses).

Figure 10 is the flow chart of the operation of explanation rate controlled (for example the encoder with Fig. 7 uses).

Figure 11 is the flow chart of the operation of the exemplary exercise estimator of explanation (for example the encoder with Fig. 7 uses).

Figure 12 is the flow chart of the operation of explanation example mode decision-making encoder functionality (for example the encoder with Fig. 7 uses).

Figure 13 is the flow chart of example operation of the scalability of the explanation encoder of realizing being used for Fig. 7.

Figure 14 is the flow chart of the example operation of the realization rate distortion data flow that takes place in the encoder of Fig. 7 of explanation (for example).

Figure 15 is the curve chart of the relation between explanation codec complexity, branch coordination and the human vision quality.

Figure 16 is the curve chart of the non-linear scene detection formula of explanation.

Figure 17 A is the flow chart of process, and described process is used for the encoded multimedia data so that data boundary is aimed at the frame boundaries of time domain, and the content-based information of wherein said coding.

But Figure 17 B is the high-level block diagram of the encoding device of process illustrated among execution graph 17A and Figure 17 C.

Figure 17 C is the flow chart that is used for the process of encoded multimedia data, and wherein content-based classification is encoded to Intra-coded blocks or inter-coded block with described multi-medium data.

Figure 18 is the figure that the release of an interleave process of motion estimation/compensation is used in explanation.

Figure 19 is the block diagram of multimedia communications system.

Figure 20 is the figure of the tissue of the video bit stream in explanation enhancement layer and the basal layer.

Figure 21 illustrates the figure that aims at that cuts into slices with video frame boundary.

Figure 22 is the block diagram of explanation prediction hierarchy.

Figure 23 is that the content-based information of explanation is come the process flow diagram of the method for encoded multimedia data.

Figure 24 is that the content-based information level of explanation is come the process flow diagram of encoded multimedia data with the method on aligned data border.

Figure 25 is the safety effect zone of explanation Frame and the diagram of safe Title area.

Figure 26 is the diagram in the safety effect zone of explanation Frame.

Figure 27 is that explanation uses the adaptability frame refreshing to come the process flow diagram of the process of encoded multimedia data based on content of multimedia information.

Figure 28 is that explanation uses redundant I frame to come the process flow diagram of the process of encoded multimedia data based on content of multimedia information.

Figure 29 illustrates the motion compensation vector M V between present frame and the previous frame _PAnd the motion compensation vector M V between present frame and the next frame _N

Figure 30 is the process flow diagram of explanation Shot Detection.

Figure 31 is the process flow diagram of explanation basis of coding layer and enhancement layer.

Figure 32 is the schematic diagram of explanation coding macro zone block.

Figure 33 is the schematic diagram that explanation is used for the module of basis of coding layer and enhancement layer.

Figure 34 shows the example of basal layer and enhancement layer coefficient selector process.

Figure 35 shows another example of basal layer and enhancement layer coefficient selector process.

Figure 36 shows another example of basal layer and enhancement layer coefficient selector process.

Figure 37 is that the content-based information of explanation is come the process flow diagram of encoded multimedia data.

Figure 38 is the figure of explanation possible system decision-making in the reverse telecine process.

Figure 39 illustrates and treats in the macro zone block by going the piece process to come the border of filtering.

Figure 40 is the figure of explanation space-time release of an interleave process.

Figure 41 illustrates the example of the heterogeneous resampling of 1D.

Figure 42 is the flow chart of the example of the adaptability gop structure in the explanation video streaming.

Note that at appropriate location in some views of described accompanying drawing, same numbers refers to identical parts all the time.

Embodiment

Below describe in detail and be directed to some aspect of discussing in the present invention.Yet, can implement the present invention by many different modes.Mentioning " aspect " or " on the one hand " in this specification means special characteristic, structure or the characteristic described in conjunction with described aspect and is included at least one aspect.Phrase " in one aspect " appears in many places in described specification, " aspect one " or " in certain aspects " may not all refer to identical aspect, neither with others repel mutually separately or alternative aspect.In addition, the various features that others represent can and be can't help in description by some aspect.Similarly, description may be some aspect but not be the various requirement of the requirement of others.

Below describe and comprise that a plurality of details are to provide the thorough understanding to described example.Yet the those skilled in the art recognizes, even this paper not process in description or illustrated example or the aspect or each details of device also can be put into practice described example.For instance, can in the block diagram of each electrical connection of illustrated components not or each electric device, show electric assembly, in order to avoid obscure described example with unnecessary details.In other cases, but this class component of detail display, other structure and technology with the described example of further explanation.

The present invention relates to use the content information of the multi-medium data that just is being encoded to control the equipment and the method for coding and code conversion.(multi-medium data) " content information " or " content " are the broad terms that means the information relevant with the content of multi-medium data, and can comprise (for example) metadata, measure (for example, classifying content) associated content relevant information from measuring of calculating of multi-medium data with one or more.Decide according to application-specific, can content information is provided or determine content information to encoder by encoder.Content information can be used for many aspects of multi-medium data coding, comprise scene change-detection, time handle, the space-time noise is subdued, downsampled, at quantification, scalability, error resilient determine bit rate, optimum multimedia quality and the Fast Channel kept on the broadcast channel switch.By using one or more in these aspects, the transcoder tunable is handled multi-medium data and the generation encoded multi-medium data relevant with content.The description and the graphic encoding context and the decoding aspect of also can be applicable to of code conversion aspect are described herein.

Transcoder equipment and method relate to from a kind of format code and transform to another form, and specifically describe herein and for relating to the MPEG-2 video code is transformed to the scalable H.264 form of enhancement mode for being transferred to mobile device via wireless channel, it illustrates some aspect.Yet the description of form is not intended to limit the scope of the invention to being transformed to the MPEG-2 video code H.264, and only is exemplary illustrated some aspect of the present invention.Equipment that is disclosed and the method efficient configuration of error resilient coding that provides support with arbitrary access and vertical resolution, and also can be applicable to code conversion and/or coding except that MPEG-2 with the video format H.264.

This paper employed " multi-medium data " or abbreviation " multimedia " are the broad terms that comprises video data (it can comprise voice data), voice data or video data and both audio." video data " or " video " that this paper uses as broad terms refers to based on frame or based on the data of field, it comprises one or more images or associated picture sequence, comprise text, image information and/or voice data, and also can be used for (for example referring to multi-medium data, be used interchangeably described term), unless stipulate in addition.

Hereinafter describe transcoder various assemblies example and can use content information to come the example of the process of encoded multimedia data.

Figure 1A is the block diagram of data flow of some aspects of explanation multimedia data broadcast system 100.In system 100, multi-medium data supplier 106 is communicated to transcoder 200 with encoded multi-medium data 104.Encoded multi-medium data 104 is received by transcoder 200, at square frame 110 transit code devices 200 multi-medium data 104 is treated to original multi-medium data.Processing in the square frame 110 is decoded to encoded multi-medium data 104 and is analyzed, and further handles described multi-medium data to prepare that it is encoded to another form.Will be the decoding multimedia data be provided to square frame 112, multi-medium data is encoded to predetermined multimedia form or standard at square frame 112 places.In case multi-medium data is encoded, just prepare to transmit via (for example) wireless broadcast system (for example, cellular telephone broadcasts network or via alternative communication network) at square frame 114 places.In certain aspects, institute's receiving multimedia data 104 is encoded according to Moving Picture Experts Group-2.After the multi-medium data 104 of code conversion of decoding, transcoder 200 is encoded to H.264 standard with multi-medium data.

Figure 1B is the block diagram that can be configured to carry out the transcoder 130 of the processing in the

square frame

110 and 112 of Figure 1A.Transcoder 130 can be configured to receiving multimedia data, multi-medium data decoding and analysis (are for example flowed substantially for subpackage, captions, audio frequency, metadata, " original " video, CC data and demonstration time stamp), with the basic stream encryption of described subpackage is required form, and provides encoded data for further handling or transmission.Transcoder 130 can be configured to provide encoded data with two or more data sets (for example, first encoded data set and the second encoded data set), and this is called as hierarchical coding.In some examples of many aspects, each data set in the hierarchical coding scheme (or layer) can be encoded with the different quality grade, and it is formatd, make in first data set coded data compare and have than low quality (for example, when playing, providing lower visual quality grade) with coded data in second data set.

Fig. 1 C is the block diagram of processor 140, and processor 140 can be configured to multi-medium data is carried out code conversion, and can be configured to carry out the part or all of processing of being described in the

square frame

110 and 112 of Figure 1A.Processor 140 can comprise module 124a...n carrying out one or more in the code conversion process described herein (comprising decoding, analysis, preliminary treatment and coding), and uses content information to handle.Processor 140 also can comprise internal storage 122, and can be configured to directly or by another device communicate by letter with external memory storage 120 indirectly.Processor 140 also comprises communication module 126, it is configured to communicate by letter with one or more devices of processor 140 outsides, comprising receiving multimedia data and provide encoded data (for example in first data set coded data and in second data set coded data).In some examples of many aspects, each data set in the hierarchical coding scheme (or layer) can be encoded with the different quality grade, and it is formatd, make in first data set coded data compare and have than low quality (for example, when playing, providing lower visual quality grade) with coded data in second data set.

Assembly in transcoder 130 or the preprocessor 140 (it is configured to carry out code conversion) and the process that is included in wherein can be implemented by hardware, software, firmware, middleware, microcode or its any combination.For instance, parser, decoder, preprocessor or encoder can be independent assemblies, are incorporated in as hardware, firmware, middleware in the assembly of another device, or implement in microcode of carrying out on processor or the software, perhaps its combination.When in software, firmware, middleware or microcode, implementing, carry out the procedure code of motion compensation, shot classification and cataloged procedure or sign indicating number section and for example can be stored in the machine-readable medium such as medium.The sign indicating number section can be represented handling procedure, function, subprogram, program, routine, subroutine, module, software kit, kind, or any combination of instruction, data structure or program statement.Can a sign indicating number section be coupled to another yard section by transmission and/or reception information, data, independent variable, parameter or memory content.

The illustrative example of transcoder structure

Fig. 2 illustrates the block diagram of the example of the transcoder that can be used for transcoder 200 illustrated in the multi-media broadcasting system 100 of Fig. 1.Transcoder 200 comprises parser/decoder 202, preprocessor 226, encoder 228 and layer, sync 240, hereinafter will further be described.Transcoder 200 is configured to use the content information of multi-medium data 104 to carry out one or more aspects (as described herein) of code conversion process.Content information can obtain from the source that is positioned at transcoder 200 outsides or calculated by transcoder (for example, by preprocessor 226 or encoder 228) by the multimedia metadata.The explanation of the assembly showed among Fig. 2 can be included in uses content information to carry out assembly in the transcoder of one or more code conversion processes.In specific embodiments, one or more in the assembly of transcoder 200 can be got rid of, maybe additional assemblies can be comprised.In addition, the several portions of transcoder and code conversion process is described, even still can put into practice the present invention so that allow the those skilled in the art not describe at this paper under the situation of each details of process or device.

Fig. 5 illustrates that sequential chart is with the graphic extension as the time relationship of the operation of each assembly of transcoder 200 and/or process.As shown in Figure 5, zero (0) is located to receive encoded crossfire video 104 (encoded multi-medium data, for example MPEG-2 video) by parser 205 (Fig. 2) at first at any time.Next, to described video flowing analyze 501, demultiplexing 502 and decode 503, this for example carries out in conjunction with decoder 214 by parser 205.As described, the generation (having slight timing slip) that can walk abreast of these processes is so that be provided to preprocessor 226 (Fig. 2) with the stream output of deal with data.In time T ₁504 places, in case preprocessor 226 receives enough data to begin to export result from decoder 214, the residue treatment step just becomes and substantially carries out in order, wherein after preliminary treatment, 505, second times codings 506 of first pass coding take place in regular turn and encode 507 again, up in time T _f508 places finish till the coding again.

Transcoder 200 described herein can be configured to various multi-medium datas are carried out code conversion, and the many persons in the described process are applicable to that the multi-medium data to any kind carries out code conversion.Though examples more provided herein refer explicitly to the MPEG-2 data code is transformed to H.264 data, these examples do not plan to limit the invention to this data.Encoding context described below can be applicable to any suitable multi-medium data standard code is transformed to the multi-medium data standard that another is fit to.

Parser/decoder

Once more referring to Fig. 2, parser/decoder 202 receiving multimedia datas 104.Parser/decoder 202 comprises and transmits stream parser (" parser ") 205, its receiving multimedia data 104 and with described data analysis for video-frequency basic flow (ES) 206, audio ES 208, show that time stamp (PTS) 210 and for example captions 212 wait other data.ES carries one type the data (video or audio frequency) from single video or audio coder.For instance, video ES comprises the video data that is used for data sequence, and described data sequence comprises all subdivisions of sequence header and described sequence.Subpackage is flowed substantially or PES is made up of the single ES that makes a plurality of bags, and each of wherein said bag begins with additional bag header usually.PES stream only contains one type data from a source (for example, from a video or audio coder).The PES bag has variable-length, and described length does not correspond to the fixed packet length that transmits bag, and comparable transmission bag is much longer.Be surrounded by the place that begins of imitating load when forming from PES stream when transmitting bag, can follow closely the PES header to be placed on to transmit after transmitting the bag header.Residue PES bag content is filled the pay(useful) load that transmits bag continuously, till described PES bag all is used.Can (for example) be filled into regular length by filling up with byte (for example, byte=0xFF (all being 1)) will transmit at last to wrap.

Parser 205 is communicated to decoder 214 with video ES 206, and decoder 214 is parts of parser/decoder 202 shown here.In other configuration, parser 205 and decoder 214 are independent assemblies.PTS 210 is sent to transcoder PTS generator 215, and transcoder PTS generator 215 can produce the specific independent demonstration time stamp of transcoder 200 that is directed to be used to arrange the data for the treatment of to send to from transcoder 200 broadcast system.The layer, sync 240 that transcoder PTS generator 215 can be configured to data are provided to transcoder 200 with coordination data broadcasting synchronously.

The flow chart of an example of Fig. 3 explanation process 300 that parser 205 can be followed when analyzing out above-mentioned each subpackage and flow substantially.Process 300 begins at square frame 302 places, and this moment is from content provider 106 (Fig. 1) receiving multimedia data 104.Process 300 advances to square frame 304, carries out the initialization of parser 205 herein.Initialization can be triggered by the independent order 306 of obtaining that produces.For instance, the process that is independent of parser 205 and television schedule that receives based on the outside and channel battle array information can produce and obtain order 306.In addition, can import real-time transmission stream (TS) buffer descriptor 308 with auxiliary initialization and main the processing.

As illustrated in the square frame 304, initialization can comprise: obtain the command syntax checking; Carrying out first pass PSI/PSIP/SI (program customizing messages/program and system information protocol/system information) handles; Carry out specifically about obtaining the processing of order or PSI/PSIP/SI consistency checking; For each PES distributes the PES buffer; With setting timing (for example, in order to begin instantaneous the aligning) with required obtaining.The PES buffer is preserved through the ES data of analysis and with each and is communicated to respective audio decoder 216, test encoder 220, decoder 214 or transcoder PTS generator 215 through analysis ES data.

After initialization, process 300 advances to square frame 310 so that institute's receiving multimedia data 104 is mainly handled.Processing in the square frame 310 can comprise that target bag identifier (PID) filtering, continuous PSI/PSIP/SI monitor and processing, and timing process (for example, in order to realize required obtaining the period), makes that will enter multi-medium data is delivered in the suitable substance P ES buffer.Owing to handle multi-medium data in square frame 310, thereby produced the indication that program descriptor and PES buffer " read ", it will be situated between with decoder 214 (Fig. 2) as described below and connect.

After square frame 310, process 300 advances to square frame 314, and the termination of analysis operation takes place herein, comprising producing the timer interruption and discharging the PES buffer to avoid its consumption.Note that the PES buffer is used for existence all relevant basic stream, for example audio frequency, video and caption stream of the cited program of program descriptor.

Once more referring to Fig. 2, parser 205 sends to audio decoder 216 with corresponding to the transcoder embodiment with audio ES 208, and encoded text 216 is provided to layer, sync 240 and audio-frequency information is decoded.Caption information 212 is delivered to text code device 220.Also built-in captions (CC) data 218 of the device of self-demarking code in the future 214 are provided to text code device 220, and text code device 220 is encoded to the form that is worked by transcoder 200 with caption information 212 and CC data 218.

Parser/decoder 202 also comprises decoder 214, its receiver, video ES 206.Decoder 214 can produce the metadata that is associated with video data, with encoded video subpackage substantially stream be decoded as original video 224 (for example) with standard definition format, and handle the built-in caption data of video in the video ES stream.

Fig. 4 shows flow chart, an example of the decode procedure 400 that its explanation can be carried out by decoder 214.Process 400 is to begin at the basic flow data 206 of square frame 402 place's input videos.Process 400 advances to square frame 404, herein decoder is carried out initialization.Initialization can comprise many tasks, comprising: detect video sequence header (VSH); Carry out first pass VSH, video sequence (VS) and VS and show extension processing (comprising video format, primary colours and matrix coefficient); With the distribute data buffer to cushion decoding picture, the metadata that is associated and built-in captions (CC) data respectively.In addition, import the video PES buffer that provides by parser 205 and " read " information 406 (for example, it can be produced by process 300) in the square frame 310 of Fig. 3.

After square frame 404 places carried out initialization, process 400 advanced to square frame 408, and decoder 214 is carried out the main processing of video ES herein.The main processing comprises: come poll video PES buffer " to read " information or " interface " at the new data availability; Decoded video ES; Rebuild and the storage pixel data at the picture boundary; Synchronization video and a/v; Produce metadata and be stored in the picture boundary; With with the CC storage at the picture boundary.Main 408 the square frame as a result 410 handled comprises and produces sequence description symbol, decoding picture buffer descriptor, metadata buffer descriptor and CC data buffer descriptor.

Handle after 408 main, process 400 advances to square frame 412, herein its executive termination process.Described termination procedure can comprise: determine end condition, it is included in specific period above predetermined threshold and new data do not occur; Detect the terminal sign indicating number of sequence; And/or detect clear and definite termination signal.Termination procedure can further comprise discharge decoding picture, the metadata that is associated and CC data buffer with avoid hereinafter will describing by the consumption of preprocessor to it.Process 400 finishes at square frame 414 places, and it can enter wait with the state of receiver, video ES as input herein.

Preprocessor

Fig. 2 (and Fig. 6 is more detailed) illustrates and can use content information to carry out the sample aspect of the preprocessor 226 of one or more pretreatment operation.Preprocessor 226 receives the metadata 222 and " original " video data 224 of having decoded from parser/decoder 202.Preprocessor 226 is configured to the processing to video data 224 and metadata 222 some type of execution, and treated multimedia (for example, basal layer reference frame, enhancement layer reference frame, bandwidth information, content information) and video are provided to encoder 228.Can improve the vision clarity of data, anti-aliasing and compression efficiency to this processing of multi-medium data.In general, the video sequence that is provided by the decoder in parser/decoder 202 214 is provided for preprocessor 226, and described video sequence is converted in proper order video sequence for is further handled (for example, encoding) by encoder 228.In certain aspects, preprocessor 226 can be configured for use in many operations, (for example comprise reverse telecine process, release of an interleave, filtering, false image drift removes, decyclization, remove piece and denoising sound), (for example adjust size, spatial resolution is downsampled to 1/4th Video Graphics Arrays (QVGA) from single-definition) and gop structure generation (for example, computational complexity mapping generation, scene change-detection and decay/flash of light detect).

Preprocessor 226 can use the metadata from decoder to influence one or more in the described pretreatment operation.Metadata can comprise about, describe or the information (" content information ") of classification multimedia data contents; In particular, metadata can comprise classifying content.In certain aspects, metadata does not comprise the content information that encoding operation is required.Under this type of situation, preprocessor 226 can be configured to determine content information and use described content information to carry out pretreatment operation, and/or content information is provided to other assembly (for example, decoder 228) of transcoder 200.In certain aspects, preprocessor 226 can use this content information influence GOP cut apart, determine suitable type filtering and/or determine to be communicated to the coding parameter of encoder.

Fig. 6 shows the illustrative example that can be included in each procedure block in the preprocessor 226, and explanation can be by the processing of preprocessor 226 execution.In this example, preprocessor 226 receives metadata and videos 222,224, and the dateout 614 that will comprise (handling) metadata and video is provided to encoder 228.Usually, can receive three types video.The first, the video that is received can be a video in proper order, does not wherein need release of an interleave.The second, video data can be the video through telecine process, promptly from the terleaved video of 24fps film sequence conversion, and described in the case video.The 3rd, video can be the terleaved video without telecine process.Preprocessor 226 can be handled the video of these types as described below.

At square frame 601 places, preprocessor 226 determines that whether the video data that received 222,224 is video in proper order.In some cases, if metadata comprises this information, this can determine from metadata so, or definite by video data self being handled come.For instance, reverse telecine process process described below can determine that whether the video that received 222 is video in proper order.If process advances to square frame 607 so, herein video is carried out filtering (for example, noise suppressor) operation for example to reduce noise such as white Gaussian noise.If at square frame 601 place's video datas 222,224 is not video in proper order, process advances to square frame 604 and arrives phase detectors 604 so.

Phase detectors 604 are distinguished video that originates from telecine process and the video that begins with the standard broadcasting form.If make the decision-making (leave the "Yes" decision path of phase detectors 604) of video, in reverse telecine process 606, will turn back to its initial format so through the video of telecine process through telecine process.Redundant frame is also eliminated in identification, and will rearrange from the field that same frame of video derives and be complete image.Because the film image sequence through rebuilding is carried out photographic recording at interval with 1/24 second rule, thereby use through the image of reverse telecine process but not the motion estimation process carried out in GOP dispenser 612 or decoder 228 through the data of telecine process when irregular (its have base) is more accurate.

In one aspect, phase detectors 604 are made some decision-making after receiving frame of video.These decision-makings comprise: (i) whether whether this video be five phase place P shown in Figure 38 from telecine process output and the drop-down phase place of 3:2 ₀, P ₁, P ₂, P ₃And P ₄In one; (ii) video produces as conventional NTSC.Described decision-making is represented as phase place P ₅These decision-makings show the output as the phase detectors shown in Fig. 2 604.The path that is labeled as "Yes" from phase detectors 604 starts reverse telecine process 606, thereby indicates it to possess correct drop-down phase place, makes it can select from the field of same photographs formation and with its combination.Like the class of paths that is labeled as "No" of phase detectors 604, start deinterlacer 605 apparent NTSC frame is divided into a plurality of to carry out optimization process.Because can receive dissimilar videos at any time, so phase detectors 604 sustainable analysis frame of video.As illustration, the video that meets the NTSC standard can be inserted in the described video as commercial advertisement.After reverse telecine process, with gained in proper order video send to the noise suppressor (filter) 607 that can be used for reducing white Gaussian noise.

When recognizing conventional ntsc video (from the "No" path of phase detectors 601), it is transferred to deinterlacer 605 to compress.Deinterlacer 605 is transformed to video in proper order with interlaced field, and can be then to video execution denoising sound operation in proper order.The illustrative example that release of an interleave is handled is hereinafter described.

Reproduce video as traditional analog video-units such as TVs with interlace mode, i.e. the scan line (even field) of this type of device transmission even-numbered and the scan line (odd field) of odd-numbered.According to the sample of signal viewpoint, this equates the space-time subsample that carries out with the pattern of describing by following equation:

Wherein Θ represents the initial frame picture, and F represents interlaced field, and (x, y, n) level of difference remarked pixel, vertical and time location.

Under the situation of harmless versatility, can suppose that n=0 is even field all the time in the present invention, make above equation 1 be reduced to

Owing on horizontal dimensions, do not extract, thereby can in next n～y coordinate, describe the subsample pattern.

The target of deinterlacer is terleaved video (sequence) is transformed to noninterlace frame (frame sequence) in proper order.In other words, interpolation even number and odd field are with " recovery " or generation full frame image.This can be represented by equation 3:

F wherein _iThe release of an interleave result of pixel is lost in expression.

Figure 40 is that explanation uses Wmed filtering and estimation to produce the block diagram of some aspect of the one side of the deinterlacer 605 of frame in proper order from staggered multi-medium data.The top part of Figure 40 is showed exercise intensity mapping 4002, can produce described exercise intensity mapping 4002 from the information when front court, two preceding field (PP field and P field) and two subsequent fields (next and next down field) by use.Exercise intensity mapping 4002 is sorted out present frame or is divided into two or more different motion levels, and can produce by space-time filtering, hereinafter describes in further detail.In certain aspects, produce exercise intensity mapping 4002 with identification static region, slow moving region and rapid movement zone, equation 4 to 8 is described below with reference to.Space-time filter (for example, Wmed filter 4004) uses based on the standard of exercise intensity mapping to come staggered multi-medium data is carried out filtering, and produces the interim release of an interleave frame of space-time.In certain aspects, the Wmed filtering relates to the time neighborhood of horizontal neighbors [1,1], vertical neighborhood [3,3] and five opposite fields, described five opposite fields by five illustrated among Figure 40 fields (PP field, P field, when the front court, next, next down) expression, wherein Z ^-1The delay of a field of expression.With respect to working as the front court, next and P field are non-parity field, and the PP field is a parity field with following next." neighborhood " that be used for space-time filtering refers to the field of actual use during filtering operation and the room and time position of pixel, and can be illustrated as " aperture " shown in (for example) Fig. 6 and Fig. 7.

Deinterlacer 605 also can comprise noise suppressor (denoising sound filter) 4006, and it is configured to the interim release of an interleave frame of space-time that is produced by Wmed filter 4004 is carried out filtering.The interim release of an interleave frame of space-time is carried out the denoising sound make that the subsequent motion search procedure is more accurate, especially when the staggered multi-medium data sequence in source is polluted by white noise.It also can remove even number line in the Wmed picture and the vacation picture between the odd-numbered line at least in part.Noise suppressor 4006 can be embodied as various filters, comprises the noise suppressor based on wavelet shrinkage and small echo Wei Na (Wiener) filter.Noise suppressor is used in and uses motion compensation information to remove noise from it before candidate Wmed frame is further handled, and the signal that exists of the noise that exists in the removable Wmed frame and keeping and no matter the frequency content of signal how.Can use various types of denoising sound filters, comprising wavelet filter.Small echo is a class function that is used at the given signal of spatial domain and location, convergent-divergent territory.Small echo based on basic thought be with different scale or resolution analysis signal, make that the less variation in the Wavelet representation for transient produces corresponding less variation in initialize signal.

Wavelet shrinkage or small echo Weiner filter also can be used as noise suppressor.Wavelet shrinkage is made up of the wavelet transformation of noise signal, subsequently less wavelet coefficient is retracted to zero (or littler value), and bigger coefficient is remained unchanged.At last, carry out reciprocal transformation to obtain estimated signal.

The filtering of denoising sound has improved the accuracy of the motion compensation in the noisy environment.Wavelet shrinkage denoising sound can relate to the contraction in the wavelet transformed domain, and comprises three steps usually: linear forward wavelet transform, nonlinear shrinkage denoising sound and linear inverse wavelet transform.Weiner filter is the MSE optimum linear filter that can be used for improving by additional noise image of degradation with bluring.This type of filter is known in this technology substantially, and " by the ideal space adaptability (Ideal spatial adaptation by wavelet shrinkage) of wavelet shrinkage " and the S.P. Hei Er (S.P.Ghael) that above quote (for example), A.M. " via the improvement small echo denoising sound (Improvement Wavelet denoising via empirical Wiener filtering) of experience Wiener filtering " (SPIE procceedings of Sai Yide (A.M.Sayeed) and R.G. Baranik (R.G.Baraniuk), the 3169th volume, the the 389th to 399 page, Santiago, in July, 1997) describe in, the latter incorporates this paper into way of reference in full clearly.

In certain aspects, denoising sound filter is based on the one side of (4,2) biorthogonal cubic B-spline wavelet filter.This type of filter can be defined by following forward and reciprocal transformation:

h (z) = \frac{3}{4} + \frac{1}{2} (z + z^{- 1}) + \frac{1}{8} (z + z^{- 2})

(positive-going transition) [4]

With

g (z) = \frac{5}{4} z^{- 1} - \frac{5}{32} (1 + z^{- 2}) - \frac{3}{8} (z + z^{- 3}) - \frac{3}{32} (z^{2} + z^{- 4})

(reciprocal transformation) [5]

The application of denoising sound filter can increase the accuracy of the motion compensation in the noisy environment.Can great (D.L.Donoho) in the D.L. road and I.M. this (I.M.Johnstone) " by the ideal space adaptability (Idealspatial adaptation by wavelet shrinkage) of wavelet shrinkage " (biostatistics of being done by force, the 8th volume, the the 425th to 455 page, 1994) in further describe the embodiment of this type of filter, the full text of described document is incorporated herein with way of reference clearly.

The bottom part explanation of Figure 40 is used for the aspect of the movable information (for example, motion vector candidates, estimation, motion compensation) of definite staggered multi-medium data.In particular, Figure 40 account for motion estimates and motion compensated schemes, it is used to produce selected frame motion-compensated frame in proper order temporarily, and then with the interim frame combination of Wmed to form gained " finally " frame (being shown as present frame 4014) in proper order through release of an interleave.In certain aspects, be provided to deinterlacer from will interlock motion vector (" the MV ") candidate (or estimate) of multi-medium data of external movement estimator, and use it that starting point of bi-directional motion estimation device and compensator (" ME/MC ") 4018 is provided.In certain aspects, MV candidate selector 4022 is at definite MV of the MV candidate use contiguous block of just processed piece, for example MV of previous processing block (for example, at the piece in the previous frame 4020 of release of an interleave).Can be based on before carrying out motion compensation through the frame 70 of release of an interleave and next (for example, future) Wmed frame 4008 are two-way.Current Wmed frame 4010 and motion-compensated (" MC ") present frame 4016 are merged or combination by combiner 4012.Gained provided through the present frame 4014 of release of an interleave (be now frame) in proper order get back to ME/MC 4018 with as previous frame 4020, and also outwards be communicated to deinterlacer 605 for subsequent treatment through release of an interleave.

Can will comprise the release of an interleave prediction scheme and the intrafield interpolation decoupling of interpolation between the field by Wmed and MC release of an interleave scheme.In other words, space-time Wmed filtering can be mainly used in the intrafield interpolation purpose, and can carry out interpolation between the field during the motion compensation.This has reduced Wmed result's Y-PSNR, but the visual quality of using after the motion compensation is more desirable, because will remove the bad pixel that produces owing to predictive mode decision-making between inaccurate field from the Wmed filtering.

After suitable reverse telecine or release of an interleave are handled,, video is in proper order handled to suppress false picture and resampling (for example, adjusting size) at square frame 608 places.In aspect some resamplings, implement heterogeneous resampler picture size is carried out size adjustment.In a downsampled example, the ratio between initial picture and the picture through adjusting size can be p/q, and wherein p and q are relatively prime integers.The phase place total number is p.In certain aspects, for the adjustment size factor that is about 0.5, the cut-off frequency of multiphase filter is 0.6.Cut-off frequency does not mate fully with adjustment size ratio, so that improve through adjusting the high-frequency response of big or small sequence.This allows some aliasings inevitably.Yet, well-knownly be that than fuzzy but do not have the picture of aliasing, human eyes are preferred distinctness but slightly by the picture of aliasing.

Figure 41 illustrates the example of heterogeneous resampling, and it shows that adjusting size quantitatively is ³/ ₄The time phase place.The cut-off frequency that illustrates among Figure 41 also is ³/ ₄With vertical axis initial pixel is described above in graphic.Be that the center SIN function of drawing is represented filter shape also with described axis.Because it is quantitatively identical that cut-off frequency is chosen as with resampling, so adjusting the size zero point and the locations of pixels overlapping (in Figure 41, illustrating) of SIN function afterwards with cross.In order to find the pixel value of adjusting after the size, can such as in the following equation displaying add up to from initial pixel as described in effect:

v (x) = Σ_{i = - \infty}^{\infty} u (i) \times \sin c ({πf}_{c} (i - x)) - - - [6]

F wherein _cBe cut-off frequency.Above 1D multiphase filter can be applicable to horizontal dimensions and vertical dimensions.

Resampling (adjust size) be used to solve overscanning on the other hand.In the ntsc television signal, image has 486 scan lines, and can have 720 pixels on each scan line in digital video.Yet, because the cause of the mismatch between described size and the screen format, be not entire image all on TV as seen.Sightless image section is called overscanning.

In order to help the broadcasting station that useful information is placed in the visible zone of TV as much as possible, film and Television Engineer association (SMPTE) have defined the concrete size of the action frame that is called safety effect zone and safe Title area.See also the SMPTE recommended practice RP27.3-1989 on " being used for the safety effect of television system and the specification of safe title domain test pattern (Specifications for SafeAction and Safe Title Areas Test Pattern for Television Systems) ".SMPTE is the zone of " all remarkable effects must take place " with the safety effect zone definitions.With the zone of safe title zone definitions for " can retrain all useful informations " to guarantee on most of domestic TV receivers as seen.

For instance, referring to Figure 25, safety effect zone 2510 occupies screen center 90%, reserves 5% frame all around.Safe Title area 2505 occupies screen center 80%, reserves 10% frame.Now referring to Figure 26, because safe Title area is so little so that can not add more contents in image, so some radio station are placed on text in the safety effect zone, the safety effect zone is in white rectangle window 2615.

Usually can in overscanning, see dark border.For instance, in Figure 26, dark border appears at the upside 2620 and downside 2625 places of image.Can in overscanning, remove these dark border, because H.264 video uses the border to extend in estimation.The dark border of extending can increase residual error.Conservatively say, the border can be excised 2%, and then adjust size.Therefore, can produce the filter that is used to adjust size.Execution is blocked to remove overscanning heterogeneous before downsampled.

Referring to Fig. 6, video then advances to square frame 610 in proper order once more, carries out herein and goes piece and decyclization operation.Two types vacation picture appears in video compression applications usually, i.e. " one-tenth piece " and " Cheng Huan ".Becoming the false picture of piece is because compression algorithm is divided into a plurality of (for example, 8 * 8 pieces) with each frame.Each piece of being rebuild all has some minor errors, and the wrong of edge that is positioned at a piece form contrast with the mistake of the edge that is positioned at adjacent block usually, thereby makes block boundary as seen.On the contrary, become the false picture of ring to appear as the edge distortion on every side that is positioned at characteristics of image.Becoming the false picture of ring to occur is because encoder has abandoned too many information in quantizing high-frequency DCT coefficient.In some illustrative example, go piece and decyclization can use low pass FIR (finite impulse response (FIR)) filter to hide these visible false pictures.

In the example that decyclization is handled, except any edge of the edge that is positioned at the frame boundaries place and inactive block elimination filtering process, de-blocking filter can be applicable to all 4 * 4 block edges of frame.Should carry out this filtering after finishing the frame building process on the macro zone block basis, wherein all macro zone blocks are all treated so that increase the macro zone block address in the frame.For each macro zone block, at first from left to right vertical edge is carried out filtering, and then from the top to the bottom, horizontal edge is carried out filtering.As shown in figure 39,, brightness block elimination filtering processes are carried out at four 16 sample edges, and block elimination filtering process at each chromatic component is carried out at two 8 sample edges for horizontal direction and vertical direction.May be should be with the input of the block elimination filtering process of the current macro zone block of opposing by the sample value that is positioned at the current macro zone block top and the left side that the piece process operation revises of going to previous macro zone block, and can carry out during the filtering it further being revised to current macro zone block.Modified sample value can be used as the input of the horizontal edge of same macro zone block being carried out filtering during vertical edge is carried out filtering.Can be separately call the piece process for brightness and chromatic component.

In the example that decyclization is handled, can with the 2D filter adaptation be applied to make near the edge segment smoothing.Edge pixel stand less filtering or without undergoing filtering so that avoid fuzzy.

The GOP dispenser

After removing piece and decyclization, handle video in proper order by GOP dispenser 612.The GOP location can comprise that detector lens changes, produces complexity mapping (for example, time, spatial bandwidth mapping) and adaptability GOP is cut apart.These steps are hereinafter described one by one.

A. scene change-detection

When the frame that Shot Detection relates in definite set of pictures (GOP) represents the data that the scene variation has appearred in indication.In general, in GOP, in any two or three (or more) contiguous frames, described frame may not have marked change, maybe may have slow variation or variation fast.Certainly, decide, can further these scenes be changed where necessary and resolve into higher change level according to concrete the application.

The efficient coding that detector lens or scene change for video is important.Usually, when GOP does not have marked change, be positioned at I frame heel that GOP begins to locate, make to the subsequent decoding of described video and be presented at visually and can accept with the abundant encoded video of many predictive frames.Yet, when scene changes suddenly or lentamente, may need extra I frame and less predictive coding (P frame and B frame) to produce the acceptable result visually of subsequent decoding.

Shot Detection and the coded system and the method for the performance of improving existing coded system are hereinafter described.These a little aspects can (Fig. 7) be implemented in the GOP of preprocessor 226 dispenser 612, or are included in the encoder apparatus that can operate under the situation that has or do not have preprocessor.This a little aspects utilization comprises that the statistic (or measuring) of the statistical comparison between the contiguous frames of video data has determined whether to take place that unexpected scene changes, whether scene slowly changes, or whether has the camera flash that may make video coding complicated especially in the scene.Described statistic can obtain and then send to code device from preprocessor, or it can produce (for example, by the processor generation that is configured to carry out motion compensation) in code device.The gained statistic is assisted the decision-making of scene change-detection.In the system that carries out code conversion, there are suitable preprocessor or configurable processor usually.If preprocessor is carried out the auxiliary release of an interleave of motion compensation, motion compensation statistic so can with and be ready to for use.In this type systematic, the Shot Detection algorithm may increase system complexity a little.

The illustrative example of Shot Detection device described herein only needs to be used to the statistic from previous frame, present frame and next frame, and therefore has the low-down stand-by period.The Shot Detection device is distinguished some dissimilar camera lens incidents, and change, intersect and desalinate and other scene variation slowly comprising unexpected scene, and camera flash.By in encoder, determining the dissimilar of camera lens incident, can strengthen code efficiency and visual quality with Different Strategies.

The scene change-detection can be used for any video coding system at it, to save the position intelligently by inserting the I frame with fixed intervals.In certain aspects, the content information that is obtained by preprocessor (for example, be incorporated in the metadata or by preprocessor 226 calculate) can be used for the scene change-detection.For instance, decide, can dynamically adjust threshold value described below and other standard at dissimilar video contents according to described content information.

Usually structuring set of pictures (GOP) is carried out video coding.GOP with intracoded frame (I frame) beginning, follows a series of P (prediction) or B (two-way) frame thereafter usually.Usually, the I frame can be stored in order to show all data of described frame needs, the B frame depended on previous and subsequently the data in the frame (for example, only contain from previous frame change or with different data of data the next frame), and the P frame contains the data that changed from previous frame.In using usually, the I frame is dispersed in the encoded video with P frame and B frame.At size (for example, being used for the bits number of coded frame) aspect, the I frame is more much bigger than P frame usually, and the P frame is again greater than the B frame.In order to carry out efficient coding, transmission and decoding processing, the length of GOP answers long enough reducing the loss in efficiency from big I frame, and enough short in mismatch or channel impairment between prevention encoder and the decoder.In addition, because same cause can be carried out intraframe coding to the macro zone block in the P frame (MB).

The scene change-detection can be used for video encoder and inserts the I frame with definite suitable GOP length and based on described GOP length, and is not to insert common unwanted I frame with fixed intervals.In practical crossfire video system, usually damage communication channel owing to bit-errors or packet loss.But place quality and the viewing experience of the position appreciable impact of I frame or I MB through decoded video.An encoding scheme is that intracoded frame is used for from the picture or the picture part of juxtaposed previous picture or the marked change of picture part.Usually, these districts can not effectively and efficiently predict by estimation, and if this type of district exempted and used inter-frame coding (for example, using B frame and P frame to encode), can encode more efficiently so.Under the situation of channel impairment, those districts may suffer error propagation, can reduce or eliminate (or almost eliminating) error propagation by intraframe coding.

A plurality of parts of GOP video can be categorized as two or more kind, and wherein each district all can have the interior coding standard of different frame of the particular of can be depending on.As an example, video can be categorized as three kinds: unexpected scene changes, intersects and desalinate and other scene variation slowly, and camera flash.

Unexpected scene changes the frame that comprises with previous frame significantly different (being caused by camera operation usually).Because the content of these frames is different from the content of previous frame, thereby unexpected scene change frame should be encoded to the I frame.

Intersection desalination and other scene slowly change the slow switching that comprises scene, and the Computer Processing by shooting causes usually.The mixing gradually of two different scenes seems that more human eye is liked, but video coding has been proposed challenge.Motion compensation can not effectively reduce the bit rate of those frames, and can be the more interior MB of these frame updates.

When camera flash or camera flash incident occur in content frame and comprise camera flash.Duration of these a little flashes of light relatively lack (for example, a frame), and this type of flash of light is extremely bright, make the pixel of describing described flash of light in the frame show high brightness unusually the respective regions on contiguous frames.Camera flash changes the brightness of picture suddenly and apace.Usually the duration of camera flash is covered the duration (it is normally defined 44ms) weak point than human visual system's (HVS) time.Human eye is also insensitive to the quality of the brightness of these short time outbursts, and therefore can slightly encode to it.Because flash frames can not effectively be handled and it is the bad predicting candidate person of future frame by motion compensation, so the thick coding of these frames can not reduce the code efficiency of future frame.Because the cause of " manually " high brightness, the scene that is categorized as flash of light shall not be applied to other frame of prediction, and for the same reason, other frame can not be effective to predict these frames.In case be identified, just can take out these frames, because described frame may need a large amount of relatively processing.An option is to remove the camera flash frame and in position locate encoding D C coefficient; This way is simple, calculating is quick and save many positions.

When in the frame that detects above kind any one, promptly declare the camera lens incident.Shot Detection not only helps improving coding quality, but and its also aid identification video contents search and index.An illustrative aspect of scene detection process is hereinafter described.In this example, the Shot Detection process at first is just processed selected frame computing information or measures to carry out Shot Detection.Described measuring can comprise from the bi-directional motion estimation of video and information and other measuring based on brightness of compensation deals.

In order to carry out bi-directional motion estimation/compensation, available bi directional motion compensation device carries out preliminary treatment to video sequence, the bi directional motion compensation device make each 8 * 8 of present frame with the most contiguous consecutive frame in two frames (pass by for, and being following) in the piece coupling.Motion compensator produces the motion vector and the difference measure of each piece.Figure 29 is the example that mates of (or next) the frame N that shows the pixel make present frame C and past frame P and future and motion vector (the past motion vector MV that describes matched pixel _PWith following motion vector MV _N) explanation.Hereinafter referring to the general description of Figure 32 general description to bidirectional motion vector generation and correlative coding.

In definite bidirectional-movement information (for example, discern the movable information of the MB (optimum Match) in the corresponding contiguous frames) afterwards, can produce by the various comparisons (for example, by the motion compensator in the GOP dispenser 612 or another suitable assembly) of present frame and next frame and previous frame and additionally measure.Motion compensator can produce the difference measure of each piece.Difference measure can be difference of two squares summation (SSD) or absolute difference summation (SAD).Under the situation of harmless versatility, use SAD herein as an example.

For each frame, following calculating SAD is than (also being called " contrast ratio "):

γ = \frac{ϵ + {SAD}_{P}}{ϵ + {SAD}_{N}} - - - [6]

SAD wherein _PAnd SAD _NIt is respectively the absolute difference summation of forward and reverse difference measure.It should be noted that denominator contains little positive number ε to prevent the mistake of " removing by zero ".Molecule also contains ε with 1 effect of the numeral in the balance denominator.For instance, if previous frame, present frame and next frame are identical, motion search should draw SAD so _P=SAD _N=0.In the case, more than calculate generator γ=1, rather than 0 or infinitely great.

Can be each frame and calculate brightness histogram.Usually, multi-media image has the brightness and depth (for example, " group of frequencies " number) of 8 positions.The brightness and depth that is used for according to some aspect calculating brightness histograms can be set at 16 to obtain histogram.In others, brightness and depth can be set at suitable numeral, described numeral can be depending on the type of just processed data, available rated output or other preassigned.In certain aspects, can based on measure (for example content of data) calculating or received dynamically set brightness and depth.

An example of brightness histogram difference (λ) is calculated in following equation explanation:

λ = Σ_{i = 1}^{16} | N_{Pi} - N_{Ci} | / N - - - [17]

N wherein _PiBe the piece number that is used for i group of frequencies of previous frame, and N _CiBe the piece number that is used for i group of frequencies of present frame, and N is the piece total number in the frame.If the brightness histogram difference of previous frame and present frame different fully (or non-intersect), λ=2 so.

By using this information, following calculating frame difference is measured (D):

D = \frac{γ_{C}}{γ_{P}} + Aλ (2 λ + 1) - - - [8]

Wherein A is the constant that application program is selected,

γ_{C} = \frac{ϵ + {SAD}_{P}}{ϵ + {SAD}_{N}},

And

γ_{P} = \frac{ϵ + {SAD}_{PP}}{ϵ + {SAD}_{C}} .

Satisfy the standard shown in the equation 9 if described frame difference is measured, will select (current) frame classification so is unexpected scene change frame:

D = \frac{γ_{C}}{γ_{P}} + Aλ (2 λ + 1) &GreaterEqual; T_{1} - - - [9]

Wherein A is the constant that application program is selected, and T ₁It is threshold value.

In an example, simulation shows: set A=1 and T ₁=5 realize the good detection performance.If present frame is unexpected scene change frame, γ so _CShould be bigger, and γ _PShould be less.Can use ratio

Replace independent γ _C, the feasible activity level that described standard of measurement is turned to described situation.

It should be noted that above standard is with nonlinear way use brightness histogram difference (λ).Figure 16 illustrates that λ * (2 λ+1) is convex function.When λ less (for example, near zero), it almost can not preemphasis.It is big more that λ becomes, and increasing the weight of that described function carried out is just many more.By this preemphasis, for greater than any λ of 1.4, if with threshold value T ₁Be set at 5, detect unexpected scene so and change.

Satisfy the standard shown in the equation 5 if scene intensity is measured D, determine that so present frame is to intersect to desalinate or slowly scene variation:

T ₂≤D＜T ₁ [10]

It is used for the successive frame of given number, wherein T ₁Be and the identical threshold value of above used threshold value, and T ₂It is another threshold value.

Flash events causes brightness histogram to be displaced to brighter side usually.In the camera, the brightness histogram statistic is used for determining whether present frame comprises camera flash aspect this illustrative.Whether the brightness that the Shot Detection process can be determined present frame is than big certain threshold value T of brightness of previous frame ₃, and whether the brightness of present frame is than the big threshold value T of brightness of next frame ₃, as shown in equation 11 and 12:

Y _C-Y _P≥T ₃ [11]

Y _C-Y _N≥T ₃ [12]

If discontented be enough to standard, present frame be not categorized as so and comprise camera flash.If satisfy described standard, the Shot Detection process is determined reverse difference measure SAD so _PWith forward difference measure SAD _NWhether greater than certain threshold value T ₄, as illustrating in the following equation:

SAD _P≥T ₄ [13]

SAD _N≥T ₄ [14]

Y wherein _CBe the mean flow rate of present frame, Y _PBe the mean flow rate of previous frame, Y _NBe the mean flow rate of next frame, and SAD _PAnd SAD _NBe forward and the reverse difference measure that is associated with present frame.

Whether the Shot Detection process determines the camera flash incident greater than the brightness of previous frame and the brightness of next frame by the brightness of at first determining present frame.If be not more than, so described frame is not the camera flash incident; If but greater than, it may be the camera flash incident so.The Shot Detection process can estimate that then whether reverse difference measure is greater than threshold value T ₃With the forward difference measure whether greater than threshold value T ₄If satisfy this two conditions, the Shot Detection process is categorized as present frame and has camera flash so.If do not satisfy described standard, be not the camera lens incident of any kind so, or it can be given default categories with described frame classification, coding (for example, abandon frame, be encoded to the I frame) that described frame is carried out is treated in described default categories identification.

Above show T ₁, T ₂, T ₃And T ₄Some exemplary values.Usually, the particular that detects by testing lens is selected these threshold values.In certain aspects, described threshold value T ₁, T ₂, T ₃And T ₄In one or more being scheduled to, and these a little values are incorporated into the shot classification device in the code device.In certain aspects, described threshold value T ₁, T ₂, T ₃And T ₄In one or morely can be during handling (for example, dynamically) be fed to the information (for example, metadata) of shot classification device or set based on the information of calculating by shot classification device self based on use.

Usually in encoder, carry out and use Shot Detection information to come encoded video, but describe coding herein video for the integrality of Shot Detection announcement.Referring to Figure 30, cataloged procedure 301 can use Shot Detection information to come encoded video based on the institute's detector lens in the described frame sequence.Process 301 advances to square frame 303, and checks to check whether present frame is classified as unexpected scene and changes.If, present frame can be encoded to the I frame and can determine the GOP border so at square frame 305 places.If not, process 301 advances to square frame 307 so; If present frame is classified as the part of slow scene change,, other frame in present frame and the described slow scene change can be encoded to predictive frame (for example, P frame or B frame) so at square frame 309 places.Process 301 then advances to square frame 311, and it checks whether present frame is classified as the flashing scene that comprises camera flash herein.If, can discern described frame to carry out special processing (for example, the DC coefficient that removes or encode and be used for described frame) so at square frame 313 places; If not, do not carry out the classification of present frame so and according to other standard present frame (be encoded to the I frame or abandon) of encoding.

In aspect above-mentioned, the measures of dispersion between frame to be compressed and its contiguous two frames is measured the D indication by frame difference.Change if detect the unidirectional brightness of significant quantity, mean so in described frame, to exist to intersect and desalinate effect.The desalination that intersects is obvious more, and is just many more by using the obtainable gain of B frame.In certain aspects, as shown in following equation, use modified frame difference to measure:

D wherein _P=| Y _C-Y _P| and d _N=| Y _C-Y _N| be respectively luminance difference between present frame and the previous frame and the luminance difference between present frame and the next frame, Δ is represented constant, described constant can determine in standard test, because it can decide according to embodiment, and α is the weight variable with the value between 0 to 1.

B. the bandwidth mapping produces

Preprocessor 226 (Fig. 6) also can be configured to produce the bandwidth mapping that can be used for the encoded multimedia data.In certain aspects, the content, classification module 712 (Fig. 7) in the encoder 228 replaces producing the bandwidth mapping.

Human vision quality V can be codec complexity C and the function that divides coordination B (also being called bandwidth).Figure 15 is the curve chart of this relation of explanation.It should be noted that codec complexity measures C and consider the room and time frequency from the human vision viewpoint.For the more sensitive distortion of human eye, complexity is worth corresponding higher.Usually can suppose that V descends about C is dull, and increase about B is dull.

In order to realize constant visual quality, with bandwidth (B _i) be assigned to i object (frame or MB) to be encoded, the standard of being explained in two equatioies that follow closely below it is satisfied:

B _i＝B(C _i，V) [16]

B = \underset{i}{Σ} B_{i} - - - [17]

In two equatioies above just having listed, C _iBe the codec complexity of i object, B is a total available bandwidth, and V is the realization visual quality of object.The human vision difficult quality is formulated as equation.Therefore, above equation set is not explication.Yet, if hypothesis 3D model is all continuous in all variablees, so can be with bandwidth ratio (B _i/ B) be considered as (C, V) constant in the right neighborhood.Bandwidth ratio β _iDefine in the equation hereinafter:

β _i＝B _i/B [18]

Can follow such as in the following equation statement define the position and distribute:

β _i＝β(C _i)

1 = \underset{i}{Σ} β_{i}

For (C _i, V) ∈ δ (C ₀, V ₀)

[19]

Wherein δ indication " neighborhood ".

On time and space, codec complexity all is subjected to the influence of human vision susceptibility.The human vision model of Ji Luode (Girod) is the example that can be used for the model of definition space complexity.This model is considered local space frequency and ambient lighting.Gained is measured and is called D _CsatPreliminary treatment point place in described process, will carry out intraframe coding to picture still is that interframe encode is unknown and produces described the two bandwidth ratio.β according to the different video object _INTRABetween the coordination of branch recently.For the intraframe coding picture, in following equation, explain bandwidth ratio:

β _INTRA＝β _0INTRA?log ₁₀(1+α _INTRAY ²D _csat) [20]

In above equation, Y is the mean flow rate component of macro zone block, α _INTRABe brightness square and thereafter D _CsatThe weighted factor of item, β _0INTRABe to guarantee

1 = \underset{i}{Σ} β_{i}

Normalization factor.For instance, α _INTRA=4 o'clock value realizes good visual quality.Can use content information (for example, classifying content) with α _INTRABe set at value corresponding to the required good visual credit rating of the certain content of video.In an example, if video content comprises " orator's head " news broadcast, so can must be lower with the visual quality level setting, but because can think that the frame of video or display part do not have audio-frequency unit so important, and can distribute the less bits described data of encoding.In another example, if video content comprises sport event, can use content information so with α _INTRATherefore be set at value,, and can distribute more the multidigit described data of encoding because shown image may be more important to the beholder corresponding to higher visual quality grade.

In order to understand this relation, should notice that bandwidth becomes logarithmic relationship to distribute with codec complexity.Brightness quadratic term Y ²The coefficient that reflection has big more value uses many more facts of encoding.In order to prevent that logarithm from obtaining negative value, add numeral 1 to parenthetic item.Also can use logarithm with other truth of a matter.

The measurement that time complexity is measured by frame difference determines that described measurement is measured (for example absolute difference summation (SAD)) together with frame difference and measured two differences between the successive frame by considering amount of exercise (for example, motion vector).

The position of inter-coded pictures is distributed can consider space and time complexity.This is expressed as follows:

β _INTER＝β _0INTER?log ₁₀(1+α _INTER·SSD·D _csatexp(-γ‖MV _P+MV _N‖ ²)) [21]

In above equation, MV _PAnd MV _NBe forward and the reversed motion vector (seeing Figure 29) of current MB.It may be noted that the Y in the intraframe coding bandwidth formula ²Replace by difference of two squares summation (SSD).In order to understand ‖ MV _P+ MV _N‖ ²Effect in above equation please notes next characteristic of human visual system: experience level and smooth predictable motion (little ‖ MV _P+ MV _N‖ ²) regional attracts attention, and can be by eye tracking and do not allow usually than the more distortion in quiescent centre.Yet, experience quick or unpredictable motion (big ‖ MV _P+ MV _N‖ ²) the zone can not be tracked and tolerable significantly quantize.Experiment shows, α _INTER=1 and γ=0.001 realizes good visual quality.

C. adaptability GOP is cut apart

In another illustrative example of preprocessor 226 executable processing, the GOP dispenser 612 of Fig. 6 is gone back adaptability ground and is changed the composition of the set of pictures of coding together, and is discussed with reference to the example that uses MPEG2.Some older video compression standards (for example, MPEG2) do not require that GOP has regular texture, but can force at described standard.The MPEG2 sequence always begins (that is the frame that has been encoded) with the I frame under the situation with reference to previous picture not.Common P or the next MPEG2 GOP form of presetting at the encoder place of the spacing of predicted pictures in GOP by fixedly following described I frame.The P frame is the picture of partly predicting from previous I or P picture.The I frame of beginning and the frame between the P frame subsequently are encoded to the B frame." B " frame (B representative two-way) can use previous individually or simultaneously and next I or P picture as a reference.The needed bits number of coding I frame on average surpasses the needed bits number of coding P frame; Equally, the needed bits number of coding P frame on average surpasses the needed bits number of B frame.If the use skipped frame, it will be used for its expression without any need for the position so.

Using P frame and B frame and use skipped frame in compression algorithm more recently is to eliminate time redundancy with the basic concept of the data rate that reduces to represent that video is required.When time redundancy when higher (that is, having little change between picture and the picture), use P, B or skip picture and can represent video flowing effectively, because use the I of previous decoding or P picture decode as a reference other P or B picture after a while.

Adaptability GOP is cut apart to be based on and is used this notion adaptively.Difference between the frame is quantized, and to make by I frame, P frame, B frame automatically after quantitative differences carry out is fit to test still be the decision-making that skipped frame is represented described picture.The adaptability structure has fixedly non-existent advantage in the gop structure.Fixed structure will be ignored the possibility that minor variations has taken place in the content; Adaptive program will allow will be far away more B frame to be inserted between each I frame and the P frame or two P frames between, and then reduce in order to the needed bits number of abundant expression frame sequence.On the contrary, when being changed significantly in the video content, the efficient of P frame is subjected to reducing greatly, because the difference between predictive frame and the reference frame is too big.Under these conditions, match objects may be left the motion search district, or the similitude between the match objects reduces owing to camera angle changes the distortion that causes.At this moment, P frame or I frame and its contiguous P frame should be chosen as closer proximity to each otherly, and should insert less B frame.Fixedly GOP can not make described adjustment.

In the system of Jie Shiing, these conditions are automatic sensings herein.Flexible and feasible these variations that adapt in the content of described gop structure.Described system comes the evaluated frames difference measure with identical apart from addition character, and frame difference is measured the range measurement that can be counted as between the frame.Conceptive, suppose frame F ₁, F ₂And F ₃Has frame pitch from d ₁₂And d ₂₃, F so ₁With F ₃Between distance can be considered d at least ₁₂+ d ₂₃Carrying out frame based on this kind distance measure assigns.

The GOP dispenser is by assigning picture/mb-type to operate to described frame when receiving frame.The Forecasting Methodology that may need during each piece of image type indication coding.

Coding I picture under not with reference to the situation of other picture.Because it is independent, thereby it is provided at the access point that can begin to decode in the data flow.If frame surpasses the scene change threshold to " distance " of its precursor frame, assign the I type of coding to described frame so.

The P picture can use previous I or P picture to carry out motion compensated prediction.It will can be used as basis of coding from the piece that just predicted piece is replaced in preceding field or the frame.After the piece that just is being considered deducts reference block, with the residual block coding, this uses discrete cosine transform to eliminate spatial redundancy usually.If frame and " distance " that be assigned as between the last frame of P frame surpass second threshold value (it is usually less than first threshold), assign the P type of coding to described frame so.

B frame picture can use previous and next P or I picture carry out aforesaid motion compensation.Piece in the B picture can be by forward, reverse or bi-directional predicted; Or can under not with reference to the situation of other frame, carry out intraframe coding to it.In H.264, reference block can be the nearly linear combination of 32 pieces from many frames.If described frame can not be assigned as I or P type, if so from its to its " distance " that is right after precursor greater than the 3rd threshold value (it is usually less than second threshold value), then it is assigned as the category-B type.

If described frame can not be assigned the B frame that becomes coding, so it is assigned as " skipped frame " state.This frame can be skipped, because it is actually the copy of previous frame.

The measuring of difference that assessment quantizes between the contiguous frames in the display order is the first of this processing of being taken place.This measures is distance mentioned above; Measure by described, can assess each frame to determine its appropriate type.Therefore, the spacing between I frame and contiguous P frame or two the continuous P frames can be variable.By beginning to calculate described measuring with block-based motion compensator processing video frames, piece is the base unit of video compression, be made up of 16 * 16 pixels usually, but for example 8 * 8,4 * 4 and 8 * 16 to wait other block size also is possible.For the frame of being made up of two release of an interleave fields, motion compensation can be carried out on the basis on the scene, and is on the scene and be not to take place in the frame to the search of reference block.For the piece among first of present frame, the forward reference block finds in the field of the frame of following present frame; Equally, the back-reference piece finds in the field that is right after at the frame before the front court.Current block is accumulated compensating field.Described process continues second execution to described frame.Make up described two compensating fields to form forward and reverse compensated frame.

For the frame of in reverse telecine 606, creating, owing to only rebuild the cause of film frame, only searching for reference piece on the frame basis.Find two reference blocks and two differences (forward and reverse), thereby also produce forward and reverse compensated frame.In general, motion compensator produces the motion vector and the difference measure of each piece; But under the situation of the output of handling deinterlacer 605, piece is the part of NTSC field, and if the output of processing reverse telecine, it is the part of film frame so.Please note, difference in described the measuring is the piece in field that just is being considered or frame and assesses and between the piece of its coupling, be evaluated in preceding field or the frame or carry out in following closely field or frame, this depends on that assessing forward difference still is reverse difference.Have only brightness value to participate in this calculating.

Therefore, motion compensation step produces two groups of differences.These differences be at piece with current brightness value and have take from be right after in time before present frame and the reference block of frame afterwards in the piece of brightness value between.Determine the absolute value of each forward difference and each reverse difference at each pixel, and its each in the entire frame scope, sue for peace separately.When processing comprise frame through release of an interleave NTSC field the time, two fields include in two summations.In this way, find total absolute value SAD of forward difference and reverse difference _PAnd SAD _N

Use following relational expression to come to calculate the SAD ratio at each frame:

γ = \frac{ϵ+ {SAD}_{P}}{ϵ + {SAD}_{N}} - - - [22]

SAD wherein _PAnd SAD _NIt is respectively total absolute value of forward difference and reverse difference.Add little positive number ε to prevent the mistake of " removing " to molecule by zero.Add similar ε item to denominator, thereby further reduce to work as SAD _POr SAD _NThe susceptibility of γ near zero time.

In alternative aspect, described difference can be SSD (difference of two squares summation) and SAD (absolute difference summation), or SATD, wherein comes the piece with pixel value is carried out conversion by it is used two-dimension discrete cosine transform before the difference in obtaining the piece element.The described summation of assessment on the zone of effective video, but in others, can use than the zonule.

Also calculate the brightness histogram of each frame when receiving (non-motion compensation).Histogram is operated the DC coefficient in 16 * 16 coefficient arrays (that is, (0,0) coefficient), but described DC coefficient is the piece time spent with brightness value it to be used the result of two-dimension discrete cosine transform.The mean value of 256 brightness values in 16 * 16 can be used for histogram equably.For brightness and depth is the image of 8 positions, and number of frequency group is set at 16.Next measures the assessment histogram difference:

λ = \frac{1}{N} Σ_{i = 1}^{16} | N_{Pi} - N_{Ci} | - - - [23]

In above equation, N _PiBe number from the piece of the previous frame in i the group of frequencies, and N _CiBe the number from the piece of the present frame that belongs to i group of frequencies, N is the total number of the piece in the frame.

These intermediate object programs are through compiling to form the present frame difference measure:

D = \frac{γ_{C}}{γ_{P}} + λ (2 λ + 1), - - - [24]

γ wherein _CBe based on the SAD ratio of present frame, and γ _PBe based on the SAD ratio of previous frame.If scene has smooth motion and its brightness histogram is almost constant, D ≈ 1 so.If showing unexpected scene, present frame changes, so γ _CWill be than big and γ _PShould be less.Use ratio

Replace independent γ _C, make the activity level of described standard of measurementization to described situation.

The process of compression type is assigned in Figure 42 explanation to frame.D (the present frame difference of definition in the equation 19) is to assign the basis that makes decisions about frame.Indicated as decision block 4202, if the frame that just is being considered is first frame in the sequence, the decision path that is labeled as "Yes" so is extended to square frame 4206, and then declares that described frame is the I frame.At square frame 4208 places the frame difference of accumulative total is set at zero, and process is returned (in square frame 4210) to the beginning square frame.If the frame that just is being considered is not first frame in the sequence, is labeled as the path of "No" so from square frame 4202 continuities that make decisions, and in test square frame 4204, tests present frame difference with respect to the scene change threshold.If present frame difference, is labeled as the decision path of "Yes" so greater than described threshold value and is extended to square frame 4206, thereby produce the appointment of I frame once more.

If present frame difference is less than the scene change threshold, the "No" path is extended to square frame 4212 so, herein present frame difference is added the frame difference of accumulative total.Continue by described flow chart at decision block 4214 places, will add up frame difference and threshold value t compares, t is generally less than the scene change threshold.If the accumulative total frame difference is greater than t, square frame 4216 is transferred in control so, and described frame is assigned as the P frame; Then in step 4218, will add up frame difference and be reset to zero.If the accumulative total frame difference is controlled so from square frame 4214 and is transferred to square frame 4220 less than t.Herein present frame difference and τ (it is less than t) are compared.If present frame difference less than τ, is assigned as skipped frame with described frame so in square frame 4222 and the process of following is returned; If present frame difference greater than τ, is assigned as the B frame with described frame so in square frame 4226.

Encoder

Later referring to Fig. 2, transcoder 200 comprises encoder 228, and it receives treated metadata and original video from preprocessor 226.Metadata can comprise the initial any information that receives and any information of being calculated by preprocessor 226 in source video 104.Encoder 228 comprises first pass encoder 230, second time encoder 232 and encoder 234 again.Encoder 228 also receives the input from transcoder control 231, transcoder control 231 can be provided to the information (for example, metadata, error resilient information, content information, encoded bit rate information, basal layer and enhancement layer balancing information and quantitative information) from second time encoder 232 first pass encoder 230, encoder 234 and preprocessor 226 again.Encoder 228 uses the content information that receives from preprocessor 226 and/or by the content information that encoder 228 self (for example, by content, classification module 712 (Fig. 7)) the produces institute's receiver, video of encoding.

Fig. 7 explanation can be included in the block diagram of the functional module in exemplary twice encoder, and described exemplary twice encoder can be used for encoder illustrated in fig. 2 228.The various aspects of showing described functional module among Fig. 7, but Fig. 7 and description herein there is no need to solve all functions that can be incorporated in the encoder.Therefore, hereinafter after following argumentation basal layer and enhancement layer coding, describe aspect some of described functional module.

Basal layer and enhancement layer coding

Encoder 228 can be a SNR ges forschung device, it can be encoded to first coding data group (this paper also is called basal layer) and one or more extra coded data groups (this paper also is called enhancement layer) with original video and the metadata from preprocessor 226.Encryption algorithm produces base layer coefficient and enhancement layer coefficient, when described two kinds of layers all can be used for decoding, can make up described base layer coefficient and enhancement layer coefficient at the decoder place when decoding.When described two kinds of layers were all unavailable, the coding of basal layer allowed it to be decoded as single layer.

An aspect of this multi-layer coding process is described referring to Figure 31.At square frame 321 places, with whole intraframe coding macro zone blocks (intraframe coding MB) I frame of encoding.In H.264, by encode intraframe coding MB in the I frame of the spatial prediction that utilizes fully, described spatial prediction provides a large amount of coding gains.There are two spermotypes: intra-frame 4 * 4 and frame interior 16 * 16.If the coding gain that is provided by spatial prediction will be provided basal layer, so need be before the Code And Decode enhancement layer Code And Decode basal layer.Use twice Code And Decode of I frame.In basal layer, base layer quantization parameter QP _bFor conversion coefficient provides thick quantization step.Will be in the difference aspect enhancement layer place coding initial frame and the pixel between the reconstruction base layer frame.Enhancement layer uses quantization parameter QP _e(it provides thinner quantization step).Code device (for example encoder 228 of Fig. 2) can be carried out coding at square frame 321 places.

At square frame 323 places, encoder comes basis of coding layer data and enhancement data at P frame among the just processed GOP and/or B frame.Code device (for example encoder 228) can be carried out coding at square frame 323 places.At square frame 325 places, whether the cataloged procedure inspection exists more P frames or B frame to encode.Code device (for example SNR ges forschung device 228) can be carried out action 325.If still have more P frames or B frame, repeating step 323 so, and all frames in GOP are finished till the coding.P frame or B frame comprise interframe encode macro zone block (interframe encode MB), but as hereinafter discussing, also can have intraframe coding MB in P frame or B frame.

In order to make decoder can distinguish base layer data and enhancement data, encoder 228 coding overhead information (square frame 327).The type of overhead information comprise (for example) identification layer number data, with layer be identified as basal layer data, with layer be identified as the data of enhancement layer, between the identification layer correlation (for example, layer 2 is enhancement layers of basal layer 1, or layer 3 is enhancement layers of enhancement layer 2) data or layer is identified as the data of the last enhancement layer in a succession of enhancement layer.Overhead information can be included in its under basal layer and/or the header that is connected of enhancement data in, or be included in the independent data-message.Code device (for example encoder 228 of Fig. 2) can be carried out described process at square frame 327 places.

In order to carry out the individual layer decoding, must before inverse quantization, make up the coefficient of two kinds of layers.Therefore, the necessary interactive coefficient that produces described two kinds of layers; Otherwise this can introduce a large amount of overheads.The reason that overhead increases is that the basal layer coding can use different time references with enhancement layer coding.Need a kind of algorithm that produces basal layer and enhancement layer coefficient, but, can go to make up described base layer coefficient and enhancement layer coefficient at the decoder place before the quantification when the equal time spent of described two kinds of layers.Simultaneously, unavailable or decoder is former thereby when determining not decoding enhancement layer for for example power-saving etc. when enhancement layer, and described algorithm should be provided for acceptable base layer videos.In the situation of the following brief discussion that normative forecast is encoded that is right after, hereinafter further discuss the details of the exemplary embodiment of this process.

P frame (or any interframe encode part) can utilize the time redundancy between district in the photo current and the prediction of the optimum Match in the reference picture district.Can encode in motion vector in the position in the optimum Match prediction district in the reference frame.Difference between proparea and optimum Match reference prediction district is called residual error (or predicated error).

Figure 32 is the explanation of the example of the P frame building process among (for example) MPEG-4.Process 331 is being described in more detail of the example process that can take place in the square frame 323 of Figure 31.Process 331 comprises the photo current of being made up of 5 * 5 macro zone blocks 333, and wherein the macro zone block number in this example is arbitrarily.Macro zone block is made up of 16 * 16 pixels.Pixel can be by 8 brightness values (Y) and two 8 chromatic values (Cr and Cb) definition.In MPEG, Y, Cr and Cb component can be stored as the 4:2:0 form, and wherein Cr and Cb component are downsampled with 2 on X and Y direction.Therefore, each macro zone block will be made up of 256 Y components, 64 Cr components and 64 Cb components.At macro zone block 335 from the reference picture 337 prediction photo currents 333 that are positioned at the time point different with photo current 333.In reference picture 337, search for to be positioned at the optimum Match macro zone block 339 of the most approaching current macro zone block 335 that just is being encoded aspect Y, Cr and the Cb value.Encode in motion vector 341 in the position of the optimum Match macro zone block 339 in the reference picture 337.Reference picture 337 can be I frame or P frame, and decoder will be rebuild described I frame or P frame before making up photo current 333.Deduct optimum Match macro zone block 339 (difference of each calculating Y, Cr and the Cb component) from current macro zone block 335, thereby produce residual error 343.With 2D discrete cosine transform (DCT) 345 coded residual errors 343 and then quantize 347.Can carry out and quantize 347 to assign less bits and to provide space compression than multidigit to high frequency coefficients to the low frequency coefficients assignment by (for example).The quantization parameter of residual error 343 is the encoded information of the current macro zone block 335 of expression together with motion vector 341 and reference picture 333 identifying informations.Described encoded information can be stored in the memory for using for the purpose of (for example) error correction or figure image intensifying in the future or operate, or via network 349 transmission.

The encoded quantization parameter of residual error 343 is used in together with encoded motion vector 341 and rebuilds current macro zone block 335 in the encoder, so that subsequent motion is estimated and the part of the reference frame of compensation with acting on.Encoder can imitate the program at this P frame reconstruction of decoder.The imitation decoder will cause encoder, and both work to same reference picture.No matter represent process of reconstruction herein, be to carry out for further interframe encode in encoder or carry out in decoder.Can begin the reconstruction of P frame afterwards at reconstruction reference frame (or just by part of the picture of reference or frame).Encoded quantization parameter is through going to quantize 351 and then carry out anti-DCT of 2D or IDCT 353, thereby produces through the decoding or the residual error 355 of rebuilding.The optimum Match macro zone block 357 of encoded motion vector 341 through decoding and being used for having rebuild in reference picture 337 location that rebuild.The optimum Match macro zone block 357 that then residual error 355 of rebuilding is added to reconstruction is to form the macro zone block of rebuilding 359.The macro zone block of rebuilding 359 can be stored in the memory, independent displaying or rebuild macro zone block with other and in picture, show, or further handle image intensifying with realization figure.

The B frame any part of bi-directional predictive coding (or by) can utilize district in the photo current and the time redundancy between optimum Match prediction district in the previous picture and the optimum Match prediction district in the subsequent pictures.Follow-up optimum Match prediction district is predicted the district with the district's combination of optimum Match prediction before to form combined bidirectional.Difference between photo current district and the optimum Match combined bidirectional prediction district is residual error (or predicated error).Can encode in two motion vectors in the position in optimum Match prediction district in the subsequent reference picture and the optimum Match prediction district in the previous reference picture.

The example of the encoder process that is used for basis of coding layer and enhancement layer coefficient that Figure 33 explanation can be carried out by encoder 228.Basal layer and enhancement layer are encoded so that SNR to be provided scalable bit stream.The example that is used for MB residual error coefficient between coded frame that Figure 33 describes for example will carry out in the step 323 of Fig. 31.Yet, also can use similar approach to come MB coefficient in the coded frame.For example encoder component 228 code devices such as grade of Fig. 2 can be carried out the process that illustrates among Figure 33 and the step 323 of Figure 32.Initial (to be encoded) video data 406 (in this example, video data comprises brightness and chrominance information) is input to basal layer optimum Match macro zone block loop 363 and enhancement layer optimum Match macro zone block loop 365.Loop 363 and 365 both purposes are that respectively the residual error that will calculate at adder 367 and 369 places reduce to minimum.(as shown in the figure) or in turn carry out loop 363 and 365 concurrently.Loop 363 and 365 comprises respectively and is used to search for buffer 371 and 373 logic of (it contains reference frame), the residual error between optimum Match macro zone block and the initial data 361 is reduced to the optimum Match macro zone block (buffer 371 and 373 can be same buffer) of minimum with identification.Because base layer loop 363 will be utilized the quantization step (higher QP value) thicker than enhancement layer loop 365 usually, thereby loop 363 and 365 residual error are with different.The residual error in conversion square frame 375 and 377 each loop of conversion.

Then in selector 379, will be basal layer and enhancement layer coefficient through the conversion coefficient analysis.The analysis of selector 379 can be taked some forms, as discussed below.A common trait of analysis technology is to calculate enhancement layer coefficient C ' _Enh, make that it is base layer coefficient C ' _BaseDifferential improve.The improvement that enhancement layer is calculated as basal layer allows decode base layer coefficient and have the reasonable representation of image of decoder oneself, or combination foundation layer and enhancement layer coefficient and improvement with image are represented.Then by

quantizer

381 and 383 coefficients that quantize by selector 379 selections.Quantization parameter

With

(calculating with

quantizer

381 and 383 respectively) can be stored in the memory or via Network Transmission and arrive decoder.

For with decoder in macro zone block rebuild coupling, go quantizer 385 that basal layer residual error coefficient is gone to quantize.Residual error coefficient through going to quantize is through inverse transformation 387 and add 389 to the optimum Match macro zone block that finds in buffer 371, thereby produces the reconstruction macro zone block that mates with the macro zone block that will rebuild in decoder.Quantizer 383, go quantizer 391, inverse transformer 393, adder 397 and buffer 373 in strengthening loop 365, to carry out to calculate with like the compute classes of in base layer loop 363, carrying out.In addition, it is employed through going to quantize enhancement layer coefficient and base layer coefficient that adder 393 is used for making up the enhancement layer reconstruction.Enhancement layer quantizer and go quantizer will utilize the quantiser step size thinner (comparatively low QP) usually than basal layer.

The basal layer that Figure 34,35 and 36 displayings can be adopted in the selector 379 of Figure 33 and the example of enhancement layer coefficient selector process.For example encoder 228 choice devices such as grade of Fig. 2 can be carried out the process of describing among Figure 34,35 and 36.Use Figure 34 as an example, will be basal layer and enhancement layer coefficient through the conversion coefficient analysis, as shown in following equation:

C′ _enh＝C _enh-Q _b ^-1(Q _b(C′ _base)) [26]

Wherein " min " function can be the mathematics minimum value or the minimum value of described two independents variable.In Figure 34, equation 25 is depicted as square frame 401 and equation 26 is depicted as adder 510.In equation 26, Q _bRepresent base layer quantizer 381, and Q _b ^-1That represents basal layer removes quantizer 385.Equation 26 improves the differential that enhancement layer coefficient is converted to the base layer coefficient of calculating with equation 25.

Figure 35 is the explanation of another example of basal layer and enhancement layer coefficient selector 379.In this example, the equation (.) that comprises in the square frame 405 is expressed as follows:

Adder 407 is calculated enhancement layer coefficient as shown in following two equatioies:

C′ _enh＝C _enh-Q _b ^-1(Qb(C′ _base)) [28]

C ' wherein _BaseProvide by equation 27.

Figure 36 is the explanation of another example of basal layer and enhancement layer selector 379.In this example, base layer coefficient is constant, and enhancement layer equals to quantize/go the difference between quantized base layer coefficient and the initial enhancement layer coefficient.

Except basal layer and enhancement-layer residual error coefficient, decoder also needs to discern the information of MB of how encoding.Encoder component 228 overhead information such as codified such as code device such as grade of Fig. 2 for example, described overhead information can comprise the mapping of intraframe coding part and interframe encode part, for example MB mapping, wherein macro zone block (or sub-macro zone block) is identified as intra-encoded or interframe encode (is also discerned the interframe encode of which kind of type, comprise for example forward, reverse or two-way) and which frame of interframe encode partial reference.In aspect exemplary, MB mapping and base layer coefficient are encoded in basal layer, and enhancement layer coefficient is encoded in enhancement layer.

P frame and B frame can contain intraframe coding MB and interframe MB.Hybrid video coders is used rate distortion (RD) optimization to decide usually some macro zone block in P frame or the B frame is encoded to intraframe coding MB.In order to have individual layer decoding (wherein intraframe coding MB does not rely on MB between enhancement layer frame), do not use MB between any consecutive frame to carry out the spatial prediction of basal layer intraframe coding MB.For computational complexity is remained unchanged for enhancement layer decoder,, can skip the improvement at enhancement layer place for the intraframe coding MB in basal layer P or the B frame.

Intraframe coding MB in P frame or the B frame compares with interframe MB needs many more positions.For this reason, can only under higher QP, encode to the intraframe coding MB in P or the B frame with base layer quality.This will introduce certain deterioration in video quality, if but in frame after a while, as above discuss and improve this deterioration with the interframe MB coefficient in basal layer and the enhancement layer, this deterioration should be unconspicuous so.Two reasons make that this deterioration is not obvious.First reason is the feature of human visual system (HVS), and another reason is that interframe MB improves MB in the frame.For the object that changes the position from first frame to second frame, some pixels in first frame invisible in second frame (information to be capped), and some pixels in second frame are visible (information that is not capped) for the very first time.Human eyes to be not capped and visual information to be capped insensitive.Therefore for the information that is not capped, even it is to encode than low quality, eyes also may not be offered an explanation described difference.If in P frame subsequently, still have identical information, so will be probably to its improvement, because enhancement layer has low QP at the P subsequently at enhancement layer place frame.

Another common technique of introducing intraframe coding MB in P frame or B frame is called frame refreshing.Even it should be interframe encode MB that standard R-D optimizes regulation MB, but in the case, some MB are encoded as intraframe coding MB.The available QP of these intraframe codings MB (being included in the basal layer) _bOr QP _eCoding.If with QP _eBe used for basal layer, need do not improve so at the enhancement layer place.If QP _bBe used for basal layer, may need so to improve, otherwise at the enhancement layer place, the general who has surrendered be more obvious under the quality.Because interframe encode is more effective than intraframe coding on the meaning of code efficiency, thereby will be by interframe encode in these improvement at enhancement layer place.In this way, base layer coefficient will be not used in enhancement layer.Therefore, quality is improved at the enhancement layer place under the situation of not introducing new operation.

Because the B frame provides the higher compression quality, it is generally used in the enhancement layer.Yet B frame possibility must be with reference to the intraframe coding MB of P frame.If the pixel of B frame is treated to encode with enhancement layer quality, because the low-qualityer cause of P frame intraframe coding MB may need too many position, as above discuss so.By utilizing the quality of HVS, as above discuss, when reference P frame than low quality intraframe coding MB the time, B frame MB can encode than low quality.

A kind of extreme case of intraframe coding MB in P frame or the B frame is when because when having in the video that just is being encoded that all MB all encode with frame mode in cause P frame that scene changes or the B frame.In the case, entire frame can base layer quality be encoded and the not improvement at the enhancement layer place.If change at B frame place occurrence scene, and hypothesis B frame only is encoded in enhancement layer, and the B frame can base layer quality be encoded or abandoned fully so.If change, may not need so to change, but the P frame can be dropped or encode with base layer quality at P frame place occurrence scene.In [attorney docket/reference number 050078] number U.S. patent application case that is entitled as " scalable video coding (SCALABLE VIDEO CODING WITH TWOLAYER ENCODING AND SINGLE LAYER DECODING) ", further describe scalable floor coding with the decoding of two-layer coding and individual layer.

Encoder first pass part

The illustrative example of the encoder 228 of Fig. 7 exploded view 2.Shown in square frame explanation can be included in various coder processes in the encoder 228.In this example, encoder 228 is included in the first pass part 706 of 704 tops, line of demarcation and second time part 706 of online 704 belows (comprising second time encoder 232 and encoder 234 functional again among Fig. 2).

Encoder 228 receives metadata and original video from preprocessor 226.Metadata can comprise any metadata that is received or calculated by preprocessor 226, comprising the metadata relevant with the content information of video.First pass part 702 explanation of encoder 228 can be included in the example procedure in the first pass coding 702, is hereinafter described aspect functional at it.As be understood by those skilled in the art that, can implement this function by various forms (for example, hardware, software, firmware or its combination).

Fig. 7 illustrates adaptability frame refreshing (AIR) module.AIR module 710 provides input to I frame illustration module 708, and I frame illustration module 708 is come illustration I frame based on metadata.First pass part 702 also can comprise content, classification module 712, and it is configured to receive metadata and video and definite content information relevant with described video.Content information can be provided to rate controlled position distribution module 714, it also receives metadata and video.Control bit distribution module 714 is determined the control information of speed position and it is provided to mode decision module 715.Content information and video can be provided to frame inner model (distortion) module 716, frame inner model (distortion) module 716 is provided to mode decision module 715 and basis and enhancement layer scalability rate distortion module 718 with the intraframe coding distortion information.Video and metadata are provided to estimation (distortion) module 720, and estimation (distortion) module 720 is provided to basis and enhancement layer scalability rate distortion module 718 with the interframe encode distortion information.Basis and enhancement layer scalability rate distortion module 718 are used and are estimated to determine scalability rate distortion information from the distortion of motion estimation module 720 and frame inner model distortion module 716, and described scalability rate distortion information is provided to mode decision module 715.Mode decision module 715 also receives the input from section/MB sequence module 722.Section/MB sequence module 722 receives from error resilient module 740 input of (showing in second time part 706), but and provides about absolute coding part (section) and the information of access unit boundary alignment with the acquisition error resilient with video to mode decision module 715.Mode decision module 715 is imported to determine coding mode information and is provided " the best " coding mode to second time part 706 based on it.Hereinafter describe this first pass part 702 some examples further specify interpretation.

As above statement, metadata and original video that content, classification module 712 receives by preprocessor 226 supplies.In some instances, preprocessor 226 calculates content information and described content information (for example is provided to content, classification module 712 according to multi-medium data, with the metadata form), content, classification module 712 can use described content information to determine the classifying content of multi-medium data.In aspect some other, content, classification module 712 is configured to determine various content informations according to multi-medium data, and also can be configured to determine classifying content.

Content, classification module 712 can be configured to determine have the different content classification of the video of dissimilar contents.The different content classification can cause different parameters to be used for the aspect of encoded multimedia data, for example (for example determine bit rate, the position is distributed) to determine quantization parameter, estimation, scalability, error resilient, on channel, keep the optimum multimedia quality of data, and be used for fast channel switching scheme (for example, periodically facilitating the I frame to switch) to allow Fast Channel.According to an example, encoder 228 is configured to content-based classification and determines rate-distortion (R-D) optimization and bit rate allocation.Determine classifying content to allow content-based classification and with the given credit rating of multi-medium data boil down to corresponding to required bit rate.And, by classifying content (for example, determining classifying content), make the gained perceived quality of multi-medium data on the display of receiving system of being passed on depend on video content based on the human visual system with multi-medium data.

As the example with the program of classifying content that content, classification module 712 is experienced, Fig. 9 shows process 900, its description sort module 712 exercisable example procedure.As shown in the figure, process 900 begins at input square frame 902 places, and content, classification module 712 receives original multi-medium data and metadata herein.Process 900 then advances to square frame 904, and content, classification module 712 is determined the spatial information and the temporal information of multi-medium data herein.In certain aspects, cover (for example, filtering) by room and time and determine room and time information.Can be based on comprising that the level and smooth metadata of scene delta data and motion vector (MV) determines room and time information.Process 900 then advances to square frame 912, and it is carried out space complexity, time complexity and susceptibility and estimates.Process 900 then advances to square frame 916, herein based on the result of space, time and the sensitivity data determined in square frame 904 and 912 classifying content with multi-medium data.And, can select special speed-distortion (R-D) curve and/or renewable R-D curve data at square frame 916 places.Process 900 then advances to output square frame 918, and output can comprise indication room and time activity (for example, classifying content) complexity-distortion map or value, and/or selected R-D curve herein.Later referring to Fig. 7, content, classification module 712 is provided to rate controlled position distribution module 714, frame inner model (distortion) module 716 with output, and also is provided to I frame illustration module 708 (above discussing).

Content information

Content, classification module 712 can be configured to calculate various content informations according to multi-medium data, comprising various content correlated measures, comprise that space complexity, time complexity, contrast ratio, standard deviation and frame difference measure, hereinafter further described.

Content, classification module 712 can be configured to determine the space complexity and the time complexity of multi-medium data, and also texture value is associated with space complexity and motion value is associated with time complexity.Content, classification module 712 receives the preliminary treatment content information relevant with the content of the multi-medium data that just is being encoded from preprocessor 226, and perhaps preprocessor 226 can be configured to calculate content information.As mentioned above, content information can comprise (for example) one or more D _CsatValue, contrast ratio, motion vector (MV) and absolute difference summation (SAD).

In general, multi-medium data comprises one or more image sequences or frame.Each frame can resolve into a plurality of block of pixels for processing.Space complexity is the broad terms of the measurement of the spatial detail level in the descriptor frame substantially.Have the constant or brightness of low variation of the plain color of being mainly and the scene of chroma areas and will have the low spatial complexity.Space complexity is associated with the texture of video data.In this regard, space complexity is based on being called D _CsatThe human vision susceptibility measure, it is to calculate at each piece as the function of local space frequency and ambient lighting.General those of skill in the art know that the spatial frequency pattern that is used to use visual pattern and illumination and contrastive feature utilize human visual system's technology.Known many susceptibilitys are measured the perspective restriction that is used to utilize the human visual system, and can use with method described herein.

Time complexity is the broad terms that is used for describing substantially as the measurement of the sports level in the multi-medium data of institute's reference between the frame of frame sequence.Have little motion or do not have the scene (for example, video data frame sequence) of motion to have low time complexity.Can be at each macro zone block and can be based on D _CsatAbsolute pixel difference summation between value, motion vector and a frame and another frame (for example, reference frame) is come complexity computing time.

By considering amount of exercise (for example, motion vector or MV) together with the residual amount of energy that is expressed as the absolute difference summation (SAD) between fallout predictor and the current macro zone block, frame difference is measured the measurement that provides two differences between the successive frame.Frame difference also provides measurement two-way or single directional prediction efficient.

As follows based on the example that the frame difference of the movable information that receives from the preprocessor of potential execution motion compensation release of an interleave is measured.Deinterlacer is carried out bi-directional motion estimation, and therefore bidirectional motion vector and SAD information can be used.The frame difference of can following derivation representing at each macro zone block by SAD_MV:

SAD_MV＝log ₁₀[SAD*exp(-min(1，MV))] [29]

MV=Square_root (MV wherein _x ²+ MV _y ²), SAD=min (SAD _N, SAD _P), SAD wherein _NBe the SAD that calculates from backward reference frame, and SAD _PBe the SAD that calculates from the forward reference frame.

The other method of estimated frame difference has above been described with reference to equation 6 to 8.Can calculate SAD than (or contrast quantitatively) γ as in above equation 6, describing previously.Also can determine the brightness histogram of each frame, histogram difference λ is to use equation 7 to calculate.Can as shown in equation 8, calculate frame difference and measure D.

In an exemplary embodiment, utilize contrast ratio and frame difference to measure in the following manner to obtain the video content classification, the video content classification can be predicted the feature in the given video sequence reliably.Though this paper is described as occurring in the encoder 228, preprocessor 226 also can be configured to determine classifying content (or other content information) and via metadata classifying content is delivered to encoder 228.The process of describing in following example becomes 8 possibility kinds with classifying content, and this is similar with the classification that obtains from the analysis based on the R-D curve.Change according to the scene complexity in each superframe and scene and number to take place and decide, assorting process is exported value in the scope between 0 and 1 at each superframe.Content, classification module in the preprocessor can be carried out the following step (1) to (5) to measure from frame reduced value and frame difference value acquisition classifying content to each superframe.

1. calculate average frame contrast and frame contrast deviation according to the macro zone block reduced value.

2. use from the value of simulation acquisition and come standardization frame reduced value and frame difference value, it is respectively 40 and 5.

3. using (for example) following generalization equation to calculate classifying content measures:

CCMetric＝CCW1*I_Frame_Contrast_Mean+CCW2*Frame_Difference_Mean-CCW3*I_Contrast_Deviation^2*exp(CCW4*Frame_Difference_Deviation^2) [30]

Wherein CCW1, CCW2, CCW3 and CCW4 are weighted factors.In this example, described value is chosen as: CCW1 is 0.2, and CCW2 is 0.9, and CCW3 is 0.1, and CCW4 is-0.00009.

4. determine the scene variable number in the superframe.In general, superframe is meant the picture or the frame group that can show in the specific period.Usually, the described period is 1 second.In certain aspects, superframe comprises 30 frames (being used for the 30/fps video).In others, superframe comprises 24 frames (24/fps video).Decide according to the scene variable number, can carry out one in the following situation:

(a) do not have scene to change: when not existing scene to change in the superframe, measure and fully only depend on the frame difference value, as shown in following equation:

CCMetric＝(CCW2+(CCW1/2))*Frame_Difference_Mean-(CCW3-(CCW1/2))*1*exp(-CCW4*Frame_Difference_Deviation^2) [31]

(b) single scene changes: when observing single scene in the superframe and change, will use and give tacit consent to equation and calculate and measure, as follows:

CCMetric＝CCW1*I_Frame_Contrast_Mean+CCW2*Frame_Difference_Mean-CCW3*I_Contrast_Deviation^2*exp(CCW4*Frame_Difference_Deviation^2) [32]

(c) two scenes change: when observing when having in given superframe that 2 scenes change at the most, compare with first superframe and give last superframe more flexible strategy, in any case because first superframe can be refreshed fast by superframe after a while, as shown in following equation:

CCMetric＝0.1*I_Frame_Contrast_Mean1+CCW1*I_Frame_Contrast_Mean2+(CCW2-0.1)*Frame_Difference_Mean-CCW3*I_Contrast_Deviation1^2*I_Contrast_Deviation2^2*exp(CCW4*Frame_Difference_Deviation^2) [33]

(d) the above scenes of three or three change: have (such as N) I frame more than 3 if observe given superframe, giving the more flexible strategy of last I frame so and giving all other I frames is 0.05 flexible strategy, as shown in following equation:

CCMetric＝0.05*I_Frame_Contrast_Mean _(1....N-1)+CCW1*I_Frame_Contrast_Mean _(N)+(CCW2-(0.05*(N-1)))*Frame_Difference_Mean-CCW3*I_Contrast_Deviation _(N)^2*I_Contrast_Deviation _(1....N-1)^2*exp(CCW4*Frame_Difference_Deviation^2) [34]

5. under the situation of frame difference mean value, can proofread and correct measuring less than 0.05 o'clock harmonic motion scene.CCMetric is added in skew with 0.33 (CCOFFSET) to.

Content, classification module 712 is used D _CsatValue, motion vector and/or absolute difference summation are determined the value (or specified amount of video data) of the space complexity of indication macro zone block.Time complexity is to be measured the measurement decision-making of (consider amount of exercise together with the difference between two successive frames of motion vector, and the absolute difference summation between the described frame) by frame difference.

In certain aspects, content, classification module 712 can be configured to produce the bandwidth mapping.For instance, if preprocessor 226 does not produce the bandwidth mapping, the bandwidth mapping produces and can be carried out by content, classification module 712 so.

Determine texture value and motion value

For each macro zone block in the multi-medium data, content, classification module 712 is associated texture value and motion value is associated with time complexity with space complexity.Texture value is relevant with the brightness value of multi-medium data, the wherein less variation in the brightness value of the neighbor of low texture value designation data, and the bigger variation in the brightness value of the neighbor of high texture value designation data.In case calculate texture value and motion value, content, classification module 712 is just determined classifying content by consideration movable information and texture information.The texture of the video data that content, classification module 712 will just be classified and relative texture value (for example, " low " texture, " in " texture or " height " texture) be associated, texture value is indicated the complexity of the brightness value of macro zone block substantially relatively.And, motion value that content, classification module 712 will be calculated for the video data that just is being classified and relative motion value (for example, " low " motion, " in " motion or " height " motion) be associated, the relative motion value is indicated the amount of exercise of macro zone block substantially.In alternative aspect, can use be used to move and texture still less or more.Then, determine that by the texture value considering to be associated and motion value classifying content measures.Being entitled as " being used for the classifying content (CONTENT CLASSIFICATION FORMULTIMEDIA PROCESSING) that multimedia is handled " and transferring the co-pending the 11/373rd of this assignee in application on March 10th, 2006, disclosed in No. 577 U.S. patent application case the further describing of the illustrative aspect of classifying content, described application case is incorporated herein by reference clearly.

Fig. 8 illustrates the example of subject diagram, and how described subject diagram explanation texture value and motion value are associated with classifying content.The those skilled in the art is familiar with the many modes (for example, with look-up table or database form) in order to implement this subject diagram.Subject diagram is based on that the predetermined estimation of video data content produces.In order to determine the video data classification, cross reference " low ", " in " or " height " texture value (on " x axle ") and " low ", " in " or " height " motion value (on " y axle ").Be assigned to video data with intersecting the classifying content of indicating in the piece.For instance, " height " texture value with " in " motion value causes classifying seven (7).The relative texture value that Fig. 8 explanation is associated with 8 different contents classification in this example and the various combinations of motion value.In aspect some other, can use more or less classification.Being entitled as " being used for the classifying content (CONTENT CLASSIFICATIONFOR MULTIMEDIA PROCESSING) that multimedia is handled " and transferring the co-pending the 11/373rd of this assignee in application on March 10th, 2006, disclosed in No. 577 U.S. patent application case the further describing of the illustrative aspect of classifying content, described application case is incorporated herein by reference clearly.

The rate controlled position is distributed

As described herein, the multimedia data contents classification can be used for encryption algorithm to improve the position management effectively when keeping the constant perceived quality of video.For instance, can be used for scene change-detection, encoded bit rate distribute control and frame rate upwards the algorithm of conversion (FRUC) use classification to measure.Compresser/decompresser (codec) system and digital signal processing algorithm are generally used in the video data communication, and can be configured to save bandwidth, but exist compromise between quality and bandwidth conservation.Optimal codec provides maximum bandwidth conservation when producing the minimum degradation of video quality.

In an illustrative example, rate controlled position distribution module 714 is used classifying contents to determine bit rate (for example, dividing the bits number that is used in the encoded multimedia data) and is stored in the memory bit rate into for encoder 228 other process and assembly use.The bit rate of determining according to the classification of video data can help to save bandwidth when providing multi-medium data with consistent credit rating.In one aspect, in different bit rate and described 8 the different classifying contents each can be associated and then use described bit rate to come the encoded multimedia data.The gained effect is that though the classification of the different content of multi-medium data is assigned with the position of different numbers to encode, perceived quality is similar or consistent when watching on display.

In general, have the multi-medium data indication higher motion of higher classifying content and/or texture level and when coding, be assigned with than multidigit.Multi-medium data (indicating less texture and motion) with low classification is assigned with less bits.For the multi-medium data of certain content classification, can be based upon and watch multi-medium data and selected target perceived quality grade is determined bit rate.Can watch multi-medium data and determining the multi-medium data quality determined in its grading by the mankind.In some alternative aspect, can use (for example) signal-to-noise ratio (SNR) Algorithm to estimate the multi-medium data quality by Auto-Test System.In one aspect, be scheduled to one group of standard quality grade (for example, 5) and realize the corresponding positions speed that each extra fine quality grade needs at the multi-medium data of each classifying content.In order to determine one group of credit rating, can assess the multi-medium data of certain content classification by producing Mean Opinion Score number (MOS), Mean Opinion Score number (MOS) provides the numeral indication to the visually-perceptible quality of described multi-medium data when using certain bit rate encoded multimedia data.MOS can be expressed as the individual digit in 1 to 5 scope, and wherein 1 is lowest perceived quality, and 5 is the highest perceived qualities.In others, MOS can have more than 5 or 5 following credit ratings, and can use the difference of each credit rating to describe.

Can watch multi-medium data and determining the multi-medium data quality determined in its grading by the mankind.In some alternative aspect, can use (for example) signal-to-noise ratio (SNR) Algorithm to estimate the multi-medium data quality by Auto-Test System.In one aspect, be scheduled to one group of standard quality grade (for example, 5) and realize the corresponding positions speed that each extra fine quality grade needs at the multi-medium data of each classifying content.

Can pass through select target (for example, required) credit rating determines the visually-perceptible credit rating of the multi-medium data of certain content classification and the understanding of the relation between the bit rate.Be used for determining that the target quality level of bit rate can select in advance, select, by automated procedure or need select, or come Dynamic Selection based on preassigned by code device or system from the user or from the semi-automatic process of the input of another process by the user.Can come the select target credit rating based on the type of (for example) coding application program or with the type of the client terminal device of receiving multimedia data.

In example illustrated in fig. 7, rate controlled position distribution module 714 receives from the data of content, classification module 712 and direct metadata from preprocessor 226.Rate controlled position distribution module 714 resides in the first pass part of encoder 228, and rate controlled fine setting module 738 resides in second time part 706.This twice rate controlled aspect is configured to make first pass (rate controlled position distribution module 714) to carry out the context adaptable bit and (for example distribute by seeing a superframe in advance, long-term average bit rate with 256kbps is a target) and peak limiting speed, and second time (rate controlled fine setting module 738) improves the first pass result to obtain two-layer scalability and to carry out rate adaptation.Rate controlled is operated with four grades: (1) GOP grade---and the position of control I frame, P frame, B frame and F frame is distributed in the GOP inhomogeneous; (2) superframe grade---control is to the hard limit of maximum super frame size; (3) frame grade---the room and time complexity according to the multi-medium data frame is come the control bit requirement, and the room and time complexity is based on content information (for example, classifying content); (4) macro zone block grade---shine upon the position of controlling macro zone block based on the room and time complexity and distribute, the mapping of room and time complexity is based on content information (for example, classifying content).

The exemplary flow chart of the operation of explanation rate controlled module 714 among Figure 10.As shown in Figure 10, process 1000 begins at input 1002 square frame places.Rate controlled module 714 receives various inputs, and not all described input all must be illustrated by Fig. 7.For instance, input information can comprise from the metadata of preprocessor 226, targeted bit rates, encoder buffer size (or as equivalent, being used for the maximum delay time of rate controlled), initial rate control lag and frame rate information.Other input information can comprise the input of set of pictures (GOP) grade, comprising the length of (for example) maximum super frame size, GOP and P/B frame distribute (comprising the scene change information), required basal layer and enhancement layer arranged, be used for the complexity-distortion measure of the picture of following 30 frames among the GOP.Other input information comprises the input of photo grade, comprising complexity-distortion map (receiving from content, classification module 712), quantization parameter (QP) and the position decomposition (being enclosed within on the sliding window) of 30 frames in the past at photo current.At last, the input information of macro zone block (MB) grade comprises mean absolute difference in (for example) reference picture and that put macro zone block (MB) (MAD) and the macro zone block coded block pattern (CBP) of (no matter whether skipping) after quantizing.

After the input of square frame 1002 places, process 1000 advances to square frame 1004 to carry out the initialization to coding stream.Simultaneously, carry out buffer initialization 1006.Next, as shown in square frame 1008,, wherein receive the GOP position and distribute 1010 as an initialized part to the GOP initialization.After the GOP initialization, flow process advances to square frame 1012, wherein to the section initialization.This initialization comprises the renewal header bits shown in square frame 1014.After the initialization of carrying out square frame 1004,1008 and 1012, shown in square frame 1016, carry out rate controlled (RC) at elementary cell or macro zone block (MB).Part as the rate controlled of the macro zone block in the square frame 1016 is determined receives input via the interface in the encoder 228.These inputs can comprise the renewal 1020 of macro zone block (MB) position distribution 1018, quadratic expression model parameter and depart from the renewal 1022 of the intermediate value absolute deviation of intermediate value (" MAD ", sane discrete estimation) parameter.Next, process 1000 advances to square frame 1024 with executable operations 1024 after picture of coding.This program comprises the renewal of the buffer parameter of reception shown in square frame 1026.Process 1000 then advances to output square frame 1028, and the quantization parameter QP of each macro zone block MB of being used by mode decision module 715 is as shown in Figure 7 treated in rate controlled module 714 output herein.

Estimation

Motion estimation module 720 receives from the metadata of preprocessor 226 and the input of original video, and can comprise that the output of block size, motion vector distortion measure and reference frame identifier is provided to mode decision module 715.The example operation of Figure 11 account for motion estimation module 720.As shown in the figure, process 1100 is to import 1102 beginnings.At frame grade place, module 720 receives the input of reference frame ID and motion vector.At macro zone block grade place, input 1102 comprises input pixel and reference frame pixel.Process 1100 proceeds to step 1104, wherein carries out color motion estimation (ME) and motion vector prediction.In order to carry out this process, receive various inputs, comprising MPEG-2 motion vector and brightness motion vector MV1106, motion vector level and smooth 1108 and non-causal motion vector 1110.Next, process 1100 advances to square frame 1112, carries out motion vector sought algorithm or method herein, for example hexagon or diamond search method.Can comprise that to the input of the process at square frame 1112 places absolute difference summation (SAD) shown in square frame 1114, difference of two squares summation (SSD) and/or other measure.In case carried out motion vector sought, process 1100 just advances to and stops square frame 1116, and executive termination is handled herein.Process 1100 then finishes at output square frame 1118 places, the output that described output square frame 1118 produces block size, motion vector (MV), distortion measure and reference frame identifier.

The scalability R-D of basal layer and enhancement layer

The exemplary flow chart of the scalability process 1300 that Figure 13 explanation can be carried out by scalability R-D module 718.Process 1300 begins at beginning square frame 1302 places and advances to square frame 1304, and scalability R-D module 718 receives input and the execution estimation from motion estimation module 720 herein.Estimation depends on the input as basal layer reference frame, enhancement layer reference frame and the initial frame to be encoded of square frame 1306 indications.This information can be communicated to scalability R-D module 718 by 612 calculating of GOP dispenser and via (for example) metadata.Process 1300 advances to the scalability information of square frame 1308 with specified data basal layer and enhancement data.Next as shown in square frame 1310, carry out the basal layer coding, in square frame 1312, carry out enhancement layer coding subsequently.The coding of enhancement layer can use at the basal layer coding result of inter-layer prediction as input (illustrated as square frame 1314), therefore in time, and its execution after the basal layer coding.This is further described in co-pending [attorney docket/reference number 050078] number U.S. patent application case that is entitled as " scalable video coding (SCALABLE VIDEO CODING WITHTWO LAYER ENCODING AND SINGLE LAYER DECODING) with the decoding of two-layer coding and individual layer ".After finishing coding, process 1300 finishes at square frame 1316 places.

Section/macro zone block sequencing

First pass part 702 also comprises section/macro zone block sequence module 722, and its reception is provided to mode decision module 715 from the input of the error resilient module 740 in second time part and with slice alignment information.Section is the block of encoded video data of (entropy decoding) of can independently decoding.Access unit (AU) is encoded frame of video, its each comprise one group of NAL unit, described one group of NAL unit always contains just what a main encoded picture.Except described main encoded picture, access unit also can contain one or more redundancy encoding pictures or not contain the section of encoded picture or other NAL unit of slice of data subregion.The decoding of access unit always produces decoding picture.

Frame can provide the time division multiplexing piece of the physical layer packet of high time diversity (being called the TDM encapsulation).Superframe corresponding to a unit interval (for example, 1sec) and contain four frames.To cut into slices in time domain, generation separates and the location the most effective of destroyed data with AU boundary alignment frame boundaries.During overdamp, the most of continuous datas in the TDM encapsulation are subjected to erroneous effects.Because the cause of time diversity, there is a strong possibility is undamaged in residue TDM encapsulation.Can utilize not ruined data to restore and hide the obliterated data that encapsulates from affected TDM.Similar logic is applicable to frequency domain multiplexed (FDM), and wherein the separation of the frequency sub-carrier by data symbol modulation obtains frequency diversity.In addition, similar logic is applicable to the diversity of other form of using usually in space diversity (by separating of transmitter and receiver antenna) and the wireless network.

For section and AU are aimed at frame, foreign key (FEC) code block is created and is sealed equally and should aim at the MAC layer.Figure 20 illustrate the section and AU in coding video frequency data or the tissue of video bit stream.Encoded video can constitute by one or more bit streams (for example, the base layer bitstream of applying hierarchical video coding and enhanced layer bit).

Video bit stream comprises the AU that is illustrated by frame 1 ' 2005, frame 3 ' 2010 and frame M ' 2015 as among Figure 20.AU comprises data slicer, as illustrated by section 1 2020, section 2 2025 and section N 2030.Each of section begins by opening code identification and is provided for network-adaptive.In general, I frame or intraframe coding AU are bigger, then are P frame or forward prediction frame, then are the B frames.AU is encoded to a plurality of sections makes and causing sizable overhead cost aspect the encoded bit rate, because the spatial prediction in the section is restricted and the header of cutting into slices also acts on overhead.Because slice boundaries is again a synchronous points, so continuous physical layer bag is restricted to section control mistake, this is because when PLP is damaged, mistake is limited to the section among the described PLP, if and PLP contains the part of a plurality of sections or a plurality of sections, so wrong all sections or the sliced section that will influence among the described PLP.

Because the I frame is big (for example, about tens kilobits) usually, thus since the overhead that a plurality of sections cause to account for the ratio of total I frame sign or total bit rate little.And, in intraframe coding AU, have more section and realize better and more frequent synchronous again and more effective spatial error concealment.And because P frame and B frame be from the prediction of I frame, thereby the I frame carries most important information in the video bit stream.The I frame also serves as and is used for the random access point that channel obtains.

Referring now to Figure 21; the section of carefully the I frame being aimed at frame boundaries and will have IAU is equally aimed at frame boundaries and is realized that the most effective wrong control, error protection are (if this is owing to belong to a slice loss of frame 1 2105; the section that belongs to frame 2 2110 so probably is undamaged; separate because frame 2 2110 and frame 1 2105 have the obvious time), can by more synchronously and error concealing come the execution error recovery.

Because the size of P frame is typically about several kilobits, thus with the section of P frame and an integer P frame aim at frame boundaries realize error resilient and can to efficient produce harmful loss (for the similar reason of the reason of I frame).Can utilize temporal error concealment in aspect these are a little.Perhaps, disperse the continuous P frame so that it arrives the time diversity that increase can be provided in P interframe in the different frame, this may be because the time is hidden motion vector and the data that are based on from the I frame or the P frame of previous reconstruction.The B frame can very little (hundreds of position) arrive appropriateness big (a few kilobit).Therefore, an integer B frame need not aimed at frame boundaries to realize error resilient and can be produced harmful loss efficient.

The mode decision module

Figure 12 illustrates some examples of the operation of mode decision module 715.As shown in the figure, process 1200 begins at input square frame 1202 places.In an illustrative example, the various information that are input to mode decision module 715 comprise Y 16 * 16 patterns, the interior UV pattern of frame, motion vector data (MVD), quantization parameter (QP), SpPredMB4 * 4Y, SpPredMB16 * 16Y, SpPredMB8 * 8U, SpPredMB8 * 8V, rate distortion flag, original YMB pixel, original UMB pixel and original VMB pixel in 16 * 16 costs in slice type, intra-frame 4 * 4 cost, the frame, interior UV 8 * 8 costs of frame, the frame.Then, process 1200 advances to the initialization of square frame 1204 coding, its can by instruct as the input signal or the interface of the encoder initialization that square frame 1206 is indicated initial.Initialization can comprise the permission pattern of setting (comprise skip, directly), set pattern flexible strategy (if necessary, default value will be equal flexible strategy for all patterns) and setting buffer.After initialization, process 1200 advances to square frame 1208, carry out the main processing be used for mode decision herein, comprising: calculate be used for each allow macro zone block (MB) the pattern cost of pattern, with each MB pattern cost of weighted factor weighting with select minimum MB pattern cost pattern.The related input of these operations comprises as square frame 1210 and 1212 illustrated estimation (for example, MVD and prediction) and spatial predictions (for example, cost and prediction in all frames).Being situated between with mode decision module 715, what connect is entropy coding in the square frame 1214, and it especially improves compression speed.Process 1200 advances to square frame 1216, and buffer is through upgrading to communicate information to second time part 706 of encoder herein.At last, process 1200 advances to square frame 1218, " the best " coding mode can be communicated to second time part 706 of encoder herein.

Second time part of encoder

Referring to Fig. 7, second time part 706 of encoder 228 comprises the second time coder module 232 that is used to carry out second time coding once more.Second time encoder 232 reception is from the output of mode decision module 715.Second time encoder 232 comprises MC/ change quantization module 726 and zigzag (ZZ)/entropy coder 728.The result of second time encoder 232 is output to scalability module 730 and bit stream packing module 731, and bit stream packing module 731 encoded basal layers of output and enhancement layer are for transmitting (illustrating among Fig. 2) by transcoder 200 via layer, sync 240.As shown in Figure 2, please note from second time encoder 232 and again the basal layer of encoder 234 and enhancement layer to be compiled by layer, sync 240 be subpackage PES 242, data PES 244 (for example, CC and other text data) and the audio frequency PES246 that comprises basal layer and enhancement layer.Note that audio coder 236 receives through decoded audio information 218, and the described information and encoded information 238 outputed to layer, sync 240 of encoding again.

Encoder again

Referring to Fig. 7, second time part 706 of encoder also comprises encoder 234 again once more, and it is corresponding to the encoder again 234 among Fig. 2.Encoder 234 also receives the output of first pass part 702 and comprises MC/ change quantization 726 and ZZ/ entropy coding 728 parts again.In addition, scalability module 730 outputs to encoder 234 again.Encoder 234 will be from the gained basal layer and the enhancement layer of coding output to bit stream packing module 731 for being transferred to synchronizer (for example, the layer, sync shown in Fig. 2 240) more again.Encoder 228 examples among Fig. 7 also comprise rate controlled fine setting module 738, its with bit stream packing feedback be provided to second time in the encoder 232 MC/ change quantization module 234 and again the ZZ/ entropy module in the encoder 234 736 both, to help to adjust second time coding (for example, to increase compression efficiency).

The error resilient module

Encoder 228 examples illustrated in fig. 7 also comprise the error resilient module 740 in the part 706 second time.Error resilient module 740 is communicated by letter with section/MB sequence module 722 with bit stream packing module 731.Error resilient module 740 receives the metadata from preprocessor 228, and selects error resiliency schemes, for example section and access unit is aimed at frame boundaries, predictability hierarchy and applicability frame refreshing.Can select error resiliency schemes based on the information that in metadata, receives or according to the information that is communicated to the error resilient module from bit stream packing module 731 and section/MB sequence module 722.Error resilient module 740 is provided to the error resilient process of section/macro zone block (MB) sequence module to implement to select in the first pass part 702 with information.The video transmission that easily makes mistakes in the environment can be adopted error resilient strategy and algorithm, and it can cause to watching the user to represent clearer and less wrong data of filling.Following error resilient is described any individuality or the combination that can be applicable to existing or future application, transmission and physical layer or other technology.Wrong susceptible characteristic between the integrated osi layer of effective anti-mistake algorithm and error protection ability are together with the understanding of the desirable characteristics (for example low latency and format high throughput) of communication system.Error resilient is handled can be based on the content information of multi-medium data the classifying content of multi-medium data (for example, based on).One of major advantage is the recoverability from decay and multi-path channel mistake.Error resilient method described below be specifically about can being incorporated in the encoder 228 process of (for example, in particular, in error resilient module 740 and section/MB sequence module 722), and may extend into the data communication in the environment of easily makeing mistakes usually.

Error resilient

For hybrid compression system based on prediction, absolute coding intracoded frame under the situation of not carrying out prediction any time.Can be in time from past frame (P frame) and future frame (B frame) prediction inter-frame encoding frame.Can in reference frame (one or more), discern optimum prediction person by search procedure, and for example use distortion measure such as SAD to discern optimum Match.The predictive coding district of present frame has the piece of all size and shape (16 * 16,32 * 32,8 * 4 etc.) or is identified as the object pixels group by (for example) segmentation.

Time prediction is extended on many frames (for example, 10 to tens frames) usually and is stopped when encoding a frame as the I frame, and GOP is defined by the I frame rate usually.For maximum coding frequency, GOP is a scene, and for instance, the GOP border is aimed at scene boundary and scene change frame is encoded as the I frame.In the harmonic motion sequence, comprise static relatively background, and motion is limited to foreground object usually.The example of the content of these a little harmonic motion sequences comprises news and weather predicting program, and that wherein watches maximum contents has this character more than 30%.In the harmonic motion sequence, the great majority district is by interframe encode, and predictive frame comes later with reference to the I frame by the medium range forecast frame.

Referring to Figure 22, the Intra-coded blocks 2205 in the I frame is the dopesters that are used for the inter-coded block 2210 of coded frame (or AU) P1.In this example, the district with these pieces is the stationary part of background.By continuous time prediction, the susceptibility of 2205 pairs of mistakes of Intra-coded blocks rises, because it is also to hint its " importance " higher " well " dopester.In addition, Intra-coded blocks 2205 since be called prediction chains this time prediction chains and in display RT long (in described graphic example, continuing the duration of scene).

The prediction hierarchy is defined as measuring and the piece tree of creating based on this level of significance or RT, and wherein mother subrogates in the top (Intra-coded blocks 2205) and filial generation is positioned at the bottom.Note that the inter-coded block 2215 among the P1 is on the second level of hierarchy, the rest may be inferred.Leaf is the piece that stops prediction chains.

Can be video sequence creation prediction hierarchy and no matter content type (for example and music and physical culture and be not only to be news) how, and it can be applicable to video (and data) compression (this is applicable to all inventions of describing in the application's case) based on prediction usually.In case set up the prediction hierarchy, just more effectively application examples such as adaptability frame refreshing error resilient algorithms such as (hereinafter describing).Importance is measured can be based on given recoverability from mistake, for example by hiding operation and application adaptability frame refreshing to strengthen the recovery of coding stream to mistake.The estimation that importance is measured can be based on the number of times (also be called RT measure) of piece as the dopester.RT is measured and also is used for improving code efficiency by stoping predicated error to be propagated.Described RT is measured also the position that increases the piece that is used to have higher importance and is distributed.

The adaptability frame refreshing

The adaptability frame refreshing is can be based on the error resilience technique of the content information of multi-medium data.Even it should be interframe encode MB that standard R-D optimization will be indicated MB, but in the frame refreshing process, some MB are by intraframe coding.AIR adopts motion weighting frame refreshing to introduce intraframe coding MB in P frame or B frame.Available QP _bOr QP _eThese intraframe codings MB (it is included in the basal layer) encodes.If QP _eBe used for basal layer, do not need to improve at the enhancement layer place so.If QP _bBe used for basal layer, it may be suitable improving so, otherwise at the enhancement layer place, the general who has surrendered is tangible under the quality.Because interframe encode is more effective than intraframe coding on the meaning of code efficiency, thereby these improvement will be by interframe encode at the enhancement layer place.In this way, base layer coefficient will be not used in enhancement layer, and improve quality under the situation of not introducing new operation at the enhancement layer place.

In certain aspects, the adaptability frame refreshing can replace or add motion weighting basis based on the content information (for example, classifying content) of multi-medium data.For instance, if classifying content higher relatively (for example, having the scene of high spatial and time complexity), the adaptability frame refreshing can be introduced more relatively intraframe coding MB in P or B frame so.Perhaps, if classifying content relatively low (indication has the dynamic scene of owing of low spatial and/or time complexity), the adaptability frame refreshing can be introduced less intraframe coding MB in P or B frame so.This is used for improving the situation that not only can be applicable to wireless multimedia communication with method of measuring of error resilient a bit, and can handle (for example, in graphic rendition) at data compression and multimedia usually.

Channel switch frame

The channel switch frame (CSF) of this paper definition is to describe in order to realize in broadcasting is multiplexed that Fast Channel obtains and the therefore broad terms of the random access frame of the fast channel variation between the stream and appropriate position insertion in broadcasting stream.Channel switch frame also increases anti-mistake, has spendable redundant data under the wrong situation because it is provided at main frame transmission.The I frame or in proper order I frame (for example H.264 in the frame of decoder refresh in proper order) usually as random access point.Yet, frequent I frame (or short GOP, shorter than the scene duration) cause compression efficiency significantly to reduce.Because Intra-coded blocks can be used for obtaining error resilient, so can be by predicting that effective combining random access of hierarchy and error resilient are to improve code efficiency when increasing anti-mistake.

Can realize jointly that arbitrary access switches and resist the improvement of mistake, and it can be based on content informations such as for example classifying contents.For the harmonic motion sequence, prediction chains is long and be included in the I frame that scene begins to locate to occur in order to the pith of rebuilding superframe or the needed information of scene.Channel error is paroxysmal often, and when the failure of decay and FEC and chnnel coding takes place, has great residual error and feasible hiding failure.For harmonic motion (and so low bitrate) sequence, because the lazy weight of coded data is to provide good time diversity in video bit stream, and because these sequences are to reproduce each the high compression sequence help rebuilding, so this phenomenon is especially serious.Owing to content character---the more fresh informations in each frame have increased the number that can independently decode and have more in essence the Intra-coded blocks of error resilient, and high motion sequence has better anti-mistake.Adaptability frame refreshing based on the prediction hierarchy is realized high-performance at high motion sequence, and improvement in performance is not remarkable for the harmonic motion sequence.Therefore, the channel switch frame that contains most of I frames is the good source that is used for the diversity of harmonic motion sequence.When superframe made a mistake, the decoding in the successive frame was from CSF, and CSF restores drop-out and realizes error resilient owing to prediction.

High motion sequence (for example have under the situation of higher relatively classifying content (for example, 6 to 8) sequence), CSF can be made up of the piece that remains among the SF---those pieces are good predict persons.Needn't encode all other district of CSF, because these pieces are the pieces with short prediction chains, this is hinting described with the termination of a block in the frame.Therefore, CSF still is used for when making a mistake owing to drop-out is restored in prediction.Be used for the CSF of harmonic motion sequence and the size of I frame and be equal to, and can be by more volumeization is to encode than low bitrate, the CSF that wherein is used for high motion sequence is much smaller than corresponding I frame.

Error resilient based on the prediction hierarchy can work to scalability well, and can realize highly effectively hierarchical coding.May need to carry out data with specific bandwidth comparison video bit stream in order to the scalability of supporting the hierarchical modulation in the physical-layer techniques cuts apart.These specific bandwidth ratios may always not be used for the ideal ratio of best scalability (for example, having minimum overhead).In certain aspects, use two-layer scalability with 1: 1 bandwidth ratio.For the harmonic motion sequence, video bit stream is divided into equal-sized two layers may be ineffective.For the harmonic motion sequence, the basal layer that contains all headers and metadata information is greater than enhancement layer.Yet because it is bigger to be used for the CSF of harmonic motion sequence, thereby it is fit to the remaining bandwidth at enhancement layer place just.

High motion sequence has enough residual, information makes the data of available minimum overhead realization 1:1 cut apart.In addition, the channel switch frame that is used for these a little sequences is much smaller than being used for high motion sequence.Therefore, equally can be well the scalability of high motion sequence be worked based on the error resilient of prediction hierarchy.Based on can extending the above notion of discussing at the modest movement chip to the description of these algorithms, and proposed notion is applicable to common video coding.

Multiplexer

In aspect some encoders, multiplexer can be used for encoding a plurality of media streams of producing by encoder and be used to prepare encoded position for broadcasting.For instance, aspect the illustrative of encoder shown in Figure 2 228 in, layer, sync 240 comprises multiplexer.Described multiplexer can be through implementing so that bit rate allocation control to be provided.Estimated complexity can be provided to multiplexer, described multiplexer can be followed according to distributing available bandwidth for those video channels at the codec complexity of a collection of multiplex video channel expection, so this quality of permitting particular channel keeps constant relatively, even the bandwidth of described batch of multiplex video stream is constant relatively.This makes the channel in a collection of channel have variable-digit speed and relative constant visual quality, but not constant relatively bit rate and variable visual quality.

Figure 18 is the block diagram of the system of explanation a plurality of media streams of coding or channel 1802.Described media stream 1802 is by 1804 codings of encoder separately, and encoder 1804 is communicated by letter with multiplexer (MUX) 1806, and multiplexer (MUX) 1806 is communicated by letter with transmission medium 1808 again.For instance, media stream 1802 can be corresponding to a plurality of content channels, for example news channel, sports channel, film channels and its similar channel.Encoder 1804 is encoded to the specified coding form at described system with media stream 1802.Although be described in the context of encoded video streams, the principle and advantage of the technology that discloses is applicable to the media stream that comprises (for example) audio stream usually.Encoded media stream is provided to multiplexer 1806, and multiplexer 1806 makes up a plurality of encoded media streams and mix flow is sent to transmission medium 1808 for transmission.

Transmission medium 1808 can be corresponding to various medium, for example (but being not limited to) digital communication by satellite (DirecTV for example

), digital cable, wiredly communicate by letter with wireless Internet, optic network, cell phone network etc.Transmission medium 1808 can comprise (for example) radio frequency (RF) modulation.Usually, because the cause of frequency spectrum constraint etc., transmission medium has finite bandwidth and the data from multiplexer 1806 to transmission medium maintain constant relatively bit rate (CBR).

In conventional system, the encoded multimedia or the video flowing that use constant bit-rate (CBR) may need to be input to multiplexer 1806 in output place of multiplexer 1806 also are CBR.Described in background technology, when video content, use CBR can cause variable visual quality, variable visual quality is normally undesirable.

In illustrated system, the expection codec complexity of data is imported in both or both the above reception and registration in the described encoder 1804.One or more in the described encoder 1804 can be used as response and receive from multiplexer 1806 and adapt to bit rate control.This encoder 1804 of permitting the relative more complicated video of expectation coding is will definitely displacement speed mode receiving high bit speed or higher bandwidth (every frame is multidigit more) at those frame of video.This allowance is encoded to media stream 1802 has constant visual quality.The specific encoder 1804 employed extra bandwidths of coding relative complex video are from should be at encoder through being embodied as the position of other video flowing 1804 that is used under the situation with the constant bit-rate operation to encode.This output with multiplexer 1806 maintains constant bit-rate (CBR).

Although each media stream 1802 can have " sudden " relatively, promptly in institute's utilized bandwidth, change, the cusum of a plurality of video flowings can have less " sudden ".Can be re-assigned to the channel of the video of the relative complex of encoding by (for example) multiplexer 1806 from the bit rate of the channel of the more uncomplicated video of encoding, and this can strengthen the visual quality of combined video stream on the whole.

Encoder 1804 provides encoding together and the indication of the complexity of multiplexed one group of frame of video to multiplexer 1806.The output of multiplexer 1806 should provide the output that is not higher than for the bit rate of transmission medium 1808 appointments.The complexity indication can be based on the classifying content of as above discussing to provide selected credit rating.Multiplexer 1006 is analyzed described complexity indication, and provides allotment purpose position or bandwidth to each encoder 1004, and encoder 1804 uses encode frame of video in described group of these information.This permits one group of frame of video variable-digit speed of respectively doing for oneself, and still realizes constant bit-rate as a group.

Classifying content also can be used for making substantially any universal compressed device can be based on the quality compressing multimedia.Classifying content described herein and method and apparatus can be used for handling based on quality and/or content-based multimedia of any multi-medium data.An example is that it is used for the multimedia compression of any universal compressed device substantially.Another example is decompression and the decoding that is used for any decompressor or decoder or preprocessor, for example interpolation, resampling, enhancing, recovers and presents operation.

Now referring to Figure 19, typical video communication system comprises video compression system, and it is made up of video encoder and Video Decoder, and described video encoder is connected by communication network with Video Decoder.Wireless network is the class network of easily makeing mistakes, and wherein communication channel also shows logarithm-normal state decay or covers and multipath attenuation in mobile contextual except path loss.In order to resist channel error and for application layer data provides reliable communication, radio frequency modulator comprises forward error correction, comprising interleaver and chnnel coding (for example convolution or turbine coding).

Video compression has reduced redundant in the video of source and has increased the amount of information that carries in each of encoded video data.This increased in addition the sub-fraction of encoded video when losing to the influence of quality.Room and time prediction intrinsic in the video compression system has increased the weight of loss and has caused error propagation, thereby produces visible false picture in reconstruction video.The error resilient algorithm at video encoder place and the error resilient algorithm at Video Decoder place have strengthened the anti-mistake of video compression system.

Usually, video compression system is not known bottom-layer network.Yet, in the network of easily makeing mistakes, be starved of the FEC in the error protection algorithm in the application layer and the link/physical layer and chnnel coding is integrated or aim at, and it is providing maximal efficiency aspect error performance of enhanced system.

An example of the rate-distortion data flow that Figure 14 explanation may occur in the encoder 228 in order to coded frame.Process 1400 begins at beginning 1402 places, and advances to decision block 1404, and it receives scene change detector input 1410 (for example, via metadata) from preprocessor 226 herein, and obtains error resilient input 1406.If information indication selected frame is the I frame, process is carried out intraframe coding to described frame so.If information indication selected frame is P frame or B frame, process is used intraframe coding and estimation (interframe) the described frame of encoding so.

After the condition of affirming appearred in the condition to square frame 1404, process 1400 advances to prepared square frame 1414, herein with the speed R value of being set at R=Rqual (based on the required aimed quality of R-D curve).This setting is to receive from the data square frame 1416 that comprises the R-D curve.Process 1400 then advances to square frame 1418, distributes { Qpi} based on carrying out the rate controlled position from the image/video action message (for example, classifying content) of the classifying content process at square frame 1420 places herein.

The rate controlled position distributes square frame 1418 to be used for the estimation of square frame 1422 again.Estimation 1422 also can receive from the input of the metadata of preprocessor 1412, from the motion vector level and smooth (MPEG-2 and history) of square frame 1424 with from a plurality of reference frames (cause and effect and non-causal macro zone block MB) of square frame 1426.Process 1400 then advances to square frame 1428, distributes (Qpi) for the rate controlled position herein and determines the rate calculations of intra-frame encoding mode.Next process 1400 advances to square frame 1430, herein deterministic model and quantization parameter.What the mode decision of square frame 1430 was based on 1422 inputs of estimation square frame, error resilient 1406 imports and scalability R-D (it is determined at square frame 1432 places) carries out.In case decision-making mode, flow process just advance to square frame 1432.Note that flowing to 1432 from square frame 1430 occurs in data when the first pass of encoder partly is delivered to second time part.

At square frame 1432 places, the second time execution conversion and the quantification of encoder 228.As square frame 1444 indicated to as described in the transform/quantization process adjust or finely tune.This transform/quantization process may be subjected to the influence of rate controlled fine setting module (Fig. 7).Process 1400 then advances to square frame 1434 and carries out tortuous classification and entropy coding to produce encoded basal layer.Tortuous classification is prepared quantized data for coding with valid format.Entropy coding is to use a series of bit codes to represent the compress technique of one group of possibility symbol.Also the enhancement layer result with transform/quantization square frame 1432 sends to adder 1436, and adder 1436 deducts basal layer and the result is sent to the ZZ/ entropy coder 1438 that is used for enhancement layer, such as before referring to Figure 31 to 36 description.Notice further that please enhancement layer is fed back (seeing line 1440 true speed renewals) with the operation for the rate controlled use of classifying content 1420 that upgrades true speed and the long-term and short-term history that is used for definite bit rate.

Figure 17 A is the flow chart of process, and described process is used for the encoded multimedia data so that in time domain data boundary is aimed at frame boundaries, and the content-based information of wherein said coding.Can come implementation 1700 by equipment illustrated among Figure 17 B and other assembly (comprising transcoder 200) disclosed herein.Process 1700 begins, and it obtains the content information of multi-medium data at square frame 1702 places.This can pass through classifying content device 1712 (Figure 17 B) or another assembly (for example, the classifying content device 712 among Fig. 7) is carried out.Process 1700 then proceeds to square frame 1704, and its encoded multimedia data are wherein encoded based on the content information in the acquisition of square frame 1702 places in time domain data boundary is aimed at frame boundaries in square frame 1704.This can or carry out by transcoder 200 (Fig. 7) in another example by encoder 1714 (Figure 17 B).With data boundary and the time boundary alignment recovery that can lead to errors.

But Figure 17 B is the high-level block diagram of the encoding device 1710 of process illustrated among execution graph 17A and Figure 17 C.Equipment 1710 can comprise the classifying content device, for example is used to obtain the module 1712 of the content information of multi-medium data.The example of classifying content device also is illustrated referring to Fig. 7 and describes, and it can comprise (for example) content, classification module, encoder, preprocessor or transcoder.Equipment 1710 also comprises the device that is used for the encoded multimedia data, for example is used for the encoded multimedia data so that the module 1714 of data boundary being aimed at frame boundaries in time domain.The example of this device comprises encoder (for example, encoder 228) or transcoder (for example, transcoder 200).

Figure 17 C is the flow chart that is used for the process 1750 of encoded multimedia data.Process 1750 begins, and obtains the classifying content of multi-medium data at square frame 1752 places.This can pass through classifying content device 1712 (Figure 17 B) or another assembly (for example, the classifying content device 712 among Fig. 7) is carried out.Process 1750 proceeds to square frame 1754, and content-based classification is encoded to Intra-coded blocks or inter-coded block with multi-medium data in square frame 1754.This can or carry out by transcoder 200 (Fig. 7) in another example by encoder 1714 (Figure 17 B).Then process 1750 finishes, till the further multi-medium data of needs is handled.

Figure 23,24,27 and 28 is process flow diagrams of the method for the exemplary illustrated encoded multimedia data of implementing aspect described herein.Figure 23 is the process flow diagram of the process 2300 of the content-based information encoded multimedia data of explanation.At square frame 2305 places, process 2300 receives encoded multi-medium data, and at square frame 2310 places, process 2300 decoding multimedia data.At square frame 2315 places, process 2300 determine with through decoding multimedia data associated content information.At square frame 2320 places, process 2300 is based on described content information encoded multimedia data.

Figure 24 is that explanation encoded multimedia data are so that content-based message level comes the process flow diagram of the process 2400 on aligned data border.At square frame 2405 places, process 2400 obtains and multi-medium data associated content information, and this can be undertaken by preprocessor 226 shown in (for example) Fig. 7 or content, classification module 712.At square frame 2410 places, process 2400 encoded multimedia data are so that content-based information is come the aligned data border.For instance, based on the classifying content of the multi-medium data that just is being encoded slice boundaries and access unit border are aimed at frame boundaries.Then, encoded data can be used for subsequent treatment and/or are transferred to mobile device, and process 2400 finishes.

Figure 27 is that explanation is used for the process flow diagram that content-based information is used the process 2700 of adaptability intra refresh schemes coded data.When process 2700 beginnings, obtained multi-medium data.At square frame 2705 places, process 2700 obtains the content information of multi-medium data.Obtaining content information can be carried out by (for example) above-mentioned preprocessor 226 or content, classification module 712.Process 2700 advances to square frame 2710, it uses adaptability frame refreshing error resiliency schemes encoded multimedia data herein, and wherein adaptability frame refreshing error resiliency schemes is based on content information.The functional of square frame 2710 can be carried out by encoder 228.Make encoded data can be used for subsequent treatment and transmission, and process 2700 then finish.

Figure 28 is the process of redundant I frame encoded multimedia data is used in explanation based on content of multimedia information a process flow diagram.When process 2800 beginnings, multi-medium data can be used for handling.At square frame 2805 places, process 2800 obtains the content information of multi-medium data.As mentioned above, this can be undertaken by (for example) preprocessor 226 and/or encoder 228.At square frame 2810 places, process 2800 encoded multimedia data are so that content-based information is inserted into one or more extra I frames in the encoded data.This can be carried out in conjunction with error resiliency schemes by aforesaid encoder 228, and it still is to decide according to the error resiliency schemes that is adopted in the enhancement layer that the I frame is inserted into basal layer.After square frame 2810, encoded data can be used for subsequent treatment and/or are transferred to mobile device.

It should be noted that method described herein can implement in known various communication hardwares, processor and the system of those skilled in the art.For instance, the common requirement that client is operated as this paper describes is that client has display and comes displaying contents and information, has the operation that processor is controlled client, and has memory and store data and the program relevant with the operation of client.In one aspect, client is a cellular phone.In another aspect, client is the handheld computer with communication capacity.In aspect another, client is the personal computer with communication capacity.In addition, hardware such as for example gps receiver can be incorporated in the client to implement described various aspect.Various illustrative logical, logical block, module and the circuit of describing in conjunction with aspect disclosed herein can be implemented or carry out with general processor, digital signal processor (DSP), application-specific integrated circuit (ASIC) (ASIC), field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components or through designing with its any combination of carrying out function described herein.General processor can be a microprocessor, but in replacement scheme, described processor can be any conventional processors, controller, microcontroller or state machine.Processor also can be embodied as the combination of calculation element, for example DSP and combination, a plurality of microprocessor of microprocessor, one or more microprocessors that combine the DSP core or any other this type of configuration.

Various illustrative logical, logical block, module and the circuit of describing in conjunction with aspect disclosed herein can be implemented or carry out with general processor, digital signal processor (DSP), application-specific integrated circuit (ASIC) (ASIC), field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components or through designing with its any combination of carrying out function described herein.General processor can be a microprocessor, but in replacement scheme, described processor can be any conventional processors, controller, microcontroller or state machine.Processor also can be embodied as the combination of calculation element, for example DSP and combination, a plurality of microprocessor of microprocessor, one or more microprocessors that combine the DSP core or any other this type of configuration.

The method and apparatus that is disclosed provides the video data code conversion with a kind of form coding is the video data with another kind of form coding, and wherein said coding restores mistake based on the content and the described coding of video data.Method of describing in conjunction with example disclosed herein or algorithm can be directly hardware, the software module of carrying out by processor, firmware or in these both or both more than combination in implement.Software module can reside in the medium of any other form known in RAM memory, flash memory, ROM memory, eprom memory, eeprom memory, register, hard disk, removable dish, CD-ROM or this technology.Exemplary storage medium is coupled to processor, makes that processor can be from read information with to the medium writing information.In replacement scheme, medium can be integral formula with processor.Processor and medium can reside among the ASIC.ASIC can reside in the user terminal.In replacement scheme, processor and medium can be used as discrete component and reside in the user terminal.

Example described above only is exemplary, and under the situation that does not break away from inventive concepts disclosed herein, the those skilled in the art can diversifiedly use and change above-mentioned example now.Various modifications to these examples are conspicuous for the those skilled in the art, and under the situation of the spirit or scope that do not break away from novel aspect as herein described, basic principle defined herein can be applicable to other example, for example passes in service or any general wireless data communication applications at instant message.Therefore, the scope of this disclosure is without wishing to be held to example shown in this article, but should meet and principle disclosed herein and novel feature the widest consistent scope.Word " exemplary " is exclusively used in this article and means " serving as example, example or explanation ".Any instance interpretation that there is no need this paper is described as " exemplary " is preferred or advantageous for comparing with other example.Therefore, novel aspect described herein will be only by the scope definition of appended claims.

Claims

1. method of handling multi-medium data, it comprises:

Obtain the content information of multi-medium data; And

Encode described multi-medium data so that in time domain data boundary is aimed at frame boundaries, and wherein said coding is based on described content information.

2. method according to claim 1, wherein said content content classification.

3. method according to claim 1 wherein obtains described content information and comprises the described content information of calculating from described multi-medium data.

4. method according to claim 1, wherein said data boundary comprise I frame data border.

5. method according to claim 1, wherein said data boundary comprise the border of the encoded data that can independently decode of described multi-medium data.

6. method according to claim 1, wherein said data boundary comprises slice boundaries.

7. method according to claim 1, wherein the described border of encoded data is the access unit border of intraframe coding.

8. method according to claim 1, wherein said data boundary comprises the P frame boundaries.

9. method according to claim 1, wherein said data boundary comprises the B frame boundaries.

10. method according to claim 1, wherein said content information comprises the complexity of described multi-medium data.

11. method according to claim 10, wherein said complexity comprises time complexity, space complexity or time complexity and space complexity.

12. an equipment that is used to handle multi-medium data, it comprises:

The classifying content device, it is configured to determine the classifying content of multi-medium data; And

Encoder, its described multi-medium data that is configured to encode is so that aim at data boundary in time domain with frame boundaries, and wherein said coding is based on described content information.

13. equipment according to claim 12, wherein said content information content classification.

14. equipment according to claim 12, wherein said classifying content device are configured to obtain described content information by the described content information that calculates from described multi-medium data.

15. equipment according to claim 12, wherein said data boundary comprise I frame data border.

16. equipment according to claim 12, wherein said data boundary comprise the border of the encoded data that can independently decode of described multi-medium data.

17. equipment according to claim 12, wherein said data boundary comprises slice boundaries.

18. equipment according to claim 12, wherein the described border of encoded data is the access unit border of intraframe coding.

19. equipment according to claim 12, wherein said data boundary comprises the P frame boundaries.

20. equipment according to claim 12, wherein said data boundary comprises the B frame boundaries.

21. equipment according to claim 12, wherein said content information comprises the complexity of described multi-medium data.

22. equipment according to claim 21, wherein said complexity comprises time complexity, space complexity or time complexity and space complexity.

23. an equipment that is used to handle multi-medium data, it comprises:

Be used to obtain the device of the content information of multi-medium data; And

Be used for encoding described multi-medium data so that the device of data boundary being aimed at frame boundaries in time domain, wherein said coding is based on described content information.

24. a processor, it is configured to:

Obtain the content information of multi-medium data; And

25. processor according to claim 24, wherein said content information content classification.

26. processor according to claim 24, wherein said classifying content device are configured to obtain described content information by the described content information that calculates from described multi-medium data.

27. processor according to claim 24, wherein said data boundary comprise I frame data border.

28. processor according to claim 24, wherein said data boundary comprise the border of the encoded data that can independently decode of described multi-medium data.

29. processor according to claim 24, wherein said data boundary comprises slice boundaries.

30. processor according to claim 24, wherein the described border of encoded data is the access unit border of intraframe coding.

31. processor according to claim 24, wherein said data boundary comprises the P frame boundaries.

32. processor according to claim 24, wherein said data boundary comprises the B frame boundaries.

33. a machine-readable medium that comprises instruction, described instruction cause machine when carrying out:

Obtain the content information of multi-medium data; And

34. machine-readable medium according to claim 33, wherein said content information content classification.

35. machine-readable medium according to claim 33, it further comprises instruction, and wherein said classifying content device is configured to obtain described content information by the described content information that calculates from described multi-medium data.

36. machine-readable medium according to claim 33, wherein said data boundary comprise I frame data border.

37. machine-readable medium according to claim 33, wherein said data boundary comprise the border of the encoded data that can independently decode of described multi-medium data.

38. machine-readable medium according to claim 33, wherein said data boundary comprises slice boundaries.

39. machine-readable medium according to claim 33, wherein the described border of encoded data is the access unit border of intraframe coding.

40. a method of handling multi-medium data, it comprises:

Obtain the classifying content of described multi-medium data; And

Is Intra-coded blocks or inter-coded block based on described classifying content with the block encoding in the described multi-medium data, to increase the error resilient of described encoded multi-medium data.

41. according to the described method of claim 39, wherein said coding comprises the number that increases the macro zone block that is encoded as the intraframe coding macro zone block corresponding to the minimizing of described classifying content.

42. according to the described method of claim 39, wherein said classifying content is based on space complexity, time complexity or space complexity and the time complexity of described multi-medium data.

43. an equipment that is used to handle multi-medium data, it comprises:

The classifying content device, it is configured to obtain the classifying content of described multi-medium data; And

Encoder, it is configured to based on described classifying content is Intra-coded blocks or inter-coded block with the block encoding in the described multi-medium data, to increase the described error resilient of described encoded multi-medium data.

44. according to the described equipment of claim 43, wherein said encoder further is configured to increase corresponding to the minimizing of described classifying content the number of the macro zone block that is encoded as the intraframe coding macro zone block.

45. according to the described equipment of claim 43, wherein said classifying content is based on space complexity, time complexity or space complexity and the time complexity of described multi-medium data.

46. a processor, it is configured to:

Obtain the classifying content of described multi-medium data; And

47. according to the described processor of claim 46, wherein said coding configuration comprises the number that increases the macro zone block that is encoded as the intraframe coding macro zone block corresponding to the minimizing of described classifying content.

48. according to the described processor of claim 46, wherein said classifying content is based on space complexity, time complexity or space complexity and the time complexity of described multi-medium data.

49. an equipment that is used to handle multi-medium data, it comprises:

Be used to obtain the device of the classifying content of described multi-medium data; And

Being used for based on described classifying content is Intra-coded blocks or inter-coded block with the device of the error resilient that increases described encoded multi-medium data with the block encoding of described multi-medium data.

50. according to the described equipment of claim 49, wherein said code device comprises the number that increases the macro zone block that is encoded as the intraframe coding macro zone block corresponding to the minimizing of described classifying content.

51. according to the described equipment of claim 49, wherein said classifying content device is based on space complexity, time complexity or space complexity and the time complexity of described multi-medium data.

52. a machine-readable medium that comprises instruction, described instruction cause machine when carrying out:

Obtain the classifying content of described multi-medium data; And

53. according to the described machine-readable medium of claim 52, the wherein said instruction that is used to encode comprises the instruction that increases the number of the macro zone block that is encoded as the intraframe coding macro zone block corresponding to the minimizing of described classifying content.

54. according to the described machine-readable medium of claim 52, wherein said classifying content is based on space complexity, time complexity or space complexity and the time complexity of described multi-medium data.