CN101313580A - Content driven transcoder that orchestrates multimedia transcoding using content information - Google Patents

Content driven transcoder that orchestrates multimedia transcoding using content information Download PDF

Info

Publication number
CN101313580A
CN101313580A CN 200680043239 CN200680043239A CN101313580A CN 101313580 A CN101313580 A CN 101313580A CN 200680043239 CN200680043239 CN 200680043239 CN 200680043239 A CN200680043239 A CN 200680043239A CN 101313580 A CN101313580 A CN 101313580A
Authority
CN
China
Prior art keywords
data
frame
medium data
content
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 200680043239
Other languages
Chinese (zh)
Inventor
维贾雅拉克希米·R·拉韦恩德拉恩
戈登·肯特·沃克
田涛
帕尼库马尔·巴米迪帕蒂
石方
陈培松
斯特拉曼·加纳帕蒂·苏布拉玛尼亚
塞伊富拉·哈立德·奥古兹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Publication of CN101313580A publication Critical patent/CN101313580A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

Apparatus and methods of using content information for encoding multimedia data are described. A method of processing multimedia data includes receiving multimedia data, and encoding the multimedia data into a first data group and a second data group based on content of the multimedia data, the first data group being configured to be independently decodable from the second data group, and wherein the first and second data groups are encoded at different quality levels. The method can also include classifying the content of the multimedia data and encoding the multimedia data based on the content classification.

Description

Use content information to arrange the content driven transcoder of multi-media transcoding
Advocate the priority under the 35 U.S.C. § 119
Present application for patent advocate based on apply for (a) on September 27th, 2005 be entitled as " A VIDEO TRANSCODERFOR REAL-TIME STREAMING AND MOBILE BROADCAST APPLICATIONS " the 60/721st, No. 416 temporary patent application cases, (b) the 60/789th of being entitled as of on April 4th, 2006 application " A VIDEO TRANSCODERFOR REAL-TIME STREAMING AND MOBILE BROADCAST APPLICATIONS " the, No. 377 temporary patent application cases, (c) the 60/727th of being entitled as of on October 17th, 2005 application " METHOD ANDAPPARATUS FOR SPATIO-TEMPORAL DETNTERLACrNG AIDED BY MOTIONCOMPENSATION FOR FIELD-BASED VIDEO " the, No. 643 provisional application cases, (d) the 60/727th of being entitled as of on October 17th, 2005 application " METHOD AND APPARATUS FOR SHOT DETECTION INVIDEO STREAMING " the, No. 644 provisional application cases, (e) the 60/727th of being entitled as of on October 17th, 2005 application " A METHOD AND APPARATUS FOR USING AN ADAPTIVE GOP STRUCTURE INVIDEO STREAMING " the, No. 640 provisional application cases, (f) the 60/730th of being entitled as of on October 24th, 2005 application " INVERSE TELECINE ALGORITHM BASED ON STATE MACHINE " the, No. 145 provisional application cases, and the priority of the 60/789th, No. 048 provisional application case that is entitled as " SPATIO-TEMPORALDETNTERLACING AIDED BY MOTION COMPENSATION FOR FIELD-BASEDMULTIMEDIA DATA " of (g) applying on April 3rd, 2006.All these seven temporary patent application cases all transfer this assignee and are incorporated herein clearly by reference at this.
The reference of common patent application case co-pending
The 11/373rd of being entitled as of present application for patent and on March 10th, 2006 application " CONTENT CLASSIFICATION FORMULTIMEDIA PROCESSING ", No. 577 U.S. patent application case is relevant, and described U.S. patent application case transfers this assignee and is incorporated herein clearly by reference at this.
Technical field
The application's case is at the equipment and the method for the video code conversion of the video data that is used for real-time crossfire, and more particularly at the video data transcoding that is used for the real-time crossfire that mobile broadcast uses.
Background technology
Because the limited bandwidth resources and the changeability of available bandwidth are useful so effective video is compressed in the multiple multimedia application (for example wireless video crossfire and visual telephone).Some video encoding standard (for example MPEG-4 (ISO/IEC), H.264 (ITU) or similar video coding) provides and is very suitable for the high efficiency coding that for example radio broadcasting etc. is used.Some multi-medium datas (for example, Digital Television shows) are normally encoded according to other standards such as for example MPEG-2.Therefore, be used for before radio broadcasting will be according to a standard (for example, the MPEG-2) transcoding multimedia data of coding or be converted to another standard (for example, H.264) for transcoder.
The codec that improvement rate is optimized recovers can provide advantage aspect (error recovery) and the scalability (scalability) in error resilient (error resilience), mistake.In addition, use the information of determining from multi-medium data itself also can provide for the coding additional improvement of (comprising error resilient, mistake recovery and scalability).Therefore, need a kind of transcoder, described transcoder is used efficient processing and the compression that multi-medium data is provided from the definite information of multi-medium data itself, and described transcoder is used for multiple multi-medium data application (mobile broadcast that comprises the crossfire multimedia messages) and is scalable and can realizes error resilient.
Summary of the invention
That describes and illustrate has some aspects based on the transcoding equipment of content of the present invention and each of method, and the single person who wherein is not described aspect is responsible for its required attribute separately.Under the situation of the scope that does not limit this disclosure, its significant characteristic will be discussed briefly now.After considering this content of the discussions, and especially be entitled as after the part of " embodiment ", how provide improvement for multi-medium data treatment facility and method with the characteristic of understanding this content driven transcoding in reading.
Inventive aspect described herein relates to and uses content information being used for the whole bag of tricks of encoded multimedia data, and content information is used in the various modules or assembly of encoder (encoder that for example, is used for transcoder).Transcoder can use content information to arrange the transcoding of multi-medium data.Described content information can originate from another (for example, the metadata that receives together with video) receive.Described transcoder can be configured to operate by various processing and produce content information.In some respects, described transcoder produces the classifying content of multi-medium data, and described classifying content then is used in one or more encoding process.In some respects, the content driven transcoder can be determined the room and time content information of described multi-medium data, and content information is used for the consistent quality coded of perception of content on the channel and compression/position distribution of content-based classification.
In some respects, obtain or calculate the content information (for example, metadata, content tolerance and/or classifying content) of multi-medium data, and the assembly that then provides it to transcoder is encoded to be used to handling multi-medium data.For instance, preprocessor can use some content information about the scene change-detection, (for example carry out anti-telecine (" IVTC "), release of an interleave, motion compensation and noise suppression, the 2D wavelet transformation) and the space-time noise reduce (for example, false shadow removes, decyclization, remove piece and/or denoising sound).In some respects, preprocessor also can use about the downsampled content information of spatial resolution, for example, and when when downsampled, determining suitable " safety " district and " action is handled " district from standard definition (SD) to 1/4 Video Graphics Array (QVGA).
In some respects, encoder comprises the content, classification module that is configured to calculate content information.Described encoder can will (for example be controlled about bit rate, the position distribution) classifying content is used to each MB to determine quantization parameter (QP), to be used for (for example) about the classifying content of estimation carries out color motion estimation (ME), carries out motion vector (MV) prediction, to be used to provide basal layer and enhancement layer about the classifying content of scalability, and use classifying content about error resilient to come impact prediction level and error resiliency schemes (for example, comprise adaptability frame inner updating, boundary alignment process and redundant I frame data are provided) in enhancement layer.In some respects, transcoder cooperates data multiplexer to use classifying content to keep the optimum multimedia quality of data on the channel.In some respects, come across with making I frame period property thereby described encoder can use content category message and allow channel switch fast in the encoded data.This type of embodiment also can utilize the I block that may require to be used for error resilient in the encoded data so that arbitrary access exchange and error resilient (for example, content-based classification) can make up effectively by the prediction level, increase anti-wrong robustness simultaneously thereby improve code efficiency.
In one aspect, a kind of method of handling multi-medium data comprises receiving multimedia data, and described multi-medium data is encoded to first data set and second data set based on the content of described multi-medium data, described first data set is configured to being independent of described second data set and decodes, and wherein said first and second data sets are with different credit rating codings.In the one side of first aspect, described first data set comprises I frame and P frame, and described second data set comprises I frame, P frame and B frame.On the other hand, described first data set comprises basal layer and described second data set comprises enhancement layer.In the third aspect, described method further comprises the classifying content with described multi-medium data, the content-based classification of wherein said coding.In fourth aspect, described coding comprise determine described multi-medium data be used to encode first quantization parameter of described first data set, and second quantization parameter of described second data set that is identified for encoding, the determining of wherein said first and second quantization parameters based on described classifying content.Aspect the 5th, described coding comprises based on described classifying content to distribute bit rate at least a portion of described multi-medium data.Aspect the 6th, described coding comprises that further using classifying content to detect scene changes, and changes based on described detected scene and to determine whether the I frame is included in described first data set and described second data set.Aspect the 7th, described coding comprises based on described classifying content be identified for the encoding frame rate of described multi-medium data.In eight aspect, described coding comprises the estimation of carrying out described multi-medium data based on described classifying content.Aspect the 9th, described method also comprises and is identified for first frame rate of described first data set of encoding, and second frame rate of described second data set that is identified for encoding, and wherein said first frame rate is less than described second frame rate.Aspect the tenth, described coding comprises that coming that based on described classifying content (error resilience) restored in described multi-medium data execution error handles.The tenth on the one hand, if described coding comprises that described first data set of coding and described second data set are so that described second data set is unavailable, so described first data set can be through decoding to form displayable multi-medium data, if and described first data set and described second data set are all available, so described first data set and described second data set can the combining form decoding to form displayable multi-medium data.Aspect the 12, described first quantization parameter comprises that first step-length and described second quantization parameter that are used for coded data comprise second step-length that is used for coded data, and the wherein said first step is grown up in described second step-length.Aspect the 13, described method further comprises the classifying content with described multi-medium data, and wherein said coding is based on described classifying content, and wherein said coding comprises based on described classifying content and reduces noise in the described multi-medium data.Aspect the 14, reduce noise and comprise that carrying out false shadow removes.Aspect the 15, reduce noise and comprise at least a portion of handling described multi-medium data by the decyclization filter, the intensity of wherein said decyclization filter is based on the content of described multi-medium data.Aspect the 16, reduce noise and comprise at least a portion of handling described multi-medium data by de-blocking filter, the intensity of wherein said de-blocking filter is based on the content of described multi-medium data.Aspect the 17, reduce noise and comprise the selected frequency of described multi-medium data is carried out filtering.In the tenth eight aspect, the intensity of described decyclization filter is based on the classifying content of described multi-medium data.Aspect the 19, the intensity of described de-blocking filter is based on the classifying content of described multi-medium data.Aspect the 20, described coding comprises downsampled described multi-medium data.At last, the 20 on the one hand, described coding comprises makes a credit rating be associated with described multi-medium data, and uses the described credit rating of described multi-medium data and content information be identified for the encoding bit rate of described multi-medium data.
In second aspect, a kind of equipment that is used to handle multi-medium data comprises encoder, described encoder is configured to receiving multimedia data and based on the content of described multi-medium data described multi-medium data is encoded to first data set and second data set, described first data set is configured to being independent of described second data set and decodes, and wherein said first and second data sets are with different credit rating codings.In the one side of described first aspect, described encoder comprises the content, classification module of the classifying content that is configured to determine described multi-medium data, and coding module is through further being configured to based on the described classifying content described multi-medium data of encoding.In second aspect, described encoder is wherein determined the classifying content of described first and second quantization parameters based on described multi-medium data through further being configured to determine encode second quantization parameter of described second data set of first quantization parameter and being used to of described first data set of encoding that is used to of described multi-medium data.In the third aspect, described encoder comprises motion estimation module, described motion estimation module is configured to carry out the estimation of described multi-medium data and produce the data motion compensation information based on described classifying content, and wherein said coding module is through further being configured to use the described motion compensation information described multi-medium data of encoding.In fourth aspect, encoder also comprises the quantization modules that is used for determining based on described classifying content the quantization parameter of described multi-medium data, and described encoder is through further being configured to use the described quantization parameter described multi-medium data of encoding.Aspect the 5th, described encoder also comprises a distribution module, and institute's rheme distribution module is configured to provide bit rate at least a portion of described multi-medium data based on described classifying content.Aspect the 6th, described encoder also comprises and is configured to detect the scene change detection module that scene changes, and described coding module is included in the I frame in the encoded multi-medium data through further being configured to change based on detected scene.Aspect the 7th, described encoder also comprises the frame rate module, and described frame rate module is configured to determine based on described classifying content the frame rate of described multi-medium data, and wherein said coding module is based on the described frame rate described multi-medium data of encoding.In eight aspect, described encoder also is configured to based on described classifying content encode first data set and described second data set.Aspect the 9th, described encoder also is configured to based on described classifying content described multi-medium data error process.
In the third aspect, a kind of equipment that is used to handle multi-medium data comprises the device that is used for receiving multimedia data, with the device that is used for described multi-medium data being encoded to encoded first data set and encoded second data set based on the content of described multi-medium data, described first data set is configured to being independent of described second data set and decodes, and wherein said first and second data sets are with different credit rating codings.In the one side of described first aspect, described receiving system comprises encoder.On the other hand, the described apparatus for encoding that is used for comprises encoder.In the third aspect, described code device comprises the device of the classifying content that is used for determining described multi-medium data, and wherein said code device is based on the described classifying content described multi-medium data of encoding.In fourth aspect, described code device comprises the transcoder of encoder.
In fourth aspect, a kind of machine-readable medium comprises instruction, described instruction impels a machine receiving multimedia data when carrying out, and the content based on described multi-medium data is encoded to encoded first data set and encoded second data set with described multi-medium data, described first data set is configured to being independent of described second data set and decodes, and wherein said first and second data sets are with different credit rating codings.One side in described fourth aspect, computer-readable media further comprises the instruction of the classifying content of the content that is used for producing the described multi-medium data of indication, wherein described multi-medium data is encoded to encoded first data set and encoded second data set and comprises based on the described classifying content described multi-medium data of encoding.On the other hand, described coding comprises first quantization parameter of the described multi-medium data of described first data set that is identified for encoding, and second quantization parameter of described second data set that is identified for encoding, determine that wherein described first and second quantization parameters are based on described classifying content.In the third aspect, the instruction of the described multi-medium data of encoding comprises the instruction that divides the bit rate of at least a portion that is used in described multi-medium data based on the content of described multi-medium data.
Aspect the 5th, a kind of processor comprises one and is configured to receiving multimedia data, and the content based on described multi-medium data is encoded to encoded first data set and encoded second data set with described multi-medium data, described first data set is configured to being independent of described second data set and decodes, and wherein said first and second data sets are with different credit rating codings.One side aspect the described the 5th, described processor further comprise a classifying content that is configured to produce the content of the described multi-medium data of indication, and wherein said coding comprises based on the described classifying content described multi-medium data of encoding.On the other hand, described processor further comprises encode second quantization parameter of described second data set of first quantization parameter and being used to of described first data set of encoding that is used to that is configured to determine described multi-medium data, and wherein said first and second quantization parameters are based on described classifying content.
Description of drawings
Figure 1A is the block diagram that comprises the media broadcasting system of transcoder, and described transcoder is used for transcoding between different video formats.
Figure 1B is the block diagram of encoder, and described encoder is configured to the encoded multimedia data and provides encoded first data set and encoded second data set.
Fig. 1 C is the block diagram that is configured to the processor of encoded multimedia data.
Fig. 2 is the block diagram of an example of transcoder of the system of Fig. 1.
Fig. 3 is the flow chart that the operation of the parser in the transcoder that is used for Fig. 2 is described.
Fig. 4 is the flow chart that the operation of the decoder in the transcoder that is used for Fig. 2 is described.
Fig. 5 is the performed a succession of operated system sequential chart of the transcoder of key diagram 2.
Fig. 6 illustrates a succession of operation of the preprocessor in the transcoder that can be used for Fig. 2 and the flow chart of function.
Fig. 7 is the block diagram that can be used for exemplary twice (2-pass) encoder in the transcoder of Fig. 2.
Fig. 8 illustrates an example of classification chart, the aspect how described classification chart explanation makes texture value and motion value be associated with classifying content.
Fig. 9 is an illustrative examples as the flow chart of the example operation of the classifying content of the encoder that is used for Fig. 7.
The flow chart of the operation of Figure 10 rate controlled that to be illustrative examples use as the encoder with Fig. 7.
The flow chart of the operation of Figure 11 exemplary exercise estimator that to be illustrative examples use as the encoder with Fig. 7.
The flow chart of the operation of Figure 12 example mode decision-making encoder functionality that to be illustrative examples use as the encoder with Fig. 7.
Figure 13 is the flow chart of example operation of the scalability of the explanation encoder of realizing being used for Fig. 7.
Figure 14 is the flow chart that explanation realizes the example operation of the rate distortion data flow that occurs in the encoder of Fig. 7 for example.
Figure 15 is the curve of the relation between explanation codec complexity, branch coordination and the human vision quality.
Figure 16 is the curve of the non-linear scene detection formula of explanation.
Figure 17 is the block diagram of system, and its explanation is used for the device of receiving multimedia data and is used for the multi-medium data that is received is carried out apparatus for encoding.
Figure 18 is the figure that explanation uses the release of an interleave of motion estimation/compensation to handle.
Figure 19 is the block diagram of multimedia communications system.
Figure 20 is the figure of the tissue of the video bit stream in explanation enhancement layer and the basal layer.
Figure 21 is the figure that explanation makes section aim at respect to video frame boundary.
Figure 22 is the block diagram of explanation prediction level.
Figure 23 is the program flow diagram of the method for the content-based information encoded multimedia data of explanation.
Thereby Figure 24 is the program flow diagram of the method on the content-based message level encoded multimedia data aligned data of explanation border.
Figure 25 is the graphic of explanation safety effect district of Frame and safe header area.
Figure 26 is safety effect district graphic of explanation Frame.
Figure 27 is a program flow diagram, and the program of adaptability frame refreshing encoded multimedia data is used in its explanation based on content of multimedia information.
Figure 28 is a program flow diagram, and the program of redundant I frame encoded multimedia data is used in its explanation based on content of multimedia information.
Figure 29 illustrates the motion compensation vector M V between present frame and the previous frame PAnd the motion compensation vector M V between present frame and the next frame N
Figure 30 is the program flow diagram of explanation Shot Detection.
Figure 31 is the program flow diagram that explanation is encoded to basal layer and enhancement layer.
Figure 32 is the schematic diagram that explanation is encoded to macro zone block.
Figure 33 is the schematic diagram that explanation is used for the module of basis of coding layer and enhancement layer.
Figure 34 shows the example that basal layer and enhancement layer coefficient selector are handled.
Figure 35 shows another example that basal layer and enhancement layer coefficient selector are handled.
Figure 36 shows another example that basal layer and enhancement layer coefficient selector are handled.
Figure 37 is the process chart of the content-based information encoded multimedia data of explanation.
Figure 38 is the figure of the possible system decision-making in the anti-telecine process of explanation.
Figure 39 illustrates will be by going piece to handle the border of filtering in the macro zone block.
Figure 40 is the figure that explanation space-time release of an interleave is handled.
Figure 41 illustrates an example of 1 dimension (1-D) heterogeneous resampling.
Figure 42 is the flow chart of an example of adaptability gop structure in the explanation video streaming.
It should be noted that at appropriate location the same numeral that runs through in described graphic some views refers to identical parts.
Embodiment
Following embodiment is at some aspect of being discussed in this disclosure.Yet, can many different modes implement the present invention." aspect " mentioned in this specification or " on the one hand " mean that particular characteristics, structure or the feature described in conjunction with described aspect are included at least one aspect.Phrase " in one aspect ", " according to an aspect " or " in some respects " that this specification position occurs may not all refer to identical aspect, also may not refer to independent aspect or the alternative aspect repelled mutually with others.In addition, description can be by some aspects but not the various characteristics of others performance.Similarly, description can be the various demands of the demand of some aspects but non-others.
Below describe and comprise that details is to provide the detailed understanding to example.Yet one of ordinary skill in the art should be appreciated that, even do not describe or illustrate process in an example or the aspect or each details of device herein, also can put into practice described example.For instance, fuzzy in order not make described example because of unnecessary details, electric assembly can be showed in the block diagram of each electrical connection that described assembly is not described or each electric device.In other cases, can show at length that this class component, other structure and technology are with the described example of further explanation.
This disclosure relates to the content information of the multi-medium data that use just is being encoded and controls coding and transcoding equipment and method.(multi-medium data) " content information " or " content " are the terms of broad sense, it means the content-related information with multi-medium data, and can comprise (for example) metadata, the tolerance calculated from multi-medium data and with one or more tolerance (for example classifying content) associated content relevant information.Content information can be provided for encoder calmly on application-specific or be determined by encoder.Described content information can be used for the many aspects of multi-medium data coding, comprise scene change-detection, time handle, the space-time noise reduces, downsampled, the bit rate, scalability, the error resilient that are identified for quantizing, keep optimum multimedia quality and Fast Channel exchange via broadcast channel.Use one or more in these aspects, transcoder can arrange to handle multi-medium data and produce the relevant encoded multi-medium data of content.The description and the graphic encoding context and the decoding aspect of also can be applicable to of transcoding aspect are described herein.
It is another form that transcoder equipment and method relate to from a kind of format code transferring, and be described as clearly in this article relating to the MPEG-2 video code conversion for strengthen, scalable H.264 form to be to be used for being transferred to mobile device (some aspects are described) via wireless channel.Yet, do not wish to the MPEG-2 video code conversion to be that the H.264 description of form limits scope of the present invention, it is illustration aspects more of the present invention only.Equipment that is disclosed and the method effective structure of height of error resilient coding that provides support with arbitrary access and vertical resolution, and also can be applicable to transcoding and/or coding except that MPEG-2 with the video format H.264.
As used herein " multi-medium data " or only " multimedia " be the term of broad sense, it comprises video data (it can comprise voice data), voice data or video data and both audio.As used herein " video data " or " video " be meant based on frame (frame-based) or based on the term of the broad sense of the data of field (field-based), it comprises one or more images or the associated picture sequence that contains text, image information and/or voice data, and (unless stipulating in addition) can be used for referring to multi-medium data (for example, using described term interchangeably).
Hereinafter described is the example of the various assemblies of transcoder, and can use content information to come the example of the processing of encoded multimedia data.
Multi-media broadcasting system
Figure 1A is the block diagram of data flow of some aspects of explanation multimedia data broadcast system 100.In system 100, multi-medium data supplier 106 is communicated to transcoder 200 with encoded multi-medium data 104.Encoded multi-medium data 104 is received by transcoder 200, and transcoder 200 is treated to original multi-medium data with multi-medium data 104 in block 110.Processing in the block 110 is decoded to encoded multi-medium data 104 and is analyzed, and further handles described multi-medium data to prepare that it is encoded to another form.Described multi-medium data through decoding is provided to block 112, and described multi-medium data is encoded to predetermined multimedia form or standard at block 112 places.In case described multi-medium data is encoded, it just is ready to transmit via (for example) wireless broadcast system (for example, cellular telephone broadcasts network, or via alternative communication network) at block 114 places.In some respects, the multi-medium data 104 that receives is encoded according to Moving Picture Experts Group-2.Decoding after the multi-medium data 104 of transcoding, transcoder 200 is encoded to H.264 standard with described multi-medium data.
Figure 1B is the block diagram that can be configured to carry out the transcoder 130 of the processing in the block 110 and 112 of Figure 1A.Transcoder 130 can be configured to receiving multimedia data, described multi-medium data decoding and analysis (are for example flowed substantially for subpackage, captions (subtitle), audio frequency, metadata, " original " video, CC data and presentative time stamp), it is encoded to required form and provides encoded data so that further handle or transmission.Transcoder 130 can be configured to encoded data are offered two or more data sets (for example, first encoded data set and the second encoded data set), and it can be known as hierarchical coding.Aspect some examples in, various data sets in the hierarchical coding scheme (or layer) can the different quality grade be encoded, and through format so that encoded data have the quality lower than data encoded in second data set (for example, providing lower visual quality grade when showing) in first data set.
Fig. 1 C is part or all the block diagram of processor 140 that can be configured to transcoding multimedia data and can be configured to carry out the processing described in the block 110 and 112 of Figure 1A.Processor 140 can comprise module 124a...n, and described module is carried out one or more (the comprising decoding, analysis, preliminary treatment and coding) of transcoding processing described herein and used content information so that handle.Processor 140 also comprises internal storage 122 and can be configured to directly or via another device communicates by letter with external memory storage 120 indirectly.Processor 140 also comprises and is configured to the communication module 126 of communicating by letter with one or more devices of processor 140 outsides, described communication comprise receiving multimedia data and provide encoded data (for example in first data set encoded data and in second data set encoded data).Aspect some examples in, various data sets in the hierarchical coding scheme (or layer) can the different quality grade be encoded, and through format so that encoded data have the quality lower than data encoded in second data set (for example, providing lower visual quality grade when showing) in first data set.
Transcoder 130 or its preprocessor 140 (being configured for use in transcoding) assembly and wherein contained processing can be implemented by hardware, software, firmware, middleware, microcode or its any combination.For instance, parser, decoder, preprocessor or encoder can be independently assembly, the hardware as in the assembly of another device, firmware, middleware is incorporated into or be implemented in the microcode of carrying out on the processor or software, or in its combination.When implementing in software, firmware, middleware or microcode, program code or the code segment of carrying out motion compensation, shot classification and encoding process can be stored in the machine-readable medium (for example medium).Code segment can be represented any combination of program, function, subprogram, program, routine, subroutine, module, software kit, classification or instruction, data structure or program statement.Code segment can be coupled to another code segment or hardware circuit by transmission and/or reception information, data, independent variable, parameter or memory content.
The illustrative example of transcoder structure
Fig. 2 illustrates the block diagram of the example of the transcoder that can be used as transcoder 200 illustrated in the multi-media broadcasting system 100 of Fig. 1.Transcoder 200 comprises parser/decoder 202, preprocessor 226, encoder 228 and layer, sync 240 (further describing hereinafter).Transcoder 200 is configured to the content information of multi-medium data 104 is used for one or more aspects (as described in this article) that transcoding is handled.Content information can obtain from the source of transcoder 200 outsides, calculates by the acquisition of multimedia metadata or by described transcoder (for example, by preprocessor 226 or encoder 228).Assembly explanation shown in Fig. 2 can be included in the assembly in the transcoder, and described transcoder is used for one or more transcodings with content information and handles.In specific embodiments, the described assembly of transcoder 200 one or more are excluded or in additional assemblies can be included in.In addition, even the several portions of described transcoder and transcoding processing is described so that allowing to describe in this article one handles or each details of installing, the those skilled in the art still can put into practice the present invention.
Fig. 5 illustrates the sequential chart that illustrates as the time relationship of the operation of the various assemblies of transcoder 200 and/or processing.As shown in Figure 5, for example the encoded crossfire video 104 (encoded multi-medium data) of MPEG-2 video is located to be received by parser 205 (Fig. 2) in zero (0) at first at any time.Then, video flowing (for example) by parser 205 in conjunction with decoder 214 through the analysis (501), demodulation multiplexer (502) and decoder (503).As described, the timing slip that these processing can be small takes place to offer preprocessor 226 (Fig. 2) with the stream output with deal with data concurrently.In time T 1504 places, in case receiving enough data from decoder 214, preprocessor 226 begins result is exported, remaining treatment step in fact just becomes in proper order, first pass coding (505), second time coding (506) and encode (507) wherein takes place successively again up to being coded in time T again after preliminary treatment fTill 508 places finish.
Transcoder 200 described herein can be configured carries out transcoding to various multi-medium datas, and the many processing in the described processing are applied to the multi-medium data through any kind of transcoding.Although some examples that provided relate in particular the MPEG-2 transcoded data is data H.264 herein, these examples are not intended this disclosure is limited to this type of data.Described encoding context hereinafter can be applied to any suitable multi-medium data standard transcoding is another multi-medium data standard that is fit to.
Parser/decoder
Once more referring to Fig. 2, parser/decoder 202 receiving multimedia datas 104.Parser/decoder 202 comprises and transmits stream parser (" parser ") 205, described parser 205 receiving multimedia datas 104 and described data analysis waited other data for video-frequency basic flow (ES) 206, audio ES 208, presentative time stamp (PTS) 210 and for example captions 212.ES is from the data (video or audio frequency) of one type of single video or audio coder carrying.For instance, video ES comprises the video data about the sequence of data, and it comprises all subdivisions of sequence header and described sequence.Subpackage is flowed (or PES) substantially and is made up of the single ES that changes bag into, and each bag begins with the additional packets header usually.PES stream only contains one type data from a source (for example, from a video or audio coder).The PES bag has variable-length, and it does not correspond to the fixed packet length that transmits bag, and comparable transmission bag is much longer.When transmitting bag and form from PES stream, the PES header can place to transmit to be surrounded by imitates beginning the place and being right after after transmitting the bag header of load.Remaining PES bag content is filled continuously the pay(useful) load of transmission bag till described PES bag is used fully.Final transmission bag (for example) can be filled into regular length owing to clog byte (for example, byte=0xFF (being 1)).
Parser 205 conveys to decoder 214 as the part of parser/decoder shown here 202 with video ES 206.In other configuration, parser 205 and decoder 214 are independent assembly.PTS 210 is sent to transcoder PTS generator 215, and it is peculiar and be used to arrange data are sent to from transcoder 200 the independent presentative time stamp of broadcast system that described transcoder PTS generator 215 can produce transcoder 200.The layer, sync 240 that transcoder PTS generator 215 can be configured to data are offered transcoder 200 with coordination data broadcasting synchronously.
Fig. 3 illustrates the flow chart of handling an example of 300, and parser 205 can followed described processing 300 to described various subpackage above when stream is analyzed substantially.When from content provider 106 (Fig. 1) receiving multimedia data 104, handle 300 and begin at block 302 places.Handle 300 and proceed to block 304, carry out the initialization of parser 205 at block 304 places.Initialization can be triggered by the independent order 306 of obtaining that produces.For instance, irrelevant and TV-scheduling that receive based on the outside and channel layout information processing can produce and obtain order 306 with parser 205.In addition, can import real-time transmission stream (TS) buffer descriptor 308 with auxiliary initialization and auxiliary main the processing.
As illustrated in block 304, initialization can comprise obtains command syntax checking, carries out any one clearly relevant processing that first pass PSI/PSIP/SI (program customizing messages/program and system information protocol/system information) handled, carries out and obtained order or PSI/PSIP/SI consistency checking, distributes PES buffer and set timing (for example, in order to begin instantaneous the aligning with required obtaining) for each PES.The PES buffer is preserved through the ES data of analysis and with each ES data through analysis and is conveyed to corresponding audio decoder 216, test encoder 220, decoder 214 or transcoder PTS generator 215.
After initialization, handle 300 and proceed to the block 310 that is used for the multi-medium data 104 that receives is led processing.Processing in block 310 can comprise that target bag identifier (PID) filtering, continuous P SI/PSIP/SI supervision and processing and Timing Processing (for example, in order to realize required obtaining the duration) are so that the multi-medium data that enters is passed in the suitable substance P ES buffer.Owing to the processing of described multi-medium data, produce program descriptor and indication that the PES buffer " reads " in block 310, it will institute's description and decoder 214 (Fig. 2) Jie connect hereinafter as this paper.
After block 310, handle 300 and proceed to block 314, the termination of analysis operation takes place at block 314 places, it comprises that the generation timer interrupts and the release of PES buffer after it consumes.Notice that the PES buffer will exist for all relevant basic streams (for example audio frequency, video and caption stream) of the program that is cited with descriptor.
Once more referring to Fig. 2, parser 205 sends to audio decoder 216 with audio ES 208 so that offer layer, sync 240 corresponding to the transcoder embodiment and with encoded text 216 and audio-frequency information is decoded.Caption information 212 is delivered to text code device 220.Built-in captions (CC) data 218 from decoder 214 also are provided in the text code device 220, and it is caption information 212 and CC data 218 codings that described text code device 220 comes with the form that is subjected to transcoder 200 and influences.
Parser/decoder 202 also comprises the decoder 214 of receiver, video ES 206.Decoder 214 can produce the metadata that is associated with video data, with encoded video subpackage substantially stream be decoded as original video 224 (for example, being the standard definition form) and handle the built-in caption data of video in video ES stream.
Fig. 4 shows the flow chart of an example of the decoding processing 400 that explanation can be carried out by decoder 214.The input that processing 400 is sentenced video-frequency basic flow data 206 at block 402 begins.Handle 400 and proceed to block 404, state decoder in block 404 places and be initialised.Initialization can comprise some tasks, and described task comprises the detection of video sequence header (VSH), carry out first pass VSH, video sequence (VS) and VS shows to extend and handle (comprising video format, primary colors and matrix coefficient) and distribute data buffer and cushion picture, the metadata that is associated and built-in captions (CC) data through decoding respectively.In addition, import the video PES buffer that provides by parser 205 and " read " information 406 (for example, it can produce by handling 300) in the block 310 of Fig. 3.
After the initialization at block 404 places, handle 400 and proceed to block 408, decoder 214 is carried out the main of video ES is handled in block 408.Main handle comprise for the new data availability come poll video PES buffer " read " information or " interface ", to video ES decode, the picture boundary rebuild and the storage pixel data, make video and a/v synchronously, generation metadata and it is stored in the picture boundary, and with the CC storage in the picture boundary.The block as a result 410 of main processing 408 comprises generation sequence description symbol, picture buffer descriptor, metadata buffer descriptor and CC data buffer descriptor through decoding.
Handle after 408 main, handle 400 and proceed to block 412, its executive termination is handled at block 412 places.Described termination can comprise definite end condition, end condition be included in do not have in specific duration above predetermined threshold new data occur, to the detection of EOS sign indicating number and/or to the detection of clear and definite termination signal.Described termination can comprise further that release is through picture, the metadata that is associated and the CC data buffer of decoding (at it by after hereinafter with the preprocessor consumption of describing).Handle 400 in the end of block 414 places, it can enter in block 414 waits for that video ES is as the received state of input.
Preprocessor
Fig. 2 (and Fig. 6) in more detail illustrates an instance aspect that content information can be used for the preprocessor 226 of one or more pretreatment operation.Preprocessor 226 receives metadata 222 and " original " video data 224 through decoding from parser/decoder 202.Preprocessor 226 is configured to video data 224 and metadata 222 are carried out the processing of some type and treated multimedia (for example, basal layer reference frame, enhancement layer reference frame, bandwidth information, content information) and video are offered encoder 228.This preliminary treatment of multi-medium data can improve the vision definition of data, anti-aliasing and compression efficiency.Generally, preprocessor 226 video sequence that provided by the decoder in parser/decoder 202 214 is provided and described video sequence is converted in proper order video sequence and is used for by encoder 228 be for further processing (for example, coding).In some respects, preprocessor 226 can be configured for use in many operations, described operation (for example comprises anti-telecine (inverse telecine), release of an interleave, filtering, false shadow removes, decyclization, remove piece and denoising sound), (for example adjust size, downsampled from standard definition to the spatial resolution of 1/4 Video Graphics Array (QVGA)) and gop structure generation (for example, computational complexity mapping generation, scene change-detection and decline/flash of light detect).
Preprocessor 226 can use the metadata from decoder to influence the one or more of described pretreatment operation.Metadata can comprise the information that is described or classifies about the content (" content information ") to multi-medium data; In particular, described metadata can comprise classifying content.In some respects, described metadata does not comprise the content information that encoding operation is required.Under this type of situation, preprocessor 226 can be configured to determine content information and use content information to be used for pretreatment operation and/or content information is offered other assembly (for example, decoder 228) of transcoder 200.In some respects, preprocessor 226 can use this type of content information to influence the coding parameter that GOP is cut apart, determined the suitable type of filtering and/or determines to be communicated to encoder.
Fig. 6 shows that an illustrative example and the explanation that can be included in the various processing blocks in the preprocessor 226 can be by the processing of preprocessor 226 execution.In this example, preprocessor 226 dateout 614 that receives metadata and video 222,224 and will comprise (handling) metadata and video is provided to encoder 228.Usually, exist three types can received video.The first, the video that is received can be a video in proper order, wherein can be without any need for release of an interleave.The second, video data can be a telecine video-from the terleaved video of 24 fps film sequence conversions, can need anti-telecine operation in the case.The 3rd, video can be non-telecine (non-telecined) terleaved video.Preprocessor 226 can be handled the video of these types as described below.
At block 601 places, preprocessor 226 determines that whether institute's receiving video data 222,224 is video in proper order.In some cases, this can determine from metadata (if described metadata contains this information) or by the processing of video data itself.For instance, anti-telecine process (hereinafter describing) can determine that whether institute's receiver, video 222 is video in proper order.If it is video in proper order, so described processing proceeds to block 607, in block 607, video is carried out filtering (for example, noise suppressor) thus operation reduces noise (for example white Gauss (Gaussian) noise).If at block 601 place's video datas 222,224 is not video in proper order, so described processing proceeds to block 604 and arrives phase detectors 604.
Phase detectors 604 differences are with telecine initial video and the video that begins with the standard broadcasting form.If determine that video is telecine (leaving the "Yes" decision path of phase detectors 604), the telecine video turns back to its unprocessed form in anti-telecine 606 so.Identification is also eliminated redundant frame and is recombinated from the field (field) that same video frame obtains and be combined into complete image.Since the film image sequence of rebuilding with regular time intervals (a second 1/24) by with the camera style record, so motion estimation process performed in GOP dispenser 612 or decoder 228 is more accurate, described motion estimation process is used anti-telecine image but not is had the telecine data of base when irregular.
In one aspect, phase detectors 604 are made some decision-making after the receiver, video frame.These decision-makings comprise: (i) whether the current video from telecine output and 3: 2 drop-down phase places is five phase place P shown in Figure 38 0, P 1, P 2, P 3And P 4In one; (ii) described video produces as conventional NTSC.Described decision-making is represented as phase place P 5These decision-makings occur as the output of the phase detectors shown in Fig. 2 604.Start anti-telecine 606 from phase detectors 604 and the path that is labeled as "Yes", indicate it to possess correct drop-down phase place so that it can select the field that forms and with they combinations from identical photographs.From phase detectors 604 and be labeled as and start deinterlacer 605 like the class of paths of "No" and tangible NTSC frame is divided into the field that is used for optimization process.Because dissimilar videos can at any time be received, so phase detectors 604 can constantly be analyzed frame of video.As a demonstration, the video that meets the NTSC standard is inserted in the described video as advertisement.After anti-telecine, gained video in proper order is sent to the noise suppressor (filter) 607 that can be used to reduce white Gaussian noise.
When recognizing conventional ntsc video (from the "No" path of phase detectors 601), it is transferred to deinterlacer 605 so that compression.Deinterlacer 605 is transformed to video in proper order with interlaced field, and then can carry out the operation of denoising sound to described video in proper order.The illustrative example that release of an interleave is handled is hereinafter described.
Traditional analog video device (as TV) provides video in staggered mode, i.e. the scan line (even field) of this type of device transmission even-numbered and the scan line (odd field) of odd-numbered.From the viewpoint of sample of signal, this is equivalent to the space-time sub-sampling in the pattern of being described by following formula:
Figure A20068004323900221
Wherein Θ represents the primitive frame picture, and F represents interlaced field, and (x, y, n) level of difference represent pixel, vertical and time location.
Do not losing under the general situation, can suppose that n=0 is that even field is so that equation above 1 is reduced in this disclosure
Figure A20068004323900222
Owing to do not select and on horizontal dimensions, carry out, so can in next n~y coordinate, describe the sub-sampling pattern.
The target of deinterlacer is that terleaved video (a succession of) is transformed to noninterlace frame (a succession of frame) in proper order.In other words, interpolation even field and odd field come " recovery " or produce the full frame picture.This can be represented by equation 3:
F wherein iRepresentative is about omitting the release of an interleave result of pixel.
Figure 40 is that explanation uses Wmed filtering and estimation to come to produce the block diagram of some aspect of the one side of the deinterlacer 605 of frame in proper order from staggered multi-medium data.Exercise intensity mapping 4002 is showed on the top of Figure 40, and it can use from producing when front court, two preceding field (PP field and P field) and two information with back court (next and following end).Exercise intensity mapping 4002 is with the present frame classification or be divided into two or more different motion grades, and can produce by spatio-temporal filtering (describing in further detail hereinafter).In some respects, as hereinafter describing with reference to equation 4-8, exercise intensity mapping 4002 is through producing with identification static region, slow moving region and rapid movement zone.Space time filter (for example, the Wmed filter 4004) uses based on the standard of exercise intensity mapping to come the staggered multi-medium data of filtering and produce the interim release of an interleave frame of space-time.In some respects, the Wmed Filtering Processing relates to the level neighbour field of [1,1], vertical adjacent and the adjacent field of time of five opposite fields of [3,3], described five opposite fields by five illustrated among Figure 40 fields (PP field, P field, when the front court, next, end down) expression, wherein Z -1The delay of a field of expression.With when the front court, next is relevant with the P field is non-parity field, and relevant with the PP field with following end be parity field." adjacent " that is used for spatio-temporal filtering is meant actual field of using and the space-time position of pixel during the filtering operation, and can be illustrated as shown in (for example) Fig. 6 and 7 " aperture " (aperture).
Deinterlacer 605 also can comprise noise suppressor (denoising sound filter) 4006, and described noise suppressor 4006 is configured to the interim release of an interleave frame of space-time that filtering is produced by Wmed filter 4004.Make motion search subsequently handle more accurately (especially under the situation that the staggered multi-medium data sequence in source is destroyed by white noise) to the interim release of an interleave frame of space-time denoising sound.It also can remove the vacation picture between the even number line and odd-numbered line in the Wmed picture at least in part.Noise suppressor 4006 can be used as the various filters that comprise based on the noise suppressor of wavelet shrinkage and small echo wiener filter (Wiener filter) and implements.Noise suppressor can be used to from candidate Wmed frame remove noise (using before motion compensation information further handles candidate Wmed frame) and the removable Wmed of the being present in frame noise and no matter the frequency content of signal how and stick signal.Can use various types of denoising sound filters, comprise wavelet filter.Small echo is that a class is used for the function of given signal framing in spatial domain and scale domain.The basic conception of small echo is to come analytic signal so that the little variation in the Wavelet representation for transient form produces corresponding little variation in primary signal with different yardsticks or resolution.
Also wavelet shrinkage or small echo wiener filter can be applied as noise suppressor.Wavelet shrinkage is made up of the wavelet transformation of noise signal, and less subsequently wavelet coefficient is punctured into zero (or littler value), and bigger coefficient is uninfluenced.At last, the execution inversion brings and obtains estimated signal.
The filtering of denoising sound improves the accuracy of motion compensation in noisy environment.Wavelet shrinkage denoising sound can relate in wavelet transformed domain and to shrink and comprise three steps usually: linear forward wavelet transform, nonlinear shrinkage denoising sound and linear inverse wavelet transform.Wiener filter is the MSE optimum linear filter that can be used to improve owing to addition noise and fuzzy image of demoting.Common known this type of filter in this technology, and its S.P.Ghael that above mentions (for example), A.M.Sayeed and R.R.Baraniuk " Ideal spatial adaptation by wavelet shrinkage " and " Improvement Wavelet denoising via empirical Wiener filtering " (SPIE journal, the 3169th volume, the 389-399 page or leaf, San Diego, in July, 1997) describe in, the latter clearly is incorporated herein in full by reference.
In some respects, denoising sound filter is based on the one side of (4,2) biorthogonal cubic B-spline wavelet filter.This kind filter can be defined by following positive-going transition and reciprocal transformation:
h ( z ) = 3 4 + 1 2 ( z + z - 1 ) + 1 8 ( z + z - 2 ) (positive-going transition) [4]
With
g ( z ) = 5 4 z - 1 - 5 32 ( 1 + z - 2 ) - 3 8 ( z + z - 3 ) - 3 32 ( z 2 + z - 4 ) (reciprocal transformation) [5]
The application of denoising sound filter can increase the accuracy of motion compensation in noisy environment.The embodiment of this type of filter is at " Ideal spatial adaptation by wavelet shrinkage " (D.L.Donoho and I.M.Johnstone, Biometrika, the 8th volume, the 425-455 page or leaf, 1994) further describe in, described document clearly is incorporated herein in full by reference.
The base section explanation one of Figure 40 is used for the aspect of the movable information (for example, motion vector candidates, estimation, motion compensation) of definite staggered multi-medium data.In particular, Figure 40 account for motion estimates and motion compensated schemes, and the motion compensation that described scheme is used to produce selected frame is frame in proper order temporarily, and then combines gained " finally " frame in proper order that is shown as release of an interleave present frame 4014 with formation with the interim frame of Wmed.In some respects, motion vector (" the MV ") candidate (or estimated value) of staggered multi-medium data is provided to deinterlacer and is used to provide the starting point that is used for bi-directional motion estimation device and compensator (" ME/MC ") 4018 from the external movement estimator.In some respects, MV candidate selector 4022 uses the previous MV that determines of adjacent block for the MV candidate of just treated block, has for example before handled the MV of block (for example block in the release of an interleave previous frame 4020).Can carry out motion compensation based on previous release of an interleave frame 70 and next (for example, future) Wmed frame 4008 two-wayly.Current Wmed frame 4010 and motion compensation (" MC ") present frame 4016 is merged or combination by combiner 4012.Gained release of an interleave present frame 4014 (existing for frame) in proper order provided gets back to ME/MC 4018 being used as release of an interleave previous frame 4020, and also with deinterlacer 605 PERCOM peripheral communication so that with reprocessing.
Might the release of an interleave prediction scheme that comprise interpolation between the field be separated with intrafield interpolation by Wmed+MC release of an interleave scheme.In other words, space-time Wmed filtering can be mainly used in the intrafield interpolation purpose, and interpolation can be carried out during motion compensation between the field.This has reduced Wmed result's Y-PSNR, but the visual quality of using after the motion compensation is more satisfactory, because will remove from the Wmed Filtering Processing from the bad pixel of predictive mode decision-making between inaccurate field.
After suitable anti-telecine or release of an interleave are handled, suppress for the vacation picture at block 608 places and resampling (for example, adjustment size) and video is in proper order handled.Aspect some resamplings, implement the leggy resampler for the adjustment of dimension of picture.In a downsampled example, the ratio between the picture of original image and adjustment size can be p/q, and wherein p and q close to fasten and be prime number.Phase place add up to p.In some respects, the cut-off frequency of polyphase filters is 0.6 for about 0.5 adjustment size factor.Cut-off frequency does not accurately mate so that the high-frequency that improves through adjusting the size sequence responds with the adjustment dimensional ratios.This allows some aliasings inevitably.Yet well-known, human eye is liked clearly but is had the picture of an aliasing to surpass fuzzy and picture that do not have aliasing.
Figure 41 illustrates an example of leggy resampling, and it shows the phase place when the adjustment size quantitatively is 3/4.Illustrated cut-off frequency also is 3/4 among Figure 41.Explanation has the original pixels of vertical axis in above-mentioned figure.Be that the center is described the sinc function and represented filter shape also with described axle.Because it is quantitatively identical to select cut-off frequency to be chosen as with resampling, thus after adjusting size described sinc zero of a function and locations of pixels overlapping (in Figure 41 with the cross symbol description).In order after adjusting size, to find a pixel value, shown in following equation, can amount to contribution from original pixels:
v ( x ) = Σ i = - ∞ ∞ u ( i ) × sin c ( πf c ( i - x ) ) - - - [ 6 ]
F wherein cBe cut-off frequency.Above-mentioned 1 dimension polyphase filters can be applied to horizontal dimensions and vertical dimensions.
The consideration on the other hand of resampling (adjustment size) excessively scans.In the ntsc television signal, image has 486 scan lines, and can have 720 pixels on each scan line in digital video.Yet, because not matching between size and the screen format, so be not the whole of entire image on TV for visible.The sightless part of image is known as excessive scanning.
In order to help broadcaster that Useful Information is placed in the visible zone of many as far as possible TVs, (the Society of Motion Picture ﹠amp of the Society of Motion Picture and Television Engineers; Television Engineers SMPTE) defines the specific dimensions of the action frame that is known as safety effect district and safe header area.Referring to the SMPTE recommended practice RP 27.3-1989 on the Specifications for Safe Action and Safe Title AreasTest Pattern for Television Systems.SMPTE is decided to be the safety effect regional boundary wherein the zone of " all important actions must take place ".The zone that safe title regional boundary is decided to be wherein " all Useful Informations can be defined to guarantee the observability on most of domestic TV receivers ".
For instance, referring to Figure 25, safety effect district 2510 occupies the centre 90% of screen, and it reserves 5% edge around.Safe header area 2505 occupies the centre 80% of screen, and it reserves 10% edge.Now referring to Figure 26, because safe header area is so little, so in order to add more contents in image, the safety effect district is put into text in some stations, described safety effect district is in the inside of white rectangular window 2615.
Can excessively usually see the black border in the scanning.For instance, in Figure 26, the black border comes across the upside 2620 and downside 2625 places of image.Because H.264 video uses the edge to extend in estimation, so can excessively remove these black borders in the scanning.The black border of extending can increase remnants.Suitably, can be with edge cuts 2%, and then adjust size.Can correspondingly produce and be used to adjust the size filtering device.Before leggy is downsampled, carries out and prune to remove excessive scanning.
Referring to Fig. 6, video then proceeds to block 610 in proper order once more, carries out at block 610 places and goes piece and decyclization operation.Two types false shadow (" block " and " ring ") comes across in the video compression applications usually.The appearance of the false shadow of block is because compression algorithm is divided into block (for example, 8 * 8 blocks) with each frame.Each block through rebuilding has some minor errors, and the mistake formation contrast of the edge of the usually wrong and adjacent block of the edge of block, thereby makes block border as seen.In contrast, encircle the edge distortion on every side that false shadow shows as characteristics of image.The appearance that encircles false shadow is because encoder abandons too many information when quantizing high-frequency DCT coefficient.In some illustrative example, go piece and decyclization all can use low pass FIR (finite impulse response (FIR)) filter to hide these visible false shadows.
In an example that goes piece to handle, de-blocking filter can be applied to all 4 * 4 block edges of frame, the edge of the boundary of described frame and de-blocking filter are handled except any edge of its inefficacy.This Filtering Processing will be carried out based on macro zone block in the frame construction back of finishing dealing with, and wherein come all macro zone blocks in the processed frame according to the order that increases progressively the macro zone block address.For each macro zone block, filtering vertical edge at first from left to right, and then filtering horizontal edge from top to bottom.Lightness (luma) de-blocking filter is handled and is carried out on four 16-sample edges, and the de-blocking filter that is used for each colourity (chroma) component is handled and is carried out on two 8-sample edges, described processing is carried out in the horizontal direction with on the vertical direction, as shown in Figure 39.On current macro zone block and the input that will handle with the de-blocking filter that the current macro zone block of opposing is carried out of the sample value on the left side (may be going piece to handle operation to revise) and can during the filtering of current macro zone block, further revising by what carry out on the macro zone block formerly.The sample value that is modified during the filtering of vertical edge can be used as the input of the filtering of the horizontal edge that is used for same macro zone block.Going piece to handle can call separately at lightness and chromatic component.
In the example that decyclization is handled, be applied near the segment smoothingization that makes the edge 2 dimension filter adaptabilities.For fear of fuzzy, edge pixel stand seldom filtering or without undergoing filtering.
The GOP dispenser
After removing piece and decyclization, video is handled by GOP dispenser 612 in proper order.The GOP location can comprise that detector lens changes, produces complexity mapping (for example, time, spatial bandwidth mapping) and adaptability GOP is cut apart.In these operations each is hereinafter described.
A. scene change-detection
When the frame that Shot Detection relates in definite set of pictures (GOP) demonstrates the data that the variation of indication scene has taken place.Usually, in GOP, described frame does not have any significant variation in any two or three (or more than three) consecutive frames, perhaps can have slow variation or variation fast.Certainly, these scenes variation classification can further be decomposed into the variation of grade greatly calmly on required application-specific.
Detector lens or scene change very important for the efficient coding of video.Usually, when GOP does not change significantly, be some predictive frames after the I frame that begins to locate of described GOP, encoded video is so that the decoding subsequently of described video and be shown as visually-acceptable fully.Yet when scene changed suddenly or lentamente, the vision that may need extra I frame and less predictive coding (P frame and B frame) to produce decoding subsequently can be accepted the result.
The method of hereinafter describing Shot Detection and coded system and improving the performance of existing coded system.These aspects can be implemented in the GOP dispenser 612 of preprocessor 226 (Fig. 7), perhaps can be included in the encoder apparatus (it can be operated under preprocessor existence or non-existent situation).These aspect utilizations comprise the statistics (or tolerance) of the statistical comparison between the consecutive frame of video data determine whether to take place unexpected scene changes, whether scene just slowly changes or scene in whether have the camera flash-light that can make video coding become especially complicated.Described statistics can obtain and then be sent to code device from preprocessor, and perhaps it can produce in code device (for example, by being configured to carry out the processor of motion compensation).The gained statistics helps to carry out the decision-making of scene change-detection.In the system that carries out transcoding, usually there be preprocessor or the configurable processor that is fit to.If described preprocessor is carried out the auxiliary release of an interleave of motion compensation, the motion compensation statistics is for available and available at any time so.In this type systematic, the Shot Detection algorithm can increase system complexity a little.
The illustrative example of Shot Detection device described herein only needs to be used to the statistics from previous frame, present frame and next frame, and therefore has the low-down stand-by period.The Shot Detection device is distinguished some dissimilar camera lens incidents, comprises that unexpected scene changes, intersects and desalinate and other slow scene variation and camera flash-light.By determining dissimilar camera lens incidents with the Different Strategies in the encoder, code efficiency and visual quality are improved.
The scene change-detection can be used for any video coding system, because it preserves the position intelligently by inserting the I frame at interval with regular time.In some respects, the content information that preprocessor obtained (for example, be incorporated in the metadata or by preprocessor 226 calculate) can be used for the scene change-detection.For instance, hereinafter described threshold value and other standard decide dynamically to adjust to be used for dissimilar video contents on content information.
Video coding is gone up operation in structuring set of pictures (GOP) usually.Usually with intracoded frame (I frame) beginning, the back is a series of P (prediction) frame or B (two-way) frame to GOP.Usually, the I frame can store show all required data of described frame, B frame rely on data in previous frame and the subsequent frames (for example, only contain from previous frame change or with different data of data the next frame), and the P frame contains the data that changed from previous frame.In widespread usage, P frame and B frame intersperse among among the I frame in encoded video.(for example, be used for the number of the position of coded frame) aspect size, the I frame is more much bigger than P frame usually, and the P frame is bigger than B frame.For efficient coding, transmission and decoding processing, the length of GOP answers long enough to reduce the effective loss that comes arrogant I frame and enough to lack to overcome not matching or channel impairments between encoder and the decoder.In addition, the macro zone block (MB) in the P frame can be because of identical former thereby by intraframe coding.
The scene change-detection can be used for video encoder determining suitable GOP length and to insert the I frame based on described GOP length, rather than usually inserts unwanted I frame at interval with regular time.In actual crossfire video system, communication channel is usually because bit-errors or packet loss and undermined.Put I frame or IMB wherein and can influence decoded video quality and viewing experience significantly.A kind of encoding scheme is that intracoded frame is used for partly comparing picture or the picture part with marked change with the previous picture or the picture of configuration.Usually, can not come to predict effectively and efficiently these zones by estimation, and if this type of zone quilt release inter-frame coding (for example, using B frame and P frame coding), coding can more effectively carry out so.In the situation of channel impairments, error propagation may be stood in those zones, and described error propagation can be minimized or eliminate (perhaps almost like this) by intraframe coding.
The several portions of GOP video can be classified as two or more kinds, and wherein each zone can have the different intraframe coding standard of the particular of can be depending on.As an example, video can be classified as three kinds: scene changes, intersects and desalinate and other slow scene variation suddenly, and camera flash-light.
Suddenly scene variation comprises and the remarkable different frame of previous frame that it is caused by camera operation usually.Because the content of these frames is different with the content of previous frame, so unexpected scene change frame should be encoded to the I frame.
The desalination that intersects changes the slow conversion that comprises scene with other slow scene, and it is caused by the Computer Processing of shooting usually.It is more desirable that the mixing gradually of two different scenes allows the people glance up, but video coding is constituted challenge.Motion compensation can not reduce the bit rate of those frames effectively, and can be at MB in the more frame of these frame updates.
When the content of frame comprised camera flash, camera flash-light or camera flash incident took place.This type of flash duration is relatively lacked (for example, a frame) and very bright, makes the pixel of describing to glisten in the frame demonstrate uncommon high brightness with respect to the respective regions on the consecutive frame.Camera flash-light is changed the brightness of picture suddenly and apace.The duration of camera flash-light is usually covered the duration (being defined as 44ms usually) weak point than human visual system's (HVS) time.Human eye is insensitive for the quality of these short brightness bursts, and therefore it can be encoded roughly.Because can not handle the photoflash lamp frame effectively and described photoflash lamp frame is the bad predicting candidate person of future frame by motion compensation, so the rough coding of these frames does not reduce the code efficiency of future frame.The scene that is classified as photoflash lamp should not be used to predict other frame because of the cause of " manually " high brightness, and other frame is because same reason can not be used for predicting these frames effectively.In case be identified, just these frames can be taken out, because it may need a large amount of relatively processing.A kind of selection is to remove the camera flash-light frame and encoding D C coefficient on its position; This kind solution is simpler, calculates fast and save many positions.
When any one of the mentioned kind of frame is detected, declaration camera lens incident.Shot Detection is not only useful to improving coding quality, and it also can help identification video content search and index.The illustrative aspect that scene detection is handled is hereinafter described.In this example, Shot Detection is handled and at first is just treated selected frame computing information or the tolerance that is used for Shot Detection.Tolerance can comprise from the information of the bi-directional motion estimation of video and compensation deals and other tolerance based on brightness.
In order to carry out bi-directional motion estimation/compensation, video sequence can carry out preliminary treatment by the bi directional motion compensation device, described bi directional motion compensation device makes in each 8 * 8 block of present frame and the described frame near the block coupling in consecutive frame both (in the past, and in future).Described motion compensator produces motion vector and difference measurement for each block.Figure 29 is an explanation, and its displaying makes an example of the pixel of present frame C and past frame P and future (or next) frame N coupling and describes motion vector (past motion vector MV for matched pixel PWith following motion vector MV N).Hereinafter referring to the general description of the big volume description of Figure 32 about bidirectional motion vector generation and correlative coding.
Bidirectional-movement information in determining corresponding consecutive frame (for example, identify the movable information that mates MB most) afterwards, various comparisons by present frame and next frame and previous frame can produce extra tolerance (for example, by the motion compensator in the GOP dispenser 612 or another assembly that is fit to).Described motion compensator can be each block and produces difference measurement.Described difference measurement can be the difference of two squares and (SSD) or absolute difference and (SAD).Do not losing under the general situation, SAD is being used as an example herein.
For each frame, SAD leads (being also referred to as " contrast ratio ") following calculating:
γ = ϵ + SA D P ϵ + SAD N - - - [ 6 ]
SAD wherein PAnd SAD NBe respectively forward and oppositely difference measurement absolute difference and.It should be noted that denominator contains little positive number ε and prevents " removing zero " mistake.Molecule also contains a ε and comes identity element (unity) effect in the balance denominator.For instance, if previous frame, present frame and next frame are identical, motion search will draw SAD so P=SAD N=0.In the case, aforementioned calculation generator γ=1 but not 0 or infinitely great.
Can calculate a brightness histogram at each frame.Usually multi-media image has eight brightness and depth (for example, " case " number (bin)).According to some aspects,, the brightness and depth that is used to calculate brightness histogram obtains described histogram thereby can being set to 16.In others, brightness and depth can be set to suitable number, and described number can be depending on the type of just treated data, available computing capability or other preassigned.In some respects, brightness and depth can dynamically be set based on the tolerance of calculating or receive (for example content of data).
An example of brightness histogram difference (lambda) is calculated in following equation explanation:
λ = Σ i = 1 16 | N Pi - N Ci | / N - - - [ 7 ]
N wherein PiBe number for the block in the previous frame i case, and N CiBe number, and N is the sum of the block in the frame for the block in the present frame i case.If the brightness histogram difference of previous frame and present frame fully dissimilar (or non-intersect), λ=2 so.
Use this information, (D) is calculated as follows with frame difference metric:
D = γ C γ P + Aλ ( 2 λ + 1 ) - - - [ 8 ]
Wherein A is the constant by application choice, γ C = ϵ + SA D P ϵ + SAD N , And γ P = ϵ + SAD PP ϵ + SAD C .
If frame difference metric meets the standard shown in the equation 9, so selected (current) frame is classified as unexpected scene change frame:
D = γ C γ P + Aλ ( 2 λ + 1 ) ≥ T 1 - - - [ 9 ]
Wherein A is the constant by application choice, and T 1It is threshold value.
In an example, simulation shows: set A=1 and T 1=5 realize the good detection performance.If present frame is unexpected scene change frame, γ so CShould be bigger and γ PShould be less.But usage rate But not only use γ CSo that described tolerance turns to the Activity Level of described situation through specification.
It should be noted that above-mentioned standard uses brightness histogram difference lambda (λ) in nonlinear mode.Figure 16 illustrates that λ * (2 λ+1) is convex function.When λ less (for example, near zero), it is preemphasis (preemphasis) seldom.It is big more that λ becomes, and increasing the weight of that described function carries out is many more.Have under the situation of this preemphasis, greater than 1.4 λ, detect unexpected scene and change (if threshold value T for any 1Be set as 5).
If scene intensity tolerance D meets the standard shown in the equation 5, so definite present frame is for intersecting desalination or slowly scene variation:
T 2<D<T 1 [10]
For the successive frame of given number, wherein T 1For with threshold value and T that above employed threshold value is identical 2Be another threshold value.
The photoflash lamp incident usually causes brightness histogram to transfer to brighter side.Aspect this illustrative of camera, the brightness histogram statistics is used for determining whether present frame comprises camera flash-light.Shot Detection handle can determine present frame brightness whether than the big specific threshold value T of brightness of previous frame 3, and whether the brightness of present frame is than the big threshold value T of brightness of next frame 3, as shown in equation 11 and 12:
Y C-Y P≥T 3 [11]
Y C-Y N≥T 3 [12]
If do not satisfy above-mentioned standard, present frame is not categorized as so and comprises camera flash-light.If satisfy described standard, Shot Detection is handled and is determined reverse difference measurement SAD so PWith forward difference measurement SAD NWhether greater than certain threshold level T4, as illustrated in the following equation:
SAD P≥T 4 [13]
SAD N≥T 4 [14]
Y wherein CBe the mean flow rate of present frame, Y PBe the mean flow rate of previous frame, Y NBe the mean flow rate of next frame, and SADP and forward and the reverse difference measurement of SADN for being associated with present frame.
Shot Detection handles by the brightness of at first determining present frame whether determine the camera flash incident greater than the brightness of previous frame and the brightness of next frame.If not so, so described frame is not the camera flash incident; But if so, it may be the camera flash incident so.Whether described Shot Detection is handled then can assess reverse difference measurement greater than threshold value T 3With the forward difference measurement whether greater than threshold value T 4If satisfy this two conditions, so described Shot Detection processing is categorized as present frame has camera flash-light.If do not satisfy described standard, so described frame even is not classified as the camera lens of any kind, and perhaps it can givenly identify the default categories of the coding (for example, abandon frame, be encoded to the I frame) that will carry out on described frame.
Show that above some are about T 1, T 2, T 3And T 4Exemplary values.Usually, these threshold values are to select by the test to the particular of Shot Detection.In some respects, threshold value T 1, T 2, T 3And T 4One or more scheduled and this type of value be incorporated in the shot classification device of code device.In some respects, can during handle, (for example, dynamically) be supplied to the information (for example, metadata) of shot classification device or come setting threshold T based on the information that shot classification device itself is calculated based on use 1, T 2, T 3And T 4One or more.
Use Shot Detection information to come encoded video in encoder, to carry out usually, but described herein for the integrality of Shot Detection disclosure.Referring to Figure 30, encoding process 301 can use Shot Detection information to come encoded video based on the institute's detector lens in the frame sequence.Handle 301 and proceed to block 303, and check to understand present frame whether be classified as unexpected scene variation.If so, can be encoded as the I frame and can determine the GOP border at block 305 place's present frames so.If not so, handle 301 so and proceed to block 307; If present frame is classified as the part of the scene of slow change, other frame in the scene of block 309 place's present frames and slowly change can be encoded as predictive frame (for example, P frame or B frame) so.Handle 301 and then proceed to block 311, check in block 311 whether present frame is classified as the photoflash lamp scene that comprises camera flash.If so, state frame in block 313 places so and can be identified to be used for special processing, for example remove or be described frame encoding D C coefficient; If not so, present frame is not carried out any classification and present frame so and can be encoded, be encoded as the I frame according to other standard or be dropped.
Aspect above-described, frame to be compressed is adjacent two measuress of dispersion between the frame and is indicated by frame difference metric D.Change if detect the unidirectional brightness of significant quantity, represent the intersection desalination effect in the described frame so.The desalination that intersects is remarkable more, just can realize many more gains by using the B frame.In some respects, use as following equation as shown in through the modification frame difference metric:
Figure A20068004323900321
D wherein P=| Y C-Y P| and d N=| Y C-Y N| be respectively lightness difference between present frame and the previous frame and the lightness difference between present frame and the next frame, Δ is represented the constant that can determine in common experiment, because it can be depending on embodiment, and α is the weight variable with the value between 0 and 1.
B. the bandwidth mapping produces
Preprocessor 226 (Fig. 6) also can be configured to produce the bandwidth mapping that can be used for the encoded multimedia data.In some respects, the content, classification module 712 (Fig. 7) in the encoder 228 replaces preprocessor to produce described bandwidth mapping.
Human vision quality V can be codec complexity C and both functions of branch coordination B (also being known as bandwidth).Figure 15 is the curve of this relation of explanation.It should be noted that codec complexity tolerance C considers the room and time frequency from the human vision viewpoint.For the more sensitive distortion of human eye, the complexity value is correspondingly higher.Usually can suppose V in C monotone decreasing and in B monotonic increase.
In order to realize constant visual quality, with bandwidth (B i) be assigned to i object (frame or MB) to be encoded, described bandwidth (B i) satisfy below represented standard in back to back two equatioies:
B i=B(C i,V) [16]
B = Σ i B i - - - [ 17 ]
On back to back in two equatioies, C iBe the codec complexity of i object, B is total available bandwidth and the visual quality of V for realizing about an object.Be difficult to the human vision quality is expressed exactly with equation.Therefore, top equation set is not accurately to define.Yet, if hypothesis 3-D (3 dimension) model is continuous in all variablees, so can be with bandwidth ratio (B i/ be considered as one B) that (C V) remains unchanged in the right neighborhood.Bandwidth ratio β iDefined in the equation below:
β i=B i/B [18]
Then can explain and define the position distribution as following equation:
β i=β(C i)
1 = Σ i B i For (C i, V) ∈ δ (C 0, V 0) [19]
Wherein δ indication " neighborhood ".
Codec complexity is subjected to the human vision susceptibility to influence (in the space and on the time).The human vision model of Girod is an example that can be used to define the model of space complexity.This model is considered local space frequency and ambient lighting.Gained tolerance is called D CsatPreliminary treatment point in processing, picture will by intraframe coding still be interframe encode be unknown, and produce at described both bandwidth ratio.β according to the different video object INTRABetween ratio come a minute coordination.For the intraframe coding picture, the bandwidth ratio rate is expressed in the following equation:
β 1NTRA=β 0INTRAlog 10(1+α INTRA?Y 2D csat) [20]
In above-mentioned equation, Y is the mean flow rate component of macro zone block, α INTRAFor about brightness quadratic sum D thereafter CsatThe weighted factor of item, β 0INTRAFor guaranteeing 1 = Σ i β i Normalization factor.For instance, about α INTRA=4 value realizes good visual quality.Can use content information (for example, classifying content) with α INTRABe set at a value, described value is corresponding to the required good visual credit rating of the certain content that is used for video.In an example, if video content comprises " spokesman's talking head " news broadcast, the visual quality grade can be made as so lower, but because can think that the frame of described video or display part are important not as audio-frequency unit, and can distribute less position to come coded data.In another example, if video content comprises sport event, can use content information so with α INTRABe set at value,, and therefore can distribute more position to come coded data because shown image may be more important for the observer corresponding to higher visual quality grade.
In order to understand this relation, it should be noted that bandwidth and codec complexity are logarithm ground and distribute.The coefficient that brightness quadratic term Y2 reflection has relatively large value uses a more fact of encoding.In order to prevent that logarithm from drawing negative value, identity element is added in the item in the parantheses.Also can use logarithm with other radix.
Time complexity is definite by measuring of frame difference metric, and described measuring considered amount of exercise (for example, motion vector) and frame difference metric (for example absolute difference and (SAD)) and measured two differences between the successive frame.
Position distribution about inter coded images can be considered space complexity and time complexity.This is expressed by following formula:
β INTER=β 0INTERlog 10(1+α INTER·SSD·D csatexp(-γ||MV P+MV N|| 2)) [21]
In the superincumbent equation, MV PAnd MV NBe forward and reversed motion vector (referring to Figure 29) about current MB.It may be noted that the Y in the intraframe coding bandwidth formula 2Replace by the difference of two squares with (SSD).In order to understand || MV P+ MV N|| 2Effect in the equation in the above please notes human visual system's following feature: experience is level and smooth, predictable motion (little ‖ MV P+ MV N|| 2) the zone arouse attention and can follow the trail of and can tolerate more distortions like that not as stagnant zone usually by eyes.Yet, experience quick or unpredictable motion (big || MV P+ MV N|| 2) the zone can not be tracked and tolerable quantize significantly.Experiment shows α INTER=1, good visual quality is realized in γ=0.001.
C. adaptability GOP is cut apart
In another illustrative example of the processing that can be carried out by preprocessor 226, the GOP dispenser 612 of Fig. 6 also adaptability ground changes the warp composition of a picture group sheet of coding jointly, and comes it is discussed with reference to the example that uses MPEG2.Some older video compression standards (for example, MPEG2) do not require that GOP has regular texture, but can force a regular texture.The MPEG2 sequence is always with I frame (that is frame of having encoded under the situation with reference to previous picture not) beginning.MPEG2 GOP form is often fixed by the spacing that makes the P that is connected among the GOP behind the I frame or predicted pictures and is pre-configured at the encoder place.The P frame is the picture that part is predicted from previous I or P picture.The beginning I frame and subsequently the frame between the P frame be encoded as the B frame." B " frame (B representative two-way) can be individually or simultaneously will be previous and next (I or P) picture with for referencial use.The number of the position that coding I frame is required on average surpasses the number of the required position of coding P frame; Equally, the number of the required position of coding P frame on average surpasses the number of the required position of B frame.The frame of skipping (if it is used) is because the cause of its form of expression will be without any need for the position.
Use P frame and B frame and in compression algorithm more recently, skipped frame is to eliminate time redundancy with the notion of the speed that reduces the required data of expression video.When time redundancy when higher (that is, existing between picture and picture seldom to change), video flowing is represented in P, B or the use of being skipped picture effectively, because the I of early decoding or P picture are used as reference decode other P or B picture after a while.
Adaptability GOP is cut apart based on using this notion adaptively.Difference between the frame is through quantizing and automatically make the decision-making that the frame of still being skipped with I, P, B is represented picture after the test suitable to the quantitative differences execution.The adaptability structure has the advantage that can not obtain in the fixing gop structure.Fixed structure will be ignored the possibility that content change has seldom taken place; Adaptive program will allow more B frame is inserted between each I frame and the P frame or two P frames between, reduce the number of the required position of abundant expression frame sequence whereby.On the contrary, when being changed significantly in the video content, the efficient of P frame can be because the difference between predictive frame and the reference frame reduces too greatly and greatly.Under these conditions, match objects may be in motion search area, and perhaps the similitude between the match objects is owing to the distortion that variation caused in the camera angle reduces.At this moment, the P frame that is adjacent of P frame or I frame should be chosen as near each other and should insert less B frame.Fixing GOP can not make described adjustment.
Herein in the system that is disclosed, these conditions of sensing automatically.Described gop structure is for flexibly and can adapt to these variations in the content.Described system evaluation frame difference metric, described frame difference metric can be considered to be interframe distance measure (the identical addition character with distance).Conceptive, given have frame pitch from d 12And d 23Frame F 1, F 2And F 3, think F so 1With F 3Between distance be at least d 12+ d 23It is that carry out on the basis that frame is specified with this similar distance metric.
The GOP dispenser is operated by picture/mb-type being assigned to frame (when it is received).The Forecasting Methodology that the picture/mb-type indication may need when each block of coding.
The I picture be not encoded under the situation with reference to other picture.Because it is independently, so it provides the access point that decoding can begin in the data flow.If " distance " of a frame and its forerunner's frame surpasses the scene change threshold, so the I type of coding is assigned to described frame.
The P picture can be used for motion compensated prediction with previous I picture or P picture.In its priority of use front court or the frame and can be from the block of the block displacement that just is predicted as the basis that is used to encode.After from the block of just considering, deducting reference block, use discrete cosine transform to come the coded residual block usually to eliminate spatial redundancy.If a frame and " distance " that be designated as between the last frame of P frame surpass second threshold value (it is usually less than first threshold), so the P type of coding is assigned to described frame.
B frame picture will be previous described in can be as mentioned and next P or I picture be used for motion compensation.Block in the B picture can be through forward, reverse or bi-directional predicted; Perhaps it can carry out intraframe coding under not with reference to the situation of other frame.In H.264, reference block can be the nearly linear combination of 32 blocks (from the frame of as much).If frame can not be designated as I or P type, if " distance " between described frame and its forerunner's frame that is right after is appointed as the category-B type with described frame so greater than the 3rd threshold value (it is usually less than second threshold value).
If described frame can not be designated as B frame coding, be assigned therein as " skipped frame " state so.Because this frame is actually the copy of previous frame, so can skip this frame.
The tolerance that assessment quantizes the difference between the consecutive frame with display order is the first of this processing of being taken place.The distance that this tolerance is above to be mentioned; Utilize described tolerance, assess the suitable type of each frame.Therefore, the spacing between I frame and adjacent P frame or two the continuous P frames can be a variable.Calculate described tolerance to come processing video frames to begin by the motion compensator based on block, block is the base unit of video compression, and it is made up of 16 * 16 pixels usually, but for example 8 * 8,4 * 4 and 8 * 16 to wait other resource block size be possible.For the frame of being made up of two release of an interleave fields, motion compensation can be carried out based on the field, betides in the field but not in the frame for the search of reference block.For the block among first of present frame, the forward reference block is found in the field that is connected in the described frame after first; Equally, the back-reference block is found in and is right after in the described frame in the field before the front court.Current block is combined into compensating field.Described processing continues to carry out in second of described frame.Described two compensating fields combination is to form forward and reverse compensating field.
For the frame of in anti-telecine 606, setting up because moving-picture frame through rebuilding is only arranged, so to the search of reference block only based on frame.Find two reference block and two species diversity (forward with reverse), thereby also cause forward and reverse compensated frame.On the whole, motion compensator produces motion vector and difference measurement for each block; But under the treated situation of the output of deinterlacer 605, treated if block is the output of the part of NTSC field and anti-telecine, block is the part of moving-picture frame so.Note, in the described tolerance in the field of decent consideration or the block in the frame and front court formerly or the frame or be right after thereafter the field or frame in and the difference between the block that mates of described block assess, this depends on over against forward difference or reverse difference assesses.Only brightness value is participated in this calculating.
Therefore motion compensation step produces two groups of differences.These differences are the difference between the block of the brightness value in current brightness value and the reference block, and reference block is taken from and is right after before the present frame in time and is right after frame after the described present frame.For each pixel is determined the absolute value of each forward difference and each reverse difference, and each amounts on entire frame individually.When the release of an interleave NTSC field that comprises a frame was handled, two fields were included in two summations.In this way, obtain total absolute value (SAD of forward difference and reverse difference PAnd SAD N).
For each frame, use following relation to calculate the SAD ratio,
γ = ϵ + SAD P ϵ + SAD N - - - [ 22 ]
SAD wherein PAnd SAD nBe respectively total absolute value of forward difference and reverse difference.Little positive number ε is added in the molecule prevents " removing zero " mistake.Similar ε item is added in the denominator, thereby at SAD POr SAD NAny one further reduce the susceptibility of γ near zero the time.
In alternative aspect, difference can be SSD (difference of two squares and) and SAD (absolute difference with) or SATD, and the pixel value block is applied to described block and conversion with two-dimension discrete cosine transform by the difference in the block element before measured in SATD.On the effect video area, assess summation, but can use than the zonule in others.
Also calculate the brightness histogram of each frame of (non-motion compensation) that receives.The DC coefficient (that is, (0,0) coefficient) of described histogram in 16 * 16 coefficient arrays gone up operation, and described 16 * 16 coefficient arrays are the results that two-dimension discrete cosine transform are applied to brightness value block (if it is available).Equally, the mean value of 256 brightness values in 16 * 16 blocks can be used for described histogram.For brightness and depth is eight image, and the number of case (bin) is made as 16.Following metric evaluation histogram difference
λ = 1 N Σ i = 1 16 | N Pi - N Ci | - - - [ 23 ]
In following formula, N PiBe number of blocks from the previous frame in the i case, and N CiFor from the number of blocks that belongs to the present frame in the i case, N is the block sum in the frame.
These intermediate object programs through combination with form as shown in the formula the present frame difference measurement
D = γ C γ P + λ ( 2 λ + 1 ) - - - [ 24 ]
γ wherein CBe based on the SAD ratio and the γ of present frame PBe based on the SAD ratio of previous frame.If a scene has smooth motion and its lightness histogram does not almost change, D ≈ 1 so.If showing unexpected scene, present frame changes, so γ CWill be than big and γ PShould be less.Usage rate
Figure A20068004323900382
But not only use γ CSo that described tolerance turns to the Activity Level of described situation through specification.
Figure 42 explanation is assigned to compression type the processing of frame.D, the present frame difference that is defined in the equation 19 is for specifying the basis of the decision-making made from respect to frame.Indicated as decision block 4202, if frame under consideration is the first, arrive block 4206 along the decision path that is labeled as "Yes" so in sequence, declare that whereby described frame is the I frame.In block 4208, will add up frame difference and be set at zero, and described processing is returned (in block 4210) to the beginning block.If the decent frame of considering is not to be first frame in sequence, the path that is labeled as "No" so draws from the block 4202 that wherein makes decisions, and contrast scene change threshold is tested present frame difference in test block 4204.If present frame difference greater than described threshold value, arrives block 4206 along the decision path that is labeled as "Yes" so, cause the appointment of I frame once more.
If present frame difference less than the scene change threshold, arrives block 4212 so along the "No" path, in block 4212, present frame difference is added in the accumulative total frame difference.Proceed to decision block 4214 via flow chart, will add up frame difference and threshold value t compares, described threshold value t is generally less than described scene change threshold.If described accumulative total frame difference is greater than t, block 4216 is transferred in control so, and described frame is designated as the P frame; Described accumulative total frame difference then is reset to zero in step 4218.If described accumulative total frame difference is controlled so from block 4214 and is transferred to block 4220 less than t.In block 4220 with described present frame difference with compare less than the τ of t.If described present frame difference is less than τ, is designated as at frame described in the block 4222 so and skips and follow described processing and return; If described present frame difference is greater than τ, so described frame is designated as the B frame in block 4226.
Encoder
Return referring to Fig. 2, transcoder 200 comprises from preprocessor 226 and receives the treated metadata and the encoder 228 of original video.Described metadata can be included in any information of initial reception in the video 104 of source and any information of being calculated by preprocessor 226.Encoder 228 comprises first pass encoder 230, second time encoder 232 and encoder 234 again.Encoder 228 also receives the input from transcoder control 231, described transcoder control 231 can be provided to the information (for example, metadata, error resilient information, content information, encoded bit rate information, basal layer and enhancement layer balancing information and quantitative information) from second time encoder 232 first pass encoder 230, encoder 234 and preprocessor 226 again.Encoder 228 uses are from the content information of preprocessor 226 receptions and/or the video of being encoded and being received by the content information that encoder 228 itself (for example, by content, classification module 712 (Fig. 7)) is produced.
Fig. 7 explanation can be included in the block diagram of the functional module in exemplary twice encoder that can be used for encoder illustrated in fig. 2 228.The various aspects of display function module among Fig. 7, but all functions that can be incorporated in the encoder are not necessarily discussed in Fig. 7 and description herein.Therefore, be discussed below after basal layer and the enhancement layer coding, some aspect to described functional module is described hereinafter.
Basal layer and enhancement layer coding
Encoder 228 can be a SNR ges forschung device, the encoded data (being also referred to as enhancement layer in this article) that it can be encoded to first group of encoded data (being also referred to as basal layer in this article) and one or more additional set with original video and the metadata from preprocessor 226.Encryption algorithm produces basal layer and enhancement layer coefficient in when decoding, and when all can be used for decoding, described coefficient can be in the combination of decoder place when two-layer.When all unavailable, the coding of basal layer allows it decoded as simple layer when two-layer.
Referring to Figure 31 the aspect that this multi-layer coding is handled is described.At block 321 places, with complete intraframe coding macro zone block (intraframe coding MB) coding I frame.In H.264, by encode intraframe coding MB in the I frame of the spatial prediction that makes full use of, described spatial prediction provides the coding gain of significant number.There are two spermotypes: intra-frame 4 * 4 and frame interior 16 * 16.If the coding gain that basal layer will utilize spatial prediction to provide, so described basal layer need carry out Code And Decode before the Code And Decode enhancement layer.Use twice Code And Decode of I frame.In basal layer, base layer quantization parameter QP bFor conversion coefficient provides the rudenss quantization step-length.Will encode at the enhancement layer place between primitive frame and the reconstruction base layer frame based on pixel difference.Described enhancement layer uses quantization parameter QP e, described quantization parameter QP eMore accurate quantization step is provided.At block 321 place's code devices (for example encoder 228 of Fig. 2) executable code.
At block 323 places, encoder is encoded to the P frame that is used for just treated GOP and/or the base layer data and the enhancement data of B frame.Code device (for example encoder 228) can be carried out coding at block 323 places.At block 325 places, whether the encoding process inspection also exists P frame or B frame to encode.Code device (for example SNR ges forschung device 228) can be carried out action 325.If also remaining P frame or B frame, repeating step 323 is till finishing coding for all frames among the GOP so.P frame and B frame comprise interframe encode macro zone block (interframe encode MB), but can have intraframe coding MB (as hereinafter discussing) in P frame and B frame.
In order to make decoder difference base layer data and enhancement data, encoder 228 coding overhead information (block 327).The type of overhead information comprise the number of (for example) identification layer data, with layer be identified as the data of basal layer, data, the correlation of identification between the multilayer that layer is identified as enhancement layer (for example, layer 2 is the enhancement layer about basal layer 1, perhaps layer 3 is a enhancement layer about enhancement layer 2) data, perhaps layer is identified as the data of the final enhancement layer in a series of enhancement layer.Overhead information can be contained in its under basal layer and/or the header that is connected of enhancement data in, perhaps be contained in the independent data-message.Code device (for example encoder 228 of Fig. 2) can be carried out described processing at block 327 places.
In order to realize individual layer decoding, before going to quantize in conjunction with the coefficient of two layers.Therefore, the coefficient of two layers must alternatively produce; Otherwise this can introduce the overhead of significant quantity.The former coding because of basal layer that overhead increases can be used different time references with enhancement layer coding.Need an algorithm to produce basal layer and enhancement layer coefficient, but when the equal time spent of two layers, described coefficient can be in the combination of decoder place before going quantification.Simultaneously, when enhancement layer is unavailable or decoder for the former of (for example) power-saving thereby when determining not decoding enhancement layer, described algorithm should be acceptable base layer videos and prepares.The details of one illustrative example of this processing hereinafter further is discussed in the context (being right after below) that the normative forecast coding is briefly discussed.
P frame (or any interframe encode section) can utilize the time redundancy of mating most between the estimation range in zone of one in the photo current and the reference picture.Can in motion vector, encode to the position of mating the estimation range described in the reference frame most.Difference between current region and the match reference estimation range is called residual error (or predicated error).
Figure 32 is the explanation of the example that the construction of P frame is handled among (for example) MPEG-4.Handle 331 describing in detail for the instance processes in the block 323 that can betide Fig. 31.Handle 331 and comprise the photo current of being made up of 5 * 5 macro zone blocks 333, wherein the number of macro zone block is in this example for arbitrarily.One macro zone block is made up of 16 * 16 pixels.Pixel can be defined by 8 brightness values (Y) and two 8 color values (Cr and Cb).In MPEG, Y, Cr and Cb component can be stored by the 4:2:0 form, and wherein Cr component and Cb component are downsampled with 2 on directions X and Y direction.Therefore, each macro zone block will be made up of 256 Y components, 64 Cr components and 64 Cb components.The macro zone block 335 of photo current 333 is that basis is predicted with the reference picture 337 that photo current 333 is in different time points.In reference picture 337, search for to locate and mate macro zone block 339 most, mate macro zone block 339 the most approaching current macro zone block 335 that just is being encoded aspect Y value, Cr value and Cb value most.The position of mating macro zone block 339 in the reference picture 337 most is coded in the motion vector 341.Reference picture 337 can be I frame or the P frame that decoder will have been rebuild before construction photo current 333.From current macro zone block 335, deduct and mate macro zone block 339 (difference of each of calculating Y component, Cr component and Cb component) most, thereby cause residual error 343.Encode and then quantize 347 with 345 pairs of residual errors 343 of 2D discrete cosine transform (DCT).Can carry out quantification 347 and provide space compression less position is distributed to high frequency coefficients and simultaneously low frequency coefficients distributed in more position by (for example).The quantization parameter of residual error 343 and motion vector 341 and reference picture 333 identifying informations are the encoded information of the current macro zone block 335 of expression.Described encoded information can be stored in the memory so that followingly use or be operated for (for example) error correction or figure image intensifying purpose or transmit via network 349.
Can use the encoded quantization parameter of residual error 343 and encoded motion vector 341 to rebuild current macro zone block 335 in the encoder, with acting on the part of the reference frame of estimation and compensation subsequently.Encoder can be simulated and is used for the decoder program that this P frame is rebuild.The simulation of decoder will cause encoder with decoder identical reference picture to be worked.Be presented in the encoder reconstruction process of carrying out (for further interframe encode) or in decoder, carrying out herein.The reconstruction of P frame can begin after reference frame (perhaps just by the part of the picture of reference or frame) is rebuilt.Encoded quantization parameter is gone to quantize 351, and then carries out oppositely DCT (or IDCT) 353 of two dimension, thereby causes through residual error 355 decoding or that rebuild.Encoded motion vector 341 is through decoding and be used to make mating most in the reference picture 337 that macro zone block 357 is positioned to have rebuild of having rebuild.Then the residual error 355 of rebuilding is added to the macro zone block of rebuilding with formation in the macro zone block 357 359 that mates most of reconstruction.The macro zone block of rebuilding 359 can be stored in the memory, show independently or be shown in the picture with the macro zone block of other reconstruction, or is further processed for the figure image intensifying.
B frame any section of bi-directional predictive coding (or with) can utilize one in a zone and the previous picture in photo current to mate estimation range and the time redundancy of mating most between the estimation range in the picture subsequently most.To mate the estimation range subsequently most and before mate most the estimation range combination to form combined bi-directional prediction region.Photo current zone and the difference of mating most between the combined bi-directional prediction region are residual error (or predicated error).Mate most the estimation range in reference picture subsequently the position and mate the estimation range most and formerly can in two motion vectors, encode the position in the reference picture.
One example of the coder processes of the coding that is used for base layer coefficient and enhancement layer coefficient that Figure 33 explanation can be carried out by encoder 228.Encode described basal layer and described enhancement layer so that SNR to be provided scalable bit stream.Figure 33 describes to be used for the example of MB residual error coefficient between coded frame, for example will carry out in the step 323 of Figure 31.Yet, also can use similar methods to come MB coefficient in the coded frame.Code device (for example encoder component 228 of Fig. 2) can be carried out processing illustrated among Figure 33 and the step 323 of Figure 32.Original (to be encoded) video data 406 (in this example, video data comprises lightness and chrominance information) is input to basal layer mates macro zone block loop 302 most and enhancement layer mates in the macro zone block loop 365 most.The target in two loops (363 and 365) is that the residual error of calculating at adder 367 and 369 places is minimized.(as shown in the figure) or carry out loop 363 and 365 successively concurrently.Loop 363 and 365 comprises the logic that is used for searching for respectively buffer 371 and 373, and described buffer contains reference frame to be made with identification and mate macro zone block and the minimized macro zone block (buffer 371 and 373 can be identical buffer) that mates most of the residual error between the initial data 361 most.Because base layer loop 363 will be utilized the quantization step (higher QP value) more rough than enhancement layer loop 365 usually, so the residual error in loop 363 and loop 365 will be different.The residual error in transform blockiis 375 and 377 each loop of conversion.
Then in selector 379, will analyze in base layer coefficient and the enhancement layer coefficient through conversion coefficient.Such as hereinafter discussion, the analysis of selector 379 can have some forms.A denominator of analysis technology is enhancement layer coefficient C ' EnhAs calculated so that it is to base layer coefficient C ' BaseDifferential refinement.Enhancement layer is calculated as allows decoder by itself decoding base layer coefficient and have the reasonable representation form of image, perhaps base layer coefficient and enhancement layer coefficient are made up and have the improved representation of image the improvement of basal layer.Selector 379 selected coefficients are then quantized by quantizer 381 and quantizer 383.Quantization parameter
Figure A20068004323900421
With
Figure A20068004323900422
(calculating by quantizer 381 and 383 respectively) can be stored in the memory or by via Network Transmission to decoder.
For the reconstruction that makes macro zone block is matched with in the decoder, go quantizer 385 that basal layer residual error coefficient is gone to quantize.Residual error coefficient through going to quantize is through inverse transformation (387) and be added in the coupling macro zone block of being found in (389) buffer 371, thus obtain with decoder in the macro zone block of reconstruction that the macro zone block of rebuilding is complementary.Quantizer 383, go quantizer 391, inverse transformer 395, adder 397 and buffer 373 in strengthening loop 365, to carry out and the similar calculating of in base layer loop 363, being carried out of calculating.In addition, adder 393 be used for be used for enhancement layer reconstruction go to quantize the combination of enhancement layer and base layer coefficient.Enhancement layer quantizer and go quantizer will utilize usually than the accurate quantiser step size of basal layer (lower QP).
Figure 34, Figure 35 and Figure 36 show the example that basal layer in the selector 379 that can be used for Figure 33 and enhancement layer coefficient selector are handled.Choice device (for example encoder 228 of Fig. 2) can carry out Figure 34,35 and Figure 36 in the processing described.Figure 34 as an example, is gone in basal layer and the enhancement layer coefficient by analysis shown in following equation through conversion coefficient:
Figure A20068004323900423
C′ enh=C enh-Q b -1(Q b(C′ base)) [26]
Wherein " min " function can be the mathematics minimum value or the minimum value of two independents variable.Equation 25 is depicted as block 401 among Figure 34 and equation 26 and is depicted as adder 510 among Figure 34.In equation 26, Q bRepresent base layer quantizer 381 and Q b -1That represents basal layer removes quantizer 385.Equation 26 is converted to enhancement layer coefficient the differential refinement of the base layer coefficient of calculating with equation 25.
Figure 35 is the explanation of another example of basal layer and enhancement layer coefficient selector 379.In this example, the equation (.) that is contained in the block 405 is represented following formula:
Figure A20068004323900424
Adder 407 is calculated enhancement layer coefficient shown in following two equatioies:
C′ enh=C enh-Q b -1(Q b(C′ base)) [28]
C ' wherein BaseProvide by equation 27.
Figure 36 is the explanation of another example of basal layer and enhancement layer selector 379.In this example, the constant and enhancement layer of base layer coefficient equals to quantize/go the difference between quantized base layer coefficient and the original enhancement layer coefficient.
Except basal layer and enhancement-layer residual error coefficient, how encoded decoder also need to discern MB information.Code device (for example encoding pack 228 of Fig. 2) codified overhead information, described overhead information can comprise the mapping of intraframe coding and interframe encode part, for example wherein macro zone block (or sub-macro zone block) is identified as intra-encoded or the MB mapping of the interframe encode part institute reference of interframe encode (also discern the interframe encode of which kind, comprise that (for example) is forward, reverse or two-way) and frame.At an instance aspect, described MB mapping and base layer coefficient are encoded in basal layer, and enhancement layer coefficient is encoded in enhancement layer.
P frame and B frame can contain intraframe coding MB and interframe MB.Hybrid video coders is used rate distortion (RD) optimization to decide usually some macro zone block in P frame or the B frame is encoded to intraframe coding MB.In order to realize individual layer decoding (wherein intraframe coding MB does not rely on MB between enhancement layer frame), MB between any consecutive frame is not used for the spatial prediction of basal layer intraframe coding MB.For the computational complexity about enhancement layer decoder is remained unchanged,, can skip the improvement at enhancement layer place for the intraframe coding MB in basal layer P frame or B frame.
Intraframe coding MB in P frame or B frame can be Duoed a lot of positions than interframe MB.Owing to this reason, the intraframe coding MB in P frame or the B frame can only encode with the base layer quality with higher QP.This will introduce some degenerations of video quality, if but this degeneration is modified (discussing as mentioned) in the frame after a while with interframe MB coefficient (in basal layer and enhancement layer), and so described degeneration should be inapparent.It is inapparent that two reasons make that this deteriorates to.First is that human visual system's (HVS) characteristic and another are that interframe MB improves MB in the frame.For change the object of position to second frame from first frame, some pixels in described first frame are sightless (information to be covered) in described second frame, and some pixels in described second frame are visible (not coverage information) during in the first time.Human eye is insensitive for visual information unlapped He to be covered.Therefore for unlapped information, even it is with lower quality coded, eyes still can not pick out difference.If identical information remains in subsequently the P frame, the big possibility that will exist the frame of P subsequently at enhancement layer place to make improvements so is because enhancement layer has lower QP.
Known another common technique of introducing intraframe coding MB in P frame or B frame is a frame refreshing.In the case, will indicate some MB to should be interframe encode MB although standard R-D optimizes, described MB is encoded as intraframe coding MB.These intraframe codings MB that is contained in the basal layer can QP bOr QP eCoding.If QP eBe used for basal layer, the enhancement layer place is without any need for improvement so.If QP bBe used for basal layer, can need so to improve, otherwise the following general who has surrendered of quality at the enhancement layer place be significant.Because on the meaning of code efficiency, interframe encode is more effective than intraframe coding, so these improvement at enhancement layer place will be through interframe encode.Therefore, base layer coefficient will not be used to enhancement layer.Therefore, under the situation of not introducing new operation, quality is improved at the enhancement layer place.
Because the B frame provides high compression quality, so use it in the enhancement layer usually.Yet the B frame can be with reference to the intraframe coding MB of P frame.If the pixel of B frame will be encoded with enhancement layer quality, so its can owing to P frame intraframe coding MB require too many position than low quality, discuss in as mentioned.Quality by utilizing HVS (discussing as mentioned), the quality coded that B frame MB can be lower when the low-qualityer intraframe coding MB of reference P frame.
The extreme case of intraframe coding MB in P frame or the B frame is, owing to existing scene to change in the video of just encoding, so all MB in P frame or the B frame encode with frame mode.In the case, entire frame can base layer quality coding and enhancement layer place without any improvement.If scene changes and come across B frame place and hypothesis B frame only is encoded in enhancement layer, the described B frame or just it is abandoned of so can base layer quality encoding.If changing, scene comes across P frame place, so can be without any need for variation, but discardable described P frame or it is encoded with base layer quality.Scalable floor is coded in [attorney docket/Ref. No. 050078] number co-pending U.S. patent application case that is entitled as " SCALABLE VIDEO CODING WITH TWO LAYER ENCODINGAND SINGLE LAYER DECODING " and further describes, described patent application case by this assignee all and be incorporated herein by reference in full.
Encoder first pass part
One illustrative example of the encoder 228 of Fig. 7 exploded view 2.Shown in block explanation can be included in various coder processes in the encoder 228.In this example, encoder 228 is included in second time part 706 under the first pass part 706 and online 704 on the boundary line 704 (comprising second time encoder 232 and encoder 234 functional again among Fig. 2).
Metadata and original video that encoder 228 receives from preprocessor 226.Described metadata can comprise any metadata that is received or calculated by preprocessor 226, and it comprises the metadata relevant with the content information of video.First pass part 702 explanations of encoder 228 can be included in the exemplary processes in the first pass coding 702, and described first pass coding 702 is hereinafter according to its functional being described.As be understood by those skilled in the art that this type of functional can enforcement by various forms (for example, hardware, software, firmware or its combination).
Fig. 7 illustrates adaptability frame refreshing (AIR) module.AIR module 710 is provided to input based on metadata the I frame illustration module 708 of illustration I frame.First pass part 702 also can comprise content, classification module 712, and described content, classification module 712 is configured to receive metadata and video and definite content information relevant with described video.Content information can be provided to rate controlled position distribution module 714, described rate controlled position distribution module 714 also receives metadata and video.Control bit distribution module 714 is determined the control information of speed position and it is provided to mode decision module 715.Content information and video can be provided to frame inner model (distortion) module 716, described frame inner model (distortion) module 71 6 is provided to mode decision module 715 with the intraframe coding distortion information and and about the scalability rate distortion module 718 of basal layer and enhancement layer.Video and metadata are provided to estimation (distortion) module 720, and described estimation (distortion) module 720 is provided to scalability rate distortion module 718 about basal layer and enhancement layer with the interframe encode distortion information.Scalability rate distortion module 718 uses about basal layer and enhancement layer estimate to determine scalability rate distortion information from the distortion of motion estimation module 720 and frame inner model distortion module 716, and described scalability rate distortion information is provided to mode decision module 715.Mode decision module 715 also receives the input from section/MB sequence module 722.Section/MB sequence module 722 receives from error resilient module 740 input of (showing in second time part 706), and will be provided to mode decision module 715 with the information that the access unit border of error resilient is aimed at independently about the codified part of video (section).Mode decision module 715 is imported to determine coding mode information and " the best " coding mode is provided to part 706 second time based on it.Hereinafter the interpretation that further specifies to some examples of this first pass part 702 codings is described.
As noted before, metadata and original video that content, classification module 712 receives by preprocessor 226 supplies.In some instances, preprocessor 226 calculates and (for example is provided to content, classification module 712 from the content information of multi-medium data and with described content information, in metadata), described content, classification module 712 can use described content information to determine classifying content about described multi-medium data.In some others, content, classification module 712 is configured to definite various content informations from multi-medium data, and also can be configured to determine classifying content.
It is that the video with dissimilar contents is determined different classifying contents that content, classification module 712 can be configured next.Different classifying contents can cause being used for the different parameters of the aspect of encoded multimedia data, for example, be identified for determining quantization parameter, estimation, scalability, error resilient, keep the optimum multimedia quality of data on the channel and (for example be used for the Fast Channel exchange scheme, periodically force the I frame to allow the Fast Channel exchange) bit rate (for example, the position is distributed).According to an example, encoder 228 is configured to content-based classification and determines rate distortion (R-D) optimization and bit rate allocation.Determine that the content-based classification of classifying content permission multi-medium data is compressed to the given credit rating corresponding to required bit rate.And,, make that the gained perceived quality of the multi-medium data that transmits depends on video content on the display of receiving system by the content of multi-medium data is classified (for example, determining classifying content) based on the human visual system.
Fig. 9 shows and handles 900, and as the example in order to program that content is classified that content, classification module 712 is experienced, its description sort module 712 can be so as to the exemplary processes of operation.As shown in the figure, processing 900 begins at input block 902 places, and content, classification module 712 receives original multi-medium data and metadata in input block 902.Handle 900 and then proceed to block 904, content, classification module 712 is determined the spatial information and the temporal information of multi-medium data in block 904.In some respects, room and time information is covered (for example, filtering) by room and time and is determined.Can determine room and time information based on the metadata that comprises scene delta data and motion vector (MV) smoothing.Handle 900 and then proceed to the block 912 of carrying out space complexity, time complexity and susceptibility estimation.Handle 900 and then proceed to block 916, the content of multi-medium data is classified based on the result of determined space, time and sensitivity data in block 904 and 912 in block 916.In block 916, can select particular rate distortion (R-D) curve and/or renewable R-D curve data equally.Handle 900 and then proceed to output block 918, wherein said output can comprise complexity-distortion map or the value and/or the selected R-D curve of indication room and time activity (for example, classifying content).Return referring to Fig. 7, content, classification module 712 is provided to output rate controlled position distribution module 714, frame inner model (distortion) module 716 and also is provided to I frame illustration module 708 (above discussing).
Content information
Content, classification module 712 can be configured to calculate the various content informations from multi-medium data, it comprises various content calculations of correlation, and described tolerance comprises space complexity, time complexity, contrast ratio value, standard deviation and the frame difference metric that hereinafter further describes.
Content, classification module 712 can be configured to determine the space complexity and the time complexity of multi-medium data, and also texture value is associated with described space complexity and motion value is associated with described time complexity.Content, classification module 712 receives and the relevant preliminary treatment content information of content from the just encoded multi-medium data of preprocessor 226, and perhaps, preprocessor 226 can be configured to calculate content information.As mentioned, described content information can comprise (for example) one or more D CsatValue, contrast ratio value, motion vector (MV) and absolute difference and (SAD).
On the whole, multi-medium data comprises one or more sequences of image or frame.Each frame can be decomposed into block of pixels so that handle.Space complexity is the term of broad sense, and it is the measuring of grade of the spatial detail in descriptor frame substantially.The scene simple or constant or little change zone that mainly has brightness and colourity will have the low spatial complexity.Described space complexity is associated with the texture of video data.Space complexity is called D based on (in this regard) CsatHuman vision susceptibility tolerance, described tolerance is calculated function as local space frequency and ambient lighting at each block.Those skilled in the art know that the spatial frequency pattern that uses visual pattern and illumination and contrast characteristic utilize human visual system's technology.Some susceptibilitys tolerance is known have been utilized human visual system's perspective restriction and can have been used by method described herein.
Time complexity is the term of broad sense, and it is used for the measuring of sport rank in the multi-medium data of reference between the frame of big volume description such as frame sequence.The scene (for example, the frame sequence of video data) that has little motion or do not have a motion has lower time complexity.Time complexity can be calculated and can be based on D at each macro zone block CsatAbsolute pixel difference between value, motion vector and a frame and another frame (for example, reference frame) and.
Frame difference metric provides measuring of two differences between the successive frame, and it is considered amount of exercise (for example, motion vector or MV) and is expressed as absolute difference and (SAD) residual amount of energy between prediction macro zone block and the current macro zone block.Frame difference also provides measuring of two-way or single directional prediction efficient.
An example based on the frame difference metric of the movable information that receives from preprocessor (may carry out the motion compensation release of an interleave) is as follows.Deinterlacer carries out bi-directional motion estimation and therefore bidirectional motion vector and SAD information are available.The frame difference of being represented by SAD_MV that is used for each macro zone block can followingly draw:
SAD_MV=log 10[SAD*exp(-min(1,MV))] [29]
MV=Square_root (MV wherein x 2+ MV y 2), SAD=min (SAD N, SAD P), SAD wherein NBe the SAD that calculates from backward reference frame, and SAD PBe the SAD that calculates from the forward reference frame.
The other method of estimated frame difference is above described with reference to equation 6-8.Can in equation 6, early describe as mentioned and calculate SAD ratio (or contrast ratio) γ.Also can determine the brightness histogram of each frame, use equation 7 compute histograms difference λ.Can as shown in equation 8, calculate frame difference metric D.
In an illustrative example, utilize contrast ratio and frame difference metric to obtain the video content classification in following mode, described video content classification can be predicted the characteristic in the given video sequence reliably.Although be described as herein betiding in the encoder 228, preprocessor 226 also can be configured to determine classifying content (or other content information) and described classifying content is delivered to encoder 228 via metadata.Processing described in the following example is eight possible kinds with described classifying content, and it is similar to the classification that obtains from the analysis based on the R-D curve.Classification is treated to the value in the scope of each superframe output between 0 and 1, and this depends on that scene in the complexity of scene and the described superframe changes the number of appearance.Content, classification module in the preprocessor can be carried out following step (1)-(5) to obtain classifying content tolerance from frame contrast and frame difference value at each superframe.
1. from the macro zone block reduced value, calculate average frame contrast and frame contrast deviation.
2. use the value (it is respectively 40 and 5) that from simulation, obtains to make frame contrast and the standardization of frame difference value.
3. use (for example) vague generalization equation to calculate classifying content tolerance:
CCMetric=CCW1*I_Frame_Contrast_Mean+CCW2*Frame_Difference_Mean-CCW3*I_Contrast_Deviation^2*exp?(CCW4*Frame_Difference_Deviation^2) [30]
Wherein CCW1, CCW2, CCW3 and CCW4 are weighted factor.In this example, described value is through being chosen as: CCW1 is 0.2, CCW2 is 0.9, CCW3 be 0.1 and CCW4 be-0.00009.
4. determine the number that scene changes in the superframe.Usually, superframe is meant a picture group sheet or a frame that can show in the specific period.Usually, the described period is one second.In some respects, superframe comprises 30 frames (being used for the 30/fps video).In others, superframe comprises 24 frames (24/fps video).The number that the visual field scape changes and deciding is carried out one in the following situation.
(a) no scene changes: when not existing any scene to change in the superframe, tolerance fully only depends on the frame difference value as shown in following equation:
CCMetric=(CCW2+(CCW1/2))*Frame_Difference_Mean-(CCW3-(CCW1/2))*1*exp(-CCW4*Frame_Difference_Deviation^2) [31]
(b) single scene changes: when in superframe, observing single scene change frame, will use the acquiescence equation to come computation measure, as follows:
CCMetric=CCW1*I_Frame_Contrast_Mean+CCW2*Frame_Difference_Mean-CCW3*I_Contrast_Deviation^2*exp(CCW4*Frame_Difference_Deviation^2) [32]
(c) two scenes change: when existing 2 scenes to change in observing given superframe at the most, last superframe is given more flexible strategy (because first will upgrade by any way fast by back) than first, as shown in following equation:
CCMetric=0.1*I_Frame_Contrast_Mean1+CCW1*I_Frame_Contrast_Mean2+(CCW2-0.1)*Frame_Difference_Mean-CCW3*I_Contrast_Deviation1^2*I_Contrast_Deviation2^2*exp(CCW4*Frame_Difference_Deviation^2) [33]
(d) three or three above scenes variations: have 3 above I frames (for example N) if observe given superframe, so last I frame is given more flexible strategy and all other I frames are given 0.05 flexible strategy, as shown in following equation:
CCMetric=0.05*I_Frame_Contrast_Mean (1....N-1)+CCW1*I_Frame_Contrast_Mean (N)+(CCW2-(0.05*(N-1)))*Frame_Difference_Mean-CCW3*I_Contrast_Deviation (N)^2*I_Contrast_Deviation (1....N-1)^2*exp(CCW4*Frame_Difference_Deviation^2) [34]
When frame difference mean value less than 0.05 the time, can under the situation of harmonic motion scene, high-ranking officers be used in the tolerance.To 0.33 skew (CCOFFSET) be added among the CCMetric.
Content, classification module 712 is used D CsatValue, motion vector and/or absolute difference and the next value of determining indication about the space complexity of macro zone block (video data that perhaps specifies number).Time complexity is determined by the measuring of frame difference metric (difference between two successive frames, consider amount of exercise (having motion vector) between the frame and absolute difference and).
In some respects, content, classification module 712 can be configured to produce the bandwidth mapping.For instance, if preprocessor 226 does not produce the bandwidth mapping, the bandwidth mapping produces and can be carried out by content, classification module 712 so.
Determine texture value and motion value
For each macro zone block in the multi-medium data, content, classification module 712 is associated texture value and motion value is associated with time complexity with space complexity.Described texture value is relevant with the brightness value of multi-medium data, the wherein less variation in the brightness value of the neighbor of low texture value designation data, and the bigger variation in the brightness value of the neighbor of high texture value designation data.In case calculate texture value and motion value, content, classification module 712 is just determined classifying content by consideration movable information and texture information.Content, classification module 712 will about the texture of the video data of just classifying and relative texture value (for example, " low " texture, " in " texture or " height " texture) be associated, described relative texture value is indicated the complexity of the brightness value of macro zone block usually.And, motion value that content, classification module 712 will be calculated for the video data of just classifying and relative motion value (for example, " low " motion, " in " motion or " height " motion) be associated, described relative motion value is indicated the amount of exercise of macro zone block usually.In alternative aspect, can use less or more kind about motion and texture.Then, classifying content tolerance is then determined by texture value and motion value that consideration is associated.
Fig. 8 illustrates an example of classification chart, and how its explanation texture value and motion value are associated with classifying content.One of ordinary skill in the art are familiar with several different methods and come (for example) to implement this classification chart in look-up table or database.Described classification chart is based on that the predetermined estimation of video data content produces.In order to determine the video data classification, make " low ", " in " or the texture value (on " x axle ") of " height " and " low ", " in " or motion value (on " y the axle ") reference mutually of " height ".The classifying content that intersection block middle finger is shown is assigned to video data.For instance, " height " texture value and " in " motion value obtains classifying seven (7).The relative texture value that Fig. 8 explanation is associated with eight different classifying contents in this example and the various combinations of motion value.In some others, can use more or less classification.Further describing of the illustrative aspect of classifying content is disclosed in the 11/373rd of being entitled as of application on March 10th, 2006 " Content Classification forMultimedia Processing ", in No. 577 common U.S. patent application case co-pending, described patent application case transfers this assignee and is incorporated herein clearly by reference at this.
The rate controlled position is distributed
As described herein, the multimedia data contents classification can be used for encryption algorithm is kept video simultaneously to improve the position management effectively constant perceived quality.For instance, classification tolerance can be used for distributing the upwards algorithm of conversion (FRUC) of control and frame rate about scene change-detection, encoded bit rate.Compresser/decompresser (codec) system and digital signal processing algorithm generally are used for video data communication and can be configured to preserve bandwidth, but exist compromise between quality and bandwidth preservation.Optimal codec provides maximum bandwidth to preserve and produces minimum degradation to video quality simultaneously.
In an illustrative example, rate controlled position distribution module 714 uses classifying contents to determine bit rate (for example, being equipped with the number of the position that is used to the multi-medium data coding through branch) and described bit rate stored in the memory for other of encoder 228 to handle and the assembly use.The bit rate of determining from the classification of video data can help to preserve bandwidth and the multi-medium data with consistent credit rating is provided simultaneously.In one aspect, each that different bit rate and eight different contents can be classified is associated and then described bit rate is used for the encoded multimedia data.Income effect is, although the classification of the different content of multi-medium data be assigned with obtain different numbers the position to be used for coding, when observe on display, perceived quality is similar or unanimity.
Usually, have the motion and/or the texture of the multi-medium data indication higher level of higher classifying content, and when coding, be assigned with and obtain more position.Having the multi-medium data of low classification (indicating less texture and motion) is assigned with and obtains less position.For the multi-medium data with certain content classification, bit rate can be based on determining about the selected target perceived quality grade of observing multi-medium data.Determine that the multi-medium data quality can be observed multi-medium data and the multi-medium data classification is determined by the mankind.In some alternative aspect, Auto-Test Systems such as use (for example) signal-to-noise ratio (SNR) Algorithm come the multi-medium data quality is made an estimate.In one aspect, predetermined one group of standard quality grade (for example, five) and the required corresponding positions speed of each extra fine quality grade of realization are to be used to have the multi-medium data of each classifying content.In order to determine one group of credit rating, can come the multi-medium data with certain content classification is assessed by producing Mean Opinion Score number (MOS), described MOS provides the numerical value indication of the visually-perceptible quality of described multi-medium data when multi-medium data uses certain bit rate coding.Described MOS can be expressed as the single digital in 1 to 5 scope, wherein 1 is that lowest perceived quality and 5 is the highest perceived qualities.In others, described MOS can have more than five or be less than five credit rating, and can use the difference of each credit rating is described.
Determine that the multi-medium data quality can be observed multi-medium data and the multi-medium data classification is determined by the mankind.In some alternative aspect, Auto-Test Systems such as use (for example) signal-to-noise ratio (SNR) Algorithm come the multi-medium data quality is made an estimate.In one aspect, the corresponding positions speed that each extra fine quality grade of predetermined one group of standard quality grade (for example, five) and realization one is required is to be used to have the multi-medium data of each classifying content.
Can pass through that select target (for example, required) credit rating is determined the visually-perceptible credit rating and about the relation between the bit rate of multi-medium data with certain classifying content.But be used for determining the target quality level preliminary election of bit rate, by the user select, by handling automatically or semi-automatic processing (need from the user or from the input of another processing) selection or dynamically select based on preassigned by code device or system.The type that can use based on (for example) coding or the type of the client terminal device of receiving multimedia data come the select target credit rating.
In example illustrated in fig. 7, rate controlled position distribution module 714 receives the data and direct metadata from preprocessor 226 from content, classification module 712.Rate controlled position distribution module 714 resides in the first pass part of encoder 228, and rate controlled fine setting module 738 resides in second time part 706.This twice rate controlled aspect is configured to make first pass (rate controlled position distribution module 714) to predict that with a superframe carrying out the situation adaptable bit (for example distributes, long-term average bit rate with 256kbps is a target) and peak limiting speed, and second time (rate controlled fine setting module 738) is that two-layer scalability improves the first pass result and carries out rate adaptation.Rate controlled is operated on four grades: (1) GOP grade-be controlled to be the position distribution in GOP of I frame, P frame, B frame and F frame heterogeneous; (2) superframe grade: will limit being controlled on the maximum superframe size firmly; (3) frame grade-come the control bit demand according to the room and time complexity of multi-medium data frame, its content-based information (for example, classifying content); (4) the macro zone block grade-shining upon the position of controlling macro zone block based on the room and time complexity distributes its content-based information (for example, classifying content).
The exemplary flow chart of the operation of explanation rate controlled module 714 among Figure 10.As shown in Figure 10, processing 1000 begins at input (1002) block place.Rate controlled module 714 receives various inputs, Fig. 7 and all inputs of inessential explanation.For instance, input information can comprise from the metadata of preprocessor 226, targeted bit rates, encoder buffer size (perhaps, as equivalent, about the maximum delay time of rate controlled), initial rate control lag and frame rate information.Further input information can comprise the input at set of pictures (GOP) grade place, and it comprises the length of (for example) maximum superframe size, GOP and arrangement that the P/B frame distributes (comprising the scene change information), required basal layer and enhancement layer, about the complexity-distortion metrics of the picture of following 30 frames among the GOP.Other input information comprises the input at photo grade place, and it comprises about the complexity-distortion map of photo current (receiving from content, classification module 712), quantization parameter (QP) and the position decomposition of 30 frames (being matched with on the sliding window) in the past.At last, the input information at macro zone block (MB) grade place comprises the encoded block pattern (CBP) no matter whether the mean absolute difference (MAD) that disposes macro zone block (MB) in (for example) reference picture and macro zone block (are skipped) after quantification.
After the input of block 1002 places, handle 1000 and proceed to block 1004 is used for coding stream with execution initialization.Simultaneously, carry out buffer initialization 1006.Then, initialization GOP as shown in block 1008, wherein the GOP position distributes 1010 to be received as an initialized part.After the GOP initialization, flow process proceeds to block 1012, wherein initialization one section.This initialization comprises shown in block 1014 renewal to header bits.After the initialization of carrying out block 1004,1008 and 1012, shown in block 1016, carry out rate controlled (RC) to base unit or macro zone block (MB).Part as the rate controlled of macro zone block in the block 1016 is determined receives input via the interface in the encoder 228.These inputs can comprise that macro zone block (MB) position distributes 1018, the renewal 1020 of secondary model parameter and with the renewal 1022 of median absolute difference (" MAD ", the sane estimated value of the dispersion) parameter of median.Then handle 1000 and proceed to block 1024, it is used for executable operations (1024) after picture of coding.This program comprises the renewal of reception buffer parameter shown in block 1026.Handle 1000 and then proceed to output block 1028, wherein 714 outputs of rate controlled module are by the quantization parameter QP about each macro zone block MB of mode decision module shown in Figure 7 715 uses.
Estimation
Motion estimation module 720 receives from the input of the metadata of preprocessor 226 and original video and can comprise that the output of resource block size, motion vector, distortion metrics and reference frame identifier is provided to mode decision module 715.The example operation of Figure 11 account for motion estimation module 720.As shown in the figure, handle 1100 to import 1102 beginnings.When the frame level, module 720 receives the input of reference frame ID and motion vector.When the macro zone block level, input 1102 comprises input pixel and reference frame pixel.Handle 1100 and proceed to step 1104, wherein carry out color motion estimation (ME) and motion vector prediction.In order to carry out this processing, receive various inputs, it comprises MPEG-2 motion vector and lightness motion vector (MV) 1106, motion vector smoothing 1108 and non-causal motion vector 1110.Then, handle 1100 and proceed to block 1112, wherein carry out motion vector sought algorithm or method, for example hexagon or diamond search method.The input of the processing at block 1112 places can comprise shown in block 1114 absolute difference and (SAD), the difference of two squares and (SSD) and/or other tolerance.In case the execution motion vector sought is handled 1100 and is just proceeded to the termination block 1116 that executive termination is therein handled.Handle 1100 and then finish at output block 1118 places, it produces the output of resource block size, motion vector (MV), distortion metrics and reference frame identifier.
Scalability R-D about basal layer and enhancement layer
Figure 13 illustrates the exemplary flow chart of scalability processing 1300, and wherein said scalability is handled 1300 and can be carried out by scalability R-D module 718.Processing 1300 is from 1302 beginnings of beginning block and proceed to block 1304, and scalability R-D module 718 receives from the input of motion estimation module 720 and carries out estimation at block 1304 places.Estimation depends on basal layer reference frame, enhancement layer reference frame and treats the input of encoded primitive frame (indicated as block 1306).This type of information can be communicated to scalability R-D module 718 by 612 calculating of GOP dispenser and via (for example) metadata.Handle 1300 and proceed to block 1308 to determine the scalability information of base layer data and enhancement data.Shown in block 1310, then carry out the basal layer coding, subsequently encoding enhancement layer in block 1312.Shown in block 1314, the coding of enhancement layer can be used as input with the basal layer coding result that is used for inter-layer prediction, so it is carried out after the basal layer coding in time.This further describes in [attorney docket/Ref. No. 050078] number common U.S. patent application case co-pending that is entitled as " SCALABLE VIDEO CODING WITH TWO LAYERENCODING AND SINGLE LAYER DECODING ".After coding is finished, handle 1300 and finish at block 1316 places.
Section/macro zone block sequencing
First pass part 702 also comprises section/macro zone block sequence module 722, and described section/macro zone block sequence module 722 receives in second time part and is provided to mode decision module 715 from the input of error resilient module 740 and with slice alignment information.Section is the chunk of decodable (entropy decoding) encoded video data separately.Access unit (AU) is encoded frame of video, and each comprises one group of NAL unit that always just in time contains a basic coding picture.Except that the basic coding picture, access unit also can contain one or more redundancy encoding pictures or other does not contain a plurality of sections of encoded picture or the NAL unit of slice of data subregion.The decoding of access unit always causes through decoding picture.
Frame can provide the time division multiplexing block of the physical layer packet (being known as TDM container (TDM capsule)) of high time diversity.Superframe is corresponding to a time quantum (for example, 1 second) and contain four frames.In time domain, make section and AU borderline phase aim at the most effective separation and the localization that produces corrupt data for frame boundaries.Between serious degradation period, the most of proximity data in the TDM container is subjected to erroneous effects.Because the cause of time diversity, residue TDM container has impregnable than high likelihood.Can utilize unspoiled data to recover and hide obliterated data from affected TDM container.Similar logic is applied to frequency domain multiplexed (FDM), and wherein frequency diversity obtains by the separation in the frequency subcarriers of data symbol modulation.In addition, similar logic is applied to (by the separation in reflector and the receiver antenna) and the diversity (often being applied in the wireless network) of other form in space.
For section and AU are aimed at respect to frame, outer sign indicating number (FEC) code block is set up and the encapsulation of MAC layer also should be aimed at.Figure 20 illustrates the tissue of encoded video data or video bit stream among section and the AU.Encoded video can be formed in one or more bit streams (for example, the base layer bitstream of applying hierarchical video coding and enhanced layer bit).
Video bit stream comprises the AU that is illustrated by frame 1 ' 2005, frame 3 ' 2010 and frame M ' 2015 as among Figure 20.AU comprises as by section 12020, section 22025 and section N2030 illustrated data slicer.Each of section begins by the identification of starting sign indicating number and network-adaptive is provided.On the whole, I frame or intraframe coding AU are bigger, are P frame or forward prediction frame secondly, are the B frame afterwards.With regard to encoded bit rate, AU is encoded to a plurality of sections causes significant overhead cost, because the spatial prediction in the section is limited and the section header also increases overhead.Because slice boundaries is synchronous points again, institute can control mistake so that contiguous physical layer packet is limited to section, this is because when PLP is destroyed, mistake is limited in the section among the described PLP, if yet described PLP contains the part of a plurality of sections or a plurality of sections, so wrong will the part of all sections or section among the described PLP being exerted an influence.
Because I frame big usually (for example, being approximately tens thousand of positions) is not so because the overhead that a plurality of section produces accounts for the larger proportion of total I frame size or total bit rate.And, in intraframe coding AU, have more section and realize better and more frequent synchronous again and more effective spatial error concealment.And, because P frame and B frame are by the prediction of I frame, so the most important information of I frame carrying in video bit stream.The I frame is also with acting on the random access point that channel obtains.
Now referring to Figure 21; carefully aim at the I frame and also aim at section with IAU with respect to frame boundaries with respect to frame boundaries; can realize that the most effective wrong control, error protection are (if owing to belong to a slice loss of frame 12105; the section that belongs to frame 22110 so has impregnable than high likelihood; because have significant time-division between frame 22110 and the frame 12105, mistake recover can by more synchronously and error concealing carry out).
Because the P frame has about thousands of size usually, so the section of aiming at a P frame and an integer P frame with respect to frame boundaries is implemented in the error resilient (to similar about the reason of I frame) under the situation of inefficient unfavorable loss.Can be in aspect this type of service time error concealing.Perhaps, scatter the continuous P frame so that it arrives in the different frames additional time diversity is provided between the P frame, this may be because the time hides based on motion vector and data from the I frame or the P frame of previous reconstruction.The B frame can be that minimum (hundreds of position) is to medium big (only a few kilobit).Therefore, need come a pre-integral B frame with the error resilient under the situation that is implemented in inefficient unfavorable loss with respect to frame boundaries.
The mode decision module
Figure 12 illustrates some examples of the operation of mode decision module 715.As shown in the figure, processing 1200 begins at input block 1202 places.In an illustrative example, the various information that are input to mode decision module 715 comprise: 16 * 16 costs, interior UV 8 * 8 costs of frame, interior Y 16 * 16 patterns of frame, the interior UV pattern of frame, motion vector data (MVD), quantization parameter (QP), SpPredMB4 * 4Y, SpPredMB16 * 16Y, SpPredMB8 * 8U, SpPredMB8 * 8V, rate distortion flag, original YMB pixel, original UMB pixel and original VMB pixel in slice type, intra-frame 4 * 4 cost, the frame.Handle 1200 and then proceed to block 1204 initialization of encoding, it can be by input signal or as the indicated interfaces guiding encoder initializations startup of block 1206.Described initialization can comprise setting allows pattern (comprise skip, directly), sets pattern flexible strategy (will be to equate flexible strategy for all pattern default values optionally) and setting buffer.After initialization, handle 1200 and proceed to block 1208, in described block 1208, carry out main processing about mode decision, it comprises: for each is allowed mode computation macro zone block (MB) pattern cost, comes for each MB pattern cost weighting and select minimum MB pattern cost pattern with weighted factor.Comprise as block 1210 and 1212 illustrated estimation (for example, MVD and prediction) and spatial prediction (for example, all interior cost of frame and prediction) with the relevant input of these operations.With mode decision module 715 be situated between connect improve the entropy coding of compression speed for (especially) in the block 1214.Handle 1200 and then proceed to block 1216, buffer is through upgrading to communicate information to second time part 706 of encoder in block 1216.At last, handle 1200 and proceed to block 1218, in block 1218, " the best " coding mode can be communicated to second time part 706 of encoder.
Second time part of encoder
Referring to Fig. 7, second time part 706 of encoder 228 comprises the second time coder module 232 that is used to carry out second time coding once more.Second time encoder 232 reception is from the output of mode decision module 715.Second time encoder 232 comprises MC/ change quantization module 726 and zigzag (ZZ)/entropy coder 728.The result of second time encoder 232 is output to scalability module 730 and bit stream packing module 731, and described bit stream packing module 731 encoded basal layers of output and enhancement layer are so that transmitted via layer, sync 240 (illustrated in fig. 2) by transcoder 200.As shown in Figure 2, should note, from second time encoder 232 with the basal layer and the enhancement layer of encoder 234 are combined as subpackage PES 242, data PES 244 (for example, CC and other text data) and the audio frequency PES 246 that comprises basal layer and enhancement layer by layer, sync 240 again.It should be noted that audio coder 236 receives through decoded audio information 218 and the described information and described encoded information 238 outputed to layer, sync 240 of encoding again.
Encoder again
Referring to Fig. 7, second time part 706 of encoder also comprises encoder 234 (it is corresponding to the encoder again 234 among Fig. 2) again once more.Encoder 234 also receives the output of first pass part 702 and comprises MC/ change quantization 726 and ZZ/ entropy coding 728 parts again.In addition, scalability module 730 outputs to encoder 234 again.Encoder 234 basal layer that will synthesize owing to encoding again and enhancement layer output to bit stream packing module 731 so that be transferred to synchronizer (for example, the layer, sync shown in Fig. 2 240) again.Encoder 228 examples among Fig. 7 also comprise rate controlled fine setting module 738, described rate controlled fine setting module 738 with bit stream packing feedback be provided to second time in the encoder 232 MC/ change quantization module 234 and be provided to again in the encoder 234 ZZ/ entropy module 736 both, thereby help tuning second time coding (for example, increasing compression efficiency).
The error resilient module
Encoder 228 examples illustrated in fig. 7 also are included in the error resilient module 740 in the part 706 second time.Error resilient module 740 is communicated by letter with section/MB sequence module 722 with bit stream packing module 731.Error resilient module 740 receives the frame refreshing from the metadata of preprocessor 228 and selection error resiliency schemes (for example, section and access unit being aimed at respect to frame boundaries), prediction level and employing.The selection of error resiliency schemes can or be communicated to the information of error resilient module based on the information that is received in the metadata from bit stream packing module 731 and section/MB sequence module 722.Error resilient module 740 is provided to information in section/macro zone block (MB) sequence module of first pass part 702 and handles to implement selected error resilient.Video transmission via the environment of easily makeing mistakes can be used error resilient strategy and algorithm, and described error resilient strategy and algorithm can cause presenting more clearly and having for the user who observes the data of less mistake.Error resilient is hereinafter described and be can be applicable to existing the application or following any indivedual persons or combination of using (transport layer and physical layer or other technology).Effectively anti-wrong robustness algorithm will combine with the required character of communication system (for example low latency and format high throughput) the understanding of error sensitivity character between the osi layer and error protection ability.Error resilient is handled can be based on the content information of multi-medium data, for example, and based on the classifying content of described multi-medium data.One of major advantage is the restorability from decay and multi-path channel mistake.Hereinafter described error resilient method relates in particular to and can be incorporated in the processing that (for example, especially is incorporated in error resilient module 740 and the section/MB sequence module 722) in the encoder 228, and generally can expand to the data communication in the environment of easily makeing mistakes.
Error resilient
For the hybrid compression system based on prediction, intracoded frame is absolute coding under the situation of not having prediction any time.Can be according to past frame (P frame) and future frame (B frame) and predict inter-frame encoding frame in time.Can handle by the search in the reference frame (one or more) and discern optimum prediction, and use distortion measure (for example SAD) to discern optimum Match.The predictive coding district of present frame can be the block of size and dimension with variation (16 * 16,32 * 32,8 * 4 etc.), perhaps is identified as one group of pixel of object by (for example) segmentation.
Time prediction is generally used for that many frames (for example, 10 to several 10 frames) are gone up and it stops when frame is encoded as the I frame, GOP is defined by the I frame rate usually.In order to realize maximum code efficiency, GOP is a scene, and for example the GOP border is aimed at scene boundary and scene change frame is encoded as the I frame.Comprise static relatively background in the harmonic motion sequence and move and be limited to foreground object usually.The example of the content of this type of harmonic motion sequence comprises news and weather predicting program, and wherein most of view content has this character more than 30%.In the harmonic motion sequence, most of zones are through interframe encode, and predictive frame comes later with reference to the I frame via the medium range forecast frame.
Referring to Figure 22, the intraframe coding block 2205 in the I frame is the predictor that is used for the interframe encode block 2215 of encoded frame (or AU) P1.In this example, the zone of these blocks is the static part of background.By continuous time prediction, the susceptibility of 2205 pairs of mistakes of intraframe coding block rises, because it is " good " predictor, this means that also its " importance " is higher.In addition, intraframe coding block 2205 relies on these time prediction chains (being called prediction chains) and continues the long time (lasting the duration of the scene in the example of this figure) in display.
The prediction level is defined as based on this importance rate or continuation and measures and the tree of the block set up, wherein father (parent) at the top (intraframe coding block 2205) and son (children) in the bottom.Note, the interframe encode block 2215 among the P1 on second grade of described level, or the like.Leaf is for stopping the block of prediction chains.
The prediction level can set up about video sequence and no matter content type (for example music and physical culture and be not only news) how, and the prediction level generally can be applicable to video (and data) compression (this is applied to all aspects described in the application's case) based on prediction.In case set up described prediction level, just more effectively application error restores algorithm (for example hereinafter described adaptability frame refreshing).Measure of importance can for example strengthen the recovery of encoded bit stream from mistake by hiding operation and application adaptability frame refreshing based on the restorability of given block from mistake.The estimated value of measure of importance can be used as the number of times (being also referred to as continuation tolerance) of predictor based on block.Continuation tolerance also is used to improve code efficiency by suppressing the prediction error propagation.Continuation tolerance also increases about the position of the block with higher significant distributes.
The adaptability frame refreshing
The adaptability frame refreshing is can be based on the error resilience technique of the content information of multi-medium data.In frame refreshing was handled, some MB were intra-encoded, and are still like this even standard R-D optimization will stipulate that it should be interframe encode MB.AIR uses motion weighting frame refreshing to come to introduce intraframe coding MB in P frame or B frame.These intraframe codings MB that is included in the basal layer can QP bOr QP eEncode.If QP eBe used to basal layer, should do not carry out any improvement at the enhancement layer place so.If QP bBe used to basal layer, it may be suitable improving so, otherwise quality is comparatively remarkable the following general who has surrendered at enhancement layer place.Because on the meaning of code efficiency, interframe encode is more effective than intraframe coding, so will be through interframe encode in these improvement at enhancement layer place.In this way, base layer coefficient will be not used in enhancement layer, and the quality at enhancement layer place is improved under the situation of not introducing new operation.
In some respects, the adaptability frame refreshing can be based on the content information (for example, classifying content) of multi-medium data but not motion weighting basis, or content-based information and described motion weighting basis.For instance, if described classifying content is higher relatively (for example, having the scene of higher room and time complexity), the adaptability frame refreshing can be incorporated into more relatively intraframe coding MB in P frame or the B frame so.Perhaps, if described content information is relatively low (indication has the less dynamic scene of lower space and/or time complexity), the adaptability frame refreshing can be incorporated into less intraframe coding MB in P frame and the B frame so.This type of tolerance and the method that is used for improving error resilient not only can be applied to the situation of wireless multimedia communication but also be applied to data compression substantially and multimedia processing (for example, in the graph rendering).
The channel switch frame
As the channel switch frame (CSF) that defined herein is the term of broad sense, and the random access frame that its description is inserted in the appropriate position in the broadcasting stream is to be used for Fast Channel and to obtain and therefore to be used for the fast channel variation between the multiplexed stream of broadcasting.The channel switch frame also increases anti-wrong robustness, has spendable redundant data when being transmitted under the wrong situation because it provides when prime frame.The I frame or in proper order I frame (for example H.264 in the frame of decoder refresh in proper order) play random access point usually.Yet frequent I frame (perhaps short GOP is shorter than the scene duration) causes compression efficiency significantly to reduce.Because the intraframe coding block can be required for error resilient, increase simultaneously and resist wrong robustness so can arbitrary access and error resilient be made up effectively to improve code efficiency by the prediction level.
Arbitrary access exchanges and the improvement of anti-wrong robustness can realize jointly, and can be based on content informations such as for example classifying contents.For the harmonic motion sequence, the long and major part that rebuild the required information of superframe or scene of prediction chains is contained in the I frame that begins to locate that comes across described scene.Channel error tends to exist and hide the huge residual error that lost efficacy into paroxysmal and when decay invasion and attack and FEC and chnnel coding lost efficacy.Because the number of encoded data is not enough significantly providing good time diversity and because these encoded data are the compressible sequences of height that make each can be used for rebuilding in video bit stream, so this is especially serious for harmonic motion (and therefore low bitrate) sequence.Owing to the character of content, high motion sequence for more anti-mistake-each frame in more fresh information increase can independently decode and the number of the intraframe coding block that easily from mistake, recovers in essence.Realize that based on the adaptability frame refreshing of prediction level the high-performance of high motion sequence and improvement in performance are not remarkable for the harmonic motion sequence.Therefore, the channel switch frame that contains most of I frame is the good source that is used for the diversity of harmonic motion sequence.When mistake invasion and attack superframe, the decoding in the successive frame is from CSF, and described CSF is owing to the information that realizes that prediction and error resilient recover to lose.
((for example, sequence 6-8) under) the situation, CSF can be made up of the block (for good predict) that continues in SF for example to have higher relatively classifying content at high motion sequence.Other zone of all of described CSF does not need to be encoded, because these zones are the blocks with short prediction chains (this means that it stops with the frame intra block).Therefore, when mistake occurred, CSF was still owing to the effect that recovers from drop-out is played in prediction.The CSF that is used for the harmonic motion sequence equates with the size of I frame, and it can be next with lower bit rate coding by bigger quantification, and the CSF that wherein is used for high motion sequence is more much smaller than corresponding I frame.
Based on the error resilient of prediction level can be well with the scalability cooperation and can realize highly effectively hierarchical coding.The data that can require to have the video bit stream of specific bandwidth ratio in order to the scalability of supporting the modulation of level formula in physical-layer techniques are cut apart.These ratios may always not be used for the ideal ratio (for example, having minimum overhead) of best scalability.In some respects, use 2-layer scalability with 1: 1 bandwidth ratio.It may not be effectively same for the harmonic motion sequence that video bit stream is divided into two layers with equivalent size.For the harmonic motion sequence, the basal layer that contains all headers and metadata information is bigger than enhancement layer.Yet, because it is bigger to be used for the CSF of harmonic motion sequence, so it adapts to the remaining bandwidth in the enhancement layer well.
High motion sequence has enough residual, information, makes to realize that data are divided into 1: 1 by minimum overhead.In addition, it is more much smaller than the frame that is used for high motion sequence to be used for the channel switch frame of this type of sequence.Therefore, also can cooperate well based on the error resilient of prediction level with the scalability that is used for high motion sequence.The conceptual expansion of above being discussed may be used for the middle motion montage based on description, and the notion that is proposed is applied to video coding substantially to these algorithms.
Multiplexer
Aspect some encoders, a plurality of media streams that multiplexer can be used for that encoder is produced are encoded and are used for preparing encoded position is used for broadcasting.For instance, aspect the illustrative of the encoder shown in Fig. 2 228, layer, sync 240 comprises multiplexer.Can implement described multiplexer bit rate allocation control is provided.The complexity of estimating can be provided to described multiplexer, described multiplexer then can be according to being used in the set of those video channels at the desired codec complexity of multiplex video channel with the available bandwidth branch, and it then allows the quality of particular channel to keep constant relatively (bandwidth of set that promptly is used in multiplex video stream is constant relatively).This provides the channel with variable bit rate and relative constant visual quality in the set of channel, but not has the channel of constant relatively bit rate and variable visual quality.
Figure 18 is the block diagram of the system of explanation a plurality of media streams of coding or channel 1802.Media stream 1802 is encoded by encoder 1804 separately, and described encoder 1804 is communicated by letter with multiplexer (MUX) 1806, and described multiplexer 1806 is then communicated by letter with transmission medium 1808.For instance, media stream 1802 can be corresponding to various content channels, for example news channel, sports channel, film channels or the like.Encoder 1804 is encoded to media stream 1802 coded format that is given for described system.Although describe in the situation of encoded video streams, the principle and advantage of the technology that is disclosed can be applicable to comprise the media stream of (for example) audio stream usually.Encoded media stream is provided to multiplexer 1806, and described multiplexer 1806 sends to transmission medium 1808 so that transmission with various encoded media stream combinations and with described mix flow.
Transmission medium 1808 can be corresponding to various medium, for example (but being not limited to) digital communication by satellite (Direc for example ), digital cable, wiredly communicate by letter with wireless Internet, optic network, cell phone network or the like.Transmission medium 1808 can comprise (for example) modulation to radio frequency (RF).Usually, owing to reasons such as spectrum inhibition, transmission medium has limited bandwidth and the data from multiplexer 1806 to transmission medium maintain constant relatively bit rate (CBR).
In conventional system, constant bit-rate (CBR) also is CBR at the use encoded multimedia or the video flowing that can require to be imported into multiplexer 1806 of output place of multiplexer 1806.Described in background, when video content, the use of CBR can cause bad variable visual quality usually.
In illustrated system, the expection codec complexity of data is imported in both or both the above reception and registration of encoder 1804.The one or more reception in response of encoder 1804 controlled from the adaptation bit rate of multiplexer 1806.This allow to wish that encoder 1804 that the video of relative complex is encoded is will definitely displacement speed mode to receive the higher bit rate that is used for those frame of video or higher bandwidth (each frame is than multidigit).This allows to come encoded multimedia stream 1802 with the visual quality of unanimity.If by the used extra bandwidths of specific encoder that the video of relative complex is encoded 1804 from described encoder originally through implementing will to be used for position that other video flowing 1804 is encoded so with constant bit rate operation.This keeps multiplexer 1806 and exports with constant bit-rate (CBR).
Although indivedual media streams 1802 can be " paroxysmal " (that is, utilized bandwidth change) relatively, the accumulation of a plurality of video flowings and may not too have sudden.Bit rate from the channel that more uncomplicated video is encoded can heavily be assigned to the channel that the video of relative complex is encoded by (for example) multiplexer 1806, and this can improve the visual quality of combined video stream on the whole.
Encoder 1804 provides indication to the complexity of one group of to be encoded and multiplexed together frame of video for multiplexer 1806.The output of multiplexer 1806 should provide the output that is not higher than the bit rate that is given for transmission medium 1808.Indication to complexity can content-basedly be classified (discussing as mentioned) so that the chosen quality grade to be provided.The indication of multiplexer 1806 Analysis of Complex and for various encoders 1004 provide institute allotment purpose position or bandwidth, and encoder 1804 uses these information to come the frame of video in described group is encoded.This allows one group of frame of video individually to have variable-digit speed, and still realizes constant bit rate as one group.
Classifying content also can be used for any universal compressed device substantially and realizes multimedia compression based on quality.Classifying content described herein and method and device can be used in the handling based on quality and/or content-based multimedia of any multi-medium data.Example is that it is substantially for the use in the multimedia compression of any universal compressed device.Another example is used for decompressing in any decompressor or decoder or preprocessor or decoding for it, for example interpolation, resampling, enhancing, recovery and demonstration operation.
Now referring to Figure 19, typical video communication system comprises the video compression system of being made up of video encoder and Video Decoder, and described video encoder is connected by communication network with described Video Decoder.Wireless network is a kind of network of easily makeing mistakes, and wherein communication channel also shows the logarithm normality decay in mobile plot or covers and multipath attenuation except that path loss.In order to prevent channel error and reliable communication about application layer data to be provided, radio frequency modulator comprises forward error correction, and described forward is proofreaied and correct and comprised interleaver and chnnel coding (for example convolution or turbine coding).
Video compression reduces redundant in the video of source and increases the amount of the information of institute's carrying in each of encoded video data.Even when the sub-fraction of encoded information was lost, this still increased the influence to quality.In the video compression system intrinsic room and time prediction aggravation lose and the propagation that leads to errors, thereby cause visible false shadow in the reconstruction video.The error resilient algorithm at video encoder place and the error resilience algorithms at Video Decoder place strengthen the anti-wrong robustness of video compression system.
Usually, video compression system be can not determine basic network.Yet, in the network of easily makeing mistakes, the FEC in the error protection algorithm in the application layer and the link/physical layer and chnnel coding combined or aim at very cater to the need, and maximum efficient is being provided aspect the error performance of enhanced system.
Figure 14 explanation can come across in the encoder 228 example in order to the rate distortion data flow of coded frame.Processing 1400 begins at beginning 1402 places and proceeds to decision block 1404, and it receives scene change detectors input 1410 (for example, via metadata) and obtain error resilient input 1406 from preprocessor 226 in decision block 1404.If information indication selected frame is the I frame, so described processing is carried out intraframe coding to described frame.If information indication selected frame is P or B frame, intraframe coding and estimation (interframe) the described frame of encoding is used in so described processing.
Situation occurs certainly for the situation of block 1404 after, handle 1400 and proceed to and prepare block 1414, in preparing block 1414 with the speed R value of being set at R=Rqual, promptly based on the aimed quality of wanting of R-D curve.This set point is to receive from the block 1416 that comprises the R-D curve.Handle 1400 then from proceeding to block 1418, in block 1418, carry out the rate controlled position and distribute { Qpi} based on the image/video action message (for example, classifying content) of handling (at block 1420 places) from classifying content.
Rate controlled position distribution block 1418 is then used in the estimation in the block 1422.Estimation 1422 also can receive following input: from the metadata (1412) of preprocessor, from the motion vector smoothing (MPEG-2+ history) of block 1424 with from a plurality of reference frames (cause and effect+non-causal macro zone block MB) of block 1426.Handle 1400 and then proceed to block 1428, in block 1428, distribute (Qpi) to be identified for the rate calculations of intra-frame encoding mode at the rate controlled position.Handle 1400 and then proceed to block 1430, deterministic model and quantization parameter in block 1430.The mode decision of block 1430 is based on that estimation, error resilient 1406 inputs and the scalability R-D (determining at block 1432 places) of block 1422 input make.In case deterministic model, flow process just proceed to block 1432.It should be noted that from the flow process of block 1430 to 1432 and when the first pass of encoder partly is passed to second time part, take place in data.
At block 1432 places, transform and quantization is carried out by second time part of encoder 228.As indicated with block 1444, transform/quantization process is through adjusting or fine setting.This transform/quantization process can be subjected to rate controlled fine setting module (Fig. 7) influence.Handle 1400 and then proceed to block 1434 so that Z-shaped classification and entropy coding produce encoded basal layer.The valid format that Z-shaped classification is in quantized data to be used for encoding.Entropy coding is to use a series of bit code to represent the compress technique of one group of possibility symbol.The enhancement layer result of transform/quantization block 1432 also sends to adder 1436, and described adder 1436 deducts described basal layer and the result is sent to the ZZ/ entropy coder 1438 that is used for enhancement layer, as before referring to Figure 31-36 description.Further note, described enhancement layer through feedback (upgrading) referring to line 1440 true speed in case upgrade classifying content 1420 with true speed and by rate controlled be used for determining bit rate for a long time and the operation of short-term history.
Figure 17 is the high level block diagram of multimedia coding system.Described multimedia coding system comprises the device that is used for receiving multimedia data, as illustrated by the module 1705 that is used for receiving multimedia data.This type of device can be including (for example) transcoder, encoder, preprocessor, the processor that is configured to receiving multimedia data, receiver.More particularly, receiving system can comprise assembly and module in order to receiving multimedia data described herein, and in various examples, it comprises transcoder 200.Described coded system also comprises the device that is used for the encoded multimedia data, as illustrated by the module 1710 that is used for multi-medium data is encoded.This type of code device 1710 can comprise transcoder 200, encoder 228 or preprocessor 226.
Figure 23, Figure 24, Figure 27 and Figure 28 are the program flow diagrams of the method for illustration encoded multimedia data, and it implements aspect described herein.Figure 23 is the program flow diagram of the processing 2300 of the content-based information encoded multimedia data of explanation.At block 2305 places, handle 2300 and receive encoded multi-medium data, and, handle the described multi-medium data of 2300 decodings at block 2310 places.At block 2315 places, handle 2300 determine with through decoding multimedia data associated content information.At block 2320 places, handle 2300 based on the described content information described multi-medium data of encoding.
Figure 24 is that explanation encoded multimedia data are so that content-based message level comes the program flow diagram of the processing 2400 on aligned data border.At block 2405 places, to handle 2400 and obtain and multi-medium data associated content information, it can be carried out by preprocessor 226 shown in (for example) Fig. 7 or content, classification module 712.At block 2410 places, handle the described multi-medium data of 2400 codings so that content-based information is come the aligned data border.For instance, the classifying content based on just encoded multi-medium data makes slice boundaries and access unit border aim at frame boundaries.Encoded data then can be used for reprocessing and/or are transferred to mobile device, and handle 2400 and finish.
Figure 27 illustrates that being used for content-based information uses the adaptability intra refresh schemes to come the program flow diagram of the processing 2700 of coded data.When handling 2700 beginnings, obtained multi-medium data.At block 2705 places, handle the content information of 2700 acquisition multi-medium datas.The acquisition of described content information can be carried out by (for example) preprocessor 226 or the content, classification module 712 described in as mentioned.Handle 2700 and proceed to block 2710, it uses adaptability frame refreshing error resiliency schemes to come the encoded multimedia data in block 2710, and wherein said adaptability frame refreshing coding rehabilitation programs is based on described content information.The functional of block 2710 can be carried out by encoder 228.Encoded data can be used for processing and transmission subsequently, and then handle 2700 and finish.
Figure 28 is that explanation uses redundant I frame to come the program flow diagram of the processing of encoded multimedia data based on content of multimedia information.When handling 2800 beginnings, multi-medium data can be used for handling.At block 2805 places, handle the content information of the described multi-medium data of 2800 acquisitions.As mentioned, this can be carried out by (for example) preprocessor 226 and/or encoder 228.At block 2810 places, handle the described multi-medium data of 2800 codings so that one or more extra I frames are inserted in the encoded data based on described content information.This can be carried out in conjunction with error resiliency schemes by encoder 228 (as mentioned), and it still is to depend on employed error resiliency schemes in the enhancement layer that the I frame is inserted into basal layer.After block 2810, encoded data can be used for processing subsequently and/or are transferred to mobile device.
It should be noted that method described herein can implement in known various communication hardwares, processor and the system of one of ordinary skill in the art.For instance, for the aggregate demand that client is operated as described in this article like that is that client has the display that is used for displaying contents and information, is used to control the processor of the operation of described client, and is used to store the data relevant with the operation of described client and the memory of program.In one aspect, client is a cellular phone.On the other hand, client is the hand-held computer with communication capacity.Aspect another, client is the personal computer with communication capacity.In addition, for example hardware such as gps receiver can be incorporated in and implement various aspects in the client.Various illustrative logical, logical blocks, module and the circuit of describing in conjunction with each side disclosed herein can be implemented with its any combination of carrying out function described herein or carry out with general processor, digital signal processor (DSP), application-specific integrated circuit (ASIC) (ASIC), field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components or through design.General processor can be a microprocessor, but in alternate embodiment, processor can be any conventional processors, controller, microcontroller or state machine.Processor also can be embodied as the combination of calculation element, for example combination, a plurality of microprocessor of DSP and microprocessor, combine one or more microprocessors of DSP core or any other this type of configuration.
Various illustrative logical, logical blocks, module and the circuit of describing in conjunction with each side disclosed herein can be implemented with its any combination of carrying out function described herein or carry out with general processor, digital signal processor (DSP), application-specific integrated circuit (ASIC) (ASIC), field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components or through design.General processor can be a microprocessor, but in alternate embodiment, processor can be any conventional processors, controller, microcontroller or state machine.Processor also can be embodied as the combination of calculation element, for example combination, a plurality of microprocessor of DSP and microprocessor, combine one or more microprocessors of DSP core or any other this type of configuration.
The method and apparatus that is disclosed will be the video data with another form coding with a kind of video data transcoding of form coding, wherein encode and can realize error resilient based on the content and the coding of described video data.In the software module that method of describing in conjunction with example disclosed herein or algorithm can be embodied directly in the hardware, carried out by processor, in the firmware, perhaps in both or both the above combination in these.Software module can reside in the medium of any other form known in RAM memory, flash memory, ROM memory, eprom memory, eeprom memory, register, hard disk, removable disk, CD-ROM or this technology.Exemplary storage medium is coupled to processor so that described processor can and write information to described medium from described read information.In alternate embodiment, medium can be integral formula with processor.Processor and medium can reside among the ASIC.Described ASIC can reside in the user terminal.In alternate embodiment, processor and medium can be used as discrete component and reside in the user terminal.
Example as described above only now can make full use of above-described example and depart from above-described example for exemplary and those skilled in the art under the situation that does not depart from inventive concept disclosed herein.It will be apparent to those skilled in the art that various modifications to these examples, and under the situation of the spirit or scope that do not depart from novel aspect described herein, the General Principle that this paper defined can be applicable to other example, for example is applied in instantaneous information receiving and transmitting service or any general wireless data communication applications.Therefore, the scope of this paper is without wishing to be held to example shown in this article, but should meet and principle disclosed herein and the consistent widest range of novel characteristics.Use word " exemplary " to represent " as example, example or explanation " herein specially.Any example that will not be described as " exemplary " herein be interpreted as compare with other example preferred or more favourable.Therefore, novel aspect described herein will be only by the scope definition of appended claims.

Claims (44)

1. method of handling multi-medium data, it comprises:
Receiving multimedia data; And
Content based on described multi-medium data is encoded to first data set and second data set with described multi-medium data,
Described first data set is configured to be independent of described second data set and decodes, and wherein said first and second data sets are with different credit rating codings.
2. method according to claim 1, wherein said first data set comprises I frame and P frame, and wherein said second data set comprises I frame, P frame and B frame.
3. method according to claim 1, wherein said first data set comprise basal layer and described second data set comprises enhancement layer.
4. method according to claim 1, it further comprises the described classifying content with described multi-medium data, and wherein said coding is based on described classifying content.
5. method according to claim 4, wherein said coding comprises encode second quantization parameter of described second data set of first quantization parameter and being identified for of described first data set of encoding that is used to of determining described multi-medium data, and the described of wherein said first and second quantization parameters determined based on described classifying content.
6. method according to claim 4, wherein said coding comprise based on described classifying content to distribute bit rate at least a portion of described multi-medium data.
7. method according to claim 4, wherein said coding further comprises:
Using described classifying content to detect scene changes; And
Change based on described detected scene and to determine whether the I frame is included in described first data set and described second data set.
8. method according to claim 4, wherein said coding further comprise based on described classifying content be identified for the encoding frame rate of described multi-medium data.
9. method according to claim 4, wherein said coding comprises the estimation of carrying out described multi-medium data based on described classifying content.
10. method according to claim 4, it further comprises first frame rate of described first data set that is identified for encoding and second frame rate of described second data set that is identified for encoding, and wherein said first frame rate is less than described second frame rate.
11. method according to claim 4, wherein said coding comprise based on described classifying content processing is restored in described multi-medium data execution error.
12. method according to claim 4, wherein said coding comprises described first data set of coding and described second data set, if make that described second data set is unavailable, described first data set of decodable code is to form displayable multi-medium data so, if and described first data set and described second data set are all available, so can compound mode decode described first data set and described second data set are to form displayable multi-medium data.
13. method according to claim 4, wherein said first quantization parameter comprises and is used for first step-length that data are encoded, and wherein said second quantization parameter comprises and is used for second step-length that data are encoded, and the wherein said first step is grown up in described second step-length.
14. method according to claim 1, it further comprises the described classifying content with described multi-medium data, and wherein said coding is based on described classifying content, and wherein said coding comprises based on described classifying content and reduces noise in the described multi-medium data.
15. method according to claim 14 wherein reduces noise and comprises and carry out false shadow and remove.
16. method according to claim 14 wherein reduces noise and comprises and spend at least a portion that ring wave filter is handled described multi-medium data, the intensity of wherein said decyclization filter is based on the content of described multi-medium data.
17. method according to claim 14 wherein reduces noise and comprises at least a portion of handling described multi-medium data with de-blocking filter, the intensity of wherein said de-blocking filter is based on the content of described multi-medium data.
18. method according to claim 14 wherein reduces noise and comprises the selected frequency of described multi-medium data is carried out filtering.
19. method according to claim 16, the intensity of wherein said decyclization filter is based on the classifying content of described multi-medium data.
20. method according to claim 17, the intensity of wherein said de-blocking filter is based on the classifying content of described multi-medium data.
21. method according to claim 1, wherein coding comprises downsampled described multi-medium data.
22. method according to claim 1, wherein coding comprises a credit rating is associated with described multi-medium data, and uses the content information of described credit rating and described multi-medium data be identified for the encoding bit rate of described multi-medium data.
23. equipment that is used to handle multi-medium data, it comprises encoder, described encoder is configured to receiving multimedia data and based on the content of described multi-medium data described multi-medium data is encoded to first data set and second data set, described first data set is configured to be independent of described second data set and decodes, and wherein said first and second data sets are with different credit rating codings.
24. equipment according to claim 23, wherein said encoder comprises content, classification module, described content, classification module is configured to determine the classifying content of described multi-medium data, and wherein coding module through further being configured to based on the described classifying content described multi-medium data of encoding.
25. equipment according to claim 24, wherein said encoder is wherein determined the described classifying content of described first and second quantization parameters based on described multi-medium data through further being configured to determine encode second quantization parameter of described second data set of first quantization parameter and being used to of described first data set of encoding that is used to of described multi-medium data.
26. equipment according to claim 24, wherein said encoder comprises motion estimation module, described motion estimation module is configured to carry out the estimation of described multi-medium data and produce the data motion compensation information based on described classifying content, and wherein said coding module is through further being configured to use the described motion compensation information described multi-medium data of encoding.
27. equipment according to claim 24, wherein said encoder further comprises quantization modules, described quantization modules is used for determining based on described classifying content the quantization parameter of described multi-medium data, and wherein said encoder is through further being configured to use the described quantization parameter described multi-medium data of encoding.
28. equipment according to claim 24, wherein said encoder further comprises a distribution module, and institute's rheme distribution module is configured to provide bit rate at least a portion of described multi-medium data based on described classifying content.
29. equipment according to claim 24, wherein said encoder further comprises and is configured to detect the scene change detection module that scene changes, and wherein said coding module is included in the I frame in the encoded multi-medium data through further being configured to change based on detected scene.
30. equipment according to claim 24, wherein said encoder further comprises the frame rate module, described frame rate module is configured to determine based on described classifying content the frame rate of described multi-medium data, and wherein said coding module is based on the described frame rate described multi-medium data of encoding.
31. equipment according to claim 24, wherein said encoder is through further being configured to based on described classifying content encode first data set and described second data set.
32. equipment according to claim 24, wherein said encoder is through further being configured to based on described classifying content described multi-medium data error process.
33. an equipment that is used to handle multi-medium data, it comprises:
The device that is used for receiving multimedia data; And
Be used for described multi-medium data being encoded to the device of the first encoded data set and encoded second data set based on the content of described multi-medium data, described first data set is configured to be independent of described second data set and decodes, and wherein said first and second data sets are with different credit rating codings.
34. equipment according to claim 33, wherein said receiving system comprises encoder.
35. equipment according to claim 33, the wherein said apparatus for encoding that is used for comprises encoder.
36. equipment according to claim 33, wherein said code device comprise the device of the classifying content that is used for determining described multi-medium data, and wherein said code device is based on the described classifying content described multi-medium data of encoding.
37. according to the equipment of claim 33, wherein said code device comprises the transcoder of encoder.
38. a machine-readable medium that comprises instruction, described instruction impel machine when carrying out:
Receiving multimedia data; And
Content based on described multi-medium data is encoded to first encoded data set and the second encoded data set with described multi-medium data, described first data set is configured to be independent of described second data set and decodes, and wherein said first and second data sets are with different credit rating codings.
39. according to the described computer-readable media of claim 38, it further comprises the instruction of the classifying content of the described content that is used for producing the described multi-medium data of indication, and wherein is used for the described instruction that described multi-medium data is encoded to the first encoded data set and encoded second data set comprised and is used for based on the encode instruction of described multi-medium data of described classifying content.
40. according to the described computer-readable media of claim 39, the described instruction that wherein is used for encoding comprises and is used for determining based on described classifying content the encode instruction of second quantization parameter of described second data set of first quantization parameter and being used to of described first data set of encoding that is used to of described multi-medium data.
41. according to the described method of claim 38, it is the instruction of at least a portion distribution bit rate of described multi-medium data that the described instruction that wherein is used for encoding comprises the described content that is used for based on described multi-medium data.
42. a processor, it comprises one and is configured to:
Receiving multimedia data; And
Content based on described multi-medium data is encoded to first encoded data set and the second encoded data set with described multi-medium data, described first data set is configured to be independent of described second data set and decodes, and wherein said first and second data sets are with different credit rating codings.
43. according to the described processor of claim 42, wherein said processor further comprises a classifying content that is configured to produce the described content of the described multi-medium data of indication, and wherein said coding comprises based on the described classifying content described multi-medium data of encoding.
44. according to the described processor of claim 42, wherein said processor further comprises encode second quantization parameter of described second data set of first quantization parameter and being used to of described first data set of encoding that is used to that is configured to determine described multi-medium data, and wherein said first and second quantization parameters are based on described classifying content.
CN 200680043239 2005-09-27 2006-09-27 Content driven transcoder that orchestrates multimedia transcoding using content information Pending CN101313580A (en)

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
US72141605P 2005-09-27 2005-09-27
US60/721,416 2005-09-27
US60/727,640 2005-10-17
US60/727,644 2005-10-17
US60/727,643 2005-10-17
US60/730,145 2005-10-24
US60/789,048 2006-04-03
US60/789,377 2006-04-04

Publications (1)

Publication Number Publication Date
CN101313580A true CN101313580A (en) 2008-11-26

Family

ID=40101112

Family Applications (4)

Application Number Title Priority Date Filing Date
CN 200680043239 Pending CN101313580A (en) 2005-09-27 2006-09-27 Content driven transcoder that orchestrates multimedia transcoding using content information
CNA2006800439065A Pending CN101313589A (en) 2005-09-27 2006-09-27 Redundant data encoding methods and device
CN200680044013.2A Active CN101313592B (en) 2005-09-27 2006-09-27 Methods and device for data alignment with time domain boundary
CN200680043886.1A Expired - Fee Related CN101313588B (en) 2005-09-27 2006-09-27 Coding method and device of scalability techniques based on content information

Family Applications After (3)

Application Number Title Priority Date Filing Date
CNA2006800439065A Pending CN101313589A (en) 2005-09-27 2006-09-27 Redundant data encoding methods and device
CN200680044013.2A Active CN101313592B (en) 2005-09-27 2006-09-27 Methods and device for data alignment with time domain boundary
CN200680043886.1A Expired - Fee Related CN101313588B (en) 2005-09-27 2006-09-27 Coding method and device of scalability techniques based on content information

Country Status (3)

Country Link
CN (4) CN101313580A (en)
ES (1) ES2371188T3 (en)
UA (1) UA92368C2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104054339A (en) * 2012-01-19 2014-09-17 高通股份有限公司 Signaling of deblocking filter parameters in video coding
WO2017063566A1 (en) * 2015-10-13 2017-04-20 Mediatek Inc. Partial decoding for arbitrary view angle and line buffer reduction for virtual reality video

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5347849B2 (en) * 2009-09-01 2013-11-20 ソニー株式会社 Image encoding apparatus, image receiving apparatus, image encoding method, and image receiving method
JPWO2011099254A1 (en) * 2010-02-15 2013-06-13 パナソニック株式会社 Data processing apparatus and data encoding apparatus
US8644383B2 (en) * 2011-03-10 2014-02-04 Microsoft Corporation Mean absolute difference prediction for video encoding rate control
JP5948659B2 (en) * 2011-10-01 2016-07-06 インテル・コーポレーション System, method and computer program for integrating post-processing and pre-processing in video transcoding
CN107911699B (en) * 2012-07-02 2021-08-10 三星电子株式会社 Video encoding method and apparatus, and non-transitory computer-readable medium
KR101586367B1 (en) * 2013-08-07 2016-01-18 주식회사 더블유코퍼레이션 Method for processing multi-channel substitutional advertisement with single source and managing schedule
EP3023983B1 (en) * 2014-11-21 2017-10-18 AKG Acoustics GmbH Method of packet loss concealment in ADPCM codec and ADPCM decoder with PLC circuit
CN104735449B (en) * 2015-02-27 2017-12-26 成都信息工程学院 A kind of image transfer method split based on rectangle every column scan
CN106209773A (en) * 2016-06-24 2016-12-07 深圳羚羊极速科技有限公司 The method that the sampling transmission of a kind of audio packet is recombinated again
US10116981B2 (en) * 2016-08-01 2018-10-30 Microsoft Technology Licensing, Llc Video management system for generating video segment playlist using enhanced segmented videos
US10708666B2 (en) * 2016-08-29 2020-07-07 Qualcomm Incorporated Terrestrial broadcast television services over a cellular broadcast system
KR102373261B1 (en) * 2017-09-28 2022-03-10 애플 인크. Systems and methods for processing event camera data
CN111143108B (en) * 2019-12-09 2023-05-02 成都信息工程大学 Coding and decoding method and device for reducing array code Xcode repair
CN112260694B (en) * 2020-09-21 2022-01-11 广州中望龙腾软件股份有限公司 Data compression method of simulation file

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6538688B1 (en) * 1998-07-02 2003-03-25 Terran Interactive Method and apparatus for performing an automated inverse telecine process
WO2000019726A1 (en) * 1998-09-29 2000-04-06 General Instrument Corporation Method and apparatus for detecting scene changes and adjusting picture coding type in a high definition television encoder
US6639943B1 (en) * 1999-11-23 2003-10-28 Koninklijke Philips Electronics N.V. Hybrid temporal-SNR fine granular scalability video coding
US20030118097A1 (en) * 2001-12-21 2003-06-26 Koninklijke Philips Electronics N.V. System for realization of complexity scalability in a layered video coding framework
KR100501933B1 (en) * 2002-11-21 2005-07-18 삼성전자주식회사 Coding compression apparatus and method for multimedia data
US7606472B2 (en) * 2003-05-30 2009-10-20 Canon Kabushiki Kaisha Video stream data recording apparatus

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104054339A (en) * 2012-01-19 2014-09-17 高通股份有限公司 Signaling of deblocking filter parameters in video coding
US9723331B2 (en) 2012-01-19 2017-08-01 Qualcomm Incorporated Signaling of deblocking filter parameters in video coding
WO2017063566A1 (en) * 2015-10-13 2017-04-20 Mediatek Inc. Partial decoding for arbitrary view angle and line buffer reduction for virtual reality video

Also Published As

Publication number Publication date
CN101313592B (en) 2011-03-02
CN101313588B (en) 2012-08-22
CN101313592A (en) 2008-11-26
ES2371188T3 (en) 2011-12-28
UA92368C2 (en) 2010-10-25
CN101313589A (en) 2008-11-26
CN101313588A (en) 2008-11-26

Similar Documents

Publication Publication Date Title
CN102724498B (en) The coding method of the scalability techniques of content-based information and equipment
CN101313592B (en) Methods and device for data alignment with time domain boundary
Valdez Objective video quality assessment considering frame and display time variation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20081126