CN1870748A - Internet protocol TV. - Google PatentsInternet protocol TV. Download PDF
- Publication number
- CN1870748A CN1870748A CN 200510066767 CN200510066767A CN1870748A CN 1870748 A CN1870748 A CN 1870748A CN 200510066767 CN200510066767 CN 200510066767 CN 200510066767 A CN200510066767 A CN 200510066767A CN 1870748 A CN1870748 A CN 1870748A
- Prior art keywords
- Prior art date
- 239000002609 media Substances 0 claims abstract description 38
- 238000005457 optimization Methods 0 description 76
- 238000000034 methods Methods 0 description 66
- 238000005516 engineering processes Methods 0 description 52
- 238000006243 chemical reaction Methods 0 description 47
- 238000001914 filtration Methods 0 description 27
- 238000004891 communication Methods 0 description 15
- 230000018109 developmental process Effects 0 description 12
- 238000005070 sampling Methods 0 description 12
- 239000011159 matrix materials Substances 0 description 11
- 230000002708 enhancing Effects 0 description 10
- 238000004458 analytical methods Methods 0 description 9
- 239000000203 mixtures Substances 0 description 7
- 238000004364 calculation methods Methods 0 description 5
- 239000000463 materials Substances 0 description 4
- 241000894007 species Species 0 description 3
- 230000002123 temporal effects Effects 0 description 3
- 241000208340 Araliaceae Species 0 description 2
- 238000010168 coupling process Methods 0 description 2
- 230000000694 effects Effects 0 description 2
- 238000000605 extraction Methods 0 description 2
- 230000004438 eyesight Effects 0 description 2
- 240000005373 Panax quinquefolius Species 0 description 1
- 230000003213 activating Effects 0 description 1
- 239000003795 chemical substance by application Substances 0 description 1
- 238000004140 cleaning Methods 0 description 1
- 150000001875 compounds Chemical class 0 description 1
- 238000010276 construction Methods 0 description 1
- 230000002349 favourable Effects 0 description 1
- 230000002265 prevention Effects 0 description 1
- 230000001737 promoting Effects 0 description 1
- 238000007493 shaping process Methods 0 description 1
- 238000000638 solvent extraction Methods 0 description 1
- 238000005728 strengthening Methods 0 description 1
Relate generally to internet protocol TV of the present invention and video frequency program provide system, more particularly, the present invention relates to improve the H.264/AVC optimisation technique of basic class codec of Code And Decode speed.
H.264/AVC be the video coding international standard of developing jointly and working out by ITU-T VCEG and two tissues of ISO/IEC MPEG, formally issue in May, 2003.H.264/AVC adopted a large amount of advanced persons' the flexibility of video coding technique to improve code efficiency and to ensure network application.Under the prerequisite of identical visual quality, bit rate H.264/AVC approximately reduces by 50% than early stage standard.For the application of high time delay, advantage H.264/AVC is more obvious.
The present invention is devoted to the H.264/AVC optimization research of basic class codec, reaches high as far as possible Code And Decode speed by algorithm optimization and multimedia instruction optimization, and main contents are as follows.
H.264/AVC the inventor is detailed has contrasted and previous video encoding standard, has concluded H.264/AVC at the special technology that improves aspect coding efficiency and the channel adaptability.H.264/AVC the raising to coding efficiency mainly comes from following four aspects: the entropy coding mode of going mosaic filtering in the motion prediction ability of enhancing, the accurate dct transform of fritter, the self adaptation ring and strengthening.Aspect robustness that ensures transfer of data and the flexibility operated under different network environments, H.264/AVC the new technology of Cai Yonging comprises: parameter set structure, NAL cellular construction, band use flexibly, strip data subregion, redundant coded picture or the like.
At the H.264/AVC optimization of basic class decoder, the inventor has at first carried out analysis of complexity to the main functional modules of decoder, has determined several modules the most consuming time in the H.264/AVC basic class decoder.Shoot the arrow at the target then, proposed motion compensation fast algorithm, entropy decoding fast algorithm, IDCT fast algorithm respectively and at the MMX/SSE/SSE2 multimedia instruction optimisation technique of Intel CPU.Experiment shows that pure C optimization work makes decoder speed bring up to about original seven times, and multimedia instruction optimization work makes that the decoding speed of final basic class decoder is not optimize about nine times of version.
For the H.264/AVC optimization of basic class encoder, the inventor has proposed " skipping " the macro block fast detection method and the rapid movement searching algorithm of improved rate-distortion optimization algorithm, novelty.Experiment shows, the optimisation strategy that is adopted has exchanged the raising of 29～47 times of coding rates for less distortion.
At last, the inventor has also concluded hot research direction and related industry chain H.264/AVC, but is desirable to provide the clue of some marks of follow-up study person.
Summary of the invention
Research based on said circumstances and inventor, the invention provides a kind of system that video frequency program is provided, it is characterized in that comprising: the program center is used for receiver, video source program and converts described video source program to be suitable for carrying out flow transmission on the internet media signal; Content transmits the center, is used for receiving described media signal from described program center and using described media signal to transmit the program of being asked by carrying out flow transmission in the internet; And subscriber device, be used for transmitting the center requests program from content.
Description of drawings
Details of the present invention, about its structure and operation, can come with reference to the accompanying drawings to understand best, wherein similar reference number refers to similar part, and wherein:
Fig. 1 illustrates the development course of ITU-T and MPEG video standard;
Fig. 2 illustrates the video encoder diagram;
Fig. 3 illustrates the Video Decoder diagram;
Fig. 4 illustrates the video standard scope;
Fig. 5 illustrates H.264/AVC video encoder structure;
Fig. 6 illustrates the access unit structure;
Fig. 7 illustrates H.264/AVC macroblock coding structure;
Fig. 8 illustrates and does not use FMO with the example of image division for band;
Fig. 9 illustrates two examples that the QCIF image adopts FMO;
Figure 10 illustrates Intra_4 * 4 predictive modes, and wherein (a) value that a-p is shown is tried to achieve by A-Q, (b) is eight kinds of prediction direction of Intra_4 * 4 patterns;
Figure 11 illustrates the division of macro block;
Figure 12 illustrates the multiframe motion compensation;
Figure 13 illustrates mosaic filter principle;
Figure 14 illustrates the performance of mosaic filter, and wherein the mosaic filter is removed for not adopting in the left side, and the mosaic filter is removed for adopting in the right;
Figure 15 illustrates evolution process H.264/AVC;
Figure 16 illustrates the video standard sequence, and wherein (a) is flowergarden 250 frames, (b) is foreman 300 frames, (c) is tempete 260 frames, (d) is mobile 300 frames, (e) is highway 2000 frames, (f) is paris 1065 frames;
Figure 17 illustrates the luminance block interpolation diagram of 1/4 pixel precision, and wherein integral point represents that with the capitalization of band shade fractional point is with not representing with the lowercase of shade;
Figure 18 illustrates the chrominance block interpolation diagram of 1/8 location of pixels;
Figure 19 illustrates reference frame and fills diagram, and wherein (a) is the original reference frame, (b) for filling example, (c) is the reference frame after filling, (d) for filling sketch;
Figure 20 illustrates adjacent block and chooses diagram, and wherein (a) illustrates the situation of current block and the identical size of contiguous block, and the situation of current block and the different sizes of contiguous block (b) is shown;
Figure 21 illustrates complete searching method;
Figure 22 illustrates hexagon search, wherein 1. is big hexagon, 2. is little hexagon;
Figure 23 illustrates the complete method for searching motion of fractional point;
The subjective quality that Figure 24 illustrates under the different code checks compares, and wherein (a) is former figure, (b) is the situation of 400K, (c) is the situation of 600K, (d) is the situation of 800K;
Figure 25 illustrates H.264/AVC media industry chain;
Figure 26 illustrates the making platform example; And
Figure 27 illustrates the broadcast example.
H.264 be present up-to-date video coding international standard, formally issue in May, 2003.This standard is united formulation by ITU-T (standardization department of international telecommunication union telecommunication) that is responsible for communication technology standardization and ISO (International Standards Organization), H.264 has another name called " MPEG4 AVC (Advanced Video Coding) ".H.264/AVC data compression rate is more than 1.5 times of more than 2 times of MPEG2, MPEG4.H.264/AVC compression ratio height not only, and Network Transmission had better support function.It has been introduced towards the encoding mechanism of IP bag, helps the transmitted in packets in the network, the Streaming Media of video transmission in the network enabled.H.264/AVC have stronger anti-bit error performance, can adapt to the video transmission in the wireless channel of packet loss height, serious interference.H.264/AVC support the hierarchical coding transmission under the heterogeneous networks resource, thereby obtain picture quality stably.Because H.264/AVC higher compression ratio, better IP and radio network information channel adaptability, so this standard will obtain more and more widely application in Digital Video Communication and field of storage.
Simultaneously be noted that also the cost that H.264/AVC obtains superior function is the increase of computation complexity.According to estimates, the calculation of coding complexity is about as much as H.263 three times, and decoding complex degree is about as much as twice H.263.Thereby practical research H.264/AVC is a research focus of present stage video field.The present invention promptly is devoted to the H.264/AVC optimization research of basic class codec, reaches high as far as possible Code And Decode speed by algorithm optimization and multimedia instruction optimization.The content arrangement of this specification is as follows.
H.26X the series video standard of part 1 general introduction International Telecommunications Union (ITU-T) and the MPEG series video standard of International Organization for Standardization, and to generally the class and the rank of employing briefly introduce in the video international standard.
Part 2 is to H.264/AVC network abstraction layer of video standard (NAL) and video coding layer (VCL) have been done more detailed elaboration, and will H.264/AVC compare with previous video encoding standard, concluded the special technology that H.264/AVC improves coding efficiency and channel adaptability.
The 3rd part is devoted to H.264/AVC basic class decoder optimization technology.At first identifying code is reduced and obtained H.264/AVC basic class decoder, determined several modules the most consuming time in the H.264/AVC basic class decoder by analysis of complexity then: motion compensation, entropy decoding, IDCT, remove mosaic filtering and buffer management.Proposed respectively with the pure C optimisation technique of platform independence and at the MMX/SSE/SSE2 multimedia instruction optimisation technique of IntelCPU at the concrete property of these modules then.Experimental result shows that pure C optimization work makes decoding speed bring up to about original seven times, and multimedia instruction optimization work makes that the decoding speed of final basic class decoder is not optimize about nine times of version.
The 4th part has provided the complete H.264/AVC algorithm optimization strategy of basic class encoder, has proposed " skipping " the macro block fast detection method and the rapid movement searching algorithm of improved rate-distortion optimization algorithm, novelty.Experimental result shows, the optimisation strategy that is adopted has exchanged the raising of 29～47 times of coding rates for less distortion (being up to 0.16db).
The 5th part inventor has carried out certain conclusion with regard to present hot research direction and related industry chain H.264/AVC, but is desirable to provide the clue of some marks of follow-up study person.
1. video standard general introduction
Applications such as digital storage media, television broadcasting and communication increase day by day for the requirement of the general-purpose coding method of moving image and sound accompaniment image thereof, and video standard is promptly answered this requirement and produced.Its purposes is to make the sport video data as the accessible data mode of a kind of computer, can be stored on the various mediums, can send on the existing or following network, receive, and also can propagate on the existing or following broadcast channel.
1.1 video standard brief introduction
Standard is vital for communication.Do not have sender and recipient the common all-purpose language of understanding, communication just can not realize.For the multi-media communication that needs transmitting video data, it is more important that standard just seems.Video encoding standard not only will be formulated a kind of current language (being the bitstream syntax that we often say, bitstream syntax), and this language must be very effective.This validity comprises two aspects: on the one hand, standard must rely on a kind of outstanding compression algorithm, the bandwidth demand when this algorithm can reduce transmitting video data; On the other hand, the realization of encoder will be tried one's best simply, that is to say that the complexity of compression algorithm is low as far as possible.
Two types standard is arranged.A kind of is to be reached common understanding and industry or the commercial criterion formulated by some companies.Sometimes this standard can be received by the market, thereby becomes de facto standard, and is widely accepted by other institutes of company.Another is by the defined aspiration standard of the volunteers of the open committee (voluntary standards).This standard is driven by the market demand, and this standard is ahead of the present situation of technical development simultaneously.Because if the product development merchant has utilized its proprietary technology to develop product, just be difficult to allow it accept new standard again.The standard of being discussed in this specification belongs to second type.
In the multi-media communication field, two main standards tissues are arranged: (the International Telecommunication Union-TelecommunicationStandardization Sector of International Telecommunications Union, ITU-T) and International Standards Organization (InternationalOrganization for Standardization, ISO).The history of these two video standards that standardization body researched and developed and present situation are as shown in Figure 1.
With regard to encryption algorithm, all video standards are followed same frame structure, and difference only is parameter range and the coding mode that some are specific.The essential difference of these standard rooms is that employed bit rate is interval different: MPEG-1, MPEG-2 are applicable to high bandwidth, high-quality and low video and the voice applications that postpones; H.261, H.263, MPEG-4 and developing H.264 then be applicable to low bandwidth to image quality and postpone less demanding application.
Understand video encoding standard two kinds of approach are arranged: a kind of bitstream syntax (bitstreamsyntax) that lays particular emphasis on, need understand each aspect of grammer and each bit of bit stream what is represented.This method manufacturer that the equipment of standard is deferred in manufacturing for needs is very important.Another lays particular emphasis on the encryption algorithm of generation standard code stream, need understand the function of each module and the pluses and minuses of various algorithms.In fact strict, do not formulate any encryption algorithm in the standard.Adopt a kind of approach in back can understand video coding technique on the whole better, rather than be confined to the normal bitstream grammer.
1.2H.26X brief introduction
This section is introduced International Telecommunications Union---telecommunication standardization sector (InternationalTelecommunication Union-Telecommunication StandardizationSector, ITU-T) Kai Fa video standard.The predecessor of ITU-T be international Telephone and Telegraph Consultative Committee (Consultative Committee of the International telephone andTelegraph, CCITT).H.261 the video standard of ITU-T comprise, H.263, and H.263+.
H.261 be that ITU-T SG15 is the video international standard that visual telephone and video conference define, this standard is emphasized low bit rate and low coding delay.H.261 standard began one's study from 1984, and target bit rate at that time is m * 384kbits/s, and wherein the span of m is between 1 to 5.Beginning in 1988, variation has taken place in development target, and bit rate changes p * 64kbits/s into, and wherein, the span of p is 1 to 30.Therefore, unofficial name H.261 is called p * 64.December nineteen ninety is formally issue H.261.H.261 encryption algorithm adopts motion compensation (motioncompensation) to remove the time-domain redundancy, and discrete cosine transform (DCT) coding removes the spatial domain redundancy.The video encoding standard of subsequent development all is based on this coding framework.Therefore, video encoding standard H.261 existing for other and that formulating has very far-reaching influence.
H.261 Gui Ding frame per second is 30000/1001 (being approximately 29.97) frame/second (p/s), and picture format is CIF (common intermediate format) and QCIF (quarterCIF).With the term of computer industry, CIF approaches the CGA form of use in the computer demonstration.Under this resolution, picture quality is not high, approaches the quality of video tape recorder, well below the quality of radio and television.H.261, this is because of being to be visual telephone and video conference design, and in this application, video sequence mainly is made of talker's thing, and common TV programme comprises a large amount of motions and scene changes.
H.261 the encryption algorithm of Cai Yonging as shown in Figures 2 and 3.In encoder-side, the piece of input picture is carried out motion search in decoded reference frame, then residual error is carried out dct transform and quantification treatment, again the coefficient after quantizing is carried out entropy coding and transmission.In decoder end, carry out DCT inverse transformation and motion compensation with recuperating original image.
Because the forecast sample piece of each piece of present frame is made up of the sample block that is arranged in the reference frame diverse location, therefore, contains coding noise and blocking effect in the predictive frame.For reducing predicated error, predictive frame at first need be done low-pass filtering treatment, and then as the reference frame of present frame.Because this filter is positioned at the motion compensation loop, so be called loop filter.
Fig. 4 has provided the scope of standard, and typical encoding and decoding of video chain (having removed transmission of video signals and storage area).Identical with other all ITU-T with the ISO/IEC video encoding standard, only stipulated H.261 how bitstream syntax and decoder are explained code stream and decoded.That is to say that standard has only been stipulated decoder, specifically provided the decode procedure of bitstream syntax and each syntactic element, and how explanation does not encode.
For the code stream behind the given arbitrarily coding, all decoders of deferring to standard all can produce identical output like this.This qualification of critical field makes each company relatively freer when realizing encoder, can try to achieve balance point according to concrete being applied between compression quality, realization cost and the time of commercial operation.But on the other hand, Ding Yi standard can not guarantee encoding quality like this: suppose that motion vector all is zero, all pixels are all encoded with conversion, and the code stream of Sheng Chenging also is standard compliant so, but the non-constant of compression performance.
Therefore, in order to show the validity of video encoding standard, the group of definition standard can provide usually reference model (reference model, RM).Reference model has been realized whole identifying codes of estimation, quantification, interframe/intraframe coding decision-making, motion compensation, buffering, Rate Control.
H.261 be the part of several ITU-T H series terminal standards of formulating for different network environments.For example, H.320 (being the agreement of N-ISDN terminal definitions) adopts H.261 as the video codec standard, adopt H.221 as multiplexed standard, adopt H.242 conduct control signaling standard, adopt G.711, G.722 and G.728 as the encode/decode audio signal standard.H.323 and H.324 H.261 also be used for the other-end standard, as H.321, H.322, or the like.
H.263 defined by exploitation ITU-T SG15 H.261.H.263 the definition of standard originates in November, 1993, is formally adopted in 1996.This video encoding standard is application (very low bit-rate applications) design that is lower than 64kbits/s for bit rate, for example, the video bitrate typical range be 10 to the mobile network of 24kbits/s or Public Switched Telephone Network (publicswitched telephone network, PSTN).On essence, H.263 combine H.261 the feature with MPEG, and done many optimizations at low bit rate very.(signal to noise ratio, angle SNR) is in that be lower than H.263 will 3 to 4 dB than H.261 under the code check of 64kbits/s from signal to noise ratio.In fact, under all code checks, H.261 coding efficiency H.263 all is better than.Compare with MPEG-1, bit rate H.263 will hang down 30%.
Owing to H.263 be based on H.261 structure, so the agent structure of these two standards is identical.The difference of two standards is as follows:
(1) H.263 supports the more images form, and use different GOB structures.
(2) H.263 use half-pixel motion compensation, as H.261, do not use loop filter.
(3) H.263 use 3D VLC to come encoding D CT coefficient.
(4) except basic encryption algorithm, four negotiable enhancing options are arranged between encoder:
Not restrained motion arrow pattern---motion vector can exceed image boundary
The compression efficiency of the arithmetic coding pattern based on grammer---arithmetic coding will be higher than variable-length encoding
Advanced predictive mode---except that the piece that uses 16 * 16 carries out motion search, can also use 8 * 8 piece
The PB-frame pattern---the B frame in the PB-frame does not adopt traditional bi-directional motion vector, but the motion vector of P frame is done flexible back as forward motion vector, to reduce bit rate.
(5) H.263 allow each macro block to change quantization step.
With H.261 the same, ITU-T SG15 also provides with reference to codec, i.e. test model.TMN6 has provided all identifying codes of four negotiable options.
To H.261 similar, H.263 be the part of several ITU-T H series terminal standards of formulating for different network environments.For example, H.324 (being the agreement of PSDN terminal definitions) adopts H.263 as the video codec standard, adopts H.223 as multiplexed standard, adopts H.245 conduct control signaling standard, adopts G.723 as the encode/decode audio signal standard.H.263 also be used for the other-end standard, for example the H.323 terminal that defines for the local area network (LAN) of do not guarantee service quality (QoS).
H.263+ by ITU-T SG16 exploitation, be the enhancing to H.263.These enhancing property technology be grouped into 12 key technology areas (Key Technical Areas, KTAs), and formal issue in 1997 years.These enhancing property technology reduce following six big classes:
● extended source form: Higher Picture Clock Frequency (PCF), CustomPicture Formats, Custom Pixel Aspect Ratios (PAR)
● improve the new coding mode of code efficiency: Advanced Intra Coding Mod, Alternate Inter VLC Mod, Modified Quantization Mod, DeblockingFilter Mode, Improved PB-frame Mode
● improve the enhancement mode of error robustness: Slice Structured Mode, ReferencePicture Selection Mode, Independent Segment Decoding Mode
● retractility enhancement mode: Temporal scalability, SNR scalability, andspatial scalability mode
● other enhancement mode: Reference Picture Resampling Mode, Reduced-Resolution Update Mode
● supplemental enhancement information (supplemental enhancement information)
1.3MPEG brief introduction
Mpeg standard is the focus that many scientific research institutions and university pay close attention to always, also is the focus of industrial quarters digital video terminal product development.(the Moving Picture ExpertGroup of Motion Picture Experts Group, MPEG) be by (the InternationalOrganization for Standardization of International Standards Organization in 1988, ISO) and (the International Electrotechnical Commission of International Electrotechnical Commission, IEC) unite the expert group of establishment, be responsible for the standard such as synchronous of the coding, decoding of exploitation television image data and voice data and they.The standard of this expert group's exploitation is called mpeg standard, and up to the present, the mpeg standard of having developed and having developed has:
MPEG-1: digital television standard, formal issue in 1992.
MPEG-2: digital television standard.
MPEG-3: merged to high definition TV (HDTV) working group in July, 1992.
MPEG-4: multimedia application standard, issue in 1999.
MPEG-5: also do not see definition until in September, 1998.
MPEG-6: also do not see definition until in September, 1998.
MPEG-7: Multimedia Content Description Interface standard (studying).
Mpeg standard has been illustrated the Code And Decode process of sound and television image, strict regulations form the sentence structure of bit data stream behind sound and the coded image data, the method for testing of decoder etc. is provided.But all the elements are not all made strict regulations, especially to the algorithm of compression and decompression.So both guaranteed that decoder can be correctly decoded voice data and the television image data that meet mpeg standard, and left very big leeway for again the specific implementation of mpeg standard.People can update the Code And Decode algorithm, improve the quality and the code efficiency of sound and television image.
The same with other iso standard files, the constructive process of mpeg standard file is divided into 4 stages:
(1) working document (Working Draft, WD): the working document that working group prepares.
(2) committee draft (Committee Draft, CD): ready from working group Work FileThe middle file of coming out that promotes.This is the original form of ISO document, and it is studied and voted by the inner official inquiry of ISO.
(3) (Draft International Standard, DIS): ballot member state is right for draft internation standard Committee draftContent and explanation satisfied after by Committee draftThe file that elevates.
(4) (International Standard, IS): other departments and other committees by ballot member state, ISO vote the file of publishing issue afterwards through to international standard.
1.3.1MPEG-1 digital television standard
That MPEG-1 handles is standard interchange format (Standard InterchangeFormat, SIF) or be called source input format (Source Input Format, SIF) TV, be that TSC-system is 352 pixels * 240 row/frames * 30 frame/seconds, Phase Alternation Line system is 352 pixels * 288 row/frames * 25 frame/seconds, and the output speed of compression is defined in below the 1.5Mb/s.This standard is mainly developed at CD-ROM that had this data transmission rate at that time and network, is used on CD-ROM the storage Active-Movie and in the transmission over networks Active-Movie.
The standard No. of MPEG-1 is ISO/IEC11172, and standard name is " information technology---be used for data rate approximately up to the television image and the sound accompaniment coding (Information technology-Coding of moving pictures and associatedaudio for digital storage media at up to about 1.5Mb/s) of the digital storage media of 1.5Mb/s ".It will be adopted by ISO/IEC the end of the year 1991, be made up of 5 parts:
1. MPEG-1 Systems is write as in the MPEG-1 system, stipulates the synchronous of television image data, voice data and other related datas, is called for short ISO/IEC 11172-1.
2. the MPEG-1 television image is write as MPEG-1 Video, and the Code And Decode of regulation TV data is called for short ISO/IEC 11172-2.
3. MPEG-1 sound is write as MPEG-1 Audio, and the Code And Decode of regulation voice data is called for short ISO/IEC 11172-3.
4. the MPEG-1 uniformity test is write as MPEG-1 Conformance testing, is called for short ISO/IEC 11172-4.How this standard specifies tests the requirement whether bit data stream and decoder satisfy defined in first three part of MPEG-1.These tests can be implemented by manufacturer and user.
5. the MPEG-1 software simulation is write as MPEG-1 Software simulation, is called for short ISO/IEC 11172-5.In fact, the content of this part is not a standard, but a technical report has provided the result who carries out first three part of Moving Picture Experts Group-1 with software.
1.3.2MPEG-2 digital television standard
Moving Picture Experts Group-2 began one's study from nineteen ninety, issue DIS in 1994.It is direct high quality graphic and a sound coding standard relevant with digital television broadcasting.MPEG-2 can be described as the expansion of MPEG-1, because their basic coding algorithm is all identical, but MPEG-2 has increased the unexistent function of many MPEG-1.For example, increased the coding of interlaced scan tv, changeability (scalability) function of bit rate is provided.The elementary object of MPEG-2 aspect speed is: bit rate is 4～9Mb/s, is up to 15Mb/s.
The standard No. of MPEG-2 is ISO/IEC 13818, and standard name is " universal coding of information technology---television image and sound information (Information technology-Genericcoding of moving pictures and associated audio information) ".MPEG-2 comprises 10 parts:
1. MPEG-2 Systems is write as in the MPEG-2 system, stipulates the synchronous of television image data, voice data and other related datas, is called for short ISO/IEC 13818-1.This standard mainly is the combination that is used for defining television image data, voice data and other data, and synthetic one or more of these data sets is applicable to the elementary stream of storage or transmission.Data flow has two kinds of forms, a kind of be called program data stream (Program Stream, PS), another be called transmitting data stream (Transport Stream, TS).Program data stream is by making up one or more normalized packetised elementary streams (Packetised Elementary Streams, PES) a kind of data flow that generates, be used in and occur being suitable for using the application of software processes under the wrong less environment that compares; Transmitting data stream also is the one or more PES of combination and a kind of data flow of generating, and it is used in and occurs under wrong relatively many environment, for example in loss or noisy transmission system are arranged.
2. the MPEG-2 television image is write as MPEG-2 Video, and the Code And Decode of regulation TV data is called for short ISO/IEC 13818-2.
3. MPEG-2 sound is write as MPEG-2 Audio, and the Code And Decode of regulation voice data is the expansion of MPEG-1 Audio, supports a plurality of sound channels, is called for short ISO/IEC 13818-3.
4. the MPEG-2 uniformity test is write as MPEG-2 Conformance testing, is called for short ISO/IEC 13818-4.
5. the MPEG-2 software simulation is write as MPEG-2 Software simulation, is called for short ISO/IEC 13818-5.
6. order of MPEG-2 digital storage media and control Extended Protocol are write as MPEG-2Extensions for DSM-CC, are called for short ISO/IEC 13818-6.This agreement is used to manage the data flow of MPEG-1 and MPEG-2, and data flow both can be moved on unit, again can operation under heterogeneous network (promptly constructing with similar devices but the network of operation different agreement) environment.
7. the advanced acoustic coding of MPEG-2 is write as MPEG-2 AAC, is multi-channel sound encryption algorithm standard.This standard also has the sound standard of non-backward compatible except that backward compatibility MPEG-1 Audio standard, be called for short ISO/IEC 13818-7.
8. MPEG-2 system decoder real-time interface extension standards is called for short ISO/IEC 13818-8.This is and the real-time interface standard of transmitting data stream that it can be used for receiving the transmitting data stream of automatic network.
9. MPEG-2 DSM-CC consistency extend testing is called for short ISO/IEC 13818-9.
10. the advanced acoustic coding standard of MPEG-2 revision.
1.3.3MPEG-4 multimedia application standard
MPEG-4 began one's study from 1994, and it is algorithm and the instrument of developing for the coding of audio-visual data and interactive play, is a multimedia communication standard that data speed is very low.The target of MPEG-4 is can work highly reliably under heterogeneous network environment, and has very strong interactive function.
In order to reach this target, MPEG-4 has introduced the notion that the object base table reaches (object-basedrepresentation), be used for expressing audiovisual object (Audio/Visual Objects, AVO); MPEG-4 has expanded the coded data type, and the generated data object by the natural data object extension generates to computer adopts synthetic object/natural objects hybrid coding (Synthetic/Natural Hybrid Coding, SNHC) algorithm; Realize interactive function and reusing key concepts such as having introduced combination, synthetic and layout in the object.
MPEG-4 will be applied in mobile communication and public switched telephone network (Public SwitchedTelephone Network, PSTN) on, and the application under support video telephone (videophone), TV Mail (video mail), electronic newspaper (electronic newspapers) and other the low message transmission rate occasions.
The standard name of MPEG-4 is Very lowbitrate audio-visual coding (a very low speed rate audiovisual coding).The standard No. of MPEG-4 is ISO/IEC 14496, comprises following 10 parts:
1. the MPEG-4 system is called for short ISO/IEC 14496-1.
2. the MPEG-4 television image is called for short ISO/IEC 14496-2.
3. MPEG-4 sound is called for short ISO/IEC 14496-3.
4. the MPEG-4 uniformity test is called for short ISO/IEC 14496-4.
5. the MPEG-4 reference software is called for short ISO/IEC 14496-5.
6. (Delivery Multimedia IntegrationFramework DMIF), is called for short ISO/IEC 14496-6 to the MPEG-4 Delivery Multimedia Integration Framewor.This agreement is used for managing multimedia data stream, and (File Transfer Protocol, FTP) similar, its difference is: what FTP returned is data, is to be directed to the pointer that where obtains data flow and DMIF returns with file transfer protocol (FTP) in principle.DMIF has covered three kinds of major techniques: broadcast technology, Internet technology and optical disc.
7. the Optimization Software of MPEG-4 instrument is called for short ISO/IEC 14496-7.
8. MPEG-4 IP framework is called for short ISO/IEC 14496-8.
9. MPEG-4 is called for short ISO/IEC 14496-9 with reference to hardware description.
10. (Advanced Video Coding AVC), is called for short ISO/IEC 14496-10 to the MPEG-4 advanced video encoding.AVC is the video international standard that the VCEG of the MPEG of ISO and ITU-T develops jointly, and for H.264, the codec optimization work of this video standard is research emphasis of the present invention in the title of ITU-T.
1.3.4MPEG-7 Multimedia Content Description Interface
In October, 1996, the MPEG tissue proposed a new solution, was referred to as Multimedia Content Description Interface (Multimedia Content Description Interface), just the MPEG-7 international standard.The target of MPEG-7 be set up one the cover audiovisual features quantitative criteria descriptor and structure and the relation between them, be referred to as description scheme (DS, DescriptionSchemes).MPEG-7 also sets up cover standardized language a---Description Definition Language (DDL, Description Definition Language) simultaneously, in order to declarative description symbol and description scheme, guarantees autgmentability and long life cycle that it is widely adopted.People can retrieval and the audiovisual materials that interrelate of index and MPEG-7 data, and these materials can be the multimedia descriptions information that static images, figure, 3D model, sound, speech, video and these elements are formed.MPEG-7 makes every effort to can be fast and search out the required dissimilar multimedia materials of user effectively.MPEG-7 will carry out standardized description to various dissimilar multimedia messagess, and should describe with described content and interrelate, to realize search fast and effectively.This standard do not comprise describing the automatic extraction of feature, it also not the regulation utilization instrument or any program of searching for described.
MPEG-7 mainly is devoted to the information coding of audio-visual data and expresses, and in other words is exactly to concentrate in the standardization of the general-purpose interface of the description of multimedia materials (information of expression content rather than content itself).Just because of this, MPEG-7 is devoted in the flexibility of the interactivity of data resource and globalization and data management.
MPEG-7 can be independent of other mpeg standards to be used, but defined description to audio frequency, object video also is applicable to MPEG-7 among the MPEG-4, and this description is the basis of classification.Can utilize the description of MPEG-7 to strengthen the content description function of other mpeg standards in addition.The maximum difference of MPEG-7 and other mpeg standards is that MPEG-7 more pays attention to the consideration to people's nature.MPEG-7 must combine the characteristics and the technology of many association areas, such as: computer vision, database and signal processing etc.The database personnel pay attention to high-rise description, wish that MPEG-7 provides the structure and the interconnection technique of standard; Signal processing personnel more pay attention to the analysis of image with to the understanding of content.
The application of MPEG-7 comprises: digital library, for example image directory, music dictionary etc.; Multimedia catalogue service (Multimedia Directory Services), for example Yellow Page; The selection of broadcast medium, radio channel for example, TV channel etc.; Multi-media edit, for example personal electric press service, multimedia creation or the like.Potential application comprises: education, amusement, news, tourism, medical treatment, shopping etc.
1.3.5MPEG-21 multimedia framework
Along with the continuous development of multimedia application technology, relevant multimedia standard emerges in an endless stream, and these standards relate to the various aspects of multimedia technology.Various multimedia messages distributed earth is present on the different equipment in the whole world, transmit these multimedia messagess effectively by heterogeneous network, must need synthetically to utilize the multimedia technology standard of different levels.But whether existing standard can really accomplish supporting linking, and whether has gaps and omissions between each standard, also needs a comprehensive standard to be coordinated.This is the original intention that is suggested of " multimedia framework (MultimediaFramework) " this notion just.
The target of MPEG-21 is that transparent effectively electronic transaction and environment for use will be provided for the user of multimedia messages.
It is the integrated of certain key technology that the scope of MPEG-21 can be described as, and these technology can realize the transparent of multimedia resource and strengthen using by visit global network and equipment.Its function comprises: content creating, content product, content release, content consumption and use, content representation, intellectual property right management and protection, content recognition and description, fiscal administration, user's the right of privacy, terminal and Internet resources extraction, event report etc.
The basic framework key element of MPEG-21 comprises numericitem explanation (Digital ItemDeclaration); content representation (Content Representation); the identification of numericitem and description (Digital Item Identification and Description); Content Management and use (Content Management and Usage); intellectual property right management and protection (IntellectualProperty Management and Protection); terminal and network (Terminals andNetworks); event report (Event Reporting) etc.
1.4 field of video applications
Video can be applied to but be not limited to following field:
BSS---satellite transmission service (domestic. applications)
Cable TV on CATV---optical networking, the coaxial cable etc. is relayed
CDAD---cabled digital audio frequency is relayed
DAB---digital audio broadcasting (ground and satellite broadcasting)
DTTB---Digital Terrestrial Television Broadcast
EC---electronic motion picture
ENG---electronic news collection (comprise SNG---SNG)
FSS---fixed satellite service
HTT---family's video theatre
IPC---human communication (video conference, visual telephone etc.)
ISM---interactive storage media (CD etc.)
NCA---news and news flash
NDB---network data base service
RVS---remote video monitoring
SSM---continuous medium (digital VTR VTR etc.)
1.5 class and rank
Each video standard all intends being applied to bit rate, resolution, quality and the service of wider range.In the process of creating a standard, according to the application requirements of each typical fields, formulate necessary syntactic element, and comprehensively be a single grammer system it.Consider the practicality of the complete syntax system that realizes a standard, arrange a limited number of grammer subclass by the mode of " class (profile) " and " rank (level) ".
Class is a subclass of a defined full bit stream grammer of standard.After the grammer scope of given certain class defined, the various values of bit stream parameter still can make the Code And Decode process, and great changes have taken place, for example, can designated frame size (being similar to) be 2 14Individual pixel is wide and 2 14Individual pixel height.Realize that at present the decoder that can handle big or small arbitrarily frame is still impracticable and uneconomic.In order to address this problem, in each class, defined again " rank ".Rank is a set that each parameter of bit stream is limited.These qualifications may be the constraints on some simple numerical, also may be (for example constraints of the wide product with vertical frame dimension and frame speed of frame) that form with the algorithm combination of Several Parameters proposes.
1.6 brief summary
The H.26X series video standard of International Telecommunications Union (ITU-T) and the MPEG series video standard of International Organization for Standardization have been summarized in this part, summed up the application of video, and class and the rank that generally adopts in the video international standard briefly introduced.Content and special technology for the H.264/AVC video standard of institute of the present invention primary study will be set forth in next part.
2.H.264/AVC video standard general introduction
H.264/AVC be present up-to-date video coding international standard.This standard is developed jointly by ITU-T and ISO/IEC, and ITU-T claims this standard for H.264, and ISO/IEC classifies this standard the tenth part of MPEG-4 as, and code name is 14496-10AVC (Advanced VideoCoding).
As far back as 1998, (the Video Coding Experts Group of video coding expert group, VCEG) ITU-T SG16 Q6 has just sent and has collected the H.26L call of engineering motion, H.26L target is to make code efficiency double (this means that code check reduces half under given definition) than every other existing video standard, and is applicable to all applications.First draft of this standard was adopted in October, 1999.In December calendar year 2001, VCEG and MPEGISO/IEC JTC 1/SC 29/WG 11 have set up joint video team (Joint VideoTeam, JVT), the formal issue of this group of in May, 2003 is (JVT-G050) H.264/AVC.Identical with other all ITU-T with the ISO/IEC video encoding standard, H.264/AVC only stipulated decoder in the standard, specifically provided the decode procedure of bitstream syntax and each syntactic element.
H.264/AVC be at least following application technical solution be provided:
● by the broadcasting of cable, satellite, modulator-demodulator, DSL etc.
● on CD or tape unit, carry out mutual or storage continuously, as DVD etc.
● by the dialogue service of ISDN, Ethernet, local area network (LAN), DSL, wireless network, modulator-demodulator etc.
● carry out video request program or streaming media service by ISDN, modulator-demodulator, local area network (LAN), DSL, wireless network etc.
● by the MMS (Multimedia Message Service) (MMS) of ISDN, DSL, Ethernet, local area network (LAN), wireless network etc.
Other new application also will be arranged on the existing and following network in addition.So, how to make a video encoding standard can be applicable to various application and network? H.264/AVC video coding layer (VCL) and network abstraction layer (NAL) have guaranteed this flexibility and customizability.The VCL layer guarantees the efficient coding of video content; The NAL layer comes packaging V CL content (Fig. 5) in the mode that is suitable for various transport layers and storage medium.
This part is organized as follows: at first set forth network abstraction layer (Network AbstractionLayer, the video data structure after NAL) and H.264/AVC encoding; Describe then the video coding layer (Video Coding Layer, VCL); Summarize technical characteristic H.264/AVC at last.
2.1 network abstraction layer
The major function of network abstraction layer (NAL) is to provide the network friendly, promptly can customize the transmission of VCL simply and effectively at the application of different field.
Though H.264/AVC do not comprise in the scope of standard code according to application-specific custom video content, the design of NAL has guaranteed the feasibility of this video customization.Key among the NAL comprises: NAL unit, byte stream form and packet format, parameter set (parametersets) and access unit (access units) or the like.Briefly introduce these notions below, describe in detail and ask for an interview list of references S.Wenger (2003) and T.Stockhammer etc. (2003).
Video data behind the coding is encapsulated in the NAL unit, and each NAL unit is exactly a packet that comprises the integral words joint.First byte of each NAL unit is a head, indicates the type of data in this NAL unit, and remaining byte is the real data of the type of indication in the head, is called payload (payload).
In case of necessity, the payload data in the NAL unit will be interted " pseudo-initial code prevents byte (emulation prevention bytes) ", and interspersed these data are to occur " start code prefix (start code prefix) " in the payload in order to prevent.
The organization definition of NAL unit itself is a kind of general form, both can be used for the transmission system based on bag, also can be used for the transmission system based on byte stream.A series of NAL unit that encoder generates is called NAL unit stream (NAL unit stream).
2.1.2 the NAL unit of byte stream form
Some system (for example, H.320 with MPEG-2/H.222.0 system) need transmit complete or part NAL unit stream in order.In this system, need the border and the internal data of the correct NAL of differentiation unit.Use for this class, H.264/AVC standard definition byte stream form (byte stream format).In the byte stream form, each NAL unit has all added the start code prefix of three bytes.Like this, just can determine the border of NAL unit by searching this unique start code prefix.Pseudo-initial code prevents that the use of byte from having guaranteed that start code prefix is the initial unique identification in NAL unit.
Allow to add a spot of additional data (byte of every frame) in the code stream to help carrying out byte-aligned according to the decoder of byte stream form work.Also allow in the byte stream form to insert additional data, do the data that can expand transmission on the one hand like this, on the other hand quick byte-aligned is had booster action.
2.1.3 the NAL unit of packet format
In other system (for example, the IP/RTP system), the data behind the coding are packed by system transport protocol, thereby the border of NAL unit does not need start code prefix to define in the bag.In this system, the NAL unit can not add start code prefix.
2.1.4VCL and non-VCL NAL unit
The NAL unit can be divided into two classes: VCL NAL unit and non-VCL NAL unit.The data that contain expression video image sample value in the VCL NAL unit; But not contain parameter set (parameter sets) and supplemental enhancement information additional informations such as (supplementalenhancement information) in the VCL NAL unit.Contain important data that are applicable to lot of V CL NAL unit in the parameter set.Contain temporal information in the supplemental enhancement information, or other for the decoding video sequence sampled value and nonessential, but can strengthen the enhancing information of decoding rear video availability again.
2.1.5 parameter set
Contain a data that is applicable to lot of V CL NAL unit in the parameter set, these data are constant substantially in whole decode procedure.Two types parameter set is arranged:
Sequence parameter set, this parameter set are applicable to the frame of video behind a series of continuous codings;
Picture parameter set, this parameter set are applicable to one or several frame of video of decoding.
Sequence and picture parameter set mechanism make that the transmission of data is separated behind the coding of the transmission of a data of changes little and video image sampled value.Each VCL NAL unit contains a sign of pointing to a certain picture parameter set, and each picture parameter set contains a sign of pointing to a certain sequence parameter set.Like this, just can indicate a large amount of information (parameter set), thereby avoid repetition parameter collection information in each VCL NAL unit with a spot of data (sign).
Sequence parameter set and picture parameter set should transmit before their applied VCL NAL unit, also can repeat transmission in order to avoid loss of data.In some applications, parameter set adopts and the identical channel transfer in VCL NAL unit, is called " in-band " transmission; In other were used, parameter set adopted and transmits than the more reliable mechanism of video channel, was called " out-of-band " transmission.
2.1.6 access unit
The NAL unit set of specific format is called access unit (access unit).The access unit of decoding can generate a complete frame.The form of access unit as shown in Figure 6.
Each access unit must comprise the main priority encoder image (primary coded picture) that is made of plurality of V CL NAL unit.Also may contain the access unit delimiter (access unit delimiter) that is useful on its original position of location before the access unit.Before the main priority encoder image, also might contain supplemental enhancement information such as image temporal information etc.
The main priority encoder image is made of a series of VCL NAL unit, contains the band (slices) or the strip data subregion (slice datapartitions) of expression video image sampling in these VCL NAL unit.
May contain some additional VCL NAL unit that are called redundant coded picture (redundant coded pictures) after the main priority encoder image, these unit carry out the redundancy expression to same video image region, and the benefit of doing like this is can recover original image when the main priority encoder image data transmission is lost.Under the situation that redundant coded picture exists, the decoder redundant coded picture of can not decoding.
At last, if coded frame is the last frame of whole encoded video sequence, can adopt EOS (end of sequence) NAL unit to come the end of identifier so; If coded frame is the last frame in the whole NAL unit stream, can adopts stream to finish (end ofstream) NAL unit so and identify the end of stream.
2.1.7 encoded video sequence
Encoded video sequence (coded video sequence) is made up of a series of access units, and these access units are continuous in code stream, and use same sequence parameter set.As long as relevant parameter set information all exists, encoded video sequence just can be totally independent of other encoded video sequence and decode.The initial access unit of each encoded video sequence all is IDR (instantaneous decoding refresh) access unit.An IDR access unit comprises the frame of an intraframe coding.The appearance of IDR access unit represents that also the subsequent frame in the code stream can be with the frame before this IDR access unit as the reference frame.
A NAL unit stream can comprise one or several encoded video sequence.
2.2 video coding layer
ITU-T since H.261 and ISO/IEC video standard all adopt block-based hybrid coding mode, and H.264/AVC VCL has also adopted identical coding method, and Fig. 7 has provided the H.264/AVC encoding-decoding process based on macro block.H.264/AVC be not to derive from a certain specific functional module than other video standards increasing substantially on compression performance, but derive from a lot of little improved comprehensive contributions.
2.2.1 image, frame, and
H.264/AVC the encoded video sequence in (coded video sequence) is made up of the image behind a series of codings (coded picture).Identical with the MPEG-2 video standard, image H.264/AVC (picture) promptly can be represented a complete frame (frame), can represent an independent field (field) again.
In general, a frame of video can be thought to be made of top (top field) He Dichang (bottom field) two fields that are interweaved.The field, top is made up of even number line, and field, the end is made up of odd-numbered line.
2.2.2YCbCr color space and 4:2:0 sampling
The human visual system comes the perception scene by brightness and color information, and more responsive to the brightness ratio color, and video coding system mainly is exactly these characteristics of utilizing the human visual system.Identical with former video standard, H.264/AVC adopt the YCbCr color space that has reduced Cb and Cr sample rate.Y represents brightness, and Cb and Cr represent that respectively color is from the side-play amount of gray scale to blueness and redness.
Because the human visual system wants much responsive to brightness ratio colourity, thus H.264/AVC the sampling structure of Cai Yonging be the sample rate of chrominance block be luminance block sample rate 1/4.The be known as 4:2:0 sampling of 8 precision of every pixel of this sampling.
2.2.3 macro block (Macroblock)
H.264/AVC image segmentation is become the macro block of fixed size, each macro block is made up of one 16 * 16 brightness and two 8 * 8 chrominance block.Macro block is the basic processing unit of the decoding of defined in the standard H.264/AVC.The basic coding algorithm of macro block will explain to set forth after how macroblock packet being become band.
2.2.4 band (Slice) and slice-group (Slice Group)
When not using FMO, band by a series of according to Raster scanning sequenceContinuous macro block is formed.As shown in Figure 8, piece image can resolve into one or several band.Like this, piece image is exactly the set of one or several band.As long as the sequence parameter set and the picture parameter set of activation are arranged, decoder just can only correctly solve the pixel data of this band according to the code stream of this band self, needs the border of the next level and smooth band of information of other bands when only in the end going mosaic filtering.
FMO adopts the notion of slice-group (slice group) to divide piece image into band neatly.Each slice-group comprises all macro blocks by " macro block is to the mapping (macroblock to slicegroup map) of slice-group " appointment, and this mapping is specified by some information of picture parameter set and slice header.Each slice-group comprises one or more bands, and each band is by forming by the continuous a series of macro blocks of raster scanning sequence in this macro block group.Under the situation of not using FMO, can regard entire image by an independent slice-group as and form.When using FMO, can be varied to the division of image, two kinds of forms for example shown in Figure 9.The coding of " macro block is to the mapping of slice-group " expression region of interest field type of left figure is used." macro block is to the mapping of slice-group " of right figure is suitable for the error concealing (annotate: slice-group #0 transmits with different bags with slice-group #1, and one of them packet loss) in the video conference application.Please refer to list of references S.Wenger (2003) about the more information of using FMO.
No matter whether use FMO, each band can adopt one of following listed type of coding to encode:
The I band: all macro blocks in this band all adopt the mode of infra-frame prediction to encode.
The P band: the macro block in this band both can adopt the mode of infra-frame prediction to encode, and can adopt the mode of inter prediction to encode again.Piece for adopting inter prediction only allows a motion compensated prediction signal.
The B band: except the type of coding that the P band adopts, the macro block in the B band can also adopt the mode of bidirectional interframe predictive to encode.The piece that adopts the bidirectional interframe predictive mode to encode has two motion compensated prediction signals.
Above these three kinds of type of codings and previous standard in the type of coding that adopts be identical.In addition, increased following two kinds of type of codings in H.264/AVC newly:
The SP band: this crossover P band can make decoder directly be transformed into the decoding of another video sequence from the decoding of a video sequence so that insert under the situation of I band need not.
The SI band: this crossover I band all adopts 4 * 4 intraframe predictive codings, is used for not having fully the switching between two video sequences of correlation.
2.2.5 macroblock encoding and decode procedure
Incoming video signal is resolved into macro block, again each macroblock map is arrived corresponding slice-group and band, just can encode according to flow process shown in Figure 7 then each macro block in each band.At first the brightness and the chroma samples of a macro block are carried out spatial domain or time-domain prediction, then the difference between primary signal and the predicted value is carried out 4 * 4 integer dct transforms, at last the conversion coefficient after information of forecasting and the quantification is carried out entropy coding.As long as in the image different bands is arranged, just can carry out effective parallel processing to macro block.
2.2.6 adaptive frame/field encoding operation
If exist moving object or pick-up lens moving in the original scene, the correlation of the adjacent lines under the interlacing scan situation so will be starkly lower than the correlation of the adjacent lines under the situation of lining by line scan.In this case, be more suitable for respectively each being compressed.For improving code efficiency, H.264/AVC allow encoder when coding one two field picture, to do following selection:
1) with two occasion and get up, according to independent frame encode (frame pattern).
2) two fields of nonjoinder are to each encode separately (field mode).
3) two fields are merged into independent frame, to (macroblock pairs), can select to be broken down into two macro blocks of field, top and field, the end for neighboring macro-blocks on the vertical direction during coding, it is right perhaps to decompose the framing macro block.
Can in above-mentioned three kinds of patterns, freely select its coding mode for each frame in the sequence.Preceding two kinds of patterns are called image adaptive frame/field (Picture-Adaptive Frame/Field, PAFF) coding.Test shows, for standard sequence " Canoa " and " Rugby ", PAFF coding can reduce by 16% to 20% bit rate than the coding by frame pattern only.
If some regional movement in the frame and do not move in remaining zone is so to non-moving region framing code, can make compression performance more better to the moving region by the field coding.Therefore, frame/field coding selection also can independently be selected (16 * 32 luminance areas) by the macro block of every frame.This coded system is called macro block adaptive frame/field (MacroBlock-AdaptiveFrame/Field, MBAFF) coding.It is worthy of note that MBAFF selects to use a frame or a coding at macro block to level rather than in macro-block level.The benefit of doing like this is both to have guaranteed basic macro block processing structure, has guaranteed that again the piece of motion compensated area can reach 16 * 16.Test shows, for standard sequence " Mobile and Calendar " and " MPEG-4 WorldNews ", the MBAFF coding can reduce by 14% to 16% bit rate than the PAFF coding.
2.2.7 infra-frame prediction
All band type of codings are all supported following macroblock encoding type: Intra_4 * 4, Intra_16 * 16 and I_PCM predictive mode.
Intra_4 * 4 patterns are carried out independent prediction to each luminance block of 4 * 4, and this pattern is suitable for the more part of details in the piece image is encoded.And Intra_16 * 16 patterns are that whole 16 * 16 luminance block are predicted, thereby are more suitable in the smooth region in the image is encoded.When adopting this luminance block predictive mode of two types, corresponding chrominance block also needs to do infra-frame prediction.Except that Intra_4 * 4 and Intra_16 * 16, also have the I_PCM pattern, this pattern directly sends the sampled value of original image current macro, need not to predict and transition coding.
When adopting Intra_4 * 4 patterns, each piece of 4 * 4 is predicted by the value of the space neighbor shown in Figure 10 (a).16 pixel values of 4 * 4 (label is a-p) are obtained by decoded pixel value in the contiguous block (label is A-Q) prediction.4 * 4 predictive mode always has nine kinds.Except that " DC " predictive mode, all the other eight kinds of directional prediction modes are by shown in Figure 10 (b).These eight kinds of directional prediction modes are suitable for the texture that has directivity in the predictive image, as the edge of different directions.
When adopting Intra_16 * 16 patterns, directly 16 * 16 luminance block of macro block are predicted.Intra_16 * 16 have four kinds of predictive modes: pattern 0 is vertical prediction (verticalprediction), pattern 1 is horizontal forecast (horizontal prediction), pattern 2 is DC prediction (DC prediction), and pattern 4 is planar prediction (plane prediction).
Because colourity all is level and smooth in big scope usually, so chrominance block adopts the predictive mode similar to Intra_16 * 16.In addition, for guaranteeing the independence between each band, do not allow to adopt the pixel value of other bands to carry out infra-frame prediction.
2.2.8 inter prediction
1.P the inter prediction of band
For the macro block in the P band, both can adopt the pattern of infra-frame prediction, also can adopt the pattern of inter prediction.The macro block of each inter prediction all need be divided according to the demand of motion compensated prediction, and the regulation luminance block can be divided into 16 * 16,16 * 8,8 * 16 or 8 * 8 piece in the standard; Wherein, 8 * 8 piece can further be divided into 8 * 4,4 * 8 or 4 * 4 fritter again, as shown in figure 11.
The predicted value of the M of each predictive coding * N luminance block is taken from a zone in the reference frame, and this motion prediction is represented by translational motion vector and reference frame index value.Under the situation of the limit, if macro block adopts four 8 * 8 piece, and each piece of 8 * 8 again and then resolve into four 4 * 4 piece, and a macro block will transmit 16 motion vectors so.
The precision of motion compensation is 1/4th of the luminance component sampling interval.If motion vector points integral point sampling location, prediction signal is a sampled value corresponding in the reference frame so.If motion vector points non-integer point sampling position, prediction signal will be tried to achieve by interpolation calculation so.The predicted value of half-pixel position is tried to achieve by the one dimension 6-tap FIR filter on level or the vertical direction.The predicted value of 1/4th location of pixels is the mean value of whole pixel and half-pix point sampling value.
The predicted value of chrominance block obtains by bilinear interpolation.Because the sampling resolution of the sampling resolution specific luminance piece of chrominance block is low, the displacement accuracy of chrominance block is 1/8th pixels.
H.264/AVC adopted the motion prediction of 1/4th pixel precisions that T.Wedi (2003) proposes, this is H.264/AVC than the important improvement of other early stage standards.H.264/AVC allow motion vector to exceed image boundary, answer the pixel in the nearest image in service range motion vector indication position to carry out border extension in this case.The motion vector that transmits in the code stream is the difference of current motion vector and motion vectors, and motion vectors is the intermediate value of contiguous block motion vector.In addition, cannot cross over band boundaries and carry out motion prediction.
H.264/AVC support the multiframe motion compensated prediction that T.Wiegand (1999,2001) proposes.That is to say that a plurality of frames after the previous coding can be as the reference frame of motion compensated prediction, as shown in figure 12.
The multiframe motion compensated prediction requires encoder all will preserve reference frame in multi frame buffer district (multi-picture buffer).(Memory Management Control Operation MMCO) duplicates the multi frame buffer district of encoder to decoder according to the storage administration control operation of appointment in the code stream.Unless the size in multi frame buffer district is made as a frame, otherwise must the index value of specified reference frame in the multi frame buffer district.16 * 16,16 * 8,8 * 16 or 8 * 8 luminance block for each motion compensation all must transmission reference frame index parameter.8 * 8 must adopt identical reference frame with interior motion compensation block.
Except above-described motion compensation macro block mode, the P macro block also can adopt the P_Skip pattern-coding.This pattern is neither transmitted the conversion coefficient after the quantification, also not transmitting moving vector and reference frame index value parameter.P_Skip macro block acquiescence is used 16 * 16, and the reference frame index parameter is 0, and motion vector is a predicted value.The purposes of P_Skip coding mode is to represent not have with considerably less bit number the bulk image-region of variation or steady motion.
2.B the inter prediction of band
H.264/AVC the B band that has adopted M.Flierl etc. (1998,2003) to propose, and the B band can be used as the reference frame of motion compensated prediction.Like this, the B band is that with the essential difference of P bar interband some macro block or piece in the B band can use the predicted value of the weighted average of two different motion compensated predictions as current block.The B band uses two different reference picture lists (reference picture lists), is respectively tabulation 0 (list 0) and tabulation 1 (list1).
The B band is supported four kinds of different inter prediction types: tabulation 0, tabulation 1, bi-directional predicted and directly prediction (direct prediction).For bi-predictive mode, prediction signal is the weighted average of motion compensation tabulation 0 and tabulation 1 prediction signal.Directly predictive mode is known by inference by the syntactic element of previous transmission, can be tabulation 0 prediction, or 1 prediction of tabulating, or bi-directional predicted.
The macroblock partitions that the B band uses is similar to the P band.Except P_16 * 16, P_16 * 8, P_8 * 16, outside P_8 * 8 and the intraframe coding type, the B band is also supported bi-directional predicted and direct prediction.For each 16 * 16,16 * 8,8 * 16 and 8 * 8 piece, can select separately prediction mode (tabulation 0, the tabulation 1, or bi-directional predicted).8 * 8 subregions in the B macro block also can be encoded with Direct Model.If the conversion coefficient after the macro block of a direct coding does not quantize needs to transmit, claim that so this pattern is the B_Skip pattern, this pattern adopt with the P band in the P_Skip mode class like method encode.The coding mode of motion vector and P band are similar, because contiguous piece may encode with different predictive modes, so need do some suitable adjustment.
2.2.9 transform and quantization
Similar to early stage video encoding standard, H.264/AVC also to carry out transition coding to prediction residual.But, distinguishedly be, what H.264/AVC adopt is 4 * 4 integer transforms, (Discrete Cosine Transform DCT) has similar quality for this conversion and 4 * 4 discrete cosine transforms.Transformation matrix is as follows:
Owing to adopt accurate integer operation to define inverse transformation, so avoided the positive inverse transformation of other standard phenomenon that do not match.Basic transition coding process and other standard are closely similar: in encoder-side, carry out DCT direct transform, the scanning of " it " word, quantification and entropy coding; Carry out opposite process in decoder end.Ask for an interview list of references H.Malvar etc. (2003) about the detailed information of H.264/AVC conversion.
Preamble was carried, and Intra_16 * 16 predictive modes and chrominance block pattern are used for the smooth region of encoding.For Intra_16 * 16 predictive modes, the DC coefficient behind the dct transform need carry out 4 * 4 Hadamard conversion for the second time.For each chrominance block, the DC coefficient behind its dct transform need carry out 2 * 2 Hadamard conversion for the second time.Adopt conversion for the second time to be because the two-dimensional transform of smooth region has following characteristic: the reconstruction precision of smooth region and the one dimension size of conversion are inversely proportional to.Therefore, for very level and smooth zone,, adopted the quadratic transformation can be so that reconstruction error reduces a lot than the situation of only using 4 * 4 conversion.
Select the reasons are as follows of undersized dct transform for use: make that with inter-frame forecast mode the spatial coherence of residual error is less in the frame H.264/AVC.This just means aspect the removal correlation little to the conversion demand, thereby it is just enough that statistic correlation is eliminated in the conversion of employing 4 * 4.
Under identical objective compressed capability, 4 * 4 conversion visually have the effect that reduces near the noise (being usually said ring distortion) in edge.Simultaneously, the demand of undersized conversion aspect amount of calculation and processing word length two is all less.Because conversion process H.264/AVC only uses addition and shifting function, so avoided the problem (annotating: adopt the standard of 8 * 8DCT conversion all to have this problem) of inaccuracy coupling between the encoder.
H.264/AVC decide the quantizing process of conversion coefficient with quantization parameter.The interval of this quantization parameter is the arbitrary integer between 0 to 51.The every increase by 1 of quantization parameter, expression quantization step increase about 12% (quantization parameter increases by 6, just one times of expression quantization step increase).It should be noted that quantization step increases by 12% and just means that bit rate approximately reduces 12%.
Conversion coefficient after the quantification adopts the scanning of " it " font usually, carries out entropy coding and transmission then.2 * 2DC coefficient of chrominance block scans with the grid mode.Just can realize H.264/AVC inverse transformation with 16 additions of integer and shifting function.Equally, encoder-side also only needs 16. and operation can realize positive-going transition and quantizing process.
2.2.10 entropy coding
H.264/AVC support two kinds of entropy coding methods.Wherein, variable length encoding method adopts single exp-Golomb code table to encode to remove all syntactic elements the conversion coefficient after the quantification.Do like this and just avoided the VLC table that is that each syntactic element design is different, it is just passable only to be customized to the mapping of this unique code table according to statistics.
When the conversion coefficient that transmits after quantizing, H.264/AVC used the coded system of a kind of more effective CAVLC of being called (Context-Adaptive Variable Length Coding).In this coded system, different syntactic elements uses different VLC code tables.Because being statistical laws of the corresponding syntactic element of basis, these VLC code tables design, so the performance of entropy coding is better than the situation of using unique VLC code table.
If the CABAC (Context-AdaptiveBinary Arithmetic Coding) that (2003) such as employing D.Marpe propose can further promote the efficient of entropy coding.At first, use arithmetic coding to allow bit number, be a significant benefit to the coding of probability like this greater than 0.5 symbol to each allocation of symbols non-integer number.The another one key property of CABAC is its context model, promptly estimates conditional probability with the statistical information of syntax elements encoded, and these conditional probabilities are used for switching between different probabilistic models.H.264/AVC Gui Ding arithmetic coding core engine and relevant probability Estimation thereof all adopt the mode of tabling look-up, thereby have avoided multiplication, have reduced complexity.Compare with CAVLC, CABAC can make bit rate reduce 5%-15%.For interleaved TV signal, the advantage of CABAC is the most obvious.
2.2.11 loop goes mosaic filtering
A common characteristic of block-based coding is exactly the block structure that exists naked eyes to distinguish in the decoded image.Usually, the reconstruction error at the edge of piece is greater than the reconstruction error of the inside of piece.The topmost vision distortion that exists in the current encoder technology is exactly this " mosaic " phenomenon.H.264/AVC the adaptive loop circuit that has adopted P.List etc. (2003) to propose goes mosaic filter (adaptive in-loop deblocking filter) to come mosaic is carried out smoothing processing.
Go the basic thought of mosaic filtering as follows: the antipode between the sample value on if block border is bigger, so just probably has the piece distortion, therefore should reduce this species diversity, promptly carries out filtering.On the other hand, the antipode between the sample value on if block border is excessive, to such an extent as to can not explain that this border probably is the real border in the original image so, should not do smoothing processing with the roughness that quantizes to cause.Figure 13 illustrates the principle of mosaic filtering with an one dimension border.p 0And q 0Only when satisfying following condition, do Filtering Processing:
|p 0-q 0|＜α(QP)and|p 1-p 0|＜β(QP)and|q 1-q 0|＜β(QP)，
Wherein, β (QP) is less than α (QP).p 1And q 1When satisfying following condition, do Filtering Processing:
|p 2-p 0|＜β(QP)or|q 2q 0|＜β(QP)。
The whole concept of going mosaic filtering is both to have guaranteed to reduce blocking effect, guarantees that again the border of original image is motionless substantially, thereby can improve decoded subjective quality greatly.Under the prerequisite of same objective quality, use filter can reduce bit rate 5%-10% than the situation of not using filter.Figure 14 has showed the performance of removing the mosaic filter.
2.2.12 imaginary ginseng person's decoder
Video standard must guarantee to decode standard compliant decoder, and all defer to standard] code stream.Accomplish this point, it is not enough only describing code decode algorithm.In real-time system, also must specify decoder how to obtain code stream and when can delete decoded image.Thereby need definition hypothetical reference decoder (Hypothetical Reference Decoder, HRD) the receiver model of specifying the inputoutput buffer model and haveing nothing to do with realization.Like this, the code stream that has only encoder to generate can be decoded by HRD, thinks that just encoder is legal.So, as long as the implementation procedure of receiver has been imitated the behavior of HRD, just can guarantee to be correctly decoded the code stream that all defer to standard.
H.264/AVC HRD stipulates the operation of following two buffering areas: 1. the bit stream buffer district (Coded Picture Buffer, CPB) and 2. decoding back image buffer (DecodedPicture Buffer, DPB).H.264/AVC the design concept of HRD is similar to MPEG-2, but but flexible unusually aspect the video transmission, has fully guaranteed the variable bit rate transmission in a fixed response time.
Different with MPEG-2 is, has H.264/AVC adopted the multiframe reference, and putting in order of reference frame is also different with DISPLAY ORDER.Therefore, H.264/AVC HRD also need specify the buffer management model of decoding back image, guarantees to use suitable memory space to come the reference frame of storage decoder end.
2.3H.264/AVC special technology
Compare with previous video encoding standard, the special technology that has H.264/AVC improved coding efficiency aspect prediction is as follows:
● motion compensation block sizes is flexible: H.264/AVC much flexible than other standards aspect the selection of the block size of motion compensation, minimum block size is 4 * 4.
● 1/4 pixel precision motion compensation: most previous standards adopt the half-pix motion vector at the most.H.264/AVC this is improved, adopt the motion vector of 1/4 pixel precision, this thinking originates from the advanced class (advanced profile) of MPEG-4 video standard (part 2).
● motion vector allows to surpass image boundary: MPEG-2 and standard before all requires in the border of the essential sensing of motion vector reference frame.H.264/AVC allow beyond the border of motion vector points reference frame, the border with the value of exterior pixel be edge pixel to extrapolated value.This border extrapolation technique originates from option H.263.
● the encoded predicted frame of multiframe reference: MPEG-2 and standard before thereof (i.e. " P " frame) only uses next-door neighbour's former frame to predict the value of present frame.H.264/AVC expand enhancement mode reference frame selection technology H.263++, allowed encoder in decoded a plurality of frames, to select the piece of coupling.H.264/AVC in bi-directional predicted " B " frame, also allow to use the multiframe reference.And in MPEG-2, the B frame only allows to use two specific frames, and one is according to DISPLAY ORDER forward direction adjacent I frame or P frame, another be according to behind the DISPLAY ORDER to adjacent I frame or P frame.
● the independence of reference frame order and DISPLAY ORDER: in the standard before H.264/AVC, between the order of the order of motion compensation reference frame and demonstration output strict correlation is arranged.And in H.264/AVC, removed this correlation restriction, and determining reference frame order and DISPLAY ORDER flexibly by encoder, suffered unique restriction is the memory capacity of decoder.Another benefit of removing this correlation is to have eliminated the extra time delay demand of bi-directional predicted frames when showing.
● the independence of graphical representation method and image reference ability: in the standard before H.264/AVC, the B frame can not be as the reference frame.New standard has been removed this restriction, thereby makes encoder have more flexibility, can use as a rule with the more approximate frame of the frame of present encoding as the reference frame.
● weight estimation: allow H.264/AVC that the appointment according to encoder is weighted and is offset to motion compensated prediction, this is an innovation H.264/AVC, and this technology has improved the code efficiency of the scene of being fade-in fade-out greatly.
● " skipping " of enhancing and " directly " motion macro block: in the standard before H.264/AVC, the motion vector of the image-region of " skipping " predictive coding must be zero.Like this, if the present frame and the pass between the reference frame of coding are global motion, just can't adopt " skipping " pattern to encode, so can not improve code check well.And in standard H.264/AVC, the predicted value that " skipping " pattern is improved to the motion vector that adopts adjacent block is used as the motion vector of current macro.For bi-directional predicted region (being the B band), H.264/AVC also adopted improved " directly " movement compensating mode, thereby than H.263+ and " directly " pattern of MPEG-4 (part 2) have more advantage.
● the directivity spatial prediction of intraframe coding: H.264/AVC be first video standard that allows in the P frame, to use inter-coded macroblocks.For the piece of intraframe coding, the value of current block by in the present frame the sample value extrapolation of decoding block obtain, do like this and can improve forecast quality.
● go mosaic filtering in the ring: block-based video coding can produce piece distortion (blockingartifacts), and this distortion derives from according to the prediction of piece and conversion.It is the effective ways that improve video quality that self adaptation is removed the mosaic filter, if reasonable in design, can improve subjectivity and objective video quality greatly.H.264/AVC the thought source that removes the mosaic filter that adopts in is in option H.263+.H.264/AVC be applied in the motion-compensated prediction loop, like this, adopt and improved the reference frame of the frame of quality, thereby improved predicted quality as follow-up inter-frame encoding frame.
Except the Forecasting Methodology of top enhancing of summing up, the design of the raising code efficiency that other parts H.264/AVC adopt is listed below:
● the fritter conversion: 8 * 8 the dct transform that all video encoding standards before are H.264/AVC all used, and H.264/AVC mainly adopt 4 * 4 dct transform, so just can effectively reduce ring (ringing) distortion.Simultaneously, the dct transform of employing 4 * 4 also is the needs that adapt to 4 * 4 prediction pieces.
● graduate conversion: though use 4 * 4DCT conversion that benifit is arranged as a rule, some signal is more suitable in adopting large-sized dct transform.In standard H.264/AVC, allow low frequency aberration piece is adopted 8 * 8 conversion, correspondingly, also can adopt 16 * 16 dct transform for the low frequency grey blocks of intraframe coding.
● the long conversion of short word: H.264/AVC use 16-position dct transform, and standard is before all used 32 dct transform.
● accurate inverse transformation: because previous video encoding standard all adopts the floating-point dct transform, therefore, inverse transformation is coarse, and for same compressed sequence, the output of the video of different decoders can be slightly different, thereby reduced video quality.And H.264/AVC adopt integer transform, guaranteed that all decoders can produce identical output.
● arithmetic entropy coding: the entropy coding method that has H.264/AVC adopted a kind of advanced person---arithmetic coding, arithmetic coding are optional features H.263.In H.264/AVC, use the arithmetic coding of enhancement mode, be called CABAC (Context-Adaptive Binary ArithmeticCoding).
● context-sensitive entropy coding: two kinds of entropy coding methods that adopt H.264/AVC are called CAVLC (Context-Adaptive Variable-Length Coding) and CABAC.These two kinds of entropy coding methods were compared with former standard, and advance is based on contextual adaptivity.
H.264/AVC standard has been adopted some new technologies, has guaranteed that these new technologies comprise to the robustness of data transmission fault and the flexibility of operating under different network environments:
● the parameter set structure: in the standard before H.264/AVC, the several crucial bit of losing sequence head or frame head information will cause decoder normally to export.H.264/AVC these parameter informations are separated, thereby made its transmission more flexible, guaranteed effective reliable transmission of header.
● NAL unit syntactic structure: each syntactic structure H.264/AVC all is encapsulated in the logical data bag that is called the NAL unit, and NAL unit syntactic structure makes video data to be transferred to particular network in the mode of customization.
● stripe size flexibly: fixedly band (slice) structure that adopts with MPEG-2 is different, and H.264/AVC standard allows to set flexibly stripe size.The use of band is favourable also fraud: on the one hand, use band can increase a data, reduce the validity of prediction, thereby can reduce code efficiency; On the other hand, use band can increase the degree of parallelism of encoder, and help promoting the robustness of Network Transmission.
● flexible macro block ordering (FMO): adopt slice-group (slicegroup) to come image-region is carried out subregion H.264/AVC, each band all is a subclass can independently decoding in the slice-group like this.If use properly, by managing each regional spatial relationship that each band comprises, FMO can significantly improve the robustness of transfer of data.Certainly, FMO also can be used for other purposes.
● bar tape sort (ASO) arbitrarily: because each band in the frame can not rely on other bands and independent decoding in the present frame, so H.264/AVC allow to send and receive each band in the frame with random order.This ability can reduce in the real-time application network delay end to end, is highly suitable for so unordered procotol such as IP.
● redundant coded picture: in standard H.264/AVC,, allow encoder that image-region is carried out redundant transmission in order to strengthen the robustness that data are lost.For example, if high-definition picture is lost, the low-resolution image that can transmit a present frame so again remedies.
● the strip data subregion: because some coded message (as motion vector and other information of forecastings) is more important than other information in the code stream, therefore, three different subregions transmit H.264/AVC to allow syntactic element with each band to be divided at the most.
● SP/SI synchronously/switching frame: SP/SI is the frame type that H.264/AVC increases newly in the standard, use the SP/SI frame, do not need to adopt the I frame that decoder is switched between different sequences, or the data of losing are recovered, and effectively support the fast forwarding and fast rewinding function.
2.4H.264/AVC class and rank
Class and rank provide the H.264/AVC means of the syntax and semantics subclass of standard that define, and have therefore also just defined the decoder capabilities at a certain specific bit stream.Class (profile) is the subclass of a qualification of the defined whole bitstream syntax of standard H.264/AVC.Rank (level) is the limitations set of parameter in the bit stream.Consistency detection will realize at the rank of class that limits and qualification.
H.264/AVC define class and other constraint portions of level in the standard, all can not got any possible values that H.264/AVC standard allowed by the syntactic element of immediate constraint and parameter.If decoder can be correctly decoded all permissible values of the syntactic element of certain rank defined of certain class, claim that then the given class of this decoder and this is consistent in given rank.If a bit stream is no more than the allowed band of permissible value and does not comprise unallowed syntactic element, think that then the given class of this bit stream and this is consistent on given rank.For encoder, do not require the given class of its use at given other whole syntactic elements of level, but necessary generation and this given class are at the corresponding to code stream of given rank.
H.264/AVC standard code three class: basic (Baseline), main (Main), and expansion (Extended) class.
Basic class (Baseline profile) is supported all features except that following two feature sets in the standard H.264/AVC:
● set 1:B band, weight estimation, CABAC, a coding, image or macro block adaptive frame/field coding.
● set 2:SP/SI band, strip data subregion.
Main file is supported set 1.But main file time is not supported FMO, ASO and the redundant coded picture supported in the basic class.Like this, for the encoded video sequence that basic class decoder can be decoded, main file time decoder one of them subclass of only decoding.Concentrating in sequential parameter has flag bit to specify the decoder of what class this encoded video sequence of decoding.
Whole features that the basic class decoder of expansion class support is supported, and the whole features in two set above except that CABAC.
In standard H.264/AVC, three class are adopted identical level definitions.Defined 15 ranks in the standard altogether, the content of these rank regulations comprises: the upper limit of picture size, decoder processes speed, the size in multi frame buffer district, video bitrate, screen buffer size or the like.A concrete realization can be supported arbitrary rank of a class.
H.264/AVC high compression performance not only can strengthen existing application, and Video Applications will inevitably be extended in the middle of some new fields.Listed H.264/AVC possible application below:
● conversational services: the transmission bandwidth of this class service request is lower than 1Mbps, and requires low time delay.This class service in a short time can be used basic class, afterwards can be gradually to the transition of expansion class.Such typical case uses as follows:
---adopt the H.320 conversational video service of circuit switching ISDN
---H.324/M the 3GPP session serves
---adopt the H.323 conversational services of IP/RTP agreement by Ethernet
---with IP/RTP as host-host protocol, SIP 3GPP conversational services as the session layer agreement
● entertainment video is used: the bandwidth of this class service request is 1-8Mbps, and time delay is 0.5 to 2 second.Such application may be used main file.The typical case uses as follows:
---via satellite, the broadcasting of cable or DSL
---the DVD of single-definition or high definition
---by the video request program of various passages
● streaming media service: the bandwidth requirement of this class service is 50-1500kbps, and time delay is more than 2 seconds.Such service may be adopted basic class or expansion class.Typical case's application can be divided into wired and wireless two big classes by transmission channel:
---as the 3GPP streaming media service as the session layer agreement of host-host protocol, SIP, this service generally can be adopted basic class with IP/RTP.
---based on Ethernet, with IP/RTP as the streaming media service of host-host protocol, SIP as the session layer agreement.Service in this field is dominated by privately owned solution at present, and such service should be used the expansion class, and estimation need could be used with the system design in future is mutually integrated.
● other service: low bit rate is adopted in such service, and transmits by file, therefore time delay is not required.Such service should be used some in three class according to concrete application choice.Application example comprises:
---the 3GPP MMS (Multimedia Message Service)
2.5 brief summary
H.264/AVC video encoding standard is to develop jointly also standardized by ITU-T VCEG and two tissues of ISO/IEC MPEG.H.264/AVC adopted a large amount of advanced persons' the flexibility of video coding technique to improve code efficiency and to ensure network application.H.264/AVC the design of video coding layer has adopted traditional block-based motion compensation to add the hybrid coding pattern of conversion, but compares and exist some significant differences with early stage standard.These differences briefly are summarized as follows:
● the motion prediction ability of enhancing;
● fritter (4 * 4) is dct transform accurately;
● remove the mosaic filter in the self adaptation ring;
● the entropy coding mode of enhancing.
Under the prerequisite of identical visual quality, bit rate H.264/AVC approximately reduces by 50% than early stage standard.For the application of high time delay, advantage H.264/AVC is more obvious.
3H.264/AVC decoder optimization
H.264/AVC be ideal standard by the video information of cable, satellite, communication network transmission broadcasting-quality.Because the complexity of standard itself, it is much bigger that the required amount of calculation ratio of encoding-decoding process has other standard now.Therefore, to H.264/AVC the algorithm of encoding and decoding program and the optimization of the technology of realization are the active demands that storage and Streaming Media are used.
The H.264/AVC optimization realization technology of basic class decoder is devoted in this part.We take up from identifying code JM61e, obtain H.264/AVC basic class decoder by reducing identifying code.Then each important module of basic class decoder is carried out analysis of complexity, determined module the most consuming time in the H.264/AVC basic class decoder.Pure C version optimisation technique with platform independence has been proposed afterwards, comprise realization technology such as reference frame filling, the compensation of adaptive block movement size and quick idct transform, this version is to carry out based on the further basis of optimizing of par-ticular processor (CPU, or DSP).Method and experimental result that the MMX/SSE/SSE2 multimedia instruction collection at Intel CPU is optimized have been provided then.Sum up for this part at last.
3.1 obtain basic class decoder
The decoder optimization work of pure C version can be divided into two steps: the first step is partly to obtain basic class decoder by reducing non-basic class on the identifying code basis; Algorithm and program level optimization that second step just was intended to improve speed.Reduce step and want much simple than optimization step.This section tells about how to set about obtaining basic class decoder from identifying code.
3.1.1H.264/AVC ginseng person's code evolution process
Table 1 standardized process H.264/AVC
H.264/AVC evolution may be summarized to be the progressively improvement of each version between year March in August, 1999 to 2003.Table 1 has provided the creation-time and the place of each version, wherein be with " *" number expression do not have corresponding identifying code.VCEG claims when developing this standard separately that document and software model are TML (Test Model Long-term), and afterwards, the JVT of associating group of MPEG and VCEG is referred to as JM (Joint Model).
VCEG has started research H.264 in H.263 finishing up, first version TML-1 of TML came out in August, 1999.Figure 15 has shown each version encoder performance of Foreman sequence (frame per second the was 10 frame/seconds) test with the QCIF size, and by this figure as seen, performance is improved little between two contiguous identifying code versions.The PSNR performance of TML-1 is to H.263 similar, and is lower than MPEG-4 ASP slightly.H.264/AVC final version JM-6 has some TML-1 feature not to be covered, such as field coding, B image, NAL or the like.Under the identical bit situation, JM-6 has improved 2-3 dB than the PSNR value of TML-1; Under the situation of identical PSNR value, JM-6 has reduced 40%-60% than the bit rate of TML-1 in other words.
3.1.2 how to obtain basic class decoder
How this part explanation reduces identifying code to obtain basic class decoder.JM6.1e was the identifying code latest edition when this work started, so we begin reduction work from JM6.1e.Though identifying code also has later release to occur afterwards, JM6.1e has realized whole programs of basic class codec, is very suitable so select JM6.1e for the optimization work of basic class codec.
The decoder that meets a certain class must be correctly decoded all permissible values of the syntactic element of this class defined.This just requires to prepare enough basic class code streams before reduction, constantly with all basic class code streams the decoder of current state is tested in the reduction process, to guarantee to reduce correctness and the completeness of back version on function.Correctness is meant guarantees that in the reduction process PSNR (peak signal-to-noise ratio) value remains unchanged; Completeness is meant all basic class code streams of can decoding.
If guarantee correctness is the basic problem of reduction process, so, guarantees that completeness is to reduce basic class decoder matter of utmost importance to be solved.How to ensure completeness? this just requires a plurality of cycle testss and all permissible values of the syntactic element of basic class encoder defined are tested.
In the configuration file of identifying code encoder, have 75 parameters, these parameters are used for cooperating input-output file, coding parameter, band, B frame, SP frame, hunting zone, rate-distortion optimization, loop filter and CABAC context initialization information or the like.Wherein, the basic class decoder parameter and the span thereof that must be able to be correctly decoded listed in the table 2.
The parameter list that the basic class decoder of table 2 must be able to be correctly decoded
The cycle tests that we adopt is six standard test sequences shown in Figure 16, and these sequences are the CIF size, and length from 250 to 2000 frames do not wait.Wherein, three sequence motions of flowergarden, tempete and mobile are more violent, and three sequence motions of foreman, highway and paris are milder.Carry out assembly coding with each parameter in these six cycle testss and the table 2, just generated the basic class bit stream data storehouse that we are used to test.
The reduction process is as follows:
(1) basic class decoder has only I band and P band, therefore, the first step of reducing process be deletion all with B band, BS band, program that the SP band is relevant with the SI band.
(2) only keep a subregion (partition) and place input code flow.
(3) in basic class, each coded image of video sequence all is only to comprise frame macroblock encoding frame, that is to say, basic class is not supported interleaved frame.Therefore, the 3rd step was all decode procedures relevant with the field of deletion, and this step is very loaded down with trivial details.
(4) because the P band of basic class is not supported weight estimation, so the 4th step was the deletion program relevant with weight estimation.
(5) the entropy coding mode that adopts in the basic class is CAVLC, deletion CABAC decoding program.
(6) do last cleaning work: delete redundant variable and function, use the least possible memory space.
3.1.3 basic class decoder complexity is analyzed
To the basic class decoder after reducing be optimized, at first must carry out analysis of complexity, be optimized at more module consuming time then to determine calculating bottleneck.So, how to carry out analysis of complexity? from the angle of completeness test, should use the whole ASCII stream file ASCIIs in the basic class bit stream data storehouse.But, from the angle of decoder optimization, need not whole parameters listed in the consideration table 2, because a lot of parameters in the table 2 (such as UseHadamard, RDOptimization or the like) mainly influence the performance of encoder, and irrelevant with decoder.Like this, to analyze the complexity of basic class decoder, can adopt some fixing, relative simple configuration, come six video sequences are analyzed, so just can mask the influence of complicated input parameter, thereby directly manifest all video sequences a lot of functional module all consuming time.
The configuration of the encoder that carries out analysis of complexity that we adopt is as follows:
(1) single frames is with reference to (NumberReferenceFrames=1);
(2) every frame has only a band (SliceMode=0);
(3) use an I frame (IntraPeriod=100) every 100 frames;
(4) quantization parameter is 25 (QPFirstFrame=QPRemainingFrame=25);
(5) do not use Hadamard conversion (UseHadamard=0);
(6) the maximum search scope is 16 (SearchRange=16);
(7) do not adopt rate-distortion optimization (RDOptimization=0);
(8) use all from 16 * 16 to 4 * 4 block type (InterSearch16 * 16=1, InterSearch16 * 8=1, InterSearch8 * 16=1, InterSearch8 * 8=1, InterSearch8 * 4=1, InterSearch4 * 8=1, InterSearch4 * 4=1);
(9) do not adopt FMO (num_slice_groups_minus1=0);
(10) pixel of interframe encode can be used for intra-frame macro block prediction (UseConstrainedIntraPred=0);
(11) the POC pattern is 0 (PicOrderCntType=0).
Adopt above-mentioned configuration, six sequences are encoded, coding efficiency is as shown in table 3.By table 3 as seen, because the contained content difference of video sequence, under the situation that adopts the same-code configuration, the quality of reconstruction sequence (SNR) also has nothing in common with each other, and the difference on the bit rate is obvious especially.On basic, the mild sequence bits rate of moving is lower, and it is also better to rebuild the back objective quality; The violent video bits rate of moving is higher, and it is a bit weaker to rebuild the back objective quality.
The coding efficiency of six standard test sequences of table 3 under identical configuration
Use the code stream of these six sequences, the performance of the basic class decoder after reducing is analyzed, analysis result is as shown in table 4.By this table as seen, motion compensation and entropy decoding are two modules the most consuming time, and only these two modules have just taken 70%～85% of total decode time.In addition, IDCT, loop filtering and buffer management module are also more consuming time.To be respectively in the second be the decode time and the decoded frame rate of unit to last two row in the table.
Table 4 is reduced basic each main functional modules of the class decoder distribution consuming time in back
Illustrate: identical test platform has all been used in all tests that the present invention carries out.CPU is a Pentium IV processor, and dominant frequency is 2.4GHz, in save as 512Mbytes, operating system is Windows 2000 professional.
3.2 algorithm level optimization
The function of identifying code is to be used for each conception of species in the validation criteria, and the instrument that the quality of certain specific implementation of checking is provided.The main target of identifying code be correctness and with the consistency of standard, speed issue is not gived top priority.Therefore, be necessary to study H.264 optimization Algorithm scheme in the codec.
We inquire into decoder algorithm level optimisation technique with platform independence this part.By above as can be known, five modules of the most critical in the decoder are respectively: motion compensation, entropy decoding, buffer management, inverse transformation and loop filtering.Be to accelerate the speed of decoder, we need concentrate on these optimization work of module consuming time, the fast algorithm of setting forth these functional modules below successively.
3.2.1 motion compensating module fast algorithm
From the angle of algorithm, motion compensating module can be subdivided into two submodules again, i.e. luminance component motion compensating module and chromatic component motion compensating module.H.264/AVC adopt the motion vector of 1/4 pixel precision, the calculating diagram of luminance block and chrominance block motion compensation is respectively as Figure 17 and shown in Figure 180.For luminance block, 1/2 pixel comes interpolation to obtain with 6-tap FIR filter, and the weights of filter are respectively (1/32 ,-5/32,5/8,5/8 ,-5/32,1/32); 1/4 pixel is the adjacent integral point or the linear average of 1/2 pixel.For the video sequence of 4:2:0 sampling, the luminance block motion vector of 1/4 pixel precision is corresponding to the chrominance block motion vector of 1/8 pixel precision, and each 1/8 chroma pixel is obtained by adjacent four integral point linear interpolations.
H.264/AVC the formula of the motion compensated interpolation of decoder calculating is strict regulations in the standard, and the identical algorithm of codec employing, so will accelerate the execution speed of motion compensating module, can only set about from implementation method.For motion compensating module, we have proposed " reference frame filling " and " adaptive block size MC " two kinds of optimisation techniques, will introduce respectively below.These two kinds of optimisation techniques promptly are applicable to the luminance block motion compensation, are applicable to the chrominance block motion compensation again.
I. reference frame is filled
H.264/AVC standard allows motion vector points image boundary position in addition, and so, when calculating the inter prediction value of P-macro block, the location of pixels in the reference frame of use just might exceed the height and width of image.Under the break bounds situation, adopt in the reference frame pixel value of close position to substitute.In identifying code, in every case use pixel in the reference frame (x_pos y_pos) carries out motion compensation calculations, all need to use following formula calculate actual use pixel (x_real, y_real):
For avoiding asking for the calculating of max min, we have proposed reference frame filling technique shown in Figure 19.
Need fill filling algorithm following (referring to Figure 19 (d)) to four bands in upper and lower, left and right of reference frame:
Region0: the respective value in the original reference frame
Region1: all pixels equal the pixel value in the upper left corner in the original reference frame
Region2: all pixels equal the pixel value in the upper right corner in the original reference frame
Region3: all pixels equal the pixel value in the lower left corner in the original reference frame
Region4: all pixels equal the pixel value in the lower right corner in the original reference frame
Region5: obtain by the first row pixel level extrapolation in the original reference frame
Region6: obtain by the vertical extrapolation of the first row pixel in the original reference frame
Region7: obtain by last row pixel level extrapolation in the original reference frame
Region8: obtain by the vertical extrapolation of last column pixel in the original reference frame
Y, three components of U and V are filled with same technology, and unique difference is that the filling width of U and V component is that the Y component is filled half of width.The Y component fill width value formula be max (mv_x, mv_y)+8.If motion search range is 16, and the object video motion is mild, and can be provided with and fill width is 24.For the more violent sequence of motion, need to be provided with bigger filling width value.It is worthy of note that the side effect of this reference frame filling technique is to need more memory space and extra filling time.
II. adaptive block movement size compensation
In identifying code, the motion compensation of all luminance components is all done according to 4 * 4 piece, and actual motion compensation block size is 16 * 16,8 * 16,16 * 8,8 * 8,4 * 8,8 * 4 or 4 * 4.If the predictive mode of a macro block is 16 * 16, identifying code need call 4 * 4 motion compensation functions 16 times so.Consider function call expense, the characteristic of 6-tap FIR filter and the influence of metadata cache, 16 inevitable motion-compensated times of motion-compensated time of 4 * 4 greater than 16 * 16 of singles.On the other hand, bigger piece (16 * 16,8 * 16,16 * 8) is obviously much higher than less piece frequency of utilization.X.S.Zhou (2003) points out by substantive test: 4 * 4,4 * 8 and 8 * 4 frequency of utilization only accounts for 5% of all pieces in the P frame.Therefore, we have proposed adaptive block movement size compensation technique, that is to say, directly carry out the interpolation operation of M * N size for the piece (value of M and N can be 16,8 or 4) of M * N, thereby avoid repeatedly calling 4 * 4 motion compensation function.
Chromatic component motion compensation block size should be 1/4 of luminance component motion compensation block size, promptly 8 * 8,4 * 8,8 * 4,4 * 4,2 * 4,4 * 2 and 2 * 2.In identifying code, the motion compensation of chrominance block is that pointwise is calculated, and like this, all will recomputate the A shown in dx, dy, 8-dx, 8-dy and Figure 18, B, C, D position for each point in the chrominance block.So, the adaptive block movement size compensation technique that adopts us to propose is calculated the motion-compensated values of chrominance block, just can significantly reduce amount of calculation.
3.2.2 entropy decoding fast algorithm
Contrast table 3 and table 4 as can be seen, for the higher sequence of code check, entropy decoding is shared proportion also big (being up to 60%) in decoder, so the optimization of this module is most important.The entropy decoded portion mainly is to read following message: frame head, macro block mode, reference frame index, motion vector, cbp, quantization parameter difference, intra prediction mode and residual error.Wherein reading of residual error is the most consuming time, so the optimization of entropy decoded portion focuses on reading of residual error.
Briefly introduce the coded system CAVLC (Context-basedAdaptive Variable Length Coding) of 4 * 4 residual errors below earlier, be beneficial to understand the bottleneck place of residual error entropy decoding.
In CAVLC entropy coding mode, all encode respectively in the actual value and the position of the number of the nonzero coefficient after the quantification (N), coefficient.The design of CAVLC has utilized the following properties that quantizes 4 * 4 of backs:
(1) after prediction, conversion and the quantification, 4 * 4 normally sparse (promptly comprising a lot of zero);
(2) " it " word scanning back maximum nonzero coefficient of occurrence number be+/-1;
(3) number of the nonzero coefficient of adjacent block is correlated with;
(4) amplitude of nonzero coefficient has the trend that reduces gradually from the low frequency to the high frequency direction.
Below we set forth one 4 * 4 the quantification of luminance block by an example after conversion coefficient how to encode.If " it " word scan example is:
1) nonzero coefficient number (N) and " (Trailing 1s) ": " Trailing 1s (T1s) " is illustrated in the number that scanning tail end absolute value is 1 coefficient.T1s=2 in this example, coefficient number is N=5.These two values are encoded together, and according to one in 4 VLC code tables of coefficient number selection use of adjacent block.
2) coding nonzero coefficient value: since T1s can only equal+1 or-1, so only need specify its symbol.Because the value of the coefficient of HFS is littler than the value of the coefficient of low frequency part, so coefficient value is encoded by backward.In this example, first coefficient to be encoded is-2, and the coding of this coefficient adopts initial code table.When the next coefficient of coding (this routine intermediate value is 6), if the coefficient of just having encoded surpasses a threshold value, just use new code table, otherwise use original code table, choose with regard to the self adaptation that has realized the VLC code table like this.The coding nonzero coefficient has six exp-Golomb code tables.
3) symbolic information: for T1s, the transmission of sign bit only needs a bit.For other coefficients, the encoded packets of sign bit is contained in and is included in the exp-Golomb code word.
The positional information of nonzero coefficient is encoded by the position of specifying zero before last nonzero coefficient, can be decomposed into following two steps:
4) Ling total number (TotalZeros): this code word specifies in the number of zero between last nonzero coefficient and the original position.Ling total number is 3 in this example.Because known N=5, so total number one of zero fixes in 0 to 11 interval.Because the span of N is 1 to 15, so total number of zero always has 15 code tables.If N=16 is not zero coefficient.
5) RunBefore: need to continue to specify zero distribution in this example.At first will the encode number of zero before last coefficient, number is 2 in this example.Also surplus now next zero, the number of zero before the penult coefficient should be 0 or 1 so, and number is 1 in this example.So far, the full detail coding finishes.
For six test code streams that we adopted, the detailed distribution consuming time of entropy decoded portion is as shown in table 5.Percentage is represented the ratio of entropy decoding each several part and whole decode times in the table.The 2nd classifies step 1 proportion consuming time as in this table, the 3rd classifies step 2 proportion consuming time as, the 4th classifies step 4 proportion consuming time as, the 5th classifies step 5 proportion consuming time as, the 6th classifies other parts (comprising step 3 and various function call expense) proportion consuming time of separating the residual error module as, and the 7th classifies as and separate the percentage that residual error accounts for whole decode times.Last is classified remainder code stream information except that residual error as and carries out the entropy shared time scale of decoding in the table.
Table 5 entropy decoded portion detailed distribution consuming time
Analyzed as can be known by table 5, decoding step 1 (number of total number of nonzero coefficient and HFS+/-1), step 4 (total number of zero before last nonzero coefficient) and step 5 (zero stroke between nonzero coefficient) are very consuming time.The general character of these three steps is that all wanting exploratory tables look-up.The encoder of identifying code and the encoding and decoding employing identical code table of decoder for residual error.Because encoder is that forward is tabled look-up, and promptly gets a numerical value according to each known dimension coordinate from form, so for encoder, it is not consuming time to table look-up.But for decoder, be not know that (because not knowing length) goes to determine the bidimensional coordinate figure under the situation of concrete value, this just has very big uncertainty.Like this, for the residual error decoding module of decoder, the transformation of form has just been become the key of optimization.The target that form reproduces is to reduce to table look-up number of times and read the code stream number of times as far as possible, form heavily transform also corresponding the change of program circuit.
3.2.3 quick IDCT
What the floating-point DCT/IDCT that adopts with other standard was different is that H.264/AVC standard adopts 4 * 4 integer transforms to carry out the conversion of signals of spatial domain and frequency domain.In identifying code, all to do two-dimentional idct transform to all 4 * 4.In fact, the coefficient after one 4 * 4 conversion and the quantification may be zero entirely, perhaps has only DC component non-vanishing.Therefore, we can judge required conversion dimension according to syntactic element coded_block_pattern (cbp) before actual idct transform.Syntactic element coded_block_pattern has shown which piece of 8 * 8 contains nonzero coefficient, as long as the predictive mode of macro block is not Intra_16 * 16, just contain in the code stream so coded_block_pattern this, and can calculate CodedBlockPatternLuma and these two variablees of CodedBlockPatternChroma by this value, formula is as follows:
If one 4 * 4 contain nonzero coefficient really, just need carry out dct transform.If nonzero coefficient has only one, and be DC component, two-dimensional dct transform can be done further simplification so.
3.2.4 other optimized Measures
For the buffer management module, should use the external memory zone of trying one's best few, reduce the exchanges data between cache (high-speed cache) and the memory block as far as possible.
For removing the mosaic filtration module,,, only be to do the calculating of the intensity on each bar limit to simplify so the optimization amplitude is little because algorithm is strict regulations in the standard.
Except above-described realization technology, the following optimization method that we have also adopted M.E.Lee to introduce: loop unrolling, loop distribution, loop interchange and cache optimize.
3.2.5 experimental result
H.264/AVC basic class decoder behind the algorithm optimization is done performance test, and table 6 has provided basic class decoder after optimizing is compared each several part with the basic class decoder of not optimizing acceleration multiple.By this table as seen, because the validity of algorithm that we propose is read residual error, motion compensation and inverse transform block and has all been improved execution speed greatly.Loop filtering module speed improves few, for about original twice.The decoder bulk velocity has been brought up to about original seven times.
Basic class level decoder core module is quickened multiple behind table 6 algorithm optimization
Table 7 has shown distribution consuming time and the decode time and the decoded frame rate of the nucleus module of optimizing the back decoder.Should show with table 4 contrast as seen, after the nucleus module optimization, each nucleus module after optimization in the decoder shared ratio all change.Integral body, motion compensation and the ratio of reading residual error descend a lot, and by contrast, the loop filtering module seems in the decoder that after optimization proportion is bigger.
Basic each main functional modules of class decoder distribution consuming time behind table 7 algorithm optimization
3.3 the multimedia instruction level is optimized
On the procedure basis that algorithm level is optimized, can further do the optimization of platform class (CPU or DSP).We are example with Intel CPU for this section, set forth the optimization effect of using the multimedia instruction collection to reach.
3.3.1Intel multimedia instruction collection brief introduction
In order to satisfy the needs that multimedia technology is handled mass data, Intel Company is in its 5th generation Intel 80 * 86 microprocessor Pentiums, added multimedia extension MMX instruction, data flow SIMD expansion SSE instruction and SSE2 instruction, formed Pentium II, Pentium III and Pentium IV microprocessor with multimedia processing capability.
The I.MMX command system
MMX (MultiMedia eXtension) means multimedia extension, is the formal microprocessor enhancement techniques of announcing of Intel Company in 1996.It has been the most great improvement since 32 80386 microprocessors of Intel introducing in 1985.Its core is the data characteristics in handling at multimedia messages, 57 multimedia instructions have been increased newly, greatly improve the performance of Pentium/PentiumPro microprocessor, made personal computer can move application programs such as figure, animation, audio frequency, video, communication and virtual reality more quickly.
The MMX art designs basic, the general deflation shaping of one cover instruction, totally 57, satisfied the needs of various multimedia applications substantially.So-called " tightening (Packed) integer data " is meant that a plurality of 8/16/32 integer data combinations becomes one 64 bit data.The MMX instruction mainly just is to use this deflation integer data, and it is divided into four kinds of data types again: tighten byte, tighten word, tighten double word, tighten 4 words.64 packed datas can be represented 8 bytes, 4 words, 2 double words or 14 word, multimedia software for a large amount of use 8/16/32 bit data, such MMX instruction just can be handled 8/4/2 data unit, structure that Here it is so-called " single-instruction multiple-data SIMD (Single Instruction Multiple Data) " simultaneously.This structure Fundamentals that to be the MMX technology improve machine performance.
For example, MMX instructs PADD[B, W, D] realize the addition of two packed datas.The operand of PADDB instruction is 8 pairs of octet data elements independently mutually, and the operand of PADDW instruction is 4 pairs of 16 digital data elements independently mutually, and the operand of PADDD instruction is 2 pairs of 32 double-word data elements independently mutually.Each data element addition forms result separately, and it doesn't matter each other and influence.In multimedia software, there are this data that need parallel processing in a large number.
In order to use 64 to tighten the integer data easily, the MMX technology contains 8 64 MMX register (MM0～MM7), have only the MMX instruction can use the MMX register.
The MMX instruction is divided into following several big class:
Arithmetic operation instruction---realize adding, subtracting and take advantage of operation to packed data; And have " around ", the characteristics of " saturated " and " taking advantage of-Jia "
Compare instruction---compare each data element (byte, word or double word) of two operands separately.If comparative result is true, corresponding data element is changed to complete 1 in the destination register; Otherwise, be changed to complete 0.
Type conversion instruction---the mutual conversion of various packed datas
Logic instruction---carry out logical operation in the position mode to 64, the result returns the MMX destination register.
Shift instruction---with the quantity of source operand appointment, the data element in each destination operand that is shifted (word, double word or 4 words).The source operand of specifying the displacement number can be that MMX register, memory data and 8 count immediately.
Data movement instruction---realize between MMX register and the MMX register and the transmission of 32/64 bit data between MMX register and the main memory.
State clearance order---zero clearing floating point notation word register
The II.SSE command system
Adopt the Pentium and the pentium ii processor of MMX instruction to obtain great success, promoted the development of multimedia application software, also the processor ability is had higher requirement simultaneously.Intel Company is at the application demand of the Internet, use the key technology " single instruction stream multiple data stream SIMD " of MMX instruction set, released (the data flow SIMD expansion: the Pentium III processor of instruction set Streaming SIMD Extensions) that has SSE in February, 1999.
Data flow SIMD expansion technique mainly provides following new expansion on original IA-32 programmed environment basis:
SSE instruction set with 70 instructions
Support 128 to tighten floating data (SIMD floating type)
8 SIMD floating data register XMM0～XMM7 are provided
The SSE instruction set has 70 instructions, and they can be divided into three groups: 50 SIMD floating point instructions, 12 SIMD integer instructions and 8 cache memory optimization process instructions.The SSE instruction set can be used under all IA execution patterns.
50 SIMD floating point instructions in the SSE technology are main instructions of SSE command system, also are the keys that Pentium III processor performance improves.
12 SIMD integer instructions are arranged in the SSE instruction set.This is in order to strengthen and to improve the MMX command system and the instruction that increases newly.It can further improve the quality of video and image processing so that the programmer improves algorithm.
The storage system of high-performance computer system has one-level or two levels of cache (be called for short high-speed cache, English is Cache) between central processor CPU and main storage, purpose is the access speed for expedited data and program.In order to control the operation of Cache better, improve the program running performance, the SSE technology has designed the optimization process instruction of 8 high-speed caches at Pentium III.
The III.SSE2 command system
The MMX command system mainly provides the ability of parallel processing integer data, and the SSE command system mainly provides the parallel processing capability of single-precision floating-point data.In November, 2000, Intel Company released Pentium 4 microprocessor, adopted the SIMD technology to add the SSE2 instruction again, had expanded the double-precision floating point parallel processing capability.The SSE2 instruction is intended to strengthen the ability of IA-32 microprocessor to aspects such as 3-D image, video code and decode, speech recognition, ecommerce, the Internets.
The SSE2 command system is keeping mainly having increased by 6 kinds of data types on the basis of compatibility with existing IA-32 microprocessor, application program and operating system, and the instruction of these data of parallel processing, makes that the whole multimedia instruction is more perfect.
The SSE2 command system comprises original 32 general registers of IA-32 microprocessor, 64 MMX registers, 128 XMM registers, also comprises 32 flag register EFLAGS and floating-point status/control register MXCSR; But do not introduce new register and instruction executing state.It mainly utilizes the XMM register to increase a kind of 128 deflation double-precision floating point data and 4 kinds 128 SIMD integer data types newly.
Tighten double-precision floating points (Packed dsouble-precision floating-point): this 128 bit data type is made up of two 64 double-precision floating pointses that meet ieee standard, tightens into two 4 digital data.
128 tighten integer (128-bit Packed integer): these four kinds 128 tighten the integer data and can comprise 16 byte integer, 8 word integers, 4 double-word integers or 24 word integers.
Topmost instruction is exactly the deflation double-precision floating point instruction at 128 and 64 bit manipulation patterns in the SSE2 command system.Also have 128 extended instructions, high-speed cache control and the instruction ordering instruction of 64 and 128 SIMD integer instructions, MMX and SSE technology in addition.For these data of the command request with 128 bit memory operands at the memory address of main memory 16 byte boundaries that align.But as except giving an order: the movupd instruction is supported not line up access, uses 64 bit manipulation mode instructions of 8 byte memory operands not to be subjected to the alignment restriction.
3.3.2 optimizing process and experimental result
Because have only Pentium 4 microprocessor just to support the SSE2 instruction, so consider the versatility of program code, we only use MMX and SSE instruction set to be optimized to decoder.Program after optimizing like this goes for the CPU of more manufacturer production.
From the angle of the characteristic of decoder data processing H.264/AVC, loop filtering and read two modules of residual error and be unwell to multimedia instruction and handle has only two module available multimedias instructions of motion compensation and inverse transformation to be optimized.Although after adopting the MMX instruction program feature is improved,, also needs to do further optimization process according to concrete MMX program in order to give full play to the advantage of Pentium/Pentium Pro structure treatment device.This mainly comprises: rationally arrange instruction, make many instructions as far as possible can be in the inner executed in parallel of processor; Rationally launch circulation, reduce because the time of branch's circulation cost; Determine align data border or the like.H.264/AVC basic each main functional modules of class decoder distribution consuming time was as shown in table 8 after multimedia instruction was optimized.By this table contrast table 7 as seen, two module proportions of motion compensation and inverse transformation further reduce, and loop filtering increases to some extent with the ratio of reading two modules of residual error.
Basic each main functional modules of class decoder distribution consuming time after the optimization of table 8 multimedia instruction
Basic class decoder after the optimization of multimedia instruction collection is compared with the basic class decoder behind the algorithm optimization, and the multiple of acceleration is as shown in table 9.By this table as seen, the speed of motion compensating module and inverse transform block is original 2～3 times, and the whole decoding speed of the basic class decoder after the optimization of multimedia instruction collection is about 1.3 times of basic class decoder behind the algorithm optimization.
Basic class decoder quickens multiple after the optimization of table 9 multimedia instruction
Algorithm optimization and multimedia instruction collection are optimized integration, compare with the speed of identifying code decoder after reducing, the acceleration multiple of each module is as shown in table 10.By this table as seen, motion compensating module optimization multiple is the highest, and inverse transform block is taken second place, be afterwards to read the residual error module, speed amplification minimum be the loop filtering module.Decoding speed is about nine times of version before not optimizing behind the two-stage optimizing.
Basic class decoder quickens multiple after table 10 algorithm and the multimedia instruction optimization
3.4 brief summary
The H.264/AVC optimisation technique of basic class decoder has been summed up in this part.We take up from identifying code JM6.1e, earlier JM6.1e are reduced, and obtain basic class decoder.Then this version is carried out analysis of complexity, determined the plurality of modules that needs emphasis to optimize: motion compensation, inverse transformation, loop filtering, entropy decoding and buffer management.3.2 joint has provided the algorithm level optimisation technique at each emphasis module with platform independence, this step optimization work is brought up to about original seven times decoding speed.3.3 joint has carried out the optimization work of multimedia instruction level on the version basis behind the algorithm optimization, make decoding speed that further lifting arranged, the decoding speed of final basic class decoder is not optimize about nine times of version.
4H.264/AVC Encoder Optimization
Lastly divide us to introduce H.264/AVC optimization the realizations technology of basic class decoder, H.264/AVC this part we will be devoted to the optimization work of class encoder substantially.For the optimization of decoder, because decoding algorithm and the syntactic element that must decode all are strict regulations in the standard, so it is more to optimize suffered restriction, the space that can put to good use is little.And encoder is quite different, realize a basic class encoder, and the code stream that only requires this encoder and generated can be correctly decoded by basic class decoder, and does not require that the code stream that this encoder generates contains all syntactic elements that basic class is contained.On the other hand, do not stipulate the algorithm that encoder should adopt in the standard yet.Like this, when realizing encoder, just can between coding rate and coding quality, do balance, according to the algorithm policy of specific application decision encoder employing.
The H.264/AVC encoder that we did is the same with decoder, also is at basic class, and begins to do from the cutting of identifying code JM61e.Reduction work is simple relatively, and basic identical with the reduction workflow of decoder, this part is no longer told about.We mainly concentrate on algorithm for the optimization work of basic class encoder H.264/AVC and improve, improved rate-distortion optimization (RDO) algorithm has been proposed, novel " skipping " macro block fast detection method, and the rapid movement searching algorithm, this part has been introduced the rate control algorithm and the experimental result that adopt at last.
4.1 improved RDO algorithm
Encoder adopts rate-distortion optimization, and (Rate Distortion Optimization, RDO) algorithm is chosen best macro-block coding pattern and motion vector.The basic principle of RDO is to minimize D+LxR, and wherein D is distortion, and R is a bit rate, and L is the Lagrangian factor (Lagrange Multiplier).Distortion is more little, and bit rate is low more, and then coding efficiency is good more.T.Wiegand (2002) has concluded the RDO process that adopts in the identifying code in JVT-B118r8, algorithm is as follows:
A) given reference frame, the Lagrangian factor, and macroblock quantization parameter.The Lagrange factor is as follows:
L MODE＝0.85×2 QP/3，
B) for the intra-frame 4 * 4 macro block mode, for each piece of 4 * 4 selects to make the intra prediction mode of following formula minimum:
J(s，c，IMODE|QP，λ MODE)＝SSD(s，c，IMODE|QP)+λ MODE·R(s，c，IMODE|QP)
C) determine to make 16 * 16 predictive modes in the optimum frame of SATD minimum;
D), choose motion vector and reference frame by making following formula reach minimum value for each 8 * 8:
Determine the coding mode of each 8 * 8 subregion again with the method that minimizes following formula,
Here SSD is the difference of two squares of original block s and reconstruction signal c (through DCT, quantification, and IDCT):
E) choose 16 * 16 by minimizing following formula, 16 * 8 and 8 * 16 motion vector and reference frame
F) choose the macroblock prediction pattern by minimizing following formula
J(s，c，MODE|QP，L MODE)＝SSD(s，c，MODE|QP)+L MODE·R(s，c，MODE|QP)
For different patterns, QP and L ModeFix.The span of macro block mode (MODE) is as follows:
The I frame: MODE ∈ INTRA4 * 4, INTRA16 * 16},
The P frame:
So far, the coding mode of current macro is selected to finish.
The RDO algorithm of identifying code is fine qualitatively, and is still very unsatisfactory on execution speed.From raising speed and guarantee not influence substantially the prerequisite of quality, we have done significantly to improve to the rate distortion algorithm.Rate-distortion optimization algorithm flow after the improvement is as follows:
A) given reference frame, the Lagrangian factor, and macroblock quantization parameter.The Lagrange factor is as follows:
L MODE＝0.85×2 QP/3，
B) if be the I frame, leap to step h);
C) detect " skipping " pattern that whether can adopt, if available " skipping " pattern then detects and finishes;
D) choose 16 * 16 by minimizing following formula, 16 * 8,8 * 16 and 8 * 8 motion vector and reference frame:
E) if d) adopt 8 * 8 cost minimum in the step, then change step f), otherwise change step g);
F) reach motion vector and reference frame that minimum value is chosen each fritter for each 8 * 8 subregion by making following formula:
Determine the coding mode of each 8 * 8 subregion again with the method that minimizes following formula:
G) if J greater than threshold value (for example 512), changes step h) carry out detecting in the frame, otherwise change step j);
H), each piece of 4 * 4 is selected to make the intra prediction mode of following formula minimum for the intra-frame 4 * 4 macro block mode:
J(s，c，IMODE|QP，λ MODE)＝SSD(s，c，IMODE|QP)+λ MODE·R(s，c，IMODE|QP)
I) determine to make 16 * 16 predictive modes in the optimum frame of SATD minimum;
J) final step: choose the macroblock prediction pattern by minimizing following formula:
J(s，c，MODE|QP，L MODE)＝SSD(s，c，MODE|QP)+L MODE·R(s，c，MODE|QP)。
The RDO algorithm that the improved rate-distortion optimization algorithm that we proposed adopts in the identifying code has huge improvement on speed, be mainly derived from following aspect:
1), at first does " skipping " mode detection, (when promptly detecting with motion vectors as current macro 16 * 16 motion vectors for the macro block of P frame, whether CBP is 0, the fast algorithm of this detection is seen next joint of this part), if can adopt " skipping " pattern, then directly stop the RDO flow process.
2) " skip " mode detection after, directly detect bigger piece, promptly 16 * 16,16 * 8,8 * 16 and 8 * 8, only at the comparative result of these four kinds of pieces for selecting for use under 8 * 8 the condition, just carry out the detection of the piece of 8 * 8 following sizes.
3) if the interframe testing result has made distortion less than predefined threshold value, the then detection of skipped frame internal schema.
4) if need further to improve coding rate, then can not select the Hadamard conversion for use, promptly directly use SAD SA in the formula (T) D part.
4.2 " skip " the macro block fast detection method
Usually, the information that comprises in the code stream of the macro block of intraframe coding is: macro block (mb) type, luma prediction modes, prediction mode for chroma, QP difference, CBP and residual error.The information that comprises in the code stream of the macro block of interframe encode is: macro block (mb) type, the difference of reference frame index, motion vector and motion vectors, QP difference, CBP and residual error.
Macro block in the P frame allows to adopt " skipping " pattern, and the advantage of the macro block of " skipping " pattern-coding is that the code stream that uses is considerably less, only needs to transmit macro block type information.A macro block adopts " skipping " pattern must satisfy following five conditions:
1) present frame is the P frame;
2) the forced coding pattern of current macro is 16 * 16;
3) cbp of current macro equals zero;
4) index value of reference frame in reference frame lists of current macro use is 0;
5) optimum movement vector of current macro equals 16 * 16 motion vectors.
We propose to adopt four following steps to carry out the fast detecting of " skipping " pattern:
Step 1: calculate 16 * 16 motion vector predictor
Because encoding motion vector needs a large amount of bits, so video standard is general all according to the correlation of contiguous block motion vector, predict the motion vector of current block, the difference MVD of current motion vector and the motion vectors of only encoding then with the motion vector of contiguous block.H.264/AVC standard has also adopted identical strategy, still, and in view of standard H.264/AVC allows the block type that adopts more, so it is also different to form the method for motion vectors MVp.Whether the calculating of MVp and piece size and contiguous motion vector exist all relation, briefly introduces the computational methods of 16 * 16 motion vector predictor below.
If E is a current macro, A is the piece in the left side adjacent with E, and B is the piece of the upside adjacent with E, and C is the piece of E upper right side.If more than one of the adjacent piece in E left side, so uppermost is A.If more than one of the piece that the E upside is adjacent, the piece on the so left side is B.Figure 20 (a) for example understands the selection (being 16 * 16 in this example) of contiguous block when the size of all pieces is all identical; The mode of choosing of prediction piece when Figure 20 (b) has provided contiguous block and current block and varies in size.
For " skipping " macro block, 16 * 16 motion vectors MVp are A, the intermediate value of the motion vector of B and three pieces of C.If A has non-existent (for example outside current band, or the place macro block has adopted intraframe coding method) in three pieces of B and C, the account form of MVp is corresponding makes an amendment.
Step 2: whether the cbp value of calculating the Y component is 0
With motion vectors MVp is current motion vector, calculates the predicted value of Y component, deducts predicted value with the original value of current block then and obtains residual error.Residual block to 16 4 * 4 is done dct transform, quantizes then, and any one 4 * 4 have non-vanishing coefficient is 1 also termination detection with regard to the value of putting cbp.Here the inventor has proposed to detect 4 * 4 fast algorithm ETAL (Early Termination Algorithm for Luma) that whether contain nonzero coefficient, brief description algorithm principle.
H.264 adopt 4 * 4 integer transforms, the formula of this conversion is as follows:
Wherein, CXC TBe core 2-D conversion.E is the contraction-expansion factor matrix, and symbol represents CXC TEach element and the contraction-expansion factor of E matrix correspondence position multiply each other (scalar multilication, rather than matrix multiplication).
H.264/AVC the forward quantization operation of Cai Yonging is as follows:
Z ij＝round(Y ij/Qstep)
Wherein, Y IjBe the coefficient of matrix after the conversion, Qstep is a quantization step, Z IjBe the coefficient after quantizing.H.264/AVC stipulated the value of 52 Qstep altogether in the standard, and come index with quantization parameter QP.H.264/AVC quantization step interval wants big than other standards, and this has just guaranteed that encoder can accurately control code check and quality flexibly.
Can do following supposition: if the DC component of coefficient is zero after the conversion, so all alternating current components just are zero entirely.By the character (concentration of energy is in low frequency part after the conversion) of dct transform as can be known, this hypothesis is rational.For one 4 * 4 piece, the value of the DC component after the quantification is:
Therefore, if satisfy following formula, the value of the DC component after quantizing so just is zero,
Wherein, qbits=15+QP/6, q Rem=QP%6, f=(1＜＜qbits)/6, QE is the quantization parameter table, QP is a quantization parameter.
Step 3: whether the cbp value of calculating the U component is 0
Calculate earlier the predicted value of U component, deduct predicted value with the original value of current block then and obtain 8 * 8 residual matrix.Respectively 44 * 4 fritter is asked dct transform, the DC component of each coefficient of 4 * 4 is extracted constitute one 2 * 2 matrix W D, this matrix of 2 * 2 needs to carry out the Hadamard conversion according to following formula again.
The cbp value of U component depends on Y DQuantification after the value of coefficient and the ac coefficient behind each 4 * 4 fritter dct transform value after quantizing.As long as there is the coefficient after the quantification non-vanishing, the cbp value of U component is not 0 just, can stop detecting.Here the inventor has proposed to detect the fast algorithm ETAC (Early Termination Algorithmfor Chroma) whether 8 * 8 aberration pieces contain nonzero coefficient, and algorithm principle is as follows:
Can do following supposition: because W DMatrix stack has suffered whole DC component of 8 * 8, so if Y DCoefficient behind four element quantizations of matrix is zero entirely, and the cbp value of U component just is 0 so.
W DFour component computing formula as follows:
Y DFour components be calculated as follows:
Y D(0，0)＝W D(0，0)+W D(0，1)+W D(1，0)+W D(1，1)，
Y D(1，0)＝W D(0，0)-W D(0，1)+W D(1，0)-W D(1，1)，
Y D(0，1)＝W D(0，0)+W D(0，1)-W D(1，0)-W D(1，1)，
Y D(1，1)＝W D(0，0)-W D(0，1)-W D(1，0)+W D(1，1)。
Y DThe quantitative formula of each component is:
(|Y D(i，j)|×QE[q rem]+2×f)＞＞(qbits+1)
Therefore, if Y DFour components all satisfy following formula, the cbp value of U component just equals 0 so.
|Y D(i，j)|＜((2( qbits+1)-2×f)/QE[q rem])
Wherein, qbits=15+QP_SCALE_CR[QP]/6, q Rem=QP_SCALE_CR[QP] %6, f=(1＜＜qbits)/6, QE is the quantization parameter table, and QP is a quantization parameter, and QP_SCALE_CR is the constant table.
Step 4: whether the cbp value of calculating the V component is 0
Whether calculate earlier the predicted value of V component, deduct predicted value with the original value of current block then and obtain 8 * 8 residual matrix, be 0 with ETAC method detection cbp value again.
4.3 rapid movement searching algorithm
Identical with early stage video standard (H.261, MPEG-1, MPEG-2 is H.263, with MPEG-4), H.264/AVC also adopt the hybrid encoding frame of motion search and conversion.In hybrid encoding frame, mainly excavate time domain redundancy between the successive frame by motion search, test shows, motion search is a module the most consuming time in this hybrid encoding frame.In H.264/AVC, in order to improve precision of prediction and to increase compression performance, adopted multiple predictive mode, multiframe reference and high-precision motion vector again, the result causes the complexity of motion search and computational load to increase greatly.Experiment shows that with reference under the situation, H.264/AVC motion search accounts for and always encode 60% of the time at single frames; And under the situation of 5 frame references, H.264/AVC motion search will account for and always encode 80% of the time.If utilization rate aberration optimizing (RDO) not perhaps adopts bigger hunting zone (for example 48 or 64), motion search shared proportion consuming time is just bigger so.
Usually, motion search comprises two steps: the one, and integral point motion search, the 2nd, near the fractional point motion search best integral point.For the fractional point motion search, H.263, MPEG-1, MPEG-2 and MPEG-4 adopt 1/2 pixel precision, and H.264/AVC and the advanced class (advanced profile) of MPEG-4 video standard (part 2) adopt 1/4 pixel precision to reach higher motion to describe precision and higher compression performance.
The rapid movement searching algorithm is the research focus of video field always, and the heat that the integral point fast search algorithm is pined for especially.Existing integral point fast search algorithm generally all adopts different step-size in searchs and search pattern, has so not only guaranteed to reduce computation complexity but also can guarantee video quality.Classical motion estimation algorithm comprises the three step search methods (TSS) of (1981) propositions such as T.Koga, the two dimensional logarithmic search method (2-D LOGS) that A.Jain (1981) proposes, the block-based gradient search procedure (BBGDS) that L.Liu and E.Feig (1996) propose, the four step search methods (FSS) that L.M.Po and W.C.Ma (1996) propose, hexagon search method (HEXBS) that C.Zhu etc. (2002) propose or the like.
On the basis of forefathers' algorithm, at the exclusive feature of motion search H.264/AVC, we have proposed to be suitable for the improved hexagon fast algorithm of integral point motion search and the rhombus fast algorithm that is suitable for the fractional point motion search, are set forth respectively below.
4.3.1 quick integral point searching algorithm
Briefly introducing the integral point motion estimation algorithm that adopts in the identifying code below earlier, promptly is the complete searching method of spirality of central point with the motion vectors.This searching method is central point with the motion vectors, then according to the spirality pointwise judge in the motion search range have a few.It is 2 o'clock search order that Figure 21 has provided motion search range.Actual motion search range should equal 16 in general at least.
The shortcoming of the maximum of this complete searching method is that search point is too much.Suppose that the hunting zone is 16, will search for (16+1+16) * (16+1+16) individual point so, i.e. 1089 points.Rapid movement searching algorithm in the existing list of references all is at this problem basically, under the prerequisite of ensuring the quality of products to reduce the purpose that search point reaches quick search as far as possible.
The integral point motion estimation algorithm that we proposed fully combines the demand of RDO simultaneously again also for same consideration, and algorithm flow is as follows:
Step 1: as shown in figure 20, the motion vector predictor of current block is the intermediate value of motion vector of the piece of the left side (Mv_A), upside (Mv_B) of current block and upper right side (Mv_C), promptly pred_mv=median (Mv_A, Mv_B, Mv_C).With the motion vectors is optimum movement vector, and asks the rate distortion costs (rate-distortion cost) under this optimum movement vector.
Step 2: calculate the rate distortion costs of (0,0) motion vector, and compare with the rate distortion costs of motion vectors Pred_mv, the motion vector that cost is little is an optimum movement vector.
Step 3: if the current block size is 16 * 16, calculate Mv_A respectively, the rate distortion costs of Mv_B and Mv_C, and compare with the rate distortion costs of optimum movement vector, the motion vector of rate distortion costs minimum is an optimum movement vector.
Step 4: if current block differs in size in 16 * 16, calculate the rate distortion costs of the motion vector of last layer piece, and compare with the rate distortion costs of optimum movement vector, the motion vector that rate distortion costs is little is an optimum movement vector.Illustrate: this RDO algorithm adopts the descending search order of piece; Like this, 16 * 16 is pieces of 16 * 8 and 8 * 16 last layer, and 8 * 16 is pieces of 8 * 8 last layer, and 8 * 8 is pieces of 8 * 4 and 4 * 8 last layer, and 4 * 8 is pieces of 4 * 4 last layer.
Step 5: with the optimum movement vector is the center, by its big hexagonal six points on every side of search shown in Figure 22.If the point of rate distortion costs minimum is a central point, stop the search of this step so; Otherwise the point with the rate distortion costs minimum is new central point, carries out big hexagon search again, till the point of certain searching rate distortion cost minimum is central point.Be to accelerate search speed, the high reps that can limit that big hexagon search carries out is as 16 times.
Step 6: with the optimum movement vector is the center, by its little hexagonal four points on every side of search shown in Figure 22.If the point of rate distortion costs minimum is a central point, stop the search of this step so; Otherwise the point with the cost minimum is new central point, carries out little hexagon search again, till the point of certain searching rate distortion cost minimum is central point.
Step 7: search stops, and current optimum movement vector is the best integral point motion vector of the piece of current location, current size.
4.3.2 rapid fraction point search algorithm
The fractional point motion search carries out in eight windows that integral point is surrounded adjacent with best integral point, and as shown in figure 23, capitalization is represented the integral point position among the figure, the numeral half-pixel position, and lowercase is represented 1/4 location of pixels.The fractional point method for searching motion that identifying code adopted is carried out in two steps suddenly, and search procedure is as follows: suppose that best integer search point is E, so at first searching for label is 1,2,3,4,5,6,7,8 half-pix point; Suppose that 7 are best half-pixel position, searching for label so again is a, b, c, d, e, f, g, 1/4 pixel of h.
This way of search of identifying code requires 16 fractional point of search could determine 1/4 best precision motion vector.If also used the Hadamard conversion simultaneously, the complexity of fractional point motion search is just very big so.Like this, compare with the integral point method for searching motion of updating, the fractional point motion search just becomes the bottleneck place of encoder execution speed.
We find, the diamond search pattern of Figure 22 shown in 2. be for the integral point motion search, not only simply but also effective, therefore, we expand to this diamond search pattern in the fractional point motion search process, to reach the minimizing search point, improve the purpose of execution speed.
This rapid fraction point method for searching motion implementation is described below: with best integral point is central point, searches for four summits of the rhombus of 1/4 pixel step length.If the rate distortion costs minimum of central point, integral point is the optimum position so, stops search; Otherwise, be central point with the point of rate distortion costs minimum, continue in the mark search window, to search for, till the point that is performed until certain rate distortion costs minimum is central point with diamond pattern.
4.4 Rate Control
Rate Control is the necessary component of encoder, and its function is to generate high-quality code stream under given bit rate.Rate Control is at MPEG-2, and H.263 MPEG-4 waits and obtained extensive studies (H.J.Lee, 2000 in the video standard; A.Vetro, 1999; J.R.Corbera, S.Lei, 1999; T.H.Chiang, Y.Q.Zhang, 1997; S.Kondo, H.Fukuda, 1997; L.M.Wang, 2001; W.Ding, B.Liu, 1996; S.H.Hong, 2003).But, rate control algorithm H.264/AVC (S.W.Ma, 2003; Z.G.Li, 2003) than the rate control algorithm complexity of other standard many.This is because in H.264/AVC, two algorithms of Rate Control and rate-distortion optimization all will use quantization parameter, so just caused following contradiction:, need determine quantization parameter according to the MAD (Mean Absolute Difference) of present frame or macro block from the angle of rate control algorithm; And the MAD of present frame or macro block is only at definite quantization parameter and carry out RDO and just can obtain later on.
The CBR algorithm that our rate control algorithm has adopted Z.G.Li (2003) to propose, this self-adaption code rate control algolithm are based on elementary cell and linear model.The basic step of Rate Control is as follows:
1. according to liquid flow traffic model, linear target bits of following the tracks of theory and bound calculating present frame.
2. with even all elementary cells to be encoded of present frame of distributing to of the remaining bits number average of present frame.
3. according to linear model, the actual MAD value of the elementary cell of the same position of employing former frame is predicted the MAD value of the current elementary cell of present frame.
4. calculate the corresponding quantization parameter with the secondary rate-distortion model.
5. to each macro block implementation rate aberration optimizing of current elementary cell.
When elementary cell was frame, this rate control algorithm comprised two-layer: control of GOP layer bit rate and the control of frame layer bit rate.When elementary cell during, also need add the 3rd layer in the whole rate control algorithm: the control of elementary cell layer bit rate less than frame.
4.5 other optimized Measures and experimental result
Lastly divide us to introduce multimedia instruction collection optimisation technique, i.e. MMX, the use of SSE and SSE2 instruction based on Intel CPU.For encoder, the optimization of this multimedia instruction collection seems particularly important.The one, because encoder is much higher than the decoder computation complexity, the machine that should select the higher gears configuration for use to be reaching the demand of real-time coding, and the 2nd, because multimedia instruction itself relatively is suitable for the demand of encoder algorithm.
Other skills that improve encoder speed also have: use less extra buffer to avoid visiting big global variable as far as possible more; To its data boundary; The pointers or the like that use more.
Through our optimization, the speed of encoder has had significantly lifting, and qualitative loss is very little.In order to provide a comparative result, we use the encoder after the optimization to come six standard test sequences among recompile Figure 16, and still adopt the configuration of table 2, and new coding efficiency is as shown in table 11.
Encoder performance after table 11 is optimized
Contrast table 11 and table 3, encoder and the difference of encoder on performance of not optimizing after just having obtained optimizing, this species diversity is provided by table 12.The value that the 3rd, 4,5,6 row are respectively in the table 11 in the table 12 deducts the difference that the analog value in the table 3 obtains.Last classifies the multiple that speed improves, the just ratio of frame per second as.By table 12 as seen, the optimisation strategy that we adopted has exchanged the raising of tens times of coding rates for less distortion (being up to 0.16db).
Compare performance difference after table 12 Encoder Optimization with before optimizing
Above test result be at 300 frames with interior CIF (352 * 288) standard test sequences, and under fixing QP situation, obtain.We know that main an application of encoder is compressed motion picture, TV programme, provides the subjectivity and the objective examination result of different code check lower compression films " deep in enemy rear " below.Video coding is provided with as follows: VGA size (640 * 480), I frame detect automatically, I interframe largest interval is that 130 frames, frame per second were 24 frame/seconds, and the QP maximum is 32, the QP minimum value is 22, carries out Rate Control by macro block.Audio coding is set to: coding standard is MP3, and code check is 48Kbit/s, and sample frequency is 24000Hz.This DVD film original size is 4.29GB.
Table 13 has provided the objective coding efficiency (SNR) of this film under the different code checks and the total size that adopts the audio-video document that generates after the above-mentioned configuration.Figure 24 has provided the former figure of a certain frame in this film and the reconstructed image after the compression.Contrast four width of cloth pictures as seen, compression has caused the loss of detailed information, and code stream is low more, and the detailed information loss is many more.As a whole, the code stream of 800K can reach the quality of DVD substantially, still, is the MPEG2 coding because DVD adopts, so the total file size of DVD is about 7 times of 800K ASCII stream file ASCII H.264/AVC.For band-limited video network program request, the code stream of 400K is a very good selection.Certainly, under with the code check situation, encoding with little size can further raising video quality.
The coding efficiency of film " deep in enemy rear " under the different code checks of table 13
4.6 brief summary
This part has provided the complete H.264/AVC algorithm optimization strategy of basic class encoder, has proposed improved rate-distortion optimization (RDO) algorithm, novel " skipping " macro block fast detection method, and rapid movement searching algorithm.Experimental result shows, the optimisation strategy that we adopted has exchanged the raising of 29～47 times of coding rates for less distortion (being up to 0.16db).
5H.264/AVC prospect forecast
H.264/AVC be towards in the international video standard used of low code check.Higher compression ratio that this standard provided and better channel adaptability make it have broad application prospects in video industry circle, for example: the multi-point on real-time video communication, internet video transmission, video stream media service, the heterogeneous network, compressed video storage, video database etc.This standard can be widely applied to a plurality of fields such as long-distance education, video conference, video monitoring, video storage and transmission, Streaming Media making, may operate in the transmission network of different mediums such as IP network, HFC wired network and mobile radio communication.
In addition, the MPEG LA of mpeg standard Patent Co., Ltd announces final MPEG-4 AVC scale fee policy on November 17th, 2003, and every encoding and decoding product facility license fee is the highest 0.2 dollar, has reduced an order of magnitude for 2.5 dollars every than MPEG-2.MPEG LA also comprises clause according to content and encoding and decoding time charge to the charging mode of MPEG-4 before this, and a digital television receiving apparatus need be paid more than 10 yuan encoding and decoding technique expense in every month by watching that two hours programs calculate every day.The policy of current issue has been cancelled the pattern according to time charge, but still to come with the year by parameters such as program, subscriber's number or local transmitting station numbers to program operator be that unit imposes license fee.From the angle of patent charge, H.264/AVC also be doomed and obtain more and more widely application in the communication of digital video or field of storage.
From the practicability of video coding and decoding technology H.264/AVC, the angle of commercialization, aspect core technology H.264/AVC, also should carry out the further investigation of following aspect:
(1) the H.264/AVC algorithm research of video standard and foundation code optimization
This had both comprised that the codec MMX/SSE/SSE2 at the PC platform optimized, and comprised that also the pure C of embedded system codec at non-Intel/AMD platform optimizes.Though we have done a large amount of research in this respect, the development has no limits for algorithm, As time goes on, must work out faster and better algorithm and implementation.
(2) the H.264/AVC parallel processing of video coding algorithm research and realization
The coded program that identifying code is realized is based on uniprocessor, and from the angle of hardware advances, has the CPU of two cpu logics (hyperthread) and the DSP of double-core at present.So how to utilize this class dual processor, reach the high degree of parallelism of coding flow process? angle from H.264/AVC standard formulation provides certain parallel mechanism, such as the partitioning technology of slice-group/band/macro block.But, from the angle of real realization, also exist such or such difficulty, such as stationary problem, remove mosaic filtering problem or the like.So,, H.264/AVC encryption algorithm and handling process are transformed, could really be reached efficient parallel, thereby on hyperthread CPU or double-nuclear DSP cheaply, realize the real-time coding of D1 size only at specific dual processor.
(3) H.264/AVC the video standard code check is changed and transcoding technology
H.264/AVC video standard code check switch technology refers to the conversion of H.264/AVC high code check to low code check.
Transcoding technology refer to H.264/AVC, mutual conversion between MPEG 2 and 4 three standard code streams of MPEG.
(4) tracking of video encoding and decoding standard development and new technology
In the progress of tight tracking video standard tissue, make great efforts new product, the new technology of the famous in the world video technique of research provider, for example; Company standards such as Real, Quicktime, Divx, Xvid are learnt from other's strong points to offset one's weaknesses.
The present domestic emerging AVS working group (digital audio/video encoding and decoding technique standardization effort group) of what deserves to be mentioned is.In in June, 2002 approval establishment, affiliate is an Inst. of Computing Techn. Academia Sinica by science and technology department of the Ministry of Information industry in this working group.The task of working group is: towards the information industry demand of China, associating domestic enterprise and scientific research institution, system (repairing) is ordered the common technology standards such as compression, decompression, processing and expression of digital audio/video, for digital audio/video equipment and system provide the encoding and decoding technique of high-efficiency and economic, serve great information industry such as high-resolution digital broadcasting, high-density laser digital storage media, WiMAX multi-media communication, the Internet broadband Streaming Media and use.The official website of AVS working group is http://www.avs.org.cn.
The AVS standard is the abbreviation of " information technology advanced audio/video coding " series standard." information technology advanced audio/video coding " series standard comprises support standards such as three main standard such as system, video, audio frequency and uniformity test." first: system " and the international standard MPEG-2 compatibility that extensively adopts at present, and carried out concrete regulation and definition at application such as Digital Television, CD player, network flow-medium, multimedia communications, provide support from framework to domestic and international various main flow videos, audio coding standard." second portion: video " is the most complicated part in the AVS standard, and code efficiency is 2-3 a times of MPEG-2 video, and implementation complexity is suitable for handling HDTV obviously than H.264/AVC low." third part: audio frequency " adopts the mainstream technology framework, is a cover performance and the similar audio coding scheme of international standard.We also wait and see whether to become national standard as for the AVS standard.
Present stage, commercial promise H.264/AVC mainly comprised: IP video set-top box application, H.264 the encoding and decoding dsp chip is used, the Streaming Media application solution of the very low code check of mobile platform (GPRS/CDMA 1X) is used, IP audio frequency and video work compound is used (video conference, video telephone, Video chat, video monitoring) or the like, and Figure 25 is H.264/AVC media industry chain graphic extension.Be exemplified below based on the product that H.264/AVC video technique is relevant:
(1) encoding and decoding developing instrument SDK software product H.264/AVC
At the H.264/AVC encoding and decoding SDK software product in different application field, the user uses the SDK developing instrument can use the video items exploitation, for example; Application such as network video-on-demand, video conference, video telephone, Video chat, video monitoring.
(2) realize the H.264/AVC dsp chip product of coding and decoding video
These series of products are to realize H.264/AVC code decode algorithm at the specific model DSP of the dsp chip manufacturer of main flow (as AnalogDevices, Texas Instruments Incorporated, Philips) in the world, make the dsp chip sale of rising in value.
(3) H.264/AVC encoding and decoding audio/video coding, transcoding, program making system product
This product is towards medium unit or video increment operator, carry out the Software tool that digital video programs is made, edited, it both can be used as Software tool and had sold separately, and can add hardware and specialized collection, output equipment (video program production work station) package sale again.Figure 26 and Figure 27 are respectively and make platform and play example.
(4) video IP set-top box product H.264/AVC
IP video set-top box application comprises three parts: the one, and terminal is video decode player software (Win CE or Linux platform) and high-end H.264/AVC encoded video programme acquisition manufacturing system H.264/AVC; The 2nd, IP video network application management software; The 3rd, top-set hardware.
(5) smart mobile phone DST PLAYER product
The mobile flow medium application technology relates to each link of hardware, software and system integration.Developed at present wireless public network (GPRS/CDMA 1X) the Streaming Media application system based on H.264/AVC.This software mainly is at the research and development of different cell phone platform version products and perfect, on PocketPC, Smatrphone2003, Brew platform, realize at present, following research and development of products mainly concentrates on the realization on Linux, Plam, the Symbian platform, makes the better adaptability of product.
Priority Applications (1)
|Application Number||Priority Date||Filing Date||Title|
|CN 200510066767 CN1870748A (en)||2005-04-27||2005-04-27||Internet protocol TV.|
Applications Claiming Priority (1)
|Application Number||Priority Date||Filing Date||Title|
|CN 200510066767 CN1870748A (en)||2005-04-27||2005-04-27||Internet protocol TV.|
|Publication Number||Publication Date|
|CN1870748A true CN1870748A (en)||2006-11-29|
Family Applications (1)
|Application Number||Title||Priority Date||Filing Date|
|CN 200510066767 CN1870748A (en)||2005-04-27||2005-04-27||Internet protocol TV.|
Country Status (1)
|CN (1)||CN1870748A (en)|
Cited By (6)
|Publication number||Priority date||Publication date||Assignee||Title|
|CN102186079A (en) *||2011-05-11||2011-09-14||北京航空航天大学||Motion-vector-based H.264 baseline profile intra mode decision method|
|CN102244781A (en) *||2011-03-31||2011-11-16||苏州汉辰数字多媒体有限公司||Audio video coding standard (AVS) video coder adopting high speed multi-core digital signal processor (DSP) platform|
|CN101742295B (en) *||2008-11-14||2012-11-28||北京中星微电子有限公司||Image adaptive strip division-based adaptive frame/field encoding method and device|
|CN101918937B (en) *||2007-12-05||2014-03-05||欧乐2号公司||System for collaborative conferencing using streaming interactive video|
|CN103891288A (en) *||2011-11-07||2014-06-25||株式会社Ntt都科摩||Video prediction encoding device, video prediction encoding method, video prediction encoding program, video prediction decoding device, video prediction decoding method, and video prediction decoding program|
|CN104717498A (en) *||2010-04-23||2015-06-17||M&K控股株式会社||Image encoding apparatus|
- 2005-04-27 CN CN 200510066767 patent/CN1870748A/en not_active Application Discontinuation
Cited By (39)
|Publication number||Priority date||Publication date||Assignee||Title|
|CN101918937B (en) *||2007-12-05||2014-03-05||欧乐2号公司||System for collaborative conferencing using streaming interactive video|
|CN101742295B (en) *||2008-11-14||2012-11-28||北京中星微电子有限公司||Image adaptive strip division-based adaptive frame/field encoding method and device|
|CN105245887B (en) *||2010-04-23||2017-09-19||M&K控股株式会社||Image encoding apparatus|
|CN104717498B (en) *||2010-04-23||2017-09-19||M&K控股株式会社||For the equipment to Image Coding|
|CN105245884B (en) *||2010-04-23||2018-02-02||M&K控股株式会社||For the apparatus and method to Image Coding|
|CN104717498A (en) *||2010-04-23||2015-06-17||M&K控股株式会社||Image encoding apparatus|
|CN105120274A (en) *||2010-04-23||2015-12-02||M&K控股株式会社||Apparatus and method for encoding image|
|CN105120273A (en) *||2010-04-23||2015-12-02||M&K控股株式会社||Apparatus and method for encoding image|
|CN105245887A (en) *||2010-04-23||2016-01-13||M&K控股株式会社||Apparatus and method for encoding image|
|CN105245885A (en) *||2010-04-23||2016-01-13||M&K控股株式会社||Apparatus and method for encoding image|
|CN105245886A (en) *||2010-04-23||2016-01-13||M&K控股株式会社||Apparatus and method for encoding image|
|CN105245883A (en) *||2010-04-23||2016-01-13||M&K控股株式会社||Apparatus and method for encoding image|
|CN105245877A (en) *||2010-04-23||2016-01-13||M&K控股株式会社||Apparatus and method for encoding image|
|CN105245884A (en) *||2010-04-23||2016-01-13||M&K控股株式会社||Apparatus and method for encoding image|
|CN105245877B (en) *||2010-04-23||2017-11-10||M&K控股株式会社||For the apparatus and method to Image Coding|
|CN105245886B (en) *||2010-04-23||2017-11-10||M&K控股株式会社||For the apparatus and method to Image Coding|
|CN105245885B (en) *||2010-04-23||2017-11-10||M&K控股株式会社||For the apparatus and method to Image Coding|
|CN105245883B (en) *||2010-04-23||2017-11-10||M&K控股株式会社||For the apparatus and method to Image Coding|
|CN105120273B (en) *||2010-04-23||2017-10-24||M&K控股株式会社||For the apparatus and method to Image Coding|
|CN105120274B (en) *||2010-04-23||2017-09-19||M&K控股株式会社||For the apparatus and method to Image Coding|
|CN102244781A (en) *||2011-03-31||2011-11-16||苏州汉辰数字多媒体有限公司||Audio video coding standard (AVS) video coder adopting high speed multi-core digital signal processor (DSP) platform|
|CN102186079A (en) *||2011-05-11||2011-09-14||北京航空航天大学||Motion-vector-based H.264 baseline profile intra mode decision method|
|CN107105283A (en) *||2011-11-07||2017-08-29||株式会社Ntt都科摩||Dynamic image prediction decoding device and method|
|US9788005B2 (en)||2011-11-07||2017-10-10||Ntt Docomo, Inc.||Video prediction encoding device, video prediction encoding method, video prediction encoding program, video prediction decoding device, video prediction decoding method, and video prediction decoding program|
|CN107071465A (en) *||2011-11-07||2017-08-18||株式会社Ntt都科摩||Dynamic Image Prediction Decoding Device And Method|
|CN107071466A (en) *||2011-11-07||2017-08-18||株式会社Ntt都科摩||Dynamic Image Prediction Decoding Device And Method|
|CN106658025A (en) *||2011-11-07||2017-05-10||株式会社Ntt都科摩||Dynamic video prediction encoding device and methiod|
|US9615088B2 (en)||2011-11-07||2017-04-04||Ntt Docomo, Inc.||Video prediction encoding device, video prediction encoding method, video prediction encoding program, video prediction decoding device, video prediction decoding method, and video prediction decoding program|
|CN103891288B (en) *||2011-11-07||2016-12-21||株式会社Ntt都科摩||Dynamic image prediction decoding device and method|
|US9838708B2 (en)||2011-11-07||2017-12-05||Ntt Docomo, Inc.|
|CN103891288A (en) *||2011-11-07||2014-06-25||株式会社Ntt都科摩|
|US9973775B2 (en)||2011-11-07||2018-05-15||Ntt Docomo, Inc.|
|CN107071466B (en) *||2011-11-07||2018-09-14||株式会社Ntt都科摩||Dynamic image prediction decoding device and method|
|US10104393B2 (en)||2011-11-07||2018-10-16||Ntt Docomo, Inc.|
|US10104392B2 (en)||2011-11-07||2018-10-16||Ntt Docomo, Inc.|
|CN107071465B (en) *||2011-11-07||2019-01-29||株式会社Ntt都科摩||Dynamic image prediction decoding device and method|
|CN106658025B (en) *||2011-11-07||2019-07-23||株式会社Ntt都科摩||Dynamic image prediction decoding device and method|
|CN107105283B (en) *||2011-11-07||2019-07-30||株式会社Ntt都科摩||Dynamic image prediction decoding device and method|
|US10484705B2 (en)||2011-11-07||2019-11-19||Ntt Docomo, Inc.|
|US9955161B2 (en)||Apparatus and method of adaptive block filtering of target slice|
|JP2018057009A (en)||Image processing device, method, and program|
|US9654796B2 (en)||Method and apparatus for encoding and decoding image through intra prediction|
|AU2013281949B2 (en)||Image processing device and method|
|US9762929B2 (en)||Content adaptive, characteristics compensated prediction for next generation video|
|TWI629894B (en)||Image processing device, image processing method, image processing program, and computer-readable medium|
|US20190149839A1 (en)||Encoding device and encoding method with setting and encoding of reference information|
|Ahmad et al.||Video transcoding: an overview of various techniques and research issues|
|CN103369321B (en)||Image processing equipment and method|
|CN1186944C (en)||Picture coding device, picture coding method, picture decoding device and its method, and providing medium|
|CN1156171C (en)||Device for raising processing efficiency of image and sound|
|CN103428497B (en)||Adaptability for enhancement-layer video coding quantifies|
|US9979981B2 (en)||Image processing device and method|
|CA2644605C (en)||Video processing with scalability|
|KR101435095B1 (en)||Video encoder, video decoder, video encoding method, video decoding method, and computer readable information recording medium storing program|
|EP1618744B1 (en)||Video transcoding|
|US9191667B2 (en)||System and method for transcoding data from one video standard to another video standard|
|US8054883B2 (en)||Method for transcoding compressed video signals, related apparatus and computer program product therefor|
|CN101263718B (en)||Integrated digital transcoding|
|JP6355744B2 (en)||Block vector prediction in video and image encoding / decoding|
|CN1272740C (en)||Multi-resolution image data management system and method based on wavelet-like transform and sparse data coding|
|CN1222153C (en)||Digital image compression method|
|CN1265649C (en)||Moving picture signal coding method, decoding method, coding apparatus, and decoding apparatus|
|CN1054486C (en)||Quantization size altering apparatus|
|ES2601927T3 (en)||Motion vector prediction for planned progressive interlaced video frame fields|
|C10||Entry into substantive examination|
|SE01||Entry into force of request for substantive examination|
|C02||Deemed withdrawal of patent application after publication (patent law 2001)|
|WD01||Invention patent application deemed withdrawn after publication||
Open date: 20061129