CN105007483A

CN105007483A - Screen content encoding and decoding method compatible with H264 standard

Info

Publication number: CN105007483A
Application number: CN201510400827.6A
Authority: CN
Inventors: 王中元; 傅佑铭; 何政
Original assignee: Wuhan University WHU
Current assignee: Chengdu Suirui cloud Technology Co. Ltd.
Priority date: 2015-07-09
Filing date: 2015-07-09
Publication date: 2015-10-28
Anticipated expiration: 2035-07-09
Also published as: CN105007483B

Abstract

The invention discloses a screen content encoding and decoding method compatible with an H264 standard, which introduces dictionary compression into a conventional video encoding frame and adds an encoding mode for a text content, namely dictionary encoding. Through joint optimization of a code rate and distortion, a most appropriate encoding mode is selected for each image, a text area generally selects dictionary encoding, and other areas remain the original encoding way, so that the compression quality of a great number of text areas in the screen content is improved. Meanwhile, through reasonably utilizing the encoding mode remaining H264 and properly processing the dictionary encoding time, the compatibility with the standard technology is kept. The method has higher compression quality while the code stream is compatible with the H264 standard.

Description

A kind of screen content coding-decoding method with H264 operating such

Technical field

The invention belongs to video coding and decoding technology field, relate to a kind of screen content decoding method, be specifically related to screen content coding-decoding method that is a kind of and H264 operating such.

Technical background

In video conference, long-distance education, remote collaboration office system, Sharing computer screen content is an important function, and Screen sharing is show and share remote document data and provide a fast approach.Screen content image is the image of a kind of character and graphic and natural image mixing, comprise Word/PDF document, PPT gives a lecture document, the all kinds such as Web page and day by day variation, simultaneously because screen picture resolution is higher, comparatively large to network bandwidth consumption, therefore, effectively must compress it.

Character and graphic part in vision-mix comprises the high-frequency information of many human eye sensitivity, traditional still image pressure standard (as JPEG) and dynamic video compression standard (as H264) based on human eye to the insensitive feature of natural image medium-high frequency information by HFS coarse quantization, be directly used in compressed mixed image, often cause character and graphic smudgy.Some are intended to the improvement opportunity maintaining text pattern edge high-frequency information, as spatial domain directly quantize, palette coding, Lossless Compression, need the kernel of Standard modification coding framework, the compatibility with standard decoder cannot be accomplished, have impact on the interoperability that screen content is shared.

Summary of the invention

In order to solve the problems of the technologies described above, the invention provides screen content coding-decoding method that is a kind of and H264 operating such.

The technical solution adopted in the present invention is: a kind of screen content coding-decoding method with H264 operating such, and it is characterized in that, described coding method, comprises the following steps:

Step 1: image block number of coded bits is estimated, choose some typical text screen contents and form large training dataset, frame by frame dictionary encoding is performed to the image in training set, the overall bit number that statistics produces, again according to the total number of image block, these bit numbers are converted single image block, namely obtains the bit number R under single image block dictionary encoding mode;

Step 2: in the dictionary encoding pattern obtained in H264 Standard coding modes and step 1, optimizing cost function by code rate distortion is that each image block chooses forced coding pattern, by its schema code, I_PCM is set to for the selected image block for dictionary encoding pattern, but does not encode immediately;

Step 3: image block data is collected, and the image block data of each I_PCM of being determined as is write a common buffer; Repeat step 2, until a two field picture is disposed;

Step 4: perform dictionary encoding after recombinating by column major order to the pixel of each image block, comprises brightness and two chromatic components, then performs dictionary encoding, before the code stream of the code stream of dictionary encoding write H264 standard code, forms composite bit stream.

As preferably, the typical text screen content described in step 1 comprises Word document, PPT lantern slide, webpage, CAD figure.

As preferably, what the dictionary encoding described in step 1 adopted is Lempel-Ziv-Markovchain-Algorithm algorithm.

As preferably, described in step 2, optimizing cost function by code rate distortion is that each image block chooses forced coding pattern, its specific implementation process is the distortion D of computed image block under two kinds of coding modes and bit number R, then optimizes cost function J=D+ λ R by code rate distortion and chooses the minimum pattern of Combine distortion J as forced coding pattern; Wherein J is Combine distortion, and parameter lambda is LaGrange parameter, and λ is for weighing the metric weights between distortion and code check.

As preferably, carry out coding/decoding method after coding, comprise the following steps:

Step 1: extract dictionary code stream from composite bit stream, then performs dictionary decoding, obtains the decoding sampling point data that all patterns are I_PCM image block;

Step 2: sequential scanning decoding sampling point data also resolve H264 code stream, are the image block of I_PCM to the pattern of parsing, by the pixel sampling point data of its correspondence write H264 code stream;

Step 3: the H264 decode procedure of operative norm.

Compare with standardized H264 coding techniques and the improvement project of encoding for screen video more at present, the present invention has the following advantages and good effect:

(1) the present invention to be distributed to prospect by the dictionary encoding pattern that newly increases and sparse text filedly takes Lossless Compression, improves coding quality and the compression efficiency of screen video;

(2) the present invention use H264 definition but the I_PCM pattern do not used instruction dictionary coding mode, therefore, code stream by the H264 decoder identification of standard, can not need to revise decoder, maintains standard compatibility.

(3) the dictionary encoding pattern that the present invention newly increases does not need the distortion of block-by-block calculation code and bit number consumption, therefore, does not bring the extra increase of computational complexity.

Accompanying drawing explanation

Fig. 1: the coding flow chart of the embodiment of the present invention.

Fig. 2: the decoding process figure of the embodiment of the present invention.

Embodiment

Understand for the ease of those of ordinary skill in the art and implement the present invention, below in conjunction with drawings and Examples, the present invention is described in further detail, should be appreciated that exemplifying embodiment described herein is only for instruction and explanation of the present invention, is not intended to limit the present invention.

Screen content is mainly based on text, figure, front backcolor is well arranged, image background color is single, foreground color does not also enrich relatively, the Data distribution8 in pixel color space is more sparse, generally concentrate on the numerically several of minority, local correlations in image block between pixel is not strong, therefore, removes the limited use of the transition coding of spatial coherence; On the other hand, be different from natural scene video, screen content is not generally containing band noise, and word clear-cut margin, and quantizing distortion meeting fuzzy literal edge, causes word unintelligible.Based on above 2 understanding, the encoding scheme that traditional conversion adds quantification is not suitable for the compression of high-quality screen content.On the contrary, the feature of the text filed pixel sparse distribution of screen content, is applicable to harmless dictionary compression just.

Under above-mentioned technical thought, the present invention needs emphasis to solve three key issues:

(1) define Multi-encoding pattern in the existing compression standard such as H264, the present invention has increased again a kind of dictionary encoding pattern newly.Typically, existing coding mode is effective to the regional compare with natural video frequency attribute, as the image, animation etc. that embed, and newly-increased encoder dictionary pattern is generally more suitable for text block compression, have selected coding mode mistakenly and can reduce overall compression performance on the contrary.Therefore, for each image block, to distribute appropriate coding mode very crucial exactly.

(2) central principle of dictionary encoding is exactly from historical data, find the coupling of current data, if the match is successful, just carrys out alternative initial data with data to (matching length, matching distance), thus realizes the Lossless Compression to data.Therefore, the efficiency of dictionary encoding and the length of data to be encoded closely related, first encoding input data more, efficiency is higher, otherwise lower.This point is also demonstrated with a large amount of practices of the tool compresses files such as WinZip or WinRar.But the initial data length of 16x16 pixel image block is very short, carry out separately one by one compressing the performance that will significantly limit dictionary encoding device.How to ensure that dictionary encoding efficiency is most important by suitable data recombination.

(3) newly-increased dictionary encoding pattern obviously can not be met the decoder of the standards such as H264 accept, directly code stream is given standard decoder decoding, decoder can be considered as mistake; Amendment decoder kernel can accomplish the compatibility to the pattern increased, but in many instances, decoder is all transparent concerning user and application developer, does not possess the condition of amendment, as hardware decoder.Therefore, how the code stream of dictionary and existing standard hybrid coding is being sent into code stream that decoder kernel pre reduction is standard thus to maintain standard compatibility very important.

For the problems referred to above, the present invention, on the basis of feature analyzing H264 coding standard and representative dictionary compress technique, proposes following solution one by one.

(1) Video coding selects optimum coding mode according to code rate distortion optimization (RDO) usually, preferred coding mode will provide minimum code distortion and code check consumption simultaneously, namely the cost function optimized is J=D+ λ R, wherein J is Combine distortion, D is the image fault that lossy coding brings, R is the bit number produced by this pattern-coding, parameter lambda is LaGrange parameter, metric weights between balance distortion and code check, generally the prior mode by statistics or experience pre-sets.What the original coding mode of H264 performed is lossy coding, and distortion D and bit number R index all will be added up, but under dictionary encoding pattern, owing to being lossless coding, distortion is actual is zero, only needs to weigh bit number.Usual lossy coding is sent into entropy coder coding by the parameter links such as conversion, quantification produced, is determined the value of parameter R again according to the bit number of actual coding, but under dictionary encoding pattern, if also take this thinking, because dictionary encoding device is not good for the compression performance of low volume data, compression efficiency will be underestimated, cause the bit number R produced will exceed than actual coding situation, thus bring mode decision inaccurate, much should be judged as that the block of dictionary encoding may be mistaken for other pattern.Therefore, the present invention, when weighing the bit number that dictionary encoding image block needs, is not carry out actual coding to image block to obtain, but takes the mode of training to be that image block pre-estimates a numerical value.Specific practice is, choose some typical text screen contents and form large training dataset, then frame by frame dictionary encoding is performed to the image in training set, the overall bit number that statistics produces, again according to the total number of image block, these bit numbers are converted single image block, namely obtains the bit number discreet value under single image block dictionary encoding mode.The prior off-line training of this process is good, and the R using this discreet value to substitute in RDO formula in real cataloged procedure carries out RDO optimum choice.

(2) sign in dictionary encoding device compression efficiency when the low volume data of processing block rank limited, therefore the present invention does not perform coding separately immediately to the image block being determined as dictionary encoding pattern, but collect together, after processing Deng a two field picture, concentrative implementation dictionary encoding again, compressed bit stream becomes a complete code stream with the bit stream complex of standard.Simultaneously, consider that the height of character shared pixel in dot matrix word library is often greater than width, therefore, improve dictionary encoding efficiency for the neighborhood repeatability strengthened between pixel, when accessing the image block of 16x16 pixel-matrix structure, not row major order scanning routinely, but by column major order's scanning.By these two measures, image block is reasonably recombinated, will effectively promote dictionary encoding efficiency.

(3) typically, pattern in the encoder outside self-defined a kind of standard to accomplish operating such hardly may, the encoder kernel that the present invention adopts only carries out dictionary encoding mode decision, the thinking of the outer concentrative implementation coding of encoder creates chance for the compatibility that maintains the standard.And, a kind of I_PCM coding mode of H264 standard definition, this pattern just directly encapsulates pixel sampling point data, do not carry out any damaging or Lossless Compression, therefore, use hardly in normal application of leading with boil down to, therefore, this pattern can be utilized to indicate dictionary encoding.Summary is got up, the standard compatibility strategy that the present invention takes at codec end is as follows: encoder kernel will be determined as the block I_PCM pattern instruction of dictionary encoding pattern, but do not encode, to collect data encoder outward concentrate coding, then by code stream multiplex before normal H264 code stream; Decoder is before entering kernel decoding, extract dictionary code stream, execution dictionary is decoded, then the result of prescan dictionary decoding and H264 code stream, the sampling point data that dictionary decoding recovers are mapped to the macroblocks that coding mode is I_PCM successively, after completing this preprocessing process, then the H264 decode operation of operative norm.Because the result of dictionary decoding is original sampling point data, H264 decoder is understood according to I_PCM pattern can not cause any ambiguity.By post-processing step and the corresponding pre-treatment step of decoder end of as above encoder-side, under the prerequisite not revising decoder kernel, the dictionary coding method that the present invention proposes can be compatible with the H264 decoder of standard.

Ask for an interview Fig. 1, a kind of screen content coding-decoding method with H264 operating such provided by the invention, described coding method, comprises the following steps:

Step 1: image block number of coded bits is estimated, choose some typical text screen contents (comprising Word document, PPT lantern slide, webpage, CAD figure) and form large training dataset, frame by frame dictionary encoding is performed to the image in training set, the overall bit number that statistics produces, again according to the total number of image block, these bit numbers are converted single image block, namely obtains the bit number R under single image block dictionary encoding mode;

The present embodiment chooses representational screen content composition training dataset, comprise 50 width Word document images, 50 width PPT file and pictures, 50 width Web page images, dictionary encoding device adopts LZMA (Lempel-Ziv-Markovchain-Algorithm) algorithm.

Wherein code rate distortion optimizes cost function is that each image block chooses forced coding pattern, its specific implementation process is the distortion D of computed image block under two kinds of coding modes and bit number R, then optimizes cost function J=D+ λ R by code rate distortion and chooses the minimum pattern of Combine distortion J as forced coding pattern; Wherein J is Combine distortion, and parameter lambda is LaGrange parameter, and λ is for weighing the metric weights between distortion and code check.

LaGrange parameter empirically formula λ=2 of the present embodiment ^qp/6-2determine, wherein qp is quantization parameter.

Ask for an interview Fig. 2, a kind of screen content coding/decoding method with H264 operating such provided by the invention, comprises the following steps:

Step 3: the H264 decode procedure of operative norm.

Dictionary compression is introduced in traditional video coding framework by the present invention, newly-increased a kind of coding mode for content of text---dictionary encoding.By the combined optimization of code check and distortion, be that each image block selects the most appropriate coding mode, text filed general selection dictionary encoding, other region retains original coded system, thus improves a large amount of text filed compression quality occurred in screen content.Meanwhile, by the Appropriate application of coding mode that retains H264 and the appropriate process on dictionary encoding opportunity, the compatibility with standard technique is maintained.The present invention has higher compression quality, with code stream and H264 operating such.

Should be understood that, the part that this specification does not elaborate all belongs to prior art.

Should be understood that; the above-mentioned description for preferred embodiment is comparatively detailed; therefore the restriction to scope of patent protection of the present invention can not be thought; those of ordinary skill in the art is under enlightenment of the present invention; do not departing under the ambit that the claims in the present invention protect; can also make and replacing or distortion, all fall within protection scope of the present invention, request protection range of the present invention should be as the criterion with claims.

Claims

1., with the screen content coding-decoding method of H264 operating such, it is characterized in that, described coding method, comprises the following steps:

2. the screen content coding method with H264 operating such according to claim 1, is characterized in that: the typical text screen content described in step 1 comprises Word document, PPT lantern slide, webpage, CAD figure.

3. the screen content coding method with H264 operating such according to claim 1, is characterized in that: what the dictionary encoding described in step 1 adopted is Lempel-Ziv-Markov chain-Algorithm algorithm.

4. the screen content coding method with H264 operating such according to claim 1, it is characterized in that: described in step 2 is that each image block chooses forced coding pattern by code rate distortion optimization cost function, its specific implementation process is the distortion D of computed image block under two kinds of coding modes and bit number R, then optimizes cost function J=D+ λ R by code rate distortion and chooses the minimum pattern of Combine distortion J as forced coding pattern; Wherein J is Combine distortion, and parameter lambda is LaGrange parameter, and λ is for weighing the metric weights between distortion and code check.

5. the screen content coding method with H264 operating such according to claim 1, is characterized in that, carry out the method for decoding, comprise the following steps after coding:

Step 3: the H264 decode procedure of operative norm.