CN101883284B

CN101883284B - Video encoding/decoding method and system based on background modeling and optional differential mode

Info

Publication number: CN101883284B
Application number: CN 201010203823
Authority: CN
Inventors: 高文; 张贤国; 梁路宏; 黄铁军
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2010-06-21
Filing date: 2010-06-21
Publication date: 2013-06-26
Anticipated expiration: 2030-06-21
Also published as: CN101883284A

Abstract

The invention discloses video encoding/decoding method and system based on background modeling and an optional differential mode. The encoding method comprises the following steps of: modeling and generating a background image by using an input video sequence, and encoding to obtain a reconstructed background image; carrying out pixel or sub-pixel accuracy global motion estimation on each input image to obtain a global motion vector; and carrying out encoding on each video block by selectively using an original mode and a differential mode on the basis of the reconstructed background image and the global motion vector. The invention can improve the encoding performance and has the advantages of not increasing the encoding delay and being beneficial to further processing as the code stream contains the background image per se.

Description

Video encoding/decoding and system based on background modeling and optional differential mode

Technical field

The present invention relates to the video compression technology in the digital media processing technical field, relate more specifically to a kind of video encoding/decoding based on background modeling and optional differential mode and system.

Background technology

Video compression (also referred to as Video coding) is one of key technology during digital media storage and transmission etc. are used, and its objective is the data volume in reducing storage and transmit by the elimination redundant information.Current all main flow video compression standards have all adopted block-based predictive transformation hybrid encoding frame, namely by the statistical redundancy (comprising spatial redundancy, time redundancy and comentropy redundancy) in the methods such as prediction, conversion, entropy coding elimination video image, to reach the purpose that reduces data volume.To have the scene invariant feature in the regular hour section due to video data, in recent years, development and progress along with the background modeling technology, the background modeling technology more and more is applied in Video coding, the background of reasonably utilizing modeling to generate, can further eliminate the information redundancy in video, thereby obtain better compression performance.

Use for the modeling background, main method is divided into two classes at present, one class is to break away from block-based predictive transformation hybrid encoding frame, adopt object-based video compression technology, by using the modeling algorithm generation background, complete object detection, to technology such as image tracing, foreground/background segmentation, each object in video is separated, by different objects is taked different compress modes, further excavate the information redundancy in video, thereby improve compression efficiency.Therefore, object-based video-frequency compression method is an important research direction that solves the video compression problem, but also there are two problems: one, object detection in video remains an open question in computer vision and image processing field with cutting apart, existing method is still not ideal enough aspect the accuracy that detects, cuts apart and accuracy rate, becomes to a bottleneck of the video-frequency compression method of object; Its two, the computational complexity of above-mentioned object detection and dividing method is higher, is unfavorable for the realization of encoder.

So the another kind of still coding method of block-based predictive transformation hybrid encoding frame just becomes the research direction that another one receives much concern.

Summary of the invention

The invention provides a kind of video encoding/decoding based on background modeling and optional differential mode and system.Can better improve coding efficiency based on the present invention.

On the one hand, the invention discloses a kind of method for video coding based on background modeling and optional differential mode, the method comprises the steps: the background modeling step, uses the video sequence modeling generation background image of input, through obtaining the reconstructed background image after coding; The overall motion estimation step is carried out the overall motion estimation of pixel or sub-pixel precision to every width input picture, obtain global motion vector; The model selection step based on background image and the described global motion vector of described reconstruct, optionally uses raw mode, difference modes that each video block is encoded.

Above-mentioned method for video coding, in preferred described background modeling step, the video sequence modeling generation background image of described use input comprises: for each pixel, find the pixel set of this pixel in training set, then begin traversal; For each pixel value, according to poor between current pixel value and next adjacent pixel values in this pixel set, the dynamic threshold that utilizes current time to generate judges, if poor absolute value is greater than threshold value, judge that the current data section finishes, and begins next data segment; And then, the pixel set of whole current pixel position is divided into some data segments; Distribute a weight for each section, described weight is the size of this segment data set; Based on described weight, the background pixel value of the described pixel that calculates.

Above-mentioned method for video coding, preferred described background modeling step also comprise regularly chooses training set again to upgrade the step of described background image.

Above-mentioned method for video coding, in preferred described background modeling step, background image is encoded to obtain in the background image of reconstruct, and described coding method comprises: use to diminish or background image that the coding method coding modeling of harmless, image or video generates; Or all background images are regarded as a sequence, use method for video coding to encode, described method for video coding is: MPEG-1/2/4, H.263, H.264/AVC, VC1, AVS, JPEG, JPEG2000 or MJPEG.

Above-mentioned method for video coding, in preferred described overall motion estimation step, described overall motion estimation comprises: take data block as base unit, make present image carry out take the reconstructed background image as reference picture the overall situation, whole pixel or minute pixel motion search, getting the intermediate value of motion vector set, maximum cluster or mean value is the global motion vector of present image.

Above-mentioned method for video coding, in preferred described model selection step, described raw mode, the difference modes optionally used encoded to each video block, and the method for selection is: by comparing the rate distortion result of raw mode and difference modes.

Above-mentioned method for video coding, preferred described raw mode is encoded to: according to global motion vector, find the corresponding data of predictive reference data in background image of data to be encoded, if predictive reference data has been encoded to difference modes, take both decoding superposition value as reference, otherwise directly take the decoding original value of predictive reference data as with reference to coming the direct coding data to be encoded.

Above-mentioned method for video coding, preferred described difference modes is encoded to: according to global motion vector, data for the coupling correspondence of predictive reference data in background image of data to be encoded, if institute's predictive reference data has been encoded to raw mode, take both decoding difference value as reference, otherwise directly take the decoding original value of prediction reference pixel as the differential data with reference to data corresponding in encode data to be encoded and background image.

above-mentioned method for video coding, the preferred described training set of regularly again choosing is specially to upgrade described background image: carry out the renewal of background image according to video-frequency band, one section input video sequence that described video-frequency band is encoded for the background image that uses same reconstruct, whole input video sequence can be regarded as by some end to end video-frequency bands and consists of, during coding, choosing the training plan image set from the current video section carries out background modeling and generates a width background image, for next video-frequency band coding, when making current video section coding, what use is the background image that generates in previous video section coding.

On the other hand, the invention also discloses a kind of video coding system based on background modeling and optional differential mode, comprising: the background modeling module is used for using the video sequence modeling generation background image of inputting, through obtaining the reconstructed background image after coding; The overall motion estimation module for every width input picture being carried out the overall motion estimation of pixel or sub-pixel precision, obtains global motion vector; Mode selection module is used for background image and described global motion vector based on described reconstruct, optionally uses raw mode, difference modes that each video block is encoded.

Above-mentioned video coding system, in preferred described background modeling module, the video sequence modeling generation background image of described use input comprises: be used for for each pixel, find the pixel set of this pixel in training set, then begin traversal; For each pixel value, according to poor between current pixel value and next adjacent pixel values in this pixel set, the dynamic threshold that utilizes current time to generate judges, if poor absolute value is greater than threshold value, judge that the current data section finishes, and begins next data segment; And then, the pixel set of whole current pixel position is divided into some data segments; Distribute a weight for each section, described weight is the size of this segment data set; Based on described weight, the submodule of the background pixel of the described pixel value that calculates.

Above-mentioned video coding system, preferred described background modeling module also comprise regularly chooses training set again to upgrade the submodule of described background image.

Above-mentioned video coding system, in preferred described background modeling module, background image is encoded to obtain in the background image of reconstruct, and described coding method comprises: use to diminish or background image that the coding method coding modeling of harmless, image or video generates; Or all background images are regarded as a sequence, use method for video coding to encode, described method for video coding is: MPEG-1/2/4, H.263, H.264/AVC, VC1, AVS, JPEG, JPEG2000 or MJPEG.

Above-mentioned video coding system, preferred described overall motion estimation module also comprises: be used for take data block as base unit, make present image carry out take the reconstructed background image as reference picture the overall situation, whole pixel or minute pixel motion search, getting the intermediate value of motion vector set, maximum cluster or mean value is the submodule of the global motion vector of present image.

Above-mentioned video coding system, in preferred described mode selection module, described raw mode, the difference modes optionally used encoded to each video block, and the method for selection is: by comparing the rate distortion result of raw mode and difference modes.

Above-mentioned video coding system, preferred described raw mode is encoded to: according to global motion vector, find the corresponding data of predictive reference data in background image of data to be encoded, if predictive reference data has been encoded to difference modes, take both decoding superposition value as reference, otherwise directly take the decoding original value of predictive reference data as with reference to coming the direct coding data to be encoded.

Above-mentioned video coding system, preferred described difference modes is encoded to: according to global motion vector, data for the coupling correspondence of predictive reference data in background image of data to be encoded, if institute's predictive reference data has been encoded to raw mode, take both decoding difference value as reference, otherwise directly take the decoding original value of prediction reference pixel as the differential data with reference to data corresponding in encode data to be encoded and background image.

above-mentioned video coding system, the preferred described training set of regularly again choosing is specially to upgrade described background image: carry out the renewal of background image according to video-frequency band, one section input video sequence that described video-frequency band is encoded for the background image that uses same reconstruct, whole input video sequence can be regarded as by some end to end video-frequency bands and consists of, during coding, choosing the training plan image set from the current video section carries out background modeling and generates a width background image, for next video-frequency band coding, when making current video section coding, what use is the background image that generates in previous video section coding.

On the other hand, the invention also discloses the corresponding video encoding/decoding method of a kind of with above-mentioned method for video coding, comprising: decoding background image and global motion vector; Each video block is carried out raw mode or difference modes decoding.

Above-mentioned video encoding/decoding method, preferred described raw mode decoding comprises: if data to be decoded be encoded to raw mode, according to global motion vector, obtain the corresponding data of predictive reference data in background image; If predictive reference data is for being encoded to difference modes, take both decoding superposition value as reference, otherwise directly take the decoding original value of predictive reference data as with reference to the data to be decoded that directly decode.

Above-mentioned video encoding/decoding method, preferred described difference modes decoding comprises: if data to be decoded be encoded to difference modes, according to global motion vector, obtain the corresponding data of predictive reference data in background image; If institute's predictive reference data has been encoded to raw mode, take both decoding difference value as reference, otherwise directly take the decoding original value of predictive reference data as with reference to decoding current data to be decoded, the data that decode pass through again with background image in the superposition of corresponding data.

On the other hand, the invention also discloses the corresponding video decoding system of a kind of and above-mentioned video coding system, comprising: the module that is used for decoding background image and global motion vector; Be used for each video block is carried out the module of raw mode or difference modes decoding.

Above-mentioned video decoding system, preferred described raw mode decoding comprises: if data to be decoded be encoded to raw mode, according to global motion vector, obtain the corresponding data of predictive reference data in background image; If predictive reference data is for being encoded to difference modes, take both decoding superposition value as reference, otherwise directly take the decoding original value of predictive reference data as with reference to the data to be decoded that directly decode.

Above-mentioned video decoding system, preferred described difference modes decoding comprises: if data to be decoded be encoded to difference modes, according to global motion vector, obtain the corresponding data of predictive reference data in background image; If institute's predictive reference data has been encoded to raw mode, take both decoding difference value as reference, otherwise directly take the decoding original value of predictive reference data as with reference to decoding current data to be decoded, the data that decode again through with background image in the superposition of corresponding data obtain final decoded data.

Than prior art, the present invention has following features: the first, do not carry out cutting apart of object or foreground/background; The second, encode take piece or macro block as unit; The 3rd, increased global motion compensation; The 4th, the mode of employing model selection is selected optimum from two class coding modes, to guarantee code efficiency.The present invention can improve coding efficiency, and has and do not increase coding delay, and code stream itself comprised background image, is conducive to the advantage of further processing.

Description of drawings

Fig. 1 is the flow chart of steps that the present invention is based on the method for video coding embodiment of background modeling and optional differential mode;

Fig. 2 is be used to implementing method for video coding block diagram of the present invention;

Fig. 3 is the corresponding relation of data and background image in present image under overall motion estimation;

Fig. 4 is background modeling process schematic diagram;

Fig. 5 is the schematic diagram of choosing of training set;

Fig. 6 is that current data to be encoded will be encoded to difference modes, prediction reference and be encoded to raw mode decoding for example;

Fig. 7 is that current data to be encoded will be encoded to coding that difference modes, prediction reference be encoded to difference modes for example;

Fig. 8 is that current data to be encoded will be encoded to raw mode, prediction reference and be encoded to difference modes decoding for example;

Fig. 9 is that current data to be decoded have been encoded to difference modes, prediction reference and have been encoded to raw mode decoding for example;

Figure 10 be current data to be decoded bits of coded difference modes, prediction reference be encoded under difference modes coding for example;

Figure 11 is that current data to be decoded have been encoded to raw mode, prediction reference and have been encoded to difference modes decoding for example.

Embodiment

For above-mentioned purpose of the present invention, feature and advantage can be become apparent more, the present invention is further detailed explanation below in conjunction with the drawings and specific embodiments.

In the present invention, utilize the fixing characteristics of scene in the certain hour that exists in video sequence, to relatively-stationary scene in video, adopted a kind of method of background modeling, and then the generation background image is described.In cataloged procedure, by setting up and upgrading and described the background image of relatively-stationary scene in the video, optionally use a kind of difference modes coding method of new introducing each data block of encoding, thereby eliminate to a greater extent the redundancy in video sequence, obtain better compression performance.Correspondingly, the basic thought of coding method is as follows: model and background image updating are described the fixed scene information of video image; Use subsequently and selectively carry out overall motion estimation, obtain the global motion vector of each width image; Then between the difference Video Encoding Mode of the original video coding mode of direct coding initial data and coding initial data and the difference result of corresponding background data, the coding mode of selection optimum.In above-mentioned encryption algorithm thought, background image utilizes original inputted video image to obtain through background modeling, and the background image of generation need to enroll code stream; If used overall motion estimation, also need global motion vector is write code stream; Designing requirement and the coding side of decoding end are complementary, and namely code stream when decoding, at first directly decodes background image and the global motion vector that can select to encode.

With reference to Fig. 1, Fig. 1 is the flow chart of steps that the present invention is based on the method for video coding embodiment of background modeling and optional differential mode, comprises the steps:

Background modeling step S1 uses the video sequence modeling generation background image of inputting, and background image obtains the reconstructed background image through after encoding and decoding; Overall motion estimation step S2 carries out the overall motion estimation of pixel or sub-pixel precision to every width input picture, obtain global motion vector; Model selection step S3 based on background image and the described global motion vector of described reconstruct, optionally uses raw mode, difference modes that each video block is encoded.

Above-described embodiment has following features: the first, do not carry out cutting apart of object or foreground/background; The second, encode take piece or macro block as unit; The 3rd, increased global motion compensation; The 4th, the mode of employing model selection is selected optimum from two class coding modes, to guarantee code efficiency.The present invention can improve coding efficiency, and has and do not increase coding delay, and code stream itself comprised background image, is conducive to the advantage of further processing.And, video compression technology based on background modeling is particularly useful for video monitoring, video conference, intelligent room etc., the video of these application scenarioss has the scene characteristics long, that the camera lens switching frequency is extremely low of holding time, and is conducive to use background modeling to improve compression efficiency.

with reference to Fig. 2, the present invention can complete under encoding and decoding framework as shown in Figure 2, this framework comprises background modeling at coding side, selectable overall motion estimation, global motion compensation, the background image coding, the background image decoding, the difference modes coding, seven functional units of raw mode coding, complete respectively the background modeling algorithm, selectable overall motion estimation algorithm, and the operation of the compensation data between background image, the background image encryption algorithm, the background image decoding algorithm, data-encoding scheme under difference modes, data encoding algorithm operating under raw mode, corresponding decoding end is made of difference modes decoding, raw mode decoding, background image decoding, difference and background superpositing unit, realizes respectively the overlap-add operation of corresponding data in decoding, differential decoding result and the background image of coded data under the decoding, raw mode of coded data under difference modes.

as shown in Figure 2, the present embodiment comprises the background image modelling operability of accepting input video sequence at coding side, accept the background image encoding operation of described background image modelling operability output, the background image decode operation that connects described background image encoding operation output, accept the optional overall motion estimation operation of input video sequence and background image decoder module output, accept the overall motion estimation vector of optional overall motion estimation vector, the global motion compensation operation of list entries and background image decoding output, but accept input video sequence, the differential data raw mode encoding operation of the overall motion estimation vector global motion compensation operation output of overall motion estimation module output, accept the overall motion estimation phasor difference merotype encoding operation of input video sequence and optional overall motion estimation module output, select to accept the coded message of difference modes encoding operation output and coded message and the encoding code stream of encoding code stream or the output of raw mode encoding operation by model selection at last.In decoding end, comprise the background image decode operation of the encoding code stream of accepting the coding side input; Optionally accept the difference modes decode operation of input code flow information according to the model selection flag bit; Optionally accept the raw mode decode operation of input code flow information according to the model selection flag bit; Accept difference and the background overlap-add operation of output and the background image decoding output of difference block decoding.The described video based on background model of Fig. 2 is compiled the solution method in following the present invention and each related function and implementation method that operates is described in detail:

One, background image modeling

1, to the video sequence of input, choose a training plan image set, modeling generation one width background image passes to video encoding module 3 and compresses.Before first background image generated, this module was output as the first frame.Take luminance component as example, modeling method used comprises the steps: including but not limited to as shown in Figure 3 background modeling algorithm

Step S11, initialization current pixel position mean, threshold value;

Step S12 sets up a data segment, its weight of initialization, mean value;

Step S13 reads a new pixel in training set;

Step S14, judgement: is the difference on the time between neighbor square greater than threshold value? if, execution in step S15a, execution in step S15b if not;

Step S15a upgrades current data section mean value;

Step S16a upgrades threshold value under current weight;

Step S17a, current data section weight adds 1;

Step S15b upgrades overall average, threshold value under current data section weight;

Step S16b sets up a new data section, initializes weights, mean value;

Step S18, after execution of step S17a or step S16b, whether the training of judgement collection finishes, if, execution in step S19; If not, execution in step S13;

Step S19 upgrades overall average under current data section weight, and with the overall average mean value of pixel as a setting.

In Fig. 3, mean value and the weight of overall average, each newdata section all are initialized as 0.Threshold value is initialized as the Pixel-level mean value of the difference square of front cross frame data in training set.Overall average and threshold value more new capital are based on weight.If the weight of data segment i is W _i, pixel and be Sum _i, in data segment i, the poor quadratic sum of all neighbors is T _i, overall average AVG, threshold value Th calculate as formula (1), (2):

AVG = Σ_{j = 1}^{i} ({Sum}_{j} \times W_{j}) / Σ_{j = 1}^{i} (W_{j}^{2}), - - - (1)

Th = Σ_{j = 1}^{i} (T_{j} \times W_{j}) / Σ_{j = 1}^{i} (W_{j}^{2}) . - - - (2)

Corresponding with the sequential structure that the described background image renewal of preamble is required, whenever training plan image set of input, just re-start background modeling, generate a width background image, complete background image and upgrade.The modeling process of chromatic component is identical.

Two, background image encoding operation

The background image that the background modeling module is generated carries out compression coding, and coding result is write code stream and passes to the background image reconstructed module.The encoder used of encoding comprise but be not limited to MPEG-1/2/4, H.263, H.264/AVC, VC1, AVS, JPEG, JPEG2000, MJPEG.The modes such as background image sequence use IPPP structured coding and the every width background image of Lossless Compression are regarded in the configuration of encoder as including but not limited to the every width background image of independent intraframe coding, with all background images.In concrete execution mode, we have adopted the AVS coding techniques that extends to 9 bit input bit wides background image to be carried out the coding of QP=0.

Three, background image decode operation

Background image code stream to the output of background image coding module is carried out decoding and reconstituting, in order to make the encoding and decoding coupling, transmits the background image of reconstruct to the reconstructed background compensating operation.In concrete execution mode, adopted the AVS decoding technique that extends to 9 bit stream that background image is decoded, the output of decoding is also 9 bits.

Four, optional overall motion estimation operation

The reconstructed background image of background image decode operation output and the current video image of input are used overall motion estimation, obtain global motion vector.Include but not limited to take data block as base unit, make present image carry out overall whole pixel take the reconstructed background image as reference picture or divide the pixel motion search to get the intermediate value of motion vector, maximum cluster or mean value being the global motion vector of present image.The global motion vector that obtains need to write code stream.If global motion vector is zero, also can uses corresponding flag bit to substitute global motion vector and write code stream.

Five, global motion compensation operation

The global motion vector that uses overall motion estimation to generate mates in background image and the background image data corresponding to the current initial data that will encode, as shown in Figure 4.The current initial data that will encode and background image data to coupling carry out calculus of differences, then difference result are exported to the difference modes encoding operation and are encoded.

Six, difference modes encoding operation

Reconstruct background compensation operation output ground differential data piece is carried out compression coding.The encryption algorithm of selecting will be complementary with the differential data block format.For example, when difference image is output as 9 bit, should select to be configured to the piece compression algorithm of 9 bits, described in the reconstructed background compensating operation.Under this pattern, we will encode this data block predictive reference data used and current data blocks of data all with the reconstructed background image in corresponding data carry out difference operation, the data that make this data block reference are all difference.Be calculated as example take global motion vector as the brightness difference under 0, if s is (x, y) be to treat that differential data is at position (x, y) the luminance pixel values of, b (x, y) final reconstructed background image is in the luminance pixel values of position (x, y), and the computational methods of differential data are including but not limited to the described two kinds of algorithms of following formula:

r(x，y)＝Clip1(s(x，y)-b(x，y)+256)， (3)

r(x，y)＝Clip2((s(x，y)-b(x，y))＞＞1+128)，(4)

Within wherein the clip1 in formula represents result of calculation is limited to [0,511], if cross the border, get nearest borderline pixel value; Within clip2 represents result of calculation is limited to [0,255], if cross the border, get nearest borderline pixel value.In conjunction with the global motion compensation operation, when the predictive reference data of current data block was the raw mode coding, the operation of this unit as shown in Figure 5.When the predictive reference data of current data block was the difference modes coding, the operation of this unit as shown in Figure 6.When infra-frame prediction, the reference picture in Fig. 5,6 can be equal to present image.

Seven, raw mode encoding operation

Data block to input is carried out compression coding.The encryption algorithm of selecting will be complementary with input block.For example, when input block is 9 bit, should select to be configured to the piece compression algorithm of 9 bits, described in the difference image computing module.Under this pattern, we carry out overlap-add operation at the corresponding data of encoding in this data block predictive reference data used and reconstructed background image, and the data that make this data block reference are all non-difference.Be calculated as example take global motion vector as the brightness difference under 0, if s is (x, y) be at position (x, y) differential coding data are through decoded luminance pixel values, b (x, y) is that the background image pixel of decoding is in the luminance pixel values of position (x, y), final value r (x, the y) computational methods of reference pixel are including but not limited to the described two kinds of algorithms of following formula:

r(x，y)＝Clip1(s(x，y)-256+b(x，y))， (5)

r(x，y)＝Clip2((s(x，y)＜＜1-128)+b(x，y)))，(6)

When the predictive reference data of current data block was the raw mode coding, this unit directly decoded code stream.When the predictive reference data of current data block was the difference modes coding, the operation of this unit as shown in Figure 7.When infra-frame prediction, the reference picture in Fig. 7 can be equal to present image.

Above-mentioned coding method is in realization, and described background image regular update method can be by a kind of new video sequence structure---video-frequency band describes and realizes.A video-frequency band is one section long input video sequence (hundreds of frame or longer), and whole input video sequence can be regarded as by end to end one by one video-frequency band and consists of.Each video-frequency band uses the background image of same width reconstruct to calculate difference image.When coding, the background modeling module is chosen the training plan image set from the current video section, carries out background modeling and generates a width background image, for next video-frequency band coding.From another perspective, current video section when coding, use be the background image that generates in previous video section coding, therefore, whole coding method can not bring because of the generation of background image extra delay.For first video-frequency band, wherein before some images can adopt traditional video coding technique (include but not limited to MPEG-1/2/4, H.263, H.264/AVC, VC1, AVS, JPEG, JPEG2000, MJPEG) encode.In these images of coding, the background modeling module is chosen the training plan image set as shown in Figure 8 from these images, generate first background image and be transferred to decoding end.For ensuing image in first video-frequency band, can utilize the reconstructed image of above-mentioned first background image, generate difference image and encode.When said method can guarantee that whole sequence begins to encode, can not produce extra delay because of background modeling yet.

Corresponding with above-mentioned sequential structure, what at first be incorporated into code stream is first training plan image set of direct coding.The encoding code stream of the first width background image subsequently.Next, be the corresponding difference image encoding code stream of other parts outside first training plan image set of first video-frequency band.Later replace to such an extent that corresponding background image and the difference image of each video-frequency band coding enrolled final code stream.

The code stream that above-mentioned coding method and system produce can be decoded with four operations of as shown in Figure 2 decoding end:

One, background image decode operation

Background image code stream is decoded, and the background image that transmission decodes is to the difference image compensating module.

Two, difference and background overlap-add operation.

To in the differential data of difference modes piece decode operation output and background image under global motion vector corresponding background data superpose, stack result is exported.

Three, difference modes decode operation

The difference modes image code stream that coding side writes is decoded, during decoding such as the pixel of institute's reference be raw mode coding, will be at first obtain reference image vegetarian refreshments for the current block decoding according to formula (3), (4).After decoding the current block data, also need to rebuild final decoded pixel point, at this moment, luminance component take global motion vector under 0 is established b ' (x, y) and r ' (x as example, y) be respectively that the background pixel that decodes and reference pixel are at position (x, y) pixel value of locating, output image can be according to following formula at the pixel value d ' (x, y) of this position:

d′(x，y)＝Clip1(b′(x，y)+r′(x，y)-256). (7)

d′(x，y)＝Clip2(b′(x，y)+((r′(x，y)-128)＜＜1))?(8)

Formula (7), (8) are complementary with formula (3), (4) of coding side respectively.In conjunction with difference and background overlap-add operation, when the predictive reference data of data block to be decoded was the raw mode coding, this unit operations as shown in Figure 9.When the predictive reference data of current data block was difference modes, the operation of this unit as shown in figure 10.When infra-frame prediction, the reference picture in Fig. 9,10 can be equal to present image.

Four, raw mode piece decode operation.

The raw mode image code stream that coding side writes is decoded, in decode procedure, pixel as institute's reference is the difference modes coding, will be at first obtain reference image vegetarian refreshments for the current block decoding according to formula (5), (6), the current block data that decode are the decoded picture of final reconstruction.When the predictive reference data of data to be decoded was the raw mode coding, when the predictive reference data of data to be decoded was difference modes, the operation of this unit as shown in figure 11.When infra-frame prediction, the reference picture in Figure 11 can be equal to present image.

The below cites an actual example to illustrate a kind of possible implementation method of the method for the invention.Setting input video is the YUV4:2:0 form, and setting video-frequency band length is 990 two field pictures.Input data pixels value need to extend to 9 bits by adding 256.Background modeling operates employing modeling method based on the segmentation weight as shown in Figure 11 and formula (1) (2).

Concrete, the 118 width images that can select to be evenly distributed in this video-frequency band in each video-frequency band of training set carry out background modeling to brightness and each chromatic component respectively as the training plan image set, are next video-frequency band generation background image.In addition, we introduce video-frequency band 0 as the initial video section, and the first width image is the background image of video-frequency band 0.The pixel value of the background image of these generations extends to 9 bits by adding 256, then adopts the AVS-S encoder RM0903 that extends to 9 bits, in the QP=0 situation, it directly is encoded to the I frame.The background image decode operation adopts the AVS-S encoder RM0903 decoder that extends to 9 bits, realizes the decoding of background image.Do not use overall motion estimation, the global motion vector of not encoding.Adopt the described method of aforementioned formula (3) to carry out Difference Calculation in the difference modes coding method, adopt the AVS-S coding method of having expanded 9 bits that differentiated current block is encoded.Adopt the described method of aforementioned formula (5) to carry out superposition calculation in the raw mode coding method, adopt the AVS-S coding method of having expanded 9 bits that current block to be encoded is encoded.The content of the code stream of background image and difference image is in order: the front 118 width images of direct coding, first background image, first video-frequency band, second background image, second video-frequency band.

For above-mentioned realization, carried out following performance test.Choose the static camera sequence of the indoor/outdoor scene of 8 3088 frames and test, and with the comparison of the Shenzhan Profile of the AVS reference encoder device RM0903 of generic configuration.The embodiment of method of the present invention can be in the range of code rates of 1Mbps～4Mbps, realize the performance gain of 0.92～1.53dB on the SD sequence, code check corresponding to 40.1%～74.76% is saved, in the range of code rates at 128kbps～768kbps, realize the performance gain of 1.27～1.87dB on the CIF sequence, the code check corresponding to 36.61%～85.77 is saved.

Above a kind of video encoding/decoding and system based on background modeling and optional differential mode provided by the present invention described in detail, used specific embodiment herein principle of the present invention and execution mode are set forth, the explanation of above embodiment just is used for helping to understand system of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications.In sum, this description should not be construed as limitation of the present invention.

Claims

1. the method for video coding based on background modeling and optional differential mode, is characterized in that, the method comprises the steps:

The background modeling step is used the video sequence modeling generation background image of inputting, and background image is through reconstructed background image after encoding and decoding;

The overall motion estimation step is carried out the overall motion estimation of pixel or sub-pixel precision to every width input picture, obtain global motion vector;

The model selection step based on background image and the described global motion vector of reconstruct, optionally uses raw mode, difference modes that each video block is encoded.

2. method for video coding according to claim 1, is characterized in that, in described background modeling step, the video sequence modeling generation background image of described use input comprises:

For each pixel, find the pixel set of this pixel in training set, then begin traversal;

For each pixel value, according to poor between current pixel value and next adjacent pixel values in this pixel set, the dynamic threshold that utilizes current time to generate judges, if poor absolute value is greater than threshold value, judge that the current data section finishes, and begins next data segment; And then, the pixel set of whole current pixel position is divided into some data segments;

Distribute a weight for each section, described weight is the size of this segment data set; Based on described weight, the background pixel value of the described pixel that calculates.

3. method for video coding according to claim 2, is characterized in that, described background modeling step also comprises regularly chooses training set again to upgrade the step of described background image.

4. method for video coding according to claim 1, is characterized in that, in described background modeling step, background image encoded to obtain the method for rebuilding background image comprise:

The background image that use diminishes or the coding method coding modeling of harmless, image or video generates; Or all background images are regarded as a sequence, use MPEG-1/2/4, H.263, H.264/AVC, VC1, AVS, JPEG, JPEG2000 or MJPEG encode to this sequence.

5. method for video coding according to claim 1, is characterized in that, in described overall motion estimation step, described overall motion estimation comprises:

Take data block as base unit, make present image carry out take the background image of reconstruct as reference picture the overall situation, whole pixel or minute pixel motion search, getting the intermediate value of motion vector set, maximum cluster or mean value is the global motion vector of present image.

6. method for video coding according to claim 1, it is characterized in that, in described model selection step, described raw mode, the difference modes optionally used encoded to each video block, and the method for selection is: by comparing the rate distortion result of raw mode and difference modes.

7. method for video coding according to claim 6, is characterized in that, described raw mode is encoded to:

According to global motion vector, find the corresponding data of predictive reference data in background image of data to be encoded, if predictive reference data has been encoded to difference modes, take the decoding superposition value of this predictive reference data and its corresponding data in background image as reference, otherwise, directly take the decoding original value of predictive reference data as with reference to coming the direct coding data to be encoded.

8. method for video coding according to claim 7, is characterized in that, described difference modes is encoded to:

According to global motion vector, find the data of the correspondence of predictive reference data in background image of data to be encoded, if institute's predictive reference data has been encoded to raw mode, the decoding difference value of the corresponding data in this predictive reference data and its background image is as reference, otherwise, directly take the decoding original value of predictive reference data as the differential data with reference to data corresponding in encode data to be encoded and background image.

9. method for video coding according to claim 3, is characterized in that, the described training set of regularly again choosing is specially to upgrade described background image:

Carry out the renewal of background image according to video-frequency band, one section input video sequence that described video-frequency band is encoded for the background image that uses same reconstruct, whole input video sequence can be regarded as by some end to end video-frequency bands and consists of, during coding, choosing the training plan image set from the current video section carries out background modeling and generates a width background image, for next video-frequency band coding, when making current video section coding, use be background image in the generation of previous video section coding.

10. the video coding system based on background modeling and optional differential mode, is characterized in that, comprising:

The background modeling module is used for using the video sequence modeling generation background image of inputting, and background image is through reconstructed background image after encoding and decoding;

The overall motion estimation module for every width input picture being carried out the overall motion estimation of pixel or sub-pixel precision, obtains global motion vector;

Mode selection module is used for background image and described global motion vector based on reconstruct, optionally uses raw mode, difference modes that each video block is encoded.

11. video coding system according to claim 10 is characterized in that, in described background modeling module, the video sequence modeling generation background image of described use input comprises:

Be used for for each pixel, find the pixel set of this pixel in training set, then begin traversal; For each pixel value, according to poor between current pixel value and next adjacent pixel values in this pixel set, the dynamic threshold that utilizes current time to generate judges, if poor absolute value is greater than threshold value, judge that the current data section finishes, and begins next data segment; And then, the pixel set of whole current pixel position is divided into some data segments; Distribute a weight for each section, described weight is the size of this segment data set; Based on described weight, the submodule of the background pixel value of the described pixel that calculates.

12. video coding system according to claim 11 is characterized in that, described background modeling module also comprises regularly chooses training set again to upgrade the submodule of described background image.

13. video coding system according to claim 10 is characterized in that, in described background modeling module, background image is encoded to obtain in the submodule of background image of reconstruct, described coding method comprises:

The background image that use diminishes or the coding method coding modeling of harmless, image or video generates; Or

All background images are regarded as a sequence, use MPEG-1/2/4, H.263, H.264/AVC, VC1, AVS, JPEG, JPEG2000 or MJPEG encode to this sequence.

14. video coding system according to claim 10 is characterized in that, described overall motion estimation module also comprises:

Be used for take data block as base unit, make present image carry out the overall situation take the background image of reconstruct as reference picture, whole pixel or minute pixel motion search, getting the intermediate value of motion vector set, maximum cluster or mean value is the submodule of the global motion vector of present image.

15. video coding system according to claim 10, it is characterized in that, in described mode selection module, described raw mode, the difference modes optionally used encoded to each video block, and the method for selection is: by comparing the rate distortion result of raw mode and difference modes.

16. video coding system according to claim 15 is characterized in that, described raw mode is encoded to:

17. video coding system according to claim 16 is characterized in that, described difference modes is encoded to:

According to global motion vector, find the corresponding data of predictive reference data in background image of data to be encoded, if institute's predictive reference data has been encoded to raw mode, take the decoding difference value of this predictive reference data and its corresponding data in background image as reference, otherwise, directly take the decoding original value of predictive reference data as the differential data with reference to data corresponding in encode data to be encoded and background image.

18. video coding system according to claim 12 is characterized in that, the described training set of regularly again choosing is specially to upgrade described background image:

Carry out the renewal of background image according to video-frequency band, one section input video sequence that described video-frequency band is encoded for the background image that uses same reconstruct, whole input video sequence can be regarded as by some end to end video-frequency bands and consists of, during coding, choosing the training plan image set from the current video section carries out background modeling and generates a width background image, for next video-frequency band coding, make current video section when coding, use be the background image that generates in previous video section coding.

19. one kind generates the video encoding/decoding method of code stream with method for video coding claimed in claim 1, it is characterized in that, comprising:

Decoding and rebuilding background image and global motion vector;

Each video block is carried out raw mode or difference modes decoding.

20. video encoding/decoding method according to claim 19 is characterized in that, described raw mode decoding comprises:

If data to be decoded be encoded to raw mode, according to global motion vector, obtain the corresponding data of predictive reference data in background image; If predictive reference data is for being encoded to difference modes, take the decoding superposition value of this predictive reference data and its corresponding data in background image as reference, otherwise, directly take the decoding original value of predictive reference data as with reference to the data to be decoded that directly decode.

21. video encoding/decoding method according to claim 19 is characterized in that, described difference modes decoding comprises:

If data to be decoded be encoded to difference modes, according to global motion vector, obtain the corresponding data of predictive reference data in background image; If institute's predictive reference data has been encoded to raw mode, take the decoding difference value of this predictive reference data and its corresponding data in background image as reference, otherwise, directly take the decoding original value of predictive reference data as with reference to decoding current data to be decoded, the data that decode pass through again with background image in the superposition of corresponding data.

22. one kind generates the video decoding system of code stream with video coding system claimed in claim 10, it is characterized in that, comprising:

The module that is used for decoding background image and global motion vector;

Be used for each video block is carried out the module of raw mode or difference modes decoding.

23. video decoding system according to claim 22 is characterized in that, described raw mode decoding comprises:

If data to be decoded be encoded to raw mode, according to global motion vector, obtain the corresponding data of predictive reference data in background image; If predictive reference data is for being encoded to difference modes, take the decoding superposition value of this predictive reference data and its corresponding data in background image as reference, otherwise, directly take the decoding original value of predictive reference data as with reference to directly decoding, to obtain final decoded data.

24. video encoding/decoding method according to claim 22 is characterized in that, described difference modes decoding comprises:

If data to be decoded be encoded to difference modes, according to global motion vector, obtain the corresponding data of predictive reference data in background image; If institute's predictive reference data has been encoded to raw mode, take the decoding difference value of this predictive reference data and its corresponding data in background image as reference, otherwise, directly take the decoding original value of predictive reference data as with reference to decoding current data to be decoded, the data that decode again through with background image in the superposition of corresponding data obtain final decoded data.