CN103618900B

CN103618900B - Video area-of-interest exacting method based on coding information

Info

Publication number: CN103618900B
Application number: CN201310591430.0A
Authority: CN
Inventors: 刘鹏宇; 贾克斌
Original assignee: Beijing University of Technology
Current assignee: Hebei Hongyi Environmental Protection Technology Co ltd
Priority date: 2013-11-21
Filing date: 2013-11-21
Publication date: 2016-08-17
Anticipated expiration: 2033-11-21
Also published as: CN103618900A

Abstract

The invention discloses a kind of view-based access control model Perception Features and the video area-of-interest exacting method of coding information, relate to field of video encoding.The present invention comprises the following steps: first extract the monochrome information of current coding macro block from original video stream；Then, utilize the inter-frame forecast mode type of current coding macro block, identify visual signature significance region, spatial domain；Again with former frame coded macroblocks respectively average motion vector the most in the horizontal and vertical directions as dynamic bi-threshold, according to motion vector both horizontally and vertically and the comparative result of dynamic bi-threshold of current coding macro block, identify time-domain visual Feature Saliency region；Finally combine the mark result in spatially and temporally visual signature significance region, define video interested priority, it is achieved video interested automatically extracts.The inventive method may be based on ROI(Region of Interest, ROI) video coding technique provide important coding basis.

Description

Video area-of-interest exacting method based on coding information

Technical field

The invention belongs to video information process field.Utilize video coding technique and human eye visual perception principle to realize one to regard Frequently area-of-interest rapid extracting method.The video flowing of input can be automatically analyzed by the method, utilizes coding information mark Note and export video interested region.

Background technology

H.264/AVC, up-to-date video encoding standard have employed the coding techniques of multiple advanced person, is improving coding efficiency Meanwhile, its encoder complexity also sharply increases, and limits its extensively should in multimedia signal processing with real-time communication service With.People to how improving H.264/AVC coding rate conduct in-depth research, and propose a large amount of fast coding optimization calculation Method, but But most of algorithms do not differentiate between regional significance level in vision meaning in video image, to all encoded contents Use identical encoding scheme, have ignored human visual system HVS(Human Visual System, HVS) to video scene sense The diversity known.

Visual Neurosci is studied it was demonstrated that HVS has selectivity to the perception of video scene, has zones of different not Same visual importance.Therefore, utilize the information that encoded to carry out visually-perceptible feature analysis, then will according to visually-perceptible feature Calculate resource priority and distribute to area-of-interest, to improving video coding algorithm real-time, reducing computation complexity, have important Theory significance and using value.And the effective of visual feature analysis, particularly vision area-of-interest detects fast and effectively It is Optimized Coding Based resource, writes the important foundation of efficient video coding scheme.

Summary of the invention

The present invention is different from the Moving Objects in Video Sequences such as existing optical flow method, frame difference method, kinergety detection method, Background difference Extracting method, based on being the coding information such as the predictive mode in video code flow, motion vector, according to coding information and vision The relatedness of area-of-interest, identifies that the visual signature significance region, spatial domain in video coding contents and temporal signatures vision show Work degree region, thus realize the mark automatically of video interested region and obtain.

According to HVS feature, human eye is more sensitive than chrominance information to monochrome information, and the inventive method is for video sequence In the coding information of luminance component, carry out the mark automatically of video interested region and obtain.

The inventive method specifically includes following step:

Step one: input yuv format, GOP(Group of Picture, GOP) structure is the video sequence of IPPP, reads The luminance component Y of coded macroblocks, carries out coding parameter configuration and initiation parameter；

Step 2: the first frame to video sequence, i.e. I frame carries out intraframe predictive coding；

In video encoding standard, I frame is as the reference point of random access, containing bulk information, owing to it can not utilize Temporal correlation between consecutive frame encodes, thus uses intra-frame predictive encoding method, utilizes oneself coding weight in present frame Current macro is predicted by the coding information building macro block, to eliminate spatial redundancy.To video sequence head frame, i.e. I frame carries out frame Intraprediction encoding is a kind of conventional coded system usual in Video coding.

Step 3: current p frame is carried out inter prediction encoding, utilizes the dependency of consecutive frame video content to eliminate the time superfluous Remaining.In record present frame, the inter-frame forecast mode type of all coded macroblocks, is designated as Mode_pn；

Wherein, p=1,2,3 ..., L-1, represent pth and carry out the frame of video of interframe encode, L is that whole video sequence is carried out The totalframes of coding；N represents the sequence number of the n-th coded macroblocks in current encoded frame.

Step 4: identify the visual signature significance region, spatial domain of current p frame, if particularly as follows: the frame of current coding macro block Between predictive mode Mode_pnBelong to sub-split set of modes or intra prediction mode set, i.e. Mode_pn∈{8×8,8×4,4 × 8,4 × 4}or{Intra16 × 16, Intra4 × 4}, then this macro block is labeled as S_Yp(x,y,Mode_pn)=1, belongs to spatial domain Visual signature significance region, otherwise labelling S_Yp(x,y,Mode_pn)=0；Wherein, the luminance component of Y presentation code macro block, (x, Y) position coordinates of this coded macroblocks, p and Mode are represented_pnIt is defined as above, travels through all coded macroblocks in current p frame；

Fig. 1 gives H.264 standard inter-frame forecast mode and selects schematic flow sheet.

Through experiment, find in H.264/AVC standard code, it was predicted that have between coding result and human eye area-of-interest There is a strong correlation: the moving region higher for human eye attention rate or texture-rich region, Mode_pnBig more options sub-split Set of modes { 8 × 8,8 × 4,4 × 8,4 × 4}；At Shot change, video content is undergone mutation, or occurs that motion amplitude is relatively During big Moving Objects, human eye attention rate is the highest, now Mode_pnJust can select intra prediction mode set Intra16 × 16, Intra4×4}；For the background smooth region that human eye attention rate is relatively low, Mode_pnBig more options macroblock partition set of modes {Skip,16×16,16×8,8×16}.Fig. 2, as a example by Claire sequence, gives Claire sequence the 50th frame inter prediction Pattern scattergram, from figure it appeared that in the region that human eye attention rate is higher, coded macroblocks mostly have selected interframe Asia and divides Cut predictive mode set.

Step 5: each coded macroblocks motion vector V in the horizontal direction in record pth frame_xpnWith in vertical direction On motion vector V_ypn；And calculate all coded macroblocks average motion vector in the horizontal direction in previous coded frameAnd the average motion vector in vertical direction

Wherein,

{\overset{&OverBar;}{V}}_{x (p - 1) th} = \frac{Σ_{n = 1}^{Num} V_{x (p - 1) n}}{Num}, {\overset{&OverBar;}{V}}_{y (p - 1) th} = \frac{Σ_{n = 1}^{Num} V_{y (p - 1) n}}{Num};

V_x(p-1)nAnd V_y(p-1)nRepresent previous volume Each coded macroblocks motion vector in the horizontal and vertical directions in code frame, the definition of p with n is identical with step 3；Num table Show the macro block number comprised in a coded frame, namely accumulative frequency.Fig. 3 with the video of QCIF form (176 × 144) is Example, gives position and sequence number n thereof of all coded macroblocks (16 × 16) in a coded frame, now,

Num = \frac{176}{16} \times \frac{144}{16} = 11 \times 9 = 99 .

Step 6: identify the time-domain visual Feature Saliency region of current p frame, if particularly as follows: the water of current coding macro block Square to motion vector V_xpnMore than former frame coded macroblocks motion vector meansigma methods in the horizontal directionOr currently compile The movement in vertical direction vector V of decoding macroblock_ypnMore than former frame coded macroblocks in movement in vertical direction vector average value Then this macro block belongs to time-domain visual Feature Saliency region, labelling T_Yp(x,y,V_xpn,V_ypn)=1, otherwise labelling T_Yp(x,y,V_xpn, V_ypn)=0, travels through all coded macroblocks in current p frame；

Wherein, the luminance component of Y presentation code macro block, (x, y) represents the position coordinates of this coded macroblocks, the definition of p with Step 3 is identical.

Motion perception is one of most important visual processes mechanism in human visual system.Through experiment, find have relatively The encoded content of big motion vector is precisely human eye moving region interested (such as head, arm, personage etc.)；And motion vector The less encoded content of even zero static background region that human eye attention rate is relatively low just.Fig. 4, as a example by Akiyo sequence, gives Go out Akiyo sequence the 50th frame motion vector scattergram, it appeared that in the higher face of human eye attention rate and head and shoulder from figure In region, coded macroblocks is generally of bigger motion vector.

Whether violent the movement degree of current coding macro block is, it is determined that setting of threshold value is bigger on the impact of result.For reducing False Rate, movement degree decision threshold horizontally and vertically is designated as by the present invention respectivelyWith Represent all coded macroblocks average motion vector in the horizontal direction in former frame,Represent in former frame All coded macroblocks average motion vector in vertical direction.In the present invention, the setting of dynamic threshold, has taken into full account video The temporal correlation of sequence, enables threshold value to change with the change of former frame coded macroblocks motion vector meansigma methods, effectively subtracts Lack erroneous judgement, it is possible to obtain time-domain visual Feature Saliency region quickly and accurately.

Step 7: the video interested region of labelling current p frame, particularly as follows: all codings traveled through in current p frame are grand Block, spatial feature significance and time-domain visual Feature Saliency according to each coded macroblocks be marked, concrete labelling formula As follows:

{ROI}_{Yp} (x, y) = \{\begin{matrix} 3, S_{Yp} (x, y, {Mode}_{pn}) = 1 | | T_{Yp} (x, y, V_{xpn}, V_{ypn}) = 1 \\ 2, S_{Yp} (x, y, {Mode}_{pn}) = 0 | | T_{Yp} (x, y, V_{xpn}, V_{ypn}) = 1 \\ 1, S_{Yp} (x, y, {Mode}_{pn}) = 1 | | T_{Yp} (x, y, V_{xpn}, V_{ypn}) = 0 \\ 0, S_{Yp} (x, y, {Mode}_{pn}) = 0 | | T_{Yp} (x, y, V_{xpn}, V_{ypn}) = 0 \end{matrix}

Marking video area-of-interest, is divided into following a few class situation:

If current coding macro block has spatially and temporally visual signature significance, i.e. S simultaneously_Yp(x,y,Mode_pn)=1 is also And T_Yp(x,y,V_xpn,V_ypn)=1, illustrates that current coding macro block not only grain details is enriched, and creates bigger motion and vow Amount, then human eye interest level is the highest, labelling ROI_Yp(x,y)=3；

If only having time-domain visual Feature Saliency, not there is spatial domain visual signature significance, i.e. T_Yp(x,y,V_xpn,V_ypn) =1 and S_Yp(x,y,Mode_pn)=0, illustrates that current coding macro block creates bigger motion vector, and the perception according to HVS is special Levying, human eye has high susceptibility to the motion of object, and human eye interest level takes second place, labelling ROI_Yp(x,y)=2；

If Macroblock Motion degree is relatively low, not there is time-domain visual Feature Saliency, but there is abundant texture information, only have There are spatial domain visual signature significance, i.e. S_Yp(x,y,Mode_pn)=1 and T_Yp(x,y,V_xpn,V_ypn)=0, human eye interest level is again Secondary, labelling ROI_Yp(x,y)=1；

If neither there is spatial domain visual signature significance the most not there is time-domain visual Feature Saliency, i.e. S_Yp(x,y, Mode_pn)=0 and T_Yp(x,y,V_xpn,V_ypn)=0, illustrates that current coding macro block texture is smooth, it is mild or static, generally to move Being static background area, be then human eye regions of non-interest, human eye interest level is minimum, labelling ROI_Yp(x,y)=0；

Wherein, ROI_Yp(x y) represents current coding macro block visual impression interest priority；T_Yp(x,y,V_xpn,V_ypn) represent and work as The time-domain visual Feature Saliency of front coded macroblocks；S_Yp(x,y,Mode_pn) represent current coding macro block spatial domain visual signature show Work degree；(x y) represents the position coordinates of current coding macro block；Y represents the luminance component of macro block；P represents that pth carries out interframe volume The frame of video of code；N represents the sequence number of the n-th coded macroblocks in current encoded frame.

Step 8: output Video coding code stream, particularly as follows: according to the ROI of labelling_Yp(x, y) priority level interested height, The luminance component Y of all macro blocks in current p frame is done following process, and the video flowing after output token,

Y_{p} (x, y) = \{\begin{matrix} 255, {ROI}_{Yp} (x, y) = 3 \\ 150, {ROI}_{Yp} (x, y) = 2 \\ 100, {ROI}_{Yp} (x, y) = 1 \\ 0, {ROI}_{Yp} (x, y) = 0 \end{matrix}

Owing to the span of the luminance component of coded macroblocks is Y ∈ [0,255], represent macro block brightness component from 0 to 255 From completely black to 256 the whitest ranks.ROI according to labelling_Yp(x, y) priority level interested height, the present invention is directed to macro block Luminance component Y do following process, and the video flowing after output token.

If ROI_Yp(x, y)=3, interest level is the highest, and human eye attention rate is the highest, by the luminance component of this coded macroblocks Being set to 255, output macro Block Brightness component value is the highest, i.e. Y_p(x,y)=255；

If ROI_Yp(x, y)=2, interest level takes second place, and human eye attention rate is higher, by the luminance component of this coded macroblocks Being set to 150, output macro Block Brightness component value is higher, i.e. Y_p(x,y)=150；

If ROI_Yp(x, y)=1, again, human eye attention rate is relatively low, by the luminance component of this coded macroblocks for interest level Being set to 100, output macro Block Brightness component value is relatively low, i.e. Y_p(x,y)=100；

If ROI_Yp(x, y)=0, it is regions of non-interest, human eye attention rate is minimum, by the luminance component of this coded macroblocks Being set to 0, output macro Block Brightness component value is minimum, i.e. Y_p(x,y)=0。

Step 9: return step 3, process next frame, until traveling through whole video sequence.

Fig. 5 gives video interested region mark and extracting method flow chart.

Fig. 6 gives the output result of the video interested region after the labelling of typical video sequences.Beneficial effect

This method is the rapid extraction of video interested region according to basic coding information realization.This method utilizes basic volume Relatedness between code information and human eye vision area-of-interest, identifies that the spatial domain visual signature in video coding contents shows respectively Work degree region and temporal signatures visual saliency region, the mark in conjunction with spatially and temporally visual signature significance region is tied Really, define video interested priority, finally achieve video interested automatically extracting.The inventive method may be based on feeling emerging Interest region ROI(Region of Interest, ROI) video coding technique provide important coding basis.

Accompanying drawing explanation

Fig. 1 .H.264 standard inter-frame forecast mode selects schematic flow sheet；

Fig. 2 .Claire sequence the 50th frame inter-frame forecast mode scattergram；

Fig. 3. the position of each coded macroblocks and sequence number schematic diagram thereof in a frame of video；

Fig. 4 .Akiyo sequence the 50th frame motion vector scattergram；

Fig. 5. the inventive method flow chart；

Fig. 6. utilize the output result schematic diagram of the inventive method marking video area-of-interest.

Detailed description of the invention

In view of human eye is more sensitive than chrominance information to monochrome information, the inventive method is for the luminance component of frame of video Encode.First reading in video sequence, extract its luminance component, the video interested region extraction module calling the present invention completes Automatically mark and the extraction of area-of-interest.

The present invention is the collection using video capture device (such as DV etc.) to realize video image in implementing, and will Picture transmission is to computer, in a computer according to the automatic mark of the coding information realization video interested region in video code flow Know.Visual signature significance region, predictive coding pattern identification spatial domain according to current coding macro block；Grand according to present encoding again Block motion vector in the horizontal or vertical directions, identifies time-domain visual Feature Saliency region, vows by setting dynamic motion Amount decision threshold reduces owing to different video motion types is for the impact of region of interesting extraction accuracy；Last according to empty Territory/time-domain visual Feature Saliency obtains video interested classification results, it is achieved automatically extracting of video interested region.

In being embodied as, complete following procedure in a computer:

The first step: read in video sequence according to coding profile encoder.cfg, join according to the parameter in configuration file Put encoder.Such as: complete video code flow structure GOP=IPPP ...;Coding frame number FramesToBeEncoded=100；Frame per second FrameRate=30f/s；Video file width S ourceWidth=176, highly SourceHeight=144；Output file title OutputFile=ROI.264；Quantization step value QPISlice=28, QPPSlice=28；Motion estimation search range SearchRange=±16；Reference frame number NumberReferenceFrames=5；Activity ratio distortion cost function RDOptimization=on；The parameter configuration such as entropy code type SymbolMode=CAVLC, initiation parameter L=coded frame are set Number, p=1；

Second step: read coded macroblocks luma component values Y from input video sequence the most frame by frame；

3rd step: to video sequence head frame, i.e. I frame carries out intraframe predictive coding；

4th step: current p frame is carried out inter prediction encoding；The inter-frame forecast mode type of record current coding macro block Mode_pn；Wherein, p=1,2,3 ..., L-1, represent pth and carry out the frame of video of interframe encode, L is that whole video sequence is compiled The totalframes of code；N represents the sequence number of the n-th coded macroblocks in current encoded frame.

5th step: mark visual signature significance region, spatial domain, if the inter-frame forecast mode Mode of current coding macro block_pnBelong to In sub-split set of modes or intra prediction mode set, Mode_pn∈{8×8,8×4,4×8,4×4}or{Intra16× 16, Intra4 × 4}, then be labeled as S by this macro block_Yp(x,y,Mode_pn)=1, belongs to visual signature significance region, spatial domain, no Then labelling S_Yp(x,y,Mode_pn)=0；

S (x, y, {Mode}_{pn}) = \{\begin{matrix} 1, & {Mode}_{pn} &Element; {8 \times 8,8 \times 4,4 \times 8,4 \times 4} or {Intra 16 \times 16, Intra 4 \times 4} \\ 0, & else \end{matrix}

6th step: if p ≠ 1, each coded macroblocks motion vector V in the horizontal direction in record pth frame_xpnWith Motion vector V in vertical direction_ypn；And calculate all coded macroblocks average fortune in the horizontal direction in previous coded frame Dynamic vectorAnd the average motion vector in vertical directionOtherwise, the tenth step is jumped to；

7th step: mark time-domain visual Feature Saliency region, if the horizontal motion vector V of current coding macro block_xpn More than former frame coded macroblocks motion vector meansigma methods in the horizontal directionOr the vertical direction of current coding macro block Motion vector V_ypnMore than former frame coded macroblocks in movement in vertical direction vector average valueMeet any of which one Criterion, then this macro block belongs to time-domain visual Feature Saliency region, labelling T_Yp(x,y,V_xpn,V_ypn)=1, otherwise labelling T_Yp (x,y,V_xpn,V_ypn)=0；

T_{Yp} (x, y, V_{xpn}, V_{ypn}) = \{\begin{matrix} 1, & V_{xpn} > {\overset{&OverBar;}{V}}_{x (p -) th} or V_{ypn} > {\overset{&OverBar;}{V}}_{t (p - 1) th} \\ 0, & else \end{matrix}

8th step: marking video area-of-interest.

{ROI}_{Yp} (x, y) = \{\begin{matrix} 3, S_{Yp} (x, y, {Mode}_{pn}) = 1 | | T_{Yp} (x, y, V_{xpn}, V_{ypn}) = 1 \\ 2, S_{Yp} (x, y, {Mode}_{pn}) = 0 | | T_{Yp} (x, y, V_{xpn}, V_{ypn}) = 1 \\ 1, S_{Yp} (x, y, {Mode}_{pn}) = 1 | | T_{Yp} (x, y, V_{xpn}, V_{ypn}) = 0 \\ 0, S_{Yp} (x, y, {Mode}_{pn}) = 0 | | T_{Yp} (x, y, V_{xpn}, V_{ypn}) = 0 \end{matrix}

If current coding macro block has spatially and temporally visual signature significance, i.e. S simultaneously_Yp(x,y,Mode_pn)=1 is also And T_Yp(x,y,V_xpn,V_ypn)=1, then human eye interest level is the highest, labelling ROI_Yp(x,y)=3；

If only having time-domain visual Feature Saliency, i.e. T_Yp(x,y,V_xpn,V_ypn)=1 and S_Yp(x,y,Mode_pn)=0, people Eye interest level takes second place, labelling ROI_Yp(x,y)=2；

If only having spatial domain visual signature significance, i.e. S_Yp(x,y,Mode_pn)=1 and T_Yp(x,y,V_xpn,V_ypn)=0, people Eye interest level again, labelling ROI_Yp(x,y)=1；

If neither there is spatial domain visual signature significance the most not there is time-domain visual Feature Saliency, i.e. S_Yp(x,y, Mode_pn)=0 and T_Yp(x,y,V_xpn,V_ypn)=0, then be human eye regions of non-interest, labelling ROI_Yp(x,y)=0；

9th step: output Video coding code stream.

Y_{p} (x, y) = \{\begin{matrix} 255, {ROI}_{Yp} (x, y) = 3 \\ 150, {ROI}_{Yp} (x, y) = 2 \\ 100, {ROI}_{Yp} (x, y) = 1 \\ 0, {ROI}_{Yp} (x, y) = 0 \end{matrix}

Tenth step: if p ≠ L-1, p=p+1, jump to the 3rd step；Otherwise, coding is terminated.

Utilize the output result schematic diagram of the inventive method marking video area-of-interest, as shown in Figure 6.Typically to regard Frequently, as a example by supervisory sequence (Hall) and indoor activity video sequence (Salesman), utilize motion vector distribution result and interframe pre- Surveying model selection result, marking video area-of-interest, if the human eye interest level of certain macro block is the highest, then in output video The brightness value of this position is the highest, otherwise brightness value is the lowest.From Fig. 6, the labelling result of rightmost side string is it is found that use The shape of the video interested region that the inventive method obtains is irregular, with traditional motion using solid shape template Object detection method obtain area-of-interest compare, the inventive method labelling result closer to human eye paid close attention to interested Target shape, it is possible to labelling area-of-interest more accurately.

The inventive method also can be combined with other fast coding technology, is ensureing human eye encoding region of interest quality Under premise, reduce human eye uninterested background area encoder complexity, reduce the scramble time further it can also be used to based on H.264 in scalable coding, it is achieved the Selective long-range DEPT coding of area-of-interest.

Claims

1. video area-of-interest exacting method based on coding information, it is characterised in that comprise the steps:

Step one: input yuv format, GOP (Group of Picture, GOP) structure are the video sequence of IPPP, read coding The luminance component Y of macro block, carries out coding parameter configuration；

Step 3: current p frame is carried out inter prediction encoding, records the inter prediction mould of all coded macroblocks in current p frame Formula type, is designated as Mode_pn；P=1,2,3 ..., L-1, represent pth and carry out the frame of video of interframe encode, L is whole video sequence Row carry out the totalframes encoded；N represents the sequence number of the n-th coded macroblocks in current encoded frame；

Step 4: identify the visual signature significance region, spatial domain of current p frame, if particularly as follows: the interframe of current coding macro block is pre- Survey pattern Mode_pnBelong to sub-split set of modes or intra prediction mode set, i.e. Mode_pn∈{8×8,8×4,4×8,4 × 4}or{Intra16 × 16, Intra4 × 4}, then this macro block is labeled as S_Yp(x,y,Mode_pn)=1, belongs to spatial domain vision Feature Saliency region, otherwise labelling S_Yp(x,y,Mode_pn)=0；The luminance component of Y presentation code macro block, (x, y) representing should The position coordinates of coded macroblocks, travels through all coded macroblocks in current p frame；

Step 5: each coded macroblocks motion vector V in the horizontal direction in record pth frame_xpnIn vertical direction Motion vector V_ypn；And calculate all coded macroblocks average motion vector in the horizontal direction in previous coded frameAnd the average motion vector in vertical directionNum represents a volume The macro block number comprised in code frame, i.e. accumulative frequency；

Step 6: identify the time-domain visual Feature Saliency region of current p frame, if particularly as follows: the level side of current coding macro block To motion vector V_xpnMore than former frame coded macroblocks motion vector meansigma methods in the horizontal directionOr present encoding is grand The movement in vertical direction vector V of block_ypnMore than former frame coded macroblocks in movement in vertical direction vector average valueThen should Macro block belongs to time-domain visual Feature Saliency region, labelling T_Yp(x,y,V_xpn,V_ypn)=1, otherwise labelling T_Yp(x,y,V_xpn, V_ypn)=0, travels through all coded macroblocks in current p frame；

Step 7: the video interested region of labelling current p frame, particularly as follows: all coded macroblocks traveled through in current p frame, root Spatial feature significance and time-domain visual Feature Saliency according to each coded macroblocks are marked, and concrete labelling formula is as follows:

ROI_Yp(x, y) represents the priority interested of present encoding block, the wherein luminance component of Y presentation code macro block, and p represents the P the frame of video carrying out interframe encode, (x y) represents the position coordinates of this coded macroblocks；

If current coding macro block has spatially and temporally visual signature significance, i.e. S simultaneously_Yp(x,y,Mode_pn)=1 and T_Yp(x,y,V_xpn,V_ypn)=1, then labelling ROI_Yp(x, y)=3；

If current coding macro block only has time-domain visual Feature Saliency, not there is spatial domain visual signature significance, i.e. T_Yp(x,y, V_xpn,V_ypn)=1 and S_Yp(x,y,Mode_pn)=0, then labelling ROI_Yp(x, y)=2；

If current coding macro block does not have time-domain visual Feature Saliency, only there is spatial domain visual signature significance, i.e. S_Yp(x,y, Mode_pn)=1 and T_Yp(x,y,V_xpn,V_ypn)=0, then labelling ROI_Yp(x, y)=1；

If current coding macro block neither has spatial domain, visual signature significance does not the most have time-domain visual Feature Saliency, i.e. S_Yp (x,y,Mode_pn)=0 and T_Yp(x,y,V_xpn,V_ypn)=0, then labelling ROI_Yp(x, y)=0；

Step 8: output Video coding code stream, particularly as follows: according to the ROI of labelling_Yp(x, y) priority level interested height, to working as In front p frame, the luminance component Y of all macro blocks does following process, and the video flowing after output token,