CN103618900A

CN103618900A - Video region-of-interest extraction method based on encoding information

Info

Publication number: CN103618900A
Application number: CN201310591430.0A
Authority: CN
Inventors: 刘鹏宇; 贾克斌
Original assignee: Beijing University of Technology
Current assignee: Hebei Hongyi Environmental Protection Technology Co ltd
Priority date: 2013-11-21
Filing date: 2013-11-21
Publication date: 2014-03-05
Anticipated expiration: 2033-11-21
Also published as: CN103618900B

Abstract

The invention discloses a video region-of-interest extraction method based on visual perception characteristics and encoding information, and relates to the field of video encoding. The video region-of-interest extraction method comprises the following steps of (1) extracting luminance information of a current encoding macro-block from a primary video stream, (2) identifying a space domain visual characteristic saliency region through an inter-frame prediction mode type of the current encoding macro-block, (3) using a mean motion vector, in the horizontal direction, of a previous encoding macro-block and a mean motion vector, in the perpendicular direction, of the previous encoding macro-block as dual dynamic thresholds, identifying a time domain visual characteristic saliency region according to the result of comparison among a motion vector, in the horizontal direction, of the current encoding macro-block, a motion vector, in the perpendicular direction, of the current encoding macro-block and the dual dynamic thresholds, and (4) defining a video interest priority through combination of the identification result of the space domain visual characteristic saliency region and the identification result of the time domain visual characteristic saliency region, and achieving automatic extraction of a region of interest of a video. According to the video region-of-interest extraction method, the important encoding basis can be provided for the video encoding technology based on the ROI.

Description

Video area-of-interest exacting method based on coded message

Technical field

The invention belongs to video information process field.Utilize video coding technique and human eye vision perception principle to realize a kind of video interested region rapid extracting method.The method can be carried out automatic analysis to the video flowing of input, utilizes coded message mark output video area-of-interest.

Background technology

H.264/AVC, up-to-date video encoding standard has adopted multiple advanced person's coding techniques, and when improving coding efficiency, its encoder complexity also sharply increases, and has limited its extensive use in multimedia signal processing and real time communication business.People to how improving H.264/AVC coding rate conduct in-depth research, and a large amount of fast coding optimized algorithms have been proposed, but But most of algorithms is not distinguished the significance level of regional in vision meaning in video image, all encoded contents are adopted to identical encoding scheme, ignored human visual system HVS(Human Visual System, HVS) otherness to video scene perception.

Optic nerve scientific research proves, and HVS has selectivity to the perception of video scene, and zones of different is had to different visual importance.Therefore, utilize existing coded message to carry out visually-perceptible signature analysis, again according to visually-perceptible feature by computational resource priority allocation to area-of-interest, to improving video coding algorithm real-time, reducing computation complexity, there is important theory significance and using value.And effective detection of visual signature analysis fast and effectively, particularly visual impression region-of-interest is Optimized Coding Based resource, writes the important foundation of efficient video coding scheme.

Summary of the invention

The present invention is different from the Moving Objects in Video Sequences extracting methods such as existing optical flow method, frame difference method, kinergety detection method, background subtraction method, that to take the coded messages such as predictive mode in video code flow, motion vector be basis, according to the relevance of coded message and visual impression region-of-interest, visual signature significance region, spatial domain and temporal signatures visual saliency region in identification video coding contents, thus realize the Automatic Logos of video interested region and obtain.

According to HVS feature, human eye is more responsive than chrominance information to monochrome information, and the inventive method, for the coded message of the luminance component in video sequence, is carried out the Automatic Logos of video interested region and obtains.

The inventive method specifically comprises the steps:

Step 1: input yuv format, GOP(Group of Picture, GOP) video sequence that structure is IPPP, reads the luminance component Y of coded macroblocks, carries out coding parameter configuration and initiation parameter;

Step 2: to the first frame of video sequence, I frame carries out intraframe predictive coding;

In video encoding standard, I frame is as the reference point of random access, contain bulk information, because can not utilizing the temporal correlation between consecutive frame, it encodes, thereby employing intra-frame predictive encoding method, utilize the coded message of own coding and rebuilding macro block in present frame to predict current macro, to eliminate spatial redundancy.To the first frame of video sequence, to carry out intraframe predictive coding be a kind of conventional coded system habitual in Video coding to I frame.

Step 3: current p frame is carried out to inter prediction encoding, utilize the correlation of consecutive frame video content to eliminate time redundancy.The inter-frame forecast mode type that records all coded macroblockss in present frame, is designated as Mode _pn;

Wherein, p=1,2,3 ..., L-1, represents p frame of video of carrying out interframe encode, L is the totalframes that whole video sequence is encoded; N is illustrated in the sequence number of n coded macroblocks in current encoded frame.

Step 4: identify the visual signature significance region, spatial domain of current p frame, be specially: if the inter-frame forecast mode Mode of current coding macro block _pnbelong to sub-split set of modes or intra prediction mode set, i.e. Mode _pn∈ 8 * 8,8 * 4,4 * 8, and 4 * 4}or{Intra16 * 16, Intra4 * 4}, is labeled as S by this macro block _yp(x, y, Mode _pn)=1, belongs to visual signature significance region, spatial domain, otherwise mark S _yp(x, y, Mode _pn)=0; Wherein, the luminance component of Y presentation code macro block, (x, y) represents the position coordinates of this coded macroblocks, p and Mode _pndefinition the same, travel through all coded macroblockss in current p frame;

Fig. 1 has provided H.264 standard inter-frame forecast mode and has selected schematic flow sheet.

Through experiment, find, in standard code H.264/AVC, between predictive coding result and human eye area-of-interest, to there is strong correlation: for the higher moving region of human eye attention rate or texture-rich region, Mode _pnlarge more options sub-split set of modes 8 * 8,8 * 4,4 * 8,4 * 4}; At camera lens, switch, video content is undergone mutation, or while there is the larger Moving Objects of motion amplitude, human eye attention rate is the highest, now Mode _pnjust can select intra prediction mode set { Intra16 * 16, Intra4 * 4}; For the lower background smooth region of human eye attention rate, Mode _pnlarge more options macroblock partition set of modes { Skip, 16 * 16,16 * 8,8 * 16}.Fig. 2 be take Claire sequence as example, has provided Claire sequence the 50th frame inter-frame forecast mode distribution map, can find from figure in the higher region of human eye attention rate, and coded macroblocks has mostly been selected the set of interframe sub-split predictive mode.

Step 5: record each coded macroblocks motion vector V in the horizontal direction in p frame _xpnmotion vector V in vertical direction _ypn; And calculate all coded macroblockss average motion vector in the horizontal direction in previous coded frame

and the average motion vector in vertical direction

Wherein,

{\overset{&OverBar;}{V}}_{x (p - 1) th} = \frac{Σ_{n = 1}^{Num} V_{x (p - 1) n}}{Num}, {\overset{&OverBar;}{V}}_{y (p - 1) th} = \frac{Σ_{n = 1}^{Num} V_{y (p - 1) n}}{Num};

V _{x (p-1) n}and V _{y (p-1) n}represent each coded macroblocks motion vector in the horizontal and vertical directions in previous coded frame, the definition of p and n is identical with step 3; Num represents the macro block number comprising in a coded frame, namely accumulative frequency.It is example that Fig. 3 be take the video of QCIF form (176 * 144), has provided position and the sequence number n thereof of all coded macroblockss (16 * 16) in a coded frame, now,

Num = \frac{176}{16} \times \frac{144}{16} = 11 \times 9 = 99 .

Step 6: identify the time domain visual signature significance region of current p frame, be specially: if the horizontal motion vector V of current coding macro block _xpnbe greater than former frame coded macroblocks motion vector mean value in the horizontal direction

or the movement in vertical direction vector V of current coding macro block _ypnbe greater than former frame coded macroblocks motion vector mean value in the vertical direction

this macro block belongs to time domain visual signature significance region, mark T _yp(x, y, V _xpn, V _ypn)=1, otherwise mark T _yp(x, y, V _xpn, V _ypn)=0, travels through all coded macroblockss in current p frame;

Wherein, the luminance component of Y presentation code macro block, (x, y) represents the position coordinates of this coded macroblocks, the definition of p is identical with step 3.

Motion perception is one of most important visual processes mechanism in human visual system.Through experiment, the encoded content that discovery has larger motion vector is the interested moving region of human eye (as head, arm, personage etc.) just; And motion vector less be even zero the encoded content static background region that human eye attention rate is lower just.Fig. 4 be take Akiyo sequence as example, has provided Akiyo sequence the 50th frame motion vector distribution map, can find that coded macroblocks has larger motion vector conventionally in the higher people's face of human eye attention rate and head shoulder region from figure.

Acutely whether, the setting of decision threshold is larger on the impact of result for the movement degree of current coding macro block.For reducing False Rate, the present invention is designated as the movement degree decision threshold of horizontal direction and vertical direction respectively

with

represent all coded macroblockss average motion vector in the horizontal direction in former frame,

represent all coded macroblockss average motion vector in vertical direction in former frame.The setting of dynamic threshold in the present invention, taken into full account the temporal correlation of video sequence, threshold value can be changed with the variation of former frame coded macroblocks motion vector mean value, effectively reduced erroneous judgement, can obtain quickly and accurately time domain visual signature significance region.

Step 7: the video interested region of the current p frame of mark, is specially: travel through all coded macroblockss in current p frame, carry out mark according to the spatial feature significance of each coded macroblocks and time domain visual signature significance, concrete mark formula is as follows:

{ROI}_{Yp} (x, y) = \{\begin{matrix} 3, S_{Yp} (x, y, {Mode}_{pn}) = 1 | | T_{Yp} (x, y, V_{xpn}, V_{ypn}) = 1 \\ 2, S_{Yp} (x, y, {Mode}_{pn}) = 0 | | T_{Yp} (x, y, V_{xpn}, V_{ypn}) = 1 \\ 1, S_{Yp} (x, y, {Mode}_{pn}) = 1 | | T_{Yp} (x, y, V_{xpn}, V_{ypn}) = 0 \\ 0, S_{Yp} (x, y, {Mode}_{pn}) = 0 | | T_{Yp} (x, y, V_{xpn}, V_{ypn}) = 0 \end{matrix}

Marking video area-of-interest, is divided into following a few class situation:

If current coding macro block has spatial domain and time domain visual signature significance, i.e. S simultaneously _yp(x, y, Mode _pn)=1 and T _yp(x, y, V _xpn, V _ypn)=1, current coding macro block is described, and not only grain details is abundant, and has produced larger motion vector, and human eye interest level is the highest, mark ROI _yp(x, y)=3;

If only there is time domain visual signature significance, do not there is spatial domain visual signature significance, i.e. T _yp(x, y, V _xpn, V _ypn)=1 and S _yp(x, y, Mode _pn)=0, illustrates that current coding macro block has produced larger motion vector, and according to the Perception Features of HVS, human eye has high susceptibility to the motion of object, and human eye interest level takes second place, mark ROI _yp(x, y)=2;

If macro block movement degree is lower, do not there is time domain visual signature significance, but there is abundant texture information, only there is spatial domain visual signature significance, i.e. S _yp(x, y, Mode _pn)=1 and T _yp(x, y, V _xpn, V _ypn)=0, human eye interest level again, mark ROI _yp(x, y)=1;

If neither there is spatial domain visual signature significance, do not there is time domain visual signature significance, i.e. S yet _yp(x, y, Mode _pn)=0 and T _yp(x, y, V _xpn, V _ypn)=0, illustrates that current coding macro block texture is smooth, mild or static, the normally static background area of moving, and is the non-area-of-interest of human eye, and human eye interest level is minimum, mark ROI _yp(x, y)=0;

Wherein, ROI _yp(x, y) represents current coding macro block visual impression interest priority; T _yp(x, y, V _xpn, V _ypn) represent the time domain visual signature significance of current coding macro block; S _yp(x, y, Mode _pn) represent the spatial domain visual signature significance of current coding macro block; (x, y) represents the position coordinates of current coding macro block; Y represents the luminance component of macro block; P represents p frame of video of carrying out interframe encode; N is illustrated in the sequence number of n coded macroblocks in current encoded frame.

Step 8: output video encoding code stream, is specially: according to the ROI of mark _yp(x, y) priority level height interested, does following processing to the luminance component Y of all macro blocks in current p frame, and the video flowing after output token,

Y_{p} (x, y) = \{\begin{matrix} 255, {ROI}_{Yp} (x, y) = 3 \\ 150, {ROI}_{Yp} (x, y) = 2 \\ 100, {ROI}_{Yp} (x, y) = 1 \\ 0, {ROI}_{Yp} (x, y) = 0 \end{matrix}

Because the span of the luminance component of coded macroblocks is Y ∈ [0,255], from 0 to 255 represents that macro block brightness component is from complete black in complete 256 white ranks.According to the ROI of mark _yp(x, y) priority level height interested, the luminance component Y that the present invention is directed to macro block does following processing, and the video flowing after output token.

If ROI _yp(x, y)=3, interest level is the highest, and human eye attention rate is the highest, and the luminance component of this coded macroblocks is made as to 255, and output macro Block Brightness component value is the highest, i.e. Y _p(x, y)=255;

If ROI _yp(x, y)=2, interest level takes second place, and human eye attention rate is higher, and the luminance component of this coded macroblocks is made as to 150, and output macro Block Brightness component value is higher, i.e. Y _p(x, y)=150;

If ROI _yp(x, y)=1, again, human eye attention rate is lower for interest level, and the luminance component of this coded macroblocks is made as to 100, and output macro Block Brightness component value is lower, i.e. Y _p(x, y)=100;

If ROI _yp(x, y)=0, moral sense region-of-interest, human eye attention rate is minimum, and the luminance component of this coded macroblocks is made as to 0, and output macro Block Brightness component value is minimum, i.e. Y _p(x, y)=0.

Step 9: return to step 3, next frame is processed, until travel through whole video sequence.

Fig. 5 has provided video interested region sign and extracting method flow chart.

Fig. 6 has provided the video interested region Output rusults after the mark of exemplary video sequence.Beneficial effect

This method according to basic coding information realization the rapid extraction of video interested region.This method is utilized the relevance between basic coding information and human eye vision area-of-interest, identify respectively visual signature significance region, spatial domain and temporal signatures visual saliency region in video coding contents, again in conjunction with the sign result in spatial domain and time domain visual signature significance region, define video interested priority, finally realized video interested automatic extraction.The video coding technique that the inventive method can be based on region of interest ROI (Region of Interest, ROI) provides important coding basis.

Accompanying drawing explanation

Fig. 1 .H.264 standard inter-frame forecast mode is selected schematic flow sheet;

Fig. 2 .Claire sequence the 50th frame inter-frame forecast mode distribution map;

Fig. 3. the position of each coded macroblocks and sequence number schematic diagram thereof in a frame of video;

Fig. 4 .Akiyo sequence the 50th frame motion vector distribution map;

Fig. 5. the inventive method flow chart;

Fig. 6. utilize the Output rusults schematic diagram of the inventive method marking video area-of-interest.

Embodiment

More responsive than chrominance information to monochrome information in view of human eye, the inventive method is encoded for the luminance component of frame of video.First read in video sequence, extract its luminance component, call Automatic Logos and extraction that video interested region extraction module of the present invention completes area-of-interest.

In the invention process, be to adopt video capture device (as Digital Video etc.) to realize the collection of video image, and by picture transmission to computer, in computer, according to the coded message in video code flow, realize the Automatic Logos of video interested region.Visual signature significance region, predictive coding pattern identification spatial domain according to current coding macro block; The motion vector in horizontal or vertical direction according to current coding macro block again, sign time domain visual signature significance region, reduces due to the impact of different video motion types for region of interesting extraction accuracy by setting dynamic motion vector decision threshold; Finally according to spatial domain/time domain visual signature significance, obtain video interested classification results, realize the automatic extraction of video interested region.

In concrete enforcement, in computer, complete following program:

The first step: encoder.cfg reads in video sequence according to coding configuration file, according to the parameter configuration encoder in configuration file.For example: complete video code flow structure GOP=IPPP Coding frame number FramesToBeEncoded=100; Frame per second FrameRate=30f/s; Video file width S ourceWidth=176, height SourceHeight=144; Output file title OutputFile=ROI.264; Quantization step value QPISlice=28, QPPSlice=28; Motion estimation search scope SearchRange=± 16; Reference frame number NumberReferenceFrames=5; Activity ratio distortion cost function RDOptimization=on; The parameter configuration such as entropy type of coding SymbolMode=CAVLC are set, the initiation parameter L=frame number of encoding, p=1;

Second step: read frame by frame in order coded macroblocks luma component values Y from input video sequence;

The 3rd step: to the first frame of video sequence, I frame carries out intraframe predictive coding;

The 4th step: current p frame is carried out to inter prediction encoding; Record the inter-frame forecast mode type Mode of current coding macro block _pn; Wherein, p=1,2,3 ..., L-1, represents p frame of video of carrying out interframe encode, L is the totalframes that whole video sequence is encoded; N is illustrated in the sequence number of n coded macroblocks in current encoded frame.

The 5th step: sign visual signature significance region, spatial domain, if the inter-frame forecast mode Mode of current coding macro block _pnbelong to sub-split set of modes or intra prediction mode set, Mode _pn∈ 8 * 8,8 * 4,4 * 8, and 4 * 4}or{Intra16 * 16, Intra4 * 4}, is labeled as S by this macro block _yp(x, y, Mode _pn)=1, belongs to visual signature significance region, spatial domain, otherwise mark S _yp(x, y, Mode _pn)=0;

S (x, y, {Mode}_{pn}) = \{\begin{matrix} 1, & {Mode}_{pn} &Element; {8 \times 8,8 \times 4,4 \times 8,4 \times 4} or {Intra 16 \times 16, Intra 4 \times 4} \\ 0, & else \end{matrix}

The 6th step: if each coded macroblocks motion vector V in the horizontal direction in p frame is recorded in p ≠ 1 _xpnmotion vector V in vertical direction _ypn; And calculate all coded macroblockss average motion vector in the horizontal direction in previous coded frame

and the average motion vector in vertical direction

otherwise, jump to the tenth step;

The 7th step: sign time domain visual signature significance region, if the horizontal motion vector V of current coding macro block _xpnbe greater than former frame coded macroblocks motion vector mean value in the horizontal direction

meet wherein any one criterion, this macro block belongs to time domain visual signature significance region, mark T _yp(x, y, V _xpn, V _ypn)=1, otherwise mark T _yp(x, y, V _xpn, V _ypn)=0;

T_{Yp} (x, y, V_{xpn}, V_{ypn}) = \{\begin{matrix} 1, & V_{xpn} > {\overset{&OverBar;}{V}}_{x (p -) th} or V_{ypn} > {\overset{&OverBar;}{V}}_{t (p - 1) th} \\ 0, & else \end{matrix}

The 8th step: marking video area-of-interest.

{ROI}_{Yp} (x, y) = \{\begin{matrix} 3, S_{Yp} (x, y, {Mode}_{pn}) = 1 | | T_{Yp} (x, y, V_{xpn}, V_{ypn}) = 1 \\ 2, S_{Yp} (x, y, {Mode}_{pn}) = 0 | | T_{Yp} (x, y, V_{xpn}, V_{ypn}) = 1 \\ 1, S_{Yp} (x, y, {Mode}_{pn}) = 1 | | T_{Yp} (x, y, V_{xpn}, V_{ypn}) = 0 \\ 0, S_{Yp} (x, y, {Mode}_{pn}) = 0 | | T_{Yp} (x, y, V_{xpn}, V_{ypn}) = 0 \end{matrix}

If current coding macro block has spatial domain and time domain visual signature significance, i.e. S simultaneously _yp(x, y, Mode _pn)=1 and T _yp(x, y, V _xpn, V _ypn)=1, human eye interest level is the highest, mark ROI _yp(x, y)=3;

If only there is time domain visual signature significance, i.e. T _yp(x, y, V _xpn, V _ypn)=1 and S _yp(x, y, Mode _pn)=0, human eye interest level takes second place, mark ROI _yp(x, y)=2;

If only there is spatial domain visual signature significance, i.e. S _yp(x, y, Mode _pn)=1 and T _yp(x, y, V _xpn, V _ypn)=0, human eye interest level again, mark ROI _yp(x, y)=1;

If neither there is spatial domain visual signature significance, do not there is time domain visual signature significance, i.e. S yet _yp(x, y, Mode _pn)=0 and T _yp(x, y, V _xpn, V _ypn)=0, is the non-area-of-interest of human eye, mark ROI _yp(x, y)=0;

The 9th step: output video encoding code stream.

Y_{p} (x, y) = \{\begin{matrix} 255, {ROI}_{Yp} (x, y) = 3 \\ 150, {ROI}_{Yp} (x, y) = 2 \\ 100, {ROI}_{Yp} (x, y) = 1 \\ 0, {ROI}_{Yp} (x, y) = 0 \end{matrix}

The tenth step: if p ≠ L-1, p=p+1, jumps to the 3rd step; Otherwise, finish coding.

Utilize the Output rusults schematic diagram of the inventive method marking video area-of-interest, as shown in Figure 6.Take typical video monitoring sequence (Hall) and indoor activity video sequence (Salesman) is example, utilize motion vector distribution result and inter-frame forecast mode selection result, marking video area-of-interest, if the human eye interest level of certain macro block is higher, in output video, the brightness value of this position is higher, otherwise brightness value is lower.From Fig. 6, the mark result of the rightmost side one row can be found, it is irregular adopting the shape of the video interested region of the inventive method acquisition, compare with the area-of-interest that the moving target detecting method of traditional employing solid shape template obtains, the inventive method mark result more approaches the interesting target shape that human eye is paid close attention to, more accurately mark area-of-interest.

The inventive method also can be combined with other fast coding technology, under guaranteeing the prerequisite of human eye encoding region of interest quality, reduce the uninterested background area of human eye encoder complexity, further reduce the scramble time, also can be used in the scalable coding based on H.264, the selectivity that realizes area-of-interest strengthens coding.

Claims

1. the video area-of-interest exacting method based on coded message, is characterized in that comprising the steps:

Step 1: input yuv format, GOP(Group of Picture, GOP) video sequence that structure is IPPP, reads the luminance component Y of coded macroblocks, carries out coding parameter configuration;

Step 3: current p frame is carried out to inter prediction encoding, record the inter-frame forecast mode type of all coded macroblockss in current p frame, be designated as Mode _pn; P=1,2,3 ..., L-1, represents p frame of video of carrying out interframe encode, L is the totalframes that whole video sequence is encoded; N is illustrated in the sequence number of n coded macroblocks in current encoded frame;

Step 4: identify the visual signature significance region, spatial domain of current p frame, be specially: if the inter-frame forecast mode Mode of current coding macro block _pnbelong to sub-split set of modes or intra prediction mode set, i.e. Mode _pn∈ 8 * 8,8 * 4,4 * 8, and 4 * 4}or{Intra16 * 16, Intra4 * 4}, is labeled as S by this macro block _yp(x, y, Mode _pn)=1, belongs to visual signature significance region, spatial domain, otherwise mark S _yp(x, y, Mode _pn)=0; The luminance component of Y presentation code macro block, (x, y) represents the position coordinates of this coded macroblocks, travels through all coded macroblockss in current p frame;

and the average motion vector in vertical direction

num represents the macro block number comprising in a coded frame, i.e. accumulative frequency; Step 6: identify the time domain visual signature significance region of current p frame, be specially: if the horizontal motion vector V of current coding macro block _xpnbe greater than former frame coded macroblocks motion vector mean value in the horizontal direction

this macro block belongs to time domain visual signature significance region, mark T _yp(x, y, V _xpn, V _ypn)=1, otherwise mark T _yp(x, y, V _xpn, V _ypn)=0, travels through all coded macroblockss in current p frame; Step 7: the video interested region of the current p frame of mark, is specially: travel through all coded macroblockss in current p frame, carry out mark according to the spatial feature significance of each coded macroblocks and time domain visual signature significance, concrete mark formula is as follows:

{ROI}_{Yp} (x, y) = \{\begin{matrix} 3, S_{Yp} (x, y, {Mode}_{pn}) = 1 | | T_{Yp} (x, y, V_{xpn}, V_{ypn}) = 1 \\ 2, S_{Yp} (x, y, {Mode}_{pn}) = 0 | | T_{Yp} (x, y, V_{xpn}, V_{ypn}) = 1 \\ 1, S_{Yp} (x, y, {Mode}_{pn}) = 1 | | T_{Yp} (x, y, V_{xpn}, V_{ypn}) = 0 \\ 0, S_{Yp} (x, y, {Mode}_{pn}) = 0 | | T_{Yp} (x, y, V_{xpn}, V_{ypn}) = 0 \end{matrix}

If current coding macro block has spatial domain and time domain visual signature significance, i.e. S simultaneously _yp(x, y, Mode _pn)=1 and T _yp(x, y, V _xpn, V _ypn)=1, mark ROI _yp(x, y)=3; If current coding macro block only has time domain visual signature significance, do not there is spatial domain visual signature significance, i.e. T _yp(x, y, V _xpn, V _ypn)=1 and S _yp(x, y, Mode _pn)=0, mark ROI _yp(x, y)=2; If current coding macro block does not have time domain visual signature significance, only there is spatial domain visual signature significance, i.e. S _yp(x, y, Mode _pn)=1 and T _yp(x, y, V _xpn, V _ypn)=0, mark ROI _yp(x, y)=1; If current coding macro block neither has spatial domain visual signature significance and does not also have time domain visual signature significance, i.e. S _yp(x, y, Mode _pn)=0 and T _yp(x, y, V _xpn, V _ypn)=0, mark ROI _yp(x, y)=0; Step 8: output video encoding code stream, is specially: according to the ROI of mark _yp(x, y) priority level height interested, does following processing to the luminance component Y of all macro blocks in current p frame, and the video flowing after output token,

Y_{p} (x, y) = \{\begin{matrix} 255, {ROI}_{Yp} (x, y) = 3 \\ 150, {ROI}_{Yp} (x, y) = 2 \\ 100, {ROI}_{Yp} (x, y) = 1 \\ 0, {ROI}_{Yp} (x, y) = 0 \end{matrix}