Summary of the invention
The present invention is different from the Moving Objects in Video Sequences such as existing optical flow method, frame difference method, kinergety detection method, Background difference
Extracting method, based on being the coding information such as the predictive mode in video code flow, motion vector, according to coding information and vision
The relatedness of area-of-interest, identifies that the visual signature significance region, spatial domain in video coding contents and temporal signatures vision show
Work degree region, thus realize the mark automatically of video interested region and obtain.
According to HVS feature, human eye is more sensitive than chrominance information to monochrome information, and the inventive method is for video sequence
In the coding information of luminance component, carry out the mark automatically of video interested region and obtain.
The inventive method specifically includes following step:
Step one: input yuv format, GOP(Group of Picture, GOP) structure is the video sequence of IPPP, reads
The luminance component Y of coded macroblocks, carries out coding parameter configuration and initiation parameter;
Step 2: the first frame to video sequence, i.e. I frame carries out intraframe predictive coding;
In video encoding standard, I frame is as the reference point of random access, containing bulk information, owing to it can not utilize
Temporal correlation between consecutive frame encodes, thus uses intra-frame predictive encoding method, utilizes oneself coding weight in present frame
Current macro is predicted by the coding information building macro block, to eliminate spatial redundancy.To video sequence head frame, i.e. I frame carries out frame
Intraprediction encoding is a kind of conventional coded system usual in Video coding.
Step 3: current p frame is carried out inter prediction encoding, utilizes the dependency of consecutive frame video content to eliminate the time superfluous
Remaining.In record present frame, the inter-frame forecast mode type of all coded macroblocks, is designated as Modepn;
Wherein, p=1,2,3 ..., L-1, represent pth and carry out the frame of video of interframe encode, L is that whole video sequence is carried out
The totalframes of coding;N represents the sequence number of the n-th coded macroblocks in current encoded frame.
Step 4: identify the visual signature significance region, spatial domain of current p frame, if particularly as follows: the frame of current coding macro block
Between predictive mode ModepnBelong to sub-split set of modes or intra prediction mode set, i.e. Modepn∈{8×8,8×4,4
× 8,4 × 4}or{Intra16 × 16, Intra4 × 4}, then this macro block is labeled as SYp(x,y,Modepn)=1, belongs to spatial domain
Visual signature significance region, otherwise labelling SYp(x,y,Modepn)=0;Wherein, the luminance component of Y presentation code macro block, (x,
Y) position coordinates of this coded macroblocks, p and Mode are representedpnIt is defined as above, travels through all coded macroblocks in current p frame;
Fig. 1 gives H.264 standard inter-frame forecast mode and selects schematic flow sheet.
Through experiment, find in H.264/AVC standard code, it was predicted that have between coding result and human eye area-of-interest
There is a strong correlation: the moving region higher for human eye attention rate or texture-rich region, ModepnBig more options sub-split
Set of modes { 8 × 8,8 × 4,4 × 8,4 × 4};At Shot change, video content is undergone mutation, or occurs that motion amplitude is relatively
During big Moving Objects, human eye attention rate is the highest, now ModepnJust can select intra prediction mode set Intra16 × 16,
Intra4×4};For the background smooth region that human eye attention rate is relatively low, ModepnBig more options macroblock partition set of modes
{Skip,16×16,16×8,8×16}.Fig. 2, as a example by Claire sequence, gives Claire sequence the 50th frame inter prediction
Pattern scattergram, from figure it appeared that in the region that human eye attention rate is higher, coded macroblocks mostly have selected interframe Asia and divides
Cut predictive mode set.
Step 5: each coded macroblocks motion vector V in the horizontal direction in record pth framexpnWith in vertical direction
On motion vector Vypn;And calculate all coded macroblocks average motion vector in the horizontal direction in previous coded frameAnd the average motion vector in vertical direction
Wherein, Vx(p-1)nAnd Vy(p-1)nRepresent previous volume
Each coded macroblocks motion vector in the horizontal and vertical directions in code frame, the definition of p with n is identical with step 3;Num table
Show the macro block number comprised in a coded frame, namely accumulative frequency.Fig. 3 with the video of QCIF form (176 × 144) is
Example, gives position and sequence number n thereof of all coded macroblocks (16 × 16) in a coded frame, now,
Step 6: identify the time-domain visual Feature Saliency region of current p frame, if particularly as follows: the water of current coding macro block
Square to motion vector VxpnMore than former frame coded macroblocks motion vector meansigma methods in the horizontal directionOr currently compile
The movement in vertical direction vector V of decoding macroblockypnMore than former frame coded macroblocks in movement in vertical direction vector average value
Then this macro block belongs to time-domain visual Feature Saliency region, labelling TYp(x,y,Vxpn,Vypn)=1, otherwise labelling TYp(x,y,Vxpn,
Vypn)=0, travels through all coded macroblocks in current p frame;
Wherein, the luminance component of Y presentation code macro block, (x, y) represents the position coordinates of this coded macroblocks, the definition of p with
Step 3 is identical.
Motion perception is one of most important visual processes mechanism in human visual system.Through experiment, find have relatively
The encoded content of big motion vector is precisely human eye moving region interested (such as head, arm, personage etc.);And motion vector
The less encoded content of even zero static background region that human eye attention rate is relatively low just.Fig. 4, as a example by Akiyo sequence, gives
Go out Akiyo sequence the 50th frame motion vector scattergram, it appeared that in the higher face of human eye attention rate and head and shoulder from figure
In region, coded macroblocks is generally of bigger motion vector.
Whether violent the movement degree of current coding macro block is, it is determined that setting of threshold value is bigger on the impact of result.For reducing
False Rate, movement degree decision threshold horizontally and vertically is designated as by the present invention respectivelyWith Represent all coded macroblocks average motion vector in the horizontal direction in former frame,Represent in former frame
All coded macroblocks average motion vector in vertical direction.In the present invention, the setting of dynamic threshold, has taken into full account video
The temporal correlation of sequence, enables threshold value to change with the change of former frame coded macroblocks motion vector meansigma methods, effectively subtracts
Lack erroneous judgement, it is possible to obtain time-domain visual Feature Saliency region quickly and accurately.
Step 7: the video interested region of labelling current p frame, particularly as follows: all codings traveled through in current p frame are grand
Block, spatial feature significance and time-domain visual Feature Saliency according to each coded macroblocks be marked, concrete labelling formula
As follows:
Marking video area-of-interest, is divided into following a few class situation:
If current coding macro block has spatially and temporally visual signature significance, i.e. S simultaneouslyYp(x,y,Modepn)=1 is also
And TYp(x,y,Vxpn,Vypn)=1, illustrates that current coding macro block not only grain details is enriched, and creates bigger motion and vow
Amount, then human eye interest level is the highest, labelling ROIYp(x,y)=3;
If only having time-domain visual Feature Saliency, not there is spatial domain visual signature significance, i.e. TYp(x,y,Vxpn,Vypn)
=1 and SYp(x,y,Modepn)=0, illustrates that current coding macro block creates bigger motion vector, and the perception according to HVS is special
Levying, human eye has high susceptibility to the motion of object, and human eye interest level takes second place, labelling ROIYp(x,y)=2;
If Macroblock Motion degree is relatively low, not there is time-domain visual Feature Saliency, but there is abundant texture information, only have
There are spatial domain visual signature significance, i.e. SYp(x,y,Modepn)=1 and TYp(x,y,Vxpn,Vypn)=0, human eye interest level is again
Secondary, labelling ROIYp(x,y)=1;
If neither there is spatial domain visual signature significance the most not there is time-domain visual Feature Saliency, i.e. SYp(x,y,
Modepn)=0 and TYp(x,y,Vxpn,Vypn)=0, illustrates that current coding macro block texture is smooth, it is mild or static, generally to move
Being static background area, be then human eye regions of non-interest, human eye interest level is minimum, labelling ROIYp(x,y)=0;
Wherein, ROIYp(x y) represents current coding macro block visual impression interest priority;TYp(x,y,Vxpn,Vypn) represent and work as
The time-domain visual Feature Saliency of front coded macroblocks;SYp(x,y,Modepn) represent current coding macro block spatial domain visual signature show
Work degree;(x y) represents the position coordinates of current coding macro block;Y represents the luminance component of macro block;P represents that pth carries out interframe volume
The frame of video of code;N represents the sequence number of the n-th coded macroblocks in current encoded frame.
Step 8: output Video coding code stream, particularly as follows: according to the ROI of labellingYp(x, y) priority level interested height,
The luminance component Y of all macro blocks in current p frame is done following process, and the video flowing after output token,
Owing to the span of the luminance component of coded macroblocks is Y ∈ [0,255], represent macro block brightness component from 0 to 255
From completely black to 256 the whitest ranks.ROI according to labellingYp(x, y) priority level interested height, the present invention is directed to macro block
Luminance component Y do following process, and the video flowing after output token.
If ROIYp(x, y)=3, interest level is the highest, and human eye attention rate is the highest, by the luminance component of this coded macroblocks
Being set to 255, output macro Block Brightness component value is the highest, i.e. Yp(x,y)=255;
If ROIYp(x, y)=2, interest level takes second place, and human eye attention rate is higher, by the luminance component of this coded macroblocks
Being set to 150, output macro Block Brightness component value is higher, i.e. Yp(x,y)=150;
If ROIYp(x, y)=1, again, human eye attention rate is relatively low, by the luminance component of this coded macroblocks for interest level
Being set to 100, output macro Block Brightness component value is relatively low, i.e. Yp(x,y)=100;
If ROIYp(x, y)=0, it is regions of non-interest, human eye attention rate is minimum, by the luminance component of this coded macroblocks
Being set to 0, output macro Block Brightness component value is minimum, i.e. Yp(x,y)=0。
Step 9: return step 3, process next frame, until traveling through whole video sequence.
Fig. 5 gives video interested region mark and extracting method flow chart.
Fig. 6 gives the output result of the video interested region after the labelling of typical video sequences.Beneficial effect
This method is the rapid extraction of video interested region according to basic coding information realization.This method utilizes basic volume
Relatedness between code information and human eye vision area-of-interest, identifies that the spatial domain visual signature in video coding contents shows respectively
Work degree region and temporal signatures visual saliency region, the mark in conjunction with spatially and temporally visual signature significance region is tied
Really, define video interested priority, finally achieve video interested automatically extracting.The inventive method may be based on feeling emerging
Interest region ROI(Region of Interest, ROI) video coding technique provide important coding basis.
In being embodied as, complete following procedure in a computer:
The first step: read in video sequence according to coding profile encoder.cfg, join according to the parameter in configuration file
Put encoder.Such as: complete video code flow structure GOP=IPPP ...;Coding frame number FramesToBeEncoded=100;Frame per second
FrameRate=30f/s;Video file width S ourceWidth=176, highly SourceHeight=144;Output file title
OutputFile=ROI.264;Quantization step value QPISlice=28, QPPSlice=28;Motion estimation search range
SearchRange=±16;Reference frame number NumberReferenceFrames=5;Activity ratio distortion cost function
RDOptimization=on;The parameter configuration such as entropy code type SymbolMode=CAVLC, initiation parameter L=coded frame are set
Number, p=1;
Second step: read coded macroblocks luma component values Y from input video sequence the most frame by frame;
3rd step: to video sequence head frame, i.e. I frame carries out intraframe predictive coding;
4th step: current p frame is carried out inter prediction encoding;The inter-frame forecast mode type of record current coding macro block
Modepn;Wherein, p=1,2,3 ..., L-1, represent pth and carry out the frame of video of interframe encode, L is that whole video sequence is compiled
The totalframes of code;N represents the sequence number of the n-th coded macroblocks in current encoded frame.
5th step: mark visual signature significance region, spatial domain, if the inter-frame forecast mode Mode of current coding macro blockpnBelong to
In sub-split set of modes or intra prediction mode set, Modepn∈{8×8,8×4,4×8,4×4}or{Intra16×
16, Intra4 × 4}, then be labeled as S by this macro blockYp(x,y,Modepn)=1, belongs to visual signature significance region, spatial domain, no
Then labelling SYp(x,y,Modepn)=0;
6th step: if p ≠ 1, each coded macroblocks motion vector V in the horizontal direction in record pth framexpnWith
Motion vector V in vertical directionypn;And calculate all coded macroblocks average fortune in the horizontal direction in previous coded frame
Dynamic vectorAnd the average motion vector in vertical directionOtherwise, the tenth step is jumped to;
7th step: mark time-domain visual Feature Saliency region, if the horizontal motion vector V of current coding macro blockxpn
More than former frame coded macroblocks motion vector meansigma methods in the horizontal directionOr the vertical direction of current coding macro block
Motion vector VypnMore than former frame coded macroblocks in movement in vertical direction vector average valueMeet any of which one
Criterion, then this macro block belongs to time-domain visual Feature Saliency region, labelling TYp(x,y,Vxpn,Vypn)=1, otherwise labelling TYp
(x,y,Vxpn,Vypn)=0;
8th step: marking video area-of-interest.
If current coding macro block has spatially and temporally visual signature significance, i.e. S simultaneouslyYp(x,y,Modepn)=1 is also
And TYp(x,y,Vxpn,Vypn)=1, then human eye interest level is the highest, labelling ROIYp(x,y)=3;
If only having time-domain visual Feature Saliency, i.e. TYp(x,y,Vxpn,Vypn)=1 and SYp(x,y,Modepn)=0, people
Eye interest level takes second place, labelling ROIYp(x,y)=2;
If only having spatial domain visual signature significance, i.e. SYp(x,y,Modepn)=1 and TYp(x,y,Vxpn,Vypn)=0, people
Eye interest level again, labelling ROIYp(x,y)=1;
If neither there is spatial domain visual signature significance the most not there is time-domain visual Feature Saliency, i.e. SYp(x,y,
Modepn)=0 and TYp(x,y,Vxpn,Vypn)=0, then be human eye regions of non-interest, labelling ROIYp(x,y)=0;
9th step: output Video coding code stream.
If ROIYp(x, y)=3, interest level is the highest, and human eye attention rate is the highest, by the luminance component of this coded macroblocks
Being set to 255, output macro Block Brightness component value is the highest, i.e. Yp(x,y)=255;
If ROIYp(x, y)=2, interest level takes second place, and human eye attention rate is higher, by the luminance component of this coded macroblocks
Being set to 150, output macro Block Brightness component value is higher, i.e. Yp(x,y)=150;
If ROIYp(x, y)=1, again, human eye attention rate is relatively low, by the luminance component of this coded macroblocks for interest level
Being set to 100, output macro Block Brightness component value is relatively low, i.e. Yp(x,y)=100;
If ROIYp(x, y)=0, it is regions of non-interest, human eye attention rate is minimum, by the luminance component of this coded macroblocks
Being set to 0, output macro Block Brightness component value is minimum, i.e. Yp(x,y)=0。
Tenth step: if p ≠ L-1, p=p+1, jump to the 3rd step;Otherwise, coding is terminated.
Utilize the output result schematic diagram of the inventive method marking video area-of-interest, as shown in Figure 6.Typically to regard
Frequently, as a example by supervisory sequence (Hall) and indoor activity video sequence (Salesman), utilize motion vector distribution result and interframe pre-
Surveying model selection result, marking video area-of-interest, if the human eye interest level of certain macro block is the highest, then in output video
The brightness value of this position is the highest, otherwise brightness value is the lowest.From Fig. 6, the labelling result of rightmost side string is it is found that use
The shape of the video interested region that the inventive method obtains is irregular, with traditional motion using solid shape template
Object detection method obtain area-of-interest compare, the inventive method labelling result closer to human eye paid close attention to interested
Target shape, it is possible to labelling area-of-interest more accurately.
The inventive method also can be combined with other fast coding technology, is ensureing human eye encoding region of interest quality
Under premise, reduce human eye uninterested background area encoder complexity, reduce the scramble time further it can also be used to based on
H.264 in scalable coding, it is achieved the Selective long-range DEPT coding of area-of-interest.