Face video retrieval method and system
Technical Field
The invention relates to the field of video retrieval, in particular to a face video retrieval method and a face video retrieval system.
Background
With the rapid development of multimedia technology and computer network technology, video is becoming one of the mainstream carriers for information dissemination. The problem faced by people is no longer the lack of video content, but how to quickly and effectively find the content needed by people facing video information in the great amount. In the field of social public security, the video monitoring system becomes an important component for maintaining social security and strengthening social management. The face video retrieval is an urgent need in a public security user monitoring system. As the most popular video search technology at present, no matter the video content retrieval based on the non-compression domain and the video content retrieval based on the compression domain, the common design mode does not utilize the characteristics of the face retrieval, thereby influencing the efficiency of the face video retrieval technology.
Disclosure of Invention
The embodiment of the invention aims to provide a face video retrieval method, and aims to solve the problem of low efficiency of the existing face video retrieval technology.
The embodiment of the invention is realized in such a way that a face video retrieval method comprises the following steps:
step A: judging the current frame pic of the current search videotIs determined by the judgment parameter partIf the value is 1, entering the step B if the value is 1, and otherwise, entering the step E;
and B: searching a current frame by using a first video searching mode;
and C: if the next frame of the current search video exists, making t equal to t +1, setting the next frame of the current search video as the current frame of the current search video, and then entering the step D; otherwise, ending; t represents the frame number of the searched video sequence, and the initial value of t is 1;
step D: if not sbktIf (i, j) is 1, entering step E; otherwise, go to step G.
sbkt(i, j) denotes bkt(i, j) identifying the parameter, bkt(i, j) denotes pictIth row and jth column code blocks;
step E: if the current searching video current frame pictFor intra-predicted frames, let tptBkh × bkw; otherwise, calculate tpt=sum(sign(bkt(i, j) | Condition 2) |1 ≦ i ≦ bkh and 1 ≦ j ≦ bkw);
step F: if tptFirst, all sbk are set to 0t(i, j) ═ 0, then proceed to step C; otherwise, if tptEntering step B if not less than 0.9 × bkh × bkw; otherwise, entering step G; bkw and bkh respectively represent the column number and the row number of a frame of image in a unit of block after the frame of image is divided into blocks;
step G: searching the current frame by using a second video search mode, and then entering the step C;
the first video search mode comprises the steps of:
decoding a current frame of a current search video to obtain a decoded image;
all decoded blocks of the decoded picture are processed as follows: if bkt(i, j) if the prediction mode is a subblock prediction mode, entering a subdivision judgment mode; otherwise, entering a rough judgment mode;
unifying the resolution of the current search area and the search target, and then scaling the current search area and the search target to the same size with the unified resolution;
firstly, extracting image characteristics from a search area of a current decoding image; then comparing with a search target, matching and finishing the search of the current frame of the current search video;
according to the matching result of the current frame of the current search video, identifying parameter identification of each decoding block of the current frame of the current search video;
wherein, sbkt(i,j)=sign(bkt(i, j) | condition 3), condition 3 represents: bkt(i, j) matching the target.
Another objective of an embodiment of the present invention is to provide a face video retrieval system, where the system includes:
a first judgment processing module for judging the pic of the current frame of the current search videotIs determined by the judgment parameter partIf the video is 1, entering a first video searching device if the video is 1, and otherwise entering a scene switching parameter calculation module;
wherein par
tRepresents pic
tThe determination parameter of (a) is determined,
pic
tthe method comprises the steps of representing the tth frame of a current search video, wherein t represents the frame number of a search video sequence, and the initial value of t is 1; condition 1 represents: t is 1
Orpic
t is an intra-predicted frame ortp
t≥0.9*bkh*bkw;tp
tFor scene switching parameters, tp
t=sum(sign(bk
t(i, j) | Condition 2) |1 ≦ i ≦ bkh and 1 ≦ j ≦ bkw); sum (b) of
Variables of|
Condition) Means for summing the variables that satisfy the condition;
condition 2 represents: bk
t(i, j) is an intra prediction block or at least comprises one intra prediction sub-block; bk
t(i, j) denotes pic
tThe ith row and the jth decoding block bkw and bkh of the decoding device respectively represent the column number and the row number of a frame of image which is divided into blocks and takes the blocks as a unit;
first video search means for searching for a current frame using a first video search mode;
the second judgment processing module is used for judging whether a next frame of the current search video frame exists or not, if yes, the next frame of the current search video frame is made to be t +1, the next frame of the current search video frame is set to be the current search video frame, then the third judgment processing module is started, and if not, the process is ended;
a third judgment processing module for judging whether the existence sbk existstIf the (i, j) is not 1, entering a scene switching parameter calculation module, otherwise, entering a second video search device;
a scene switching parameter calculation module for judging if the current frame pi of the video is searched currentlyctFor intra-predicted frames, let tptBkh × bkw; otherwise, tp is calculatedt=sum(sign(bkt(i, j) | Condition 2) |1 ≦ i ≦ bkh and 1 ≦ j ≦ bkw);
a fourth judgment processing module for judging whether tp is availabletIf 0, all sbk are sett(i, j) is equal to 0, and then the second judgment processing module is entered; otherwise, judging if tptEntering the first video searching device if the speed is more than or equal to 0.9 and bkh and bkw; otherwise, entering a second video searching device;
the second video searching device is used for searching the current frame by using a second video searching mode and then entering a second judgment processing module;
the first video search device includes:
the decoding image acquisition module is used for decoding the current searching video frame to acquire a decoding image;
a prediction mode decision module for deciding if bkt(i, j) entering a subdivision decision device if the prediction mode is a subblock prediction mode; otherwise, entering a rough classification judgment device;
the first size unifying module is connected with the prediction mode judging module and is used for unifying the resolution of the current search area and the search target and then scaling the current search area and the search target to the same size according to the unified resolution;
the first target image searching module is used for extracting image characteristics from a searching area of a current decoding image; then comparing with a search target, matching and finishing the search of the current frame of the current search video;
the first identification parameter identification module is used for identifying the identification parameter of each decoding block of the current frame of the current search video according to the matching result of the current frame of the current search video;
wherein, sbkt(i,j)=sign(bkt(i, j) | Condition 3), sbkt(i, j) represents bkt(ii) the identification parameter of (i, j); condition 3 represents: bkt(i, j) matching the target.
The invention has the advantages of
The invention provides a face video retrieval method, which determines a search area of a key frame through information of a non-compressed domain, and then acquires a tracking search area through motion and prediction information of a compressed domain, so that the data volume and the operation amount of video search are reduced, and the timeliness of the video search is improved; in addition, the method also aims at the characteristics of face retrieval, and reduces the calculation amount by reducing the search area; through preprocessing, the accuracy of searching is improved.
Drawings
FIG. 1 is a flow chart of a face video retrieval method according to a preferred embodiment of the present invention;
FIG. 2 is a flowchart of the method of Step1 in FIG. 1;
FIG. 3 is a block diagram of a face video retrieval system in accordance with a preferred embodiment of the present invention;
FIG. 4 is a block diagram of the first video search apparatus of FIG. 3;
FIG. 5 is a view showing the structure of the subdivision determination device in FIG. 4;
FIG. 6 is a structural view of the rough judgment means in FIG. 4;
fig. 7 is a structural diagram of the second video search apparatus in fig. 3.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and examples, and for convenience of description, only parts related to the examples of the present invention are shown. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the invention provides a face video retrieval method and a face video retrieval system, wherein the method determines a search area of a key frame through information of a non-compressed domain, and then acquires a tracking search area through motion and prediction information of a compressed domain, so that the data volume and the operation amount of video search are reduced, and the timeliness of video search is improved; in addition, the method also aims at the characteristics of face retrieval, and reduces the calculation amount by reducing the search area; through preprocessing, the accuracy of searching is improved.
Example one
FIG. 1 is a flow chart of a face video retrieval method according to a preferred embodiment of the present invention; the method comprises the following steps:
step: 0: judgment parameter partIf the value is 1, the Step1 is entered, otherwise, the Step4 is entered.
Wherein par
tRepresents pic
tThe determination parameter of (a) is determined,
pic
tthe method comprises the steps of representing the t-th frame of a current search video (namely the current frame of the current search video), wherein t represents the frame number of a search video sequence, and the initial value of t is 1; condition 1 represents: t is 1
Orpic
t is an intra-predicted frame ortp
t≥0.9*bkh*bkw;tp
tFor scene switching parameters, tp
t=sum(sign(bk
t(i, j) | Condition 2) |1 ≦ i ≦ bkh and 1 ≦ j ≦ bkw); sum (b) of
Variables of|
Condition) Means for summing the variables that satisfy the condition;
condition 2 represents: bk
t(i, j) is an intra prediction block or at least comprises one intra prediction sub-block; bk
t(i, j) denotes pic
tRow i and row j of (1) the decoded block (the size of the block is 16x16 (standard such as H264), 64x64(HEVC), when the block is further divided, these small-sized blocks are called sub-blocks), bkw and bkh respectively represent the number of columns and rows of the image in units of blocks after the image of one frame is divided into blocks;
step 1: the current frame is searched using a first video search mode.
First video search mode (fig. 2 is the method flow diagram of Step1 in fig. 1):
step 11: and decoding the current frame of the current search video to obtain a decoded image.
Step 12: according to the characteristics of face recognition, a search area is defined for the decoded image; namely, all decoded blocks of the decoded image are processed as follows: if bkt(i, j) the prediction mode is a subblock prediction mode, namely, if the block is further divided, entering a subdivision judgment mode; otherwise, go intoEnter the rough point decision mode. A subdivision judgment mode:
step A1: and taking each pixel point in the block as a skin color judgment point, judging the skin color of the skin color judgment point, and adding 1 to the number of the skin color pixel points of the block if the skin color judgment point is the skin color.
Step A2: and if the number of the skin color pixel points in the block is greater than a fourteenth threshold value, judging that the block is drawn into a face video search area, otherwise, drawing the block into a non-face video search area. The fourteenth threshold upper limit is the total number of the pixel points of the block, and the lower limit is half of the total number of the pixel points of the selectable block.
Rough classification judgment mode:
step B1: and taking the average value of the pixel points in the block as a unit as a skin color decision point, namely taking the average value of corresponding components of all the pixel points in the block as the value of each color model component.
Step B2: judging the skin color of the skin color judging point, and if the skin color judging point is the skin color, dividing the skin color judging point into a face video searching area; otherwise, the block is scribed into a non-face video search area.
In the subdivision judgment mode and the rough division judgment mode, if the skin color judgment point is the skin color, the following 6 conditions are satisfied at the same time:
the method comprises the following steps of 1; thres1< b-g < Thres2, requirement 2: thres3< r-g < Thres 4x Wr,
Requirement 3: gup < g < Gdown, requirement 4: thres5< Wr, requirement 5: thres6< Co < Thres7,
The method comprises the following steps: thres8<energyUV<Thres9&&U*Thres10<V&&U*Thres11>V or Thres12<energyUV<Thres13
Wherein Thresjj, jj ∈ [1,13 ]]The first threshold value to the thirteenth threshold value are respectively set according to the actual situation; based on the normalized RGB model, the RGB model,
obtaining normalized RGB color components r, g and b; color balance parameter Wr ═ (r-1/3)
2+(g-1/3)
2(ii) a Constructing a green component upper bound model Gup ═ a
upr
2+b
upr+c
upWherein a is
up,b
up,c
upAs a model parameter, Gdown ═ a
downr
2+b
downr+c
down(ii) a Wherein a is
down,b
down,c
downIs a model parameter; model-based YUV model
Obtaining color energy
Y is a brightness component, and U, V represents two chrominance components of the YUV model respectively; based on YCoCg model
Obtaining Co, wherein Co is a color component value of a YCgCo model;
in the first video search mode, the search area comprises a face video search area and a non-face video search area; the skin color decision point method may be any one of those disclosed in the art.
Step 13: the resolution of the current search area and the search target are unified, and then the current search area and the search target are scaled to the same size with the unified resolution.
Step 14: firstly, extracting image characteristics from a search area of a current decoding image; and then comparing with a search target, matching and finishing the search of the current frame of the current search video.
The image features are extracted and compared with the search target, and the matching method can be any method disclosed in the field of corresponding video search, and is not repeated herein.
Step 15: and identifying the identification parameters of each decoding block of the current frame of the current search video according to the matching result of the current frame of the current search video.
Wherein, sbkt(i,j)=sign(bkt(i, j) | Condition 3), sbkt(i, j) represents bkt(ii) the identification parameter of (i, j); condition 3 represents: bkt(i, j) matching the target.
Step 2: if the next frame of the current search video exists, making t equal to t +1, setting the next frame of the current search video as the current frame of the current search video, and then entering Step 3; otherwise, ending.
Step 3: if not sbktIf (i, j) is 1, go to Step 4; otherwise, go to Step 6.
Step 4: if pictFor intra-predicted frames, let tptBkh × bkw; otherwise, calculate tpt=sum(sign(bkt(i, j) | Condition 2) |1 ≦ i ≦ bkh and 1 ≦ j ≦ bkw).
Step 5: if tptFirst, all sbk are set to 0t(i, j) ═ 0, then proceed to Step 2; otherwise, if tptEntering Step1 when the pressure is not less than 0.9 × bkh × bkw; otherwise, Step6 is entered.
Step 6: using the second video search mode, the current frame is searched and then Step2 is entered.
Second video search mode:
step 61: if bkt(i, j) as an intra prediction block, decoding the block and then delimiting the block as a search region; if not, then,
if spbktWhen (i, j) is 1, sbk is sett(i, j) ═ 1, namely, represents that the current block matches the target; otherwise, sbk is settAnd (i, j) ═ 0, namely, the current block does not match the target.
Wherein spbkt(i, j) denotes bkt(ii) an identification parameter of the reference block of (i, j).
Step 62: the current search area is preprocessed, i.e. the resolutions of the current search area and the search target are unified, and then the current search area and the search target are scaled to the same size with the unified resolution.
Step 63: firstly, extracting image characteristics of a search area, then comparing the image characteristics with a search target, matching and finishing the search of the current frame of the current search video.
The image features are extracted and compared with the search target, and the matching method can be any method disclosed in the field of corresponding video search, and is not repeated herein.
Step 64: and identifying the identification parameters of the decoding blocks according to the matching results of the decoding blocks in the search area.
Example two
FIG. 3 is a block diagram of a face video retrieval system in accordance with a preferred embodiment of the present invention; the system comprises:
a first judgment processing module for judging the pic of the current frame of the current search videotIs determined by the judgment parameter partIf the video is 1, entering a first video searching device if the video is 1, and otherwise entering a scene switching parameter calculation module;
wherein par
tRepresents pic
tThe determination parameter of (a) is determined,
pic
tthe method comprises the steps of representing the t-th frame of a current search video (namely the current frame of the current search video), wherein t represents the frame number of a search video sequence, and the initial value of t is 1; condition 1 represents: t is 1
Orpic
t is an intra-predicted frame ortp
t≥0.9*bkh*bkw;tp
tFor scene switching parameters, tp
t=sum(sign(bk
t(i, j) | Condition 2) |1 ≦ i ≦ bkh and 1 ≦ j ≦ bkw); sum (b) of
Variables of|
Condition) Means for summing the variables that satisfy the condition;
condition 2 represents: bk
t(i, j) is an intra prediction block or at least comprises one intra prediction sub-block; bk
t(i, j) denotes pic
tRow i and row j of (1) the decoded block (the size of the block is 16x16 (standard such as H264), 64x64(HEVC), when the block is further divided, these small-sized blocks are called sub-blocks), bkw and bkh respectively represent the number of columns and rows of the image in units of blocks after the image of one frame is divided into blocks;
first video search means for searching for a current frame using a first video search mode;
and the second judgment processing module is used for judging whether the next frame of the current search video current frame exists or not, if so, making t equal to t +1, setting the next frame of the current search video current frame as the current search video current frame, and then entering the third judgment processing module, otherwise, ending.
A third judgment processing module for judging whether the existence sbk existstIf the (i, j) is not 1, entering a scene switching parameter calculation module, otherwise, entering a second video search device;
a scene switching parameter calculation module for judging if the current frame pic of the video is searched currentlytFor intra-predicted frames, let tptBkh × bkw; otherwise, tp is calculatedt=sum(sign(bkt(i, j) | Condition 2) |1 ≦ i ≦ bkh and 1 ≦ j ≦ bkw).
A fourth judgment processing module for judging whether tp is availabletIf 0, all sbk are sett(i, j) is equal to 0, and then the second judgment processing module is entered; otherwise, judging if tptEntering the first video searching device if the speed is more than or equal to 0.9 and bkh and bkw; otherwise, entering a second video searching device.
The second video searching device is used for searching the current frame by using a second video searching mode and then entering a second judgment processing module;
further, fig. 4 is a structural diagram of the first video search apparatus in fig. 3, the first video search apparatus comprising:
the decoding image acquisition module is used for decoding the current searching video frame to acquire a decoding image;
a prediction mode decision module for deciding if bkt(i, j) entering a subdivision decision device if the prediction mode is a subblock prediction mode; otherwise, entering a rough classification judgment device.
The first size unifying module is connected with the prediction mode judging module and is used for unifying the resolution of the current search area and the search target and then scaling the current search area and the search target to the same size according to the unified resolution;
the first target image searching module is used for extracting image characteristics from a searching area of a current decoding image; and then comparing with a search target, matching and finishing the search of the current frame of the current search video.
And the first identification parameter identification module is used for identifying the identification parameters of each decoding block of the current frame of the current search video according to the matching result of the current frame of the current search video.
Wherein, sbkt(i,j)=sign(bkt(i, j) | Condition 3), sbkt(i, j) represents bkt(ii) the identification parameter of (i, j); condition 3 represents: bkt(i, j) matching the target.
Further, fig. 5 is a structural view of the subdivision determination device in fig. 4;
the subdivision judging device comprises a block skin color pixel point counting module and a first face video searching area dividing module,
the block skin color pixel counting module is used for taking each pixel in the block as a skin color judgment point, judging the skin color of the skin color judgment point, and if the skin color judgment point is skin color, adding 1 to the number of the block skin color pixels;
and the first face video search area dividing module is connected with the block skin color pixel point counting module and is used for judging whether the block is drawn into the face video search area if the number of the block skin color pixel points is greater than a fourteenth threshold value or not.
The fourteenth threshold upper limit is the total number of the pixel points of the block, and the lower limit is half of the total number of the pixel points of the selectable block.
FIG. 6 is a structural view of the rough judgment means in FIG. 4;
the rough-dividing judging device comprises a block color model component value calculating module and a second human face video searching area dividing module,
the block color model component value calculation module is used for taking the mean value of pixel points in the block as a skin color decision point and taking the mean value of corresponding components of all pixel points in the block as the value of each color model component;
the second face video search area dividing module is connected with the block color model component value setting module and used for judging the skin color of the skin color judging point, and if the skin color judging point is the skin color, the second face video search area dividing module is divided into a face video search area; otherwise, the block is drawn into a non-human face video search area.
In the subdivision judgment mode and the rough division judgment mode, if the skin color judgment point is the skin color, the following 6 conditions are satisfied at the same time:
the method comprises the following steps of 1; thres1< b-g < Thres2, requirement 2: thres3< r-g < Thres 4x Wr,
Requirement 3: gup < g < Gdown, requirement 4: thres5< Wr, requirement 5: thres6< Co < Thres7,
The method comprises the following steps: thres8<energyUV<Thres9&&U*Thres10<V&&U*Thres11>V or Thres12<energyUV<Thres13
Wherein Thresjj, jj ∈ [1,13 ]]The first threshold value to the thirteenth threshold value are respectively set according to the actual situation; based on the normalized RGB model, the RGB model,
obtaining normalized RGB color components r, g and b; color balance parameter Wr ═ (r-1/3)
2+(g-1/3)
2(ii) a Constructing a green component upper bound model Gup ═ a
upr
2+b
upr+c
upWherein a is
up,b
up,c
upAs a model parameter, Gdown ═ a
downr
2+b
downr+c
down(ii) a Wherein a is
down,b
down,c
downIs a model parameter; model-based YUV model
Obtaining color energy
Y is a brightness component, and U, V represents two chrominance components of the YUV model respectively; based on YCoCg model
Obtaining Co, wherein Co is a color component value of a YCgCo model; the skin color decision point method may be any one of those disclosed in the art.
Further, fig. 7 is a structural diagram of a second video search apparatus in fig. 3, the second video search apparatus comprising:
a second search area defining module for determining if bkt(i, j) as an intra prediction block, decoding the block and then delimiting the block as a search region; otherwise, if spbktWhen (i, j) is 1, sbk is sett(i, j) ═ 1, namely, represents that the current block matches the target; otherwise, sbk is settAnd (i, j) ═ 0, namely, the current block does not match the target. Wherein spbkt(i, j) denotes bkt(ii) an identification parameter of the reference block of (i, j).
And the second size unifying module is connected with the second searching area demarcating module and is used for preprocessing the current searching area, namely unifying the resolution of the current searching area and the searching target, and then zooming the current searching area and the searching target to the same size with the unified resolution.
And the second target image searching module is used for firstly extracting image characteristics from a searching area, then comparing the image characteristics with a searching target, matching and finishing the searching of the current frame of the current searching video.
The image features are extracted and compared with the search target, and the matching method can be any method disclosed in the field of corresponding video search, and is not repeated herein.
And the second identification parameter identification module is used for identifying the identification parameters of the decoding blocks according to the matching results of the decoding blocks in the search area.
It will be understood by those skilled in the art that all or part of the steps in the method according to the above embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, such as ROM, RAM, magnetic disk, optical disk, etc.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.