CN106682094B - Face video retrieval method and system - Google Patents

Face video retrieval method and system Download PDF

Info

Publication number
CN106682094B
CN106682094B CN201611087529.7A CN201611087529A CN106682094B CN 106682094 B CN106682094 B CN 106682094B CN 201611087529 A CN201611087529 A CN 201611087529A CN 106682094 B CN106682094 B CN 106682094B
Authority
CN
China
Prior art keywords
video
search
block
current
skin color
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611087529.7A
Other languages
Chinese (zh)
Other versions
CN106682094A (en
Inventor
马国强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Mengwang Video Co Ltd
Original Assignee
Shenzhen Mengwang Video Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Mengwang Video Co Ltd filed Critical Shenzhen Mengwang Video Co Ltd
Priority to CN201611087529.7A priority Critical patent/CN106682094B/en
Publication of CN106682094A publication Critical patent/CN106682094A/en
Application granted granted Critical
Publication of CN106682094B publication Critical patent/CN106682094B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • G06F16/784Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Library & Information Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a face video retrieval method and a system, the method determines a search area of a key frame through information of a non-compressed domain, and then obtains a tracking search area through motion and prediction information of a compressed domain, thereby reducing data volume and operation amount of video search and improving timeliness of video search; in addition, the method also aims at the characteristics of face retrieval, and reduces the calculation amount by reducing the search area; through preprocessing, the accuracy of searching is improved.

Description

Face video retrieval method and system
Technical Field
The invention relates to the field of video retrieval, in particular to a face video retrieval method and a face video retrieval system.
Background
With the rapid development of multimedia technology and computer network technology, video is becoming one of the mainstream carriers for information dissemination. The problem faced by people is no longer the lack of video content, but how to quickly and effectively find the content needed by people facing video information in the great amount. In the field of social public security, the video monitoring system becomes an important component for maintaining social security and strengthening social management. The face video retrieval is an urgent need in a public security user monitoring system. As the most popular video search technology at present, no matter the video content retrieval based on the non-compression domain and the video content retrieval based on the compression domain, the common design mode does not utilize the characteristics of the face retrieval, thereby influencing the efficiency of the face video retrieval technology.
Disclosure of Invention
The embodiment of the invention aims to provide a face video retrieval method, and aims to solve the problem of low efficiency of the existing face video retrieval technology.
The embodiment of the invention is realized in such a way that a face video retrieval method comprises the following steps:
step A: judging the current frame pic of the current search videotIs determined by the judgment parameter partIf the value is 1, entering the step B if the value is 1, and otherwise, entering the step E;
and B: searching a current frame by using a first video searching mode;
and C: if the next frame of the current search video exists, making t equal to t +1, setting the next frame of the current search video as the current frame of the current search video, and then entering the step D; otherwise, ending; t represents the frame number of the searched video sequence, and the initial value of t is 1;
step D: if not sbktIf (i, j) is 1, entering step E; otherwise, go to step G.
sbkt(i, j) denotes bkt(i, j) identifying the parameter, bkt(i, j) denotes pictIth row and jth column code blocks;
step E: if the current searching video current frame pictFor intra-predicted frames, let tptBkh × bkw; otherwise, calculate tpt=sum(sign(bkt(i, j) | Condition 2) |1 ≦ i ≦ bkh and 1 ≦ j ≦ bkw);
step F: if tptFirst, all sbk are set to 0t(i, j) ═ 0, then proceed to step C; otherwise, if tptEntering step B if not less than 0.9 × bkh × bkw; otherwise, entering step G; bkw and bkh respectively represent the column number and the row number of a frame of image in a unit of block after the frame of image is divided into blocks;
step G: searching the current frame by using a second video search mode, and then entering the step C;
the first video search mode comprises the steps of:
decoding a current frame of a current search video to obtain a decoded image;
all decoded blocks of the decoded picture are processed as follows: if bkt(i, j) if the prediction mode is a subblock prediction mode, entering a subdivision judgment mode; otherwise, entering a rough judgment mode;
unifying the resolution of the current search area and the search target, and then scaling the current search area and the search target to the same size with the unified resolution;
firstly, extracting image characteristics from a search area of a current decoding image; then comparing with a search target, matching and finishing the search of the current frame of the current search video;
according to the matching result of the current frame of the current search video, identifying parameter identification of each decoding block of the current frame of the current search video;
wherein, sbkt(i,j)=sign(bkt(i, j) | condition 3), condition 3 represents: bkt(i, j) matching the target.
Another objective of an embodiment of the present invention is to provide a face video retrieval system, where the system includes:
a first judgment processing module for judging the pic of the current frame of the current search videotIs determined by the judgment parameter partIf the video is 1, entering a first video searching device if the video is 1, and otherwise entering a scene switching parameter calculation module;
wherein partRepresents pictThe determination parameter of (a) is determined,
Figure BDA0001168067990000021
pictthe method comprises the steps of representing the tth frame of a current search video, wherein t represents the frame number of a search video sequence, and the initial value of t is 1; condition 1 represents: t is 1Orpict is an intra-predicted frame ortpt≥0.9*bkh*bkw;tptFor scene switching parameters, tpt=sum(sign(bkt(i, j) | Condition 2) |1 ≦ i ≦ bkh and 1 ≦ j ≦ bkw); sum (b) ofVariables of|Condition) Means for summing the variables that satisfy the condition;
Figure BDA0001168067990000031
condition 2 represents: bkt(i, j) is an intra prediction block or at least comprises one intra prediction sub-block; bkt(i, j) denotes pictThe ith row and the jth decoding block bkw and bkh of the decoding device respectively represent the column number and the row number of a frame of image which is divided into blocks and takes the blocks as a unit;
first video search means for searching for a current frame using a first video search mode;
the second judgment processing module is used for judging whether a next frame of the current search video frame exists or not, if yes, the next frame of the current search video frame is made to be t +1, the next frame of the current search video frame is set to be the current search video frame, then the third judgment processing module is started, and if not, the process is ended;
a third judgment processing module for judging whether the existence sbk existstIf the (i, j) is not 1, entering a scene switching parameter calculation module, otherwise, entering a second video search device;
a scene switching parameter calculation module for judging if the current frame pi of the video is searched currentlyctFor intra-predicted frames, let tptBkh × bkw; otherwise, tp is calculatedt=sum(sign(bkt(i, j) | Condition 2) |1 ≦ i ≦ bkh and 1 ≦ j ≦ bkw);
a fourth judgment processing module for judging whether tp is availabletIf 0, all sbk are sett(i, j) is equal to 0, and then the second judgment processing module is entered; otherwise, judging if tptEntering the first video searching device if the speed is more than or equal to 0.9 and bkh and bkw; otherwise, entering a second video searching device;
the second video searching device is used for searching the current frame by using a second video searching mode and then entering a second judgment processing module;
the first video search device includes:
the decoding image acquisition module is used for decoding the current searching video frame to acquire a decoding image;
a prediction mode decision module for deciding if bkt(i, j) entering a subdivision decision device if the prediction mode is a subblock prediction mode; otherwise, entering a rough classification judgment device;
the first size unifying module is connected with the prediction mode judging module and is used for unifying the resolution of the current search area and the search target and then scaling the current search area and the search target to the same size according to the unified resolution;
the first target image searching module is used for extracting image characteristics from a searching area of a current decoding image; then comparing with a search target, matching and finishing the search of the current frame of the current search video;
the first identification parameter identification module is used for identifying the identification parameter of each decoding block of the current frame of the current search video according to the matching result of the current frame of the current search video;
wherein, sbkt(i,j)=sign(bkt(i, j) | Condition 3), sbkt(i, j) represents bkt(ii) the identification parameter of (i, j); condition 3 represents: bkt(i, j) matching the target.
The invention has the advantages of
The invention provides a face video retrieval method, which determines a search area of a key frame through information of a non-compressed domain, and then acquires a tracking search area through motion and prediction information of a compressed domain, so that the data volume and the operation amount of video search are reduced, and the timeliness of the video search is improved; in addition, the method also aims at the characteristics of face retrieval, and reduces the calculation amount by reducing the search area; through preprocessing, the accuracy of searching is improved.
Drawings
FIG. 1 is a flow chart of a face video retrieval method according to a preferred embodiment of the present invention;
FIG. 2 is a flowchart of the method of Step1 in FIG. 1;
FIG. 3 is a block diagram of a face video retrieval system in accordance with a preferred embodiment of the present invention;
FIG. 4 is a block diagram of the first video search apparatus of FIG. 3;
FIG. 5 is a view showing the structure of the subdivision determination device in FIG. 4;
FIG. 6 is a structural view of the rough judgment means in FIG. 4;
fig. 7 is a structural diagram of the second video search apparatus in fig. 3.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and examples, and for convenience of description, only parts related to the examples of the present invention are shown. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the invention provides a face video retrieval method and a face video retrieval system, wherein the method determines a search area of a key frame through information of a non-compressed domain, and then acquires a tracking search area through motion and prediction information of a compressed domain, so that the data volume and the operation amount of video search are reduced, and the timeliness of video search is improved; in addition, the method also aims at the characteristics of face retrieval, and reduces the calculation amount by reducing the search area; through preprocessing, the accuracy of searching is improved.
Example one
FIG. 1 is a flow chart of a face video retrieval method according to a preferred embodiment of the present invention; the method comprises the following steps:
step: 0: judgment parameter partIf the value is 1, the Step1 is entered, otherwise, the Step4 is entered.
Wherein partRepresents pictThe determination parameter of (a) is determined,
Figure BDA0001168067990000051
pictthe method comprises the steps of representing the t-th frame of a current search video (namely the current frame of the current search video), wherein t represents the frame number of a search video sequence, and the initial value of t is 1; condition 1 represents: t is 1Orpict is an intra-predicted frame ortpt≥0.9*bkh*bkw;tptFor scene switching parameters, tpt=sum(sign(bkt(i, j) | Condition 2) |1 ≦ i ≦ bkh and 1 ≦ j ≦ bkw); sum (b) ofVariables of|Condition) Means for summing the variables that satisfy the condition;
Figure BDA0001168067990000052
condition 2 represents: bkt(i, j) is an intra prediction block or at least comprises one intra prediction sub-block; bkt(i, j) denotes pictRow i and row j of (1) the decoded block (the size of the block is 16x16 (standard such as H264), 64x64(HEVC), when the block is further divided, these small-sized blocks are called sub-blocks), bkw and bkh respectively represent the number of columns and rows of the image in units of blocks after the image of one frame is divided into blocks;
step 1: the current frame is searched using a first video search mode.
First video search mode (fig. 2 is the method flow diagram of Step1 in fig. 1):
step 11: and decoding the current frame of the current search video to obtain a decoded image.
Step 12: according to the characteristics of face recognition, a search area is defined for the decoded image; namely, all decoded blocks of the decoded image are processed as follows: if bkt(i, j) the prediction mode is a subblock prediction mode, namely, if the block is further divided, entering a subdivision judgment mode; otherwise, go intoEnter the rough point decision mode. A subdivision judgment mode:
step A1: and taking each pixel point in the block as a skin color judgment point, judging the skin color of the skin color judgment point, and adding 1 to the number of the skin color pixel points of the block if the skin color judgment point is the skin color.
Step A2: and if the number of the skin color pixel points in the block is greater than a fourteenth threshold value, judging that the block is drawn into a face video search area, otherwise, drawing the block into a non-face video search area. The fourteenth threshold upper limit is the total number of the pixel points of the block, and the lower limit is half of the total number of the pixel points of the selectable block.
Rough classification judgment mode:
step B1: and taking the average value of the pixel points in the block as a unit as a skin color decision point, namely taking the average value of corresponding components of all the pixel points in the block as the value of each color model component.
Step B2: judging the skin color of the skin color judging point, and if the skin color judging point is the skin color, dividing the skin color judging point into a face video searching area; otherwise, the block is scribed into a non-face video search area.
In the subdivision judgment mode and the rough division judgment mode, if the skin color judgment point is the skin color, the following 6 conditions are satisfied at the same time:
the method comprises the following steps of 1; thres1< b-g < Thres2, requirement 2: thres3< r-g < Thres 4x Wr,
Requirement 3: gup < g < Gdown, requirement 4: thres5< Wr, requirement 5: thres6< Co < Thres7,
The method comprises the following steps: thres8<energyUV<Thres9&&U*Thres10<V&&U*Thres11>V or Thres12<energyUV<Thres13
Wherein Thresjj, jj ∈ [1,13 ]]The first threshold value to the thirteenth threshold value are respectively set according to the actual situation; based on the normalized RGB model, the RGB model,
Figure BDA0001168067990000061
obtaining normalized RGB color components r, g and b; color balance parameter Wr ═ (r-1/3)2+(g-1/3)2(ii) a Constructing a green component upper bound model Gup ═ aupr2+bupr+cupWherein a isup,bup,cupAs a model parameter, Gdown ═ adownr2+bdownr+cdown(ii) a Wherein a isdown,bdown,cdownIs a model parameter; model-based YUV model
Figure BDA0001168067990000062
Obtaining color energy
Figure BDA0001168067990000063
Y is a brightness component, and U, V represents two chrominance components of the YUV model respectively; based on YCoCg model
Figure BDA0001168067990000064
Obtaining Co, wherein Co is a color component value of a YCgCo model;
in the first video search mode, the search area comprises a face video search area and a non-face video search area; the skin color decision point method may be any one of those disclosed in the art.
Step 13: the resolution of the current search area and the search target are unified, and then the current search area and the search target are scaled to the same size with the unified resolution.
Step 14: firstly, extracting image characteristics from a search area of a current decoding image; and then comparing with a search target, matching and finishing the search of the current frame of the current search video.
The image features are extracted and compared with the search target, and the matching method can be any method disclosed in the field of corresponding video search, and is not repeated herein.
Step 15: and identifying the identification parameters of each decoding block of the current frame of the current search video according to the matching result of the current frame of the current search video.
Wherein, sbkt(i,j)=sign(bkt(i, j) | Condition 3), sbkt(i, j) represents bkt(ii) the identification parameter of (i, j); condition 3 represents: bkt(i, j) matching the target.
Step 2: if the next frame of the current search video exists, making t equal to t +1, setting the next frame of the current search video as the current frame of the current search video, and then entering Step 3; otherwise, ending.
Step 3: if not sbktIf (i, j) is 1, go to Step 4; otherwise, go to Step 6.
Step 4: if pictFor intra-predicted frames, let tptBkh × bkw; otherwise, calculate tpt=sum(sign(bkt(i, j) | Condition 2) |1 ≦ i ≦ bkh and 1 ≦ j ≦ bkw).
Step 5: if tptFirst, all sbk are set to 0t(i, j) ═ 0, then proceed to Step 2; otherwise, if tptEntering Step1 when the pressure is not less than 0.9 × bkh × bkw; otherwise, Step6 is entered.
Step 6: using the second video search mode, the current frame is searched and then Step2 is entered.
Second video search mode:
step 61: if bkt(i, j) as an intra prediction block, decoding the block and then delimiting the block as a search region; if not, then,
if spbktWhen (i, j) is 1, sbk is sett(i, j) ═ 1, namely, represents that the current block matches the target; otherwise, sbk is settAnd (i, j) ═ 0, namely, the current block does not match the target.
Wherein spbkt(i, j) denotes bkt(ii) an identification parameter of the reference block of (i, j).
Step 62: the current search area is preprocessed, i.e. the resolutions of the current search area and the search target are unified, and then the current search area and the search target are scaled to the same size with the unified resolution.
Step 63: firstly, extracting image characteristics of a search area, then comparing the image characteristics with a search target, matching and finishing the search of the current frame of the current search video.
The image features are extracted and compared with the search target, and the matching method can be any method disclosed in the field of corresponding video search, and is not repeated herein.
Step 64: and identifying the identification parameters of the decoding blocks according to the matching results of the decoding blocks in the search area.
Example two
FIG. 3 is a block diagram of a face video retrieval system in accordance with a preferred embodiment of the present invention; the system comprises:
a first judgment processing module for judging the pic of the current frame of the current search videotIs determined by the judgment parameter partIf the video is 1, entering a first video searching device if the video is 1, and otherwise entering a scene switching parameter calculation module;
wherein partRepresents pictThe determination parameter of (a) is determined,
Figure BDA0001168067990000081
pictthe method comprises the steps of representing the t-th frame of a current search video (namely the current frame of the current search video), wherein t represents the frame number of a search video sequence, and the initial value of t is 1; condition 1 represents: t is 1Orpict is an intra-predicted frame ortpt≥0.9*bkh*bkw;tptFor scene switching parameters, tpt=sum(sign(bkt(i, j) | Condition 2) |1 ≦ i ≦ bkh and 1 ≦ j ≦ bkw); sum (b) ofVariables of|Condition) Means for summing the variables that satisfy the condition;
Figure BDA0001168067990000082
condition 2 represents: bkt(i, j) is an intra prediction block or at least comprises one intra prediction sub-block; bkt(i, j) denotes pictRow i and row j of (1) the decoded block (the size of the block is 16x16 (standard such as H264), 64x64(HEVC), when the block is further divided, these small-sized blocks are called sub-blocks), bkw and bkh respectively represent the number of columns and rows of the image in units of blocks after the image of one frame is divided into blocks;
first video search means for searching for a current frame using a first video search mode;
and the second judgment processing module is used for judging whether the next frame of the current search video current frame exists or not, if so, making t equal to t +1, setting the next frame of the current search video current frame as the current search video current frame, and then entering the third judgment processing module, otherwise, ending.
A third judgment processing module for judging whether the existence sbk existstIf the (i, j) is not 1, entering a scene switching parameter calculation module, otherwise, entering a second video search device;
a scene switching parameter calculation module for judging if the current frame pic of the video is searched currentlytFor intra-predicted frames, let tptBkh × bkw; otherwise, tp is calculatedt=sum(sign(bkt(i, j) | Condition 2) |1 ≦ i ≦ bkh and 1 ≦ j ≦ bkw).
A fourth judgment processing module for judging whether tp is availabletIf 0, all sbk are sett(i, j) is equal to 0, and then the second judgment processing module is entered; otherwise, judging if tptEntering the first video searching device if the speed is more than or equal to 0.9 and bkh and bkw; otherwise, entering a second video searching device.
The second video searching device is used for searching the current frame by using a second video searching mode and then entering a second judgment processing module;
further, fig. 4 is a structural diagram of the first video search apparatus in fig. 3, the first video search apparatus comprising:
the decoding image acquisition module is used for decoding the current searching video frame to acquire a decoding image;
a prediction mode decision module for deciding if bkt(i, j) entering a subdivision decision device if the prediction mode is a subblock prediction mode; otherwise, entering a rough classification judgment device.
The first size unifying module is connected with the prediction mode judging module and is used for unifying the resolution of the current search area and the search target and then scaling the current search area and the search target to the same size according to the unified resolution;
the first target image searching module is used for extracting image characteristics from a searching area of a current decoding image; and then comparing with a search target, matching and finishing the search of the current frame of the current search video.
And the first identification parameter identification module is used for identifying the identification parameters of each decoding block of the current frame of the current search video according to the matching result of the current frame of the current search video.
Wherein, sbkt(i,j)=sign(bkt(i, j) | Condition 3), sbkt(i, j) represents bkt(ii) the identification parameter of (i, j); condition 3 represents: bkt(i, j) matching the target.
Further, fig. 5 is a structural view of the subdivision determination device in fig. 4;
the subdivision judging device comprises a block skin color pixel point counting module and a first face video searching area dividing module,
the block skin color pixel counting module is used for taking each pixel in the block as a skin color judgment point, judging the skin color of the skin color judgment point, and if the skin color judgment point is skin color, adding 1 to the number of the block skin color pixels;
and the first face video search area dividing module is connected with the block skin color pixel point counting module and is used for judging whether the block is drawn into the face video search area if the number of the block skin color pixel points is greater than a fourteenth threshold value or not.
The fourteenth threshold upper limit is the total number of the pixel points of the block, and the lower limit is half of the total number of the pixel points of the selectable block.
FIG. 6 is a structural view of the rough judgment means in FIG. 4;
the rough-dividing judging device comprises a block color model component value calculating module and a second human face video searching area dividing module,
the block color model component value calculation module is used for taking the mean value of pixel points in the block as a skin color decision point and taking the mean value of corresponding components of all pixel points in the block as the value of each color model component;
the second face video search area dividing module is connected with the block color model component value setting module and used for judging the skin color of the skin color judging point, and if the skin color judging point is the skin color, the second face video search area dividing module is divided into a face video search area; otherwise, the block is drawn into a non-human face video search area.
In the subdivision judgment mode and the rough division judgment mode, if the skin color judgment point is the skin color, the following 6 conditions are satisfied at the same time:
the method comprises the following steps of 1; thres1< b-g < Thres2, requirement 2: thres3< r-g < Thres 4x Wr,
Requirement 3: gup < g < Gdown, requirement 4: thres5< Wr, requirement 5: thres6< Co < Thres7,
The method comprises the following steps: thres8<energyUV<Thres9&&U*Thres10<V&&U*Thres11>V or Thres12<energyUV<Thres13
Wherein Thresjj, jj ∈ [1,13 ]]The first threshold value to the thirteenth threshold value are respectively set according to the actual situation; based on the normalized RGB model, the RGB model,
Figure BDA0001168067990000101
obtaining normalized RGB color components r, g and b; color balance parameter Wr ═ (r-1/3)2+(g-1/3)2(ii) a Constructing a green component upper bound model Gup ═ aupr2+bupr+cupWherein a isup,bup,cupAs a model parameter, Gdown ═ adownr2+bdownr+cdown(ii) a Wherein a isdown,bdown,cdownIs a model parameter; model-based YUV model
Figure BDA0001168067990000102
Obtaining color energy
Figure BDA0001168067990000103
Y is a brightness component, and U, V represents two chrominance components of the YUV model respectively; based on YCoCg model
Figure BDA0001168067990000104
Obtaining Co, wherein Co is a color component value of a YCgCo model; the skin color decision point method may be any one of those disclosed in the art.
Further, fig. 7 is a structural diagram of a second video search apparatus in fig. 3, the second video search apparatus comprising:
a second search area defining module for determining if bkt(i, j) as an intra prediction block, decoding the block and then delimiting the block as a search region; otherwise, if spbktWhen (i, j) is 1, sbk is sett(i, j) ═ 1, namely, represents that the current block matches the target; otherwise, sbk is settAnd (i, j) ═ 0, namely, the current block does not match the target. Wherein spbkt(i, j) denotes bkt(ii) an identification parameter of the reference block of (i, j).
And the second size unifying module is connected with the second searching area demarcating module and is used for preprocessing the current searching area, namely unifying the resolution of the current searching area and the searching target, and then zooming the current searching area and the searching target to the same size with the unified resolution.
And the second target image searching module is used for firstly extracting image characteristics from a searching area, then comparing the image characteristics with a searching target, matching and finishing the searching of the current frame of the current searching video.
The image features are extracted and compared with the search target, and the matching method can be any method disclosed in the field of corresponding video search, and is not repeated herein.
And the second identification parameter identification module is used for identifying the identification parameters of the decoding blocks according to the matching results of the decoding blocks in the search area.
It will be understood by those skilled in the art that all or part of the steps in the method according to the above embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, such as ROM, RAM, magnetic disk, optical disk, etc.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. A face video retrieval method, the method comprising the steps of:
step A: judging the current frame pic of the current search videotIs determined by the judgment parameter partIf the value is 1, entering the step B if the value is 1, and otherwise, entering the step E;
and B: searching a current frame by using a first video searching mode;
and C: if the next frame of the current search video exists, making t equal to t +1, setting the next frame of the current search video as the current frame of the current search video, and then entering the step D; otherwise, ending; t represents the frame number of the searched video sequence, and the initial value of t is 1;
step D: if not sbktIf (i, j) is 1, entering step E; otherwise, entering step G;
sbkt(i, j) denotes bkt(i, j) identifying the parameter, bkt(i, j) denotes pictIth row and jth column code blocks;
step E: if the current searching video current frame pictFor intra-predicted frames, let tptBkh × bkw; otherwise, calculate tpt=sum(sign(bkt(i, j) | Condition 2) |1 ≦ i ≦ bkh and 1 ≦ j ≦ bkw); tptFor the scene change parameter, condition 2 indicates: bkt(i, j) is an intra prediction block or at least comprises one intra prediction sub-block;
step F: if tptFirst, all sbk are set to 0t(i, j) ═ 0, then proceed to step C; otherwise, if tptEntering step B if not less than 0.9 × bkh × bkw; otherwise, entering step G; bkw and bkh respectively represent the column number and the row number of a frame of image in a unit of block after the frame of image is divided into blocks;
step G: searching the current frame by using a second video search mode, and then entering the step C;
it is characterized in that the preparation method is characterized in that,
the first video search mode comprises the steps of:
decoding a current frame of a current search video to obtain a decoded image;
all decoded blocks of the decoded picture are processed as follows: if bkt(i, j) if the prediction mode is a subblock prediction mode, entering a subdivision judgment mode; otherwise, entering a rough judgment mode;
unifying the resolution of the current search area and the search target, and then scaling the current search area and the search target to the same size with the unified resolution;
firstly, extracting image characteristics from a search area of a current decoding image; then comparing with a search target, matching and finishing the search of the current frame of the current search video;
according to the matching result of the current frame of the current search video, identifying parameter identification of each decoding block of the current frame of the current search video;
wherein, sbkt(i,j)=sign(bkt(i, j) | condition 3), condition 3 represents: bkt(i, j) matching the target.
2. The face video retrieval method of claim 1,
Figure FDA0002397051330000021
pictrepresents the current search video t frame, and condition 1 represents: t 1 or pictFor intra-prediction of frames or tpt≥0.9*bkh*bkw;tptFor scene switching parameters, tpt=sum(sign(bkt(i, j) | Condition 2) |1 ≦ i ≦ bkh and 1 ≦ j ≦ bkw); sum (variable | condition) represents summing the variables that satisfy the condition;
Figure FDA0002397051330000022
condition 2 represents: bkt(i, j) is an intra prediction block or at least includes one intra prediction sub-block.
3. The face video retrieval method of claim 1,
a subdivision judgment mode:
step A1: taking each pixel point in the block as a skin color judgment point, judging the skin color of the skin color judgment point, and if the skin color judgment point is the skin color, adding 1 to the number of the skin color pixel points of the block;
step A2: if the number of the skin color pixel points in the block is larger than a fourteenth threshold value, judging that the block is drawn into a face video search area, otherwise, drawing the block into a non-face video search area;
rough classification judgment mode:
step B1: taking the mean value of pixel points in a block as a skin color decision point, and taking the mean value of corresponding components of all pixel points in the block as the value of each color model component;
step B2: judging the skin color of the skin color judging point, and if the skin color judging point is the skin color, dividing the skin color judging point into a face video searching area; otherwise, the block is scribed into a non-face video search area.
4. The face video retrieval method of claim 3,
the fourteenth threshold has an upper limit of the total number of pixels of the block and a lower limit of half the total number of pixels of the block.
5. The face video retrieval method of claim 3,
in the subdivision judgment mode and the rough division judgment mode, if the skin color judgment point is the skin color, the following 6 conditions are satisfied at the same time:
the method comprises the following steps of 1: thres1< b-g < Thres 2; the method comprises the following steps: thres3< r-g < Thres4 × Wr;
requirement 3: gup < g < Gdown; and 4, requirement: thres5< Wr; the requirements are 5: thres6< Co < Thres 7;
the method comprises the following steps: thres8<energyUV<Thres9&&U*Thres10<V&&U*Thres11>V or Thres12<energyUV<Thres13
Wherein Thresjj, jj ∈ [1,13 ]]The first threshold value to the thirteenth threshold value are respectively set according to the actual situation; based on the normalized RGB model, the RGB model,
Figure FDA0002397051330000023
acquiring normalized RGB color components r, g and b; color balance parameter Wr ═ (r-1/3)2+(g-1/3)2(ii) a Constructing a green component upper bound model Gup ═ aupr2+bupr+cupWherein a isup,bup,cupAs a model parameter, a green component lower bound model Gdown ═ adownr2+bdownr+cdown(ii) a Wherein, adown,bdown,cdownIs a model parameter; model-based YUV model
Figure FDA0002397051330000024
Obtaining color energy
Figure FDA0002397051330000025
Y is a brightness component, and U, V represents two chrominance components of the YUV model respectively; based on YCoCg model
Figure FDA0002397051330000031
And obtaining Co which is the color component value of the YCgCo model.
6. The face video retrieval method of claim 1, wherein the second video search mode comprises the steps of:
if bkt(i, j) as an intra prediction block, decoding the block and then delimiting the block as a search region; otherwise, if spbktWhen (i, j) is 1, sbk is sett(i, j) ═ 1; otherwise, sbk is sett(i, j) ═ 0; wherein spbkt(i, j) denotes bkt(ii) an identification parameter of the reference block of (i, j);
preprocessing the current search area, namely unifying the resolution of the current search area and the search target, and then scaling the current search area and the search target to the same size with the unified resolution;
firstly, extracting image characteristics from a search area, then comparing the image characteristics with a search target, matching and completing the search of a current frame of a current search video;
and identifying the identification parameters of the decoding blocks according to the matching results of the decoding blocks in the search area.
7. A face video retrieval system, the system comprising:
a first judgment processing module for judging the pic of the current frame of the current search videotIs determined by the judgment parameter partIf the video is 1, entering a first video searching device if the video is 1, and otherwise entering a scene switching parameter calculation module;
wherein partRepresents pictThe determination parameter of (a) is determined,
Figure FDA0002397051330000032
pictthe method comprises the steps of representing the tth frame of a current search video, wherein t represents the frame number of a search video sequence, and the initial value of t is 1; condition 1 represents: t 1 or pictFor intra-prediction of frames or tpt≥0.9*bkh*bkw;tptFor scene switching parameters, tpt=sum(sign(bkt(i, j) | Condition 2) |1 ≦ i ≦ bkh and 1 ≦ j ≦ bkw); sum (variable | condition) represents summing the variables that satisfy the condition;
Figure FDA0002397051330000033
condition 2 represents: bkt(i, j) is an intra prediction block or at least comprises one intra prediction sub-block; bkt(i, j) denotes pictRow i, row j decoding block of (1); bkw and bkh respectively represent the column number and the row number of a frame of image in a unit of block after the frame of image is divided into blocks;
first video search means for searching for a current frame using a first video search mode;
the second judgment processing module is used for judging the existence of the next frame of the current search video current frame, if so, the next frame of the current search video current frame is made to be t +1, the next frame of the current search video current frame is set to be the current search video current frame, and then the current search video current frame enters the third judgment processing module; otherwise, ending;
third judging and processing moduleFor determining whether sbk is presenttIf the (i, j) is not 1, entering a scene switching parameter calculation module, otherwise, entering a second video search device;
a scene switching parameter calculation module for judging if the current frame pic of the video is searched currentlytFor intra-predicted frames, let tptBkh × bkw; otherwise, tp is calculatedt=sum(sign(bkt(i, j) | Condition 2) |1 ≦ i ≦ bkh and 1 ≦ j ≦ bkw);
a fourth judgment processing module for judging whether tp is availabletIf 0, all sbk are sett(i, j) is equal to 0, and then the second judgment processing module is entered; otherwise, judging if tptEntering the first video searching device if the speed is more than or equal to 0.9 and bkh and bkw; otherwise, entering a second video searching device;
the second video searching device is used for searching the current frame by using a second video searching mode and then entering a second judgment processing module;
it is characterized in that the preparation method is characterized in that,
the first video search device includes:
the decoding image acquisition module is used for decoding the current searching video frame to acquire a decoding image;
a prediction mode judging module for judging if bkt(i, j) entering a subdivision decision device if the prediction mode is a subblock prediction mode; otherwise, entering a rough classification judgment device;
the first size unifying module is connected with the prediction mode judging module and is used for unifying the resolution of the current search area and the search target and then scaling the current search area and the search target to the same size according to the unified resolution;
the first target image searching module is used for extracting image characteristics from a searching area of a current decoding image; then comparing with a search target, matching and finishing the search of the current frame of the current search video;
the first identification parameter identification module is used for identifying the identification parameter of each decoding block of the current frame of the current search video according to the matching result of the current frame of the current search video;
wherein, sbkt(i,j)=sign(bkt(i, j) | Condition 3), sbkt(i, j) denotes bkt(ii) the identification parameter of (i, j); condition 3 represents: bkt(i, j) matching the target.
8. The face video retrieval system of claim 7,
the subdivision judging device comprises a block skin color pixel point counting module and a first face video searching area dividing module,
the block skin color pixel counting module is used for taking each pixel in the block as a skin color judgment point, judging the skin color of the skin color judgment point, and if the skin color judgment point is skin color, adding 1 to the number of the block skin color pixels;
the first face video search area dividing module is connected with the block skin color pixel point counting module and used for judging whether the block is divided into a face video search area if the number of the block skin color pixel points is larger than a fourteenth threshold value or not;
the rough-dividing judging device comprises a block color model component value calculating module and a second human face video searching area dividing module,
the block color model component value calculation module is used for taking the mean value of pixel points in the block as a skin color decision point and taking the mean value of corresponding components of all pixel points in the block as the value of each color model component;
the second face video search area dividing module is connected with the block color model component value setting module and used for judging the skin color of the skin color judging point, and if the skin color judging point is the skin color, the second face video search area dividing module is divided into a face video search area; otherwise, the block is drawn into a non-human face video search area.
9. The face video retrieval system of claim 8,
in the subdivision judgment mode and the rough division judgment mode, if the skin color judgment point is the skin color, the following 6 conditions are satisfied at the same time:
the method comprises the following steps of 1: thres1< b-g < Thres 2; the method comprises the following steps: thres3< r-g < Thres4 × Wr;
requirement 3: gup < g < Gdown; and 4, requirement: thres5< Wr; the requirements are 5: thres6< Co < Thres 7;
the method comprises the following steps: thres8<energyUV<Thres9&&U*Thres10<V&&U*Thres11>V
Or Thres12<energyUV<Thres13
Wherein Thresjj, jj ∈ [1,13 ]]The first threshold value to the thirteenth threshold value are respectively set according to the actual situation; based on the normalized RGB model, the RGB model,
Figure FDA0002397051330000051
acquiring normalized RGB color components r, g and b; color balance parameter Wr ═ (r-1/3)2+(g-1/3)2(ii) a Constructing a green component upper bound model Gup ═ aupr2+bupr+cupWherein a isup,bup,cupAs a model parameter, a green component lower bound model Gdown ═ adownr2+bdownr+cdown(ii) a Wherein, adown,bdown,cdownIs a model parameter; model-based YUV model
Figure FDA0002397051330000052
Obtaining color energy
Figure FDA0002397051330000053
Y is a brightness component, and U, V represents two chrominance components of the YUV model respectively; based on YCoCg model
Figure FDA0002397051330000054
And obtaining Co which is the color component value of the YCgCo model.
10. The face video retrieval system of claim 7,
the second video search device includes:
second search area delineationA module for judging if bkt(i, j) as an intra prediction block, decoding the block and then delimiting the block as a search region; otherwise, if spbktWhen (i, j) is 1, sbk is sett(i, j) ═ 1, namely, represents that the current block matches the target; otherwise, sbk is sett(i, j) ═ 0, namely, it means that the current block does not match the target; wherein spbkt(i, j) denotes bkt(ii) an identification parameter of the reference block of (i, j);
the second size unifying module is connected with the second searching area dividing module and is used for preprocessing the current searching area, namely unifying the resolution of the current searching area and the searching target and then zooming the current searching area and the searching target to the same size with the unified resolution;
the second target image searching module is used for firstly extracting image characteristics of a searching area, then comparing the image characteristics with a searching target, matching and completing searching of a current frame of a current searching video;
and the second identification parameter identification module is used for identifying the identification parameters of the decoding blocks according to the matching results of the decoding blocks in the search area.
CN201611087529.7A 2016-12-01 2016-12-01 Face video retrieval method and system Active CN106682094B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611087529.7A CN106682094B (en) 2016-12-01 2016-12-01 Face video retrieval method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611087529.7A CN106682094B (en) 2016-12-01 2016-12-01 Face video retrieval method and system

Publications (2)

Publication Number Publication Date
CN106682094A CN106682094A (en) 2017-05-17
CN106682094B true CN106682094B (en) 2020-05-22

Family

ID=58867143

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611087529.7A Active CN106682094B (en) 2016-12-01 2016-12-01 Face video retrieval method and system

Country Status (1)

Country Link
CN (1) CN106682094B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563278B (en) * 2017-07-21 2020-08-04 深圳市梦网视讯有限公司 Rapid eye and lip positioning method and system based on skin color detection
CN107423704B (en) * 2017-07-21 2020-07-03 深圳市梦网视讯有限公司 Lip video positioning method and system based on skin color detection
CN107481222B (en) * 2017-07-21 2020-07-03 深圳市梦网视讯有限公司 Rapid eye and lip video positioning method and system based on skin color detection
CN107527015B (en) * 2017-07-21 2020-08-04 深圳市梦网视讯有限公司 Human eye video positioning method and system based on skin color detection
CN107516067B (en) * 2017-07-21 2020-08-04 深圳市梦网视讯有限公司 Human eye positioning method and system based on skin color detection
CN107861990B (en) * 2017-10-17 2020-11-06 深圳市梦网视讯有限公司 Video searching method and system and terminal equipment
CN108710853B (en) * 2018-05-21 2021-01-01 深圳市梦网科技发展有限公司 Face recognition method and device
CN111815653B (en) * 2020-07-08 2024-01-30 深圳市梦网视讯有限公司 Method, system and equipment for segmenting human face and body skin color region

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102129682A (en) * 2011-03-09 2011-07-20 深圳市融创天下科技发展有限公司 Foreground and background area division method and system
CN102214291A (en) * 2010-04-12 2011-10-12 云南清眸科技有限公司 Method for quickly and accurately detecting and tracking human face based on video sequence
CN103167290A (en) * 2013-04-01 2013-06-19 深圳市云宙多媒体技术有限公司 Method and device for quantizing video coding movement intensity
CN103984738A (en) * 2014-05-22 2014-08-13 中国科学院自动化研究所 Role labelling method based on search matching

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7881505B2 (en) * 2006-09-29 2011-02-01 Pittsburgh Pattern Recognition, Inc. Video retrieval system for human face content

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102214291A (en) * 2010-04-12 2011-10-12 云南清眸科技有限公司 Method for quickly and accurately detecting and tracking human face based on video sequence
CN102129682A (en) * 2011-03-09 2011-07-20 深圳市融创天下科技发展有限公司 Foreground and background area division method and system
CN103167290A (en) * 2013-04-01 2013-06-19 深圳市云宙多媒体技术有限公司 Method and device for quantizing video coding movement intensity
CN103984738A (en) * 2014-05-22 2014-08-13 中国科学院自动化研究所 Role labelling method based on search matching

Also Published As

Publication number Publication date
CN106682094A (en) 2017-05-17

Similar Documents

Publication Publication Date Title
CN106682094B (en) Face video retrieval method and system
CN110087087B (en) VVC inter-frame coding unit prediction mode early decision and block division early termination method
CN102917225B (en) HEVC intraframe coding unit fast selecting method
CN108495135B (en) Quick coding method for screen content video coding
CN111383201B (en) Scene-based image processing method and device, intelligent terminal and storage medium
CN111355956B (en) Deep learning-based rate distortion optimization rapid decision system and method in HEVC intra-frame coding
Chao et al. A novel rate control framework for SIFT/SURF feature preservation in H. 264/AVC video compression
CN109040764B (en) HEVC screen content intra-frame rapid coding algorithm based on decision tree
KR20110103415A (en) Video identifier extracting device
EP4006824A1 (en) Image processing method and apparatus, mobile terminal video processing method and apparatus, device and medium
CN107657228A (en) Video scene similarity analysis method and system, video coding-decoding method and system
CN110533117B (en) Image comparison method, device, equipment and storage medium
CN112507842A (en) Video character recognition method and device based on key frame extraction
Duan et al. Optimizing JPEG quantization table for low bit rate mobile visual search
WO2017032245A1 (en) Method and device for generating video file index information
Blanch et al. Chroma intra prediction with attention-based CNN architectures
CN102129682A (en) Foreground and background area division method and system
CN112291562A (en) Fast CU partition and intra mode decision method for H.266/VVC
CN106611043B (en) Video searching method and system
US11212518B2 (en) Method for accelerating coding and decoding of an HEVC video sequence
CN113507611B (en) Image storage method and device, computer equipment and storage medium
US20080253670A1 (en) Image Signal Re-Encoding Apparatus And Image Signal Re-Encoding Method
CN107481222A (en) A kind of quick eye lip video locating method and system based on Face Detection
JP5644505B2 (en) Collation weight information extraction device
CN111901606A (en) Video coding method for improving caption coding quality

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
CB02 Change of applicant information

Address after: 518000 Guangdong city of Shenzhen province Nanshan District Guangdong streets high in the four Longtaili Technology Building Room 325 No. 30

Applicant after: Shenzhen Monternet encyclopedia Information Technology Co. Ltd.

Address before: The central Shenzhen city of Guangdong Province, 518057 Keyuan Road, Nanshan District science and Technology Park No. 15 Science Park Sinovac A Building 1 unit 403, No. 405 unit

Applicant before: BAC Information Technology Co., Ltd.

CB02 Change of applicant information
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 518000 Guangdong city of Shenzhen province Nanshan District Guangdong streets high in the four Longtaili Technology Building Room 325 No. 30

Applicant after: Shenzhen mengwang video Co., Ltd

Address before: 518000 Guangdong city of Shenzhen province Nanshan District Guangdong streets high in the four Longtaili Technology Building Room 325 No. 30

Applicant before: SHENZHEN MONTNETS ENCYCLOPEDIA INFORMATION TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant