CN103218603A - Face automatic labeling method and system - Google Patents

Face automatic labeling method and system Download PDF

Info

Publication number
CN103218603A
CN103218603A CN2013101154712A CN201310115471A CN103218603A CN 103218603 A CN103218603 A CN 103218603A CN 2013101154712 A CN2013101154712 A CN 2013101154712A CN 201310115471 A CN201310115471 A CN 201310115471A CN 103218603 A CN103218603 A CN 103218603A
Authority
CN
China
Prior art keywords
face
people
sequence
speaker
lip
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013101154712A
Other languages
Chinese (zh)
Other versions
CN103218603B (en
Inventor
丁宇新
张逸彬
燕泽权
戴蔚
高德坤
柴光忍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Harbin Institute of Technology
Original Assignee
Shenzhen Graduate School Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Harbin Institute of Technology filed Critical Shenzhen Graduate School Harbin Institute of Technology
Priority to CN201310115471.2A priority Critical patent/CN103218603B/en
Publication of CN103218603A publication Critical patent/CN103218603A/en
Application granted granted Critical
Publication of CN103218603B publication Critical patent/CN103218603B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a face automatic labeling method and a face automatic labeling system. Firstly, faces are detected from a captured video, a face image set is acquired, then the face image set is filtered, an HSV (Hue, Saturation, Value) color histogram difference value of an image of an adjacent frame is simultaneously acquired, shot segmentation is carried out by adopting a spatial color histogram shot edge detecting algorithm, angular points in a target region of a first frame are detected for the faces from the adjacent frame, the angular points are deferred to a next frame by using a local matching method, corresponding updating is carried out, a matched number is subjected to statistics, and according to a threshold value of the matched number, the operation is sequentially carried out to acquire face sequences; then speakers and unspeaking people are detected by a lip motion detection module according to lip motion of the speakers in the face sequences and the speakers, talk contents and talk time are fused to be labeled; and finally, the faces on each sequence are read in and are gradually positioned, then affine transformation is carried out according to a positioning result and pixel gray values in a round region with a fixed size near a transformed feature point are extracted to be used as features of the faces. The face automatic labeling method and the face automatic labeling system which are disclosed by the invention are convenient to use and have high accuracy.

Description

A kind of people's face automatic marking method and system
Technical field
The present invention relates to a kind of people's face mask method and system, relate in particular to a kind of automatic accurate mask method of the people of carrying out face and system.
Background technology
The video human face mark is a kind of of video information excavation, and existing and general technology is to use manual type to mark, its mark flow process such as Fig. 1.In the process of traditional-handwork mark, inefficiency takes time and effort.And owing to exist artificial difference may cause the front and back mark inconsistent.The automatic mark of the video human face of prior art also just is in the experimental study stage substantially, not effective, stable and can accurately mark automatically system's appearance.
Summary of the invention
The technical matters that the present invention solves is: make up a kind of people's face automatic marking method and system, overcome prior art and do not possess effectively, stablize and technical matters that can accurately automatic marking system.
Technical scheme of the present invention is: a kind of people's face automatic marking method is provided, comprises the steps:
People's face detects: detect people's face from the video of intercepting, obtain the set of people's face picture, filter out the set of people's face picture then, simultaneously, obtain the hsv color histogram difference of an adjacent frame picture, the camera lens edge detection algorithm of employing spatial color histogram carries out camera lens to be cut apart, to people's face from an adjacent frame, detect angle point in the target area of first frame, and use local method of mating that these angle points are prolonged and pass next frame, and upgrade accordingly, and the statistical match number, threshold value according to the coupling number goes on according to this and obtains people's face sequence;
Speaker's face sequence labelling: moving by the moving detection module of lip according to the lip of speaker in people's face sequence, detect speaker and speaker not, speaker, the content of speaking and the time three that speaks are integrated into the rower notes;
Speaker's face sequence labelling not: people's face that in the training sample all have been finished classification is encoded earlier, all training of human faces are obtained an encoder dictionary by the study of LC-KSVD algorithm, after encoder dictionary study is finished, calculate the coding of each non-classified people's face, with PSM method location feature, affined transformation, after extracting face characteristic and normalization, use the LC-KSVD algorithm that the feature that extracts of this sequence people face is encoded, and mate with the encoder dictionary of having learnt, a threshold value is set, when the Euclidean distance of two encoded radios during less than this threshold value, the match is successful in expression, and the video human face classification is to classify by the method for statistics, finishes marking Function.
Further technical scheme of the present invention is: detect in the step at people's face, comprise that also the people's face picture to intercepting carries out colour of skin filtration, at first count the threshold trait of face complexion, and then set up a complexion model, finally utilize this complexion model that people's face picture is carried out numerical analysis based on pixel, undesirable image filtering is fallen.
Further technical scheme of the present invention is: detect in the step at people's face, comprise that also the people's face picture to intercepting carries out the moving filtration of lip, utilize the geometric properties of mouth region in people's face, obtain mouth region according to the numerical value ratio, add up the threshold trait of lip look in people's face simultaneously, thereby set up the lip color model, finally utilize this lip color model right, those contaminant filters that lie in the set of people's face picture are fallen people's face picture set the carrying out numeric ratio after filtering through complexion model.
Further technical scheme of the present invention is: in obtaining people's face sequence process, after video lens is extracting end, again last image of the flanking sequence in the same camera lens and first image of last sequence are being re-used track algorithm one time, whether turn down threshold value and reexamine and can be aggregated to together this moment, and the sequential polymerization of fracture is got up.
Further technical scheme of the present invention is: set up data coordinates, transverse axis is that time, the longitudinal axis are that name, coordinate are the content of speaking, and time, name and the content three that speaks are merged.
Further technical scheme of the present invention is: in the face tracking process people's face sequence length lower limit is set, wrong people's face is rejected.
Technical scheme of the present invention is: make up a kind of people's face automatic marking system, comprise people's face detecting unit, speaker's face sequence labelling unit, speaker's face sequence labelling unit not, described people's face detecting unit detects people's face from the video of intercepting, obtain the set of people's face picture, filter out the set of people's face picture then, simultaneously, obtain the hsv color histogram difference of an adjacent frame picture, the camera lens edge detection algorithm of employing spatial color histogram carries out camera lens to be cut apart, to people's face from consecutive frame, detect angle point in the target area of first frame, and use local method of mating that these angle points are prolonged and pass next frame, and upgrade accordingly, and the statistical match number, threshold value according to the coupling number goes on according to this and obtains people's face sequence; Described speaker's face sequence labelling module detects speaker and speaker not by the moving detection module of lip according to the lip of speaker in people's face sequence is moving, and speaker, the content of speaking and the time three that speaks are integrated into the rower notes; Described not speaker face sequence labelling module is read in the people's face on each sequence, and affined transformation is carried out according to positioning result again in the location one by one, and the grey scale pixel value near the fixed size border circular areas of unique point after the extraction conversion, as this face characteristic.
Further technical scheme of the present invention is: described not speaker face sequence labelling unit also comprises sort module, described sort module is encoded to people's face that in the training sample all have been finished classification earlier, all training of human faces are obtained an encoder dictionary by the study of LC-KSVD algorithm, after encoder dictionary study is finished, calculate the coding of each non-classified people's face then, mate then, a threshold value is set, when the Euclidean distance of two encoded radios during less than this threshold value, the match is successful in expression.The video human face classification is to classify by the method for statistics.
Further technical scheme of the present invention is: described people's face detecting unit comprises the dual threshold module, described dual threshold module is after video lens is extracting end, again last image of the flanking sequence in the same camera lens and first image of last sequence are being re-used track algorithm one time, whether turn down threshold value and reexamine and can be aggregated to together this moment, and the sequential polymerization of fracture is got up.
Further technical scheme of the present invention is: described people's face detecting unit comprises that also lip moves filtering module, the moving filtering module of described lip utilizes the geometric properties of mouth region in people's face, obtain mouth region according to the numerical value ratio, add up the threshold trait of lip look in people's face simultaneously, thereby set up the lip color model, finally utilize this lip color model right, those contaminant filters that lie in the set of people's face picture are fallen people's face picture set the carrying out numeric ratio after filtering through complexion model.
Technique effect of the present invention is: make up a kind of people's face automatic marking method and system, at first from the video of intercepting, detect people's face, obtain the set of people's face picture, filter out the set of people's face picture then, simultaneously, obtain the hsv color histogram difference of an adjacent frame picture, the camera lens edge detection algorithm of employing spatial color histogram carries out camera lens to be cut apart, to people's face, detect angle point in the target area of first frame, and use the method for local coupling that these angle points are prolonged and pass next frame from consecutive frame, and upgrade accordingly, and the statistical match number, the threshold value according to the coupling number goes on according to this and obtains people's face sequence.Detect speaker and speaker not by the moving detection module of lip according to the lip of speaker in people's face sequence is moving then, speaker, the content of speaking and the time three that speaks are integrated into the rower notes; At last, read in the people's face on each sequence, affined transformation is carried out according to positioning result again in the location one by one, and the grey scale pixel value near the fixed size border circular areas of unique point after the extraction conversion, as this face characteristic.People's face automatic marking method of the present invention and system, easy to use, the accuracy height.
Description of drawings
Fig. 1 is existing labeling system structural representation.
Fig. 2 is a labeling system process flow diagram of the present invention.
Fig. 3 adopts KLT track algorithm process flow diagram for the present invention
Fig. 4 is inventor's face trace flow figure
Fig. 5 is a labeling system structural representation of the present invention.
Fig. 6 is a labeling system concrete structure synoptic diagram of the present invention.
Embodiment
Below in conjunction with specific embodiment, technical solution of the present invention is further specified.
As shown in Figure 2, the specific embodiment of the present invention is: a kind of people's face automatic marking method is provided, comprises the steps:
Step 100: people's face detects, and detects people's face from the video of intercepting that is:, obtains the set of people's face picture, filter out the set of people's face picture then, simultaneously, obtain the hsv color histogram difference of an adjacent frame picture, the camera lens edge detection algorithm of employing spatial color histogram carries out camera lens to be cut apart; To people's face, detect angle point in the target area of first frame, and use the method for local coupling that these angle points are prolonged and pass next frame from consecutive frame, and upgrade accordingly, and the statistical match number, the threshold value according to the coupling number goes on according to this and obtains people's face sequence.
Specific implementation process is as follows: at first use the Adaboost algorithm slightly to extract: the detection window minimum dimension is set to 20*20dpi, the zoom factor of detection window is 1.2, and detected people's face is carried out the 80*80 size block format, extract through the Adaboost algorithm, having obtained people's face picture set, is not all to be people's face picture in this set, also has non-face wrong picture, need further detect and filter, thereby weed out this part wrong picture.Here adopted the complexion model filtration, realize by function m odelSkinColor (IplImage*img), at first count the threshold trait of face complexion, and then set up a complexion model, finally utilize this complexion model that people's face picture is carried out numerical analysis based on pixel, undesirable image filtering is fallen.
After being written into video, read all frames, each two field picture is carried out people's face detect, detection is presented at video under also preserving, when detecting, calculate the hsv color histogram difference of consecutive frame, be used for the usefulness that camera lens is cut apart.Camera lens is cut apart the camera lens edge detection algorithm of usage space color histogram, and it is serious to consider that video is subjected to illumination effect, selects the color histogram based on the HSV space for use, because the relative illumination variation of H component has stability; Camera lens is cut apart in the son, and its segmentation threshold default setting is 0.4, and for accurately to cut apart at the different video environment, and the user can manually import a plurality of segmentation thresholds to be cut apart and check segmentation result, finds optimal segmentation threshold.
Face tracking extracts people's face sequence: be better than operation based on individual facial image based on the operation of sequence, because the data volume of mark descends, and with the sequence is that the mark unit can improve accuracy greatly, this system in camera lens inside, uses KLT(Kanade-Lucas-Tomasi at face tracking) follow the tracks of based on the track algorithm of angle point.This algorithm is divided into two parts: Harris Corner Detection Algorithm and KLT angle point track algorithm, at first adopt the Harris Corner Detection Algorithm to detect the angle point of target area, re-use KLT angle point track algorithm and follow the tracks of angle point, therefore, the tracking of people's face is exactly the tracking of angle point in the human face region.Common processing mode is to detect angle point in the target area of first frame, and uses local method of mating that these angle points are prolonged and pass next frame, and upgrades accordingly, goes on according to this.This paper is in camera lens, examine or check the people's face on the adjacent frame, suppose A, the B people's face from consecutive frame, utilization Harris Corner Detection Algorithm finds the angle point of A, search the angle point that in B, mates according to the method for pyramid LK compute sparse light stream again, and statistical match number m (c i, c I+1), c wherein iThe angle point of expression A, c I+1The angle point of expression B.Set a threshold value, as m (c i, c I+1) during greater than this threshold value, judge that these two target areas are from same sequence.
To people's face, detect angle point in the target area of first frame, and use the method for local coupling that these angle points are prolonged and pass next frame from consecutive frame, and upgrade accordingly, and the statistical match number, the threshold value according to the coupling number goes on according to this and obtains people's face sequence.
Step 200: speaker's face sequence labelling, that is: detect speaker and speaker not by the moving detection module of lip according to the lip of speaker in people's face sequence is moving, speaker, the content of speaking and the time three that speaks are integrated into the rower notes.
Specific implementation process is as follows: words person mark and for speaker's module not provides training data, and system uses words person to detect to carry out speaker's face sequence labelling, and it also is to obtain the training data process that words person detects.Words person's detection technique used herein at first uses dynamic time consolidation algorithm to merge drama and caption information, drama is gathered around the information of having plenty of characters name and speaking content, caption information is the information of the time and the content of speaking, by setting up the data dictionary of the content of speaking, with characters name, the content of speaking, the time, the three was merged mutually.In the specific embodiment, set up data coordinates, transverse axis is that time, the longitudinal axis are that name, coordinate are the content of speaking, and time, name and the content three that speaks are merged.
According to the speaker's face sequence that has marked as training sample, use the PSM method that some specific regions of people's face are located one by one, carry out affined transformation according to positioning result again and carry out the face posture rectification, and the grey scale pixel value near the fixed size border circular areas of unique point after the extraction conversion, after the normalized as this face characteristic, those people's faces as training sample, are encoded to these features of speaker's face with the LC-KSVD algorithm after the feature of extraction, carry out dictionary study.
Step 300: speaker's face sequence labelling not, that is: with PSM method location feature, affined transformation, after extracting face characteristic and normalization, use the LC-KSVD algorithm that the feature that extracts of this sequence people face is encoded, and mate, by the mode of ballot with the encoder dictionary of having learnt, be used for determining which kind of speaker's face do not belong to, and finishes marking Function.
Specific implementation process is as follows: at first carry out feature extraction, based on the overall situation and based on the pixel characteristic of part, use Pictorial Structures Model method to position, at first will read in location model, position the initialization of model, read in everyone face sequence more successively, read in the people's face on each sequence, affined transformation is carried out according to positioning result again in the location one by one, and the grey scale pixel value near the fixed size border circular areas of unique point after the extraction conversion, as this face characteristic.
Preferred implementation of the present invention is: detect in the step at people's face, comprise that also the people's face picture to intercepting carries out colour of skin filtration, at first count the threshold trait of face complexion, and then set up a complexion model, finally utilize this complexion model that people's face picture is carried out numerical analysis based on pixel, undesirable image filtering is fallen.
Preferred implementation of the present invention is: detect in the step at people's face, comprise that also the people's face picture to intercepting carries out the moving filtration of lip, utilize the geometric properties of mouth region in people's face, obtain mouth region according to the numerical value ratio, add up the threshold trait of lip look in people's face simultaneously, thereby set up the lip color model, finally utilize this lip color model right, those contaminant filters that lie in the set of people's face picture are fallen people's face picture set the carrying out numeric ratio after filtering through complexion model.
Preferred implementation process is as follows: though complexion model has filtered out most wrong people's face picture in the set of people's face picture, but find through overtesting, in the complexion model filter process, filter effect for the very approximate object of some and people's face color is not fine, such as, the clothes of yellow floor and yellowish pink etc.In order to overcome this problem, system has introduced the lip color model.In order to realize of the filtration of lip color model to wrong people's face, at first carried out the mouth region extraction, realized this function by function m odelLipColor (const IplImage*img), what import is that the method that this extraction is adopted is to utilize the geometric properties of mouth region in people's face, obtains mouth region according to the numerical value ratio.Add up the threshold trait of lip look in people's face simultaneously, thereby set up the lip color model, finally utilize this lip color model right, those contaminant filters that lie in the set of people's face picture are fallen people's face picture set the carrying out numeric ratio after filtering through complexion model.
Preferred implementation of the present invention is: in obtaining people's face sequence process, after video lens is extracting end, again last image of the flanking sequence in the same camera lens and first image of last sequence are being re-used track algorithm one time, whether turn down threshold value and reexamine and can be aggregated to together this moment, and the sequential polymerization of fracture is got up.
The trace flow that this paper is designed such as Fig. 3 use the tracking of KLT angle point to obtain 3 sequences from frame i to frame i+4, when new person's face 3 occurring among the frame i+2, form a new sequence; When frame i+4 do not have can and frame i+3 people face 2 be complementary people's face the time, the tracking of sequence 1 end.Analyze and find to differ between the little frame of distance because the personage moves excessive, may cause the sequence fracture and form two independent sequences, if just be aggregated to together probably but these two sequences derive from same camera lens, therefore, this paper carries out sequence and extracts on the basis of video lens, after extracting end, again last image of the flanking sequence in the same camera lens and first image of last sequence are being re-used track algorithm one time, whether turn down threshold value and reexamine and can be aggregated to together this moment, i.e. the dual threshold method.The dual threshold method can be got up the sequential polymerization that ruptures effectively.
Face tracking such as Fig. 4, wherein threshold value 1 is a KLT algorithm threshold value, is used for sequence of partitions; Threshold value 2 is sequence length lower limits, is used for doing the sequence coarse filtration, and short sequence filter is fallen, and the sequence that tracking obtains is preserved, and can check for the user, adjusts and follows the tracks of threshold value, obtains tracking results more accurately.
As Fig. 5 technical scheme of the present invention be: make up a kind of people's face automatic marking system, comprise people's face detecting unit, speaker's face sequence labelling unit, speaker's face sequence labelling unit not, described people's face detecting unit detects people's face from the video of intercepting, obtain the set of people's face picture, filter out the set of people's face picture then, simultaneously, obtain the hsv color histogram difference of consecutive frame picture, the camera lens edge detection algorithm of employing spatial color histogram carries out camera lens to be cut apart, to people's face from consecutive frame, detect angle point in the target area of first frame, and use the method for local coupling that these angle points are deferred to next frame, and upgrade accordingly, and the statistical match number, threshold value according to the coupling number goes on according to this and obtains people's face sequence; Described speaker's face sequence labelling module detects speaker and speaker not by the moving detection module of lip according to the lip of speaker in people's face sequence is moving, and speaker, the content of speaking and the time three that speaks are integrated into the rower notes; Described not speaker face sequence labelling module is read in the people's face on each sequence, and affined transformation is carried out according to positioning result again in the location one by one, and the grey scale pixel value near the fixed size border circular areas of unique point after the extraction conversion, as this face characteristic.
As shown in Figure 6, the specific implementation process of inventor's face automatic marking system is as follows:
At first use the Adaboost algorithm slightly to extract: the detection window minimum dimension is set to 20*20dpi, the zoom factor of detection window is 1.2, and detected people's face is carried out the 80*80 size block format, extract through the Adaboost algorithm, having obtained people's face picture set, is not all to be people's face picture in this set, also has non-face wrong picture, need further detect and filter, thereby weed out this part wrong picture.Here adopted the complexion model filtration, realize by function m odelSkinColor (IplImage*img), at first count the threshold trait of face complexion, and then set up a complexion model, finally utilize this complexion model that people's face picture is carried out numerical analysis based on pixel, undesirable image filtering is fallen.
After being written into video, read all frames, each two field picture is carried out people's face detect, detection is presented at video under also preserving, when detecting, calculate the hsv color histogram difference of consecutive frame, be used for the usefulness that camera lens is cut apart.Camera lens is cut apart the camera lens edge detection algorithm of usage space color histogram, and it is serious to consider that video is subjected to illumination effect, selects the color histogram based on the HSV space for use, because the relative illumination variation of H component has stability; Camera lens is cut apart in the son, and its segmentation threshold default setting is 0.4, and for accurately to cut apart at the different video environment, and the user can manually import a plurality of segmentation thresholds to be cut apart and check segmentation result, finds optimal segmentation threshold.
Face tracking extracts people's face sequence: be better than operation based on individual facial image based on the operation of sequence, because the data volume of mark descends, and with the sequence is that the mark unit can improve accuracy greatly, this system in camera lens inside, uses KLT(Kanade-Lucas-Tomasi at face tracking) follow the tracks of based on the track algorithm of angle point.This algorithm is divided into two parts: Harris Corner Detection Algorithm and KLT angle point track algorithm, at first adopt the Harris Corner Detection Algorithm to detect the angle point of target area, re-use KLT angle point track algorithm and follow the tracks of angle point, therefore, the tracking of people's face is exactly the tracking of angle point in the human face region.Common processing mode is to detect angle point in the target area of first frame, and uses the method for local coupling that these angle points are deferred to next frame, and upgrades accordingly, goes on according to this.This paper is in camera lens, people's face on the examination consecutive frame is supposed A, the B people's face from consecutive frame, and utilization Harris Corner Detection Algorithm finds the angle point of A, search the angle point that in B, mates according to the method for pyramid LK compute sparse light stream again, and statistical match number m (c i, c I+1), c wherein iThe angle point of expression A, c I+1The angle point of expression B.Set a threshold value, as m (c i, c I+1) during greater than this threshold value, judge that these two target areas are from same sequence.
Speaker's face sequence labelling: words person mark and for speaker's module not provides training data, system uses words person to detect to carry out speaker's face sequence labelling, and it also is to obtain the training data process that words person detects.Words person's detection technique used herein at first uses dynamic time consolidation algorithm to merge drama and caption information, drama is gathered around the information of having plenty of characters name and speaking content, caption information is the information of the time and the content of speaking, by setting up the data dictionary of the content of speaking, with characters name, the content of speaking, the time, the three was merged mutually.In the specific embodiment, set up data coordinates, transverse axis is that time, the longitudinal axis are that name, coordinate are the content of speaking, and time, name and the content three that speaks are merged.
Speaker's face sequence labelling not: at first carry out feature extraction, based on the overall situation and based on the pixel characteristic of part, use Pictorial Structures Model method to position, at first to read in location model, position the initialization of model, read in everyone face sequence more successively, read in the people's face on each sequence, locate one by one, carry out affined transformation according to positioning result again, and the grey scale pixel value near the fixed size border circular areas of unique point after the extraction conversion, as this face characteristic.
Preferred implementation of the present invention is: described not speaker face sequence labelling unit also comprises sort module, classification, people's face of at first wanting earlier in the training sample all to have been finished classification is encoded, all training of human faces are obtained an encoder dictionary by the study of LC-KSVD algorithm, after encoder dictionary study is finished, calculate the coding of each non-classified people's face, mate then, a threshold value is set, and when the Euclidean distance of two encoded radios during less than this threshold value, the match is successful in expression.The video human face classification is to classify by the method for statistics.Which kind of everyone face of same cycle tests belonged to adds up, be designated as " ballot value ", if the test person face and in such someone's face the match is successful, represent that then the test person face " thrown a ticket " to such, the rate of specific gravity that accounts in this people's face sequence when a certain class is all bigger than other classes, this sequence belongs to such so, voting process that Here it is.
Divide the time-like use to mark conforming KSVD dictionary study coding based on class based on the LC-KSVD(that video properties is optimized) algorithm, learn encoder dictionary earlier, calculate the coding of each test person face again, and on the basis of sequence, classify.The LC-KSVD algorithm will be constructed an initial dictionary before carrying out dictionary study, as the input of KSVD algorithm iteration study, the mode that is based on sequence that the present invention uses is optimized the structure of initial dictionary.Method specifically describes as follows: evenly select some individual faces to construct D on each sequence basis of each class 0, suppose that the dictionary element number is K, the classification number is N, tf IjJ sequence representing the i class, then D 0As follows:
D 0=(d 0,d 1,...,d k,....)(d k∈tf ij,i=1,2,..,N;k=1,2...,K)
D 0In each tabulation show an element, each element all has class mark corresponding.After encoder dictionary study is finished, calculate the coding of each the people's face that need classify.The video human face classification is by statistical classification result's on sequence basis.At first all samples of same cycle tests are voted to all classes, the ballot value is the classification score on each class of each test person face in the sequence, and computing formula is,
j = arg max j ( l = Wx i )
Wherein W is a matrix of coefficients, x iExpression input signal corresponding codes value.S j=[s 1, s 2..., s i... s C] tJ=1 ..., n, wherein S jExpression test person face j classification results, n represents cycle tests length, C represents classification sum, s jThe expression sample on class j score value; With the sequence be again basis statistics score and, promptly
Figure BDA00003010076300091
N wherein iRepresent i sequence length; Class mark with this cycle tests is defined as at last
Figure BDA00003010076300092
I=1 ..., C.
Preferred implementation of the present invention is: described people's face detecting unit comprises the dual threshold module, described dual threshold module is after video lens is extracting end, again last image of the flanking sequence in the same camera lens and first image of last sequence are being re-used track algorithm one time, whether turn down threshold value and reexamine and can be aggregated to together this moment, and the sequential polymerization of fracture is got up.
Preferred implementation of the present invention is: described people's face detecting unit comprises that also lip moves filtering module, the moving filtering module of described lip utilizes the geometric properties of mouth region in people's face, obtain mouth region according to the numerical value ratio, add up the threshold trait of lip look in people's face simultaneously, thereby set up the lip color model, finally utilize this lip color model right, those contaminant filters that lie in the set of people's face picture are fallen people's face picture set the carrying out numeric ratio after filtering through complexion model.
Technique effect of the present invention is: make up a kind of people's face automatic marking method and system, at first from the video of intercepting, detect people's face, obtain the set of people's face picture, filter out the set of people's face picture then, simultaneously, obtain the hsv color histogram difference of an adjacent frame picture, the camera lens edge detection algorithm of employing spatial color histogram carries out camera lens to be cut apart, to people's face, detect angle point in the target area of first frame, and use the method for local coupling that these angle points are prolonged and pass next frame from consecutive frame, and upgrade accordingly, and the statistical match number, the threshold value according to the coupling number goes on according to this and obtains people's face sequence.Detect speaker and speaker not by the moving detection module of lip according to the lip of speaker in people's face sequence is moving then, speaker, the content of speaking and the time three that speaks are integrated into the rower notes; At last, read in the people's face on each sequence, affined transformation is carried out according to positioning result again in the location one by one, and the grey scale pixel value near the fixed size border circular areas of unique point after the extraction conversion, as this face characteristic.People's face automatic marking method of the present invention and system, easy to use, the accuracy height.
Above content be in conjunction with concrete preferred implementation to further describing that the present invention did, can not assert that concrete enforcement of the present invention is confined to these explanations.For the general technical staff of the technical field of the invention, without departing from the inventive concept of the premise, can also make some simple deduction or replace, all should be considered as belonging to protection scope of the present invention.

Claims (10)

1. people's face automatic marking method is characterized in that, comprises the steps:
People's face detects: detect people's face from the video of intercepting, obtain the set of people's face picture, filter out the set of people's face picture then, simultaneously, obtain the hsv color histogram difference of an adjacent frame picture, the camera lens edge detection algorithm of employing spatial color histogram carries out camera lens to be cut apart, to people's face from an adjacent frame, detect angle point in the target area of first frame, and use local method of mating that these angle points are prolonged and pass next frame, and upgrade accordingly, and the statistical match number, threshold value according to the coupling number goes on according to this and obtains people's face sequence;
Speaker's face sequence labelling: moving by the moving detection module of lip according to the lip of speaker in people's face sequence, detect speaker and speaker not, speaker, the content of speaking and the time three that speaks are integrated into the rower notes;
Speaker's face sequence labelling not: people's face that in the training sample all have been finished classification is encoded earlier, all training of human faces are obtained an encoder dictionary by the study of LC-KSVD algorithm, after encoder dictionary study is finished, calculate the coding of each non-classified people's face, with PSM method location feature, affined transformation, after extracting face characteristic and normalization, use the LC-KSVD algorithm that the feature that extracts of this sequence people face is encoded, and mate with the encoder dictionary of having learnt, a threshold value is set, when the Euclidean distance of two encoded radios during less than this threshold value, the match is successful in expression, and the video human face classification is to classify by the method for statistics, finishes marking Function.
2. people's face automatic marking method according to claim 1, it is characterized in that, detect in the step at people's face, comprise that also the people's face picture to intercepting carries out colour of skin filtration, at first count the threshold trait of face complexion, and then set up a complexion model, and finally utilize this complexion model that people's face picture is carried out numerical analysis based on pixel, undesirable image filtering is fallen.
3. people's face automatic marking method according to claim 1, it is characterized in that, detect in the step at people's face, comprise that also the people's face picture to intercepting carries out the moving filtration of lip, utilize the geometric properties of mouth region in people's face, obtain mouth region according to the numerical value ratio, add up the threshold trait of lip look in people's face simultaneously, thereby set up the lip color model, finally utilize this lip color model right, those contaminant filters that lie in the set of people's face picture are fallen people's face picture set the carrying out numeric ratio after filtering through complexion model.
4. people's face automatic marking method according to claim 1, it is characterized in that, in obtaining people's face sequence process, after video lens is extracting end, again last image of the flanking sequence in the same camera lens and first image of last sequence are being re-used track algorithm one time, whether turn down threshold value and reexamine and can be aggregated to together this moment, and the sequential polymerization of fracture is got up.
5. people's face automatic marking method according to claim 1 is characterized in that, sets up data coordinates, and transverse axis is that time, the longitudinal axis are that name, coordinate are the content of speaking, and time, name and the content three that speaks are merged.
6. people's face automatic marking method according to claim 1 is characterized in that, in the face tracking process people's face sequence length lower limit is set, and wrong people's face is rejected.
7. people's face automatic marking system, it is characterized in that, comprise people's face detecting unit, speaker's face sequence labelling unit, speaker's face sequence labelling unit not, described people's face detecting unit detects people's face from the video of intercepting, obtain the set of people's face picture, filter out the set of people's face picture then, simultaneously, obtain the hsv color histogram difference of an adjacent frame picture, the camera lens edge detection algorithm of employing spatial color histogram carries out camera lens to be cut apart, to people's face, detect angle point in the target area of first frame, and use the method for local coupling that these angle points are prolonged and pass next frame from consecutive frame, and upgrade accordingly, and the statistical match number, the threshold value according to the coupling number goes on according to this and obtains people's face sequence; Described speaker's face sequence labelling module detects speaker and speaker not by the moving detection module of lip according to the lip of speaker in people's face sequence is moving, and speaker, the content of speaking and the time three that speaks are integrated into the rower notes; Described not speaker face sequence labelling module is read in the people's face on each sequence, and affined transformation is carried out according to positioning result again in the location one by one, and the grey scale pixel value near the fixed size border circular areas of unique point after the extraction conversion, as this face characteristic.
8. according to the described people's face of claim 7 automatic marking system, it is characterized in that, described not speaker face sequence labelling unit also comprises sort module, described sort module is encoded to people's face that in the training sample all have been finished classification earlier, all training of human faces are obtained an encoder dictionary by the study of LC-KSVD algorithm, after encoder dictionary study is finished, calculate the coding of each non-classified people's face then, mate then, a threshold value is set, when the Euclidean distance of two encoded radios during less than this threshold value, the match is successful in expression, and the video human face classification is to classify by the method for statistics.
9. according to the described people's face of claim 7 automatic marking system, it is characterized in that, described people's face detecting unit comprises the dual threshold module, described dual threshold module is after video lens is extracting end, again last image of the flanking sequence in the same camera lens and first image of last sequence are being re-used track algorithm one time, whether turn down threshold value and reexamine and can be aggregated to together this moment, and the sequential polymerization of fracture is got up.
10. according to the described people's face of claim 7 automatic marking system, it is characterized in that, described people's face detecting unit comprises that also lip moves filtering module, the moving filtering module of described lip utilizes the geometric properties of mouth region in people's face, obtain mouth region according to the numerical value ratio, add up the threshold trait of lip look in people's face simultaneously, thereby set up the lip color model, finally utilize this lip color model right, those contaminant filters that lie in the set of people's face picture are fallen people's face picture set the carrying out numeric ratio after filtering through complexion model.
CN201310115471.2A 2013-04-03 2013-04-03 A kind of face automatic marking method and system Expired - Fee Related CN103218603B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310115471.2A CN103218603B (en) 2013-04-03 2013-04-03 A kind of face automatic marking method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310115471.2A CN103218603B (en) 2013-04-03 2013-04-03 A kind of face automatic marking method and system

Publications (2)

Publication Number Publication Date
CN103218603A true CN103218603A (en) 2013-07-24
CN103218603B CN103218603B (en) 2016-06-01

Family

ID=48816372

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310115471.2A Expired - Fee Related CN103218603B (en) 2013-04-03 2013-04-03 A kind of face automatic marking method and system

Country Status (1)

Country Link
CN (1) CN103218603B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103390282A (en) * 2013-07-30 2013-11-13 百度在线网络技术(北京)有限公司 Image tagging method and device
CN104091164A (en) * 2014-07-28 2014-10-08 北京奇虎科技有限公司 Face picture name recognition method and system
CN104951730A (en) * 2014-03-26 2015-09-30 联想(北京)有限公司 Lip movement detection method, lip movement detection device and electronic equipment
CN108171135A (en) * 2017-12-21 2018-06-15 深圳云天励飞技术有限公司 Method for detecting human face, device and computer readable storage medium
CN108831462A (en) * 2018-06-26 2018-11-16 北京奇虎科技有限公司 Vehicle-mounted voice recognition methods and device
CN109190520A (en) * 2018-08-16 2019-01-11 广州视源电子科技股份有限公司 A kind of super-resolution rebuilding facial image method and device
CN109472217A (en) * 2018-10-19 2019-03-15 广州慧睿思通信息科技有限公司 Intelligent training model building method and device, training method and device
CN109753975A (en) * 2019-02-02 2019-05-14 杭州睿琪软件有限公司 Training sample obtaining method and device, electronic equipment and storage medium
CN109948441A (en) * 2019-02-14 2019-06-28 北京奇艺世纪科技有限公司 Model training, image processing method, device, electronic equipment and computer readable storage medium
CN110442873A (en) * 2019-08-07 2019-11-12 云南电网有限责任公司信息中心 A kind of hot spot work order acquisition methods and device based on CBOW model
CN110998606A (en) * 2017-08-14 2020-04-10 华为技术有限公司 Generating marker data for deep object tracking
CN111191708A (en) * 2019-12-25 2020-05-22 浙江省北大信息技术高等研究院 Automatic sample key point marking method, device and system
CN112381065A (en) * 2020-12-07 2021-02-19 福建天创信息科技有限公司 Face positioning method and terminal

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1794264A (en) * 2005-12-31 2006-06-28 北京中星微电子有限公司 Method and system of real time detecting and continuous tracing human face in video frequency sequence
CN101510255A (en) * 2009-03-30 2009-08-19 北京中星微电子有限公司 Method for identifying and positioning human face, apparatus and video processing chip
CN102521581A (en) * 2011-12-22 2012-06-27 刘翔 Parallel face recognition method with biological characteristics and local image characteristics
CN102799870A (en) * 2012-07-13 2012-11-28 复旦大学 Single-training sample face recognition method based on blocking consistency LBP (Local Binary Pattern) and sparse coding
CN102902961A (en) * 2012-09-21 2013-01-30 武汉大学 Face super-resolution processing method based on K neighbor sparse coding average value constraint

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1794264A (en) * 2005-12-31 2006-06-28 北京中星微电子有限公司 Method and system of real time detecting and continuous tracing human face in video frequency sequence
CN101510255A (en) * 2009-03-30 2009-08-19 北京中星微电子有限公司 Method for identifying and positioning human face, apparatus and video processing chip
CN102521581A (en) * 2011-12-22 2012-06-27 刘翔 Parallel face recognition method with biological characteristics and local image characteristics
CN102799870A (en) * 2012-07-13 2012-11-28 复旦大学 Single-training sample face recognition method based on blocking consistency LBP (Local Binary Pattern) and sparse coding
CN102902961A (en) * 2012-09-21 2013-01-30 武汉大学 Face super-resolution processing method based on K neighbor sparse coding average value constraint

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘广征: "基于视频与文本信息的说话者人脸标注", 《万方数据》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103390282B (en) * 2013-07-30 2016-04-13 百度在线网络技术(北京)有限公司 Image labeling method and device thereof
CN103390282A (en) * 2013-07-30 2013-11-13 百度在线网络技术(北京)有限公司 Image tagging method and device
CN104951730A (en) * 2014-03-26 2015-09-30 联想(北京)有限公司 Lip movement detection method, lip movement detection device and electronic equipment
CN104951730B (en) * 2014-03-26 2018-08-31 联想(北京)有限公司 A kind of lip moves detection method, device and electronic equipment
CN104091164A (en) * 2014-07-28 2014-10-08 北京奇虎科技有限公司 Face picture name recognition method and system
CN110998606A (en) * 2017-08-14 2020-04-10 华为技术有限公司 Generating marker data for deep object tracking
CN110998606B (en) * 2017-08-14 2023-08-22 华为技术有限公司 Generating marker data for depth object tracking
CN108171135A (en) * 2017-12-21 2018-06-15 深圳云天励飞技术有限公司 Method for detecting human face, device and computer readable storage medium
CN108831462A (en) * 2018-06-26 2018-11-16 北京奇虎科技有限公司 Vehicle-mounted voice recognition methods and device
CN109190520A (en) * 2018-08-16 2019-01-11 广州视源电子科技股份有限公司 A kind of super-resolution rebuilding facial image method and device
CN109472217A (en) * 2018-10-19 2019-03-15 广州慧睿思通信息科技有限公司 Intelligent training model building method and device, training method and device
CN109472217B (en) * 2018-10-19 2021-08-31 广州慧睿思通信息科技有限公司 Intelligent exercise training model construction method and device and training method and device
CN109753975A (en) * 2019-02-02 2019-05-14 杭州睿琪软件有限公司 Training sample obtaining method and device, electronic equipment and storage medium
CN109753975B (en) * 2019-02-02 2021-03-09 杭州睿琪软件有限公司 Training sample obtaining method and device, electronic equipment and storage medium
CN109948441A (en) * 2019-02-14 2019-06-28 北京奇艺世纪科技有限公司 Model training, image processing method, device, electronic equipment and computer readable storage medium
CN110442873A (en) * 2019-08-07 2019-11-12 云南电网有限责任公司信息中心 A kind of hot spot work order acquisition methods and device based on CBOW model
CN111191708A (en) * 2019-12-25 2020-05-22 浙江省北大信息技术高等研究院 Automatic sample key point marking method, device and system
CN112381065A (en) * 2020-12-07 2021-02-19 福建天创信息科技有限公司 Face positioning method and terminal
CN112381065B (en) * 2020-12-07 2024-04-05 福建天创信息科技有限公司 Face positioning method and terminal

Also Published As

Publication number Publication date
CN103218603B (en) 2016-06-01

Similar Documents

Publication Publication Date Title
CN103218603A (en) Face automatic labeling method and system
CN110363140B (en) Human body action real-time identification method based on infrared image
Shahab et al. ICDAR 2011 robust reading competition challenge 2: Reading text in scene images
CN104866829B (en) A kind of across age face verification method based on feature learning
Li et al. Delving into egocentric actions
CN100565559C (en) Image text location method and device based on connected component and support vector machine
CN102163284B (en) Chinese environment-oriented complex scene text positioning method
Avgerinakis et al. Recognition of activities of daily living for smart home environments
US20160154469A1 (en) Mid-air gesture input method and apparatus
WO2019080203A1 (en) Gesture recognition method and system for robot, and robot
CN103824091B (en) A kind of licence plate recognition method for intelligent transportation system
CN108647625A (en) A kind of expression recognition method and device
CN106446952A (en) Method and apparatus for recognizing score image
CN106297755B (en) Electronic equipment and identification method for music score image identification
CN103735253A (en) Tongue appearance analysis system and method thereof in traditional Chinese medicine based on mobile terminal
CN105516802A (en) Multi-feature fusion video news abstract extraction method
CN108805076A (en) The extracting method and system of environmental impact assessment report table word
CN104281839A (en) Body posture identification method and device
CN105138983B (en) The pedestrian detection method divided based on weighting block model and selective search
CN105631039A (en) Picture browsing method
CN104821010A (en) Binocular-vision-based real-time extraction method and system for three-dimensional hand information
CN111046886A (en) Automatic identification method, device and equipment for number plate and computer readable storage medium
CN110599463A (en) Tongue image detection and positioning algorithm based on lightweight cascade neural network
CN106709438A (en) Method for collecting statistics of number of people based on video conference
Shivakumara et al. Gradient-angular-features for word-wise video script identification

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160601

Termination date: 20200403