CN103218603A

CN103218603A - Face automatic labeling method and system

Info

Publication number: CN103218603A
Application number: CN2013101154712A
Authority: CN
Inventors: 丁宇新; 张逸彬; 燕泽权; 戴蔚; 高德坤; 柴光忍
Original assignee: Shenzhen Graduate School Harbin Institute of Technology
Current assignee: Shenzhen Graduate School Harbin Institute of Technology
Priority date: 2013-04-03
Filing date: 2013-04-03
Publication date: 2013-07-24
Anticipated expiration: 2033-04-03
Also published as: CN103218603B

Abstract

The invention relates to a face automatic labeling method and a face automatic labeling system. Firstly, faces are detected from a captured video, a face image set is acquired, then the face image set is filtered, an HSV (Hue, Saturation, Value) color histogram difference value of an image of an adjacent frame is simultaneously acquired, shot segmentation is carried out by adopting a spatial color histogram shot edge detecting algorithm, angular points in a target region of a first frame are detected for the faces from the adjacent frame, the angular points are deferred to a next frame by using a local matching method, corresponding updating is carried out, a matched number is subjected to statistics, and according to a threshold value of the matched number, the operation is sequentially carried out to acquire face sequences; then speakers and unspeaking people are detected by a lip motion detection module according to lip motion of the speakers in the face sequences and the speakers, talk contents and talk time are fused to be labeled; and finally, the faces on each sequence are read in and are gradually positioned, then affine transformation is carried out according to a positioning result and pixel gray values in a round region with a fixed size near a transformed feature point are extracted to be used as features of the faces. The face automatic labeling method and the face automatic labeling system which are disclosed by the invention are convenient to use and have high accuracy.

Description

A kind of people's face automatic marking method and system

Technical field

The present invention relates to a kind of people's face mask method and system, relate in particular to a kind of automatic accurate mask method of the people of carrying out face and system.

Background technology

The video human face mark is a kind of of video information excavation, and existing and general technology is to use manual type to mark, its mark flow process such as Fig. 1.In the process of traditional-handwork mark, inefficiency takes time and effort.And owing to exist artificial difference may cause the front and back mark inconsistent.The automatic mark of the video human face of prior art also just is in the experimental study stage substantially, not effective, stable and can accurately mark automatically system's appearance.

Summary of the invention

The technical matters that the present invention solves is: make up a kind of people's face automatic marking method and system, overcome prior art and do not possess effectively, stablize and technical matters that can accurately automatic marking system.

Technical scheme of the present invention is: a kind of people's face automatic marking method is provided, comprises the steps:

People's face detects: detect people's face from the video of intercepting, obtain the set of people's face picture, filter out the set of people's face picture then, simultaneously, obtain the hsv color histogram difference of an adjacent frame picture, the camera lens edge detection algorithm of employing spatial color histogram carries out camera lens to be cut apart, to people's face from an adjacent frame, detect angle point in the target area of first frame, and use local method of mating that these angle points are prolonged and pass next frame, and upgrade accordingly, and the statistical match number, threshold value according to the coupling number goes on according to this and obtains people's face sequence;

Speaker's face sequence labelling: moving by the moving detection module of lip according to the lip of speaker in people's face sequence, detect speaker and speaker not, speaker, the content of speaking and the time three that speaks are integrated into the rower notes;

Speaker's face sequence labelling not: people's face that in the training sample all have been finished classification is encoded earlier, all training of human faces are obtained an encoder dictionary by the study of LC-KSVD algorithm, after encoder dictionary study is finished, calculate the coding of each non-classified people's face, with PSM method location feature, affined transformation, after extracting face characteristic and normalization, use the LC-KSVD algorithm that the feature that extracts of this sequence people face is encoded, and mate with the encoder dictionary of having learnt, a threshold value is set, when the Euclidean distance of two encoded radios during less than this threshold value, the match is successful in expression, and the video human face classification is to classify by the method for statistics, finishes marking Function.

Further technical scheme of the present invention is: detect in the step at people's face, comprise that also the people's face picture to intercepting carries out colour of skin filtration, at first count the threshold trait of face complexion, and then set up a complexion model, finally utilize this complexion model that people's face picture is carried out numerical analysis based on pixel, undesirable image filtering is fallen.

Further technical scheme of the present invention is: detect in the step at people's face, comprise that also the people's face picture to intercepting carries out the moving filtration of lip, utilize the geometric properties of mouth region in people's face, obtain mouth region according to the numerical value ratio, add up the threshold trait of lip look in people's face simultaneously, thereby set up the lip color model, finally utilize this lip color model right, those contaminant filters that lie in the set of people's face picture are fallen people's face picture set the carrying out numeric ratio after filtering through complexion model.

Further technical scheme of the present invention is: in obtaining people's face sequence process, after video lens is extracting end, again last image of the flanking sequence in the same camera lens and first image of last sequence are being re-used track algorithm one time, whether turn down threshold value and reexamine and can be aggregated to together this moment, and the sequential polymerization of fracture is got up.

Further technical scheme of the present invention is: set up data coordinates, transverse axis is that time, the longitudinal axis are that name, coordinate are the content of speaking, and time, name and the content three that speaks are merged.

Further technical scheme of the present invention is: in the face tracking process people's face sequence length lower limit is set, wrong people's face is rejected.

Technical scheme of the present invention is: make up a kind of people's face automatic marking system, comprise people's face detecting unit, speaker's face sequence labelling unit, speaker's face sequence labelling unit not, described people's face detecting unit detects people's face from the video of intercepting, obtain the set of people's face picture, filter out the set of people's face picture then, simultaneously, obtain the hsv color histogram difference of an adjacent frame picture, the camera lens edge detection algorithm of employing spatial color histogram carries out camera lens to be cut apart, to people's face from consecutive frame, detect angle point in the target area of first frame, and use local method of mating that these angle points are prolonged and pass next frame, and upgrade accordingly, and the statistical match number, threshold value according to the coupling number goes on according to this and obtains people's face sequence; Described speaker's face sequence labelling module detects speaker and speaker not by the moving detection module of lip according to the lip of speaker in people's face sequence is moving, and speaker, the content of speaking and the time three that speaks are integrated into the rower notes; Described not speaker face sequence labelling module is read in the people's face on each sequence, and affined transformation is carried out according to positioning result again in the location one by one, and the grey scale pixel value near the fixed size border circular areas of unique point after the extraction conversion, as this face characteristic.

Further technical scheme of the present invention is: described not speaker face sequence labelling unit also comprises sort module, described sort module is encoded to people's face that in the training sample all have been finished classification earlier, all training of human faces are obtained an encoder dictionary by the study of LC-KSVD algorithm, after encoder dictionary study is finished, calculate the coding of each non-classified people's face then, mate then, a threshold value is set, when the Euclidean distance of two encoded radios during less than this threshold value, the match is successful in expression.The video human face classification is to classify by the method for statistics.

Further technical scheme of the present invention is: described people's face detecting unit comprises the dual threshold module, described dual threshold module is after video lens is extracting end, again last image of the flanking sequence in the same camera lens and first image of last sequence are being re-used track algorithm one time, whether turn down threshold value and reexamine and can be aggregated to together this moment, and the sequential polymerization of fracture is got up.

Further technical scheme of the present invention is: described people's face detecting unit comprises that also lip moves filtering module, the moving filtering module of described lip utilizes the geometric properties of mouth region in people's face, obtain mouth region according to the numerical value ratio, add up the threshold trait of lip look in people's face simultaneously, thereby set up the lip color model, finally utilize this lip color model right, those contaminant filters that lie in the set of people's face picture are fallen people's face picture set the carrying out numeric ratio after filtering through complexion model.

Technique effect of the present invention is: make up a kind of people's face automatic marking method and system, at first from the video of intercepting, detect people's face, obtain the set of people's face picture, filter out the set of people's face picture then, simultaneously, obtain the hsv color histogram difference of an adjacent frame picture, the camera lens edge detection algorithm of employing spatial color histogram carries out camera lens to be cut apart, to people's face, detect angle point in the target area of first frame, and use the method for local coupling that these angle points are prolonged and pass next frame from consecutive frame, and upgrade accordingly, and the statistical match number, the threshold value according to the coupling number goes on according to this and obtains people's face sequence.Detect speaker and speaker not by the moving detection module of lip according to the lip of speaker in people's face sequence is moving then, speaker, the content of speaking and the time three that speaks are integrated into the rower notes; At last, read in the people's face on each sequence, affined transformation is carried out according to positioning result again in the location one by one, and the grey scale pixel value near the fixed size border circular areas of unique point after the extraction conversion, as this face characteristic.People's face automatic marking method of the present invention and system, easy to use, the accuracy height.

Description of drawings

Fig. 1 is existing labeling system structural representation.

Fig. 2 is a labeling system process flow diagram of the present invention.

Fig. 3 adopts KLT track algorithm process flow diagram for the present invention

Fig. 4 is inventor's face trace flow figure

Fig. 5 is a labeling system structural representation of the present invention.

Fig. 6 is a labeling system concrete structure synoptic diagram of the present invention.

Embodiment

Below in conjunction with specific embodiment, technical solution of the present invention is further specified.

As shown in Figure 2, the specific embodiment of the present invention is: a kind of people's face automatic marking method is provided, comprises the steps:

Step 100: people's face detects, and detects people's face from the video of intercepting that is:, obtains the set of people's face picture, filter out the set of people's face picture then, simultaneously, obtain the hsv color histogram difference of an adjacent frame picture, the camera lens edge detection algorithm of employing spatial color histogram carries out camera lens to be cut apart; To people's face, detect angle point in the target area of first frame, and use the method for local coupling that these angle points are prolonged and pass next frame from consecutive frame, and upgrade accordingly, and the statistical match number, the threshold value according to the coupling number goes on according to this and obtains people's face sequence.

Specific implementation process is as follows: at first use the Adaboost algorithm slightly to extract: the detection window minimum dimension is set to 20*20dpi, the zoom factor of detection window is 1.2, and detected people's face is carried out the 80*80 size block format, extract through the Adaboost algorithm, having obtained people's face picture set, is not all to be people's face picture in this set, also has non-face wrong picture, need further detect and filter, thereby weed out this part wrong picture.Here adopted the complexion model filtration, realize by function m odelSkinColor (IplImage*img), at first count the threshold trait of face complexion, and then set up a complexion model, finally utilize this complexion model that people's face picture is carried out numerical analysis based on pixel, undesirable image filtering is fallen.

After being written into video, read all frames, each two field picture is carried out people's face detect, detection is presented at video under also preserving, when detecting, calculate the hsv color histogram difference of consecutive frame, be used for the usefulness that camera lens is cut apart.Camera lens is cut apart the camera lens edge detection algorithm of usage space color histogram, and it is serious to consider that video is subjected to illumination effect, selects the color histogram based on the HSV space for use, because the relative illumination variation of H component has stability; Camera lens is cut apart in the son, and its segmentation threshold default setting is 0.4, and for accurately to cut apart at the different video environment, and the user can manually import a plurality of segmentation thresholds to be cut apart and check segmentation result, finds optimal segmentation threshold.

Face tracking extracts people's face sequence: be better than operation based on individual facial image based on the operation of sequence, because the data volume of mark descends, and with the sequence is that the mark unit can improve accuracy greatly, this system in camera lens inside, uses KLT(Kanade-Lucas-Tomasi at face tracking) follow the tracks of based on the track algorithm of angle point.This algorithm is divided into two parts: Harris Corner Detection Algorithm and KLT angle point track algorithm, at first adopt the Harris Corner Detection Algorithm to detect the angle point of target area, re-use KLT angle point track algorithm and follow the tracks of angle point, therefore, the tracking of people's face is exactly the tracking of angle point in the human face region.Common processing mode is to detect angle point in the target area of first frame, and uses local method of mating that these angle points are prolonged and pass next frame, and upgrades accordingly, goes on according to this.This paper is in camera lens, examine or check the people's face on the adjacent frame, suppose A, the B people's face from consecutive frame, utilization Harris Corner Detection Algorithm finds the angle point of A, search the angle point that in B, mates according to the method for pyramid LK compute sparse light stream again, and statistical match number m (c _i, c _I+1), c wherein _iThe angle point of expression A, c _I+1The angle point of expression B.Set a threshold value, as m (c _i, c _I+1) during greater than this threshold value, judge that these two target areas are from same sequence.

To people's face, detect angle point in the target area of first frame, and use the method for local coupling that these angle points are prolonged and pass next frame from consecutive frame, and upgrade accordingly, and the statistical match number, the threshold value according to the coupling number goes on according to this and obtains people's face sequence.

Step 200: speaker's face sequence labelling, that is: detect speaker and speaker not by the moving detection module of lip according to the lip of speaker in people's face sequence is moving, speaker, the content of speaking and the time three that speaks are integrated into the rower notes.

Specific implementation process is as follows: words person mark and for speaker's module not provides training data, and system uses words person to detect to carry out speaker's face sequence labelling, and it also is to obtain the training data process that words person detects.Words person's detection technique used herein at first uses dynamic time consolidation algorithm to merge drama and caption information, drama is gathered around the information of having plenty of characters name and speaking content, caption information is the information of the time and the content of speaking, by setting up the data dictionary of the content of speaking, with characters name, the content of speaking, the time, the three was merged mutually.In the specific embodiment, set up data coordinates, transverse axis is that time, the longitudinal axis are that name, coordinate are the content of speaking, and time, name and the content three that speaks are merged.

According to the speaker's face sequence that has marked as training sample, use the PSM method that some specific regions of people's face are located one by one, carry out affined transformation according to positioning result again and carry out the face posture rectification, and the grey scale pixel value near the fixed size border circular areas of unique point after the extraction conversion, after the normalized as this face characteristic, those people's faces as training sample, are encoded to these features of speaker's face with the LC-KSVD algorithm after the feature of extraction, carry out dictionary study.

Step 300: speaker's face sequence labelling not, that is: with PSM method location feature, affined transformation, after extracting face characteristic and normalization, use the LC-KSVD algorithm that the feature that extracts of this sequence people face is encoded, and mate, by the mode of ballot with the encoder dictionary of having learnt, be used for determining which kind of speaker's face do not belong to, and finishes marking Function.

Specific implementation process is as follows: at first carry out feature extraction, based on the overall situation and based on the pixel characteristic of part, use Pictorial Structures Model method to position, at first will read in location model, position the initialization of model, read in everyone face sequence more successively, read in the people's face on each sequence, affined transformation is carried out according to positioning result again in the location one by one, and the grey scale pixel value near the fixed size border circular areas of unique point after the extraction conversion, as this face characteristic.

Preferred implementation of the present invention is: detect in the step at people's face, comprise that also the people's face picture to intercepting carries out colour of skin filtration, at first count the threshold trait of face complexion, and then set up a complexion model, finally utilize this complexion model that people's face picture is carried out numerical analysis based on pixel, undesirable image filtering is fallen.

Preferred implementation of the present invention is: detect in the step at people's face, comprise that also the people's face picture to intercepting carries out the moving filtration of lip, utilize the geometric properties of mouth region in people's face, obtain mouth region according to the numerical value ratio, add up the threshold trait of lip look in people's face simultaneously, thereby set up the lip color model, finally utilize this lip color model right, those contaminant filters that lie in the set of people's face picture are fallen people's face picture set the carrying out numeric ratio after filtering through complexion model.

Preferred implementation process is as follows: though complexion model has filtered out most wrong people's face picture in the set of people's face picture, but find through overtesting, in the complexion model filter process, filter effect for the very approximate object of some and people's face color is not fine, such as, the clothes of yellow floor and yellowish pink etc.In order to overcome this problem, system has introduced the lip color model.In order to realize of the filtration of lip color model to wrong people's face, at first carried out the mouth region extraction, realized this function by function m odelLipColor (const IplImage*img), what import is that the method that this extraction is adopted is to utilize the geometric properties of mouth region in people's face, obtains mouth region according to the numerical value ratio.Add up the threshold trait of lip look in people's face simultaneously, thereby set up the lip color model, finally utilize this lip color model right, those contaminant filters that lie in the set of people's face picture are fallen people's face picture set the carrying out numeric ratio after filtering through complexion model.

Preferred implementation of the present invention is: in obtaining people's face sequence process, after video lens is extracting end, again last image of the flanking sequence in the same camera lens and first image of last sequence are being re-used track algorithm one time, whether turn down threshold value and reexamine and can be aggregated to together this moment, and the sequential polymerization of fracture is got up.

The trace flow that this paper is designed such as Fig. 3 use the tracking of KLT angle point to obtain 3 sequences from frame i to frame i+4, when new person's face 3 occurring among the frame i+2, form a new sequence; When frame i+4 do not have can and frame i+3 people face 2 be complementary people's face the time, the tracking of sequence 1 end.Analyze and find to differ between the little frame of distance because the personage moves excessive, may cause the sequence fracture and form two independent sequences, if just be aggregated to together probably but these two sequences derive from same camera lens, therefore, this paper carries out sequence and extracts on the basis of video lens, after extracting end, again last image of the flanking sequence in the same camera lens and first image of last sequence are being re-used track algorithm one time, whether turn down threshold value and reexamine and can be aggregated to together this moment, i.e. the dual threshold method.The dual threshold method can be got up the sequential polymerization that ruptures effectively.

Face tracking such as Fig. 4, wherein threshold value 1 is a KLT algorithm threshold value, is used for sequence of partitions; Threshold value 2 is sequence length lower limits, is used for doing the sequence coarse filtration, and short sequence filter is fallen, and the sequence that tracking obtains is preserved, and can check for the user, adjusts and follows the tracks of threshold value, obtains tracking results more accurately.

As Fig. 5 technical scheme of the present invention be: make up a kind of people's face automatic marking system, comprise people's face detecting unit, speaker's face sequence labelling unit, speaker's face sequence labelling unit not, described people's face detecting unit detects people's face from the video of intercepting, obtain the set of people's face picture, filter out the set of people's face picture then, simultaneously, obtain the hsv color histogram difference of consecutive frame picture, the camera lens edge detection algorithm of employing spatial color histogram carries out camera lens to be cut apart, to people's face from consecutive frame, detect angle point in the target area of first frame, and use the method for local coupling that these angle points are deferred to next frame, and upgrade accordingly, and the statistical match number, threshold value according to the coupling number goes on according to this and obtains people's face sequence; Described speaker's face sequence labelling module detects speaker and speaker not by the moving detection module of lip according to the lip of speaker in people's face sequence is moving, and speaker, the content of speaking and the time three that speaks are integrated into the rower notes; Described not speaker face sequence labelling module is read in the people's face on each sequence, and affined transformation is carried out according to positioning result again in the location one by one, and the grey scale pixel value near the fixed size border circular areas of unique point after the extraction conversion, as this face characteristic.

As shown in Figure 6, the specific implementation process of inventor's face automatic marking system is as follows:

At first use the Adaboost algorithm slightly to extract: the detection window minimum dimension is set to 20*20dpi, the zoom factor of detection window is 1.2, and detected people's face is carried out the 80*80 size block format, extract through the Adaboost algorithm, having obtained people's face picture set, is not all to be people's face picture in this set, also has non-face wrong picture, need further detect and filter, thereby weed out this part wrong picture.Here adopted the complexion model filtration, realize by function m odelSkinColor (IplImage*img), at first count the threshold trait of face complexion, and then set up a complexion model, finally utilize this complexion model that people's face picture is carried out numerical analysis based on pixel, undesirable image filtering is fallen.

Face tracking extracts people's face sequence: be better than operation based on individual facial image based on the operation of sequence, because the data volume of mark descends, and with the sequence is that the mark unit can improve accuracy greatly, this system in camera lens inside, uses KLT(Kanade-Lucas-Tomasi at face tracking) follow the tracks of based on the track algorithm of angle point.This algorithm is divided into two parts: Harris Corner Detection Algorithm and KLT angle point track algorithm, at first adopt the Harris Corner Detection Algorithm to detect the angle point of target area, re-use KLT angle point track algorithm and follow the tracks of angle point, therefore, the tracking of people's face is exactly the tracking of angle point in the human face region.Common processing mode is to detect angle point in the target area of first frame, and uses the method for local coupling that these angle points are deferred to next frame, and upgrades accordingly, goes on according to this.This paper is in camera lens, people's face on the examination consecutive frame is supposed A, the B people's face from consecutive frame, and utilization Harris Corner Detection Algorithm finds the angle point of A, search the angle point that in B, mates according to the method for pyramid LK compute sparse light stream again, and statistical match number m (c _i, c _I+1), c wherein _iThe angle point of expression A, c _I+1The angle point of expression B.Set a threshold value, as m (c _i, c _I+1) during greater than this threshold value, judge that these two target areas are from same sequence.

Speaker's face sequence labelling: words person mark and for speaker's module not provides training data, system uses words person to detect to carry out speaker's face sequence labelling, and it also is to obtain the training data process that words person detects.Words person's detection technique used herein at first uses dynamic time consolidation algorithm to merge drama and caption information, drama is gathered around the information of having plenty of characters name and speaking content, caption information is the information of the time and the content of speaking, by setting up the data dictionary of the content of speaking, with characters name, the content of speaking, the time, the three was merged mutually.In the specific embodiment, set up data coordinates, transverse axis is that time, the longitudinal axis are that name, coordinate are the content of speaking, and time, name and the content three that speaks are merged.

Speaker's face sequence labelling not: at first carry out feature extraction, based on the overall situation and based on the pixel characteristic of part, use Pictorial Structures Model method to position, at first to read in location model, position the initialization of model, read in everyone face sequence more successively, read in the people's face on each sequence, locate one by one, carry out affined transformation according to positioning result again, and the grey scale pixel value near the fixed size border circular areas of unique point after the extraction conversion, as this face characteristic.

Preferred implementation of the present invention is: described not speaker face sequence labelling unit also comprises sort module, classification, people's face of at first wanting earlier in the training sample all to have been finished classification is encoded, all training of human faces are obtained an encoder dictionary by the study of LC-KSVD algorithm, after encoder dictionary study is finished, calculate the coding of each non-classified people's face, mate then, a threshold value is set, and when the Euclidean distance of two encoded radios during less than this threshold value, the match is successful in expression.The video human face classification is to classify by the method for statistics.Which kind of everyone face of same cycle tests belonged to adds up, be designated as " ballot value ", if the test person face and in such someone's face the match is successful, represent that then the test person face " thrown a ticket " to such, the rate of specific gravity that accounts in this people's face sequence when a certain class is all bigger than other classes, this sequence belongs to such so, voting process that Here it is.

Divide the time-like use to mark conforming KSVD dictionary study coding based on class based on the LC-KSVD(that video properties is optimized) algorithm, learn encoder dictionary earlier, calculate the coding of each test person face again, and on the basis of sequence, classify.The LC-KSVD algorithm will be constructed an initial dictionary before carrying out dictionary study, as the input of KSVD algorithm iteration study, the mode that is based on sequence that the present invention uses is optimized the structure of initial dictionary.Method specifically describes as follows: evenly select some individual faces to construct D on each sequence basis of each class ₀, suppose that the dictionary element number is K, the classification number is N, tf _IjJ sequence representing the i class, then D ₀As follows:

D ₀=(d ₀,d ₁,...,d _k,....)(d _k∈tf _ij,i=1,2,..,N;k=1,2...,K)

D ₀In each tabulation show an element, each element all has class mark corresponding.After encoder dictionary study is finished, calculate the coding of each the people's face that need classify.The video human face classification is by statistical classification result's on sequence basis.At first all samples of same cycle tests are voted to all classes, the ballot value is the classification score on each class of each test person face in the sequence, and computing formula is,

j = \arg \max_{j} (l = {Wx}_{i})

Wherein W is a matrix of coefficients, x _iExpression input signal corresponding codes value.S _j=[s ₁, s ₂..., s _i... s _C] ^tJ=1 ..., n, wherein S _jExpression test person face j classification results, n represents cycle tests length, C represents classification sum, s _jThe expression sample on class j score value; With the sequence be again basis statistics score and, promptly

N wherein _iRepresent i sequence length; Class mark with this cycle tests is defined as at last

I=1 ..., C.

Preferred implementation of the present invention is: described people's face detecting unit comprises the dual threshold module, described dual threshold module is after video lens is extracting end, again last image of the flanking sequence in the same camera lens and first image of last sequence are being re-used track algorithm one time, whether turn down threshold value and reexamine and can be aggregated to together this moment, and the sequential polymerization of fracture is got up.

Preferred implementation of the present invention is: described people's face detecting unit comprises that also lip moves filtering module, the moving filtering module of described lip utilizes the geometric properties of mouth region in people's face, obtain mouth region according to the numerical value ratio, add up the threshold trait of lip look in people's face simultaneously, thereby set up the lip color model, finally utilize this lip color model right, those contaminant filters that lie in the set of people's face picture are fallen people's face picture set the carrying out numeric ratio after filtering through complexion model.

Above content be in conjunction with concrete preferred implementation to further describing that the present invention did, can not assert that concrete enforcement of the present invention is confined to these explanations.For the general technical staff of the technical field of the invention, without departing from the inventive concept of the premise, can also make some simple deduction or replace, all should be considered as belonging to protection scope of the present invention.

Claims

1. people's face automatic marking method is characterized in that, comprises the steps:

2. people's face automatic marking method according to claim 1, it is characterized in that, detect in the step at people's face, comprise that also the people's face picture to intercepting carries out colour of skin filtration, at first count the threshold trait of face complexion, and then set up a complexion model, and finally utilize this complexion model that people's face picture is carried out numerical analysis based on pixel, undesirable image filtering is fallen.

3. people's face automatic marking method according to claim 1, it is characterized in that, detect in the step at people's face, comprise that also the people's face picture to intercepting carries out the moving filtration of lip, utilize the geometric properties of mouth region in people's face, obtain mouth region according to the numerical value ratio, add up the threshold trait of lip look in people's face simultaneously, thereby set up the lip color model, finally utilize this lip color model right, those contaminant filters that lie in the set of people's face picture are fallen people's face picture set the carrying out numeric ratio after filtering through complexion model.

4. people's face automatic marking method according to claim 1, it is characterized in that, in obtaining people's face sequence process, after video lens is extracting end, again last image of the flanking sequence in the same camera lens and first image of last sequence are being re-used track algorithm one time, whether turn down threshold value and reexamine and can be aggregated to together this moment, and the sequential polymerization of fracture is got up.

5. people's face automatic marking method according to claim 1 is characterized in that, sets up data coordinates, and transverse axis is that time, the longitudinal axis are that name, coordinate are the content of speaking, and time, name and the content three that speaks are merged.

6. people's face automatic marking method according to claim 1 is characterized in that, in the face tracking process people's face sequence length lower limit is set, and wrong people's face is rejected.

7. people's face automatic marking system, it is characterized in that, comprise people's face detecting unit, speaker's face sequence labelling unit, speaker's face sequence labelling unit not, described people's face detecting unit detects people's face from the video of intercepting, obtain the set of people's face picture, filter out the set of people's face picture then, simultaneously, obtain the hsv color histogram difference of an adjacent frame picture, the camera lens edge detection algorithm of employing spatial color histogram carries out camera lens to be cut apart, to people's face, detect angle point in the target area of first frame, and use the method for local coupling that these angle points are prolonged and pass next frame from consecutive frame, and upgrade accordingly, and the statistical match number, the threshold value according to the coupling number goes on according to this and obtains people's face sequence; Described speaker's face sequence labelling module detects speaker and speaker not by the moving detection module of lip according to the lip of speaker in people's face sequence is moving, and speaker, the content of speaking and the time three that speaks are integrated into the rower notes; Described not speaker face sequence labelling module is read in the people's face on each sequence, and affined transformation is carried out according to positioning result again in the location one by one, and the grey scale pixel value near the fixed size border circular areas of unique point after the extraction conversion, as this face characteristic.

8. according to the described people's face of claim 7 automatic marking system, it is characterized in that, described not speaker face sequence labelling unit also comprises sort module, described sort module is encoded to people's face that in the training sample all have been finished classification earlier, all training of human faces are obtained an encoder dictionary by the study of LC-KSVD algorithm, after encoder dictionary study is finished, calculate the coding of each non-classified people's face then, mate then, a threshold value is set, when the Euclidean distance of two encoded radios during less than this threshold value, the match is successful in expression, and the video human face classification is to classify by the method for statistics.

9. according to the described people's face of claim 7 automatic marking system, it is characterized in that, described people's face detecting unit comprises the dual threshold module, described dual threshold module is after video lens is extracting end, again last image of the flanking sequence in the same camera lens and first image of last sequence are being re-used track algorithm one time, whether turn down threshold value and reexamine and can be aggregated to together this moment, and the sequential polymerization of fracture is got up.

10. according to the described people's face of claim 7 automatic marking system, it is characterized in that, described people's face detecting unit comprises that also lip moves filtering module, the moving filtering module of described lip utilizes the geometric properties of mouth region in people's face, obtain mouth region according to the numerical value ratio, add up the threshold trait of lip look in people's face simultaneously, thereby set up the lip color model, finally utilize this lip color model right, those contaminant filters that lie in the set of people's face picture are fallen people's face picture set the carrying out numeric ratio after filtering through complexion model.