CN103984738B

CN103984738B - Role labelling method based on search matching

Info

Publication number: CN103984738B
Application number: CN201410218854.7A
Authority: CN
Inventors: 陈智能; 冯柏岚; 徐波
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2014-05-22
Filing date: 2014-05-22
Publication date: 2017-05-24
Anticipated expiration: 2034-05-22
Also published as: CN103984738A

Abstract

The invention discloses a movie and television play role labelling method based on search matching. The method comprises the following steps of: obtaining the to-be-labelled object set of a labelling scene and all to-be-labelled object information according to a to-be-labelled object list; constructing a text keyword for each of to-be-labelled objects, and obtaining the corresponding image set by virtue of an image search engine; carrying out face detection and visual attribute analysis on the image of the search result, and removing a noise therein to obtain a role face set which is closely related to the labelling scene, of the to-be-labelled objects; carrying out face detection and tracking on the labelling scene to obtain all face sequences therein; carrying out role labelling on the labelling scene on the basis of a visual similarity among the face sequences, and a visual similarity analysis on the face sequences and the role faces of the to-be-labelled objects. According to the method disclosed by the invention, movie and television play role labelling is carried out by virtue of face images related to movie and television play roles in the Internet; the method disclosed by the invention has the beneficial effects that the labelling process is fully-automatic, high in labelling accuracy, and high in method extensibility and universality.

Description

A kind of character labeling method based on search matching

Technical field

The present invention relates to video intelligent analysis technical field, in particular, it is related to a kind of role based on search matching Mask method.

Background technology

With flourishing for film and television play industry, there are a large amount of movie and television play programs to be produced out and extreme enrichment every year The entertainment life of the people.The story main body of most movie and television plays is character.These roles are played the part of by true performer Drill, movie and television play plot is continued to develop and goed deep into also with the appearance and interaction of role.Therefore, character labeling is carried out to movie and television play, For the face occurred in movie and television play adds corresponding role name, the mapping relations set up between face-role name, so as to obtain The character specific time slice for occurring and area of space information in movie and television play, are worth as an extensive application Important topic.Currently, movie and television play character labeling turned into the intelligent and personal management of extensive movie and television play data, browse and Base support technology in the service such as retrieval.Browse in the movie and television play centered on role, intelligent video summary, towards specific angle The role of nucleus module is play in the application such as video frequency searching of color.

The method for having had some movie and television play character labelings at present is suggested, and they can be broadly divided into based on face mould The method of type and the method based on drama.Method based on faceform is that each role collects a number of face as instruction Practice sample, and be that each role constructs respective faceform, based on these models, face in movie and television play using these samples Character labeling is then realized according to it with the similarity of different role faceform.Although this kind of method has been obtained in many systems To successful Application.But, it needs to artificially collect training sample, it will usually expend regular hour and energy.And above-mentioned instruction The faceform that gets is general also more difficult to be applied to other movie and television plays.Because even being same performer, s/hes are in difference Visual appearance in movie and television play also likely to be present larger difference, cause the method based on faceform to be difficult to expand on a large scale The treatment and analysis of movie and television play come up.On the other hand, the method based on drama is then by excavating movie and television play text and visual information Mode uniformity in time realizes character labeling.Usually, this kind of method is obtained from outside channel such as internet first The drama and captioned test of movie and television play program are obtained, by drama and the captions of aliging, specific role is obtained and is being said in particular point in time The information of words.Simultaneously according to the time point that face is detected in movie and television play, the mapping relations of face and role name are tentatively set up, entered And the visual similarity between face is utilized, this relation is refined and is allowed to more accurate.Method advantage based on drama is Annotation process is automatic (without manual intervention).However, the drama and caption information of not all movie and television play are all easy to Obtain.Many movie and television plays do not disclose its drama, or drama and captions and non-fully corresponding, and many dubbed films do not have yet Chinese drama and captions, these factors limit the universality of the method based on drama.

In addition to the above methods, some famous person's image labeling methods for being based on search also are suggested in the recent period.These methods Famous person's facial image construction famous person storehouse is collected first with search engine.Then to image to be marked, by calculate the image with The vision similarity of image in famous person storehouse, obtains a small amount of highly similar image, and then the famous person's letter according to belonging to these images Breath, realizes the famous person's mark to image to be marked.But, the validity of this kind of method is still only only including the storehouse of hundreds of famous persons On be confirmed, additionally, this work is directed to image area rather than visual domain, it is impossible to be can be used to using video structure etc. auxiliary Help the valuable clue of mark.

The prosperity of internet causes that substantial amounts of character image is appeared on network.Performer with certain popularity is come Say, with the Real Name of s/he as inquiry, the facial image of many s/hes can be retrieved by image search engine.This A little faces generally have following features：1) image of the retrieval result image comprising the performer in different movie and television plays, and life, Therefore face also has certain visual appearance to change；2) certain noise is usually contained in facial image, is such as occurred in image It is the face of other people；3) sorted in retrieval result forward image correct proportions generally than sequence rearward high.The opposing party Face, with movie and television play name plus performer institute role name in movie and television play as inquiry, because inquiry is more strict, by image The characteristics of facial image that search engine retrieving is arrived, is then different from the former.Usually, the master in being movie and television play as inquired about role The image major part for sorting forward when wanting role, in retrieval result is facial image of the role in the movie and television play, but ought When role is not dominant role, the noise proportional of the forward retrieval result of sorting would generally be higher, as a result in also have it is higher Probability there is the facial image of other dominant roles in some movie and television plays.

The facial image and its These characteristics that movie and television play character search is obtained obviously can be used to preferably realize role Mark.But, prior art does not have well using these information, is particularly excavating the result that different query and searches are obtained The characteristics of image this aspect.The present invention is based on this understanding and puts forward.Specifically, the present invention is using movie and television play name plus angle The facial image that the role occurs in the movie and television play is generally comprised in the image that color name retrieval is obtained.Therefore, using based on regarding Feeling the method for matching can obtain good character labeling effect.But, it is likely to deposit in the image collection that so retrieval is obtained In a small number of or even more noises, how to differentiate noise and remove its influence as a difficult point.Therefore, novelty of the present invention The image collection noise proportional that obtains of utilization Real Name retrieval generally relatively low this feature, by excavating " Real Name " Face set obtains the perceptual property of performer, and then the face set using these perceptual properties to " movie and television play name plus role name " Denoising is carried out, so as to obtain role's face set of performer.Based on this, the vision of role's face and face in movie and television play is recycled Visual similarity in similitude, and movie and television play between face, realizes the high accuracy mark of movie and television play role.It is based on tradition The method of faceform is compared, and annotation process of the invention is automatic without manual intervention, and role's facial image is with video display Play is adaptive to be should determine that, with good autgmentability.Compared with the method based on drama, the present invention only needs to the performer of movie and television play Table can be carried out, and compared to drama and captions are obtained, obtained cast and be relatively easy to many tasks.Even so, even if Cannot get cast, artificial summary one is also one and summarizes drama and the easy task of captioned test more than artificial.Therefore originally Invention has stronger universality, is applicable in more movie and television plays.Additionally, the famous person's image labeling method based on search is only Facial image is collected using name, it is of the invention then fully excavated the correlation between the different facial images that obtain of inquiries, and according to This realizes that great targetedly movie and television play role face is collected.Moreover, the present invention is also by excavating the structural information of video Character labeling is better achieved, thus it is higher to be more technically advanced mark precision.More than refer to Application No. 201210215951.1, the invention of entitled " a kind of method that high priest's summary is automatically generated inside TV programme " is special Profit；And Application No. 201110406765.1, entitled " a kind of TV play video analysis method of based role " Patent of invention.

The content of the invention

It is an object of the invention to fully excavating and effectively using the facial image on movie and television play role in internet, carrying For a kind of automatic, expansible, universality is strong, high-precision character labeling method, be the intelligent of magnanimity movie and television play data and Propertyization management, browse and the service offer base support technology such as retrieve.

To achieve the above object, the present invention provide it is a kind of based on search matching character labeling method, the method include with Lower step：

S1, according to list object to be marked, obtain marking the object set to be marked and all objects to be marked of scene Information；

S2, it is every object formation text key word to be marked, corresponding Search Results is obtained using image search engine Image collection；

S3, Face datection and perceptual property analysis are carried out on the Search Results image for being obtained, belonged to using face vision Property uniformity remove noise therein, obtain the object to be marked role face set closely related with mark scene；

S4, to it is described mark scene carry out Face detection and tracking, obtain wherein all of face sequence；S5, based on people Vision similarity between face sequence, and face sequence is analyzed with the vision similarity of object role face to be marked, to institute Stating mark scene carries out character labeling.

According to the invention it is proposed that a kind of movie and television play character labeling method based on search matching.The method is by excavating The relation of the facial image that different query and searches are obtained, obtains the role facial image closely related with movie and television play, and then according to The visual similarity of face sequence in obtained role's facial image and movie and television play, and the vision in movie and television play between face sequence Similitude realizes character labeling.The method has annotation process full-automatic without manual intervention, marks high precision, it is adaptable to big rule Mould movie and television play data processing, autgmentability is strong, it is adaptable to polytype movie and television play, the strong advantage of universality.The method can also be made For extensive movie and television play data intelligent and personal management, browse with the important foundation support technology in retrieval service, Movie and television play centered on role is browsed, intelligent video summary, towards playing core in the application such as video frequency searching of specific role The effect of module.

Brief description of the drawings

Fig. 1 is the flow chart according to the character labeling method based on search matching of one embodiment of the invention.

Specific embodiment

To make the object, technical solutions and advantages of the present invention become more apparent, below in conjunction with specific embodiment, and reference Accompanying drawing, the present invention is described in more detail.

As shown in figure 1, the character labeling method based on search matching of the invention is comprised the following steps：

S1, according to the list object to be marked such as cast, obtain marking the object set to be marked of scene and need to be marked Note the information of object：Real Name and role name；

S2, it is every performer construction text key word, corresponding Search Results image set is obtained using image search engine Close；

S3, the Search Results image set for being obtained close carry out Face datection and perceptual property analysis, regarded using face Feel that the uniformity of attribute removes noise therein, obtain performer and the closely related role's face set of the movie and television play；

S4, Face detection and tracking is carried out to movie and television play, obtain all of face sequence in movie and television play；

It is S5, similar to the vision of performer role's face based on the vision similarity between face sequence, and face sequence Degree analysis, realizes the character labeling to movie and television play.

According to a preferred embodiment of the invention, according to the list object to be marked such as cast, all objects to be marked are obtained Real Name and the detailed process of role name be：

Step 11, access love performer's net (http://www.ayanyuan.com/)、IMDB(http:// ) etc. www.imdb.com/ specialty is directed to movie and television play cast, the website of story introduction, and the shadow is obtained using movie and television play name inquiry Depending on play, i.e., webpage related to the mark scene；

Step 12, the page layout according to the webpage, crawl obtain performer's matrix section, obtain performer's collection of the movie and television play Close, and each performer Real Name, the information such as role name.

According to a preferred embodiment of the invention, the performer's set for being obtained to step 12, is every performer's construction Real Name With movie and television play name plus two groups of text key words of role name, using image search engine obtain Search Results image detailed process such as Under：

Every performer in step 21, the performer's set obtained for step 12 constructs two text key words, and one is to drill The Real Name of member, another is name combinatorics on words of the full name of movie and television play plus performer institute figure；

After step 22, text key word construction are finished, using image search engine, such as the application for calling Google to provide The two text key words are submitted to Google image search engines by routine interface successively, and it is retrieval to set search parameter Image comprising face, returns to several Search Results images corresponding with the performer, such as retrieve the result images quantity for obtaining It is 64, under the setting, Google image search engines can sort in the unified resource of the facial image of first 64 retrieval result Finger URL (i.e. URL addresses) returns to retrieval end, retrieves end and then downloads respective image according to the address.That is, institute Having image can normally download ideally, the step for can obtain 64 Search Results images.In practical application, The image that each keyword can be downloaded to is generally between 50 to 64.Downloaded with Real Name and movie and television play name plus role name To image collection be known respectively as " Real Name " and " movie and television play name plus role name " image collection.

Every performer in gathering performer repeats said process, that is, obtain " Real Name " and " video display of each performer Acute name adds role name " image collection.

According to a preferred embodiment of the invention, " Real Name " and " the movie and television play name plus role name " figure for being obtained to step 2 Image set is closed carries out Face datection and perceptual property analysis, and noise therein is removed using the uniformity of face perceptual property, obtains The detailed process of performer's role face set closely related with the movie and television play is as follows：

Step 31, call recognition of face cloud service Face++ (http://www.faceplusplus.com.cn/) people The face detection instrument such as interface, carries out Face datection to " Real Name " and " movie and television play name adds role name " image collection, and according to Image collection is expressed as " Real Name " accordingly and " movie and television play name plus role name " face set by testing result；Extract simultaneously The perceptual property of each object face to be marked, in an embodiment of the present invention, the perceptual property includes sex, age and people Kinds three kinds, and locating human face M facial key area, in an embodiment of the present invention, the facial key area includes nine It is individual, respectively：Two left and right corners of eyes, the lower-left of nose is along, middle lower edge and bottom right edge, the left and right corner of face.In each face Portion's key area extracts N-dimensional characteristic vector (the SIFT feature vectors of such as 128 dimensions), and this 9 128 characteristic vectors of dimension are spelled It is connected in face face visual signature description of 1152 dimensions.Every performer in gathering performer repeats said process, obtains every " Real Name " of individual performer and " movie and television play name plus role name " face set, above-mentioned three kinds of perceptual properties of each face and face Portion's critical zone locations；

Step 32, closed in " Real Name " face collection of every performer, the system of above-mentioned three kinds of perceptual properties is generated respectively Meter histogram, such as：For gender attribute generates one 2 dimension histogram, 2 dimensions correspond to masculinity and femininity respectively；For age attribute is given birth to Into one 8 dimension histogram, wherein the dimension of the 1st peacekeeping the 8th corresponds to the face of less than 10 years old and more than 70 years old respectively, the age falls in interval [10* (i-1), 10*i) the face histogrammic i-th dimension of correspondence；For people's attribute generates a 3-dimensional histogram, 3-dimensional is right respectively Answer " Asian ", " white man " and " Black people ".The phase of appearance situation according to three kinds of perceptual properties of face to the statistic histogram Dimension is answered to be voted.When all faces have been voted in performer " Real Name " the face set, calculate histogram and obtain The ratio of the most dimension of poll and face quantity, if the ratio exceedes the threshold value of setting, such as 0.5, then it is assumed that the vision belongs to It is significant that property is closed in " Real Name " face collection.One performer is defined as the above-mentioned of recognizable and if only if s/he Three kinds of perceptual properties are all significant.These three notable attributes are also defined as the character attribute of the performer.All performers' " Real Name " face collection closes repetition said process, obtains the character attribute of all of recognizable performer and s/hes.For Those are not defined as recognizable performer, due to the character attribute of s/hes cannot be identified from network facial image, Will not be considered in follow-up character labeling；

Step 33, the every recognizable performer obtained to step 32, close in its " movie and television play name plus role name " face collection (without loss of generality, performer's role name and " movie and television play name plus role name " face set is respectively defined as Per_iAnd CF_i), based on step The rapid 31 1152 dimension face face visual signatures for obtaining describe son carries out face cluster, in an embodiment of the present invention, using imitative Penetrating propagation (Affinity Propagation) algorithm carries out face cluster, and the clustering algorithm needs to calculate the similarity moment of face Battle array S ≡ [s_i,j]_T×T, wherein, element s_i,jIt is face f_iAnd f_jVision similarity, be face f as i ≠ j_iAnd f_jDescription COS distance, be the average value of all human face similarity degrees in the set as i=j, T is set CF_iIn face quantity. According to the cluster process, can be by CF_iIt is expressed as the form of formula (1)

Wherein, w is the categorical measure of generation after cluster,It is CF_iJ-th cluster result in set,ForIn K-th description of face.Cluster only retains result classification of the face quantity more than or equal to 3.

Each the cluster result classification obtained to formula (1)Respectively the sex of the performer that statistic procedure 32 is obtained, The appearance ratio of age, ethnic group three-type-person's thing attribute in the category.When the appearance ratio of three attribute is both greater than a predetermined threshold Value, such as 0.6, then it is assumed thatIn face be all performer Per_iThe candidate role face closely related with the movie and television play.To institute HaveClassification repeat said process, obtain Per_iAll candidate role's faces.To in all recognizable performer's repetitions Process is stated, the respective candidate role's face set of s/hes is obtained；

Step 34, image duplicate removal is carried out for candidate role's face set of object to be marked, i.e., obtained for step 33 Performer Per_iCandidate role's face setDue to generally there is a number of vision copy in network facial image Image, it is right to remove the influence of copy imageFacial image in set carries out vision copy detection, real in the present invention one Apply in example, detected as detection kit with gopher case SOTU using vision copy detection and (for details, reference can be made to http://vireo.cs.cityu.edu.hk/research/project/sotu.htm).IfInterior detection vision copy people Face, the then sequence according to facial image in Google retrieval results, the face that deletion is sorted rearward in retrieval result is repeated Perform said process untilIn there is no copy face.Said process is repeated to all recognizable performers so that s/hes There is no vision copy face in respective candidate role's face set；

Step 35, the result based on step 34, further carry out face duplicate removal, that is, detect different recognizable performers' In candidate role's face set face is copied with the presence or absence of vision.Because vision copy face is only possible to belong to a role.If Performer Per_iAnd Per_jCandidate role's face set in detect copy face f, then respectively calculate f with the two set in other The average visual similarity of face, f is deleted in the low face set of similarity.Said process is repeated until between different performers not There is copy face again.By above-mentioned steps, K the set Γ and the respective role of s/hes of recognizable performer are can obtain Face set A_i, it is designated as respectively：

Γ={ A₁,A₂,…,A_K, wherein

Wherein,Represent Per_iRole's face set in j-th face description son.

According to a preferred embodiment of the invention, Face detection and tracking is carried out to movie and television play, obtains the face in movie and television play The detailed process of sequence is：

Step 41, shot boundary detector is carried out to movie and television play, if detecting s-1 shot boundary point.According to this s-1 mirror Movie and television play is decomposed into s camera lens by head boundary point；

The instruments such as step 42, the Face datection and tracked interface for calling recognition of face cloud service Face++, in each camera lens Face detection and tracking is inside carried out, the face sequence detected in the camera lens is obtained.This process is repeated to all s camera lenses, Obtain all of face sequence in the movie and television play.It is of course also possible to using other Face detection and tracking methods, the present invention for Face detection and tracking method does not do any limitation.

According to a preferred embodiment of the invention, the recognizable performer and the respective role of s/hes for being obtained based on step 35 Face set, and the movie and television play face sequence that step 42 is obtained, based on the vision similarity between face sequence, and face Sequence is analyzed with the vision similarity of performer role's face, and realization is to the detailed process of the character labeling of movie and television play：

Step 51, set step 42 T face sequence be obtained, to each face sequence in all face extraction colors it is straight Square figure feature, and clustered based on this feature.Clustering algorithm equally uses affine propagation (Affinity Propagation) algorithm, wherein human face similarity degree matrix computations principle are identical with described in step 33.Will according to cluster result Face sequence FT_kIt is expressed as：

Wherein,WithBe respectively classification i class center vector classification i in face quantity, such center vector by away from The character representation of the face nearest from classification i central points, w is categorical measure；

Step 52, due to appear in multiple face sequences of synchronization be not generally possible to correspondence same person.According to people The overlapping cases of face sequence time of occurrence, generation collison matrix C ≡ [c_i,j]_T×T.If face sequence FT_iAnd FT_jTime of occurrence There is overlap, then c_i,j=1, the c if non-overlapping_i,j=0；

Step 53, the face sequence obtained according to step 51 are represented, based on earth mover distance (Earth Mover ' s Distance face sequence FT) is calculated_iAnd FT_iVision similarity, be designated as fs_i,j, to the combination of two weight of all face sequences Multiple above-mentioned calculating process, and this obtains the probability propagation matrix P ≡ [p of face sequence similarity by formula (2)_i,j]_T×T, its In：

Step 54, calculating role match confidence level matrix S ≡ [s with face sequence_i,j]_T×K, wherein s_i,jIt is face sequence FT_iWith Per_jRole's face set similarity, the similarity be equal to the two set in most like face vision it is similar Property, calculated according to formula (3)：

WhereinIt is face sequence FT_iIn m-th class center vector and Per_jRole's face set in n-th jiao The similarity of color face；

Step 55, by formula (4), update matching confidence level matrix S using collison matrix C

This operation can avoid the face sequence overlapped for time of occurrence from assigning matching confidence level high simultaneously, so as to follow-up Same role is noted as in step；

Step 56, updated using step 55 after matrix S, similar threshold value V1 (such as V1=0.8) and dissmilarity threshold value V2 (such as V2=0.2), initial mark matrix is generated by formula (5)

In matrix L₍₀₎In,Represent FT_iIt is role Per_jFace,Represent face sequence FT_iIt is not angle Color Per_jFace,Represent only by matching confidence level, face sequence FT_iCorresponding role not can determine that still.Will be full FootTwo tuples<FT_i,Per_j>It is added to and has marked role's set LFaces.Realize to height matching confidence value and Two tuples not conflicted<FT_i,Per_j>Character labeling；

The initial mark matrix L that step 57, the probability propagation matrix P obtained based on formula (2) and formula (5) are obtained₍₀₎, (Label Propagation) algorithm is propagated by label, namely iteration performs formula (6) and formula (7) updates initial mark Matrix L₍₀₎InElement, until algorithmic statement

L_(t+1)≡PL_(t) (6)

By performing label propagation algorithm, existing high confidence level character labeling result is by according to the phase between face sequence Like spending, propagated not can determine that the face sequence of role still with certain probability；

Step 58, orderIt is the mark matrix after algorithmic statement, L is updated according to formula (8)_ΔMiddle satisfactionThe element of conditionMark confidence level

Wherein α ∈ (0,1) are the threshold values of regulation mark confidence level and matching confidence weight, are set to 0.5.By formula (8), the similarity and face sequence between face sequence obtain effective integration with the confidence level that matches of role's face；

Step 59, L successively from after step 58 renewal_ΔMiddle lookup value is maximum and meets the element of condition (9)Will< FT_i,Per_j>It is added to and has marked role set LFaces, while updates matrix L according to formula (10)_Δ.Repeat above-mentioned lookup Cheng Zhizhi L_ΔIn no longer exist and meet the element of condition (9)；

Wherein,<FT_i,Per_j>Represent by face sequence FT_iWith role Per_jComposition with height matching confidence value and not Two tuples of conflict, T_labelIt is differentiation threshold value set in advance, is set to 0.5.

According to formula (9) and (10), can choose successively when previous belief highest face sequence and role name combination are carried out Mark.Work as L_ΔWhen no longer there is the element for meeting condition (9), annotation process terminates.The knot in role's set LFaces is marked Fruit is character labeling result.

Particular embodiments described above, has been carried out further in detail to the purpose of the present invention, technical scheme and beneficial effect Describe in detail bright, should be understood that and the foregoing is only specific embodiment of the invention, be not intended to limit the invention, it is all Within the spirit and principles in the present invention, any modification, equivalent substitution and improvements done etc., should be included in guarantor of the invention Within the scope of shield.

Claims

1. it is a kind of based on the character labeling method for searching for matching, it is characterised in that the method is comprised the following steps：

S1, according to list object to be marked, obtain marking the object set to be marked of scene and the information of all objects to be marked；

S2, it is every object formation text key word to be marked, corresponding Search Results image is obtained using image search engine Set；

S3, Face datection and perceptual property analysis are carried out on the Search Results image for being obtained, using face perceptual property Uniformity removes noise therein, obtains the object to be marked role face set closely related with mark scene；

S4, to it is described mark scene carry out Face detection and tracking, obtain wherein all of face sequence；

S5, based on the vision similarity between face sequence, and face sequence and object role face to be marked vision phase Analyzed like degree, character labeling is carried out to the mark scene；

Wherein, the step S4 is comprised the following steps：

Step 41, shot boundary detector carried out to the mark scene, and the mark scene is decomposed into by s according to testing result Individual camera lens；

Step 42, Face detection and tracking is carried out for each camera lens in s camera lens, obtain owning in the mark scene Face sequence.

2. method according to claim 1, it is characterised in that the step S1 is comprised the following steps：

Step 11, retrieval obtain the webpage related to the mark scene；

Step 12, according to the webpage that obtains of retrieval, obtain the object set to be marked of the mark scene, and each is to be marked The information of object.

3. method according to claim 2, it is characterised in that the information of the object to be marked includes Real Name and angle Color name.

4. method according to claim 1, it is characterised in that the step S2 is comprised the following steps：

Step 21, it is each the object formation text key word to be marked in the object set to be marked；

Step 22, based on the text key word, using image search engine retrieval obtain each object to be marked, several with The corresponding Search Results image set of the text key word is closed.

5. method according to claim 4, it is characterised in that the text key word includes mark scene title and waits to mark Note object correspondence role name combinatorics on words, and object to be marked Real Name, the Real Name with object to be marked is corresponding Search Results image set close be designated as Per_i, it is corresponding with mark scene title role name combinatorics on words corresponding with object to be marked Search Results image set is closed and is designated as CF_i。

6. method according to claim 1, it is characterised in that the step S3 is comprised the following steps：

Step 31, to the Search Results image set close carry out Face datection, extract each object face to be marked vision category Property, and M of locating human face facial key area, N-dimensional characteristic vector is extracted in each facial key area, obtain M × N-dimensional Face face visual signature description；

Step 32, the image collection Per for each object to be marked_i, generation is straight corresponding to the statistics of the perceptual property respectively Fang Tu, and the respective dimensions of the statistic histogram are voted according to the appearance situation of each perceptual property, according to ballot Result judges the conspicuousness of each perceptual property, when all perceptual properties of and if only if certain object to be marked are notable, The object to be marked is considered as recognizable, and using corresponding perceptual property as the object to be marked character attribute；

Step 33, the to be marked object recognizable to each, in its corresponding image collection CF_iOn, regarded based on the face face Feel that Feature Descriptor carries out face cluster, according to appearance ratio of the character attribute in each cluster result classification, obtain Candidate role's face set of corresponding object to be marked；

Step 34, carry out image duplicate removal for candidate role's face set of object to be marked；

Step 35, using face average visual similarity, carry out face for the candidate role's face set after image duplicate removal Weight.

7. method according to claim 6, it is characterised in that the perceptual property includes sex, age and ethnic group.

8. method according to claim 1, it is characterised in that the step S5 is comprised the following steps：

Step 51, to each face sequence in all face extraction color histogram features, and gathered based on this feature Class；

Step 52, according to cluster result and the overlapping cases of face sequence time of occurrence, generation collison matrix C；

Vision similarity between step 53, calculating face sequence, obtains the probability propagation matrix P of face sequence similarity；

Step 54, calculating role match confidence level matrix S with face sequence, wherein, the element of matrix S is face sequence and angle Similarity between color face set；

Step 55, update the matching confidence level matrix S using the collison matrix C, it is to avoid be face that time of occurrence is overlapped Sequence assigns matching confidence level high simultaneously；

Step 56, using matching confidence level matrix S, the similar threshold value V1 after renewal and dissimilar threshold value, the initial mark square of generation Battle array L₍₀₎；

Step 57, mark based on the probability propagation matrix P and initially matrix L₍₀₎, update described first by label propagation algorithm Begin mark matrix L₍₀₎In uncertain element, until algorithmic statement；

Step 58, make L_ΔIt is the mark matrix after algorithmic statement, updates L_ΔThe mark confidence level of middle element, to merge face sequence Between similarity and face sequence and role's face match confidence level；

Step 59, mark matrix L successively from after renewal_ΔMiddle lookup value is maximum and meets the element of certain conditionAnd update The mark matrix L_Δ, said process is repeated until the mark matrix L_ΔIn no longer exist and meet the element of the condition, so Afterwards to being labeled when previous belief highest face sequence and role name combination.

9. method according to claim 8, it is characterised in that the certain condition in the step 59 is：

Wherein,Represent only by matching confidence level, face sequence FT_iCorresponding role not can determine that still；T_labelFor pre- The differentiation threshold value for first setting, ＜ FT_i, Per_j＞ is represented by face sequence FT_iWith role Per_jWhat is constituted matches confidence level with height Value and two tuples not conflicted, LFaces are represented and have been marked role's set, Label (FT_k) represent face sequence FT_kLabel, c_{I, k}It is the element of the i-th row kth row in collison matrix C, c_{I, k}=1 represents face sequence FT_iAnd FT_kTime of occurrence have overlap.