CN104217008B

CN104217008B - Internet personage video interactive mask method and system

Info

Publication number: CN104217008B
Application number: CN201410475211.0A
Authority: CN
Inventors: 陈智能; 白锦峰; 冯柏岚; 黄向生; 徐波
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2014-09-17
Filing date: 2014-09-17
Publication date: 2018-03-13
Anticipated expiration: 2034-09-17
Also published as: CN104217008A

Abstract

The invention discloses a kind of internet personage video interactive mask method and system, methods described to include：Extract the name in the face sequence and periphery text in video to be marked；Using name as text key word, corresponding personage's network image set is obtained using search engine；The importance score of face sequence, the merging recommendation scores two-by-two of face sequence, and face sequence and the similitude of personage's network image are calculated, according to above-mentioned property, it is determined that the face sequence, name and the personage's network image that are shown during mark；By a variety of user interactives, corresponding mark behavior is produced, realizes the mark to personage in video.The present invention is by excavating a variety of resources related to video to be marked and personage, and design friendly various user mutual mode, annotation process can be simplified, auxiliary mark decision-making, effectively alleviate mark person and do not recognize personage to be marked, the problem of causing annotation process to be difficult to, the efficiency and precision of personage's video labeling can be increased substantially.

Description

Internet personage video interactive mask method and system

Technical field

The present invention relates to video intelligent analysis technical field, in particular, is related to a kind of internet personage video interactive Formula mask method and system.

Background technology

With the development of Internet technology and the prevalence of the online splitting glass opaque of video, a large amount of professional and non-professional video quilts Manufacture out, upload to internet, and browsed and watched by user all over the world.Because this kind of video is typically embedded into Online play externally is provided in internet web page to service, therefore is collectively referred to as internet video.People be internet video most by One of theme of concern.There are a large amount of hot videos to be related to people, particularly famous person in video website, in addition, the name of famous person The always important component of video search engine focus inquiry word.

Although internet personage video is extensively concerned, personage interested is found in large-scale internet video library and is regarded Frequency is not an easy thing.The video search engine of main flow is all that the method matched by text key word is realized at present Video frequency searching.For personage's video, there is the deficiency in terms of following three in this search method：1) periphery of internet video Text (such as title, label and user comment) is typically imperfect and with noisy, and the video that someone occurs might not The name of the people is labelled with, correspondingly, is labelled with the video of someone name that the people not necessarily occurs, causes video search The video of a part of correlation is only able to find, and a certain proportion of noise in retrieval result generally be present；2) periphery text is to whole The description of fragment in individual video rather than video, the video segment that people appearance is directly targeted to according to name is still that main flow regards The service that can not provide at present of frequency website, and this service for user browse can undoubtedly provide for video it is very big just Profit；3) whether in retrieval result list, the video appeared in above is generally not maximally related, because occurring according only to name It is difficult to obtain the accurately inquiry degree of correlation to judge.Therefore, there is an urgent need to the effective personage's video frequency searching of more intelligence, clear for industrial circle Look at and sort method.

The key to solve the above problems marks its corresponding name in the face occurred in for video.In other words Say, establish the mapping relations of face name into periphery text in video, what this task was generalized is referred to as face mark.Though Right Face datection and name identification has been the technology of comparative maturity, but face marks, particularly in facial angle, facial table Feelings, illumination, block etc. it is unrestricted in the case of mark be still a great challenge problem.In the past several years, pin To certain types of videos such as news video, film and television plays, there are some effective face mask methods to be suggested.This Although a little methods are realized and had nothing in common with each other, the technology path of multimodal information fusion is all employed substantially.First, they are from new Hear the outside channel such as lecture notes, phonetic transcription text or internet and obtain high priest's (dominant role of such as film) involved by video Name, and the drama and captioned test of video, by using news lecture notes or alignment drama and captions, obtain particular persons In the content of speaking of video particular point in time.Simultaneously according to the time point that face is detected in video, face and people are tentatively established The mapping relations of name, and then using the visual similarity between face, this relation is refined so as to realize mark.Due to new Hear lecture notes, drama and captioned test and can generally provide and clue, and film electricity occur compared with horn of plenty and specific name and personage The high priest's quantity being related to depending on play etc. is generally also relatively limited, and the above method can be realized to particular news with higher precision The full-automatic mark of high priest in program, film and television play.

However, internet video is different from film and television play.Although also there are some text envelopes on the webpage of internet video Breath, but it is the usual limited amount of these texts, not accurate enough and do not organized preferably.In addition, they are appeared in entirely Video-level, timestamp information is carried unlike captioned test.These characteristics determine above-mentioned dependence rich text information excavating Method be difficult to directly be generalized on internet video.In addition, internet video content embraces a wide spectrum of ideas, the people that video may relate to Thing covers different social sectors, and quantity is extremely more, even only focusing on famous person therein, its quantity is nor a decimal Word.At present, the face automatic marking work for extensive open the Internet video is still in predevelopment phase.Due to being difficult to reach To preferable mark effect, the method and system that this respect does not have maturation at present emerges.

As magnanimity internet video is deposited in video website, and new video quantity is also increasing at faster speed, people Thing video labeling turns into pendulum and has to solve the problems, such as academia and industrial circle face are previous again.Therefore, people is incorporated into mark Link is noted, starts to attract attention to improve interactive mask method of the mark accuracy as target.To sky, meadow, building On mark etc. general visual concept, there are some effective interactive mask methods to be suggested at present, but these sides Method can not be applied directly in difference mark different people this problem.To find out its cause, manually mark above-mentioned general regard Feel that concept is relatively easy to realize, because these concepts only can be distinguished by general knowledge when most of, but difference is marked For different personages, even veteran mark person, generally also only recognizes people considerably less in the world, and people is to be Oneself unacquainted people marks name.If as existing Interactive Marking System, only by image or video comprising personage Frame and (multiple) related name submit to mark user, may not recognize personage to be marked due to very big, user is difficult as mark General visual concept goes to mark personage like that, even if that need to mark is all famous person.In interactive characters' mark particularly video Personage marks this aspect, and presently relevant achievement is also very rare.

When noticing that people sees unacquainted people in an image or a video, to understand whom he/her is, is taken solution party Case is typically：Name is found from the text of periphery, by the use of the name found as keyword, is examined using image search engine Rope, the result images then returned by comparison search engine and the people seen in image, show that people is whose judges in image. The image retrieval based on text key word that such scheme uses, although also there is the searching system of a small number of " to scheme to search figure " at present, But because search target is the image of particular persons, it is not required that all result images and query image visually height phase Seemingly, and video human face visual appearance change greatly, resolution ratio it is generally relatively low, also the precision of " to scheme to search figure " system is brought and chosen Fight, mainly taken at present in this task or searching method based on text key word.Due to can be with by search engine Substantial amounts of character image particularly famous person is found, such scheme many times be can yet be regarded as, and one kind is effective to help user to understand it The preceding method for not recognizing personage.

The above-mentioned way of people can use for reference naturally to personage's video interactive mask method and system design on.Mark Note person equally can run into unacquainted people and pause mark of having to, by seeking help from search engine when carrying out personage's mark Understand the people Deng external tool, and then continue to promote annotation process.Due to needing frequently to be cut in mark and search comparison operation Change, this process is undoubtedly poorly efficient and cumbersome.If video periphery text can be extracted by text resolution and Visual analysis techniques In name, obtain related person network image simultaneously accordingly shown in annotation process；At the same time, to the people in video Face is analyzed and handled, and is shown in a manner of being easy to mark, and the person that makes mark need not both be switched to search engine and go Solve personage to be marked, it is seen that be to be more easy to be added on the notation methods of decision-making and by tissue and the friendly video human face presented again It image, so can undoubtedly simplify annotation process, significantly improve the efficiency and precision of personage's video labeling.However, disclosed Interactive mask method and system specifically in video personage, the above-mentioned back of the body are not inquired when being retrieved in patent database Scape and understanding are exactly that the present invention produces motivation and reason.

The content of the invention

When the present invention is directed to internet personage video labeling, because mark person does not recognize personage to be marked very likely, lead The situation for causing annotation process to be difficult to, proposes a kind of internet personage video interactive mask method and system, passes through excavation A variety of resources related to video to be marked and personage, and friendly various user mutual mode is designed, simplify annotation process, it is auxiliary Mark decision-making is helped, improves the efficiency and precision of personage's video labeling, and then promotes the retrieval of internet personage video, browse and arrange The lifting of sequence service level.

To achieve the above object, the present invention provides a kind of internet personage video interactive mask method, including following step Suddenly：

S1, video to be marked is analyzed, extracted in the face arrangement set and video periphery text in the video Name set；

S2, using the name in the name set that step S1 is obtained as text key word, search for obtain with the people famous prime minister The network image set of the personage answered；

S3, the importance score for calculating the face sequence, the merging recommendation scores two-by-two of the face sequence, Yi Jisuo The similarity score for personage's network image corresponding with the name that face sequence obtains with step S2 is stated, and according to described heavy The property wanted score, merging recommendation scores and the similarity score two-by-two, it is determined that when being labeled to the video, are shown Face sequence, name and the personage's network image shown；

S4, the face sequence shown according to step S3, name and personage's network image, mark is interacted to face sequence Note, and then realize the mark to the video.

The present invention also proposes a kind of internet personage video interactive labeling system, including：

For analyzing video to be marked, extract in the face arrangement set and video periphery text in the video The device of name set；

For using the name in the name set as text key word, searching for acquisition personage corresponding with the name Network image set device；

For calculating the merging recommendation scores two-by-two of the importance score of the face sequence, the face sequence, and The similarity score of face sequence personage's network image corresponding with the name, and according to the importance score, institute State and merge recommendation scores and the similarity score two-by-two, it is determined that when being labeled to the video, the face sequence that is shown The device of row, name and personage's network image；

For showing face sequence, name and the personage's network image to be marked, mark is interacted to face sequence, entered And realize the device being labeled to the video.

The present invention is by excavating a variety of resources and respective design that contribute to mark related to video to be marked and personage Friendly various user mutual mode, can simplify annotation process, and auxiliary mark decision-making, effectively alleviation mark person do not recognize to be marked Personage, the problem of causing mark to be difficult to.Using the present invention, the efficiency of internet personage video labeling can be increased substantially And precision, and then promote the retrieval of internet personage video, browse and the lifting for the service level that sorts.

Brief description of the drawings

Fig. 1 is a kind of flow chart of internet personage video interactive mask method according to the embodiment of the present invention；

Fig. 2 is a kind of the internet personage video interactive labeling system sectional drawing and correlation module according to the embodiment of the present invention Explanation.

Embodiment

For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with specific embodiment, and reference Accompanying drawing, the present invention is described in further detail.

As shown in figure 1, the internet personage video interactive mask method of the present invention comprises the following steps：

S1, video is analyzed, extract the name collection in the face arrangement set and video periphery text in the video Close；

According to a preferred embodiment of the invention, video is analyzed, extracts the face sequence in the video and video week The step S1 of name in the text of side detailed process is：

Step S11, shot segmentation is carried out to the video, Face detection and tracking is carried out to each obtained camera lens, obtained The face sequence in the camera lens is obtained, the face sequence that comprehensive all camera lenses obtain, obtains the face arrangement set of the video；

When it is implemented, shot boundary detector is carried out to internet personage video to be marked first, according to shot boundary Video is decomposed into camera lens set by point；Then computer vision is called to increase income the storehouse OpenCV each camera lens of Face datection function pair Carry out Face datection frame by frame；Next, belong to same by what is detected on different video frame using based on the tracking of detection Personal face gathers together, and generates face sequence.All camera lenses are repeated with the generating process of above-mentioned face sequence, this is obtained and regards Frequently the face arrangement set of all detectionsWherein FN represents the quantity of face sequence.

Based on detection tracking generation face sequence the step of be：First, it is each according to Face datection result, extraction The color histogram feature of face, and the similarity two-by-two of face is calculated according to this.Then, by the similarity two-by-two of face from greatly to Small sequence, using coagulation type clustering method, the face for meeting following four conditions is merged two-by-two, four conditions are specific For：1) similarity two-by-two of face is more than merging threshold value set in advance；2) institute in the face class where merging the first two face In the face set be made up of face, no any two face appears in same frame of video；3) between the appearance of two faces Every no more than 1 second；4) distance of two face center point coordinates is not more than 2.5 times of face width.It is straight to repeat above-mentioned merging process Meet aforementioned four condition simultaneously to there is no two faces, obtain face cluster result.Finally, the people of same class will be belonged to Face sorts by (frame of video) time of occurrence, the face missed with the method supplement Face datection process of interpolation, generates complete people Face sequence.All faces in each face sequence belong to same person.

Above is the face retrieval method of one embodiment of the present invention.It is of course also possible to examined using other faces Survey and tracking obtains face sequence, the present invention does not do any restrictions for the acquisition methods of face sequence.

Step S12, the representative facial image of each face sequence in the face arrangement set is obtained.

A kind of embodiment is, to each face sequence, such as F_k, subscript k represent face arrangement set in k-th of people Face sequence, if it has t face, the color histogram feature of this t face is extracted, calculate face similarity matrix two-by-two according to thisWhereinFor the similarity of ith and jth face, orderFor T_kIn all human face similarity degrees Average value, self-adaption cluster is carried out to this t face using affine propagation (Affinity Propagation) clustering algorithm.If Cluster generation | F_k| individual class, then F_kIts representative face image set can be expressed asWhereinIt is from The nearest facial image of i class central point.Wherein, i is the natural number less than t.

It is of course also possible to obtain the representative facial image of face sequence using other methods, the present invention is for face sequence The acquisition methods of the representative facial image of row do not do any restrictions.

Step S13, video periphery text is collected, name is extracted from the video periphery text.

Where video periphery text refers to internet video on webpage, the word content related to video, it includes But it is not limited to：Video title, label, descriptive text and user comment.In view of different types of periphery text correlation and Noise level is different, and the present invention only considers the user comment of video title, label and length more than 20 words.

Specifically, in the case of periphery text is English (the periphery text of such as English video website), one kind is employed Name extracting method based on the matching of wikipedia biographical dictionary.To the continuous word sequence in above-mentioned text, this method is from One word starts, and whether the phrase for testing first (n ＜ 4) word sequence compositions of n successively forms a Wiki entry, if forming, Then retain the maximum Wiki entries of n, and continue above-mentioned test since (n+1)th word.With it, can be continuous " Barack Obama " and " the Wiki entry such as World Cup 2014 " are found in word sequence.In title, label and comment collection The above-mentioned resolving of repetition is closed, after obtaining Wiki entry set.This method verifies whether these Wiki entries are names one by one. Specifically, the classification description section of Wiki page where checking Wiki entry, inquiry are " xxxx wherein with the presence or absence of form Births " description classification, wherein xxxx are four or three numerals for representing the time.If in the presence of judging the Wiki entry For name, otherwise it is determined as other name entities and is omitted.

Previously described is the processing method of English text, in the case of video periphery text is Chinese, first with Chinese word segmentation instrument ICTCLAS carries out Chinese word segmentation, then again using the above-mentioned name based on the matching of wikipedia biographical dictionary Extracting method carries out name, and extraction can (judgment criterion of Wiki classification description section be accordingly changed into The description classification of " xxxx births ").By above-mentioned processing, the related name set of the video can be obtained Wherein N_kK-th of name extracted is represented, CN represents the quantity of extracted name.

Because the periphery text of internet video is generally provided by video upload user, the syntactic structure of text is loose, single Word collocation it is more free, misspelling and write a Chinese character in simplified form it is also of common occurrence.The above-mentioned name based on the matching of wikipedia biographical dictionary carries Method is taken independent of syntactic structure, and to misspelling and has write a Chinese character in simplified form certain tolerance, particularly suitable for internet video week Name extraction in the text of side.It is of course also possible to using other name extracting methods, the present invention is not done for name extracting method Any restrictions.

Step S2 is using the name in the name set that step S1 is obtained as text key word, is searched for obtain and the people The corresponding personage's network image set of name, it specifically comprises the following steps：

Step S21, using the name in the name set that step S1 is obtained as text key word, search for and download on network The image related to the text key word.

Specifically, existing image search engine can be utilized, such as the application programming interfaces for calling Google to provide, Text key word is submitted to Google image search engines, and the image for setting search parameter to include face for 64 width of retrieval, Under the setting, Google image search engines can sort retrieval result in the URL of the character image of first 64 (i.e. URL addresses) returns to retrieval end, retrieves end and then downloads respective image according to URL addresses.That is, in all images Can normally download ideally, the step for can obtain 64 search result images.In practice, each name can With the image that downloads to generally between 50 to 64.

Step S22, Face datection is carried out to the image related to the text key word of the download, filters out and do not examine Measure face or detect the image of more than one face.

For example, computer vision can be called to increase income storehouse OpenCV Face datection function, to downloading successful figure map As carrying out Face datection.The returning result of Face datection function can be：Do not detect face, detect one or more people Face.In image due to detecting multiple faces in addition to inquirer, the face of other personages is generally will also include, and then marking Reference comparison procedure in the person that can disturb mark judgement, therefore the step only retains the image for detecting 1 face, does not examine Measure face and detect that the image of multiple faces is then removed；

Step S23, repeat the above steps S21 and step S22 to all names in the name set, obtain with it is described Each personage's network image set corresponding to name in name set.

The set of personage's network image can be designated asWherein C_kRepresent all and name N_kCorresponding personage's net Network image.

For convenience of the follow-up explanation to embodiment.The system composition of the present invention is simply introduced first.Fig. 2 Give the sectional drawing of above-mentioned Interactive Marking System, it can be seen that system interface is divided into management region, tab area, mark ginseng Four parts in examination district domain and mark history area.Management region supplies mark person's interactive selection video to be marked, is regarded selected by loading The face sequence names and related name of frequency.Tab area can be further divided into similar face merging/distinguishing mark subregion and Name-face connective marker subregion, it is respectively intended to the current similar face combined sequence Q to be marked of display_i=＜ F_m, F_n＞ With face sequence F_j, and perform corresponding interaction labeling operation.In addition, with showing in name-face connective marker subregion The most like name of face sequence of leting others have a look at and its first six width personage network image are then accordingly shown in mark reference zone.Rightmost Mark history area the tuple of name-face two marked is then shown by mark order, wherein two tuples of newest mark show Show in the top.The function of mark reference zone and mark history area helps mark person reference to determine primarily as auxiliary information Plan.

Step S3 is importance score, the merging recommendation scores two-by-two of the face sequence for calculating the face sequence, And the similarity score of personage's network image corresponding with the name that the face sequence obtains with step S2, and according to The importance score, merging recommendation scores and the similarity score two-by-two, it is determined that when being labeled to the video, Face sequence, name and the personage's network image shown.Step S3 is included as follows step by step：

Step S31, the significance value of each face sequence in the face arrangement set is calculated.

It is video because the face sequence that time of occurrence is longer, face is bigger more easily attracts much attention in video The probability of key figure is also bigger.This property of face sequence is referred to as conspicuousness by the present invention, and is proposed following notable Property value calculation formula：

Wherein, size_iAnd dura_iIt is face sequence F respectively_iAverage face size and time of occurrence length, size_θWith dura_θIt is two threshold values rule of thumb set, is respectively intended to the shadow for controlling face size and time of occurrence to calculate conspicuousness Ring.By formula (1), the face sequence that long, the average people of time of occurrence is bold will have larger significance value.

Step S32, the similitude two-by-two in the face arrangement set between face sequence is calculated.

Because in video, time of occurrence has two overlapping face sequences generally to correspond to different people, but between time of occurrence Every shorter face sequence be probably then same person because the reason such as Shot change, the different face sequences of generation.Based on upper Understanding is stated, proposes that the visual similarity two-by-two according to face sequence, face sequence time of occurrence interval, time of occurrence whether there is The information such as overlapping, calculates the similitude two-by-two of face sequence, and corresponding calculation formula is：

Wherein, time_θBe for control time of occurrence difference influence threshold value, Δ time_{I, j}It is face sequence F_iAnd F_j's Time of occurrence difference, calculated by equation below (3)：

In formula (3),WithIt is face sequence F respectively_iBetween at the beginning of appearance and the end time, when Between be worth and small show that face sequence appears in before video (beginning) part.In addition in formula (2), CO_{I, j}It is to represent face sequence F_iAnd F_jWhether time of occurrence has overlapping two-valued function, if the two has overlapping, CO_{I, j}=1, otherwise CO_{I, j}=0；vs(F_i, F_j) it is face sequence F_iAnd F_jVisual similarity, in the representative face set with two face sequences, most like two The similarity of face represents that its calculation formula is：

In formula (4),It is face sequence F_iM-th of representative face facial characteristics vector.

Step S33, the face sequence obtained according to step S32 similitude and customer interaction information during mark two-by-two, meter Calculate the merging recommendation scores two-by-two of face sequence.

Specifically calculated using equation below：

MS(F_i, F_j)=(1-PM_{I, j})·sim(F_i, F_j) (5)

Wherein PM_{I, j}It is to represent face sequence F_iAnd F_jCombination whether in annotation process, by user's " skipping " or mark For " difference ".If then PM_{I, j}=1, otherwise PM_{I, j}=0.According to formula (5), similitude is high, and during user annotation not Big merging recommendation scores two-by-two will be endowed by user's " skipping " or the face sequence combination of two for being labeled as " difference ".It is based on All scores are more than or equal to the face combined sequence of previously given threshold value according to MS (F by this_i, F_j) value arranges, obtain from high to low To merging recommendation scores list two-by-twoWherein Q_k=＜ F_i, F_j＞_i≠j.In annotation process, Fig. 2 systems Similar face sequence in system merge distinguishing mark subregion will be according to Rank_MSShow face sequence combination of two to be marked.

Step S34, the face sequence significance value obtained using step S31, the two-phase of face sequence two that step S32 is obtained Customer interaction information during like property score and mark, calculate the importance score of face sequence.

After the importance expression of face sequence considers the information of a variety of face sequences, the video and user mutual, The degree that face sequential value must mark, it can be calculated using following equation (6)：

Wherein PA_iIt is to characterize face sequence F_iWhether " skipped " by user in annotation process, if then PA_i=1, otherwise PA_i=0；WithIt is the conspicuousness Sai after minimax normalization respectively_iWith accumulation correlation AR_i, the latter is defined as：

Wherein, L_jIt is face sequence F_jMark function of state.If F_iIt has been be marked that, then L_j=1, otherwise L_j=0.According to formula (6), significance value is big, has marked that face is all more similar to multiple, and the face " do not skipped " during user annotation Sequence will be endowed big importance score.

Based on this, by face sequence according to importance score IS (F_i) arrange from high to low, obtain importance score listIn annotation process, name-face connective marker subregion in Fig. 2 systems will be according to Rank_ISIt is aobvious Show face sequence to be marked.

Step S35, face sequence and the personage in personage's network image set in the face arrangement set are calculated The similarity of network image, it is ranked up by the height of similarity, K of people's list of file names and each name after being sorted Most like personage's network image.In the present invention, K value is arranged to 6.

The step is mainly solved in interactive annotation process, and mark person is frequently encountered the feelings for not recognizing personage to be marked Condition.By showing above-mentioned name and personage's network image in labeling system, alleviate the problem of mark person does not recognize personage to be marked. Specifically, after the face sequence that is shown in name-face connective marker subregion determines, by the name most like with it and its K most like personage's network images are shown, and for mark person with reference to comparing, auxiliary is determined corresponding to face sequence to be marked Name.The calculating of the step includes following three sub-step：

Step S351, calculate face sequence in the face arrangement set in the name set name it is similar two-by-two Property.The visual characteristic of name can be represented by its corresponding personage's network image.Based on this, people is calculated by equation below (8) Face sequence F_iWith personage's network image set C_jSimilitude, and be used as face sequence F by the use of the similitude_iWith name N_jIt is similar Property：

Wherein

It is name N_jCorresponding personage's network image set C_jThe face features vector of middle n-th image；

Step S352, the similitude being calculated according to step S351, the name is ranked up.Generally, face sequence Arrange F_iWith name N_jSimilarity vs (F_i, N_j) bigger, then F_iIt is N_jFace probability it is also bigger.Based on this, according to vs (F_i, N_j) value is from high to low to name set omega_NIt is ranked up, obtains face sequence F_iCorresponding name sequence

Step S353, K most like personage network images of the face sequence relative to each name are calculated.From formula (9) It can be seen that the similarity in personage's network image with face sequence is by the image and most like face sequence representativeness face Represent.Therefore, to each group of people's face sequence F_iWith name N_j, according toValue is from high to low to C_jIn personage's network image It is ranked up, retains K most like images, obtain face sequence F_iRelative to name N_jPersonage's network image listWherein K is set as 6；

According to a preferred embodiment of the invention, the face sequence importance score list that step 34 obtainsAnd the face sequence F that step 35 obtains_iCorresponding similar people's list of file namesWith similar personage's network image listPass through a variety of user mutuals Operation, corresponding mark behavior is produced, realize that details are provided below that step S4 is specifically included to the mark of personage in video：

Step S41, the various resources that initialization annotation process is related to.

Specific practice is：

S411, order

S412, automatic marking meet the face combined sequence Q of condition shown in formula (10)_i=＜ F_m, F_n＞, and will be all Combination has been marked from Rank_MSRemoved in list

Label(F_i)=Label (F_j), if satisfies vs (F_i, F_j)≥T_s (10)

Wherein if satisfies represent " if satisfaction ", T_sRepresent whether similar enough on two face serial visuals Threshold value.

S413, from Rank_MSAnd Rank_ISThe middle element Q for taking out top ranked respectively_i=＜ F_m, F_n＞ and F_j, i.e., current two Two merge the face sequence of the face combined sequence of highest scoring and importance highest scoring, by these resources in labeling system Shown；

S414, take out Rank (F_j) in top ranked nameAndIn K image, by this A little resources are shown in labeling system.

Step S42, according to a variety of user interactives, corresponding mark behavior is produced.

User interactive has three classes：1) the face combined sequence Q that will be shown in system_i=＜ F_m, F_n＞ is labeled as " phase Together " or " difference " similar face merging/distinguishing mark operation；2) particular person name mark face sequence F is selected_jName-people Face connective marker operates；3) different names and its personage's network image are selected, for the operation of mark person reference.This three generic operation In, the 3rd class is mark auxiliary operation, it is therefore an objective to provides information auxiliary user annotation decision-making, the second generic operation can be F_jOn mark Corresponding name, and it is never marked and removed in face arrangement set ULSets.Mark behavior corresponding to this three classes interactive operation It is respectively：

1) similar face merging/mark behavior corresponding to distinguishing mark operation：

A) if user marks Q with " identical " option_i, then Label (F are made_m)=Label (F_n), wherein Label (F_m) represent Face sequence F_mCorresponding name；

B) if user marks Q with " difference " option_i, then Label (F are made_m)≠Label(F_n), with seasonal PM_{M, n}=1；

C) if user is to Q_i" skipping " option is selected, then makes PM_{M, n}=1；

2) name-mark behavior corresponding to the operation of face connective marker：

A) if user selects to use name N_kFlag F_j, then make ULSets=ULSets F_j, Label (F_j)=N_k；

B) if user selects " skipping " to F_jMark, then make PA_j=1；

3) behavior corresponding to name and personage's network image selection operation：

If a) user clicks on " previous " option, k=k-1 (as k ＞ 1) is made, shows nameAnd personage's net Network image listIn K image；

If b) user clicks on " the latter " option, k=k+1 (as k ＜ CN) is made, shows nameAnd personage's net Network image listIn K image.

Step S43, other face sequences that do not mark are labeled using label propagation algorithm.

Because the interaction mark behavior of user provides extra mark clue.Therefore, using label propagation algorithm to full Sufficient equation below (11) or the other of (12) described condition do not mark face sequence F_iCarry out automatic marking；

Wherein T_sIt is the similarity threshold that formula (10) defines.

Step S44, arranged and reordered to merging recommendation scores list and importance score list two-by-two, under decision The resource presented during one wheel user annotation.

By step S42 and step S43, merge recommendation scores list Rank two-by-two_MSWith importance score list Rank_IS In some face sequences be marked.The step is according to annotation results, to Rank_MSAnd Rank_ISArranged and reordered, certainly Determine the resource presented during next round user annotation.Above-mentioned arrangement and the specific practice to reorder are respectively：

1) arrange：By Rank_MSAnd Rank_ISThe middle element Q for meeting equation below (13), (14) or (15) condition_i= ＜ F_m, F_n＞ and F_jRemove respectively：

Rank_MS=Rank_MS\Q_i, if satisfies Label (F_m)=Label (F_n) (14)

Rank_IS=Rank_IS\F_j, if satisfies (15)

2) reorder：To Rank_MSAnd Rank_ISIn remaining element, be utilized respectively formula (6) and (5) recalculate its two Two merge recommendation scores and importance score, and are resequenced according to score and generate Rank_MSAnd Rank_ISList, as next round The foundation that resource is shown during interaction mark.

Step S45, repeat step S42 to step S44, until all face sequences that do not mark are all marked (i.e.), or user actively exits annotation process.

Particular embodiments described above, the purpose of the present invention, technical scheme and beneficial effect are carried out further in detail Describe in detail bright, it should be understood that the foregoing is only the present invention specific embodiment, be not intended to limit the invention, it is all Within the spirit and principles in the present invention, any modification, equivalent substitution and improvements done etc., the protection of the present invention should be included in Within the scope of.

Claims

1. a kind of internet personage video interactive mask method, it is characterised in that comprise the following steps：

S1, video to be marked is analyzed, extract the name in the face arrangement set and video periphery text in the video Set；

S2, using the name in the name set that step S1 is obtained as text key word, search for obtain it is corresponding with the name Personage's network image set；

S3, the importance score for calculating the face sequence, the merging recommendation scores two-by-two of the face sequence, and the people The similarity score for personage's network image corresponding with the name that face sequence obtains with step S2, and according to the importance Score, merging recommendation scores and the similarity score two-by-two, it is determined that when being labeled to the video, are shown Face sequence, name and personage's network image；

S4, the face sequence shown according to step S3, name and personage's network image, interact mark to face sequence, enter And the mark to the video is realized,

Wherein, the step S3 comprises the following steps：

Step S31, the significance value of each face sequence in the face arrangement set is calculated；

Step S32, the similitude two-by-two in the face arrangement set between face sequence is calculated；

Step S33, the face sequence obtained according to step S32 similitude and customer interaction information during mark two-by-two, calculate people The merging recommendation scores two-by-two of face sequence；

Step S34, the face sequence significance value obtained using step S31, the face sequence that step S32 is obtained similitude two-by-two Customer interaction information when score and mark, calculate the importance score of face sequence；

Step S35, face sequence and personage's network in personage's network image set in the face arrangement set are calculated The similarity of image, it is ranked up by the height of similarity, obtains people's list of file names after each face sequence permutation and each K most like personage's network images of name；

The step S4 comprises the following steps：

Step S41, the various resources that initialization annotation process is related to；

Step S42, according to a variety of user interactives, corresponding mark behavior is produced；

Step S43, other face sequences that do not mark are labeled using label propagation algorithm；

Step S44, arranged and reordered to merging recommendation scores list and importance score list two-by-two, determine next round The resource presented during user annotation；

Step S45, repeat step S42 to step S44, until all face sequences that do not mark are all marked,

Wherein, the importance score of face sequence is calculated by equation below (1) in the step S34：

<mrow> <mi>I</mi> <mi>S</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>PA</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <mrow> <mo>(</mo> <mover> <mrow> <msub> <mi>Sai</mi> <mi>i</mi> </msub> </mrow> <mo>&OverBar;</mo> </mover> <mo>+</mo> <mover> <mrow> <msub> <mi>AR</mi> <mi>i</mi> </msub> </mrow> <mo>&OverBar;</mo> </mover> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>

Wherein PA_iIt is to characterize face sequence F_iWhether " skipped " by user in annotation process, if then PA_i=1, otherwise PA_i= 0；WithIt is the conspicuousness Sai after minimax normalization respectively_iWith accumulation correlation AR_i, the latter is defined as：

<mrow> <msub> <mi>AR</mi> <mi>i</mi> </msub> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mi>j</mi> <mo>&NotEqual;</mo> <mi>i</mi> </mrow> <mrow> <mi>F</mi> <mi>N</mi> </mrow> </munderover> <msub> <mi>L</mi> <mi>j</mi> </msub> <mo>&CenterDot;</mo> <mi>s</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>F</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>

Wherein, FN represents the quantity of face sequence, sim (F_i, F_j) represent face sequence F_iAnd F_jSimilitude, L_jIt is face sequence F_jMark function of state；If F_iIt has been be marked that, then L_j=1, otherwise L_j=0,

By face sequence according to importance score IS (F_i) arrange from high to low, obtain importance score list

A variety of user interactives include in the step S42：1) the face combined sequence Q for showing system_i=＜ F_m, F_n＞ Labeled as the similar face merging/distinguishing mark operation of " identical " or " difference "；2) particular person name mark face sequence F is selected_j The operation of name-face connective marker；3) interactive operation for selecting different names and its personage's network image to be shown.

2. according to the method for claim 1, it is characterised in that the step S1 comprises the following steps：

Step S11, shot segmentation is carried out to the video, Face detection and tracking is carried out to each obtained camera lens, is somebody's turn to do Face sequence in camera lens, the face sequence that comprehensive all camera lenses obtain, obtains the face arrangement set of the video；

Step S12, the representative facial image of each face sequence in the face arrangement set is obtained；

3. according to the method for claim 1, it is characterised in that the step S2 comprises the following steps：

Step S21, using the name in the name set that step S1 is obtained as text key word, search for and download and institute on network State the related image of text key word；

Step S22, Face datection is carried out to the image related to the text key word of the download, filters out and do not detect Face or detect the image of more than one face；

Step S23, repeat the above steps S21 and step S22 to all names in the name set, obtains and the name Each personage's network image set corresponding to name in set.

4. according to the method for claim 1, it is characterised in that face sequence F in the step S31_iConspicuousness pass through such as Lower formula (3) calculates：

<mrow> <mi>S</mi> <mi>a</mi> <mi>i</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mi>e</mi> <mrow> <mo>-</mo> <mfrac> <mrow> <msub> <mi>size</mi> <mi>&theta;</mi> </msub> </mrow> <mrow> <msub> <mi>size</mi> <mi>i</mi> </msub> </mrow> </mfrac> </mrow> </msup> <mo>+</mo> <msup> <mi>e</mi> <mrow> <mo>-</mo> <mfrac> <mrow> <msub> <mi>dura</mi> <mi>&theta;</mi> </msub> </mrow> <mrow> <msub> <mi>dura</mi> <mi>i</mi> </msub> </mrow> </mfrac> </mrow> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>

Wherein, size_iAnd dura_iIt is face sequence F respectively_iAverage face size and time of occurrence length, size_θAnd dura_θ It is two threshold values rule of thumb set, is respectively intended to the influence for controlling face size and time of occurrence to calculate conspicuousness.

5. according to the method for claim 1, it is characterised in that the similitude two-by-two of face sequence is led in the step S32 Cross equation below (4) calculating：

<mrow> <mi>s</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>F</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mi>e</mi> <mrow> <mo>-</mo> <mfrac> <mrow> <msub> <mi>&Delta;time</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> </mrow> <mrow> <msub> <mi>time</mi> <mi>&theta;</mi> </msub> </mrow> </mfrac> </mrow> </msup> <mo>&CenterDot;</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>CO</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <mi>v</mi> <mi>s</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>F</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>

Wherein, time_θBe for control time of occurrence difference influence threshold value, Δ time_{I, j}It is face sequence F_iAnd F_jAppearance Time difference, calculated by equation below (5)：

In formula (5),WithIt is face sequence F respectively_iBetween at the beginning of appearance and end time, time value are small Show that face sequence appears in the previous section of video；

In formula (4), CO_{I, j}It is to represent face sequence F_iAnd F_jWhether time of occurrence has overlapping two-valued function, if the two has weight Fold, then CO_{I, j}=1, otherwise CO_{I, j}=0；vs(F_i, F_j) it is face sequence F_iAnd F_jVisual similarity, with two face sequences Representative face set in, the similarity of most like two faces represents that its calculation formula is：

<mrow> <mi>v</mi> <mi>s</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>F</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mi>e</mi> <mrow> <mo>-</mo> <msub> <mi>min</mi> <mrow> <msubsup> <mi>f</mi> <mi>i</mi> <mi>m</mi> </msubsup> <mo>&Element;</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>,</mo> <msubsup> <mi>f</mi> <mi>j</mi> <mi>n</mi> </msubsup> <mo>&Element;</mo> <msub> <mi>F</mi> <mi>j</mi> </msub> <mo>,</mo> <mi>i</mi> <mo>&NotEqual;</mo> <mi>j</mi> </mrow> </msub> <mo>|</mo> <mo>|</mo> <msubsup> <mi>f</mi> <mi>i</mi> <mi>m</mi> </msubsup> <mo>-</mo> <msubsup> <mi>f</mi> <mi>j</mi> <mi>n</mi> </msubsup> <mo>|</mo> <mo>|</mo> </mrow> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow>

In formula (6), f_i ^mIt is face sequence F_iM-th of representative face facial characteristics vector.

6. according to the method for claim 5, it is characterised in that the merging two-by-two of face sequence is recommended in the step S33 Score is calculated by equation below (7)：

MS(F_i, F_j)=(1-PM_{I, j})·sim(F_i, F_j) (7)

Wherein PM_{I, j}It is to represent face sequence F_iAnd F_jCombination whether in annotation process, by user's " skipping " or be labeled as " no Together "；If then PM_{I, j}=1, otherwise PM_{I, j}=0；According to formula (7), similitude is high, and not by user during user annotation " skipping " or it is labeled as the face sequence combination of two of " difference " and big merging recommendation scores two-by-two will be endowed；, will based on this All scores are more than or equal to the face combined sequence of previously given threshold value according to MS (F_i, F_j) value arranges, obtain two-by-two from high to low Merge recommendation scores listWherein Q_k=＜ F_i, F_j＞_i≠j。

7. according to the method for claim 6, it is characterised in that the step S35 comprises the following steps：

Step S351, face sequence and the similitude two-by-two of name in the name set in the face arrangement set are calculated；

Step S352, the similitude being calculated according to step S351, the name is ranked up；

Step S353, K most like personage network images of the face sequence relative to each name are calculated.

8. according to the method for claim 7, it is characterised in that

The step S351 calculates face sequence F by equation below (8)_iWith personage's network image set C_jSimilitude, be used in combination The similitude is as face sequence F_iWith name N_jSimilitude：

<mrow> <mi>v</mi> <mi>s</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>N</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mi>v</mi> <mi>s</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>C</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mo>|</mo> <msub> <mi>C</mi> <mi>j</mi> </msub> <mo>|</mo> </mrow> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>n</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mo>|</mo> <msub> <mi>C</mi> <mi>j</mi> </msub> <mo>|</mo> </mrow> </munderover> <mi>v</mi> <mi>s</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>,</mo> <msubsup> <mi>c</mi> <mi>j</mi> <mi>n</mi> </msubsup> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>8</mn> <mo>)</mo> </mrow> </mrow>

Wherein

<mrow> <mi>v</mi> <mi>s</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>,</mo> <msubsup> <mi>c</mi> <mi>j</mi> <mi>n</mi> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mi>e</mi> <mrow> <mo>-</mo> <msub> <mi>min</mi> <mrow> <msubsup> <mi>f</mi> <mi>i</mi> <mi>m</mi> </msubsup> <mo>&Element;</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> </mrow> </msub> <mo>|</mo> <mo>|</mo> <msubsup> <mi>f</mi> <mi>i</mi> <mi>m</mi> </msubsup> <mo>-</mo> <msubsup> <mi>c</mi> <mi>j</mi> <mi>n</mi> </msubsup> <mo>|</mo> <mo>|</mo> </mrow> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>9</mn> <mo>)</mo> </mrow> </mrow>

It is personage's network image set C_jThe face features vector of middle n-th image.

9. according to the method for claim 8, it is characterised in that

The step S352 is to each face sequence F_i, according to vs (F_i, N_j) value is ranked up to name from high to low, obtain people Name sequenceWherein, CN represents the quantity of extracted name.

10. according to the method for claim 9, it is characterised in that

The step S353 is to each group of people's face sequence F_iWith name N_j, according toValue is from high to low to C_jIn personage's network Image is ranked up, and retains the most like images of K, is obtained and F_iAnd N_jCorresponding personage's network image list

11. according to the method for claim 10, it is characterised in that the step S41 includes：

S411, orderWherein, ULSets is represented Face arrangement set is not marked；

S412, automatic marking meet the face combined sequence Q of condition shown in formula (10)_i=＜ F_m, F_n＞, and marked all Note is combined from Rank_MSRemoved in list

Label(F_i)=Label (F_j), if vs (F_i, F_j)≥T_s (10)

Wherein Label (F_i) represent face sequence F_iCorresponding name, Label (F_j) represent face sequence F_jCorresponding name, T_s Be represent on two face serial visuals whether threshold value similar enough；

S413, from Rank_MSAnd Rank_ISThe middle element Q for taking out top ranked respectively_i=＜ F_m, F_n＞ and F_j, i.e., currently close two-by-two And the face combined sequence of highest scoring and the face sequence of importance highest scoring, these resources are given in labeling system Display；

S414, take out Rank (F_j) in top ranked nameAndIn K image, these are provided Source is shown in labeling system.

12. according to the method for claim 11, it is characterised in that a variety of user interactives are corresponding in the step S42 Mark behavior be respectively：

A) if user marks Q with " identical " option_i, then Label (F are made_m)=Label (F_n), wherein Label (F_m) represent face sequence Arrange F_mCorresponding name；

C) if user is to Q_i" skipping " option is selected, then makes PM_{M, n}=1；

B) if user is to F_j" skipping " option is selected, then makes PA_j=1；

If a) user clicks on " previous " option, k=k-1 (as k ＞ 1) is made, shows nameAnd personage's network image ListIn K image；

If b) user clicks on " the latter " option, k=k+1 (as k ＜ CN) is made, shows nameAnd personage's network As listIn K image.

13. according to the method for claim 12, it is characterised in that the step S43 to meet certain condition it is other not Mark face sequence F_iCarry out shown in specific practice such as formula (11) or (12) of automatic marking：

<mrow> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mi>L</mi> <mi>a</mi> <mi>b</mi> <mi>e</mi> <mi>l</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>N</mi> <mi>k</mi> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>U</mi> <mi>L</mi> <mi>S</mi> <mi>e</mi> <mi>t</mi> <mi>s</mi> <mo>=</mo> <mi>U</mi> <mi>L</mi> <mi>S</mi> <mi>e</mi> <mi>t</mi> <mi>s</mi> <mo>\</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> <mi>i</mi> <mi>f</mi> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>&Element;</mo> <mi>U</mi> <mi>L</mi> <mi>S</mi> <mi>e</mi> <mi>t</mi> <mi>s</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>v</mi> <mi>s</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>F</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>&GreaterEqual;</mo> <msub> <mi>T</mi> <mi>s</mi> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>L</mi> <mi>a</mi> <mi>b</mi> <mi>e</mi> <mi>l</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>N</mi> <mi>k</mi> </msub> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>11</mn> <mo>)</mo> </mrow> </mrow>

<mrow> <mi>Label</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mi>Label</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>,</mo> <mi>if</mi> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>&Element;</mo> <mi>ULSets</mi> </mtd> </mtr> <mtr> <mtd> <msub> <mi>F</mi> <mi>j</mi> </msub> <mo>&Element;</mo> <mi>ULSets</mi> </mtd> </mtr> <mtr> <mtd> <mi>vs</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>F</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>&GreaterEqual;</mo> <mi>T</mi> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>12</mn> <mo>)</mo> </mrow> </mrow>

Wherein T_sIt is the similarity threshold that formula (10) defines.

14. according to the method for claim 13, it is characterised in that the step S44 is according to annotation results, to Rank_MSWith Rank_ISThe specific practice for being arranged and being reordered is：

1) arrange：In Rank_MSAnd Rank_ISThe middle element Q for deleting the condition for meeting equation below (13), (14) or (15) respectively_i =＜ F_m, F_n＞ and F_j：

<mrow> <msub> <mi>Rank</mi> <mi>MS</mi> </msub> <mo>=</mo> <msub> <mi>Rank</mi> <mi>MS</mi> </msub> <mo>\</mo> <msub> <mi>Q</mi> <mi>i</mi> </msub> <mo>,</mo> <mi>if</mi> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <msub> <mi>F</mi> <mi>m</mi> </msub> <mo>&NotElement;</mo> <mi>ULSets</mi> </mtd> </mtr> <mtr> <mtd> <msub> <mi>F</mi> <mi>n</mi> </msub> <mo>&NotElement;</mo> <mi>ULSets</mi> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>13</mn> <mo>)</mo> </mrow> </mrow>

Rank_MS=Rank_MS\Q_i, if Label (F_m)=Label (F_n) (14)

<mrow> <msub> <mi>Rank</mi> <mi>IS</mi> </msub> <mo>=</mo> <msub> <mi>Rank</mi> <mi>IS</mi> </msub> <mo>\</mo> <msub> <mi>F</mi> <mi>j</mi> </msub> <mo>,</mo> <mi>if</mi> <msub> <mi>F</mi> <mi>j</mi> </msub> <mo>&NotElement;</mo> <mi>ULSets</mi> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>15</mn> <mo>)</mo> </mrow> </mrow>

2) reorder：To Rank_MSAnd Rank_ISRemaining element, recalculate it using formula (7) and (1) and merge recommendation two-by-two Score and importance score, and Rank is regenerated according to this_MSAnd Rank_IS, as next round interact mark when resource show according to According to.

A kind of 15. internet personage of any internet personage video interactive mask method using described in claim 1-14 Video interactive labeling system, it is characterised in that including：

For analyzing video to be marked, the name in the face arrangement set and video periphery text in the video is extracted The device of set；

For using the name in the name set as text key word, searching for the net of acquisition personage corresponding with the name The device of network image collection；

For calculating the merging recommendation scores two-by-two of the importance score of the face sequence, the face sequence, and it is described The similarity score of face sequence personage's network image corresponding with the name, and according to the importance score, described two Two merge recommendation scores and the similarity score, it is determined that when being labeled to the video, the face sequence that is shown, people The device of name and personage's network image；

For showing face sequence, name and the personage's network image to be marked, mark, Jin Ershi are interacted to face sequence The device being now labeled to the video.