CN104217008A

CN104217008A - Interactive type labeling method and system for Internet figure video

Info

Publication number: CN104217008A
Application number: CN201410475211.0A
Authority: CN
Inventors: 陈智能; 白锦峰; 冯柏岚; 黄向生; 徐波
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2014-09-17
Filing date: 2014-09-17
Publication date: 2014-12-17
Anticipated expiration: 2034-09-17
Also published as: CN104217008B

Abstract

The invention discloses an interactive type labeling method and system for an Internet figure video. The method comprises the following steps: extracting human face sequences of a video to be labeled and names in a peripheral text; taking each name as a text keyword and obtaining a corresponding figure network image set by utilizing a searching engine; calculating significance scores of the human face sequences, recommending scores of the human face sequences combined two by two and the similarity of the human face sequences and a figure network image; determining the human face sequences, the name and the figure network image which are displayed in a labeling process according to the properties; generating corresponding labeling behaviors through many types of user interactive operations to realize the labeling of figures in the video. According to the interactive type labeling method and system, many types of resources related to the video to be labeled and the figures are excavated, and friendly and various user interactive manners are designed, so that the labeling process can be simplified and a labeling decision is assisted; the problem that the figures to be labeled are not known by a labeling person so that the labeling process is difficult to carry out is effectively relieved; the efficiency and the precision of labeling the figure video can be greatly improved.

Description

Internet personage video interactive mask method and system

Technical field

The present invention relates to video intelligent analysis technical field, in particular, relate to a kind of internet personage video interactive mask method and system.

Background technology

Along with the development of Internet technology is shared the popular of behavior online with video, a large amount of specialties and non-professional video are manufactured out, upload to internet, and are browsed and watch by user all over the world.Because this class video is embedded in internet web page conventionally, externally provide online broadcasting service, be therefore collectively referred to as internet video.People is one of internet video theme of greatest concern.In video website, there are a large amount of hot videos to relate to people, famous person particularly, in addition, famous person's name is also the important component part of video search engine focus inquiry word always.

Although internet personage video is extensively concerned, in large-scale internet video library, finding interested personage's video is not an easy thing.The video search engine of main flow is all that the method for mating by text key word realizes video frequency searching at present.Concerning personage's video, there is the deficiency of following three aspects in this search method: 1) the periphery text of internet video (for example title, label and user comment) normally imperfect and be with noisy, the video that someone occurs might not mark this people's name, correspondingly, having marked this people in the video of someone name not necessarily there will be, cause video search can only find the video that a part is relevant, and in result for retrieval, conventionally have a certain proportion of noise; 2) periphery text is the description to fragment in whole video rather than video, the video segment that is directly targeted to this people's appearance according to name remains the service that main flow video website can not provide at present, and this service can provide very large facility undoubtedly for user's browsing video; 3) in result for retrieval list, the video appearing at is not above maximally related conventionally, because only whether there is being difficult to obtain inquire about accurately degree of correlation judgement according to name.Therefore, industrial community in the urgent need to more intelligent effective personage's video frequency searching, browse and sort method.

The key addressing the above problem is that the people's face for occurring in video marks its corresponding name.In other words, set up people's face in video to the mapping relations of name in periphery text, this task is marked by the general people's of being called face.Although people's face detects and name identification has been the technology of comparative maturity, people's face marks, and particularly at people's face angle, facial expression, illumination, the mark that blocks etc. in unrestricted situation, remains a problem that has challenge.In several years in the past, the video for particular types such as news video, film and television plays, has had some effective people's face mask methods to be suggested.Although these methods are had nothing in common with each other on realizing, and have substantially all adopted the technology path of multimodal information fusion.First, from news lecture notes, phonetic transcription text or internet etc., outside channel obtains the related high priest's of video (as the dominant role of film) name for they, and the drama of video and captioned test, by utilizing news lecture notes or alignment drama and captions, obtain particular persons in the content of speaking of video particular point in time.According to detecting the time point of people's face in video, tentatively set up the mapping relations of people's face and name, and then utilize the visual similarity between people's face simultaneously, thereby this relation is refined and realized mark.Because can providing conventionally compared with horn of plenty and concrete name and personage, news lecture notes, drama and captioned test there is clue, and high priest's quantity that film and television play etc. relates to is common also comparatively limited, and said method can be realized the full-automatic mark to high priest in particular news program, film and television play with higher precision.

Yet internet video is different from film and television play.Although also have some text messages on the webpage of internet video, the common limited amount of these texts, not accurate enough and do not organized preferably.In addition, they appear at whole video-level, unlike captioned test with timestamp information.These characteristics have determined that the method for above-mentioned dependence rich text information excavating is difficult to directly be generalized on internet video.In addition, internet video content embraces a wide spectrum of ideas, and the personage that video may relate to is contained different social sectors, and quantity is extremely many, even only pay close attention to famous person wherein, its quantity neither a decimal fractions.At present, the people's face automatic marking work for extensive open the Internet video is still in predevelopment phase.Owing to being difficult to reach good mark effect, current this respect does not have ripe method and system and emerges.

Along with magnanimity internet video is deposited in video website, and new video quantity also increasing at faster speed, and personage's video labeling becomes again Bai academia and the previous problem that solves of having to of industrial community face.Therefore, people is incorporated into mark link, interactive mask method starts to receive publicity to take that to improve mark accuracy be target.To sky, meadow, on the mark of the general visual concepts such as building, had at present some effective interactive mask methods to be suggested, but these methods can not be applied directly in this problem of difference mark different people.Trace it to its cause, the above-mentioned general visual concept of artificial mark is comparatively easily to realize, because only can distinguish these concepts by general knowledge in the time of most of, but concerning the different personages of difference mark, even veteran mark person, conventionally be also only familiar with considerably less in the world people, and people cannot mark name for own unacquainted people.If as existing Interactive Marking System, only the image that comprises personage or frame of video and (a plurality of) relevant name are submitted to mark user, due to very large, may not be familiar with personage to be marked, the very difficult image scale of user is noted general visual concept and is gone like that to mark personage, even if that needs mark is all famous person.Interactive personage, mark particularly video personage and mark this on the one hand, at present relevant achievement is also very rare.

Notice when people sees unacquainted people in image or video, for understanding whom he/her is, the solution of taking is normally: from periphery text, find name, with the name finding as keyword, utilize image search engine to retrieve, then the result images returning by comparison search engine and the people in picture with the aid of pictures, show that in image, whose judgement people is.The image retrieval based on text key word that such scheme adopts, although also there is at present the searching system of minority " to scheme to search figure ", but because search target is the image of particular persons, and do not require that all result images are visually highly similar to query image, and the apparent variation of the vision of video human face is large, resolution is conventionally lower, also the precision of " to scheme to search figure " system is brought challenges, searching method that mainly take at present or based on text key word in this task.Owing to can find particularly famous person of a large amount of character images by search engine, such scheme a kind of effective help user that many times can yet be regarded as is not familiar with personage's method before understanding.

People's above-mentioned way can be used for reference on the interactive mask method of personage's video and the design of system naturally.Mark person is carrying out personage while marking, and can run into unacquainted people equally and has to suspend mark, understands this people, and then continue to advance mark process by seeking help from the external tools such as search engine.Because needs frequently switch in mark and search comparison operation, this process is undoubtedly poor efficiency and loaded down with trivial details.If can extract the name in video periphery text by text resolution and visual analysis technology, obtain related person network image and give corresponding demonstration in mark process; Meanwhile, people's face in video is analyzed and processed, and show to be easy to the mode of mark, person both went to understand personage to be marked without being switched to search engine to make mark, what see is again mark mode and process tissue the friendly video human face image presenting that is more easily added on decision-making, can simplify mark process so undoubtedly, significantly improve efficiency and the precision of personage's video labeling.Yet, while retrieving in disclosed patent database, do not inquire special for video in personage's interactive mask method and system, above-mentioned background and be familiar with the present invention just and produce motivation and reason.

Summary of the invention

While the present invention is directed to internet personage video labeling, because mark person is not very likely familiar with personage to be marked, cause mark process to be difficult to the situation of carrying out, a kind of internet personage video interactive mask method and system are proposed, by excavating video multiple and to be marked and the relevant resource of personage, and the friendly various user interactions mode of design, simplify mark process, the decision-making of auxiliary mark, improve efficiency and the precision of personage's video labeling, so promote internet personage video retrieval, browse and the lifting of the service level that sorts.

For achieving the above object, the invention provides a kind of internet personage video interactive mask method, comprise the following steps:

S1, video to be marked is analyzed, extracted people's face arrangement set in this video and the name set in video periphery text;

Name in the name set that S2, the step S1 of take obtain is text key word, and search is the network image set with the corresponding personage of described name with acquisition;

S3, calculate the importance score of described people's face sequence, the recommendation scores of merging between two of described people's face sequence, and described people's face sequence and step S2 that obtain with the similarity score corresponding personage's network image of described name, and according to described importance score, described recommendation scores and the described similarity score of merging between two, determine when described video is marked the people's face sequence, name and the personage's network image that are shown;

S4, the people's face sequence, name and the personage's network image that according to step S3, show, mark alternately to people's face sequence, and then realize the mark to described video.

The present invention also proposes a kind of internet personage video interactive labeling system, comprising:

For video to be marked is analyzed, extract people's face arrangement set in this video and the device of the name set in video periphery text;

For take the name of described name set, be text key word, search is to obtain the device with the corresponding personage's of described name network image set;

For calculating the importance score of described people's face sequence, the recommendation scores of merging between two of described people's face sequence, and the similarity score of described people's face sequence and the corresponding personage's network image of described name, and according to described importance score, described recommendation scores and the described similarity score of merging between two, determine when described video is marked the device of people's face sequence, name and the personage's network image being shown;

For showing people's face sequence, name and personage's network image that will mark, people's face sequence is marked alternately, and then realize the device that described video is marked.

The present invention is by excavating the resource that contributes to mark the friendly various user interactions mode of respective design that video multiple and to be marked and personage are relevant, can simplify mark process, the decision-making of auxiliary mark, person is not familiar with personage to be marked effectively to alleviate mark, causes mark to be difficult to the problem of carrying out.Utilize the present invention, can increase substantially efficiency and the precision of internet personage video labeling, so promote internet personage video retrieval, browse and the lifting of the service level that sorts.

Accompanying drawing explanation

Fig. 1 is the process flow diagram according to a kind of internet personage video interactive mask method of the embodiment of the present invention;

Fig. 2 is a kind of internet personage video interactive labeling system sectional drawing and the correlation module explanation according to the embodiment of the present invention.

Embodiment

For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in further detail.

As shown in Figure 1, internet personage video interactive mask method of the present invention comprises the following steps:

S1, video is analyzed, extracted people's face arrangement set in this video and the name set in video periphery text;

According to a preferred embodiment of the invention, video is analyzed, the detailed process of extracting the step S1 of people's face sequence in this video and the name in video periphery text is:

Step S11, described video is carried out to camera lens cut apart, the camera lens that each is obtained carries out Face detection and tracking, obtains the people's face sequence in this camera lens, and people's face sequence that comprehensive all camera lenses obtain obtains people's face arrangement set of this video;

During concrete enforcement, first internet personage video to be marked is carried out to shot boundary detection, according to the shot boundary video of naming a person for a particular job, be decomposed into camera lens set; Then calling increase income people's face detection function of storehouse OpenCV of computer vision carries out frame by frame people's face to each camera lens and detects; Next, adopt the tracking based on detecting that the people's face that belongs to same person detecting on different video frame is gathered together, generate people's face sequence.All camera lenses are repeated to the generative process of above-mentioned people's face sequence, obtain all people's face arrangement sets that detect of this video the quantity of FN representative face sequence wherein.

The step that tracking based on detecting generates people's face sequence is: first, according to people's face testing result, extract the color histogram feature of everyone face, and calculate according to this similarity between two of people's face.Then, the similarity between two of people's face is sorted from big to small, adopt coagulation type clustering method, the people's face that meets following four conditions is merged between two, four conditions are specially: 1) similarity between two of people's face is greater than predefined merging threshold value; 2), in people's face set that everyone face forms in merging the first two people's face people from place face class, without any two people's faces, appear at same frame of video; 3) the appearance interval of two people's faces is no more than 1 second; 4) distance of two people's face center point coordinates is not more than 2.5 times of people's face width.Repeat above-mentioned merging process and meet above-mentioned four conditions until no longer include two people's faces simultaneously, obtain people's face cluster result.Finally, the people's face that belongs to same class, by the sequence of (frame of video) time of occurrence, is supplemented to people's face that people's face testing process is missed by the method for interpolation, generate complete people's face sequence.Everyone face in everyone face sequence belongs to same person.

It is more than people's face retrieval method of one embodiment of the present invention.Certainly, can use other Face detection and tracking method to obtain people's face sequence, the present invention does not do any restriction for the acquisition methods of people's face sequence yet.

Step S12, obtain the representative facial image of everyone the face sequence in described people's face arrangement set.

A kind of embodiment is, for example, to everyone face sequence, F _k, subscript k represents k people's face sequence in people's face arrangement set, establishing it has t people's face, extracts the color histogram feature of this t people's face, calculates according to this people's face similarity matrix between two wherein be the similarity of i and j people's face, order for T _kin the mean value of everyone face similarity degree, adopt affine propagation (Affinity Propagation) clustering algorithm to carry out self-adaption cluster to this t people's face.If cluster generates | F _k| individual class, F _kcan be expressed as its representative facial image set wherein from i the facial image that class central point is nearest.Wherein, i is the natural number that is less than t.

Certainly, can use other method to obtain the representative facial image of people's face sequence, the present invention does not do any restriction for the acquisition methods of the representative facial image of people's face sequence yet.

Step S13, collection video periphery text extract name from this video periphery text.

Described video periphery text refers on the webpage of internet video place, the word content relevant to video, and it includes but not limited to: video title, label, descriptive text and user comment.Correlativity and the noise level of considering dissimilar periphery text are different, and the present invention only considers that video title, label and length surpass the user comment of 20 words.

Particularly, for periphery text, be English situation (as the periphery text of English video website), adopted a kind of name extracting method based on wikipedia biographical dictionary coupling.To the continuous word sequence in above-mentioned text, the method is from first word, whether the phrase of testing successively n unit (n < 4) word sequence composition forms a dimension keyword bar, if form, retain the dimension keyword bar of n maximum, and continue above-mentioned test since n+1 word.By the method, can in continuous word sequence, find dimension keyword bars such as " Barack Obama " and " World Cup 2014 ".At title, label and comment collection, close and repeat above-mentioned resolving, obtain after the set of dimension keyword bar.The method verifies whether these dimension keyword bars are names one by one.Particularly, check that the classification of dimension keyword bar place dimension base page face is described part, wherein whether existence form is the description classification of " xxxx births " in inquiry, and wherein xxxx is four or three numerals that represent the time.If exist, judge that this dimension keyword bar is name, otherwise be judged to be other named entity and ignored.

What introduce above is the disposal route of English text, for video periphery text, it is Chinese situation, first utilize Chinese word segmentation instrument ICTCLAS to carry out Chinese word segmentation, and then adopt the above-mentioned name extracting method based on wikipedia biographical dictionary coupling to carry out that name extracts can (dimension base class do not describe the judgment criterion of part is corresponding becomes the description classification whether existence form is " xxxx birth ").By above-mentioned processing, can obtain the relevant name set of this video n wherein _krepresent k the name extracting, CN represents the quantity of extracted name.

Because the periphery text of internet video is conventionally uploaded user by video and provided, the syntactic structure of text is loose, word collocation is comparatively free, misspelling and write a Chinese character in simplified form also of common occurrence.The above-mentioned name extracting method based on wikipedia biographical dictionary coupling does not rely on syntactic structure, and to misspelling with write a Chinese character in simplified form certain tolerance, the name being particularly suitable in internet video periphery text extracts.Certainly, can use other name extracting method, the present invention does not do any restriction for name extracting method yet.

Step S2 for the name of take in the name set that step S1 obtains be text key word, search is to obtain and the set of the corresponding personage's network image of described name, it specifically comprises the steps:

Name in the name set that step S21, the step S1 of take obtain is text key word, search the download image relevant to described text key word on network.

Specifically, can utilize existing image search engine, such as the application programming interfaces that call Google and provide, text key word is submitted to Google image search engine, and search parameter is set for the retrieval 64 width image that comprises people's face, this arranges down, and Google image search engine can turn back to retrieval end in the URL(uniform resource locator) (being URL address) of the character image of first 64 by result for retrieval sequence, and retrieval is held and then downloaded respective image according to URL address.That is to say, at all images, can normally download ideally, this step can obtain 64 Search Results images.In reality, the image that each name can download to is conventionally between 50 to 64.

Step S22, the image relevant to described text key word of described download carried out to the detection of people's face, filtering does not detect image people's face or that more than one people's face detected.

For example, can call increase income people's face detection function of storehouse OpenCV of computer vision, to downloading successful character image, carry out the detection of people's face.Returning results of people's face detection function can be: people's face do not detected, one or more people's faces detected.Owing to detecting in the image of a plurality of people's faces except inquirer, conventionally also can comprise people's face of other personage, and then can disturb mark person's judgement in the reference comparison procedure of mark, therefore this step only retains the image that 1 people's face detected, people's face do not detected and is removed with the image that a plurality of people's faces detected;

Step S23, all names in described name set are repeated to above-mentioned steps S21 and step S22, obtain the personage network image set corresponding with each name in described name set.

The set of personage's network image can be designated as c wherein _krepresent all and name N _kcorresponding personage's network image.

For convenience of the follow-up explanation to embodiment.First system of the present invention is formed and simply introduced.Fig. 2 has provided the sectional drawing of above-mentioned Interactive Marking System, can see, system interface is divided into management area, tab area, mark reference zone and four parts of mark history area.Management area, for mark person's interactive selection video to be marked, is written into people's face sequence title and the relevant name of selected video.Tab area can be further divided into similar people's face merging/distinguishing mark subregion and name-people face connective marker subregion, is used for respectively showing current similar people's face combined sequence Q to be marked _i=< F _m, F _n> and people's face sequence F _j, and carry out corresponding mutual mark operation.In addition the first six width personage network image of the name the most similar to shown people's face sequence in name-people face connective marker subregion and it corresponding mark reference zone that is presented at.Rightmost mark history area sequentially shows by mark name-people face two tuples that marked, and wherein two tuples of up-to-date mark are presented at the top.The function of mark reference zone and mark history area is mainly to help mark person with reference to decision-making as supplementary.

Step S3 is for calculating the importance score of described people's face sequence, the recommendation scores of merging between two of described people's face sequence, and described people's face sequence and step S2 that obtain with the similarity score corresponding personage's network image of described name, and according to described importance score, described recommendation scores and the described similarity score of merging between two, determine when described video is marked the people's face sequence, name and the personage's network image that are shown.This step S3 comprises as follows step by step:

Step S31, calculate the conspicuousness value of everyone face sequence in described people's face arrangement set.

Because time of occurrence is longer, people's face is larger people's face sequence more easily attracts much attention at video, be that video core personage's probability is also larger.The present invention is called conspicuousness by this character of people's face sequence, and has proposed following conspicuousness value computing formula:

Sai (F_{i}) = e^{- \frac{{size}_{θ}}{{size}_{i}}} + e^{- \frac{{dura}_{θ}}{{dura}_{i}}} - - - (1)

Wherein, size _iand dura _irespectively people's face sequence F _ibe bold little and time of occurrence length of average man, size _θand dura _θbe two threshold values that rule of thumb arrange, be used for respectively controlling people's impact little and that time of occurrence calculates conspicuousness of being bold.By formula (1), people's face sequence that time of occurrence is long, average man is bold will have larger conspicuousness value.

Step S32, calculate the similarity between two between people's face sequence in described people's face arrangement set.

In video, time of occurrence has two overlapping people's face sequences different people of correspondence conventionally, but the shorter people's face sequence in time of occurrence interval may be that same person is because of the reasons such as camera lens switching, the different people face sequence of generation.Based on above-mentioned cognition, whether the visual similarity between two of people's face sequence of giving chapter and verse, people's face sequence time of occurrence interval, time of occurrence there is the information such as overlapping, calculate the similarity between two of people's face sequence, and corresponding computing formula is:

sim (F_{i}, F_{j}) = e^{- \frac{{Δtime}_{i, j}}{{time}_{θ}}} \cdot (1 - {CO}_{i, j}) \cdot vs (F_{i}, F_{j}) - - - (2)

Wherein, time _θfor controlling the threshold value of time of occurrence difference impact, Δ time _{i, j}people's face sequence F _iand F _jtime of occurrence difference, by following formula (3), calculate:

\{\begin{matrix} {time}_{j}^{beg} - {time}_{i}^{end}, & if & {time}_{i}^{beg} \leq {time}_{j}^{beg} \\ {time}_{i}^{beg} - {time}_{j}^{end}, & if & {time}_{j}^{beg} \leq {time}_{i}^{beg} \end{matrix} - - - (3)

In formula (3), with respectively people's face sequence F _ithe start time and the end time that occur, the little people's of the showing face of time value sequence appear at video before (beginning) part.In addition in formula (2), CO _{i, j}mean people's face sequence F _iand F _jwhether time of occurrence has overlapping two-valued function, overlapping if the two has, CO _{i, j}=1, otherwise CO _{i, j}=0; Vs (F _i, F _j) be people's face sequence F _iand F _jvisual similarity, in the representative people's face set by two people's face sequences, the similarity of two the most similar people's faces represents, its computing formula is:

vs (F_{i}, F_{j}) = e^{{- \min}_{f_{i}^{m} &Element; F_{i}, f_{j}^{n} &Element; F_{j}, i &NotEqual; j} | | f_{i}^{m} - f_{j}^{n} | |} - - - (4)

In formula (4), people's face sequence F _ithe facial characteristics vector of m representative people's face.

Step S33, people's face sequence of obtaining according to step S32 be customer interaction information when similarity and mark between two, calculates the recommendation scores of merging between two of people's face sequence.

Specifically can utilize following formula to calculate:

MS(F _i，F _j)＝(1-PM _i，j)·sim(F _i，F _j) (5)

PM wherein _{i, j}mean people's face sequence F _iand F _jcombination whether in mark process, by user, " skipped " or be labeled as " difference ".If PM _{i, j}=1, otherwise PM _{i, j}=0.According to formula (5), similarity is high, and people's face sequence combination of two of " not skipped " or be labeled as " difference " in user annotation process by user will be endowed the large recommendation scores of merging between two.Based on this, people's face combined sequence that all scores are more than or equal to threshold value given in advance is according to MS (F _i, F _j) value arrangement from high to low, obtain merging between two recommendation scores list q wherein _k=< F _i, F _j> _{i ≠ j}.In mark process, the similar people's face sequence in Fig. 2 system merge distinguishing mark subregion will be according to Rank _mSshow people's face sequence combination of two to be marked.

Step S34, people's face sequence conspicuousness value of utilizing step S31 to obtain, people's face sequence that step S32 obtains is similarity score and the customer interaction information in when mark between two, calculates the importance score of people's face sequence.

The importance of people's face sequence represents to consider after the information of various human face sequence, described video and user interactions, the degree that people's face sequential value must mark, and it can utilize following formula (6) to calculate:

IS (F_{i}) = (1 - {PA}_{i}) \cdot (\overset{&OverBar;}{{Sai}_{i}} + \overset{&OverBar;}{{AR}_{i}}) - - - (6)

PA wherein _ito characterize people's face sequence F _iwhether in mark process, by user, " skipped ", if PA _i=1, otherwise PA _i=0; with respectively the conspicuousness Sai after minimax normalization _iwith accumulation correlativity AR _i, the latter is defined as:

{AR}_{i} = Σ_{j = 1, j &NotEqual; i}^{FN} L_{j} \cdot sim (F_{i}, F_{j}) - - - (7)

Wherein, L _jpeople's face sequence F _jmark function of state.If F _ibe marked, L _j=1, otherwise L _j=0.According to formula (6), conspicuousness value is large, and to a plurality of to have marked people's face all more similar, and people's face sequence of " not skipped " in user annotation process will be endowed large importance score.

Based on this, by people's face sequence according to importance score IS (F _i) arrange from high to low, obtain the list of importance score in mark process, the name-people face connective marker subregion in Fig. 2 system will be according to Rank _iSshow people's face sequence to be marked.

Step S35, calculate the similarity of the personage's network image in people's face sequence and the set of described personage's network image in described people's face arrangement set, height by similarity sorts, the people's list of file names after being sorted and the K of each name personage's network image the most similar.In the present invention, the value of K is set to 6.

This step mainly solves in interactive mark process, and mark person often can run into the situation of not being familiar with personage to be marked.By show above-mentioned name and personage's network image at labeling system, alleviate the problem that mark person is not familiar with personage to be marked.Particularly, after the people's face sequence showing in name-people face connective marker subregion is determined, the name the most similar to it and K personage's network image the most similar thereof are displayed, for mark person, with reference to comparison, assist and determine the corresponding name of people's face sequence to be marked.The calculating of this step comprises following three sub-steps:

Step S351, calculate in described people's face arrangement set the similarity between two of name in people's face sequence and described name set.The visual characteristic of name can represent by its corresponding personage's network image.Based on this, by following formula (8), calculate people's face sequence F _iwith personage's network image set C _jsimilarity, and by this similarity as people's face sequence F _iwith name N _jsimilarity:

vs (F_{i}, N_{j}) = vs (F_{i}, C_{j}) = \frac{1}{| C_{j} |} Σ_{n = 1}^{| C_{j} |} vs (F_{i}, c_{j}^{n}) - - - (8)

Wherein

vs (F_{i}, c_{j}^{n}) = e^{{- \min}_{f_{i}^{m} &Element; F_{i}} | | f_{i}^{m} - c_{j}^{n} | |} - - - (9)

name N _jcorresponding personage's network image set C _jin the face features vector of n image;

Step S352, the similarity calculating according to step S351, sort to described name.Conventionally, people's face sequence F _iwith name N _jsimilarity vs (F _i, N _j) larger, F _in _jthe probability of people's face also larger.Based on this, according to vs (F _i, N _j) be worth from high to low to name set omega _nsort, obtain people's face sequence F _icorresponding name sequence

Step S353, calculate people's face sequence with respect to K personage's network image the most similar of each name.From formula (9), can see, in personage's network image to the similarity of people's face sequence by this image and the most similar representative face representation of people's face sequence.Therefore, to every group of people's face sequence F _iwith name N _j, according to value is from high to low to C _jin personage's network image sort, retain K image the most similar, obtain people's face sequence F _iwith respect to name N _jthe list of personage's network image

Rank (F_{i}, F_{j}) = {c_{k}}_{k = 1}^{K},

Wherein K is set as 6;

According to a preferred embodiment of the invention, people's face sequence importance score list that step 34 obtains and people's face sequence F of obtaining of step 35 _icorresponding similar people's list of file names with the list of similar personage's network image by multiple user interactive, produce corresponding mark behavior, realize the detailed process of the mark of personage in video step S4 as described below is specifically comprised:

The various resources that step S41, initialization mark process relate to.

Specific practice is:

S411, order

{{PA}_{k} = 0}_{k = 1}^{FN},

{{PM}_{m, n} = 0}_{m = 1, n = 1, m &NotEqual; n}^{FN},

ULSets = {F_{k}}_{k = 1}^{FN};

S412, automatic marking meet people's face combined sequence Q of condition shown in formula (10) _i=< F _m, F _n>, and marked combination from Rank by all _mSin list, shift out

Label(F _i)＝Label(F _j)，if satisfies vs(F _i，F _j)≥T _s (10)

Wherein if satisfies represents " if meeting ", T _smean visually enough similar threshold value whether of two people's face sequences.

S413, from Rank _mSand Rank _iSmiddlely take out respectively the element Q that rank is the highest _i=< F _m, F _n> and F _j, currently merge between two people's face combined sequence and the highest people's face sequence of importance score that score is the highest, these resources are shown in labeling system;

S414, taking-up Rank (F _j) the middle the highest name of rank and in K image, these resources are shown in labeling system.

Step S42, according to multiple user interactive, produce corresponding mark behavior.

User interactive has three classes: 1) by the people's face combined sequence Q showing in system _i=< F _m, F _n> is labeled as similar people's face merging/distinguishing mark operation of " identical " or " difference "; 2) select particular person name mark people face sequence F _jname-people face connective marker operation; 3) select different names and personage's network image thereof, for the operation of mark person reference.In this three generic operation, the 3rd class is mark non-productive operation, and object is to provide the decision-making of information assisted user mark, and Equations of The Second Kind operation can be F _jcorresponding name on mark, and it is never marked in people's face arrangement set ULSets and is shifted out.Mark behavior corresponding to this three classes interactive operation be respectively:

1) the mark behavior of similar people's face merging/distinguishing mark operational correspondence:

If a) user marks Q with " identical " option _i, make Label (F _m)=Label (F _n), Label (F wherein _m) expression people face sequence F _mcorresponding name;

B) if user marks Q with " difference " option _i, make Label (F _m) ≠ Label (F _n), with seasonal PM _{m, n}=1;

C) if user to Q _iselect " skipping " option, make PM _{m, n}=1;

2) the mark behavior of name-people face connective marker operational correspondence:

If a) user selects to use name N _kflag F _j, make ULSets=ULSets F _j, Label (F _j)=N _k;

B) if user selects " skipping " to F _jmark, make PA _j=1;

3) name and personage's network image are selected the behavior of operational correspondence:

If a) user clicks " previous " option, make k=k-1 (when k > 1), show name and personage's network image list in K image;

B) if user clicks " latter one " option, make k=k+1 (when k < CN), show name and personage's network image list in K image.

Step S43, utilize label propagation algorithm not mark people's face sequence to other to mark.

Because user's mutual mark behavior provides extra mark clue.Therefore, utilize label propagation algorithm not mark people's face sequence F to other that meets following formula (11) or (12) described condition _icarry out automatic marking;

\{\begin{matrix} Label (F_{i}) = N_{k} \\ ULSets = ULSERS \ {F_{i}}^{,} \end{matrix} if satisfies \{\begin{matrix} F_{i} &Element; ULSets \\ vs (F_{i}, F_{j}) &GreaterEqual; T_{s} \\ Label (F_{j}) = N_{k} \end{matrix} - - - (11)

Label (F_{i}) = Label (F_{j}), if satisfies \{\begin{matrix} F_{i} &Element; ULSets \\ F_{j} &Element; ULSets \\ vs (F_{i}, F_{j}) &GreaterEqual; T_{s} \end{matrix} - - - (12)

T wherein _sit is the similarity threshold of formula (10) definition.

Step S44, to merging between two recommendation scores list and the list of importance score arranges and reorders, the resource being presented while determining next round user annotation.

By step S42 and step S43, merge between two recommendation scores list Rank _mSwith importance score list Rank _iSin some face sequences be marked.This step is according to annotation results, to Rank _mSand Rank _iSarrange and reorder, the resource being presented while determining next round user annotation.Above-mentioned arrangement and the specific practice reordering are respectively:

1) arrange: by Rank _mSand Rank _iSin meet the element Q of following formula (13), (14) or (15) described condition _i=< F _m, F _n> and F _jshift out respectively:

{Rank}_{MS} = {Rank}_{MS} \ Q_{i}, if satisfies \{\begin{matrix} F_{m} &Element; ULSets \\ F_{n} &Element; ULSets \end{matrix} - - - (13)

Rank _MS＝Rank _MS\Q _i，if satisfies Label(F _m)＝Label(F _n) (14)

Rank _IS＝Rank _IS\F _j，if satisfies (15)

2) reorder: to Rank _mSand Rank _iSin remaining element, utilize respectively formula (6) and (5) to recalculate it and merge between two recommendation scores and importance score, and generate Rank according to score rearrangement _mSand Rank _iSlist, the foundation that while marking alternately as next round, resource shows.

Step S45, repeating step S42 are to step S44, until (all people's of mark face sequences are all marked ), or user initiatively exits mark process.

Above-described specific embodiment; object of the present invention, technical scheme and beneficial effect are further described; be understood that; the foregoing is only specific embodiments of the invention; be not limited to the present invention; within the spirit and principles in the present invention all, any modification of making, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims

1. an internet personage video interactive mask method, is characterized in that, comprises the following steps:

2. method according to claim 1, is characterized in that, described step S1 comprises the following steps:

Step S12, obtain the representative facial image of everyone the face sequence in described people's face arrangement set;

3. method according to claim 1, is characterized in that, described step S2 comprises the following steps:

Name in the name set that step S21, the step S1 of take obtain is text key word, search the download image relevant to described text key word on network;

Step S22, the image relevant to described text key word of described download carried out to the detection of people's face, filtering does not detect image people's face or that more than one people's face detected;

4. method according to claim 1, is characterized in that, described step S3 comprises the following steps:

Step S31, calculate the conspicuousness value of everyone face sequence in described people's face arrangement set;

Step S32, calculate the similarity between two between people's face sequence in described people's face arrangement set;

Step S33, people's face sequence of obtaining according to step S32 be customer interaction information when similarity and mark between two, calculates the recommendation scores of merging between two of people's face sequence;

Step S34, people's face sequence conspicuousness value of utilizing step S31 to obtain, people's face sequence that step S32 obtains is similarity score and the customer interaction information in when mark between two, calculates the importance score of people's face sequence;

Step S35, calculate the similarity of the personage's network image in people's face sequence and the set of described personage's network image in described people's face arrangement set, height by similarity sorts, and obtains K personage's network image the most similar of people's list of file names after everyone face sequence permutation and each name.

5. method according to claim 4, is characterized in that, people's face sequence F in described step S31 _iconspicuousness by following formula (1), calculate:

Sai (F_{i}) = e^{- \frac{{size}_{θ}}{{size}_{i}}} + e^{- \frac{{dura}_{θ}}{{dura}_{i}}} - - - (1)

Wherein, size _iand dura _irespectively people's face sequence F _ibe bold little and time of occurrence length of average man, size _θand dura _θbe two threshold values that rule of thumb arrange, be used for respectively controlling people's impact little and that time of occurrence calculates conspicuousness of being bold.

6. method according to claim 4, is characterized in that, in described step S32, the similarity between two of people's face sequence is calculated by following formula (2):

sim (F_{i}, F_{j}) = e^{- \frac{{Δtime}_{i, j}}{{time}_{θ}}} \cdot (1 - {CO}_{i, j}) \cdot vs (F_{i}, F_{j}) - - - (2)

\{\begin{matrix} {time}_{j}^{beg} - {time}_{i}^{end}, & if & {time}_{i}^{beg} \leq {time}_{j}^{beg} \\ {time}_{i}^{beg} - {time}_{j}^{end}, & if & {time}_{j}^{beg} \leq {time}_{i}^{beg} \end{matrix} - - - (3)

In formula (3), with respectively people's face sequence F _ithe start time and the end time that occur, the little people's of the showing face of time value sequence appear at video before (beginning) part;

In formula (2), CO _{i, j}mean people's face sequence F _iand F _jwhether time of occurrence has overlapping two-valued function, overlapping if the two has, CO _{i, j}=1, otherwise CO _{i, j}=0; Vs (F _i, F _j) be people's face sequence F _iand F _jvisual similarity, in the representative people's face set by two people's face sequences, the similarity of two the most similar people's faces represents, its computing formula is:

vs (F_{i}, F_{j}) = e^{{- \min}_{f_{i}^{m} &Element; F_{i}, f_{j}^{n} &Element; F_{j}, i &NotEqual; j} | | f_{i}^{m} - f_{j}^{n} | |} - - - (4)

7. method according to claim 4, is characterized in that, in described step S33, the recommendation scores of merging between two of people's face sequence calculates by following formula (5):

MS(F _i，F _j)＝(1-PM _i，j)·sim(F _i，F _j) (5)

PM wherein _{i, j}mean people's face sequence F _iand F _jcombination whether in mark process, by user, " skipped " or be labeled as " difference "; If PM _{i, j}=1, otherwise PM _{i, j}=0; According to formula (5), similarity is high, and people's face sequence combination of two of " not skipped " or be labeled as " difference " in user annotation process by user will be endowed the large recommendation scores of merging between two; Based on this, people's face combined sequence that all scores are more than or equal to threshold value given in advance is according to MS (F _i, F _j) value arrangement from high to low, obtain merging between two recommendation scores list q wherein _k=< F _i, F _j> _{i ≠ j}.

8. method according to claim 4, is characterized in that, in described step S34, the importance score of people's face sequence is calculated by following formula (6):

IS (F_{i}) = (1 - {PA}_{i}) \cdot (\overset{&OverBar;}{{Sai}_{i}} + \overset{&OverBar;}{{AR}_{i}}) - - - (6)

{AR}_{i} = Σ_{j = 1, j &NotEqual; i}^{FN} L_{j} \cdot sim (F_{i}, F_{j}) - - - (7)

Wherein, L _jpeople's face sequence F _jmark function of state; If F _ibe marked, L _j=1, otherwise L _j=0,

By people's face sequence according to importance score IS (F _i) arrange from high to low, obtain the list of importance score

{Rank}_{IS} = {F_{i}}_{i = 1}^{FN} .

9. method according to claim 4, is characterized in that, described step S35 comprises the following steps:

Step S351, calculate in described people's face arrangement set the similarity between two of name in people's face sequence and described name set;

Step S352, the similarity calculating according to step S351, sort to described name;

Step S353, calculate people's face sequence with respect to K personage's network image the most similar of each name.

10. method according to claim 9, is characterized in that,

Described step S351 calculates people's face sequence F by following formula (8) _iwith personage's network image set C _jsimilarity, and by this similarity as people's face sequence F _iwith name N _jsimilarity:

vs (F_{i}, N_{j}) = vs (F_{i}, C_{j}) = \frac{1}{| C_{j} |} Σ_{n = 1}^{| C_{j} |} vs (F_{i}, c_{j}^{n}) - - - (8)

Wherein

vs (F_{i}, c_{j}^{n}) = e^{{- \min}_{f_{i}^{m} &Element; F_{i}} | | f_{i}^{m} - c_{j}^{n} | |} - - - (9)

personage's network image set C _jin the face features vector of n image.

11. methods according to claim 10, is characterized in that,

Described step S352 is to everyone face sequence F _i, according to vs (F _i, N _j) value sorts to name from high to low, obtains name sequence

12. methods according to claim 11, is characterized in that,

Described step S353 for example, to every group of people's face sequence and name, F _iand N _j, according to value is from high to low to C _jin personage's network image sort, retain K image the most similar, obtain and F _iand N _jcorresponding personage's network image list

13. methods according to claim 1, is characterized in that, described step S4 comprises the following steps:

The various resources that step S41, initialization mark process relate to;

Step S42, according to multiple user interactive, produce corresponding mark behavior;

Step S43, utilize label propagation algorithm not mark people's face sequence to other to mark;

Step S44, to merging between two recommendation scores list and the list of importance score arranges and reorders, the resource being presented while determining next round user annotation;

Step S45, repeating step S42 are to step S44, until all people's of mark face sequences are all marked.

14. methods according to claim 13, is characterized in that, described step S41 comprises:

S411, order

{{PA}_{k} = 0}_{k = 1}^{FN},

{{PM}_{m, n} = 0}_{m = 1, n = 1, m &NotEqual; n}^{FN},

ULSets = {F_{k}}_{k = 1}^{FN};

Label(F _i)＝Label(F _j)，if satisfies vs(F _i，F _j)≥T _s (10)

Wherein if satisfies represents " if meeting ", T _smean visually enough similar threshold value whether of two people's face sequences;

15. methods according to claim 13, is characterized in that, in described step S42, multiple user interactive comprises: the people's face combined sequence Q 1) system being shown _i=< F _m, F _n> is labeled as similar people's face merging/distinguishing mark operation of " identical " or " difference "; 2) select particular person name mark people face sequence F _jname-people face connective marker operation; 3) interactive operation of selecting different names and personage's network image thereof to be shown.

16. methods according to claim 13, is characterized in that, in described step S42, mark behavior corresponding to multiple user interactive is respectively:

C) if user to Q _iselect " skipping " option, make PM _{m, n}=1;

B) if user to F _jselect " skipping " option, make PA _j=1;

17. methods according to claim 13, is characterized in that, described step S43 does not mark people's face sequence F to other that meets certain condition _icarry out the specific practice of automatic marking as shown in formula (11) or (12):

\{\begin{matrix} Label (F_{i}) = N_{k} \\ ULSets = ULSERS \ {F_{i}}^{,} \end{matrix} if satisfies \{\begin{matrix} F_{i} &Element; ULSets \\ vs (F_{i}, F_{j}) &GreaterEqual; T_{s} \\ Label (F_{j}) = N_{k} \end{matrix} - - - (11)

Label (F_{i}) = Label (F_{j}), if satisfies \{\begin{matrix} F_{i} &Element; ULSets \\ F_{j} &Element; ULSets \\ vs (F_{i}, F_{j}) &GreaterEqual; T_{s} \end{matrix} - - - (12)

T wherein _sit is the similarity threshold of formula (10) definition.

18. methods according to claim 13, is characterized in that, described step S44 is according to annotation results, to Rank _mSand Rank _iSthe specific practice that arranges and reorder is:

1) arrange: at Rank _mSand Rank _iSthe middle element Q that meets following formula (13), (14) or (15) described condition that deletes respectively _i=< F _m, F _n> and F _j:

{Rank}_{MS} = {Rank}_{MS} \ Q_{i}, if satisfies \{\begin{matrix} F_{m} &Element; ULSets \\ F_{n} &Element; ULSets \end{matrix} - - - (13)

Rank _MS＝Rank _MS\Q _i，if satisfies Label(F _m)＝Label(F _n) (14)

Rank _IS＝Rank _IS\F _j，if satisfies (15)

2) reorder: to Rank _mSand Rank _iSremaining element, utilizes formula (6) and (5) to recalculate it and merges between two recommendation scores and importance score, and regenerate according to this Rank _mSand Rank _iS, the foundation that while marking alternately as next round, resource shows.

19. 1 kinds of internet personage video interactive labeling systems, is characterized in that, comprising: