CN104217008B - Internet personage video interactive mask method and system - Google Patents

Internet personage video interactive mask method and system Download PDF

Info

Publication number
CN104217008B
CN104217008B CN201410475211.0A CN201410475211A CN104217008B CN 104217008 B CN104217008 B CN 104217008B CN 201410475211 A CN201410475211 A CN 201410475211A CN 104217008 B CN104217008 B CN 104217008B
Authority
CN
China
Prior art keywords
mrow
msub
face
name
personage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410475211.0A
Other languages
Chinese (zh)
Other versions
CN104217008A (en
Inventor
陈智能
白锦峰
冯柏岚
黄向生
徐波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201410475211.0A priority Critical patent/CN104217008B/en
Publication of CN104217008A publication Critical patent/CN104217008A/en
Application granted granted Critical
Publication of CN104217008B publication Critical patent/CN104217008B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • G06F16/784Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Library & Information Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of internet personage video interactive mask method and system, methods described to include:Extract the name in the face sequence and periphery text in video to be marked;Using name as text key word, corresponding personage's network image set is obtained using search engine;The importance score of face sequence, the merging recommendation scores two-by-two of face sequence, and face sequence and the similitude of personage's network image are calculated, according to above-mentioned property, it is determined that the face sequence, name and the personage's network image that are shown during mark;By a variety of user interactives, corresponding mark behavior is produced, realizes the mark to personage in video.The present invention is by excavating a variety of resources related to video to be marked and personage, and design friendly various user mutual mode, annotation process can be simplified, auxiliary mark decision-making, effectively alleviate mark person and do not recognize personage to be marked, the problem of causing annotation process to be difficult to, the efficiency and precision of personage's video labeling can be increased substantially.

Description

Internet personage video interactive mask method and system
Technical field
The present invention relates to video intelligent analysis technical field, in particular, is related to a kind of internet personage video interactive Formula mask method and system.
Background technology
With the development of Internet technology and the prevalence of the online splitting glass opaque of video, a large amount of professional and non-professional video quilts Manufacture out, upload to internet, and browsed and watched by user all over the world.Because this kind of video is typically embedded into Online play externally is provided in internet web page to service, therefore is collectively referred to as internet video.People be internet video most by One of theme of concern.There are a large amount of hot videos to be related to people, particularly famous person in video website, in addition, the name of famous person The always important component of video search engine focus inquiry word.
Although internet personage video is extensively concerned, personage interested is found in large-scale internet video library and is regarded Frequency is not an easy thing.The video search engine of main flow is all that the method matched by text key word is realized at present Video frequency searching.For personage's video, there is the deficiency in terms of following three in this search method:1) periphery of internet video Text (such as title, label and user comment) is typically imperfect and with noisy, and the video that someone occurs might not The name of the people is labelled with, correspondingly, is labelled with the video of someone name that the people not necessarily occurs, causes video search The video of a part of correlation is only able to find, and a certain proportion of noise in retrieval result generally be present;2) periphery text is to whole The description of fragment in individual video rather than video, the video segment that people appearance is directly targeted to according to name is still that main flow regards The service that can not provide at present of frequency website, and this service for user browse can undoubtedly provide for video it is very big just Profit;3) whether in retrieval result list, the video appeared in above is generally not maximally related, because occurring according only to name It is difficult to obtain the accurately inquiry degree of correlation to judge.Therefore, there is an urgent need to the effective personage's video frequency searching of more intelligence, clear for industrial circle Look at and sort method.
The key to solve the above problems marks its corresponding name in the face occurred in for video.In other words Say, establish the mapping relations of face name into periphery text in video, what this task was generalized is referred to as face mark.Though Right Face datection and name identification has been the technology of comparative maturity, but face marks, particularly in facial angle, facial table Feelings, illumination, block etc. it is unrestricted in the case of mark be still a great challenge problem.In the past several years, pin To certain types of videos such as news video, film and television plays, there are some effective face mask methods to be suggested.This Although a little methods are realized and had nothing in common with each other, the technology path of multimodal information fusion is all employed substantially.First, they are from new Hear the outside channel such as lecture notes, phonetic transcription text or internet and obtain high priest's (dominant role of such as film) involved by video Name, and the drama and captioned test of video, by using news lecture notes or alignment drama and captions, obtain particular persons In the content of speaking of video particular point in time.Simultaneously according to the time point that face is detected in video, face and people are tentatively established The mapping relations of name, and then using the visual similarity between face, this relation is refined so as to realize mark.Due to new Hear lecture notes, drama and captioned test and can generally provide and clue, and film electricity occur compared with horn of plenty and specific name and personage The high priest's quantity being related to depending on play etc. is generally also relatively limited, and the above method can be realized to particular news with higher precision The full-automatic mark of high priest in program, film and television play.
However, internet video is different from film and television play.Although also there are some text envelopes on the webpage of internet video Breath, but it is the usual limited amount of these texts, not accurate enough and do not organized preferably.In addition, they are appeared in entirely Video-level, timestamp information is carried unlike captioned test.These characteristics determine above-mentioned dependence rich text information excavating Method be difficult to directly be generalized on internet video.In addition, internet video content embraces a wide spectrum of ideas, the people that video may relate to Thing covers different social sectors, and quantity is extremely more, even only focusing on famous person therein, its quantity is nor a decimal Word.At present, the face automatic marking work for extensive open the Internet video is still in predevelopment phase.Due to being difficult to reach To preferable mark effect, the method and system that this respect does not have maturation at present emerges.
As magnanimity internet video is deposited in video website, and new video quantity is also increasing at faster speed, people Thing video labeling turns into pendulum and has to solve the problems, such as academia and industrial circle face are previous again.Therefore, people is incorporated into mark Link is noted, starts to attract attention to improve interactive mask method of the mark accuracy as target.To sky, meadow, building On mark etc. general visual concept, there are some effective interactive mask methods to be suggested at present, but these sides Method can not be applied directly in difference mark different people this problem.To find out its cause, manually mark above-mentioned general regard Feel that concept is relatively easy to realize, because these concepts only can be distinguished by general knowledge when most of, but difference is marked For different personages, even veteran mark person, generally also only recognizes people considerably less in the world, and people is to be Oneself unacquainted people marks name.If as existing Interactive Marking System, only by image or video comprising personage Frame and (multiple) related name submit to mark user, may not recognize personage to be marked due to very big, user is difficult as mark General visual concept goes to mark personage like that, even if that need to mark is all famous person.In interactive characters' mark particularly video Personage marks this aspect, and presently relevant achievement is also very rare.
When noticing that people sees unacquainted people in an image or a video, to understand whom he/her is, is taken solution party Case is typically:Name is found from the text of periphery, by the use of the name found as keyword, is examined using image search engine Rope, the result images then returned by comparison search engine and the people seen in image, show that people is whose judges in image. The image retrieval based on text key word that such scheme uses, although also there is the searching system of a small number of " to scheme to search figure " at present, But because search target is the image of particular persons, it is not required that all result images and query image visually height phase Seemingly, and video human face visual appearance change greatly, resolution ratio it is generally relatively low, also the precision of " to scheme to search figure " system is brought and chosen Fight, mainly taken at present in this task or searching method based on text key word.Due to can be with by search engine Substantial amounts of character image particularly famous person is found, such scheme many times be can yet be regarded as, and one kind is effective to help user to understand it The preceding method for not recognizing personage.
The above-mentioned way of people can use for reference naturally to personage's video interactive mask method and system design on.Mark Note person equally can run into unacquainted people and pause mark of having to, by seeking help from search engine when carrying out personage's mark Understand the people Deng external tool, and then continue to promote annotation process.Due to needing frequently to be cut in mark and search comparison operation Change, this process is undoubtedly poorly efficient and cumbersome.If video periphery text can be extracted by text resolution and Visual analysis techniques In name, obtain related person network image simultaneously accordingly shown in annotation process;At the same time, to the people in video Face is analyzed and handled, and is shown in a manner of being easy to mark, and the person that makes mark need not both be switched to search engine and go Solve personage to be marked, it is seen that be to be more easy to be added on the notation methods of decision-making and by tissue and the friendly video human face presented again It image, so can undoubtedly simplify annotation process, significantly improve the efficiency and precision of personage's video labeling.However, disclosed Interactive mask method and system specifically in video personage, the above-mentioned back of the body are not inquired when being retrieved in patent database Scape and understanding are exactly that the present invention produces motivation and reason.
The content of the invention
When the present invention is directed to internet personage video labeling, because mark person does not recognize personage to be marked very likely, lead The situation for causing annotation process to be difficult to, proposes a kind of internet personage video interactive mask method and system, passes through excavation A variety of resources related to video to be marked and personage, and friendly various user mutual mode is designed, simplify annotation process, it is auxiliary Mark decision-making is helped, improves the efficiency and precision of personage's video labeling, and then promotes the retrieval of internet personage video, browse and arrange The lifting of sequence service level.
To achieve the above object, the present invention provides a kind of internet personage video interactive mask method, including following step Suddenly:
S1, video to be marked is analyzed, extracted in the face arrangement set and video periphery text in the video Name set;
S2, using the name in the name set that step S1 is obtained as text key word, search for obtain with the people famous prime minister The network image set of the personage answered;
S3, the importance score for calculating the face sequence, the merging recommendation scores two-by-two of the face sequence, Yi Jisuo The similarity score for personage's network image corresponding with the name that face sequence obtains with step S2 is stated, and according to described heavy The property wanted score, merging recommendation scores and the similarity score two-by-two, it is determined that when being labeled to the video, are shown Face sequence, name and the personage's network image shown;
S4, the face sequence shown according to step S3, name and personage's network image, mark is interacted to face sequence Note, and then realize the mark to the video.
The present invention also proposes a kind of internet personage video interactive labeling system, including:
For analyzing video to be marked, extract in the face arrangement set and video periphery text in the video The device of name set;
For using the name in the name set as text key word, searching for acquisition personage corresponding with the name Network image set device;
For calculating the merging recommendation scores two-by-two of the importance score of the face sequence, the face sequence, and The similarity score of face sequence personage's network image corresponding with the name, and according to the importance score, institute State and merge recommendation scores and the similarity score two-by-two, it is determined that when being labeled to the video, the face sequence that is shown The device of row, name and personage's network image;
For showing face sequence, name and the personage's network image to be marked, mark is interacted to face sequence, entered And realize the device being labeled to the video.
The present invention is by excavating a variety of resources and respective design that contribute to mark related to video to be marked and personage Friendly various user mutual mode, can simplify annotation process, and auxiliary mark decision-making, effectively alleviation mark person do not recognize to be marked Personage, the problem of causing mark to be difficult to.Using the present invention, the efficiency of internet personage video labeling can be increased substantially And precision, and then promote the retrieval of internet personage video, browse and the lifting for the service level that sorts.
Brief description of the drawings
Fig. 1 is a kind of flow chart of internet personage video interactive mask method according to the embodiment of the present invention;
Fig. 2 is a kind of the internet personage video interactive labeling system sectional drawing and correlation module according to the embodiment of the present invention Explanation.
Embodiment
For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with specific embodiment, and reference Accompanying drawing, the present invention is described in further detail.
As shown in figure 1, the internet personage video interactive mask method of the present invention comprises the following steps:
S1, video is analyzed, extract the name collection in the face arrangement set and video periphery text in the video Close;
S2, using the name in the name set that step S1 is obtained as text key word, search for obtain with the people famous prime minister The network image set of the personage answered;
S3, the importance score for calculating the face sequence, the merging recommendation scores two-by-two of the face sequence, Yi Jisuo The similarity score for personage's network image corresponding with the name that face sequence obtains with step S2 is stated, and according to described heavy The property wanted score, merging recommendation scores and the similarity score two-by-two, it is determined that when being labeled to the video, are shown Face sequence, name and the personage's network image shown;
S4, the face sequence shown according to step S3, name and personage's network image, mark is interacted to face sequence Note, and then realize the mark to the video.
According to a preferred embodiment of the invention, video is analyzed, extracts the face sequence in the video and video week The step S1 of name in the text of side detailed process is:
Step S11, shot segmentation is carried out to the video, Face detection and tracking is carried out to each obtained camera lens, obtained The face sequence in the camera lens is obtained, the face sequence that comprehensive all camera lenses obtain, obtains the face arrangement set of the video;
When it is implemented, shot boundary detector is carried out to internet personage video to be marked first, according to shot boundary Video is decomposed into camera lens set by point;Then computer vision is called to increase income the storehouse OpenCV each camera lens of Face datection function pair Carry out Face datection frame by frame;Next, belong to same by what is detected on different video frame using based on the tracking of detection Personal face gathers together, and generates face sequence.All camera lenses are repeated with the generating process of above-mentioned face sequence, this is obtained and regards Frequently the face arrangement set of all detectionsWherein FN represents the quantity of face sequence.
Based on detection tracking generation face sequence the step of be:First, it is each according to Face datection result, extraction The color histogram feature of face, and the similarity two-by-two of face is calculated according to this.Then, by the similarity two-by-two of face from greatly to Small sequence, using coagulation type clustering method, the face for meeting following four conditions is merged two-by-two, four conditions are specific For:1) similarity two-by-two of face is more than merging threshold value set in advance;2) institute in the face class where merging the first two face In the face set be made up of face, no any two face appears in same frame of video;3) between the appearance of two faces Every no more than 1 second;4) distance of two face center point coordinates is not more than 2.5 times of face width.It is straight to repeat above-mentioned merging process Meet aforementioned four condition simultaneously to there is no two faces, obtain face cluster result.Finally, the people of same class will be belonged to Face sorts by (frame of video) time of occurrence, the face missed with the method supplement Face datection process of interpolation, generates complete people Face sequence.All faces in each face sequence belong to same person.
Above is the face retrieval method of one embodiment of the present invention.It is of course also possible to examined using other faces Survey and tracking obtains face sequence, the present invention does not do any restrictions for the acquisition methods of face sequence.
Step S12, the representative facial image of each face sequence in the face arrangement set is obtained.
A kind of embodiment is, to each face sequence, such as Fk, subscript k represent face arrangement set in k-th of people Face sequence, if it has t face, the color histogram feature of this t face is extracted, calculate face similarity matrix two-by-two according to thisWhereinFor the similarity of ith and jth face, orderFor TkIn all human face similarity degrees Average value, self-adaption cluster is carried out to this t face using affine propagation (Affinity Propagation) clustering algorithm.If Cluster generation | Fk| individual class, then FkIts representative face image set can be expressed asWhereinIt is from The nearest facial image of i class central point.Wherein, i is the natural number less than t.
It is of course also possible to obtain the representative facial image of face sequence using other methods, the present invention is for face sequence The acquisition methods of the representative facial image of row do not do any restrictions.
Step S13, video periphery text is collected, name is extracted from the video periphery text.
Where video periphery text refers to internet video on webpage, the word content related to video, it includes But it is not limited to:Video title, label, descriptive text and user comment.In view of different types of periphery text correlation and Noise level is different, and the present invention only considers the user comment of video title, label and length more than 20 words.
Specifically, in the case of periphery text is English (the periphery text of such as English video website), one kind is employed Name extracting method based on the matching of wikipedia biographical dictionary.To the continuous word sequence in above-mentioned text, this method is from One word starts, and whether the phrase for testing first (n < 4) word sequence compositions of n successively forms a Wiki entry, if forming, Then retain the maximum Wiki entries of n, and continue above-mentioned test since (n+1)th word.With it, can be continuous " Barack Obama " and " the Wiki entry such as World Cup 2014 " are found in word sequence.In title, label and comment collection The above-mentioned resolving of repetition is closed, after obtaining Wiki entry set.This method verifies whether these Wiki entries are names one by one. Specifically, the classification description section of Wiki page where checking Wiki entry, inquiry are " xxxx wherein with the presence or absence of form Births " description classification, wherein xxxx are four or three numerals for representing the time.If in the presence of judging the Wiki entry For name, otherwise it is determined as other name entities and is omitted.
Previously described is the processing method of English text, in the case of video periphery text is Chinese, first with Chinese word segmentation instrument ICTCLAS carries out Chinese word segmentation, then again using the above-mentioned name based on the matching of wikipedia biographical dictionary Extracting method carries out name, and extraction can (judgment criterion of Wiki classification description section be accordingly changed into The description classification of " xxxx births ").By above-mentioned processing, the related name set of the video can be obtained Wherein NkK-th of name extracted is represented, CN represents the quantity of extracted name.
Because the periphery text of internet video is generally provided by video upload user, the syntactic structure of text is loose, single Word collocation it is more free, misspelling and write a Chinese character in simplified form it is also of common occurrence.The above-mentioned name based on the matching of wikipedia biographical dictionary carries Method is taken independent of syntactic structure, and to misspelling and has write a Chinese character in simplified form certain tolerance, particularly suitable for internet video week Name extraction in the text of side.It is of course also possible to using other name extracting methods, the present invention is not done for name extracting method Any restrictions.
Step S2 is using the name in the name set that step S1 is obtained as text key word, is searched for obtain and the people The corresponding personage's network image set of name, it specifically comprises the following steps:
Step S21, using the name in the name set that step S1 is obtained as text key word, search for and download on network The image related to the text key word.
Specifically, existing image search engine can be utilized, such as the application programming interfaces for calling Google to provide, Text key word is submitted to Google image search engines, and the image for setting search parameter to include face for 64 width of retrieval, Under the setting, Google image search engines can sort retrieval result in the URL of the character image of first 64 (i.e. URL addresses) returns to retrieval end, retrieves end and then downloads respective image according to URL addresses.That is, in all images Can normally download ideally, the step for can obtain 64 search result images.In practice, each name can With the image that downloads to generally between 50 to 64.
Step S22, Face datection is carried out to the image related to the text key word of the download, filters out and do not examine Measure face or detect the image of more than one face.
For example, computer vision can be called to increase income storehouse OpenCV Face datection function, to downloading successful figure map As carrying out Face datection.The returning result of Face datection function can be:Do not detect face, detect one or more people Face.In image due to detecting multiple faces in addition to inquirer, the face of other personages is generally will also include, and then marking Reference comparison procedure in the person that can disturb mark judgement, therefore the step only retains the image for detecting 1 face, does not examine Measure face and detect that the image of multiple faces is then removed;
Step S23, repeat the above steps S21 and step S22 to all names in the name set, obtain with it is described Each personage's network image set corresponding to name in name set.
The set of personage's network image can be designated asWherein CkRepresent all and name NkCorresponding personage's net Network image.
For convenience of the follow-up explanation to embodiment.The system composition of the present invention is simply introduced first.Fig. 2 Give the sectional drawing of above-mentioned Interactive Marking System, it can be seen that system interface is divided into management region, tab area, mark ginseng Four parts in examination district domain and mark history area.Management region supplies mark person's interactive selection video to be marked, is regarded selected by loading The face sequence names and related name of frequency.Tab area can be further divided into similar face merging/distinguishing mark subregion and Name-face connective marker subregion, it is respectively intended to the current similar face combined sequence Q to be marked of displayi=< Fm, Fn> With face sequence Fj, and perform corresponding interaction labeling operation.In addition, with showing in name-face connective marker subregion The most like name of face sequence of leting others have a look at and its first six width personage network image are then accordingly shown in mark reference zone.Rightmost Mark history area the tuple of name-face two marked is then shown by mark order, wherein two tuples of newest mark show Show in the top.The function of mark reference zone and mark history area helps mark person reference to determine primarily as auxiliary information Plan.
Step S3 is importance score, the merging recommendation scores two-by-two of the face sequence for calculating the face sequence, And the similarity score of personage's network image corresponding with the name that the face sequence obtains with step S2, and according to The importance score, merging recommendation scores and the similarity score two-by-two, it is determined that when being labeled to the video, Face sequence, name and the personage's network image shown.Step S3 is included as follows step by step:
Step S31, the significance value of each face sequence in the face arrangement set is calculated.
It is video because the face sequence that time of occurrence is longer, face is bigger more easily attracts much attention in video The probability of key figure is also bigger.This property of face sequence is referred to as conspicuousness by the present invention, and is proposed following notable Property value calculation formula:
Wherein, sizeiAnd duraiIt is face sequence F respectivelyiAverage face size and time of occurrence length, sizeθWith duraθIt is two threshold values rule of thumb set, is respectively intended to the shadow for controlling face size and time of occurrence to calculate conspicuousness Ring.By formula (1), the face sequence that long, the average people of time of occurrence is bold will have larger significance value.
Step S32, the similitude two-by-two in the face arrangement set between face sequence is calculated.
Because in video, time of occurrence has two overlapping face sequences generally to correspond to different people, but between time of occurrence Every shorter face sequence be probably then same person because the reason such as Shot change, the different face sequences of generation.Based on upper Understanding is stated, proposes that the visual similarity two-by-two according to face sequence, face sequence time of occurrence interval, time of occurrence whether there is The information such as overlapping, calculates the similitude two-by-two of face sequence, and corresponding calculation formula is:
Wherein, timeθBe for control time of occurrence difference influence threshold value, Δ timeI, jIt is face sequence FiAnd Fj's Time of occurrence difference, calculated by equation below (3):
In formula (3),WithIt is face sequence F respectivelyiBetween at the beginning of appearance and the end time, when Between be worth and small show that face sequence appears in before video (beginning) part.In addition in formula (2), COI, jIt is to represent face sequence FiAnd FjWhether time of occurrence has overlapping two-valued function, if the two has overlapping, COI, j=1, otherwise COI, j=0;vs(Fi, Fj) it is face sequence FiAnd FjVisual similarity, in the representative face set with two face sequences, most like two The similarity of face represents that its calculation formula is:
In formula (4),It is face sequence FiM-th of representative face facial characteristics vector.
Step S33, the face sequence obtained according to step S32 similitude and customer interaction information during mark two-by-two, meter Calculate the merging recommendation scores two-by-two of face sequence.
Specifically calculated using equation below:
MS(Fi, Fj)=(1-PMI, j)·sim(Fi, Fj) (5)
Wherein PMI, jIt is to represent face sequence FiAnd FjCombination whether in annotation process, by user's " skipping " or mark For " difference ".If then PMI, j=1, otherwise PMI, j=0.According to formula (5), similitude is high, and during user annotation not Big merging recommendation scores two-by-two will be endowed by user's " skipping " or the face sequence combination of two for being labeled as " difference ".It is based on All scores are more than or equal to the face combined sequence of previously given threshold value according to MS (F by thisi, Fj) value arranges, obtain from high to low To merging recommendation scores list two-by-twoWherein Qk=< Fi, Fji≠j.In annotation process, Fig. 2 systems Similar face sequence in system merge distinguishing mark subregion will be according to RankMSShow face sequence combination of two to be marked.
Step S34, the face sequence significance value obtained using step S31, the two-phase of face sequence two that step S32 is obtained Customer interaction information during like property score and mark, calculate the importance score of face sequence.
After the importance expression of face sequence considers the information of a variety of face sequences, the video and user mutual, The degree that face sequential value must mark, it can be calculated using following equation (6):
Wherein PAiIt is to characterize face sequence FiWhether " skipped " by user in annotation process, if then PAi=1, otherwise PAi=0;WithIt is the conspicuousness Sai after minimax normalization respectivelyiWith accumulation correlation ARi, the latter is defined as:
Wherein, LjIt is face sequence FjMark function of state.If FiIt has been be marked that, then Lj=1, otherwise Lj=0.According to formula (6), significance value is big, has marked that face is all more similar to multiple, and the face " do not skipped " during user annotation Sequence will be endowed big importance score.
Based on this, by face sequence according to importance score IS (Fi) arrange from high to low, obtain importance score listIn annotation process, name-face connective marker subregion in Fig. 2 systems will be according to RankISIt is aobvious Show face sequence to be marked.
Step S35, face sequence and the personage in personage's network image set in the face arrangement set are calculated The similarity of network image, it is ranked up by the height of similarity, K of people's list of file names and each name after being sorted Most like personage's network image.In the present invention, K value is arranged to 6.
The step is mainly solved in interactive annotation process, and mark person is frequently encountered the feelings for not recognizing personage to be marked Condition.By showing above-mentioned name and personage's network image in labeling system, alleviate the problem of mark person does not recognize personage to be marked. Specifically, after the face sequence that is shown in name-face connective marker subregion determines, by the name most like with it and its K most like personage's network images are shown, and for mark person with reference to comparing, auxiliary is determined corresponding to face sequence to be marked Name.The calculating of the step includes following three sub-step:
Step S351, calculate face sequence in the face arrangement set in the name set name it is similar two-by-two Property.The visual characteristic of name can be represented by its corresponding personage's network image.Based on this, people is calculated by equation below (8) Face sequence FiWith personage's network image set CjSimilitude, and be used as face sequence F by the use of the similitudeiWith name NjIt is similar Property:
Wherein
It is name NjCorresponding personage's network image set CjThe face features vector of middle n-th image;
Step S352, the similitude being calculated according to step S351, the name is ranked up.Generally, face sequence Arrange FiWith name NjSimilarity vs (Fi, Nj) bigger, then FiIt is NjFace probability it is also bigger.Based on this, according to vs (Fi, Nj) value is from high to low to name set omegaNIt is ranked up, obtains face sequence FiCorresponding name sequence
Step S353, K most like personage network images of the face sequence relative to each name are calculated.From formula (9) It can be seen that the similarity in personage's network image with face sequence is by the image and most like face sequence representativeness face Represent.Therefore, to each group of people's face sequence FiWith name Nj, according toValue is from high to low to CjIn personage's network image It is ranked up, retains K most like images, obtain face sequence FiRelative to name NjPersonage's network image listWherein K is set as 6;
According to a preferred embodiment of the invention, the face sequence importance score list that step 34 obtainsAnd the face sequence F that step 35 obtainsiCorresponding similar people's list of file namesWith similar personage's network image listPass through a variety of user mutuals Operation, corresponding mark behavior is produced, realize that details are provided below that step S4 is specifically included to the mark of personage in video:
Step S41, the various resources that initialization annotation process is related to.
Specific practice is:
S411, order
S412, automatic marking meet the face combined sequence Q of condition shown in formula (10)i=< Fm, Fn>, and will be all Combination has been marked from RankMSRemoved in list
Label(Fi)=Label (Fj), if satisfies vs (Fi, Fj)≥Ts (10)
Wherein if satisfies represent " if satisfaction ", TsRepresent whether similar enough on two face serial visuals Threshold value.
S413, from RankMSAnd RankISThe middle element Q for taking out top ranked respectivelyi=< Fm, Fn> and Fj, i.e., current two Two merge the face sequence of the face combined sequence of highest scoring and importance highest scoring, by these resources in labeling system Shown;
S414, take out Rank (Fj) in top ranked nameAndIn K image, by this A little resources are shown in labeling system.
Step S42, according to a variety of user interactives, corresponding mark behavior is produced.
User interactive has three classes:1) the face combined sequence Q that will be shown in systemi=< Fm, Fn> is labeled as " phase Together " or " difference " similar face merging/distinguishing mark operation;2) particular person name mark face sequence F is selectedjName-people Face connective marker operates;3) different names and its personage's network image are selected, for the operation of mark person reference.This three generic operation In, the 3rd class is mark auxiliary operation, it is therefore an objective to provides information auxiliary user annotation decision-making, the second generic operation can be FjOn mark Corresponding name, and it is never marked and removed in face arrangement set ULSets.Mark behavior corresponding to this three classes interactive operation It is respectively:
1) similar face merging/mark behavior corresponding to distinguishing mark operation:
A) if user marks Q with " identical " optioni, then Label (F are madem)=Label (Fn), wherein Label (Fm) represent Face sequence FmCorresponding name;
B) if user marks Q with " difference " optioni, then Label (F are madem)≠Label(Fn), with seasonal PMM, n=1;
C) if user is to Qi" skipping " option is selected, then makes PMM, n=1;
2) name-mark behavior corresponding to the operation of face connective marker:
A) if user selects to use name NkFlag Fj, then make ULSets=ULSets Fj, Label (Fj)=Nk
B) if user selects " skipping " to FjMark, then make PAj=1;
3) behavior corresponding to name and personage's network image selection operation:
If a) user clicks on " previous " option, k=k-1 (as k > 1) is made, shows nameAnd personage's net Network image listIn K image;
If b) user clicks on " the latter " option, k=k+1 (as k < CN) is made, shows nameAnd personage's net Network image listIn K image.
Step S43, other face sequences that do not mark are labeled using label propagation algorithm.
Because the interaction mark behavior of user provides extra mark clue.Therefore, using label propagation algorithm to full Sufficient equation below (11) or the other of (12) described condition do not mark face sequence FiCarry out automatic marking;
Wherein TsIt is the similarity threshold that formula (10) defines.
Step S44, arranged and reordered to merging recommendation scores list and importance score list two-by-two, under decision The resource presented during one wheel user annotation.
By step S42 and step S43, merge recommendation scores list Rank two-by-twoMSWith importance score list RankIS In some face sequences be marked.The step is according to annotation results, to RankMSAnd RankISArranged and reordered, certainly Determine the resource presented during next round user annotation.Above-mentioned arrangement and the specific practice to reorder are respectively:
1) arrange:By RankMSAnd RankISThe middle element Q for meeting equation below (13), (14) or (15) conditioni= < Fm, Fn> and FjRemove respectively:
RankMS=RankMS\Qi, if satisfies Label (Fm)=Label (Fn) (14)
RankIS=RankIS\Fj, if satisfies (15)
2) reorder:To RankMSAnd RankISIn remaining element, be utilized respectively formula (6) and (5) recalculate its two Two merge recommendation scores and importance score, and are resequenced according to score and generate RankMSAnd RankISList, as next round The foundation that resource is shown during interaction mark.
Step S45, repeat step S42 to step S44, until all face sequences that do not mark are all marked (i.e.), or user actively exits annotation process.
Particular embodiments described above, the purpose of the present invention, technical scheme and beneficial effect are carried out further in detail Describe in detail bright, it should be understood that the foregoing is only the present invention specific embodiment, be not intended to limit the invention, it is all Within the spirit and principles in the present invention, any modification, equivalent substitution and improvements done etc., the protection of the present invention should be included in Within the scope of.

Claims (15)

1. a kind of internet personage video interactive mask method, it is characterised in that comprise the following steps:
S1, video to be marked is analyzed, extract the name in the face arrangement set and video periphery text in the video Set;
S2, using the name in the name set that step S1 is obtained as text key word, search for obtain it is corresponding with the name Personage's network image set;
S3, the importance score for calculating the face sequence, the merging recommendation scores two-by-two of the face sequence, and the people The similarity score for personage's network image corresponding with the name that face sequence obtains with step S2, and according to the importance Score, merging recommendation scores and the similarity score two-by-two, it is determined that when being labeled to the video, are shown Face sequence, name and personage's network image;
S4, the face sequence shown according to step S3, name and personage's network image, interact mark to face sequence, enter And the mark to the video is realized,
Wherein, the step S3 comprises the following steps:
Step S31, the significance value of each face sequence in the face arrangement set is calculated;
Step S32, the similitude two-by-two in the face arrangement set between face sequence is calculated;
Step S33, the face sequence obtained according to step S32 similitude and customer interaction information during mark two-by-two, calculate people The merging recommendation scores two-by-two of face sequence;
Step S34, the face sequence significance value obtained using step S31, the face sequence that step S32 is obtained similitude two-by-two Customer interaction information when score and mark, calculate the importance score of face sequence;
Step S35, face sequence and personage's network in personage's network image set in the face arrangement set are calculated The similarity of image, it is ranked up by the height of similarity, obtains people's list of file names after each face sequence permutation and each K most like personage's network images of name;
The step S4 comprises the following steps:
Step S41, the various resources that initialization annotation process is related to;
Step S42, according to a variety of user interactives, corresponding mark behavior is produced;
Step S43, other face sequences that do not mark are labeled using label propagation algorithm;
Step S44, arranged and reordered to merging recommendation scores list and importance score list two-by-two, determine next round The resource presented during user annotation;
Step S45, repeat step S42 to step S44, until all face sequences that do not mark are all marked,
Wherein, the importance score of face sequence is calculated by equation below (1) in the step S34:
<mrow> <mi>I</mi> <mi>S</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>PA</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>&amp;CenterDot;</mo> <mrow> <mo>(</mo> <mover> <mrow> <msub> <mi>Sai</mi> <mi>i</mi> </msub> </mrow> <mo>&amp;OverBar;</mo> </mover> <mo>+</mo> <mover> <mrow> <msub> <mi>AR</mi> <mi>i</mi> </msub> </mrow> <mo>&amp;OverBar;</mo> </mover> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>
Wherein PAiIt is to characterize face sequence FiWhether " skipped " by user in annotation process, if then PAi=1, otherwise PAi= 0;WithIt is the conspicuousness Sai after minimax normalization respectivelyiWith accumulation correlation ARi, the latter is defined as:
<mrow> <msub> <mi>AR</mi> <mi>i</mi> </msub> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mi>j</mi> <mo>&amp;NotEqual;</mo> <mi>i</mi> </mrow> <mrow> <mi>F</mi> <mi>N</mi> </mrow> </munderover> <msub> <mi>L</mi> <mi>j</mi> </msub> <mo>&amp;CenterDot;</mo> <mi>s</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>F</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>
Wherein, FN represents the quantity of face sequence, sim (Fi, Fj) represent face sequence FiAnd FjSimilitude, LjIt is face sequence FjMark function of state;If FiIt has been be marked that, then Lj=1, otherwise Lj=0,
By face sequence according to importance score IS (Fi) arrange from high to low, obtain importance score list
A variety of user interactives include in the step S42:1) the face combined sequence Q for showing systemi=< Fm, Fn> Labeled as the similar face merging/distinguishing mark operation of " identical " or " difference ";2) particular person name mark face sequence F is selectedj The operation of name-face connective marker;3) interactive operation for selecting different names and its personage's network image to be shown.
2. according to the method for claim 1, it is characterised in that the step S1 comprises the following steps:
Step S11, shot segmentation is carried out to the video, Face detection and tracking is carried out to each obtained camera lens, is somebody's turn to do Face sequence in camera lens, the face sequence that comprehensive all camera lenses obtain, obtains the face arrangement set of the video;
Step S12, the representative facial image of each face sequence in the face arrangement set is obtained;
Step S13, video periphery text is collected, name is extracted from the video periphery text.
3. according to the method for claim 1, it is characterised in that the step S2 comprises the following steps:
Step S21, using the name in the name set that step S1 is obtained as text key word, search for and download and institute on network State the related image of text key word;
Step S22, Face datection is carried out to the image related to the text key word of the download, filters out and do not detect Face or detect the image of more than one face;
Step S23, repeat the above steps S21 and step S22 to all names in the name set, obtains and the name Each personage's network image set corresponding to name in set.
4. according to the method for claim 1, it is characterised in that face sequence F in the step S31iConspicuousness pass through such as Lower formula (3) calculates:
<mrow> <mi>S</mi> <mi>a</mi> <mi>i</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mi>e</mi> <mrow> <mo>-</mo> <mfrac> <mrow> <msub> <mi>size</mi> <mi>&amp;theta;</mi> </msub> </mrow> <mrow> <msub> <mi>size</mi> <mi>i</mi> </msub> </mrow> </mfrac> </mrow> </msup> <mo>+</mo> <msup> <mi>e</mi> <mrow> <mo>-</mo> <mfrac> <mrow> <msub> <mi>dura</mi> <mi>&amp;theta;</mi> </msub> </mrow> <mrow> <msub> <mi>dura</mi> <mi>i</mi> </msub> </mrow> </mfrac> </mrow> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>
Wherein, sizeiAnd duraiIt is face sequence F respectivelyiAverage face size and time of occurrence length, sizeθAnd duraθ It is two threshold values rule of thumb set, is respectively intended to the influence for controlling face size and time of occurrence to calculate conspicuousness.
5. according to the method for claim 1, it is characterised in that the similitude two-by-two of face sequence is led in the step S32 Cross equation below (4) calculating:
<mrow> <mi>s</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>F</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mi>e</mi> <mrow> <mo>-</mo> <mfrac> <mrow> <msub> <mi>&amp;Delta;time</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> </mrow> <mrow> <msub> <mi>time</mi> <mi>&amp;theta;</mi> </msub> </mrow> </mfrac> </mrow> </msup> <mo>&amp;CenterDot;</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>CO</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>&amp;CenterDot;</mo> <mi>v</mi> <mi>s</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>F</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>
Wherein, timeθBe for control time of occurrence difference influence threshold value, Δ timeI, jIt is face sequence FiAnd FjAppearance Time difference, calculated by equation below (5):
<mrow> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <msubsup> <mi>time</mi> <mi>j</mi> <mrow> <mi>b</mi> <mi>e</mi> <mi>g</mi> </mrow> </msubsup> <mo>-</mo> <msubsup> <mi>time</mi> <mi>i</mi> <mrow> <mi>e</mi> <mi>n</mi> <mi>d</mi> </mrow> </msubsup> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>i</mi> <mi>f</mi> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>time</mi> <mi>i</mi> <mrow> <mi>b</mi> <mi>e</mi> <mi>g</mi> </mrow> </msubsup> <mo>&amp;le;</mo> <msubsup> <mi>time</mi> <mi>j</mi> <mrow> <mi>b</mi> <mi>e</mi> <mi>g</mi> </mrow> </msubsup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>time</mi> <mi>i</mi> <mrow> <mi>b</mi> <mi>e</mi> <mi>g</mi> </mrow> </msubsup> <mo>-</mo> <msubsup> <mi>time</mi> <mi>j</mi> <mrow> <mi>e</mi> <mi>n</mi> <mi>d</mi> </mrow> </msubsup> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>i</mi> <mi>f</mi> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>time</mi> <mi>j</mi> <mrow> <mi>b</mi> <mi>e</mi> <mi>g</mi> </mrow> </msubsup> <mo>&amp;le;</mo> <msubsup> <mi>time</mi> <mi>i</mi> <mrow> <mi>b</mi> <mi>e</mi> <mi>g</mi> </mrow> </msubsup> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>5</mn> <mo>)</mo> </mrow> </mrow>
In formula (5),WithIt is face sequence F respectivelyiBetween at the beginning of appearance and end time, time value are small Show that face sequence appears in the previous section of video;
In formula (4), COI, jIt is to represent face sequence FiAnd FjWhether time of occurrence has overlapping two-valued function, if the two has weight Fold, then COI, j=1, otherwise COI, j=0;vs(Fi, Fj) it is face sequence FiAnd FjVisual similarity, with two face sequences Representative face set in, the similarity of most like two faces represents that its calculation formula is:
<mrow> <mi>v</mi> <mi>s</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>F</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mi>e</mi> <mrow> <mo>-</mo> <msub> <mi>min</mi> <mrow> <msubsup> <mi>f</mi> <mi>i</mi> <mi>m</mi> </msubsup> <mo>&amp;Element;</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>,</mo> <msubsup> <mi>f</mi> <mi>j</mi> <mi>n</mi> </msubsup> <mo>&amp;Element;</mo> <msub> <mi>F</mi> <mi>j</mi> </msub> <mo>,</mo> <mi>i</mi> <mo>&amp;NotEqual;</mo> <mi>j</mi> </mrow> </msub> <mo>|</mo> <mo>|</mo> <msubsup> <mi>f</mi> <mi>i</mi> <mi>m</mi> </msubsup> <mo>-</mo> <msubsup> <mi>f</mi> <mi>j</mi> <mi>n</mi> </msubsup> <mo>|</mo> <mo>|</mo> </mrow> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow>
In formula (6), fi mIt is face sequence FiM-th of representative face facial characteristics vector.
6. according to the method for claim 5, it is characterised in that the merging two-by-two of face sequence is recommended in the step S33 Score is calculated by equation below (7):
MS(Fi, Fj)=(1-PMI, j)·sim(Fi, Fj) (7)
Wherein PMI, jIt is to represent face sequence FiAnd FjCombination whether in annotation process, by user's " skipping " or be labeled as " no Together ";If then PMI, j=1, otherwise PMI, j=0;According to formula (7), similitude is high, and not by user during user annotation " skipping " or it is labeled as the face sequence combination of two of " difference " and big merging recommendation scores two-by-two will be endowed;, will based on this All scores are more than or equal to the face combined sequence of previously given threshold value according to MS (Fi, Fj) value arranges, obtain two-by-two from high to low Merge recommendation scores listWherein Qk=< Fi, Fji≠j
7. according to the method for claim 6, it is characterised in that the step S35 comprises the following steps:
Step S351, face sequence and the similitude two-by-two of name in the name set in the face arrangement set are calculated;
Step S352, the similitude being calculated according to step S351, the name is ranked up;
Step S353, K most like personage network images of the face sequence relative to each name are calculated.
8. according to the method for claim 7, it is characterised in that
The step S351 calculates face sequence F by equation below (8)iWith personage's network image set CjSimilitude, be used in combination The similitude is as face sequence FiWith name NjSimilitude:
<mrow> <mi>v</mi> <mi>s</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>N</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mi>v</mi> <mi>s</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>C</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mo>|</mo> <msub> <mi>C</mi> <mi>j</mi> </msub> <mo>|</mo> </mrow> </mfrac> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>n</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mo>|</mo> <msub> <mi>C</mi> <mi>j</mi> </msub> <mo>|</mo> </mrow> </munderover> <mi>v</mi> <mi>s</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>,</mo> <msubsup> <mi>c</mi> <mi>j</mi> <mi>n</mi> </msubsup> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>8</mn> <mo>)</mo> </mrow> </mrow>
Wherein
<mrow> <mi>v</mi> <mi>s</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>,</mo> <msubsup> <mi>c</mi> <mi>j</mi> <mi>n</mi> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mi>e</mi> <mrow> <mo>-</mo> <msub> <mi>min</mi> <mrow> <msubsup> <mi>f</mi> <mi>i</mi> <mi>m</mi> </msubsup> <mo>&amp;Element;</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> </mrow> </msub> <mo>|</mo> <mo>|</mo> <msubsup> <mi>f</mi> <mi>i</mi> <mi>m</mi> </msubsup> <mo>-</mo> <msubsup> <mi>c</mi> <mi>j</mi> <mi>n</mi> </msubsup> <mo>|</mo> <mo>|</mo> </mrow> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>9</mn> <mo>)</mo> </mrow> </mrow>
It is personage's network image set CjThe face features vector of middle n-th image.
9. according to the method for claim 8, it is characterised in that
The step S352 is to each face sequence Fi, according to vs (Fi, Nj) value is ranked up to name from high to low, obtain people Name sequenceWherein, CN represents the quantity of extracted name.
10. according to the method for claim 9, it is characterised in that
The step S353 is to each group of people's face sequence FiWith name Nj, according toValue is from high to low to CjIn personage's network Image is ranked up, and retains the most like images of K, is obtained and FiAnd NjCorresponding personage's network image list
11. according to the method for claim 10, it is characterised in that the step S41 includes:
S411, orderWherein, ULSets is represented Face arrangement set is not marked;
S412, automatic marking meet the face combined sequence Q of condition shown in formula (10)i=< Fm, Fn>, and marked all Note is combined from RankMSRemoved in list
Label(Fi)=Label (Fj), if vs (Fi, Fj)≥Ts (10)
Wherein Label (Fi) represent face sequence FiCorresponding name, Label (Fj) represent face sequence FjCorresponding name, Ts Be represent on two face serial visuals whether threshold value similar enough;
S413, from RankMSAnd RankISThe middle element Q for taking out top ranked respectivelyi=< Fm, Fn> and Fj, i.e., currently close two-by-two And the face combined sequence of highest scoring and the face sequence of importance highest scoring, these resources are given in labeling system Display;
S414, take out Rank (Fj) in top ranked nameAndIn K image, these are provided Source is shown in labeling system.
12. according to the method for claim 11, it is characterised in that a variety of user interactives are corresponding in the step S42 Mark behavior be respectively:
1) similar face merging/mark behavior corresponding to distinguishing mark operation:
A) if user marks Q with " identical " optioni, then Label (F are madem)=Label (Fn), wherein Label (Fm) represent face sequence Arrange FmCorresponding name;
B) if user marks Q with " difference " optioni, then Label (F are madem)≠Label(Fn), with seasonal PMM, n=1;
C) if user is to Qi" skipping " option is selected, then makes PMM, n=1;
2) name-mark behavior corresponding to the operation of face connective marker:
A) if user selects to use name NkFlag Fj, then make ULSets=ULSets Fj, Label (Fj)=Nk
B) if user is to Fj" skipping " option is selected, then makes PAj=1;
3) behavior corresponding to name and personage's network image selection operation:
If a) user clicks on " previous " option, k=k-1 (as k > 1) is made, shows nameAnd personage's network image ListIn K image;
If b) user clicks on " the latter " option, k=k+1 (as k < CN) is made, shows nameAnd personage's network As listIn K image.
13. according to the method for claim 12, it is characterised in that the step S43 to meet certain condition it is other not Mark face sequence FiCarry out shown in specific practice such as formula (11) or (12) of automatic marking:
<mrow> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mi>L</mi> <mi>a</mi> <mi>b</mi> <mi>e</mi> <mi>l</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>N</mi> <mi>k</mi> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>U</mi> <mi>L</mi> <mi>S</mi> <mi>e</mi> <mi>t</mi> <mi>s</mi> <mo>=</mo> <mi>U</mi> <mi>L</mi> <mi>S</mi> <mi>e</mi> <mi>t</mi> <mi>s</mi> <mo>\</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> <mi>i</mi> <mi>f</mi> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>&amp;Element;</mo> <mi>U</mi> <mi>L</mi> <mi>S</mi> <mi>e</mi> <mi>t</mi> <mi>s</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>v</mi> <mi>s</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>F</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>&amp;GreaterEqual;</mo> <msub> <mi>T</mi> <mi>s</mi> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>L</mi> <mi>a</mi> <mi>b</mi> <mi>e</mi> <mi>l</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>N</mi> <mi>k</mi> </msub> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>11</mn> <mo>)</mo> </mrow> </mrow>
<mrow> <mi>Label</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mi>Label</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>,</mo> <mi>if</mi> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>&amp;Element;</mo> <mi>ULSets</mi> </mtd> </mtr> <mtr> <mtd> <msub> <mi>F</mi> <mi>j</mi> </msub> <mo>&amp;Element;</mo> <mi>ULSets</mi> </mtd> </mtr> <mtr> <mtd> <mi>vs</mi> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>F</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>&amp;GreaterEqual;</mo> <mi>T</mi> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>12</mn> <mo>)</mo> </mrow> </mrow>
Wherein TsIt is the similarity threshold that formula (10) defines.
14. according to the method for claim 13, it is characterised in that the step S44 is according to annotation results, to RankMSWith RankISThe specific practice for being arranged and being reordered is:
1) arrange:In RankMSAnd RankISThe middle element Q for deleting the condition for meeting equation below (13), (14) or (15) respectivelyi =< Fm, Fn> and Fj
<mrow> <msub> <mi>Rank</mi> <mi>MS</mi> </msub> <mo>=</mo> <msub> <mi>Rank</mi> <mi>MS</mi> </msub> <mo>\</mo> <msub> <mi>Q</mi> <mi>i</mi> </msub> <mo>,</mo> <mi>if</mi> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <msub> <mi>F</mi> <mi>m</mi> </msub> <mo>&amp;NotElement;</mo> <mi>ULSets</mi> </mtd> </mtr> <mtr> <mtd> <msub> <mi>F</mi> <mi>n</mi> </msub> <mo>&amp;NotElement;</mo> <mi>ULSets</mi> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>13</mn> <mo>)</mo> </mrow> </mrow>
RankMS=RankMS\Qi, if Label (Fm)=Label (Fn) (14)
<mrow> <msub> <mi>Rank</mi> <mi>IS</mi> </msub> <mo>=</mo> <msub> <mi>Rank</mi> <mi>IS</mi> </msub> <mo>\</mo> <msub> <mi>F</mi> <mi>j</mi> </msub> <mo>,</mo> <mi>if</mi> <msub> <mi>F</mi> <mi>j</mi> </msub> <mo>&amp;NotElement;</mo> <mi>ULSets</mi> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>15</mn> <mo>)</mo> </mrow> </mrow>
2) reorder:To RankMSAnd RankISRemaining element, recalculate it using formula (7) and (1) and merge recommendation two-by-two Score and importance score, and Rank is regenerated according to thisMSAnd RankIS, as next round interact mark when resource show according to According to.
A kind of 15. internet personage of any internet personage video interactive mask method using described in claim 1-14 Video interactive labeling system, it is characterised in that including:
For analyzing video to be marked, the name in the face arrangement set and video periphery text in the video is extracted The device of set;
For using the name in the name set as text key word, searching for the net of acquisition personage corresponding with the name The device of network image collection;
For calculating the merging recommendation scores two-by-two of the importance score of the face sequence, the face sequence, and it is described The similarity score of face sequence personage's network image corresponding with the name, and according to the importance score, described two Two merge recommendation scores and the similarity score, it is determined that when being labeled to the video, the face sequence that is shown, people The device of name and personage's network image;
For showing face sequence, name and the personage's network image to be marked, mark, Jin Ershi are interacted to face sequence The device being now labeled to the video.
CN201410475211.0A 2014-09-17 2014-09-17 Internet personage video interactive mask method and system Active CN104217008B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410475211.0A CN104217008B (en) 2014-09-17 2014-09-17 Internet personage video interactive mask method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410475211.0A CN104217008B (en) 2014-09-17 2014-09-17 Internet personage video interactive mask method and system

Publications (2)

Publication Number Publication Date
CN104217008A CN104217008A (en) 2014-12-17
CN104217008B true CN104217008B (en) 2018-03-13

Family

ID=52098498

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410475211.0A Active CN104217008B (en) 2014-09-17 2014-09-17 Internet personage video interactive mask method and system

Country Status (1)

Country Link
CN (1) CN104217008B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105809096A (en) * 2014-12-31 2016-07-27 中兴通讯股份有限公司 Figure labeling method and terminal
CN104796781B (en) * 2015-03-31 2019-01-18 小米科技有限责任公司 Video clip extracting method and device
US10405045B2 (en) * 2015-12-14 2019-09-03 Google Llc Systems and methods for estimating user attention
CN106227836B (en) * 2016-07-26 2020-07-14 上海交通大学 Unsupervised joint visual concept learning system and unsupervised joint visual concept learning method based on images and characters
CN109214247B (en) * 2017-07-04 2022-04-22 腾讯科技(深圳)有限公司 Video-based face identification method and device
CN107480236B (en) * 2017-08-08 2021-03-26 深圳创维数字技术有限公司 Information query method, device, equipment and medium
CN107832662B (en) * 2017-09-27 2022-05-27 百度在线网络技术(北京)有限公司 Method and system for acquiring image annotation data
CN108882033B (en) * 2018-07-19 2021-12-14 上海影谱科技有限公司 Character recognition method, device, equipment and medium based on video voice
CN111046235B (en) * 2019-11-28 2022-06-14 福建亿榕信息技术有限公司 Method, system, equipment and medium for searching acoustic image archive based on face recognition
CN111144306A (en) * 2019-12-27 2020-05-12 联想(北京)有限公司 Information processing method, information processing apparatus, and information processing system
CN111126069B (en) * 2019-12-30 2022-03-29 华南理工大学 Social media short text named entity identification method based on visual object guidance
CN111639599B (en) * 2020-05-29 2024-04-02 北京百度网讯科技有限公司 Object image mining method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101739428A (en) * 2008-11-10 2010-06-16 中国科学院计算技术研究所 Method for establishing index for multimedia
CN102629275A (en) * 2012-03-21 2012-08-08 复旦大学 Face and name aligning method and system facing to cross media news retrieval
CN103984738A (en) * 2014-05-22 2014-08-13 中国科学院自动化研究所 Role labelling method based on search matching

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101739428A (en) * 2008-11-10 2010-06-16 中国科学院计算技术研究所 Method for establishing index for multimedia
CN102629275A (en) * 2012-03-21 2012-08-08 复旦大学 Face and name aligning method and system facing to cross media news retrieval
CN103984738A (en) * 2014-05-22 2014-08-13 中国科学院自动化研究所 Role labelling method based on search matching

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于友好交互模式的半自动照片人物标注系统;张杰等;《第八届和谐人机环境联合学术会议(HHME2012)论文集NCMT》;20140530;第1、4-6页 *
基于用户交互的web图像标注框架设计与实现;郜新鑫;《中国优秀硕士学位论文全文数据库 信息科技辑》;20110315(第3期);I138-1075 *

Also Published As

Publication number Publication date
CN104217008A (en) 2014-12-17

Similar Documents

Publication Publication Date Title
CN104217008B (en) Internet personage video interactive mask method and system
CN110968699B (en) Logic map construction and early warning method and device based on fact recommendation
Arulanandam et al. Extracting crime information from online newspaper articles
US8935197B2 (en) Systems and methods for facilitating open source intelligence gathering
CN111291210B (en) Image material library generation method, image material recommendation method and related devices
Foley et al. Learning to extract local events from the web
CN106462640B (en) Contextual search of multimedia content
CN103544266B (en) A kind of method and device for searching for suggestion word generation
Nguyen et al. LifeSeeker 3.0: An Interactive Lifelog Search Engine for LSC'21
CN106204156A (en) A kind of advertisement placement method for network forum and device
US20120323905A1 (en) Ranking data utilizing attributes associated with semantic sub-keys
CN113722478B (en) Multi-dimensional feature fusion similar event calculation method and system and electronic equipment
CN109446399A (en) A kind of video display entity search method
Jahagirdar et al. Watching the news: Towards videoqa models that can read
US20120317141A1 (en) System and method for ordering of semantic sub-keys
CN103823868B (en) Event recognition method and event relation extraction method oriented to on-line encyclopedia
CN109783612B (en) Report data positioning method and device, storage medium and terminal
CN103020311B (en) A kind of processing method of user search word and system
US9875298B2 (en) Automatic generation of a search query
CN106874365A (en) Tracking based on social event on Social Media platform
CN105205075B (en) From the name entity sets extended method of extension and recommended method is inquired based on collaboration
Guo et al. Multi-modal identification of state-sponsored propaganda on social media
Arya et al. Predicting behavioural patterns in discussion forums using deep learning on hypergraphs
CN114880572A (en) Intelligent news client recommendation system
US20120317103A1 (en) Ranking data utilizing multiple semantic keys in a search query

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant