CN104217008A - Interactive type labeling method and system for Internet figure video - Google Patents

Interactive type labeling method and system for Internet figure video Download PDF

Info

Publication number
CN104217008A
CN104217008A CN201410475211.0A CN201410475211A CN104217008A CN 104217008 A CN104217008 A CN 104217008A CN 201410475211 A CN201410475211 A CN 201410475211A CN 104217008 A CN104217008 A CN 104217008A
Authority
CN
China
Prior art keywords
people
face
name
sequence
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410475211.0A
Other languages
Chinese (zh)
Other versions
CN104217008B (en
Inventor
陈智能
白锦峰
冯柏岚
黄向生
徐波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201410475211.0A priority Critical patent/CN104217008B/en
Publication of CN104217008A publication Critical patent/CN104217008A/en
Application granted granted Critical
Publication of CN104217008B publication Critical patent/CN104217008B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • G06F16/784Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people

Abstract

The invention discloses an interactive type labeling method and system for an Internet figure video. The method comprises the following steps: extracting human face sequences of a video to be labeled and names in a peripheral text; taking each name as a text keyword and obtaining a corresponding figure network image set by utilizing a searching engine; calculating significance scores of the human face sequences, recommending scores of the human face sequences combined two by two and the similarity of the human face sequences and a figure network image; determining the human face sequences, the name and the figure network image which are displayed in a labeling process according to the properties; generating corresponding labeling behaviors through many types of user interactive operations to realize the labeling of figures in the video. According to the interactive type labeling method and system, many types of resources related to the video to be labeled and the figures are excavated, and friendly and various user interactive manners are designed, so that the labeling process can be simplified and a labeling decision is assisted; the problem that the figures to be labeled are not known by a labeling person so that the labeling process is difficult to carry out is effectively relieved; the efficiency and the precision of labeling the figure video can be greatly improved.

Description

Internet personage video interactive mask method and system
Technical field
The present invention relates to video intelligent analysis technical field, in particular, relate to a kind of internet personage video interactive mask method and system.
Background technology
Along with the development of Internet technology is shared the popular of behavior online with video, a large amount of specialties and non-professional video are manufactured out, upload to internet, and are browsed and watch by user all over the world.Because this class video is embedded in internet web page conventionally, externally provide online broadcasting service, be therefore collectively referred to as internet video.People is one of internet video theme of greatest concern.In video website, there are a large amount of hot videos to relate to people, famous person particularly, in addition, famous person's name is also the important component part of video search engine focus inquiry word always.
Although internet personage video is extensively concerned, in large-scale internet video library, finding interested personage's video is not an easy thing.The video search engine of main flow is all that the method for mating by text key word realizes video frequency searching at present.Concerning personage's video, there is the deficiency of following three aspects in this search method: 1) the periphery text of internet video (for example title, label and user comment) normally imperfect and be with noisy, the video that someone occurs might not mark this people's name, correspondingly, having marked this people in the video of someone name not necessarily there will be, cause video search can only find the video that a part is relevant, and in result for retrieval, conventionally have a certain proportion of noise; 2) periphery text is the description to fragment in whole video rather than video, the video segment that is directly targeted to this people's appearance according to name remains the service that main flow video website can not provide at present, and this service can provide very large facility undoubtedly for user's browsing video; 3) in result for retrieval list, the video appearing at is not above maximally related conventionally, because only whether there is being difficult to obtain inquire about accurately degree of correlation judgement according to name.Therefore, industrial community in the urgent need to more intelligent effective personage's video frequency searching, browse and sort method.
The key addressing the above problem is that the people's face for occurring in video marks its corresponding name.In other words, set up people's face in video to the mapping relations of name in periphery text, this task is marked by the general people's of being called face.Although people's face detects and name identification has been the technology of comparative maturity, people's face marks, and particularly at people's face angle, facial expression, illumination, the mark that blocks etc. in unrestricted situation, remains a problem that has challenge.In several years in the past, the video for particular types such as news video, film and television plays, has had some effective people's face mask methods to be suggested.Although these methods are had nothing in common with each other on realizing, and have substantially all adopted the technology path of multimodal information fusion.First, from news lecture notes, phonetic transcription text or internet etc., outside channel obtains the related high priest's of video (as the dominant role of film) name for they, and the drama of video and captioned test, by utilizing news lecture notes or alignment drama and captions, obtain particular persons in the content of speaking of video particular point in time.According to detecting the time point of people's face in video, tentatively set up the mapping relations of people's face and name, and then utilize the visual similarity between people's face simultaneously, thereby this relation is refined and realized mark.Because can providing conventionally compared with horn of plenty and concrete name and personage, news lecture notes, drama and captioned test there is clue, and high priest's quantity that film and television play etc. relates to is common also comparatively limited, and said method can be realized the full-automatic mark to high priest in particular news program, film and television play with higher precision.
Yet internet video is different from film and television play.Although also have some text messages on the webpage of internet video, the common limited amount of these texts, not accurate enough and do not organized preferably.In addition, they appear at whole video-level, unlike captioned test with timestamp information.These characteristics have determined that the method for above-mentioned dependence rich text information excavating is difficult to directly be generalized on internet video.In addition, internet video content embraces a wide spectrum of ideas, and the personage that video may relate to is contained different social sectors, and quantity is extremely many, even only pay close attention to famous person wherein, its quantity neither a decimal fractions.At present, the people's face automatic marking work for extensive open the Internet video is still in predevelopment phase.Owing to being difficult to reach good mark effect, current this respect does not have ripe method and system and emerges.
Along with magnanimity internet video is deposited in video website, and new video quantity also increasing at faster speed, and personage's video labeling becomes again Bai academia and the previous problem that solves of having to of industrial community face.Therefore, people is incorporated into mark link, interactive mask method starts to receive publicity to take that to improve mark accuracy be target.To sky, meadow, on the mark of the general visual concepts such as building, had at present some effective interactive mask methods to be suggested, but these methods can not be applied directly in this problem of difference mark different people.Trace it to its cause, the above-mentioned general visual concept of artificial mark is comparatively easily to realize, because only can distinguish these concepts by general knowledge in the time of most of, but concerning the different personages of difference mark, even veteran mark person, conventionally be also only familiar with considerably less in the world people, and people cannot mark name for own unacquainted people.If as existing Interactive Marking System, only the image that comprises personage or frame of video and (a plurality of) relevant name are submitted to mark user, due to very large, may not be familiar with personage to be marked, the very difficult image scale of user is noted general visual concept and is gone like that to mark personage, even if that needs mark is all famous person.Interactive personage, mark particularly video personage and mark this on the one hand, at present relevant achievement is also very rare.
Notice when people sees unacquainted people in image or video, for understanding whom he/her is, the solution of taking is normally: from periphery text, find name, with the name finding as keyword, utilize image search engine to retrieve, then the result images returning by comparison search engine and the people in picture with the aid of pictures, show that in image, whose judgement people is.The image retrieval based on text key word that such scheme adopts, although also there is at present the searching system of minority " to scheme to search figure ", but because search target is the image of particular persons, and do not require that all result images are visually highly similar to query image, and the apparent variation of the vision of video human face is large, resolution is conventionally lower, also the precision of " to scheme to search figure " system is brought challenges, searching method that mainly take at present or based on text key word in this task.Owing to can find particularly famous person of a large amount of character images by search engine, such scheme a kind of effective help user that many times can yet be regarded as is not familiar with personage's method before understanding.
People's above-mentioned way can be used for reference on the interactive mask method of personage's video and the design of system naturally.Mark person is carrying out personage while marking, and can run into unacquainted people equally and has to suspend mark, understands this people, and then continue to advance mark process by seeking help from the external tools such as search engine.Because needs frequently switch in mark and search comparison operation, this process is undoubtedly poor efficiency and loaded down with trivial details.If can extract the name in video periphery text by text resolution and visual analysis technology, obtain related person network image and give corresponding demonstration in mark process; Meanwhile, people's face in video is analyzed and processed, and show to be easy to the mode of mark, person both went to understand personage to be marked without being switched to search engine to make mark, what see is again mark mode and process tissue the friendly video human face image presenting that is more easily added on decision-making, can simplify mark process so undoubtedly, significantly improve efficiency and the precision of personage's video labeling.Yet, while retrieving in disclosed patent database, do not inquire special for video in personage's interactive mask method and system, above-mentioned background and be familiar with the present invention just and produce motivation and reason.
Summary of the invention
While the present invention is directed to internet personage video labeling, because mark person is not very likely familiar with personage to be marked, cause mark process to be difficult to the situation of carrying out, a kind of internet personage video interactive mask method and system are proposed, by excavating video multiple and to be marked and the relevant resource of personage, and the friendly various user interactions mode of design, simplify mark process, the decision-making of auxiliary mark, improve efficiency and the precision of personage's video labeling, so promote internet personage video retrieval, browse and the lifting of the service level that sorts.
For achieving the above object, the invention provides a kind of internet personage video interactive mask method, comprise the following steps:
S1, video to be marked is analyzed, extracted people's face arrangement set in this video and the name set in video periphery text;
Name in the name set that S2, the step S1 of take obtain is text key word, and search is the network image set with the corresponding personage of described name with acquisition;
S3, calculate the importance score of described people's face sequence, the recommendation scores of merging between two of described people's face sequence, and described people's face sequence and step S2 that obtain with the similarity score corresponding personage's network image of described name, and according to described importance score, described recommendation scores and the described similarity score of merging between two, determine when described video is marked the people's face sequence, name and the personage's network image that are shown;
S4, the people's face sequence, name and the personage's network image that according to step S3, show, mark alternately to people's face sequence, and then realize the mark to described video.
The present invention also proposes a kind of internet personage video interactive labeling system, comprising:
For video to be marked is analyzed, extract people's face arrangement set in this video and the device of the name set in video periphery text;
For take the name of described name set, be text key word, search is to obtain the device with the corresponding personage's of described name network image set;
For calculating the importance score of described people's face sequence, the recommendation scores of merging between two of described people's face sequence, and the similarity score of described people's face sequence and the corresponding personage's network image of described name, and according to described importance score, described recommendation scores and the described similarity score of merging between two, determine when described video is marked the device of people's face sequence, name and the personage's network image being shown;
For showing people's face sequence, name and personage's network image that will mark, people's face sequence is marked alternately, and then realize the device that described video is marked.
The present invention is by excavating the resource that contributes to mark the friendly various user interactions mode of respective design that video multiple and to be marked and personage are relevant, can simplify mark process, the decision-making of auxiliary mark, person is not familiar with personage to be marked effectively to alleviate mark, causes mark to be difficult to the problem of carrying out.Utilize the present invention, can increase substantially efficiency and the precision of internet personage video labeling, so promote internet personage video retrieval, browse and the lifting of the service level that sorts.
Accompanying drawing explanation
Fig. 1 is the process flow diagram according to a kind of internet personage video interactive mask method of the embodiment of the present invention;
Fig. 2 is a kind of internet personage video interactive labeling system sectional drawing and the correlation module explanation according to the embodiment of the present invention.
Embodiment
For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in further detail.
As shown in Figure 1, internet personage video interactive mask method of the present invention comprises the following steps:
S1, video is analyzed, extracted people's face arrangement set in this video and the name set in video periphery text;
Name in the name set that S2, the step S1 of take obtain is text key word, and search is the network image set with the corresponding personage of described name with acquisition;
S3, calculate the importance score of described people's face sequence, the recommendation scores of merging between two of described people's face sequence, and described people's face sequence and step S2 that obtain with the similarity score corresponding personage's network image of described name, and according to described importance score, described recommendation scores and the described similarity score of merging between two, determine when described video is marked the people's face sequence, name and the personage's network image that are shown;
S4, the people's face sequence, name and the personage's network image that according to step S3, show, mark alternately to people's face sequence, and then realize the mark to described video.
According to a preferred embodiment of the invention, video is analyzed, the detailed process of extracting the step S1 of people's face sequence in this video and the name in video periphery text is:
Step S11, described video is carried out to camera lens cut apart, the camera lens that each is obtained carries out Face detection and tracking, obtains the people's face sequence in this camera lens, and people's face sequence that comprehensive all camera lenses obtain obtains people's face arrangement set of this video;
During concrete enforcement, first internet personage video to be marked is carried out to shot boundary detection, according to the shot boundary video of naming a person for a particular job, be decomposed into camera lens set; Then calling increase income people's face detection function of storehouse OpenCV of computer vision carries out frame by frame people's face to each camera lens and detects; Next, adopt the tracking based on detecting that the people's face that belongs to same person detecting on different video frame is gathered together, generate people's face sequence.All camera lenses are repeated to the generative process of above-mentioned people's face sequence, obtain all people's face arrangement sets that detect of this video the quantity of FN representative face sequence wherein.
The step that tracking based on detecting generates people's face sequence is: first, according to people's face testing result, extract the color histogram feature of everyone face, and calculate according to this similarity between two of people's face.Then, the similarity between two of people's face is sorted from big to small, adopt coagulation type clustering method, the people's face that meets following four conditions is merged between two, four conditions are specially: 1) similarity between two of people's face is greater than predefined merging threshold value; 2), in people's face set that everyone face forms in merging the first two people's face people from place face class, without any two people's faces, appear at same frame of video; 3) the appearance interval of two people's faces is no more than 1 second; 4) distance of two people's face center point coordinates is not more than 2.5 times of people's face width.Repeat above-mentioned merging process and meet above-mentioned four conditions until no longer include two people's faces simultaneously, obtain people's face cluster result.Finally, the people's face that belongs to same class, by the sequence of (frame of video) time of occurrence, is supplemented to people's face that people's face testing process is missed by the method for interpolation, generate complete people's face sequence.Everyone face in everyone face sequence belongs to same person.
It is more than people's face retrieval method of one embodiment of the present invention.Certainly, can use other Face detection and tracking method to obtain people's face sequence, the present invention does not do any restriction for the acquisition methods of people's face sequence yet.
Step S12, obtain the representative facial image of everyone the face sequence in described people's face arrangement set.
A kind of embodiment is, for example, to everyone face sequence, F k, subscript k represents k people's face sequence in people's face arrangement set, establishing it has t people's face, extracts the color histogram feature of this t people's face, calculates according to this people's face similarity matrix between two wherein be the similarity of i and j people's face, order for T kin the mean value of everyone face similarity degree, adopt affine propagation (Affinity Propagation) clustering algorithm to carry out self-adaption cluster to this t people's face.If cluster generates | F k| individual class, F kcan be expressed as its representative facial image set wherein from i the facial image that class central point is nearest.Wherein, i is the natural number that is less than t.
Certainly, can use other method to obtain the representative facial image of people's face sequence, the present invention does not do any restriction for the acquisition methods of the representative facial image of people's face sequence yet.
Step S13, collection video periphery text extract name from this video periphery text.
Described video periphery text refers on the webpage of internet video place, the word content relevant to video, and it includes but not limited to: video title, label, descriptive text and user comment.Correlativity and the noise level of considering dissimilar periphery text are different, and the present invention only considers that video title, label and length surpass the user comment of 20 words.
Particularly, for periphery text, be English situation (as the periphery text of English video website), adopted a kind of name extracting method based on wikipedia biographical dictionary coupling.To the continuous word sequence in above-mentioned text, the method is from first word, whether the phrase of testing successively n unit (n < 4) word sequence composition forms a dimension keyword bar, if form, retain the dimension keyword bar of n maximum, and continue above-mentioned test since n+1 word.By the method, can in continuous word sequence, find dimension keyword bars such as " Barack Obama " and " World Cup 2014 ".At title, label and comment collection, close and repeat above-mentioned resolving, obtain after the set of dimension keyword bar.The method verifies whether these dimension keyword bars are names one by one.Particularly, check that the classification of dimension keyword bar place dimension base page face is described part, wherein whether existence form is the description classification of " xxxx births " in inquiry, and wherein xxxx is four or three numerals that represent the time.If exist, judge that this dimension keyword bar is name, otherwise be judged to be other named entity and ignored.
What introduce above is the disposal route of English text, for video periphery text, it is Chinese situation, first utilize Chinese word segmentation instrument ICTCLAS to carry out Chinese word segmentation, and then adopt the above-mentioned name extracting method based on wikipedia biographical dictionary coupling to carry out that name extracts can (dimension base class do not describe the judgment criterion of part is corresponding becomes the description classification whether existence form is " xxxx birth ").By above-mentioned processing, can obtain the relevant name set of this video n wherein krepresent k the name extracting, CN represents the quantity of extracted name.
Because the periphery text of internet video is conventionally uploaded user by video and provided, the syntactic structure of text is loose, word collocation is comparatively free, misspelling and write a Chinese character in simplified form also of common occurrence.The above-mentioned name extracting method based on wikipedia biographical dictionary coupling does not rely on syntactic structure, and to misspelling with write a Chinese character in simplified form certain tolerance, the name being particularly suitable in internet video periphery text extracts.Certainly, can use other name extracting method, the present invention does not do any restriction for name extracting method yet.
Step S2 for the name of take in the name set that step S1 obtains be text key word, search is to obtain and the set of the corresponding personage's network image of described name, it specifically comprises the steps:
Name in the name set that step S21, the step S1 of take obtain is text key word, search the download image relevant to described text key word on network.
Specifically, can utilize existing image search engine, such as the application programming interfaces that call Google and provide, text key word is submitted to Google image search engine, and search parameter is set for the retrieval 64 width image that comprises people's face, this arranges down, and Google image search engine can turn back to retrieval end in the URL(uniform resource locator) (being URL address) of the character image of first 64 by result for retrieval sequence, and retrieval is held and then downloaded respective image according to URL address.That is to say, at all images, can normally download ideally, this step can obtain 64 Search Results images.In reality, the image that each name can download to is conventionally between 50 to 64.
Step S22, the image relevant to described text key word of described download carried out to the detection of people's face, filtering does not detect image people's face or that more than one people's face detected.
For example, can call increase income people's face detection function of storehouse OpenCV of computer vision, to downloading successful character image, carry out the detection of people's face.Returning results of people's face detection function can be: people's face do not detected, one or more people's faces detected.Owing to detecting in the image of a plurality of people's faces except inquirer, conventionally also can comprise people's face of other personage, and then can disturb mark person's judgement in the reference comparison procedure of mark, therefore this step only retains the image that 1 people's face detected, people's face do not detected and is removed with the image that a plurality of people's faces detected;
Step S23, all names in described name set are repeated to above-mentioned steps S21 and step S22, obtain the personage network image set corresponding with each name in described name set.
The set of personage's network image can be designated as c wherein krepresent all and name N kcorresponding personage's network image.
For convenience of the follow-up explanation to embodiment.First system of the present invention is formed and simply introduced.Fig. 2 has provided the sectional drawing of above-mentioned Interactive Marking System, can see, system interface is divided into management area, tab area, mark reference zone and four parts of mark history area.Management area, for mark person's interactive selection video to be marked, is written into people's face sequence title and the relevant name of selected video.Tab area can be further divided into similar people's face merging/distinguishing mark subregion and name-people face connective marker subregion, is used for respectively showing current similar people's face combined sequence Q to be marked i=< F m, F n> and people's face sequence F j, and carry out corresponding mutual mark operation.In addition the first six width personage network image of the name the most similar to shown people's face sequence in name-people face connective marker subregion and it corresponding mark reference zone that is presented at.Rightmost mark history area sequentially shows by mark name-people face two tuples that marked, and wherein two tuples of up-to-date mark are presented at the top.The function of mark reference zone and mark history area is mainly to help mark person with reference to decision-making as supplementary.
Step S3 is for calculating the importance score of described people's face sequence, the recommendation scores of merging between two of described people's face sequence, and described people's face sequence and step S2 that obtain with the similarity score corresponding personage's network image of described name, and according to described importance score, described recommendation scores and the described similarity score of merging between two, determine when described video is marked the people's face sequence, name and the personage's network image that are shown.This step S3 comprises as follows step by step:
Step S31, calculate the conspicuousness value of everyone face sequence in described people's face arrangement set.
Because time of occurrence is longer, people's face is larger people's face sequence more easily attracts much attention at video, be that video core personage's probability is also larger.The present invention is called conspicuousness by this character of people's face sequence, and has proposed following conspicuousness value computing formula:
Sai ( F i ) = e - size &theta; size i + e - dura &theta; dura i - - - ( 1 )
Wherein, size iand dura irespectively people's face sequence F ibe bold little and time of occurrence length of average man, size θand dura θbe two threshold values that rule of thumb arrange, be used for respectively controlling people's impact little and that time of occurrence calculates conspicuousness of being bold.By formula (1), people's face sequence that time of occurrence is long, average man is bold will have larger conspicuousness value.
Step S32, calculate the similarity between two between people's face sequence in described people's face arrangement set.
In video, time of occurrence has two overlapping people's face sequences different people of correspondence conventionally, but the shorter people's face sequence in time of occurrence interval may be that same person is because of the reasons such as camera lens switching, the different people face sequence of generation.Based on above-mentioned cognition, whether the visual similarity between two of people's face sequence of giving chapter and verse, people's face sequence time of occurrence interval, time of occurrence there is the information such as overlapping, calculate the similarity between two of people's face sequence, and corresponding computing formula is:
sim ( F i , F j ) = e - &Delta;time i , j time &theta; &CenterDot; ( 1 - CO i , j ) &CenterDot; vs ( F i , F j ) - - - ( 2 )
Wherein, time θfor controlling the threshold value of time of occurrence difference impact, Δ time i, jpeople's face sequence F iand F jtime of occurrence difference, by following formula (3), calculate:
time j beg - time i end , if time i beg &le; time j beg time i beg - time j end , if time j beg &le; time i beg - - - ( 3 )
In formula (3), with respectively people's face sequence F ithe start time and the end time that occur, the little people's of the showing face of time value sequence appear at video before (beginning) part.In addition in formula (2), CO i, jmean people's face sequence F iand F jwhether time of occurrence has overlapping two-valued function, overlapping if the two has, CO i, j=1, otherwise CO i, j=0; Vs (F i, F j) be people's face sequence F iand F jvisual similarity, in the representative people's face set by two people's face sequences, the similarity of two the most similar people's faces represents, its computing formula is:
vs ( F i , F j ) = e - min f i m &Element; F i , f j n &Element; F j , i &NotEqual; j | | f i m - f j n | | - - - ( 4 )
In formula (4), people's face sequence F ithe facial characteristics vector of m representative people's face.
Step S33, people's face sequence of obtaining according to step S32 be customer interaction information when similarity and mark between two, calculates the recommendation scores of merging between two of people's face sequence.
Specifically can utilize following formula to calculate:
MS(F i,F j)=(1-PM i,j)·sim(F i,F j) (5)
PM wherein i, jmean people's face sequence F iand F jcombination whether in mark process, by user, " skipped " or be labeled as " difference ".If PM i, j=1, otherwise PM i, j=0.According to formula (5), similarity is high, and people's face sequence combination of two of " not skipped " or be labeled as " difference " in user annotation process by user will be endowed the large recommendation scores of merging between two.Based on this, people's face combined sequence that all scores are more than or equal to threshold value given in advance is according to MS (F i, F j) value arrangement from high to low, obtain merging between two recommendation scores list q wherein k=< F i, F j> i ≠ j.In mark process, the similar people's face sequence in Fig. 2 system merge distinguishing mark subregion will be according to Rank mSshow people's face sequence combination of two to be marked.
Step S34, people's face sequence conspicuousness value of utilizing step S31 to obtain, people's face sequence that step S32 obtains is similarity score and the customer interaction information in when mark between two, calculates the importance score of people's face sequence.
The importance of people's face sequence represents to consider after the information of various human face sequence, described video and user interactions, the degree that people's face sequential value must mark, and it can utilize following formula (6) to calculate:
IS ( F i ) = ( 1 - PA i ) &CenterDot; ( Sai i &OverBar; + AR i &OverBar; ) - - - ( 6 )
PA wherein ito characterize people's face sequence F iwhether in mark process, by user, " skipped ", if PA i=1, otherwise PA i=0; with respectively the conspicuousness Sai after minimax normalization iwith accumulation correlativity AR i, the latter is defined as:
AR i = &Sigma; j = 1 , j &NotEqual; i FN L j &CenterDot; sim ( F i , F j ) - - - ( 7 )
Wherein, L jpeople's face sequence F jmark function of state.If F ibe marked, L j=1, otherwise L j=0.According to formula (6), conspicuousness value is large, and to a plurality of to have marked people's face all more similar, and people's face sequence of " not skipped " in user annotation process will be endowed large importance score.
Based on this, by people's face sequence according to importance score IS (F i) arrange from high to low, obtain the list of importance score in mark process, the name-people face connective marker subregion in Fig. 2 system will be according to Rank iSshow people's face sequence to be marked.
Step S35, calculate the similarity of the personage's network image in people's face sequence and the set of described personage's network image in described people's face arrangement set, height by similarity sorts, the people's list of file names after being sorted and the K of each name personage's network image the most similar.In the present invention, the value of K is set to 6.
This step mainly solves in interactive mark process, and mark person often can run into the situation of not being familiar with personage to be marked.By show above-mentioned name and personage's network image at labeling system, alleviate the problem that mark person is not familiar with personage to be marked.Particularly, after the people's face sequence showing in name-people face connective marker subregion is determined, the name the most similar to it and K personage's network image the most similar thereof are displayed, for mark person, with reference to comparison, assist and determine the corresponding name of people's face sequence to be marked.The calculating of this step comprises following three sub-steps:
Step S351, calculate in described people's face arrangement set the similarity between two of name in people's face sequence and described name set.The visual characteristic of name can represent by its corresponding personage's network image.Based on this, by following formula (8), calculate people's face sequence F iwith personage's network image set C jsimilarity, and by this similarity as people's face sequence F iwith name N jsimilarity:
vs ( F i , N j ) = vs ( F i , C j ) = 1 | C j | &Sigma; n = 1 | C j | vs ( F i , c j n ) - - - ( 8 )
Wherein
vs ( F i , c j n ) = e - min f i m &Element; F i | | f i m - c j n | | - - - ( 9 )
name N jcorresponding personage's network image set C jin the face features vector of n image;
Step S352, the similarity calculating according to step S351, sort to described name.Conventionally, people's face sequence F iwith name N jsimilarity vs (F i, N j) larger, F in jthe probability of people's face also larger.Based on this, according to vs (F i, N j) be worth from high to low to name set omega nsort, obtain people's face sequence F icorresponding name sequence
Step S353, calculate people's face sequence with respect to K personage's network image the most similar of each name.From formula (9), can see, in personage's network image to the similarity of people's face sequence by this image and the most similar representative face representation of people's face sequence.Therefore, to every group of people's face sequence F iwith name N j, according to value is from high to low to C jin personage's network image sort, retain K image the most similar, obtain people's face sequence F iwith respect to name N jthe list of personage's network image Rank ( F i , F j ) = { c k } k = 1 K , Wherein K is set as 6;
According to a preferred embodiment of the invention, people's face sequence importance score list that step 34 obtains and people's face sequence F of obtaining of step 35 icorresponding similar people's list of file names with the list of similar personage's network image by multiple user interactive, produce corresponding mark behavior, realize the detailed process of the mark of personage in video step S4 as described below is specifically comprised:
The various resources that step S41, initialization mark process relate to.
Specific practice is:
S411, order { PA k = 0 } k = 1 FN , { PM m , n = 0 } m = 1 , n = 1 , m &NotEqual; n FN , ULSets = { F k } k = 1 FN ;
S412, automatic marking meet people's face combined sequence Q of condition shown in formula (10) i=< F m, F n>, and marked combination from Rank by all mSin list, shift out
Label(F i)=Label(F j),if satisfies vs(F i,F j)≥T s (10)
Wherein if satisfies represents " if meeting ", T smean visually enough similar threshold value whether of two people's face sequences.
S413, from Rank mSand Rank iSmiddlely take out respectively the element Q that rank is the highest i=< F m, F n> and F j, currently merge between two people's face combined sequence and the highest people's face sequence of importance score that score is the highest, these resources are shown in labeling system;
S414, taking-up Rank (F j) the middle the highest name of rank and in K image, these resources are shown in labeling system.
Step S42, according to multiple user interactive, produce corresponding mark behavior.
User interactive has three classes: 1) by the people's face combined sequence Q showing in system i=< F m, F n> is labeled as similar people's face merging/distinguishing mark operation of " identical " or " difference "; 2) select particular person name mark people face sequence F jname-people face connective marker operation; 3) select different names and personage's network image thereof, for the operation of mark person reference.In this three generic operation, the 3rd class is mark non-productive operation, and object is to provide the decision-making of information assisted user mark, and Equations of The Second Kind operation can be F jcorresponding name on mark, and it is never marked in people's face arrangement set ULSets and is shifted out.Mark behavior corresponding to this three classes interactive operation be respectively:
1) the mark behavior of similar people's face merging/distinguishing mark operational correspondence:
If a) user marks Q with " identical " option i, make Label (F m)=Label (F n), Label (F wherein m) expression people face sequence F mcorresponding name;
B) if user marks Q with " difference " option i, make Label (F m) ≠ Label (F n), with seasonal PM m, n=1;
C) if user to Q iselect " skipping " option, make PM m, n=1;
2) the mark behavior of name-people face connective marker operational correspondence:
If a) user selects to use name N kflag F j, make ULSets=ULSets F j, Label (F j)=N k;
B) if user selects " skipping " to F jmark, make PA j=1;
3) name and personage's network image are selected the behavior of operational correspondence:
If a) user clicks " previous " option, make k=k-1 (when k > 1), show name and personage's network image list in K image;
B) if user clicks " latter one " option, make k=k+1 (when k < CN), show name and personage's network image list in K image.
Step S43, utilize label propagation algorithm not mark people's face sequence to other to mark.
Because user's mutual mark behavior provides extra mark clue.Therefore, utilize label propagation algorithm not mark people's face sequence F to other that meets following formula (11) or (12) described condition icarry out automatic marking;
Label ( F i ) = N k ULSets = ULSERS \ F i , if satisfies F i &Element; ULSets vs ( F i , F j ) &GreaterEqual; T s Label ( F j ) = N k - - - ( 11 )
Label ( F i ) = Label ( F j ) , if satisfies F i &Element; ULSets F j &Element; ULSets vs ( F i , F j ) &GreaterEqual; T s - - - ( 12 )
T wherein sit is the similarity threshold of formula (10) definition.
Step S44, to merging between two recommendation scores list and the list of importance score arranges and reorders, the resource being presented while determining next round user annotation.
By step S42 and step S43, merge between two recommendation scores list Rank mSwith importance score list Rank iSin some face sequences be marked.This step is according to annotation results, to Rank mSand Rank iSarrange and reorder, the resource being presented while determining next round user annotation.Above-mentioned arrangement and the specific practice reordering are respectively:
1) arrange: by Rank mSand Rank iSin meet the element Q of following formula (13), (14) or (15) described condition i=< F m, F n> and F jshift out respectively:
Rank MS = Rank MS \ Q i , if satisfies F m &Element; ULSets F n &Element; ULSets - - - ( 13 )
Rank MS=Rank MS\Q i,if satisfies Label(F m)=Label(F n) (14)
Rank IS=Rank IS\F j,if satisfies (15)
2) reorder: to Rank mSand Rank iSin remaining element, utilize respectively formula (6) and (5) to recalculate it and merge between two recommendation scores and importance score, and generate Rank according to score rearrangement mSand Rank iSlist, the foundation that while marking alternately as next round, resource shows.
Step S45, repeating step S42 are to step S44, until (all people's of mark face sequences are all marked ), or user initiatively exits mark process.
Above-described specific embodiment; object of the present invention, technical scheme and beneficial effect are further described; be understood that; the foregoing is only specific embodiments of the invention; be not limited to the present invention; within the spirit and principles in the present invention all, any modification of making, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (19)

1. an internet personage video interactive mask method, is characterized in that, comprises the following steps:
S1, video to be marked is analyzed, extracted people's face arrangement set in this video and the name set in video periphery text;
Name in the name set that S2, the step S1 of take obtain is text key word, and search is the network image set with the corresponding personage of described name with acquisition;
S3, calculate the importance score of described people's face sequence, the recommendation scores of merging between two of described people's face sequence, and described people's face sequence and step S2 that obtain with the similarity score corresponding personage's network image of described name, and according to described importance score, described recommendation scores and the described similarity score of merging between two, determine when described video is marked the people's face sequence, name and the personage's network image that are shown;
S4, the people's face sequence, name and the personage's network image that according to step S3, show, mark alternately to people's face sequence, and then realize the mark to described video.
2. method according to claim 1, is characterized in that, described step S1 comprises the following steps:
Step S11, described video is carried out to camera lens cut apart, the camera lens that each is obtained carries out Face detection and tracking, obtains the people's face sequence in this camera lens, and people's face sequence that comprehensive all camera lenses obtain obtains people's face arrangement set of this video;
Step S12, obtain the representative facial image of everyone the face sequence in described people's face arrangement set;
Step S13, collection video periphery text extract name from this video periphery text.
3. method according to claim 1, is characterized in that, described step S2 comprises the following steps:
Name in the name set that step S21, the step S1 of take obtain is text key word, search the download image relevant to described text key word on network;
Step S22, the image relevant to described text key word of described download carried out to the detection of people's face, filtering does not detect image people's face or that more than one people's face detected;
Step S23, all names in described name set are repeated to above-mentioned steps S21 and step S22, obtain the personage network image set corresponding with each name in described name set.
4. method according to claim 1, is characterized in that, described step S3 comprises the following steps:
Step S31, calculate the conspicuousness value of everyone face sequence in described people's face arrangement set;
Step S32, calculate the similarity between two between people's face sequence in described people's face arrangement set;
Step S33, people's face sequence of obtaining according to step S32 be customer interaction information when similarity and mark between two, calculates the recommendation scores of merging between two of people's face sequence;
Step S34, people's face sequence conspicuousness value of utilizing step S31 to obtain, people's face sequence that step S32 obtains is similarity score and the customer interaction information in when mark between two, calculates the importance score of people's face sequence;
Step S35, calculate the similarity of the personage's network image in people's face sequence and the set of described personage's network image in described people's face arrangement set, height by similarity sorts, and obtains K personage's network image the most similar of people's list of file names after everyone face sequence permutation and each name.
5. method according to claim 4, is characterized in that, people's face sequence F in described step S31 iconspicuousness by following formula (1), calculate:
Sai ( F i ) = e - size &theta; size i + e - dura &theta; dura i - - - ( 1 )
Wherein, size iand dura irespectively people's face sequence F ibe bold little and time of occurrence length of average man, size θand dura θbe two threshold values that rule of thumb arrange, be used for respectively controlling people's impact little and that time of occurrence calculates conspicuousness of being bold.
6. method according to claim 4, is characterized in that, in described step S32, the similarity between two of people's face sequence is calculated by following formula (2):
sim ( F i , F j ) = e - &Delta;time i , j time &theta; &CenterDot; ( 1 - CO i , j ) &CenterDot; vs ( F i , F j ) - - - ( 2 )
Wherein, time θfor controlling the threshold value of time of occurrence difference impact, Δ time i, jpeople's face sequence F iand F jtime of occurrence difference, by following formula (3), calculate:
time j beg - time i end , if time i beg &le; time j beg time i beg - time j end , if time j beg &le; time i beg - - - ( 3 )
In formula (3), with respectively people's face sequence F ithe start time and the end time that occur, the little people's of the showing face of time value sequence appear at video before (beginning) part;
In formula (2), CO i, jmean people's face sequence F iand F jwhether time of occurrence has overlapping two-valued function, overlapping if the two has, CO i, j=1, otherwise CO i, j=0; Vs (F i, F j) be people's face sequence F iand F jvisual similarity, in the representative people's face set by two people's face sequences, the similarity of two the most similar people's faces represents, its computing formula is:
vs ( F i , F j ) = e - min f i m &Element; F i , f j n &Element; F j , i &NotEqual; j | | f i m - f j n | | - - - ( 4 )
In formula (4), people's face sequence F ithe facial characteristics vector of m representative people's face.
7. method according to claim 4, is characterized in that, in described step S33, the recommendation scores of merging between two of people's face sequence calculates by following formula (5):
MS(F i,F j)=(1-PM i,j)·sim(F i,F j) (5)
PM wherein i, jmean people's face sequence F iand F jcombination whether in mark process, by user, " skipped " or be labeled as " difference "; If PM i, j=1, otherwise PM i, j=0; According to formula (5), similarity is high, and people's face sequence combination of two of " not skipped " or be labeled as " difference " in user annotation process by user will be endowed the large recommendation scores of merging between two; Based on this, people's face combined sequence that all scores are more than or equal to threshold value given in advance is according to MS (F i, F j) value arrangement from high to low, obtain merging between two recommendation scores list q wherein k=< F i, F j> i ≠ j.
8. method according to claim 4, is characterized in that, in described step S34, the importance score of people's face sequence is calculated by following formula (6):
IS ( F i ) = ( 1 - PA i ) &CenterDot; ( Sai i &OverBar; + AR i &OverBar; ) - - - ( 6 )
PA wherein ito characterize people's face sequence F iwhether in mark process, by user, " skipped ", if PA i=1, otherwise PA i=0; with respectively the conspicuousness Sai after minimax normalization iwith accumulation correlativity AR i, the latter is defined as:
AR i = &Sigma; j = 1 , j &NotEqual; i FN L j &CenterDot; sim ( F i , F j ) - - - ( 7 )
Wherein, L jpeople's face sequence F jmark function of state; If F ibe marked, L j=1, otherwise L j=0,
By people's face sequence according to importance score IS (F i) arrange from high to low, obtain the list of importance score Rank IS = { F i } i = 1 FN .
9. method according to claim 4, is characterized in that, described step S35 comprises the following steps:
Step S351, calculate in described people's face arrangement set the similarity between two of name in people's face sequence and described name set;
Step S352, the similarity calculating according to step S351, sort to described name;
Step S353, calculate people's face sequence with respect to K personage's network image the most similar of each name.
10. method according to claim 9, is characterized in that,
Described step S351 calculates people's face sequence F by following formula (8) iwith personage's network image set C jsimilarity, and by this similarity as people's face sequence F iwith name N jsimilarity:
vs ( F i , N j ) = vs ( F i , C j ) = 1 | C j | &Sigma; n = 1 | C j | vs ( F i , c j n ) - - - ( 8 )
Wherein
vs ( F i , c j n ) = e - min f i m &Element; F i | | f i m - c j n | | - - - ( 9 )
personage's network image set C jin the face features vector of n image.
11. methods according to claim 10, is characterized in that,
Described step S352 is to everyone face sequence F i, according to vs (F i, N j) value sorts to name from high to low, obtains name sequence
12. methods according to claim 11, is characterized in that,
Described step S353 for example, to every group of people's face sequence and name, F iand N j, according to value is from high to low to C jin personage's network image sort, retain K image the most similar, obtain and F iand N jcorresponding personage's network image list
13. methods according to claim 1, is characterized in that, described step S4 comprises the following steps:
The various resources that step S41, initialization mark process relate to;
Step S42, according to multiple user interactive, produce corresponding mark behavior;
Step S43, utilize label propagation algorithm not mark people's face sequence to other to mark;
Step S44, to merging between two recommendation scores list and the list of importance score arranges and reorders, the resource being presented while determining next round user annotation;
Step S45, repeating step S42 are to step S44, until all people's of mark face sequences are all marked.
14. methods according to claim 13, is characterized in that, described step S41 comprises:
S411, order { PA k = 0 } k = 1 FN , { PM m , n = 0 } m = 1 , n = 1 , m &NotEqual; n FN , ULSets = { F k } k = 1 FN ;
S412, automatic marking meet people's face combined sequence Q of condition shown in formula (10) i=< F m, F n>, and marked combination from Rank by all mSin list, shift out
Label(F i)=Label(F j),if satisfies vs(F i,F j)≥T s (10)
Wherein if satisfies represents " if meeting ", T smean visually enough similar threshold value whether of two people's face sequences;
S413, from Rank mSand Rank iSmiddlely take out respectively the element Q that rank is the highest i=< F m, F n> and F j, currently merge between two people's face combined sequence and the highest people's face sequence of importance score that score is the highest, these resources are shown in labeling system;
S414, taking-up Rank (F j) the middle the highest name of rank and in K image, these resources are shown in labeling system.
15. methods according to claim 13, is characterized in that, in described step S42, multiple user interactive comprises: the people's face combined sequence Q 1) system being shown i=< F m, F n> is labeled as similar people's face merging/distinguishing mark operation of " identical " or " difference "; 2) select particular person name mark people face sequence F jname-people face connective marker operation; 3) interactive operation of selecting different names and personage's network image thereof to be shown.
16. methods according to claim 13, is characterized in that, in described step S42, mark behavior corresponding to multiple user interactive is respectively:
1) the mark behavior of similar people's face merging/distinguishing mark operational correspondence:
If a) user marks Q with " identical " option i, make Label (F m)=Label (F n), Label (F wherein m) expression people face sequence F mcorresponding name;
B) if user marks Q with " difference " option i, make Label (F m) ≠ Label (F n), with seasonal PM m, n=1;
C) if user to Q iselect " skipping " option, make PM m, n=1;
2) the mark behavior of name-people face connective marker operational correspondence:
If a) user selects to use name N kflag F j, make ULSets=ULSets F j, Label (F j)=N k;
B) if user to F jselect " skipping " option, make PA j=1;
3) name and personage's network image are selected the behavior of operational correspondence:
If a) user clicks " previous " option, make k=k-1 (when k > 1), show name and personage's network image list in K image;
B) if user clicks " latter one " option, make k=k+1 (when k < CN), show name and personage's network image list in K image.
17. methods according to claim 13, is characterized in that, described step S43 does not mark people's face sequence F to other that meets certain condition icarry out the specific practice of automatic marking as shown in formula (11) or (12):
Label ( F i ) = N k ULSets = ULSERS \ F i , if satisfies F i &Element; ULSets vs ( F i , F j ) &GreaterEqual; T s Label ( F j ) = N k - - - ( 11 )
Label ( F i ) = Label ( F j ) , if satisfies F i &Element; ULSets F j &Element; ULSets vs ( F i , F j ) &GreaterEqual; T s - - - ( 12 )
T wherein sit is the similarity threshold of formula (10) definition.
18. methods according to claim 13, is characterized in that, described step S44 is according to annotation results, to Rank mSand Rank iSthe specific practice that arranges and reorder is:
1) arrange: at Rank mSand Rank iSthe middle element Q that meets following formula (13), (14) or (15) described condition that deletes respectively i=< F m, F n> and F j:
Rank MS = Rank MS \ Q i , if satisfies F m &Element; ULSets F n &Element; ULSets - - - ( 13 )
Rank MS=Rank MS\Q i,if satisfies Label(F m)=Label(F n) (14)
Rank IS=Rank IS\F j,if satisfies (15)
2) reorder: to Rank mSand Rank iSremaining element, utilizes formula (6) and (5) to recalculate it and merges between two recommendation scores and importance score, and regenerate according to this Rank mSand Rank iS, the foundation that while marking alternately as next round, resource shows.
19. 1 kinds of internet personage video interactive labeling systems, is characterized in that, comprising:
For video to be marked is analyzed, extract people's face arrangement set in this video and the device of the name set in video periphery text;
For take the name of described name set, be text key word, search is to obtain the device with the corresponding personage's of described name network image set;
For calculating the importance score of described people's face sequence, the recommendation scores of merging between two of described people's face sequence, and the similarity score of described people's face sequence and the corresponding personage's network image of described name, and according to described importance score, described recommendation scores and the described similarity score of merging between two, determine when described video is marked the device of people's face sequence, name and the personage's network image being shown;
For showing people's face sequence, name and personage's network image that will mark, people's face sequence is marked alternately, and then realize the device that described video is marked.
CN201410475211.0A 2014-09-17 2014-09-17 Internet personage video interactive mask method and system Active CN104217008B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410475211.0A CN104217008B (en) 2014-09-17 2014-09-17 Internet personage video interactive mask method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410475211.0A CN104217008B (en) 2014-09-17 2014-09-17 Internet personage video interactive mask method and system

Publications (2)

Publication Number Publication Date
CN104217008A true CN104217008A (en) 2014-12-17
CN104217008B CN104217008B (en) 2018-03-13

Family

ID=52098498

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410475211.0A Active CN104217008B (en) 2014-09-17 2014-09-17 Internet personage video interactive mask method and system

Country Status (1)

Country Link
CN (1) CN104217008B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104796781A (en) * 2015-03-31 2015-07-22 小米科技有限责任公司 Video clip extraction method and device
CN105809096A (en) * 2014-12-31 2016-07-27 中兴通讯股份有限公司 Figure labeling method and terminal
CN106227836A (en) * 2016-07-26 2016-12-14 上海交通大学 Associating visual concept learning system and method is supervised with the nothing of word based on image
CN107480236A (en) * 2017-08-08 2017-12-15 深圳创维数字技术有限公司 A kind of information query method, device, equipment and medium
CN107710261A (en) * 2015-12-14 2018-02-16 谷歌有限责任公司 The system and method paid attention to for estimating user
CN107832662A (en) * 2017-09-27 2018-03-23 百度在线网络技术(北京)有限公司 A kind of method and system for obtaining picture labeled data
CN108882033A (en) * 2018-07-19 2018-11-23 北京影谱科技股份有限公司 A kind of character recognition method based on video speech, device, equipment and medium
CN109214247A (en) * 2017-07-04 2019-01-15 腾讯科技(深圳)有限公司 Face identification method and device based on video
CN111046235A (en) * 2019-11-28 2020-04-21 福建亿榕信息技术有限公司 Method, system, equipment and medium for searching acoustic image archive based on face recognition
CN111126069A (en) * 2019-12-30 2020-05-08 华南理工大学 Social media short text named entity identification method based on visual object guidance
CN111144306A (en) * 2019-12-27 2020-05-12 联想(北京)有限公司 Information processing method, information processing apparatus, and information processing system
CN111639599A (en) * 2020-05-29 2020-09-08 北京百度网讯科技有限公司 Object image mining method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101739428A (en) * 2008-11-10 2010-06-16 中国科学院计算技术研究所 Method for establishing index for multimedia
CN102629275A (en) * 2012-03-21 2012-08-08 复旦大学 Face and name aligning method and system facing to cross media news retrieval
CN103984738A (en) * 2014-05-22 2014-08-13 中国科学院自动化研究所 Role labelling method based on search matching

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101739428A (en) * 2008-11-10 2010-06-16 中国科学院计算技术研究所 Method for establishing index for multimedia
CN102629275A (en) * 2012-03-21 2012-08-08 复旦大学 Face and name aligning method and system facing to cross media news retrieval
CN103984738A (en) * 2014-05-22 2014-08-13 中国科学院自动化研究所 Role labelling method based on search matching

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张杰等: "基于友好交互模式的半自动照片人物标注系统", 《第八届和谐人机环境联合学术会议(HHME2012)论文集NCMT》 *
郜新鑫: "基于用户交互的web图像标注框架设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105809096A (en) * 2014-12-31 2016-07-27 中兴通讯股份有限公司 Figure labeling method and terminal
CN104796781A (en) * 2015-03-31 2015-07-22 小米科技有限责任公司 Video clip extraction method and device
CN104796781B (en) * 2015-03-31 2019-01-18 小米科技有限责任公司 Video clip extracting method and device
CN107710261A (en) * 2015-12-14 2018-02-16 谷歌有限责任公司 The system and method paid attention to for estimating user
CN107710261B (en) * 2015-12-14 2021-06-29 谷歌有限责任公司 System and method for estimating user attention
CN106227836A (en) * 2016-07-26 2016-12-14 上海交通大学 Associating visual concept learning system and method is supervised with the nothing of word based on image
CN106227836B (en) * 2016-07-26 2020-07-14 上海交通大学 Unsupervised joint visual concept learning system and unsupervised joint visual concept learning method based on images and characters
CN109214247A (en) * 2017-07-04 2019-01-15 腾讯科技(深圳)有限公司 Face identification method and device based on video
CN107480236A (en) * 2017-08-08 2017-12-15 深圳创维数字技术有限公司 A kind of information query method, device, equipment and medium
CN107832662A (en) * 2017-09-27 2018-03-23 百度在线网络技术(北京)有限公司 A kind of method and system for obtaining picture labeled data
CN108882033B (en) * 2018-07-19 2021-12-14 上海影谱科技有限公司 Character recognition method, device, equipment and medium based on video voice
CN108882033A (en) * 2018-07-19 2018-11-23 北京影谱科技股份有限公司 A kind of character recognition method based on video speech, device, equipment and medium
CN111046235A (en) * 2019-11-28 2020-04-21 福建亿榕信息技术有限公司 Method, system, equipment and medium for searching acoustic image archive based on face recognition
CN111046235B (en) * 2019-11-28 2022-06-14 福建亿榕信息技术有限公司 Method, system, equipment and medium for searching acoustic image archive based on face recognition
CN111144306A (en) * 2019-12-27 2020-05-12 联想(北京)有限公司 Information processing method, information processing apparatus, and information processing system
CN111126069A (en) * 2019-12-30 2020-05-08 华南理工大学 Social media short text named entity identification method based on visual object guidance
CN111126069B (en) * 2019-12-30 2022-03-29 华南理工大学 Social media short text named entity identification method based on visual object guidance
CN111639599A (en) * 2020-05-29 2020-09-08 北京百度网讯科技有限公司 Object image mining method, device, equipment and storage medium
CN111639599B (en) * 2020-05-29 2024-04-02 北京百度网讯科技有限公司 Object image mining method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN104217008B (en) 2018-03-13

Similar Documents

Publication Publication Date Title
CN104217008A (en) Interactive type labeling method and system for Internet figure video
Lokoč et al. Is the reign of interactive search eternal? findings from the video browser showdown 2020
CN106201177B (en) A kind of operation execution method and mobile terminal
US20130243249A1 (en) Electronic device and method for recognizing image and searching for concerning information
CN104281649A (en) Input method and device and electronic equipment
CN111291210B (en) Image material library generation method, image material recommendation method and related devices
WO2015176525A1 (en) Time-serialization-based document identification, association, search, and display system
KR20150091053A (en) Method and apparatus for video retrieval
CN103593363A (en) Video content indexing structure building method and video searching method and device
CN102222103A (en) Method and device for processing matching relationship of video content
Nguyen et al. LifeSeeker 3.0: An Interactive Lifelog Search Engine for LSC'21
CN109614482A (en) Processing method, device, electronic equipment and the storage medium of label
CN113779381B (en) Resource recommendation method, device, electronic equipment and storage medium
CN104933171B (en) Interest point data association method and device
CN104462590A (en) Information searching method and device
CN103399855B (en) Behavior intention determining method and device based on multiple data sources
CN107665188A (en) A kind of semantic understanding method and device
JP2016157492A (en) Method and apparatus for providing retrieval service interactively displaying type of retrieval target
Baidya et al. LectureKhoj: automatic tagging and semantic segmentation of online lecture videos
Zang et al. Multimodal icon annotation for mobile applications
CN103955480A (en) Method and equipment for determining target object information corresponding to user
CN108763369A (en) A kind of video searching method and device
RU2459242C1 (en) Method of generating and using recursive index of search engines
CN113869063A (en) Data recommendation method and device, electronic equipment and storage medium
CN111639234B (en) Method and device for mining core entity attention points

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant