WO2013044407A1 - Retrieving visual media - Google Patents

Retrieving visual media Download PDF

Info

Publication number
WO2013044407A1
WO2013044407A1 PCT/CN2011/001629 CN2011001629W WO2013044407A1 WO 2013044407 A1 WO2013044407 A1 WO 2013044407A1 CN 2011001629 W CN2011001629 W CN 2011001629W WO 2013044407 A1 WO2013044407 A1 WO 2013044407A1
Authority
WO
WIPO (PCT)
Prior art keywords
visual media
processor
instances
content type
group
Prior art date
Application number
PCT/CN2011/001629
Other languages
French (fr)
Inventor
Tong Zhang
Keyan LIU
Xinyun SUN
Original Assignee
Hewlett-Packard Development Company, L.P.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett-Packard Development Company, L.P. filed Critical Hewlett-Packard Development Company, L.P.
Priority to US14/239,387 priority Critical patent/US9229958B2/en
Priority to CN201180073732.8A priority patent/CN103827856A/en
Priority to PCT/CN2011/001629 priority patent/WO2013044407A1/en
Priority to EP11873267.6A priority patent/EP2734931A4/en
Publication of WO2013044407A1 publication Critical patent/WO2013044407A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/5866Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • G06F16/784Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • FLICKR® Google, Inc.
  • PICASA® Google, Inc.
  • YOUTUBE® Google, Inc.
  • Visual media People are often the principal subject matter in visual media, such as photos, images, and video frames. The ability to find visual media of a particular person easily and quickly in a visual media dataset is highly desired. Searching for visual media including a particular person can have many applications. Visual media content is best evaluated visually. However, legacy search tools are often text based, originally designed to return text results, and more recently expanding into applications involving image searches. That is, the search input is limited to text, such as a person's name, a noun, or written description of the visual media being sought. Text-based searching alone can be imprecise with respect to visual media results since, for example, many people can have the same name which can return visual media of many different people.
  • Figure 1 illustrates a text-base search portal for retrieving visual media in accordance with one or more examples of the present disclosure.
  • Figure 2A illustrates a display of visual media returned responsive to a text-based query in accordance with one or more examples of the present disclosure.
  • Figure 2B illustrates a display of re-ranked visual media in accordance with one or more examples of the present disclosure.
  • Figure 3 illustrates a flow diagram of a method for retrieving visual media in accordance with one or more examples of the present disclosure.
  • Figure 4 illustrates a block diagram of an example computing system for retrieving visual media in accordance with one or more examples of the present disclosure.
  • FIG. 5 illustrates a block diagram of an example computer readable medium (CRM) in communication with processing resources in accordance with one or more examples of the present disclosure.
  • CRM computer readable medium
  • an example method can include receiving a text query associated with a target content.
  • a first group of visual media is identified based on correspondence of metadata of the visual media with the text query.
  • Keyframes from the first group of identified visual media are selected.
  • the method further includes detecting instances of a content type in the selected keyframes, and grouping similar instances of the content type into clusters.
  • the target content is associated with a cluster having a greatest quantity of similar instances.
  • the term “includes” means includes but not limited to, the term “including” means including but not limited to.
  • the term “based on” means based at least in part on.
  • This disclosure provides a system and method for searching to find visual media of a particular person(s), for example, by using a network such as the Internet.
  • the input to the system and/or method can be a text query, such as the names of one or more of the person(s) to be searched.
  • the output of the system and/or method can be a display of visual media and/or a list of visual media containing the person(s).
  • the list may include the location of visual media and/or segments thereof in each returned visual media that contain the person(s). For example, the list may indicate that the subject appears at certain times and/or locations in a particular visual media.
  • a user may view and/or edit the visual media of the person(s). That is, a user may select portions of the returned visual media to use in composing a new visual media.
  • a new video may be formed with one or more segments of the person(s) extracted from multiple returned videos.
  • Such new visual media may include still images, either from original still images or extracted from video.
  • the disclosed system and method may also be applied to discover people who appear frequently with the person(s) who were the identified target of the search.
  • Visual media can be composed showing the person(s) who were the identified target of the search and others together.
  • the results of the system and method for searching to find visual media of the present disclosure can also be used to generate statistics of people's co-appearance with the person(s) who were the identified target of the search.
  • returned results from text query may be mixed.
  • Some of the visual media may not contain the particular person at all, or may not even be related to the particular person.
  • a text search for "John Smith” can produce visual media for many different people named John Smith in addition to a particular John Smith of interest.
  • face clustering By applying face clustering to frames in top returned visual media, facial features of the person can be obtained, which can subsequently be used to find more relevant visual media.
  • No input sample of visual media (such as in query by example approaches) is necessary, and no training of a classifier is needed.
  • located visual media segments of the particular person can be used for repurposing.
  • Figure 1 illustrates a text-based search portal for retrieving visual media in accordance with one or more examples of the present disclosure.
  • the text- based search portal 100 can be, for example, a web page associated with an Internet, or other database, search engine 106.
  • the text-based search portal 100 can be a front end of a commercially-available search engine 106 from which techniques of the present disclosure can be applied, or can be a front end for a stand-alone visual medial search system (e.g., a private visual media dataset).
  • the search portal 100 can include a search field 102 by which to receive a text query 104.
  • the text query 104 can be, for example, one or more people's name or another descriptor of the sought-after visual media.
  • the text query 104 can be a title such as "president” or "pope,” or be a description such as "first black president” or "leading man in Gone with the Wind movie.”
  • the visual medial search system can search a collection of visual media (e.g., images, video) and return visual media results based on text descriptions associated with particular ones of the visual media such as metadata therefore.
  • Associated text descriptions can be in the form of visible and/or invisible text information associated with the visual media.
  • Visible text information associated with visual media can include tagging or labeling on videos and images, the tags being capable of display along with the
  • Invisible text information associated with visual media can include metadata associated with particular visual media, such as time, date, and/or place of capture, description of subject matter, etc. stored in a file associated with the visual media.
  • various methods for retrieving visual media can involve for retrieving visual media via the Internet (e.g., images stored in the cloud, YOUTUBE® videos).
  • a text-based search engine a number of videos can be returned.
  • some of the videos may not be related to particular person(s) being searched due to the noisy nature of text annotations. That is text annotations can be general, inaccurate, vague, imprecise, etc.
  • some of them may not contain the person(s). For example, a video annotated as "Johnny's graduation party" may capture those relatives who attended, rather than the subject of interest, Johnny.
  • the videos may not be ranked by how much the target person(s) appear either.
  • FIG. 2A illustrates a display of visual media 210 returned responsive to a text-based query in accordance with one or more examples of the present disclosure.
  • the display of visual media 210 can include a number of returned visual media 213 (e.g. , videos, images) of the target content (e.g., person(s) being searched for), as well as a number of query images 21 1 derived from the returned visual media 213.
  • the query images 21 1 can be derived from the returned visual media 213, for example, by face clustering or other identification techniques. Where the query images 21 1 are derived from the returned visual media 213 by face clustering techniques, the query images 21 1 may be face images 212, as shown.
  • the display of visual media 210 can be based on a text query.
  • the system can search, for example, in a collection of tagged visual media over the Internet (e.g., YOUTUBE® videos) using a text search engine (such as the one used in YOUTUBE®).
  • a number of videos 214 can be returned. However, among the number of returned videos 214, some of the returned videos may not be related to the target content (e.g., particular person(s) being searched).
  • Keyframes can be extracted from the videos 214 returned from the text query. Keyframes intend one or more portions (e.g., a frame) of a returned video 214. Keyframes can be selected from the top N returned videos (e.g., 20). The system and/or method of the present disclosure are not limited to the example quantity discussed here, and may include keyframes selected from more or fewer returned visual media. The keyframes may be evenly sampled over time, or may be selected through an intelligent scheme.
  • a keyframe collection can contain, for example, the keyframes selected from the N videos 214, or the query images 211 may be based on the keyframes (e.g., include some additional area around a keyframe).
  • a face detector can be applied to all the keyframes to detect one or more faces 216 in a keyframe. Face clustering can be performed in the keyframes. Face clustering can be conducted on all the detected faces 216. Faces of the same person can be grouped into a cluster. Even though there may be videos not related to the target person in the top N returned videos 214, or some of the top N returned videos 214 may not containing the target person, the largest face cluster can be assumed to correspond to the target person based on an assumption that at least some of the returned videos 214 contain the target person. There may be other, smaller face clusters that correspond to people who appear with the target person in the returned videos214, or people who are not relevant at all.
  • Candidate query faces 212 can be automatically generated.
  • a quantity, K, of face images (e.g., 4, 5) from each of the top face clusters can be selected.
  • the quantity K is not limited to any particular value, and can be more or fewer than the example quantities provided herein.
  • the face image 216 with the biggest face can be chosen.
  • a face image 216 that may be most different from the chosen face image 216 can be selected. This process can continue until K face images 216 can be selected. For example, if the text query is "Barak Obama," the largest face cluster should correspond to President Barak Obama, and K face images 216 of his can be selected as a query image 212.
  • Face images 216 of people who appear frequently enough in the returned videos 214 and have large enough face clusters may also be selected as a query image 212.
  • the text query is "Clinton," there might be large face clusters of both Bill Clinton and Hilary Clinton, which can each be selected as a query image 212. There may also be face clusters of other people related to the name. Such automatically selected face images can be displayed to a user as query images 212 in an order of face cluster size (e.g., largest face cluster size being the topmost or right-most or top-right-most image), which is most likely to present face images 212 of the target person(s) in a most prominent position (e.g., at the top).
  • face cluster size e.g., largest face cluster size being the topmost or right-most or top-right-most image
  • examples of the present disclosure are not limited to any particular ordering. Other ordering schemes are possible and/or other means for indicating preferred candidate query images are contemplated, such as by highlighting, labeling, ordering, ranking, etc.
  • incremental clustering e.g. , online clustering
  • a dynamic environment in which new visual media data is continually added to the dataset, such as the Internet.
  • Incremental clustering can be performed prior to a particular text query and/or stored based on a previous text query, and subsequently used to return visual media and/or determine appropriate clusters.
  • re-ranking of the query images 212 can be performed.
  • a user may conduct a visual inspection of the query images 212, and select one or more query image 2 2 from the displayed array of query images 212.
  • the query image 212 is a face image representative of a face cluster. Selecting a query image 212 indicates the face image of the targeted person(s) from among query images that may be of other person(s) (e.g. , other persons with a same naTne).
  • the original returned M videos where M is greater than or equal to N (i.e., the number of returned videos displayed) from text query can be re-ranked.
  • a collection of keyframes can be selected from each of the videos.
  • the keyframes now selected may be the same as the keyframes previously selected, or may be a more densely sampled collection.
  • Detected faces can be clustered in the keyframe set from each video. The resulting face clusters can be compared (e.g., matched) with a selected query face image.
  • the video can be identified as relevant that contains the target person.
  • a ranking score can be computed for each video.
  • the ranking score can be comprised of one or more of the following factors: (a) a relevant video in which the target person appears ranks higher than a non-relevant video; (b) the total time period in the video in which the target person appears; (c) the number of times that the video has been viewed. Other factors may be included in determining the ranking score.
  • the videos can be arranged (e.g., listed) according to the new rank.
  • retrieving visual media involves two rounds of keyframe extraction and face clustering.
  • keyframes can be extracted from a number of top returned visual media, for example, and all keyframes from the different returned visual media can be utilized together as a collection to which face clustering can be applied.
  • a keyframe set can be extracted within each returned visual media, and face clustering can be applied on the keyframe set within each visual media.
  • Figure 2B illustrates a display of re-ranked visual media 221 in
  • the re- ranked visual media 221 can include the top T videos 220, for example, as determined from the ranking score.
  • the top T videos 220 can indicate the face images 216 associated with the cluster corresponding to the selected query image.
  • the results of a video search can be cached and updated.
  • videos can be processed offline (e.g., prior to a particular search) and the analysis results can be cached.
  • the visual media retrieval system can analyze tags and/or metadata of the most viewed videos to obtain the collection of most popular people (e.g., celebrities). With names of these people as text queries, the system can generate re-ranked results based on these queries.
  • the video lists and locations of segments in which the target person(s) appear can be cached.
  • the visual media retrieval system can constantly update the cached queries (e.g., list of people popular in viewed videos) with new input queries by users.
  • the visual media retrieval system can regularly update the video search results with new videos uploaded to the dataset (e.g., Internet).
  • the visual media retrieval system can be arranged such that video analysis may only need to be done on newly uploaded videos of a person in the existing list, or on videos of a query of people not previously in an existing list. While the visual media retrieval system can keep computing to provide better and better results, the storage required to store the cache can be minimized, for example, by only storing the pointers to the visual media and/or locations of related segments within particular visual media.
  • the user may view the visual media, and/or directly jump to segments within the visual media in which a target person appears.
  • the visual media may also be repurposed to compose customized visual media (e.g., video, photo) products. For example, the user may pick one or more segments of visual media that include the target person(s) from multiple visual media sources and make a new visual media containing selected appearances of the target person(s).
  • the visual media retrieval system can include visual media editing tools that may be applied to identified visual media. Keyframes of the target person(s) automatically identified or semi-automatically selected may also be displayed, which the user may edit. For example, a user may make a photobook of the target person(s).
  • Returned visual media can include, or a user may compose visual media that includes, the target person with other people who appear in the same scene.
  • the visual media retrieval system can be applied to reveal certain social relations relative to the targeted person(s), and related statistics.
  • the user may further compose visual media products of multiple target people appearing together (e.g., such as a user appearing with a celebrity).
  • Figure 3 illustrates a flow diagram of a method for retrieving visual media in accordance with one or more examples of the present disclosure.
  • One example method includes receiving a text query associated with a target content at 360.
  • a first group of visual media can be identified based on correspondence of metadata of the visual media with the text query, as shown at 362.
  • keyframes from the first group of identified visual media are selected.
  • the method further includes detecting instances of a content type in the selected keyframes as indicated at 366, and grouping similar instances of the content type into clusters at 368.
  • the target content can be associated with a cluster having a greatest quantity of similar instances, as indicated at 370.
  • Figure 4 illustrates a block diagram of an example computing system used to implement visual media searching according to the present disclosure.
  • the computing system 474 can be comprised of a number of computing resources communicatively coupled to the network 478.
  • Figure 4 shows a first computing device 475 that may also have an associated data source 476, and may have one or more input/output devices (e.g., keyboard, electronic display).
  • a second computing device 479 is also shown in Figure 4 being
  • Second computing device 479 may include one or more processors 480 communicatively coupled to a non-transitory computer-readable medium 481.
  • the non-transitory computer-readable medium 481 may be structured to store executable instructions 482 (e.g., one or more programs) that can be executed by the one or more processors 480 and/or data.
  • the second computing device 479 may be further communicatively coupled to a production device 483 (e.g., electronic display, printer, etc.). Second computing device 479 can also be communicatively coupled to an external computer-readable memory 484.
  • the second computing device 479 can cause an output to the production device 483, for example, as a result of executing instructions of one or more programs stored on non-transitory computer-readable medium 481 , by the at least one processor 480, to implement a system for retrieving visual media according to the present disclosure.
  • Causing an output can include, but is not limited to, displaying text and images to an electronic display and/or printing text and images to a tangible medium (e.g., paper).
  • Executable instructions to implement visual media retrieving may be executed by the first computing device 475 and/or second computing device 479, stored in a database such as may be maintained in external computer-readable memory 484, output to production device 483, and/or printed to a tangible medium.
  • One or more additional computers 477 may also be communicatively coupled to the network 478 via a communication link that includes a wired and/or wireless portion.
  • the computing system can be comprised of additional multiple interconnected computing devices, such as server devices and/or clients. Each computing device can include control circuitry such as a
  • processor a state machine, application specific integrated circuit (ASIC), controller, and/or similar machine.
  • ASIC application specific integrated circuit
  • the control circuitry can have a structure that provides a given
  • non-transitory computer-readable medium e.g., 476, 481 , 484.
  • the non- transitory computer-readable medium can be integral (e.g., 481 ), or
  • the non- transitory computer-readable medium can be an internal memory, a portable memory, a portable disk, or a memory located internal to another computing resource (e.g., enabling the computer-readable instructions to be downloaded over the Internet).
  • the non-transitory computer-readable medium 330 can have computer-readable instructions stored thereon that are executed by the control circuitry (e.g., processor) to provide a particular functionality.
  • the non-transitory computer-readable medium can include volatile and/or non-volatile memory.
  • Volatile memory can include memory that depends upon power to store information, such as various types of dynamic random access memory (DRAM), among others.
  • DRAM dynamic random access memory
  • Non-volatile memory can include memory that does not depend upon power to store information.
  • non-volatile memory can include solid state media such as flash memory, EEPROM, phase change random access memory (PCRAM), among others.
  • the non-transitory computer-readable medium can include optical discs, digital video discs (DVD), Blu-ray discs, compact discs (CD), laser discs, and magnetic media such as tape drives, floppy discs, and hard drives, solid state media such as flash memory, EEPROM, phase change random access memory (PCRAM), as well as other types of machine-readable media.
  • Logic can be used to implement the method(s) of the present disclosure, in whole or part. Logic can be implemented using appropriately configured hardware and/or machine readable instructions (including software). The above-mentioned logic portions may be discretely implemented and/or implemented in a common arrangement.
  • FIG. 5 illustrates a block diagram of an example computer readable medium (CRM) 595 in communication, e.g., via a communication path 596, with processing resources 593 according to the present disclosure.
  • processor resources 593 can include one or a plurality of processors 594 such as in a parallel processing arrangement.
  • a computing device having processor resources can be in communication with, and/or receive a tangible non- transitory computer readable medium (CRM) 595 storing a set of computer readable instructions (e.g., software) for capturing and/or replaying network traffic, as described herein.
  • CRM computer readable medium

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • User Interface Of Digital Computer (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Examples of the present disclosure may include methods, systems, and computer readable media with executable instructions. An example method for retrieving visual media can include receiving a text query associated with a target content. A first group of visual media is identified based on correspondence of metadata of the visual media with the text query, and keyframes from the first group of identified visual media are selected. The method further includes detecting instances of a content type in the selected keyframes, and grouping similar instances of the content type into clusters. The target content is associated with a cluster having a greatest quantity of similar instances.

Description

RETRIEVING VISUAL MEDIA
Background
The amount of visual media on the Internet is growing due to people sharing photos and video, and by commercial efforts in response to increasing speeds and bandwidth capabilities of the network. Internet data transfer speeds are increasing. WEB 2.0 applications that facilitate participatory information sharing such as social networking sites, blogs, social media, and others are growing in number. Image-based and video sharing websites, such as
FLICKR® (Google, Inc.), PICASA® (Google, Inc.), YOUTUBE® (Google, Inc.), etc., are growing in popularity. All of these capabilities and developments are making online content-based image manipulations very useful. Since new visual media is being uploaded to the Internet all the time, ways to efficiently organize, index, and retrieve desired visual media is a constant and ever- growing challenge. Organizing visual media can be an enormous endeavor.
People are often the principal subject matter in visual media, such as photos, images, and video frames. The ability to find visual media of a particular person easily and quickly in a visual media dataset is highly desired. Searching for visual media including a particular person can have many applications. Visual media content is best evaluated visually. However, legacy search tools are often text based, originally designed to return text results, and more recently expanding into applications involving image searches. That is, the search input is limited to text, such as a person's name, a noun, or written description of the visual media being sought. Text-based searching alone can be imprecise with respect to visual media results since, for example, many people can have the same name which can return visual media of many different people. Users aren't typically interested in all results returned in response to a text search query (e.g., images of all people named "Bob Smith"), but rather some portion of returned images such as an image of the "Bob Smith" they know. Therefore, some ordering of visual media search results can be beneficial to a user. Brief Description of the Drawings
Figure 1 illustrates a text-base search portal for retrieving visual media in accordance with one or more examples of the present disclosure.
Figure 2A illustrates a display of visual media returned responsive to a text-based query in accordance with one or more examples of the present disclosure.
Figure 2B illustrates a display of re-ranked visual media in accordance with one or more examples of the present disclosure.
Figure 3 illustrates a flow diagram of a method for retrieving visual media in accordance with one or more examples of the present disclosure.
Figure 4 illustrates a block diagram of an example computing system for retrieving visual media in accordance with one or more examples of the present disclosure.
Figure 5 illustrates a block diagram of an example computer readable medium (CRM) in communication with processing resources in accordance with one or more examples of the present disclosure.
Detailed Description
Examples of the present disclosure may include methods, systems, and computer readable media with executable instructions, and/or logic. According to one or more examples of the present disclosure, an example method can include receiving a text query associated with a target content. A first group of visual media is identified based on correspondence of metadata of the visual media with the text query. Keyframes from the first group of identified visual media are selected. The method further includes detecting instances of a content type in the selected keyframes, and grouping similar instances of the content type into clusters. The target content is associated with a cluster having a greatest quantity of similar instances.
As used herein, the term "includes" means includes but not limited to, the term "including" means including but not limited to. The term "based on" means based at least in part on. This disclosure provides a system and method for searching to find visual media of a particular person(s), for example, by using a network such as the Internet. According to examples of the present disclosure, the input to the system and/or method can be a text query, such as the names of one or more of the person(s) to be searched. The output of the system and/or method can be a display of visual media and/or a list of visual media containing the person(s). The list may include the location of visual media and/or segments thereof in each returned visual media that contain the person(s). For example, the list may indicate that the subject appears at certain times and/or locations in a particular visual media.
With such results, a user may view and/or edit the visual media of the person(s). That is, a user may select portions of the returned visual media to use in composing a new visual media. For example, a new video may be formed with one or more segments of the person(s) extracted from multiple returned videos. Such new visual media may include still images, either from original still images or extracted from video. The disclosed system and method may also be applied to discover people who appear frequently with the person(s) who were the identified target of the search. Visual media can be composed showing the person(s) who were the identified target of the search and others together. The results of the system and method for searching to find visual media of the present disclosure can also be used to generate statistics of people's co-appearance with the person(s) who were the identified target of the search.
When searching for visual media of a particular person, returned results from text query may be mixed. Some of the visual media may not contain the particular person at all, or may not even be related to the particular person. For example, a text search for "John Smith" can produce visual media for many different people named John Smith in addition to a particular John Smith of interest. By applying face clustering to frames in top returned visual media, facial features of the person can be obtained, which can subsequently be used to find more relevant visual media. No input sample of visual media (such as in query by example approaches) is necessary, and no training of a classifier is needed. Also, located visual media segments of the particular person can be used for repurposing.
Figure 1 illustrates a text-based search portal for retrieving visual media in accordance with one or more examples of the present disclosure. The text- based search portal 100 can be, for example, a web page associated with an Internet, or other database, search engine 106. The text-based search portal 100 can be a front end of a commercially-available search engine 106 from which techniques of the present disclosure can be applied, or can be a front end for a stand-alone visual medial search system (e.g., a private visual media dataset).
The search portal 100 can include a search field 102 by which to receive a text query 104. The text query 104 can be, for example, one or more people's name or another descriptor of the sought-after visual media. For example, the text query 104 can be a title such as "president" or "pope," or be a description such as "first black president" or "leading man in Gone with the Wind movie." For a text query 104, the visual medial search system can search a collection of visual media (e.g., images, video) and return visual media results based on text descriptions associated with particular ones of the visual media such as metadata therefore. Associated text descriptions can be in the form of visible and/or invisible text information associated with the visual media. Visible text information associated with visual media can include tagging or labeling on videos and images, the tags being capable of display along with the
video/image. Invisible text information associated with visual media can include metadata associated with particular visual media, such as time, date, and/or place of capture, description of subject matter, etc. stored in a file associated with the visual media.
According to one or more examples of the present disclosure, various methods for retrieving visual media can involve for retrieving visual media via the Internet (e.g., images stored in the cloud, YOUTUBE® videos). With a text- based search engine, a number of videos can be returned. However, among the returned videos, some of the videos may not be related to particular person(s) being searched due to the noisy nature of text annotations. That is text annotations can be general, inaccurate, vague, imprecise, etc. Furthermore, among related videos, some of them may not contain the person(s). For example, a video annotated as "Johnny's graduation party" may capture those relatives who attended, rather than the subject of interest, Johnny. The videos may not be ranked by how much the target person(s) appear either.
Figure 2A illustrates a display of visual media 210 returned responsive to a text-based query in accordance with one or more examples of the present disclosure. The display of visual media 210 can include a number of returned visual media 213 (e.g. , videos, images) of the target content (e.g., person(s) being searched for), as well as a number of query images 21 1 derived from the returned visual media 213. The query images 21 1 can be derived from the returned visual media 213, for example, by face clustering or other identification techniques. Where the query images 21 1 are derived from the returned visual media 213 by face clustering techniques, the query images 21 1 may be face images 212, as shown.
As an example of the system and method for retrieving visual media of the present disclosure, the following discussion refers to video clips, such as those found on YOUTUBE®. However, the present disclosure is not limited to visual media being only such video and can include other types of visual media such as still images and/or other visual media file formats. The display of visual media 210 can be based on a text query. For a text query containing one or more people's names, the system can search, for example, in a collection of tagged visual media over the Internet (e.g., YOUTUBE® videos) using a text search engine (such as the one used in YOUTUBE®). A number of videos 214 can be returned. However, among the number of returned videos 214, some of the returned videos may not be related to the target content (e.g., particular person(s) being searched).
Keyframes can be extracted from the videos 214 returned from the text query. Keyframes intend one or more portions (e.g., a frame) of a returned video 214. Keyframes can be selected from the top N returned videos (e.g., 20). The system and/or method of the present disclosure are not limited to the example quantity discussed here, and may include keyframes selected from more or fewer returned visual media. The keyframes may be evenly sampled over time, or may be selected through an intelligent scheme. A keyframe collection can contain, for example, the keyframes selected from the N videos 214, or the query images 211 may be based on the keyframes (e.g., include some additional area around a keyframe).
A face detector can be applied to all the keyframes to detect one or more faces 216 in a keyframe. Face clustering can be performed in the keyframes. Face clustering can be conducted on all the detected faces 216. Faces of the same person can be grouped into a cluster. Even though there may be videos not related to the target person in the top N returned videos 214, or some of the top N returned videos 214 may not containing the target person, the largest face cluster can be assumed to correspond to the target person based on an assumption that at least some of the returned videos 214 contain the target person. There may be other, smaller face clusters that correspond to people who appear with the target person in the returned videos214, or people who are not relevant at all.
Candidate query faces 212 can be automatically generated. A quantity, K, of face images (e.g., 4, 5) from each of the top face clusters can be selected. However, the quantity K is not limited to any particular value, and can be more or fewer than the example quantities provided herein. For one cluster, the face image 216 with the biggest face can be chosen. A face image 216 that may be most different from the chosen face image 216 can be selected. This process can continue until K face images 216 can be selected. For example, if the text query is "Barak Obama," the largest face cluster should correspond to President Barak Obama, and K face images 216 of his can be selected as a query image 212. Face images 216 of people who appear frequently enough in the returned videos 214 and have large enough face clusters may also be selected as a query image 212.
If the text query is "Clinton," there might be large face clusters of both Bill Clinton and Hilary Clinton, which can each be selected as a query image 212. There may also be face clusters of other people related to the name. Such automatically selected face images can be displayed to a user as query images 212 in an order of face cluster size (e.g., largest face cluster size being the topmost or right-most or top-right-most image), which is most likely to present face images 212 of the target person(s) in a most prominent position (e.g., at the top). However, examples of the present disclosure are not limited to any particular ordering. Other ordering schemes are possible and/or other means for indicating preferred candidate query images are contemplated, such as by highlighting, labeling, ordering, ranking, etc.
According to one or more examples, incremental clustering, e.g. , online clustering, can be used for a dynamic environment in which new visual media data is continually added to the dataset, such as the Internet. Incremental clustering can be performed prior to a particular text query and/or stored based on a previous text query, and subsequently used to return visual media and/or determine appropriate clusters.
According to examples of the present disclosure, re-ranking of the query images 212 can be performed. A user may conduct a visual inspection of the query images 212, and select one or more query image 2 2 from the displayed array of query images 212. The query image 212 is a face image representative of a face cluster. Selecting a query image 212 indicates the face image of the targeted person(s) from among query images that may be of other person(s) (e.g. , other persons with a same naTne).
Based on this visual query, the original returned M videos, where M is greater than or equal to N (i.e., the number of returned videos displayed) from text query can be re-ranked. Within the returned videos, a collection of keyframes can be selected from each of the videos. The keyframes now selected may be the same as the keyframes previously selected, or may be a more densely sampled collection. Detected faces can be clustered in the keyframe set from each video. The resulting face clusters can be compared (e.g., matched) with a selected query face image.
For a particular video, if there is at least one face cluster that matches the selected query face, the video can be identified as relevant that contains the target person. A ranking score can be computed for each video. The ranking score can be comprised of one or more of the following factors: (a) a relevant video in which the target person appears ranks higher than a non-relevant video; (b) the total time period in the video in which the target person appears; (c) the number of times that the video has been viewed. Other factors may be included in determining the ranking score. The videos can be arranged (e.g., listed) according to the new rank.
As described above, from text-based visual media search results, retrieving visual media according to the present disclosure involves two rounds of keyframe extraction and face clustering. In the first round, keyframes can be extracted from a number of top returned visual media, for example, and all keyframes from the different returned visual media can be utilized together as a collection to which face clustering can be applied. In the second round, a keyframe set can be extracted within each returned visual media, and face clustering can be applied on the keyframe set within each visual media.
Figure 2B illustrates a display of re-ranked visual media 221 in
accordance with one or more examples of the present disclosure. The re- ranked visual media 221 can include the top T videos 220, for example, as determined from the ranking score. The top T videos 220 can indicate the face images 216 associated with the cluster corresponding to the selected query image.
The results of a video search can be cached and updated. For quick response to user's query, videos can be processed offline (e.g., prior to a particular search) and the analysis results can be cached. For example with respect to an offline process, the visual media retrieval system can analyze tags and/or metadata of the most viewed videos to obtain the collection of most popular people (e.g., celebrities). With names of these people as text queries, the system can generate re-ranked results based on these queries. The video lists and locations of segments in which the target person(s) appear can be cached. The visual media retrieval system can constantly update the cached queries (e.g., list of people popular in viewed videos) with new input queries by users.
Also, the visual media retrieval system can regularly update the video search results with new videos uploaded to the dataset (e.g., Internet). The visual media retrieval system can be arranged such that video analysis may only need to be done on newly uploaded videos of a person in the existing list, or on videos of a query of people not previously in an existing list. While the visual media retrieval system can keep computing to provide better and better results, the storage required to store the cache can be minimized, for example, by only storing the pointers to the visual media and/or locations of related segments within particular visual media.
Once the most relevant visual media are identified and/or retrieved, the user may view the visual media, and/or directly jump to segments within the visual media in which a target person appears. The visual media may also be repurposed to compose customized visual media (e.g., video, photo) products. For example, the user may pick one or more segments of visual media that include the target person(s) from multiple visual media sources and make a new visual media containing selected appearances of the target person(s).
The visual media retrieval system can include visual media editing tools that may be applied to identified visual media. Keyframes of the target person(s) automatically identified or semi-automatically selected may also be displayed, which the user may edit. For example, a user may make a photobook of the target person(s).
Moreover, from face clusters within each relevant visual media, people who appear frequently with the target person may be discovered. Statistics can be obtained regarding who appears most often with the target person.
Returned visual media can include, or a user may compose visual media that includes, the target person with other people who appear in the same scene.
Furthermore, the visual media retrieval system can be applied to reveal certain social relations relative to the targeted person(s), and related statistics. The user may further compose visual media products of multiple target people appearing together (e.g., such as a user appearing with a celebrity).
Figure 3 illustrates a flow diagram of a method for retrieving visual media in accordance with one or more examples of the present disclosure. One example method includes receiving a text query associated with a target content at 360. A first group of visual media can be identified based on correspondence of metadata of the visual media with the text query, as shown at 362. At 364, keyframes from the first group of identified visual media are selected. The method further includes detecting instances of a content type in the selected keyframes as indicated at 366, and grouping similar instances of the content type into clusters at 368. The target content can be associated with a cluster having a greatest quantity of similar instances, as indicated at 370.
Figure 4 illustrates a block diagram of an example computing system used to implement visual media searching according to the present disclosure. The computing system 474 can be comprised of a number of computing resources communicatively coupled to the network 478. Figure 4 shows a first computing device 475 that may also have an associated data source 476, and may have one or more input/output devices (e.g., keyboard, electronic display). A second computing device 479 is also shown in Figure 4 being
communicatively coupled to the network 478, such that executable instructions may be communicated through the network between the first and second computing devices.
Second computing device 479 may include one or more processors 480 communicatively coupled to a non-transitory computer-readable medium 481. The non-transitory computer-readable medium 481 may be structured to store executable instructions 482 (e.g., one or more programs) that can be executed by the one or more processors 480 and/or data. The second computing device 479 may be further communicatively coupled to a production device 483 (e.g., electronic display, printer, etc.). Second computing device 479 can also be communicatively coupled to an external computer-readable memory 484.
The second computing device 479 can cause an output to the production device 483, for example, as a result of executing instructions of one or more programs stored on non-transitory computer-readable medium 481 , by the at least one processor 480, to implement a system for retrieving visual media according to the present disclosure. Causing an output can include, but is not limited to, displaying text and images to an electronic display and/or printing text and images to a tangible medium (e.g., paper). Executable instructions to implement visual media retrieving may be executed by the first computing device 475 and/or second computing device 479, stored in a database such as may be maintained in external computer-readable memory 484, output to production device 483, and/or printed to a tangible medium.
One or more additional computers 477 may also be communicatively coupled to the network 478 via a communication link that includes a wired and/or wireless portion. The computing system can be comprised of additional multiple interconnected computing devices, such as server devices and/or clients. Each computing device can include control circuitry such as a
processor, a state machine, application specific integrated circuit (ASIC), controller, and/or similar machine.
The control circuitry can have a structure that provides a given
functionality, and/or execute computer-readable instructions that are stored on a non-transitory computer-readable medium (e.g., 476, 481 , 484). The non- transitory computer-readable medium can be integral (e.g., 481 ), or
communicatively coupled (e.g., 476, 484) to the respective computing device (e.g. 475, 479) in either a wired or wireless manner. For example, the non- transitory computer-readable medium can be an internal memory, a portable memory, a portable disk, or a memory located internal to another computing resource (e.g., enabling the computer-readable instructions to be downloaded over the Internet). The non-transitory computer-readable medium 330 can have computer-readable instructions stored thereon that are executed by the control circuitry (e.g., processor) to provide a particular functionality.
The non-transitory computer-readable medium, as used herein, can include volatile and/or non-volatile memory. Volatile memory can include memory that depends upon power to store information, such as various types of dynamic random access memory (DRAM), among others. Non-volatile memory can include memory that does not depend upon power to store information.
Examples of non-volatile memory can include solid state media such as flash memory, EEPROM, phase change random access memory (PCRAM), among others. The non-transitory computer-readable medium can include optical discs, digital video discs (DVD), Blu-ray discs, compact discs (CD), laser discs, and magnetic media such as tape drives, floppy discs, and hard drives, solid state media such as flash memory, EEPROM, phase change random access memory (PCRAM), as well as other types of machine-readable media.
Logic can be used to implement the method(s) of the present disclosure, in whole or part. Logic can be implemented using appropriately configured hardware and/or machine readable instructions (including software). The above-mentioned logic portions may be discretely implemented and/or implemented in a common arrangement.
Figure 5 illustrates a block diagram of an example computer readable medium (CRM) 595 in communication, e.g., via a communication path 596, with processing resources 593 according to the present disclosure. As used herein, processor resources 593 can include one or a plurality of processors 594 such as in a parallel processing arrangement. A computing device having processor resources can be in communication with, and/or receive a tangible non- transitory computer readable medium (CRM) 595 storing a set of computer readable instructions (e.g., software) for capturing and/or replaying network traffic, as described herein.
The above specification, examples and data provide a description of the method and applications, and use of the system and method of the present disclosure. Since many examples can be made without departing from the spirit and scope of the system and method of the present disclosure, this specification merely sets forth some of the many possible example configurations and implementations.
Although specific examples have been illustrated and described herein, those of ordinary skill in the art will appreciate that an arrangement calculated to achieve the same results can be substituted for the specific examples shown. This disclosure is intended to cover adaptations or variations of one or more examples provided herein. The above description has been made in an illustrative fashion, and not a restrictive one. Combination of the above examples, and other examples not specifically described herein will be apparent upon reviewing the above description. Therefore, the scope of one or more examples of the present disclosure should be determined based on the appended claims, along with the full range of equivalents that are entitled. Throughout the specification and claims, the meanings identified below do not necessarily limit the terms, but merely provide illustrative examples for the terms. The meaning of "a," "an," and "the" includes plural reference, and the meaning of "in" includes "in" and "on." "Embodiment," as used herein, does not necessarily refer to the same embodiment, although it may.
In the foregoing discussion of the present disclosure, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration how examples of the disclosure may be practiced. These examples are described in sufficient detail to enable those of ordinary skill in the art to practice the examples of this disclosure, and it is to be understood that other examples may be utilized and that process, electrical, and/or structural changes may be made without departing from the scope of this disclosure.
Some features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the disclosed examples of the present disclosure have to use more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter lies in less than all features of a single disclosed example. The following claims are hereby incorporated into the Detailed Description, with each claim standing on its own.

Claims

What is claimed:
1. A method for retrieving visual media, comprising:
receiving, using a processor, a text query associated with a target content;
identifying, using the processor, a first group of visual media based on correspondence of metadata of the visual media with the text query;
selecting, using the processor, keyframes from the first group of identified visual media;
detecting, using the processor, instances of a content type in the selected keyframes;
grouping, using the processor, similar instances of the content type into clusters; and
associating, using the processor, the target content with a cluster having a greatest quantity of similar instances.
2. The method of claim 1 , wherein:
receiving, using the processor, a text query associated with a target content includes receiving a name, the target content being one or more images of a person with the name; and
detecting, using the processor, instances of a content type in the selected keyframes includes detecting face images.
3. The method of claim 1 , further comprising:
selecting, using the processor, instances of a content type from each cluster having a threshold quantity of similar instances; and
displaying, using the processor, the instances of a content type for each of the clusters having the threshold quantity of similar instances in an order corresponding to cluster size,
wherein cluster size corresponds to quantity of similar instances of the content type in the cluster.
4. The method of claim 3, wherein selecting and displaying the instances of a content type for each of the clusters includes respectively listing segments within one or more video clips at which the target content appears.
5. The method of claim 4, wherein displaying images of faces appearing in one or more video clips includes displaying a largest face image and displaying a face image most different from a selected face image.
6. The method of claim 3, further comprising:
receiving, using the processor, a user selection of at least one of the displayed instances;
identifying, using the processor, a second group of visual media based on correspondence of metadata of the visual media with the text query and the selected instances of a content type;
selecting, using the processor, second keyframes from the second group of identified visual media;
detecting, using the processor, second instances of a content type in the selected second keyframes;
grouping, using the processor, similar second instances of the content type into clusters; and
determining, using the processor, matches between the second instances of the content type and the selected instances of a content type.
7. The method of claim 6, further comprising:
determining, using the processor, a ranking score for visual media having at least one determined match based on cumulative time during which the selected instances of a content type appear; and
displaying, using the processor, a listing of the visual media based on ranking score.
8. The method of claim 7, further comprising creating, using the processor, an index of the visual media based on ranking score and a quantity of occurrences that the visual media is viewed, the index including a location of the visual media and one or more locations within the visual media at which a particular instance of a content type appears.
9. The method of claim 7, further comprising:
analyzing, using the processor, metadata of most-viewed visual media and a name associated with each respective most-viewed visual media prior to receiving the text query;
generating, using the processor, a re-ranked result corresponding to the name;
caching, using the processor, the re-ranked result; and
updating, using the processor, the cached re-ranked result responsive to new text queries.
10. The method of claim 1 , further comprising indexing, using the processor, locations within the visual media based on content type in the selected keyframes.
1 1 . The method of claim 10, further comprising repurposing , using the processor, images from the indexed locations within the visual media onto customized image arrangements.
12. A non-transitory computer-readable medium having computer-executable instructions stored thereon, the computer-executable instructions comprising instructions that, if executed by one or more processors, cause the one or more processors to:
retrieve a group of visual media based on correspondence of metadata of the visual media with a text query;
select keyframes from the group of retrieved visual media;
apply face clustering to the keyframes from the group of retrieved visual media;
generate query face images based on the face clusters; and re-rank the group of retrieved visual media on a display based on a received input corresponding to a particular one of the query face images.
13. The non-transitory computer-readable medium of claim 12, further comprising instructions that, if executed by one or more processors, cause the one or more processors to indicate portions of a selected one of the group of retrieved visual media corresponding to a selected query face image.
14. A computing system, comprising:
a display;
a non-transitory computer-readable medium having computer-executable instructions stored thereon; and
a processor coupled to the display and the non-transitory computer- readable medium, wherein the computer-executable instructions comprise instructions that, if executed by the processor, cause the processor to:
retrieve a group of visual media based on correspondence of metadata of the visual media with a text query;
select keyframes from the group of retrieved visual media;
apply face clustering to the keyframes from the group of retrieved visual media;
generate query face images based on face clusters; and re-rank the group of retrieved visual media on the display based on a received input corresponding to a particular one of the query face images.
15. The computing system of claim 1 , wherein the processor executes the instructions to display portions of a particular retrieved visual media based on a received input corresponding to a particular one of the query face images.
PCT/CN2011/001629 2011-09-27 2011-09-27 Retrieving visual media WO2013044407A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US14/239,387 US9229958B2 (en) 2011-09-27 2011-09-27 Retrieving visual media
CN201180073732.8A CN103827856A (en) 2011-09-27 2011-09-27 Retrieving visual media
PCT/CN2011/001629 WO2013044407A1 (en) 2011-09-27 2011-09-27 Retrieving visual media
EP11873267.6A EP2734931A4 (en) 2011-09-27 2011-09-27 Retrieving visual media

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2011/001629 WO2013044407A1 (en) 2011-09-27 2011-09-27 Retrieving visual media

Publications (1)

Publication Number Publication Date
WO2013044407A1 true WO2013044407A1 (en) 2013-04-04

Family

ID=47994088

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/001629 WO2013044407A1 (en) 2011-09-27 2011-09-27 Retrieving visual media

Country Status (4)

Country Link
US (1) US9229958B2 (en)
EP (1) EP2734931A4 (en)
CN (1) CN103827856A (en)
WO (1) WO2013044407A1 (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140298219A1 (en) * 2013-03-29 2014-10-02 Microsoft Corporation Visual Selection and Grouping
US10521472B2 (en) * 2015-02-27 2019-12-31 Realnetworks, Inc. Composing media stories method and system
CN106294454A (en) * 2015-05-29 2017-01-04 中兴通讯股份有限公司 Video retrieval method and device
CN105426515B (en) * 2015-12-01 2018-12-18 小米科技有限责任公司 video classifying method and device
US10810744B2 (en) * 2016-05-27 2020-10-20 Rakuten, Inc. Image processing device, image processing method and image processing program
US10606887B2 (en) * 2016-09-23 2020-03-31 Adobe Inc. Providing relevant video scenes in response to a video search query
US10346727B2 (en) * 2016-10-28 2019-07-09 Adobe Inc. Utilizing a digital canvas to conduct a spatial-semantic search for digital visual media
CN106874936B (en) * 2017-01-17 2023-07-11 腾讯科技(上海)有限公司 Image propagation monitoring method and device
US10789284B2 (en) * 2018-04-13 2020-09-29 Fuji Xerox Co., Ltd. System and method for associating textual summaries with content media
US11093839B2 (en) * 2018-04-13 2021-08-17 Fujifilm Business Innovation Corp. Media object grouping and classification for predictive enhancement
US11281934B2 (en) * 2020-02-24 2022-03-22 Gfycat, Inc. Identification and tracking of internet memes
CN114153342A (en) * 2020-08-18 2022-03-08 深圳市万普拉斯科技有限公司 Visual information display method and device, computer equipment and storage medium
EP4161085A4 (en) * 2021-03-30 2023-11-01 BOE Technology Group Co., Ltd. Real-time audio/video recommendation method and apparatus, device, and computer storage medium
CN113343029B (en) * 2021-06-18 2024-04-02 中国科学技术大学 Complex video character retrieval method with enhanced social relationship

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5805746A (en) * 1993-10-20 1998-09-08 Hitachi, Ltd. Video retrieval method and apparatus
JP2000112958A (en) * 1998-09-30 2000-04-21 Canon Inc Information retrieval device/method and computer readable memory
CN1851709A (en) * 2006-05-25 2006-10-25 浙江大学 Embedded multimedia content-based inquiry and search realizing method
CN101000633A (en) * 2007-01-17 2007-07-18 北京航空航天大学 Search method and system for MPEG-7 file
CN101901249A (en) * 2009-05-26 2010-12-01 复旦大学 Text-based query expansion and sort method in image retrieval

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6751343B1 (en) 1999-09-20 2004-06-15 Ut-Battelle, Llc Method for indexing and retrieving manufacturing-specific digital imagery based on image content
US20030210808A1 (en) * 2002-05-10 2003-11-13 Eastman Kodak Company Method and apparatus for organizing and retrieving images containing human faces
US7986372B2 (en) * 2004-08-02 2011-07-26 Microsoft Corporation Systems and methods for smart media content thumbnail extraction
GB2424091A (en) * 2005-03-11 2006-09-13 Alamy Ltd Ranking of images in the results of a search
US7864989B2 (en) 2006-03-31 2011-01-04 Fujifilm Corporation Method and apparatus for adaptive context-aided human classification
US20080159383A1 (en) * 2006-12-27 2008-07-03 Yahoo! Inc. Tagboard for video tagging
CN101398832A (en) * 2007-09-30 2009-04-01 国际商业机器公司 Image searching method and system by utilizing human face detection
US8068676B2 (en) 2007-11-07 2011-11-29 Palo Alto Research Center Incorporated Intelligent fashion exploration based on clothes recognition
US20100030663A1 (en) 2008-06-30 2010-02-04 Myshape, Inc. System and method for networking shops online and offline
WO2010006334A1 (en) * 2008-07-11 2010-01-14 Videosurf, Inc. Apparatus and software system for and method of performing a visual-relevance-rank subsequent search
US8538943B1 (en) * 2008-07-24 2013-09-17 Google Inc. Providing images of named resources in response to a search query
US8150169B2 (en) 2008-09-16 2012-04-03 Viewdle Inc. System and method for object clustering and identification in video
US8180766B2 (en) 2008-09-22 2012-05-15 Microsoft Corporation Bayesian video search reranking
WO2010041377A1 (en) 2008-10-06 2010-04-15 パナソニック株式会社 Representative image display device and representative image selection method
KR101471204B1 (en) 2008-12-19 2014-12-10 주식회사 케이티 Apparatus and method for detecting clothes in image
US8452794B2 (en) * 2009-02-11 2013-05-28 Microsoft Corporation Visual and textual query suggestion
US20100306249A1 (en) * 2009-05-27 2010-12-02 James Hill Social network systems and methods
US20110099488A1 (en) * 2009-10-26 2011-04-28 Verizon Patent And Licensing Inc. Method and apparatus for presenting video assets

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5805746A (en) * 1993-10-20 1998-09-08 Hitachi, Ltd. Video retrieval method and apparatus
JP2000112958A (en) * 1998-09-30 2000-04-21 Canon Inc Information retrieval device/method and computer readable memory
CN1851709A (en) * 2006-05-25 2006-10-25 浙江大学 Embedded multimedia content-based inquiry and search realizing method
CN101000633A (en) * 2007-01-17 2007-07-18 北京航空航天大学 Search method and system for MPEG-7 file
CN101901249A (en) * 2009-05-26 2010-12-01 复旦大学 Text-based query expansion and sort method in image retrieval

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2734931A4 *

Also Published As

Publication number Publication date
US9229958B2 (en) 2016-01-05
CN103827856A (en) 2014-05-28
EP2734931A1 (en) 2014-05-28
EP2734931A4 (en) 2015-04-01
US20140193048A1 (en) 2014-07-10

Similar Documents

Publication Publication Date Title
US9229958B2 (en) Retrieving visual media
US11461392B2 (en) Providing relevant cover frame in response to a video search query
US10922350B2 (en) Associating still images and videos
US20220004573A1 (en) Method for creating view-based representations from multimedia collections
Wu et al. Practical elimination of near-duplicates from web video search
Wang et al. Event driven web video summarization by tag localization and key-shot identification
Jain et al. Learning to re-rank: query-dependent image re-ranking using click data
US9244923B2 (en) Hypervideo browsing using links generated based on user-specified content features
US20140093174A1 (en) Systems and methods for image management
US20080247610A1 (en) Apparatus, Method and Computer Program for Processing Information
US20110173190A1 (en) Methods, systems and/or apparatuses for identifying and/or ranking graphical images
US10311038B2 (en) Methods, computer program, computer program product and indexing systems for indexing or updating index
EP2588976A1 (en) Method and apparatus for managing video content
KR20110007179A (en) Method and apparatus for searching a plurality of stored digital images
WO2012115829A1 (en) Method for media browsing and reliving
Sandhaus et al. Semantic analysis and retrieval in personal and social photo collections
Münzer et al. lifexplore at the lifelog search challenge 2018
Friedland et al. Multimodal location estimation on flickr videos
Nixon et al. Multimodal video annotation for retrieval and discovery of newsworthy video in a news verification scenario
Liu et al. Automatic concept detector refinement for large-scale video semantic annotation
Ashok Kumar et al. An efficient scene content-based indexing and retrieval on video lectures
Kim et al. User‐Friendly Personal Photo Browsing for Mobile Devices
Liu et al. EventEnricher: a novel way to collect media illustrating events
Blighe et al. MyPlaces: detecting important settings in a visual diary
Shinde et al. Late Semantic Fusion Approaches for Multimedia Information Retrieval with Automatic Tag Generation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11873267

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 14239387

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 2011873267

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE