US10657176B1 - Associating object related keywords with video metadata - Google Patents

Associating object related keywords with video metadata Download PDF

Info

Publication number
US10657176B1
US10657176B1 US16/437,649 US201916437649A US10657176B1 US 10657176 B1 US10657176 B1 US 10657176B1 US 201916437649 A US201916437649 A US 201916437649A US 10657176 B1 US10657176 B1 US 10657176B1
Authority
US
United States
Prior art keywords
video
keyword
word
product
transcription
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
US16/437,649
Other languages
English (en)
Inventor
Dominick Khanh Pham
Sven Daehne
Mike Dodge
Janet Galore
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Amazon Technologies Inc
Original Assignee
Amazon Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Amazon Technologies Inc filed Critical Amazon Technologies Inc
Priority to US16/437,649 priority Critical patent/US10657176B1/en
Assigned to AMAZON TECHNOLOGIES, INC. reassignment AMAZON TECHNOLOGIES, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: DODGE, MIKE, DAEHNE, SVEN, GALORE, JANET, PHAM, DOMINICK KHANH
Application granted granted Critical
Publication of US10657176B1 publication Critical patent/US10657176B1/en
Priority to PCT/US2020/036908 priority patent/WO2020251967A1/fr
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/71Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/732Query formulation
    • G06F16/7335Graphical querying, e.g. query-by-region, query-by-sketch, query-by-trajectory, GUIs for designating a person/face/object as a query predicate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7834Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/28Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/34Indicating arrangements 

Definitions

  • the present invention relates to associating object-related keywords (or attributes) in a video with video metadata (e.g., a video timestamp).
  • video metadata e.g., a video timestamp
  • video consumption, navigation, and interaction is fairly limited. For example, users can fast forward a video, rewind the video, or scrub the seeker to skip to different segments in the video. This limited interaction allows for some level of coarse searching and navigation of a video, but does not allow a user to search for specific words/phrases mentioned in the video. This is due in part to the lack of contextual metadata available for the video. As a result, searching and navigating the contents of a video (or a collection of videos) for particular content is laborious and inefficient.
  • FIG. 1 illustrates presenting markers on a user device to indicate time instances when object-related keywords are mentioned in the video, according to one embodiment.
  • FIG. 2 is a block diagram for generating tags that associate an object-related keyword with a time instance in a video, according to one embodiment.
  • FIG. 3 is a flowchart of a method for generating tags that associate an object-related keyword with a time instance in a video, according to one embodiment.
  • FIG. 4 illustrates an example user interface for generating tags that associate an object-related keyword with a time instance in a video, according to one embodiment.
  • FIG. 5 is a flowchart of a method for displaying tags on a user device, according to one embodiment.
  • FIGS. 6A-6C illustrate an example user interface of a user device, according to one embodiment.
  • FIG. 7 illustrates another example user interface of a user device, according to one embodiment.
  • FIG. 8 is a flowchart of a method for using keywords as navigational anchors on a user device, according to one embodiment.
  • FIGS. 9A-9B illustrate yet another example user interface of a user device, according to one embodiment.
  • Embodiments herein describe a video tagging system that can mark, within a timeline (or duration) of a video, occurrences of keywords associated with an object (e.g., product) (or a collection of objects) in the video.
  • the video may be a review video describing the performance of different travel headphones, a promotional video advertising different items for decorating a house, a video advertising sporting equipment for sale, etc.
  • the video tagging system can identify the object(s) that is associated with a video (e.g., the object appears in and/or is mentioned in the video) based on an object identifier(s) (or identities) (IDs) associated with the video.
  • the product ID can include, e.g., a particular brand and model of a smartphone, a model number of a toy set, or a standard identification number.
  • the video tagging system can use transcription data of audio content associated with the video to identify terms (or keywords) mentioned in the video that are relevant to one or more attributes of the object(s).
  • the attributes can include, e.g., a type of the product for sale (e.g., basketball, shoe, clothing item, etc.), a brand name of the product (e.g., Brand A headphone, Brand B headphone, etc.), a physical feature of the product (e.g., size, shape, color, etc.), and the like.
  • the attributes of the object can be identified based on information associated with the object ID for the object. Such information can include, but is not limited to, product reviews, product description, a detailed product page, visual appearance of the product, etc.
  • the video tagging system enables a user to easily search and navigate for particular content within a video (or a collection of videos).
  • the video tagging system can create markers on a user device playing the video to indicate to a user the relevant time instances where the terms (or keywords) are mentioned.
  • a user may identify an object (e.g., a product for sale, an actor in a scene, etc.) that the user is interested in from a video displayed on a user device (e.g., a television, tablet, smartphone, etc.).
  • a user device e.g., a television, tablet, smartphone, etc.
  • the user device or an input/output (I/O) device communicatively coupled to the user device
  • the user can search for terms relevant to the object in the video.
  • the user may search for “sound quality” for a pair of headphones that are mentioned in the video.
  • the user device can retrieve the relevant tags associated with “sound quality,” and display markers on the user interface (UI), indicating the occurrences (e.g., time instances or timestamps) where “sound quality” (or other keywords related to “sound quality”) is mentioned in the video.
  • the markers enable the user to quickly navigate to different portions (e.g., frames) of the video that the user is interested in.
  • embodiments help the user to easily learn more information about the object associated with the video. This, in turn, enables the user to make a more informed decision as to whether the user would like to purchase the product in the video.
  • FIG. 1 illustrates presenting markers on a user device to indicate time instances when object-related (e.g., product-related) keywords are mentioned in the video, according to one embodiment.
  • FIG. 1 includes a video editor 100 , a television 130 , and an I/O device 170 .
  • the video editor 100 adds product metadata 104 into a video 102 , based on a transcription 110 corresponding to the video 102 .
  • the product metadata 104 includes additional contextual information regarding the product in the video that may help a user viewing the video 102 to decide whether to purchase the product in the video 102 .
  • the product metadata 104 includes keywords 106 and tags 108 .
  • the transcription 110 includes text 112 , timestamps 114 , and confidences 116 .
  • the text 112 includes words transcribed from the audio content in the video 102 .
  • the timestamps 114 include, for each word in the text 112 , the timestamp when the word is mentioned in the (audio content of the) video 102 .
  • the confidences 118 include, for each word in the text 112 , the confidence (or accuracy) level that the word in the text 112 has been accurately transcribed from the audio content.
  • the transcription 110 may be stored in and retrieved from another location (e.g., in the cloud).
  • the keywords 106 generally act as search terms for different content within the video 102 .
  • the keywords 106 can include terms relevant to the object(s) in the video 102 .
  • the keywords 106 can include terms (or words) mentioned in the video 102 (e.g., as determined from the transcription 110 ).
  • the keywords 106 can include terms (or words) that are not mentioned in the video 102 (e.g., as determined from the transcription 110 ), but that are related to other terms mentioned in the video 102 about the object.
  • the user may use the I/O device 170 to manually input (or search) for a keyword 106 .
  • the user may have identified (via subtitles 140 ) a particular term (e.g., “comfort”) mentioned in the video 102 that is related to a product (e.g., headphones) in the video 102 , and may use the I/O device 170 to search for relevant portions (e.g., frames) of the video that mention “comfort.”
  • a keyword e.g., “fabric”
  • fabric is mentioned in a video about clothing items
  • the video editor 100 may generate a set of keywords 106 to display to the user, based on the transcription 110 .
  • the keywords 106 may be displayed in a navigation panel 142 or on the I/O device 170 itself, if provisioned with a screen/display 172 and communication infrastructure (e.g., Bluetooth) to allow communication with the video editor 100 .
  • the user can select one of the keywords 106 to navigate (or jump) to the relevant portion (frame) of the video that mentions (or is related) to the searched keyword.
  • the tags 108 include associations between the keywords 106 , timestamps 114 , and the object(s) in the video 102 . That is, the tags 108 indicate the relevant timestamps 114 when the keywords 106 associated with an object are mentioned (or indirectly mentioned) in the video 102 . As described below, the indications (e.g., markers) of the tags may be displayed on the user device (e.g., television 130 ) in various formats.
  • the tags 108 may indicate a timestamp 114 of a different word within the video 102 relative to the occurrence of the keyword 106 .
  • the transcription 110 may infer punctuations used in the text 112 and indicate when periods, commas, and other punctuation occurs. Assuming this information is available, the tag 108 can indicate a timestamp 114 corresponding to the beginning of a sentence (or phrase) that includes the keyword 106 as opposed to the timestamp 114 corresponding to the exact time instance when the keyword 106 is mentioned in another portion of the sentence (or phrase).
  • the tag 108 can indicate a timestamp 114 corresponding to another word that is a predefined distance away from (e.g., N words prior to) the keyword 106 .
  • a timestamp 114 corresponding to another word that is a predefined distance away from (e.g., N words prior to) the keyword 106 .
  • the keywords 106 and the tags 108 are stored in one or more files as metadata.
  • the metadata can be stored with the video (e.g., in addition to the audio the audio and visual data of the video) or can be stored elsewhere (e.g., in the cloud).
  • a user device can retrieve the keywords 106 and/or the tags 108 from a server (e.g., in the cloud).
  • the arrow 120 illustrates transferring the video 102 to a television 130 .
  • the television 130 may be internet-capable so it can download the video 102 from a video distribution system.
  • a user (not shown) can use the I/O device 170 to play the video 102 on the television 130 .
  • the I/O device 170 can include controls for navigating through a UI (on the television 130 ) which offers multiple videos, selecting a video, and playing the selected video.
  • the I/O device 170 is a remote that includes interactive buttons and controls.
  • the I/O device 170 allows the user to input (e.g., via an interactive keyboard displayed on the UI on the television 130 ) queries containing keywords related to an object (e.g., product for sale) displayed (or mentioned) in the video 102 .
  • the I/O device 170 includes an audio device (e.g., a microphone) for receiving audio input from the user. Using the audio device, the user can input queries (or searches) for keywords 106 related to a product displayed (or mentioned) in the video 102 .
  • the I/O device 170 also includes a screen/display 172 , which allows the user to interact with content received from video editor 100 and/or the television 130 .
  • the screen/display 172 can be implemented as a touch screen interface.
  • the user may see (or hear about) an object (e.g., product for sale) the user wishes to learn more about.
  • the current frame of the video 102 being displayed on the television 130 includes three products 132 (e.g., a first headphone device (product 132 A), a second headphone device (product 132 B), and a third headphone device (product 132 C)).
  • the user can search for a keyword 106 related to one of the products (e.g., product 132 A) or for more information related to a collection of products (e.g., products 132 A-C).
  • the keyword 106 can be a word that has been mentioned in a previous (or current) frame in the video 102 , a word that appears in subtitles 140 , a word that the user infers will be mentioned in the video 102 about the product, a suggested word (generated by the video editor 100 ) that appears in the navigation panel 142 , a brand name of the product (e.g., “Brand A”), etc.
  • the I/O device 170 can include a scroll wheel or arrow keys to navigate to different parts of the UI on the television 130 in order to search for a keyword 106 .
  • a scroll wheel or arrow keys to navigate to different parts of the UI on the television 130 in order to search for a keyword 106 .
  • the user can navigate through different keywords 106 presented in the navigation panel 142 , keywords presented in subtitles 140 , etc. Once a keyword 106 is highlighted, the user can use a different button to submit a query for the selected keyword 106 .
  • the user can use the arrow buttons to type a keyword 106 on a virtual keyboard within the UI of the television 130 .
  • the user could use voice commands (e.g., “search for audio quality”) to submit a query for a keyword 106 .
  • voice commands e.g., “search for audio quality”
  • the user can navigate on the I/O device itself, via the screen/display 172 .
  • the television 130 presents markers 180 in the timeline 160 to identify the time instances when the keyword 106 is mentioned (or indirectly mentioned) the video 102 .
  • the television 130 presents three markers 180 A, 180 B, and 180 C that indicate different time instances when the keyword 106 is mentioned.
  • the user can select the different markers 180 A-C to jump to the frame of the video 102 that is related to the keyword 106 , e.g., to learn more information regarding the product.
  • the user can select the frame selector 162 to navigate to the different markers 180 A, 180 B, and 180 C to learn more information regarding the product.
  • the markers 180 A-C can be displayed on the television 130 in various formats.
  • the markers 180 A-C could be in the form of a thumbnail video frame, which may provide the user with a better visual context for the object in question.
  • the television 130 can display the navigation panel 142 in a pop-up graphic 150 . That is, the pop-up graphic 150 may overlay certain portions of the video 102 being displayed. In some embodiments, the pop-up graphic 150 may appear adjacent to the video 102 , e.g., in a side panel of a UI on the television 130 . In some embodiments, the pop-up graphic 150 may appear when the video 102 has been paused.
  • the embodiments herein primarily discuss associating product-related keywords with timestamps in a video 102 and presenting markers to indicate the occurrences of the product-related keywords in the video 102
  • this disclosure is not limited to products for sale and can be used for other applications.
  • the embodiments herein can be used to enhance user experience when watching a television show or movie.
  • the video editor 100 can generate and associate the timestamps to keywords related to actors in a scene. For example, the user can search for the actor's name to find timestamps of other scenes in the current show in which the actor appears.
  • the video editor 100 can generate and associate keywords related to any object (e.g., product for sale, a person, object, geographical landmark, etc.) in a video 102 with timestamps in the video.
  • object e.g., product for sale, a person, object, geographical landmark, etc.
  • embodiments herein discuss objects associated with video
  • embodiments can be applied to audio feeds (e.g., without the need for a video feed).
  • Embodiments can also be applied to live video content.
  • metadata can be created during the live broadcast, allowing the user to search the portion of the broadcast that has been transmitted.
  • FIG. 2 is a block diagram for generating tags that associate an object-related keyword with a time instance in a video, according to one embodiment.
  • FIG. 2 includes a (tagging) computing system 200 , a network (e.g., video distribution network) 240 , a transcription computing service 220 , and user devices 230 .
  • the computing system 200 includes a processor(s) 202 and memory 204 .
  • the processor(s) 202 represents any number of processing elements which can include any number of processing cores.
  • the memory 204 can include volatile memory, non-volatile memory, and combinations thereof.
  • the memory 204 includes the video editor 100 , a keyword generator 206 , and a tagging component 208 .
  • the video editor 100 may be a software application executing on the computing system 200 .
  • the video editor 100 permits a user to generate keywords 106 associated with objects in a video 102 and generate tags 108 corresponding to time instances when the keywords 106 are mentioned in the video 102 . This process is described in more detail below.
  • the keyword generator 206 may be a software application executing on the computing system 200 .
  • the keyword generator 206 generates keywords 106 associated with different objects that appear (and/or are mentioned) in a video 102 .
  • the keyword generator 206 can evaluate the transcription of the audio content associated with the video and determine a set of keywords 106 relevant to one or more attributes of a given object in the video.
  • the keyword generator 206 can analyze the transcription using one or more techniques to determine the set of keywords 106 relevant to a given object.
  • These techniques can include, but are not limited to, determining frequency (or number occurrences) of words, filtering (or removing) non-stop words, computing term relevance scores of the words relative to an attribute(s) of an object, identifying different parts of speech used for the words, named entity recognition, etc.
  • the keyword generator 206 can generate keywords 106 that may or may not be mentioned in the video 102 (e.g., the keywords 106 do not appear in the transcription). For example, the keyword generator 206 can use video-to-product associations to retrieve other contextual information regarding objects in the video (e.g., product attributes, customer reviews, related videos, etc.). Using this data, the keyword generator 206 use machine learning tools (e.g., applications, algorithms, etc.) to identify related keywords 106 to the objects in a given video.
  • machine learning tools e.g., applications, algorithms, etc.
  • the keyword generator 206 can determine (e.g., using a machine learning tool) that a similarity between a first keyword (not mentioned in the video 102 ) and a second keyword (that is mentioned in the video 102 ) satisfies a condition (e.g., is greater than or equal to a threshold).
  • the keywords 106 that are generated by the keyword generator 206 can be used as navigation anchors (e.g., markers or links) to different time instances within the video.
  • the set of keywords 106 can be presented within a navigation panel 142 on a UI of the user device. In this manner, the keyword generator 206 can proactively suggest object-related keywords that may be of interest to the user.
  • the keyword generator 206 can rank (or adjust the relevance of) keywords 106 , based on external inputs (e.g., user preferences, user search/navigation history, etc.). For example, the keyword generator 206 may determine, based on previous user history, that the user has a preference for “comfort” attributes when shopping for headphones. Absent this information, the user may have been presented with the following ordered list of keywords: “battery life, charge time, comfort, fit, size, etc. However, once the user's history is accounted for, the keyword generator 206 can modify the ordered list of keywords to be the following: comfort, fit, size, battery life, charge time, etc.
  • external inputs e.g., user preferences, user search/navigation history, etc.
  • the keyword generator 206 may determine, based on previous user history, that the user has a preference for “comfort” attributes when shopping for headphones. Absent this information, the user may have been presented with the following ordered list of keywords: “battery life, charge time, comfort,
  • the keyword generator 206 can rank (or adjust the relevance of) suggested keywords, based on similarities to other words within a given context. For example, for a video about headphones, the keyword generator 206 can determine, based on a similarity (or distance) metric, that the keyword “battery life” is highly related (or similar) to another word “charge time” (e.g., the distance between the two words is below a threshold).
  • the keyword generator 206 can determine, based on the similarity (or distance) metric, that the keyword “comfort” is highly related to another word “fit.” In these cases, the keyword generator 206 can adjust the order of the suggested keywords, such that the words “comfort” and “fit” appear closer together, and the words “battery life” and “charge time” appear closer together. Doing so can enable additional keyword associations that may not be present in the transcription.
  • the tagging component 208 can also use the similarity (or relatedness) between words to associate unmentioned keywords to time instances corresponding to mentioned keywords, and vice versa.
  • the tagging component 208 can associate the timestamp for the unsearched word to the related searched word.
  • “comfort” may not be mentioned in a video about headphones, but “fit” is mentioned in the video.
  • a marker indicating the timestamp for “fit” can be displayed on the user device.
  • the tagging component 208 may be a software application that generates tags 108 to identify associations between timestamps 114 in a video and keywords 106 .
  • the tagging component 208 includes a mapping tool 210 .
  • the mapping tool 210 can employ acoustic pattern recognition techniques (applied solely to the audio content without analyzing transcription data) to identify time instances in the video associated with keywords 106 .
  • the mapping tool 210 can retrieve the audio feed from the video 102 and compare acoustic features (e.g., audio frequency, pitch, tone, etc.) of different portions of the audio feed to a set of audio fingerprints associated with the set of keywords 106 .
  • acoustic features e.g., audio frequency, pitch, tone, etc.
  • a video that advertises different items for decorating a house can play different theme music depending on the location within the house (e.g., a first song when discussing kitchen items, a second song when discussing bedroom items, etc.).
  • an audio fingerprint based on the first song can be associated with kitchen item related keywords
  • another audio fingerprint based on the second song can be associated with bedroom item related keywords, and so on.
  • the mapping tool 210 can associate the timestamp for that portion of the audio feed to the keyword 106 .
  • the mapping tool 210 can use the transcription 110 of the audio extracted from the video 102 to identify associations between the timestamps 114 and keywords 106 .
  • the mapping tool 210 can identify the keyword 106 in the text 112 of the transcription 110 , extract the timestamp 114 (in the transcription 110 ) corresponding to the keyword 106 , and use the extracted timestamp 114 as the relevant time instance in the video 102 to associate with the keyword 106 .
  • the mapping tool 210 can use the transcription 110 to identify another time instance corresponding to a different starting point within the video 102 .
  • the mapping tool 210 can associate the keyword 106 to a time instance corresponding to the start of the sentence. To do so, the mapping tool 210 can identify another timestamp 114 of a word in the transcription 110 in proximity to the keyword 106 and associate that timestamp 114 to the keyword 106 .
  • the transcription computing service 220 provides transcriptions of audio and video content.
  • the transcription computing service 220 includes one or more computing systems 224 for providing transcriptions.
  • the computing system 224 includes a retrieval tool 226 and a transcription tool 228 .
  • the retrieval tool 226 receives video 102 and extracts audio content (or feed) from the video 102 .
  • the retrieval tool 226 may store the extracted audio content into an audio file 222 .
  • the transcription tool 228 employs natural language processing/speech recognition techniques and/or machine learning techniques to generate a transcription 110 of the audio content in the audio file 222 .
  • the network (e.g., video distribution network) 240 can be a wide area network (WAN) or a local access network (LAN) which distributes the video 102 to the user devices 230 .
  • the network 240 can be hosted in a cloud computing environment.
  • the network 240 is part of a subscription service or provided for free to users who sign up for an account.
  • the network 240 can provide a repository of videos that can be provided to the user devices 230 on-demand or as part of advertisements.
  • the user devices 230 A, 230 B, and 230 C can be mobile phones, internet-capable televisions, laptops, tablets, streaming players (e.g., online media players), and the like.
  • a user can pause the video 102 currently being played in order to learn more about a particular object (mentioned) in the video 102 .
  • the I/O device can be separate from the user device 230 —e.g., a remote controller or mouse—or may be an integrated component in the user device 230 —e.g., a touch screen or touch pad.
  • FIG. 3 is a flowchart of a method 300 for generating tags that associate an object-related keyword with a time instance in a video, according to one embodiment.
  • the method 300 may be performed by one or more components (e.g., video editor 100 , keyword generator 206 , tagging component 208 , etc.) of a computing system (e.g., computing system 200 ).
  • the method 300 begins at block 302 , where the computing system identifies at least one object associated with a video.
  • the at least one object can include a product for sale, a person, a geographical landmark, etc.
  • the computing system may identify the object(s) associated with the video based on a set of product IDs configured for that video. For example, the set of product IDs (or other identification information) for the object(s) may be stored within the video file or within a separate file.
  • the computing system determines one or more attributes of the object.
  • the one or more attributes can include a type of the object, a Brand name of the object, a physical feature (e.g., size, shape, color, etc.) of the object, and the like.
  • the object attributes can be stored along with the product ID for the object (e.g., within the video file or within a separate file). In some cases, the object attributes can be determined from a catalog of items (e.g., maintained in a database). In general, the object attributes can be determined from any source of information related to the product ID for the object (e.g., product reviews, product description, detailed product page, visual appearance of the object, etc.).
  • the computing system obtains a transcription of audio content within the video.
  • the computing system can retrieve the transcription from a database (e.g., transcriptions 110 ) storing different audio transcriptions for different videos.
  • the computing system can request the transcription computing service 220 to perform a transcription of the audio content.
  • the computing system can upload the audio content to the transcription computing service 220 and receive a transcription of the audio content in response.
  • the transcription may include text of the words mentioned in the audio content.
  • the transcription may include timestamped data indicating a time instance when each word in the text is mentioned in the video.
  • the computing system determines, based on an analysis of the transcription, one or more keywords that are associated with at least one of the attributes of the at least one object.
  • the keyword(s) may include words (or terms) that are mentioned in the audio content that indicate one of the attributes of the object.
  • a keyword for a Brand A headphone device may be “Brand A,” or headphones.
  • a keyword for a laptop may be “battery life” relating to one of the components (e.g., battery) of the laptop.
  • the keyword(s) may (additionally or alternatively) include words (or terms) that are not mentioned in the audio content.
  • the unmentioned keywords may indicate one of the attributes of the object and be related or similar to one or more other keywords mentioned in the audio content.
  • the keyword(s) may be provided to the computing system (e.g., via the video editor 100 ) by a user.
  • a user For example, an advertiser or product reviewer may identify a set of relevant keywords associated with the product that a user may provide in a query about the product.
  • the video editor 100 may provide a UI (e.g., UI 400 ) to enable a user to identify keyword(s) relevant to an object in the video.
  • UI e.g., UI 400
  • the keyword(s) can be generated by keyword generator (e.g., keyword generator 206 ) of the computing system.
  • keyword generator can analyze the audio transcription to derive keywords based on one or more techniques.
  • the keyword generator can determine the frequency of words in the transcription and select words that satisfy a predetermined frequency threshold (e.g., frequency of occurrence is greater than a threshold) as keywords.
  • a predetermined frequency threshold e.g., frequency of occurrence is greater than a threshold
  • the keyword generator can filter out (or down weight) stop words, such as “the,” “a,” “an,” “in,” etc., when processing the natural language data in the transcription.
  • the keyword generator can filter out stop words before or after processing the natural language data in the transcription.
  • the keyword generator can compute term weight (or relevance) scores (e.g., term frequency-inverse document frequency (TFIDF) scores) for words in the transcription relative to an attribute(s) of the object.
  • term weight or relevance
  • TFIDF term frequency-inverse document frequency
  • the keyword generator can select words having a term weight score that satisfies a threshold (e.g., above a threshold) as keywords.
  • the keyword generator can evaluate parts of speech used in the transcription to identify keywords relevant to an attribute(s) of the object. For example, the keyword generator can identify different adjectives (e.g., “comfortable,” “loud,” etc.) used in relation to the object (e.g., a noun, such as “headphones”) and use the adjectives (or similar words) as keywords.
  • the keyword generator can use named entity recognition techniques (also known as entity identification, entity extraction, etc.) to classify terms in the transcription into pre-defined categories, which are used as keywords. In sum, the keyword generator can employ any one of or combination of the above techniques to determine keywords relevant to the object in the video.
  • the computing system generates one or more tags that each include an indication of the association between a keyword and the time instance when a word (in the transcription) associated with the keyword is mentioned in the audio content of the video.
  • the keyword and the associated word in the transcription may be the same.
  • the time instance may correspond to the time instance when the keyword is mentioned in the audio content of the video.
  • the keyword may be another word in the transcription that is different from the associated word in the transcription.
  • the time instance corresponding to the associated word may be prior to or subsequent to the time instance when the keyword is mentioned in the audio content.
  • the keyword may be different from the associated word and may not be a word that is mentioned in the audio content.
  • the keyword may be a word (e.g., “comfort”) that is related (or similar) to another word (e.g., “fit”) that is mentioned in the audio content.
  • each tag includes an indication of the product ID, the keyword(s), and the starting time instance in the video associated with the keyword(s).
  • the one or more tags can be submitted to the computing system by a user, e.g., via the UI 400 of the video editor 100 .
  • the computing system may automatically generate tags to indicate the time instances in the video associated with the keywords.
  • the computing system sends an indication of the tags.
  • the tags can be stored in a file associated with the video (e.g., in the cloud) for later retrieval by a user device.
  • the computing system can package the tags along with the video and transmit the package to a user device.
  • FIG. 4 illustrates a UI 400 of the video editor 100 for generating tags that associate an object-related keyword with a time instance in a video, according to one embodiment.
  • the UI 400 can be used to perform one or more of blocks 302 - 312 of the method 300 depicted in FIG. 3 .
  • the UI 400 can be displayed on a computing system (e.g., computing system 200 ).
  • the UI 400 includes a current frame 420 , a timeline 460 , a frame selector 462 , a tag list 410 , a tag creator 430 , and a tag editor 440 .
  • the user can move across the timeline 460 to select a particular frame in the video, which is displayed as current frame 420 .
  • the timeline 460 indicates the temporal location of each frame in the video.
  • the frame selector 462 permits the user to select a particular one of the frames to be displayed as the current frame 420 .
  • the tag creator 430 includes a Product ID field 436 , a Timestamp field 438 , a Keyword(s) field 450 , and a Time Range field 452 , which allow the user to create tags corresponding to time instances associated with keywords.
  • the Product ID field 436 allows a user to select a product ID associated with one of the objects in the video being displayed and/or mentioned in the audio content associated with the video.
  • the Product ID field 436 is pre-loaded with a set of Product IDs associated with objects in the video. In this case, the user can select one of the product IDs via a drop-down button in the Product ID field 436 .
  • the Product ID field 436 permits a user to type in product IDs for objects in the video.
  • the user can provide a standard identification number or other unique product ID for an object.
  • the Product ID field 436 may be a search field which permits the user to identify the standard identification number or unique ID by putting in general information such as the type of product, its manufacturer, partial product ID number, and the like.
  • the Product ID field 436 can provide a list of potential IDs that match the criteria provided by the user from which the user can select the correct product ID.
  • the UI 400 can provide a field that lets the user identify the object (e.g., name of the actor).
  • the Keyword(s) field 450 permits a user to identify keywords (or terms) associated with the object (identified by the product ID) in the video. As noted, the user can select words or terms that are mentioned in the audio content associated with the video (e.g., the words or terms may appear in the subtitles 140 ). In other cases, the user can select words that are not mentioned in the audio content to associate with the object.
  • the Timestamp field 438 permits a user to indicate a time instance in the video when the keyword is mentioned or a time instance that is related to the keyword (e.g., the keyword is indirectly mentioned). In some embodiments, the Timestamp field 438 can be automatically updated with a timestamp corresponding to a position of the frame selector 462 within the timeline 460 .
  • the Time Range field 452 permits a user to indicate a region (or time range) within the video where multiple instances of the keyword are mentioned. For example, if the computing system (or user) determines that a predefined number of time instances (e.g., X number of time instances) related to the same keyword appear within a predefined time window (e.g., Y seconds), the computing system can indicate the time range in which the number of time instances are located. Doing so avoids multiple display markers (greater than a threshold) for the same keyword being presented on a user device.
  • a predefined number of time instances e.g., X number of time instances
  • a predefined time window e.g., Y seconds
  • the tag creator 430 includes a CREATE TAG button 434 that the user can select to create a tag based on the Product ID field 436 , Timestamp field 438 , Keyword(s) field 450 , and Time Range field 452 .
  • the created tag is added to the Tag List 410 .
  • the Tag List 410 include tags 108 A, 108 B, 108 C, and 108 D.
  • Each tag 108 may include the product ID, keyword(s), and timestamp (corresponding to the time instance in the video when the keyword(s) are mentioned).
  • tag 108 A indicates that: the timestamp for “waterproof” associated with object 432 A (e.g., a first watch device) is at 10 seconds (relative to a start of the video); the timestamp for “LTE, cellular” associated with object 432 A is at 120 seconds (relative to a start of the video); and the timestamp for “compact, fit” associated with object 432 B (e.g., a second watch device) is at 60 seconds (relative to a start of the video).
  • a tag in addition to the product ID, keyword(s), and timestamp, can include a time range corresponding to multiple time instances in the video when the same keyword(s) are mentioned).
  • tag 108 D indicates that the starting timestamp for “noise canceling” associated with object 432 B is at 180 seconds (relative to a start of the video) and that the time range for multiple references to “noise canceling” is between 180 seconds and 210 seconds.
  • the UI 400 allows the user to select a tag 108 in the Tag List 410 in order to preview the tag 108 in the current frame 420 .
  • the user can use the Tag Editor 440 to modify the tag 108 or delete the tag.
  • the Tag Editor 440 the user can modify values of the Product ID field 436 , Timestamp field 438 , Keyword(s) field 450 , and Time Range field 452 .
  • the user can select the UPDATE TAG button 472 to update the values of a tag 108 in the Tag List 410 .
  • the user can select the DELETE TAG button 474 to delete a tag 108 from the Tag List 410 .
  • FIG. 5 is a flowchart of a method 500 for displaying tags on a user device, according to one embodiment.
  • the method 500 may be performed by a user device (e.g., user device 230 ).
  • the method 500 is performed by a user device that has received a video that includes tags and audio/visual data of the video using the method 300 described above.
  • the method 500 may begin when a viewer instructs the user device to play a video.
  • the method 500 begins at block 502 , where the user device displays a video.
  • the user can control when the user device plays the video using a separate I/O device (e.g., a remote) or an integrated I/O device (e.g., touchscreen or trackpad).
  • a separate I/O device e.g., a remote
  • an integrated I/O device e.g., touchscreen or trackpad
  • the user device determines if a query has been received. For example, the user may have paused the video to submit a query regarding an object in the video the user is interested in. If the user device does not detect a query, the method 500 returns to block 502 where the user device continues to play the video. However, once the user device detects a query, the method proceeds to block 506 .
  • the user device determines a keyword associated with the query.
  • the user device may identify a term in the query that corresponds to a keyword for the video. For example, the user may have searched for a word (or term) mentioned in the video about an object.
  • the user device can search metadata transmitted along with the video to determine if the term in the query is a keyword.
  • the user device may determine that terms in the query are not mentioned in the video. In this case, the user device can identify another keyword similar to the searched term.
  • the user device may interact with the keyword generator 206 to identify a keyword similar to the searched term, given the context of the video.
  • the keyword generator 206 may use a machine learning algorithm (e.g., word2vec, GloVe, etc.) to compute a similarity or distance score between the searched term and other keywords associated with the object in the video, and select another keyword having a score that satisfies a particular threshold for the user device to use.
  • the keyword generator 206 can provide an indication of the other keyword to the user device.
  • the user device may be configured to use one or more machine learning algorithm(s) to determine a similar keyword.
  • the user device determines a tag corresponding to the keyword. For example, the user device can search the metadata associated with the video to identify the tag corresponding to the keyword. In some embodiments, the user device can interact with another computing system (e.g., computing system 200 ) to retrieve the tag. For example, the user device can send a request that includes the keyword to the other computing system and receive, in response, the corresponding tag.
  • the user device identifies a time instance in the video related to the keyword from the tag.
  • the user device displays a marker corresponding to the time instance on the user interface.
  • the user device can display the marker along the timeline (or seeker bar) 160 .
  • the marker can be displayed on the user interface in other locations and/or formats.
  • FIG. 6A illustrates an example UI 600 of a user device, where markers 680 A and 680 B are displayed on the timeline 160 .
  • the video may be a review video describing a particular set of headphones (e.g., object 632 ).
  • the review video may be introducing the object 632 , e.g., by describing various features of the object 632 .
  • the user may be interested in hearing more about the comfort (or fit) of the object 632 .
  • the user can search for “comfort” in the search field 604 on the UI 600 .
  • the user device can display markers 680 A and 680 B corresponding to time instances when the “comfort” of the object 632 is mentioned in the video.
  • the user device can display marker 680 C to indicate a region within the video corresponding to multiple time instances when the “comfort” of the object 632 is mentioned in the video.
  • the user can select one of the markers 680 A, 680 B, and 680 C to go directly to the timestamp corresponding to the keyword “comfort.”
  • FIG. 6B after selecting the marker 680 A, the user is taken to current frame 620 B of the video where “comfort” of the object 632 is mentioned.
  • FIG. 6C after selecting the marker 680 B, the user is taken to current frame 620 C in the video where another instance of “comfort” of the object 632 is mentioned.
  • the user can be taken to a starting time instance of the time region corresponding to the marker 680 C where multiple instances of “comfort” of the object 632 are mentioned.
  • the user device can display the marker(s) in a list of results shown on the user interface, where the marker(s) indicates at least the timestamp corresponding to the keyword.
  • the marker(s) can act as navigation anchors (or links) to various parts within the video.
  • FIG. 7 illustrates an example UI 700 of a user device that displays marker(s) in search results 750 .
  • search results 750 can be displayed in a panel (e.g., navigation panel 142 ) on the UI 700 .
  • the search results 750 can appear as a pop-up graphic (e.g., pop-up graphic 150 ).
  • the video may be a review video describing the ten best travel carry-on bags (e.g., objects 732 A, 732 B, 732 C, etc.).
  • the review video may be playing an introductory title introducing the subject of the video.
  • the user may be interested in hearing more about carry-on bags for pets.
  • the user can search for “pet” in the search field 704 .
  • the user device can display a set of markers 780 A and 780 B corresponding to the timestamps associated with the keyword “pet.”
  • Each marker 780 can include an image of the frame of interest, the timestamp, and corresponding portion of text.
  • the marker 780 A includes an image of the frame 720 B depicting object 732 D, the timestamp 114 , and corresponding text 112 .
  • the marker 780 B includes an image of the frame 720 C depicting object 732 E, the timestamp 114 , and corresponding text 112 .
  • the user device can display markers (or links) corresponding to time instances in other videos where the keyword is mentioned.
  • markers or links
  • at least one of the markers presented in the search results 750 can correspond to a particular time instance in another video (e.g., about pet carry on bags) related to the current video.
  • FIG. 8 is a flowchart of a method 800 for using keywords as navigational anchors on a user device, according to one embodiment.
  • the method 800 may be performed by a user device (e.g., user device 230 ).
  • the method 800 is performed by a user device that has received a video that includes tags and audio/visual data of the video using the method 300 described above.
  • the method 800 may begin when a viewer instructs the user device to play a video.
  • the method 800 may begin when a viewer has paused a video playing on the user device.
  • the method 800 begins at block 802 , where the user device identifies an object associated with a video. For example, the user device can identify the object based on the tags (indicating product IDs) transmitted along with the metadata of the video.
  • the user device displays one or more keywords associated with the object. For example, the user device can determine from the tags the keywords associated with the object and display the keywords in a pop-up graphic (e.g., pop-up graphic 150 ) on a UI of the user device.
  • Each keyword acts as a navigational anchor (or hook or link) to the time instance in the video that is associated with the keyword (e.g., the keyword is mentioned or indirectly mentioned at the time instance, the keyword is mentioned within a threshold amount of time after the time instance, etc.).
  • the user device determines that the user has selected one of the keywords. That is, the viewer can use the I/O device to select one of the keywords displayed on the UI of the user device by, for example, touching the keyword in a touch-sensitive screen, using arrows in a remote, using a cursor, and the like.
  • the user device determines a tine instance in the video corresponding to the selected keyword. For example, the user device can retrieve from the metadata transmitted along with the video the tag corresponding to the selected keyword to determine the time instance when the keyword is mentioned in the video.
  • the user device plays the video starting at the time instance.
  • FIGS. 9A-9B illustrates an example UI 900 of a user device that displays suggested keywords to the user, according to one embodiment.
  • the video may be an advertisement promoting different sports equipment (e.g., object 932 A, object 932 B, object 932 C in current frame 920 A).
  • the user device can suggest keywords related to the objects that may be of interest to the user. As noted, these keywords may have been generated by the keyword generator 206 , based on a transcription of the audio content in the video, and transmitted along with audio/visual data of the video.
  • the UI 900 displays keywords 980 A, 980 B, . . . , 980 K in a keyword map 950 .
  • the keyword map 950 can be displayed in a panel (e.g., navigational panel 142 ), as a pop-up graphic (e.g., pop-up graphic 150 ), in the current frame (e.g., current frame 920 A), etc.
  • Each keyword 980 acts as a navigational anchor to the time instance in the video associated with the keyword.
  • FIG. 9B after the user selects keyword 980 A for “basketball size,” the user is taken to current frame 920 B in which “basketball size” is mentioned.
  • aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, microcode, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.”
  • the present invention may be a system, a method, and/or a computer program product.
  • the computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
  • the computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device.
  • the computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • a non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, and any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or Flash memory erasable programmable read-only memory
  • SRAM static random access memory
  • CD-ROM compact disc read-only memory
  • DVD digital versatile disk
  • a memory stick any suitable combination of the foregoing.
  • a computer readable storage medium is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
  • Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
  • the network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers.
  • a network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
  • Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
  • the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
  • These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
  • the computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s).
  • the functions noted in the block may occur out of the order noted in the figures.
  • two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • Embodiments of the invention may be provided to end users through a cloud computing infrastructure.
  • Cloud computing generally refers to the provision of scalable computing resources as a service over a network.
  • Cloud computing may be defined as a computing capability that provides an abstraction between the computing resource and its underlying technical architecture (e.g., servers, storage, networks), enabling convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction.
  • cloud computing allows a user to access virtual computing resources (e.g., storage, data, applications, and even complete virtualized computing systems) in “the cloud,” without regard for the underlying physical systems (or locations of those systems) used to provide the computing resources.
  • cloud computing resources are provided to a user on a pay-per-use basis, where users are charged only for the computing resources actually used (e.g. an amount of storage space consumed by a user or a number of virtualized systems instantiated by the user).
  • a user can access any of the resources that reside in the cloud at any time, and from anywhere across the Internet.
  • applications e.g., video distribution network, transcription computing service
  • the video distribution network could execute on a computing system in the cloud and distribute videos with embedded tags to the user devices.
  • the transcription computing service could execute on a computing system in the cloud and distribute transcriptions of audio content in a video. Doing so allows a user to access this information from any computing system attached to a network connected to the cloud (e.g., the Internet).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
US16/437,649 2019-06-11 2019-06-11 Associating object related keywords with video metadata Active US10657176B1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US16/437,649 US10657176B1 (en) 2019-06-11 2019-06-11 Associating object related keywords with video metadata
PCT/US2020/036908 WO2020251967A1 (fr) 2019-06-11 2020-06-10 Association de mots-clés liés à un objet à des métadonnées vidéo

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US16/437,649 US10657176B1 (en) 2019-06-11 2019-06-11 Associating object related keywords with video metadata

Publications (1)

Publication Number Publication Date
US10657176B1 true US10657176B1 (en) 2020-05-19

Family

ID=70736454

Family Applications (1)

Application Number Title Priority Date Filing Date
US16/437,649 Active US10657176B1 (en) 2019-06-11 2019-06-11 Associating object related keywords with video metadata

Country Status (2)

Country Link
US (1) US10657176B1 (fr)
WO (1) WO2020251967A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114449333A (zh) * 2020-10-30 2022-05-06 华为终端有限公司 视频笔记生成方法及电子设备
CN116915925A (zh) * 2023-06-20 2023-10-20 天翼爱音乐文化科技有限公司 基于视频模板的视频生成方法、系统、电子设备及介质
US11977711B1 (en) * 2015-03-17 2024-05-07 Amazon Technologies, Inc. Resource tagging and grouping

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120323897A1 (en) * 2011-06-14 2012-12-20 Microsoft Corporation Query-dependent audio/video clip search result previews
US8990692B2 (en) * 2009-03-26 2015-03-24 Google Inc. Time-marked hyperlinking to video content
US20160342590A1 (en) * 2015-05-20 2016-11-24 Fti Consulting, Inc. Computer-Implemented System And Method For Sorting, Filtering, And Displaying Documents
US9824691B1 (en) * 2017-06-02 2017-11-21 Sorenson Ip Holdings, Llc Automated population of electronic records

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101644789B1 (ko) * 2009-04-10 2016-08-04 삼성전자주식회사 방송 프로그램 연관 정보 제공 장치 및 방법
US10331661B2 (en) * 2013-10-23 2019-06-25 At&T Intellectual Property I, L.P. Video content search using captioning data
US10108617B2 (en) * 2013-10-30 2018-10-23 Texas Instruments Incorporated Using audio cues to improve object retrieval in video
US20170083620A1 (en) * 2015-09-18 2017-03-23 Sap Se Techniques for Exploring Media Content
US20190080207A1 (en) * 2017-07-06 2019-03-14 Frenzy Labs, Inc. Deep neural network visual product recognition system
US10176846B1 (en) * 2017-07-20 2019-01-08 Rovi Guides, Inc. Systems and methods for determining playback points in media assets

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8990692B2 (en) * 2009-03-26 2015-03-24 Google Inc. Time-marked hyperlinking to video content
US20120323897A1 (en) * 2011-06-14 2012-12-20 Microsoft Corporation Query-dependent audio/video clip search result previews
US20160342590A1 (en) * 2015-05-20 2016-11-24 Fti Consulting, Inc. Computer-Implemented System And Method For Sorting, Filtering, And Displaying Documents
US9824691B1 (en) * 2017-06-02 2017-11-21 Sorenson Ip Holdings, Llc Automated population of electronic records

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
U.S. Appl. No. 16/432,841, "Generating Video Segments Based on Video Metadata," filed Jun. 5, 2019.

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11977711B1 (en) * 2015-03-17 2024-05-07 Amazon Technologies, Inc. Resource tagging and grouping
CN114449333A (zh) * 2020-10-30 2022-05-06 华为终端有限公司 视频笔记生成方法及电子设备
CN114449333B (zh) * 2020-10-30 2023-09-01 华为终端有限公司 视频笔记生成方法及电子设备
CN116915925A (zh) * 2023-06-20 2023-10-20 天翼爱音乐文化科技有限公司 基于视频模板的视频生成方法、系统、电子设备及介质
CN116915925B (zh) * 2023-06-20 2024-02-23 天翼爱音乐文化科技有限公司 基于视频模板的视频生成方法、系统、电子设备及介质

Also Published As

Publication number Publication date
WO2020251967A1 (fr) 2020-12-17

Similar Documents

Publication Publication Date Title
US11120490B1 (en) Generating video segments based on video metadata
US11100096B2 (en) Video content search using captioning data
US11310562B2 (en) User interface for labeling, browsing, and searching semantic labels within video
US10846752B2 (en) Systems and methods for managing interactive features associated with multimedia
US10070170B2 (en) Content annotation tool
US9690768B2 (en) Annotating video intervals
US20180253173A1 (en) Personalized content from indexed archives
US8799300B2 (en) Bookmarking segments of content
US10545954B2 (en) Determining search queries for obtaining information during a user experience of an event
US20190130185A1 (en) Visualization of Tagging Relevance to Video
US20160014482A1 (en) Systems and Methods for Generating Video Summary Sequences From One or More Video Segments
US10657176B1 (en) Associating object related keywords with video metadata
US20150293995A1 (en) Systems and Methods for Performing Multi-Modal Video Search
US9721564B2 (en) Systems and methods for performing ASR in the presence of heterographs
US9946438B2 (en) Maximum value displayed content feature
US20110106809A1 (en) Information presentation apparatus and mobile terminal
US20170242554A1 (en) Method and apparatus for providing summary information of a video
CN113746875B (zh) 一种语音包推荐方法、装置、设备及存储介质
CN111263186A (zh) 视频生成、播放、搜索以及处理方法、装置和存储介质
US11968428B2 (en) Navigating content by relevance
US20230403432A1 (en) Systems and methods for restricting video content
US10699750B1 (en) Tagging tracked objects in a video with metadata
US20200037009A1 (en) System device and methods for presenting media contents
WO2023220274A1 (fr) Cartes d'entité comprenant un contenu descriptif relatif à des entités provenant d'une vidéo

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4