US20220075829A1 - Voice searching metadata through media content - Google Patents

Voice searching metadata through media content Download PDF

Info

Publication number
US20220075829A1
US20220075829A1 US17/528,842 US202117528842A US2022075829A1 US 20220075829 A1 US20220075829 A1 US 20220075829A1 US 202117528842 A US202117528842 A US 202117528842A US 2022075829 A1 US2022075829 A1 US 2022075829A1
Authority
US
United States
Prior art keywords
media content
content files
metadata
scene
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US17/528,842
Inventor
Jing X. Wang
Mark Arana
Edward Drake
Alexander C. Chen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Disney Enterprises Inc
Original Assignee
Disney Enterprises Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Disney Enterprises Inc filed Critical Disney Enterprises Inc
Priority to US17/528,842 priority Critical patent/US20220075829A1/en
Assigned to DISNEY ENTERPRISES, INC. reassignment DISNEY ENTERPRISES, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CHEN, ALEXANDER C., DRAKE, EDWARD, WANG, JING X., ARANA, MARK
Publication of US20220075829A1 publication Critical patent/US20220075829A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9032Query formulation
    • G06F16/90332Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/432Query formulation
    • G06F16/433Query formulation using audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/438Presentation of query results
    • G06F16/4387Presentation of query results by the use of playlists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/44Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/54Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for retrieval
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/482End-user interface for program selection
    • H04N21/4826End-user interface for program selection using recommendation lists, e.g. of programs or channels sorted out according to their score
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/482End-user interface for program selection
    • H04N21/4828End-user interface for program selection for searching program descriptors
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • the present disclosure relates generally to media content playback and interaction.
  • DVD digital video disk
  • VCR video cassette recorder
  • a user may fast forward or rewind through portions of the media content, e.g., scenes of a movie, to achieve playback of a particular portion of the media content that the user wishes to view or experience.
  • Media interaction on devices such as smart phones, laptop personal computers (PCs), and the like mimic such controls during playback of media content being streamed or downloaded to the device.
  • the operation also includes searching metadata associated with a plurality of media content files to identify a subset of the plurality of media content files.
  • the subset of the plurality of media content files include one or more media content files of the plurality of media content files that includes one or more scenes that match the search criteria.
  • the operation also includes providing, to the user device, search results identifying the subset of the plurality of media content files.
  • the metadata is generated by a respective originator of each media content file of the plurality of media content files and describes each scene of the plurality of media content files.
  • FIG. 1 illustrates an example environment in which various embodiments may be implemented.
  • FIG. 2 is an operational flow diagram illustrating an example process for voice searching through a video file in accordance with various embodiments.
  • FIG. 3 illustrates an example user interface for performing voice searching in accordance with various embodiments.
  • FIG. 4A illustrates an example simple user interface for performing voice searching and displaying search results in accordance with embodiment.
  • FIG. 4B illustrates an example advanced user interface for performing voice searching and displaying search results in accordance with another embodiment.
  • FIG. 5 illustrates an example user interface for presenting search results in accordance with one embodiment.
  • FIG. 6 is an example computing module that may be used to implement various features of embodiments described in the present disclosure.
  • traditional methods of interacting with media may involve a user fast forwarding or rewinding through media content to achieve playback of a particular portion of the media content.
  • a user that wishes to view a particular scene in a movie generally fast forwards and rewinds the movie during playback until the desired scene is reached.
  • a user may skip to a particular “chapter” of the movie.
  • the level of granularity that can be achieved with conventional interaction methods is often rough or imprecise.
  • various embodiments described in the present disclosure provide systems and methods that allow a user to use voice commands or inputs to search for one or more portions (e.g., one or more scenes) of media content (e.g., one or more movies) that are of interest to the user.
  • Media content in the context of the present disclosure can be any type of media content, such as movies, music, audio books, and the like.
  • a user is not limited to searching for a particular portion of a single media content during playback via voice commands or input.
  • a user may search for content in one or more content repositories, digital libraries, or databases.
  • truncated versions of media can be accessed, generated, and/or presented, e.g., storylines, relevant scenes that are stitched together, etc.
  • FIG. 1 is a diagram illustrating an example environment in which various embodiments can be implemented.
  • FIG. 1 illustrates a system 100 for providing voice searching of media content.
  • system 100 can include a user device 102 .
  • User device 102 may include a processor 104 and a memory unit 106 , and can be configured to receive digital media content for presentation on a display 108 .
  • User device 102 may further be configured to access a list of media content stored on a content database or repository such as an electronic program guide, an online media store, etc.
  • device 102 may be a tablet PC, a smart phone, a laptop PC, etc.
  • System 100 may further include a media server 112 , which may be operated by a content provider, such as a cable provider (e.g., COMCAST®), YouTube®, a digital media content distributor, such as Amazon®, iTunes®, Netflix® or other third-party distributor.
  • Media server 112 may include a content database 114 on which digital media content can be stored.
  • Media server 112 may further include a search engine 116 for performing searches of media content or portions of media content based on the user's voice commands or input.
  • Search engine 116 may include a voice recognition/speech-to-text engine (or other translation engine) for receiving and analyzing/translating the user's voice commands or input into search instructions that can be understood and followed by search engine 116 .
  • system 100 may include a third-party content provider 120 , which may include and/or control its own content database 122 .
  • third-party content provider 120 may provide content from media server 112 (e.g., by accessing content database 114 and forwarding media to user device 102 ). It should be noted that system 100 may include more or less media servers, content providers, and/or user devices.
  • Network 110 may be any communications network such as a cellular or data network, a satellite network, an intranet, an extranet, a virtual private network (VPN), a local area network (LAN), a wireless LAN (WLAN), a wide area network (WAN), a personal area network (PAN), a portion of the Internet, a portion of the Public Switched Telephone Network (PSTN), or any combination thereof.
  • network 110 may employ various communication media, such as a coaxial cable, fiber optic cable system, Ethernet, radio waves, etc.
  • metadata can be 1) included in a media content file by an originator, such as the producer, or editor, 2) automatically generated by a computer during production or editing of the media content file (scene description, time, location, characters), and/or 3) generated by one or more users.
  • an originator such as the producer, or editor
  • metadata can include user-inserted keywords, tags, titles, annotations, and the like.
  • metadata may include frame information, indexing information, links to enhanced or supplemental content, etc.
  • types and/or amount of metadata in various types of media content can differ.
  • computer animated media content may have large amounts of metadata associated with it (e.g., metadata about objects) as a result of the content itself being computer-generated.
  • Metadata can be associated with media content at any time, e.g., during production, or subsequent to viewing by a user.
  • users that have viewed or experienced a particular piece of media content may provide feedback or ‘third-party’ metadata that can be accessed, mined, aggregated, etc., from fan websites or social media outlets and services.
  • third-party metadata can then be associated with the media content and subsequently indexed.
  • metadata as described herein may further include temporal metadata that can provide time-based information and/or access to one or more portions of media content on its own or in conjunction with other types of metadata.
  • temporal metadata can be included that represents mood on a media content timeline, where users can search for a particular chapter, scene, shot by mood or, e.g., skip depressing portions of the media content.
  • Metadata can be associated with a particular media content file, or a specific scene or camera shot angle in a movie (group of frames) as embedded metadata, linked metadata, etc.
  • a scene can be a sequence of frames with a start frame and an end frame, where the frames relate to an event, part, or location of the story.
  • Metadata can include, but is not limited to the following: actor(s)/actress(es) name (actual name and character role name); song lyrics of a movie soundtrack song; movie dialog; song title; scene title; scene description; film location; shooting location; story location, product shown or included in a particular scene; emotions; objects; actions; acoustic or audio fingerprints; keywords; and/or any other indicia that may be associated with one or more portions of the media content.
  • subtitles can be leveraged as a basis for media content searching.
  • media server 112 can pre-process media content by searching or parsing any metadata included or otherwise associated with a media content file.
  • voice recognition/speech-to-text engine 118 can analyze the voice command or input to determine what a user of user device 102 is searching for. Voice recognition/speech-to-text engine 118 can then translate the voice command or input into a format that search engine 116 can utilize to search for any pre-processed metadata of the relevant media content file(s) stored in, e.g., content database 114 , that matches or meets the search criteria identified in the voice command or input.
  • any relevant media content or portions of media content (such as a scene or group of related scenes) can be transmitted, presented, or identified on user device 102 .
  • Pre-processing of the metadata may include considering one or more ‘associative’ or ‘thematic’ aspects of media content.
  • the metadata can be utilized to identify one or more scenes rather than mere frames of media content. That is, one or more scenes considered together can be used to present, e.g., plot themes, plot points, one or more groups of pictures (GOPs), etc.
  • content database 114 may further include the pre-processed metadata which can be linked with (such as through the use of pointers) or otherwise associated with media content.
  • content database 114 may be partitioned into a pre-processed metadata portion and a portion in which media content is stored.
  • additional databases or data repositories for storing metadata may be implemented in media server 112 or can be remotely accessed by media server 112 , where, e.g., pointers or other associative mechanisms, can link media content stored in content database 114 and the pre-processed metadata.
  • voice recognition/speech-to-text engine 118 at the server 112 can provide a more accurate interpretation or translation of a user's voice command(s) or input(s), as more intensive processing and analysis can be performed on media server 112 .
  • voice recognition can be performed locally on user device 102 .
  • FIG. 2 is an operational flow chart illustrating example operations that may be performed by a media server, e.g., media server 112 , for providing the above-described voice searching functionality in accordance with various embodiments.
  • vocal input is received from a user device.
  • a user may use a device such as a smart phone to input a voice command representative of a search for one or more portions of media content while the user is watching, listening, or otherwise experiencing the media content.
  • at least one portion of media content is searched for based on the vocal user input. That is, media server 112 may search for metadata associated with one or more parts (frames, GOPs, scenes, etc.) in a media content file that matches or meets the search criteria identified in the vocal user input.
  • media server 112 can search the movie media content file for scenes in which the associated metadata or subtitle(s) reference or include the famous weapon.
  • the media server 112 may search for scenes presenting how super hero X obtained his/her super powers.
  • the scenes may be contiguous (e.g., scenes following each other chronologically), or the scenes may be non-contiguous.
  • the media server 112 may stitch the non-contiguous scenes together.
  • media server 112 can instruct, e.g., a media player application on user device 102 , to present a modified progress bar in which the relevant scenes are highlighted or otherwise indicated.
  • other portions of media content besides the currently-experienced media content
  • the user may engage in voice-based searching in the context of content discovery.
  • FIG. 3 is an example of a graphical user interface (GUI) of a media player application implemented on a smart phone 300 , which may be one embodiment of user device 102 .
  • smart phone 300 may include a display 302 on which media content, such a streamed or downloaded movie file, can be displayed via a media player application.
  • media content such as a streamed or downloaded movie file
  • a user can, e.g., swipe or otherwise activate voice command button 304 .
  • voice command button the user may speak a command requesting a search for one or more portions (e.g., “show me all the action scenes”) or aspects of interest regarding the streamed/downloaded movie file.
  • the user may wish to view a scene in the streamed movie file during which a particular song is played.
  • the user may speak the name of the song, hum or sing lyrics of the song, etc.
  • Smart phone 300 may digitize and process the speech/singing of the user for transmission to media server 112 via network 110 .
  • voice recognition/speech-to-text engine 118 may analyze or translate the speech/singing, and search engine 116 may perform the requisite search.
  • media server 112 may instruct the media player of smart phone 300 to display the scene in which the desired song is played.
  • the media player GUI may present a cursor or other indicator on a progress bar 306 indicating where the user can skip to in order to view the relevant scene.
  • the media player GUI may display a “heat map” on or associated with progress bar 306 .
  • This can be useful when, e.g., multiple scenes or portions of media content may potentially be relevant to the user's search.
  • one or more markers 308 a , 308 b , 308 c , etc. may be displayed on progress bar 306 .
  • the one or more markers may be distinguished using, e.g., varying degrees of color.
  • the distinguishing colors can be representative of a relevance score (which can be calculated by search engine 116 ). That is, search engine 116 may complete a search and determine that multiple scenes could potentially meet the search criteria spoken by the user.
  • search engine 116 may determine, e.g., by the amount of matching metadata or subtitles in a scene, a potential relevance to the search criteria. The user may then touch/interact with the heat map and/or use playback buttons 310 to view the relevant scenes indicated by the heat map. Moreover, and instead of the one or more markers, relevant portions of media content can be identified using, e.g., representative thumbnail images overlaid on progress bar 306 .
  • various embodiments are not limited to a linear single point searching experience, as can be the case with conventional systems and methods of media content interaction. Instead, and as described above, various embodiments can present a user with entire scenes, shots, or portions of media content (whether the media content is a movie, a song, an audio book, or other media type). Moreover, the user can be presented with multiple options for viewing the one or more portions of media content, e.g., selecting where to begin viewing the relevant portions of media content, etc. Moreover, the media server 112 can stitch together derivative media content such as story lines or relevant portions of media content or multiple scenes and provide them to the user device.
  • the user can search for media content that has not yet been displayed or experienced, which can achieve enhanced methods of content discovery.
  • a user may employ voice-based searching for media content of interest based on a myriad of indicia/metadata such as those described previously.
  • FIG. 4A illustrates one embodiment of the present disclosure in which a ‘simple’ search GUI may be presented to a user.
  • FIG. 4A illustrates a smart phone 400 , which may be one embodiment of user device 102 .
  • smart phone 400 may include a display 402 on which a voice-based search GUI 404 A can be presented.
  • Voice-based GUI 404 can include a scene request prompt mechanism that the user may actuate in order to input one or more keywords or natural language search terms.
  • a search result 406 A can be presented to the user.
  • a single result can be presented.
  • the single result can be, as previously described, a stitching together of relevant scenes from a single instance of media content.
  • FIG. 4B illustrates another embodiment of a voice-based GUI 404 B that can represent a more ‘advanced’ embodiment of voice-based media content searching.
  • the search results 406 B that can be returned may include various portions of media content that are relevant to the voice-based search. This can include, for example, relevant scenes that include a particular object, a particular character(s), scenes that are relevant from a thematic or plot perspective, as well as additional media content, whether it be derivative content, other or alternative media content, etc.
  • the user interface may be designed to be easy to use and present the found scene(s) in a desirable, unique, and memorable way.
  • search mechanisms or algorithms utilized in accordance with various embodiments can be configured or adapted as needed or desired.
  • closed captioning or subtitle metadata can be used as an initial search reference to identify potentially relevant portions of media content.
  • more refined or complex camera shot or character recognition algorithms or methods can be used to further refine the search to increase the potential relevancy of search results returned to a user.
  • FIG. 5 illustrates an example of a search result GUI in accordance with yet another embodiment of the present disclosure.
  • FIG. 5 illustrates a smart phone 500 , which may be one embodiment of user device 102 .
  • smart phone 500 may include a display 502 on which a search results GUI 504 can be presented to a user.
  • Search results GUI 504 can present a ‘most relevant’ search result along with less relevant, but having potential interest to the user.
  • the user may initiate a voice-based search requesting love scenes in a movie between a character and the name of the actress portraying the character's love interest.
  • Search results GUI 504 may therefore display an icon 504 A representative of the scene(s) relevant to the voice-based search at the forefront.
  • related scenes such as action scenes involving the character and the actress may be presented in the background as another representative icon 504 B. Additionally still, related scenes such as scenes involving the character and other characters/actors can also be presented in the background via yet another representative icon 504 C. It should be noted that the relevant and related scenes or media content can also be presented using relative sizing of the icons to represent a probability ‘score’ reflective of its relevance relative to the voice-based search and/or ‘most-relevant’ search result(s).
  • the relevance of search results can be based on a plurality of various sources.
  • the pre-processed metadata can originate from, e.g., third-party sources, such as social media outlets, fan websites, and the like. That is, the relevancy of search results can be based on, e.g., crowd-sourced information or previous actions by the user.
  • a user can limit a voice search for a scene in user's collection of purchased movies, i.e., digital library.
  • media server 112 of FIG. 2 may access the user's personal library of content (movies).
  • the user's collection can include video clips of favorite scenes in movies referred to as ‘snippets.’
  • media server 112 may skew or customize the search results returned to the user based upon what the user has previously deemed to be of interest, how the user has classified or categorized previously clipped snippets, etc. Examples of snippet technology that various embodiments of the present disclosure can leverage are described in U.S.
  • the search can still be performed by accessing, e.g., an electronic thesaurus or other third-party information source.
  • a user may request a search for scenes in a movie where an actor experiences a “hiccup.”
  • the term hiccup may not.
  • a media server e.g., media server 112 of FIG. 2
  • the search may progress based on a metadata search related to “bodily function.” If such a search fails to produce any results, free-form searching or a “best-guess” search can be performed. Accordingly, hierarchical searching can be utilized in accordance with various embodiments.
  • the user may further refine the search results by selecting (e.g., by voice-based input, touch-based input, etc.) a first aspect of the initial search results, drilling down the first aspect, and so on.
  • a voice-based search input may be, “Show me all Disney® movies.”
  • voice-based GUI 404 B Upon voice-based GUI 404 B returning a list of all known Disney® movies, the user may utilize voice-based GUI 404 B to then input the following search, “Show me all animated movies.” Further still, the user may again utilize voice-based GUI 404 B to initiate yet another narrowing search, “Show me all G-rated movies.”
  • voice-based searching in accordance with various embodiments can also be used to eliminate one or more aspects of media content that a user may wish to exclude from the search results.
  • search options and/or results can be monetized in accordance with various embodiments. For example, simple searching can be provided to a user free of charge. However, should the user wish to perform more comprehensive searching, the user may be required to pay a fee to access such comprehensive search options.
  • a user may perform a voice-based search requesting a certain fight scene of a movie. For a nominal fee (that may be less than the charge for a complete instance of the full media content), the user can receive only the requested fight scene or derivative media content in the form of, e.g., stitched scenes having a common theme or plot perspective from multiple media content instances in accordance with the user's voice-based search request.
  • scene-stitching in accordance with various embodiments need not be limited solely to combining existing portions of media content. That is, various embodiments contemplate ‘creating’ new media content by, e.g., stitching together requested dialog. For example, the user may request media content that includes instances of an actor or character in which particular words or dialogue are stitched together.
  • a user can search for a scene in movies not owned by the user.
  • media server 112 may a) show the entire scene, or b) just a preview (e.g., thumbnail image) of the scene to the user, and thereafter may offer to i) sell the movie to user, or ii) sell just the scene to the user (e.g., for $1 or $2).
  • a user can limit a search for a scene in a single movie, or multiple movies.
  • a user can select different ways to stitch together non-contiguous scenes, e.g., by story, timeline, by relevance; or “FastPlay” all scenes.
  • a user may save a found scene as a favorite, i.e., as a Snippet.
  • FIG. 6 illustrates an example computing module that may be used to implement various features of the system and methods disclosed herein.
  • module might describe a given unit of functionality that can be performed in accordance with one or more embodiments of the present application.
  • a module might be implemented utilizing any form of hardware, software, or a combination thereof.
  • processors, controllers, ASICs, PLAs, PALs, CPLDs, FPGAs, logical components, software routines or other mechanisms might be implemented to make up a module.
  • the various modules described herein might be implemented as discrete modules or the functions and features described can be shared in part or in total among one or more modules.
  • computing module 600 may represent, for example, computing or processing capabilities found within a self-adjusting display, desktop, laptop, notebook, and tablet computers; hand-held computing devices (tablets, PDA's, smart phones, cell phones, palmtops, etc.); workstations or other devices with displays; servers; or any other type of special-purpose or general-purpose computing devices as may be desirable or appropriate for a given application or environment.
  • computing module 600 may be one embodiment of user device 102 , media server 112 , and/or one or more functional elements thereof.
  • Computing module 600 might also represent computing capabilities embedded within or otherwise available to a given device.
  • a computing module might be found in other electronic devices such as, for example navigation systems, portable computing devices, and other electronic devices that might include some form of processing capability.
  • Computing module 600 might include, for example, one or more processors, controllers, control modules, or other processing devices, such as a processor 604 .
  • Processor 604 might be implemented using a general-purpose or special-purpose processing engine such as, for example, a microprocessor, controller, or other control logic.
  • processor 604 is connected to a bus 602 , although any communication medium can be used to facilitate interaction with other components of computing module 600 or to communicate externally.
  • Computing module 600 might also include one or more memory modules, simply referred to herein as main memory 608 .
  • main memory 608 preferably random access memory (RAM) or other dynamic memory might be used for storing information and instructions to be executed by processor 604 .
  • Main memory 608 might also be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 604 .
  • Computing module 600 might likewise include a read only memory (“ROM”) or other static storage device coupled to bus 602 for storing static information and instructions for processor 604 .
  • ROM read only memory
  • the computing module 600 might also include one or more various forms of information storage mechanism 610 , which might include, for example, a media drive 612 and a storage unit interface 620 .
  • the media drive 612 might include a drive or other mechanism to support fixed or removable storage media 614 .
  • a hard disk drive, a solid state drive, a magnetic tape drive, an optical disk drive, a compact disc (CD) or digital video disc (DVD) drive (R or RW), or other removable or fixed media drive might be provided.
  • storage media 614 might include, for example, a hard disk, an integrated circuit assembly, magnetic tape, cartridge, optical disk, a CD or DVD, or other fixed or removable medium that is read by, written to or accessed by media drive 612 .
  • the storage media 614 can include a computer usable storage medium having stored therein computer software or data.
  • information storage mechanism 610 might include other similar instrumentalities for allowing computer programs or other instructions or data to be loaded into computing module 600 .
  • Such instrumentalities might include, for example, a fixed or removable storage unit 622 and an interface 620 .
  • Examples of such storage units 622 and interfaces 620 can include a program cartridge and cartridge interface, a removable memory (for example, a flash memory or other removable memory module) and memory slot, a PCMCIA slot and card, and other fixed or removable storage units 622 and interfaces 620 that allow software and data to be transferred from the storage unit 622 to computing module 600 .
  • Computing module 600 might also include a communications interface 624 .
  • Communications interface 624 might be used to allow software and data to be transferred between computing module 600 and external devices.
  • Examples of communications interface 624 might include a modem or softmodem, a network interface (such as an Ethernet, network interface card, WiMedia, IEEE 802.XX or other interface), a communications port (such as for example, a USB port, IR port, RS232 port Bluetooth® interface, or other port), or other communications interface.
  • Software and data transferred via communications interface 624 might typically be carried on signals, which can be electronic, electromagnetic (which includes optical) or other signals capable of being exchanged by a given communications interface 624 . These signals might be provided to communications interface 624 via a channel 628 .
  • This channel 628 might carry signals and might be implemented using a wired or wireless communication medium.
  • Some examples of a channel might include a phone line, a cellular link, an RF link, an optical link, a network interface, a local or wide area network, and other wired or wireless communications channels.
  • computer program medium and “computer usable medium” are used to generally refer to transitory or non-transitory media such as, for example, memory 608 , storage unit 620 , media 614 , and channel 628 .
  • These and other various forms of computer program media or computer usable media may be involved in carrying one or more sequences of one or more instructions to a processing device for execution.
  • Such instructions embodied on the medium are generally referred to as “computer program code” or a “computer program product” (which may be grouped in the form of computer programs or other groupings). When executed, such instructions might enable the computing module 600 to perform features or functions of the present application as discussed herein.
  • module does not imply that the components or functionality described or claimed as part of the module are all configured in a common package. Indeed, any or all of the various components of a module, whether control logic or other components, can be combined in a single package or separately maintained and can further be distributed in multiple groupings or packages or across multiple locations.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

Techniques for searching metadata through media content. User input identifying a search criteria is received from a user device. Metadata associated with media content files is searched to identify a subset of the media content files. Search results identifying the subset of the media content files are provided to the user device. The metadata is generated by an originator of each media content file and describes each scene.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application is a continuation of co-pending U.S. patent application Ser. No. 14/568,083, filed Dec. 11, 2014, which claims benefit of U.S. provisional patent application 62/059,703, filed Oct. 3, 2014. Each of the aforementioned related patent applications are herein incorporated by reference in its entirety.
  • BACKGROUND
  • The present disclosure relates generally to media content playback and interaction.
  • Traditional methods of interacting with media content via a digital video disk (DVD) or video cassette recorder (VCR) generally rely on actuating playback buttons or controls. For example, a user may fast forward or rewind through portions of the media content, e.g., scenes of a movie, to achieve playback of a particular portion of the media content that the user wishes to view or experience. Media interaction on devices such as smart phones, laptop personal computers (PCs), and the like mimic such controls during playback of media content being streamed or downloaded to the device.
  • SUMMARY
  • A computer-implemented method, a non-transitory computer-readable medium, and a system and provided to perform an operation that includes receiving, from a user device, user input identifying a search criteria. The operation also includes searching metadata associated with a plurality of media content files to identify a subset of the plurality of media content files. The subset of the plurality of media content files include one or more media content files of the plurality of media content files that includes one or more scenes that match the search criteria. The operation also includes providing, to the user device, search results identifying the subset of the plurality of media content files. The metadata is generated by a respective originator of each media content file of the plurality of media content files and describes each scene of the plurality of media content files.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The present disclosure, in accordance with one or more various embodiments, is described in detail with reference to the following figures. The figures are provided for purposes of illustration only and merely depict typical or example embodiments.
  • FIG. 1 illustrates an example environment in which various embodiments may be implemented.
  • FIG. 2 is an operational flow diagram illustrating an example process for voice searching through a video file in accordance with various embodiments.
  • FIG. 3 illustrates an example user interface for performing voice searching in accordance with various embodiments.
  • FIG. 4A illustrates an example simple user interface for performing voice searching and displaying search results in accordance with embodiment.
  • FIG. 4B illustrates an example advanced user interface for performing voice searching and displaying search results in accordance with another embodiment.
  • FIG. 5 illustrates an example user interface for presenting search results in accordance with one embodiment.
  • FIG. 6 is an example computing module that may be used to implement various features of embodiments described in the present disclosure.
  • The figures are not exhaustive and do not limit the present disclosure to the precise form disclosed.
  • DETAILED DESCRIPTION
  • As previously described, traditional methods of interacting with media may involve a user fast forwarding or rewinding through media content to achieve playback of a particular portion of the media content. In the case of a DVD, a user that wishes to view a particular scene in a movie generally fast forwards and rewinds the movie during playback until the desired scene is reached. Alternatively, a user may skip to a particular “chapter” of the movie. However, the level of granularity that can be achieved with conventional interaction methods is often rough or imprecise.
  • The use of smart phones or tablet PCs that have small displays (relative to conventional TVs or monitors) can often exacerbate the imprecise nature of conventional media interaction. This is because the playback controls or mechanisms on such devices are commensurately small as well.
  • Moreover, conventional methods of searching media content rely on text-only searching, and often only retrieves complete versions of media content, or retrieves specific ‘frames’ in the context of movie media based upon text-only metadata such as subtitle information.
  • Accordingly, various embodiments described in the present disclosure provide systems and methods that allow a user to use voice commands or inputs to search for one or more portions (e.g., one or more scenes) of media content (e.g., one or more movies) that are of interest to the user. Media content in the context of the present disclosure can be any type of media content, such as movies, music, audio books, and the like. A user is not limited to searching for a particular portion of a single media content during playback via voice commands or input. For example, a user may search for content in one or more content repositories, digital libraries, or databases. Further still, and based upon the particular voice commands issued by the user, truncated versions of media can be accessed, generated, and/or presented, e.g., storylines, relevant scenes that are stitched together, etc.
  • FIG. 1 is a diagram illustrating an example environment in which various embodiments can be implemented. FIG. 1 illustrates a system 100 for providing voice searching of media content. As illustrated in FIG. 1, system 100 can include a user device 102. User device 102 may include a processor 104 and a memory unit 106, and can be configured to receive digital media content for presentation on a display 108. User device 102 may further be configured to access a list of media content stored on a content database or repository such as an electronic program guide, an online media store, etc. As alluded to previously, device 102 may be a tablet PC, a smart phone, a laptop PC, etc.
  • System 100 may further include a media server 112, which may be operated by a content provider, such as a cable provider (e.g., COMCAST®), YouTube®, a digital media content distributor, such as Amazon®, iTunes®, Netflix® or other third-party distributor. Media server 112 may include a content database 114 on which digital media content can be stored. Media server 112 may further include a search engine 116 for performing searches of media content or portions of media content based on the user's voice commands or input. Search engine 116 may include a voice recognition/speech-to-text engine (or other translation engine) for receiving and analyzing/translating the user's voice commands or input into search instructions that can be understood and followed by search engine 116. Further still, system 100 may include a third-party content provider 120, which may include and/or control its own content database 122. In certain scenarios, third-party content provider 120 may provide content from media server 112 (e.g., by accessing content database 114 and forwarding media to user device 102). It should be noted that system 100 may include more or less media servers, content providers, and/or user devices.
  • Communications between one or more of media server 112, third-party content provider 120, and/or user device 102 can be effectuated over a network 110. Network 110 may be any communications network such as a cellular or data network, a satellite network, an intranet, an extranet, a virtual private network (VPN), a local area network (LAN), a wireless LAN (WLAN), a wide area network (WAN), a personal area network (PAN), a portion of the Internet, a portion of the Public Switched Telephone Network (PSTN), or any combination thereof. Accordingly, network 110 may employ various communication media, such as a coaxial cable, fiber optic cable system, Ethernet, radio waves, etc.
  • In accordance with various embodiments, searching for or through media content can be accomplished using metadata. That is, metadata can be 1) included in a media content file by an originator, such as the producer, or editor, 2) automatically generated by a computer during production or editing of the media content file (scene description, time, location, characters), and/or 3) generated by one or more users. In the case of YouTube® media content, for example, metadata can include user-inserted keywords, tags, titles, annotations, and the like. In the case of studio-produced media content, metadata may include frame information, indexing information, links to enhanced or supplemental content, etc. It should be noted that the types and/or amount of metadata in various types of media content can differ. For example, computer animated media content may have large amounts of metadata associated with it (e.g., metadata about objects) as a result of the content itself being computer-generated.
  • Moreover, metadata can be associated with media content at any time, e.g., during production, or subsequent to viewing by a user. For example, users that have viewed or experienced a particular piece of media content may provide feedback or ‘third-party’ metadata that can be accessed, mined, aggregated, etc., from fan websites or social media outlets and services. Such third-party metadata can then be associated with the media content and subsequently indexed. Additionally still, metadata as described herein may further include temporal metadata that can provide time-based information and/or access to one or more portions of media content on its own or in conjunction with other types of metadata. For example, temporal metadata can be included that represents mood on a media content timeline, where users can search for a particular chapter, scene, shot by mood or, e.g., skip depressing portions of the media content.
  • Such metadata can be associated with a particular media content file, or a specific scene or camera shot angle in a movie (group of frames) as embedded metadata, linked metadata, etc. A scene can be a sequence of frames with a start frame and an end frame, where the frames relate to an event, part, or location of the story. Metadata can include, but is not limited to the following: actor(s)/actress(es) name (actual name and character role name); song lyrics of a movie soundtrack song; movie dialog; song title; scene title; scene description; film location; shooting location; story location, product shown or included in a particular scene; emotions; objects; actions; acoustic or audio fingerprints; keywords; and/or any other indicia that may be associated with one or more portions of the media content. Alternatively or in addition to metadata, subtitles can be leveraged as a basis for media content searching.
  • Prior to sending the media content, media server 112 can pre-process media content by searching or parsing any metadata included or otherwise associated with a media content file. Upon receiving a voice command or input from user device 102, voice recognition/speech-to-text engine 118 can analyze the voice command or input to determine what a user of user device 102 is searching for. Voice recognition/speech-to-text engine 118 can then translate the voice command or input into a format that search engine 116 can utilize to search for any pre-processed metadata of the relevant media content file(s) stored in, e.g., content database 114, that matches or meets the search criteria identified in the voice command or input. Upon completing the search, any relevant media content or portions of media content (such as a scene or group of related scenes) can be transmitted, presented, or identified on user device 102.
  • Pre-processing of the metadata may include considering one or more ‘associative’ or ‘thematic’ aspects of media content. For example, and in accordance with some embodiments, the metadata can be utilized to identify one or more scenes rather than mere frames of media content. That is, one or more scenes considered together can be used to present, e.g., plot themes, plot points, one or more groups of pictures (GOPs), etc.
  • Hence, content database 114 may further include the pre-processed metadata which can be linked with (such as through the use of pointers) or otherwise associated with media content. For example, content database 114 may be partitioned into a pre-processed metadata portion and a portion in which media content is stored. Alternatively, additional databases or data repositories for storing metadata (not shown) may be implemented in media server 112 or can be remotely accessed by media server 112, where, e.g., pointers or other associative mechanisms, can link media content stored in content database 114 and the pre-processed metadata.
  • It should be noted that use of voice recognition/speech-to-text engine 118 at the server 112 can provide a more accurate interpretation or translation of a user's voice command(s) or input(s), as more intensive processing and analysis can be performed on media server 112. However, in accordance with other embodiments, voice recognition can be performed locally on user device 102.
  • FIG. 2 is an operational flow chart illustrating example operations that may be performed by a media server, e.g., media server 112, for providing the above-described voice searching functionality in accordance with various embodiments. At operation 200, vocal input is received from a user device. As described above, a user may use a device such as a smart phone to input a voice command representative of a search for one or more portions of media content while the user is watching, listening, or otherwise experiencing the media content. At operation 202, at least one portion of media content is searched for based on the vocal user input. That is, media server 112 may search for metadata associated with one or more parts (frames, GOPs, scenes, etc.) in a media content file that matches or meets the search criteria identified in the vocal user input. For example, if the user's voice command or input requests scenes within a movie in which a famous weapon is shown, media server 112 can search the movie media content file for scenes in which the associated metadata or subtitle(s) reference or include the famous weapon. As another example, and in response to a user requesting to be shown “the origin of super hero X,” the media server 112 may search for scenes presenting how super hero X obtained his/her super powers.
  • The scenes may be contiguous (e.g., scenes following each other chronologically), or the scenes may be non-contiguous. In the case of non-contiguous scenes, the media server 112 may stitch the non-contiguous scenes together.
  • At operation 204, access to the at least one portion of media content is provided via the user device. Following the above example, media server 112 can instruct, e.g., a media player application on user device 102, to present a modified progress bar in which the relevant scenes are highlighted or otherwise indicated. In accordance with still other embodiments, as will be described in greater detail below, other portions of media content (besides the currently-experienced media content) that have some relevance based on the user's vocal input may be returned to the user as search results. That is, the user may engage in voice-based searching in the context of content discovery.
  • FIG. 3 is an example of a graphical user interface (GUI) of a media player application implemented on a smart phone 300, which may be one embodiment of user device 102. As illustrated in FIG. 3, smart phone 300 may include a display 302 on which media content, such a streamed or downloaded movie file, can be displayed via a media player application. At any point during playback of the streamed movie file, a user can, e.g., swipe or otherwise activate voice command button 304. Upon activating voice command button, the user may speak a command requesting a search for one or more portions (e.g., “show me all the action scenes”) or aspects of interest regarding the streamed/downloaded movie file. For example, the user may wish to view a scene in the streamed movie file during which a particular song is played. The user may speak the name of the song, hum or sing lyrics of the song, etc. Smart phone 300 may digitize and process the speech/singing of the user for transmission to media server 112 via network 110. As described above, voice recognition/speech-to-text engine 118 may analyze or translate the speech/singing, and search engine 116 may perform the requisite search. Upon finding one or more matches to the speech/singing, media server 112 may instruct the media player of smart phone 300 to display the scene in which the desired song is played. Alternatively, the media player GUI may present a cursor or other indicator on a progress bar 306 indicating where the user can skip to in order to view the relevant scene.
  • Alternatively still, and as illustrated in FIG. 3, the media player GUI may display a “heat map” on or associated with progress bar 306. This can be useful when, e.g., multiple scenes or portions of media content may potentially be relevant to the user's search. For example, one or more markers 308 a, 308 b, 308 c, etc., may be displayed on progress bar 306. The one or more markers may be distinguished using, e.g., varying degrees of color. The distinguishing colors can be representative of a relevance score (which can be calculated by search engine 116). That is, search engine 116 may complete a search and determine that multiple scenes could potentially meet the search criteria spoken by the user. In such a scenario, search engine 116 may determine, e.g., by the amount of matching metadata or subtitles in a scene, a potential relevance to the search criteria. The user may then touch/interact with the heat map and/or use playback buttons 310 to view the relevant scenes indicated by the heat map. Moreover, and instead of the one or more markers, relevant portions of media content can be identified using, e.g., representative thumbnail images overlaid on progress bar 306.
  • It should be noted that various embodiments are not limited to a linear single point searching experience, as can be the case with conventional systems and methods of media content interaction. Instead, and as described above, various embodiments can present a user with entire scenes, shots, or portions of media content (whether the media content is a movie, a song, an audio book, or other media type). Moreover, the user can be presented with multiple options for viewing the one or more portions of media content, e.g., selecting where to begin viewing the relevant portions of media content, etc. Moreover, the media server 112 can stitch together derivative media content such as story lines or relevant portions of media content or multiple scenes and provide them to the user device.
  • Further still, the user can search for media content that has not yet been displayed or experienced, which can achieve enhanced methods of content discovery. For example, instead of searching for desired media content using conventional methods of textual-based searching, a user may employ voice-based searching for media content of interest based on a myriad of indicia/metadata such as those described previously.
  • Various GUIs may also be presented to the user through which a voice-based search can be conducted and media content search results can be presented. FIG. 4A illustrates one embodiment of the present disclosure in which a ‘simple’ search GUI may be presented to a user. FIG. 4A illustrates a smart phone 400, which may be one embodiment of user device 102. As illustrated in FIG. 4A, smart phone 400 may include a display 402 on which a voice-based search GUI 404A can be presented. Voice-based GUI 404 can include a scene request prompt mechanism that the user may actuate in order to input one or more keywords or natural language search terms. In response to the input, a search result 406A can be presented to the user. In the case of this particular voice-based GUI 404A, which may be appropriate, e.g., for younger users, a single result can be presented. The single result can be, as previously described, a stitching together of relevant scenes from a single instance of media content.
  • FIG. 4B illustrates another embodiment of a voice-based GUI 404B that can represent a more ‘advanced’ embodiment of voice-based media content searching. As previously described, the search results 406B that can be returned may include various portions of media content that are relevant to the voice-based search. This can include, for example, relevant scenes that include a particular object, a particular character(s), scenes that are relevant from a thematic or plot perspective, as well as additional media content, whether it be derivative content, other or alternative media content, etc.
  • The user interface may be designed to be easy to use and present the found scene(s) in a desirable, unique, and memorable way.
  • It should be noted that the search mechanisms or algorithms utilized in accordance with various embodiments can be configured or adapted as needed or desired. For example, the use of closed captioning or subtitle metadata can be used as an initial search reference to identify potentially relevant portions of media content. Subsequently or in conjunction with such search methods, more refined or complex camera shot or character recognition algorithms or methods can be used to further refine the search to increase the potential relevancy of search results returned to a user.
  • FIG. 5 illustrates an example of a search result GUI in accordance with yet another embodiment of the present disclosure. FIG. 5 illustrates a smart phone 500, which may be one embodiment of user device 102. As illustrated in FIG. 5, smart phone 500 may include a display 502 on which a search results GUI 504 can be presented to a user. Search results GUI 504 can present a ‘most relevant’ search result along with less relevant, but having potential interest to the user. For example, the user may initiate a voice-based search requesting love scenes in a movie between a character and the name of the actress portraying the character's love interest. Search results GUI 504 may therefore display an icon 504A representative of the scene(s) relevant to the voice-based search at the forefront. Additionally, related scenes such as action scenes involving the character and the actress may be presented in the background as another representative icon 504B. Additionally still, related scenes such as scenes involving the character and other characters/actors can also be presented in the background via yet another representative icon 504C. It should be noted that the relevant and related scenes or media content can also be presented using relative sizing of the icons to represent a probability ‘score’ reflective of its relevance relative to the voice-based search and/or ‘most-relevant’ search result(s).
  • The relevance of search results can be based on a plurality of various sources. As alluded to above, the pre-processed metadata can originate from, e.g., third-party sources, such as social media outlets, fan websites, and the like. That is, the relevancy of search results can be based on, e.g., crowd-sourced information or previous actions by the user.
  • A user can limit a voice search for a scene in user's collection of purchased movies, i.e., digital library. In conducting a search, media server 112 of FIG. 2 may access the user's personal library of content (movies). The user's collection can include video clips of favorite scenes in movies referred to as ‘snippets.’ Based upon the content of such snippets, media server 112 may skew or customize the search results returned to the user based upon what the user has previously deemed to be of interest, how the user has classified or categorized previously clipped snippets, etc. Examples of snippet technology that various embodiments of the present disclosure can leverage are described in U.S. patent application Ser. No. 14/189,908, which is incorporated herein by reference in its entirety.
  • It should be further noted that in the event where pre-processed metadata does not match, e.g., some voice-based keyword input by user, the search can still be performed by accessing, e.g., an electronic thesaurus or other third-party information source. For example, a user may request a search for scenes in a movie where an actor experiences a “hiccup.” As opposed to a search for “love scenes” which is likely to have relevant metadata, the term hiccup may not. Accordingly, a media server, e.g., media server 112 of FIG. 2) may access the aforementioned third-party information source to determine that hiccup relates to a bodily function. Accordingly, the search may progress based on a metadata search related to “bodily function.” If such a search fails to produce any results, free-form searching or a “best-guess” search can be performed. Accordingly, hierarchical searching can be utilized in accordance with various embodiments.
  • Referring back to FIG. 4B, it should be appreciated that the user may further refine the search results by selecting (e.g., by voice-based input, touch-based input, etc.) a first aspect of the initial search results, drilling down the first aspect, and so on. For example, a voice-based search input may be, “Show me all Disney® movies.” Upon voice-based GUI 404B returning a list of all known Disney® movies, the user may utilize voice-based GUI 404B to then input the following search, “Show me all animated movies.” Further still, the user may again utilize voice-based GUI 404B to initiate yet another narrowing search, “Show me all G-rated movies.” It should be noted that voice-based searching in accordance with various embodiments can also be used to eliminate one or more aspects of media content that a user may wish to exclude from the search results.
  • Moreover, search options and/or results can be monetized in accordance with various embodiments. For example, simple searching can be provided to a user free of charge. However, should the user wish to perform more comprehensive searching, the user may be required to pay a fee to access such comprehensive search options. Additionally, with respect to search results and in the context of content discovery, a user may perform a voice-based search requesting a certain fight scene of a movie. For a nominal fee (that may be less than the charge for a complete instance of the full media content), the user can receive only the requested fight scene or derivative media content in the form of, e.g., stitched scenes having a common theme or plot perspective from multiple media content instances in accordance with the user's voice-based search request.
  • It should be noted that scene-stitching in accordance with various embodiments need not be limited solely to combining existing portions of media content. That is, various embodiments contemplate ‘creating’ new media content by, e.g., stitching together requested dialog. For example, the user may request media content that includes instances of an actor or character in which particular words or dialogue are stitched together.
  • Furthermore, a user can search for a scene in movies not owned by the user. Upon media server 112 finding the scene, media server 112 may a) show the entire scene, or b) just a preview (e.g., thumbnail image) of the scene to the user, and thereafter may offer to i) sell the movie to user, or ii) sell just the scene to the user (e.g., for $1 or $2).
  • A user can limit a search for a scene in a single movie, or multiple movies.
  • A user can select different ways to stitch together non-contiguous scenes, e.g., by story, timeline, by relevance; or “FastPlay” all scenes.
  • There may be different ways to search and rank metadata in scenes/frames in a movie in accordance with various embodiments.
  • A user may save a found scene as a favorite, i.e., as a Snippet.
  • It should be noted that although various embodiments presented herein have been described in the context of video/visual-based media content, other embodiments can be adapted for use in other contexts, such as radio broadcasting, podcasting, etc. Moreover, the systems and methods described herein can be adapted for use in allowing users/consumers to purchase/rent or access previously purchased/rented “full access” versions of “limited access” games, applications, and other such content.
  • FIG. 6 illustrates an example computing module that may be used to implement various features of the system and methods disclosed herein.
  • As used herein, the term module might describe a given unit of functionality that can be performed in accordance with one or more embodiments of the present application. As used herein, a module might be implemented utilizing any form of hardware, software, or a combination thereof. For example, one or more processors, controllers, ASICs, PLAs, PALs, CPLDs, FPGAs, logical components, software routines or other mechanisms might be implemented to make up a module. In implementation, the various modules described herein might be implemented as discrete modules or the functions and features described can be shared in part or in total among one or more modules. In other words, as would be apparent to one of ordinary skill in the art after reading this description, the various features and functionality described herein may be implemented in any given application and can be implemented in one or more separate or shared modules in various combinations and permutations. Even though various features or elements of functionality may be individually described or claimed as separate modules, one of ordinary skill in the art will understand that these features and functionality can be shared among one or more common software and hardware elements, and such description shall not require or imply that separate hardware or software components are used to implement such features or functionality.
  • Where components or modules of the application are implemented in whole or in part using software, in one embodiment, these software elements can be implemented to operate with a computing or processing module capable of carrying out the functionality described with respect thereto. One such example computing module is shown in FIG. 6. Various embodiments are described in terms of this example-computing module 600. After reading this description, it will become apparent to a person skilled in the relevant art how to implement the application using other computing modules or architectures.
  • Referring now to FIG. 6, computing module 600 may represent, for example, computing or processing capabilities found within a self-adjusting display, desktop, laptop, notebook, and tablet computers; hand-held computing devices (tablets, PDA's, smart phones, cell phones, palmtops, etc.); workstations or other devices with displays; servers; or any other type of special-purpose or general-purpose computing devices as may be desirable or appropriate for a given application or environment. For example, computing module 600 may be one embodiment of user device 102, media server 112, and/or one or more functional elements thereof. Computing module 600 might also represent computing capabilities embedded within or otherwise available to a given device. For example, a computing module might be found in other electronic devices such as, for example navigation systems, portable computing devices, and other electronic devices that might include some form of processing capability.
  • Computing module 600 might include, for example, one or more processors, controllers, control modules, or other processing devices, such as a processor 604. Processor 604 might be implemented using a general-purpose or special-purpose processing engine such as, for example, a microprocessor, controller, or other control logic. In the illustrated example, processor 604 is connected to a bus 602, although any communication medium can be used to facilitate interaction with other components of computing module 600 or to communicate externally.
  • Computing module 600 might also include one or more memory modules, simply referred to herein as main memory 608. For example, preferably random access memory (RAM) or other dynamic memory might be used for storing information and instructions to be executed by processor 604. Main memory 608 might also be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 604. Computing module 600 might likewise include a read only memory (“ROM”) or other static storage device coupled to bus 602 for storing static information and instructions for processor 604.
  • The computing module 600 might also include one or more various forms of information storage mechanism 610, which might include, for example, a media drive 612 and a storage unit interface 620. The media drive 612 might include a drive or other mechanism to support fixed or removable storage media 614. For example, a hard disk drive, a solid state drive, a magnetic tape drive, an optical disk drive, a compact disc (CD) or digital video disc (DVD) drive (R or RW), or other removable or fixed media drive might be provided. Accordingly, storage media 614 might include, for example, a hard disk, an integrated circuit assembly, magnetic tape, cartridge, optical disk, a CD or DVD, or other fixed or removable medium that is read by, written to or accessed by media drive 612. As these examples illustrate, the storage media 614 can include a computer usable storage medium having stored therein computer software or data.
  • In alternative embodiments, information storage mechanism 610 might include other similar instrumentalities for allowing computer programs or other instructions or data to be loaded into computing module 600. Such instrumentalities might include, for example, a fixed or removable storage unit 622 and an interface 620. Examples of such storage units 622 and interfaces 620 can include a program cartridge and cartridge interface, a removable memory (for example, a flash memory or other removable memory module) and memory slot, a PCMCIA slot and card, and other fixed or removable storage units 622 and interfaces 620 that allow software and data to be transferred from the storage unit 622 to computing module 600.
  • Computing module 600 might also include a communications interface 624. Communications interface 624 might be used to allow software and data to be transferred between computing module 600 and external devices. Examples of communications interface 624 might include a modem or softmodem, a network interface (such as an Ethernet, network interface card, WiMedia, IEEE 802.XX or other interface), a communications port (such as for example, a USB port, IR port, RS232 port Bluetooth® interface, or other port), or other communications interface. Software and data transferred via communications interface 624 might typically be carried on signals, which can be electronic, electromagnetic (which includes optical) or other signals capable of being exchanged by a given communications interface 624. These signals might be provided to communications interface 624 via a channel 628. This channel 628 might carry signals and might be implemented using a wired or wireless communication medium. Some examples of a channel might include a phone line, a cellular link, an RF link, an optical link, a network interface, a local or wide area network, and other wired or wireless communications channels.
  • In this document, the terms “computer program medium” and “computer usable medium” are used to generally refer to transitory or non-transitory media such as, for example, memory 608, storage unit 620, media 614, and channel 628. These and other various forms of computer program media or computer usable media may be involved in carrying one or more sequences of one or more instructions to a processing device for execution. Such instructions embodied on the medium, are generally referred to as “computer program code” or a “computer program product” (which may be grouped in the form of computer programs or other groupings). When executed, such instructions might enable the computing module 600 to perform features or functions of the present application as discussed herein.
  • Although described above in terms of various exemplary embodiments and implementations, it should be understood that the various features, aspects and functionality described in one or more of the individual embodiments are not limited in their applicability to the particular embodiment with which they are described, but instead can be applied, alone or in various combinations, to one or more of the other embodiments of the application, whether or not such embodiments are described and whether or not such features are presented as being a part of a described embodiment. Thus, the breadth and scope of the present application should not be limited by any of the above-described exemplary embodiments.
  • Terms and phrases used in this document, and variations thereof, unless otherwise expressly stated, should be construed as open ended as opposed to limiting. As examples of the foregoing: the term “including” should be read as meaning “including, without limitation” or the like; the term “example” is used to provide exemplary instances of the item in discussion, not an exhaustive or limiting list thereof; the terms “a” or “an” should be read as meaning “at least one,” “one or more” or the like; and adjectives such as “conventional,” “traditional,” “normal,” “standard,” “known” and terms of similar meaning should not be construed as limiting the item described to a given time period or to an item available as of a given time, but instead should be read to encompass conventional, traditional, normal, or standard technologies that may be available or known now or at any time in the future. Likewise, where this document refers to technologies that would be apparent or known to one of ordinary skill in the art, such technologies encompass those apparent or known to the skilled artisan now or at any time in the future.
  • The presence of broadening words and phrases such as “one or more,” “at least,” “but not limited to” or other like phrases in some instances shall not be read to mean that the narrower case is intended or required in instances where such broadening phrases may be absent. The use of the term “module” does not imply that the components or functionality described or claimed as part of the module are all configured in a common package. Indeed, any or all of the various components of a module, whether control logic or other components, can be combined in a single package or separately maintained and can further be distributed in multiple groupings or packages or across multiple locations.
  • Additionally, the various embodiments set forth herein are described in terms of exemplary block diagrams, flow charts and other illustrations. As will become apparent to one of ordinary skill in the art after reading this document, the illustrated embodiments and their various alternatives can be implemented without confinement to the illustrated examples. For example, block diagrams and their accompanying description should not be construed as mandating a particular architecture or configuration.

Claims (21)

What is claimed is:
1. A computer-implemented method, comprising:
receiving, from a user device, user input identifying a search criteria;
searching, by operation of one or more computer processors, metadata associated with a plurality of media content files to identify a subset of the plurality of media content files, the subset of the plurality of media content files comprising one or more media content files of the plurality of media content files that includes one or more scenes that match the search criteria; and
providing, to the user device, search results identifying the subset of the plurality of media content files,
wherein the metadata is generated by a respective originator of each media content file of the plurality of media content files and describes each scene of the plurality of media content files.
2. The computer-implemented method of claim 1, wherein the metadata comprises visual metadata describing an actor, an actress, a character, an object, a location, an emotion, an action, a theme, or a plot point associated with each scene of the plurality of media content files.
3. The computer-implemented method of claim 1, wherein the metadata comprises audio metadata describing a dialog or a song associated with each scene of the plurality of media content files.
4. The computer-implemented method of claim 1, wherein the metadata comprises subtitle metadata describing a subtitle associated with each scene of the plurality of media content files.
5. The computer-implemented method of claim 1, wherein the metadata comprises visual metadata describing an actor, an actress, a character, an object, a location, an emotion, an action, a theme, or a plot point associated with each scene of the plurality of media content files, and
wherein the metadata further comprises at least one of:
audio metadata describing a dialog or a song associated with each scene of the plurality of media content files, or
subtitle metadata describing a subtitle associated with each scene of the plurality of media content files.
6. The computer-implemented method of claim 1, wherein the user input comprises vocal user input, and wherein the computer-implemented method further comprises:
initiating a speech-to-text recognition process to ascertain the search criterion from the vocal user input.
7. The computer-implemented method of claim 1, further comprising:
generating derivative media content by stitching together the search results into a single media content file based on a stitching option.
8. A non-transitory computer-readable medium containing a program executable to perform an operation comprising:
receiving, from a user device, user input identifying a search criteria;
searching, by one or more computer processors when executing the program, metadata associated with a plurality of media content files to identify a subset of the plurality of media content files, the subset of the plurality of media content files comprising one or more media content files of the plurality of media content files that includes one or more scenes that match the search criteria; and
providing, to the user device, search results identifying the subset of the plurality of media content files,
wherein the metadata is generated by a respective originator of each media content file of the plurality of media content files and describes each scene of the plurality of media content files.
9. The non-transitory computer-readable medium of claim 8, wherein the metadata comprises visual metadata describing an actor, an actress, a character, an object, a location, an emotion, an action, a theme, or a plot point associated with each scene of the plurality of media content files.
10. The non-transitory computer-readable medium of claim 8, wherein the metadata comprises audio metadata describing a dialog or a song associated with each scene of the plurality of media content files.
11. The non-transitory computer-readable medium of claim 8, wherein the metadata comprises subtitle metadata describing a subtitle associated with each scene of the plurality of media content files.
12. The non-transitory computer-readable medium of claim 8, wherein the metadata comprises visual metadata describing an actor, an actress, a character, an object, a location, an emotion, an action, a theme, or a plot point associated with each scene of the plurality of media content files, and
wherein the metadata further comprises at least one of:
audio metadata describing a dialog or a song associated with each scene of the plurality of media content files, or
subtitle metadata describing a subtitle associated with each scene of the plurality of media content files.
13. The non-transitory computer-readable medium of claim 8, wherein the user input comprises vocal user input, and wherein the operation further comprises:
initiating a speech-to-text recognition process to ascertain the search criterion from the vocal user input.
14. The non-transitory computer-readable medium of claim 8, wherein the operation further comprises:
generating derivative media content by stitching together the search results into a single media content file based on a stitching option.
15. A system comprising:
one or more computer processors;
a memory containing a program executable by the one or more computer processors to perform an operation comprising:
receiving, from a user device, user input identifying a search criteria;
searching metadata associated with a plurality of media content files to identify a subset of the plurality of media content files, the subset of the plurality of media content files comprising one or more media content files of the plurality of media content files that includes one or more scenes that match the search criteria; and
providing, to the user device, search results identifying the subset of the plurality of media content files,
wherein the metadata is generated by a respective originator of each media content file of the plurality of media content files and describes each scene of the plurality of media content files.
16. The system of claim 15, wherein the metadata comprises visual metadata describing an actor, an actress, a character, an object, a location, an emotion, an action, a theme, or a plot point associated with each scene of the plurality of media content files.
17. The system of claim 15, wherein the metadata comprises audio metadata describing a dialog or a song associated with each scene of the plurality of media content files.
18. The system of claim 15, wherein the metadata comprises subtitle metadata describing a subtitle associated with each scene of the plurality of media content files.
19. The system of claim 15, wherein the metadata comprises visual metadata describing an actor, an actress, a character, an object, a location, an emotion, an action, a theme, or a plot point associated with each scene of the plurality of media content files, and
wherein the metadata further comprises at least one of:
audio metadata describing a dialog or a song associated with each scene of the plurality of media content files, or
subtitle metadata describing a subtitle associated with each scene of the plurality of media content files.
20. The system of claim 15, wherein the user input comprises vocal user input, and wherein the operation further comprises:
initiating a speech-to-text recognition process to ascertain the search criterion from the vocal user input.
21. The system of claim 15, wherein the operation further comprises:
generating derivative media content by stitching together the search results into a single media content file based on a stitching option.
US17/528,842 2014-10-03 2021-11-17 Voice searching metadata through media content Pending US20220075829A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/528,842 US20220075829A1 (en) 2014-10-03 2021-11-17 Voice searching metadata through media content

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201462059703P 2014-10-03 2014-10-03
US14/568,083 US11182431B2 (en) 2014-10-03 2014-12-11 Voice searching metadata through media content
US17/528,842 US20220075829A1 (en) 2014-10-03 2021-11-17 Voice searching metadata through media content

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US14/568,083 Continuation US11182431B2 (en) 2014-10-03 2014-12-11 Voice searching metadata through media content

Publications (1)

Publication Number Publication Date
US20220075829A1 true US20220075829A1 (en) 2022-03-10

Family

ID=55633215

Family Applications (2)

Application Number Title Priority Date Filing Date
US14/568,083 Active US11182431B2 (en) 2014-10-03 2014-12-11 Voice searching metadata through media content
US17/528,842 Pending US20220075829A1 (en) 2014-10-03 2021-11-17 Voice searching metadata through media content

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US14/568,083 Active US11182431B2 (en) 2014-10-03 2014-12-11 Voice searching metadata through media content

Country Status (3)

Country Link
US (2) US11182431B2 (en)
CN (2) CN114996485A (en)
HK (1) HK1223697A1 (en)

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10157618B2 (en) 2013-05-02 2018-12-18 Xappmedia, Inc. Device, system, method, and computer-readable medium for providing interactive advertising
US11182431B2 (en) * 2014-10-03 2021-11-23 Disney Enterprises, Inc. Voice searching metadata through media content
US20160180722A1 (en) * 2014-12-22 2016-06-23 Intel Corporation Systems and methods for self-learning, content-aware affect recognition
US9978366B2 (en) 2015-10-09 2018-05-22 Xappmedia, Inc. Event-based speech interactive media player
US20170142549A1 (en) * 2015-11-12 2017-05-18 Trackr, Inc. System and method for tracking items within a defined area
US9990176B1 (en) * 2016-06-28 2018-06-05 Amazon Technologies, Inc. Latency reduction for content playback
US10182114B2 (en) * 2016-07-04 2019-01-15 Novatek Microelectronics Corp. Media content sharing method and server
US10362365B2 (en) * 2016-11-03 2019-07-23 Ravi Guides, Inc. Systems and methods for managing an interactive session in an interactive media guidance application
TWI617197B (en) * 2017-05-26 2018-03-01 和碩聯合科技股份有限公司 Multimedia apparatus and multimedia system
US10970334B2 (en) 2017-07-24 2021-04-06 International Business Machines Corporation Navigating video scenes using cognitive insights
US10902050B2 (en) 2017-09-15 2021-01-26 International Business Machines Corporation Analyzing and weighting media information
US11144584B2 (en) 2017-10-03 2021-10-12 Google Llc Coordination of parallel processing of audio queries across multiple devices
US10777203B1 (en) 2018-03-23 2020-09-15 Amazon Technologies, Inc. Speech interface device with caching component
US10984799B2 (en) 2018-03-23 2021-04-20 Amazon Technologies, Inc. Hybrid speech interface device
US11295783B2 (en) * 2018-04-05 2022-04-05 Tvu Networks Corporation Methods, apparatus, and systems for AI-assisted or automatic video production
US10733984B2 (en) 2018-05-07 2020-08-04 Google Llc Multi-modal interface in a voice-activated network
CN109325097B (en) * 2018-07-13 2022-05-27 海信集团有限公司 Voice guide method and device, electronic equipment and storage medium
JP7409370B2 (en) * 2019-03-27 2024-01-09 ソニーグループ株式会社 Video processing device and video processing method
US11133005B2 (en) * 2019-04-29 2021-09-28 Rovi Guides, Inc. Systems and methods for disambiguating a voice search query
US11689766B2 (en) * 2020-08-28 2023-06-27 Disney Enterprises, Inc. Techniques for personalizing the playback of a media title based on user interactions with an internet of things device
WO2022065537A1 (en) * 2020-09-23 2022-03-31 주식회사 파이프랩스 Video reproduction device for providing subtitle synchronization and method for operating same
US20220345778A1 (en) * 2021-04-26 2022-10-27 At&T Intellectual Property I, L.P. Method and system for enhancing media content consumption experiences

Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010047377A1 (en) * 2000-02-04 2001-11-29 Sincaglia Nicholas William System for distributed media network and meta data server
US6397181B1 (en) * 1999-01-27 2002-05-28 Kent Ridge Digital Labs Method and apparatus for voice annotation and retrieval of multimedia data
US20030184598A1 (en) * 1997-12-22 2003-10-02 Ricoh Company, Ltd. Television-based visualization and navigation interface
US6907397B2 (en) * 2002-09-16 2005-06-14 Matsushita Electric Industrial Co., Ltd. System and method of media file access and retrieval using speech recognition
US20070011133A1 (en) * 2005-06-22 2007-01-11 Sbc Knowledge Ventures, L.P. Voice search engine generating sub-topics based on recognitiion confidence
US20070027844A1 (en) * 2005-07-28 2007-02-01 Microsoft Corporation Navigating recorded multimedia content using keywords or phrases
US20080154870A1 (en) * 2006-12-26 2008-06-26 Voice Signal Technologies, Inc. Collection and use of side information in voice-mediated mobile search
US20080249770A1 (en) * 2007-01-26 2008-10-09 Samsung Electronics Co., Ltd. Method and apparatus for searching for music based on speech recognition
US20090150159A1 (en) * 2007-12-06 2009-06-11 Sony Ericsson Mobile Communications Ab Voice Searching for Media Files
US20090190899A1 (en) * 2008-01-25 2009-07-30 At&T Labs System and method for digital video retrieval involving speech recognitiion
US20090287650A1 (en) * 2006-06-27 2009-11-19 Lg Electronics Inc. Media file searching based on voice recognition
US20100010814A1 (en) * 2008-07-08 2010-01-14 International Business Machines Corporation Enhancing media playback with speech recognition
US20100076763A1 (en) * 2008-09-22 2010-03-25 Kabushiki Kaisha Toshiba Voice recognition search apparatus and voice recognition search method
US20100262618A1 (en) * 2009-04-14 2010-10-14 Disney Enterprises, Inc. System and method for real-time media presentation using metadata clips
US20110015932A1 (en) * 2009-07-17 2011-01-20 Su Chen-Wei method for song searching by voice
US7988560B1 (en) * 2005-01-21 2011-08-02 Aol Inc. Providing highlights of players from a fantasy sports team
US8132103B1 (en) * 2006-07-19 2012-03-06 Aol Inc. Audio and/or video scene detection and retrieval
US20130013305A1 (en) * 2006-09-22 2013-01-10 Limelight Networks, Inc. Method and subsystem for searching media content within a content-search service system
US20140164371A1 (en) * 2012-12-10 2014-06-12 Rawllin International Inc. Extraction of media portions in association with correlated input
US20140188997A1 (en) * 2012-12-31 2014-07-03 Henry Will Schneiderman Creating and Sharing Inline Media Commentary Within a Network
US20160098998A1 (en) * 2014-10-03 2016-04-07 Disney Enterprises, Inc. Voice searching metadata through media content

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101551997B (en) * 2009-02-25 2012-07-04 北京派瑞根科技开发有限公司 Assisted learning system of music
KR20140089861A (en) * 2013-01-07 2014-07-16 삼성전자주식회사 display apparatus and method for controlling the display apparatus

Patent Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030184598A1 (en) * 1997-12-22 2003-10-02 Ricoh Company, Ltd. Television-based visualization and navigation interface
US6397181B1 (en) * 1999-01-27 2002-05-28 Kent Ridge Digital Labs Method and apparatus for voice annotation and retrieval of multimedia data
US20010047377A1 (en) * 2000-02-04 2001-11-29 Sincaglia Nicholas William System for distributed media network and meta data server
US6907397B2 (en) * 2002-09-16 2005-06-14 Matsushita Electric Industrial Co., Ltd. System and method of media file access and retrieval using speech recognition
US7988560B1 (en) * 2005-01-21 2011-08-02 Aol Inc. Providing highlights of players from a fantasy sports team
US20070011133A1 (en) * 2005-06-22 2007-01-11 Sbc Knowledge Ventures, L.P. Voice search engine generating sub-topics based on recognitiion confidence
US20070027844A1 (en) * 2005-07-28 2007-02-01 Microsoft Corporation Navigating recorded multimedia content using keywords or phrases
US20090287650A1 (en) * 2006-06-27 2009-11-19 Lg Electronics Inc. Media file searching based on voice recognition
US8132103B1 (en) * 2006-07-19 2012-03-06 Aol Inc. Audio and/or video scene detection and retrieval
US20130013305A1 (en) * 2006-09-22 2013-01-10 Limelight Networks, Inc. Method and subsystem for searching media content within a content-search service system
US20080154870A1 (en) * 2006-12-26 2008-06-26 Voice Signal Technologies, Inc. Collection and use of side information in voice-mediated mobile search
US20080249770A1 (en) * 2007-01-26 2008-10-09 Samsung Electronics Co., Ltd. Method and apparatus for searching for music based on speech recognition
US20090150159A1 (en) * 2007-12-06 2009-06-11 Sony Ericsson Mobile Communications Ab Voice Searching for Media Files
US20090190899A1 (en) * 2008-01-25 2009-07-30 At&T Labs System and method for digital video retrieval involving speech recognitiion
US20100010814A1 (en) * 2008-07-08 2010-01-14 International Business Machines Corporation Enhancing media playback with speech recognition
US20100076763A1 (en) * 2008-09-22 2010-03-25 Kabushiki Kaisha Toshiba Voice recognition search apparatus and voice recognition search method
US20100262618A1 (en) * 2009-04-14 2010-10-14 Disney Enterprises, Inc. System and method for real-time media presentation using metadata clips
US20110015932A1 (en) * 2009-07-17 2011-01-20 Su Chen-Wei method for song searching by voice
US20140164371A1 (en) * 2012-12-10 2014-06-12 Rawllin International Inc. Extraction of media portions in association with correlated input
US20140188997A1 (en) * 2012-12-31 2014-07-03 Henry Will Schneiderman Creating and Sharing Inline Media Commentary Within a Network
US20160098998A1 (en) * 2014-10-03 2016-04-07 Disney Enterprises, Inc. Voice searching metadata through media content
US11182431B2 (en) * 2014-10-03 2021-11-23 Disney Enterprises, Inc. Voice searching metadata through media content

Also Published As

Publication number Publication date
CN105488094A (en) 2016-04-13
HK1223697A1 (en) 2017-08-04
CN114996485A (en) 2022-09-02
US20160098998A1 (en) 2016-04-07
US11182431B2 (en) 2021-11-23

Similar Documents

Publication Publication Date Title
US20220075829A1 (en) Voice searching metadata through media content
US10031649B2 (en) Automated content detection, analysis, visual synthesis and repurposing
US8798777B2 (en) System and method for using a list of audio media to create a list of audiovisual media
US9799375B2 (en) Method and device for adjusting playback progress of video file
US8831953B2 (en) Systems and methods for filtering objectionable content
US9213705B1 (en) Presenting content related to primary audio content
US10430024B2 (en) Media item selection using user-specific grammar
US8799300B2 (en) Bookmarking segments of content
US7765245B2 (en) System and methods for enhanced metadata entry
US10560734B2 (en) Video segmentation and searching by segmentation dimensions
US10116981B2 (en) Video management system for generating video segment playlist using enhanced segmented videos
US20180035171A1 (en) System and method for content-based navigation of live and recorded tv and video programs
US8812498B2 (en) Methods and systems for providing podcast content
US20140143218A1 (en) Method for Crowd Sourced Multimedia Captioning for Video Content
US9558784B1 (en) Intelligent video navigation techniques
KR20140139859A (en) Method and apparatus for user interface for multimedia content search
US20220150587A1 (en) Metrics-based timeline of previews
US9564177B1 (en) Intelligent video navigation techniques
US12086503B2 (en) Audio segment recommendation
US9635337B1 (en) Dynamically generated media trailers
US20160212487A1 (en) Method and system for creating seamless narrated videos using real time streaming media
US20150261425A1 (en) Optimized presentation of multimedia content
JP6602423B6 (en) Content providing server, content providing terminal, and content providing method
CN110209870B (en) Music log generation method, device, medium and computing equipment
JP7128222B2 (en) Content editing support method and system based on real-time generation of synthesized sound for video content

Legal Events

Date Code Title Description
AS Assignment

Owner name: DISNEY ENTERPRISES, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WANG, JING X.;ARANA, MARK;DRAKE, EDWARD;AND OTHERS;SIGNING DATES FROM 20141120 TO 20141211;REEL/FRAME:058141/0838

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: ADVISORY ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED