US20180160200A1 - Methods and systems for identifying, incorporating, streamlining viewer intent when consuming media - Google Patents
Methods and systems for identifying, incorporating, streamlining viewer intent when consuming media Download PDFInfo
- Publication number
- US20180160200A1 US20180160200A1 US15/786,077 US201715786077A US2018160200A1 US 20180160200 A1 US20180160200 A1 US 20180160200A1 US 201715786077 A US201715786077 A US 201715786077A US 2018160200 A1 US2018160200 A1 US 2018160200A1
- Authority
- US
- United States
- Prior art keywords
- video
- viewer
- electronic device
- multimedia
- keyword
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/854—Content authoring
- H04N21/8545—Content authoring for generating interactive applications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/43—Querying
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/71—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/74—Browsing; Visualisation therefor
- G06F16/745—Browsing; Visualisation therefor the internal structure of a single video sequence
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G06F17/30858—
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
- H04N21/234336—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by media transcoding, e.g. video is transformed into a slideshow of still pictures or audio is converted into text
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/266—Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
- H04N21/26603—Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel for automatically generating descriptors from content, e.g. when it is not made available by its provider, using content analysis techniques
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/27—Server based end-user applications
- H04N21/278—Content descriptor database or directory service for end-user access
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/45—Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
- H04N21/4508—Management of client data or end-user data
- H04N21/4532—Management of client data or end-user data involving end-user characteristics, e.g. viewer profile, preferences
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/478—Supplemental services, e.g. displaying phone caller identification, shopping application
- H04N21/4788—Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/482—End-user interface for program selection
- H04N21/4828—End-user interface for program selection for searching program descriptors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/8126—Monomedia components thereof involving additional data, e.g. news, sports, stocks, weather forecasts
- H04N21/8133—Monomedia components thereof involving additional data, e.g. news, sports, stocks, weather forecasts specifically related to the content, e.g. biography of the actors in a movie, detailed information about an article seen in a video program
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/84—Generation or processing of descriptive data, e.g. content descriptors
- H04N21/8405—Generation or processing of descriptive data, e.g. content descriptors represented by keywords
Definitions
- the embodiments herein relate to electronic devices and, more particularly to identify viewer's intent vis-à-vis the multimedia content, and both facilitate productive viewing/listening of the content and incorporating said intent.
- the other challenge is that there is no way to validate whether the viewer understood what is described inside the video. Another relates to the fact that the viewing context is typically limited to the title of the video. There is currently no way to provide the viewer with additional relevant content that would help to broaden the viewer perspective on the main topics discussed in the video.
- NPV Net Present Value
- the viewer will go through the process of searching videos online. This search results in the viewer being presented with several different videos titled with Finance and Accounting, Business Finance, Net Present Value or the like Now, the viewer can select that video to watch which has titles closely matching with the set viewing intent. It is likely that the viewer may start watching the video based on the matching title of Net Present Value.
- the viewer may not be sure whether the selected video does contain the relevant content, which satisfies the viewer intent. If the video does not contain the relevant information, the viewer might continue to watch the selected filtered videos until an exact match for the desired viewing intent is found. The entire process is time consuming and tedious. Sometimes, the viewer can use the manually generated tags associated with the videos to choose the video based on the desired viewing intent. These tags are usually inappropriate and or misleading to the viewer.
- FIG. 1 is a block diagram illustrating various units of an electronic device for identifying viewer intent related to a multimedia, according to embodiments as disclosed herein;
- FIG. 2 is a flow diagram illustrating a method for identifying viewer intent related to a multimedia, according to embodiments as disclosed herein;
- FIG. 3 is a block diagram illustrating a method of generating contextual data elements, according to embodiments as disclosed herein;
- FIG. 4 is an example diagram illustrating a method for extracting content from a video, according to embodiments as disclosed herein;
- FIG. 5 is an example diagram illustrating a method for establishing a video context, according to embodiments as disclosed herein;
- FIG. 6 is an example block diagram illustrating a text summarizer, according to embodiments as disclosed herein.
- FIG. 7 is an example diagram illustrating a mechanism to search inside a video, according to embodiments as disclosed herein.
- a method includes generating a textual summary from at least one multimedia. Further, the method includes determining and displaying at least one of one or more keywords and one or more key phrase associated with the textual summary. Further, the method includes generating one or more paragraphs from the extracted textual summary to generate one or more chapters based on the at least one of one or more keywords and one or more key phrases appeared in a time stamp associated with the textual summary. Further, the method includes generating one or more index tables for the generated one or more chapters to enable a user to search inside the multimedia.
- FIG. 1 is a block diagram illustrating various units of an electronic device 100 for identifying viewer intent a multimedia, according to embodiments as disclosed herein.
- the electronic device 100 can be at least one of but not restricted to a mobile phone, a smartphone, tablet, a phablet, a personal digital assistant (PDA), a laptop, a computer, a wearable computing device, a smart TV, wearable device (for example, smart watch, smart band), or any other electronic device which has the capability of playing a multimedia content or accessing an application (such as a browser) which can access and display multimedia content.
- the electronic device includes a text summarizer 102 , a keyword and key phase extractor 104 , a paragraph generator 106 , an index table generator 108 , a communication interface unit 110 and a memory 112 .
- the text summarizer 102 can be configured to generate a textual summary from at least one multimedia content.
- This content can be for example, in the form of video, audio and video, textual content present in video format, animations with audio, text or the like.
- the keyword and key phase extractor 104 can be configured to determine and display at least one of one or more keyword and one or more key phrase associated with the textual summary.
- the paragraph generator 106 can be configured to generate one or more paragraphs from the extracted textual summary to generate one or more chapters based on at least one of one or more keywords and one or more key phrases appeared in a time stamp associated with the textual summary.
- the index table generator 108 can be configured to generate one or more index tables for the generated one or more chapters to enable a viewer to search inside the multimedia, while also used for creating titles of the said paragraphs and to estimate the duration of the chapters/paragraphs.
- the method includes receiving an input from the viewer (wherein the input can comprise of at least one of keyword and key phrase) to play the multimedia associated with the at least one of keyword and key phrase.
- the method includes receiving an input from the viewer on the generated at least one table of contents to play an interested portion of the multimedia.
- the method includes receiving the viewer search intent as an input to identify a matching content in the multimedia corresponding to the viewer search intent.
- the viewer search intent can be an audio input, video/image input or textual input.
- the viewer search intent can be at least one of a keyword, key phrase, sentence or the like.
- the communication interface unit 110 can be configured to establish communication between the electronic device 100 and a network.
- the memory 112 can be configured to store multimedia, textual summary, keywords and key phrases, paragraphs and chapters generated from the respective multimedia.
- the memory 112 may include one or more computer-readable storage media.
- the memory 108 may include non-volatile storage elements. Examples of such non-volatile storage elements may include magnetic hard discs, optical discs, floppy discs, flash memories, or forms of electrically programmable memories (EPROM) or electrically erasable and programmable (EEPROM) memories.
- EPROM electrically programmable memories
- EEPROM electrically erasable and programmable
- the memory 112 may, in some examples, be considered a non-transitory storage medium.
- the term “non-transitory” may indicate that the storage medium is not embodied in a carrier wave or a propagated signal.
- non-transitory should not be interpreted to mean that the memory 112 is non-movable.
- the memory 112 can be configured to store larger amounts of information than the memory.
- a non-transitory storage medium may store data that can, over time, change (e.g., in Random Access Memory (RAM) or cache).
- RAM Random Access Memory
- FIG. 1 shows exemplary units of the electronic device 100 , but it is to be understood that other embodiments are not limited thereon. In other embodiments, the electronic device 100 may include less or more number of units. Further, the labels or names of the units are used only for illustrative purpose and does not limit the scope of the embodiments herein. One or more units can be combined together to perform same or substantially similar function in the electronic device 100 .
- the embodiments herein provide an electronic device 100 configured to uses a combination of image processing, speech recognition, natural language processing, machine learning and neural networks to determine what is inside the multimedia (for example, video) by generating a summary of the video using a combination of video frame analysis and text summarization, wherein the text summarization includes extractive and abstractive summaries. Further, the electronic device 100 can be configured to use recurrent neural network techniques and domain classification on the text summarization to extract relevant keywords, keyphrases and table of contents. Further, the electronic device 100 can be configured to generate pointers to additional information in the form of text and visual content using a combination of the generate relevant key-words, key-phrases and table of contents and most recently captured internet browsing preferences of the viewer.
- the embodiments herein extend the video viewing beyond the title of the video. This provides the viewer a method/way to know what is inside the video prior to watching the video. Reading the summary of the video content enables viewer to estimate how likely will the viewer's intent of viewing be covered by the video.
- the embodiments herein also enable the viewer to search the viewer intent of “information gathering” inside the video.
- the embodiments herein extended beyond the title of the video which helps the viewer to build and match the relevant intent of information gathering, resulting in a more enhanced and engaged experience. This can be implemented in various domains like online-education, video surveillance, online advertising, retail, product introduction, skill development programs, Judiciary, Banking, Broadcasting industry or the like.
- FIGS. 1 through 8 where similar reference characters denote corresponding features consistently throughout the figures, there are shown embodiments.
- FIG. 2 is a flow diagram illustrating a method for identifying viewer intent in a multimedia, according to embodiments as disclosed herein.
- the method includes generating a textual summary from at least one multimedia content.
- the method allows the text summarizer 102 to generate the textual summary from at least one multimedia content.
- the method includes determining and displaying at least one of at least one keyword and at least one key phrase associated with the textual summary.
- the method allows the keyword and key phase extractor 104 to determining and displaying at least one of at least one keyword and at least one key phrase associated with the textual summary.
- the method includes generating at least one paragraph from the extracted textual summary to generate at least one chapter based on the at least one of at least one keyword and at least one key phrase appeared in a time stamp associated with the textual summary.
- the method allows the paragraph generator 106 to generate at least one paragraph from the extracted textual summary to generate at least one chapter based on the at least one of at least one keyword and at least one key phrase appeared in a time stamp associated with the textual summary.
- the method includes generating at least one index table for the generated at least one chapter to enable a user to search inside the multimedia.
- the method allows the Index table generator 108 to generate at least one index table for the generated at least one chapter to enable a user to search inside the multimedia
- the embodiments herein can establish a contextual relation between content of the multimedia and a learning intent of the viewer/user.
- the contextual relation can be established with the help of textual summary, keyword(s) and key phrase(s), one or more paragraphs generated from the extracted textual summary, one or more chapters generated based on the keyword(s) and at key phrase(s) appeared in a time stamp associated with the textual summary and the generated index table(s).
- FIG. 3 is a diagram illustrating a method of generating contextual data elements, according to embodiments as disclosed herein.
- the multimedia may have audio and/or video, audio/speech content and visual content are extracted. Further, frames of visual content are extracted and classified for further processing of images comprising the frames and for identifying the context.
- the extracted/separated audio/speech content is analyzed and converted to a suitable textual summary/transcript through an automatic speech recognition (ASR) engine. Further necessary and pertinent post-processing is performed on the extracted transcript for cleaning up of errors and other corrections. This transcript is appropriately punctuated automatically and processed for identifying keywords which represent the context of the matter presented/spoken in the audio content.
- ASR automatic speech recognition
- These elements viz., the classification of the video and its frames, the identified keywords, the textual summary, gathering and collating information relevant to the content being processed with the help of the keywords, along with the interface to be able to search the video for the generated keywords and present the search results in the context of the video, comprise the overall information incorporated as a viewing/visual element in the context of the original multimedia.
- the contextual data elements are information entities generated from the multimedia.
- the data elements represent the context of the multimedia and enable a viewer to match and validate intent of viewing with the content of that video.
- the contextual data elements help the viewer to identify the viewing intent beyond title of the video.
- the embodiments herein enables the electronic device 100 to generate the following contextual data elements:
- Video summary is an automatic summarization of video's audio and/or visual content into textual summary, which represents the most relevant context of the entire video in a condensed form.
- Video summary is quick to read and view.
- the video summary helps the viewer to get accurate information of what has been spoken or visualized inside the video.
- a viewer can use the video summary to validate the viewer intent of viewing the video. If there is a match between content and viewer's intent, the viewer can then continue to watch the video or else the viewer can decide to select some other video. This data element, allows the viewer to spend his time much more productively, while searching for relevant content to view.
- the embodiments herein provides contextual data element.
- the contextual data elements are a unique enablement of viewer's ability to search inside the video and find a more relevant match to his or her intent of watching the video. This enables the user to find out a few things (what is inside the video, where in the video the content related to the viewer's intent occur. This answers the important question of whether this video has the content that matches the viewer intent.
- Information retrieval data elements are used to enhance the effectiveness of information gathering for the viewer by seamlessly enabling access to other related information.
- Other related information is assembled using a combination of Natural Language Processing (NLP) and Machine Learning (ML) techniques. These techniques are applied to a combination of inputs, which are in the context of the video and the analysis of recently captured online behavior traits of a viewer.
- Technical methods used to generate this data element present additional information in the form of relevant text, and audio and video sources.
- the contextual data elements help to automate the process of creation of right descriptions and attributes. Additionally, contextual Keywords data elements help to classify the context of the information being searched for accurately. Using these keywords, the viewer can establish a match between his or her viewing intent and the content of the video.
- the contextual keywords data element extraction uses a combination of Natural Language Processing; Machine learning and Neural Networks techniques to establish the relevance of the keywords for a given context of the video. An accurate representation of the context of the video with relevant keywords, greatly improves search ability of the video itself.
- a table of video content helps the viewer to navigate through the video in a structured manner.
- Embodiments herein disclose methods to generate the index automatically using the content of the video, which is synced with occurrence of the content on the time scale of the video.
- FIG. 4 is an example diagram illustrating a method for extracting content from a video, according to embodiments as disclosed herein.
- the first step of extracting video content involves crawling the online video and related information.
- the end result of the crawling can include a multi domain data corpus.
- the data corpus will contain documents, metadata, tagged audio and video content classified into different domains.
- the content extraction from the video includes data extraction, noise elimination and data classification, wherein the data extraction includes extracting content from the selected online video.
- the content extraction engine extracts the content from the video, wherein the extracted content includes video frames, audio streams and video metadata.
- the content extraction also includes converting the extracted data into appropriate formats. For instance, audio speech data is converted into single channel, 16 bits and 16 kHz bandwidth format.
- the content extraction includes noise elimination.
- the noise elimination unit filters out irrelevant data or the data, which cannot be utilized for further processing. For instance, identifying audio speech with multiple high and low frequencies compressed at very low bit rates will constitute an input to the elimination phase.
- the content extraction includes data classification, which comprises of filtering the irrelevant data from the extracted content; the data classification unit can be configured to classify the extracted data into different categories. For instance, the data classification unit extracts the video frames and groups them into different categories based on the images, changes in scenes and identification of text or the like. Further, the audio stream is categorized based on the spoken accent, dialects and gender. An appropriate categorization is done to metadata of the video as well. These categories together highlight features, which are uniquely tagged to help generate a useful context for the video during data processing.
- FIG. 5 is an example diagram illustrating a method for establishing a video context, according to embodiments as disclosed herein.
- the video context can be established using a matrix of machine learned contextual data models.
- the machine learnt contextual data models can be configured to develop new intelligent data from the extracted video content. As depicted in FIG. 5 , establishing the video context results in building a matrix of data models and generating necessary inputs for the sequence of steps to generate new data that can be subsequently used to build the context of the video.
- the matrix of data models can be built using inputs like: uniquely tagged audio, video and metadata content extracted from the online video using video content extraction and using separately built machine learnt multi domain data corpus.
- the multi domain data corpus represents different fields of information categorized into different domains. The domain categorization helps to impose relevant classification of context to the selected video.
- the matrix represents the arrangement of different types of machine learnt data models carrying the context of the video, which can be used by different methods in their independent capacities to produce outputs in the form of a new data.
- This form of data captures the relevant context of the video and can be used to develop data/content elements, which can be used by the viewer to validate the viewing intent.
- Speech to text decoder comprises of methods, which uniquely automate the process of selecting desired data models from the matrix such as speech acoustic models, domain language models, lexicon, and so on. Further, the selected data models can be configured to decode the audio content of the video into textual summary.
- the textual summary can be fed into a transcript generator to develop the textual summary into a structured text using different data models from the matrix.
- the structured text represents a clean, punctuated data, which is paraphrased using domain centric data models. This method also uses these data models to identify new contextual titles and subtitles from the structured text, which gets developed into index table for the contents of the video.
- the structured text is taken as an input by a keyword generator, which uses a different contextual data model from the matrix and identifies keywords, which represent the context of the video. These keywords are used further by several other methods as a technique to preserve the video context and generate new data elements, which helps to extend the context beyond the title of the video.
- FIG. 6 is an example block diagram illustrating a text summarizer, according to embodiments as disclosed herein.
- the text summarizer 102 takes long structured text and video frames as inputs and generates information in a more condensed form, which provides a gist/summary of the video content to the viewer.
- Textual summary generated is presented both in text and video forms.
- This method uses both extractive and abstractive techniques to build the textual summary. While the extractive technique uses the text phrases from the original text, the abstractive technique generates new sentences with the help of data models. Sentences generated from both extractive and abstractive techniques receive scores using the scale of domain of the video content and then the ranking engine selects a set of sentences, which are presented as a summary data. Further the textual summary output is fed into a video frames handler, which aligns video frames with the summary text. Video summary re-ranker considers these aligned video frames and other video frames as inputs, which are classified and tagged using domain intelligence. With these inputs, video re-ranker generates a new video summary.
- FIG. 7 is an example block diagram illustrating a mechanism to search inside a video, according to embodiments as disclosed herein.
- the embodiments herein enable the viewer to find relevant sections inside the video.
- the electronic device 100 can be configured to receive a query phrase as an input in the form of text.
- the text query phrase is fed into a video frame parser and a text parser.
- the video frame parser determines relevant frames from the video, which matches with the description of the query phrase.
- the text parser determines relevant sections inside the videos based on the textual summary.
- Outputs from both the parsers are fed into intelligent response builder.
- the intelligent response builder can be configured to use domain intelligence as one of the criterion to rank video frames and texts to identify relevant sections of the video. Finally these sections of the video as presented as a final response to the query from the viewer.
- the embodiments herein provides a mechanism to summarize the multimedia, wherein the multimedia can includes at least one of but not limited to text, audio or video.
- the generated multimedia summarization enables the user to identify and categories the multimedia into different categories for example, mass media (for example, news or the like), sports content, film content, print matter, educational videos or the like.
- the embodiments herein also provide video thumbnails and provide a mechanism to perform a concept based search (like keyword search).
- the embodiments herein help in generating viewership patterns and associated analyses for internal communications.
- the embodiments herein help in corporate trainings for absorbing training material and passing assessments.
- the embodiments herein facilitates searching within and across all media, generates relationships and links between different categories—cross-search using keywords and key-phrases.
- the embodiments herein also help in collation and consolidation of material for easy consumption.
- the embodiments herein help in law enforcement by generating a surveillance summary. Further, the embodiments herein aggregate data using search terms and relationships.
- the embodiments disclosed herein can be implemented through at least one software program running on at least one hardware device and performing functions to control the at least one hardware device.
- the electronic device 100 shown in FIG. 1 includes blocks, which can be at least one of a hardware sub-module, or a combination of hardware sub-modules and software module.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Library & Information Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Acoustics & Sound (AREA)
- Software Systems (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Psychiatry (AREA)
- Hospice & Palliative Care (AREA)
- Child & Adolescent Psychology (AREA)
- Computer Security & Cryptography (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiments herein provide systems and methods for identifying viewer intent for a multimedia of an electronic device, wherein a method includes generating a textual summary from at least one multimedia. Further, the method includes determining and displaying at least one of one or more keywords and one or more key phrase associated with the textual summary. Further, the method includes generating one or more paragraphs from the extracted textual summary to generate one or more chapters based on the at least one of one or more keywords and one or more key phrases appeared in a time stamp associated with the textual summary. Further, the method includes generating one or more index tables for the generated one or more chapters to enable a user to search inside the multimedia.
Description
- This application is based on and derives the benefit of Indian Provisional Application 201641041399 filed on Dec. 3, 2016, the contents of which are incorporated herein by reference.
- The embodiments herein relate to electronic devices and, more particularly to identify viewer's intent vis-à-vis the multimedia content, and both facilitate productive viewing/listening of the content and incorporating said intent.
- Generally, videos are being used to share information and messages on the Internet. It is now easy to make videos and capture your thought processes. It is possible to reach large audiences using video as the medium of communication. This large proliferation has resulted in there being too many videos with duplicate messages on the Internet today. There is no intelligent system, which helps to choose the right video to view and, and the one that can be understood most effectively. Additionally, there are few other unsolved problems today in video space such as: selecting a video, which caters to the users viewing context (what is my situation) and viewing intent (what I want to achieve). The viewer's short attention span means that there should be a way to help the viewer to quickly get a gist of important concepts described in the video. The other challenge is that there is no way to validate whether the viewer understood what is described inside the video. Another relates to the fact that the viewing context is typically limited to the title of the video. There is currently no way to provide the viewer with additional relevant content that would help to broaden the viewer perspective on the main topics discussed in the video.
- Too much online video content, has made accessing information a stressful effort. To establish a relevant context to watch a video is a time-consuming effort, which often leads to bubbling of unnecessary information and waste of time and effort. For example, there are several different online videos available on any given topic, but it is very difficult to analyze and understand using the visual and spoken context of the video. In this situation, the context can only be established after watching the video and there are very high chances that this context is different from what is desired by the viewer. Not surprising that content providers find a gap between the expected and actual viewership of their online video content.
- For example, if a viewer wants to watch a video, the viewer has to identify a right video to watch. Let's assume that the viewer wants to watch a video, which is discussing about how to calculate Net Present Value (NPV) when cash flows are uneven. Thus, how to calculate Net Present Value (NPV) when cash flows are uneven acts as a viewing intent and this process is called “setting the viewing intent”. Furthermore, based on this viewing intent, the viewer will go through the process of searching videos online. This search results in the viewer being presented with several different videos titled with Finance and Accounting, Business Finance, Net Present Value or the like Now, the viewer can select that video to watch which has titles closely matching with the set viewing intent. It is likely that the viewer may start watching the video based on the matching title of Net Present Value. However, the viewer may not be sure whether the selected video does contain the relevant content, which satisfies the viewer intent. If the video does not contain the relevant information, the viewer might continue to watch the selected filtered videos until an exact match for the desired viewing intent is found. The entire process is time consuming and tedious. Sometimes, the viewer can use the manually generated tags associated with the videos to choose the video based on the desired viewing intent. These tags are usually inappropriate and or misleading to the viewer.
- Existing solutions disclose methods to cater to the viewer's context by providing index of video content and providing transcripts of the speech spoken inside the video, but these methods are manual, tedious and are too limited to meet the scale of the online video content. They are clearly limited because of the technical barriers posed by the expectations of such a method which should automatically create the viewer's context when watching a video and should meet the scale of time and be significantly less than the effort of online video content creation.
- The embodiments disclosed herein will be better understood from the following detailed description with reference to the drawings, in which:
-
FIG. 1 is a block diagram illustrating various units of an electronic device for identifying viewer intent related to a multimedia, according to embodiments as disclosed herein; -
FIG. 2 is a flow diagram illustrating a method for identifying viewer intent related to a multimedia, according to embodiments as disclosed herein; -
FIG. 3 is a block diagram illustrating a method of generating contextual data elements, according to embodiments as disclosed herein; -
FIG. 4 is an example diagram illustrating a method for extracting content from a video, according to embodiments as disclosed herein; -
FIG. 5 is an example diagram illustrating a method for establishing a video context, according to embodiments as disclosed herein; -
FIG. 6 is an example block diagram illustrating a text summarizer, according to embodiments as disclosed herein; and -
FIG. 7 is an example diagram illustrating a mechanism to search inside a video, according to embodiments as disclosed herein. - The embodiments herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments that are illustrated in the accompanying drawings and detailed in the following description. Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the embodiments herein. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein may be practiced and to further enable those of skill in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the embodiments herein.
- The embodiments herein provide systems and methods identifying viewer intent in a multimedia, wherein a method includes generating a textual summary from at least one multimedia. Further, the method includes determining and displaying at least one of one or more keywords and one or more key phrase associated with the textual summary. Further, the method includes generating one or more paragraphs from the extracted textual summary to generate one or more chapters based on the at least one of one or more keywords and one or more key phrases appeared in a time stamp associated with the textual summary. Further, the method includes generating one or more index tables for the generated one or more chapters to enable a user to search inside the multimedia.
-
FIG. 1 is a block diagram illustrating various units of anelectronic device 100 for identifying viewer intent a multimedia, according to embodiments as disclosed herein. - In an embodiment, the
electronic device 100 can be at least one of but not restricted to a mobile phone, a smartphone, tablet, a phablet, a personal digital assistant (PDA), a laptop, a computer, a wearable computing device, a smart TV, wearable device (for example, smart watch, smart band), or any other electronic device which has the capability of playing a multimedia content or accessing an application (such as a browser) which can access and display multimedia content. The electronic device includes atext summarizer 102, a keyword andkey phase extractor 104, aparagraph generator 106, anindex table generator 108, acommunication interface unit 110 and amemory 112. - The
text summarizer 102 can be configured to generate a textual summary from at least one multimedia content. This content can be for example, in the form of video, audio and video, textual content present in video format, animations with audio, text or the like. The keyword andkey phase extractor 104 can be configured to determine and display at least one of one or more keyword and one or more key phrase associated with the textual summary. Theparagraph generator 106 can be configured to generate one or more paragraphs from the extracted textual summary to generate one or more chapters based on at least one of one or more keywords and one or more key phrases appeared in a time stamp associated with the textual summary. Theindex table generator 108 can be configured to generate one or more index tables for the generated one or more chapters to enable a viewer to search inside the multimedia, while also used for creating titles of the said paragraphs and to estimate the duration of the chapters/paragraphs. Further, the method includes receiving an input from the viewer (wherein the input can comprise of at least one of keyword and key phrase) to play the multimedia associated with the at least one of keyword and key phrase. Further, the method includes receiving an input from the viewer on the generated at least one table of contents to play an interested portion of the multimedia. Further, the method includes receiving the viewer search intent as an input to identify a matching content in the multimedia corresponding to the viewer search intent. The viewer search intent can be an audio input, video/image input or textual input. For example, the viewer search intent can be at least one of a keyword, key phrase, sentence or the like. - The
communication interface unit 110 can be configured to establish communication between theelectronic device 100 and a network. - The
memory 112 can be configured to store multimedia, textual summary, keywords and key phrases, paragraphs and chapters generated from the respective multimedia. Thememory 112 may include one or more computer-readable storage media. Thememory 108 may include non-volatile storage elements. Examples of such non-volatile storage elements may include magnetic hard discs, optical discs, floppy discs, flash memories, or forms of electrically programmable memories (EPROM) or electrically erasable and programmable (EEPROM) memories. In addition, thememory 112 may, in some examples, be considered a non-transitory storage medium. The term “non-transitory” may indicate that the storage medium is not embodied in a carrier wave or a propagated signal. However, the term “non-transitory” should not be interpreted to mean that thememory 112 is non-movable. In some examples, thememory 112 can be configured to store larger amounts of information than the memory. In certain examples, a non-transitory storage medium may store data that can, over time, change (e.g., in Random Access Memory (RAM) or cache). -
FIG. 1 shows exemplary units of theelectronic device 100, but it is to be understood that other embodiments are not limited thereon. In other embodiments, theelectronic device 100 may include less or more number of units. Further, the labels or names of the units are used only for illustrative purpose and does not limit the scope of the embodiments herein. One or more units can be combined together to perform same or substantially similar function in theelectronic device 100. - The embodiments herein provide an
electronic device 100 configured to uses a combination of image processing, speech recognition, natural language processing, machine learning and neural networks to determine what is inside the multimedia (for example, video) by generating a summary of the video using a combination of video frame analysis and text summarization, wherein the text summarization includes extractive and abstractive summaries. Further, theelectronic device 100 can be configured to use recurrent neural network techniques and domain classification on the text summarization to extract relevant keywords, keyphrases and table of contents. Further, theelectronic device 100 can be configured to generate pointers to additional information in the form of text and visual content using a combination of the generate relevant key-words, key-phrases and table of contents and most recently captured internet browsing preferences of the viewer. - The embodiments herein extend the video viewing beyond the title of the video. This provides the viewer a method/way to know what is inside the video prior to watching the video. Reading the summary of the video content enables viewer to estimate how likely will the viewer's intent of viewing be covered by the video. The embodiments herein also enable the viewer to search the viewer intent of “information gathering” inside the video. Thus, the embodiments herein extended beyond the title of the video which helps the viewer to build and match the relevant intent of information gathering, resulting in a more enhanced and engaged experience. This can be implemented in various domains like online-education, video surveillance, online advertising, retail, product introduction, skill development programs, Judiciary, Banking, Broadcasting industry or the like.
- The embodiments herein disclose methods and systems for identifying viewer intent in a multimedia of an electronic device. Referring now to the drawings, and more particularly to
FIGS. 1 through 8 , where similar reference characters denote corresponding features consistently throughout the figures, there are shown embodiments. -
FIG. 2 is a flow diagram illustrating a method for identifying viewer intent in a multimedia, according to embodiments as disclosed herein. - At
step 202, the method includes generating a textual summary from at least one multimedia content. The method allows thetext summarizer 102 to generate the textual summary from at least one multimedia content. - At
step 204, the method includes determining and displaying at least one of at least one keyword and at least one key phrase associated with the textual summary. The method allows the keyword andkey phase extractor 104 to determining and displaying at least one of at least one keyword and at least one key phrase associated with the textual summary. - At
step 206, the method includes generating at least one paragraph from the extracted textual summary to generate at least one chapter based on the at least one of at least one keyword and at least one key phrase appeared in a time stamp associated with the textual summary. The method allows theparagraph generator 106 to generate at least one paragraph from the extracted textual summary to generate at least one chapter based on the at least one of at least one keyword and at least one key phrase appeared in a time stamp associated with the textual summary. - At
step 208, the method includes generating at least one index table for the generated at least one chapter to enable a user to search inside the multimedia. The method allows theIndex table generator 108 to generate at least one index table for the generated at least one chapter to enable a user to search inside the multimedia - The various actions, acts, blocks, steps, or the like in the method and the flow diagram 200 may be performed in the order presented, in a different order or simultaneously. Further, in some embodiments, some of the actions, acts, blocks, steps, or the like may be omitted, added, modified, skipped, or the like without departing from the scope of the invention.
- The embodiments herein can establish a contextual relation between content of the multimedia and a learning intent of the viewer/user. The contextual relation can be established with the help of textual summary, keyword(s) and key phrase(s), one or more paragraphs generated from the extracted textual summary, one or more chapters generated based on the keyword(s) and at key phrase(s) appeared in a time stamp associated with the textual summary and the generated index table(s).
-
FIG. 3 is a diagram illustrating a method of generating contextual data elements, according to embodiments as disclosed herein. - The multimedia may have audio and/or video, audio/speech content and visual content are extracted. Further, frames of visual content are extracted and classified for further processing of images comprising the frames and for identifying the context. The extracted/separated audio/speech content is analyzed and converted to a suitable textual summary/transcript through an automatic speech recognition (ASR) engine. Further necessary and pertinent post-processing is performed on the extracted transcript for cleaning up of errors and other corrections. This transcript is appropriately punctuated automatically and processed for identifying keywords which represent the context of the matter presented/spoken in the audio content. These elements, viz., the classification of the video and its frames, the identified keywords, the textual summary, gathering and collating information relevant to the content being processed with the help of the keywords, along with the interface to be able to search the video for the generated keywords and present the search results in the context of the video, comprise the overall information incorporated as a viewing/visual element in the context of the original multimedia.
- The contextual data elements are information entities generated from the multimedia. The data elements represent the context of the multimedia and enable a viewer to match and validate intent of viewing with the content of that video. The contextual data elements help the viewer to identify the viewing intent beyond title of the video. The embodiments herein enables the
electronic device 100 to generate the following contextual data elements: - Video Summary:
- Video summary is an automatic summarization of video's audio and/or visual content into textual summary, which represents the most relevant context of the entire video in a condensed form. Video summary is quick to read and view. The video summary helps the viewer to get accurate information of what has been spoken or visualized inside the video. A viewer can use the video summary to validate the viewer intent of viewing the video. If there is a match between content and viewer's intent, the viewer can then continue to watch the video or else the viewer can decide to select some other video. This data element, allows the viewer to spend his time much more productively, while searching for relevant content to view.
- Search Inside the Video:
- The embodiments herein provides contextual data element. The contextual data elements are a unique enablement of viewer's ability to search inside the video and find a more relevant match to his or her intent of watching the video. This enables the user to find out a few things (what is inside the video, where in the video the content related to the viewer's intent occur. This answers the important question of whether this video has the content that matches the viewer intent.
- Retrieving Other Relevant Information:
- Information retrieval data elements are used to enhance the effectiveness of information gathering for the viewer by seamlessly enabling access to other related information. Other related information is assembled using a combination of Natural Language Processing (NLP) and Machine Learning (ML) techniques. These techniques are applied to a combination of inputs, which are in the context of the video and the analysis of recently captured online behavior traits of a viewer. Technical methods used to generate this data element, present additional information in the form of relevant text, and audio and video sources. These data element's presentation in the video, significantly adds to the viewer's intent to look for related information and learn more thereby broadening the task of information gathering.
- Contextual Keywords:
- The contextual data elements help to automate the process of creation of right descriptions and attributes. Additionally, contextual Keywords data elements help to classify the context of the information being searched for accurately. Using these keywords, the viewer can establish a match between his or her viewing intent and the content of the video. The contextual keywords data element extraction uses a combination of Natural Language Processing; Machine learning and Neural Networks techniques to establish the relevance of the keywords for a given context of the video. An accurate representation of the context of the video with relevant keywords, greatly improves search ability of the video itself.
- Automatic Generation of Table of Contents:
- A table of video content helps the viewer to navigate through the video in a structured manner. Embodiments herein disclose methods to generate the index automatically using the content of the video, which is synced with occurrence of the content on the time scale of the video.
-
FIG. 4 is an example diagram illustrating a method for extracting content from a video, according to embodiments as disclosed herein. - The first step of extracting video content involves crawling the online video and related information. The end result of the crawling can include a multi domain data corpus. The data corpus will contain documents, metadata, tagged audio and video content classified into different domains.
- The content extraction from the video includes data extraction, noise elimination and data classification, wherein the data extraction includes extracting content from the selected online video. The content extraction engine extracts the content from the video, wherein the extracted content includes video frames, audio streams and video metadata. The content extraction also includes converting the extracted data into appropriate formats. For instance, audio speech data is converted into single channel, 16 bits and 16 kHz bandwidth format.
- Further, the content extraction includes noise elimination. Once the extracted content is formatted to the appropriate format, the noise elimination unit filters out irrelevant data or the data, which cannot be utilized for further processing. For instance, identifying audio speech with multiple high and low frequencies compressed at very low bit rates will constitute an input to the elimination phase.
- Further, the content extraction includes data classification, which comprises of filtering the irrelevant data from the extracted content; the data classification unit can be configured to classify the extracted data into different categories. For instance, the data classification unit extracts the video frames and groups them into different categories based on the images, changes in scenes and identification of text or the like. Further, the audio stream is categorized based on the spoken accent, dialects and gender. An appropriate categorization is done to metadata of the video as well. These categories together highlight features, which are uniquely tagged to help generate a useful context for the video during data processing.
-
FIG. 5 is an example diagram illustrating a method for establishing a video context, according to embodiments as disclosed herein. - The video context can be established using a matrix of machine learned contextual data models. The machine learnt contextual data models can be configured to develop new intelligent data from the extracted video content. As depicted in
FIG. 5 , establishing the video context results in building a matrix of data models and generating necessary inputs for the sequence of steps to generate new data that can be subsequently used to build the context of the video. - The matrix of data models can be built using inputs like: uniquely tagged audio, video and metadata content extracted from the online video using video content extraction and using separately built machine learnt multi domain data corpus. The multi domain data corpus represents different fields of information categorized into different domains. The domain categorization helps to impose relevant classification of context to the selected video.
- The matrix represents the arrangement of different types of machine learnt data models carrying the context of the video, which can be used by different methods in their independent capacities to produce outputs in the form of a new data. This form of data captures the relevant context of the video and can be used to develop data/content elements, which can be used by the viewer to validate the viewing intent.
- Speech to text decoder comprises of methods, which uniquely automate the process of selecting desired data models from the matrix such as speech acoustic models, domain language models, lexicon, and so on. Further, the selected data models can be configured to decode the audio content of the video into textual summary.
- Further, the textual summary can be fed into a transcript generator to develop the textual summary into a structured text using different data models from the matrix. The structured text represents a clean, punctuated data, which is paraphrased using domain centric data models. This method also uses these data models to identify new contextual titles and subtitles from the structured text, which gets developed into index table for the contents of the video.
- The structured text is taken as an input by a keyword generator, which uses a different contextual data model from the matrix and identifies keywords, which represent the context of the video. These keywords are used further by several other methods as a technique to preserve the video context and generate new data elements, which helps to extend the context beyond the title of the video.
-
FIG. 6 is an example block diagram illustrating a text summarizer, according to embodiments as disclosed herein. - The
text summarizer 102 takes long structured text and video frames as inputs and generates information in a more condensed form, which provides a gist/summary of the video content to the viewer. Textual summary generated is presented both in text and video forms. This method uses both extractive and abstractive techniques to build the textual summary. While the extractive technique uses the text phrases from the original text, the abstractive technique generates new sentences with the help of data models. Sentences generated from both extractive and abstractive techniques receive scores using the scale of domain of the video content and then the ranking engine selects a set of sentences, which are presented as a summary data. Further the textual summary output is fed into a video frames handler, which aligns video frames with the summary text. Video summary re-ranker considers these aligned video frames and other video frames as inputs, which are classified and tagged using domain intelligence. With these inputs, video re-ranker generates a new video summary. -
FIG. 7 is an example block diagram illustrating a mechanism to search inside a video, according to embodiments as disclosed herein. - The embodiments herein enable the viewer to find relevant sections inside the video. The
electronic device 100 can be configured to receive a query phrase as an input in the form of text. The text query phrase is fed into a video frame parser and a text parser. The video frame parser determines relevant frames from the video, which matches with the description of the query phrase. Similarly, the text parser determines relevant sections inside the videos based on the textual summary. Outputs from both the parsers are fed into intelligent response builder. The intelligent response builder can be configured to use domain intelligence as one of the criterion to rank video frames and texts to identify relevant sections of the video. Finally these sections of the video as presented as a final response to the query from the viewer. - The embodiments herein provides a mechanism to summarize the multimedia, wherein the multimedia can includes at least one of but not limited to text, audio or video. The generated multimedia summarization enables the user to identify and categories the multimedia into different categories for example, mass media (for example, news or the like), sports content, film content, print matter, educational videos or the like. The embodiments herein also provide video thumbnails and provide a mechanism to perform a concept based search (like keyword search).
- The embodiments herein help in generating viewership patterns and associated analyses for internal communications. The embodiments herein help in corporate trainings for absorbing training material and passing assessments.
- The embodiments herein facilitates searching within and across all media, generates relationships and links between different categories—cross-search using keywords and key-phrases. The embodiments herein also help in collation and consolidation of material for easy consumption.
- The embodiments herein help in law enforcement by generating a surveillance summary. Further, the embodiments herein aggregate data using search terms and relationships.
- The embodiments disclosed herein can be implemented through at least one software program running on at least one hardware device and performing functions to control the at least one hardware device. The
electronic device 100 shown inFIG. 1 includes blocks, which can be at least one of a hardware sub-module, or a combination of hardware sub-modules and software module. - The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of embodiments and examples, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the spirit and scope of the claims as described herein.
Claims (10)
1. A method for identifying a viewer intent a multimedia of an electronic device (100), the method comprising:
generating, by the electronic device (100), a textual summary from at least one multimedia;
determining and displaying, by the electronic device (100), at least one of at least one keyword and at least one key phrase associated with the textual summary, wherein the at least one of at least one keyword and at least one key phrase is contextually associated with content of the multimedia;
generating, by the electronic device (100), at least one paragraph from the extracted textual summary to generate at least one chapter based on the at least one of at least one keyword and at least one key phrase appeared in a time stamp associated with the textual summary; and
generating, by the electronic device (100), at least one index table for the generated at least one chapter.
2. The method of claim 1 , wherein the method further comprises receiving, by the electronic device (100), an input from a viewer on the displayed at least one of at least one keyword and key phrase to play the multimedia associated with the at least one of at least one keyword and key phrase.
3. The method of claim 1 , wherein the method further comprises receiving, by the electronic device (100), an input from the viewer on the generated at least one table of contents to play an interested portion of the multimedia.
4. The method of claim 1 , wherein the generated textual summary is at least one of an extractive summary and an abstractive summary.
5. The method of claim 1 , wherein the method further includes receiving, by the electronic device (100), a viewer search intent as an input to identify a matching content in the multimedia corresponding to the viewer search intent.
6. An electronic device (100) for identifying a viewer intent a multimedia, the electronic device (100) comprises
a text summarizer (102) configured to generate a textual summary from at least one multimedia;
a keyword and key phrase extractor (104) configured to determine and display at least one of at least one keyword and at least one key phrase associated with the textual summary, wherein the at least one of at least one keyword and at least one key phrase are contextually associated with content of the multimedia;
a paragraph generator (106) configured to generate at least one paragraph from the extracted textual summary to generate at least one chapter based on the at least one of at least one keyword and at least one key phrase appeared in a time stamp associated with the textual summary; and
an index table generator (108) configured to generate at least one index table for the generated at least one chapter.
7. The electronic device (100) of claim 6 , wherein the electronic device (100) is further configured to receive an input from a viewer on the displayed at least one of at least one keyword and key phrase to play the multimedia associated with the at least one of at least one keyword and key phrase.
8. The electronic device (100) of claim 6 , wherein the electronic device (100) is further configured to receive an input from the viewer on the generated at least one index table to play an interested portion of the multimedia.
9. The electronic device (100) of claim 6 , wherein the generated textual summary is at least one of an extractive summary and an abstractive summary.
10. The electronic device (100) of claim 6 , wherein the electronic device (100) is further configured to receive a viewer search intent as an input to identify a matching content in the multimedia corresponding to the viewer search intent.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US16/422,658 US10911840B2 (en) | 2016-12-03 | 2019-05-24 | Methods and systems for generating contextual data elements for effective consumption of multimedia |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| IN201641041399 | 2016-12-03 | ||
| IN201641041399 | 2016-12-03 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US16/422,658 Continuation-In-Part US10911840B2 (en) | 2016-12-03 | 2019-05-24 | Methods and systems for generating contextual data elements for effective consumption of multimedia |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US20180160200A1 true US20180160200A1 (en) | 2018-06-07 |
Family
ID=62243641
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US15/786,077 Abandoned US20180160200A1 (en) | 2016-12-03 | 2017-10-17 | Methods and systems for identifying, incorporating, streamlining viewer intent when consuming media |
| US16/422,658 Active 2038-01-08 US10911840B2 (en) | 2016-12-03 | 2019-05-24 | Methods and systems for generating contextual data elements for effective consumption of multimedia |
Family Applications After (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US16/422,658 Active 2038-01-08 US10911840B2 (en) | 2016-12-03 | 2019-05-24 | Methods and systems for generating contextual data elements for effective consumption of multimedia |
Country Status (1)
| Country | Link |
|---|---|
| US (2) | US20180160200A1 (en) |
Cited By (24)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108873829A (en) * | 2018-05-28 | 2018-11-23 | 上海新增鼎数据科技有限公司 | A kind of phosphoric acid production parameter control method promoting decision tree based on gradient |
| US20190258704A1 (en) * | 2018-02-20 | 2019-08-22 | Dropbox, Inc. | Automated outline generation of captured meeting audio in a collaborative document context |
| US20190362020A1 (en) * | 2018-05-22 | 2019-11-28 | Salesforce.Com, Inc. | Abstraction of text summarizaton |
| CN110516030A (en) * | 2019-08-26 | 2019-11-29 | 北京百度网讯科技有限公司 | Determination method, apparatus, device and computer-readable storage medium of intent word |
| CN110598651A (en) * | 2019-09-17 | 2019-12-20 | 腾讯科技(深圳)有限公司 | Information processing method, device and storage medium |
| US10558761B2 (en) * | 2018-07-05 | 2020-02-11 | Disney Enterprises, Inc. | Alignment of video and textual sequences for metadata analysis |
| US10657954B2 (en) | 2018-02-20 | 2020-05-19 | Dropbox, Inc. | Meeting audio capture and transcription in a collaborative document context |
| US20200175972A1 (en) * | 2018-11-29 | 2020-06-04 | International Business Machines Corporation | Voice message categorization and tagging |
| US20200221190A1 (en) * | 2019-01-07 | 2020-07-09 | Microsoft Technology Licensing, Llc | Techniques for associating interaction data with video content |
| US10869091B1 (en) * | 2019-08-06 | 2020-12-15 | Amazon Technologies, Inc. | Latency detection for streaming video |
| JP2021103575A (en) * | 2020-04-10 | 2021-07-15 | ベイジン バイドゥ ネットコム サイエンス アンド テクノロジー カンパニー リミテッド | Content search method and device for movie and tv drama |
| US11227593B2 (en) * | 2019-06-28 | 2022-01-18 | Rovi Guides, Inc. | Systems and methods for disambiguating a voice search query based on gestures |
| US11288454B2 (en) * | 2018-05-15 | 2022-03-29 | Beijing Sankuai Online Technology Co., Ltd | Article generation |
| US20220129502A1 (en) * | 2020-10-26 | 2022-04-28 | Dell Products L.P. | Method and system for performing a compliance operation on video data using a data processing unit |
| US20220180869A1 (en) * | 2017-11-09 | 2022-06-09 | Microsoft Technology Licensing, Llc | Systems, methods, and computer-readable storage device for generating notes for a meeting based on participant actions and machine learning |
| US11361759B2 (en) * | 2019-11-18 | 2022-06-14 | Streamingo Solutions Private Limited | Methods and systems for automatic generation and convergence of keywords and/or keyphrases from a media |
| US20220237375A1 (en) * | 2021-01-25 | 2022-07-28 | Kyndryl, Inc. | Effective text parsing using machine learning |
| US11409791B2 (en) | 2016-06-10 | 2022-08-09 | Disney Enterprises, Inc. | Joint heterogeneous language-vision embeddings for video tagging and search |
| US11488602B2 (en) | 2018-02-20 | 2022-11-01 | Dropbox, Inc. | Meeting transcription using custom lexicons based on document history |
| US11514949B2 (en) | 2020-10-26 | 2022-11-29 | Dell Products L.P. | Method and system for long term stitching of video data using a data processing unit |
| US11675827B2 (en) | 2019-07-14 | 2023-06-13 | Alibaba Group Holding Limited | Multimedia file categorizing, information processing, and model training method, system, and device |
| US11689379B2 (en) | 2019-06-24 | 2023-06-27 | Dropbox, Inc. | Generating customized meeting insights based on user interactions and meeting media |
| CN117556084A (en) * | 2023-12-27 | 2024-02-13 | 环球数科集团有限公司 | Video emotion analysis system based on multiple modes |
| US11916908B2 (en) | 2020-10-26 | 2024-02-27 | Dell Products L.P. | Method and system for performing an authentication and authorization operation on video data using a data processing unit |
Families Citing this family (23)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10818033B2 (en) * | 2018-01-18 | 2020-10-27 | Oath Inc. | Computer vision on broadcast video |
| KR102660124B1 (en) * | 2018-03-08 | 2024-04-23 | 한국전자통신연구원 | Method for generating data for learning emotion in video, method for determining emotion in video, and apparatus using the methods |
| US11392791B2 (en) * | 2018-08-31 | 2022-07-19 | Writer, Inc. | Generating training data for natural language processing |
| US11468071B2 (en) * | 2018-11-30 | 2022-10-11 | Rovi Guides, Inc. | Voice query refinement to embed context in a voice query |
| KR102592833B1 (en) * | 2018-12-14 | 2023-10-23 | 현대자동차주식회사 | Control system and method of interlocking control system of voice recognition function of vehicle |
| KR102345625B1 (en) * | 2019-02-01 | 2021-12-31 | 삼성전자주식회사 | Caption generation method and apparatus for performing the same |
| US11800202B2 (en) * | 2019-09-10 | 2023-10-24 | Dish Network L.L.C. | Systems and methods for generating supplemental content for a program content stream |
| US11263407B1 (en) | 2020-09-01 | 2022-03-01 | Rammer Technologies, Inc. | Determining topics and action items from conversations |
| US11093718B1 (en) * | 2020-12-01 | 2021-08-17 | Rammer Technologies, Inc. | Determining conversational structure from speech |
| GB202104299D0 (en) * | 2021-03-26 | 2021-05-12 | Polkinghorne Ben | Video content item selection |
| US12062367B1 (en) * | 2021-06-28 | 2024-08-13 | Amazon Technologies, Inc. | Machine learning techniques for processing video streams using metadata graph traversal |
| CN113269173B (en) * | 2021-07-20 | 2021-10-22 | 佛山市墨纳森智能科技有限公司 | Method and device for establishing emotion recognition model and recognizing human emotion |
| US20230359670A1 (en) * | 2021-08-31 | 2023-11-09 | Jio Platforms Limited | System and method facilitating a multi mode bot capability in a single experience |
| JP2023043782A (en) * | 2021-09-16 | 2023-03-29 | ヤフー株式会社 | Information processing device, information processing method and information processing program |
| US12142273B2 (en) * | 2021-11-09 | 2024-11-12 | Honda Motor Co., Ltd. | Creation of notes for items of interest mentioned in audio content |
| US11302314B1 (en) | 2021-11-10 | 2022-04-12 | Rammer Technologies, Inc. | Tracking specialized concepts, topics, and activities in conversations |
| US11599713B1 (en) | 2022-07-26 | 2023-03-07 | Rammer Technologies, Inc. | Summarizing conversational speech |
| US12321701B2 (en) * | 2022-11-04 | 2025-06-03 | Microsoft Technology Licensing, Llc | Building and using target-based sentiment models |
| KR102834401B1 (en) * | 2022-12-27 | 2025-07-15 | 한국과학기술원 | Taxonomy and computational classification pipeline of information types in instructional videos |
| KR102749990B1 (en) * | 2023-02-16 | 2025-01-03 | 쿠팡 주식회사 | Method and electronic device for generating tag information corresponding to image content |
| US20240303442A1 (en) * | 2023-03-10 | 2024-09-12 | Microsoft Technology Licensing, Llc | Natural language processing based dominant item detection in videos |
| US20250006191A1 (en) * | 2023-06-28 | 2025-01-02 | Disctopia L.L.C. | System for distribution of content and analysis of content engagement |
| US20250005276A1 (en) * | 2023-06-30 | 2025-01-02 | Salesforce, Inc. | Systems and methods for selecting neural network models for building a custom artificial intelligence stack |
Family Cites Families (22)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6763148B1 (en) * | 2000-11-13 | 2004-07-13 | Visual Key, Inc. | Image recognition methods |
| US20020083473A1 (en) * | 2000-12-21 | 2002-06-27 | Philips Electronics North America Corporation | System and method for accessing a multimedia summary of a video program |
| US7260257B2 (en) * | 2002-06-19 | 2007-08-21 | Microsoft Corp. | System and method for whiteboard and audio capture |
| US20060212897A1 (en) * | 2005-03-18 | 2006-09-21 | Microsoft Corporation | System and method for utilizing the content of audio/video files to select advertising content for display |
| JP4746397B2 (en) * | 2005-10-04 | 2011-08-10 | 株式会社東芝 | Advertisement display processing method and apparatus related to playback title |
| US20100153885A1 (en) * | 2005-12-29 | 2010-06-17 | Rovi Technologies Corporation | Systems and methods for interacting with advanced displays provided by an interactive media guidance application |
| US8539359B2 (en) * | 2009-02-11 | 2013-09-17 | Jeffrey A. Rapaport | Social network driven indexing system for instantly clustering people with concurrent focus on same topic into on-topic chat rooms and/or for generating on-topic search results tailored to user preferences regarding topic |
| US9244923B2 (en) * | 2012-08-03 | 2016-01-26 | Fuji Xerox Co., Ltd. | Hypervideo browsing using links generated based on user-specified content features |
| US9374411B1 (en) * | 2013-03-21 | 2016-06-21 | Amazon Technologies, Inc. | Content recommendations using deep data |
| US20150142891A1 (en) * | 2013-11-19 | 2015-05-21 | Sap Se | Anticipatory Environment for Collaboration and Data Sharing |
| US9286290B2 (en) * | 2014-04-25 | 2016-03-15 | International Business Machines Corporation | Producing insight information from tables using natural language processing |
| US20160021249A1 (en) * | 2014-07-18 | 2016-01-21 | Ebay Inc. | Systems and methods for context based screen display |
| RU2579899C1 (en) * | 2014-09-30 | 2016-04-10 | Общество с ограниченной ответственностью "Аби Девелопмент" | Document processing using multiple processing flows |
| US10210906B2 (en) * | 2015-06-08 | 2019-02-19 | Arris Enterprises Llc | Content playback and recording based on scene change detection and metadata |
| US10013404B2 (en) * | 2015-12-03 | 2018-07-03 | International Business Machines Corporation | Targeted story summarization using natural language processing |
| CN107241622A (en) * | 2016-03-29 | 2017-10-10 | 北京三星通信技术研究有限公司 | video location processing method, terminal device and cloud server |
| US9972360B2 (en) * | 2016-08-30 | 2018-05-15 | Oath Inc. | Computerized system and method for automatically generating high-quality digital content thumbnails from digital video |
| US11033216B2 (en) * | 2017-10-12 | 2021-06-15 | International Business Machines Corporation | Augmenting questionnaires |
| US10467335B2 (en) * | 2018-02-20 | 2019-11-05 | Dropbox, Inc. | Automated outline generation of captured meeting audio in a collaborative document context |
| US10657954B2 (en) * | 2018-02-20 | 2020-05-19 | Dropbox, Inc. | Meeting audio capture and transcription in a collaborative document context |
| US10558761B2 (en) * | 2018-07-05 | 2020-02-11 | Disney Enterprises, Inc. | Alignment of video and textual sequences for metadata analysis |
| US20200221190A1 (en) * | 2019-01-07 | 2020-07-09 | Microsoft Technology Licensing, Llc | Techniques for associating interaction data with video content |
-
2017
- 2017-10-17 US US15/786,077 patent/US20180160200A1/en not_active Abandoned
-
2019
- 2019-05-24 US US16/422,658 patent/US10911840B2/en active Active
Cited By (37)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11409791B2 (en) | 2016-06-10 | 2022-08-09 | Disney Enterprises, Inc. | Joint heterogeneous language-vision embeddings for video tagging and search |
| US12014737B2 (en) * | 2017-11-09 | 2024-06-18 | Microsoft Technology Licensing, Llc | Systems, methods, and computer-readable storage device for generating notes for a meeting based on participant actions and machine learning |
| US20220180869A1 (en) * | 2017-11-09 | 2022-06-09 | Microsoft Technology Licensing, Llc | Systems, methods, and computer-readable storage device for generating notes for a meeting based on participant actions and machine learning |
| US10943060B2 (en) | 2018-02-20 | 2021-03-09 | Dropbox, Inc. | Automated outline generation of captured meeting audio in a collaborative document context |
| US11488602B2 (en) | 2018-02-20 | 2022-11-01 | Dropbox, Inc. | Meeting transcription using custom lexicons based on document history |
| US10657954B2 (en) | 2018-02-20 | 2020-05-19 | Dropbox, Inc. | Meeting audio capture and transcription in a collaborative document context |
| US10467335B2 (en) * | 2018-02-20 | 2019-11-05 | Dropbox, Inc. | Automated outline generation of captured meeting audio in a collaborative document context |
| US11275891B2 (en) | 2018-02-20 | 2022-03-15 | Dropbox, Inc. | Automated outline generation of captured meeting audio in a collaborative document context |
| US20190258704A1 (en) * | 2018-02-20 | 2019-08-22 | Dropbox, Inc. | Automated outline generation of captured meeting audio in a collaborative document context |
| US11288454B2 (en) * | 2018-05-15 | 2022-03-29 | Beijing Sankuai Online Technology Co., Ltd | Article generation |
| US20190362020A1 (en) * | 2018-05-22 | 2019-11-28 | Salesforce.Com, Inc. | Abstraction of text summarizaton |
| US10909157B2 (en) * | 2018-05-22 | 2021-02-02 | Salesforce.Com, Inc. | Abstraction of text summarization |
| CN108873829A (en) * | 2018-05-28 | 2018-11-23 | 上海新增鼎数据科技有限公司 | A kind of phosphoric acid production parameter control method promoting decision tree based on gradient |
| US10558761B2 (en) * | 2018-07-05 | 2020-02-11 | Disney Enterprises, Inc. | Alignment of video and textual sequences for metadata analysis |
| US20200175972A1 (en) * | 2018-11-29 | 2020-06-04 | International Business Machines Corporation | Voice message categorization and tagging |
| US11011166B2 (en) * | 2018-11-29 | 2021-05-18 | International Business Machines Corporation | Voice message categorization and tagging |
| US20200221190A1 (en) * | 2019-01-07 | 2020-07-09 | Microsoft Technology Licensing, Llc | Techniques for associating interaction data with video content |
| US11689379B2 (en) | 2019-06-24 | 2023-06-27 | Dropbox, Inc. | Generating customized meeting insights based on user interactions and meeting media |
| US12040908B2 (en) | 2019-06-24 | 2024-07-16 | Dropbox, Inc. | Generating customized meeting insights based on user interactions and meeting media |
| US11227593B2 (en) * | 2019-06-28 | 2022-01-18 | Rovi Guides, Inc. | Systems and methods for disambiguating a voice search query based on gestures |
| US12322385B2 (en) | 2019-06-28 | 2025-06-03 | Adeia Guides Inc. | Systems and methods for disambiguating a voice search query based on gestures |
| US11675827B2 (en) | 2019-07-14 | 2023-06-13 | Alibaba Group Holding Limited | Multimedia file categorizing, information processing, and model training method, system, and device |
| US10869091B1 (en) * | 2019-08-06 | 2020-12-15 | Amazon Technologies, Inc. | Latency detection for streaming video |
| CN110516030A (en) * | 2019-08-26 | 2019-11-29 | 北京百度网讯科技有限公司 | Determination method, apparatus, device and computer-readable storage medium of intent word |
| CN110598651A (en) * | 2019-09-17 | 2019-12-20 | 腾讯科技(深圳)有限公司 | Information processing method, device and storage medium |
| US11361759B2 (en) * | 2019-11-18 | 2022-06-14 | Streamingo Solutions Private Limited | Methods and systems for automatic generation and convergence of keywords and/or keyphrases from a media |
| JP2021103575A (en) * | 2020-04-10 | 2021-07-15 | ベイジン バイドゥ ネットコム サイエンス アンド テクノロジー カンパニー リミテッド | Content search method and device for movie and tv drama |
| US11570527B2 (en) | 2020-04-10 | 2023-01-31 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for retrieving teleplay content |
| JP7228615B2 (en) | 2020-04-10 | 2023-02-24 | ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド | Movie/TV drama content search method and device |
| EP3823296A3 (en) * | 2020-04-10 | 2021-09-15 | Beijing Baidu Netcom Science and Technology Co., Ltd | Method and apparatus for retrieving teleplay content |
| US11514949B2 (en) | 2020-10-26 | 2022-11-29 | Dell Products L.P. | Method and system for long term stitching of video data using a data processing unit |
| US11599574B2 (en) * | 2020-10-26 | 2023-03-07 | Dell Products L.P. | Method and system for performing a compliance operation on video data using a data processing unit |
| US20220129502A1 (en) * | 2020-10-26 | 2022-04-28 | Dell Products L.P. | Method and system for performing a compliance operation on video data using a data processing unit |
| US11916908B2 (en) | 2020-10-26 | 2024-02-27 | Dell Products L.P. | Method and system for performing an authentication and authorization operation on video data using a data processing unit |
| US11574121B2 (en) * | 2021-01-25 | 2023-02-07 | Kyndryl, Inc. | Effective text parsing using machine learning |
| US20220237375A1 (en) * | 2021-01-25 | 2022-07-28 | Kyndryl, Inc. | Effective text parsing using machine learning |
| CN117556084A (en) * | 2023-12-27 | 2024-02-13 | 环球数科集团有限公司 | Video emotion analysis system based on multiple modes |
Also Published As
| Publication number | Publication date |
|---|---|
| US20190294668A1 (en) | 2019-09-26 |
| US10911840B2 (en) | 2021-02-02 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20180160200A1 (en) | Methods and systems for identifying, incorporating, streamlining viewer intent when consuming media | |
| JP5781601B2 (en) | Enhanced online video through content detection, search, and information aggregation | |
| CN109582945B (en) | Article generating method, device and storage medium | |
| CN113038153B (en) | Financial live broadcast violation detection method, device, equipment and readable storage medium | |
| US10896444B2 (en) | Digital content generation based on user feedback | |
| CN111935529B (en) | Education audio and video resource playing method, equipment and storage medium | |
| WO2016186856A1 (en) | Contextualizing knowledge panels | |
| Basu et al. | Videopedia: Lecture video recommendation for educational blogs using topic modeling | |
| CN112804580B (en) | Video dotting method and device | |
| US20190082236A1 (en) | Determining Representative Content to be Used in Representing a Video | |
| Furini | On introducing timed tag-clouds in video lectures indexing | |
| Yamamoto et al. | Video scene annotation based on web social activities | |
| Baidya et al. | LectureKhoj: Automatic tagging and semantic segmentation of online lecture videos | |
| CN110888896A (en) | Data searching method and data searching system thereof | |
| CN112382295A (en) | Voice recognition method, device, equipment and readable storage medium | |
| Carta et al. | VSTAR: visual semantic thumbnails and tAgs revitalization | |
| Salim et al. | An approach for exploring a video via multimodal feature extraction and user interactions | |
| Wartena | Comparing segmentation strategies for efficient video passage retrieval | |
| CN117076710A (en) | News automatic cataloging method based on multi-mode information fusion | |
| Baraldi et al. | NeuralStory: An interactive multimedia system for video indexing and re-use | |
| Kravvaris et al. | Automatic point of interest detection for open online educational video lectures | |
| Lian | Innovative Internet video consuming based on media analysis techniques | |
| Carmichael et al. | Multimodal indexing of digital audio-visual documents: A case study for cultural heritage data | |
| Hürst et al. | Searching in recorded lectures | |
| Salim et al. | An alternative approach to exploring a video |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: STREAMINGO SOLUTIONS PRIVATE LIMITED, INDIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GOEL, VAIBHAV;MANJUNATH, SHARATH;V, VIDHYA T;AND OTHERS;REEL/FRAME:043885/0402 Effective date: 20170821 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |