WO2015007321A1 - Identifying stories in media content - Google Patents

Identifying stories in media content Download PDF

Info

Publication number
WO2015007321A1
WO2015007321A1 PCT/EP2013/065232 EP2013065232W WO2015007321A1 WO 2015007321 A1 WO2015007321 A1 WO 2015007321A1 EP 2013065232 W EP2013065232 W EP 2013065232W WO 2015007321 A1 WO2015007321 A1 WO 2015007321A1
Authority
WO
WIPO (PCT)
Prior art keywords
media
segment
story
concepts
media segment
Prior art date
Application number
PCT/EP2013/065232
Other languages
French (fr)
Inventor
Abigail BETLEY
Unai AYO ARESTI
David TOONE
Original Assignee
Longsand Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Longsand Limited filed Critical Longsand Limited
Priority to US14/905,487 priority Critical patent/US9734408B2/en
Priority to PCT/EP2013/065232 priority patent/WO2015007321A1/en
Priority to CN201380078901.6A priority patent/CN105474201A/en
Priority to EP13747986.1A priority patent/EP3022663A1/en
Publication of WO2015007321A1 publication Critical patent/WO2015007321A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8549Creating video summaries, e.g. movie trailer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • G06V20/47Detecting features for summarising video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/71Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/738Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/28Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8541Content authoring involving branching, e.g. to different story endings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/44Event detection

Definitions

  • Media content may be broadcast, streamed, or otherwise delivered via any of a number of communication channels, using a number of different technologies.
  • delivery of streaming video media on the Internet generally includes encoding the video content into one or more streaming video formats, and efficiently delivering the encoded video content to end users.
  • FIG. 1 is a conceptual diagram of an example story recognition environment in accordance with implementations described herein.
  • FIG. 2 is a conceptual diagram of an example process of identifying stories from media content in accordance with implementations described herein.
  • FIG. 3 is a flow diagram of an example process of identifying stories from media content in accordance with implementations described herein.
  • FIG. 4 is a block diagram of an example computer system to identify stories from media content in accordance with implementations described herein.
  • the media content e.g., the news program
  • conceptual stories e.g., three conceptually distinct stories including the legal story, the business story, and the weather story.
  • additional useful processing may be performed - e.g., to summarize or classify the stories, or to isolate (e.g., clip) stories from the media content for more convenient access or delivery.
  • These or other appropriate processing techniques may be applied to stories, after they have been identified, to generally make the stories more accessible and/or consumable by end users.
  • FIG. 1 is a conceptual diagram of an example story recognition environment 100 in accordance with implementations described herein.
  • environment 100 includes a computing system 1 10 that is configured to execute a story recognition engine 1 12.
  • the story recognition engine 1 12 may generally operate to analyze incoming media content 102, and to identify various stories 1 14a, 1 14b, and 1 14c included in the media content 102.
  • the story recognition engine 1 12 may generally identify stories by dividing the media content 102 into segments, analyzing the segments to determine concepts associated with the respective segments, comparing concepts across different segments to determine conceptual similarity of the different segments, and merging segments that are conceptually similar into stories.
  • the example topology of environment 100 may be representative of various story recognition environments. However, it should be understood that the example topology of environment 100 is shown for illustrative purposes only, and that various modifications may be made to the configuration. For example, environment 100 may include different or additional components, or the components may be implemented in a different manner than is shown. Also, while computing system 1 10 is generally illustrated as a standalone server, it should be understood that computing system 1 10 may, in practice, be any appropriate type of computing device, such as a server, a mainframe, a laptop, a desktop, a workstation, or other device. Computing system 1 10 may also represent a group of computing devices, such as a server farm, a server cluster, or other group of computing devices operating individually or together to perform the functionality described herein.
  • Media content 102 may be in the form of any appropriate media type, and may be provided from any appropriate media source.
  • media types that may be processed as described herein include, but are not limited to, audio information (e.g., radio broadcasts, telephone conversations, streaming audio, etc.), video information (e.g., television broadcasts, webcasts, streaming video, etc.), and/or multimedia information (e.g., combinations of audio, video, graphics, and/or other appropriate content).
  • Examples of media sources include, but are not limited to, broadcast media sources, streaming media sources, online media repositories, standalone physical media (e.g., Blu-Ray discs, DVDs, compact discs, etc.), or the like.
  • Computing system 1 10 may include a processor 122, a memory 124, an interface 126, a segmentation module 128, a content analysis module 130, and a segment merge module 132. It should be understood that the components shown are for illustrative purposes only, and that in some cases, the functionality being described with respect to a particular module or component of computing system 1 10 may be performed by one or more different or additional modules or components, e.g., of computing system 1 10 or of another appropriate computing system. Similarly, it should be understood that portions or all of the functionality may be combined into fewer modules or components than are shown.
  • Processor 122 may be configured to process instructions for execution by computing system 1 10.
  • the instructions may be stored on a non- transitory, tangible computer-readable storage medium, such as in memory 124 or on a separate storage device (not shown), or on any other type of volatile or nonvolatile memory that stores instructions to cause a programmable processor to perform the techniques described herein.
  • computing system 1 10 may include dedicated hardware, such as one or more integrated circuits, Application Specific Integrated Circuits (ASICs), Application Specific Special Processors (ASSPs), Field Programmable Gate Arrays (FPGAs), or any combination of the foregoing examples of dedicated hardware, for performing the techniques described herein.
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Special Processors
  • FPGAs Field Programmable Gate Arrays
  • multiple processors may be used, as appropriate, along with multiple memories and/or types of memory.
  • Interface 126 may be implemented in hardware and/or software, and may be configured, for example, to receive the media content 102 from an appropriate media source (not shown). In some implementations, interface 126 may be configured to locate and/or request media content 102 from one or more media sources. For example, interface 126 may be configured to capture news feeds from different news channels or outlets on a recurring, scheduled, and/or ad hoc basis, and to provide the media content 102 for processing by the story recognition engine 1 12. Interface 126 may also be configured to output processed stories, e.g., stories 1 14a, 1 14b, and/or 1 14c for consumption by end users or other appropriate computing systems, such as a search engine or other appropriate system.
  • processed stories e.g., stories 1 14a, 1 14b, and/or 1 14c for consumption by end users or other appropriate computing systems, such as a search engine or other appropriate system.
  • interface 126 may also include one or more user interfaces that allow a user (e.g., a system administrator) to interact directly with the computing system 1 10, e.g., to manually define or modify settings or options associated with the story recognition engine 1 12. Such settings or options may be stored in a database (not shown), and may be used by the story recognition engine 1 12 to adjust one or more processing parameters associated with story recognition functionality as described herein.
  • Example user interfaces may include touchscreen devices, pointing devices, keyboards, voice input interfaces, visual input interfaces, or the like.
  • Segmentation module 128 may execute on one or more processors, e.g., processor 122, and may segment received media content 102 into a plurality of media segments based on auditory indicators included in the media content 102. For example, segmentation module 128 may analyze the audio portion of media content 102 to identify certain auditory tokens (e.g., silence of a given length, or particular types of auditory signals, such as music or specific tones) to identify logical breaks in the media content 102. In the example of a news program, segmentation based on silence or pauses in the audio portion of the media content 102 may lead to segments that align with sentences and/or paragraphs, as speakers may generally pause for a brief moment between sentences and/or paragraphs. Similarly, news programs may include musical jingles, a series of tones, or other auditory signals that indicate logical breaks between portions of the program. These and/or other appropriate auditory indicators may be used to allow segmentation module 128 to segment the media content 102.
  • auditory tokens e
  • segmentation module 128 may also or alternatively use visual indicators to segment the received media content 102.
  • segmentation module 128 may analyze the video portion of media content 102 to identify certain visual tokens (e.g., key frames that indicate substantial differences between successive frames of video, black frames, or other appropriate visual indicators) that may also or alternatively be used to identify logical breaks in the media content 102.
  • visual tokens e.g., key frames that indicate substantial differences between successive frames of video, black frames, or other appropriate visual indicators
  • auditory indicators such as silence in combination with video indicators such as key frames may be used to accurately and consistently segment the media content 102 into appropriate media segments.
  • Segmentation module 128 may also use other appropriate indicators to cause or fine-tune the segmentation of media content 102 into a plurality of media segments.
  • speech-to-text processing of the audio portion of media content 102 may provide a transcript, which may be used, e.g., in association with the auditory and/or visual indicators as described above, to determine appropriate breaks for the segments (e.g., based on periods or other punctuation in the transcript).
  • closed-captioning information associated with the video portion of media content 102 may be used as an input to determine or confirm breaks for the segments.
  • the respective media segments may be analyzed by content analysis module 130.
  • Content analysis module 130 may execute on one or more processors, e.g., processor 122, and may analyze the media segments to determine a set of one or more key terms and/or concepts associated with the respective segments.
  • analyzing a media segment may include generating a transcript of the audio portion of the media segment (e.g., using speech-to-text processing), and providing the transcript to a conceptual analysis engine, which in turn may provide the set of one or more concepts associated with the media segment.
  • the conceptual analysis engine may also return key terms from the media segment, e.g., by removing any common terms or stop words that are unlikely to add any conceptual information about the particular segment.
  • the content analysis module 130 may be configured to analyze the media segments natively (e.g., without converting the audio, video, or multimedia information to text) to determine the concepts associated with the media segments.
  • Segment merge module 132 may execute on one or more processors, e.g., processor 122, and may compare the concepts identified by content analysis module 130 to determine the conceptual similarity between one or more media segments, and may merge the media segments into a story if the conceptual similarity indicates that the media segments are sufficiently related. For example, in some implementations, the segment merge module 132 may compare a first set of concepts associated with a first media segment to a second set of concepts associated with a second media segment to determine a conceptual similarity between the first set of concepts and the second set of concepts. In some implementations, the conceptual similarity may be expressed as a numerical similarity score, or may otherwise be expressed as an objective conceptual similarity between the two segments.
  • the segment merge module 132 may identify a story as including both of the media segments.
  • the segment merge module 132 may be configured to analyze a certain number of nearby segments (e.g., the three preceding media segments) for conceptually similar segments. The number of nearby segments to be analyzed in such implementations may be configurable, e.g., by an administrator.
  • Conceptual similarity and the similarity threshold as described above may be defined in any appropriate manner to achieve the desired story identification results for a given implementation.
  • conceptual similarity may be determined based on matching concepts and/or key terms, or may be determined based on conceptual distances between concepts and/or key terms, or may be determined based on other appropriate techniques or combinations of techniques.
  • the similarity threshold may be based on a percentage of the concepts and/or terms that match from one segment to another (e.g., 25% or greater, 50% or greater, etc.), or may be based on a number of matching or otherwise overlapping concepts (e.g., one or more, more than one, more than two, etc.).
  • the similarity threshold may be based on nearest conceptual distances, furthest conceptual distances, average conceptual distances, and/or other appropriate measures or combinations of measures.
  • the similarity threshold may be configurable, e.g., by an administrator to achieve a desired level of consistency within the stories. For example, to generate more consistent stories, the similarity threshold may be increased.
  • segment merge module 132 may merge not only the segments that are determined to be conceptually similar as described above, but may also merge intervening media segments that are temporally situated between the segments that are to be merged into a story. Continuing the example above, if the first media segment and the second media segment are separated by three intervening media segments, the segment merge module 132 may merge the five media segments - bookended by the first and second media segments and including the three intervening media segments - into a single story even if the intervening media segments were not necessarily identified as being conceptually similar to either of the first or second media segments.
  • segment merge module 132 may exclude certain of the intervening media segments from being merged into the story if the particular intervening media segments are identified as being conceptually unrelated to the story. In the example above, if one of the three intervening media segments was identified as being unrelated (as opposed to simply not being identified as specifically related), the segment merge module 132 may merge four of the five media segments, again bookended by the first and second media segments, into a single story such that the story excludes the unrelated media segment. Such exclusion, for example, may ensure that advertisements or other completely separate media segments are not included as part of the story.
  • the story recognition engine 1 12 may perform post-identification processing on the stories. For example, the story recognition engine 1 12 may analyze any identified stories to generate summaries of the respective stories, or to divide the media content according to the stories, or to perform other appropriate processing. In such a manner, the stories from various media content may be made more accessible and/or consumable by users.
  • FIG. 2 is a conceptual diagram of an example process 200 of identifying stories from media content in accordance with implementations described herein.
  • the process 200 may be performed, for example, by a story recognition processing system such as the story recognition engine 1 12 illustrated in FIG. 1 .
  • a story recognition processing system such as the story recognition engine 1 12 illustrated in FIG. 1 .
  • the description that follows uses the story recognition engine 1 12 illustrated in FIG. 1 as the basis of an example for describing the process.
  • another system, or combination of systems may be used to perform the process or various portions of the process.
  • media content 212 is received by the story recognition engine 1 12.
  • the media content 212 may generally be in the form of a single, continuous block of media, such as a thirty-minute news program with two advertisement breaks included during the program.
  • the media content 212 shows dotted lines that are intended to represent the auditory and/or visual indicators as described above.
  • the media content 212 has been broken down into multiple media segments - Segment A 222, Segment B 224, Segment C 226, and Segment D 228.
  • the story recognition engine 1 12 may use auditory and/or visual indicators included in the media content 212 to segment the content, e.g., according to sentences or other logical breaks in the content.
  • Segment A 222 has been analyzed to determine the set of concepts 232 associated with Segment A 222.
  • Segment B 224 has been analyzed to determine the set of concepts 234 associated with Segment B 224
  • Segment C 226 has been analyzed to determine the set of concepts 236 associated with Segment C 226, and Segment D 228 has been analyzed to determine the set of concepts 238 associated with Segment A 228.
  • stage 240 Segment A 222 and Segment B 224 have been merged into Candidate Story A 242, and Segment C 226 and Segment D 228 have been merged into Candidate Story B 244.
  • Such merging may be based on a comparison of the concepts across segments, and a determination that the concepts 232 of Segment A 222 were conceptually similar to the concepts 234 of Segment B 224.
  • Story recognition engine 1 12 may also have compared the concepts 236 of Segment C 226 and/or the concepts 238 of Segment D 228 to those in the previous segments and determined that there was insufficient conceptual similarity to merge the segments.
  • story recognition engine 1 12 may have merged Segment C 226 and Segment D 228 based on determining that the concepts 236 and 238 were conceptually similar enough that they were likely part of the same story.
  • Candidate Story A 242 has been identified as a non- story 252
  • Candidate Story B 244 has been identified as a story 254.
  • Non- stories such as non-story 252 may be identified, e.g., in cases where the length of the story is less than a configurable minimum story length (e.g., less than thirty seconds), or in cases where the concepts are determined to be inconsequential (e.g., advertisements or non-story segues between stories), or under other appropriate circumstances.
  • post-processing may also be performed.
  • the story has been summarized, e.g., based on the content of the story and/or the determined concepts associated with the story.
  • FIG. 3 is a flow diagram of an example process 300 of identifying stories from media content in accordance with implementations described herein.
  • the process 300 may be performed, for example, by a story recognition processing system such as the story recognition engine 1 12 illustrated in FIG. 1 .
  • a story recognition processing system such as the story recognition engine 1 12 illustrated in FIG. 1 .
  • the description that follows uses the story recognition engine 1 12 illustrated in FIG. 1 as the basis of an example for describing the process.
  • another system, or combination of systems may be used to perform the process or various portions of the process.
  • Process 300 begins at block 310 when media content is received.
  • the media content may be provided directly to the story recognition engine 1 12 (e.g., by a user or a content provider).
  • the story recognition engine 1 12 may actively locate and/or request media content for processing.
  • the story recognition engine 1 12 may actively monitor a particular news feed (e.g., streaming video content, or broadcast news channel) to gather appropriate media content for processing.
  • the media content is segmented based on auditory indicators, visual indicators, or a combination of auditory and visual indicators.
  • the story recognition engine 1 12 may identify auditory tokens, video tokens, or other appropriate indicators of a logical break in the media content, and may segment the media content into media segments accordingly.
  • the segments are analyzed to determine concepts associated with the respective segments.
  • the story recognition engine 1 12 may include a conceptual analysis engine or utilize a separate conceptual analysis engine to determine a set of one or more key terms and/or concepts associated with the respective segments.
  • analyzing a media segment may include generating a transcript of the audio portion of the media segment (e.g., using speech-to-text processing), and providing the transcript to a conceptual analysis engine, which in turn may provide the set of one or more concepts associated with the media segment.
  • the conceptual analysis engine may also return key terms from the media segment.
  • conceptual analysis engine may be configured to analyze the media segments natively (e.g., without converting the audio, video, or multimedia information to text).
  • the determined concepts are compared across segments to determine conceptual similarity of the segments.
  • the story recognition engine 1 12 may compare a first set of concepts associated with a first media segment to a second set of concepts associated with a second media segment to determine a conceptual similarity between the first set of concepts and the second set of concepts.
  • stories are identified based on the conceptually similar segments.
  • the story recognition engine 1 12 may identify a story made up of two media segments if it determines that the two media segments are conceptually similar.
  • the story recognition engine 1 12 may merge multiple conceptually similar media segments into a story, e.g., if the comparison of block 340 indicates that the media segments are sufficiently related.
  • FIG. 4 is a block diagram of an example computer system 400 to identify stories from media content in accordance with implementations described herein.
  • the system 400 includes story recognition machine-readable instructions 402, which may include or be implemented by certain of the various modules depicted in FIG. 1 .
  • the story recognition machine-readable instructions 402 may be loaded for execution on a processor or processors 404.
  • a processor may include a microprocessor, microcontroller, processor module or subsystem, programmable integrated circuit, programmable gate array, or another control or computing device.
  • the processor(s) 404 may be coupled to a network interface 406 (to allow the system 400 to perform communications over a data network) and/or to a storage medium (or storage media) 408.
  • the storage medium 408 may be implemented as one or multiple computer-readable or machine-readable storage media.
  • the storage media may include different forms of memory including semiconductor memory devices such as dynamic or static random access memories (DRAMs or SRAMs), erasable and programmable read-only memories (EPROMs), electrically erasable and programmable read-only memories (EEPROMs), and flash memories; magnetic disks such as fixed, floppy and removable disks; other magnetic media including tape; optical media such as compact disks (CDs) or digital video disks (DVDs); or other appropriate types of storage devices.
  • DRAMs or SRAMs dynamic or static random access memories
  • EPROMs erasable and programmable read-only memories
  • EEPROMs electrically erasable and programmable read-only memories
  • flash memories such as fixed, floppy and removable disks
  • magnetic media such as fixed, floppy and removable disks
  • optical media such as compact disks (CDs) or digital video disks (DVDs); or other appropriate types of
  • the instructions discussed above may be provided on one computer-readable or machine-readable storage medium, or alternatively, may be provided on multiple computer-readable or machine-readable storage media distributed in a system having plural nodes.
  • Such computer-readable or machine- readable storage medium or media is (are) considered to be part of an article (or article of manufacture).
  • An article or article of manufacture may refer to any appropriate manufactured component or multiple components.
  • the storage medium or media may be located either in the machine running the machine- readable instructions, or located at a remote site, e.g., from which the machine- readable instructions may be downloaded over a network for execution.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

Techniques associated with identifying stories in media content are described in various implementations. In one example implementation, a method may include receiving media content, and segmenting the media content into a plurality of media segments based on auditory indicators included in the media content. The method may also include analyzing a first media segment to determine a first set of concepts associated with the first media segment, and analyzing a second media segment to determine a second set of concepts associated with the second media segment. The method may further include comparing the first set of concepts to the second set of concepts to determine a conceptual similarity between the first set of concepts and the second set of concepts, and in response to the conceptual similarity exceeding a similarity threshold, identifying a story that includes the first media segment and the second media segment.

Description

IDENTIFYING STORIES IN MEDIA CONTENT
BACKGROUND
[0001] In today's always-connected society, media content in the form of live, pre-recorded, or on-demand programming is nearly ubiquitous. For example, 24x7 news programs offer a continuous stream of live information throughout the day, and countless pre-recorded media sources are accessible, e.g., via the Internet, at any given time.
[0002] Media content may be broadcast, streamed, or otherwise delivered via any of a number of communication channels, using a number of different technologies. For example, delivery of streaming video media on the Internet generally includes encoding the video content into one or more streaming video formats, and efficiently delivering the encoded video content to end users.
BRIEF DESCRIPTION OF THE DRAWINGS
[0003] FIG. 1 is a conceptual diagram of an example story recognition environment in accordance with implementations described herein.
[0004] FIG. 2 is a conceptual diagram of an example process of identifying stories from media content in accordance with implementations described herein.
[0005] FIG. 3 is a flow diagram of an example process of identifying stories from media content in accordance with implementations described herein.
[0006] FIG. 4 is a block diagram of an example computer system to identify stories from media content in accordance with implementations described herein.
DETAILED DESCRIPTION
[0007] The vast amount of available media content may easily lead to information overload, especially if the media content is not well organized or otherwise structured in a manner that allows users to easily identify specific content that is of interest. As a result, certain content providers, content aggregators, or end users may manually tag or otherwise classify the media content, e.g., by associating metadata with the content. Such manual classification, while fairly accurate, may be relatively inefficient, expensive, and/or time-consuming. [0008] Described herein are techniques for identifying stories in media content, even if the stories have not been previously classified as such. As used herein, the term "story" generally refers to a portion of media content that relates to a particular topic or to a set of consistent concepts. For example, during the course of a nightly news program, one story may describe the outcome of a recent criminal trial, while another story may discuss the success of a local business, and yet another story may relate to the weather. According to the techniques described here, the media content (e.g., the news program) is separated into conceptual stories (e.g., three conceptually distinct stories including the legal story, the business story, and the weather story). Once the stories have been identified using the described techniques, additional useful processing may be performed - e.g., to summarize or classify the stories, or to isolate (e.g., clip) stories from the media content for more convenient access or delivery. These or other appropriate processing techniques may be applied to stories, after they have been identified, to generally make the stories more accessible and/or consumable by end users.
[0009] FIG. 1 is a conceptual diagram of an example story recognition environment 100 in accordance with implementations described herein. As shown, environment 100 includes a computing system 1 10 that is configured to execute a story recognition engine 1 12. The story recognition engine 1 12 may generally operate to analyze incoming media content 102, and to identify various stories 1 14a, 1 14b, and 1 14c included in the media content 102. As described in further detail below, the story recognition engine 1 12 may generally identify stories by dividing the media content 102 into segments, analyzing the segments to determine concepts associated with the respective segments, comparing concepts across different segments to determine conceptual similarity of the different segments, and merging segments that are conceptually similar into stories.
[0010] The example topology of environment 100 may be representative of various story recognition environments. However, it should be understood that the example topology of environment 100 is shown for illustrative purposes only, and that various modifications may be made to the configuration. For example, environment 100 may include different or additional components, or the components may be implemented in a different manner than is shown. Also, while computing system 1 10 is generally illustrated as a standalone server, it should be understood that computing system 1 10 may, in practice, be any appropriate type of computing device, such as a server, a mainframe, a laptop, a desktop, a workstation, or other device. Computing system 1 10 may also represent a group of computing devices, such as a server farm, a server cluster, or other group of computing devices operating individually or together to perform the functionality described herein.
[0011] Media content 102 may be in the form of any appropriate media type, and may be provided from any appropriate media source. Examples of media types that may be processed as described herein include, but are not limited to, audio information (e.g., radio broadcasts, telephone conversations, streaming audio, etc.), video information (e.g., television broadcasts, webcasts, streaming video, etc.), and/or multimedia information (e.g., combinations of audio, video, graphics, and/or other appropriate content). Examples of media sources include, but are not limited to, broadcast media sources, streaming media sources, online media repositories, standalone physical media (e.g., Blu-Ray discs, DVDs, compact discs, etc.), or the like.
[0012] Computing system 1 10 may include a processor 122, a memory 124, an interface 126, a segmentation module 128, a content analysis module 130, and a segment merge module 132. It should be understood that the components shown are for illustrative purposes only, and that in some cases, the functionality being described with respect to a particular module or component of computing system 1 10 may be performed by one or more different or additional modules or components, e.g., of computing system 1 10 or of another appropriate computing system. Similarly, it should be understood that portions or all of the functionality may be combined into fewer modules or components than are shown.
[0013] Processor 122 may be configured to process instructions for execution by computing system 1 10. The instructions may be stored on a non- transitory, tangible computer-readable storage medium, such as in memory 124 or on a separate storage device (not shown), or on any other type of volatile or nonvolatile memory that stores instructions to cause a programmable processor to perform the techniques described herein. Alternatively or additionally, computing system 1 10 may include dedicated hardware, such as one or more integrated circuits, Application Specific Integrated Circuits (ASICs), Application Specific Special Processors (ASSPs), Field Programmable Gate Arrays (FPGAs), or any combination of the foregoing examples of dedicated hardware, for performing the techniques described herein. In some implementations, multiple processors may be used, as appropriate, along with multiple memories and/or types of memory.
[0014] Interface 126 may be implemented in hardware and/or software, and may be configured, for example, to receive the media content 102 from an appropriate media source (not shown). In some implementations, interface 126 may be configured to locate and/or request media content 102 from one or more media sources. For example, interface 126 may be configured to capture news feeds from different news channels or outlets on a recurring, scheduled, and/or ad hoc basis, and to provide the media content 102 for processing by the story recognition engine 1 12. Interface 126 may also be configured to output processed stories, e.g., stories 1 14a, 1 14b, and/or 1 14c for consumption by end users or other appropriate computing systems, such as a search engine or other appropriate system.
[0015] In some implementations, interface 126 may also include one or more user interfaces that allow a user (e.g., a system administrator) to interact directly with the computing system 1 10, e.g., to manually define or modify settings or options associated with the story recognition engine 1 12. Such settings or options may be stored in a database (not shown), and may be used by the story recognition engine 1 12 to adjust one or more processing parameters associated with story recognition functionality as described herein. Example user interfaces may include touchscreen devices, pointing devices, keyboards, voice input interfaces, visual input interfaces, or the like.
[0016] Segmentation module 128 may execute on one or more processors, e.g., processor 122, and may segment received media content 102 into a plurality of media segments based on auditory indicators included in the media content 102. For example, segmentation module 128 may analyze the audio portion of media content 102 to identify certain auditory tokens (e.g., silence of a given length, or particular types of auditory signals, such as music or specific tones) to identify logical breaks in the media content 102. In the example of a news program, segmentation based on silence or pauses in the audio portion of the media content 102 may lead to segments that align with sentences and/or paragraphs, as speakers may generally pause for a brief moment between sentences and/or paragraphs. Similarly, news programs may include musical jingles, a series of tones, or other auditory signals that indicate logical breaks between portions of the program. These and/or other appropriate auditory indicators may be used to allow segmentation module 128 to segment the media content 102.
[0017] In some implementations, segmentation module 128 may also or alternatively use visual indicators to segment the received media content 102. For example, segmentation module 128 may analyze the video portion of media content 102 to identify certain visual tokens (e.g., key frames that indicate substantial differences between successive frames of video, black frames, or other appropriate visual indicators) that may also or alternatively be used to identify logical breaks in the media content 102. When taken together, auditory indicators such as silence in combination with video indicators such as key frames may be used to accurately and consistently segment the media content 102 into appropriate media segments.
[0018] Segmentation module 128 may also use other appropriate indicators to cause or fine-tune the segmentation of media content 102 into a plurality of media segments. For example, speech-to-text processing of the audio portion of media content 102 may provide a transcript, which may be used, e.g., in association with the auditory and/or visual indicators as described above, to determine appropriate breaks for the segments (e.g., based on periods or other punctuation in the transcript). Similarly, closed-captioning information associated with the video portion of media content 102 may be used as an input to determine or confirm breaks for the segments.
[0019] After the media content 102 has been segmented, the respective media segments may be analyzed by content analysis module 130. Content analysis module 130 may execute on one or more processors, e.g., processor 122, and may analyze the media segments to determine a set of one or more key terms and/or concepts associated with the respective segments. In some implementations, analyzing a media segment may include generating a transcript of the audio portion of the media segment (e.g., using speech-to-text processing), and providing the transcript to a conceptual analysis engine, which in turn may provide the set of one or more concepts associated with the media segment. The conceptual analysis engine may also return key terms from the media segment, e.g., by removing any common terms or stop words that are unlikely to add any conceptual information about the particular segment. In some implementations, the content analysis module 130 may be configured to analyze the media segments natively (e.g., without converting the audio, video, or multimedia information to text) to determine the concepts associated with the media segments.
[0020] Segment merge module 132 may execute on one or more processors, e.g., processor 122, and may compare the concepts identified by content analysis module 130 to determine the conceptual similarity between one or more media segments, and may merge the media segments into a story if the conceptual similarity indicates that the media segments are sufficiently related. For example, in some implementations, the segment merge module 132 may compare a first set of concepts associated with a first media segment to a second set of concepts associated with a second media segment to determine a conceptual similarity between the first set of concepts and the second set of concepts. In some implementations, the conceptual similarity may be expressed as a numerical similarity score, or may otherwise be expressed as an objective conceptual similarity between the two segments.
[0021] Then, if the conceptual similarity exceeds a particular similarity threshold (e.g., that may be configured according to implementation-specific considerations), the segment merge module 132 may identify a story as including both of the media segments. In some implementations, the segment merge module 132 may be configured to analyze a certain number of nearby segments (e.g., the three preceding media segments) for conceptually similar segments. The number of nearby segments to be analyzed in such implementations may be configurable, e.g., by an administrator.
[0022] Conceptual similarity and the similarity threshold as described above may be defined in any appropriate manner to achieve the desired story identification results for a given implementation. For example, conceptual similarity may be determined based on matching concepts and/or key terms, or may be determined based on conceptual distances between concepts and/or key terms, or may be determined based on other appropriate techniques or combinations of techniques. In the case of matching concepts, the similarity threshold may be based on a percentage of the concepts and/or terms that match from one segment to another (e.g., 25% or greater, 50% or greater, etc.), or may be based on a number of matching or otherwise overlapping concepts (e.g., one or more, more than one, more than two, etc.). In the case of conceptual distances, the similarity threshold may be based on nearest conceptual distances, furthest conceptual distances, average conceptual distances, and/or other appropriate measures or combinations of measures. The similarity threshold may be configurable, e.g., by an administrator to achieve a desired level of consistency within the stories. For example, to generate more consistent stories, the similarity threshold may be increased.
[0023] In some implementations, segment merge module 132 may merge not only the segments that are determined to be conceptually similar as described above, but may also merge intervening media segments that are temporally situated between the segments that are to be merged into a story. Continuing the example above, if the first media segment and the second media segment are separated by three intervening media segments, the segment merge module 132 may merge the five media segments - bookended by the first and second media segments and including the three intervening media segments - into a single story even if the intervening media segments were not necessarily identified as being conceptually similar to either of the first or second media segments.
[0024] In some implementations, segment merge module 132 may exclude certain of the intervening media segments from being merged into the story if the particular intervening media segments are identified as being conceptually unrelated to the story. In the example above, if one of the three intervening media segments was identified as being unrelated (as opposed to simply not being identified as specifically related), the segment merge module 132 may merge four of the five media segments, again bookended by the first and second media segments, into a single story such that the story excludes the unrelated media segment. Such exclusion, for example, may ensure that advertisements or other completely separate media segments are not included as part of the story.
[0025] After the stories have been identified as described above, the story recognition engine 1 12 may perform post-identification processing on the stories. For example, the story recognition engine 1 12 may analyze any identified stories to generate summaries of the respective stories, or to divide the media content according to the stories, or to perform other appropriate processing. In such a manner, the stories from various media content may be made more accessible and/or consumable by users.
[0026] FIG. 2 is a conceptual diagram of an example process 200 of identifying stories from media content in accordance with implementations described herein. The process 200 may be performed, for example, by a story recognition processing system such as the story recognition engine 1 12 illustrated in FIG. 1 . For clarity of presentation, the description that follows uses the story recognition engine 1 12 illustrated in FIG. 1 as the basis of an example for describing the process. However, it should be understood that another system, or combination of systems, may be used to perform the process or various portions of the process.
[0027] In stage 210, media content 212 is received by the story recognition engine 1 12. The media content 212 may generally be in the form of a single, continuous block of media, such as a thirty-minute news program with two advertisement breaks included during the program. The media content 212 shows dotted lines that are intended to represent the auditory and/or visual indicators as described above.
[0028] In stage 220, the media content 212 has been broken down into multiple media segments - Segment A 222, Segment B 224, Segment C 226, and Segment D 228. The story recognition engine 1 12 may use auditory and/or visual indicators included in the media content 212 to segment the content, e.g., according to sentences or other logical breaks in the content.
[0029] In stage 230, Segment A 222 has been analyzed to determine the set of concepts 232 associated with Segment A 222. Similarly, Segment B 224 has been analyzed to determine the set of concepts 234 associated with Segment B 224, and Segment C 226 has been analyzed to determine the set of concepts 236 associated with Segment C 226, and Segment D 228 has been analyzed to determine the set of concepts 238 associated with Segment A 228.
[0030] In stage 240, Segment A 222 and Segment B 224 have been merged into Candidate Story A 242, and Segment C 226 and Segment D 228 have been merged into Candidate Story B 244. Such merging may be based on a comparison of the concepts across segments, and a determination that the concepts 232 of Segment A 222 were conceptually similar to the concepts 234 of Segment B 224. Story recognition engine 1 12 may also have compared the concepts 236 of Segment C 226 and/or the concepts 238 of Segment D 228 to those in the previous segments and determined that there was insufficient conceptual similarity to merge the segments. Similarly, story recognition engine 1 12 may have merged Segment C 226 and Segment D 228 based on determining that the concepts 236 and 238 were conceptually similar enough that they were likely part of the same story.
[0031] In stage 250, Candidate Story A 242 has been identified as a non- story 252, and Candidate Story B 244 has been identified as a story 254. Non- stories, such as non-story 252, may be identified, e.g., in cases where the length of the story is less than a configurable minimum story length (e.g., less than thirty seconds), or in cases where the concepts are determined to be inconsequential (e.g., advertisements or non-story segues between stories), or under other appropriate circumstances. After a story has been identified as such, e.g., story 254, post-processing may also be performed. For example, in the case of story 254, the story has been summarized, e.g., based on the content of the story and/or the determined concepts associated with the story.
[0032] FIG. 3 is a flow diagram of an example process 300 of identifying stories from media content in accordance with implementations described herein. The process 300 may be performed, for example, by a story recognition processing system such as the story recognition engine 1 12 illustrated in FIG. 1 . For clarity of presentation, the description that follows uses the story recognition engine 1 12 illustrated in FIG. 1 as the basis of an example for describing the process. However, it should be understood that another system, or combination of systems, may be used to perform the process or various portions of the process.
[0033] Process 300 begins at block 310 when media content is received. In some implementations, the media content may be provided directly to the story recognition engine 1 12 (e.g., by a user or a content provider). In other implementations, the story recognition engine 1 12 may actively locate and/or request media content for processing. For example, the story recognition engine 1 12 may actively monitor a particular news feed (e.g., streaming video content, or broadcast news channel) to gather appropriate media content for processing. [0034] At block 320, the media content is segmented based on auditory indicators, visual indicators, or a combination of auditory and visual indicators. For example, the story recognition engine 1 12 may identify auditory tokens, video tokens, or other appropriate indicators of a logical break in the media content, and may segment the media content into media segments accordingly.
[0035] At block 330, the segments are analyzed to determine concepts associated with the respective segments. For example, the story recognition engine 1 12 may include a conceptual analysis engine or utilize a separate conceptual analysis engine to determine a set of one or more key terms and/or concepts associated with the respective segments. In some implementations, analyzing a media segment may include generating a transcript of the audio portion of the media segment (e.g., using speech-to-text processing), and providing the transcript to a conceptual analysis engine, which in turn may provide the set of one or more concepts associated with the media segment. The conceptual analysis engine may also return key terms from the media segment. In some implementations, conceptual analysis engine may be configured to analyze the media segments natively (e.g., without converting the audio, video, or multimedia information to text).
[0036] At block 340, the determined concepts are compared across segments to determine conceptual similarity of the segments. For example, the story recognition engine 1 12 may compare a first set of concepts associated with a first media segment to a second set of concepts associated with a second media segment to determine a conceptual similarity between the first set of concepts and the second set of concepts.
[0037] At block 350, stories are identified based on the conceptually similar segments. For example, the story recognition engine 1 12 may identify a story made up of two media segments if it determines that the two media segments are conceptually similar. In some implementations, the story recognition engine 1 12 may merge multiple conceptually similar media segments into a story, e.g., if the comparison of block 340 indicates that the media segments are sufficiently related.
[0038] FIG. 4 is a block diagram of an example computer system 400 to identify stories from media content in accordance with implementations described herein. The system 400 includes story recognition machine-readable instructions 402, which may include or be implemented by certain of the various modules depicted in FIG. 1 . The story recognition machine-readable instructions 402 may be loaded for execution on a processor or processors 404. As used herein, a processor may include a microprocessor, microcontroller, processor module or subsystem, programmable integrated circuit, programmable gate array, or another control or computing device. The processor(s) 404 may be coupled to a network interface 406 (to allow the system 400 to perform communications over a data network) and/or to a storage medium (or storage media) 408.
[0039] The storage medium 408 may be implemented as one or multiple computer-readable or machine-readable storage media. The storage media may include different forms of memory including semiconductor memory devices such as dynamic or static random access memories (DRAMs or SRAMs), erasable and programmable read-only memories (EPROMs), electrically erasable and programmable read-only memories (EEPROMs), and flash memories; magnetic disks such as fixed, floppy and removable disks; other magnetic media including tape; optical media such as compact disks (CDs) or digital video disks (DVDs); or other appropriate types of storage devices.
[0040] Note that the instructions discussed above may be provided on one computer-readable or machine-readable storage medium, or alternatively, may be provided on multiple computer-readable or machine-readable storage media distributed in a system having plural nodes. Such computer-readable or machine- readable storage medium or media is (are) considered to be part of an article (or article of manufacture). An article or article of manufacture may refer to any appropriate manufactured component or multiple components. The storage medium or media may be located either in the machine running the machine- readable instructions, or located at a remote site, e.g., from which the machine- readable instructions may be downloaded over a network for execution.
[0041] Although a few implementations have been described in detail above, other modifications are possible. For example, the logic flows depicted in the figures may not require the particular order shown, or sequential order, to achieve desirable results. In addition, other steps may be provided, or steps may be eliminated, from the described flows. Similarly, other components may be added to, or removed from, the described systems. Accordingly, other implementations are within the scope of the following claims.

Claims

WHAT IS CLAIMED IS: 1 . A computer-implemented method of identifying stories in media content, the method comprising:
receiving, at a computing system, media content;
segmenting, using the computing system, the media content into a plurality of media segments based on auditory indicators included in the media content; analyzing, using the computing system, a first media segment from among the plurality of media segments to determine a first set of concepts associated with the first media segment;
analyzing, using the computing system, a second media segment from among the plurality of media segments to determine a second set of concepts associated with the second media segment;
comparing, using the computing system, the first set of concepts to the second set of concepts to determine a conceptual similarity between the first set of concepts and the second set of concepts; and
in response to the conceptual similarity exceeding a similarity threshold, identifying a story that includes the first media segment and the second media segment.
2. The computer-implemented method of claim 1 , wherein segmenting the media content is further based on visual indicators included in the media content.
3. The computer-implemented method of claim 1 , wherein the conceptual similarity is determined based on matching concepts included in the first set to concepts included in the second set.
4. The computer-implemented method of claim 1 , wherein the conceptual similarity is determined based on conceptual distances between concepts included in the first set and concepts included in the second set.
5. The computer-implemented method of claim 1 , wherein the media content comprises a multimedia content stream.
6. The computer-implemented method of claim 1 , wherein identifying the story comprises merging the first media segment, the second media segment, and intervening media segments between the first media segment and the second media segment.
7. The computer-implemented method of claim 6, wherein identifying the story comprises excluding intervening media segments that are identified as being conceptually unrelated to the story.
8. A story recognition system comprising:
one or more processors;
a segmentation module, executing on at least one of the one or more processors, that segments media content into a plurality of media segments based on auditory indicators included in the media content;
a content analysis module, executing on at least one of the one or more processors, that analyzes the media segments to determine concepts associated with the respective media segments; and
a segment merge module, executing on at least one of the one or more processors, that compares the concepts associated with a first media segment to the concepts associated with a second media segment to determine conceptual similarity between the first media segment and the second media segment, and that merges the first media segment and the second media segment into a story in response to the conceptual similarity indicating the first media segment and the second media segment are conceptually related.
9. The story recognition system of claim 8, wherein the segmentation module segments the media content further based on visual indicators included in the media content.
10. The story recognition system of claim 8, wherein the conceptual similarity is determined based on matching the concepts associated with the first media segment to the concepts associated with the second media segment.
1 1 . The story recognition system of claim 8, wherein the conceptual similarity is determined based on conceptual distances between the concepts associated with the first media segment and the concepts associated with the second media segment.
12. The story recognition system of claim 8, wherein the segment merge module further merges into the story intervening media segments between the first media segment and the second media segment.
13. The story recognition system of claim 12, wherein the segment merge module excludes from merging into the story an unrelated intervening media segment, from among the intervening media segments, in response to determining that the unrelated media segment is conceptually unrelated to the story.
14. The story recognition system of claim 8, wherein the media content comprises a multimedia content stream.
15. A non-transitory computer-readable storage medium storing instructions that, when executed by one or more processors, cause the one or more processors to:
segment media content into a plurality of media segments based on auditory indicators included in the media content;
analyze a first media segment from among the plurality of media segments to determine a first set of concepts associated with the first media segment;
analyze a second media segment from among the plurality of media segments to determine a second set of concepts associated with the second media segment;
compare the first set of concepts to the second set of concepts to determine a conceptual similarity between the first set of concepts and the second set of concepts; and
merge the first media segment and the second media segment into a story in response to the conceptual similarity exceeding a similarity threshold.
PCT/EP2013/065232 2013-07-18 2013-07-18 Identifying stories in media content WO2015007321A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US14/905,487 US9734408B2 (en) 2013-07-18 2013-07-18 Identifying stories in media content
PCT/EP2013/065232 WO2015007321A1 (en) 2013-07-18 2013-07-18 Identifying stories in media content
CN201380078901.6A CN105474201A (en) 2013-07-18 2013-07-18 Identifying stories in media content
EP13747986.1A EP3022663A1 (en) 2013-07-18 2013-07-18 Identifying stories in media content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2013/065232 WO2015007321A1 (en) 2013-07-18 2013-07-18 Identifying stories in media content

Publications (1)

Publication Number Publication Date
WO2015007321A1 true WO2015007321A1 (en) 2015-01-22

Family

ID=48979714

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2013/065232 WO2015007321A1 (en) 2013-07-18 2013-07-18 Identifying stories in media content

Country Status (4)

Country Link
US (1) US9734408B2 (en)
EP (1) EP3022663A1 (en)
CN (1) CN105474201A (en)
WO (1) WO2015007321A1 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102047703B1 (en) * 2013-08-09 2019-11-22 엘지전자 주식회사 Mobile terminal and controlling method thereof
US10219029B1 (en) * 2014-03-12 2019-02-26 Google Llc Determining online content insertion points in an online publication
US20160189712A1 (en) * 2014-10-16 2016-06-30 Veritone, Inc. Engine, system and method of providing audio transcriptions for use in content resources
US10645467B2 (en) * 2015-11-05 2020-05-05 Adobe Inc. Deconstructed video units
FR3053497B1 (en) * 2016-06-29 2019-09-13 4T Sa METHOD FOR ENHANCING THE SECURITY OF A PEACE-BASED TELEVISION SYSTEM BASED ON PERIODIC PERIODIC RETRO-COMMUNICATION
US10448065B2 (en) * 2017-05-12 2019-10-15 Comcast Cable Communications, Llc Conditioning segmented content
EP3777239A4 (en) 2018-04-05 2021-12-22 Cochlear Limited Advanced hearing prosthesis recipient habilitation and/or rehabilitation
US11064252B1 (en) * 2019-05-16 2021-07-13 Dickey B. Singh Service, system, and computer-readable media for generating and distributing data- and insight-driven stories that are simultaneously playable like videos and explorable like dashboards
EP3944234A1 (en) * 2020-07-24 2022-01-26 Atos Global IT Solutions and Services Private Limited Method for processing a video file comprising audio content and visual content comprising text content

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070201558A1 (en) * 2004-03-23 2007-08-30 Li-Qun Xu Method And System For Semantically Segmenting Scenes Of A Video Sequence
US20080316307A1 (en) * 2007-06-20 2008-12-25 Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung E.V. Automated method for temporal segmentation of a video into scenes with taking different types of transitions between frame sequences into account

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6295092B1 (en) * 1998-07-30 2001-09-25 Cbs Corporation System for analyzing television programs
US6744922B1 (en) * 1999-01-29 2004-06-01 Sony Corporation Signal processing method and video/voice processing device
US6697821B2 (en) 2000-03-15 2004-02-24 Süccesses.com, Inc. Content development management system and method
US6580437B1 (en) * 2000-06-26 2003-06-17 Siemens Corporate Research, Inc. System for organizing videos based on closed-caption information
US20020108115A1 (en) 2000-12-11 2002-08-08 The Associated Press News and other information delivery system and method
US20030131362A1 (en) * 2002-01-09 2003-07-10 Koninklijke Philips Electronics N.V. Method and apparatus for multimodal story segmentation for linking multimedia content
US8872979B2 (en) * 2002-05-21 2014-10-28 Avaya Inc. Combined-media scene tracking for audio-video summarization
US20070245379A1 (en) * 2004-06-17 2007-10-18 Koninklijke Phillips Electronics, N.V. Personalized summaries using personality attributes
JP4305921B2 (en) 2004-11-02 2009-07-29 Kddi株式会社 Video topic splitting method
WO2006103633A1 (en) * 2005-03-31 2006-10-05 Koninklijke Philips Electronics, N.V. Synthesis of composite news stories
US20100005485A1 (en) 2005-12-19 2010-01-07 Agency For Science, Technology And Research Annotation of video footage and personalised video generation
US8503523B2 (en) * 2007-06-29 2013-08-06 Microsoft Corporation Forming a representation of a video item and use thereof
US8417037B2 (en) * 2007-07-16 2013-04-09 Alexander Bronstein Methods and systems for representation and matching of video content
US10116902B2 (en) * 2010-02-26 2018-10-30 Comcast Cable Communications, Llc Program segmentation of linear transmission
US9116995B2 (en) 2011-03-30 2015-08-25 Vcvc Iii Llc Cluster-based identification of news stories
US10134440B2 (en) * 2011-05-03 2018-11-20 Kodak Alaris Inc. Video summarization using audio and visual cues
WO2015038749A1 (en) * 2013-09-13 2015-03-19 Arris Enterprises, Inc. Content based video content segmentation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070201558A1 (en) * 2004-03-23 2007-08-30 Li-Qun Xu Method And System For Semantically Segmenting Scenes Of A Video Sequence
US20080316307A1 (en) * 2007-06-20 2008-12-25 Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung E.V. Automated method for temporal segmentation of a video into scenes with taking different types of transitions between frame sequences into account

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LIENBART R ET AL: "Scene determination based on video and audio features", MULTIMEDIA COMPUTING AND SYSTEMS, 1999. IEEE INTERNATIONAL CONFERENCE ON FLORENCE, ITALY 7-11 JUNE 1999, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, vol. 1, 7 June 1999 (1999-06-07), pages 685 - 690, XP010342829, ISBN: 978-0-7695-0253-3, DOI: 10.1109/MMCS.1999.779282 *
SARACENO C ET AL: "Identification of story units in audio-visual sequences by joint audio and video processing", IMAGE PROCESSING, 1998. ICIP 98. PROCEEDINGS. 1998 INTERNATIONAL CONFERENCE ON CHICAGO, IL, USA 4-7 OCT. 1998, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, vol. 1, 4 October 1998 (1998-10-04), pages 363 - 367, XP010308744, ISBN: 978-0-8186-8821-8, DOI: 10.1109/ICIP.1998.723500 *
YAO WANG ET AL: "Multimedia Content Analysis - Using Both Audio and Visual Clues", IEEE SIGNAL PROCESSING MAGAZINE, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 17, no. 6, November 2000 (2000-11-01), pages 12 - 36, XP011089877, ISSN: 1053-5888, DOI: 10.1109/79.888862 *

Also Published As

Publication number Publication date
EP3022663A1 (en) 2016-05-25
US9734408B2 (en) 2017-08-15
US20160155001A1 (en) 2016-06-02
CN105474201A (en) 2016-04-06

Similar Documents

Publication Publication Date Title
US9734408B2 (en) Identifying stories in media content
US9369780B2 (en) Methods and systems for detecting one or more advertisement breaks in a media content stream
EP2592575B1 (en) Content descriptor
US9420349B2 (en) Methods and systems for monitoring a media stream and selecting an action
CN112511854B (en) Live video highlight generation method, device, medium and equipment
KR102441927B1 (en) Method and apparatus for identifying local ad placement opportunities
Kotsakis et al. Investigation of broadcast-audio semantic analysis scenarios employing radio-programme-adaptive pattern classification
US11057457B2 (en) Television key phrase detection
US11342003B1 (en) Segmenting and classifying video content using sounds
WO2013097101A1 (en) Method and device for analysing video file
US10141010B1 (en) Automatic censoring of objectionable song lyrics in audio
US20210050015A1 (en) Systems and methods for a two pass diarization, automatic speech recognition, and transcript generation
US8249872B2 (en) Skipping radio/television program segments
US8606585B2 (en) Automatic detection of audio advertisements
US11120839B1 (en) Segmenting and classifying video content using conversation
US11922967B2 (en) System and method for podcast repetitive content detection
CN113378000B (en) Video title generation method and device
Koolagudi et al. Advertisement detection in commercial radio channels
US20220215835A1 (en) Evaluating user device activations
US20130297311A1 (en) Information processing apparatus, information processing method and information processing program
US20240020977A1 (en) System and method for multimodal video segmentation in multi-speaker scenario
US11528525B1 (en) Automated detection of repeated content within a media series
Lopez-Otero et al. MultiBIC: an improved speaker segmentation technique for TV shows.
VU et al. AUTOMATIC QUESTION EXTRACTION FROM MEETING AND DIALOG RECORDING

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201380078901.6

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13747986

Country of ref document: EP

Kind code of ref document: A1

REEP Request for entry into the european phase

Ref document number: 2013747986

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 14905487

Country of ref document: US

Ref document number: 2013747986

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE