WO2021058116A1 - Synthèse de contenu multimédia basé sur l'humeur - Google Patents

Synthèse de contenu multimédia basé sur l'humeur Download PDF

Info

Publication number
WO2021058116A1
WO2021058116A1 PCT/EP2019/076266 EP2019076266W WO2021058116A1 WO 2021058116 A1 WO2021058116 A1 WO 2021058116A1 EP 2019076266 W EP2019076266 W EP 2019076266W WO 2021058116 A1 WO2021058116 A1 WO 2021058116A1
Authority
WO
WIPO (PCT)
Prior art keywords
full
mood based
mood
length movie
based time
Prior art date
Application number
PCT/EP2019/076266
Other languages
English (en)
Inventor
Tarik CHOWDHURY
Jian Tang
Declan O’SULLIVAN
Owen Conlan
Jeremy DEBATTISTA
Fabrizio Orlandi
Majid LATIFI
Matthew Nicholson
Islam HASSAN
Killian MCCABE
Declan MCKIBBEN
Daniel Turner
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to CN201980092247.1A priority Critical patent/CN113795882B/zh
Priority to PCT/EP2019/076266 priority patent/WO2021058116A1/fr
Publication of WO2021058116A1 publication Critical patent/WO2021058116A1/fr

Links

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • G11B27/034Electronic editing of digitised analogue information signals, e.g. audio or video signals on discs
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/102Programmed access in sequence to addressed parts of tracks of operating record carriers
    • G11B27/105Programmed access in sequence to addressed parts of tracks of operating record carriers of operating discs
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/28Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording

Definitions

  • the present invention in some embodiments thereof, relates to generating summery videos for multimedia content, and, more specifically, but not exclusively, to generating mood based summery videos for multimedia content, specifically for fiill-length movies.
  • Multimedia content for example, video content, audio content and/or the like is constantly increasing in giant leaps offering endless options for consuming this content.
  • Much of the multimedia content for example, movies, television series, television shows and/or the like may be significantly long in their time duration.
  • Such applications may include, for example, supporting users to select multimedia to consume, categorization of the multimedia in categories and/or classes (based on genre, narrative, etc.) and/or the like.
  • one of the major challenges in creating these summary videos is to create an efficient summary video which delivers the narrative of the multimedia content, for example, plot, progress, main facts, key moments and/or the like in a concise and coherent manner such that users (spectators) watching the summary video may be able to accurately understand and comprehend the narrative of the multimedia content.
  • An objective of the embodiments of the disclosure is to provide a solution which mitigates or solves the drawbacks and problems of conventional solutions.
  • the above and further objectives are solved by the subject matter of the independent claims. Further advantageous embodiments can be found in the dependent claims.
  • the disclosure aims at providing a solution for creating video summary summarizing multimedia content, in particular full-length movies in a coherent, concise and accurate manner.
  • a method of generating a mood based summary video for a full-length movie comprising:
  • the KG comprises an annotated model of the foil-length movie generated by annotating features extracted for the foil-length movie.
  • Generating a mood based summary video by concatenating a subset of the plurality of mood based time intervals having a score exceeding a predefined threshold.
  • a system for generating a mood based summary video for a foil-length movie comprising using one or more processors configured to execute a code, the code comprising:
  • Code instructions to receive a KG created for the full-length movie comprises an annotated model of the foil-length movie generated by annotating features extracted for the foil-length movie.
  • a computer readable storage medium comprising computer program code instructions, being executable by a computer, for performing the above identified method.
  • each of the annotated features in the KG annotated model is associated with a timestamp, indicating a temporal location of the respective feature in a timeline of the full-length movie.
  • the timestamps may map each of the features expressed by the KG annotation model to their time of occurrence along the time line of the full-length movie. Mapping the features along the timeline may be essential for accurately associating the features with their time in order to identify the plurality of mood based time intervals and apply the metrics used to compute the scores for each of the mood based time intervals.
  • segmenting the full-length movie to the plurality of mood based time intervals is done according to the analysis of the KG according to one or more features expressing one or more of: a background music, a semantic content of speech, a mood indicative facial expression of a character and a mood indicative gesture of a character. These features may be highly indicative of the moods expressed in the respective mood based time interval, in particular the dominant mood.
  • the plurality of metrics comprising: a number of main characters appearing during a certain mood based time interval, a duration of appearance of each main character during the certain mood based time interval and a number of actions relating to each main character during the certain mood based time interval.
  • At least some of the mood based time intervals of the subset are selected according to a score of a diversity metrics computed for each of the plurality of mood based time intervals, the diversity metrics expressing a difference of each mood based time interval compared to its adjacent mood based time intervals with respect to one or more interval attributes which is a member of a group consisting of: characters appearing in the mood based time intervals, a dominant mood of the mood based time intervals and actions seen in the mood based time intervals.
  • Selecting the mood based time intervals according to the diversity score may lead to selection of a diverse collection of the mood based time intervals which may convey an elaborate, wide and/or comprehensive scope of the narrative.
  • Using the diversity score may further serve to avoid selecting redundant mood based time intervals which may present little and/or insignificant difference compared to other selected mood based time intervals.
  • the subset of mood based time intervals is selected according to a time length defined for the mood based summary video. Adjusting the selection of the mood based time intervals according to the predefined duration (length) of the summary video may enable high flexibility in selecting the mood based time intervals which best deliver, present and/or convey the narrative within the time constraints applicable for the summary video.
  • the KG annotated model is created by automatically uplifting features extracted from one or more of: a video content of the full-length movie, an audio content of the full-length movie, a speech content of the full-length movie, one or more subtitles record associated with the full-length movie and a metadata record associated with the full-length movie.
  • the GK annotation model may be a powerful tool providing highly rich, extensive and precise information describing the full-length movie which may be used for extracting accurate and extensive features for segmenting the full-length movie to the plurality of mood based time intervals, for computing the score for the mood based time intervals and/or the like.
  • the KG annotated model is using one or more manually annotated features which is extracted for the full-length movie.
  • Manual annotation may serve to enhance the KG annotated model where automated tools may be somewhat limited.
  • Implementation of the method and/or system of embodiments of the invention can involve performing or completing selected tasks manually, automatically, or a combination thereof. Moreover, according to actual instrumentation and equipment of embodiments of the method and/or system of the invention, several selected tasks could be implemented by hardware, by software or by firmware or by a combination thereof using an operating system.
  • a data processor such as a computing platform for executing a plurality of instructions.
  • the data processor includes a volatile memory for storing instructions and/or data and/or a non-volatile storage, for example, a magnetic hard-disk and/or removable media, for storing instructions and/or data.
  • a network connection is provided as well.
  • a display and/or a user input device such as a keyboard or mouse are optionally provided as well.
  • FIG. 1 is a flowchart of an exemplary process of generating a mood based summery video for a full-length movie, according to some embodiments of the present invention
  • FIG. 2 is a schematic illustration of an exemplary system for generating a mood based summery video for a full-length movie, according to some embodiments of the present invention
  • FIG. 3 is a schematic illustration of an exemplary mood based segmentation of a full- length video, according to some embodiments of the present invention.
  • FIG. 4 is a chart graph of distribution of experiment scores provided by users presented with mood based summary video to rank their understanding of the narrative of the full-length movie, according to some embodiments of the present invention.
  • FIG. 5 presents graph charts of experiment results conducted to evaluate mood based summary videos created for three full-length movies, according to some embodiments of the present invention.
  • the present invention in some embodiments thereof, relates to generating summery videos for multimedia content, and, more specifically, but not exclusively, to generating mood based summery videos for multimedia content, specifically for full-length movies.
  • Creating summary videos to visually (as opposed to textually) summarize multimedia content, in particular full-length movies may be highly desirable and beneficial for a plurality of applications, services, purposes and goals, for example, supporting users to select multimedia to consume, categorization of the multimedia in categories and/or classes (based on genre, narrative, etc.) and/or the like.
  • the video summary which may be significantly shorter in (time) duration should be concise and coherent while conveying (delivering) the narrative of the full-length movie, for example, plot, progress, main facts, key moments and/or the like.
  • the summary video may be therefore created by selecting and concatenating together a plurality of segments (time intervals) extracted from the full-length movie which combined together may have a total duration (length) shorter, for example, approximately 15% to 25% compared to the original full-length movie.
  • the time intervals should be short enough to provide sufficient granularity and localization thus allowing selection of multiple time intervals which may reliably and accurately convey the narrative of the full- length movie.
  • the time intervals should be sufficiently long to create a coherent summary video in which the selected time intervals are logically and effectively connect to each other.
  • the mood based time intervals which are typically approximately several minutes long are sufficiently short (in duration) to serve as an efficient segmentation (split) unit for segmenting full-length movies to a plurality of high resolution time intervals.
  • the mood based time intervals are sufficiently long to convey a substantial aspect of the full-length movie’s narrative in a reliable, coherent and concise manner.
  • the mood based summary video is therefore created for the full-length movie by concatenating together a subset of mood based time intervals selected from a plurality of mood based time intervals created by segmenting the full-length movie based on the mood expressed in each of the mood based time intervals. Moreover, since one or more of the mood based time intervals may express multiple moods, the full-length movie may be segmented to the plurality of mood based time intervals according to a dominant mood expressed in each of the mood based time intervals.
  • Segmenting the full-length video is done based on an analysis of a Knowledge Graph (KG) annotation model created by manually and/or automatically uplifting features extracted from the full-length video.
  • the KG annotated model which is outside the scope of the present invention may be created for the full-length movie by uplifting and annotating features extracted from one or more data sources relating to the full-length movie, for example, the video (visual) content, the audio content, the speech content, a subtitles record associated with the full-length movie, a metadata record associated with the full-length movie, a textual description and/or summary of the full-length movie, an actors list, a characters list and/or the like.
  • the KG annotated model may therefore provide a highly rich, extensive and precise source of information describing the full-length movie, scenes of the full-length movie and/or features extracted from the full-length movie.
  • the full-length video may be segmented to the mood based time intervals according to one or more mood indicative features extracted from the KG annotation model, for example, a background music, a semantic content of the speech, a mood indicative facial expression of a character, a mood indicative gesture of a character and/or the like.
  • the summary video may be created by concatenating together a subset of the mood based time intervals which are selected to bets convey, present and/or deliver the narrative of the full-length movie.
  • a set of metrics was defined to enable evaluating a level of relevance of each mood based time interval to the narrative, specifically, to evaluate a contribution of each mood based time interval to an understanding and comprehension of the narrative by users (spectators) who watch the summary video.
  • the newly defined metrics may include relevance metrics, for example, a number of main characters appearing during a respective mood based time interval, a duration of appearance of each main character during the respective mood based time interval, a number of actions relating to each main character during the respective mood based time interval and/or the like.
  • the main characters appearing in the full-length movie may have a major correlation, contribution and/or impact to the narrative, plot and/or progress of the full-length movie, in particular compared to other characters, for example, supporting characters, side characters, extra characters and/or the like.
  • the number of main characters which appear in a mood based time interval of the full-length movie and the (time) duration of their appearance in the mood based time interval may be therefore highly indicative and reflective of the level of relevance (correlation, expressiveness, agreement and/or alignment) of the mood based time interval to the narrative of the full-length movie.
  • actions relating (e.g. conducted y, inflicted on, involving, etc.) to the main characters which are detected in the mood based time interval may also be highly indicative of the level of relevance of the respective mood based time interval to the narrative of the full-length movie.
  • a relevance score may be computed for each of the mood based time intervals. Moreover, the relevance score computed for one or more of the mood based time intervals may be based on aggregation of the relevance score computed according to multiple relevance metrics.
  • the defined metrics may include diversity metrics defined to express a difference between each mood based time interval and one or more of its adjacent mood based time intervals, i.e., a preceding mood base time interval and a subsequent mood base time interval.
  • the diversity metrics may be defined by one or more interval attributes relating to each mood based time interval with respect to its adjacent mood based time interval(s).
  • the diversity metrics may include, for example, a difference in the (identity) of characters appearing in a mood based time intervals compared to the adjacent interval(s), a difference between the mood, specifically the dominant mood expressed in a mood based time interval and the mood expressed in the adjacent interval(s), a difference in actions depicted in a mood based time interval and actions depicted in the adjacent interval(s) and/or the like.
  • a diversity score may be computed for each of the mood based time intervals according to the diversity metrics, typically by aggregating the diversity score computed for each of the diversity metrics relating to one or more interval attributes.
  • the diversity score therefore expresses how different each mood based time interval is from its adjacent mood based time interval(s).
  • the diversity score therefore expresses how different the respective mood based time interval is from its adjacent mood based time interval(s). Identifying the diversity and difference between each of the mood based time intervals and its adjacent intervals may enable selecting a diverse set of mood based time intervals encompassing a wide scope of the narrative of the full-length movie while avoiding selecting similar mood based time intervals which may be redundant.
  • the subset of mood based time intervals selected for the summary video may therefore include mood based time intervals selected according to their score, for example, an aggregation of their relevance score and diversity score. For example, each mood based time interval having a score exceeding a certain predefined threshold may be selected to the subset used to create the summary video. In another example, a certain number of mood based time intervals having a highest score may be selected for the subset used to create the summary video.
  • the mood based time intervals and/or their number are selected to the subset constituting the summary video according to a one or more timing parameters, for example, an overall duration time defined for the summary video, a duration of one or more of the mood based time intervals and/or the like.
  • the selected subset of mood based time intervals may be concatenated to produce the mood based video summary of the full-length movie which may be output for presentation to one or more users (spectators).
  • the mood based summary videos may present major advantages and benefits compared to exiting methods and systems for creating video summaries.
  • automatically creating the summary videos may significantly reduce the effort and/or time required for manually creating the summary videos as may be done at least partially by some of the existing methods.
  • some of the existing methods may use one or more video inference and/or interpretation tools, algorithms and/or techniques for automatically (at least partially) creating the summary videos. These methods, however, may typically process the entire full- length movie which may require major and even extreme computing resources (e.g. processing resources, storage resources, etc.) and/or computing time.
  • the mood based video summary on the other hand is based on processing limited length time intervals of the full- length movie thus significantly reducing the computing resources and/or computing time required for creating the summary videos.
  • a major challenge in creating the summary videos is to create the summary video such that it is highly representative of the full-length movie’s narrative while maintaining coherence in a significantly (predefined) shorter duration compared to the original full-length movie.
  • Some of the existing methods for automatically creating the summary videos may include in the summary video short time sections extracted from the full-length movie, typically action related sections. These short time sections may fail to accurately, logically and/or coherently deliver the narrative of the full-length movie.
  • the mood based time intervals may define efficient segmentation units which are long enough to contain substantial sections of the full-length movie to convey its narrative in a coherent manner while sufficiently short to allow selection of a large number of time interval presenting a diverse collection of sections (different in content and/or context) of the full-length movie thus conveying an accurate and extensive summary of the narrative.
  • introducing the new metrics, specifically the relevance metrics may allow for efficient automated selection of the significant and important mood based time intervals thus further reducing the computing resources and/or computing time required to create the summary video.
  • applying the newly introduced diversity metrics for automatically selecting the mood based time intervals may serve to select a wide and diverse collection of the mood based time intervals which may be highly representative of the narrative of the full-length movie, in particular for complex narrative movies in which major aspects of the narrative may be distributed across many sections of the movie.
  • using the diversity score may further serve to avoid selecting redundant mood based time intervals which may present little and/or insignificant difference compared to other selected mood based time intervals.
  • the present invention may be a system, a method, and/or a computer program product.
  • the computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
  • the computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device.
  • the computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • a non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD- ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or Flash memory erasable programmable read-only memory
  • SRAM static random access memory
  • CD- ROM compact disc read-only memory
  • DVD digital versatile disk
  • memory stick a floppy disk
  • a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon
  • a computer readable storage medium is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
  • Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
  • the network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers.
  • a network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
  • Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the "C" programming language or similar programming languages.
  • ISA instruction-set-architecture
  • machine instructions machine dependent instructions
  • microcode firmware instructions
  • state-setting data or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the "C" programming language or similar programming languages.
  • the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • LAN local area network
  • WAN wide area network
  • Internet Service Provider for example, AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
  • FPGA field-programmable gate arrays
  • PLA programmable logic arrays
  • the functions noted in the block may occur out of the order noted in the figures.
  • two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.
  • FIG. 1 is a flowchart of an exemplary process of generating a mood based summery video for a full-length movie, according to some embodiments of the present invention.
  • An exemplary process 100 may be executed to generate a mood based summary video summarizing multimedia content, specifically a full- length movie.
  • the mood based summary video which may be significantly shorter than the may consist of a one or more segments of the full-length movie which are highly relevant to the overall narrative (ontology) of the full-length movie thus reliably conveying the summery of the full-length movie.
  • the generated mood based summary video may be presented to one or more users (spectators) for one or more goals, purposes and/or applications, for example, movie selection, movie categorization and/or the like.
  • FIG. 2 is a schematic illustration of an exemplary system for generating a mood based summery video for a full-length movie, according to some embodiments of the present invention.
  • An exemplary video summarization system 200 may execute a process such as the process 100 to generate (create) a mood based summary video for one or more full-length movies.
  • the video summarization system 200 may include an I/O interface 210, a processor(s) 212 for executing the process 100 and storage 214 for storing code (program store) and/or data.
  • the I/O network interface 210 may include one or more interfaces, ports and/or interconnections for exchanging data with one or more external resources.
  • the I/O interface 210 may include one or more network interfaces for connecting to one or more wired and/or wireless networks, for example, a Local Area Network (LAN), a Wide Area Network (WAN), a Municipal Area Network (MAN), a cellular network, the internet and/or the like.
  • the video summarization system 200 may communicate with one or more remote networked resources, for example, a device, a computer, a server, a computing node, a cluster of computing nodes a server, a system, a service, a storage resource, a cloud system, a cloud service, a cloud platform and/or the like.
  • the I/O interface 210 may include one or more interfaces and/or ports, for example, a Universal Serial Bus (USB) port, a serial port and/or the like configured to connect to one or more attachable media devices, for example, a storage medium (e.g. flash drive, memory stick, etc.), a mobile device (e.g. laptop, smartphone, tablet, etc.) and/or the like.
  • USB Universal Serial Bus
  • attachable media devices for example, a storage medium (e.g. flash drive, memory stick, etc.), a mobile device (e.g. laptop, smartphone, tablet, etc.) and/or the like.
  • the video summarization system 200 may receive, via the I/O interface 210, one or more full-length movies 250, for example, fiction movies, documentary movies, educational movies, a series comprising multiple episodes and/or the like.
  • the video summarization system 200 may further receive, via the I/O interface 210, a KG annotation model 255 annotation model created for each of the full-length movies 250.
  • the KG annotated model 255 which is outside the scope of the present invention may be created for each full-length movie 250 by uplifting and annotating features extracted from a video content of the full-length movie 250, an audio content of the full-length movie 250, a speech content of the full-length movie 250, at least one subtitles record associated with the full-length movie 250, a metadata record associated with the full-length movie 250 and/or the like.
  • the KG annotated model 255 which may be created automatically, manually and/or in a combination thereof may therefore provide a highly rich, extensive and precise source of information describing the full-length movie 250, scenes of the full-length movie 250 and/or features extracted from the full-length movie 250.
  • the video summarization system 200 may output, via the I/O interface 210, a video summary 260 summarizing the full-length movie 250 in a significantly short time compared to the full-length movie 250, for example, 15%, 20%, 25%, etc. of the length of the full- length movie 250.
  • the processor(s) 212 may include one or more processor(s), homogenous or heterogeneous, each comprising one or more processing nodes arranged for parallel processing, as clusters and/or as one or more multi-core processor(s).
  • the processor(s) 212 may execute one or more software (code) modules, for example, a process, an application, an agent, a utility, a tool, a script and/or the like each comprising a plurality of program instructions stored in a non-transitory medium such as the storage 214 and executed by one or more processors such as the processor(s) 212.
  • the processor(s) 212 may execute a video summarizer 220 implementing the process 100.
  • the video summarization system 200 may further include one or more hardware components to support execution of the video summarizer 220, for example, a circuit, an Integrated Circuit (IC), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Digital Signals Processor (DSP), a Graphic Processor Unit (GPU) and/or the like.
  • IC Integrated Circuit
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • DSP Digital Signals Processor
  • GPU Graphic Processor Unit
  • the video summarizer 220 may be therefore executed, utilized and/or implemented by one or more software modules, one or more of the hardware components and/or a combination thereof.
  • the storage 214 used for storing data and/or code may include one or more non-transitory memory devices, for example, a persistent non-volatile device such as, for example, a ROM, a Flash array, a hard drive, a solid state drive (SSD), a magnetic disk and/or the like.
  • the storage 214 may typically also include one or more volatile devices, for example, a Random Access Memory (RAM) device, a cache memory and/or the like.
  • the storage 214 further comprises one or more network storage resources, for example, a storage server, a Network Attached Storage (NAS), a network drive, and/or the like accessible to the video summarizer 220 via the I/O interface 210.
  • NAS Network Attached Storage
  • the video summarization system 200 and/or the video summarizer 220 are provided, executed and/or utilized at least partially by one or more cloud computing services, for example, Infrastructure as a Service (IaaS), Platform as a Service (PaaS), Software as a Service (SaaS) and/or the like provided by one or more cloud infrastructures and/or services such as, for example, Amazon Web Service (AWS), Google Cloud, Microsoft Azure and/or the like.
  • IaaS Infrastructure as a Service
  • PaaS Platform as a Service
  • SaaS Software as a Service
  • AWS Amazon Web Service
  • Google Cloud Google Cloud
  • Azure Microsoft Azure
  • the video summarization system 200 may further execute one or more applications, services and/or hosts to enable one or more users, for example, a content expert, a knowledge engineer and/or the like to interact with the video summarizer 220 in order to access, evaluate, define, adjust and/or control the process 100 and/or part thereof.
  • the user(s) may access, for example, the video summarizer 220, the full-length movie 250, the KG annotation model 255, the summary video 260 and/or one or more temporary products of the process 100 may be done via.
  • Access to the video summarization system 200 may be implemented in one or more architectures, deployments and/or methods.
  • the video summarization system 200 may execute one or more host applications, web applications and/or the like providing access to the video summarizer 220 for one or more of remote users.
  • the remote user(s) may use a client device(s), for example, a computer, a server, a mobile device and/or the like which executes an access application, for example, a browser, a local agent and/or the like for accessing the video summarization system 200 via one or more networks to which the video summarization system 200 is connected via the I/O interface 210.
  • one or more of the users may be local users who may access the video summarization system 200 via one or more Human Machine Interfaces (HMI) provided by the I/O interface 210, for example, a keyboard, a point device, (e.g. mouse, trackball, etc.), a display, a touch screen and/or the like.
  • HMI Human Machine Interfaces
  • access to one or more of the video summarizer 220, the full-length movie 250, the KG annotation model 255, the summary video 260 and/or to a temporary product of the process 100 may be done via one or more databases, applications and/or interfaces.
  • one or more databases for example, an SPARQL database may be deployed in association with the video summarization system 200, specifically with the video summarizer 220.
  • the user(s) may therefore issue one or more database queries and/or issue one or more update instructions to interact with the video summarizer 220 and in order to evaluate, define, adjust and/or control the process 100 and/or part thereof.
  • the process 100 starts with the video summarizer 220 receiving a full-length movie 250.
  • the full-length movie 250 for example, a fiction movie, a documentary movie, an educational movie and/or the like may typically comprise of a significantly long video stream, for example, 90 minutes, 120 minutes, 180 minutes and/or the like.
  • the full-length movie 250 may further include a series, a mini-series and/or the like comprising a plurality of episodes.
  • the video summarizer 220 receives a KG annotation model 255 created for the full-length movie 250.
  • the KG annotated model 255 which is outside the scope of the present invention may be created for the full-length movie 250 in order to create an enhanced feature set for the full-length movie 250 providing rich, extensive and precise information describing the full-length movie 250, one or more scenes of the full-length movie 250, a narrative of the full-length movie 250, an ontology of the full-length movie 250 and/or the like.
  • the KG annotated model 255 may be created by uplifting and annotating features extracted from the full-length movie 250 and/or from one or more data sources and/or data records associated and/or corresponding to the full-length movie 250.
  • the KG annotated model 255 may include enhanced features created by uplifting features extracted from the video (visual) content of the full-length movie 250.
  • the KG annotated model 255 may include enhanced features created by uplifting features extracted from the audio content of the full-length movie 250.
  • the KG annotated model 255 may include enhanced features created by uplifting features extracted from the speech content of the full-length movie 250.
  • the KG annotated model 255 may include enhanced features created by uplifting features extracted from one or more subtitles records associated with the full-length movie 250. In another example, the KG annotated model 255 may include enhanced features created by uplifting features extracted from one or more metadata records associated with the full-length movie 250.
  • the KG annotation model 255 may be created manually, automatically and/or by a combination thereof.
  • one or more Natural Language Processing (NLP) methods, algorithms and/or tools may be applied to the full-length movie 250, specifically to features extracted from the full-length movie 250 in order to annotate these features thus uplifting, enhancing and/or enriching the extracted features.
  • NLP Natural Language Processing
  • ML Machine Learning
  • the ML model(s), in particular NLP ML model (s), for example, a neural network, a deep neural network, a Support Vector Machine and/or the like may be trained using one or more training datasets comprising sample features extracted, simulated and/or manipulated for a plurality of full-length movies optionally of the same genre as the received full-length movie 250.
  • Each of the annotated features described by the KG annotation model 255 may be associated (assigned) with a time stamp which temporally maps the respective annotated future to a temporal location in a timeline of the full-length movie 250.
  • the video summarizer 220 may therefore identify the temporal location of the respective annotated feature in the timeline of the full- length movie 250.
  • the video summarizer 220 may identify a single dominant mood which is estimated to be expressed in higher intensity, force, magnitude and/or the like in each of the mood based time intervals compared to other moods which may be expressed in the respective mood based time interval.
  • the video summarizer 220 may identify the plurality of mood based time intervals by analyzing one or more mood indicative annotated features described in the KG annotation model 255. As each of the annotated features is associated with a respective timestamp, the video summarizer 220 may accurately map the annotated features to their temporal locations in the timeline of the full-length movie 250 in order to set the start and end times for each of the plurality of mood based time intervals.
  • the mood indicative annotated features may include, for example, features reflecting a background music (sound track) played during one or more of the mood based time intervals. For example, a dramatic music may be highly indicative of a dramatic scene which may express moods such as depression, romance and/or the like.
  • a romantic music may be highly indicative of a romantic scene which may express moods such as happiness, joy, lightheaded and/or the like.
  • a rhythmic music may be highly indicative of an action scene which may express moods such as anxiety, excitement, fear and/or the like.
  • the mood indicative annotated features may further include features expressing semantic content of speech detected during one or more of the mood based time intervals, for example, key words, contextual words and/or the like.
  • love and/or affection expressing words may be highly indicative of a romantic scene, which may express moods such as happiness, joy, lightheaded and/or the like.
  • weapons, cars, violence expressing words may be highly indicative of an action and/or battle scene, which may express moods such as anxiety, excitement, fear and/or the like.
  • the mood indicative annotated features may further include features expressing tone, intonation and/or volume, which may be coupled with corresponding semantic content expressing features such that the semantic content may be associated with the tone, intonation and/or volume.
  • key words spoken in low volume and/or whispered in a soft intonation may be highly indicative of a romantic and/or a dramatic scene, which may express moods such as sadness, depression, romance, happiness, joy, lightheaded and/or the like.
  • key words spoken in high volume in a sharp intonation may be highly indicative of an action and/or battle scene, which may express moods such as anxiety, excitement, fear and/or the like.
  • the mood indicative annotated features may include features expressing one or more mood indicative (expressive) facial expressions of one or more characters identified in one or more of the mood based time intervals.
  • the mood indicative facial expressions may reflect, for example, anger, happiness, romance, anxiety, excitement, fear, lighthearted, love and/o the like.
  • the mood indicative annotated features may include features expressing one or more mood indicative (expressive) gestures made by one or more characters identified in one or more of the mood based time intervals.
  • the mood indicative gestures may include for example, hugging, kissing, fighting, running, driving and/o the like which may be highly indicative of the moods of the characters, for example, love, anxiety, excitement, fear and/or the like.
  • the video summarizer 220 may further aggregate a plurality of mood indicative annotated features in order to estimate the dominant mode expressed in each of one or more of the mood based time intervals and segment the full-length movie 250 accordingly.
  • the video summarizer 220 may compute a score, specifically a relevance score for each of the plurality of mood based time intervals which expresses the relevance of the respective mood based time interval to a narrative and/or ontology of the full-length movie 250.
  • the video summarizer 220 may compute the relevance score according to one or more relevance metrics specifically defined and applied for estimating the relevance of time intervals extracted from the full-length movie 250 to the narrative and/or ontology of the full-length movie 250.
  • the metrics are defined and applied to reflect a level (degree) of relevance, i.e., the level of correlation, expressiveness, agreement and/or alignment of the extracted time interval with respect to the narrative of the full-length movie 250.
  • the relevance metrics may relate to characters in the full-length movie 250 and actions relating to the characters.
  • the video summarizer 220 may therefore analyze the KG annotation model 255, specifically the annotated features of the KG annotation model 255 to identify the characters presented throughout the full-length movie 250.
  • the video summarizer 220 may identify which of these characters are main character(s), i.e. leading characters playing a major part in the full-length movie 250 and which of these characters are supporting characters, side characters and/or extra characters.
  • the video summarizer 220 may further identify actions relating to each of the characters throughout the full-length movie 250, for example, actions conducted by a character, an action inflicted on a character, an action involving a character and/or the like.
  • the video summarizer 220 associates the characters, main characters and actions with the plurality of mood based time intervals according to the timestamps of the features expressing the time of appearance of the characters and/or the actions along the timeline of the full-length movie 250.
  • each of the mood based time interval is associated with one or more characters seen during the respective mood based time interval and one or more actions relating to the character(s) during the respective mood based time interval.
  • FIG. 3 is a schematic illustration of an exemplary mood based segmentation of a full-length video, according to some embodiments of the present invention.
  • a video summarizer such as the video summarizer 220 may analyze a KG annotation model such as the KG annotation model 255 created for a full-length movie such as the full-length movie 250 comprising a plurality of scenes to segment the full-length movie 250 to a plurality of mood based time intervals.
  • a total (overall) duration of the full-length movie 250 is T n .
  • the video summarizer 220 may identify a certain scene starting at time T n-50 and ending at time Tn-io. Based on the analysis of the KG annotation model 255, the video summarizer 220 may further identify a mood based interval expressing a dominant mood MOODq which starts at starting at time T n-47 and ends at time T n-32 .
  • the video summarizer 220 may also identify, based on the analysis of the KG annotation model 255, one or more actions seen in the scene, for example, an action A w starting at T n-38 , and ending at T n-33 , an action A x starting at T n-33 , and ending at T n-28 , an action A y starting at T n-28 , and ending at T n-25 and an action A z starting at T n-25 , and ending at T n-19 .
  • the video summarizer 220 may associate each of the actions with one or more of the characters identified during the time of the actions and may further associate each of the actions with a respective mood based time interval according to the time of detection of the respective action along the timeline of the full-length movie 250 identified by the timestamp(s) assigned to the feature(s) expressing the respective action. Based on the analysis of the KG annotation model 255, the video summarizer thus associates each of the mood based time intervals with one or more characters, main character(s) or other character(s) and actions relating to the identified characters. Once the association is accomplished the relevance metrics may be applied.
  • the relevance metrics may include, for example, a number of main characters appearing during the certain mood based time interval.
  • the main characters may have a major correlation, contribution and/or impact to the narrative, plot and/or progress of the full-length movie 250.
  • the relevance, e.g. the correlation, the contribution and/or the impact of the main characters to the full-length movie 250 may be significantly higher compared to other characters, for example, supporting characters, side characters, extra characters and/or the like depicted in the full- length movie 250.
  • the number of main characters which are depicted in a certain time interval of the full-length movie 250 may be therefore highly indicative and reflective of the level of relevance (correlation, expressiveness, agreement and/or alignment) of the certain time interval to the narrative of the full-length movie 250.
  • time intervals showing a large number of main characters of the full- length movie 250 may be highly correlated to the narrative of the full-length movie 250 while time intervals depicting a small number of main characters, for example, one main character may have low correlation to the narrative of the full-length movie 250.
  • time intervals of the full-length movie in which none of the main characters appears may have low and potentially insignificance correlation, contribution and/or impact to the narrative, ontology, plot and/or progress of the full-length movie 250.
  • the video summarizer 220 may compute the relevance score of one or more of the mood based time intervals according to the main character metrics based on the number of main characters identified in the respective mood based time interval.
  • the video summarizer 220 may compute a main characters score ImpChar(S i ) of the respective mood based time interval S i according to an exemplary formulation presented in equation 1 below which indicates how many of the overall characters depicted in the respective mood based time interval are main characters. Equation 1:
  • the relevance metrics may further include a time (duration) of appearance of each of the main characters depicted in a time interval. Since the main characters may have a major relevance to the narrative, plot and/or progress of the full-length movie 250, the time of appearance of these main characters during a certain time interval may also be highly indicative and reflective of the level of relevance (correlation, expressiveness, agreement and/or alignment) of the certain time interval with respect to the narrative of the full-length movie 250.
  • This metrics may be computed to express, for example, the time duration of appearance of each of the main characters seen in the respective mood based time interval with relation (e.g. a fraction of) the total (overall) time duration of the respective mood based time interval.
  • the video summarizer 220 may compute the relevance score of one or more of the mood based time intervals according to the main character time duration metrics for one or more of the main characters appearing in the respective mood based time interval. Moreover, in case a plurality of main characters appear in the respective mood based time interval, the video summarizer 220 may further compute and/or adjust the relevance score by aggregating the main character time duration metrics for the plurality of main characters shown in the respective mood based time interval.
  • Another relevance metric which may be applied to compute the relevance score of one or more of the mood based time intervals is a number of actions relating to each main character (designated important actions herein after) detected during the respective mood based time interval, for example, actions conducted by a main character, actions inflicted on a main character, actions involving a main character and/or the like.
  • the important actions relating to these main characters may also have major relevance to the narrative of the full-length movie 250.
  • the relevance, e.g. the correlation, the contribution and/or the impact of the important actions relating to the main characters may be significantly higher compared to actions relating to the other characters (i.e. conducted by, inflicted on, involving).
  • the number of important actions identified in a certain time interval of the full-length movie 250 as relating to one or more of the main characters seen in the certain time interval may be therefore highly indicative and reflective of the level of relevance (correlation, expressiveness, agreement and/or alignment) of the certain time interval with respect to the narrative of the full-length movie 250.
  • the video summarizer 220 may compute the relevance score of one or more of the mood based time intervals according to the important actions metrics based on the number of actions identified for each main character identified in the respective mood based time interval.
  • the video summarizer 220 may compute an important actions score ImpAct (S i ) of the respective mood based time interval S j according to an exemplary formulation presented in equation 2 below which indicates how many of the overall actions detected in the respective mood based time interval relate to main characters.
  • C n designates a character identified in the respective mood based time interval S i
  • Main(C n ) designates a main character identified in the respective mood based time interval S i
  • Actions ⁇ S i designates actions relating to all the characters C n identified in the respective mood based time interval S i and designates the important actions relating to the main characters identified in the respective mood based time interval S i .
  • the video summarizer 220 may compute and/or adjust the relevance score of one or more of the mood based time intervals of the fiill-length movie 250 by aggregating the relevance scores computed for the respective mood based time interval according to multiple metrics selected from the main character metrics, the main character time duration metrics and the important actions metrics.
  • the video summarizer 220 may compute a diversity score for one or more of the mood based time intervals expressing a difference between the respective mood based time interval compared to one or more of its adjacent mood based time intervals, i.e., a preceding mood base time interval and a subsequent mood base time interval.
  • the diversity score therefore expresses how different the respective mood based time interval is from its adjacent mood based time interval(s). Identifying the diversity and difference between each of the mood based time intervals and its adjacent intervals may enable selecting a diverse set of mood based time intervals encompassing a wide scope of the narrative of the full-length movie 250 while avoiding selecting similar mood based time intervals, which may be redundant.
  • the video summarizer 220 computes the diversity score according to a diversity metrics defined by one or more interval attributes of the respective mood based time interval and its adjacent mood based time interval(s).
  • interval attributes may include, for example, a character attribute reflecting the (identify of) characters appearing in the mood based time interval, a mood attribute reflecting the moods expressed in the mood based time interval, an action attribute reflecting the actions depicted in the mood based time interval and/or the like.
  • the diversity metrics may thus express the difference between the respective mood based time interval and its adjacent mood based time interval(s) with respect to the respective interval attribute.
  • a first diversity metric may express the difference between the characters appearing in the respective mood based time interval and those appearing in the adjacent mood based time interval(s).
  • a second diversity metric may express the difference between the mood, specifically the dominant mood expressed in the respective mood based time interval and the mood(s) expressed in the adjacent mood based time interval(s).
  • a third diversity metric may express the difference between actions depicted in the respective mood based time interval and actions depicted in the adjacent mood based time interval(s).
  • the video summarizer 220 may compute a partial diversity score for each of the interval attributes between two consecutive mood based time intervals.
  • the partial diversity score may be computed as an intersection of the respective interval attribute as identified in two subsequent mood based time intervals over a union of the respective interval attribute as identified in two subsequent mood based time intervals S i and S i+1 as presented in equation 3 below.
  • C ⁇ S i is the set of characters appearing in the mood based time interval S i and C ⁇ S i+1 is the set of characters appearing in the mood based time interval S i+1 .
  • M ⁇ S i is the set of moods expressed in the mood based time interval S i and M ⁇ S i+1 is the set of moods expressed in the mood based time interval S i+1 and A ⁇ S i is the set of actions seen in the mood based time interval 5; and A ⁇ S i+1 is the set of actions seen in the mood based time interval S i+1 .
  • the video summarizer 220 may compute a relative diversity score for each two consecutive mood based time intervals by aggregating two or more of the individual diversity scores computed separately for each of the interval attributes. For example, the video summarizer 220 may compute the relative diversity score according to the exemplary formulation presented in equation 4 below.
  • the video summarizer 220 may then compute the diversity score Div(S i ) for each of the mood based time intervals S i based on the relative diversity score computed for every two consecutive mood based time intervals, for example according to the exemplary formulation presented in equation 5 below.
  • the video summarizer 220 may set the diversity score Div( S i ) for these two mood based time intervals to equal the relative diversity score computed for these two mood based time intervals, i.e., d(S (0,1) ) for the first mood based time interval and (S (n-1,n) ) for the last mood based time interval assuming the full-length movie 250 is segmented to n mood based time intervals.
  • the video summarizer 220 may generate a mood based summary video for the fiill-length movie 250 which aims to summarize the full-length movie 250 and present the narrative, plot and/or progress of the full-length movie 250 in a significantly shorter time duration compared to the time duration of the full-length movie 250, for example, 15%, 20, 25% and/or the like.
  • the video summarizer 220 may generate the mood based summary video by concatenating together a subset of mood based time intervals selected from the plurality of mood based time intervals according to a score computed for each of the mood based time intervals.
  • the score computed by the video summarizer 220 for each of the mood based time intervals includes the relevance score computed for the respective mood based time interval and optionally the diversity score computed for the respective mood based time interval.
  • the score computed by the video summarizer 220 for each of the mood based time intervals may therefore express an aggregated score, for example, a weighted average of the scores computed according to the metrics described herein before.
  • weighted average score is computed based on the relevance score(s) computed according to the relevance metrics optionally combined with the diversity score computed based on the diversity metrics derived from the interval attributes.
  • the video summarizer 220 may apply one or more methods, techniques and/or implementation modes for selecting the subset of mood based time intervals used to generate the mood based summary video.
  • the video summarizer 220 may select all the mood based time intervals having a score exceeding a certain predefined threshold value. In another example, the video summarizer 220 may select a predefined number of mood based time intervals having the highest score.
  • the video summarizer 220 may select the subset of mood based time intervals according to a duration time predefined for the mood based summary video to be created for the full-length movie 250. For example, assuming a certain time duration is predefined for the mood based summary video, the video summarizer 220 may select a certain number of the highest scoring mood based time intervals which have a combined (time) duration which is less or equal to the predefined certain time duration.
  • the resulting moos based summary video is therefore a sequence of the mood based time intervals selected according to the scores assigned by the metrics described herein before.
  • the mood based time intervals are shorter than scenes and longer than actions thus providing a convenient and efficient unit for creating the mood based summary video which may accurately, concisely and coherently represent the narrative, plot and progress of the full-length movie 250 in a significantly shorter time period, for example, 20 to 30 minutes.
  • the video summarizer 220 outputs the mood based summary video which may be used by one or more users for one or more purposes, objectives, goals and/or applications. For example, one or more users may watch the mood based summary video in order to determine whether they wish to watch the full-length movie 250. In another example, one or more users may watch the mood based summary video in order to categorize the full-length movie 250 in one or more categories, libraries and/or the like.
  • the video summarizer 220 may adjust one or more of the weights assigned to the relevance score, to the diversity score and/or to any of their components to adjust (reduce or increase) the contribution and/or the relevance of one or more of the metrics and hence of the score computed according to these metrics.
  • the video summarizer 220 may further apply one or more ML models trained over a large dataset comprising a plurality of full-length movies such as the full-length movie 250 each associated with a respective KG annotation model such as the KG annotation model 255 to adjust the weights assigned to each of the metrics.
  • the training datasets may be labeled with a feedback scores assigned by users who viewed mood based summary videos created for the plurality of full- length movies.
  • the feedback scores may reflect the understanding and/or conception of the narrative, ontology and/or plot of a certain full-length movie based on watching the respective mood based summary video.
  • the computing hardware used for the experiments is selected to support the intensive computer vision processing required by the crowd behavior anomaly detector 220 executing the process 100.
  • the computing hardware is based on a work station comprising two Intel® Xeon ® E5-2600 CPUs having 6 cores each and 15MB SmartCache operating @ 2.00GHz supported by 24GB DRAM and an nVIDIA® GeForce GTX 1080 Xtreme D5X GPU with 8GB DRAM.
  • the video summarizer 220 was implemented in Python and JavaScript programming languages and makes use of source code provided on GitHub.
  • This evaluation aspect is to estimate the quality of the mood based summary videos, in particular to measure the user understanding after watching a mood based summary video.
  • This aspect should refer to the amount of information that a user obtains from watching the mood based summary video and, if possible, it should not reflect the subjective view of the user on the quality of the mood based summary video.
  • the purpose of this evaluation aspect is to determine the extent to which the automatically generated mood based summary videos include key events of their respective full-length movies through a comparison with human authored text summaries. These text summaries may be obtained, for example, through movie plots described on Wikipedia, IMDb, TV review websites and/or the like.
  • the movies are selected to represent a range of complexity of movie narratives (plots, characters, structure, turning points, etc.) where “Mission Impossible” may be relatively simple, “The Girl with the Dragon Tattoo” may be more complicated and “Casino Royale” may be the most complex.
  • a plurality of users who are not familiar with the three evaluated full-length movies and have not seen them before were presented with three mood based summary videos created for the evaluated full-length movies.
  • the users then filled a questionnaire comprising questions aiming to determine the level of the users’ understanding of the narratives of the full-length movies.
  • the questionnaire was fabricated according to a Likert scale as known in the art with a scale of 1 to 7 where 1 indicates a very low understanding level and 7 indicates a very high understanding level.
  • FIG. 4 is a chart graph of distribution of experiment scores provided by users presented with mood based summary video to rank their understanding of the narrative of the full-length movie, according to some embodiments of the present invention.
  • a chart graph 400 presents an averaged score accumulated for a plurality of users ranking their understanding of the three full-length movies based on watching the mood based summary videos cerate for these three full-length movies. As expected, the level of understanding expressed by the user for the three full-length movies matches the complexity of the narrative of these full-length movies.
  • MI6 agent James Bond gains his license to kill and status as a 00 agent by assassinating the traitorous MI6 section chief Dryden at the British Embassy in
  • the evaluators had the task of checking how many of these facts extracted from textual summaries were presented in the respective mood based summary video.
  • the evaluators were presented with a list of facts extracted from the textual summaries and, after watching the mood based summary videos, they had to mark the facts that are included in the respective mood based summary videos, either fully or partially.
  • manually generated textual summaries by humans are available on well-known websites such as Wikipedia or IMDB, specific community-based online wikis and/or the like
  • the textual summaries obtained from Wikipedia plots and IMDB were used as they both enforce strict guidelines on their authors and offer summaries that are quite similar in structure and granularity.
  • This evaluation strategy has the advantage that it does not require questionnaires or user-based evaluations which may be subjective and provides an accurate and concrete estimate of the validity of the generated mood based summary videos. Simply counting the key facts that are included in the mood based summary videos provides a reliable estimation of the selection of the mood based time intervals and the summarization method as described in the process 100.
  • Summarization strategies for generating the mood based summary videos included 3 different summarization strategies, each employing different metrics for computing the score for the mood based time intervals, specifically, presence of main actors, moods, types of activities and/or the like.
  • a similar total duration (length) of approximately of 30 minutes was set for all mood based summary videos generated according to the different summarization strategies such that a similar compression rate was applied the different strategies mood based summary videos.
  • the duration (length) of the evaluated full-length movies is over 2 hours such that the duration of mood based summary videos are ⁇ 25% of the duration of the respective full-length movies.
  • Table 1 presents the results of the experiment conducted by the seven evaluators for the three evaluated full-length movies. It should be noted that while the experiments are conducted on a small scale and may thus lack statistical significance, these experiments may provide insights on the developed summarization algorithms.
  • Table 1 The first evaluation aspect of user understanding is presented in the “understand” row for each of the full-length movies and expresses the understanding of the respective evaluator of the respective fill-length movie after watching the respective summary video on the scale of 1-7. As evident from table 1, the evaluators had consistent results showing substantial agreement and similar values. The standard deviation for the percentages reported in table 1 range from a minimum of ⁇ 4.1% to ⁇ 8.9%. However, as stated herein before, with this limited number of evaluators deriving significant statistical values may be highly limited.
  • the second evaluation aspect of alignment with the textual summaries is reflected in the “%present”. “%partial” and “%missing” rows in table 1.
  • the “%present” expresses the percentage of facts found by the respective evaluator to be aligned (match) between the respective summary video and its corresponding text summary.
  • the “%partial” expresses the percentage of facts found by the respective evaluator to be partially aligned (match) between the respective summary video and its corresponding text summary.
  • the “%missing” expresses the percentage of facts found by the respective evaluator to be missing in the respective summary video compared to its corresponding text summary.
  • the average percentages (right most column) of missing key facts ranges between 53% and 59% for the mood based summary videos created for the three evaluated full-length movies. Consequently, 41%-47% of the key facts were included in the mood based summary videos created for the three evaluated full-length movies (either partially or completely). This may be a significantly good result especially considering that, based on rough estimation a mood based summary video which includes all facts of a respective full-length movie, for example, “Mission Impossible”, may be approximately an hour long. Therefore, achieving a relatively high compliance of the mood based summary video which is half that time ( ⁇ 30 minutes) with the respective text summaries may be a major improvement.
  • FIG. 5 presents graph charts of experiment results conducted to evaluate mood based summary videos created for three full-length movies, according to some embodiments of the present invention.
  • a pie graph chart 502 presents the distribution of present, partial and missing facts as evaluated by the evaluators for the mood based summary video created for the “Mission Impossible” full-length movie.
  • a pie graph chart 504 presents the distribution of present, partial and missing facts as evaluated by the evaluators for the mood based summary video created for the “Casino Royal” full-length movie.
  • a pie graph chart 506 presents the distribution of present, partial and missing facts as evaluated by the evaluators for the mood based summary video created for the “The Girl with The Dragon Tattoo” full-length movie.
  • a compound or “at least one compound” may include a plurality of compounds, including mixtures thereof.
  • range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.
  • a numerical range is indicated herein, it is meant to include any cited numeral (fractional or integral) within the indicated range.
  • the phrases “ranging/ranges between” a first indicate number and a second indicate number and “ranging/ranges from” a first indicate number “to” a second indicate number are used herein interchangeably and are meant to include the first and second indicated numbers and all the fractional and integral numerals therebetween.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

L'invention concerne un procédé et des systèmes permettant de générer une vidéo de faits marquants basée sur l'humeur pour un long-métrage, faisant appel aux étapes suivantes : la réception d'un long-métrage, la réception d'un modèle annoté de graphique de connaissances (KG) du long-métrage généré par l'annotation de caractéristiques extraites pour le long-métrage, la segmentation du long-métrage en une pluralité d'intervalles de temps basés sur l'humeur, chacun exprimant une certaine humeur dominante basée sur une analyse du KG, le calcul d'un score pour chacun de la pluralité d'intervalles de temps basés sur l'humeur selon une ou plusieurs métriques d'une pluralité de métriques exprimant un niveau de pertinence de l'intervalle de temps basé sur l'humeur respectif par rapport à un récit du long-métrage, la génération d'une vidéo de résumé basée sur l'humeur par concaténation d'un sous-ensemble de la pluralité d'intervalles de temps basés sur l'humeur présentant un score dépassant un seuil prédéfini; et la sortie de la vidéo de résumé basée sur l'humeur aux fins de présentation à au moins un utilisateur.
PCT/EP2019/076266 2019-09-27 2019-09-27 Synthèse de contenu multimédia basé sur l'humeur WO2021058116A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980092247.1A CN113795882B (zh) 2019-09-27 2019-09-27 基于情绪的多媒体内容概括
PCT/EP2019/076266 WO2021058116A1 (fr) 2019-09-27 2019-09-27 Synthèse de contenu multimédia basé sur l'humeur

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2019/076266 WO2021058116A1 (fr) 2019-09-27 2019-09-27 Synthèse de contenu multimédia basé sur l'humeur

Publications (1)

Publication Number Publication Date
WO2021058116A1 true WO2021058116A1 (fr) 2021-04-01

Family

ID=68084846

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2019/076266 WO2021058116A1 (fr) 2019-09-27 2019-09-27 Synthèse de contenu multimédia basé sur l'humeur

Country Status (2)

Country Link
CN (1) CN113795882B (fr)
WO (1) WO2021058116A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220012500A1 (en) * 2020-07-09 2022-01-13 Samsung Electronics Co., Ltd. Device and method for generating summary video

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117807995B (zh) * 2024-02-29 2024-06-04 浪潮电子信息产业股份有限公司 一种情绪引导的摘要生成方法、系统、装置及介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080187231A1 (en) * 2005-03-10 2008-08-07 Koninklijke Philips Electronics, N.V. Summarization of Audio and/or Visual Data
US20150139606A1 (en) * 2013-11-15 2015-05-21 Lg Electronics Inc. Video display device and method for operating the same
US20160211001A1 (en) * 2015-01-20 2016-07-21 Samsung Electronics Co., Ltd. Apparatus and method for editing content
US20190034428A1 (en) * 2016-03-15 2019-01-31 Telefonaktiebolaget Lm Ericsson (Publ) Associating metadata with a multimedia file

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
BR9305594A (pt) * 1992-12-17 1995-03-01 Samsung Electronics Co Ltd Melo para gravação de disco, processo e aparelho para reprodução de tal gravação e aparelho para acompanhamento da reprodução por vídeo
JP4345190B2 (ja) * 2000-03-30 2009-10-14 ソニー株式会社 磁気テープ記録装置および方法
EP1738368A1 (fr) * 2004-04-15 2007-01-03 Koninklijke Philips Electronics N.V. Procede de production d'un article de contenu multimedia ayant une influence emotionnelle specifique sur un utilisateur
US20140298364A1 (en) * 2013-03-26 2014-10-02 Rawllin International Inc. Recommendations for media content based on emotion
US20150243325A1 (en) * 2014-02-24 2015-08-27 Lyve Minds, Inc. Automatic generation of compilation videos
CN104065977B (zh) * 2014-06-06 2018-05-15 北京音之邦文化科技有限公司 音/视频文件的处理方法及装置
US20160014482A1 (en) * 2014-07-14 2016-01-14 The Board Of Trustees Of The Leland Stanford Junior University Systems and Methods for Generating Video Summary Sequences From One or More Video Segments
US10169659B1 (en) * 2015-09-24 2019-01-01 Amazon Technologies, Inc. Video summarization using selected characteristics
US9721165B1 (en) * 2015-11-13 2017-08-01 Amazon Technologies, Inc. Video microsummarization
WO2018081751A1 (fr) * 2016-10-28 2018-05-03 Vilynx, Inc. Système et procédé d'étiquetage vidéo
US10192584B1 (en) * 2017-07-23 2019-01-29 International Business Machines Corporation Cognitive dynamic video summarization using cognitive analysis enriched feature set
CN107948732B (zh) * 2017-12-04 2020-12-01 京东方科技集团股份有限公司 视频的播放方法、视频播放装置及系统
CN109408672B (zh) * 2018-12-14 2020-09-29 北京百度网讯科技有限公司 一种文章生成方法、装置、服务器及存储介质
CN110166650B (zh) * 2019-04-29 2022-08-23 北京百度网讯科技有限公司 视频集的生成方法及装置、计算机设备与可读介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080187231A1 (en) * 2005-03-10 2008-08-07 Koninklijke Philips Electronics, N.V. Summarization of Audio and/or Visual Data
US20150139606A1 (en) * 2013-11-15 2015-05-21 Lg Electronics Inc. Video display device and method for operating the same
US20160211001A1 (en) * 2015-01-20 2016-07-21 Samsung Electronics Co., Ltd. Apparatus and method for editing content
US20190034428A1 (en) * 2016-03-15 2019-01-31 Telefonaktiebolaget Lm Ericsson (Publ) Associating metadata with a multimedia file

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HANJALIC A ET AL: "Affective Video Content Representation and Modeling", IEEE TRANSACTIONS ON MULTIMEDIA, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 7, no. 1, 1 February 2005 (2005-02-01), pages 143 - 154, XP011125470, ISSN: 1520-9210, DOI: 10.1109/TMM.2004.840618 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220012500A1 (en) * 2020-07-09 2022-01-13 Samsung Electronics Co., Ltd. Device and method for generating summary video
US11847827B2 (en) * 2020-07-09 2023-12-19 Samsung Electronics Co., Ltd. Device and method for generating summary video

Also Published As

Publication number Publication date
CN113795882B (zh) 2022-11-25
CN113795882A (zh) 2021-12-14

Similar Documents

Publication Publication Date Title
Stappen et al. The multimodal sentiment analysis in car reviews (muse-car) dataset: Collection, insights and improvements
Kim et al. Understanding in-video dropouts and interaction peaks inonline lecture videos
Rohrbach et al. Movie description
Kleftodimos et al. Using open source technologies and open internet resources for building an interactive video based learning environment that supports learning analytics
US11062086B2 (en) Personalized book-to-movie adaptation recommendation
US11055334B2 (en) System and method for aligning messages to an event based on semantic similarity
Yuksel et al. Human-in-the-loop machine learning to increase video accessibility for visually impaired and blind users
Asai et al. Evidentiality-guided generation for knowledge-intensive NLP tasks
US20170004139A1 (en) Searchable annotations-augmented on-line course content
US8965867B2 (en) Measuring and altering topic influence on edited and unedited media
US11682415B2 (en) Automatic video tagging
Gajos et al. Leveraging video interaction data and content analysis to improve video learning
Park et al. Crowdsourcing micro-level multimedia annotations: The challenges of evaluation and interface
Kleftodimos et al. An interactive video-based learning environment supporting learning analytics: Insights obtained from analyzing learner activity data
US9672475B2 (en) Automated opinion prediction based on indirect information
US9525896B2 (en) Automatic summarizing of media content
US11677991B1 (en) Creating automatically a short clip summarizing highlights of a video stream
Jiang et al. Effective crowdsourcing for a new type of summarization task
WO2021058116A1 (fr) Synthèse de contenu multimédia basé sur l'humeur
CN111177462A (zh) 视频分发时效的确定方法和装置
Bai et al. A Survey of Multimodal Large Language Model from A Data-centric Perspective
Larson et al. The community and the crowd: Multimedia benchmark dataset development
Zhang et al. Mm-narrator: Narrating long-form videos with multimodal in-context learning
Huang et al. VideoMark: A video-based learning analytic technique for MOOCs
Wu et al. Cold start problem for automated live video comments

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19779479

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19779479

Country of ref document: EP

Kind code of ref document: A1