WO2016157150A1 - Method and system for automatic generation of video from non-visual information - Google Patents

Method and system for automatic generation of video from non-visual information Download PDF

Info

Publication number
WO2016157150A1
WO2016157150A1 PCT/IB2016/051881 IB2016051881W WO2016157150A1 WO 2016157150 A1 WO2016157150 A1 WO 2016157150A1 IB 2016051881 W IB2016051881 W IB 2016051881W WO 2016157150 A1 WO2016157150 A1 WO 2016157150A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
fact
elements
audio
entities
Prior art date
Application number
PCT/IB2016/051881
Other languages
French (fr)
Inventor
Francesco PICCOLOMINI NALDI BANDINI
Ludovico Giangiuseppe MARINI
Marco MIGLIACCIO
Alain FRANZONI
Original Assignee
Littlesea S.R.L.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Littlesea S.R.L. filed Critical Littlesea S.R.L.
Publication of WO2016157150A1 publication Critical patent/WO2016157150A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/438Presentation of query results
    • G06F16/4387Presentation of query results by the use of playlists
    • G06F16/4393Multimedia presentations, e.g. slide shows, multimedia albums

Definitions

  • the invention relates to computer systems. More specifically, the invention related to a method for converting non- visual information, such as text or other content, into video.
  • the present invention provides a method for converting an original set of textual information into video form that retains the semantics and the coherence of the textual information, based on a user or system-defined storyboard, a filtering engine and a rendering engine.
  • the textual information coming from multiple information sources is selected, filtered and analyzed by a text-to-visual elements selector and mapper.
  • the information relative to the selected video element is fed to an audio module that selects a corresponding audio element.
  • Audio and video elements selected from one or more object libraries are organized by a graph generator in a graph whose arches can represent both temporal and spatial information, on the basis of a system or user defined storyboard.
  • Said graph is analyzed by a coherence module that correct potential incoherencies of the audio/video.
  • the corrected spatial-temporal graph is fed to a rendering engine that generates the corresponding video.
  • the present invention is a method for converting text to video and audio, comprising the various steps.
  • a VD/P1285PC corresponding audio element for said video element the fourth step of selecting the properties of every video and audio element, the fifth step of generating a spatial temporal graph representing the video and audio elements in space and in time, the sixth of selecting the parameters of global coherence of the video, the seventh step of analyzing the spatial- temporal graph and identify potential inconsistencies, the eighth step of modifying the graph to correct said inconsistencies and the ninth step of automatically assembling said video and corresponding audio as a configuration file.
  • the information filtered from a text may include financial information and data and financial rating information as well as statistical data
  • the structuring of the text is performed according to the criteria of the content categorization, the entities and data elements extraction and mapping, the relations between entities and the relevance
  • the storyboard generation is performed according to the criteria of:
  • Said visual element characteristics are selected among the size, the colors and the fonts, the look and the feel, the type of visual representation, selecting the element based on the rules of: rules for different types of content, as decided in the storyboard; priority for entity and elements type; variety of entities and elements types and the timing rules.
  • the selection of the video coherence parameters is based on the rules of the different types of content, as decided in the storyboard , the priority in time and in space for entity and elements type, the variety of entities and elements types and the timing rules.
  • the selection of the audio characteristics are in accordance to the video properties and may include, a narration, the sound effects, a music track or any combination of the above.
  • the configuration file is rendered into video on the fly upon a user's request.
  • textual information is gathered from a multitude of sources including financial reports, web sites, machine readable data, databases and the like.
  • a content analyzer 700 is fed with said textual information and a storyboard 600 is generated as result of such analysis.
  • a text to visual element mapper 100 in Figure 1 filters said textual information by means of the text filter 101 and according the filtering rules 104.
  • the text is originated from various sources in various formats including JSON, CSV, XLS . Such information may not fulfill completely the storyboard . In such case the system optionally can request further information via Application Programmer Interfaces (APIs)
  • APIs Application Programmer Interfaces
  • the resulting filtered text is analyzed by the semantic text analyzer 102 which extracts content and information that is used by the visual element selector and mapper 103 to select from one or more visual object libraries 105 and one or more visual elements as best visual representative for the extracted and semantically analyzed piece of content and information.
  • the process is carried out for all the pieces of content and information extracted from the selected text.
  • the information relative to the selected visual elements is fed to the audio module 200 that by means of an audio selector and mapper 201 provides a corresponding audio element selected from one or more libraries of audio elements 202.
  • Video and audio elements are fed to an audio video spatial-temporal graph generator 300 together with the storyboard 600. Each node of the graph can be either an audio or visual element.
  • Arches among elements express the spatial and temporal relationship among audiovisual elements.
  • Said graph is fed to a coherence module 400 that analyzes said graph by means of the graph analyzer 401 that checks that the space temporal relations among objects satisfy the set of rules 402.
  • the arches of the graph are modified by a graph modifier module 401 that re-establishes coherence in the audio-visual representation.
  • the resulting modified graph is rendered into video and audio by a rendering engine 500.
  • the visual objects extracted from different libraries that will appear in different spatial temporal positions in the video are kept in computer memory for efficiency during the rendering process avoiding the search of the same object many times.
  • FIG 2 is shown an example of the first step of the invention in which textual information is gathered from a multitude of sources including financial reports, web sites, machine readable data, databases and the like.
  • FIG 3 is shown an example of the second step of the invention in which a content analyzer 700 is fed with said textual information
  • FIG 5 is shown an example of the fourth step of the invention in which a text to visual element mapper 100 filters said textual information by means of the text filter 101 and according the filtering rules 104.
  • FIG 7 is shown an example of the sixth step of the invention in which the said graph is fed to a coherence module 400 that analyzes said graph by means of the graph analyzer 401 that checks that the space temporal relations among objects satisfy the set of rules 402.
  • FIG 8 is shown an example of the seventh step of the invention in which the mapper 103 to select from one or more visual object libraries 105 one or more visual elements as best visual representative for the extracted and semantically analyzed piece of content and information.
  • FIG 9 is shown an example of the eighth step of the invention in which the video and audio elements are fed to an audio video spatial temporal graph generator 300 together with the storyboard 600.
  • FIG 10 is shown an example of the ninth step of the invention in which an embodiment of the application the resulting modified graph is rendered into video and audio by a rendering engine 500.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention relates to computer systems. More specifically, the invention related to method for converting non- visual information, such as text or other content, into video.

Description

METHOD AND SYSTEM FOR AUTOMATIC GENERATION OF VIDEO FROM
NON-VISUAL INFORMATION
Field of the Invention
The invention relates to computer systems. More specifically, the invention related to a method for converting non- visual information, such as text or other content, into video.
Background of the Invention
In the recent years the use of video and visual animation is becoming more and more popular due to the development of technologies and systems in the area of computer science. In particular visual representation of information is becoming more and more common as it easier to grasp and more concise than reading text.
Summary of the invention
The present invention provides a method for converting an original set of textual information into video form that retains the semantics and the coherence of the textual information, based on a user or system-defined storyboard, a filtering engine and a rendering engine.
According to the invention the textual information coming from multiple information sources is selected, filtered and analyzed by a text-to-visual elements selector and mapper. The information relative to the selected video element is fed to an audio module that selects a corresponding audio element. Audio and video elements selected from one or more object libraries are organized by a graph generator in a graph whose arches can represent both temporal and spatial information, on the basis of a system or user defined storyboard. Said graph is analyzed by a coherence module that correct potential incoherencies of the audio/video. The corrected spatial-temporal graph is fed to a rendering engine that generates the corresponding video.
Detailed description of the invention
The present invention is a method for converting text to video and audio, comprising the various steps. The first step of mapping a filtered and analyzed text to corresponding visual elements and selecting an instance of said representation from a library of visual elements, the second step of generating a storyboard template, the third step of selecting a
A VD/P1285PC corresponding audio element for said video element, the fourth step of selecting the properties of every video and audio element, the fifth step of generating a spatial temporal graph representing the video and audio elements in space and in time, the sixth of selecting the parameters of global coherence of the video, the seventh step of analyzing the spatial- temporal graph and identify potential inconsistencies, the eighth step of modifying the graph to correct said inconsistencies and the ninth step of automatically assembling said video and corresponding audio as a configuration file.
The information filtered from a text may include financial information and data and financial rating information as well as statistical data
The structuring of the text is performed according to the criteria of the content categorization, the entities and data elements extraction and mapping, the relations between entities and the relevance
The storyboard generation is performed according to the criteria of:
content categorization, the entities and data elements extraction and mapping, the relations between entities
Said visual element characteristics are selected among the size, the colors and the fonts, the look and the feel, the type of visual representation, selecting the element based on the rules of: rules for different types of content, as decided in the storyboard; priority for entity and elements type; variety of entities and elements types and the timing rules.
The selection of the video coherence parameters is based on the rules of the different types of content, as decided in the storyboard , the priority in time and in space for entity and elements type, the variety of entities and elements types and the timing rules.
The selection of the audio characteristics are in accordance to the video properties and may include, a narration, the sound effects, a music track or any combination of the above.
The configuration file is rendered into video on the fly upon a user's request.
Drawings and Example
In the Figure 1 is shown a schematic view of the invention.
In an embodiment of the invention, textual information is gathered from a multitude of sources including financial reports, web sites, machine readable data, databases and the like. A content analyzer 700 is fed with said textual information and a storyboard 600 is generated as result of such analysis. A text to visual element mapper 100 in Figure 1 filters said textual information by means of the text filter 101 and according the filtering rules 104. The text is originated from various sources in various formats including JSON, CSV, XLS . Such information may not fulfill completely the storyboard . In such case the system optionally can request further information via Application Programmer Interfaces (APIs)
The resulting filtered text is analyzed by the semantic text analyzer 102 which extracts content and information that is used by the visual element selector and mapper 103 to select from one or more visual object libraries 105 and one or more visual elements as best visual representative for the extracted and semantically analyzed piece of content and information. The process is carried out for all the pieces of content and information extracted from the selected text. The information relative to the selected visual elements is fed to the audio module 200 that by means of an audio selector and mapper 201 provides a corresponding audio element selected from one or more libraries of audio elements 202. Video and audio elements are fed to an audio video spatial-temporal graph generator 300 together with the storyboard 600. Each node of the graph can be either an audio or visual element. Arches among elements express the spatial and temporal relationship among audiovisual elements. Said graph is fed to a coherence module 400 that analyzes said graph by means of the graph analyzer 401 that checks that the space temporal relations among objects satisfy the set of rules 402. In case of failure the arches of the graph are modified by a graph modifier module 401 that re-establishes coherence in the audio-visual representation.
In an embodiment of the application the resulting modified graph is rendered into video and audio by a rendering engine 500.
In an embodiment of the invention the visual objects extracted from different libraries that will appear in different spatial temporal positions in the video are kept in computer memory for efficiency during the rendering process avoiding the search of the same object many times.
Although embodiments of the invention have been described by way of illustration, it will be understood that the invention may be carried out with many variations, modifications, and adaptations, without exceeding the scope of the claims.
In the figure 2 is shown an example of the first step of the invention in which textual information is gathered from a multitude of sources including financial reports, web sites, machine readable data, databases and the like. In the figure 3 is shown an example of the second step of the invention in which a content analyzer 700 is fed with said textual information
In the figure 4 is shown an example of the third step of the invention in which the storyboard 600 is generated as result of such analysis
In the figure 5 is shown an example of the fourth step of the invention in which a text to visual element mapper 100 filters said textual information by means of the text filter 101 and according the filtering rules 104.
In the figure 6 is shown an example of the fifth step of the invention in which the resulting filtered text is analyzed by the semantic text analyzer 102 which extracts content and information that is used by the visual element selector
In the figure 7 is shown an example of the sixth step of the invention in which the said graph is fed to a coherence module 400 that analyzes said graph by means of the graph analyzer 401 that checks that the space temporal relations among objects satisfy the set of rules 402. In the figure 8 is shown an example of the seventh step of the invention in which the mapper 103 to select from one or more visual object libraries 105 one or more visual elements as best visual representative for the extracted and semantically analyzed piece of content and information.
In the figure 9 is shown an example of the eighth step of the invention in which the video and audio elements are fed to an audio video spatial temporal graph generator 300 together with the storyboard 600.
In the figure 10 is shown an example of the ninth step of the invention in which an embodiment of the application the resulting modified graph is rendered into video and audio by a rendering engine 500.

Claims

1. A method for converting text to video and audio, characterized by the fact of comprising the steps of:
a. mapping a filtered and analyzed text to corresponding visual elements and selecting an instance of said representation from a library of visual elements;
b. generating a storyboard template;
c. selecting a corresponding audio element for said video element
d. selecting the properties of every video and audio element
e. generating a spatio temporal graph representing the video and audio elements in space and in time
f. selecting the parameters of global coherence of the video
g. analyzing the spatio-temporal graph and identify potential inconsistencies
h. Modify the graph to correct said inconsistencies.; and
i. automatically assembling said video and corresponding audio as a configuration file.
2. A method according to claim 1, characterized by the fact that the information filtered from a text may include:
a. financial information and data
b. financial rating information
c. statistical data
3. A method according to claim 1, characterized by the fact that the structuring of the text is performed according to the criteria of:
a. content categorization;
b. entities and data elements extraction and mapping;
c. relations between entities
d. relevance
4. A method according to claim 1, characterized by the fact that the storyboard generation is performed according to the criteria of:
a. content categorization; b. entities and data elements extraction and mapping;
c. relations between entities
5. A method according to claim 1, characterized by the fact that said visual element characteristics are selected from among:
a. size;
b. colors and fonts;
c. look and feel;
d. type of visual representation;
6. A method according to claim 1, characterized by the fact that selecting the element is based on the rules of:
a. rules for different types of content, as decided in the storyboard
b. priority for entity and elements type;
c. variety of entities and elements types; and
d. timing rules.
7. A method according to claim 1, characterized by the fact that selecting the video coherence parameters is based on the rules of:
a. rules for different types of content, as decided in the storyboard
b. priority in time and in space for entity and elements type;
c. variety of entities and elements types; and timing rules
8. A method according to claim 1, characterized by the fact that the audio characteristics are selected according to the video properties and may include:
a. narration;
b. sound effects
c. music track
d. or any combination of the above
9. A method according to claim 1, characterized by the fact that the configuration file is rendered into video
10. A method according to claim 1, characterized by the fact that the configuration file is rendered on the fly upon a user's request.
PCT/IB2016/051881 2015-04-02 2016-04-01 Method and system for automatic generation of video from non-visual information WO2016157150A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
ITMI20150474 2015-04-02
ITMI2015A000474 2015-04-02

Publications (1)

Publication Number Publication Date
WO2016157150A1 true WO2016157150A1 (en) 2016-10-06

Family

ID=53052988

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2016/051881 WO2016157150A1 (en) 2015-04-02 2016-04-01 Method and system for automatic generation of video from non-visual information

Country Status (1)

Country Link
WO (1) WO2016157150A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110521213A (en) * 2017-03-23 2019-11-29 韩国斯诺有限公司 Story making video method and system
US10783928B2 (en) 2018-09-20 2020-09-22 Autochartis Limited Automated video generation from financial market analysis

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100238180A1 (en) * 2009-03-17 2010-09-23 Samsung Electronics Co., Ltd. Apparatus and method for creating animation from web text
WO2014091479A1 (en) * 2012-12-10 2014-06-19 Wibbitz Ltd. A method for automatically transforming text into video

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100238180A1 (en) * 2009-03-17 2010-09-23 Samsung Electronics Co., Ltd. Apparatus and method for creating animation from web text
WO2014091479A1 (en) * 2012-12-10 2014-06-19 Wibbitz Ltd. A method for automatically transforming text into video

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110521213A (en) * 2017-03-23 2019-11-29 韩国斯诺有限公司 Story making video method and system
CN110521213B (en) * 2017-03-23 2022-02-18 韩国斯诺有限公司 Story image making method and system
US11704355B2 (en) 2017-03-23 2023-07-18 Snow Corporation Method and system for producing story video
US11954142B2 (en) 2017-03-23 2024-04-09 Snow Corporation Method and system for producing story video
US10783928B2 (en) 2018-09-20 2020-09-22 Autochartis Limited Automated video generation from financial market analysis
US11322183B2 (en) 2018-09-20 2022-05-03 Autochartist Limited Automated video generation from financial market analysis

Similar Documents

Publication Publication Date Title
US10277946B2 (en) Methods and systems for aggregation and organization of multimedia data acquired from a plurality of sources
US9858340B1 (en) Systems and methods for queryable graph representations of videos
JP2022505092A (en) Video content integrated metadata automatic generation method and system utilizing video metadata and script data
US9972358B2 (en) Interactive video generation
CN111414166B (en) Code generation method, device, equipment and storage medium
EP2939424B1 (en) A system and method for generating personal videos
KR102024933B1 (en) apparatus and method for tracking image content context trend using dynamically generated metadata
US9146989B2 (en) Analytic comparison of libraries and playlists
US11621792B2 (en) Real-time automated classification system
Rummukainen et al. Categorization of natural dynamic audiovisual scenes
Ishibashi et al. Investigating audio data visualization for interactive sound recognition
WO2016157150A1 (en) Method and system for automatic generation of video from non-visual information
Rajbhoj et al. Early experience with model-driven development of mapreduce based big data application
CN107291749B (en) Method and device for determining data index association relation
Guei et al. ECOGEN: Bird sounds generation using deep learning
KR20210074734A (en) System and Method for Extracting Keyword and Ranking in Video Subtitle
JP2008117066A (en) Software development support method, software development support device, software development support program, and computer system
US10372742B2 (en) Apparatus and method for tagging topic to content
Campos et al. Machine Generation of Audio Description for Blind and Visually Impaired People
CN114359159A (en) Video generation method, system, electronic device and storage medium
CN112511766A (en) Barrage NLP-based video clipping method and system, electronic equipment and storage medium
WO2017183288A1 (en) Multimedia reproduction device and multimedia generation device
Schöning et al. Visual Analytics of Gaze Data with Standard Multimedia Players
CN109729425B (en) Method and system for predicting key segments
Busson et al. H. 761 Support of a New concept Element and a New" recognition" Node-Event to Enable Deep Learning-based Analyses for Media-Nodes

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16719519

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16719519

Country of ref document: EP

Kind code of ref document: A1