WO2016157150A1 - Method and system for automatic generation of video from non-visual information - Google Patents
Method and system for automatic generation of video from non-visual information Download PDFInfo
- Publication number
- WO2016157150A1 WO2016157150A1 PCT/IB2016/051881 IB2016051881W WO2016157150A1 WO 2016157150 A1 WO2016157150 A1 WO 2016157150A1 IB 2016051881 W IB2016051881 W IB 2016051881W WO 2016157150 A1 WO2016157150 A1 WO 2016157150A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- fact
- elements
- audio
- entities
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/43—Querying
- G06F16/438—Presentation of query results
- G06F16/4387—Presentation of query results by the use of playlists
- G06F16/4393—Multimedia presentations, e.g. slide shows, multimedia albums
Definitions
- the invention relates to computer systems. More specifically, the invention related to a method for converting non- visual information, such as text or other content, into video.
- the present invention provides a method for converting an original set of textual information into video form that retains the semantics and the coherence of the textual information, based on a user or system-defined storyboard, a filtering engine and a rendering engine.
- the textual information coming from multiple information sources is selected, filtered and analyzed by a text-to-visual elements selector and mapper.
- the information relative to the selected video element is fed to an audio module that selects a corresponding audio element.
- Audio and video elements selected from one or more object libraries are organized by a graph generator in a graph whose arches can represent both temporal and spatial information, on the basis of a system or user defined storyboard.
- Said graph is analyzed by a coherence module that correct potential incoherencies of the audio/video.
- the corrected spatial-temporal graph is fed to a rendering engine that generates the corresponding video.
- the present invention is a method for converting text to video and audio, comprising the various steps.
- a VD/P1285PC corresponding audio element for said video element the fourth step of selecting the properties of every video and audio element, the fifth step of generating a spatial temporal graph representing the video and audio elements in space and in time, the sixth of selecting the parameters of global coherence of the video, the seventh step of analyzing the spatial- temporal graph and identify potential inconsistencies, the eighth step of modifying the graph to correct said inconsistencies and the ninth step of automatically assembling said video and corresponding audio as a configuration file.
- the information filtered from a text may include financial information and data and financial rating information as well as statistical data
- the structuring of the text is performed according to the criteria of the content categorization, the entities and data elements extraction and mapping, the relations between entities and the relevance
- the storyboard generation is performed according to the criteria of:
- Said visual element characteristics are selected among the size, the colors and the fonts, the look and the feel, the type of visual representation, selecting the element based on the rules of: rules for different types of content, as decided in the storyboard; priority for entity and elements type; variety of entities and elements types and the timing rules.
- the selection of the video coherence parameters is based on the rules of the different types of content, as decided in the storyboard , the priority in time and in space for entity and elements type, the variety of entities and elements types and the timing rules.
- the selection of the audio characteristics are in accordance to the video properties and may include, a narration, the sound effects, a music track or any combination of the above.
- the configuration file is rendered into video on the fly upon a user's request.
- textual information is gathered from a multitude of sources including financial reports, web sites, machine readable data, databases and the like.
- a content analyzer 700 is fed with said textual information and a storyboard 600 is generated as result of such analysis.
- a text to visual element mapper 100 in Figure 1 filters said textual information by means of the text filter 101 and according the filtering rules 104.
- the text is originated from various sources in various formats including JSON, CSV, XLS . Such information may not fulfill completely the storyboard . In such case the system optionally can request further information via Application Programmer Interfaces (APIs)
- APIs Application Programmer Interfaces
- the resulting filtered text is analyzed by the semantic text analyzer 102 which extracts content and information that is used by the visual element selector and mapper 103 to select from one or more visual object libraries 105 and one or more visual elements as best visual representative for the extracted and semantically analyzed piece of content and information.
- the process is carried out for all the pieces of content and information extracted from the selected text.
- the information relative to the selected visual elements is fed to the audio module 200 that by means of an audio selector and mapper 201 provides a corresponding audio element selected from one or more libraries of audio elements 202.
- Video and audio elements are fed to an audio video spatial-temporal graph generator 300 together with the storyboard 600. Each node of the graph can be either an audio or visual element.
- Arches among elements express the spatial and temporal relationship among audiovisual elements.
- Said graph is fed to a coherence module 400 that analyzes said graph by means of the graph analyzer 401 that checks that the space temporal relations among objects satisfy the set of rules 402.
- the arches of the graph are modified by a graph modifier module 401 that re-establishes coherence in the audio-visual representation.
- the resulting modified graph is rendered into video and audio by a rendering engine 500.
- the visual objects extracted from different libraries that will appear in different spatial temporal positions in the video are kept in computer memory for efficiency during the rendering process avoiding the search of the same object many times.
- FIG 2 is shown an example of the first step of the invention in which textual information is gathered from a multitude of sources including financial reports, web sites, machine readable data, databases and the like.
- FIG 3 is shown an example of the second step of the invention in which a content analyzer 700 is fed with said textual information
- FIG 5 is shown an example of the fourth step of the invention in which a text to visual element mapper 100 filters said textual information by means of the text filter 101 and according the filtering rules 104.
- FIG 7 is shown an example of the sixth step of the invention in which the said graph is fed to a coherence module 400 that analyzes said graph by means of the graph analyzer 401 that checks that the space temporal relations among objects satisfy the set of rules 402.
- FIG 8 is shown an example of the seventh step of the invention in which the mapper 103 to select from one or more visual object libraries 105 one or more visual elements as best visual representative for the extracted and semantically analyzed piece of content and information.
- FIG 9 is shown an example of the eighth step of the invention in which the video and audio elements are fed to an audio video spatial temporal graph generator 300 together with the storyboard 600.
- FIG 10 is shown an example of the ninth step of the invention in which an embodiment of the application the resulting modified graph is rendered into video and audio by a rendering engine 500.
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Entrepreneurship & Innovation (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Economics (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Processing Or Creating Images (AREA)
Abstract
The invention relates to computer systems. More specifically, the invention related to method for converting non- visual information, such as text or other content, into video.
Description
METHOD AND SYSTEM FOR AUTOMATIC GENERATION OF VIDEO FROM
NON-VISUAL INFORMATION
Field of the Invention
The invention relates to computer systems. More specifically, the invention related to a method for converting non- visual information, such as text or other content, into video.
Background of the Invention
In the recent years the use of video and visual animation is becoming more and more popular due to the development of technologies and systems in the area of computer science. In particular visual representation of information is becoming more and more common as it easier to grasp and more concise than reading text.
Summary of the invention
The present invention provides a method for converting an original set of textual information into video form that retains the semantics and the coherence of the textual information, based on a user or system-defined storyboard, a filtering engine and a rendering engine.
According to the invention the textual information coming from multiple information sources is selected, filtered and analyzed by a text-to-visual elements selector and mapper. The information relative to the selected video element is fed to an audio module that selects a corresponding audio element. Audio and video elements selected from one or more object libraries are organized by a graph generator in a graph whose arches can represent both temporal and spatial information, on the basis of a system or user defined storyboard. Said graph is analyzed by a coherence module that correct potential incoherencies of the audio/video. The corrected spatial-temporal graph is fed to a rendering engine that generates the corresponding video.
Detailed description of the invention
The present invention is a method for converting text to video and audio, comprising the various steps. The first step of mapping a filtered and analyzed text to corresponding visual elements and selecting an instance of said representation from a library of visual elements, the second step of generating a storyboard template, the third step of selecting a
A VD/P1285PC
corresponding audio element for said video element, the fourth step of selecting the properties of every video and audio element, the fifth step of generating a spatial temporal graph representing the video and audio elements in space and in time, the sixth of selecting the parameters of global coherence of the video, the seventh step of analyzing the spatial- temporal graph and identify potential inconsistencies, the eighth step of modifying the graph to correct said inconsistencies and the ninth step of automatically assembling said video and corresponding audio as a configuration file.
The information filtered from a text may include financial information and data and financial rating information as well as statistical data
The structuring of the text is performed according to the criteria of the content categorization, the entities and data elements extraction and mapping, the relations between entities and the relevance
The storyboard generation is performed according to the criteria of:
content categorization, the entities and data elements extraction and mapping, the relations between entities
Said visual element characteristics are selected among the size, the colors and the fonts, the look and the feel, the type of visual representation, selecting the element based on the rules of: rules for different types of content, as decided in the storyboard; priority for entity and elements type; variety of entities and elements types and the timing rules.
The selection of the video coherence parameters is based on the rules of the different types of content, as decided in the storyboard , the priority in time and in space for entity and elements type, the variety of entities and elements types and the timing rules.
The selection of the audio characteristics are in accordance to the video properties and may include, a narration, the sound effects, a music track or any combination of the above.
The configuration file is rendered into video on the fly upon a user's request.
Drawings and Example
In the Figure 1 is shown a schematic view of the invention.
In an embodiment of the invention, textual information is gathered from a multitude of sources including financial reports, web sites, machine readable data, databases and the like. A content analyzer 700 is fed with said textual information and a storyboard 600 is generated
as result of such analysis. A text to visual element mapper 100 in Figure 1 filters said textual information by means of the text filter 101 and according the filtering rules 104. The text is originated from various sources in various formats including JSON, CSV, XLS . Such information may not fulfill completely the storyboard . In such case the system optionally can request further information via Application Programmer Interfaces (APIs)
The resulting filtered text is analyzed by the semantic text analyzer 102 which extracts content and information that is used by the visual element selector and mapper 103 to select from one or more visual object libraries 105 and one or more visual elements as best visual representative for the extracted and semantically analyzed piece of content and information. The process is carried out for all the pieces of content and information extracted from the selected text. The information relative to the selected visual elements is fed to the audio module 200 that by means of an audio selector and mapper 201 provides a corresponding audio element selected from one or more libraries of audio elements 202. Video and audio elements are fed to an audio video spatial-temporal graph generator 300 together with the storyboard 600. Each node of the graph can be either an audio or visual element. Arches among elements express the spatial and temporal relationship among audiovisual elements. Said graph is fed to a coherence module 400 that analyzes said graph by means of the graph analyzer 401 that checks that the space temporal relations among objects satisfy the set of rules 402. In case of failure the arches of the graph are modified by a graph modifier module 401 that re-establishes coherence in the audio-visual representation.
In an embodiment of the application the resulting modified graph is rendered into video and audio by a rendering engine 500.
In an embodiment of the invention the visual objects extracted from different libraries that will appear in different spatial temporal positions in the video are kept in computer memory for efficiency during the rendering process avoiding the search of the same object many times.
Although embodiments of the invention have been described by way of illustration, it will be understood that the invention may be carried out with many variations, modifications, and adaptations, without exceeding the scope of the claims.
In the figure 2 is shown an example of the first step of the invention in which textual information is gathered from a multitude of sources including financial reports, web sites, machine readable data, databases and the like.
In the figure 3 is shown an example of the second step of the invention in which a content analyzer 700 is fed with said textual information
In the figure 4 is shown an example of the third step of the invention in which the storyboard 600 is generated as result of such analysis
In the figure 5 is shown an example of the fourth step of the invention in which a text to visual element mapper 100 filters said textual information by means of the text filter 101 and according the filtering rules 104.
In the figure 6 is shown an example of the fifth step of the invention in which the resulting filtered text is analyzed by the semantic text analyzer 102 which extracts content and information that is used by the visual element selector
In the figure 7 is shown an example of the sixth step of the invention in which the said graph is fed to a coherence module 400 that analyzes said graph by means of the graph analyzer 401 that checks that the space temporal relations among objects satisfy the set of rules 402. In the figure 8 is shown an example of the seventh step of the invention in which the mapper 103 to select from one or more visual object libraries 105 one or more visual elements as best visual representative for the extracted and semantically analyzed piece of content and information.
In the figure 9 is shown an example of the eighth step of the invention in which the video and audio elements are fed to an audio video spatial temporal graph generator 300 together with the storyboard 600.
In the figure 10 is shown an example of the ninth step of the invention in which an embodiment of the application the resulting modified graph is rendered into video and audio by a rendering engine 500.
Claims
1. A method for converting text to video and audio, characterized by the fact of comprising the steps of:
a. mapping a filtered and analyzed text to corresponding visual elements and selecting an instance of said representation from a library of visual elements;
b. generating a storyboard template;
c. selecting a corresponding audio element for said video element
d. selecting the properties of every video and audio element
e. generating a spatio temporal graph representing the video and audio elements in space and in time
f. selecting the parameters of global coherence of the video
g. analyzing the spatio-temporal graph and identify potential inconsistencies
h. Modify the graph to correct said inconsistencies.; and
i. automatically assembling said video and corresponding audio as a configuration file.
2. A method according to claim 1, characterized by the fact that the information filtered from a text may include:
a. financial information and data
b. financial rating information
c. statistical data
3. A method according to claim 1, characterized by the fact that the structuring of the text is performed according to the criteria of:
a. content categorization;
b. entities and data elements extraction and mapping;
c. relations between entities
d. relevance
4. A method according to claim 1, characterized by the fact that the storyboard generation is performed according to the criteria of:
a. content categorization;
b. entities and data elements extraction and mapping;
c. relations between entities
5. A method according to claim 1, characterized by the fact that said visual element characteristics are selected from among:
a. size;
b. colors and fonts;
c. look and feel;
d. type of visual representation;
6. A method according to claim 1, characterized by the fact that selecting the element is based on the rules of:
a. rules for different types of content, as decided in the storyboard
b. priority for entity and elements type;
c. variety of entities and elements types; and
d. timing rules.
7. A method according to claim 1, characterized by the fact that selecting the video coherence parameters is based on the rules of:
a. rules for different types of content, as decided in the storyboard
b. priority in time and in space for entity and elements type;
c. variety of entities and elements types; and timing rules
8. A method according to claim 1, characterized by the fact that the audio characteristics are selected according to the video properties and may include:
a. narration;
b. sound effects
c. music track
d. or any combination of the above
9. A method according to claim 1, characterized by the fact that the configuration file is rendered into video
10. A method according to claim 1, characterized by the fact that the configuration file is rendered on the fly upon a user's request.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
ITMI20150474 | 2015-04-02 | ||
ITMI2015A000474 | 2015-04-02 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016157150A1 true WO2016157150A1 (en) | 2016-10-06 |
Family
ID=53052988
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2016/051881 WO2016157150A1 (en) | 2015-04-02 | 2016-04-01 | Method and system for automatic generation of video from non-visual information |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2016157150A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110521213A (en) * | 2017-03-23 | 2019-11-29 | 韩国斯诺有限公司 | Story making video method and system |
US10783928B2 (en) | 2018-09-20 | 2020-09-22 | Autochartis Limited | Automated video generation from financial market analysis |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100238180A1 (en) * | 2009-03-17 | 2010-09-23 | Samsung Electronics Co., Ltd. | Apparatus and method for creating animation from web text |
WO2014091479A1 (en) * | 2012-12-10 | 2014-06-19 | Wibbitz Ltd. | A method for automatically transforming text into video |
-
2016
- 2016-04-01 WO PCT/IB2016/051881 patent/WO2016157150A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100238180A1 (en) * | 2009-03-17 | 2010-09-23 | Samsung Electronics Co., Ltd. | Apparatus and method for creating animation from web text |
WO2014091479A1 (en) * | 2012-12-10 | 2014-06-19 | Wibbitz Ltd. | A method for automatically transforming text into video |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110521213A (en) * | 2017-03-23 | 2019-11-29 | 韩国斯诺有限公司 | Story making video method and system |
CN110521213B (en) * | 2017-03-23 | 2022-02-18 | 韩国斯诺有限公司 | Story image making method and system |
US11704355B2 (en) | 2017-03-23 | 2023-07-18 | Snow Corporation | Method and system for producing story video |
US11954142B2 (en) | 2017-03-23 | 2024-04-09 | Snow Corporation | Method and system for producing story video |
US10783928B2 (en) | 2018-09-20 | 2020-09-22 | Autochartis Limited | Automated video generation from financial market analysis |
US11322183B2 (en) | 2018-09-20 | 2022-05-03 | Autochartist Limited | Automated video generation from financial market analysis |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10277946B2 (en) | Methods and systems for aggregation and organization of multimedia data acquired from a plurality of sources | |
US9858340B1 (en) | Systems and methods for queryable graph representations of videos | |
JP2022505092A (en) | Video content integrated metadata automatic generation method and system utilizing video metadata and script data | |
US9972358B2 (en) | Interactive video generation | |
CN111414166B (en) | Code generation method, device, equipment and storage medium | |
EP2939424B1 (en) | A system and method for generating personal videos | |
KR102024933B1 (en) | apparatus and method for tracking image content context trend using dynamically generated metadata | |
US9146989B2 (en) | Analytic comparison of libraries and playlists | |
US11621792B2 (en) | Real-time automated classification system | |
Rummukainen et al. | Categorization of natural dynamic audiovisual scenes | |
Ishibashi et al. | Investigating audio data visualization for interactive sound recognition | |
WO2016157150A1 (en) | Method and system for automatic generation of video from non-visual information | |
Rajbhoj et al. | Early experience with model-driven development of mapreduce based big data application | |
CN107291749B (en) | Method and device for determining data index association relation | |
Guei et al. | ECOGEN: Bird sounds generation using deep learning | |
KR20210074734A (en) | System and Method for Extracting Keyword and Ranking in Video Subtitle | |
JP2008117066A (en) | Software development support method, software development support device, software development support program, and computer system | |
US10372742B2 (en) | Apparatus and method for tagging topic to content | |
Campos et al. | Machine Generation of Audio Description for Blind and Visually Impaired People | |
CN114359159A (en) | Video generation method, system, electronic device and storage medium | |
CN112511766A (en) | Barrage NLP-based video clipping method and system, electronic equipment and storage medium | |
WO2017183288A1 (en) | Multimedia reproduction device and multimedia generation device | |
Schöning et al. | Visual Analytics of Gaze Data with Standard Multimedia Players | |
CN109729425B (en) | Method and system for predicting key segments | |
Busson et al. | H. 761 Support of a New concept Element and a New" recognition" Node-Event to Enable Deep Learning-based Analyses for Media-Nodes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16719519 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 16719519 Country of ref document: EP Kind code of ref document: A1 |