WO2016157150A1

WO2016157150A1 - Method and system for automatic generation of video from non-visual information

Info

Publication number: WO2016157150A1
Application number: PCT/IB2016/051881
Authority: WO
Inventors: Francesco PICCOLOMINI NALDI BANDINI; Ludovico Giangiuseppe MARINI; Marco MIGLIACCIO; Alain FRANZONI
Original assignee: Littlesea S.R.L.
Priority date: 2015-04-02
Filing date: 2016-04-01
Publication date: 2016-10-06

Abstract

The invention relates to computer systems. More specifically, the invention related to method for converting non- visual information, such as text or other content, into video.

Description

METHOD AND SYSTEM FOR AUTOMATIC GENERATION OF VIDEO FROM

NON-VISUAL INFORMATION

Field of the Invention

The invention relates to computer systems. More specifically, the invention related to a method for converting non- visual information, such as text or other content, into video.

Background of the Invention

In the recent years the use of video and visual animation is becoming more and more popular due to the development of technologies and systems in the area of computer science. In particular visual representation of information is becoming more and more common as it easier to grasp and more concise than reading text.

Summary of the invention

The present invention provides a method for converting an original set of textual information into video form that retains the semantics and the coherence of the textual information, based on a user or system-defined storyboard, a filtering engine and a rendering engine.

According to the invention the textual information coming from multiple information sources is selected, filtered and analyzed by a text-to-visual elements selector and mapper. The information relative to the selected video element is fed to an audio module that selects a corresponding audio element. Audio and video elements selected from one or more object libraries are organized by a graph generator in a graph whose arches can represent both temporal and spatial information, on the basis of a system or user defined storyboard. Said graph is analyzed by a coherence module that correct potential incoherencies of the audio/video. The corrected spatial-temporal graph is fed to a rendering engine that generates the corresponding video.

Detailed description of the invention

The present invention is a method for converting text to video and audio, comprising the various steps. The first step of mapping a filtered and analyzed text to corresponding visual elements and selecting an instance of said representation from a library of visual elements, the second step of generating a storyboard template, the third step of selecting a

A VD/P1285PC corresponding audio element for said video element, the fourth step of selecting the properties of every video and audio element, the fifth step of generating a spatial temporal graph representing the video and audio elements in space and in time, the sixth of selecting the parameters of global coherence of the video, the seventh step of analyzing the spatial- temporal graph and identify potential inconsistencies, the eighth step of modifying the graph to correct said inconsistencies and the ninth step of automatically assembling said video and corresponding audio as a configuration file.

The information filtered from a text may include financial information and data and financial rating information as well as statistical data

The structuring of the text is performed according to the criteria of the content categorization, the entities and data elements extraction and mapping, the relations between entities and the relevance

The storyboard generation is performed according to the criteria of:

content categorization, the entities and data elements extraction and mapping, the relations between entities

Said visual element characteristics are selected among the size, the colors and the fonts, the look and the feel, the type of visual representation, selecting the element based on the rules of: rules for different types of content, as decided in the storyboard; priority for entity and elements type; variety of entities and elements types and the timing rules.

The selection of the video coherence parameters is based on the rules of the different types of content, as decided in the storyboard , the priority in time and in space for entity and elements type, the variety of entities and elements types and the timing rules.

The selection of the audio characteristics are in accordance to the video properties and may include, a narration, the sound effects, a music track or any combination of the above.

The configuration file is rendered into video on the fly upon a user's request.

Drawings and Example

In the Figure 1 is shown a schematic view of the invention.

In an embodiment of the invention, textual information is gathered from a multitude of sources including financial reports, web sites, machine readable data, databases and the like. A content analyzer 700 is fed with said textual information and a storyboard 600 is generated as result of such analysis. A text to visual element mapper 100 in Figure 1 filters said textual information by means of the text filter 101 and according the filtering rules 104. The text is originated from various sources in various formats including JSON, CSV, XLS . Such information may not fulfill completely the storyboard . In such case the system optionally can request further information via Application Programmer Interfaces (APIs)

The resulting filtered text is analyzed by the semantic text analyzer 102 which extracts content and information that is used by the visual element selector and mapper 103 to select from one or more visual object libraries 105 and one or more visual elements as best visual representative for the extracted and semantically analyzed piece of content and information. The process is carried out for all the pieces of content and information extracted from the selected text. The information relative to the selected visual elements is fed to the audio module 200 that by means of an audio selector and mapper 201 provides a corresponding audio element selected from one or more libraries of audio elements 202. Video and audio elements are fed to an audio video spatial-temporal graph generator 300 together with the storyboard 600. Each node of the graph can be either an audio or visual element. Arches among elements express the spatial and temporal relationship among audiovisual elements. Said graph is fed to a coherence module 400 that analyzes said graph by means of the graph analyzer 401 that checks that the space temporal relations among objects satisfy the set of rules 402. In case of failure the arches of the graph are modified by a graph modifier module 401 that re-establishes coherence in the audio-visual representation.

In an embodiment of the application the resulting modified graph is rendered into video and audio by a rendering engine 500.

In an embodiment of the invention the visual objects extracted from different libraries that will appear in different spatial temporal positions in the video are kept in computer memory for efficiency during the rendering process avoiding the search of the same object many times.

Although embodiments of the invention have been described by way of illustration, it will be understood that the invention may be carried out with many variations, modifications, and adaptations, without exceeding the scope of the claims.

In the figure 2 is shown an example of the first step of the invention in which textual information is gathered from a multitude of sources including financial reports, web sites, machine readable data, databases and the like. In the figure 3 is shown an example of the second step of the invention in which a content analyzer 700 is fed with said textual information

In the figure 4 is shown an example of the third step of the invention in which the storyboard 600 is generated as result of such analysis

In the figure 5 is shown an example of the fourth step of the invention in which a text to visual element mapper 100 filters said textual information by means of the text filter 101 and according the filtering rules 104.

In the figure 6 is shown an example of the fifth step of the invention in which the resulting filtered text is analyzed by the semantic text analyzer 102 which extracts content and information that is used by the visual element selector

In the figure 7 is shown an example of the sixth step of the invention in which the said graph is fed to a coherence module 400 that analyzes said graph by means of the graph analyzer 401 that checks that the space temporal relations among objects satisfy the set of rules 402. In the figure 8 is shown an example of the seventh step of the invention in which the mapper 103 to select from one or more visual object libraries 105 one or more visual elements as best visual representative for the extracted and semantically analyzed piece of content and information.

In the figure 9 is shown an example of the eighth step of the invention in which the video and audio elements are fed to an audio video spatial temporal graph generator 300 together with the storyboard 600.

In the figure 10 is shown an example of the ninth step of the invention in which an embodiment of the application the resulting modified graph is rendered into video and audio by a rendering engine 500.

Claims

1. A method for converting text to video and audio, characterized by the fact of comprising the steps of:

a. mapping a filtered and analyzed text to corresponding visual elements and selecting an instance of said representation from a library of visual elements;

b. generating a storyboard template;

c. selecting a corresponding audio element for said video element

d. selecting the properties of every video and audio element

e. generating a spatio temporal graph representing the video and audio elements in space and in time

f. selecting the parameters of global coherence of the video

g. analyzing the spatio-temporal graph and identify potential inconsistencies

h. Modify the graph to correct said inconsistencies.; and

i. automatically assembling said video and corresponding audio as a configuration file.

2. A method according to claim 1, characterized by the fact that the information filtered from a text may include:

a. financial information and data

b. financial rating information

c. statistical data

3. A method according to claim 1, characterized by the fact that the structuring of the text is performed according to the criteria of:

a. content categorization;

b. entities and data elements extraction and mapping;

c. relations between entities

d. relevance

4. A method according to claim 1, characterized by the fact that the storyboard generation is performed according to the criteria of:

a. content categorization; b. entities and data elements extraction and mapping;

c. relations between entities

5. A method according to claim 1, characterized by the fact that said visual element characteristics are selected from among:

a. size;

b. colors and fonts;

c. look and feel;

d. type of visual representation;

6. A method according to claim 1, characterized by the fact that selecting the element is based on the rules of:

a. rules for different types of content, as decided in the storyboard

b. priority for entity and elements type;

c. variety of entities and elements types; and

d. timing rules.

7. A method according to claim 1, characterized by the fact that selecting the video coherence parameters is based on the rules of:

a. rules for different types of content, as decided in the storyboard

b. priority in time and in space for entity and elements type;

c. variety of entities and elements types; and timing rules

8. A method according to claim 1, characterized by the fact that the audio characteristics are selected according to the video properties and may include:

a. narration;

b. sound effects

c. music track

d. or any combination of the above

9. A method according to claim 1, characterized by the fact that the configuration file is rendered into video

10. A method according to claim 1, characterized by the fact that the configuration file is rendered on the fly upon a user's request.