WO2021023397A1

WO2021023397A1 - Method and device for enriching multimedia content through metainformation

Info

Publication number: WO2021023397A1
Application number: PCT/EP2020/025354
Authority: WO
Inventors: Guillaume DORET; Alexis KOFMAN
Original assignee: Synchronized
Priority date: 2019-08-03
Filing date: 2020-07-31
Publication date: 2021-02-11
Also published as: FR3099674A1; FR3099674B1

Abstract

The present invention is a system capable of automating the sequencing and the enrichment of a linear audiovisual program or stream of images of any kind (television programs, films, documentaries, series, or any other audiovisual or educational programs). The method and the device consist of a smart video platform giving users access to additional content, services and functionalities for content navigation and layout, searching, sharing and e-commerce in relation to the content of the video. The audiovisual content and user experience are thus enriched. A powerful analysis capability allows this system to collect a high number of metadata on the basis of which the video content is able to be enriched and sequenced. These metadata are synchronized, and the placement of additional content and sequencing may thus be positioned with very high accuracy. The software, which we call the Editor, makes it possible to run through a video stream frame by frame, to adjust the metadata that are automatically generated, to add data and/or functional objects and to place them at chosen times. The interface of this system is formed of two parts, the first designed for the addition, editing and management of the content, the second being used to create and label sequences (chapters), marking points and/or functional objects. It is also used to place content on the timeline. The representation of the data is therefore generated using this Editor or through automatic detection. Finally, a video reader or "player" makes it possible to read the resulting video enriched with its associated data and functions. Under these conditions, this is a module that allows the reader to be included in a third-party application offering consumers an enriched audiovisual experience. It is either autonomous or remains in the universe and the ecosystem of audiovisual rights holders and broadcasters.

Description

DESCRIPTION:

Method and apparatus for enriching multimedia content with meta-information

Technical area

The present invention is a system capable of automating the sequencing and enrichment of a linear audiovisual program or stream of images of any kind (television shows, films, documentaries, series, or any other audiovisual or educational programs). The method and the device consist of a Smart-Video platform giving users access to additional content, services and features for arranging and browsing content, search, sharing and e-commerce relating to the content of the video.

The platform is made up of B pillars:

1) An editing tool, called the Editor, for generating a temporal description file, data content and interface models for users.

2) Files representing the media generated from this Editor.

3) A video player, or "player" for using the video and associated data. Thanks to these B pillars, the use of the platform generates a Smart-Video, a video whose content, functions and navigation mode are enhanced, allowing the user to personalize his journey and his experience within the same. video. By its modular nature, the video player can benefit all third-party applications, OTT platforms (such as My Canal, BBC

Player, MyTFl), OSTV, or websites, but also be used stand-alone.

State of the prior art

The state of the art of the interactive approach described above is known. We can cite two examples:

A patent WO2009115695, filed February 25, 2009 and published September 24, 2009 by MAIM ENRICO French and a patent WO2013079768 15 filed October 17, 2012 and published June 6, 2013 by NOKIA Corp US.

The first patent concerning a data source enrichment process on the creation of new data from sources and text analyzes, but does not specify that these are audiovisual sources. The second relates to a method and apparatus for enriching multimedia contents with meta-information. It adds metadata about the original media content. The media are physical and the invention relates to a methodology for the manual use of metadata.

25 As it stands, there is no automated editing process allowing the enrichment and division of video media into functional sequences and / or events or offering an integrated back-office solution, a video player enriched with content, features and services (video and enriched user experience) intended for all rights holders

Audiovisual soundtrack and its broadcasters. Presentation of the invention

The present invention, endowed with a strong analytical capacity, makes it possible to automatically detect a large number of determining elements allowing the collection of metadata relating to the content of the moment analyzed. The metadata is synchronized with the content. For this, all the frames of a video are analyzed, 1 second of video corresponds to a number of frames ranging from 25 to 30 frames. For example, if it is detected that a personality appears on the 23rd frame of the 5th second of the video, it is then possible to attach additional information about this personality from the 00: 00 timecode : 05: 23.

The present invention relates to the first pillar of the platform, the objective of which, by enriching metadata, is to simplify, by automating as much as possible, the addition of content and / or interactive elements to an audio-visual program.

This automated system is capable of analyzing and then transforming a linear audiovisual program or image stream of any kind (TV shows, films, documentaries, etc.) into an interactive audiovisual program that can be viewed on a mobile phone, any screen or connected device. in particular, tablet, Apple TV, Android TV, and / or smart TV.

Its interface, while browsing a video stream frame by frame, thanks to algorithms proceeding by extraction, is composed of two parts, the first intended for the addition, the edition and the management of the original contents, whatever. either the internal or external source, enriched with information or interactive features, the second used for the creation and labeling of chapters, event sequences or marking points being contextual and relevant editorial content. It is also used for the placement of this content and / or functional objects and the enrichment of metadata. Audio or visual cues, metadata, make it possible to organize the sequencing, the first cutting following the detection of audio or visual cues from the video, then to adjust and refine them from the cues or scenes of the video to achieve a thematic or partial interactivity.

The Editor then allows you to step through a video stream frame by frame, check the results of automatic analysis, add data and place it at selected times.

This enriched content is integrated into the initial video and reformatted to achieve a personalized user experience.

Chapter creation consists of indexing the content of a video with the aim of browsing it in a non-linear way as would be possible by cutting chapters provided on DVD media or by creating a playlist grouping together only the relevant sequences of the video. multiple videos under a pop-up list. This also makes it possible to locate and navigate temporally during the playback of the stream. In addition, adding metadata to these chapters offers the possibility of performing search queries within the video itself and not just on the title of the files.

This indexing and sequencing process is illustrated in [Fig.l]

This technology will make it possible to analyze and transform videos accessible to the public, live or in deferred time, coming from any hypertext or search engine, from any content whatever its origin, from databases of any type. ... The contents to be added can be of different types and are defined during the creation of the models which are to the desired user experience, which can differ depending on the format (Film, Magazine, Documentary) but also on the program itself: it can These may be examples of biographies, video extracts (trailer, musical clip extract, archive of programs already broadcast ...), or even purchasing features for the sale of concert tickets, books, or others.

The result can then be viewed from an application (mobile phone, tablet, smart TV, TV OS, website or any screen connected to the Internet) allowing users to benefit from the enriched experience and consult the added data.

The device consists of a Smart -Video platform which includes, on the one hand, an editor making it possible to edit, publish, share, and provide access to enriched content, on the other hand to generate a temporal description file of synchronized metadata, and finally a video player for using Smart-Video.

The metadata are accessible from codes and functionalities (API, SDK) making the video player replaceable and the device compatible with any other player or technological platform in an interoperable manner.

Under these conditions, the method automates as much as possible the actions which were until then carried out by the intervention of a human being capable of visualizing and understanding the editorial meaning of a video. This is the case for the following actions:

- viewing of a “timeline” video stream;

- identification of a passage that can be increased, enriched with information or interactive features;

- addition of content and data related to the identified passage; - division into chapters, sequencing, marking;

- placement of a temporal model of content, interactive and / or functional “video event” objects

- association of a “video event” model with one of the contents managed by the database.

Also the platform is capable of automatically detecting chapters and video, audio or multimedia events, from Deep Learning solutions ("deeplearning") or neural networks but also from algorithms in the field of signal processing, and to create a "video event" from the initial video on your own. It can also add “video events” of other types depending on the context, content and narration detected.

The automatic detection solution, more commonly known as the “automation engine” is divided into 2 blocks, one intended for splitting the video into chapters, the other for placing interactive content.

It can be represented according to the diagram described in [Fig.2]

Each block is based on a set of already known techniques and algorithms meeting the specific needs of the platform: the detection of “video events” and their placements corresponding to types of content.

Individualized sequencing offers the user the possibility of selecting all or part of the chapters of a video to be viewed with a time mark allowing them to position themselves during viewing. For the automatic detection of timecodes allowing sequencing in chapters and the placement of video events, a set of algorithms makes it possible to reduce the search window for these as much as possible. This set of algorithms is then adapted according to the program and the elements sought.

The [Fig. B] illustrates an example of steps in the case where the video includes audio or visual cues making it possible to identify the cut between two chapters: 1) First splitting following the detection of the audio cues using the so-called cross-correlation method . The comparison of two signals can find similarities between the latter two. Each similarity found is then accompanied by a probability score. The closer this score is to 1, the stronger the similarity; 2) Adjustment thanks to the detection of visual cues in the segments of the previous cut;

3) Refinement based on the detection of scenes in the video edit.

The present invention makes it possible to describe in the form of rules, as a function of the program, the sequence of the algorithms to be used in order to detect the relevant timecodes.

The list of algorithms available is as follows:

- Detection of audio similarities by "cross-correlation"; - Detection and recognition of an audio signal from the Youtube-8M convolutional neural network model;

- Detection of similarities of a visual part by SSIM;

- Face detection and recognition based on the implementation of the Eigen-Faces algorithm; - Face detection and recognition from different models of convolutional neural networks: ResNet 50, InceptionVB, DensNet, YOLOv2, MobileNet-SSD, MTCNN;

- Classification of images from different models of convolutional neural networks: ResNet50, InceptionVB, MobileNet.

The automatic placement of interactive elements also relies on cues such as text embedded in the video or keywords resulting from the transcription of the audio track of the stream into text.

Claims

CLAIMS:

1. Automated system capable of analyzing and then transforming a linear audiovisual program or image stream of any kind (TV shows, films, documentaries, etc.) into an interactive audiovisual program that can be viewed on a mobile phone, any screen or connected device in particular, tablet, Apple TV, Android TV, and / or Smart TV characterized in that its interface, while browsing and analyzing the linear audiovisual program or the original video stream frame by frame, using algorithms proceeding by extraction, serving to detect and label, on the basis of the desired user experience and defined beforehand, chapters, event sequences or marking points being contextual and relevant editorial content, as well as for the addition and placement of this content and / or functional objects at all places in the linear audiovisual program or the original video stream, these additions and placements being automatically deduced from the user experience s desired.

2. Automated system capable of analyzing and then transforming a linear audiovisual program or image stream of any kind into an interactive audiovisual program that can be viewed on a mobile phone, screen or connected device, according to claim 1, characterized in that the audio indices or visual, metadata, make it possible to organize the sequencing and the cut between two chapters, first cutting following the detection of audio or visual clues of the video, then to adjust and refine them from the clues or scenes of video to achieve a thematic or partial interactivity in accordance with the desired user experience.

3. Automated system capable of analyzing and then transforming a linear audiovisual program or image stream of any kind into an interactive audiovisual program that can be viewed on a mobile phone, screen or connected device, according to claims 1 and 2, characterized in that the Additional content and / or functional objects are integrated into the initial video and reformatted for the desired user experience.

4. Automated system capable of analyzing and then transforming a linear audiovisual program or image stream of any kind into an interactive audiovisual program that can be viewed on a mobile phone, screen or connected device, according to claims 1 to 3, characterized in that this technology makes it possible to analyze and transform any content whatever its origin, including videos accessible to the public, live or in deferred time, coming from any hypertext or search engine, and databases of Every type.

5. Automated system capable of analyzing and then transforming a linear audiovisual program or image stream of any kind into an interactive audiovisual program that can be viewed on a mobile phone, screen or connected device, according to claims 1 to 3, characterized in that this The system consists of a Smart-Video platform device comprising, on the one hand, an editor for editing, publishing, sharing and providing access to enriched content, on the other hand a temporal description file of synchronized metadata, generated via the video editor, and finally a video player for the use of Smart-Video and associated data by the public.

6. Smart-Video platform device, according to claim 5, characterized in that the metadata are accessible from codes and functionalities (API, SDK) making the video player replaceable and the reading of interactive programs interoperable with any other player, technological platform or any other digital device.