FR3052007A1

FR3052007A1 - METHOD AND DEVICE FOR RECEIVING AUDIOVISUAL CONTENT AND CORRESPONDING COMPUTER PROGRAM

Info

Publication number: FR3052007A1
Application number: FR1654887A
Authority: FR
Inventors: Santos Martinho Dos; Chantal Guionnet
Original assignee: Orange SA
Current assignee: Orange SA
Priority date: 2016-05-31
Filing date: 2016-05-31
Publication date: 2017-12-01

Abstract

L'invention concerne un procédé de réception de contenu audiovisuel à partir d'un flux audiovisuel qui comprend une composante vidéo et une composante audio, une composante de sous-titrage étant générée en association avec la composante audio et contenant une pluralité de mots représentatifs des données audio de la composante audio, ledit procédé étant caractérisé en ce qu'il met en œuvre une restitution du contenu, au cours de laquelle entre deux instants successifs de restitution d'une scène vidéo courante de la composante vidéo, dans le cas où les données audio courantes délivrées entre lesdits deux instants successifs sont représentatives d'une pluralité de K mots, tels que K≥2, lesdits K mots sont restitués entre lesdits deux instants successifs, respectivement à K instants successifs, un mot restitué à un instant donné remplaçant le mot restitué à un instant qui précède immédiatement ledit instant donné.The invention relates to a method for receiving audiovisual content from an audiovisual stream that includes a video component and an audio component, a subtitle component being generated in association with the audio component and containing a plurality of words representative of audio data of the audio component, said method being characterized in that it implements a restitution of the content, during which between two successive instants of restitution of a current video scene of the video component, in the case where the common audio data delivered between said two successive instants are representative of a plurality of K words, such that K≥2, said K words are restored between said two successive instants, respectively at K successive instants, a word restored at a given instant replacing the word restored at a time immediately preceding said given instant.

Description

Procédé et dispositif de réception de contenu audiovisuei et programme d’ordinateur correspondantMethod and device for receiving audiovisual content and corresponding computer program

Domaine de i'inventionField of Invention

Le domaine de l'invention est celui de la restitution de contenus audiovisuels par un terminal, à partir d’un flux audiovisuel qui comprend une composante vidéo et une composante audio, une composante de sous-titrage étant générée en association avec la composante audio et contenant une suite de mots représentatifs des données audio de la composante audio. L’invention s’applique aussi bien à des contenus audiovisuels diffusés en temps réel sur un terminal utilisateur, qu’à des contenus audiovisuels préalablement enregistrés dans ce dernier.The field of the invention is that of the reproduction of audiovisual contents by a terminal, from an audiovisual stream which comprises a video component and an audio component, a subtitling component being generated in association with the audio component and containing a series of words representative of the audio data of the audio component. The invention applies both to audiovisual content broadcast in real time on a user terminal, as audiovisual content previously recorded in the latter.

De façon plus particulière, l’invention s’applique à la façon dont sont restitués les mots de la composante de sous-titrage lors de la restitution du contenu. L’invention peut notamment être mise en œuvre dans un terminal doté d’une interface utilisateur et d’une interface graphique, par exemple une tablette, un téléphone portable, un smartphone (« téléphone intelligent »), un ordinateur personnel, une télévision connectée à un réseau de communication, etc...More particularly, the invention applies to the way in which the words of the subtitling component are rendered during the rendering of the content. The invention can in particular be implemented in a terminal with a user interface and a graphical interface, for example a tablet, a mobile phone, a smartphone ("smart phone"), a personal computer, a connected television to a communication network, etc.

Présentation de l’art antérieurPresentation of the prior art

Aujourd’hui, les techniques de sous-titrage d’un contenu audiovisuel permettent à un utilisateur, à un instant courant de restitution dudit contenu, de visualiser sur un écran d’affichage, en même temps que le contenu, une ou plusieurs phrases de retranscription du dialogue prononcé par le ou les personnages intervenant dans le contenu, à l’instant courant. De telles phrases de retranscription sont affichées sur une ou plusieurs lignes, selon leur longueur, et sont généralement disposées en bas de l’écran d’affichage.Today, the techniques of subtitling audiovisual content allow a user, at a current moment of rendering said content, to display on a display screen, along with the content, one or more sentences of retranscription of the dialogue pronounced by the character or characters intervening in the content, at the current moment. Such retranscription sentences are displayed on one or more lines, depending on their length, and are generally arranged at the bottom of the display screen.

Un inconvénient d’un tel procédé réside dans le fait que le confort de visualisation de l’utilisateur n’est pas satisfaisant. En effet, l’utilisateur passe son temps à faire des mouvements des yeux de gauche à droite, ce qui est une contrainte physique qui perturbe le suivi d’un contenu audiovisuel, par exemple un film, et engendre une fatigue oculaire supplémentaire. L’affichage des sous-titres sous forme de phrases impose par ailleurs une taille maximum de la police de caractères, ce qui peut être difficile à lire par des utilisateurs souffrant de problèmes visuels.A disadvantage of such a method lies in the fact that the viewing comfort of the user is not satisfactory. Indeed, the user spends his time doing eye movements from left to right, which is a physical constraint that disrupts the monitoring of audiovisual content, for example a film, and causes additional eye strain. The display of subtitles in the form of sentences also imposes a maximum size of the font, which can be difficult to read by users with visual problems.

Un autre inconvénient des techniques de sous-titrage actuelles réside dans le fait que l’utilisateur lit à son propre rythme la ou les phrases des sous-titres qui s’affichent à l’écran, ce qui peut provoquer une désynchronisation et une absence de rythme entre les phrases qui sont lues et les paroles réellement énoncées.Another disadvantage of the current captioning techniques is that the user reads at his own pace the subtitle sentence or sentences that appear on the screen, which can cause a desynchronization and a lack of rhythm between the sentences that are read and the words actually spoken.

Par ailleurs, lorsqu’une phrase prononcée par un locuteur n’a pas un débit constant, le locuteur pouvant par exemple prononcer le début de la phrase très rapidement et la fin de la phrase plus lentement, l’utilisateur ne parviendra pas distinguer cette coupure de rythme en lisant la phrase sous-titrée.On the other hand, when a sentence pronounced by a speaker does not have a constant bit rate, the speaker being able for example to pronounce the beginning of the sentence very quickly and the end of the sentence more slowly, the user will not be able to distinguish this break. rhythm by reading the subtitled sentence.

En outre, certains dialogues ne sont pas systématiquement retranscrits fidèlement, et sont parfois retranscrits ou traduits à l’aide de phrases plus concises afin de respecter la place disponible pour les sous-titres.In addition, certain dialogues are not systematically transcribed accurately, and are sometimes transcribed or translated with shorter sentences to respect the space available for subtitles.

Tous ces inconvénients ne permettent donc pas à l’utilisateur d’éprouver un réel sentiment d’immersion par rapport au contenu qui lui est restitué car le sous-titrage actuel sous la forme de phrases entières tend à dénaturer le contenu initial.All these disadvantages therefore do not allow the user to experience a real feeling of immersion in relation to the content that is returned to him because the current subtitling in the form of whole sentences tends to distort the initial content.

Objet et résumé de l’inventionObject and summary of the invention

Un des buts de l'invention est de remédier à des inconvénients de l'état de la technique précité. A cet effet, un objet de la présente invention concerne un procédé de réception de contenu audiovisuel à partir d’un flux audiovisuel qui comprend une composante vidéo et une composante audio, une composante de sous-titrage étant générée en association avec la composante audio et contenant une suite de mots représentatifs des données audio de la composante audio.One of the aims of the invention is to overcome disadvantages of the state of the art mentioned above. For this purpose, an object of the present invention relates to a method for receiving audiovisual content from an audiovisual stream which comprises a video component and an audio component, a subtitling component being generated in association with the audio component and containing a series of words representative of the audio data of the audio component.

Un tel procédé est remarquable en ce qu’il met en œuvre une restitution du contenu, au cours de laquelle entre deux instants successifs de restitution d’une scène vidéo courante de la composante vidéo, dans le cas où les données audio courantes délivrées entre les deux instants successifs sont représentatives d’une pluralité de K mots, tels que K>2, les K mots sont restitués entre les deux instants successifs, respectivement à K instants successifs, un mot restitué à un instant donné remplaçant le mot restitué à un instant qui précède immédiatement ledit instant donné.Such a method is remarkable in that it implements a restitution of the content, during which between two successive instants of restitution of a current video scene of the video component, in the case where the current audio data delivered between the two successive instants are representative of a plurality of K words, such that K> 2, the K words are restored between the two successive instants, respectively at K successive instants, a word restored at a given instant replacing the word restored at a given instant immediately preceding said given instant.

Une telle disposition permet avantageusement, au fur et à mesure de la restitution d’un contenu donné, de restituer textuellement à l’écran chaque mot prononcé, un par un, un mot étant restitué à la place du mot qui a été restitué juste avant. Le confort de visualisation de l’utilisateur s’en trouve nettement amélioré puisqu’il n’a plus à tourner les yeux sans cesse de la gauche vers la droite pour lire les phrases de sous-titres.Such a disposition advantageously makes it possible, as and when the retrieval of a given content, to reproduce textually on the screen each spoken word, one by one, a word being restored in place of the word which has been restored just before . The viewing comfort of the user is significantly improved since he no longer has to turn his eyes constantly from left to right to read the subtitle sentences.

En outre, une telle disposition permet une adaptation dynamique du rythme d’affichage des mots au débit réel de la ou des paroles prononcées par les locuteurs intervenant dans le contenu.In addition, such an arrangement allows a dynamic adaptation of the rate of display of words to the actual flow of the word or words uttered by the speakers involved in the content.

Ainsi l’utilisateur pourra aussi prendre conscience de la vitesse de parole de chaque locuteur, des blancs, des hésitations dans les phrases prononcées, ce qui va lui permettre de mieux ressentir le caractère/comportement de chaque personnage, le vécu réel des scènes, etc.... En particulier, la sensation d’immersion de l’utilisateur par rapport au contenu est nettement plus marquée en comparaison avec les procédés de restitution de contenus de l’art antérieur.Thus the user can also become aware of the speed of speech of each speaker, whites, hesitations in pronounced sentences, which will allow him to better feel the character / behavior of each character, the actual lived scenes, etc. In particular, the sensation of immersion of the user with respect to the content is much more marked in comparison with prior art content rendering methods.

On entend par restitution, soit la visualisation d’un contenu, soit l’écoute d’un contenu, soit les deux en même temps.Restitution means viewing a content, listening to content, or both at the same time.

On entend par scène vidéo une séquence de plusieurs images qui défilent sur une plage temporelle donnée.A video scene is a sequence of several images that scroll over a given time range.

Le procédé de réception selon l’invention est par exemple mis en œuvre dans un terminal, tel qu’une set-top-box ou encore dans un terminal connecté à la set-top-box, tel que par exemple une tablette, un téléviseur, etc....The reception method according to the invention is for example implemented in a terminal, such as a set-top-box or in a terminal connected to the set-top box, such as for example a tablet, a television set. , etc ....

Selon un mode de réalisation particulier, la composante de sous-titrage étant générée préalablement à la restitution du contenu, chacun des K mots, prononcés aux K instants successifs, ayant été associés préalablement à un identifiant représentatif du locuteur qui prononce le mot, la restitution du contenu à un instant considéré parmi les K instants, est mise en oeuvre : - par une analyse préalable de l’identifiant correspondant à l’instant considéré, - par un déclenchement d’une commande d’affichage, à l’instant considéré, du mot associé à l’identifiant analysé, en correspondance visuelle avec le locuteur associé à l’identifiant analysé.According to a particular embodiment, the subtitling component being generated prior to the return of the content, each of the K words, pronounced at successive K instants, having previously been associated with an identifier representative of the speaker who pronounces the word, the restitution content at a moment considered among the K instants, is implemented: - by a prior analysis of the identifier corresponding to the instant considered, - by triggering a display command, at the moment considered, the word associated with the analyzed identifier, in visual correspondence with the speaker associated with the analyzed identifier.

Une telle disposition permet avantageusement à l’utilisateur de distinguer facilement à l’écran quel est le locuteur qui prononce le mot affiché à l’instant considéré, en particulier lorsque plusieurs locuteurs parlent quasiment en même temps ou bien plusieurs locuteurs apparaissent en même temps à l’écran.Such a provision advantageously allows the user to easily distinguish on the screen which speaker speaks the word displayed at the moment considered, especially when several speakers speak almost simultaneously or several speakers appear at the same time to the screen.

Selon encore un autre mode de réalisation particulier, la composante de sous-titrage étant générée préalablement à la restitution du contenu et la pluralité de K mots étant prononcée par un même locuteur aux K instants successifs, le premier mot de la pluralité de K mots ayant en outre été associé préalablement à un identifiant représentatif de ce même locuteur, la restitution du contenu est mise en oeuvre : - au premier instant parmi les K instants, par une analyse préalable de l’identifiant correspondant au premier mot de la pluralité de K mots, - par un déclenchement d’une commande d’affichage successif des K mots, respectivement aux K instants, en correspondance visuelle avec ledit même locuteur associé à l’identifiant analysé.According to yet another particular embodiment, the subtitling component being generated prior to the return of the content and the plurality of K words being pronounced by the same speaker at successive K instants, the first word of the plurality of K words having in addition, previously associated with an identifier representative of the same speaker, the restitution of the content is implemented: - at the first moment among the K instants, by a prior analysis of the identifier corresponding to the first word of the plurality of K words by triggering a command for successively displaying K words, respectively at K instants, in visual correspondence with said same speaker associated with the analyzed identifier.

Une telle disposition permet avantageusement à l’utilisateur de distinguer facilement à l’écran quel est le locuteur qui prononce le mot affiché à l’instant courant, dans le cas où plusieurs personnes sont affichées à l’écran.Such a provision advantageously allows the user to easily distinguish on the screen which speaker speaks the word displayed at the current time, in the case where several people are displayed on the screen.

En outre, le fait de ne pas associer systématiquement un identifiant de locuteur à chaque mot prononcé à la suite par le même locuteur permet de réduire fortement le coût de signalisation du flux audiovisuel.Furthermore, the fact of not systematically associating a speaker identifier with each word uttered subsequently by the same speaker makes it possible to greatly reduce the cost of signaling the audiovisual stream.

Selon encore un autre mode de réalisation particulier, la composante de sous-titrage étant générée préalablement à la restitution du contenu et la pluralité de K mots se composant : - d’une première pluralité de J mots prononcés respectivement à J instants par un premier locuteur, - d’au moins une deuxième pluralité de L mots prononcés respectivement à L instants par un deuxième locuteur, à la suite des J mots prononcés par le premier locuteur, le premier mot de la pluralité de J mots ayant été associé préalablement à un premier identifiant représentatif du premier locuteur et le premier mot de la pluralité de L mots ayant été associé préalablement à un deuxième identifiant représentatif du deuxième locuteur, la restitution dudit contenu est mise en oeuvre : - au premier instant de la pluralité de J instants, par une analyse préalable du premier identifiant associé au premier mot de la pluralité de J mots, - par un déclenchement d’une commande d’affichage successif des J mots, respectivement aux J instants, en correspondance visuelle avec le premier locuteur associé au premier identifiant analysé, - au premier instant de la pluralité de L instants, par une analyse préalable du deuxième identifiant associé au premier mot de la pluralité de L mots, - par un déclenchement d’une commande d’affichage successif des L mots, respectivement aux L instants, en correspondance visuelle avec le deuxième locuteur associé au deuxième identifiant analysé.According to yet another particular embodiment, the subtitling component being generated prior to the return of the content and the plurality of K words consisting of: - a first plurality of J words uttered respectively J moments by a first speaker at least a second plurality of L words uttered respectively at L times by a second speaker, following the J words uttered by the first speaker, the first word of the plurality of J words having been previously associated with a first speaker; representative identifier of the first speaker and the first word of the plurality of L words having previously been associated with a second identifier representative of the second speaker, the restitution of said content is implemented: - at the first instant of the plurality of J instants, by a preliminary analysis of the first identifier associated with the first word of the plurality of J words, - by triggering a comma n of successive display of the J words, respectively J instant, in visual correspondence with the first speaker associated with the first identifier analyzed, - at the first moment of the plurality of L instants, by a prior analysis of the second identifier associated with the first word of the plurality of L words, - by triggering a command for successive display of L words, respectively at L instants, in visual correspondence with the second speaker associated with the second identifier analyzed.

Une telle disposition permet à l’utilisateur, lorsqu’au moins deux locuteurs parlent successivement, d’identifier distinctement et facilement le changement de locuteur.Such an arrangement allows the user, when at least two speakers speak successively, to identify the change of speaker clearly and easily.

Selon encore un autre mode de réalisation particulier, la composante de sous-titrage étant générée préalablement à la restitution du contenu et la pluralité de K mots étant prononcée par un même locuteur aux K instants successifs précités, à une vitesse donnée, la vitesse donnée ayant en outre été associée préalablement à un identifiant représentatif de la valeur de la vitesse donnée, la restitution du contenu est mise en oeuvre : - par une analyse préalable de l’identifiant représentatif de la valeur de la vitesse donnée, - par un calcul, en fonction de la valeur de la vitesse associée à l’identifiant analysé, de chacun des K instants de prononciation des K mots, - par un déclenchement d’une commande d’affichage successif des K mots, respectivement aux K instants calculés.According to yet another particular embodiment, the subtitling component being generated prior to the return of the content and the plurality of K words being pronounced by the same speaker at K successive successive times, at a given speed, the given speed having in addition, previously associated with an identifier representative of the value of the given speed, the restitution of the content is implemented: - by a preliminary analysis of the identifier representative of the value of the given speed, - by a calculation, in a function of the value of the speed associated with the analyzed identifier, of each of the K instruction instants of the K words, - by triggering a command for successive display of the K words, respectively at the K instants calculated.

Une telle disposition permet avantageusement de conserver la structure des composantes de sous-titrage existantes, c’est à dire qui se présentent sous la forme de groupes de phrases successifs, tout en restituant textuellement à l’écran chaque mot prononcé, un par un, au fur et à mesure de la restitution d’un contenu donné, chaque mot restitué à un instant considéré remplaçant le mot qui est restitué à un instant qui précède immédiatement l’instant considéré.Such an arrangement advantageously makes it possible to preserve the structure of the existing subtitling components, that is to say which appear in the form of groups of successive sentences, while restoring each spoken word verbatim on the screen, one by one, as and when restitution of a given content, each word restored at a given instant replacing the word which is restored at a time immediately preceding the moment considered.

Une telle fonctionnalité est rendue possible par le fait que les phrases d’une composante de sous-titrage classique sont avantageusement préalablement découpées en sous-phrases correspondant respectivement à des vitesses de prononciation différentes.Such a feature is made possible by the fact that the sentences of a conventional subtitling component are advantageously previously divided into sub-sentences respectively corresponding to different pronunciation rates.

En outre, chaque mot d’une phrase prononcée est avantageusement affiché au rythme de prononciation réel de cette phrase, ce qui améliore nettement la sensation d’immersion de l’utilisateur par rapport au contenu.In addition, each word of a spoken sentence is advantageously displayed at the actual pronunciation rate of this sentence, which significantly improves the user's feeling of immersion with respect to the content.

Selon encore un autre mode de réalisation particulier, la composante de sous-titrage est générée simultanément à une analyse du contenu, par la mise en oeuvre de ce qui suit : - identification, à un instant considéré parmi les K instants de prononciation des K mots : • d’une ou de plusieurs données audio de la composante audio délivrées à l’instant considéré et représentatives de l’un des K mots, • de la scène vidéo correspondante synchronisée avec la ou les données audio délivrées, - conversion textuelle de la ou des données audio délivrées à l’instant considéré, en un mot, - déclenchement d’une commande de restitution synchronisée de la ou des données audio, de la scène vidéo correspondante et du mot, en association avec l’instant considéré.According to yet another particular embodiment, the subtitling component is generated simultaneously with an analysis of the content, by the implementation of the following: identification, at a given moment among the K K word pronunciation instants : • one or more audio data of the audio component delivered at the moment considered and representative of one of the K words, • of the corresponding video scene synchronized with the audio data or data delivered, - textual conversion of the or audio data delivered at the instant in question, in one word, triggering a synchronized reproduction control of the audio data or data, the corresponding video scene and the word, in association with the instant in question.

Une telle disposition permet d’éviter de transmettre dans le flux audiovisuel une composante de sous-titrage, ce qui permet de réduire fortement le coût de signalisation du flux audiovisuel, tout en permettant avantageusement, à un instant de restitution considéré d’un contenu donné, de restituer textuellement à l’écran un mot prononcé à la fois, de façon synchronisée avec le son et l’image délivrés à l’instant de restitution considéré.Such an arrangement makes it possible to avoid transmitting a subtitling component in the audiovisual stream, which makes it possible to greatly reduce the cost of signaling the audiovisual stream, while at the same time advantageously allowing, at a given moment of restitution of a given content. , to reproduce textually on the screen a word spoken at a time, synchronized with the sound and the image delivered at the instant of restitution considered.

Selon encore un autre mode de réalisation particulier, l’analyse du contenu comprend, à l’instant considéré parmi les K instants de prononciation des K mots: - une identification de la fréquence vocale associée à la ou aux données audio correspondantes délivrées et représentatives d’un des K mots prononcé par un locuteur considéré, - une mise en correspondance de la fréquence vocale identifiée avec des informations relatives au locuteur considéré qui ont été associées préalablement à la fréquence vocale, le procédé de réception mettant en oeuvre, à l’instant considéré de restitution, un déclenchement d’une commande d’affichage du mot correspondant à la/aux données audio délivrées, en correspondance visuelle avec le locuteur considéré.According to yet another particular embodiment, the analysis of the content comprises, at the instant considered among the K instants of pronunciation of the K words: an identification of the voice frequency associated with the corresponding audio data or data delivered and representative of one of the K words uttered by a speaker considered, - a matching of the identified voice frequency with information relating to the speaker in question which has been previously associated with the voice frequency, the reception method implementing, at the moment considered restitution, a triggering of a word display command corresponding to the audio data / delivered, in visual correspondence with the speaker considered.

Une telle disposition permet avantageusement à l’utilisateur, lorsque la composante de sous-titrage est générée en temps réel, de distinguer facilement à l’écran quel est le locuteur qui prononce le mot affiché à l’instant courant, en particulier lorsque plusieurs locuteurs parlent quasiment en même temps ou bien lorsque plusieurs locuteurs sont affichés à l’écran.Such a provision advantageously allows the user, when the subtitling component is generated in real time, to easily distinguish on the screen which speaker speaks the word displayed at the current time, especially when several speakers speak almost at the same time or when several speakers are displayed on the screen.

Les différents modes ou caractéristiques de réalisation précités peuvent être ajoutés indépendamment ou en combinaison les uns avec les autres, au procédé de réception défini ci-dessus. L’invention concerne également un dispositif de réception de contenu audiovisuel à partir d’un flux audiovisuel qui comprend une composante vidéo et une composante audio, une composante de sous-titrage étant générée en association avec la composante audio et contenant une pluralité de K mots représentatifs des données audio de la composante audio (K>2).The aforementioned different embodiments or features may be added independently or in combination with each other to the reception method defined above. The invention also relates to a device for receiving audiovisual content from an audiovisual stream that comprises a video component and an audio component, a subtitling component being generated in association with the audio component and containing a plurality of K words. representative of the audio data of the audio component (K> 2).

Un tel dispositif de réception est remarquable en ce qu’il comprend un processeur qui est agencé pour mettre en oeuvre une restitution du contenu, au cours de laquelle entre deux instants successifs de restitution d’une scène vidéo courante de la composante vidéo, dans le cas où les données audio courantes délivrées entre les deux instants successifs sont représentatives d’une pluralité de K mots, tels que K>2, les K mots sont restitués entre les deux instants successifs, respectivement à K instants successifs, un mot restitué à un instant donné remplaçant le mot restitué à un instant qui précède immédiatement l’instant donné. L'invention concerne encore un programme d'ordinateur comportant des instructions pour mettre en oeuvre le procédé de réception selon l'invention, lorsqu'il est exécuté sur un terminal ou plus généralement sur un ordinateur.Such a reception device is remarkable in that it comprises a processor which is arranged to implement a restitution of the content, during which between two successive instants of restitution of a current video scene of the video component, in the case where the current audio data delivered between the two successive instants are representative of a plurality of K words, such that K> 2, the K words are restored between the two successive instants, respectively at K successive times, a word restored to a given instant replacing the word restored at a time immediately preceding the given moment. The invention also relates to a computer program comprising instructions for implementing the reception method according to the invention, when it is executed on a terminal or more generally on a computer.

Chacun de ces programmes peut utiliser n’importe quel langage de programmation, et être sous la forme de code source, code objet, ou de code intermédiaire entre code source et code objet, tel que dans une forme partiellement compilée, ou dans n’importe quelle autre forme souhaitable. L’invention vise également un support d’enregistrement lisible par un ordinateur sur lequel est enregistré un programme d’ordinateur, ce programme comportant des instructions adaptées à la mise en oeuvre du procédé de réception selon l'invention, tel que décrit ci-dessus.Each of these programs can use any programming language, and be in the form of source code, object code, or intermediate code between source code and object code, such as in a partially compiled form, or in any form what other form is desirable. The invention also relates to a recording medium readable by a computer on which a computer program is recorded, this program comprising instructions adapted to the implementation of the reception method according to the invention, as described above. .

Un tel support d'enregistrement peut être n'importe quelle entité ou dispositif capable de stocker le programme. Par exemple, le support peut comporter un moyen de stockage, tel qu'une ROM, par exemple un CD ROM ou une ROM de circuit microélectronique, une clé USB, ou encore un moyen d'enregistrement magnétique, par exemple un disque dur. D'autre part, un tel support d'enregistrement peut être un support transmissible tel qu'un signal électrique ou optique, qui peut être acheminé via un câble électrique ou optique, par radio ou par d'autres moyens. Le programme selon l'invention peut être en particulier téléchargé sur un réseau de type Internet.Such a recording medium may be any entity or device capable of storing the program. For example, the medium may comprise storage means, such as a ROM, for example a CD ROM or a microelectronic circuit ROM, a USB key, or a magnetic recording means, for example a hard disk. On the other hand, such a recording medium may be a transmissible medium such as an electrical or optical signal, which may be conveyed via an electrical or optical cable, by radio or by other means. The program according to the invention can be downloaded in particular on an Internet type network.

Alternativement, le support d'enregistrement peut être un circuit intégré dans lequel le programme est incorporé, le circuit étant adapté pour exécuter ou pour être utilisé dans l'exécution du procédé de réception précité.Alternatively, the recording medium may be an integrated circuit in which the program is incorporated, the circuit being adapted to execute or to be used in the execution of the aforementioned reception method.

Le dispositif de réception, le programme d'ordinateur et le support d’enregistrement correspondants précités présentent au moins les mêmes avantages que ceux conférés par le procédé de réception selon la présente invention.The receiving device, the computer program and the corresponding recording medium mentioned above have at least the same advantages as those conferred by the reception method according to the present invention.

Liste des figures D'autres avantages et caractéristiques de l'invention apparaîtront plus clairement à la lecture de la description suivante de plusieurs modes de réalisation particuliers de l'invention, donnés à titre de simples exemples illustratifs et non limitatifs, et des dessins annexés, parmi lesquels : - la figure 1 présente de façon schématique une architecture dans laquelle est mis en oeuvre le procédé de réception de contenu audiovisuel selon l’invention ; - la figure 2 présente de façon schématique les étapes d’un procédé de génération de flux audiovisuel selon un premier mode de réalisation de l’invention ; - la figure 3 représente la structure du flux audiovisuel généré selon le procédé de la figure 2 ; - les figures 4A à 4C représentent différentes variantes de la composante de sous-titrage du flux audiovisuel généré selon le procédé de la figure 2 ; - la figure 5 présente de façon schématique les étapes d’un procédé de génération de flux audiovisuel selon un deuxième mode de réalisation de l’invention ; - la figure 6 représente la structure du flux audiovisuel généré selon le procédé de la figure 5 ; - les figures 7A à 7D représentent différentes variantes de la composante de sous-titrage du flux audiovisuel généré selon le procédé de la figure 5 ; - la figure 8 présente de façon schématique les étapes d’un procédé de génération de flux audiovisuel selon un troisième mode de réalisation de l’invention ; - la figure 9 représente la structure du flux audiovisuel généré selon le procédé de la figure 8 ; - la figure 10 présente la structure simplifiée d’un dispositif de réception de flux audiovisuel selon un mode de réalisation de l’invention ; - la figure 11 présente de façon schématique les étapes d’un procédé de réception de flux audiovisuel selon un premier mode de réalisation l’invention ; les figures 12A et 12B représentent respectivement deux exemples de scènes vidéo sous-titrées telles que restituées selon deux variantes différentes du procédé de réception de la figure 11 ; - la figure 13 présente de façon schématique les étapes d’un procédé de réception de flux audiovisuel selon un deuxième mode de réalisation l’invention ; - les figures 14A et 14B représentent respectivement deux exemples de scènes vidéo sous-titrées telles que restituées selon deux variantes différentes du procédé de réception de la figure 13 ; - la figure 15 présente de façon schématique les étapes d’un procédé de réception de flux audiovisuel selon un troisième mode de réalisation l’invention.List of Figures Other advantages and characteristics of the invention will appear more clearly on reading the following description of several particular embodiments of the invention, given as simple illustrative and non-limiting examples, and the accompanying drawings. among which: - Figure 1 shows schematically an architecture in which is implemented the method of receiving audiovisual content according to the invention; FIG. 2 schematically shows the steps of a method for generating audiovisual streams according to a first embodiment of the invention; FIG. 3 represents the structure of the audiovisual stream generated according to the method of FIG. 2; FIGS. 4A to 4C show different variants of the subtitling component of the audiovisual stream generated according to the method of FIG. 2; FIG. 5 schematically shows the steps of a method for generating audiovisual streams according to a second embodiment of the invention; FIG. 6 represents the structure of the audiovisual stream generated according to the method of FIG. 5; FIGS. 7A to 7D represent different variants of the subtitling component of the audiovisual stream generated according to the method of FIG. 5; FIG. 8 schematically shows the steps of a method for generating audiovisual streams according to a third embodiment of the invention; FIG. 9 represents the structure of the audiovisual stream generated according to the method of FIG. 8; FIG. 10 shows the simplified structure of an audiovisual stream receiving device according to one embodiment of the invention; FIG. 11 schematically shows the steps of an audiovisual stream reception method according to a first embodiment of the invention; FIGS. 12A and 12B respectively represent two examples of subtitled video scenes as restored in two different variants of the reception method of FIG. 11; FIG. 13 schematically shows the steps of an audiovisual stream reception method according to a second embodiment of the invention; FIGS. 14A and 14B respectively represent two examples of subtitled video scenes as restored according to two different variants of the reception method of FIG. 13; FIG. 15 schematically shows the steps of an audiovisual stream reception method according to a third embodiment of the invention.

Description de modes de réalisation particuliers de l’inventionDescription of particular embodiments of the invention

En relation avec la figure 1, on présente une architecture dans laquelle est mis en oeuvre le procédé de réception de contenus audiovisuels selon l’invention.In relation with FIG. 1, an architecture is presented in which the method for receiving audiovisual contents according to the invention is implemented.

Une telle architecture comprend un terminal TER d’accès à des contenus proposés par une plateforme de service PFS, via un réseau de communication RC, tel que par exemple de type IP (abréviation anglaise de « Internet Protocol >>). La plateforme de service PFS propose différents contenus à l’utilisateur UT du terminal TER, tels que par exemple: - des contenus télévisuels TV, notamment ceux à diffusion programmée dans une grille de programmes, tels que des films, des événements sportifs, des émissions, des journaux télévisuels, etc... - des catalogues VOD, - des catalogues de vidéos, - des catalogues de musique (clips, concert, etc...) - des catalogues de podcasts, - des catalogues de livres numériques, - des catalogues d’applications et/ou de services. L’architecture précitée permet à l’utilisateur UT du terminal TER d’avoir accès aux contenus proposés aussi bien en situation de mobilité qu’en situation de sédentarité.Such an architecture includes a TER terminal for accessing content offered by a PFS service platform, via an RC communication network, such as for example of the IP (abbreviation of "Internet Protocol") type. The service platform PFS offers different contents to the user UT of the terminal TER, such as for example: television television contents, in particular those for broadcast programmed into a program schedule, such as films, sports events, programs , television newscasts, etc ... - VOD catalogs, - video catalogs, - music catalogs (clips, concerts, etc ...) - podcast catalogs, - digital book catalogs, - catalogs of applications and / or services. The aforementioned architecture allows the user UT terminal TER to have access to content offered both in a situation of mobility in a sedentary situation.

En situation de mobilité, le terminal TER est par exemple un téléphone portable, un smartphone (« téléphone intelligent »), une tablette, un ordinateur portable, etc...In a situation of mobility, the TER terminal is for example a mobile phone, a smartphone ("smart phone"), a tablet, a laptop, etc.

En situation de sédentarité, le terminal TER pourrait être un ordinateur personnel de type PC.In a sedentary situation, the TER terminal could be a PC-type personal computer.

Toujours en situation de sédentarité, et comme représenté sur la figure 1, le terminal TER se compose par exemple : - d’un terminal d’accès STB qui est apte à recevoir et traiter les contenus en provenance de la plateforme PFS, - d’un terminal de restitution, par exemple un téléviseur TLV tel que représenté sur la figure 1, apte à restituer à l’utilisateur UT les contenus traités par le terminal d’accès STB.Still in a sedentary situation, and as represented in FIG. 1, the TER terminal is composed for example of: an access terminal STB which is able to receive and process the contents coming from the platform PFS, a reproduction terminal, for example a TV TLV as shown in Figure 1, adapted to restore the UT user the contents processed by the access terminal STB.

Dans un exemple de réalisation, le terminal d’accès et le terminal de restitution sont regroupés en un seul terminal. Il pourrait par exemple s’agir d’un téléviseur contenant un décodeur de type set-top-box, Dans un autre exemple, le terminal d’accès STB est une set-top-box et le terminal de restitution TER est une tablette faisant office de terminal de restitution connecté à la set-top-box au moyen d’un réseau local, par exemple sans fil, en particulier du type WiFi ou CPL (abréviation de « courants porteurs en ligne »). Selon d’autres exemples non représentés, le terminal TER pourrait être un téléphone portable, un smartphone (« téléphone intelligent »), le téléviseur TLV ou une radio connectée à un réseau de communication, etc... L’utilisateur UT peut interagir avec le terminal d’accès STB à l’aide d’une télécommande classique ou à l’aide du terminal TER qui comprend à cet effet une application logicielle adaptée de télécommande. Le terminal TER a alors la possibilité d’afficher une interface contenant des touches dédiées à des commandes préenregistrées. Ainsi, le terminal TER présente les mêmes fonctions qu’une télécommande classique de téléviseur. Par exemple, l’utilisateur peut demander la sélection d’un contenu reçu en provenance de la plateforme de services PFS, par simple pression des touches directionnelles «i— >>, >>, «î », « i » dans un menu associé à la visualisation et/ou à l’écoute des contenus reçus. L’utilisateur peut aussi valider le contenu sélectionné en pressant la touche «OK». Lorsque l’utilisateur active une touche de sa télécommande, un message comprenant la commande associée à cette touche est envoyé au terminal d’accès STB selon un protocole de communication adapté au réseau local utilisé.In an exemplary embodiment, the access terminal and the rendering terminal are grouped into a single terminal. It could, for example, be a television containing a set-top-box decoder. In another example, the access terminal STB is a set-top box and the terminal TER is a tablet. office of restitution terminal connected to the set-top-box by means of a local network, for example wireless, in particular of the WiFi or CPL type (abbreviation of "power lines in line"). According to other examples not shown, the TER terminal could be a mobile phone, a smartphone ("smart phone"), the TV TLV or a radio connected to a communication network, etc ... The UT user can interact with the access terminal STB using a conventional remote control or using the terminal TER which includes for this purpose a suitable software application remote control. The TER terminal then has the possibility of displaying an interface containing keys dedicated to prerecorded commands. Thus, the TER terminal has the same functions as a conventional TV remote control. For example, the user can request the selection of content received from the PFS service platform by simply pressing the "i- >>, >>," i "," i "directional keys in an associated menu. viewing and / or listening to received content. The user can also validate the selected content by pressing the "OK" key. When the user activates a key on his remote control, a message comprising the command associated with this key is sent to the access terminal STB according to a communication protocol adapted to the local network used.

Le terminal d’accès STB, de même que le terminal TER, comprennent en outre des moyens de connexion au réseau de communication RC qui peuvent être, par exemple, de type x-DSL, fibre ou encore 3G et 4G.The access terminal STB, as well as the terminal TER, furthermore comprise means of connection to the communication network RC which may be, for example, of the x-DSL, fiber or 3G and 4G type.

En relation avec la figure 2, on présente maintenant, selon un premier mode de réalisation, les étapes du procédé de génération d’un flux audiovisuel Fi représentatif d’un contenu audiovisuel G apte à être reçu par le terminal d’accès STB de la figure 1, via le réseau de communication RC.With reference to FIG. 2, the steps of the method for generating an audiovisual stream Fi representative of an audiovisual content G adapted to be received by the access terminal STB of the present invention are presented according to a first embodiment. Figure 1, via the communication network RC.

Un tel contenu audiovisuel Ci d’une durée T, consiste par exemple en un film, une émission de télévision, la retransmission d’un discours, un concert, un documentaire, un événement sportif tel qu’en particulier un match de football, etc.....Such audiovisual content Ci of a duration T, consists for example of a film, a television program, the retransmission of a speech, a concert, a documentary, a sports event such as in particular a football match, etc. .....

Le procédé de génération du flux audiovisuel Fi comprend une étape d’édition EDi au cours de laquelle le flux Fi est construit par insertion : - d’une composante vidéo CVi qui contient l’ensemble des données vidéo contribuant à la restitution des images du contenu audiovisuel G, - d’au moins une composante audio CAi qui contient l’ensemble des données audio contribuant à la restitution sonore complète du contenu audiovisuel G.The method for generating the audiovisual stream Fi comprises an editing step EDi during which the stream Fi is built by inserting: a video component CVi which contains all the video data contributing to the restitution of the images of the content audiovisual G, - at least one audio component CAi which contains all the audio data contributing to the complete sound reproduction of audiovisual content G.

Les composantes vidéo CVi et audio CAi sont insérées de façon synchronisée de telle sorte qu’à un instant de restitution courant du contenu audiovisuel, le son émis corresponde bien à l’image restituée à l’écran.The video components CVi and audio CAi are inserted in a synchronized manner so that at a current moment of restitution of the audiovisual content, the sound emitted corresponds to the image rendered on the screen.

Gomme représenté sur la figure 3 : - la composante vidéo GVi contient une séquence de M scènes vidéo SVi, SV2,..., SVu,...SVm qui défilent sur toute la durée T du contenu G, respectivement à différents instants successifs ti, ta,..., tu,..., tM, - la composante audio GAi contient une séquence de M ensembles de données audio Di, D2,..., Du,...Dm qui se succèdent sur toute la durée T du contenu G et qui sont synchronisés respectivement avec les M scènes vidéo de la composante vidéo GVi aux différents instants successifs ti, ta,..., tu,..., tM qui constituent des indicateurs temporels de synchronisation entre les données de la composante vidéo GVi et les données audio de la composante audio GAi.Eraser represented in FIG. 3: the video component GVi contains a sequence of M video scenes SVi, SV2,..., SVu,... SVm which scroll over the entire duration T of the content G, respectively at different successive times ti , ta, ..., you, ..., tM, - the audio component GAi contains a sequence of M sets of audio data Di, D2, ..., Du, ... Dm which follow each other over the duration T of the content G and which are respectively synchronized with the M video scenes of the video component GVi at the different successive instants ti, ta, ..., tu, ..., tM which constitute temporal indicators of synchronization between the data of the GVi video component and the audio data of the audio component GAi.

Ghacune des M scènes vidéo se compose d’une séquence de plusieurs images défilant à un rythme donné (24 images par seconde par exemple). En outre, le nombre d’images par scène vidéo n’est pas obligatoirement le même d’une scène vidéo à l’autre.Each of the M video scenes consists of a sequence of several images scrolling at a given rate (24 frames per second for example). In addition, the number of images per video scene is not necessarily the same from one video scene to another.

Chacun desdits M ensembles de données audio, ou bien, seulement certains d’entre eux, contiennent des données audio qui sont associées à un ou plusieurs mots prononcés par un locuteur ou plusieurs locuteurs intervenant dans le contenu C. Par locuteur intervenant dans le contenu G, on entend une personne qui prononce le ou les mots en étant visible sur l’image correspondante ou bien une personne qui prononce le ou les mots, sans être visible sur l’image correspondante (« voix off »).Each of said M sets of audio data, or only some of them, contain audio data that is associated with one or more words uttered by a speaker or several speakers involved in the content C. By speaker involved in the content G we mean a person who pronounces the word or words being visible on the corresponding image or a person who pronounces the word or words, without being visible on the corresponding image ("voice off").

Au cours d’une étape de signalisation SIG1 représentée à la figure 2, il est procédé en outre classiquement à l’insertion, dans un sous-flux de signalisation SFi associé au flux Fi, d’une pluralité d’informations permettant l’analyse du flux Fi par le terminal TER ou le terminal d’accès STB de l’utilisateur. De telles informations sont notamment des métadonnées, telles que par exemple un identifiant ID du contenu audiovisuel C, des informations de description DESC du contenu audiovisuel C, comme par exemple son genre (film, documentaire, sport, etc...), le nom et le prénom des personnes associées au contenu audiovisuel C (réalisateur, acteurs, actrices, sportifs, etc....), des informations temporelles associées, telles que par exemple la durée T du contenu audiovisuel C, l’heure de début et de fin de diffusion du contenu C si ce dernier est diffusé en temps réel, etc...During a signaling step SIG1 shown in FIG. 2, it is moreover conventionally performed to insert, in a signaling sub-flow SFi associated with the stream Fi, a plurality of information allowing the analysis of the Fi stream by the TER terminal or the access terminal STB of the user. Such information is in particular metadata, such as, for example, an identifier ID of the audiovisual content C, description information DESC of the audiovisual content C, for example its genre (film, documentary, sport, etc.), the name and the first name of the persons associated with the audiovisual content C (director, actors, actresses, sportsmen, etc.), associated temporal information, such as, for example, the duration T of the audiovisual content C, the start time and end of broadcast of content C if it is broadcast in real time, etc ...

Les étapes ED1 et SIG1 sont connues en tant que telles.Steps ED1 and SIG1 are known as such.

Selon le premier mode de réalisation de l’invention, de procédé de génération de flux audiovisuel comprend en outre une étape ST1 de sous-titrage du contenu audiovisuel C, au cours de laquelle une composante de sous-titrage CSi est générée en association avec la composante audio CAi. De façon connue en soi, la composante de sous-titrage CSi transporte la retranscription textuelle des phrases prononcées sur toute la durée T du contenu audiovisuel C.According to the first embodiment of the invention, the audiovisual stream generation method further comprises a step ST1 of subtitling the audiovisual content C, during which a subtitling component CSi is generated in association with the audio component CAi. In a manner known per se, the subtitling component CSi conveys the text transcript of sentences pronounced over the duration T of audiovisual content C.

Selon le premier mode de réalisation de l’invention, comme représenté sur la figure 3, à un instant courant tu d’apparition d’une scène vidéo SVu, et jusqu’à l’instant suivant tu+i d’apparition de la scène vidéo suivante SVu+i, dans le cas où l’ensemble correspondant Du de données audio de la composante audio CAi contient des données audio représentatives d’une pluralité de K mots mki, mk2,..., mj,..., mK constituant à la suite une ou plusieurs phrases, au cours de l’étape de sous-titrage ST1 de la figure 2, les K mots sont indexés temporellement un par un, respectivement à K instants successifs tki, tk2,..., tK, de façon à être restitués ultérieurement à l’écran, un par un, entre les deux instants successifs tu et tu+i d’apparition de la scène vidéo SVu, un mot restitué à un instant donné parmi les K instants remplaçant le mot restitué à l’instant qui précède immédiatement ledit instant donné. L’étape ST1 est ainsi réitérée pour chaque intervalle de temps [ti, ta], [ta, ta], et ainsi de suite jusqu’au dernier intervalle de temps [tM, Tj.According to the first embodiment of the invention, as represented in FIG. 3, at a current instant tu of appearance of a video scene SVu, and until the following moment tu + i of appearance of the scene next video SVu + i, in the case where the corresponding set of audio data of the audio component CAi contains audio data representative of a plurality of K words mki, mk2, ..., mj, ..., mK constituting one or more sentences later, during the subtitling step ST1 of FIG. 2, the K words are indexed temporally one by one, respectively at K successive moments tki, tk2, ..., tK, so as to be restored later on the screen, one by one, between the two successive instants tu and tu + i of appearance of the video scene SVu, a word restored at a given moment among the K instants replacing the word restored to the instant immediately preceding the given instant. Step ST1 is thus repeated for each time interval [t1, ta], [ta, ta], and so on until the last time interval [tM, Tj.

Selon une première variante de ce premier mode de réalisation, qui est représentée à la figure 4A, les K mots mki, mka,···, mK sont respectivement associés, dans la composante de sous-titrage CSi, à K identifiants lki, Ik2,···, Ik, tels que : - l’identifiant lki est représentatif du locuteur qui prononce le mot mki, - l’identifiant Ika est représentatif du locuteur qui prononce le mot trika,According to a first variant of this first embodiment, which is represented in FIG. 4A, the K words mki, mka, ···, mK are respectively associated, in the subtitling component CSi, with K identifiers lki, Ik2 , ···, Ik, such that: - the identifier lki is representative of the speaker who pronounces the word mki, - the identifier Ika is representative of the speaker who pronounces the word trika,

J - l’identifiant Ik est représentatif du locuteur qui prononce le mot nriK.J - the identifier Ik is representative of the speaker who pronounces the word nriK.

Un tel identifiant consiste par exemple dans : - le nom du locuteur, et/ou - une couleur d’affichage particulière du mot associé audit identifiant, et/ou - les coordonnées d’une zone de la scène vidéo SVu dans laquelle est destiné à être affiché le mot associé audit identifiant considéré.Such an identifier consists for example in: - the name of the speaker, and / or - a particular display color of the word associated with said identifier, and / or - the coordinates of an area of the video scene SVu in which is intended to displayed the word associated with said identifier considered.

Dans le cas où le locuteur qui prononce l’un des K mots, par exemple le mot mki, est visible dans la scène vidéo SVu, soit seul, soit avec d’autres personnes, les coordonnées de ladite zone d’affichage du mot mki sont déterminées de façon à ce que le mot mki s’affiche à l’écran dans une zone vidéo qui soit disposée à proximité dudit locuteur.In the case where the speaker who pronounces one of the K words, for example the word mki, is visible in the video scene SVu, either alone or with other persons, the coordinates of said display zone of the word mki are determined so that the word mki is displayed on the screen in a video zone that is arranged close to said speaker.

Dans le cas où le locuteur qui prononce l’un des K mots, par exemple le mot mki, n’est pas visible dans la scène vidéo SVu (« voix off >> par exemple), les coordonnées de ladite zone d’affichage du mot mki sont déterminées de façon à ce que le mot mki s’affiche à l’écran dans une zone neutre de l’image, par exemple en bas de cette dernière.In the case where the speaker who pronounces one of the K words, for example the word mki, is not visible in the video scene SVu ("voice off" for example), the coordinates of said display area of the mki word are determined so that the word mki appears on the screen in a neutral zone of the image, for example at the bottom of the latter.

Selon une deuxième variante de ce premier mode de réalisation qui est représentée à la figure 4B, dans le cas où les K mots sont tous prononcés par un même locuteur, uniquement le premier mot mki parmi lesdits K mots est associé à un identifiant lki du locuteur qui prononce les K mots.According to a second variant of this first embodiment which is represented in FIG. 4B, in the case where the K words are all uttered by the same speaker, only the first word mki among said K words is associated with a lki identifier of the speaker who pronounces the K words.

Selon une troisième variante de ce premier mode de réalisation qui est représentée à la figure 4C, dans le cas où la pluralité de K mots se compose d’une première pluralité de J mots mji, mj2,..., mj prononcés par un premier locuteur respectivement à J instants successifs tji, tj2,..., tj et d’au moins une deuxième pluralité de L mots mn, mi2,..., vni prononcés par un deuxième locuteur, à la suite du premier locuteur, respectivement à L instants successifs tu, ti2,..., tk, - au moins le premier mot mji parmi lesdits J mots est associé à un identifiant lji du premier locuteur qui prononce les J mots, - au moins le premier mot mn parmi lesdits L mots est associé à un identifiant In du deuxième locuteur qui prononce les L mots.According to a third variant of this first embodiment which is represented in FIG. 4C, in the case where the plurality of K words consists of a first plurality of J words mji, mj2, ..., mj pronounced by a first respectively at successive instants tji, tj2, ..., tj and at least a second plurality of L words mn, mi2, ..., vni pronounced by a second speaker, following the first speaker, respectively at The successive instants tu, ti2, ..., tk, - at least the first word mji among said J words is associated with an identifier lji of the first speaker who pronounces the J words, - at least the first word mn among said L words is associated with an identifier In of the second speaker who pronounces the L words.

Afin de prévoir la situation où un utilisateur accéderait au contenu dans l’intervalle de temps [tji, tj] (respectivement [tu, tJ), mais après l’instant tji (respectivement tii), chacun des mots suivants mj2 à mj (respectivement mi2 à ϊΏι) est en outre associé à l’identifiant lji du premier locuteur qui prononce les J mots (respectivement l’identifiant In du deuxième locuteur qui prononce les L mots). A l’issue de l’étape de sous-titrage ST1 de la figure 2, est généré le flux Fi comprenant la composante vidéo CVi, la composante audio CAi, la composante de sous-titrage CSi, en association avec le sous-flux SF^. Le flux Fl contient les indicateurs de synchronisation temporels ti, ta,..., tu,..., tM, de même que tous les instants de prononciation des mots.In order to predict the situation where a user would access the content in the time interval [tji, tj] (respectively [tu, tJ), but after the instant tji (respectively tii), each of the following words mj2 to mj (respectively mi2 to ϊΏι) is also associated with the identifier lji of the first speaker who pronounces the J words (respectively the identifier In of the second speaker who pronounces the L words). At the end of the subtitling step ST1 of FIG. 2, the stream Fi comprising the video component CVi, the audio component CAi, the subtitling component CSi, in association with the sub-stream SF, is generated. ^. The flow Fl contains the temporal synchronization indicators ti, ta, ..., tu, ..., tM, as well as all the instants of pronunciation of the words.

Le flux Fi est ensuite soit transmis au terminal TER ou au terminal d’accès STB de la figure 1, via le réseau RC, soit stocké sur un support pour une transmission ultérieure.The stream Fi is then either transmitted to the terminal TER or to the access terminal STB of FIG. 1, via the network RC, or stored on a medium for a subsequent transmission.

On rappelle que l’invention peut s’appliquer aux flux audiovisuels transmis ou diffusés en direct (en anglais « live ») ou bien aux flux téléchargés.It is recalled that the invention can be applied to audiovisual streams transmitted or broadcast live (in English "live") or downloaded streams.

On va maintenant décrire, en référence à la figure 5, un deuxième mode de réalisation d’un procédé de transmission de flux audiovisuel selon l’invention.A second embodiment of a method for transmitting audiovisual streams according to the invention will now be described with reference to FIG.

Similairement au premier mode de réalisation représenté à la figure 2, le procédé de génération de flux audiovisuel selon le deuxième mode de réalisation comprend une étape d’édition EDio au cours de laquelle un flux F’i est construit par insertion : - d’une composante vidéo CVio qui contient l’ensemble des données vidéo contribuant à la restitution des images du contenu audiovisuel G, - d’au moins une composante audio CAio qui contient l’ensemble des données audio contribuant à la restitution sonore complète du contenu audiovisuel G.Similarly to the first embodiment shown in FIG. 2, the method of generating audiovisual streams according to the second embodiment comprises an editing step EDio during which a stream F'i is built by insertion of: video component CVio which contains all the video data contributing to the restitution of the audiovisual content images G, at least one audio component CAio which contains all the audio data contributing to the complete sound reproduction of the audiovisual content G.

Les composantes vidéo CVio et audio CAio sont insérées de façon synchronisée de telle sorte qu’à un instant de restitution courant du contenu audiovisuel, le son émis corresponde bien à l’image restituée à l’écran.The video components CVio and audio CAio are inserted in a synchronized manner so that at a current moment of restitution of the audiovisual content, the sound emitted corresponds to the image restored on the screen.

Comme représenté sur la figure 6 : - la composante vidéo CVio contient une séquence de M scènes vidéo SVi, SV2,..., SVu,...SVm qui défilent sur toute la durée T du contenu G, respectivement à différents instants successifs ti, ta,..., tu,..., tM, - la composante audio CA10 contient une séquence de M ensembles de données audio Di, Da,..., Du,...Dm qui se succèdent sur toute la durée T du contenu G et qui sont synchronisés respectivement avec les M scènes vidéo de la composante vidéo CVio,aux différents instants successifs ti. Ϊ2,..., tu,..., tM qui constituent des indicateurs temporels de synchronisation entre les données de la composante vidéo CVio et les données audio de la composante audio CAiq.As represented in FIG. 6: the video component CVio contains a sequence of M video scenes SVi, SV2,..., SVu,... SVm which scroll over the duration T of the content G, respectively at different successive instants ti , ta, ..., you, ..., tM, the audio component CA10 contains a sequence of M audio data sets Di, Da, ..., Du, ... Dm which succeed each other over the entire duration T of the content G and which are synchronized respectively with the M video scenes of the video component CVio, at different successive times ti. Ϊ2, ..., tu, ..., tM which constitute temporal indicators of synchronization between the data of the video component CVio and the audio data of the audio component CAiq.

Similairement au premier mode de réalisation représenté à la figure 2, le procédé de transmission selon le deuxième mode de réalisation comprend en outre une étape de signalisation SIG10 représentée à la figure 5, au cours de laquelle il est procédé classiquement à l’insertion, dans un sous-flux de signalisation SF’i associé au flux F’i, d’une pluralité d’informations permettant l’analyse du flux F’i par le terminal d’accès STB de l’utilisateur. De telles informations sont identiques aux informations signalées dans le premier mode de réalisation.Similarly to the first embodiment shown in FIG. 2, the transmission method according to the second embodiment further comprises a signaling step SIG10 represented in FIG. 5, during which it is conventionally carried out at the insertion, in a signaling sub-flow SF'i associated with the flow F'i, a plurality of information allowing the analysis of the flow F'i by the access terminal STB of the user. Such information is identical to the information reported in the first embodiment.

Le procédé de génération de flux selon le deuxième mode de réalisation comprend en outre une étape ST10 de sous-titrage du contenu audiovisuel C, au cours de laquelle une composante de sous-titrage CSio est générée en association avec la composante audio CAio de la même manière que dans l’art antérieur. A cet effet, comme représenté sur la figure 6, à un instant courant tu d’apparition d’une scène vidéo SVu, et jusqu’à l’instant suivant tu+i d’apparition de la scène vidéo suivante SVu+i, dans le cas où l’ensemble correspondant Du de données audio de la composante audio CAio contient des données audio représentatives d’un ou plusieurs mots, ce ou ces mots sont retranscrits textuellement sous la forme d’une ou de plusieurs phrases, noté PFlu sur la figure 6, entre l’instant tu et l’instant tu+i. A titre d’exemple, une seule phrase PFlu est prononcée entre l’instant tu et l’instant tu+i, ladite phrase contenant K mots à la suite, désignés respectivement par mki, mi<2,..., mj,..., mK.The stream generation method according to the second embodiment further comprises a step ST10 of subtitling audiovisual content C, during which a caption component CSio is generated in association with the audio component CAio of the same way than in the prior art. For this purpose, as represented in FIG. 6, at a current instant tu of appearance of a video scene SVu, and until the next instant tu + i of appearance of the next video scene SVu + i, in the case where the corresponding set of audio data of the audio component CAio contains audio data representative of one or more words, this or these words are retranscribed textually in the form of one or more sentences, noted PFlu on the figure 6, between the moment you and the moment tu + i. For example, a single sentence PFlu is pronounced between the instant tu and the instant tu + i, said sentence containing K words in succession, designated respectively by mki, mi <2, ..., mj ,. .., mK.

Conformément à l’invention, la vitesse de prononciation des mots de la phrase PFlu est évaluée, c'est-à-dire le temps nécessaire à la prononciation des K mots.According to the invention, the pronunciation speed of the words of the sentence PFlu is evaluated, that is to say the time required for the pronunciation of K words.

Un identifiant IVu représentatif de la valeur de cette vitesse est alors associé, dans la composante de sous-titrage CSio, à la phrase PFlu, de façon à ce que lors de leur restitution à l’écran, les K mots de la phrase PHu apparaissent puis disparaissent chacun à leur tour à la vitesse associée à l’identifiant IVu, entre l’instant tu et l’instant tu+i, respectivement aux K instants successifs tki, tk2,..., tK.An identifier IVu representative of the value of this speed is then associated, in the subtitling component CSio, with the sentence PFlu, so that when they are returned to the screen, the K words of the sentence PHu appear then disappear each in turn at the speed associated with the identifier IVu, between the instant tu and the moment tu + i, respectively at K successive instants tki, tk2, ..., tK.

Selon un premier exemple, l’indicateur IVu est lui-même associé à un unique temps d’apparition-disparition d’un mot qui est le même pour les K mots.According to a first example, the indicator IVu is itself associated with a single appearance-disappearing time of a word which is the same for the K words.

Selon un deuxième exemple, l’indicateur IVu est associé à un temps d’apparition-disparition d’un mot qui est modulé en fonction de la longueur du mot considéré dans la phrase, par exemple en fonction du nombre des syllabes contenues dans le mot. L’étape de sous-titrage ST10 de la figure 5 est ainsi réitérée pour chaque intervalle de temps [ti, t2], [t2, ta], et ainsi de suite jusqu’au dernier intervalle de temps [tM, Tj.According to a second example, the indicator IVu is associated with a time of appearance-disappearance of a word that is modulated according to the length of the word considered in the sentence, for example as a function of the number of syllables contained in the word . The subtitling step ST10 of FIG. 5 is thus repeated for each time interval [t1, t2], [t2, ta], and so on until the last time interval [tM, Tj.

Selon une première variante de réalisation du deuxième mode de réalisation qui est représentée à la figure 7A, dans le cas où la phrase PHu n’est pas prononcée à vitesse constante, la phrase PHu est décomposée en autant de sous-phrases qu’il y a de vitesses différentes de prononciation.According to a first variant embodiment of the second embodiment which is represented in FIG. 7A, in the case where the sentence PHu is not pronounced at constant speed, the sentence PHu is decomposed into as many sub-sentences as there has different speeds of pronunciation.

Dans l’exemple de la figure 7A, la phrase PHu est par exemple décomposée en : - une première sous-phrase PHui contenant P mots mki, mk2,···, mkP, à laquelle est associé un premier identifiant IVui représentatif de la vitesse de prononciation des mots de la première sous-phrase PHui, - et une deuxième sous-phrase PHu2 contenant K-P mots mkP+i, mkp+2,···, ηπκ, à laquelle est associé un deuxième identifiant IVu2 représentatif de la vitesse de prononciation des mots de la deuxième sous-phrase PHu2-In the example of FIG. 7A, the sentence PHu is for example broken down into: a first subphrase PHui containing P words mki, mk2, ···, mkP, to which is associated a first identifier IVi representative of the speed of pronunciation of the words of the first subphrase PHui, - and a second sub-sentence PHu2 containing KP words mkP + i, mkp + 2, ···, ηπκ, with which is associated a second identifier IVu2 representative of the speed of pronunciation of the words in the second subphrase PHu2-

Selon une deuxième variante de réalisation du deuxième mode de réalisation qui est représentée à la figure 7B, dans le cas où la pluralité de K mots se compose d’une première pluralité de Q mots mqi, mq2,..., ma prononcés par un premier locuteur, à une première vitesse donnée Vi, et d’au moins une deuxième pluralité de R mots mr, mr2,..., mp prononcés par un deuxième locuteur, à la suite du premier locuteur, à une deuxième vitesse donnée V2, - un premier identifiant IVui représentatif de la première vitesse de prononciation des Q mots est associé, dans la composante de sous-titrage CS10, à la première pluralité de Q mots mqi, mq2,..., ma, - un deuxième identifiant IVu2 représentatif de la deuxième vitesse de prononciation des R mots est associé, dans la composante de sous-titrage CS10, à la deuxième pluralité de R mots mri, mr2,..., mR.According to a second variant embodiment of the second embodiment which is represented in FIG. 7B, in the case where the plurality of K words consists of a first plurality of Q words mqi, mq2, ..., ma pronounced by a first speaker, at a given first speed Vi, and at least a second plurality of R words mr, mr2, ..., mp uttered by a second speaker, following the first speaker, at a given second speed V2, a first identifier IVi representative of the first pronunciation speed of the Q words is associated, in the subtitling component CS10, with the first plurality of Q words mqi, mq2, ..., ma, a second representative identifier IVu2 the second pronunciation speed of the R words is associated, in the subtitling component CS10, with the second plurality of R words mri, mr2, ..., mR.

Bien entendu, dans le cas où la première pluralité de Q mots ou la deuxième pluralité de R mots n’est pas prononcée à vitesse constante, la première pluralité de Q mots ou la deuxième pluralité de R mots est à son tour décomposée en autant de sous-phrases qu’il y a de vitesses différentes de prononciation, comme dans la première variante du deuxième mode de réalisation, représentée à la figure 7A.Of course, in the case where the first plurality of Q words or the second plurality of R words is not pronounced at constant speed, the first plurality of Q words or the second plurality of R words is in turn decomposed into as many words. sub-sentences that there are different speeds of pronunciation, as in the first variant of the second embodiment, shown in Figure 7A.

Selon une troisième variante de réalisation du deuxième mode de réalisation qui est représentée à la figure 7C, de façon similaire au mode de réalisation représenté sur la figure 4B, compte tenu du fait qu’un même locuteur prononce à la suite les phrases PHui et PHu2, un identifiant lki représentatif de ce même locuteur est inséré, dans la composante de sous-titrage CS10, en association avec au moins le premier mot mki parmi lesdits K mots.According to a third variant embodiment of the second embodiment which is shown in FIG. 7C, similarly to the embodiment shown in FIG. 4B, in view of the fact that the same speaker utters the sentences PHui and PHu2 in a row. a lki identifier representative of the same speaker is inserted, in the subtitling component CS10, in association with at least the first word mki among said K words.

Selon une quatrième variante du deuxième mode de réalisation qui est représentée à la figure 7D, de façon similaire au mode de réalisation représenté sur la figure 4C, compte tenu du fait qu’un premier locuteur prononce la phrase PHui, puis qu’un deuxième locuteur prononce la phrase PHu2 : - un premier identifiant lqi représentatif du premier locuteur est inséré, dans la composante de sous-titrage CS10, en association avec au moins le premier mot mqi parmi lesdits Q mots de la phrase PHui, - un deuxième identifiant Im représentatif du deuxième locuteur est inséré, dans la composante de sous-titrage CS10, en association avec au moins le premier mot mn parmi lesdits R mots de la phrase PHu2. A l’issue de l’étape de sous-titrage ST10 de la figure 5, est généré le flux F’i comprenant la composante vidéo CVio, la composante audio CAio, la composante de sous-titrage CSio, en association avec le sous-flux SF’i. Le flux F’i contient les indicateurs de synchronisation temporels ti, ta,..., tu,···, Im, ainsi que les indicateurs de vitesse de prononciation de mots, et éventuellement les indicateurs représentatifs des locuteurs qui prononcent les mots.According to a fourth variant of the second embodiment which is represented in FIG. 7D, similarly to the embodiment shown in FIG. 4C, given that a first speaker pronounces the phrase PHui and then a second speaker pronounces the sentence PHu2: - a first identifier lqi representative of the first speaker is inserted, in the subtitling component CS10, in association with at least the first word mqi among said Q words of the sentence PHui, - a second identifier Im representative of the second speaker is inserted, in the subtitle component CS10, in association with at least the first word mn among said R words of the sentence PHu2. At the end of the subtitling step ST10 of FIG. 5, the stream F'i comprising the video component CV10, the audio component CA10, the subtitling component CS10, in association with the sub-title, is generated. SF'i flow. The flow F'i contains the temporal synchronization indicators ti, ta, ..., tu, ···, Im, as well as the word word speed indicators, and possibly the indicators representative of the speakers who pronounce the words.

Le flux F’1 est ensuite soit transmis au terminal TER ou au terminal d’accès STB de la figure 1, via le réseau RC, soit stocké sur un support pour une transmission ultérieure.The stream F'1 is then either transmitted to the terminal TER or to the access terminal STB of FIG. 1, via the network RC, or stored on a medium for a subsequent transmission.

On va maintenant décrire, en référence à la figure 8, un troisième mode de réalisation d’un procédé de génération de flux audiovisuel selon l’invention.A third embodiment of a method for generating audiovisual streams according to the invention will now be described with reference to FIG.

Ce troisième mode de réalisation se distingue principalement des modes de réalisation représentés à la figure 2 et à la figure 5, par le fait qu’il ne contient pas d’étape de sous-titrage.This third embodiment differs mainly from the embodiments shown in FIG. 2 and in FIG. 5, in that it does not contain a subtitling step.

Comme cela sera décrit plus loin dans la description, le sous-titrage est ici mis en œuvre à la réception du flux audiovisuel transmis via le réseau de communication RC de la figure 1.As will be described later in the description, the subtitling is implemented here upon reception of the audiovisual stream transmitted via the communication network RC of FIG. 1.

Similairement aux modes de réalisation représentés à la figure 2 et à la figure 5, le procédé de génération de flux selon le troisième mode de réalisation comprend une étape d’édition EDioo, au cours de laquelle un flux F”i est construit par insertion : - d’une composante vidéo CVioo qui contient l’ensemble des données vidéo contribuant à la restitution des images du contenu audiovisuel C, -d’au moins une composante audio CAioo qui contient l’ensemble des données audio contribuant à la restitution sonore complète du contenu audiovisuel C.Similarly to the embodiments shown in FIG. 2 and in FIG. 5, the flux generation method according to the third embodiment comprises an EDioo editing step, during which a flux F "i is constructed by insertion: a video component CVioo which contains all the video data contributing to the restitution of the images of the audiovisual content C, at least one audio component CAioo which contains all the audio data contributing to the complete sound reproduction of the audiovisual content C.

Les composantes vidéo CVioo et audio CAioo sont insérées de façon synchronisée de telle sorte qu’à un instant de restitution courant du contenu audiovisuel, le son émis corresponde bien à l’image restituée à l’écran.The video components CVioo and audio CAioo are inserted in a synchronized manner so that at a current moment of restitution of the audiovisual content, the sound emitted corresponds to the image restored on the screen.

Comme représenté sur la figure 9 : - la composante vidéo CV100 contient une séquence de M scènes vidéo SVi, SV2,..., SVu,..., SVm qui défilent sur toute la durée T du contenu C, respectivement à différents instants successifs ti, t2,..., tu,..., tM, - la composante audio CA100 contient une séquence de M ensembles de données audio Di, D2,..., Du,...Dm qui se succèdent sur toute la durée T du contenu C et qui sont synchronisés respectivement avec les M scènes vidéo de la composante vidéo CV100, aux différents instants successifs ti, t2,..., tu,..., tM qui constituent des indicateurs temporels de synchronisation entre les données de la composante vidéo CV100 et les données audio de la composante audio CA100.As represented in FIG. 9: the video component CV100 contains a sequence of M video scenes SVi, SV2,..., SVu,..., SVm which scroll over the duration T of the content C, respectively at different successive instants. ti, t2, ..., tu, ..., tM, the audio component CA100 contains a sequence of M sets of audio data Di, D2, ..., Du, ... Dm which succeed one another over the entire duration T of the content C and which are respectively synchronized with the M video scenes of the video component CV100, at the different successive instants ti, t2,..., tu,..., tM which constitute temporal indicators of synchronization between the data of the video component CV100 and the audio data of the audio component CA100.

Similairement aux modes de réalisation représentés à la figure 2 et à la figure 5, le procédé de génération de flux selon le troisième mode de réalisation comprend en outre une étape de signalisation SIG100 représentée à la figure 8, au cours de laquelle il est procédé classiquement à l’insertion, dans un sous-flux de signalisation SF”i associé au flux F”i, d’une pluralité d’informations permettant l’analyse du flux F”i par le terminal TER ou bien le terminal d’accès STB de l’utilisateur.Similarly to the embodiments shown in FIG. 2 and in FIG. 5, the flux generation method according to the third embodiment further comprises a signaling step SIG100 represented in FIG. 8, during which it is conventionally carried out. the insertion, in a signaling sub-flow SF "i associated with the flow F" i, of a plurality of information allowing the analysis of the flow F "i by the terminal TER or the access terminal STB of the user.

En référence à la figure 9 où est représenté le sous-flux SF”i, comme dans les premier et deuxième modes de réalisation, de telles informations sont notamment des métadonnées, telles que par exemple un identifiant ID du contenu audiovisuel C, des informations de description DESC du contenu audiovisuel C, comme par exemple son genre (film, documentaire, sport, etc...), le nom et le prénom des personnes associées au contenu audiovisuel C (réalisateur, acteurs, actrices, sportifs, etc....), des informations temporelles associées, telles que par exemple la durée T du contenu audiovisuel C, l’heure de début et de fin de diffusion du contenu C si ce dernier est diffusé en temps réel, etc...With reference to FIG. 9, in which the sub-stream SF "i is represented, as in the first and second embodiments, such information is in particular metadata, such as for example an identifier ID of the audiovisual content C, information of DESC description of the audiovisual content C, such as its genre (film, documentary, sport, etc ...), the name and surname of the people associated with the audiovisual content C (director, actors, actresses, sportsmen, etc ...) .), associated temporal information, such as, for example, the duration T of the audiovisual content C, the start and end time of the broadcast of the content C if the latter is broadcast in real time, etc.

Conformément au troisième mode de réalisation, afin de compenser l’absence de composante de sous-titrage dans le flux audiovisuel à transmettre au terminal TER ou au terminal d’accès STB de la figure 1, l’étape de signalisation SIG100 est complétée par l’insertion d’informations relatives au(x) locuteur(s) intervenant dans le contenu C. A cet effet, pour une pluralité de W locuteurs LOCi, LOC2,...,LOCi,...LOCw déterminés dans le contenu C : - le locuteur LOCi est associé à une information INFV1 représentative la fréquence vocale de ce dernier, ainsi qu’à au moins une caractéristique morphologique CMi associée audit locuteur LOC1, - le locuteur LOC2 est associé à une information INFV2 représentative la fréquence vocale de ce dernier, ainsi qu’à au moins une caractéristique morphologique CM2 associée audit locuteur LOC2,According to the third embodiment, in order to compensate for the absence of a subtitling component in the audiovisual stream to be transmitted to the TER terminal or to the access terminal STB of FIG. 1, the signaling step SIG100 is completed by the inserting information relating to the speaker (s) involved in the content C. For this purpose, for a plurality of W speakers LOCi, LOC2, ..., LOCi,... LOCw determined in the content C: the speaker LOCi is associated with a representative information INFV1 the voice frequency of the latter, and at least one morphological characteristic CMi associated with said speaker LOC1, the speaker LOC2 is associated with an information INFV2 representative of the voice frequency of the latter , as well as at least one morphological characteristic CM2 associated with said speaker LOC2,

J - le locuteur LOC, est associé à une information INFV, représentative la fréquence vocale de ce dernier, ainsi qu’à au moins une caractéristique morphologique CM, associée audit locuteur LOC,,J - the speaker LOC, is associated with information INFV, representative of the voice frequency of the latter, and at least one morphological characteristic CM, associated with said LOC speaker,

... J le locuteur LOCw est associé à une information INFVw représentative la fréquence vocale de ce dernier, ainsi qu’à au moins une caractéristique morphologique CMw associée audit locuteur LOCw-... the speaker LOCw is associated with a representative information INFVw the voice frequency of the latter, and at least one morphological characteristic CMw associated with said speaker LOCW-

Ainsi, comme représenté à la figure 9, sont insérés dans le sous-flux SF”i : - un en-tête LOC1, suivi des champs de données INFV1 et CMi, - un en-tête LOC2, suivi des champs de données INFV2 et CM2,Thus, as shown in FIG. 9, are inserted into the sub-stream SF "i: - a header LOC1, followed by the data fields INFV1 and CMi, - a header LOC2, followed by the data fields INFV2 and CM2

J - un en-tête LOC,, suivi des champs de données INFV, et CM,,J - a LOC header, followed by data fields INFV, and CM ,,

J - un en-tête LOCw, suivi des champs de données INFVw et CMw-Un locuteur peut par exemple être : - une personne visible dans la scène vidéo courante (ex : acteur ou une actrice dans le cas par exemple d’un film), - une personne hors champ (ex : voix « off » dans le cas par exemple d’un documentaire animalier, d’un match de tennis, etc...), - etc...J - a LOCw header, followed by the data fields INFVw and CMw-A speaker can for example be: - a person visible in the current video scene (eg actor or an actress in the case for example of a film) - a person out of the field (ex: voice "off" in the case for example of an animal documentary, a tennis match, etc ...), - etc ...

Dans le cas où un locuteur considéré LOC, apparaît dans la scène vidéo courante à l’instant où il prononce un mot, la caractéristique morphologique CM, consiste en des données d’image d’une partie du corps du locuteur, par exemple le visage de ce dernier.In the case where a speaker considered LOC, appears in the current video scene at the moment when he pronounces a word, the morphological characteristic CM, consists of image data of a part of the body of the speaker, for example the face of the last.

Dans le cas où un locuteur considéré LOC, n’apparaît pas dans l’image courante à l’instant où il prononce un mot, le champ de données CM, est vide. Le locuteur LOC, est alors associé à un identifiant ZI, qui est représentatif des coordonnées d’une zone vidéo courante apte à afficher chaque mot prononcé par le locuteur LOC,. De telles coordonnées peuvent par exemple être celles d’un rectangle disposé en bas et au centre de l’image courante restituée. A l’issue de l’étape de signalisation SIG100 de la figure 8, est alors généré le flux F”i comprenant la composante vidéo CV100, la composante audio CA100, en association avec le sous-flux SF”i. Le flux F”i contient aussi les indicateurs de synchronisation temporelle ti, ta,..., tu,..., tM-In the case where a speaker considered LOC does not appear in the current image at the moment when he pronounces a word, the data field CM is empty. The speaker LOC, is then associated with an identifier ZI, which is representative of the coordinates of a current video area capable of displaying each word spoken by the speaker LOC. Such coordinates may for example be those of a rectangle disposed at the bottom and in the center of the current image restored. At the end of the signaling step SIG100 of FIG. 8, the flux F "i comprising the video component CV100, the audio component CA100, in association with the sub-flux SF" i, is then generated. The stream F "i also contains the time synchronization indicators ti, ta, ..., tu, ..., tM-

Le flux F”1 est ensuite soit transmis au terminal TER ou au terminal d’accès STB de la figure 1, via le réseau RC, soit stocké sur un support pour une transmission ultérieure.The stream F "1 is then either transmitted to the terminal TER or to the access terminal STB of FIG. 1, via the network RC, or stored on a medium for a subsequent transmission.

Selon une première variante de réalisation, pour un locuteur considéré LOCi: - le champ de données INFV, contient directement la valeur de la fréquence vocale dudit locuteur, - le champ de données CM, contient directement les données d’image d’une partie du corps dudit locuteur.According to a first variant embodiment, for a speaker considered LOCi: - the data field INFV, directly contains the value of the voice frequency of said speaker, - the data field CM, directly contains the image data of a part of the body of said speaker.

Selon une deuxième variante de réalisation, pour un locuteur considéré LOC,: - le champ de données INFV, contient une adresse d’accès, par exemple un lien URL (de l’anglais « Uniterm Resource Locator »), à une base de données BDV telle que représentée sur la figure 1, dans laquelle a été préalablement stockée l’empreinte vocale dudit locuteur en association avec l’identifiant LOC,, - le champ de données CM, contient une adresse d’accès, par exemple un lien URL, à une base de données BDI telle que représentée sur la figure 1, dans laquelle ont été préalablement stockées les données d’image dudit locuteur en association avec l’identifiant LOCi.According to a second variant embodiment, for a speaker considered LOC: - the data field INFV, contains an access address, for example a URL link (of the "Uniterm Resource Locator"), to a database BDV as represented in FIG. 1, in which the voice print of said speaker has been stored beforehand in association with the identifier LOC, the data field CM contains an access address, for example a URL link, to a database BDI as shown in Figure 1, wherein were previously stored the image data of said speaker in association with the identifier LOCi.

Dans l’exemple de la figure 1, les bases de données BDV et BDI sont contenues toutes les deux dans un serveur SER connecté au réseau de communication RC. A titre de variante, les bases de données BDV et BDI pourraient être contenues respectivement dans deux serveurs distincts connectés au réseau de communication RC.In the example of FIG. 1, the databases BDV and BDI are both contained in a SER server connected to the communication network RC. As a variant, the databases BDV and BDI could be respectively contained in two separate servers connected to the communication network RC.

Selon encore une autre variante de réalisation, les bases de données BDV et BDI pourraient être contenues toutes les deux dans la plateforme de service PFS.According to yet another alternative embodiment, the BDV and BDI databases could both be contained in the PFS service platform.

En relation avec la figure 10, on considère maintenant la structure simplifiée d’un dispositif REC de réception de contenu audiovisuel, un tel dispositif étant contenu dans le terminal TER de restitution de contenu de la figure 1, selon un exemple de réalisation de l’invention. Un tel dispositif de réception de contenu audiovisuel est adapté pour mettre en oeuvre le procédé de réception de contenu audiovisuel selon l’invention qui va être décrit ci-dessous.In relation to FIG. 10, we now consider the simplified structure of a REC device for receiving audiovisual content, such a device being contained in the content-rest terminal TER of FIG. 1, according to an exemplary embodiment of FIG. invention. Such a device for receiving audiovisual content is adapted to implement the audiovisual content reception method according to the invention which will be described below.

Par exemple, le dispositif REC comprend des ressources physiques et/ou logicielles, en particulier un circuit de traitement CT pour mettre en oeuvre le procédé de réception de contenu audiovisuel selon l'invention, le circuit de traitement CT contenant un processeur PROC piloté par un programme d'ordinateur PG. A l'initialisation, les instructions de code du programme d'ordinateur PG sont par exemple chargées dans une mémoire RAM, notée MR, avant d'être exécutées par le circuit de traitement CT.For example, the device REC comprises physical and / or software resources, in particular a processing circuit CT for implementing the method for receiving audiovisual content according to the invention, the processing circuit CT containing a processor PROC driven by a PG computer program. At initialization, the code instructions of the computer program PG are for example loaded into a RAM, denoted MR, before being executed by the processing circuit CT.

Conformément à l’invention, pour un contenu audiovisuel C à restituer par le terminal TER, le circuit de traitement CT est agencé pour mettre en oeuvre : - la réception, via une interface de réception RCV, d’un flux audiovisuel Fi ou F’i ou F”i correspondant au contenu C à restituer, tel que par exemple transmis par la plateforme PFS, - l’analyse du flux reçu, via un analyseur ANA, - un calculateur CAL pour calculer l’instant de restitution, dans le contenu C, de chaque mot à restituer, - la génération d’une composante de sous-titrage, dans le cas où le flux audiovisuel reçu n’en contient pas, via un générateur GCS de composante de sous-titrage, - l’envoi au terminal TER, via une interface de commande COM, d’une commande de restitution du contenu C associé au flux analysé.According to the invention, for an audiovisual content C to be restored by the terminal TER, the processing circuit CT is arranged to implement: the reception, via a reception interface RCV, of an audiovisual stream Fi or F ' i or F "i corresponding to the content C to be restored, such as for example transmitted by the PFS platform, - the analysis of the received stream, via an ANA analyzer, - a CAL calculator to calculate the moment of restitution, in the content C, of each word to be restored, - the generation of a subtitling component, in the case where the audiovisual stream received does not contain any, via a GCS generator of subtitling component, - the sending to TER terminal, via a COM command interface, a C content return command associated with the stream analyzed.

En relation avec la figure 11, on présente maintenant les étapes du procédé de réception de contenu audiovisuel conformément à un premier mode de réalisation selon l’invention.In relation to FIG. 11, the steps of the method for receiving audiovisual content according to a first embodiment according to the invention are now presented.

Selon ce premier mode de réalisation représenté, ledit procédé de réception est mis en œuvre par le dispositif REC de la figure 10.According to this first embodiment shown, said reception method is implemented by the device REC of FIG. 10.

Au cours d’une étape E1 représentée à la figure 11, l’interface RCV de la figure 10 reçoit le flux Fi tel que généré selon le procédé de la figure 2, puis transmis via le réseau de communication RC de la figure 1, ledit flux Fi correspondant à un contenu C à restituer par un terminal TER de l’utilisateur UT, tel que par exemple une tablette.During a step E1 shown in FIG. 11, the RCV interface of FIG. 10 receives the stream Fi as generated according to the method of FIG. 2, then transmitted via the communication network RC of FIG. Fi stream corresponding to a content C to be returned by a terminal TER of the user UT, such as for example a tablet.

Au cours d’une étape E2 représentée à la figure 11, l’analyseur ANA de la figure 10 lit dans le sous-flux SFi associé au flux Fi les informations représentatives du contenu audiovisuel C qui ont été insérées lors de l’étape de signalisation SIG1 de la figure 2. De telles informations sont notamment des métadonnées, telles que par exemple un identifiant ID du contenu audiovisuel C et des informations de description DESC.During a step E2 shown in FIG. 11, the analyzer ANA of FIG. 10 reads in the sub-flow SFi associated with the stream Fi the information representative of the audiovisual content C that has been inserted during the signaling step SIG1 of FIG. 2. Such information is in particular metadata, such as, for example, an identifier ID of the audiovisual content C and description information DESC.

Au cours d’une étape E3 représentée à la figure 11, l’analyseur ANA de la figure 10 analyse l’un après l’autre les intervalles de temps [ti, t2], [t2, ta], et ainsi de suite jusqu’au dernier intervalle de temps [tM, T], dans chacun desquels une scène vidéo de la composante vidéo CVi est synchronisée avec : - une ou plusieurs données audio délivrées correspondantes insérées dans la composante audio CAi associée, - un ou plusieurs mots retranscrivant la ou les données audio délivrées correspondantes.During a step E3 shown in FIG. 11, the ANA analyzer of FIG. 10 analyzes, one after the other, the time intervals [t 1, t 2], [t 2, t 1], and so on until at the last time interval [tM, T], in each of which a video scene of the video component CVi is synchronized with: - one or more corresponding delivered audio data inserted into the associated audio component CAi, - one or more words retranscribing the or the corresponding delivered audio data.

Conformément au premier mode de réalisation du procédé de réception, pour un intervalle de temps courant [tu, tu+i] analysé, au cours duquel apparaît une scène vidéo courante SVu, et dans le cas où l’ensemble correspondant Du de données audio de la composante audio CAi contient des données audio représentatives d’une pluralité de K mots mki, mk2,..., mj,..., mK (K^2) constituant à la suite une ou plusieurs phrases, au cours d’une étape E4 représentée à la figure 11, l’analyseur ANA identifie successivement K mots indexés respectivement à K instants tki, tk2,···, tK- L’étape E4 est ainsi réitérée pour chaque intervalle de temps considéré [ti, t2], [t2, ts], et ainsi de suite jusqu’au dernier intervalle de temps [tM, T].According to the first embodiment of the reception method, for a current time interval [tu, tu + i] analyzed, during which a current video scene SVu appears, and in the case where the corresponding set of audio data D the audio component CAi contains audio data representative of a plurality of K words mki, mk2, ..., mj, ..., mK (K ^ 2) constituting one or more sentences in the course of a step E4 shown in FIG. 11, the analyzer ANA successively identifies K words indexed respectively at K instants tki, tk2, ···, tK-. The step E4 is thus repeated for each time interval considered [t1, t2], [t2, ts], and so on until the last time interval [tM, T].

Au cours d’une étape E5 représentée à la figure 11, à chacun des K instants précités, une commande de restitution textuelle du mot indexé correspondant est envoyée au terminal TER, au moyen de l’interface de communication COM de la figure 10.During a step E5 shown in FIG. 11, at each of the aforementioned K instants, a textual reproduction command of the corresponding indexed word is sent to the terminal TER, by means of the communication interface COM of FIG. 10.

Ainsi, les K mots précités sont avantageusement restitués textuellement un par un sur l’écran du terminal TER, entre deux instants successifs courant tu et tu+1 d’apparition de la scène vidéo courante SVu, un mot restitué à un instant considéré parmi les K instants remplaçant le mot restitué à l’instant qui précède immédiatement ledit instant considéré. L’étape E5 est réitérée pour chaque intervalle de temps [ti, t2], [t2, ts], et ainsi de suite jusqu’au dernier intervalle de temps [tM, T].Thus, the above-mentioned K words are advantageously reproduced textually one by one on the screen of the terminal TER, between two successive instants tu and tu + 1 of appearance of the current video scene SVu, a word restored at a given instant among the K instants replacing the word restored at the instant immediately preceding said moment considered. Step E5 is repeated for each time interval [t1, t2], [t2, ts], and so on until the last time interval [tM, T].

De cette manière, le confort de visualisation de l’utilisateur est nettement amélioré par rapport au sous-titrage de l’art antérieur qui propose l’affichage simultané des K mots dans l’intervalle de temps courant [tu, tu+i]. En effet, grâce à l’invention, au lieu de devoir tourner les yeux sans cesse de la gauche vers la droite pour lire, dans l’intervalle de temps courant [tu, tu+i], les phrases de sous-titres, l’utilisateur ne voit et ne lit qu’un mot à la fois.In this way, the viewing comfort of the user is significantly improved compared to the captioning of the prior art which proposes the simultaneous display of K words in the current time interval [tu, tu + i]. Indeed, thanks to the invention, instead of having to look constantly from left to right to read, in the current time interval [tu, tu + i], the subtitle sentences, l The user sees and reads only one word at a time.

En référence à la figure 12A, est représenté un premier exemple de restitution, entre deux instants successifs tu et tu+i, d’une scène vidéo SVu sous-titrée.Referring to FIG. 12A, there is shown a first example of reproduction, between two successive instants tu and tu + i, of a video scene SVu subtitled.

Dans l’exemple illustré, douze mots sont prononcés à la suite dans l’intervalle de temps courant [tu, tu+i], par l’actrice présente dans la scène vidéo SVu. Ces douze mots constituent par exemple la phrase suivante : « Fais le mariole et tu plongeras si vite que tu remonteras jamais ! >>.In the illustrated example, twelve words are uttered subsequently in the current time interval [tu, tu + i], by the actress present in the video scene SVu. These twelve words constitute for example the following sentence: "Make the husband and you dive so quickly that you will never go back! >>.

Selon l’invention : - à tki est restitué textuellement le mot mki « Fais >>, - à tk2 est restitué textuellement le mot mk2 « le» qui remplace le mot « Fais », - à tM est restitué textuellement le mot mK « jamais !» qui remplace le mot « remonteras ».According to the invention: - tki is reproduced textually the word mki "Do >>, - tk2 is reproduced textually the word mk2" le "which replaces the word" Faire ", - tM is reproduced textually the word mK" never ! "Which replaces the word" remonteras ".

Les douze mots sont par exemple restitués successivement en bas de l’image correspondante et au centre de cette dernière.The twelve words are for example restored successively at the bottom of the corresponding image and in the center of the latter.

Selon une première variante de ce premier mode de réalisation, dans le cas où le flux Fi a été généré comme représenté à la figure 4A, au cours de l’étape E4 de la figure 11, l’analyseur ANA identifie en outre, dans la composante de sous-titrage CSi, K identifiants lki, Ik2,···, Ik associés respectivement aux K mots mki, mk2,..., mK.According to a first variant of this first embodiment, in the case where the stream Fi has been generated as represented in FIG. 4A, during the step E4 of FIG. 11, the analyzer ANA further identifies, in the subtitle component CSi, K identifiers lki, Ik2, ···, Ik associated respectively with K words mki, mk2, ..., mK.

Comme déjà expliqué en relation avec la figure 4A, les K identifiants sont tels que : - l’identifiant lki est représentatif du locuteur qui prononce le mot mki, - l’identifiant Ik2 est représentatif du locuteur qui prononce le mot mk2. - l’identifiant Ικ est représentatif du locuteur qui prononce le mot rriK.As already explained in relation with FIG. 4A, the K identifiers are such that: the identifier lki is representative of the speaker who pronounces the word mki, the identifier Ik2 is representative of the speaker who pronounces the word mk2. the identifier Ικ is representative of the speaker who pronounces the word rriK.

Au cours de l’étape E5, la commande de restitution textuelle d’un mot qui est envoyée au terminal TER contient alors l’identifiant du locuteur qui prononce ledit mot, de façon à ce que le mot soit affiché en correspondance visuelle avec le locuteur.During the step E5, the command of textual restitution of a word which is sent to the terminal TER then contains the identifier of the speaker who pronounces said word, so that the word is displayed in visual correspondence with the speaker .

Dans le cas où le locuteur n’est pas visible à l’écran, le mot est affiché dans une zone neutre de l’image affichée sur l’écran du terminal TER, par exemple en bas de l’image.In the case where the speaker is not visible on the screen, the word is displayed in a neutral zone of the image displayed on the TER terminal screen, for example at the bottom of the image.

En référence à la figure 12B, est représenté un exemple de restitution, entre deux instants successifs tu et tu+i, d’une scène vidéo courante SVu sous-titrée, selon cette première variante.Referring to FIG. 12B, there is shown an example of reproduction, between two successive instants tu and tu + i, of a current video scene SVu subtitled, according to this first variant.

Cet exemple est le même que celui de la figure 12A, si ce n’est que chacun des douze mots prononcés est restitué textuellement à l’écran en correspondance visuelle avec l’actrice présente dans l’image affichée à l’écran.This example is the same as that of Figure 12A, except that each of the twelve words spoken is reproduced verbatim on the screen in visual correspondence with the actress present in the image displayed on the screen.

Une telle disposition permet à l’utilisateur de distinguer à l’écran le locuteur qui prononce les mots, ici l’actrice, et non l’acteur.Such an arrangement allows the user to distinguish on the screen the speaker who pronounces the words, here the actress, not the actor.

Selon une deuxième variante de ce premier mode de réalisation, dans le cas où le flux Fi a été généré comme représenté à la figure 4B, au cours de l’étape E4 de la figure 11, l’analyseur ANA de la figure 10 identifie dans la composante de sous-titrage CSi : - à l’instant courant tki, un dentifiant lki associé au moins au premier mot mki, - aux instants courants suivants tk2···, tk, respectivement les mots mk2,..., mK dépourvus d’identifiants correspondants.According to a second variant of this first embodiment, in the case where the stream Fi has been generated as shown in FIG. 4B, during the step E4 of FIG. 11, the ANA analyzer of FIG. the sub-titling component CSi: at the current time tki, a dentifier lki associated with at least the first word mki, at the following current instants tk2 ···, tk, respectively the words mk2, ..., mK devoid of corresponding identifiers.

Au cours de l’étape E5, la commande de restitution textuelle du premier mot mki qui est envoyée au terminal TER à l’instant tki contient alors l’identifiant lki du locuteur qui prononce ledit mot, de façon à ce que le mot soit affiché en correspondance visuelle avec le locuteur.During the step E5, the textual reproduction command of the first word mki which is sent to the terminal TER at time tki then contains the identifier lki of the speaker who pronounces said word, so that the word is displayed. in visual correspondence with the speaker.

Puis, aux instants courants suivants tk2···, tK, la commande de restitution textuelle des mots mk2,..., mK qui est envoyée au terminal TER ne contient pas d’identifiant. Ces mots seront alors restitués textuellement en correspondance visuelle avec le locuteur associé à l’identifiant lki identifié à l’étape E4.Then, at the following current instants tk2 ···, tK, the textual reproduction command of the words mk2,..., MK which is sent to the terminal TER does not contain an identifier. These words will then be reproduced textually in visual correspondence with the speaker associated with the identifier Iki identified in step E4.

Selon une troisième variante de ce premier mode de réalisation, le flux Fi qui a été généré est celui représenté à la figure 4C. A cet effet, entre deux instants courants successifs tu et tu+i dans la composante de sous titrage CSi, la pluralité de K mots a été décomposée en une première pluralité de J mots mji, mj2,..., mj prononcés par un premier locuteur respectivement à J instants successifs tji, tj2,..., tj, et en au moins une deuxième pluralité de L mots mn, mi2,..., mL prononcés par un deuxième locuteur, à la suite du premier locuteur, respectivement à L instants successifs tu, ti2,..., k-According to a third variant of this first embodiment, the flow Fi which has been generated is that represented in FIG. 4C. For this purpose, between two successive current instants tu and tu + i in the subtitling component CSi, the plurality of K words has been decomposed into a first plurality of J words mji, mj2, ..., mj pronounced by a first speaker respectively at successive instants tji, tj2, ..., tj, and at least a second plurality of L words mn, mi2, ..., mL pronounced by a second speaker, following the first speaker, respectively at The successive moments tu, ti2, ..., k-

Au moins le premier mot mji parmi lesdits J mots est associé à un identifiant lji du premier locuteur qui prononce les J mots.At least the first word mji among said J words is associated with an identifier lji of the first speaker who pronounces the J words.

Au moins le premier mot mn parmi lesdits L mots est associé à un identifiant In du deuxième locuteur qui prononce les L mots.At least the first word mn among said L words is associated with an identifier In of the second speaker who pronounces the L words.

Par conséquent, au cours de l’étape E4 de la figure 11, l’analyseur ANA identifie dans la composante de sous-titrage CS^ : - à l’instant courant tji, l’identifiant lji du premier locuteur associé au moins au premier mot mji, - aux instants courants suivants tj2..., tj, respectivement les mots mj2,..., mj dépourvus d’identifiants correspondants, - à l’instant courant tu, l’identifiant In du deuxième locuteur associé au moins au premier mot mn, - aux instants courants suivants ti2..., k, respectivement les mots mi2,..., mL dépourvus d’identifiants correspondants.Therefore, during the step E4 of FIG. 11, the analyzer ANA identifies in the subtitling component CS 1: at the current instant t i, the identifier j i of the first speaker associated with at least the first mji word, - at the following current instants tj2 ..., tj, respectively the words mj2, ..., mj devoid of corresponding identifiers, - at the current moment tu, the identifier In of the second speaker associated at least with the first word mn, - at the following current instants ti2 ..., k, respectively the words mi2, ..., mL devoid of corresponding identifiers.

Au cours de l’étape E5, la commande de restitution textuelle du premier mot mji qui est envoyée au terminal TER à l’instant tji contient alors l’identifiant lji du premier locuteur qui prononce ledit mot, de façon à ce que le mot soit affiché en correspondance visuelle avec ce premier locuteur.During the step E5, the textual reproduction command of the first word mji which is sent to the terminal TER at the time tji then contains the identifier lji of the first speaker who pronounces said word, so that the word is displayed in visual correspondence with this first speaker.

Puis, aux instants courants suivants tj2..., tj, la commande de restitution textuelle des mots mj2,..., mj qui est envoyée au terminal TER ne contient pas d’identifiant. Ces mots seront alors restitués textuellement en correspondance visuelle avec le premier locuteur associé à l’identifiant lji identifié à l’étape E4.Then, at the following current instants tj2 ..., tj, the textual reproduction command of the words mj2,..., Mj which is sent to the terminal TER does not contain an identifier. These words will then be textually reproduced in visual correspondence with the first speaker associated with the identifier lji identified in step E4.

Ensuite, au cours de l’étape E5, la commande de restitution textuelle du premier mot mn qui est envoyée au terminal TER à l’instant tu contient alors l’identifiant In du deuxième locuteur qui prononce ledit mot, de façon à ce que le mot soit affiché en correspondance visuelle avec ce deuxième locuteur.Then, during the step E5, the textual reproduction command of the first word mn that is sent to the terminal TER at the instant you then contains the identifier In of the second speaker who pronounces said word, so that the word is displayed in visual correspondence with this second speaker.

Puis, aux instants courants suivants ti2..., ti, la commande de restitution textuelle des mots mi2,..., mi qui est envoyée au terminal TER ne contient pas d’identifiant. Ces mots seront alors restitués textuellement en correspondance visuelle avec le deuxième locuteur associé à l’identifiant In identifié à l’étape E4.Then, at the following current instants ti2 ..., ti, the text reproduction command of the words mi2,..., Mi which is sent to the terminal TER does not contain an identifier. These words will then be reproduced textually in visual correspondence with the second speaker associated with the identifier In identified in step E4.

Une telle disposition permet à l’utilisateur de distinguer à l’écran quel est le locuteur qui prononce le mot courant, lorsque deux personnes ou plus sont présentes à l’écran.Such a provision allows the user to distinguish on the screen which speaker speaks the word, when two or more people are present on the screen.

On va maintenant décrire, en référence à la figure 13, un deuxième mode de réalisation d’un procédé de réception de contenu audiovisuel selon l’invention.A second embodiment of a method for receiving audiovisual content according to the invention will now be described with reference to FIG.

Selon ce deuxième mode de réalisation représenté, ledit procédé de réception est mis en oeuvre par le dispositif REC de la figure 10.According to this second embodiment shown, said reception method is implemented by the device REC of FIG. 10.

Similairement au premier mode de réalisation représenté à la figure 11, le procédé de réception de contenu audiovisuel selon le deuxième mode de réalisation comprend une étape E10 représentée à la figure 13, au cours de laquelle l’interface RCV de la figure 10 reçoit le flux F’i tel que généré selon le procédé de la figure 5, puis transmis via le réseau de communication RC de la figure 1, ledit flux F’i correspondant à un contenu C à restituer par un terminal TER de l’utilisateur UT, tel que par exemple une tablette.Similarly to the first embodiment shown in FIG. 11, the audiovisual content reception method according to the second embodiment comprises a step E10 represented in FIG. 13, during which the RCV interface of FIG. 10 receives the stream F'i as generated according to the method of FIG. 5, then transmitted via the communication network RC of FIG. 1, said stream F'i corresponding to a content C to be restored by a TER terminal of the user UT, such than for example a tablet.

Similairement au premier mode de réalisation représenté à la figure 11, le procédé de réception de contenu audiovisuel selon le deuxième mode de réalisation comprend une étape E20 représentée à la figure 13, au cours de laquelle l’analyseur ANA lit dans le sous-flux SF’i associé au flux F’i les informations représentatives du contenu audiovisuel G qui ont été insérées lors de l’étape de signalisation SIG10 de la figure 5. De telles informations sont notamment des métadonnées, telles que par exemple un identifiant ID du contenu audiovisuel G et des informations de description DESG.Similarly to the first embodiment shown in FIG. 11, the method for receiving audiovisual content according to the second embodiment comprises a step E20 represented in FIG. 13, during which the ANA analyzer reads in the sub-stream SF. associated with the stream F'i the information representative of the audiovisual content G that was inserted during the signaling step SIG10 of FIG. 5. Such information is in particular metadata, such as for example an identifier ID of the audiovisual content. G and DESG description information.

Similairement au premier mode de réalisation représenté à la figure 11, le procédé de réception de contenu audiovisuel selon le deuxième mode de réalisation comprend une étape E30 représentée à la figure 13, au cours de laquelle l’analyseur ANA analyse l’un après l’autre les intervalles de temps [ti, t2], [t2, ta], et ainsi de suite jusqu’au dernier intervalle de temps [tM, T], dans chacun desquels une scène vidéo de la composante vidéo GVio est synchronisée avec : - une ou plusieurs données audio délivrées correspondantes insérées dans la composante audio GAio associée, - un ou plusieurs mots retranscrivant la ou les données audio délivrées correspondantes.Similarly to the first embodiment shown in FIG. 11, the audiovisual content reception method according to the second embodiment comprises a step E30 represented in FIG. 13, during which the ANA analyzer analyzes one after the other time intervals [t1, t2], [t2, ta], and so on until the last time interval [tM, T], in each of which a video scene of the video component GVio is synchronized with: - one or more corresponding delivered audio data inserted into the associated audio component GAio, - one or more words retranscribing the corresponding audio output data.

Gomme dans le premier mode de réalisation du procédé de réception, pour un intervalle de temps courant [tu, tu+i] analysé, au cours duquel apparaît une scène vidéo courante SVu, et dans le cas où l’ensemble correspondant Du de données audio de la composante audio GAio contient des données audio représentatives d’une pluralité de K mots mki, mk2,..., mj,..., mK (K^2) constituant à la suite une ou plusieurs phrases, au cours d’une étape E40 représentée à la figure 13, l’analyseur ANA identifie successivement K mots.As in the first embodiment of the reception method, for a current time interval [tu, tu + i] analyzed, during which a current video scene SVu appears, and in the case where the corresponding set of audio data of the audio component GAio contains audio data representative of a plurality of K words mki, mk2, ..., mj, ..., mK (K ^ 2) constituting one or more sentences, in the course of a step E40 shown in Figure 13, the analyzer ANA successively identifies K words.

Dans l’exemple représenté, l’ensemble des K mots constitue une phrase courante PHu prononcée entre les instants tu et tu+i.In the example shown, the set of K words constitutes a current sentence PHu pronounced between the moments tu and tu + i.

Gonformément au deuxième mode de réalisation, au cours de l’étape E40, l’analyseur ANA identifie en outre, dans la composante de sous-titrage GSio, un identifiant IVu représentatif de la valeur de la vitesse de prononciation des K mots. L’étape E40 est ainsi réitérée pour chaque intervalle de temps [ti, t2], [t2, ta], et ainsi de suite jusqu’au dernier intervalle de temps [tM, Tj.According to the second embodiment, during the step E40, the analyzer ANA further identifies, in the subtitling component GSio, an identifier IVu representative of the value of the pronunciation speed of the K words. Step E40 is thus repeated for each time interval [t1, t2], [t2, ta], and so on until the last time interval [tM, Tj.

Au cours d’une étape E50 représentée à la figure 13, un calculateur CAL du dispositif de réception REC de la figure 10 calcule les K instants d’apparition correspondants des K mots en fonction de la valeur de la vitesse associée à l’identifiant IVu identifié à l’étape E40.During a step E50 shown in FIG. 13, a calculator CAL of the reception device REC of FIG. 10 calculates the K corresponding instants of appearance of the K words as a function of the value of the speed associated with the identifier IVu. identified in step E40.

Au cours d’une étape E60 représentée à la figure 13, à chacun des K instants calculés, une commande de restitution textuelle du mot identifié correspondant est envoyée au terminal TER, au moyen de l’interface de communication COM de la figure 10, chaque mot étant restitué à la vitesse dont la valeur est associée à l’identifiant IVu, à l’instant d’apparition du mot identifié qui a été calculé à l’étape E50.During a step E60 shown in FIG. 13, at each of the K instants calculated, a textual reproduction command of the corresponding identified word is sent to the terminal TER, by means of the communication interface COM of FIG. word being restored at the speed whose value is associated with the identifier IVu, at the instant of appearance of the identified word which was calculated in step E50.

Ainsi, grâce à la présence, dans le flux F’i, de l’identifiant de vitesse IVu associé à la phrase PHu, la composante de sous-titrage CSio utilisée peut conserver la même structure que les composantes de sous-titrage de l’art antérieur, tout en permettant, entre deux instants successifs tu et tu+i d’apparition d’une scène vidéo courante SVu, une restitution textuelle des K mots précités, un par un, au même rythme que le rythme réel de prononciation desdits K mots, un mot restitué à un instant considéré remplaçant le mot restitué à l’instant qui précède immédiatement ledit instant considéré. L’étape E60 est réitérée pour chaque intervalle de temps [ti, tz], [t2, ta], et ainsi de suite jusqu’au dernier intervalle de temps [tM, T].Thus, thanks to the presence, in the stream F'i, of the speed identifier IVu associated with the sentence PHu, the subtitling component CSio used can retain the same structure as the subtitling components of the prior art, while allowing, between two successive instants tu and tu + i of appearance of a current video scene SVu, a textual restitution of the aforementioned K words, one by one, at the same rate as the actual rhythm of pronunciation of said K words, a word restored at a given moment replacing the word restored at the instant immediately preceding said instant considered. Step E60 is repeated for each time interval [ti, tz], [t2, ta], and so on until the last time interval [tM, T].

En référence à la figure 14A, est représenté un premier exemple dé restitution, entre deux instants successifs tu et tu+i, d’une scène vidéo courante SVu sous-titrée, selon le deuxième mode de réalisation du procédé de réception de contenu audiovisuel selon l’invention.Referring to FIG. 14A, there is shown a first rendition example, between two successive instants tu and tu + i, of a current video scene SVu subtitled, according to the second embodiment of the method for receiving audiovisual content according to the invention.

La scène vidéo courante SVu est la même que celle représentée à la figure 12A. Les mots prononcés entre les instants tu et tu+i sont aussi les mêmes que ceux de la figure 12A..The current video scene SVu is the same as that shown in Figure 12A. The words pronounced between the moments tu and tu + i are also the same as those of figure 12A.

Dans l’exemple de restitution de la figure 14A, l’identifiant IVu est associé à une valeur de vitesse de prononciation constante qui est le même pour les K mots. De ce fait, quelle que soit la longueur d’un mot, tous les mots sont restitués textuellement à la même vitesse et les intervalles de temps [tki, tka], [tk2, tk3]> · . [tK-1, ti<] d’apparition-disparition d’un mot sont tous égaux.In the exemplary rendering of FIG. 14A, the identifier IVu is associated with a constant pronunciation rate value that is the same for the K words. As a result, regardless of the length of a word, all words are textually rendered at the same speed and the time intervals [tki, tka], [tk2, tk3]> ·. [tK-1, ti <] appearance-disappearance of a word are all equal.

Dans l’exemple de restitution de la figure 14B, l’identifiant IVuest associé à une valeur de vitesse de prononciation d’un mot qui est modulée en fonction de la longueur du mot considéré dans la phrase, par exemple en fonction du nombre des syllabes contenues dans le mot.In the exemplary rendering of FIG. 14B, the identifier IVu is associated with a pronunciation speed value of a word that is modulated according to the length of the word considered in the sentence, for example as a function of the number of syllables. contained in the word.

Ainsi, sur la figure 14B, le temps d’apparition-disparition par exemple du troisième mot mks « mariole », qui contient trois syllabes, est plus long que le temps d’apparition-disparition par exemple du premier mot mki « Fais », qui contient une seule syllabe.Thus, in FIG. 14B, the appearance-disappearing time for example of the third word mks "mariole", which contains three syllables, is longer than the appearance-disappearing time, for example, of the first word mki "Make", which contains a single syllable.

Selon une première variante du deuxième mode de réalisation, le flux F’i qui a été généré est celui représenté à la figure 7A. A cet effet, la phrase PHu prononcée entre les instants tu et tu+i est par exemple décomposée en : - une première sous-phrase PHui contenant P mots mki, mk2,. ., mkp, à laquelle est associé un premier identifiant IVui représentatif de la vitesse de prononciation des mots de la première sous-phrase PHui, - et une deuxième sous-phrase PHu2 contenant K-P mots mkP+i, mkp+2, · · ·, mK, à laquelle est associé un deuxième identifiant IVu2 représentatif de la vitesse de prononciation des mots de la deuxième sous-phrase PHu2·According to a first variant of the second embodiment, the flow F'i which has been generated is that represented in FIG. 7A. For this purpose, the sentence PHu pronounced between the instants tu and tu + i is for example decomposed into: a first subphrase PHui containing P words mki, mk2 ,. ., mkp, with which is associated a first identifier IVui representative of the pronunciation speed of the words of the first subphrase PHui, - and a second sub-sentence PHu2 containing KP words mkP + i, mkp + 2, · · · , mK, with which is associated a second identifier IVu2 representative of the pronunciation speed of the words of the second subphrase PHu2 ·

Par conséquent, au cours de l’étape E40 de la figure 13, l’analyseur ANA identifie en outre dans la composante de sous-titrage CSio : - le premier identifiant IVui représentatif de la valeur de la vitesse de prononciation des P mots de la première sous-phrase PHui, - le deuxième identifiant IVu2 représentatif de la valeur de la vitesse de prononciation des K-P mots de la deuxième sous-phrase PHu2-Therefore, during the step E40 of FIG. 13, the analyzer ANA further identifies in the subtitling component CS10: the first identifier IV that is representative of the value of the pronunciation speed of the P words of the first subphrase PHui, - the second identifier IVu2 representative of the value of the pronunciation speed of the KP words of the second subphrase PHu2-

Puis au cours de l’étape E50 de la figure 13 : - les P instants d’apparition respectifs des P mots sont calculés en fonction de la valeur de la vitesse associée au premier identifiant IVui identifié à l’étape E40, - les Κ-Ρ instants d’apparition respectifs des K-P mots sont calculés en fonction de la valeur de la vitesse associée au deuxième identifiant IVu2 identifié à l’étape E40.Then during step E50 of FIG. 13: the respective P appearance instants of the P words are computed as a function of the value of the speed associated with the first identifier IVi identified in step E40, the Κ- Ρ respective onset instants KP words are calculated based on the value of the speed associated with the second identifier IVu2 identified in step E40.

Puis au cours de l’étape E60 de la figure 13, une commande de restitution textuelle de chacun des P mots de la première sous-phrase PHui, à la vitesse correspondant à l’identifiant IVui, est envoyée au terminal TER, au moyen de l’interface de communication COM de la figure 10. Puis une commande de restitution textuelle de chacun des mots K-P mots de la deuxième sous-phrase PHu2, à la vitesse correspondant à l’identifiant IVu2, est envoyée au terminal TER, au moyen de l’interface de communication COM de la figure 10.Then during the step E60 of FIG. 13, a textual restitution command of each of the P words of the first subphrase PHui, at the speed corresponding to the identifier IVui, is sent to the terminal TER, by means of the communication interface COM of FIG. 10. Then a textual restitution command of each of the words KP words of the second sub-sentence PHu2, at the speed corresponding to the identifier IVu2, is sent to the terminal TER, by means of the COM communication interface of FIG.

Selon une deuxième variante du deuxième mode de réalisation, le flux F’i qui a été généré est celui représenté à la figure 7B. A cet effet, la pluralité de K mots prononcée entre les instants tu et tu+i se compose d’une première pluralité de Q mots mqi, mq2,..., ma prononcés par un premier locuteur, à une première vitesse donnée Vi, respectivement à O instants successifs tqi, tq2,..., ta, et d’au moins une deuxième pluralité de R mots mr, mr2,..., mp prononcés par un deuxième locuteur, à la suite du premier locuteur, à une deuxième vitesse donnée V2, respectivement à R instants successifs tri, tr2,..., tp.According to a second variant of the second embodiment, the flow F'i which has been generated is that represented in FIG. 7B. For this purpose, the plurality of K words pronounced between the instants tu and tu + i is composed of a first plurality of Q words mqi, mq2, ..., ma pronounced by a first speaker, at a first given speed Vi, respectively at O successive instants tqi, tq2, ..., ta, and at least a second plurality of R words mr, mr2, ..., mp pronounced by a second speaker, following the first speaker, at a second given speed V2, respectively at R successive instants tri, tr2, ..., tp.

Un premier identifiant IVui représentatif de la première vitesse de prononciation des Q mots est associé, dans la composante de sous-titrage CS10, à la première pluralité de Q mots mqi, mq2,..., ma.A first identifier IVi representative of the first pronunciation speed of the Q words is associated, in the subtitling component CS10, with the first plurality of Q words mqi, mq2, ..., ma.

Un deuxième identifiant IVu2 représentatif de la deuxième vitesse de prononciation des R mots est associé, dans la composante de sous-titrage CS10, à la deuxième pluralité de R mots mn, mr2,..., mp.A second identifier IVu2 representative of the second pronunciation speed of the R words is associated, in the subtitling component CS10, with the second plurality of R words mn, mr2, ..., mp.

Par conséquent, au cours de l’étape E40 de la figure 13, l’analyseur ANA identifie dans la composante de sous-titrage CS10 : - le premier identifiant IVui représentatif de la valeur de la vitesse de prononciation des Q mots prononcés par le premier locuteur, - le deuxième identifiant IVu2 représentatif de la valeur de la vitesse de prononciation des R mots prononcés par le deuxième locuteur.Therefore, during the step E40 of FIG. 13, the analyzer ANA identifies in the subtitling component CS10: the first identifier IV that is representative of the value of the pronunciation speed of the Q words pronounced by the first speaker, - the second identifier IVu2 representative of the value of the speed of pronunciation of the R words uttered by the second speaker.

Puis au cours de l’étape E50 de la figure 13 : - les Q instants d’apparition respectifs des Q mots sont calculés en fonction de la valeur de la vitesse associée au premier identifiant IVui identifié à l’étape E40, - les R instants d’apparition respectifs des R mots sont calculés en fonction de la valeur de la vitesse associée au deuxième identifiant IVu2 identifié à l’étape E40.Then during the step E50 of FIG. 13: the Q instants of appearance of the Q words are calculated as a function of the value of the speed associated with the first identifier IVi identified in step E40, the R instants respective appearance of the R words are calculated as a function of the value of the speed associated with the second identifier IVu2 identified in step E40.

Puis au cours de l’étape E60 de la figure 13, une commande de restitution textuelle de chacun des Q mots, à la vitesse ^/^ correspondant à l’identifiant IVui, est envoyée au terminal TER, au moyen de l’interface de communication COM de la figure 10. Puis une commande de restitution textuelle de chacun des mots R mots, à la vitesse V2 correspondant à l’identifiant IVu2, est envoyée au terminal TER, au moyen de l’interface de communication COM de la figure 10.Then during the step E60 of FIG. 13, a textual reproduction command of each of the Q words, at the speed ^ / ^ corresponding to the identifier IVui, is sent to the terminal TER, by means of the interface of FIG. COM communication of FIG. 10. Then a textual restitution command of each word R words, at the speed V2 corresponding to the identifier IVu2, is sent to the terminal TER, by means of the communication interface COM of FIG. .

Selon une troisième variante du deuxième mode de réalisation, le flux F’i qui a été généré est celui représenté à la figure 7C. A cet effet, la composante de sous-titrage CS10 de la figure 7A contient en outre un identifiant lki représentatif du locuteur qui prononce la phrase PHu décomposée préalablement en sous-phrases PHui et PHu2- L’identifiant lki est associé au moins au premier mot mki parmi lesdits K mots qui composent l’ensemble des phrases PHui et PHu2-According to a third variant of the second embodiment, the flow F'i which has been generated is that represented in FIG. 7C. For this purpose, the sub-titling component CS10 of FIG. 7A furthermore contains a lki identifier representative of the speaker that pronounces the sentence PHu previously decomposed into subphrases PHui and PHu2. The identifier lki is associated at least with the first word. mki among the K words that make up the set of sentences PHui and PHu2-

Par conséquent, au cours de l’étape E40 de la figure 13, l’analyseur ANA identifie en outre dans la composante de sous-titrage CS10 : - le premier identifiant IVui représentatif de la valeur de la vitesse de prononciation des P mots de la première sous-phrase PHui, - l’identifiant lki associé au premier mot mki de la première sous-phrase PHui et représentatif de l’unique locuteur qui prononce à la suite les deux sous-phrases PHui, puis PHu2, - le deuxième identifiant IVu2 représentatif de la valeur de la vitesse de prononciation des K-P mots de la deuxième sous-phrase PHu2·Therefore, during the step E40 of FIG. 13, the analyzer ANA further identifies in the subtitling component CS10: the first identifier IV that is representative of the value of the pronunciation speed of the P words of the first subphrase PHui, - the identifier lki associated with the first word mki of the first subphrase PHui and representative of the single speaker who utters after the two subphyses PHui, then PHu2, - the second identifier IVu2 representative of the value of the pronunciation speed of the KP words of the second subphrase PHu2 ·

Puis au cours de l’étape E60 de la figure 13, une commande de restitution textuelle de chacun des P mots de la première sous-phrase PHui, à la vitesse correspondant à l’identifiant IVui, est envoyée au terminal TER, au moyen de l’interface de communication COM de la figure 10. La commande de restitution textuelle du premier mot mki, qui est envoyée au terminal TER à la vitesse correspondant à l’identifiant IVui, contient alors l’identifiant Iri dudit unique locuteur qui prononce la phrase PHu, de façon à ce que le mot soit affiché en correspondance visuelle avec ce locuteur, à un instant tki calculé en fonction de la valeur de la vitesse correspondant à l’identifiant IVui.Then during the step E60 of FIG. 13, a textual restitution command of each of the P words of the first subphrase PHui, at the speed corresponding to the identifier IVui, is sent to the terminal TER, by means of the communication interface COM of FIG. 10. The command for textual reproduction of the first word mki, which is sent to the terminal TER at the speed corresponding to the identifier IVui, then contains the identifier Iri of said single speaker who utters the sentence PHu, so that the word is displayed in visual correspondence with this speaker, at a time tki calculated according to the value of the speed corresponding to the identifier IVui.

Puis, à l’étape E60, la commande de restitution textuelle de chacun des mots suivants mk2,..., mkP de la sous-phrase PHui qui est envoyée au terminal TER ne contient pas d’identifiant de locuteur. Ces mots seront alors restitués textuellement en correspondance visuelle avec l’unique locuteur associé à l’identifiant lki identifié à l’étape E40, à la vitesse correspondant à l’identifiant IVui identifié à l’étape E40, respectivement aux instants tk2,···, tkP qui ont été calculés à l’étape E50, en fonction de la valeur de la vitesse correspondant à l’identifiant IVui.Then, in step E60, the textual reproduction control of each of the following words mk2, ..., mkP of the subphrase PHui which is sent to the terminal TER does not contain a speaker identifier. These words will then be reproduced textually in visual correspondence with the unique speaker associated with the identifier Iki identified in step E40, at the speed corresponding to the identifier IVi identified in step E40, respectively at times tk2, ·· ·, TkP which were calculated in step E50, as a function of the value of the speed corresponding to the identifier IVui.

Ensuite, la commande de restitution textuelle de chacun des mots suivants mkP+i,..., mK de la sous-phrase PHu2 qui est envoyée au terminal TER ne contient pas non plus d’identifiant de locuteur. Ces mots seront alors restitués textuellement en correspondance visuelle avec l’unique locuteur associé à l’identifiant lki identifié à l’étape E40, à la vitesse correspondant à l’identifiant IVu2 identifié à l’étape E40, respectivement aux instants tkp+i,..., tK qui ont été calculés à l’étape E50, en fonction de la valeur de la vitesse correspondant à l’identifiant IVu2-Then, the textual reproduction command of each of the following words mkP + i, ..., mK of the subphrase PHu2 that is sent to the terminal TER does not contain a speaker ID either. These words will then be textually reproduced in visual correspondence with the unique speaker associated with the identifier lki identified in step E40, at the speed corresponding to the identifier IVu2 identified in step E40, respectively at the times tkp + i, ..., tK which were calculated in step E50, as a function of the value of the speed corresponding to the identifier IVu2-

Selon une quatrième variante du deuxième mode de réalisation, le flux F’i qui a été généré est celui représenté à la figure 7D. A cet effet, la composante de sous-titrage CSio de la figure 7B contient en outre : - un identifiant IVui représentatif de la première vitesse de prononciation de la première pluralité de Q mots mqi, mq2,..., ma. - l’identifiant lqi associé au premier mot mqi de la première phrase PHui et représentatif du premier locuteur qui prononce la phrase PHui, - un identifiant IVu2 représentatif de la deuxième vitesse de prononciation de la deuxième pluralité de R mots mn, mr2,..., mp, - l’identifiant Im associé au premier mot mn de la deuxième phrase PHu2 et représentatif du deuxième locuteur qui prononce la phrase PHu2·According to a fourth variant of the second embodiment, the flow F'i which has been generated is that represented in FIG. 7D. For this purpose, the sub-titling component CSio of FIG. 7B furthermore contains: an identifier IVi representative of the first pronunciation speed of the first plurality of Q words mqi, mq2, ..., ma. the identifier lqi associated with the first word mqi of the first sentence PHui and representative of the first speaker that pronounces the sentence PHui; - an identifier IVu2 representative of the second pronunciation speed of the second plurality of R words mn, mr2, .. ., mp, - the identifier Im associated with the first word mn of the second sentence PHu2 and representative of the second speaker who pronounces the sentence PHu2 ·

Puis au cours de l’étape E60 de la figure 13, une commande de restitution textuelle de chacun des Q mots de la première phrase PHui, à la vitesse correspondant à l’identifiant IVui, est envoyée au terminal TER, au moyen de l’interface de communication COM de la figure 10. La commande de restitution textuelle du premier mot mqi qui est envoyée au terminal TER, à la vitesse correspondant à l’identifiant IVui, contient alors l’identifiant lqi du premier locuteur qui prononce la phrase PHui, de façon à ce que le mot soit affiché en correspondance visuelle avec ce premier locuteur, à un instant tqi calculé en fonction de la valeur de la vitesse correspondant à l’identifiant IVui.Then during the step E60 of FIG. 13, a textual restitution command of each of the Q words of the first sentence PHui, at the speed corresponding to the identifier IVui, is sent to the terminal TER, by means of the communication interface COM of FIG. 10. The command for textual reproduction of the first word mqi which is sent to the terminal TER, at the speed corresponding to the identifier IVui, then contains the identifier lqi of the first speaker that pronounces the sentence PHui, so that the word is displayed in visual correspondence with this first speaker, at a time tqi calculated according to the value of the speed corresponding to the identifier IVui.

Puis, la commande de restitution textuelle de chacun des mots suivants mq2,..., mQ de la phrase PHui qui est envoyée au terminal TER ne contient pas d’identifiant de locuteur. Ces mots seront alors restitués textuellement en correspondance visuelle avec le premier locuteur associé à l’identifiant lqi identifié à l’étape E40, à la vitesse correspondant à l’identifiant IVui identifié à l’étape E40, respectivement aux instants tq2,..., ta qui ont été calculés à l’étape E50 en fonction de la valeur de la vitesse correspondant à l’identifiant IVui.Then, the textual reproduction command of each of the following words mq2, ..., mQ of the sentence PHui which is sent to the terminal TER does not contain a speaker identifier. These words will then be reproduced textually in visual correspondence with the first speaker associated with the identifier Iqi identified in step E40, at the speed corresponding to the identifier IVi identified in step E40, respectively at times tq2, ... , ta which were calculated in step E50 as a function of the value of the speed corresponding to the identifier IVui.

Ensuite, la commande de restitution textuelle de chacun des R mots suivants mri,..., mp de la phrase PHu2, à la vitesse correspondant à l’identifiant IVu2, est envoyée au terminal TER. La commande de restitution textuelle du premier mot mri qui est envoyée au terminal TER, à la vitesse correspondant à l’identifiant IVu2, contient alors l’identifiant Li du deuxième locuteur qui prononce la phrase PHu2, de façon à ce que le mot soit affiché en correspondance visuelle avec ce deuxième locuteur, à un instant tri calculé en fonction de la valeur de la vitesse correspondant à l’identifiant IVu2.Then, the textual reproduction control of each of the following R words mri, ..., mp of the sentence PHu2, at the speed corresponding to the identifier IVu2, is sent to the terminal TER. The textual reproduction command of the first word mri which is sent to the terminal TER, at the speed corresponding to the identifier IVu2, then contains the identifier Li of the second speaker who pronounces the sentence PHu2, so that the word is displayed. in visual correspondence with this second speaker, at a sorting instant calculated according to the value of the speed corresponding to the identifier IVu2.

Puis, la commande de restitution textuelle de chacun des mots suivants mr2,..., mR de la phrase PHu2 qui est envoyée au terminal TER ne contient pas d’identifiant de locuteur. Ces mots seront alors restitués textuellement en correspondance visuelle avec le deuxième locuteur associé à l’identifiant Im identifié à l’étape E40, à la vitesse correspondant à l’identifiant IVu2 identifié à l’étape E40, respectivement aux instants tr2,..., tp qui ont été calculés à l’étape E50 en fonction de la valeur de la vitesse correspondant à l’identifiant IVu2·Then, the textual reproduction command of each of the following words mr2, ..., mR of the sentence PHu2 which is sent to the terminal TER does not contain a speaker identifier. These words will then be reproduced textually in visual correspondence with the second speaker associated with the identifier Im identified in step E40, at the speed corresponding to the identifier IVu2 identified in step E40, respectively at the times tr2, ... , tp which were calculated in step E50 as a function of the value of the speed corresponding to the identifier IVu2 ·

On va maintenant décrire, en référence à la figure 15, un troisième mode de réalisation d’un procédé de réception de contenu audiovisuel selon l’invention.A third embodiment of a method for receiving audiovisual content according to the invention will now be described with reference to FIG.

Selon ce troisième mode de réalisation représenté, ledit procédé de réception est mis en œuvre par le dispositif REC de la figure 10.According to this third embodiment shown, said reception method is implemented by the device REC of FIG. 10.

Au cours d’une étape El 00 représentée à la figure 15, l’interface RCV de la figure 10 reçoit le flux F”i tel que généré selon le procédé de la figure 8, puis transmis via le réseau de communication RC de la figure 1, ledit flux F”i correspondant à un contenu C à restituer par un terminal TER de l’utilisateur UT, tel que par exemple une tablette.During a step El 00 shown in FIG. 15, the RCV interface of FIG. 10 receives the stream F "i as generated according to the method of FIG. 8, then transmitted via the communication network RC of FIG. 1, said stream F "i corresponding to a content C to be returned by a terminal TER of the user UT, such as for example a tablet.

Au cours d’une étape E200 représentée à la figure 15, l’analyseur ANA de la figure 10 lit dans le sous-flux SF”i associé au flux F”i les informations représentatives du contenu audiovisuel C qui ont été insérées lors de l’étape de signalisation SIG100 de la figure 8. De façon connue en soi, de telles informations sont notamment des métadonnées, telles que par exemple un identifiant IDi du contenu audiovisuel C et des informations de description DESCi.During a step E200 shown in FIG. 15, the analyzer ANA of FIG. 10 reads in the sub-stream SF "i associated with the stream F" i the information representative of the audiovisual content C that has been inserted during the SIG100 signaling step of Figure 8. In known manner, such information is in particular metadata, such as for example an IDi identifier of the audiovisual content C and DESCi description information.

En outre, conformément au troisième mode de réalisation de l’invention, l’analyseur ANA identifie dans le sous-flux SF”i : - en association avec le champ LOCi de données relatives à un premier locuteur intervenant dans le contenu, l’information INFVi représentative de la fréquence vocale de ce dernier, au moins une caractéristique morphologique CMi associée audit premier locuteur, et un identifiant Z\^ de zone d’image correspondant (si premier locuteur n’apparaît pas à l’écran), - en association avec le champ LOC2 de données relatives à un deuxième locuteur intervenant dans le contenu, l’information INFV2 représentative de la fréquence vocale de ce dernier, au moins une caractéristique morphologique CM2 associée audit deuxième locuteur, et un identifiant ZI2 de zone d’image correspondant (si deuxième locuteur n’apparaît pas à l’écran),In addition, according to the third embodiment of the invention, the analyzer ANA identifies in the sub-stream SF "i: - in association with the field LOCi of data relating to a first speaker involved in the content, the information INFVi representative of the voice frequency of the latter, at least one morphological characteristic CMi associated with said first speaker, and a corresponding image area identifier Z \ ^ (if first speaker does not appear on the screen), - in association with the field LOC2 of data relating to a second speaker intervening in the content, the information INFV2 representative of the voice frequency of the latter, at least one morphological characteristic CM2 associated with said second speaker, and a corresponding image area identifier ZI2 (if the second speaker does not appear on the screen),

J - en association avec le champ LOC, de données relatives à un i-ième locuteur intervenant dans le contenu, l’information INFVi représentative de la fréquence vocale de ce dernier, au moins une caractéristique morphologique CMi associée audit i-ième locuteur, et un identifiant ZI, de zone d’image correspondant (si le i-ième locuteur n’apparaît pas à l’écran),J - in association with the LOC field, data relating to an i-th speaker involved in the content, INFVi information representative of the voice frequency of the latter, at least one morphological characteristic CMi associated with said i-th speaker, and a ZI identifier, corresponding image area (if the i-th speaker does not appear on the screen),

J - en association avec le champ LOCw de données relatives à un W-ième locuteur intervenant dans le contenu, l’information INFVw représentative de la fréquence vocale de ce dernier, au moins une caractéristique morphologique CMw associée audit W-ième locuteur, et un identifiant Zlw de zone d’image correspondant (si W-ième locuteur n’apparaît pas à l’écran).J - in association with the LOCw field of data relating to a W-th speaker involved in the content, the information INFVw representative of the voice frequency of the latter, at least one morphological characteristic CMw associated with said W-th speaker, and a Zlw identifier of corresponding image area (if W-th speaker does not appear on the screen).

Au cours d’une étape E300 représentée à la figure 15, à un instant courant, l’analyseur ANA de la figure 10 identifie : - dans la composante audio CS100 du flux F”i une ou plusieurs données audio délivrées audit instant courant et représentatives d’un mot prononcé par un locuteur intervenant dans le contenu, - dans la composante vidéo CV100 du flux F”i, la scène vidéo correspondante, synchronisée avec la ou les données audio délivrées audit instant courant.During a step E300 shown in FIG. 15, at a current instant, the analyzer ANA of FIG. 10 identifies: in the audio component CS100 of the stream F 1, one or more audio data delivered at said current instant and representative a word uttered by a speaker intervening in the content, - in the video component CV100 of the stream F "i, the corresponding video scene, synchronized with the audio data or delivered to said current time.

Au cours d’une étape E400 représentée à la figure 15, audit instant courant, le générateur GCS de composante de sous-titrage de la figure 10 convertit textuellement la ou lesdites données audio délivrées audit instant courant, en un mot courant.During a step E400 shown in FIG. 15, at said current instant, the subtitle component GCS generator of FIG. 10 textually converts said audio data or data delivered to said current instant in a current word.

Selon un exemple de réalisation, le générateur GCS consiste en un algorithme de reconnaissance vocale bien en connu en soi.According to an exemplary embodiment, the GCS generator consists of a well-known voice recognition algorithm per se.

Au cours d’une étape E500 représentée à la figure 15, audit instant courant, l’analyseur ANA identifie le locuteur courant qui prononce ce mot, par exemple le i-ième locuteur, par lecture dans le sous-flux SF”i, audit instant courant, du champ LOC, de données relatives au ième locuteur, en association avec l’information INFVi représentative de la fréquence vocale de ce dernier, la caractéristique morphologique CM, dudit i-ième locuteur, et un identifiant ZI, de zone d’image correspondant (si le i-ième locuteur n’apparaît pas à l’écran).During a step E500 shown in FIG. 15, at said current instant, the analyzer ANA identifies the current speaker who pronounces this word, for example the i-th speaker, by reading in the sub-stream SF "i, audit instantaneous current, of the field LOC, of data relating to the ith speaker, in association with the information INFVi representative of the vocal frequency of the latter, the morphological characteristic CM, of said i-th speaker, and a zone identifier ZI, of corresponding picture (if the i-th speaker does not appear on the screen).

Au cours d’une étape E600 représentée à la figure 15, audit instant courant, le dispositif de réception REC déclenche l’envoi au terminal TER, au moyen de l’interface de communication COM, d’une commande de restitution synchronisée de la ou desdites données audio, de la scène vidéo correspondante et dudit mot, la restitution dudit mot étant fonction des informations INFVi, Ci, Zl| lues dans le sous-flux SF”i.During a step E600 represented in FIG. 15, at said current instant, the reception device REC triggers the sending to the terminal TER, by means of the communication interface COM, of a synchronized reproduction control of the said audio data, the corresponding video scene and said word, the restitution of said word being a function of information INFVi, Ci, Zl | read in sub-stream SF "i.

Les étapes E300 à E600 sont réitérées sur toute la durée T du contenu audiovisuel C.The steps E300 to E600 are repeated throughout the duration T of the audiovisual content C.

Afin de prévoir la situation où un utilisateur accéderait au contenu quelques instants après l’instant courant, les informations INFVi, Ci, Zli sont régulièrement rafraîchies pendant une durée prédéterminée.In order to predict the situation where a user would access the content a few moments after the current instant, the information INFVi, Ci, Zli are regularly refreshed for a predetermined duration.

Grâce à ce troisième mode de réalisation, la composante de sous-titrage est générée simultanément à l’analyse des composantes audio CA100 et vidéo CV100, chaque mot étant restitué textuellement à l’écran l’un après l’autre et en correspondance visuelle avec le locuteur qui prononce le mot courant, un mot restitué à un instant donné remplaçant le mot restitué à l’instant qui précède immédiatement ledit instant donné.Thanks to this third embodiment, the subtitling component is generated simultaneously with the analysis of the audio components CA100 and video CV100, each word being rendered textually on the screen one after the other and in visual correspondence with the speaker who pronounces the current word, a word restored at a given moment replacing the word restored at the instant immediately preceding said given instant.

Selon une première variante de réalisation, lors de l’étape E500, l’analyseur ANA lit dans le champ de données INFV, la valeur de la fréquence vocale du i-ième locuteur courant, et, dans le champ de données CM,, les données d’image d’une partie du corps dudit locuteur, par exemple les données d’image du visage du locuteur courant.According to a first variant embodiment, during step E500, the analyzer ANA reads in the data field INFV, the value of the voice frequency of the i-th current speaker, and, in the data field CM ,, the image data of a body part of said speaker, e.g. image data of the face of the current speaker.

Une reconnaissance d’image est alors mise en oeuvre dans la scène vidéo associée à la ou aux données audio analysées à l’étape E300, afin d’identifier une zone vidéo qui soit proche du visage du locuteur courant et dans laquelle restituer textuellement le mot courant.An image recognition is then implemented in the video scene associated with the audio data analyzed in step E300, in order to identify a video zone that is close to the face of the current speaker and in which to reproduce the word verbatim. current.

Selon une deuxième variante de réalisation, lors de l’étape E500, l’analyseur ANA : - lit dans le champ de données INEV, une adresse d’accès, par exemple un lien URL, à la base de données BDV de la figure 1, dans laquelle a été préalablement stockée l’empreinte vocale dudit i-ième locuteur courant en association avec l’information LOC,, - lit dans le champ de données CM, une adresse d’accès, par exemple un lien URL, à la base de données BDI de la figure 1, dans laquelle ont été préalablement stockées les données d’image dudit i-ième locuteur en association avec l’information LOC,, de telles données d’image étant par exemple des données d’une image du visage du i-ième locuteur courant. L’interface de communication COM ou toute autre interface de communication appropriée (non représentée) du dispositif de réception REC de la figure 10 envoie alors: - une requête en fourniture de l’empreinte vocale du i-ième locuteur courant à la base de données BDV, - une requête en fourniture des données d’image du i-ième locuteur courant à la base de données BDI.According to a second variant embodiment, during step E500, the analyzer ANA: reads in the data field INEV, an access address, for example a URL link, to the database BDV of FIG. 1 , in which the voice print of said i-th current speaker has previously been stored in association with the information LOC, reads in the data field CM an access address, for example a URL link, at the base BDI data of FIG. 1, in which the image data of said i-th speaker in association with the LOC information have previously been stored, such image data being for example data of a facial image of the i-th current speaker. The communication interface COM or any other appropriate communication interface (not shown) of the reception device REC of FIG. 10 then sends: a request to supply the voiceprint of the i-th current speaker to the database BDV, - a request to supply the image data of the i-th current speaker to the database BDI.

Une fois que l’empreinte vocale et que les données d’image du i-ième locuteur courant ont été reçues dans le dispositif de réception REC en réponse à l’envoi desdites requêtes, une reconnaissance d’image est alors mise en oeuvre dans la scène vidéo associée à la ou aux données audio analysées à l’étape E300, afin d’identifier une zone vidéo qui soit proche du visage du i-ième locuteur courant et dans laquelle restituer textuellement le mot courant.Once the voice print and the image data of the i-th current speaker have been received in the reception device REC in response to the sending of said requests, an image recognition is then implemented in the a video scene associated with the audio data analyzed in step E300, in order to identify a video zone that is close to the face of the current speaker i-th and in which to restore the current word verbatim.

Il va de soi que les modes de réalisation qui ont été décrits ci-dessus ont été donnés à titre purement indicatif et nullement limitatif, et que de nombreuses modifications peuvent être facilement apportées par l’homme de l’art sans pour autant sortir du cadre de l’invention.It goes without saying that the embodiments which have been described above have been given for purely indicative and non-limiting purposes, and that many modifications can easily be made by those skilled in the art without departing from the scope. of the invention.

Claims

A method for receiving audiovisual content from an audiovisual stream that includes a video component and an audio component, a subtitle component being generated in association with the audio component and containing a sequence of words representative of the audio data of the audio component, said method being characterized in that it implements a restitution of the content, during which between two successive instants of restitution of a current video scene of the video component, in the case where the current audio data delivered between said two successive instants are representative of a plurality of K words (mki, mk2, .., mk), such that K> 2, said K words are restored between said two successive instants, respectively at K successive times, a word restored at a given instant replacing the word restored at a time immediately preceding said given instant.

2. A reception method according to claim 1, wherein the subtitling component being generated prior to the return of the content, each of said K words, pronounced at said successive K instants (tki, tk2, .., tK), having been associated prior to an identifier representative of the speaker who pronounces said word, the restitution of said content at a time considered among the K instants, is implemented: - by a prior analysis of the identifier corresponding to said time considered, - by a trigger of a display command, at the instant considered, of the word associated with said analyzed identifier, in visual correspondence with the speaker associated with said analyzed identifier.

3. Reception method according to claim 1, in which the subtitling component is generated prior to the reproduction of the content and the plurality of K words being pronounced by the same speaker at said successive K instants (tki, tk2, ··, ΐκ), the first word of said plurality of K words having also been previously associated with an identifier representative of said same speaker, the restitution of said content is implemented: at the first instant among said K instants, by a prior analysis of the identifier corresponding to said first word of the plurality of K words, - by triggering a command for successive display of said K words, respectively at K instants, in visual correspondence with said same speaker associated with said analyzed identifier.

4. Reception method according to claim 1, wherein the subtitling component being generated prior to the return of the content and the plurality of K words consisting of: - a first plurality of J words (mji, mj2 ,. ., mj) pronounced respectively at J instants (tji, tj2, .., tj) by a first speaker, - by at least a second plurality of L words (mn, mi2, .., mO pronounced respectively at L moments ( tu, ti2, .., ti) by a second speaker, following the J words uttered by the first speaker, the first word of said plurality of J words having previously been associated with a first identifier representative of the first speaker and the first word of said plurality of L words having been previously associated with a second identifier representative of the second speaker, the restitution of said content is implemented: - at the first moment of the plurality of J instants, by a prior analysis of the first identifier ass associated with said first word of the plurality of J words, - by triggering a command for successive display of said J words, respectively at J instants, in visual correspondence with the first speaker associated with said first identifier analyzed, - at the first instant of the plurality of L instants, by a preliminary analysis of the second identifier associated with said first word of the plurality of L words, - by triggering a command for successive display of said L words, respectively at L moments, in visual correspondence with the second speaker associated with the second identifier analyzed.

5. Reception method according to claim 1 or claim 2, wherein the subtitling component being generated prior to the return of the content and the plurality of K words being pronounced by the same speaker at said successive K instants (tki, tk2). , ··, k), at a given speed, said speed having also been associated in advance with an identifier representative of the value of said given speed, the restitution of said content is implemented: by a prior analysis of the identifier representative of the value of said given speed, - by a calculation, as a function of the value of the speed associated with the analyzed identifier, of each of the K instruction instants of the K words, - by a triggering of a command of successive display of said K words, respectively at K instants calculated.

6. Reception method according to claim 1, in which the subtitling component is generated simultaneously with an analysis of the content by implementing the following: identification, at a given moment among the K instants of pronunciation of the K words: • of one or more audio data of the audio component delivered at said instant and representative of one of the K words, • of the corresponding video scene synchronized with the one or more audio data, - textual conversion of the audio data delivered at said instant in one word. triggering a synchronized reproduction control of the one or more audio data, of the corresponding video scene and of said word, in association with said instant in question.

7. Reception method according to claim 6, in which the analysis of the content comprises, at the instant considered among the K instants of pronunciation of K words: an identification of the vocal frequency associated with the corresponding audio data or data delivered and representative of one of said K words uttered by a speaker considered, - a matching of said identified voice frequency with information relating to said speaker concerned which have been associated prior to said voice frequency, said reception method implementing, auditing moment considered, a triggering of a display command of said word corresponding to the audio data / delivered, in visual correspondence with the speaker considered.

A device (REC) for receiving audiovisual content from an audiovisual stream that includes a video component and an audio component, a subtitle component being generated in association with the audio component and containing a series of words representative of audio data of the audio component, the device being characterized in that it comprises a processor which is arranged to implement a restitution of the content, during which between two successive instants of restitution of a current video scene of the component video, in the case where the current audio data delivered between said two successive instants are representative of a plurality of K words (mki, mk2, .., mK), such that K> 2, said K words are restored between said two successive instants, respectively at K successive instants, a word restored at a given instant replacing the restored word at a time immediately preceding said i nstant given.

A computer program comprising program code instructions for performing the steps of the audiovisual content receiving method according to any one of claims 1 to 7, when said program is executed on a computer.

A computer-readable recording medium on which a computer program is recorded including program code instructions for performing the steps of the audiovisual content receiving method according to any one of claims 1 to 7, when said program is executed by a computer.