EP1845521A1 - System for movie dubbing - Google Patents
System for movie dubbing Download PDFInfo
- Publication number
- EP1845521A1 EP1845521A1 EP07105937A EP07105937A EP1845521A1 EP 1845521 A1 EP1845521 A1 EP 1845521A1 EP 07105937 A EP07105937 A EP 07105937A EP 07105937 A EP07105937 A EP 07105937A EP 1845521 A1 EP1845521 A1 EP 1845521A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- replicas
- markers
- rhythm band
- replica
- band
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 230000009466 transformation Effects 0.000 claims abstract description 16
- 230000005236 sound signal Effects 0.000 claims abstract description 10
- 230000033764 rhythmic process Effects 0.000 claims description 42
- 230000015572 biosynthetic process Effects 0.000 claims description 23
- 238000003786 synthesis reaction Methods 0.000 claims description 23
- 238000012986 modification Methods 0.000 claims description 4
- 230000004048 modification Effects 0.000 claims description 4
- 230000001755 vocal effect Effects 0.000 claims description 4
- 238000009877 rendering Methods 0.000 claims description 2
- 230000002194 synthesizing effect Effects 0.000 abstract 1
- 230000006978 adaptation Effects 0.000 description 9
- 230000014509 gene expression Effects 0.000 description 4
- 238000000034 method Methods 0.000 description 3
- 230000001360 synchronised effect Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- 241000288140 Gruiformes Species 0.000 description 1
- 208000037656 Respiratory Sounds Diseases 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 206010037833 rales Diseases 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transforming into visible information
- G10L2021/105—Synthesis of the lips movements from speech, e.g. for talking heads
Definitions
- the present invention relates to a film doubling system.
- the dubbing of a film is to adapt the dialogue of a film to a language other than the original one. To do this, the original soundtrack must be replaced by a new soundtrack. The ultimate goal is then that the movement of the lips of the characters of the film coincides at best with the words pronounced in the new language.
- manufacture of a dubbing consists in using a screen on which are projected on the one hand the images of the film and on the other hand a rhythm band, still called rythmo band or band of dialogue, containing the text of the replicas of the characters in the target language.
- This rhythm band generally arranged below the images, scrolls continuously and synchronously with the images. Actors can then read the text that scrolls out loud as it appears, and their interventions are simultaneously recorded to ensure synchronism with the lip movements of characters appearing in the image.
- the document FR 2,765,354 describes such a dubbing system.
- a disadvantage associated with this type of method is that it is very difficult for the person in charge to adapt the dialogues to the images to get a concrete idea of the quality of his work compared to the images.
- the text of the rythmo band that he proposes is perfectly adapted to the expressions and movements of lips of the characters of the film and to be sure that the comedian will be able to interpret his text properly
- Such a method of work can therefore take a lot of time to achieve a final quality soundtrack and is thus very expensive.
- An object of the present invention is to overcome the disadvantages of the aforementioned prior art.
- an object of the present invention is to facilitate the work of the person in charge of adapting the lyrics of the film.
- Another object of the present invention is to reduce the costs associated with establishing a dubbing.
- An object of the present invention is further to ensure a maximum quality of the soundtrack doubled with respect to the images of the film.
- the present invention provides a computer assisted film dubbing system comprising display means for displaying in parallel on at least one screen images and at least one rhythm band containing the text of the replicas. and markers adapted to allow synchronization of the replicas with respect to the images, said system further comprising a speech synthesis and transformation engine capable of reproducing an intelligible sound signal from said rhythm band to form doubled replicas.
- said speech synthesis engine generates doubled replicas in accordance with the markers for synchronizing the replicas generated by said speech synthesis engine with the markers of the rhythm band.
- said markers of said rhythm band comprise the beginning and the end of a replica, the beginning and the end of sentences, the breaths, the breaks, the labials, the half-lips, etc.
- said system may comprise adjustment means able to adapt the speech rate of each doubled replica in agreement with the rhythm band.
- said adjustment means make it possible to slow down and / or speed up the throughput of each replica or duplicate replica part without modifying the vocal tone.
- said system comprises means for modifying said rhythm band for adding, shifting and / or deleting marks and / or modifying the text of the replicas in order to modify the replicas doubled generated by said speech synthesis engine correspondingly.
- the rhythm band comprises at least one graphic illustration strip able to schematically represent the markers of each replica.
- Fig. 1 is a flowchart illustrating the general operating principle of a film liner system according to the present invention.
- a dubbing system generally comprises display means and means for locating the text of the replicas.
- the display means are intended to display, broadcast or project on at least one screen of a sequence of images or video sequence and at least one visualization strip of the text of the replicas or rhythm band.
- Such display means comprise for example a projector, a monitor, a television, a VCR-TV set, a computer, etc.
- These devices can broadcast the video clip and the rhythm band on one or more screens.
- the particular arrangement of this / these screen (s) must advantageously allow a good visualization of the contents of the rhythm band that scrolls with respect to the images of the video sequence.
- the rhythm band advantageously scrolls in parallel with the video sequence and preferably under said video sequence in order to easily compare the content of the rhythm band to the content of the video sequence.
- the locating means make it possible for them to indicate on said rhythm band markers (particularly temporal) relating to said replicas.
- markers may for example indicate the beginning and end of a sentence, the beginning or the end of a breath or a pseudo-verbal sequence, the labials and half labials. Other indications are also possible.
- this tracking work can be done manually by annotating a movie tape or via the use of computer software.
- Such tracking usually leads to the division of the script into small scenes, commonly called loops in the business, which will subsequently be doubled, that is to say, be read and interpreted aloud while being recorded.
- Such markers are intended to perfectly synchronize the text of the replicas with respect to the images. In other words, these markers can synchronize the replicas of the actors to double compared to the lip movements of the characters in the film.
- a film doubling system having particularly advantageous display and registration means is given in the document FR-2 765 354 which is cited as a reference for such a system.
- This system includes a computer connected to a video recorder, speakers, a keyboard, a display screen and possibly a mouse.
- the computer includes a central processing unit associated with a program memory for driving a video card, a sound card and the screen interface for generating the synchronized scrolling of a rhythm band containing a plurality of superimposed tracks.
- a set of keys is provided in order to enter the different marks relating to the replicas of said rhythm band. These different marks are materialized by symbols and / or codes.
- markers make it possible, for example, to locate loop changes, plane changes, the beginning and end of sentences called open mouth, the beginning and end of closed mouth phrases, the presence of labials "M”, “P”, “B”, the presence of half-labial 'F', 'V', 'R', the presence of 'W', 'EU', 'Q', the beginning and end of 'Ah', 'Oh' or also the beginning and the end various expressions such as moans, tears or rales.
- the film dubbing system further incorporates a speech synthesis and processing engine.
- a speech synthesis and processing engine is capable of rendering an intelligible sound signal from the text and markers of the rhythm band to form doubled replicas.
- This principle of moving from a text to a synthetic voice is generally called "TTS" or "Text to speech” or "text to speech”.
- TTS Transmission to Speech
- text to speech text to speech
- the interest of such a synthesis and voice transformation engine for film dubbing is twofold. First, by coupling a motor to the system, the person in charge of adapting the text to the images can directly listen to the result of his work by generating a synthetic voice interpreting the replica synchronously with the video sequence.
- the voice generated by said engine may, in addition to its adaptation aid, directly serve as the final voice for dubbing the video sequence.
- the audio stream generated by the speech synthesis and transformation engine can be redirected either to an acoustic speaker as indicated in FIG. 1, or to a storage device capable of storing this stream.
- the synthesis technique used can be of different types and can be defined in the advanced settings of the tool.
- both the video sequence, the rhythm band and the audio stream generated by the synthesis and transformation engine voice can be synchronized using a master clock system.
- a master clock system thus makes it possible to perfectly block said video sequence, said rhythm band and said audio stream so as to synchronize their operations in particular.
- this figure illustrates, in the form of a graphical representation, an audio signal generated from the phrase "you are beautiful".
- the synthesized voice must be perfectly synchronous with the rhythm band.
- said synthesis and voice transformation engine advantageously generates doubled replicas in accordance with the rhythm band to synchronize the replicas generated by said synthesis and voice transformation engine with the markers.
- the information of the rythmo band, and in particular the indicated marks control said motor.
- the system may further advantageously comprise means capable of adapting the speech rate of each doubled replica in accordance with the beginning and end of sentence markers of the rhythm band.
- said means advantageously make it possible to slow down and / or speed up the throughput of each doubled replica without modifying the vocal tone.
- Such deformation by acting on the speech rate generated by said speech synthesis and transformation engine can be done using an algorithm such as PSOLA or Vocoder. It should be noted that most conventional synthesis and speech transformation engines allow speech rate control, especially when they are compatible with certain standards (VOICE XML, SSML, JSML) such as AT & T Natural Voice 1.4. More concretely, generally, to achieve such a deformation, the user requests the locating means of the rhythm band so as to indicate on said band the various references relating to replicas. The rhythm band is thus cut into several parts according to the indicated marks. Each piece or part of a sequence is then synthesized so that the duration of the signal is equal to the duration between two reference marks framing it. A concrete example of such a deformation will now be described with reference to FIGS. 3 and 4.
- the person in charge of the adaptation first positions the markers of the replicas on the rhythm band based on the reading of the movement of the lips of the character of the film.
- These marks represented in the figure by vertical lines surmounted by figurines, subdivided the rhythm band.
- a location generally corresponds to a specific lip or a specific reaction of the character of the film to be distinctly heard in the new soundtrack produced by said engine for synthesis and vocal transformation.
- the speech synthesis and transformation engine will therefore respect the positioning of the various marks indicated on the rythmo band and will consequently produce a corresponding audio stream.
- the person in charge of the adaptation can move the markers at the rhythm band to perfectly fit the image, including expressions and lip movements of the character (s).
- the dubbing system advantageously comprises means for modifying said rhythm band for adding, shifting and / or deleting locating means and / or modifying the text of the replicas in order to modify the doubled replicas generated. by the corresponding synthesis and speech transformation engine.
- the person in charge of the adaptation moves a mark of the rhythm band so that the audio signal is then deformed accordingly.
- the schematic representation of the audio signal we can notice in this example that there is no change at the beginning of the phrase "You are” while the artificial speaker lingers longer on the end of the sentence. corresponding to the pronunciation of the word "beautiful".
- the advantage of the proposed film dubbing system is that the person in charge of the adaptation can directly replace the text "C'est beau" by, for example, the expression "C'est classe".
- this person finds that the lips of the character of the film coincide perfectly with the new proposed text.
- it is still possible for him to perfect his work of adding markers on the rhythm band for example at the moment when the character's mouth of the film opens to pronounce the word "nice” and thus make perfectly coincide the word "class” of the doubled replica.
- the doubled replica generated by the speech synthesis and transformation engine can then be used as a pronunciation aid for the actor (s) in charge of the dubbing or can be used as a final voice intended to appear on the new voice. soundtrack.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
Description
La présente invention concerne un système de doublage de film.The present invention relates to a film doubling system.
Le doublage d'un film consiste à adapter le dialogue d'un film à une autre langue que celle d'origine. Pour ce faire, la bande son d'origine doit être remplacée par une nouvelle bande son. Le but final est alors que le mouvement des lèvres des personnages du film coïncide au mieux avec les paroles prononcées dans la nouvelle langue. Généralement, la fabrication d'un doublage consiste à utiliser un écran sur lequel sont projetées d'une part les images du film et d'autre part une bande rythmographique, encore appelée bande rythmo ou bande de dialogue, contenant le texte des répliques des personnages dans la langue de destination. Cette bande rythmo, généralement disposée en dessous des images, défile de façon continue et synchrone avec les images. Des comédiens peuvent alors lire le texte qui défile à haute voix au fur et à mesure de son apparition, et leurs interventions sont simultanément enregistrées afin d'assurer le synchronisme avec les mouvements de lèvre des personnages apparaissants à l'image. Le document
Un inconvénient lié à ce type de méthode est qu'il est très difficile pour la personne en charge d'adapter les dialogues aux images de se faire une idée concrète de la qualité de son travail par rapport aux images. Ainsi, pour vérifier que le texte de la bande rythmo qu'il propose est parfaitement adapté aux expressions et mouvements de lèvres des personnages du film et être sur que le comédien pourra interpréter convenablement son texte, il lui faudra généralement attendre la prestation du/des comédien(s) en charge du doublage. Il en résulte souvent nombre d'erreurs qu'il faudra reprendre après l'enregistrement. Ces erreurs nécessitent alors une nouvelle réflexion de la personne en charge de l'adaptation par rapport aux textes qu'elle doit proposer suivi d'une nouvelle prestation du/des comédiens. Dans le métier ces corrections sont communément appelées « retakes ». Une telle méthode de travail peut donc prendre énormément de temps pour aboutir à une bande son finale de qualité et s'avère ainsi très coûteuse.A disadvantage associated with this type of method is that it is very difficult for the person in charge to adapt the dialogues to the images to get a concrete idea of the quality of his work compared to the images. Thus, to verify that the text of the rythmo band that he proposes is perfectly adapted to the expressions and movements of lips of the characters of the film and to be sure that the comedian will be able to interpret his text properly, he will generally have to wait for the performance of actor (s) in charge of dubbing. This often results in a number of errors that will need to be taken again after registration. These errors then require a new reflection of the person in charge of the adaptation compared to the texts that it must propose followed by a new performance of the actors. In the business these corrections are commonly called "retakes". Such a method of work can therefore take a lot of time to achieve a final quality soundtrack and is thus very expensive.
Un but de la présente invention est de surmonter les inconvénients de l'art antérieur précité.An object of the present invention is to overcome the disadvantages of the aforementioned prior art.
En particulier, un but de la présente invention est de faciliter le travail de la personne en charge de l'adaptation des paroles du film. Un autre but de la présente invention est de réduire les coûts liés à l'établissement d'un doublage.In particular, an object of the present invention is to facilitate the work of the person in charge of adapting the lyrics of the film. Another object of the present invention is to reduce the costs associated with establishing a dubbing.
Un but de la présente invention est encore d'assurer une qualité maximale de la bande son doublée par rapport aux images du film.An object of the present invention is further to ensure a maximum quality of the soundtrack doubled with respect to the images of the film.
Pour atteindre ces buts, la présente invention propose un système de doublage d'un film assisté par ordinateur comprenant des moyens d'affichage permettant d'afficher en parallèle sur au moins un écran des images et au moins une bande rythmo contenant le texte des répliques et des repères aptes à permettre une synchronisation des répliques par rapport aux images, ledit système comportant en outre un moteur de synthèse et de transformation vocale apte à restituer un signal sonore intelligible à partir de ladite bande rythmo pour former des répliques doublées.To achieve these aims, the present invention provides a computer assisted film dubbing system comprising display means for displaying in parallel on at least one screen images and at least one rhythm band containing the text of the replicas. and markers adapted to allow synchronization of the replicas with respect to the images, said system further comprising a speech synthesis and transformation engine capable of reproducing an intelligible sound signal from said rhythm band to form doubled replicas.
Avantageusement, ledit moteur de synthèse vocale génère des répliques doublées en accord avec les repères pour synchroniser les répliques générées par ledit moteur de synthèse vocale avec les repères de la bande rythmo.Advantageously, said speech synthesis engine generates doubled replicas in accordance with the markers for synchronizing the replicas generated by said speech synthesis engine with the markers of the rhythm band.
Avantageusement, lesdits repères de ladite bande rythmo comprennent le début et la fin d'une réplique, les début et les fins de phrases, les respirations, les pauses, les labiales, les demi-labiales, etc.Advantageously, said markers of said rhythm band comprise the beginning and the end of a replica, the beginning and the end of sentences, the breaths, the breaks, the labials, the half-lips, etc.
Avantageusement, ledit système peut comprendre des moyens de réglage aptes à adapter le débit de parole de chaque réplique doublée en accord avec la bande rythmo.Advantageously, said system may comprise adjustment means able to adapt the speech rate of each doubled replica in agreement with the rhythm band.
Avantageusement, lesdits moyens de réglage permettent de ralentir et/ou d'accélérer le débit de chaque réplique ou partie de réplique doublée sans modification du timbre vocal.Advantageously, said adjustment means make it possible to slow down and / or speed up the throughput of each replica or duplicate replica part without modifying the vocal tone.
Avantageusement, ledit système comprend des moyens de modification de ladite bande rythmo permettant d'ajouter, de décaler et/ou de supprimer des repères et/ou de modifier le texte des répliques afin de modifier les répliques doublées générées par ledit moteur de synthèse vocale de manière correspondante.Advantageously, said system comprises means for modifying said rhythm band for adding, shifting and / or deleting marks and / or modifying the text of the replicas in order to modify the replicas doubled generated by said speech synthesis engine correspondingly.
Avantageusement, la bande rythmo comprend au moins une bande d'illustration graphique apte à schématiquement représenter les repères de chaque réplique.Advantageously, the rhythm band comprises at least one graphic illustration strip able to schematically represent the markers of each replica.
L'invention sera maintenant plus amplement décrite en référence aux dessins joints donnant à titre d'exemple non limitatif un mode de réalisation avantageux de l'invention.The invention will now be further described with reference to the accompanying drawings giving by way of non-limiting example an advantageous embodiment of the invention.
Sur les figures :
- la figure 1 est un schéma du principe de fonctionnement général d'un système de doublage de film selon la présente invention,
- la figure 2 est une représentation schématique d'un signal audio,
- la figure 3 est une vue schématique du système associant les mouvements de la bouche sur les images, les textes des répliques comme ils apparaîtraient sur la bande rythmo, les repères et la représentation graphique du signal audio généré à partir de la bande rythmo,
- la figure 4 est une vue similaire à la figure 3, montrant une bande rythmo remaniée, ainsi que les modifications sur le signal sonore résultant.
- FIG. 1 is a diagram of the general operating principle of a film doubling system according to the present invention,
- FIG. 2 is a schematic representation of an audio signal,
- FIG. 3 is a schematic view of the system associating the movements of the mouth on the images, the texts of the replicas as they would appear on the rhythm band, the marks and the graphic representation of the audio signal generated from the rhythm band,
- FIG. 4 is a view similar to FIG. 3, showing a reshaped rhythm band, as well as the modifications on the resulting sound signal.
La figure 1 représente un organigramme illustrant le principe général de fonctionnement d'un système de doublage de film selon la présente invention. Un tel système de doublage comporte généralement des moyens d'affichage et des moyens de repérage du texte des répliques.Fig. 1 is a flowchart illustrating the general operating principle of a film liner system according to the present invention. Such a dubbing system generally comprises display means and means for locating the text of the replicas.
Les moyens d'affichage sont destinés à assurer un affichage, une diffusion ou une projection sur au moins un écran d'une séquence d'images ou séquence vidéo et d'au moins une bande de visualisation du texte des répliques ou bande rythmo. De tels moyens d'affichage comprennent par exemple un projecteur, un moniteur, une télévision, un ensemble magnétoscope-télévision, un ordinateur, etc. Ces dispositifs peuvent diffuser la séquence vidéo et la bande rythmo sur un ou plusieurs écran(s). L'agencement particulier de ce/ces écran(s) doit avantageusement permettre une bonne visualisation du contenu de la bande rythmo qui défile par rapport aux images de la séquence vidéo. Pour ce faire, la bande rythmo défile avantageusement en parallèle de la séquence vidéo et de préférence sous ladite séquence vidéo afin de pouvoir aisément comparer le contenu de la bande rythmo au contenu de la séquence vidéo.The display means are intended to display, broadcast or project on at least one screen of a sequence of images or video sequence and at least one visualization strip of the text of the replicas or rhythm band. Such display means comprise for example a projector, a monitor, a television, a VCR-TV set, a computer, etc. These devices can broadcast the video clip and the rhythm band on one or more screens. The particular arrangement of this / these screen (s) must advantageously allow a good visualization of the contents of the rhythm band that scrolls with respect to the images of the video sequence. To do this, the rhythm band advantageously scrolls in parallel with the video sequence and preferably under said video sequence in order to easily compare the content of the rhythm band to the content of the video sequence.
Les moyens de repérage permettent quant à eux d'indiquer sur ladite bande rythmographique des repères (notamment temporels) relatifs auxdites répliques. De tels repères peuvent par exemple indiquer le début et la fin d'une phrase, le début ou la fin d'une respiration ou d'une séquence pseudo verbale, les labiales et les demi labiales. D'autres indications sont aussi envisageables. Habituellement, ce travail de repérage peut être réalisé manuellement par annotation d'une bande cinéma ou via l'utilisation d'un logiciel informatique. Un tel repérage aboutit alors généralement à la division du script en de petites scènes, communément appelées boucles dans le métier, qui devront par la suite être doublées, c'est-à-dire être lues et interprétées à haute voix tout en étant enregistrées. De tels repères ont pour but de parfaitement synchroniser le texte des répliques par rapport aux images. Autrement dit, ces repères permettent de synchroniser les répliques des comédiens à doubler par rapport aux mouvements de lèvres des personnages du film.The locating means make it possible for them to indicate on said rhythm band markers (particularly temporal) relating to said replicas. Such markers may for example indicate the beginning and end of a sentence, the beginning or the end of a breath or a pseudo-verbal sequence, the labials and half labials. Other indications are also possible. Usually, this tracking work can be done manually by annotating a movie tape or via the use of computer software. Such tracking usually leads to the division of the script into small scenes, commonly called loops in the business, which will subsequently be doubled, that is to say, be read and interpreted aloud while being recorded. Such markers are intended to perfectly synchronize the text of the replicas with respect to the images. In other words, these markers can synchronize the replicas of the actors to double compared to the lip movements of the characters in the film.
Un exemple d'un système de doublage de film comportant des moyens d'affichage et de repérage particulièrement avantageux est donné dans le document
Conformément à la présente invention, le système de doublage de film intègre en outre un moteur de synthèse et de transformation vocale. Un tel moteur est apte à restituer un signal sonore intelligible à partir du texte et des repères de la bande rythmo pour former des répliques doublées. Ce principe de passage d'un texte à une voix synthétique est généralement appelée « TTS » ou « Text to speech » soit « texte à la parole ». L'intérêt d'un tel moteur de synthèse et de transformation vocale pour le doublage de film est double. Premièrement, en couplant un moteur au système, la personne en charge de l'adaptation du texte par rapport aux images peut écouter directement le résultat de son travail en générant une voix synthétique interprétant la réplique de façon synchrone avec la séquence vidéo. Cette personne peut donc évaluer la qualité de son travail d'adaptation de façon plus concrète que s'il se contente de lire à haute voix le texte et de façon plus efficace que s'il devait attendre la prestation des comédiens en charge du doublage. Il peut ainsi s'assurer directement du synchronisme de son texte avec les labiales des personnages apparaissant à l'image. Deuxièmement, la voix générée par ledit moteur peut, outre son aide à l'adaptation, servir directement de voix finale pour le doublage de la séquence vidéo.In accordance with the present invention, the film dubbing system further incorporates a speech synthesis and processing engine. Such a motor is capable of rendering an intelligible sound signal from the text and markers of the rhythm band to form doubled replicas. This principle of moving from a text to a synthetic voice is generally called "TTS" or "Text to speech" or "text to speech". The interest of such a synthesis and voice transformation engine for film dubbing is twofold. First, by coupling a motor to the system, the person in charge of adapting the text to the images can directly listen to the result of his work by generating a synthetic voice interpreting the replica synchronously with the video sequence. This person can therefore assess the quality of his adaptation work more concretely than if he just reads the text aloud and more effectively than if he had to wait for the performance of the actors in charge of dubbing. He can thus make sure directly of the synchronism of his text with the labials of the characters appearing in the image. Secondly, the voice generated by said engine may, in addition to its adaptation aid, directly serve as the final voice for dubbing the video sequence.
Le flux audio généré par le moteur de synthèse et de transformation vocale peut être redirigé, soit vers une enceinte acoustique tel qu'indiqué sur la figure 1, soit vers un périphérique de stockage apte à mémoriser ce flux. La technique de synthèse utilisée peut être de différents types et peut être définie dans les réglages avancés de l'outil.The audio stream generated by the speech synthesis and transformation engine can be redirected either to an acoustic speaker as indicated in FIG. 1, or to a storage device capable of storing this stream. The synthesis technique used can be of different types and can be defined in the advanced settings of the tool.
Comme représenté sur la figure 1, à la fois la séquence vidéo, la bande rythmo et le flux audio généré par le moteur de synthèse et de transformation vocale peuvent être synchronisés à l'aide d'un système d'horloge maître. Un tel système permet ainsi de parfaitement caler ladite séquence vidéo, ladite bande rythmo et ledit flux audio de façon à notamment synchroniser leurs actionnements.As shown in FIG. 1, both the video sequence, the rhythm band and the audio stream generated by the synthesis and transformation engine voice can be synchronized using a master clock system. Such a system thus makes it possible to perfectly block said video sequence, said rhythm band and said audio stream so as to synchronize their operations in particular.
En se référant maintenant à la figure 2, cette figure illustre, sous la forme d'une représentation graphique, un signal audio généré à partir de la phrase « vous êtes belle ». La voix synthétisée doit être parfaitement synchrone par rapport à la bande rythmo. En conséquence, ledit moteur de synthèse et de transformation vocale génère avantageusement des répliques doublées en accord avec la bande rythmo pour synchroniser les répliques générées par ledit moteur de synthèse et de transformation vocale avec les repères. Autrement dit, les informations de la bande rythmo, et notamment les repères indiqués, pilotent ledit moteur. Le système peut en outre avantageusement comprendre des moyens aptes à adapter le débit de parole de chaque réplique doublée en accord avec les repères de début et de fin de phrase de la bande rythmographique. Par exemple, lesdits moyens permettent avantageusement de ralentir et/ou d'accélérer le débit de chaque réplique doublée sans modification du timbre vocal. Une telle déformation en agissant sur le débit de parole générée par ledit moteur de synthèse et de transformation vocale peut se faire à l'aide d'un algorithme tel que PSOLA ou Vocodeur. Il est à noter que la plupart des moteurs de synthèse et de transformation vocale conventionnels permettent le contrôle du débit de parole, notamment lorsqu'ils sont compatibles avec certaines normes (VOICE XML, SSML, JSML) tels que AT&T Natural Voice 1.4. De façon plus concrète, généralement, pour réaliser une telle déformation, l'utilisateur sollicite les moyens de repérage de la bande rythmographique de façon à indiquer sur ladite bande les différents repères relatifs aux répliques. La bande rythmo est ainsi découpée en plusieurs parties selon les repères indiqués. Chaque morceau ou partie de séquence est alors synthétisé de façon à ce que la durée du signal soit égale à la durée séparant deux repères l'encadrant. Un exemple concret d'une telle déformation va maintenant être décrit en référence aux figures 3 et 4.Referring now to Figure 2, this figure illustrates, in the form of a graphical representation, an audio signal generated from the phrase "you are beautiful". The synthesized voice must be perfectly synchronous with the rhythm band. As a result, said synthesis and voice transformation engine advantageously generates doubled replicas in accordance with the rhythm band to synchronize the replicas generated by said synthesis and voice transformation engine with the markers. In other words, the information of the rythmo band, and in particular the indicated marks, control said motor. The system may further advantageously comprise means capable of adapting the speech rate of each doubled replica in accordance with the beginning and end of sentence markers of the rhythm band. For example, said means advantageously make it possible to slow down and / or speed up the throughput of each doubled replica without modifying the vocal tone. Such deformation by acting on the speech rate generated by said speech synthesis and transformation engine can be done using an algorithm such as PSOLA or Vocoder. It should be noted that most conventional synthesis and speech transformation engines allow speech rate control, especially when they are compatible with certain standards (VOICE XML, SSML, JSML) such as AT & T Natural Voice 1.4. More concretely, generally, to achieve such a deformation, the user requests the locating means of the rhythm band so as to indicate on said band the various references relating to replicas. The rhythm band is thus cut into several parts according to the indicated marks. Each piece or part of a sequence is then synthesized so that the duration of the signal is equal to the duration between two reference marks framing it. A concrete example of such a deformation will now be described with reference to FIGS. 3 and 4.
En référence à la figure 3, la personne en charge de l'adaptation positionne tout d'abord les repères des répliques sur la bande rythmo en se basant sur la lecture du mouvement des lèvres du personnage du film. Ces repères, représentés sur la figure par des traits verticaux surmontés de figurines, subdivisent la bande rythmo. Ainsi, un tel repérage correspond généralement à une labiale spécifique ou à une réaction spécifique du personnage du film à distinctement entendre dans la nouvelle bande son produite par ledit moteur de synthèse et de transformation vocale. Le moteur de synthèse et de transformation vocale respectera donc le positionnement des différents repères indiqués sur la bande rythmo et produira en conséquence un flux audio correspondant.Referring to Figure 3, the person in charge of the adaptation first positions the markers of the replicas on the rhythm band based on the reading of the movement of the lips of the character of the film. These marks, represented in the figure by vertical lines surmounted by figurines, subdivided the rhythm band. Thus, such a location generally corresponds to a specific lip or a specific reaction of the character of the film to be distinctly heard in the new soundtrack produced by said engine for synthesis and vocal transformation. The speech synthesis and transformation engine will therefore respect the positioning of the various marks indicated on the rythmo band and will consequently produce a corresponding audio stream.
En référence à la figure 4, la personne en charge de l'adaptation peut déplacer les repères au niveau de la bande rythmo afin de parfaitement coller à l'image, et notamment aux expressions et mouvements de lèvres du/des personnage(s). Pour ce faire, le système de doublage comprend avantageusement des moyens de modification de ladite bande rythmo permettant d'ajouter, de décaler et/ou de supprimer des moyens de repérage et/ou de modifier le texte des répliques afin de modifier les répliques doublées générées par le moteur de synthèse et de transformation vocale de manière correspondante. Sur la figure 4, la personne en charge de l'adaptation déplace un repère de la bande rythmo de sorte que le signal audio est alors déformé en conséquence. Tel que visible sur la représentation schématique du signal audio, nous pouvons remarquer dans cet exemple qu'il n'y a pas de changement au début de phrase « Vous êtes » tandis que le locuteur artificiel s'attarde plus longtemps sur la fin de phrase correspondant à la prononciation du mot « belle ».Referring to Figure 4, the person in charge of the adaptation can move the markers at the rhythm band to perfectly fit the image, including expressions and lip movements of the character (s). To do this, the dubbing system advantageously comprises means for modifying said rhythm band for adding, shifting and / or deleting locating means and / or modifying the text of the replicas in order to modify the doubled replicas generated. by the corresponding synthesis and speech transformation engine. In Figure 4, the person in charge of the adaptation moves a mark of the rhythm band so that the audio signal is then deformed accordingly. As can be seen in the schematic representation of the audio signal, we can notice in this example that there is no change at the beginning of the phrase "You are" while the artificial speaker lingers longer on the end of the sentence. corresponding to the pronunciation of the word "beautiful".
Un autre exemple d'utilisation de ce moteur de synthèse et de transformation vocale va maintenant être décrit. Dans cet exemple, nous supposerons que la séquence vidéo d'origine montre un personnage s'exclamant « This is nice ». La personne en charge de l'adaptation place alors sur la bande rythmo des repères afin de signaler sur ladite bande le début et la fin de la réplique à respecter dans la phrase à prononcer. Cette personne traduit alors la phrase « This is nice » par « C'est beau ». Toutefois, en rembobinant la bande d'enregistrement vidéo et en visionnant cette dernière tout en écoutant la réplique doublée générée par le moteur de synthèse et de transformation vocale, ladite personne peut détecter un défaut dans son travail d'adaptation. En particulier, cette personne peut être choquée par le fait que les lèvres du personnage du film ne se ferment pas sur la labiale « b » de « C'est beau ». Cette labiale est en effet absente de la réplique originale en langue anglaise. L'avantage du système de doublage de film proposé est alors que la personne en charge de l'adaptation peut directement remplacer le texte « C'est beau » par, par exemple, l'expression « C'est classe ». Lors du revisionnage de la réplique doublée, cette personne constate que les lèvres du personnage du film coïncident parfaitement avec le nouveau texte proposé. Le cas échéant, en cas de décalage entre la réplique doublée et le mouvement des lèvres du personnage du film, il lui est encore possible pour parfaire son travail d'ajouter des repères sur la bande rythmo, par exemple au moment où la bouche du personnage du film s'ouvre pour prononcer le mot « nice » et ainsi faire parfaitement coïncider le mot « classe » de la réplique doublée. La réplique doublée générée par le moteur de synthèse et de transformation vocale peut alors être utilisée comme une aide à la prononciation pour le(s) comédien(s) en charge du doublage ou bien peut être utilisée comme voix définitive destinée à apparaître sur la nouvelle bande son.Another example of use of this synthesis and voice transformation engine will now be described. In this example, we will assume that the original video footage shows a character exclaiming "This is nice". The person in charge of the adaptation then places on the rhythm band markers in order to signal on the tape the beginning and the end of the reply to be respected in the sentence to be pronounced. This person then translates the sentence "This is nice" by "C'est beau". However, by rewinding the video recording tape and viewing it while listening to the doubled replica generated by the voice synthesis and transformation engine, said person can detect a defect in his adaptation work. In particular, this person may be shocked by the fact that the lips of the film character do not close on the lip "b" of "It's beautiful". This labial is indeed absent from the original replica in English language. The advantage of the proposed film dubbing system is that the person in charge of the adaptation can directly replace the text "C'est beau" by, for example, the expression "C'est classe". When reviewing the replica dubbed, this person finds that the lips of the character of the film coincide perfectly with the new proposed text. If necessary, in the event of a discrepancy between the duplicated replica and the movement of the lips of the character of the film, it is still possible for him to perfect his work of adding markers on the rhythm band, for example at the moment when the character's mouth of the film opens to pronounce the word "nice" and thus make perfectly coincide the word "class" of the doubled replica. The doubled replica generated by the speech synthesis and transformation engine can then be used as a pronunciation aid for the actor (s) in charge of the dubbing or can be used as a final voice intended to appear on the new voice. soundtrack.
Bien que la présente invention a été décrite en référence à un mode de réalisation particulier, il est bien entendu qu'un homme du métier peut y apporter diverses modifications, sans sortir du cadre de la présente invention défini par les revendications annexées.Although the present invention has been described with reference to a particular embodiment, it is understood that a person skilled in the art may make various modifications, without departing from the scope of the present invention defined by the appended claims.
Claims (7)
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR0651300A FR2899714B1 (en) | 2006-04-11 | 2006-04-11 | FILM DUBBING SYSTEM. |
Publications (1)
Publication Number | Publication Date |
---|---|
EP1845521A1 true EP1845521A1 (en) | 2007-10-17 |
Family
ID=37441836
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP07105937A Withdrawn EP1845521A1 (en) | 2006-04-11 | 2007-04-11 | System for movie dubbing |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP1845521A1 (en) |
FR (1) | FR2899714B1 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2765354A1 (en) | 1997-06-25 | 1998-12-31 | Gregoire Parcollet | Film dubbing synchronisation system |
US5970459A (en) | 1996-12-13 | 1999-10-19 | Electronics And Telecommunications Research Institute | System for synchronization between moving picture and a text-to-speech converter |
WO2002082428A1 (en) | 2001-04-05 | 2002-10-17 | Koninklijke Philips Electronics N.V. | Time-scale modification of signals applying techniques specific to determined signal types |
-
2006
- 2006-04-11 FR FR0651300A patent/FR2899714B1/en active Active
-
2007
- 2007-04-11 EP EP07105937A patent/EP1845521A1/en not_active Withdrawn
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5970459A (en) | 1996-12-13 | 1999-10-19 | Electronics And Telecommunications Research Institute | System for synchronization between moving picture and a text-to-speech converter |
FR2765354A1 (en) | 1997-06-25 | 1998-12-31 | Gregoire Parcollet | Film dubbing synchronisation system |
WO2002082428A1 (en) | 2001-04-05 | 2002-10-17 | Koninklijke Philips Electronics N.V. | Time-scale modification of signals applying techniques specific to determined signal types |
Also Published As
Publication number | Publication date |
---|---|
FR2899714B1 (en) | 2008-07-04 |
FR2899714A1 (en) | 2007-10-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8359202B2 (en) | Character models for document narration | |
US8370151B2 (en) | Systems and methods for multiple voice document narration | |
US8103511B2 (en) | Multiple audio file processing method and system | |
JP4875752B2 (en) | Speech recognition in editable audio streams | |
Chang | A Tentative Analysis of English Film Translation Characteristics and Principles. | |
US10423716B2 (en) | Creating multimedia content for animation drawings by synchronizing animation drawings to audio and textual data | |
US20080027726A1 (en) | Text to audio mapping, and animation of the text | |
US20120276504A1 (en) | Talking Teacher Visualization for Language Learning | |
US20200126559A1 (en) | Creating multi-media from transcript-aligned media recordings | |
Rayner et al. | Using public domain resources and off-the-shelf tools to produce high-quality multimedia texts | |
EP1845521A1 (en) | System for movie dubbing | |
EP2489185B1 (en) | Method for adding voice content to video content and device for implementing said method | |
US6615249B2 (en) | Method for comparing and synchronizing data from different data sources | |
Soens et al. | On split dynamic time warping for robust automatic dialogue replacement | |
Dwyer | Undoing dubbing | |
US20240087557A1 (en) | Generating dubbed audio from a video-based source | |
Bernstein | Making Audio Visible: The Lessons of Visual Language for the Textualization of Sound | |
JP7335316B2 (en) | Program and information processing device | |
FR2765354A1 (en) | Film dubbing synchronisation system | |
Beattie | Furrowing sound | |
JPH01245677A (en) | Method for editing video of 'karaoke' (recorded orchestral accompaniment) | |
Nesrine et al. | “Being the Queen: life of Queen Elizabeth II” An annotated subtitling of a documentary film | |
WO2022117993A2 (en) | Reading system and/or method of reading | |
Atkinson | Sound and writing: Complementary facets of the Anglo-Scottish ballad | |
Spiteri Miggiani et al. | Script components |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC MT NL PL PT RO SE SI SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL BA HR MK YU |
|
17P | Request for examination filed |
Effective date: 20080414 |
|
AKX | Designation fees paid |
Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC MT NL PL PT RO SE SI SK TR |
|
17Q | First examination report despatched |
Effective date: 20080611 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN |
|
18W | Application withdrawn |
Effective date: 20111118 |