FR2835087A1

FR2835087A1 - CUSTOMIZING THE SOUND PRESENTATION OF SYNTHESIZED MESSAGES IN A TERMINAL

Info

Publication number: FR2835087A1
Application number: FR0200851A
Authority: FR
Inventors: Ghislain Moncomble; Philippe Passelaigue; Jean Pierre Remy
Original assignee: France Telecom SA
Current assignee: Orange SA
Priority date: 2002-01-23
Filing date: 2002-01-23
Publication date: 2003-07-25
Anticipated expiration: 2022-01-23
Also published as: FR2835087B1; WO2003063133A8; WO2003063133A1

Abstract

Pour personnaliser la présentation sonore de messages synthétisés dans un terminal (1), des caractéristiques acoustiques (CV) décrivant une voix (V) sont sélectionnées dans un catalogue de caractéristiques acoustiques prémémorisé dans un serveur (2) afin de les transmettre à un équipement de synthèse vocale (3). Un message textuel (MT) qui peut être sélectionné dans le terminal est synthétisé dans l'équipement en dépendance des caractéristiques acoustiques sélectionnées en un message vocal (MS) qui est transmis au terminal pour l'écouter. Au moins un bruitage (B) peut être sélectionné dans le serveur pour être mélangé au message vocal.To customize the sound presentation of messages synthesized in a terminal (1), acoustic characteristics (CV) describing a voice (V) are selected from a catalog of acoustic characteristics pre-memorized in a server (2) in order to transmit them to a device. speech synthesis (3). A textual message (MT) which can be selected in the terminal is synthesized in the equipment depending on the selected acoustic characteristics into a voice message (MS) which is transmitted to the terminal for listening. At least one sound effect (B) can be selected in the server to be mixed with the voice message.

Description

que la vis moietée B agit sur la corde par l'intermédiaire d'un levier C.that the serrated screw B acts on the rope by means of a lever C.

Personnalisation de la présentation sonore de messages synthétisés dans un terminal La présente invention concerne la présentation sonore de messages dans un terminal. Les messages sont initialement textuels, puis synthétisés vocalement par un moyen de synthèse vocale interne ou The present invention relates to the audio presentation of messages in a terminal. The messages are initially textual, then synthesized by voice by a means of internal voice synthesis or

externe au terminal.external to the terminal.

o Actuellement, les serveurs vocaux interactifs ou tout autre moyen de synthèse vocale accessible par l'interméUlaire d'un serveur diffusent des messages qui résultent de la synthèse vocale de messages textuels sur la base de modèles vocaux artificiels ou naturels. Excepté un préfiltrage et un réglage des graves et des aigus dans les moyens audio de la plupart des terminaux, tels que récepteurs de télévision, récepteurs radiophoniques ou terminaux personnels du type ordinateur ou assistant numérique ou terminal téléphonique ou radiotélophonique, les usagers de ces terminaux consultant les serveurs vocaux écoutent tous les mêmes voix pour la diffusion de s me s sages synChét i sés sans aucune influence o Currently, interactive voice servers or any other means of voice synthesis accessible via a server broadcast messages that result from the voice synthesis of text messages on the basis of artificial or natural voice models. With the exception of pre-filtering and adjustment of the bass and treble in the audio means of most terminals, such as television receivers, radio receivers or personal terminals of the computer type or digital assistant or telephone or radiotelephone terminal, the users of these terminals consult the voice servers all listen to the same voices for the dissemination of synchetised wise messages without any influence

personnelle sur celles-ci.personal on them.

A cause des messages diffusés avec une seule voix par un serveur vocal donné, les messages ne sont Because of messages broadcast with a single voice by a given voice server, messages are not

pas touj ours très bien percus par certains usagers. not always very well perceived by some users.

La présente invention vise à remédier aux inconvénients des serveurs "mono-voix" de la technique antérieure afin que chaque usager personnalise le contexte acoustique des messages vocaux diffusés par les serveurs, et ainsi rende plus intelligibles et perceptibles, et donc plus The present invention aims to remedy the drawbacks of “single-voice” servers of the prior art so that each user personalizes the acoustic context of the voice messages broadcast by the servers, and thus makes them more intelligible and perceptible, and therefore more

familiers, les messages vocaux diffusés. familiar, broadcast voice messages.

A cette fin, un procédé pour personnaliser la présentation sonore de messages synthétisés dans un terminal, est caractérisé en ce qu'il comprend des étapes de sélectionner des caractéristiques acoustiques décrivant une voix dans un premier catalogue de caractéristiques acoustiques prémémorisé dans un moyen serveur afin de les transmettre à un moyen de synthèse vocale, et synChétiser un message lo textuel dans le moyen de synthèse vocale en dépendance des caractéristiques acoustiques sélectionnées en un message vocal qui est transmis au terminal. De préférence, l' invention complète la sélection des caractéristiques acoustiques pour composer une voix par une étape de sélectionner des caractéristiques acoustiques décrivant un bruitage dans un deuxième catalogue de caractéristiques acoustiques prémémorisé dans le moyen serveur afin de les transmettre avec les caractéristiques acoustiques sélectionnces décrivant la voix au moyen de synthèse vocale pour que celui-ci transmette le message vocal To this end, a method for personalizing the sound presentation of messages synthesized in a terminal, is characterized in that it comprises steps of selecting acoustic characteristics describing a voice from a first catalog of acoustic characteristics prememorized in a server means in order to transmitting them to a voice synthesis means, and synchronizing a text message in the voice synthesis means depending on the acoustic characteristics selected in a voice message which is transmitted to the terminal. Preferably, the invention completes the selection of the acoustic characteristics to compose a voice by a step of selecting the acoustic characteristics describing a sound effect in a second catalog of acoustic characteristics prememorized in the server means in order to transmit them with the selected acoustic characteristics describing the voice by means of text-to-speech so that the latter transmits the voice message

mélangé au bruitage sélectionné au terminal. mixed with the sound effects selected at the terminal.

Au lieu de sélectionner des caractéristiques acoustiques pour décrire et ainsi composer une voix et éventuellement un bruitage, l'usager du terminal peut sélectionner directement une voix dans le premier catalogue et éventuellement un bruitage dans le deuxTème catalogue, ou encore sélectionner une combinaison qui est prémémorisée dans un troisième catalogue au moins dans le terminal et qui comprend au moins une voix et éventuellement au moins un bruitage afin de synthétiser tout message textuel en dépendance de caractéristiques acoustiques de la Instead of selecting acoustic characteristics to describe and thus compose a voice and possibly a sound effect, the terminal user can directly select a voice in the first catalog and possibly a sound effect in the second catalog, or even select a combination which is stored. in a third catalog at least in the terminal and which includes at least one voice and possibly at least one sound effect in order to synthesize any text message depending on the acoustic characteristics of the

combinaison.combination.

La sélection de voix et éventuellement de bruitage est de préférence accompagnée d'une sélection de caractéristiques d'une présentation visuelle, qui peut être un fond d'écran et/ou une animation faciale, dans un quatrième catalogue prémémori s é dans le moyen serveur a f in de le s transmettre au terminal et d'afficher la présentation visuelle dans le terminal en synchronisme avec la The selection of voices and possibly sound effects is preferably accompanied by a selection of characteristics of a visual presentation, which can be a wallpaper and / or a facial animation, in a fourth catalog prememori ed in the server medium. in order to transmit it to the terminal and display the visual presentation in the terminal in synchronism with the

reproduction du message vocal dans le terminal. reproduction of the voice message in the terminal.

L' invention concerne également un système pour personnaliser la présentation sonore de messages dans un terminal, pour la mise en oeuvre du procédé de l' invention. Le système est caractérisé en ce qu'il i comprend un moyen serveur pour mémoriser des caractéristiques acoustiques décrivant des voix, un moyen de synChèse vocale dans lequel des voix sont décrites en dépendance de caractéristiques acoustiques, et un moyen applicatif dans le terminal pour sélectionner dans le moyen serveur des caractéristiques acoustiques décrivant une voix afin que le moyen de synthèse vocale synthétise au moins un message textuel selon les caractéristiques acoustiques sélectionnées en un message vocal transmis au terminal. De préférence, le moyen serveur mémorise également des caractéristiques acoustiques de bruitage à mélanger sélectivement à la voix décrite. D'autres caractéristiques et avantages de la présente invention apparaîtront plus clairement à la The invention also relates to a system for personalizing the audio presentation of messages in a terminal, for implementing the method of the invention. The system is characterized in that it comprises a server means for storing acoustic characteristics describing voices, a voice synchesis means in which voices are described depending on acoustic characteristics, and an application means in the terminal for selecting from the server means of the acoustic characteristics describing a voice so that the voice synthesis means synthesizes at least one text message according to the selected acoustic characteristics into a voice message transmitted to the terminal. Preferably, the server means also stores acoustic noise characteristics to be selectively mixed with the voice described. Other characteristics and advantages of the present invention will appear more clearly on

lecture de la description suivante de plusieurs reading the following description of several

réalisations préférées de l' invention en référence aux dessins annexés correspondants dans lesquels: - la figure 1 est un bloc-diagramme schématique d'un système pour la présentation sonore de messages selon une réalisation préférée de l' invention; et - la figure 2 est un algorithme d'un procédé de présentation sonore de messages selon l' invention. En référence à la figure 1, un système de présentation sonore de messages selon l' invention comprend essentiellement un terminal d'usager 1 muni lo au moins d' un haut-parleur ou d' un écouteur, un serveur central de sons 2 et un équipement de synthèse vocale 3. Le système repose sur une architecture du type client-serveur entre le terminal 1 associé à l'équipement 3 et le serveur central de preferred embodiments of the invention with reference to the corresponding appended drawings in which: - Figure 1 is a schematic block diagram of a system for the audio presentation of messages according to a preferred embodiment of the invention; and FIG. 2 is an algorithm of a method for the audio presentation of messages according to the invention. With reference to FIG. 1, a sound presentation system for messages according to the invention essentially comprises a user terminal 1 provided with at least one loudspeaker or one earpiece, a central sound server 2 and a text-to-speech equipment 3. The system is based on an architecture of the client-server type between the terminal 1 associated with the equipment 3 and the central server of

sons 2.sounds 2.

Par exemple, le terminal d'usager 1 est un ordinateur personnel ou un assistant numérique personnel, ou bien un récepteur de télévision ou de radio intelligent, ou un téléphone fixe ou un radiotélophone cellulaire mobile. L'équipement de synChèse vocale 3 est relié au terminal 1 par une liaison classique 4 du type filaire ou radicélectrique de proximité. En variante, l'équipement 3 est intégré de manière amovible, comme une carte, dans le terminal 1. D'autre part, le terminal 1 est relié au serveur central 2 par l'intermédiaire d'un réscau d'accès 5 correspondant au type du terminal et un réscau de paquets 6 tel que For example, the user terminal 1 is a personal computer or a personal digital assistant, or else a television or intelligent radio receiver, or a landline telephone or a mobile cellular radio telephone. The voice synchesis equipment 3 is connected to the terminal 1 by a conventional link 4 of the wired or proximity radio type. As a variant, the equipment 3 is removably integrated, like a card, in the terminal 1. On the other hand, the terminal 1 is connected to the central server 2 via an access network 5 corresponding to the type of terminal and a packet network 6 such as

le réseau internet.the internet network.

L'équipement de synthèse vocale 3 comprend es sentiellement une mémoire tampon 30 pour mémori ser un message textuel MT à synthétiser, et de préférence au moins un texte de test TE à synthétiser, un analyseur 31 de la phonétique et de la prosodie du texte à synChétiser, un synthétiseur vocal 32 proprement dit, et un générateur 33 générant un modèle acoustique en fonction de caractéristiques acoustiques CA délivrées par le terminal 1 et fournies par le serveur central 2. Les éléments fonctionnels 30 à 33 représentent schématlquement l'équipement de synthèse vocale pour une meilleure compréhension de l' invention et peuvent correspondre The voice synthesis equipment 3 essentially comprises a buffer memory 30 for storing a text message MT to be synthesized, and preferably at least one test text TE to be synthesized, an analyzer 31 of the phonetics and of the prosody of the text to be synthesized. synChetize, a voice synthesizer 32 proper, and a generator 33 generating an acoustic model according to acoustic characteristics CA delivered by the terminal 1 and supplied by the central server 2. The functional elements 30 to 33 schematically represent the voice synthesis equipment for a better understanding of the invention and may correspond

à des modules logiciels.to software modules.

L' invention concerne plus particulièrement le o troisième module 33 qui définit un modèle acoustique en dépendance des paramètres, notamment tels que des valeurs, des caractéristiques CA d'un son, tel qu'une voix mélangée éventuellement à un bruitage, afin d'appliquer à ce modèle des règles prédéterminées pour synChétiser un message textuel MT transcrlt phonétiquement et prosadiquement dans l'analyseur 31 en un message vocal synChétisé MS transmis par le synChétiseur 32 au terminal 1. Selon une autre variante, les caractéristiques CA recues dans le o générateur 3 permettent de sélectionner des unités acoustiques qui sont concaténées selon des règles prédéterminéss dans le synthétiseur 32 afin de reproduire vocalement un message analysé The invention relates more particularly to the third module 33 which defines an acoustic model in dependence on the parameters, in particular such as values, CA characteristics of a sound, such as a voice possibly mixed with a sound effect, in order to apply to this model of the predetermined rules for synchronizing a textual message MT transcribed phonetically and prosadically in the analyzer 31 into a synchronized voice message MS transmitted by the synchronizer 32 to the terminal 1. According to another variant, the characteristics CA received in the generator o allow to select acoustic units which are concatenated according to predetermined rules in the synthesizer 32 in order to reproduce vocally an analyzed message

phonétiquement et prosodiquement dans l'analyseur 31. phonetically and prosodically in the analyzer 31.

Quel que soit le type de synthèse vocale mise en oeuvre dans le synthétiseur 32, la synChèse vocale est définie en dépendance de caractéristlques acoustiques CA traitées dans le module 33 et sélectionnées par le terminal 1. Comme on le verra Whatever the type of speech synthesis implemented in the synthesizer 32, the speech synthesis is defined in dependence on acoustic characteristics CA processed in the module 33 and selected by the terminal 1. As will be seen

dans la suite de la description en référence à la in the following description with reference to

figure 2, le terminal 1 supporte une application de pe rsonnal i sat ion de présentat ion sonore 1 0 pour sélectionner les caractéristiques acoustiques CA d'un son à composer ou pour sélectionner une empreinte d'un son dAcrite par des caract6ristiques acoustiques FIG. 2, the terminal 1 supports an application for personal sound presentation of sound 1 0 to select the acoustic characteristics AC of a sound to be composed or to select a fingerprint of a sound described by acoustic characteristics

dans le serveur central 2.in the central server 2.

Les messages textuels MT mmoriser temporairement dans la mmoire 10 pour les s synth tiser sont fournis par l'usager du terminal 1, soit en les saisissant avec le clavier ou par reconnaissance vocale dans le terminal, soit en les lisant dans la mmoire du terminal s'ils ont At prenregistrAs dans le terminal, ou bien encore en o les tAlAchargeant depuis des serveurs de doeuments travers les rAseaux 5 et 6. De prAfArence, le message de test TE contenu dans la mmoire 30 est prAsAlectionnA par l'usager du terminal 1 et donc connu par l'usager pour tester une voix paramAtre par l'usager et modAlise dans l'@quipement de Text messages MT temporarily stored in memory 10 to synthesize them are provided by the user of terminal 1, either by entering them with the keyboard or by voice recognition in the terminal, or by reading them in the memory of terminal s '' They were pre-registered in the terminal, or even where they are downloaded from servers of servers across networks 5 and 6. Preferably, the test message TE contained in memory 30 is pre-selected by the user of terminal 1 and therefore known by the user to test a user-defined and modalized voice in the equipment of

sYnChAse vocale 3.vocal system 3.

Selon une autre variante d' architecture de sYstAme, au moins un quipement de synChAse vocale 3 est intgr dans le serveur central 2 et partag par plusieurs terminaux d'usager 1 pour lesquels des identificateurs d'usager IDU sont associAs respectivement des messages de test MI respectivement. Le serveur central 2 est alors analogue un serveur vocal interactif dans lequel 2s des voix synth tises et leurs caractAristiques peuvent Atre sAlectionnes par des usagers pour Acouter des messages vocaux ou multimAdias. Dans cette variante, les Achanges notamment de commandes de caractAristique acoustique et de diffusion de message vocal sont effectuAs travers le rseau d'accAs 5 et le rsaau de paquets 6 entre le terminal 1 et le seveu 2, et non Agalement entre le serveur 2 et l'4quipement 3 via le terminal 1 selon la figure 1. Selon une autre variante, le serveur central de sons 2 est réparti en plusieurs serveurs centraux dans chacun desquels un ou plusieurs catalogues de According to another variant of the architecture of the system, at least one voice synchronization device 3 is integrated in the central server 2 and shared by several user terminals 1 for which user identifiers IDU are associated respectively with test messages MI respectively. The central server 2 is then analogous to an interactive voice server in which 2s of the synthesized voices and their characteristics can be selected by users to listen to voice or multimedia messages. In this variant, the exchanges, in particular of commands of acoustic character and of voice message broadcasting, are carried out through the access network 5 and the packet network 6 between the terminal 1 and the server 2, and not also between the server 2 and the equipment 3 via the terminal 1 according to FIG. 1. According to another variant, the central sound server 2 is distributed into several central servers in each of which one or more catalogs of

fichiers définis ci-après peuvent être consultés. files defined below can be viewed.

s Le serveur central 2 comprend es sent iel lement trois catalogues de fichiers de son V(CV,AV), B(CB,AB) et C(CC,AC) dans lesquels un usager de terminal peut puiser pour personnaliser la lo présentation sonore de messages vocaux reproduits dans son terminal. Tous ces fichiers sont sélectionnables depuis le terminal 1 au moyen de s The central server 2 essentially comprises three catalogs of sound files V (CV, AV), B (CB, AB) and C (CC, AC) from which a terminal user can draw to personalize the sound presentation. of voice messages reproduced in his terminal. All these files can be selected from terminal 1 using

l' application 10.the application 10.

Le premier catalogue de fichiers V(CV,AV) est relatif à des voix dont les empreintes vocales ont été enregistrées et analysoes afin de mémoriser les caractéristiques acoustiques essentielles CV de ces voix. Par exemple, les caractéristiques acoustiques décrivant une voix prédéterminée V et contenues dans un fichier du premier catalogue concernent le sexe masculin ou féminin, l'âge sous la forme d'une période relative à l'enfance ou l' adolescence ou l'âge adulte ou la vieillesse, des caractères prosodiques tels que des durées successives de 2s segments syllabiques, l'emphase portant notamment sur l' accent sur des constituants de phrase, les fréquences laryngiennes et fondamentale relatives à la hauteur de la voix (en anglais "pitch"), le déUit ou le rythme de parole qui peut être lent ou rapide ou intermédiaire, le niveau de son exprimé en décibels, etc. Le fichier d'une voix prédéterminée V contient également des attributs AV spécifiques à chaque voix qui sont facultatifs et qui concernent le propriétaire de la voix tel qu'un usager ou une société ou un organlsme en tant qu'usager collectif, et/ou des restrictions d'accès au fichier de voix afin que celui-ci puisse être diffusé et utilisé par des usagers prédéterminés qui sont repérés par des identificateurs IDU introduits dans une liste d'usagers UV autorisés à utiliser la voix prédéterminée, ou des caractéristiques définissant un profil d'usager qu'un usager doit présenter pour accéder à l'usager de la voix, et/ou une rémunération o de l' usage de la voix qui peut être éventuellement gratuit, et toute autre caractéristique contribuant à The first catalog of files V (CV, AV) relates to voices whose vocal imprints were recorded and analyzed in order to memorize the essential acoustic characteristics CV of these voices. For example, the acoustic characteristics describing a predetermined voice V and contained in a file of the first catalog relate to the male or female sex, age in the form of a period relating to childhood or adolescence or adulthood or old age, prosodic characters such as successive durations of 2s syllabic segments, the emphasis in particular on the accent on phrase components, the laryngeal and fundamental frequencies relating to the pitch of the voice (in English "pitch" ), the deUit or the rhythm of speech which can be slow or fast or intermediate, the level of sound expressed in decibels, etc. The file of a predetermined voice V also contains AV attributes specific to each voice which are optional and which concern the owner of the voice such as a user or a company or an organization as a collective user, and / or access restrictions to the voice file so that it can be distributed and used by predetermined users who are identified by IDU identifiers entered in a list of UV users authorized to use the predetermined voice, or characteristics defining a profile of user that a user must present to access the user of the voice, and / or a remuneration o of the use of the voice which can be possibly free, and any other characteristic contributing to

commercialiser la voix prédéterminée. market the predetermined voice.

Le deuxième catalogue de fichiers B(CB,AB) est relatif à des bruitages B qui peuvent être des effets sonores, des sons spéciaux ou des morceaux musicaux dont un ou plusieurs peuvent être sélectionnés par l'usager du terminal 1 afin d'être superposés à la voix sélectionnse avec laquelle un message textuel est synthétisé. Chaque bruitage B est défini, comme les voix dans le premier catalogue, par des caractéristiques acoustiques CB et le cas échéant est associé à des attributs AB et à une liste d'usagers The second catalog of files B (CB, AB) relates to sound effects B which can be sound effects, special sounds or musical pieces of which one or more can be selected by the user of the terminal 1 in order to be superimposed to the voice selects with which a text message is synthesized. Each sound effect B is defined, like the voices in the first catalog, by acoustic characteristics CB and, where appropriate, is associated with attributes AB and a list of users.

autorisés UB.authorized UB.

Pour des raisons pratiques de téléchargement, les différents bruitages du deuxième catalogue sont constitués de préférence par des fichiers de son de petite taille chaînés et en boucle. Un ou plusieurs fichiers de bruitage chaînés peuvent être téléchargés For practical reasons of downloading, the different sound effects in the second catalog are preferably made up of small, chained and looped sound files. One or more linked sound files can be downloaded

dans le terminal.in the terminal.

Le troisième catalogue de fichiers C(CC,AC) est relatif à des fichiers de combinaisons de sons qui résultent chacune de caractéristiques acoustiques CC combinant les caractéristiques acoustiques CV et CB au moins d'une voix V et éventuellement au moins d'un bruitage B. ou de plusieurs combinaisons de voix et The third catalog of files C (CC, AC) relates to files of combinations of sounds which each result from acoustic characteristics CC combining the acoustic characteristics CV and CB at least of a voice V and possibly at least of a sound effect B . or several combinations of voices and

éventuellement de bruitage réparties temporellement. possibly sound effects distributed in time.

Chaque combinaison est ainsi définie par des caractéristiques acoustiques CC et associée à des Each combination is thus defined by CC acoustic characteristics and associated with

attributs AC et à une liste d'usagers autorisés UC. attributes AC and to a list of authorized UC users.

s Un usager de terminal peut ainsi définir un programme audio qui est partagé en diverses périodes au cours desquelles des combinaisons de voix et de bruitage personnaliseront respectivement des portions d'un message textuel MT à synChétiser. Les attributs 1O d'une combinaison définissent notamment les longueurs de période s pour de s combina i sons sonore s respectives, ainsi que l' instant de début de ces périodes par rapport à l' instant de début d'un message. i5 De préférence, le serveur central de sons 2 comprend un quatrième catalogue PV(CPV,APV) relatif à des présentations visuelles PV de messages textuels à synthétiser définies chacune par des caractéristiques CPV et des attributs APV. Les caractéristiques CPV d'une présentation visuelle concernent un fond d'écran, ou des images plus ou moins animées, ou plus particulièrement le visage d' une tête d' un animateur dont les yeux et la bouche au moins sont animés en fonction de la prononciation du message vocal synChétisé au moyen d'un moteur d' animation faciale implémenté dans le terminal d'usager. Toute la tête de l'animateur ou des éléments tels qu'yeux et bouches peuvent être choisis dans le quatrième catalogue. Comme dans les catalogues précédents, des attributs de présentation visuelle APV définissent le propriétaire de la présentation vi suel le PV, des restrictions d'accès par exemple associées à une llste d'usagers autorisés UPV, ou une rémunération s A terminal user can thus define an audio program which is divided into various periods during which combinations of voice and sound effects will respectively personalize portions of a text message MT to be synchronized. The attributes 10 of a combination notably define the lengths of period s for respective combinations of sound sounds, as well as the start time of these periods relative to the start time of a message. i5 Preferably, the central sound server 2 comprises a fourth PV catalog (CPV, APV) relating to visual presentations PV of text messages to be synthesized each defined by CPV characteristics and APV attributes. The CPV characteristics of a visual presentation relate to a wallpaper, or more or less animated images, or more particularly the face of a head of an animator whose eyes and mouth at least are animated according to the pronunciation of the synchronized voice message by means of a facial animation engine implemented in the user terminal. The whole animator's head or elements such as eyes and mouths can be chosen from the fourth catalog. As in previous catalogs, APV visual presentation attributes define the owner of the visual presentation on the PV, access restrictions for example associated with a list of UPV authorized users, or remuneration

relative à la présentation visuelle sélectionnée. relating to the selected visual presentation.

En variante, les quatre catalogues définis ci- As a variant, the four catalogs defined above

dessus sont répartis dans des serveurs respectifs au lieu d'être centralisés dans un unique serveur 2, et/ou sont déclinés par exemple par région géographique afin d'offrir notamment des voix et des bruitages adaptés à des coutumes régionales ou above are distributed in respective servers instead of being centralized in a single server 2, and / or are declined for example by geographical region in order to offer in particular voices and sound effects adapted to regional customs or

locales et de réduire le temps de réponse. and reduce response time.

On se réfère maintenant à la figure 2 pour 0 décrire les étapes principales E1 à E14 du procédé pour personnaliser la présentation sonore de messages synChétisés MS depuis le terminal d'usager 1. Le terminal 1 est désigné par un identificateur IDU qui peut comprendre un numéro de téléphone ou une adresse i5 IP (Internet Protocol) accompagnce le cas échéant Reference is now made to FIG. 2 for describing the main steps E1 to E14 of the method for personalizing the audio presentation of synchronized messages MS from the user terminal 1. The terminal 1 is designated by an identifier IDU which may include a number phone or an Internet Protocol (i5) IP address if applicable

d' un code confidentiel d' accès.a confidential access code.

En fonction du type du terminal, les commandes relatives à des sélections effectuées au cours du déroulement du procédé correspondent à l'appui d'une touche du clavier du terminal par exemple traduit en un code à multifréquence DTMF (Dual Tone MultiFrequency) pour un téléphone ou un radictélophone ou une commande spécifique à un protocole de communication ou d' interface graphique d'usager, ou bien encore correspondent à une commande vocale reconnue par un moyen de reconnaissance vocale inclus dans le terminal 1 et/ou le serveur 2. Les diverses sélections sont assistées de préférence par des pages affichées dans le terminal, lorsque celui ci possède un afficheur ou un écran d'affichage, le dialogue entre le terminal 1 et le serveur 2 s'effectuant ainsi d'une manière connue. En variante, le dialogue entre le terminal 1 et le serveur central 2 est réalisé par l'intermédiaire d'un serveur vocal interactif. La présentation des catalogues consultés dans le serveur 2 par le terminal 1 est arborescente, c'est--dire ralise par l'intermAdiaire de menus et sousmenus successifs avec un retour un menu principal. En fonction de l' application de personnalisation implmente dans le terminal 1, l'usager slectionne les caractAristiques acoustiques d' au moins une voix V et/ou un bruitage B et/ou d'une comb ina i son de sons C so it di rectement dans le 0 serveur central 2, soit aprs tAlAchargement dans le terminal 1 d'une partie des catalogues relative des fichiers mis la disposition du public, ou dont Depending on the type of terminal, the commands relating to selections made during the course of the process correspond to the pressing of a key on the terminal keyboard, for example translated into a DTMF (Dual Tone MultiFrequency) code for a telephone. or a radio recorder or a command specific to a communication protocol or user graphical interface, or else correspond to a voice command recognized by a voice recognition means included in the terminal 1 and / or the server 2. The various selections are preferably assisted by pages displayed in the terminal, when the latter has a display or a display screen, the dialogue between the terminal 1 and the server 2 thus being carried out in a known manner. As a variant, the dialogue between the terminal 1 and the central server 2 is carried out by means of an interactive voice server. The presentation of the catalogs consulted in the server 2 by the terminal 1 is tree-like, that is to say made by means of successive menus and submenus with a return to a main menu. Depending on the personalization application implemented in terminal 1, the user selects the acoustic characteristics of at least one voice V and / or one B sound effect and / or one sound combination C so it di directly in the 0 central server 2, either after download in terminal 1 of a part of the catalogs relating to the files made available to the public, or whose

l'accAs est autoris pour cet usager. access is authorized for this user.

La figure 2 illustre un exemple prAfArA d'Atapes composant le procAdA de personnalisation selon FIG. 2 illustrates a prAfArA example of steps composing the personalization procAdA according to

l' invention, la fois pour les variante ci-dessus. the invention, both for the above variants.

D'une maniAre classique l'Atape E1, l'usager devant le terminal 1 ouvre une session de l'application 10 relative la prAsentation sonore de messages synthtiser afin que le terminal 1 appelle le serveur central 2 dont l'adresse TP est mmorise dans le terminal. Si l' application le permet, l'usager sAlectionne dans un menu les catalogues ou des catgories de son dans les catalogues et les 2s tAlAcharge depuis le serveur 2 dans le terminal 1 afin de procAder diverses sAlections selon les Atapes suivantes pour construire une prAsentation sonore personnalise. Sinon, les Atapes suivantes de sAlection sont effectues travers un dialogue de questions-rponses entre le terminal 1 et le serveur 2 gui construit au fur et mesure des sAlections par l'usager une prAsentation sonore dtermine. On se In a conventional manner at Step E1, the user in front of the terminal 1 opens a session of the application 10 relating to the sound presentation of synthesized messages so that the terminal 1 calls the central server 2 whose address TP is stored in the terminal. If the application allows it, the user selects in a menu the catalogs or categories of sound in the catalogs and the 2s downloaded from the server 2 in the terminal 1 in order to make various selections according to the following steps to build a sound presentation customizes. Otherwise, the following selection steps are carried out through a question-and-answer dialogue between terminal 1 and server 2, which is built as the user selects a specific sound presentation. We are

rfArera ci-aprAs cette deuxiAme variante. Refer to this second variant below.

A l'6tape suivante E3, l' application 10 demande l'usager s'il souhaite sAlectionner au moins l'une de ses combinaisons favorites déterminées et mémorisces antérieurement, éventuellement en association avec l'identificateur IDU dans le serveur In the next step E3, the application 10 asks the user if he wishes to select at least one of his favorite combinations determined and stored previously, possibly in association with the identifier IDU in the server.

2, si elles existent dans le troisième catalogue. 2, if they exist in the third catalog.

Sinon, des caractéristiques de son sont sélectionnéss ci-après par l'usager afin de constituer une combinaison de sons personnalisant la présentation vocale de messages textuels MT à synChétiser par Otherwise, sound characteristics are selected below by the user in order to constitute a combination of sounds personalizing the vocal presentation of MT text messages to be synchronized by

l'équipement de synthèse vocale 3. text-to-speech equipment 3.

o A l'étape suivante E4, l'usager du terminal 1 sélectionne des caractéristiques acoustiques CV en validant les paramètres de celles-ci de manière à constituer une voix personnalisée V dans le premier catalogue de fichiers de voix. En variante, au lieu de sélectionner des caractéristiques acoustiques de voix V, l'usager sélectionne une voix V parmi plusieurs voix autorisces dans le premier catalogue, chacune d'entre elles étant désignée par un nom et o In the next step E4, the user of the terminal 1 selects acoustic characteristics CV by validating the parameters thereof so as to constitute a personalized voice V in the first catalog of voice files. As a variant, instead of selecting acoustic characteristics of voice V, the user selects a voice V from among several authorized voices in the first catalog, each of which is designated by a name and

une brève description des caractéristiques a brief description of the features

acoustiques de celle-ci. A l'étape E4, l' application propose optionnellement à l'usager de personnaliser encore plus la présentation sonore de s e s me s sage s en enregi st rant une empreinte vocale prédéterminée, par exemple une phrase prédéterminée prononcée par l'usager. Si l'usager souhaite mélanger la voix définie par les caractéristiques sélectionnées CV ou la voix sélectionnée précédemment à un ou plusieurs bruitages B. l'usager sélectionne à l'étape E5, d'une manière analogue à l'étape E4, des caractéristiques acoustiques CB dans le deuxième catalogue de fichiers de bruitage de manière à déterminer un ou plusieurs bruitages, ou sélectionne directement un ou plusieurs bruitages autorisés déflnis chacun par des caractéristiques acoustiques prédéterminces. Après les étapes E4 et E5, les caractéristiques de voix et éventuellement de bruitage sélectionnces directement ou indirectement sont transmises au générateur de modèle acoustique 33. Puis un texte de test TE est éventuellement sélectionné à l'étape E6 afin que le texte de test sélectionné serve d'essai pour la synthèse vocale dans le synthétiseur 32, dépendant d'un modèle acoustique défini par les caractéristiques lo sélectionnées CV dans le générateur 33, avant de valider définitivement le choix des caractéristiques acoustiques de la combinaison sélectionnée de voix et de bruitage CS = CV + CB sélectionnce aux étapes précédentes E4 et E5. Le texte de test TE sélectionné :5 à l'étape E6 peut étre un texte ou une combinaison de textes numérisée, préenregistré dans le terminal 1 ou la mémoire tampon 30 de l'équipement 3, ou saisi directement par l'usager dans le terminal 1, ou téléchargé depuis un ou plusieurs serveurs de doeuments au moins textuels via le terminal 1 dans la mémoire 30 de l'équipement 3. Le texte de test TE est de préférence mémorisé dans la mémoire 30 de l'équipement 3 notamment pour des étapes de test dans acoustics of it. In step E4, the application optionally offers the user to further personalize the sound presentation of messages by recording a predetermined voice print, for example a predetermined sentence pronounced by the user. If the user wishes to mix the voice defined by the selected characteristics CV or the voice previously selected with one or more sound effects B. the user selects in step E5, in a manner analogous to step E4, acoustic characteristics CB in the second catalog of sound effects files so as to determine one or more sound effects, or directly selects one or more authorized sound effects defined each by predetermined acoustic characteristics. After steps E4 and E5, the voice and possibly noise characteristics selected directly or indirectly are transmitted to the acoustic model generator 33. Then a test text TE is optionally selected in step E6 so that the selected test text is used test for voice synthesis in the synthesizer 32, depending on an acoustic model defined by the selected characteristics CV in the generator 33, before definitively validating the choice of acoustic characteristics of the selected combination of voice and sound effects CS = CV + CB selects in the previous steps E4 and E5. The TE test text selected: 5 in step E6 can be a digitized text or combination of texts, prerecorded in the terminal 1 or the buffer memory 30 of the equipment 3, or entered directly by the user in the terminal 1, or downloaded from one or more servers of at least textual items via the terminal 1 in the memory 30 of the equipment 3. The test text TE is preferably stored in the memory 30 of the equipment 3 in particular for steps of test in

des présentations sonores ultérieures. subsequent audio presentations.

A l'étape suivante E7, le texte de test TE lu dans la mémoire 30 est analysé par l'analyseur 31 et synthétisé dans le synthétiseur 32 en fonction notamment du modèle acoustique défini par les caractéristiques acoustiques de voix CV sélectionnces à l'étape E4 avec mixage éventuel de bruitage(s) B sélectionnés à l'étape E5. Le message vocal MS produit par le synthétiseur 32 est transmis au In the next step E7, the test text TE read in the memory 30 is analyzed by the analyzer 31 and synthesized in the synthesizer 32 as a function in particular of the acoustic model defined by the acoustic characteristics of voice CV selected in step E4 with possible mixing of sound effects B selected in step E5. The voice message MS produced by the synthesizer 32 is transmitted to the

terminal 1 afin que l'usager l'écoute. terminal 1 so that the user can listen to it.

Si l'usager n'est pas satisfait par les caractéristiques acoustiques du message vocal produit, l' application 10 lui propose à l' étape E8 de modifier, c' est-à-dire d' aj outer ou retirer ou corriger une caractéristique acoustique CV de la voix décrite V ou CB du ou des bruitages décrits B sélectionnés aux étapes E4 et E5, en revenant à l'étape E4. Finalement, après éventuellement une ou plusieurs répétitions des étapes E4 à E8, le terminal 1 et le serveur 2 mémorisent les caractéristiques acoustiques [CV + CB] de la combinaison sélectionnée 0 CS à l'étape E9, de préférence en l'associant à If the user is not satisfied with the acoustic characteristics of the voice message produced, the application 10 suggests to him in step E8 to modify, that is to say to add or remove or correct an acoustic characteristic CV of the voice described V or CB of the sound or sounds described B selected in steps E4 and E5, returning to step E4. Finally, after possibly one or more repetitions of steps E4 to E8, the terminal 1 and the server 2 store the acoustic characteristics [CV + CB] of the selected combination 0 CS in step E9, preferably by associating it with

l'identificateur d'usager IDU.the user identifier IDU.

Optionnellement, à l'étape E10, l' application 10 propose à l'usager une sélection de caractéristiques CPV dans le quatrième catalogue de fichiers de présentation visuelle afin de sélectionner à l'étape E101 une présentation visuelle telle qu'un fond Optionally, in step E10, the application 10 offers the user a selection of CPV characteristics in the fourth catalog of visual presentation files in order to select in step E101 a visual presentation such as a background.

d'écran et/ou une animation faciale. screen and / or facial animation.

A l'étape suivante E11, le terminal 1 et le serveur 2 mémorisent les caractéristiques de présentation visuelle CPV éventuellement sélectionnées à l'étape E101 en association avec la combinaison de caractéristiques acoustiques de voix et de bruitage [CV + CB] sélectionnées aux étapes E4 à E8. Puis à des étapes suivantes notamment E121, E122 et E123, l' application 10 invite l'usager du terminal 1 à sélectionner un ou plusieurs paramètres notamment temporels et/ou documentaires personnalisant l' usage de la combinaison de sons CS composée aux étapes précédentes. A l'étape E121, l'usager indique deux dates de In the next step E11, the terminal 1 and the server 2 store the visual presentation characteristics CPV possibly selected in step E101 in association with the combination of acoustic characteristics of voice and sound effects [CV + CB] selected in steps E4 to E8. Then at the following steps, in particular E121, E122 and E123, the application 10 invites the user of the terminal 1 to select one or more parameters, in particular time and / or documentary parameters, personalizing the use of the combination of sounds CS composed in the preceding steps. In step E121, the user indicates two dates of

diffusion au sujet de la combinaison sélectionnée CS. broadcast about the selected combination CS.

En pratique, l'usager indique une date de début de diffusion et/ou une date de fin de diffusion de la combinaison sélectionnce. Une ou plusieurs périodes de diffusion peuvent être ainsi sélectionnces pour rendre accessible la combinaison sélectionnée pendant ces périodes. Au moyen de l' association de telles périodes à diverses combinaisons sélectionnées, des programmes audio sont constitués. Les programmes audiovisuels constitués sont de préférence affichés dans le terminal avec leurs positions temporelles respectives. Toutes les données temporelles o précédentes et suivantes sont exprimées en annce, In practice, the user indicates a start date of broadcast and / or an end date of broadcast of the selected combination. One or more broadcast periods can thus be selected to make the selected combination accessible during these periods. By associating such periods with various selected combinations, audio programs are created. The audiovisual programs created are preferably displayed in the terminal with their respective time positions. All the preceding or following temporal data o are expressed in years,

mois, jour, heure, minute et seconde. month, day, hour, minute and second.

A l'étape E122, l' application 10 propose à l'usager de déterminer l' instant de début d' introduction de la combinaison sélectionnée CS ainsi que la durée de celle-ci par rapport à l' instant de début d'écoute d'un message vocal MS synthétisé dans l'équipement 3 afin de constituer avec d'autres combinaisons sélectionnées un programme audio. Optionnellement, l' instant de début et la durée de diffusion de la combinaison sélectionnée CS sont choisis aléatoirement par l' application 10. En variante, plusieurs combinaisons sont sélectionnées pour constituer une série de combinaisons qui est In step E122, the application 10 proposes to the user to determine the instant of start of introduction of the selected combination CS as well as the duration of the latter in relation to the instant of start of listening d a voice message MS synthesized in the equipment 3 in order to constitute with other selected combinations an audio program. Optionally, the start time and the duration of diffusion of the selected combination CS are chosen randomly by the application 10. As a variant, several combinations are selected to constitute a series of combinations which is

répétée périodiquement.repeated periodically.

A l'étape E123, l'application 10 propose à l'usager d'associer des documents prédéterminés à la combinaison sélectionnée CS. Chacun de ces doeuments est repéré par un identificateur qui peut être un nom et/ou une adresse telle qu'une adresse URL (Uniform In step E123, the application 10 offers the user to associate predetermined documents with the selected combination CS. Each of these items is identified by an identifier which can be a name and / or an address such as a URL (Uniform

Resource Locator) lue dans un serveur de site web. Resource Locator) read from a website server.

L' association d'un document avec une ou des combinaisons peut être mise à la disposition de tout usager. Après les étapes E121 et/ou E122 et/ou E123, ou après l'étape E3 ayant sélectionné une combinaison dite "favorite" déjà mémorisée dans le serveur 2 ou le terminal 1 et accessible à l'usager, l'application invite l'usager du terminal 1 à écouter un message textuel MT de son choix afin de le transmettre à l'équipement de synthèse vocale 3 qui a recu les caractéristiques acoustiques CV et CB de la combinaison sélectionnée ou favorite. Le message vocal MS résultant de la synthèse vocale du message textuel sélectionné MT est écouté par l'usager o simultanément à une éventuelle présentation visuelle telle qu'une animation faciale dont les caractéristiques CPV ont été sélectionnées à l'étape E10 et qui est affichée dans le terminal. Puis si l'usager souhaite modifier des caractéristiques i 5 acous t ique s et /ou vi suel l es de la présentat ion du message synthétisé, comme indiqué à l'étape E14, il procède de nouveau à la sélection de caractéristiques acoustiques de voix et/ou de bruitage et éventuellement de caractéristiques de présentation visuelle à partir de l'étape E4. Sinon, la session de l'application 10 au moins entre le terminal 1 et The association of a document with one or more combinations can be made available to any user. After steps E121 and / or E122 and / or E123, or after step E3 having selected a so-called "favorite" combination already stored in the server 2 or the terminal 1 and accessible to the user, the application invites the user of the terminal 1 to listen to a text message MT of his choice in order to transmit it to the text-to-speech equipment 3 which has received the acoustic characteristics CV and CB of the selected or preferred combination. The voice message MS resulting from the voice synthesis of the selected text message MT is listened to by the user o simultaneously with a possible visual presentation such as a facial animation whose CPV characteristics were selected in step E10 and which is displayed in the terminal. Then if the user wishes to modify i 5 acous tic and / or visual characteristics of the presentation of the synthesized message, as indicated in step E14, he again selects acoustic voice characteristics. and / or sound effects and possibly visual presentation characteristics from step E4. Otherwise, the session of the application 10 at least between the terminal 1 and

l'équipement 3 est terminée à l'étape E15. the equipment 3 is completed in step E15.

Claims

1 - Method for personalizing the sound presentation of messages synthesized in a terminal (1), characterized in that it comprises steps of selecting (E4) acoustic characteristics (CV) describing a voice (V) in a first catalog of acoustic characteristics memorized in a server means (2) in order to transmit them to a voice synthesis means 0 (3), and to synthesize (E7) a text message (MT) in the voice synthesis means depending on the selected acoustic characteristics into a voice message (MS) which is

transmitted to the terminal (1).

2 - Method according to claim 1, comprising a step (E5) of selecting acoustic characteristics (CB) describing a sound effect (B) in a second catalog of acoustic characteristics prememorized in the server means (2) in order to transmit them with the selected acoustic characteristics (CV) describing the voice (V) by means of voice synthesis (3) so that it transmits the voice message (MS)

mixed with the sound effects selected at the terminal (1).

3 - Method according to claim 1 or 2, according to which the step of selecting (E4, E5) acoustic characteristics describing a voice (V) or a sound effect (B) is replaced by a step of directly selecting a voice in the first

catalog or sound effects in the second catalog.

4 - Process in accordance with in accordance with one

any of claims 1 to 3, comprising a

step (E6) of selecting the text message (TE) to synthesize. Method according to Claim 4, in which the text message to be synthesized is transmitted by the terminal (1) and stored in the means for

speech synthesis (3).

6 - Method according to claim 4, according to which the text message (TE) to be synthesized is selected in a text document server in order to download it via the terminal (1) in the

voice synthesis means (3).

7 - Process according to any one of

claims 1 to 6, comprising after listening

(E7) of the voice message (MS) transmitted by the voice synthesis means (3) to the terminal (1), a step (E8) of adding or removing or correcting an acoustic characteristic (CV, CB) for describe to.

molns the VOlX.

8 - Process according to any one of

claims 1 to 7, comprising a step (E3) of

select at least one combination (C) stored in a third catalog at least in the terminal and comprising at least one voice (V) and possibly at least one sound effect (B) in order to synthesize any text message (MT) in dependence

acoustic characteristics of the suit.

9 - Process in accordance with any of the

claims 1 to 8, comprising a selection (E10,

E101) of characteristics of a visual presentation (CPV) in a fourth catalog stored in the server means (2) in order to transmit them to the terminal (1) and to display the visual presentation in the terminal in synchronism with the reproduction of the

voice message (MS) in the terminal.

- Process according to any one of

claims 1 to 9, further comprising a step

(E121, E122, E123) to select at least one of the following parameters customizing the use of at least the voice (V) declute by the selected acoustic characteristics: date and period of broadcast of the selected voice, instant of introduction and duration of the selected voice relative to the instant of the start of a voice message i5 synchronized in the voice synthesis means (3), identifier of documents to be associated with the voice, combination of sounds, including the selected voice,

or a series of combinations.

11 - Process according to any one of

claims 1 to 10, comprising beforehand in

the server means (2) a definition of attributes specific at least to voices and relating to the property and / or a restriction of access and / or a

remuneration for the use of votes.

12 - System for personalizing the sound presentation of synDhetized messages in a terminal (1), for implementing the method according to one

any of claims 1 to 11, characterized in

what it includes server means (2) for memorizing acoustic characteristics (CV) describing voices (V), voice synchesis means (3) in which voices are described depending on acoustic characteristics, and application means (10) in the terminal (1) for selecting in the server means (2) acoustic characteristics describing a voice so that the voice synthesis means synthesizes a text message (MT) according to the selected acoustic characteristics in one

voice message (MS) transmitted to the terminal.

13 - System according to claim 12, wherein the server means (2) also stores o acoustic characteristics (CB) of sound effects (B)

to be mixed selectively with the voice described.

14 - System according to claim 12 or 13, wherein the voice synchesis means is an equipment (3) located near or integrated in

the terminal (1).

- System according to claim 12 or 13, wherein the voice synchesis means (3) is