EP1460614A1

EP1460614A1 - Audio device (mobile telephone) for mixing a digital speech signal and a digital music signal

Info

Publication number: EP1460614A1
Application number: EP04290708A
Authority: EP
Inventors: Xavier Fourquin; Pierre Bonnard
Original assignee: Alcatel CIT SA; Alcatel SA
Current assignee: DRNC Holdings Inc
Priority date: 2003-03-21
Filing date: 2004-03-15
Publication date: 2004-09-22
Also published as: CN100490454C; FR2852778A1; US20040186707A1; CN1533120A; US7865360B2; FR2852778B1

Abstract

The device has an analog to digital converter (8) for converting an analog speech signal into a digital speech signal. A storage unit (10) stores a set of coded data representing a musical score comprising a set of notes. A synthesizer (3) extracts the set of the coded data in a digital musical signal. A mixing unit (4) mixes a portion of the digital speech signal and digital audio signal, to produce a digital signal. An independent claim is also included for a telecommunication terminal.

Description

La présente invention concerne un dispositif audio permettant de modifier la voix de l'utiliasteur du dispositif audio et concerne églement un termihanl de télécommuncaition permettant de modifier la voix transmise lors d'une communication téléphonique.The present invention relates to an audio device for modifying the voice of the user of the audio device and also relates to a telecommunication termihanl for modifying the voice transmitted during a telephone communication.

Même si la transmission de la parole reste l'élément essentiel de la téléphonie mobile il n'en demeure pas moins que les fabricants cherchent à différencier leurs produits en offrant de nouveaux services attractifs et distrayants pour le consommateur. Les jeux, les services liés à la reconnaissance vocale ou la multiplication des sonneries d'appel en sont des exemples.Even if speech transmission remains the essential element of mobile telephony, the fact remains that manufacturers seek to differentiate their products by offering new services that are attractive and entertaining for the consumer. Examples are games, services related to voice recognition or the multiplication of ringing tones.

Ces nouveaux services impliquent souvent un coût supplémentaire sur le téléphone lié à l'ajout d'éléments logiciels ou matériels.These new services often involve an additional cost on the phone linked to the addition of software or hardware elements.

La présente invention vise à fournir un dispositif audio offrant un service de modification de la voix transmise par l'utilisateur du terminal, notamment lors d'une communication téléphonique, ce service ayant un caractère attractif et distrayant et étant mis en oeuvre de façon simple et économique.The present invention aims to provide an audio device offering a service for modifying the voice transmitted by the user of the terminal, in particular during a telephone call, this service having an attractive and distracting nature and being implemented in a simple and economic.

La présente invention propose à cet effet un dispositif audio comportant :

des moyens d'entrée par l'utilisateur dudit dispositif audio d'un signal de parole analogique,
un convertisseur pour convertir ledit signal de parole analogique en un signal numérique de parole, ledit signal numérique de parole comportant au moins une fréquence fondamentale,
des moyens pour mémoriser un ensemble de données codées représentant une partition musicale, ladite partition musicale comportant un ensemble de notes, chaque note étant définie par une fréquence fondamentale, une durée et un instrument qui joue ladite note,
des moyens pour extraire dudit ensemble de données codées un signal numérique de musique,

caractérisé en ce que ledit dispositif audio comporte des moyens pour mélanger une première portion dudit signal numérique de parole et une première portion dudit signal numérique de musique afin de produire un signal numérique, dit chanté.The present invention therefore proposes an audio device comprising:

means of input by the user of said audio device of an analog speech signal,
a converter for converting said analog speech signal into a digital speech signal, said digital speech signal comprising at least one fundamental frequency,
means for memorizing a set of coded data representing a musical score, said musical score comprising a set of notes, each note being defined by a fundamental frequency, a duration and an instrument which plays said note,
means for extracting from said coded data set a digital music signal,

characterized in that said audio device includes means for mixing a first portion of said digital speech signal and a first portion of said digital music signal to produce a so-called sung digital signal.

Grâce à l'invention, la voix peut suivre la partition musicaleThanks to the invention, the voice can follow the musical score

Avantageusement, ledit dispositif audio comporte un processeur de traitement de signaux numériques DSP comprenant lesdits moyens pour mélanger lesdites premières portions des signaux numériques de parole et de musique.Advantageously, said audio device comprises a processor for processing digital signals DSP comprising said means for mixing said first portions of the digital speech and music signals.

Avantageusement, lesdits moyens pour mélanger lesdites premières portions desdits signaux numériques de parole et de musique comportent des moyens pour remplacer la fréquence fondamentale dudit signal de parole par la fréquence fondamentale associée à une note dudit signal de musique.Advantageously, said means for mixing said first portions of said digital speech and music signals include means for replacing the fundamental frequency of said speech signal by the fundamental frequency associated with a note of said music signal.

Avantageusement, le remplacement de la fréquence fondamentale dudit signal de parole par la fréquence fondamentale associée à une note dudit signal de musique est réalisé pendant une durée sensiblement égale à la durée de ladite note.Advantageously, the replacement of the fundamental frequency of said speech signal by the fundamental frequency associated with a note of said music signal is carried out for a duration substantially equal to the duration of said note.

Avantageusement, ledit dispositif audio comporte des moyens pour ajouter au dit signal numérique chanté une deuxième portion dudit signal numérique de parole.Advantageously, said audio device comprises means for adding to said digital sung signal a second portion of said digital speech signal.

Avantageusement, ledit dispositif audio comporte des moyens pour ajouter au dit signal numérique chanté une deuxième portion dudit signal numérique de musique.Advantageously, said audio device comprises means for adding to said digital signal sung a second portion of said digital music signal.

De manière avantageuse, lesdits moyens pour mélanger lesdites premières portions desdits signaux numériques de parole et de musique comportent des moyens pour remplacer au moins une fréquence harmonique de la fréquence fondamentale dudit signal de parole par une fréquence harmonique de la fréquence fondamentale associée à une note dudit signal de musique.Advantageously, said means for mixing said first portions of said digital speech and music signals include means for replacing at least one harmonic frequency of the fundamental frequency of said speech signal with a frequency harmonic of the fundamental frequency associated with a note of said music signal.

Avantageusement, ledit dispositif audio comporte des moyens pour discriminer une consonne d'une voyelle dans ledit signal numérique de parole, lesdits moyens de discrimination activant lesdits moyens pour mélanger lesdites premières portions des signaux numériques de parole et de musique pendant la détection de ladite voyelle.Advantageously, said audio device comprises means for discriminating a consonant from a vowel in said digital speech signal, said discrimination means activating said means for mixing said first portions of digital speech and music signals during the detection of said vowel.

Ainsi, le mélange des signaux de parole et de musique aura lieu après une consonne, donc sur une voyelle. Cette détection peut être réalisée en utilisant des moyens de détection d'enveloppes par fenêtres glissantes et une analyse spectrale.Thus, the mixing of speech and music signals will take place after a consonant, therefore on a vowel. This detection can be carried out using means for detecting envelopes by sliding windows and spectral analysis.

Avantageusement, ledit dispositif audio comporte un détecteur d'activité vocale commandant lesdits moyens pour mélanger lesdites premières portions des signaux numériques de parole et de musique.Advantageously, said audio device comprises a voice activity detector controlling said means for mixing said first portions of the digital speech and music signals.

Ainsi, on peut décider de modifier la fréquence fondamentale de la voix seulement après une diminution de l'amplitude dudit signal de voix.Thus, it can be decided to modify the fundamental frequency of the voice only after a reduction in the amplitude of said voice signal.

Avantageusement, ledit dispositif audio comporte un vocodeur, ledit vocodeur exécutant un codage dudit signal chanté.Advantageously, said audio device comprises a vocoder, said vocoder performing coding of said sung signal.

La présente invention propose également un terminal de télécommunication selon l'une des carctéristiques précédentes.The present invention also provides a telecommunication terminal according to one of the preceding characteristics.

La mise en place de ce service sur un termianal de télécommuncaition se fait de façon simple et économique en utilisant par exemple le processeur DSP (Digital Signal Processor) du téléphone.The implementation of this service on a telecommunication terminal is done in a simple and economical way using for example the DSP processor (Digital Signal Processor) of the phone.

De plus, le mélange des signaux numériques de parole et de musique peut se faire en temps réel de sorte que la voix est modifiée puis directement transmise lors d'une communication téléphonique.In addition, the digital speech and music signals can be mixed in real time so that the voice is modified and then directly transmitted during a telephone call.

Avantageusement, ledit dispositif audio comporte des moyens pour transmettre en temps réel ledit signal numérique chanté à un autre terminal.Advantageously, said audio device comprises means for transmitting said sung digital signal to another terminal in real time.

D'autres caractéristiques et avantages de la présente invention apparaîtront dans la description suivante d'un mode de réalisation de l'invention donné à titre illustratif et nullement limitatif.Other characteristics and advantages of the present invention will appear in the following description of an embodiment of the invention given by way of illustration and in no way limitative.

Dans la figure suivante :

La figure 1 représente schématiquement un terminal de télécommunication selon l'invention.

In the following figure:

FIG. 1 schematically represents a telecommunications terminal according to the invention.

La figure 1 représente un terminal 1 de télécommunication selon l'invention tel qu'un téléphone mobile.FIG. 1 represents a telecommunications terminal 1 according to the invention such as a mobile telephone.

Le terminal 1 comporte :

un processeur 2 de traitement de signaux DSP (« Digital Signal Processor »),
un microphone 11,
un haut-parleur 12,
un convertisseur analogique-numérique 8,
un convertisseur numérique-analogique 9,
un élément 10 de stockage de partitions musicales définies dans un format de codage prédéterminé.

Terminal 1 includes:

a DSP signal processing processor (“Digital Signal Processor”),
a microphone 11,
a speaker 12,
an analog-digital converter 8,
a digital-analog converter 9,
an element 10 for storing musical scores defined in a predetermined coding format.

Les partitions musicales peuvent avoir un format de codage de musique MIDI, SMAF de Yamaha® , EMR R5 polyphonique, IrDA iMelody de l'lrMC (Infrared Mobile Communications) ou un autre format de description vectoriel de la musique.Musical scores can have a MIDI music coding format, Yamaha® SMAF, polyphonic EMR R5, IrDA iMelody from lrMC (Infrared Mobile Communications) or another format for vector description of music.

Chaque note de la partition musicale est caractérisée par sa hauteur, c'est à dire sa fréquence fondamentale, et son timbre, c'est à dire les harmoniques de la fréquence fondamentale.Each note in the musical score is characterized by its pitch, ie its fundamental frequency, and its timbre, ie the harmonics of the fundamental frequency.

La partition codée comporte un ensemble de couples (note, durée). Les notes sont interprétées en durée et en fréquence, à chaque note correspondant une date de début, une date de fin et plusieurs fréquences (fréquence fondamentale et fréquences harmoniques).The coded score includes a set of couples (note, duration). The notes are interpreted in duration and frequency, with each corresponding note a start date, an end date and several frequencies (fundamental frequency and harmonic frequencies).

Les convertisseurs 8 et 9 appartiennent par exemple à un même CODEC 13 (Codeur Décodeur).The converters 8 and 9 belong, for example, to the same CODEC 13 (Encoder Decoder).

Le processeur 2 comporte :

un synthétiseur 3,
des moyens 4 de mélange de signaux,
des moyens 5 additionneurs de signaux,
un vocodeur 6.

Processor 2 includes:

a synthesizer 3,
means 4 for mixing signals,
means 5 signal adders,
a vocoder 6.

Le vocodeur 6 est par exemple un vocodeur adaptatif à débit multiple AMR (Adaptative Multi Rate) pour exécuter un codage de source du type 3 GPP TS 26.071 AM.The vocoder 6 is for example an adaptive multi rate AMR (Adaptive Multi Rate) vocoder for performing source coding of type 3 GPP TS 26.071 AM.

Le son de la voix est capté par le microphone 11. La pression acoustique est transformée en un signal électrique analogique sur une bande de fréquence [300-3400 Hz]. Ce signal analogique est découpé en intervalles jointifs de durée 20 ms. Chaque intervalle est numérisé par le convertisseur analogique numérique 8.The sound of the voice is picked up by the microphone 11. The sound pressure is transformed into an analog electrical signal on a frequency band [300-3400 Hz]. This analog signal is divided into contiguous intervals of 20 ms duration. Each interval is digitized by the analog-to-digital converter 8.

On obtient ainsi un signal numérique de parole S1 sous la forme de trames de 20 ms.A digital speech signal S1 is thus obtained in the form of frames of 20 ms.

De même, le synthétiseur 3 permet d'extraire un signal numérique S2 de musique sous la forme de trames de 20 ms correspondant à une partition stockée dans l'élément de stockage 10.Likewise, the synthesizer 3 makes it possible to extract a digital signal S2 of music in the form of frames of 20 ms corresponding to a partition stored in the storage element 10.

Une proportion X% du signal S1 et une proportion Y% du signal S2 sont traités par les moyens 4 mélangeurs de signaux.A proportion X% of the signal S1 and a proportion Y% of the signal S2 are processed by the means 4 signal mixers.

Les moyens 4 mélangeurs vont ainsi remplacer la fréquence fondamentale et les harmoniques du signal de voix par la fréquence fondamentale et les harmoniques de chacune des notes du signal de musique pendant la durée de la note. Cette substitution se fait en temps réel avec l'arrivée de la voix échantillonnée de sorte que la voix suive les différentes fréquences associées aux notes de la partition.The means 4 mixers will thus replace the fundamental frequency and the harmonics of the voice signal by the fundamental frequency and the harmonics of each of the notes of the music signal during the duration of the note. This substitution is done in real time with the arrival of the sampled voice so that the voice follows the different frequencies associated with the notes of the score.

La parole est décomposée à l'aide d'un filtre numérique en bruits (consonnes) et en signaux sinusoïdaux (voyelles) successifs, détectés en tant que tels par leurs formes d'ondes ; en sortie de ce filtre, une proportion Y% d'un signal sinusoïdal musical déduit du signal S2 se substitue à une proportion X% d'un signal sinusoïdal de parole.Speech is broken down using a digital filter into noises (consonants) and successive sinusoidal signals (vowels), detected as such by their waveforms; at the output of this filter, a proportion Y% of a musical sinusoidal signal deduced from the signal S2 is substituted for a proportion X% of a sinusoidal speech signal.

On obtient ainsi en sortie des moyens mélangeurs 4 un signal numérique chanté S3.There is thus obtained at the output of the mixing means 4 a sung digital signal S3.

De manière à garder l'intelligibilité de la voix, une proportion (100-X)% du signal S1 numérique de voix original est préservé et ajouté au signal S3 par les moyens 5 additionneurs de signaux.In order to maintain the intelligibility of the voice, a proportion (100-X)% of the original digital voice signal S1 is preserved and added to the signal S3 by the signal adding means 5.

De même, on peut ajouter à S3 une proportion (100-Y)% du signal S2 numérique de musique original via les moyens 5 additionneurs.Likewise, a proportion (100-Y)% of the original digital music signal S2 can be added to S3 via the adding means 5.

Les moyens mélangeurs 4 et additionneurs 5 sont des moyens logiciels intégrés au processeur 2.The mixing means 4 and adders 5 are software means integrated into processor 2.

Le signal mélangé et additionné S4 en sortie des moyens additionneurs 5 est ensuite codé par le vocodeur 6 puis transmis vers un interlocuteur. On a ainsi une transmission en temps réel du signal S1 modifié pour suivre la partition.The mixed and added signal S4 at the output of the adding means 5 is then coded by the vocoder 6 and then transmitted to an interlocutor. There is thus a real-time transmission of the signal S1 modified to follow the partition.

Le signal codé peut également être stocké dans un fichier ayant un format du type AMR IETF. Ce fichier peut ensuite être envoyé vers un autre terminal qui peut être par exemple un terminal mobile ou un ordinateur personnel.The coded signal can also be stored in a file having a format of the AMR IETF type. This file can then be sent to another terminal which can be for example a mobile terminal or a personal computer.

Le signal S4 peut également être transmis au convertisseur 8 numérique analogique puis émis sur le haut-parleur 9.The signal S4 can also be transmitted to the digital to analog converter 8 and then transmitted to the loudspeaker 9.

D'autres fonctions non représentées peuvent être ajoutées au processeur.Other functions not shown can be added to the processor.

Il peut en effet être utile de ne pas remplacer la fréquence fondamentale et les harmoniques du signal de voix par la fréquence fondamentale et les harmoniques d'une note du signal de musique lorsque la voix se trouve sur une consonne correspondant à un son « glotté ». Dès lors, le terminal peut comporter des moyens de détection d'enveloppe par fenêtres glissantes pour détecter une consonne du signal numérique de parole. Les moyens mélangeurs ne sont alors activés qu'à la fin de cette consonne.It may indeed be useful not to replace the fundamental frequency and the harmonics of the voice signal by the fundamental frequency and the harmonics of a note of the music signal when the voice is on a consonant corresponding to a "glotted" sound. . Therefore, the terminal can include envelope detection means by sliding windows to detect a consonant of the digital speech signal. The mixing means are then activated only at the end of this consonant.

Ces moyens de détection utilisent une fonction d'analyse spectrale TFR (Transformée de Fourier Rapide) qui se comporte comme un banc de filtres et qui permet de détecter la présence d'un pic de puissance parmi les fréquences composant le spectre de raie, ledit pic de puissance correspondant à la fréquence fondamentale d'une voyelle ; ou de détecter l'absence d'un pic de puissance, et donc s'il y a cependant présence de signal, la présence de bruit correspondant à une consonne.These detection means use a TFR (Fast Fourier Transform) spectral analysis function which behaves like a filter bank and which makes it possible to detect the presence of a power peak among the frequencies making up the line spectrum, said peak of power corresponding to the fundamental frequency of a vowel; or to detect the absence of a power peak, and therefore if there is however a signal, the presence of noise corresponding to a consonant.

De plus, le vocodeur 6 du terminal comporte un détecteur d'activité vocale VAD (Voice Activity Detector) permettant d'interrompre la transmission radio en l'absence de signal vocal. Le terminal selon l'invention peut utiliser avantageusement un tel détecteur pour commander les moyens mélangeurs. Ainsi, lorsque l'amplitude du signal de voix tend vers zéro, le détecteur VAD peut forcer les moyens mélangeurs à passer à la note suivante de la partition. Le VAD fonctionne en tout ou rien.Ainsi, durant un silence du signal de voix suffisamment long, une commande peut être envoyée au mélangeur 4 de sorte qu'on peut soit continuer à suivre la partition en fournissant uniquement une partie du signal numérique de musique ((100-Y)% du signal S2 sur la figure 2) sur le signal numérique chanté, soit décider d'introduire un silence sur le signal numérique chanté et reprendre le suivi de la partition à la reprise de l'activité vocale.In addition, the vocoder 6 of the terminal includes a voice activity detector VAD (Voice Activity Detector) making it possible to interrupt the radio transmission in the absence of a voice signal. The terminal according to the invention can advantageously use such a detector to control the mixing means. Thus, when the amplitude of the voice signal tends towards zero, the VAD detector can force the mixing means to move to the next note in the score. The VAD works in all or nothing mode, so during a silence of the voice signal long enough, a command can be sent to the mixer 4 so that we can either continue to follow the partition by providing only part of the digital music signal ((100-Y)% of signal S2 in Figure 2) on the sung digital signal, or decide to introduce silence on the sung digital signal and resume monitoring of the partition when the vocal activity resumes.

Bien entendu, l'invention n'est pas limitée au mode de réalisation qui vient d'être décrit.Of course, the invention is not limited to the embodiment which has just been described.

Notamment, le vocodeur AMR décrit peut être remplacé par tout type de vocodeur utilisant un codage de source tel qu'un vocodeur réalisant un codage RPE-LTP conforme au standard GSM 06.10 ou l'ETS 300 726 GSM EFR (Enhanced Full Rate).In particular, the AMR vocoder described can be replaced by any type of vocoder using source coding such as a vocoder performing RPE-LTP coding in accordance with the GSM 06.10 standard or the ETS 300 726 GSM EFR (Enhanced Full Rate).

Claims

Audio device comprising: - means (11) for input by the user of said audio device (1) of an analog speech signal,

a converter (8) for converting said analog speech signal into a digital speech signal (S1), said digital speech signal comprising at least one fundamental frequency,

- means (10) for storing a set of coded data representing a musical score, said musical score comprising a set of notes, each note being defined by a fundamental frequency, a duration and an instrument which plays said note,

- means (3) for extracting from said set of coded data a digital music signal (S2),

characterized in that said audio device (1) includes means (4) for mixing a first portion of said digital speech signal (S1) and a first portion of said digital music signal (S2) to produce a digital signal (S3) , says sung.

Audio device according to one of the preceding claims, characterized in that said audio device comprises a processor (2) for processing DSP digital signals comprising said means (4) for mixing said first portions of the digital speech and music signals (S1, S2).

Audio device according to one of the preceding claims, characterized in that said means (4) for mixing said first portions of said digital speech and music signals (S1, S2) include means for replacing the fundamental frequency of said speech signal by the fundamental frequency associated with a note of said music signal.

Audio device according to the preceding claim, characterized in that the replacement of the fundamental frequency of said speech signal by the fundamental frequency associated with a note of said music signal is produced for a duration substantially equal to the duration of said note.

Audio device according to one of the preceding claims, characterized in that the said audio device comprises means (5) for adding to the said sung digital signal (S3) a second portion of the said digital speech signal.

Audio device according to one of the preceding claims, characterized in that the said audio device comprises means (5) for adding to the said sung digital signal a second portion of the said digital music signal.

Audio device according to one of the preceding claims, characterized in that said means (4) for mixing said first portions of said digital speech and music signals (S1, S2) include means for replacing at least one harmonic frequency of the fundamental frequency of said speech signal by a harmonic frequency of the fundamental frequency associated with a note of said music signal.

Audio device according to one of the preceding claims, characterized in that said audio device comprises means for discriminating a consonant from a vowel in said digital speech signal, said discrimination means activating said means for mixing said first portions of the digital speech signals. speech and music during the detection of said vowel.

Audio device according to one of the preceding claims, characterized in that said audio device comprises a voice activity detector controlling said means for mixing said first portions of the digital speech and music signals.

Audio device according to one of the preceding claims, characterized in that said audio device comprises a vocoder (6), said vocoder performing coding of said sung signal.

Telecommunication terminal (1) characterized in that it comprises an audio device according to one of the preceding claims.

Telecommunication terminal (1) according to the preceding claim characterized in that said terminal (1) comprises means (2) for transmitting said sung digital signal (S3) in real time to another terminal.