EP1606792B1 - Verfahren zur analyse der grundfrequenz, verfahren und vorrichtung zur sprachkonversion unter dessen verwendung - Google Patents
Verfahren zur analyse der grundfrequenz, verfahren und vorrichtung zur sprachkonversion unter dessen verwendung Download PDFInfo
- Publication number
- EP1606792B1 EP1606792B1 EP04716265A EP04716265A EP1606792B1 EP 1606792 B1 EP1606792 B1 EP 1606792B1 EP 04716265 A EP04716265 A EP 04716265A EP 04716265 A EP04716265 A EP 04716265A EP 1606792 B1 EP1606792 B1 EP 1606792B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- fundamental frequency
- spectral
- voice
- samples
- speaker
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 238000004458 analytical method Methods 0.000 title claims abstract description 35
- 238000006243 chemical reaction Methods 0.000 title claims description 28
- 238000001228 spectrum Methods 0.000 claims abstract description 31
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 9
- 230000003595 spectral effect Effects 0.000 claims description 71
- 230000009466 transformation Effects 0.000 claims description 39
- 239000000203 mixture Substances 0.000 claims description 10
- 230000001131 transforming effect Effects 0.000 claims description 8
- 238000002156 mixing Methods 0.000 claims description 6
- 238000003786 synthesis reaction Methods 0.000 claims description 6
- 238000007476 Maximum Likelihood Methods 0.000 claims description 4
- 230000001360 synchronised effect Effects 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 2
- 238000012512 characterization method Methods 0.000 claims 1
- 239000000523 sample Substances 0.000 description 12
- 239000013598 vector Substances 0.000 description 7
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000000354 decomposition reaction Methods 0.000 description 3
- 238000010606 normalization Methods 0.000 description 3
- 230000001755 vocal effect Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000000737 periodic effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000013139 quantization Methods 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 241000282414 Homo sapiens Species 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 235000021183 entrée Nutrition 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 230000009021 linear effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000051 modifying effect Effects 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/90—Pitch determination of speech signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
- G10L2021/0135—Voice conversion or morphing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
Definitions
- the present invention relates to a method for analyzing fundamental frequency information contained in voice samples, and to a method and system for voice conversion using this analysis method.
- the production of speech may involve the vibration of the vocal chords, which is manifested by the presence in the speech signal of a periodic fundamental period structure of which the opposite is called fundamental frequency or "pitch".
- auditory rendering is essential and to obtain an acceptable quality, it is necessary to master the parameters related to the prosody and among these, the fundamental frequency.
- the process of the invention shown in figure 1 is implemented from a database of voice samples containing natural speech sequences.
- the method begins with a step 2 of analyzing the samples by grouping them by frame, in order to obtain for each sample frame, information relating to the spectrum and in particular to the spectral envelope and information relating to the fundamental frequency.
- this analysis step 2 is based on the use of a model of a sound signal in the form of a sum of a harmonic signal with a noise signal according to a commonly called model.
- HNM Harmonic plus Noise Model
- the described embodiment is based on a representation of the spectral envelope by the discrete cepstrum.
- a cepstral representation makes it possible to separate, in the speech signal, the component relative to the vocal tract of the component resulting from the source, corresponding to the vibrations of the vocal cords and characterized by the fundamental frequency.
- step 2 of analysis comprises a substep 4 of modeling each voice signal frame into a harmonic portion representing the periodic component of the signal, consisting of a sum of L harmonic sinusoids of amplitude A l and of phase ⁇ l , and a noisy part representing the friction noise and the variation of the glottal excitation.
- h (n) thus represents the harmonic approximation of the signal s (n).
- Step 2 then comprises a substep 5 of estimation for each frame, of frequency parameters and in particular of the fundamental frequency, for example by means of an autocorrelation method.
- this HNM analysis delivers the maximum frequency of voicing.
- this frequency can be set arbitrarily or estimated by other known means.
- This substep 5 is followed by a substep 6 of synchronized analysis of each frame on its fundamental frequency, which makes it possible to estimate the parameters of the harmonic part as well as the parameters of the noise of the signal.
- this synchronized analysis corresponds to the determination of the harmonic parameters by minimizing a weighted least squares criterion between the complete signal and its corresponding harmonic decomposition in the described embodiment, with the estimated noise signal.
- w (n) is the analysis window and T i is the fundamental period of the current frame.
- the window of analysis is centered around the mark of the fundamental period and has for duration twice this period.
- the analysis step 2 finally comprises a sub-step 7 for estimating the parameters of the components of the spectral envelope of the signal by using, for example, a regularized discrete cepstrum method and a Bark scale transformation to reproduce most faithfully possible the properties of the human ear.
- the analysis step 2 delivers, for each n-rank frame of speech signal samples, a scalar denoted x n comprising fundamental frequency information and a vector denoted y n comprising spectrum information in the form of of a sequence of cepstral coefficients.
- F 0 Avg corresponds to the average of the fundamental frequency values over the entire database analyzed.
- This normalization makes it possible to modify the scale of the variations of the fundamental frequency scalars in order to make it coherent with the scale of the variations of the cepstral coefficients.
- the normalization step is followed by a step 20 of determining a model representing the common cepstrum and fundamental frequency characteristics of all samples analyzed.
- GMM gaussian density mixing model
- N (z; ⁇ i ; ⁇ i ) is the probability density of the normal law of mean ⁇ i and of covariance matrix ⁇ i and the coefficients ⁇ i are the coefficients of the mixture.
- the coefficient ⁇ i corresponds to the probability a priori that the random variable z is generated by the ith Gaussian of the mixture.
- Step 20 then comprises a sub-step 24 for estimating GMM parameters ( ⁇ , ⁇ , ⁇ ) of the density p (z).
- This estimation can be performed, for example, using a conventional algorithm of the type called "EM” (Expectation - Maximization), corresponding to an iterative method leading to obtaining a maximum likelihood estimator between the speech sample data and the Gaussian mixing model.
- EM Exctation - Maximization
- the determination of the initial parameters of the GMM model is obtained using a conventional vector quantization technique.
- the model determination step 20 thus delivers the parameters of a mixture of Gaussian densities representative of the common characteristics of the spectra, represented by the cepstral coefficients, and fundamental frequencies of the voice samples analyzed.
- the method then comprises a step 30 of determining, from the model and the voice samples, a prediction function of the fundamental frequency as a function solely of spectrum information provided by the cepstrum of the signal.
- This prediction function is determined from an estimator of the realization of the fundamental frequency given the cepstrum of the vocal samples, formed in the described embodiment, by the conditional expectation.
- step 30 comprises a substep 32 for determining the conditional expectation of the fundamental frequency knowing the spectrum information provided by the cepstrum.
- P i (y) corresponds to the posterior probability that the vector y of cepstre is generated by the i th component of the Gaussian mixture of the model, defined in step 20 by the covariance matrix ⁇ i and the normal law ⁇ i .
- the determination of the conditional expectation thus makes it possible to obtain the prediction function of the fundamental frequency from the cepstrum information.
- the estimator implemented during step 30 may be a criterion of maximum a posteriori, called "MAP", and corresponding to the realization of calculating the expectation only for the model that best represents the source vector.
- MAP maximum a posteriori
- the analysis method of the invention makes it possible, from the model and the voice samples, to obtain a function of predicting the fundamental frequency as a function solely of the spectrum information provided, in the embodiment of the invention. described by the cepstrum.
- Such a prediction function then makes it possible to determine the value of the fundamental frequency for a speech signal, solely on the basis of spectrum information of this signal, thus allowing a relevant prediction of the fundamental frequency, especially for sounds that are not in the voice samples analyzed.
- Voice conversion consists of modifying the voice signal of a reference speaker called “source speaker” so that the signal produced appears to have been spoken by another speaker named “target speaker”.
- This method is implemented from a database of voice samples uttered by the source speaker and the target speaker.
- such a method includes a step 50 of determining a transform function of the spectral characteristics of the source speaker's speech samples to make them look like the spectral characteristics of the target speaker's speech samples.
- this step 50 is based on an HNM-type analysis making it possible to determine the existing relationships between the characteristics of the spectral envelope of the speech signals of the source and target speakers.
- Step 50 comprises a substep 52 for modeling voice samples according to an HNM model, the sum of harmonic signals and noise.
- the substep 52 is followed by a substep 54 of alignment between the source and target signals using, for example, a conventional alignment algorithm called "DTW" (in English “Dynamic Time Warping”). .
- DTW a conventional alignment algorithm
- Step 50 then comprises a substep 56 for determining a model such as a GMM type model representing the common characteristics of the spectrums of the speech samples of the source and target speakers.
- a model such as a GMM type model representing the common characteristics of the spectrums of the speech samples of the source and target speakers.
- a 64-component GMM model and a single vector containing the cepstral parameters of the source and the target are used, so that a spectral transformation function corresponding to an estimator of the source can be defined.
- the estimator may be formed of a maximum a posteriori criterion.
- the function thus defined makes it possible to modify the spectral envelope of a speech signal originating from the source speaker in order to make it look like the spectral envelope of the target speaker.
- the GMM model parameters representing the common spectral characteristics of the source and the target are initialized, for example, using a vector quantization algorithm.
- the analysis method of the invention is implemented during a step 60 of analyzing only voice samples of the target speaker.
- the analysis step 60 makes it possible to obtain, for the target speaker, a prediction function of the fundamental frequency as a function solely of spectral information.
- the conversion method then comprises a step 65 for analyzing a voice signal to be converted pronounced by the source speaker, which signal to be converted is different from the voice signals used during steps 50 and 60.
- This analysis step 65 is performed, for example, using a decomposition according to the model HNM for delivering spectrum information in the form of cepstral coefficients, fundamental frequency information as well as phase information and maximum frequency of voicing.
- This step 65 is followed by a step 70 of transforming the spectral characteristics of the voice signal to be converted by the application of the transformation function determined in step 50, to the cepstral coefficients defined in step 65.
- This step 70 makes it possible in particular to modify the spectral envelope of the voice signal to be converted.
- each sample frame of the signal to be converted from the source speaker is thus associated with transformed spectral information whose characteristics are similar to the spectral characteristics of the samples of the target speaker.
- the conversion method then comprises a step 80 of prediction of the fundamental frequency for the voice samples of the source speaker, by the application of the prediction function determined according to the method of the invention during step 60, to the only information transformed spectrals associated with the voice signal to be converted from the source speaker.
- the voice samples of the source speaker being associated with transformed spectral information whose characteristics are similar. to those of the target speaker, the prediction function defined in step 60 makes it possible to obtain a relevant prediction of the fundamental frequency.
- the conversion method then comprises a step 90 of synthesizing the output signal produced, in the example described, by an HNM-type synthesis which directly delivers the converted voice signal from the transformed spectral envelope information. provided by step 70, predicted fundamental frequency information from step 80 and phase and maximum voicing frequency information outputted from step 65.
- the conversion method implementing the analysis method of the invention thus makes it possible to obtain a voice conversion performing spectral modifications as well as a fundamental frequency prediction, so as to obtain good quality auditory rendering. .
- the efficiency of such a method can be evaluated from identical voice samples spoken by the source speaker and the target speaker.
- the voice signal spoken by the source speaker is converted using the method as described and the similarity of the converted signal to the signal spoken by the target speaker is evaluated.
- this resemblance is calculated as a ratio between the acoustic distance separating the converted signal from the target signal and the acoustic distance separating the target signal from the source signal.
- the ratio obtained for a signal converted using the method of the invention is of the order from 0.3 to 0.5.
- FIG 3 there is shown a functional block diagram of a voice conversion system implementing the method described with reference to the figure 2 .
- This system uses as input a database 100 of voice samples uttered by the source speaker and a database 102 containing at least the same voice samples uttered by the target speaker.
- a module 104 for determining a spectral characteristic transformation function of the source speaker into spectral characteristics of the target speaker.
- This module 104 is adapted for the implementation of step 50 of the method as described with reference to FIG. figure 2 and thus allows the determination of a transform function of the spectral envelope.
- the system comprises a module 106 for determining a fundamental frequency prediction function as a function solely of information relating to the spectrum.
- the module 106 receives for this purpose the voice samples of the only target speaker, contained in the database 102.
- the module 106 is adapted for the implementation of step 60 of the method described with reference to the figure 2 and corresponding to the analysis method of the invention as described with reference to the figure 1 .
- the transformation function delivered by the module 104 and the prediction function delivered by the module 106 are stored for later use.
- the voice conversion system receives as input a voice signal 110 corresponding to a speech signal spoken by the source speaker and intended to be converted.
- the signal 110 is introduced into a signal analysis module 112, implementing, for example, an HNM type decomposition and making it possible to dissociate spectrum information from the signal 110 in the form of cepstral coefficients and frequency information. fundamental.
- the module 112 also delivers phase information and maximum frequency of voicing obtained by applying the model HNM.
- the module 112 thus implements step 65 of the method described above.
- cepstral coefficients delivered by the module 112 are then introduced into a transformation module 114 adapted to apply the transformation function determined by the module 104.
- the transformation module 114 implements step 70 of the method described with reference to FIG. figure 2 and delivers transformed cepstral coefficients whose characteristics are similar to the spectral characteristics of the target speaker.
- the module 114 thus makes a modification of the spectral envelope of the voice signal 110.
- the transformed cepstral coefficients delivered by the module 114 are then introduced into a fundamental frequency prediction module 116 adapted to implement the prediction function determined by the module 106.
- the module 116 implements the step 80 of the method described with reference to the figure 2 and outputs predicted fundamental frequency information from only the transformed spectrum information.
- the system then comprises a synthesis module 118 receiving as input the transformed cepstral coefficients coming from the module 114 and corresponding to the spectral envelope, the predicted fundamental frequency information coming from the module 116, and the phase and maximum frequency information of voicing delivered by the module 112.
- the module 118 thus implements the step 90 of the method described with reference to the figure 2 and provides a signal 120 corresponding to the source speaker's voice signal 110, but whose spectrum and fundamental frequency characteristics have been modified to be similar to those of the target speaker.
- the system described can be implemented in various ways and in particular with the aid of a computer program adapted and connected to material means of sound acquisition.
- the models HNM and GMM can be replaced by other techniques and models known to those skilled in the art, such as for example the so-called Line Spectral Frequencies (LSF), LPC (Linear Predictive Coding) techniques or even parameters relating to the formants.
- LSF Line Spectral Frequencies
- LPC Linear Predictive Coding
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Machine Translation (AREA)
Claims (18)
- Verfahren der Analyse von Informationen der Grundfrequenz, die in Sprachproben enthalten sind, dadurch gekennzeichnet, dass es mindestens Folgendes umfasst:- einen Schritt (2) der Analyse der in Rahmen gruppierten Sprachproben, um für jeden Probenrahmen eine Spektralhüllendarstellung, die dazu geeignet ist, im Zuge der Stimmumwandlung zwischen zwei Sprechern transformiert zu werden, und die Grundfrequenz zu erhalten;- einen Schritt (20) der Bestimmung eines Modells der gemeinsame Dichtefunktion der Spektralhüllendarstellung und der Grundfrequenz aller Proben; und- einen Schritt (30) der Bestimmung auf der Grundlage des Modells und der Sprachproben einer Funktion der Vorhersage der Grundfrequenz ausschließlich als Funktion der Spektralhüllendarstellung.
- Verfahren nach Anspruch 1, dadurch gekennzeichnet, dass die Spektralhüllendarstellung in Form von Cepstralkoeffizienten ausgedrückt ist.
- Verfahren nach einem der Ansprüche 1 oder 2, dadurch gekennzeichnet, dass der Schritt der Analyse (2) Folgendes umfasst:- einen Unterschritt (4) der Modellierung der Sprachproben gemäß einer Summe eines harmonischen Signals und eines Rauschsignals;- einen Unterschritt (5) der Schätzung von Parametern der Frequenz und mindestens der Grundfrequenz der Sprachproben;- einen Unterschritt (6) der synchronisierten Analyse eines jeden Probenrahmens auf seiner Grundfrequenz; und- einen Unterschritt (7) der Schätzung der Parameter der Spektralhüllendarstellung eines jeden Probenrahmens.
- Verfahren nach einem der Ansprüche 1 bis 3, dadurch gekennzeichnet, dass es ferner einen Schritt (10) der Normalisierung der Grundfrequenz eines jeden Probenrahmens in Bezug auf den Mittelwert der Grundfrequenzen der analysierten Proben umfasst.
- Verfahren nach einem der vorangegangenen Ansprüche 1 bis 4, dadurch gekennzeichnet, dass der Schritt (20) der Bestimmung eines Modells der Bestimmung eines Modells durch Mischen Gaußscher Dichten entspricht.
- Verfahren nach Anspruch 5, dadurch gekennzeichnet, dass der Schritt der Bestimmung (20) eines Modells Folgendes umfasst:- einen Unterschritt (22) der Bestimmung eines Modells, das einer Mischung Gaußscher Dichten entspricht; und- einen Unterschritt (24) der Schätzung der Parameter der Mischung Gaußscher Dichten auf der Grundlage der Maximum-Likelihood-Schätzung zwischen den Spektral- und den Grundfrequenzinformationen der Proben und des Modells.
- Verfahren nach einem der Ansprüche 1 bis 6, dadurch gekennzeichnet, dass der Schritt (30) der Bestimmung einer Funktion der Vorhersage auf der Grundlage eines Schätzers der Realisation der Grundfrequenz unter Kenntnis der Spektralinformationen der Proben durchgeführt wird.
- Verfahren nach Anspruch 7, dadurch gekennzeichnet, dass der Schritt (30) der Bestimmung der Funktion der Vorhersage der Grundfrequenz einen Unterschritt (32) der Bestimmung der bedingten Erwartung der Realisation der Grundfrequenz unter Kenntnis der Spektralinformationen auf der Grundlage der aposteriorischen Wahrscheinlichkeit, dass die Spektralinformationen auf der Grundlage des Modells erhalten werden, umfasst, wobei die bedingte Erwartung den Schätzer bildet.
- Verfahren der Umwandlung eines Sprachsignals, das von einem Ausgangssprecher ausgegeben wird, in ein umgewandeltes Sprachsignal, dessen Eigenschaften jenen eines Zielsprechers ähneln, das mindestens Folgendes umfasst:- einen Schritt (50) der Bestimmung einer Funktion der Transformation einer Spektralhüllendarstellung des Ausgangssprechers in eine Spektralhüllendarstellung des Zielsprechers, durchgeführt auf der Grundlage von Sprachproben des Ausgangssprechers und des Zielsprechers; und- einen Schritt (70) der Transformation der Spektralinformationen des umzuwandelnden Stimmsignals des Ausgangssprechers mithilfe der Funktion der Transformation,dadurch gekennzeichnet, dass es ferner Folgendes umfasst:- einen Schritt (60) der Bestimmung einer Funktion der Vorhersage der Grundfrequenz ausschließlich als Funktion einer Spektralhüllendarstellung für den Zielsprecher, wobei die Funktion der Vorhersage mithilfe eines Verfahrens der Analyse nach einem der Ansprüche 1 bis 8 erhalten wird; und- einen Schritt (80) der Vorhersage der Grundfrequenz des umzuwandelnden Stimmsignals durch Anwenden der Funktion der Vorhersage der Grundfrequenz auf die transformierte Spektralhüllendarstellung des Stimmsignals des Ausgangssprechers.
- Verfahren nach Anspruch 9, dadurch gekennzeichnet, dass der Schritt (50) der Bestimmung einer Funktion der Transformation auf der Grundlage eines Schätzers der Realisation der Zielspektraleigenschaften unter Kenntnis der Ausgangsspektraleigenschaften durchgeführt wird.
- Verfahren nach Anspruch 10, dadurch gekennzeichnet, dass der Schritt (50) der Bestimmung einer Funktion der Transformation Folgendes umfasst:- einen Unterschritt (52) der Modellierung der Ausgangs- und Zielsprachproben gemäß einem Summenmodell eines harmonischen Signals und eines Rauschsignals;- einen Unterschritt (54) des Abgleichs zwischen den Ausgangs- und den Zielproben; und- einen Unterschritt (56) der Bestimmung der Funktion der Transformation auf der Grundlage der Berechnung der bedingten Erwartung der Realisation der Zielspektraleigenschaften unter Kenntnis der Ausgangsspektraleigenschaften, wobei die bedingte Erwartung den Schätzer bildet.
- Verfahren nach einem der Ansprüche 9 bis 11, dadurch gekennzeichnet, dass die Funktion der Transformation eine Funktion der Transformation einer Spektralhüllendarstellung ist.
- Verfahren nach einem der Ansprüche 9 bis 12, dadurch gekennzeichnet, dass es ferner einen Schritt (65) der Analyse des umzuwandelnden Stimmsignals umfasst, der dazu geeignet ist, die Informationen bezüglich des Spektrums und der Grundfrequenz bereitzustellen.
- Verfahren nach einem der Ansprüche 9 bis 13, dadurch gekennzeichnet, dass es ferner einen Schritt (90) der Synthese umfasst, der mindestens auf der Grundlage der transformierten Spektralinformationen und der vorhergesagten Informationen der Grundfrequenz die Bildung eines umgewandelten Stimmsignals ermöglicht.
- System zur Umwandlung eines Sprachsignals (110), das von einem Ausgangssprecher ausgegeben wird, in ein umgewandeltes Sprachsignal (120), dessen Eigenschaften jenen eines Zielsprechers ähneln, wobei das System mindestens Folgendes aufweist:- Mittel (104) der Bestimmung einer Funktion der Transformation von Spektraleigenschaften des Ausgangssprechers in Spektraleigenschaften des Zielsprechers, die am Eingang Sprachproben des Ausgangssprechers (100) und des Zielsprechers (102) empfangen; und- Mittel (114) der Transformation der Spektralinformationen des umzuwandelnden Stimmsignals (110) des Ausgangssprechers durch Anwenden der Funktion der Transformation, die von den Mitteln (104) bereitgestellt wurde,dadurch gekennzeichnet, dass es ferner Folgendes umfasst:- Mittel (106) der Bestimmung einer Funktion der Vorhersage der Grundfrequenz ausschließlich als Funktion von Informationen bezüglich des Spektrums für den Zielsprecher, die dazu geeignet sind, ein Verfahren der Analyse nach einem der Ansprüche 1 bis 8 durchzuführen, auf der Grundlage von Sprachproben (102) des Zielsprechers; und- Mittel (116) der Vorhersage der Grundfrequenz des umzuwandelnden Stimmsignals (110) durch Anwenden der Funktion der Vorhersage, die von den Mitteln (106) der Bestimmung einer Funktion der Vorhersage bestimmt wurde, auf die Informationen des transformierten Spektrums, die von den Mitteln der Transformation (114) bereitgestellt wurden.
- System nach Anspruch 15, dadurch gekennzeichnet, dass es ferner Folgendes umfasst:- Mittel (112) der Analyse des umzuwandelnden Stimmsignals (110), die dazu geeignet sind, Informationen bezüglich des Spektrums und der Grundfrequenz des umzuwandelnden Stimmsignals am Ausgang bereitzustellen; und- Mittel (118) der Synthese, die mindestens auf der Grundlage der Informationen des transformierten Spektrums, die von den Mitteln (114) bereitgestellt wurden, und der vorhergesagten Informationen der Grundfrequenz, die von den Mitteln (116) bereitgestellt wurden, die Bildung eines umgewandelten Stimmsignals ermöglichen.
- System nach einem der Ansprüche 15 oder 16, dadurch gekennzeichnet, dass die Mittel (104) der Bestimmung einer Funktion der Transformation dazu geeignet sind, eine Funktion der Transformation der Spektralhülle bereitzustellen.
- System nach einem der Ansprüche 15 bis 17, dadurch gekennzeichnet, dass es dazu geeignet ist, ein Verfahren der Stimmumwandlung nach einem der Ansprüche 9 bis 12 durchzuführen.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR0303790 | 2003-03-27 | ||
FR0303790A FR2853125A1 (fr) | 2003-03-27 | 2003-03-27 | Procede d'analyse d'informations de frequence fondamentale et procede et systeme de conversion de voix mettant en oeuvre un tel procede d'analyse. |
PCT/FR2004/000483 WO2004088633A1 (fr) | 2003-03-27 | 2004-03-02 | Procede d'analyse d'informations de frequence fondamentale et procede et systeme de conversion de voix mettant en oeuvre un tel procede d'analyse |
Publications (2)
Publication Number | Publication Date |
---|---|
EP1606792A1 EP1606792A1 (de) | 2005-12-21 |
EP1606792B1 true EP1606792B1 (de) | 2008-05-14 |
Family
ID=32947218
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP04716265A Expired - Lifetime EP1606792B1 (de) | 2003-03-27 | 2004-03-02 | Verfahren zur analyse der grundfrequenz, verfahren und vorrichtung zur sprachkonversion unter dessen verwendung |
Country Status (8)
Country | Link |
---|---|
US (1) | US7643988B2 (de) |
EP (1) | EP1606792B1 (de) |
JP (1) | JP4382808B2 (de) |
CN (1) | CN100583235C (de) |
AT (1) | ATE395684T1 (de) |
DE (1) | DE602004013747D1 (de) |
FR (1) | FR2853125A1 (de) |
WO (1) | WO2004088633A1 (de) |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4241736B2 (ja) * | 2006-01-19 | 2009-03-18 | 株式会社東芝 | 音声処理装置及びその方法 |
CN101064104B (zh) * | 2006-04-24 | 2011-02-02 | 中国科学院自动化研究所 | 基于语音转换的情感语音生成方法 |
US20080167862A1 (en) * | 2007-01-09 | 2008-07-10 | Melodis Corporation | Pitch Dependent Speech Recognition Engine |
JP4966048B2 (ja) * | 2007-02-20 | 2012-07-04 | 株式会社東芝 | 声質変換装置及び音声合成装置 |
US8131550B2 (en) * | 2007-10-04 | 2012-03-06 | Nokia Corporation | Method, apparatus and computer program product for providing improved voice conversion |
JP4577409B2 (ja) * | 2008-06-10 | 2010-11-10 | ソニー株式会社 | 再生装置、再生方法、プログラム、及び、データ構造 |
CN102063899B (zh) * | 2010-10-27 | 2012-05-23 | 南京邮电大学 | 一种非平行文本条件下的语音转换方法 |
CN102664003B (zh) * | 2012-04-24 | 2013-12-04 | 南京邮电大学 | 基于谐波加噪声模型的残差激励信号合成及语音转换方法 |
ES2432480B2 (es) * | 2012-06-01 | 2015-02-10 | Universidad De Las Palmas De Gran Canaria | Método para la evaluación clínica del sistema fonador de pacientes con patologías laríngeas a través de una evaluación acústica de la calidad de la voz |
US9570087B2 (en) * | 2013-03-15 | 2017-02-14 | Broadcom Corporation | Single channel suppression of interfering sources |
CN105551501B (zh) * | 2016-01-22 | 2019-03-15 | 大连民族大学 | 谐波信号基频估计算法及装置 |
WO2018138543A1 (en) * | 2017-01-24 | 2018-08-02 | Hua Kanru | Probabilistic method for fundamental frequency estimation |
CN108766450B (zh) * | 2018-04-16 | 2023-02-17 | 杭州电子科技大学 | 一种基于谐波冲激分解的语音转换方法 |
CN108922516B (zh) * | 2018-06-29 | 2020-11-06 | 北京语言大学 | 检测调域值的方法和装置 |
CN111179902B (zh) * | 2020-01-06 | 2022-10-28 | 厦门快商通科技股份有限公司 | 基于高斯模型模拟共鸣腔的语音合成方法、设备及介质 |
CN112750446B (zh) * | 2020-12-30 | 2024-05-24 | 标贝(青岛)科技有限公司 | 语音转换方法、装置和系统及存储介质 |
CN115148225B (zh) * | 2021-03-30 | 2024-09-03 | 北京猿力未来科技有限公司 | 语调评分方法、语调评分系统、计算设备及存储介质 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1993018505A1 (en) * | 1992-03-02 | 1993-09-16 | The Walt Disney Company | Voice transformation system |
ATE277405T1 (de) * | 1997-01-27 | 2004-10-15 | Microsoft Corp | Stimmumwandlung |
-
2003
- 2003-03-27 FR FR0303790A patent/FR2853125A1/fr active Pending
-
2004
- 2004-03-02 WO PCT/FR2004/000483 patent/WO2004088633A1/fr active IP Right Grant
- 2004-03-02 US US10/551,224 patent/US7643988B2/en not_active Expired - Fee Related
- 2004-03-02 JP JP2006505682A patent/JP4382808B2/ja not_active Expired - Fee Related
- 2004-03-02 AT AT04716265T patent/ATE395684T1/de not_active IP Right Cessation
- 2004-03-02 CN CN200480014488.8A patent/CN100583235C/zh not_active Expired - Fee Related
- 2004-03-02 EP EP04716265A patent/EP1606792B1/de not_active Expired - Lifetime
- 2004-03-02 DE DE602004013747T patent/DE602004013747D1/de not_active Expired - Lifetime
Also Published As
Publication number | Publication date |
---|---|
EP1606792A1 (de) | 2005-12-21 |
US20060178874A1 (en) | 2006-08-10 |
DE602004013747D1 (de) | 2008-06-26 |
JP2006521576A (ja) | 2006-09-21 |
ATE395684T1 (de) | 2008-05-15 |
JP4382808B2 (ja) | 2009-12-16 |
WO2004088633A1 (fr) | 2004-10-14 |
US7643988B2 (en) | 2010-01-05 |
CN100583235C (zh) | 2010-01-20 |
FR2853125A1 (fr) | 2004-10-01 |
CN1795491A (zh) | 2006-06-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1606792B1 (de) | Verfahren zur analyse der grundfrequenz, verfahren und vorrichtung zur sprachkonversion unter dessen verwendung | |
US7765101B2 (en) | Voice signal conversation method and system | |
US7792672B2 (en) | Method and system for the quick conversion of a voice signal | |
Ris et al. | Assessing local noise level estimation methods: Application to noise robust ASR | |
Helander et al. | Voice conversion using dynamic kernel partial least squares regression | |
US9368103B2 (en) | Estimation system of spectral envelopes and group delays for sound analysis and synthesis, and audio signal synthesis system | |
CN107705801B (zh) | 语音带宽扩展模型的训练方法及语音带宽扩展方法 | |
McLoughlin | Line spectral pairs | |
Le Cornu et al. | Generating intelligible audio speech from visual speech | |
Milner et al. | Prediction of fundamental frequency and voicing from mel-frequency cepstral coefficients for unconstrained speech reconstruction | |
EP1526508B1 (de) | Verfahren zum Auswählen von Syntheseneinheiten | |
Khonglah et al. | Speech enhancement using source information for phoneme recognition of speech with background music | |
Motlıcek | Feature extraction in speech coding and recognition | |
Radfar et al. | Monaural speech segregation based on fusion of source-driven with model-driven techniques | |
Al-Radhi et al. | Continuous vocoder applied in deep neural network based voice conversion | |
EP1846918B1 (de) | Verfahren zur schätzung einer sprachumsetzungsfunktion | |
Orphanidou et al. | Wavelet-based voice morphing | |
Ou et al. | Probabilistic acoustic tube: a probabilistic generative model of speech for speech analysis/synthesis | |
Garnaik et al. | An approach for reducing pitch induced mismatches to detect keywords in children’s speech | |
Gupta et al. | A new framework for artificial bandwidth extension using H∞ filtering | |
Lee et al. | Classification of stop place in consonant-vowel contexts using feature extrapolation of acoustic-phonetic features in telephone speech | |
Sreehari et al. | Automatic short utterance speaker recognition using stationary wavelet coefficients of pitch synchronised LP residual | |
Grumiaux et al. | Efficient bandwidth extension of musical signals using a differentiable harmonic plus noise model | |
Orphanidou et al. | Voice morphing using the generative topographic mapping | |
Huang et al. | Bandwidth-adjusted LPC analysis for robust speech recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20050921 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PL PT RO SE SI SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL LT LV MK |
|
DAX | Request for extension of the european patent (deleted) | ||
RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: ROSEC, OLIVIER Inventor name: EN-NAJJARY, TAOUFIK |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G10L 21/00 20060101ALN20070817BHEP Ipc: G10L 11/04 20060101AFI20070817BHEP |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PL PT RO SE SI SK TR |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D Free format text: NOT ENGLISH |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FG4D |
|
REF | Corresponds to: |
Ref document number: 602004013747 Country of ref document: DE Date of ref document: 20080626 Kind code of ref document: P |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20080514 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20080514 Ref country code: ES Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20080825 |
|
NLV1 | Nl: lapsed or annulled due to failure to fulfill the requirements of art. 29p and 29m of the patents act | ||
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: NL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20080514 Ref country code: AT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20080514 Ref country code: PL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20080514 |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FD4D |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20080514 Ref country code: CZ Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20080514 Ref country code: DK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20080514 Ref country code: PT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20081014 Ref country code: SE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20080814 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: RO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20080514 Ref country code: SK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20080514 |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed |
Effective date: 20090217 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: BG Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20080814 Ref country code: EE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20080514 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20080514 |
|
BERE | Be: lapsed |
Owner name: FRANCE TELECOM Effective date: 20090331 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MC Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20090331 |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: PL |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LI Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20090331 Ref country code: CH Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20090331 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: BE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20090331 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20080815 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LU Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20090302 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: HU Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20081115 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: TR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20080514 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: CY Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20080514 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 13 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20160218 Year of fee payment: 13 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20160223 Year of fee payment: 13 Ref country code: FR Payment date: 20160219 Year of fee payment: 13 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R119 Ref document number: 602004013747 Country of ref document: DE |
|
GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20170302 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: ST Effective date: 20171130 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20170331 Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20171003 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20170302 |