WO2004088633A1 - Procede d'analyse d'informations de frequence fondamentale et procede et systeme de conversion de voix mettant en oeuvre un tel procede d'analyse - Google Patents
Procede d'analyse d'informations de frequence fondamentale et procede et systeme de conversion de voix mettant en oeuvre un tel procede d'analyse Download PDFInfo
- Publication number
- WO2004088633A1 WO2004088633A1 PCT/FR2004/000483 FR2004000483W WO2004088633A1 WO 2004088633 A1 WO2004088633 A1 WO 2004088633A1 FR 2004000483 W FR2004000483 W FR 2004000483W WO 2004088633 A1 WO2004088633 A1 WO 2004088633A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- fundamental frequency
- samples
- determining
- spectrum
- function
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 238000004458 analytical method Methods 0.000 title claims abstract description 39
- 238000006243 chemical reaction Methods 0.000 title claims description 18
- 238000001228 spectrum Methods 0.000 claims abstract description 57
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 8
- 230000003595 spectral effect Effects 0.000 claims description 46
- 230000009466 transformation Effects 0.000 claims description 31
- 239000000203 mixture Substances 0.000 claims description 10
- 230000001131 transforming effect Effects 0.000 claims description 9
- 239000000523 sample Substances 0.000 claims description 7
- 238000003786 synthesis reaction Methods 0.000 claims description 7
- 230000001360 synchronised effect Effects 0.000 claims description 4
- 238000007476 Maximum Likelihood Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000012512 characterization method Methods 0.000 claims description 2
- 230000006870 function Effects 0.000 description 46
- 239000013598 vector Substances 0.000 description 7
- 230000001755 vocal effect Effects 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 238000000354 decomposition reaction Methods 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 2
- 230000000737 periodic effect Effects 0.000 description 2
- 238000013139 quantization Methods 0.000 description 2
- 210000001260 vocal cord Anatomy 0.000 description 2
- 238000004590 computer program Methods 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/90—Pitch determination of speech signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
- G10L2021/0135—Voice conversion or morphing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
Definitions
- the present invention relates to a method for analyzing fundamental frequency information contained in voice samples, and a method for analyzing voice frequency conversion method and system for converting voice. method and a voice conversion system implementing this analysis method.
- the production of speech and in particular voiced sounds can involve the vibration of the vocal cords, which manifests itself in the presence in the speech signal, of a periodic structure with a fundamental period, the reverse is called fundamental frequency or "pitch".
- fundamental frequency the reverse is called fundamental frequency or "pitch”.
- voice conversion hearing is essential and to obtain an acceptable quality, it is necessary to master the parameters related to prosody and among these, the fundamental frequency.
- the aim of the present invention is to remedy this problem, by defining a method for analyzing fundamental frequency information of vocal samples, allowing the definition of a configurable representation of the fundamental frequency.
- the subject of the present invention is a method for analyzing fundamental frequency information contained in voice samples, characterized in that it comprises at least:
- a step of determining a model representing the common characteristics of spectrum and fundamental frequency of all the samples a step of determining, from this model and the voice samples, a function for predicting the fundamental frequency as a function only of information relating to the spectrum.
- said analysis step is adapted to deliver said information relating to the spectrum in the form of cepstral coefficients
- a substep for estimating the spectrum parameters of each frame of samples - It further includes a step of normalizing the fundamental frequency of each frame of samples with respect to the average of the fundamental frequencies of the samples analyzed;
- step of determining a prediction function is carried out from an estimator of the realization of the fundamental frequency knowing the spectrum information of the samples;
- step of determining the prediction function of the fundamental frequency comprises a sub-step of determining the conditional expectation of the realization of the fundamental frequency knowing the spectrum information from the posterior probability that the spectrum information are obtained from the model, the conditional expectation forming said estimator.
- the invention also relates to a method for converting a voice signal pronounced by a source speaker into a converted voice signal whose characteristics resemble those of a target speaker, comprising at least: - a step of determining a transformation function of spectral characteristics of the source speaker into spectral characteristics of the target speaker, carried out from voice samples of the source speaker and the target speaker; and
- a step of transforming the spectrum information of the voice signal of the source speaker to be converted using said transformation function characterized in that it further comprises:
- step of determining a transformation function is carried out from an estimator of the achievement of the target spectral characteristics knowing the source spectral characteristics; - said step of determining a transformation function includes:
- - Said transformation function is a transformation function of the spectral envelope
- the subject of the invention is also a system for converting a voice signal pronounced by a source speaker into a converted voice signal whose characteristics resemble those of a target speaker, system comprising at least:
- means for transforming spectrum information of the voice signal of the source speaker to be converted by the application of said transformation function delivered by the means characterized in that it further comprises:
- means for analyzing the voice signal to be converted adapted to output information relating to the spectrum and to the fundamental frequency of the voice signal to be converted;
- - synthesis means making it possible to form a converted voice signal from at least the transformed spectrum information delivered by the means and predicted fundamental frequency information delivered by the means;
- Said means for determining a transformation function are adapted to deliver a transformation function for the spectral envelope
- - Fig.1 is a flow diagram of an analysis method according to the invention.
- - Fig.2 is a flow diagram of a voice conversion method implementing the analysis method of the invention.
- Fig.3 is a functional block diagram of a voice conversion system, allowing the implementation of the method of the invention described in Figure 2.
- the method of the invention shown in Figure 1 is implemented from a database of voice samples containing natural speech sequences.
- the method begins with a step 2 of analyzing the samples by grouping them by frame, in order to obtain for each frame of samples, information relating to the spectrum and in particular to the spectral envelope and information relating to the fundamental frequency.
- this analysis step 2 is based on the use of a model of a sound signal in the form of a sum of a harmonic signal with a noise signal according to a model commonly called "HNM" (in English: Harmonie plus Noise Model).
- HNM Harmonie plus Noise Model
- the embodiment described is based on a representation of the spectral envelope by the discrete cepstrum.
- step 2 of analysis includes a sub-step 4 of modeling each frame of voice signal into a harmonic part representing the periodic component of the signal, consisting of a sum of L harmonic sinusoids of amplitude Ai and phase ⁇ , and a noisy part representing the friction noise and the variation of the glottal excitation.
- s (n) h (n) + b (n)
- Step 2 then includes a sub-step 5 for estimating for each frame, frequency parameters and in particular the fundamental frequency, for example by means of an autocorrelation method.
- this HNM analysis delivers the maximum voicing frequency.
- this frequency can be arbitrarily fixed or be estimated by other known means.
- This sub-step 5 is followed by a sub-step 6 of synchronized analysis of each frame on its fundamental frequency, which makes it possible to estimate the parameters of the harmonic part as well as the parameters of the noise of the signal.
- this synchronized analysis corresponds to the determination of the parameters of the harmonics by minimization of a criterion of weighted least squares between the complete signal and its harmonic decomposition corresponding in the embodiment described, to the estimated noise signal.
- the criterion noted E is equal to:
- w (n) is the analysis window and Tj is the fundamental period of the current frame.
- the analysis window is centered around the mark of the fundamental period and has a duration of twice this period.
- Step 2 of analysis finally includes a sub-step 7 of estimating the parameters of the components of the spectral envelope of the signal using for example a regularized discrete cepstrum method and a transformation into a Bark scale to reproduce as faithfully as possible. possible the properties of the human ear.
- the analysis step 2 delivers, for each frame of rank n of speech signal samples, a scalar denoted x n comprising fundamental frequency information and a vector denoted y n comprising spectrum information in the form of a sequence of cepstral coefficients.
- step 2 of analysis is followed by a step 10 of normalizing the value of the fundamental frequency of each frame with respect to the average fundamental frequency in order to replace for each frame of voice samples, the value of the frequency fundamental by a normalized fundamental frequency value according to the following formula:
- F m ° y corresponds to the average of the values of
- This normalization makes it possible to modify the scale of variations of scalars of fundamental frequency in order to make it consistent with the scale of variations of cepstral coefficients.
- the normalization step 10 is followed by a step 20 of determining a model representing the common cepstrum and fundamental frequency characteristics of all the samples analyzed.
- N (z; ⁇ ; ⁇ j) is the probability density of the normal law of mean ⁇ i and of covariance matrix ⁇ and the coefficients o are the coefficients of the mixture.
- the coefficient ⁇ corresponds to the a priori probability that the random variable z is generated by the Gaussian i th of the mixture.
- Step 20 then includes a sub-step 24 for estimating GMM parameters ( ⁇ , ⁇ , ⁇ ) of the density p (z).
- This estimation can be carried out, for example, using a conventional algorithm of the type called "EM” (Expectation - Maximization), corresponding to an iterative method leading to obtaining a maximum likelihood estimator between the speech sample data and the Gaussian mixing model.
- EM Exctation - Maximization
- the initial parameters of the GMM model are determined using a standard vector quantization technique.
- the model determination step 20 thus delivers the parameters of a mixture of Gaussian densities representative of the common characteristics of the spectra, represented by the cepstral coefficients, and of the fundamental frequencies of the analyzed vocal samples.
- the method then comprises a step 30 of determining, from the model and the voice samples, a function of prediction of the fundamental frequency as a function only of spectrum information supplied by the signal cepstrum.
- This prediction function is determined from an estimator of the realization of the fundamental frequency given the cepstrum of the voice samples, formed in the embodiment described, by the conditional expectation.
- step 30 includes a sub-step 32 for determining the conditional expectation of the fundamental frequency, knowing the information relating to the spectrum provided by the cepstrum.
- the conditional expectation is noted F (y) and is determined from the following formulas:
- Pj (y) corresponds to the posterior probability that the cepstrum vector y is generated by the i th component of the Gaussian mixture of the model, defined during step 20 by the covariance matrix ⁇ j and the normal law ⁇ i ⁇ .
- the determination of the conditional expectation thus makes it possible to obtain the prediction function of the fundamental frequency from the cepstrum information.
- the estimator implemented during step 30 can be an a posteriori maximum criterion, called "MAP" and corresponding to the realization of the expectation calculation only for the model best representing the source vector.
- MAP a posteriori maximum criterion
- the analysis method of the invention makes it possible, from the model and the voice samples, to obtain a prediction function of the fundamental frequency as a function only of spectrum information provided, in the embodiment described, by the cepstrum.
- Such a prediction function then makes it possible to determine the value of the fundamental frequency for a speech signal, only on the basis of spectrum information of this signal, thus allowing a relevant prediction of the fundamental frequency in particular for sounds which are not in the voice samples analyzed.
- Voice conversion consists of modifying the voice signal of a reference speaker called “source speaker” so that the signal produced seems to have been spoken by another speaker named “target speaker”.
- This method is implemented from a database of voice samples spoken by the source speaker and the target speaker.
- such a method comprises a step 50 of determining a function for transforming the spectral characteristics of the voice samples of the source speaker to make them resemble the spectral characteristics of the voice samples of the target speaker.
- this step 50 is based on an HNM type analysis making it possible to determine the relationships existing between the characteristics of the spectral envelope of the speech signals of the source and target speakers.
- Step 50 includes a sub-step 52 of modeling the voice samples according to an HNM model, of the sum of harmonic signals and of noise.
- Sub-step 52 is followed by a sub-step 54 of alignment between the source and target signals using for example a conventional alignment algorithm called "DTW" (in English “Dynamic Time Warping”) .
- DTW a conventional alignment algorithm
- Step 50 then comprises a sub-step 56 of determining a model such as a GMM type model representing the common characteristics of the spectra of the voice samples of the source and target speakers.
- a model such as a GMM type model representing the common characteristics of the spectra of the voice samples of the source and target speakers.
- the estimator can be formed from a posteriori maximum criterion.
- the function thus defined therefore makes it possible to modify the spectral envelope of a speech signal originating from the source speaker in order to make it resemble the spectral envelope of the target speaker.
- the parameters of the GMM model representing the common spectral characteristics of the source and the target are initialized, for example, using a vector quantization algorithm.
- the analysis method of the invention is implemented during a step 60 of analyzing only the vocal samples of the target speaker.
- the analysis step 60 makes it possible to obtain, for the target speaker, a function of prediction of the fundamental frequency as a function solely of information from spectra.
- the conversion method then comprises a step 65 of analyzing a voice signal to be converted pronounced by the source speaker, which signal to be converted is different from the voice signals used during steps 50 and 60.
- This analysis step 65 is carried out, for example, using a decomposition according to the HNM model making it possible to deliver spectrum information in the form of cepstral coefficients, fundamental frequency information as well as phase information and maximum voicing frequency.
- step 65 is followed by a step 70 of transformation of the spectral characteristics of the voice signal to be converted by the application of the transformation function determined in step 50, to the cepstral coefficients defined during step 65.
- This step 70 allows in particular the modification of the spectral envelope of the voice signal to be converted.
- each frame of samples of the signal to be converted from the source speaker is thus associated with transformed spectral information whose characteristics are similar to the spectral characteristics of the samples of the target speaker.
- the conversion method then comprises a step 80 of predicting the fundamental frequency for the voice samples of the source speaker, by applying the prediction function determined according to the method of the invention during step 60, to the only information transformed spectral associated with the voice signal to be converted from the source speaker.
- the prediction function defined during step 60 makes it possible to obtain a relevant prediction of the fundamental frequency.
- the conversion method then comprises a step 90 of synthesis of the output signal carried out, in the example described, by an HNM type synthesis which directly delivers the voice signal converted from the transformed spectral envelope information delivered by step 70, predicted fundamental frequency information originating from step 80 and phase and maximum voicing frequency information delivered by step 65.
- the conversion method implementing the analysis method of the invention thus makes it possible to obtain a voice conversion carrying out modifications of spectra as well as a prediction of fundamental frequency, so as to obtain a good auditory rendering.
- the effectiveness of such a method can be evaluated from identical voice samples pronounced by the source speaker and the target speaker.
- the speech signal spoken by the source speaker is converted using the method as described and the resemblance of the converted signal with the signal spoken by the target speaker is evaluated. For example, this resemblance is calculated as a ratio between the acoustic distance separating the converted signal from the target signal and the acoustic distance separating the target signal from the source signal.
- the ratio obtained for a signal converted using the method of the invention is l '' from 0.3 to 0.5.
- FIG. 3 shows a functional block diagram of a voice conversion system implementing the method described with reference to FIG. 2.
- This system uses as input a database 100 of voice samples spoken by the source speaker and a database 102 containing at least the same voice samples spoken by the target speaker. These two databases are used by a module 104 for determining a function for transforming spectral characteristics of the source speaker into spectral characteristics of the target speaker.
- This module 104 is suitable for the implementation of step 50 of the method as described with reference to FIG. 2 and therefore allows the determination of a transformation function of the spectral envelope.
- the system includes a module 106 for determining a fundamental frequency prediction function based solely on information relating to the spectrum.
- the module 106 therefore receives the voice samples of the single target speaker, contained in the database 102, as input.
- the module 106 is suitable for implementing step 60 of the method described with reference to FIG. 2 and corresponding to the analysis method of the invention as described with reference to FIG. 1.
- the function of transformation delivered by the module is suitable for implementing step 60 of the method described with reference to FIG. 2 and corresponding to the analysis method of the invention as described with reference to FIG. 1.
- the prediction function delivered by the module 106 are stored for later use.
- the voice conversion system receives as input a voice signal 110 corresponding to a speech signal spoken by the source speaker and intended to be converted.
- the signal 110 is introduced into a signal analysis module 112, implementing, for example, an HNM type decomposition and making it possible to dissociate spectrum information from the signal 110 in the form of cepstral coefficients and frequency information. fundamental.
- the module 112 also delivers phase information and maximum voice frequency obtained by applying the HNM model.
- the module 112 therefore implements step 65 of the method described above.
- this analysis can be done beforehand and the information is stored for later use.
- the cepstral coefficients delivered by the module 112 are then introduced into a transformation module 114 adapted to apply the transformation function determined by the module 104.
- the transformation module 114 implements step 70 of the method described with reference to FIG. 2 and delivers transformed cepstral coefficients whose characteristics are similar to the spectral characteristics of the target speaker.
- the module 114 thus performs a modification of the spectral envelope of the voice signal 110.
- the transformed cepstral coefficients " delivered by the module 114, are then introduced into a module 116 for predicting the fundamental frequency suitable for implementing the prediction function determined by the module 106.
- the module 116 implements step 80 of the method described with reference to FIG. 2 and delivers as output fundamental frequency information predicted from only the transformed spectrum information.
- the system then includes a synthesis module 118 receiving as input the transformed cepstral coefficients from module 114 and corresponding to the spectral envelope, the predicted fundamental frequency information from module 116, and the phase and maximum voicing frequency information. delivered by module 112.
- the module 118 thus implements step 90 of the method described with reference to FIG. 2 and delivers a signal 120 corresponding to the voice signal 110 of the source speaker, but whose spectrum and fundamental frequency characteristics have been modified to be similar to that of the target speaker.
- the system described can be implemented in various ways and in particular using a suitable computer program and connected to hardware means of sound acquisition.
- HNM and GMM models can be replaced by other techniques and models known to those skilled in the art, such as for example the techniques known as LSF (Line Spectral Frequencies), LPC (Linear Predictive Coding) or still parameters relating to formants.
- LSF Line Spectral Frequencies
- LPC Linear Predictive Coding
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP04716265A EP1606792B1 (fr) | 2003-03-27 | 2004-03-02 | Procede d analyse d informations de frequence fondament ale et procede et systeme de conversion de voix mettant en oeuvre un tel procede d analyse |
US10/551,224 US7643988B2 (en) | 2003-03-27 | 2004-03-02 | Method for analyzing fundamental frequency information and voice conversion method and system implementing said analysis method |
JP2006505682A JP4382808B2 (ja) | 2003-03-27 | 2004-03-02 | 基本周波数情報を分析する方法、ならびに、この分析方法を実装した音声変換方法及びシステム |
DE602004013747T DE602004013747D1 (de) | 2003-03-27 | 2004-03-02 | Verfahren zur analyse der grundfrequenz, verfahren und vorrichtung zur sprachkonversion unter dessen verwendung |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR03/03790 | 2003-03-27 | ||
FR0303790A FR2853125A1 (fr) | 2003-03-27 | 2003-03-27 | Procede d'analyse d'informations de frequence fondamentale et procede et systeme de conversion de voix mettant en oeuvre un tel procede d'analyse. |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2004088633A1 true WO2004088633A1 (fr) | 2004-10-14 |
Family
ID=32947218
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/FR2004/000483 WO2004088633A1 (fr) | 2003-03-27 | 2004-03-02 | Procede d'analyse d'informations de frequence fondamentale et procede et systeme de conversion de voix mettant en oeuvre un tel procede d'analyse |
Country Status (8)
Country | Link |
---|---|
US (1) | US7643988B2 (fr) |
EP (1) | EP1606792B1 (fr) |
JP (1) | JP4382808B2 (fr) |
CN (1) | CN100583235C (fr) |
AT (1) | ATE395684T1 (fr) |
DE (1) | DE602004013747D1 (fr) |
FR (1) | FR2853125A1 (fr) |
WO (1) | WO2004088633A1 (fr) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101064104B (zh) * | 2006-04-24 | 2011-02-02 | 中国科学院自动化研究所 | 基于语音转换的情感语音生成方法 |
CN102664003A (zh) * | 2012-04-24 | 2012-09-12 | 南京邮电大学 | 基于谐波加噪声模型的残差激励信号合成及语音转换方法 |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4241736B2 (ja) * | 2006-01-19 | 2009-03-18 | 株式会社東芝 | 音声処理装置及びその方法 |
US20080167862A1 (en) * | 2007-01-09 | 2008-07-10 | Melodis Corporation | Pitch Dependent Speech Recognition Engine |
JP4966048B2 (ja) * | 2007-02-20 | 2012-07-04 | 株式会社東芝 | 声質変換装置及び音声合成装置 |
US8131550B2 (en) * | 2007-10-04 | 2012-03-06 | Nokia Corporation | Method, apparatus and computer program product for providing improved voice conversion |
JP4577409B2 (ja) * | 2008-06-10 | 2010-11-10 | ソニー株式会社 | 再生装置、再生方法、プログラム、及び、データ構造 |
CN102063899B (zh) * | 2010-10-27 | 2012-05-23 | 南京邮电大学 | 一种非平行文本条件下的语音转换方法 |
ES2432480B2 (es) * | 2012-06-01 | 2015-02-10 | Universidad De Las Palmas De Gran Canaria | Método para la evaluación clínica del sistema fonador de pacientes con patologías laríngeas a través de una evaluación acústica de la calidad de la voz |
US9570087B2 (en) * | 2013-03-15 | 2017-02-14 | Broadcom Corporation | Single channel suppression of interfering sources |
CN105551501B (zh) * | 2016-01-22 | 2019-03-15 | 大连民族大学 | 谐波信号基频估计算法及装置 |
WO2018138543A1 (fr) * | 2017-01-24 | 2018-08-02 | Hua Kanru | Procédé probabiliste pour estimation de fréquence fondamentale |
CN108766450B (zh) * | 2018-04-16 | 2023-02-17 | 杭州电子科技大学 | 一种基于谐波冲激分解的语音转换方法 |
CN108922516B (zh) * | 2018-06-29 | 2020-11-06 | 北京语言大学 | 检测调域值的方法和装置 |
CN111179902B (zh) * | 2020-01-06 | 2022-10-28 | 厦门快商通科技股份有限公司 | 基于高斯模型模拟共鸣腔的语音合成方法、设备及介质 |
CN112750446B (zh) * | 2020-12-30 | 2024-05-24 | 标贝(青岛)科技有限公司 | 语音转换方法、装置和系统及存储介质 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1993018505A1 (fr) * | 1992-03-02 | 1993-09-16 | The Walt Disney Company | Systeme de transformation vocale |
WO1998035340A2 (fr) * | 1997-01-27 | 1998-08-13 | Entropic Research Laboratory, Inc. | Systeme et procede de conversion de voix |
-
2003
- 2003-03-27 FR FR0303790A patent/FR2853125A1/fr active Pending
-
2004
- 2004-03-02 CN CN200480014488.8A patent/CN100583235C/zh not_active Expired - Fee Related
- 2004-03-02 DE DE602004013747T patent/DE602004013747D1/de not_active Expired - Lifetime
- 2004-03-02 JP JP2006505682A patent/JP4382808B2/ja not_active Expired - Fee Related
- 2004-03-02 EP EP04716265A patent/EP1606792B1/fr not_active Expired - Lifetime
- 2004-03-02 AT AT04716265T patent/ATE395684T1/de not_active IP Right Cessation
- 2004-03-02 WO PCT/FR2004/000483 patent/WO2004088633A1/fr active IP Right Grant
- 2004-03-02 US US10/551,224 patent/US7643988B2/en not_active Expired - Fee Related
Non-Patent Citations (3)
Title |
---|
DOVAL B ET AL: "Fundamental frequency estimation and tracking using maximum likelihood harmonic matching and HMMs", STATISTICAL SIGNAL AND ARRAY PROCESSING. MINNEAPOLIS, APR. 27 - 30, 1993, PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), NEW YORK, IEEE, US, vol. 4, 27 April 1993 (1993-04-27), pages 221 - 224, XP010110214, ISBN: 0-7803-0946-4 * |
KAIN A ET AL: "Stochastic modeling of spectral adjustment for high quality pitch modification", ICASSP 2000, vol. 2, 5 June 2000 (2000-06-05), pages 949 - 952, XP010504881 * |
STYLIANOU Y ET AL: "A SYSTEM FOR VOICE CONVERSION BASED ON PROBABILISTIC CLASSIFICATION AND A HARMONIC PLUS NOISE MODEL", PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING. ICASSP '98. SEATTLE, WA, MAY 12 - 15, 1998, IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, NEW YORK, NY: IEEE, US, vol. 1 CONF. 23, 12 May 1998 (1998-05-12), pages 281 - 284, XP000854570, ISBN: 0-7803-4429-4 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101064104B (zh) * | 2006-04-24 | 2011-02-02 | 中国科学院自动化研究所 | 基于语音转换的情感语音生成方法 |
CN102664003A (zh) * | 2012-04-24 | 2012-09-12 | 南京邮电大学 | 基于谐波加噪声模型的残差激励信号合成及语音转换方法 |
Also Published As
Publication number | Publication date |
---|---|
JP4382808B2 (ja) | 2009-12-16 |
US7643988B2 (en) | 2010-01-05 |
CN100583235C (zh) | 2010-01-20 |
CN1795491A (zh) | 2006-06-28 |
EP1606792A1 (fr) | 2005-12-21 |
DE602004013747D1 (de) | 2008-06-26 |
FR2853125A1 (fr) | 2004-10-01 |
JP2006521576A (ja) | 2006-09-21 |
EP1606792B1 (fr) | 2008-05-14 |
US20060178874A1 (en) | 2006-08-10 |
ATE395684T1 (de) | 2008-05-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2005106852A1 (fr) | Procede et systeme ameliores de conversion d'un signal vocal | |
WO2005106853A1 (fr) | Procede et systeme de conversion rapides d'un signal vocal | |
Viikki et al. | Cepstral domain segmental feature vector normalization for noise robust speech recognition | |
McLoughlin | Line spectral pairs | |
Geiser et al. | Bandwidth extension for hierarchical speech and audio coding in ITU-T Rec. G. 729.1 | |
EP1606792B1 (fr) | Procede d analyse d informations de frequence fondament ale et procede et systeme de conversion de voix mettant en oeuvre un tel procede d analyse | |
CN114694632A (zh) | 语音处理装置 | |
Prasad et al. | Bandwidth extension of speech signals: A comprehensive review | |
EP1526508B1 (fr) | Procédé de sélection d'unités de synthèse | |
Khonglah et al. | Speech enhancement using source information for phoneme recognition of speech with background music | |
Jokinen et al. | Estimating the spectral tilt of the glottal source from telephone speech using a deep neural network | |
Radfar et al. | Monaural speech segregation based on fusion of source-driven with model-driven techniques | |
Liu et al. | Audio bandwidth extension based on temporal smoothing cepstral coefficients | |
Al-Radhi et al. | Continuous vocoder applied in deep neural network based voice conversion | |
EP1846918B1 (fr) | Procede d'estimation d'une fonction de conversion de voix | |
Liu et al. | Audio bandwidth extension based on ensemble echo state networks with temporal evolution | |
Milivojević et al. | Estimation of the fundamental frequency of the speech signal compressed by mp3 algorithm | |
Xiao et al. | Speech intelligibility enhancement by non-parallel speech style conversion using CWT and iMetricGAN based CycleGAN | |
Gupta et al. | A new framework for artificial bandwidth extension using H∞ filtering | |
Park et al. | Unsupervised noise reduction scheme for voice-based information retrieval in mobile environments | |
Srivastava | Fundamentals of linear prediction | |
Kleijn | Signal processing representations of speech | |
Ye | Efficient Approaches for Voice Change and Voice Conversion Systems | |
Grumiaux et al. | Efficient bandwidth extension of musical signals using a differentiable harmonic plus noise model | |
Jinachitra | Robust structured voice extraction for flexible expressive resynthesis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2004716265 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2006505682 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 20048144888 Country of ref document: CN |
|
WWP | Wipo information: published in national office |
Ref document number: 2004716265 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2006178874 Country of ref document: US Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 10551224 Country of ref document: US |
|
WWP | Wipo information: published in national office |
Ref document number: 10551224 Country of ref document: US |
|
WWG | Wipo information: grant in national office |
Ref document number: 2004716265 Country of ref document: EP |