CN104392717A - Sound track spectrum Gaussian mixture model based rapid voice conversion system and method - Google Patents

Sound track spectrum Gaussian mixture model based rapid voice conversion system and method Download PDF

Info

Publication number
CN104392717A
CN104392717A CN201410742549.8A CN201410742549A CN104392717A CN 104392717 A CN104392717 A CN 104392717A CN 201410742549 A CN201410742549 A CN 201410742549A CN 104392717 A CN104392717 A CN 104392717A
Authority
CN
China
Prior art keywords
characteristic parameter
signal
value
parameter
weight coefficient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410742549.8A
Other languages
Chinese (zh)
Inventor
鲍静益
徐宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changzhou Institute of Technology
Original Assignee
Changzhou Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changzhou Institute of Technology filed Critical Changzhou Institute of Technology
Priority to CN201410742549.8A priority Critical patent/CN104392717A/en
Publication of CN104392717A publication Critical patent/CN104392717A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a sound track spectrum Gaussian mixture model based rapid voice conversion system and method. The method comprises the steps of parameter extraction and synthesis, characteristic parameter time aligning and characteristic parameter training and conversion. By the technologies of fixation of the Gaussian average on Mel frequency spectra, adaptive Gaussian variance adjusting, selecting of sampling points as weight coefficients on logarithm magnitude spectra and the like, the calculation complexity of voice parameter characterization is greatly reduced, and the operating rate is improved greatly.

Description

A kind of Rapid Speech converting system based on the modeling of sound channel spectrum Gaussian Mixture and method thereof
Technical field
The present invention relates to a kind of voice process technology, particularly a kind of Rapid Speech converting system based on the modeling of sound channel spectrum Gaussian Mixture and method thereof.
Background technology
In order to realize the task of speech conversion, a point several step is needed: characteristic parameter extraction, parameter build, mapping relations, parameter is changed in real time.Each process relates to complicated signal transacting computing, require higher, and operation time is longer to software and hardware configuration, is unfavorable for Voice Conversion Techniques instantiation on the wider mobile device of some ranges of application, embedded device.Particularly traditional speech conversion system, at this one-phase of characteristic parameter extraction, usually needs the conversion between time domain, frequency domain, cepstrum domain, and calculated amount is huge especially.In addition, be limited to concrete hardware device, too complicated parameter extraction algorithm also can cause result of calculation out of true.
Summary of the invention
For the problems referred to above, the object of the present invention is to provide a kind of Rapid Speech conversion method based on the modeling of sound channel spectrum Gaussian Mixture, the method is by presetting fixed frequency value, to measures such as the sampling of spectrum envelope amplitude logarithmetics, self-adaptative adjustment variances to speech channel spectrum envelope, the process of parameter characterization is made only to need addition subtraction multiplication and division method to complete, without the need to the signal processing means by complexity, significantly reduce computation complexity, shorten operation time.This characteristic parameter is used for speech conversion system, and experimental result shows that system performance is better than classic method.
In order to achieve the above object, the present invention by the following technical solutions: a kind of Rapid Speech conversion method based on the Gaussian Mixture modeling of sound channel spectrum, step comprises characteristic parameter extraction and train with synthesis, characteristic parameter time unifying, characteristic parameter and change;
Described characteristic parameter extraction is decomposed primary speech signal, and described characteristic parameter synthesis is then the inverse process of characteristic parameter extraction;
Described characteristic parameter time unifying is used for arranging the characteristic parameter of converting objects and the voice that are converted object and screening, and obtains set of characteristic parameters synchronous in time;
The training of described characteristic parameter is the mapping relations between the speech characteristic parameter set for learning converting objects and being converted object, thus obtaining mapping principle, described characteristic parameter conversion utilizes mapping principle that the speech conversion of converting objects is become to be converted the voice of object.
The operation steps of described characteristic parameter extraction is as follows:
(a1) voice signal is carried out to the framing of 20ms, with cross-correlation method, fundamental frequency is estimated;
(a2) according to fundamental frequency, determine that these frame voice are voiceless sound or voiced sound, when these frame voice are Voiced signal, then a maximum voiced sound frequency component is set in Voiced signal part, be used for dividing the main energy area of harmonic components and random element; Recycling least-squares algorithm is estimated to obtain discrete harmonic amplitude value and phase value; Interpolation is carried out to discrete harmonic amplitude, obtains spectrum envelope;
(a3) when these frame voice are Unvoiced signal, then utilize linear prediction analysis method to analyze Unvoiced signal part, thus obtain linear predictor coefficient;
(a4) non-linearization is carried out to the frequency spectrum axle of spectrum envelope, obtain mel-frequency; Preset 24 mel-frequency values, make it be the average of each blending constituent of gauss hybrid models; Logarithmetics is carried out to spectrum envelope amplitude axis, and at Gaussian mean place, it is sampled, sampling point numerical value is preserved, as weight coefficient; Theoretical according to human hearing characteristic, exploitation right coefficient value becomes the relation of approximate reverse ratio with the variance of Gaussian distribution, determines successively to the variance of all Gaussian mixture components;
The operation steps of described characteristic parameter synthesis is as follows:
(b1) amplitude correction is carried out to weight coefficient sequence, revise scaling and determine according to the maximal value of distribution, be approximated to proportional relation;
(b2) carry out interpolation to the weight coefficient sequence after amplitude correction, form spectrum envelope value, make its horizontal ordinate be frequency, ordinate is range value;
(b3) spectrum envelope of Voiced signal is sampled according to fundamental frequency value, obtain discrete harmonic range value;
(b4) the discrete harmonic amplitude of Voiced signal and phase value are used as range value and the phase value of sinusoidal signal, and superpose; Interpositioning and Phase Compensation is used to make reconstruction signal not produce distortion in time domain waveform;
(b5) any random white noise signal is passed through an all-pole filter, obtain approximate reconstruction Unvoiced signal;
(b6) Voiced signal and Unvoiced signal are superposed, obtain the voice signal reconstructed.
The operation steps of described characteristic parameter time unifying is as follows:
(c1) for converting objects and the characteristic parameter sequence of two Length discrepancy being converted object voice signal, utilize dynamic time warping algorithm to be mapped on the time shaft of another one by nonlinear for the time shaft of wherein one, realize matching relationship one to one;
(c2) in the process of the alignment of parameter sets, by the cumulative distortion function that iteration optimization one is default, and restricted searching area, final acquisition time match function.
Described characteristic parameter training and operation step is as follows:
The converting objects of alignment is become augmented matrix with the phonic signal character parametric joint being converted object, and default degree of mixing is N, utilizes expectation maximization rule iterative learning gauss hybrid models parameter, the average of this gauss hybrid models, variance and weight coefficient are parameter to be estimated, approximate evaluation weight coefficient and model parameter is come by Markov chain Monte-Carlo method, the i.e. average of Gaussian process and the associating posterior probability density function of variance, first suppose to meet separate characteristic between weight coefficient and model parameter, then both probability density functions are progressively estimated by the mode of iteration, in each iterative process, first fix a kind of known variables, then another kind of known variables is sampled, its probability distribution approximate is carried out by a large amount of sampled data, finally weight coefficient is multiplied with the probability distribution function of model parameter, associating posterior probability function can be obtained, carry out marginalisation to joint probability density function, obtain the estimation to the probability distribution of weight coefficient and the probability distribution of model parameter respectively, mixed Gaussian random process model structure is determined.
Described characteristic parameter conversion operations step is as follows:
(d1) under the condition of given input observation vector set, according to the structural parameters of the mixed Gaussian stochastic process trained, the posterior probability asking for current speech frame is subordinate to angle value;
(d2) in the subspace of the mixing composition of each sub-clustering, ask for the conditional expectation of output variable, get its average and export as conversion;
(d3) stacked up by the Output rusults of all compositions, weight coefficient is exactly that posterior probability is subordinate to angle value, finally obtains the speech characteristic parameter after mapping.
Adopt technique scheme, the present invention at least has following advantages:
1, under this speech conversion scheme is applicable to big data quantity environment.
Under big data quantity environment, there is between data very strong relevance and plyability.With regard to Chinese speech, under the voice that surface change is abundant, the voice metadata that its essence is formed is limited.Therefore, by setting up the phonetics transfer method with mixed structure, sub-clustering modeling can be carried out to speech data, thus making full use of large data, improve system performance.
2, this voice conversion algorithm has the advantages that non-thread maps, data relationship complicated under the good simulating reality environment of energy.
By building the phonetics transfer method based on Gaussian random process, can make full use of the ability of the Nonlinear Mapping of Gaussian random process, the signal for this kind of variability of voice signal complexity is especially applicable.
Accompanying drawing explanation
Fig. 1 is system chart of the present invention.
Embodiment
Below in conjunction with accompanying drawing, the present invention is further described.
As shown in Figure 1, a kind of Rapid Speech conversion method based on the modeling of sound channel spectrum Gaussian Mixture, is characterized in that step comprises characteristic parameter extraction and synthesis, characteristic parameter time unifying, characteristic parameter are trained and conversion;
Described characteristic parameter extraction is decomposed primary speech signal, and described characteristic parameter synthesis is then the inverse process of characteristic parameter extraction;
Described characteristic parameter time unifying is used for arranging the characteristic parameter of converting objects and the voice that are converted object and screening, and obtains set of characteristic parameters synchronous in time;
The training of described characteristic parameter is the mapping relations between the speech characteristic parameter set for learning converting objects and being converted object, thus obtaining mapping principle, described characteristic parameter conversion utilizes mapping principle that the speech conversion of converting objects is become to be converted the voice of object.
Characteristic parameter extraction comprises following operation:
(a1) voice signal is fixed to the framing of duration, frame length 20ms, frame moves 10ms.In frame voice, solve the autocorrelation function of these voice, utilize the first side lobe peak of autocorrelation function to carry out approximate evaluation pitch period, the inverse of pitch period is fundamental frequency;
(a2) according to the fundamental frequency value (voiceless sound is 0, and voiced sound is non-zero) obtained in (a1) step, determine that these frame voice are voiceless sound or voiced sound.If voiced sound, then for it arranges a maximum voiced sound frequency component, be used for dividing the main energy area of harmonic components and random element.Frequency range below maximum voiced sound frequency, modeling is carried out to signal---utilize the superposition of several sine waves to carry out fitted signal.Least-squares algorithm is utilized to come discrete amplitude values and the phase value of constraint solving sine wave; For the signal frequency range being greater than maximum voiced sound frequency, do not process;
(a3) if this frame signal is at voiceless sound, then utilize classical linear prediction analysis method to analyze it, set up an all-pole modeling, and utilize least square method constraint solving model coefficient, thus obtain linear predictor coefficient.
(a4) non-linearization is carried out to the frequency spectrum axle of spectrum envelope, obtain mel-frequency; Preset 24 mel-frequency values, make it be the average of each blending constituent of gauss hybrid models; Logarithmetics is carried out to spectrum envelope amplitude axis, and at Gaussian mean place, it is sampled, sampling point numerical value is preserved, as weight coefficient; Theoretical according to human hearing characteristic, exploitation right coefficient value becomes the relation of approximate reverse ratio with the variance of Gaussian distribution, determines successively to the variance of all Gaussian mixture components;
Characteristic parameter synthesis comprises following operation:
(b1) amplitude correction is carried out to weight coefficient sequence, revise scaling and determine according to the maximal value of distribution, be approximated to proportional relation;
(b2) carry out interpolation to the weight coefficient sequence after amplitude correction, form spectrum envelope value, make its horizontal ordinate be frequency, ordinate is range value;
(b3) spectrum envelope of Voiced signal is sampled according to fundamental frequency value, obtain discrete harmonic range value;
(b4) the discrete harmonic amplitude of Voiced signal and phase value are used as range value and the phase value of sinusoidal signal, and superpose; Interpositioning and Phase Compensation is used to make reconstruction signal not produce distortion in time domain waveform;
(b5) for Unvoiced signal, by any random white noise signal by an all-pole filter, approximate reconstruction Unvoiced signal is obtained;
(b6) Voiced signal and Unvoiced signal are superposed, obtain the voice signal reconstructed.
Characteristic parameter time unifying:
(c1) for the characteristic parameter sequence of two Length discrepancy, utilize dynamic time warping algorithm to be mapped on the time shaft of another one by nonlinear for the time shaft of wherein one, realize matching relationship one to one;
(c2) in the process of the alignment of parameter sets, by the cumulative distortion function that iteration optimization one is default, and restricted searching area, final acquisition time match function.
Parameter training and modular converter take Gaussian random process as theoretical foundation, and expand a set of mixed structure on basic framework, for carrying out sub-clustering modeling to data, improve accuracy.Meanwhile, have benefited from the Nonlinear Mapping feature of Gaussian random process, system can realize the conversion between the comparatively complicated characteristic parameter words of understanding relation.Whole running engineering comprises two stages, and training stage and translate phase, operation steps is as follows.
Characteristic parameter training and operation step is as follows:
The converting objects of alignment is become augmented matrix with the phonic signal character parametric joint being converted object, and build the Gaussian random process model comprising mixed structure, if degree of mixing is N, the weight coefficient of each blending constituent is respectively r i, wherein i=1,2,3..., N.Then under the prerequisite of given input and output vector set, output vector sequence is approximately equal to the weighted array of N number of Gaussian random process.Wherein, the input of Gaussian random process had both been given input vector sequence.The average of all weight coefficients and each Gaussian random process and variance parameter, be unknown parameter to be estimated, the average of approximate evaluation weight coefficient and model parameter (average of Gaussian process and variance) and the associating posterior probability density function of variance is come by Markov chain Monte-Carlo method, namely first suppose to meet separate characteristic between weight coefficient and model parameter, then both probability density functions are progressively estimated by the mode of iteration, in each iterative process, first fix a kind of known variables, then another kind of known variables is sampled, its probability distribution approximate is carried out by a large amount of sampled data, finally weight coefficient is multiplied with the probability distribution function of model parameter, associating posterior probability function can be obtained, carry out marginalisation to joint probability density function, obtain the estimation to the probability distribution of weight coefficient and the probability distribution of model parameter respectively, so far, mixed Gaussian random process model structure is determined,
Characteristic parameter conversion operations step is as follows:
(d1) under the condition of given input observation vector set, according to the structural parameters of the mixed Gaussian stochastic process trained, ask for the membership function value of current speech frame, so-called membership function, refer to the ratio of normalization posteriority weight coefficient;
(d2) according to being subordinate to angle value, differentiating which Gauss's subconstiuent is current speech belong to, subsequently in the subspace of each sub-clustering, according to the definition of Gaussian random process, producing the output corresponded;
(d3) stacked up by the Output rusults of all compositions, weight coefficient is exactly the value of membership function, finally obtains the speech characteristic parameter after mapping.
The above, only preferred enforcement of the present invention, not any pro forma restriction is done to the present invention, although the present invention is preferably to implement to disclose as above, but and be not used to limit the present invention, any those skilled in the art, do not departing within the scope of technical solution of the present invention, make a little change when the technology contents of above-mentioned announcement can be utilized or be modified to the Equivalent embodiments of equivalent variations, in every case be the content not departing from technical solution of the present invention, according to any simple modification that technical spirit of the present invention is done above embodiment, equivalent variations and modification, all still belong in the scope of technical solution of the present invention.

Claims (6)

1., based on a Rapid Speech conversion method for sound channel spectrum Gaussian Mixture modeling, it is characterized in that step comprises characteristic parameter extraction and synthesis, characteristic parameter time unifying, characteristic parameter are trained and conversion;
Described characteristic parameter extraction is decomposed primary speech signal, and described characteristic parameter synthesis is then the inverse process of characteristic parameter extraction;
Described characteristic parameter time unifying is used for arranging the characteristic parameter of converting objects and the voice that are converted object and screening, and obtains set of characteristic parameters synchronous in time;
The training of described characteristic parameter is the mapping relations between the speech characteristic parameter set for learning converting objects and being converted object, thus obtaining mapping principle, described characteristic parameter conversion utilizes mapping principle that the speech conversion of converting objects is become to be converted the voice of object.
2. a kind of Rapid Speech conversion method based on the modeling of sound channel spectrum Gaussian Mixture according to claim 1, is characterized in that: the operation steps of described characteristic parameter extraction is as follows:
(a1) voice signal is carried out to the framing of 20ms, with cross-correlation method, fundamental frequency is estimated;
(a2) according to fundamental frequency, determine that these frame voice are voiceless sound or voiced sound, when these frame voice are Voiced signal, then a maximum voiced sound frequency component is set in Voiced signal part, be used for dividing the main energy area of harmonic components and random element; Recycling least-squares algorithm is estimated to obtain discrete harmonic amplitude value and phase value; Interpolation is carried out to discrete harmonic amplitude, obtains spectrum envelope;
(a3) when these frame voice are Unvoiced signal, then utilize linear prediction analysis method to analyze Unvoiced signal part, thus obtain linear predictor coefficient;
(a4) non-linearization is carried out to the frequency spectrum axle of spectrum envelope, obtain mel-frequency; Preset 24 mel-frequency values, make it be the average of each blending constituent of gauss hybrid models; Logarithmetics is carried out to spectrum envelope amplitude axis, and at Gaussian mean place, it is sampled, sampling point numerical value is preserved, as weight coefficient; Theoretical according to human hearing characteristic, exploitation right coefficient value becomes the relation of approximate reverse ratio with the variance of Gaussian distribution, determines successively to the variance of all Gaussian mixture components.
3. a kind of Rapid Speech conversion method based on the modeling of sound channel spectrum Gaussian Mixture according to claim 2, is characterized in that: the operation steps of described characteristic parameter synthesis is as follows:
(b1) amplitude correction is carried out to weight coefficient sequence, revise scaling and determine according to the maximal value of distribution, be approximated to proportional relation;
(b2) carry out interpolation to the weight coefficient sequence after amplitude correction, form spectrum envelope value, make its horizontal ordinate be frequency, ordinate is range value;
(b3) spectrum envelope of Voiced signal is sampled according to fundamental frequency value, obtain discrete harmonic range value;
(b4) the discrete harmonic amplitude of Voiced signal and phase value are used as range value and the phase value of sinusoidal signal, and superpose; Interpositioning and Phase Compensation is used to make reconstruction signal not produce distortion in time domain waveform;
(b5) any random white noise signal is passed through an all-pole filter, obtain approximate reconstruction Unvoiced signal;
(b6) Voiced signal and Unvoiced signal are superposed, obtain the voice signal reconstructed.
4. a kind of Rapid Speech conversion method based on the modeling of sound channel spectrum Gaussian Mixture according to claim 3, is characterized in that: the operation steps of described characteristic parameter time unifying is as follows:
(c1) for converting objects and the characteristic parameter sequence of two Length discrepancy being converted object voice signal, utilize dynamic time warping algorithm to be mapped on the time shaft of another one by nonlinear for the time shaft of wherein one, realize matching relationship one to one;
(c2) in the process of the alignment of parameter sets, by the cumulative distortion function that iteration optimization one is default, and restricted searching area, final acquisition time match function.
5. a kind of Rapid Speech conversion method based on the modeling of sound channel spectrum Gaussian Mixture according to claim 4, is characterized in that: described characteristic parameter training and operation step is as follows:
The converting objects of alignment is become augmented matrix with the phonic signal character parametric joint being converted object, and default degree of mixing is N, utilizes expectation maximization rule iterative learning gauss hybrid models parameter, the average of this gauss hybrid models, variance and weight coefficient are parameter to be estimated, approximate evaluation weight coefficient and model parameter is come by Markov chain Monte-Carlo method, the i.e. average of Gaussian process and the associating posterior probability density function of variance, first suppose to meet separate characteristic between weight coefficient and model parameter, then both probability density functions are progressively estimated by the mode of iteration, in each iterative process, first fix a kind of known variables, then another kind of known variables is sampled, its probability distribution approximate is carried out by a large amount of sampled data, finally weight coefficient is multiplied with the probability distribution function of model parameter, associating posterior probability function can be obtained, carry out marginalisation to joint probability density function, obtain the estimation to the probability distribution of weight coefficient and the probability distribution of model parameter respectively, mixed Gaussian random process model structure is determined.
6. a kind of Rapid Speech conversion method based on the modeling of sound channel spectrum Gaussian Mixture according to claim 5, is characterized in that: described characteristic parameter conversion operations step is as follows:
(d1) under the condition of given input observation vector set, according to the structural parameters of the mixed Gaussian stochastic process trained, the posterior probability asking for current speech frame is subordinate to angle value;
(d2) in the subspace of the mixing composition of each sub-clustering, ask for the conditional expectation of output variable, get its average and export as conversion;
(d3) stacked up by the Output rusults of all compositions, weight coefficient is exactly that posterior probability is subordinate to angle value, finally obtains the speech characteristic parameter after mapping.
CN201410742549.8A 2014-12-08 2014-12-08 Sound track spectrum Gaussian mixture model based rapid voice conversion system and method Pending CN104392717A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410742549.8A CN104392717A (en) 2014-12-08 2014-12-08 Sound track spectrum Gaussian mixture model based rapid voice conversion system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410742549.8A CN104392717A (en) 2014-12-08 2014-12-08 Sound track spectrum Gaussian mixture model based rapid voice conversion system and method

Publications (1)

Publication Number Publication Date
CN104392717A true CN104392717A (en) 2015-03-04

Family

ID=52610610

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410742549.8A Pending CN104392717A (en) 2014-12-08 2014-12-08 Sound track spectrum Gaussian mixture model based rapid voice conversion system and method

Country Status (1)

Country Link
CN (1) CN104392717A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105974416A (en) * 2016-07-26 2016-09-28 四川电子军工集团装备技术有限公司 Accumulation cross-correlation envelope alignment 8-core DSP on-chip parallel implementation method
CN108198566A (en) * 2018-01-24 2018-06-22 咪咕文化科技有限公司 Information processing method and device, electronic equipment and storage medium
CN109712634A (en) * 2018-12-24 2019-05-03 东北大学 A kind of automatic sound conversion method
CN110070852A (en) * 2019-04-26 2019-07-30 平安科技(深圳)有限公司 Synthesize method, apparatus, equipment and the storage medium of Chinese speech
CN112116924A (en) * 2019-06-21 2020-12-22 株式会社日立制作所 Abnormal sound detection system, pseudo sound generation system, and pseudo sound generation method
CN112652318A (en) * 2020-12-21 2021-04-13 北京捷通华声科技股份有限公司 Tone conversion method and device and electronic equipment
CN113066472A (en) * 2019-12-13 2021-07-02 科大讯飞股份有限公司 Synthetic speech processing method and related device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090171657A1 (en) * 2007-12-28 2009-07-02 Nokia Corporation Hybrid Approach in Voice Conversion
CN101751921A (en) * 2009-12-16 2010-06-23 南京邮电大学 Real-time voice conversion method under conditions of minimal amount of training data
KR20100108843A (en) * 2009-03-30 2010-10-08 한국과학기술원 Method of voice conversion based on gaussian mixture model using kernel principal component analysis
CN102982809A (en) * 2012-12-11 2013-03-20 中国科学技术大学 Conversion method for sound of speaker
CN103531205A (en) * 2013-10-09 2014-01-22 常州工学院 Asymmetrical voice conversion method based on deep neural network feature mapping
CN104091592A (en) * 2014-07-02 2014-10-08 常州工学院 Voice conversion system based on hidden Gaussian random field

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090171657A1 (en) * 2007-12-28 2009-07-02 Nokia Corporation Hybrid Approach in Voice Conversion
KR20100108843A (en) * 2009-03-30 2010-10-08 한국과학기술원 Method of voice conversion based on gaussian mixture model using kernel principal component analysis
CN101751921A (en) * 2009-12-16 2010-06-23 南京邮电大学 Real-time voice conversion method under conditions of minimal amount of training data
CN102982809A (en) * 2012-12-11 2013-03-20 中国科学技术大学 Conversion method for sound of speaker
CN103531205A (en) * 2013-10-09 2014-01-22 常州工学院 Asymmetrical voice conversion method based on deep neural network feature mapping
CN104091592A (en) * 2014-07-02 2014-10-08 常州工学院 Voice conversion system based on hidden Gaussian random field

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105974416A (en) * 2016-07-26 2016-09-28 四川电子军工集团装备技术有限公司 Accumulation cross-correlation envelope alignment 8-core DSP on-chip parallel implementation method
CN105974416B (en) * 2016-07-26 2018-06-15 零八一电子集团有限公司 Accumulate 8 core DSP on piece Parallel Implementation methods of cross-correlation envelope alignment
CN108198566A (en) * 2018-01-24 2018-06-22 咪咕文化科技有限公司 Information processing method and device, electronic equipment and storage medium
CN109712634A (en) * 2018-12-24 2019-05-03 东北大学 A kind of automatic sound conversion method
CN110070852A (en) * 2019-04-26 2019-07-30 平安科技(深圳)有限公司 Synthesize method, apparatus, equipment and the storage medium of Chinese speech
WO2020215551A1 (en) * 2019-04-26 2020-10-29 平安科技(深圳)有限公司 Chinese speech synthesizing method, apparatus and device, storage medium
CN110070852B (en) * 2019-04-26 2023-06-16 平安科技(深圳)有限公司 Method, device, equipment and storage medium for synthesizing Chinese voice
CN112116924A (en) * 2019-06-21 2020-12-22 株式会社日立制作所 Abnormal sound detection system, pseudo sound generation system, and pseudo sound generation method
CN112116924B (en) * 2019-06-21 2024-02-13 株式会社日立制作所 Abnormal sound detection system, false sound generation system, and false sound generation method
CN113066472A (en) * 2019-12-13 2021-07-02 科大讯飞股份有限公司 Synthetic speech processing method and related device
CN112652318A (en) * 2020-12-21 2021-04-13 北京捷通华声科技股份有限公司 Tone conversion method and device and electronic equipment
CN112652318B (en) * 2020-12-21 2024-03-29 北京捷通华声科技股份有限公司 Tone color conversion method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN104392717A (en) Sound track spectrum Gaussian mixture model based rapid voice conversion system and method
CN106653056B (en) Fundamental frequency extraction model and training method based on LSTM recurrent neural network
CN110111803B (en) Transfer learning voice enhancement method based on self-attention multi-kernel maximum mean difference
CN104091592B (en) A kind of speech conversion system based on hidden Gaussian random field
CN102664003B (en) Residual excitation signal synthesis and voice conversion method based on harmonic plus noise model (HNM)
CN104392718B (en) A kind of robust speech recognition methods based on acoustic model array
CN104538028A (en) Continuous voice recognition method based on deep long and short term memory recurrent neural network
CN104464725B (en) A kind of method and apparatus imitated of singing
CN104464744A (en) Cluster voice transforming method and system based on mixture Gaussian random process
CN109767778A (en) A kind of phonetics transfer method merging Bi-LSTM and WaveNet
JP2009042716A (en) Cyclic signal processing method, cyclic signal conversion method, cyclic signal processing apparatus, and cyclic signal analysis method
CN110648684B (en) Bone conduction voice enhancement waveform generation method based on WaveNet
CN110189766B (en) Voice style transfer method based on neural network
CN113506562B (en) End-to-end voice synthesis method and system based on fusion of acoustic features and text emotional features
CN110047501A (en) Multi-to-multi phonetics transfer method based on beta-VAE
CN102930863B (en) Voice conversion and reconstruction method based on simplified self-adaptive interpolation weighting spectrum model
CN109300484B (en) Audio alignment method and device, computer equipment and readable storage medium
CN107785030B (en) Voice conversion method
CN112086100B (en) Quantization error entropy based urban noise identification method of multilayer random neural network
CN103886859B (en) Phonetics transfer method based on one-to-many codebook mapping
Wu et al. Denoising Recurrent Neural Network for Deep Bidirectional LSTM Based Voice Conversion.
CN105206259A (en) Voice conversion method
CN112562702B (en) Voice super-resolution method based on cyclic frame sequence gating cyclic unit network
CN115862590A (en) Text-driven speech synthesis method based on characteristic pyramid
CN112634914B (en) Neural network vocoder training method based on short-time spectrum consistency

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150304