CN102664003A - Residual excitation signal synthesis and voice conversion method based on harmonic plus noise model (HNM) - Google Patents
Residual excitation signal synthesis and voice conversion method based on harmonic plus noise model (HNM) Download PDFInfo
- Publication number
- CN102664003A CN102664003A CN2012101218866A CN201210121886A CN102664003A CN 102664003 A CN102664003 A CN 102664003A CN 2012101218866 A CN2012101218866 A CN 2012101218866A CN 201210121886 A CN201210121886 A CN 201210121886A CN 102664003 A CN102664003 A CN 102664003A
- Authority
- CN
- China
- Prior art keywords
- frame
- signal
- voice
- harmonic
- residual
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Telephonic Communication Services (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses a residual excitation signal synthesis and voice conversion method based on a harmonic plus noise model (HNM) and belongs to the field of voice signal processing. The method comprises the steps of preprocessing, judgment of unvoiced sound and voiced sound, extraction of harmonic parameters, computation of sound track spectrum parameters, establishment of a sound track spectrum conversion rule, conversion of characteristic parameters, prediction of residual excitation, voice synthesis and residual compensation. During establishment of an excitation signal, an appropriate residual signal which is analyzed and generated by the HNM is linearly superposed on the basis of a residual signal of a voiced sound frame harmonic signal which is analyzed and extracted by the HNM to form a predicted excitation source signal, so the supra-segmental features of a speaker, which are included in an excitation source, are effectively enhanced, and distortion caused by manual correction of the excitation signal in the conventional method is avoided; and in the synthesis stage, the appropriate residual of a target voiced sound frame harmonic signal which is analyzed by the HNM is superposed in a synthesized voice frame by frame, so the converted voice has the personality of a target speaker, and the quality of the voice is improved.
Description
Technical field
The present invention relates to Voice Conversion Techniques, particularly synthetic the and phonetics transfer method based on the residual error pumping signal of harmonic wave plus noise model belongs to the voice process technology field.
Background technology
Speech conversion is the emerging in recent years research branch of field of voice signal; Be on the research basis of Speaker Identification and phonetic synthesis, to carry out; Simultaneously also be the abundant and continuation of these two branch's intensions, but not exclusively be under the jurisdiction of the category of Speaker Identification and phonetic synthesis again.
The target of speech conversion is under the condition that the semantic information that guarantees wherein remains unchanged; Personal characteristics information in the speaker's voice of change source; Make it to have target speaker's personal characteristics, thereby make the voice after the conversion sound similarly being target speaker's sound.The realization of speech conversion can be divided into training stage and translate phase.In the training stage, system trains source speaker and target speaker, analyzes their parameter, sets up transformation rule.At translate phase, earlier phonetic feature is analyzed and extracted to the source voice, be converted to the target speech characteristic according to the voice conversion rules that obtains by the training stage again.
The characteristic of voice signal is divided into two types of segment information and Supersonic segment informations.What segment5al feature was described is the tamber characteristic of voice, mainly comprises the bandwidth, spectral tilt, fundamental frequency of position, the resonance peak of sound channel resonance peak etc.Supersonic section feature description be the prosodic features and the driving source information of voice, characteristic parameter mainly comprises the behavioral characteristics such as duration, energy, the variation profile in cycle and the variation of spectrum envelope of phoneme etc.
The key issue of speech conversion is the extraction of speaker's personal characteristics and the foundation of transformation rule, and the development through recent two decades emerges a large amount of achievements in research.At present the segment5al feature with voice signal is mainly concentrated in the research of speech characteristic parameter, and to voice signal driving source Supersonic section characteristic relate to few.The current main method that the voice signal driving source is estimated has based on linear predictive coding (Linear Prediction Coding, LPC) the residual prediction method of model.But during the residual signals that linear forecasting technology obtains (Residual signal) conduct excitation, the target speaker's individual character that contains is less, and energy is lower in the residual signals, causes conversion back voice quality relatively poor; (1, Suendermann D, Bonafonte A, Ney H; Hoege H, " A Study on Residual Prediction Techniques for Voice Conversion ", proceedings of IEEE International Conference on Acoustics; Speech, and Signal Processing, vol.1; Pp.13-16,2005. 2, Percybrooks W.S, Moore E; " Voice conversion with linear prediction residual estimation ", proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing; Pp.4673 – 4676; March 2008.) in addition, also usefully in the existing speech conversion system calculate the companding ratio of fundamental frequency through the mean value of asking for fundamental frequency, perhaps revise excitation source signal artificially through modes such as duration insertion, shearings.But, receive the influence of the residing environment of speaker because voice signal driving source Supersonic segment signal characteristic is relevant more than speaker's state at that time.Therefore, the driving source Supersonic segment information that artificial modification pumping signal can not the accurate description voice, and introduce distortion.(3、Xuejing?Sun,?“Voice?quality?conversion?in?TD-PSOLA?speech?synthesis”,?proceedings?of?IEEE?International?Conference?on?Acoustics,?Speech,?and?Signal?Processing,?Vol.2,?pp.?II953?-?II956,?2000. 4、Wang?Yuan-yuan,?Yang?Shun,?“Speech?synthesis?based?on?PSOLA?algorithm?and?modified?pitch?parameters”,?International?Conference?on?Computational?Problem-Solving?(ICCP),?pp.?296?-?299,?2010.)。
Summary of the invention
The object of the present invention is to provide a kind of voice signal characteristics and the voice conversion algorithm of speaker's personal characteristics under parallel text of combining; The primary study voice signal is in the extraction and the prediction of driving source Supersonic segment information; Through to the improvement of excitation source signal and the compensation of conversion back voice, strengthen target speaker's in the synthetic speech the individual character and the performance of raising converting system.
In order to realize the foregoing invention purpose, the present invention has adopted following technical scheme:
Synthetic and the phonetics transfer method of a kind of residual error pumping signal based on harmonic wave plus noise model, concrete steps are following:
The first step, pre-service and pure and impure sound are judged, promptly respectively source voice and target speech are carried out pre-emphasis, divide frame and windowing process, calculate the short-time energy and average zero-crossing rate of each frame, accomplish the judgement of pure and impure sound;
In second step, the extraction of harmonic parameters promptly utilizes harmonic wave plus noise model (Harmonic plus Noise Model; HNM) model is analyzed the turbid speech frame of source voice and target speech respectively; At first calculate the fundamental frequency of unvoiced frame signal, the HNM model is decomposed into harmonic signal and wideband random signal with unvoiced frame then, calculates the harmonic wave number; Extract amplitude, phase place and the each harmonic frequency of harmonic signal, regard random noise as for voiceless sound and keep constant;
The 3rd step; Sound channel spectrum CALCULATION OF PARAMETERS; The amplitude of each order harmonics that promptly respectively the voiced sound signal extraction of source voice and target speech is gone out is carried out conversion, with square sampled value as discrete power of amplitude, through invert fast fourier transformation (Inverse Fast Fourier Transformation; IFFT) obtain coefficient of autocorrelation; Carry out lpc analysis through the Levinson-Durbin algorithm again, obtain linear spectral frequency (Linear Spectral Frequency, LSF) parameter and the corresponding residual signals of source voice and target speech;
The 4th step; Set up sound channel spectrum transformation rule, with the LSF parameter of source voice and target speech pass through dynamic time warping (Dynamic Time Warping, DTW) after; (Gaussian Mixture Model GMM) carries out probabilistic Modeling to send into gauss hybrid models;
The conversion of the 5th step characteristic parameter; Promptly treating converting speech earlier analyzes through HNM; According to the method for above-mentioned second step with the 3rd step, extract LSF parameter and residual signals to be converted, LSF parameter to be converted is sent into the GMM transformation rule of the 4th step foundation and changed;
The 6th step; The prediction of residual error excitation; The immediate target LSF parameter of LSF parameter after promptly at first finding out and change by frame is utilized the corresponding residual signals of this target LSF parameter and this frame remaining random signal linear superposition after HNM analyzes, then as the residual error pumping signal;
The 7th step; Phonetic synthesis and residual compensation; Promptly at first go on foot LSF parameter and residual error pumping signal after the conversion obtain by above-mentioned the 5th step and the 6th; Every frame voice signal based on the LPC synthetic model obtains changing out is superimposed with corresponding appropriate target residual signals with every frame voice signal of changing out then once more, behind overlap-add, finally obtains the voice that synthesize.
The present invention compared with prior art; Its remarkable advantage: (1) is when setting up pumping signal; On the basis of the residual signals of analyzing the unvoiced frame harmonic signal that extracts through HNM; This HNM of appropriateness analyzes the excitation source signal of produced simultaneously residual signal (wideband random signal) as prediction on the linear superposition, can effectively strengthen the speaker's Supersonic section characteristic that contains in the driving source like this, avoids classic method artificially to revise the distortion that pumping signal is introduced simultaneously; (2) synthesis phase is superimposed with the residual error of the target unvoiced frame harmonic signal that analyzes of HNM of appropriateness once more by frame in the voice that synthesize, make the voice of changing out have more target speaker individual character, improves voice quality.
Below in conjunction with accompanying drawing the present invention is described in further detail.
Description of drawings
Fig. 1 is the synthetic and phonetics transfer method synoptic diagram of residual error pumping signal that the present invention is based on harmonic wave plus noise model;
Fig. 2 is the extraction of characteristic parameter and the synoptic diagram that transformation rule is set up;
Fig. 3 is the conversion and the synoptic diagram of predicting based on the residual error pumping signal of HNM model of characteristic parameter;
Fig. 4 is
iThe synoptic diagram of turbid speech parameter conversion of frame and phonetic synthesis.
Embodiment
In conjunction with Fig. 1, the synthetic and phonetics transfer method based on the residual error pumping signal of harmonic wave plus noise model, step is following:
The first step is carried out pre-service and the judgement of pure and impure sound earlier in the training stage, promptly respectively source voice and target speech is carried out pre-emphasis, divides frame and windowing process, calculates the short-time energy and average zero-crossing rate of each frame, accomplishes the judgement of pure and impure sound, and detailed process is following:
(1) source voice and target speech signal are carried out pre-service respectively, pre emphasis factor is 0.96, and 20ms divides frame by frame length, and zero lap uses Hamming window to carry out windowing process afterwards;
Short-time zero-crossing rate
, wherein
Be after windowing
iThe frame voice signal, and frame length does
, adopt the double threshold method to carry out the judgement of pure and impure sound;
Second step, the extraction of harmonic parameters, as shown in Figure 2; Utilize the HNM model respectively the turbid speech frame of source voice and target speech to be analyzed; At first calculate the fundamental frequency of unvoiced frame signal, the HNM model is decomposed into harmonic signal and wideband random signal with unvoiced frame then, calculates the harmonic wave number; Extract the amplitude of harmonic signal, phase place and each harmonic frequency; It is constant to regard the random noise reservation as for voiceless sound, and detailed process is following:
(1) with the normalized crosscorrelation method fundamental frequency
of calculation sources voice and target speech present frame respectively;
(2) respectively source voice and target speech are analyzed, if present frame is a unvoiced frame
(wherein
,
NBe frame length), it is decomposed into harmonic components
And random element
, at first, confirm higher harmonics number
, wherein
Be SF.Objective function is
; Wherein
representes Hamming window (hamming); Under young waiter in a wineshop or an inn's journey criterion, estimate multiple amplitude
; The real amplitude of harmonic components
can be expressed as
so, and the reality position can be expressed as
;
(3) right between adjacent two frames
With
Carry out interpolation, make variate when both become respectively
With
, likewise to the harmonic wave number
LCarry out linear interpolation variate when it is become
Adjacent two frames of what-if are
kFrame and
k+ 1 frame, and their center lays respectively at sampling point
n=
KNWith
n=(
k+ 1)
NOn, amplitude harmonic number is carried out linear interpolation, phase place is carried out the cubic polynomial interpolation:
Therefore; The harmonic wave part of one frame signal can be expressed as
, and then remaining random signal can be expressed as
;
(4) if present frame is that voiceless sound then is seen as random noise, because information is less so reservation voiceless sound signal is constant in the voiceless sound;
The 3rd step, sound channel spectrum CALCULATION OF PARAMETERS, as shown in Figure 2; The amplitude of each order harmonics that at first respectively the voiced sound signal extraction of source voice and target speech is gone out is carried out conversion; Square sampled value as discrete power with amplitude obtains coefficient of autocorrelation through the IFFT conversion, carries out lpc analysis through the Levinson-Durbin algorithm again; Obtain the LSF parameter and the corresponding residual signals of source voice and target speech, detailed process (is calculated by frame) as follows:
(1) calculates
LIndividual discrete amplitudes value
Square value, think the sampled value of discrete power spectrum
, wherein
Be
lThe subharmonic angular frequency,
(2) will
Carry out the IFFT conversion and obtain coefficient of autocorrelation
, try to achieve through the Levinson-Durbin algorithm
PRank LPC coefficient
, and further convert the LSF parameter into;
(3) by LPC coefficients to construct linear prediction inverse filter; Its transform expression formula is
, and voice just can obtain the residual signals behind this frame lpc analysis through
;
The 4th step, set up sound channel spectrum transformation rule, as shown in Figure 2, with the LSF parameter of source voice and target speech pass through DTW regular after, send into the GMM model and carry out probabilistic Modeling, detailed process is following:
(1) the LSF parameter that source voice signal and target speech unvoiced frame extracting harmonic is gone out through the DTW time unifying, and is noted the subscript of the alignment LSF that DTW returns;
(2) subscript of the alignment LSF that returns according to DTW is alignd the residual signals of source voice with the harmonic wave of target speech unvoiced frame, likewise the source voice is alignd with target speech unvoiced frame remaining random signal after HNM analyzes;
(3) the source LSF parameter and the target LSF parameter composition combined parameters of alignment are sent into the GMM model, set up sound channel spectrum transfer function;
The 5th step, the conversion of characteristic parameter, as shown in Figure 3, treat converting speech earlier and analyze through HNM, according to the method for above-mentioned second step, extract LSF parameter and residual signals to be converted with the 3rd step.LSF parameter to be converted is sent into the GMM transformation rule of the 4th step foundation and changed, detailed process is following:
(1) voice signal to be converted such as above-mentioned step is said, carry out pre-service, divide frame, analyze through HNM and extract harmonic parameters, calculate sound channel spectrum parameter and convert the LSF parameter into;
(2) good GMM rule is set up in every frame LSF parameter utilization to be converted and changed the LSF parameter after obtaining changing;
The 6th step; The prediction of residual error excitation; As shown in Figure 3, the immediate target LSF parameter of LSF parameter after at first finding out and change by frame is utilized the corresponding residual signals of this target LSF parameter and this frame remaining random signal linear superposition after HNM analyzes then; As the residual error pumping signal, detailed process is following:
(1) finds out target LSF parameter immediate for the LSF parameter of changing out by frame, remaining random signal when confirming that residual signals that this target LSF parameter is corresponding and HNM analyze with it;
Remaining random signal linear superposition during (2) with target residual signals and HNM analysis is as the residual error pumping signal;
The 7th step; Phonetic synthesis and residual compensation, as shown in Figure 4, at first go on foot LSF parameter and residual error pumping signal after the conversion obtain by above-mentioned the 5th step and the 6th; The every frame voice signal that obtains changing out based on the LPC synthetic model; Then every frame voice signal of changing out is superimposed with corresponding appropriate target residual signals once more, behind overlap-add, finally obtains the voice that synthesize, detailed process is following:
(1) the LSF Parameters Transformation after the conversion that above-mentioned steps is obtained is the LPC coefficient, sets up wave filter by frame by the LPC coefficient, again with the residual error pumping signal that dopes through this wave filter, the voice after obtaining changing;
(2) voice signal after every frame conversion is superimposed with corresponding appropriate target residual signals once more; Residual signals is carried out the amplification of appropriateness according to the general needs of experiment experience; Can residual signals be amplified to original 2 ~ 5 times during compensation, each frame voice just can obtain final synthetic speech after splicing then.
Claims (8)
1. based on the synthetic phonetics transfer method that reaches of the residual error pumping signal of harmonic wave plus noise model, it is characterized in that comprising following steps:
The first step, pre-service and pure and impure sound are judged, promptly respectively source voice and target speech are carried out pre-emphasis, divide frame and windowing process, calculate the short-time energy and average zero-crossing rate of each frame, accomplish the judgement of pure and impure sound;
Second step; The extraction of harmonic parameters promptly utilizes the HNM model respectively the turbid speech frame of source voice and target speech to be analyzed, and at first calculates the fundamental frequency of unvoiced frame signal; The HNM model is decomposed into harmonic signal and wideband random signal with unvoiced frame then; Calculate the harmonic wave number, extract amplitude, phase place and the each harmonic frequency of harmonic signal, regard random noise as for voiceless sound and keep constant;
The 3rd step; Sound channel spectrum CALCULATION OF PARAMETERS; The amplitude of each order harmonics that promptly respectively the voiced sound signal extraction of source voice and target speech is gone out is carried out conversion, and square sampled value as discrete power with amplitude obtains coefficient of autocorrelation through the IFFT conversion; Carry out lpc analysis through the Levinson-Durbin algorithm again, obtain the LSF parameter and the corresponding residual signals of source voice and target speech;
The 4th step, set up sound channel spectrum transformation rule, with the LSF parameter of source voice and target speech pass through DTW regular after, send into the GMM model and carry out probabilistic Modeling;
The 5th step; The conversion of characteristic parameter is promptly treated converting speech earlier and is analyzed through HNM, according to the method for above-mentioned second step with the 3rd step; Extract LSF parameter and residual signals to be converted, LSF parameter to be converted is sent into the GMM transformation rule of the 4th step foundation and changed;
The 6th step; The prediction of residual error excitation; The immediate target LSF parameter of LSF parameter after promptly at first finding out and change by frame is utilized the corresponding residual signals of this target LSF parameter and this frame remaining random signal linear superposition after HNM analyzes, then as the residual error pumping signal;
The 7th step; Phonetic synthesis and residual compensation; Promptly at first go on foot LSF parameter and residual error pumping signal after the conversion obtain by above-mentioned the 5th step and the 6th; Every frame voice signal based on the LPC synthetic model obtains changing out is superimposed with corresponding appropriate target residual signals with every frame voice signal of changing out then once more, behind overlap-add, finally obtains the voice that synthesize.
2. the synthetic and phonetics transfer method of the residual error pumping signal based on harmonic wave plus noise model according to claim 1 is characterized in that the detailed process of pre-service and pure and impure sound judgement is following:
The first step is carried out pre-service respectively to source voice and target speech signal, and pre emphasis factor is 0.96, and 20ms divides frame by frame length, and zero lap uses Hamming window to carry out windowing process afterwards;
Short-time zero-crossing rate
X wherein
i(m) be i frame voice signal after windowing, and frame length is N, adopts the double threshold method to carry out the judgement of pure and impure sound.
3. the synthetic and phonetics transfer method of the residual error pumping signal based on harmonic wave plus noise model according to claim 1 is characterized in that the leaching process of harmonic parameters is following:
The first step is with the fundamental frequency f of normalized crosscorrelation method difference calculation sources voice and target speech present frame
0
Second step, respectively source voice and target speech are analyzed, be unvoiced frame s (n) as if present frame, 1≤n≤N wherein, N is a frame length, and it is decomposed into harmonic components s
h(n) and random element e (n), at first, confirm higher harmonics number
F wherein
sBe SF, objective function does
W (n) expression Hamming window is wherein estimated multiple amplitude { C under young waiter in a wineshop or an inn's journey criterion
l, l=-L ,-L+1 ..., L}, the real amplitude { A of harmonic components
lBe expressed as A
l=2|C
l|=2|C
-l|, the reality bit table is shown
Again between adjacent two frames to { A
lAnd
Carry out interpolation, make variate { A when both become respectively
l(n) } and
Likewise harmonic wave number L is carried out linear interpolation variate { L (n) } when it is become, then remaining random signal is expressed as
The 3rd step is if present frame is that voiceless sound then is seen as random noise, because information is less so reservation voiceless sound signal is constant in the voiceless sound.
4. the synthetic and phonetics transfer method of the residual error pumping signal based on harmonic wave plus noise model according to claim 1 is characterized in that the following by frame computation process of sound channel spectrum parameter:
The first step is calculated L discrete amplitudes value A
lSquare value, think the sampled value P (ω of discrete power spectrum
l), ω wherein
lBe l subharmonic angular frequency, ω
l=2 π lf
0
Second step is with p (ω
l) carry out the IFFT conversion and obtain coefficient of autocorrelation R (n), try to achieve P rank LPC coefficient { a through the Levinson-Durbin algorithm
j, j=1,2 ..., P}, and further convert the LSF parameter into;
5. the synthetic and phonetics transfer method of the residual error pumping signal based on harmonic wave plus noise model according to claim 1, the detailed process that it is characterized in that setting up sound channel spectrum transformation rule is following:
The first step, the LSF parameter that source voice signal and target speech unvoiced frame extracting harmonic are gone out through the DTW time unifying, and is noted the subscript of the alignment LSF that DTW returns;
In second step, the subscript of the alignment LSF that returns according to DTW is alignd the residual signals of source voice with the harmonic wave of target speech unvoiced frame, and likewise the source voice align with target speech unvoiced frame remaining random signal after the HNM analysis;
The 3rd step, the source LSF parameter and the target LSF parameter composition combined parameters of alignment are sent into the GMM model, set up sound channel spectrum transfer function.
6. the synthetic and phonetics transfer method of the residual error pumping signal based on harmonic wave plus noise model according to claim 1 is characterized in that the detailed process of conversion of characteristic parameter is following:
The first step is carried out voice signal to be converted pre-service, is divided frame, analyzes through HNM and extracts harmonic parameters, calculates sound channel spectrum parameter and converts the LSF parameter into;
In second step, good GMM rule is set up in every frame LSF parameter utilization to be converted changed the LSF parameter after obtaining changing.
7. the synthetic and phonetics transfer method of the residual error pumping signal based on harmonic wave plus noise model according to claim 1 is characterized in that the forecasting process of residual error excitation is following:
The first step is found out target LSF parameter immediate with it for the LSF parameter of changing out by frame, remaining random signal when confirming that residual signals that this target LSF parameter is corresponding and HNM analyze;
In second step, remaining random signal linear superposition during with target residual signals and HNM analysis is as the residual error pumping signal.
8. the synthetic and phonetics transfer method of the residual error pumping signal based on harmonic wave plus noise model according to claim 1 is characterized in that the detailed process of phonetic synthesis and residual compensation is following:
The first step is the LPC coefficient with the LSF Parameters Transformation after the conversion that obtains, and sets up wave filter by frame by the LPC coefficient, again the residual error pumping signal that dopes is passed through this wave filter, the voice after obtaining changing;
Second step was superimposed with corresponding appropriate target residual signals once more with the voice signal after every frame conversion, and each frame voice just can obtain final synthetic speech after splicing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012101218866A CN102664003B (en) | 2012-04-24 | 2012-04-24 | Residual excitation signal synthesis and voice conversion method based on harmonic plus noise model (HNM) |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012101218866A CN102664003B (en) | 2012-04-24 | 2012-04-24 | Residual excitation signal synthesis and voice conversion method based on harmonic plus noise model (HNM) |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102664003A true CN102664003A (en) | 2012-09-12 |
CN102664003B CN102664003B (en) | 2013-12-04 |
Family
ID=46773469
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2012101218866A Expired - Fee Related CN102664003B (en) | 2012-04-24 | 2012-04-24 | Residual excitation signal synthesis and voice conversion method based on harmonic plus noise model (HNM) |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102664003B (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103489443A (en) * | 2013-09-17 | 2014-01-01 | 湖南大学 | Method and device for imitating sound |
WO2015154397A1 (en) * | 2014-04-08 | 2015-10-15 | 华为技术有限公司 | Noise signal processing and generation method, encoder/decoder and encoding/decoding system |
CN105359211A (en) * | 2013-09-09 | 2016-02-24 | 华为技术有限公司 | Unvoiced/voiced decision for speech processing |
CN106098073A (en) * | 2016-05-23 | 2016-11-09 | 苏州大学 | A kind of end-to-end speech encrypting and deciphering system mapping based on frequency spectrum |
CN106486129A (en) * | 2014-06-27 | 2017-03-08 | 华为技术有限公司 | A kind of audio coding method and device |
CN106656882A (en) * | 2016-11-29 | 2017-05-10 | 中国科学院声学研究所 | Signal synthesizing method and system |
CN107134277A (en) * | 2017-06-15 | 2017-09-05 | 深圳市潮流网络技术有限公司 | A kind of voice-activation detecting method based on GMM model |
CN108281150A (en) * | 2018-01-29 | 2018-07-13 | 上海泰亿格康复医疗科技股份有限公司 | A kind of breaking of voice change of voice method based on derivative glottal flow model |
CN108510991A (en) * | 2018-03-30 | 2018-09-07 | 厦门大学 | Utilize the method for identifying speaker of harmonic series |
CN108766450A (en) * | 2018-04-16 | 2018-11-06 | 杭州电子科技大学 | A kind of phonetics transfer method decomposed based on harmonic wave impulse |
CN108899008A (en) * | 2018-06-13 | 2018-11-27 | 中国人民解放军91977部队 | One kind simulating interference method and system to empty voice communication noise |
CN109003621A (en) * | 2018-09-06 | 2018-12-14 | 广州酷狗计算机科技有限公司 | A kind of audio-frequency processing method, device and storage medium |
CN109065068A (en) * | 2018-08-17 | 2018-12-21 | 广州酷狗计算机科技有限公司 | Audio-frequency processing method, device and storage medium |
CN109712634A (en) * | 2018-12-24 | 2019-05-03 | 东北大学 | A kind of automatic sound conversion method |
CN110444192A (en) * | 2019-08-15 | 2019-11-12 | 广州科粤信息科技有限公司 | A kind of intelligent sound robot based on voice technology |
CN111418005A (en) * | 2017-11-29 | 2020-07-14 | 雅马哈株式会社 | Speech synthesis method, speech synthesis device, and program |
CN113241089A (en) * | 2021-04-16 | 2021-08-10 | 维沃移动通信有限公司 | Voice signal enhancement method and device and electronic equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004088633A1 (en) * | 2003-03-27 | 2004-10-14 | France Telecom | Method for analyzing fundamental frequency information and voice conversion method and system implementing said analysis method |
TW201001396A (en) * | 2008-06-26 | 2010-01-01 | Univ Nat Taiwan Science Tech | Method for synthesizing speech |
CN102063899A (en) * | 2010-10-27 | 2011-05-18 | 南京邮电大学 | Method for voice conversion under unparallel text condition |
-
2012
- 2012-04-24 CN CN2012101218866A patent/CN102664003B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004088633A1 (en) * | 2003-03-27 | 2004-10-14 | France Telecom | Method for analyzing fundamental frequency information and voice conversion method and system implementing said analysis method |
TW201001396A (en) * | 2008-06-26 | 2010-01-01 | Univ Nat Taiwan Science Tech | Method for synthesizing speech |
CN102063899A (en) * | 2010-10-27 | 2011-05-18 | 南京邮电大学 | Method for voice conversion under unparallel text condition |
Non-Patent Citations (2)
Title |
---|
WINSTON S. PERCYBROOKS等: "VOICE CONVERSION WITH LINEAR PREDICTION RESIDUAL ESTIMATON", 《IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING》, 4 April 2008 (2008-04-04), pages 4673 - 4676 * |
易立夫等: "基于HNM算法的汉语语音合成系统", 《第六届全国现代语音学学术会议论文集(下)》, 20 October 2003 (2003-10-20), pages 528 - 533 * |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10347275B2 (en) | 2013-09-09 | 2019-07-09 | Huawei Technologies Co., Ltd. | Unvoiced/voiced decision for speech processing |
CN105359211A (en) * | 2013-09-09 | 2016-02-24 | 华为技术有限公司 | Unvoiced/voiced decision for speech processing |
CN103489443A (en) * | 2013-09-17 | 2014-01-01 | 湖南大学 | Method and device for imitating sound |
US10734003B2 (en) | 2014-04-08 | 2020-08-04 | Huawei Technologies Co., Ltd. | Noise signal processing method, noise signal generation method, encoder, decoder, and encoding and decoding system |
US9728195B2 (en) | 2014-04-08 | 2017-08-08 | Huawei Technologies Co., Ltd. | Noise signal processing method, noise signal generation method, encoder, decoder, and encoding and decoding system |
US10134406B2 (en) | 2014-04-08 | 2018-11-20 | Huawei Technologies Co., Ltd. | Noise signal processing method, noise signal generation method, encoder, decoder, and encoding and decoding system |
WO2015154397A1 (en) * | 2014-04-08 | 2015-10-15 | 华为技术有限公司 | Noise signal processing and generation method, encoder/decoder and encoding/decoding system |
CN106486129A (en) * | 2014-06-27 | 2017-03-08 | 华为技术有限公司 | A kind of audio coding method and device |
US11133016B2 (en) | 2014-06-27 | 2021-09-28 | Huawei Technologies Co., Ltd. | Audio coding method and apparatus |
US10460741B2 (en) | 2014-06-27 | 2019-10-29 | Huawei Technologies Co., Ltd. | Audio coding method and apparatus |
CN106486129B (en) * | 2014-06-27 | 2019-10-25 | 华为技术有限公司 | A kind of audio coding method and device |
CN106098073A (en) * | 2016-05-23 | 2016-11-09 | 苏州大学 | A kind of end-to-end speech encrypting and deciphering system mapping based on frequency spectrum |
CN106656882A (en) * | 2016-11-29 | 2017-05-10 | 中国科学院声学研究所 | Signal synthesizing method and system |
CN106656882B (en) * | 2016-11-29 | 2019-05-10 | 中国科学院声学研究所 | A kind of signal synthesis method and system |
CN107134277A (en) * | 2017-06-15 | 2017-09-05 | 深圳市潮流网络技术有限公司 | A kind of voice-activation detecting method based on GMM model |
CN111418005A (en) * | 2017-11-29 | 2020-07-14 | 雅马哈株式会社 | Speech synthesis method, speech synthesis device, and program |
CN111418005B (en) * | 2017-11-29 | 2023-08-11 | 雅马哈株式会社 | Voice synthesis method, voice synthesis device and storage medium |
CN108281150A (en) * | 2018-01-29 | 2018-07-13 | 上海泰亿格康复医疗科技股份有限公司 | A kind of breaking of voice change of voice method based on derivative glottal flow model |
CN108510991A (en) * | 2018-03-30 | 2018-09-07 | 厦门大学 | Utilize the method for identifying speaker of harmonic series |
CN108766450B (en) * | 2018-04-16 | 2023-02-17 | 杭州电子科技大学 | Voice conversion method based on harmonic impulse decomposition |
CN108766450A (en) * | 2018-04-16 | 2018-11-06 | 杭州电子科技大学 | A kind of phonetics transfer method decomposed based on harmonic wave impulse |
CN108899008A (en) * | 2018-06-13 | 2018-11-27 | 中国人民解放军91977部队 | One kind simulating interference method and system to empty voice communication noise |
CN108899008B (en) * | 2018-06-13 | 2023-04-18 | 中国人民解放军91977部队 | Method and system for simulating interference of noise in air voice communication |
CN109065068A (en) * | 2018-08-17 | 2018-12-21 | 广州酷狗计算机科技有限公司 | Audio-frequency processing method, device and storage medium |
CN109003621A (en) * | 2018-09-06 | 2018-12-14 | 广州酷狗计算机科技有限公司 | A kind of audio-frequency processing method, device and storage medium |
CN109712634A (en) * | 2018-12-24 | 2019-05-03 | 东北大学 | A kind of automatic sound conversion method |
CN110444192A (en) * | 2019-08-15 | 2019-11-12 | 广州科粤信息科技有限公司 | A kind of intelligent sound robot based on voice technology |
CN113241089A (en) * | 2021-04-16 | 2021-08-10 | 维沃移动通信有限公司 | Voice signal enhancement method and device and electronic equipment |
CN113241089B (en) * | 2021-04-16 | 2024-02-23 | 维沃移动通信有限公司 | Voice signal enhancement method and device and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN102664003B (en) | 2013-12-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102664003B (en) | Residual excitation signal synthesis and voice conversion method based on harmonic plus noise model (HNM) | |
Dave | Feature extraction methods LPC, PLP and MFCC in speech recognition | |
US20150025892A1 (en) | Method and system for template-based personalized singing synthesis | |
Song et al. | ExcitNet vocoder: A neural excitation model for parametric speech synthesis systems | |
CN1815552B (en) | Frequency spectrum modelling and voice reinforcing method based on line spectrum frequency and its interorder differential parameter | |
CN101527141B (en) | Method of converting whispered voice into normal voice based on radial group neutral network | |
CN110648684B (en) | Bone conduction voice enhancement waveform generation method based on WaveNet | |
CN102184731A (en) | Method for converting emotional speech by combining rhythm parameters with tone parameters | |
CN103021418A (en) | Voice conversion method facing to multi-time scale prosodic features | |
CN103714822B (en) | Sub-band coding and decoding method and device based on SILK coder decoder | |
CN106782599A (en) | The phonetics transfer method of post filtering is exported based on Gaussian process | |
Erro et al. | MFCC+ F0 extraction and waveform reconstruction using HNM: preliminary results in an HMM-based synthesizer | |
Oura et al. | Deep neural network based real-time speech vocoder with periodic and aperiodic inputs | |
CN105654941A (en) | Voice change method and device based on specific target person voice change ratio parameter | |
CN103854655B (en) | A kind of low bit-rate speech coder and decoder | |
CN103886859B (en) | Phonetics transfer method based on one-to-many codebook mapping | |
CN102231275B (en) | Embedded speech synthesis method based on weighted mixed excitation | |
Raju et al. | Application of prosody modification for speech recognition in different emotion conditions | |
Othmane et al. | Enhancement of esophageal speech using voice conversion techniques | |
CN115862590A (en) | Text-driven speech synthesis method based on characteristic pyramid | |
Xie et al. | Pitch transformation in neural network based voice conversion | |
CN114913844A (en) | Broadcast language identification method for pitch normalization reconstruction | |
Gentet et al. | Neutral to lombard speech conversion with deep learning | |
Jung et al. | Pitch alteration technique in speech synthesis system | |
Kawahara et al. | Beyond bandlimited sampling of speech spectral envelope imposed by the harmonic structure of voiced sounds. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20131204 Termination date: 20160424 |
|
CF01 | Termination of patent right due to non-payment of annual fee |