CN103345920B - Self-adaptation interpolation weighted spectrum model voice conversion and reconstructing method based on Mel-KSVD sparse representation - Google Patents

Self-adaptation interpolation weighted spectrum model voice conversion and reconstructing method based on Mel-KSVD sparse representation Download PDF

Info

Publication number
CN103345920B
CN103345920B CN201310211046.3A CN201310211046A CN103345920B CN 103345920 B CN103345920 B CN 103345920B CN 201310211046 A CN201310211046 A CN 201310211046A CN 103345920 B CN103345920 B CN 103345920B
Authority
CN
China
Prior art keywords
dictionary
model
voice
ksvd
mel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310211046.3A
Other languages
Chinese (zh)
Other versions
CN103345920A (en
Inventor
汤一彬
沈媛
朱昌平
周浩
高远
单鸣雷
姚澄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changzhou Campus of Hohai University
Original Assignee
Changzhou Campus of Hohai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changzhou Campus of Hohai University filed Critical Changzhou Campus of Hohai University
Priority to CN201310211046.3A priority Critical patent/CN103345920B/en
Publication of CN103345920A publication Critical patent/CN103345920A/en
Application granted granted Critical
Publication of CN103345920B publication Critical patent/CN103345920B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention belongs to the field of voice signal processing and discloses a self-adaptation interpolation weighted spectrum model voice conversion and reconstructing method based on Mel-KSVD sparse representation. According to the method, the problem of data compressing of model parameters is fully considered, after a smooth power spectrum is extracted in the voice analysis stage, the Mel-KSVD method is utilized for conducting related sparse coefficient representation on the extracted smooth power spectrum parameters, meanwhile, in the process of the sparse representation, a dictionary is continuously updated through the dictionary self-adaptation learning strategy, and the sparse coefficient is optimized. The simulation result shows that compared with a traditional model with fewer sparse coefficients, the model is wholly equal to or better than the traditional model in the synthetic voice quality, and the male voice is even superior to that of the traditional KSVD sparse representation model. In addition, compared with a Mel frequency cepstrum coefficient compressed model, the model is better in the voice synthesis quality.

Description

Based on speech conversion and the reconstructing method of the adaptive interpolation weighted spectral model of Mel-KSVD rarefaction representation
Technical field
The invention belongs to field of voice signal, relate to a kind of speech conversion and reconstruction model, particularly a kind of speech conversion of the adaptive interpolation weighted spectral model based on Mel-KSVD rarefaction representation and reconstructing method.
Background technology
Speech parameter and reconstruct be one important and there is certain challenging problem, the speech analysis-synthesis system of its correspondence is widely used in various field, as voice coding, conversion etc.
Show in " speech conversion and reconstructing method based on the adaptive interpolation weighted spectral model " document delivered in April, 1999 people such as H.Kawahara, based on speech conversion and the reconstruction model of adaptive interpolation weighted spectral, abandon the structure of glottis, sound channel in traditional voice model, the power spectrum of extracting directly voice, obtains high-quality phonetic synthesis effect.It becomes the speech analysis synthetic model of current main flow gradually, is widely used in each side such as phonetic synthesis, speech conversion.It adopts with VOCODER the thought of the source filter being prototype to characterize voice signal, and voice signal is regarded as the result of pumping signal by exporting after time-varying linear filter.After analyzing the phonetic speech power spectrum obtaining each frame, the smoothing processing on time-frequency domain is carried out to this power spectrum, on time shaft and frequency axis, carries out over-sampling simultaneously, ensure that synthesis phase reconstructs the high-quality of voice.
In recent years, sparse representation theory obtained very fast development, and was applied to numerous areas, as: image noise reduction, blind source separating, speech enhan-cement etc.Above-mentioned application is all the related sparse coefficient in order to obtain sparse territory, characterizes the internal characteristics of voice signal.Itself also there are some defects in STRAIGHT model.The smooth power spectrum envelop parameter gone out through STRAIGHT model extraction has suitable redundant information, and this model value obtains further perfect.But scholars seldom pay close attention to the improvement of STRAIGHT model, therefore, how to be combined with sparse representation theory by STRAIGHT model, further compact model parameter becomes a major issue of the further application and development of this model of restriction.
Summary of the invention
The object of the invention is to overcome the problems referred to above, a kind of speech conversion and reconstructing method of the adaptive interpolation weighted spectral model based on Mel-KSVD rarefaction representation are provided, realize while maintenance synthetic speech quality is substantially constant, STRAIGHT model is combined with sparse representation theory, model output parameters is further compressed, reduce the number of the transmission of parameter, reduce STRAIGHT model calculated amount, thus improve the synthesis quality of voice.
Technical scheme of the present invention is considered from the following aspect: STRAIGHT model is a kind of speech model based on power spectrum.Its smooth power spectrum parameter is a kind of power spectrum after time-frequency domain compensates, and has certain redundant information.Therefore by the output parameter of the method compact model of Mel-KSVD, rarefaction representation is carried out to it, also finally reach the number of the transmission reducing parameter according to the sparse coefficient synthetic speech obtained, reduce the object of STRAIGHT model calculated amount.
Technical scheme of the present invention is as follows:
Based on speech conversion and the reconstructing method of the adaptive interpolation weighted spectral model of Mel-KSVD rarefaction representation, it is characterized in that, utilize the method for Mel-KSVD to carry out rarefaction representation to the smooth power spectrum parameter extracted through STRAIGHT analytical model, comprise following steps:
(1) voice signal to be synthesized is inputted, voice signal is extracted level and smooth spectrum by STRAIGHT analytical model: first adopt time-frequency penalty method to extract power spectrum, then again low-frequency band compensation carried out to power spectrum and cross smooth compensating, finally the tone-off frame of power spectrum is processed, to obtain smooth power spectrum, the parameter of smooth power spectrum forms a data matrix, is set to Y=[y 1..., y m];
(2) the smooth power spectrum parameter extracted is by carrying out the training of dictionary after Mei Er wave filter, recycling Mel-KSVD algorithm is to formula: constraint condition is carry out the Optimization Solution of parameter D and X,
Wherein M is the matrix of coefficients of Mei Er bank of filters, Y=[y 1..., y m] represent power spectrum parameters matrix, D=[d 1..., d k] be target training dictionary, d irepresent an atom of dictionary, x kfor y kthe sparse spike that D projects, X=[x 1..., x m], || || ffor Frobenius norm, || || 0it is 0 norm;
(3) the target training dictionary of optimization is utilized with by Mei Er wave filter and Mel-KSVD algorithm the level and smooth spectrum parameter to the voice to be synthesized that STRAIGHT analytical model obtains carry out the sparse spike x that rarefaction representation obtains k, and the sparse coefficient matrix X=[x that will obtain 1..., x m] synthesis of voice is carried out by STRAIGHT synthetic model; By estimating to power spectrum parameters matrix the synthesis carrying out voice, estimated matrix is solution formula is k=1,2 ..., M.
Further technical scheme comprises:
Algorithm described in step (2) is to formula constraint condition is carry out the Optimization Solution of D and X, carry out as follows:
(2a) in the dictionary training stage, target dictionary D and reconstructed error relevant; MD in objective function is seen as a complicated dictionary D eq, dictionary D eqin atom d koptimization problem be classified as following formula:
< d eq , k , &delta; k > = arg min d k , x k | | E eq , k - d eq , k &delta; k | | F 2 ,
Wherein d eq, kd eqkth row, δ kit is the row k of X;
(2b) adopt singular value decomposition algorithm to above formula process,
E eq,k=UΣV T
d ~ eq , k = U ( : , 1 ) ,
&delta; ~ k = &Sigma; ( 1,1 ) * V ( : , 1 ) ,
Wherein, U and V is unitary matrix, and Σ is diagonal matrix, and its kth diagonal element is E ksingular value, U (:, 1) and V (:, 1) represents the first row of U and V respectively, and Σ (1,1) is the maximum singular value of Σ;
Obtain best dictionary atom be optimized for
When for all k=1,2 ..., M, carries out the iteration of sparse coefficient and dictionary updating, until and when substantially remaining unchanged, stopping the Optimization Solution to D, the dictionary now obtained is best dictionary export sparse coefficient matrix X=[x 1..., x m] and corresponding dictionary enter described step (3), otherwise repeat step (2a) and (2b).
The beneficial effect that the present invention reaches:
The speech conversion of Mel-KSVD sparse representation method and adaptive interpolation weighted spectral and reconstruct (STRAIGHT) model combine by the present invention, by compressing further the voice smooth power spectrum parameter extracted through STRAIGHT analytical model, the sparse coefficient utilizing Mel-KSVD rarefaction representation to obtain is passed to the reconstruct of STRAIGHT synthetic model and synthetic speech signal.The model that the present invention proposes is compared with traditional K-SVD model, and its synthetic speech quality is totally suitable, in male voice voice, be even more better than conventional model.In addition, owing to compressing smooth power spectrum parameter after the analysis phase, decrease the Transfer Parameters of model, the calculated amount of model is greatly reduced.
Accompanying drawing explanation
Fig. 1 is a kind of speech conversion of adaptive interpolation weighted spectral based on Mel-KSVD rarefaction representation of the present invention and the frame diagram of reconstruction model;
Fig. 2 is the sound spectrograph of the present invention to men and women's phonosynthesis voice, the first behavior male voice voice sound spectrograph, the second behavior female voice voice sound spectrograph;
Fig. 3 is the voice quality statistical graph of method of the present invention and other two kinds of Measures compare;
Synthetic speech quality statistical graph when Fig. 4 is different dictionary atom number in the present invention.
Embodiment
Below in conjunction with accompanying drawing, the speech conversion of a kind of adaptive interpolation weighted spectral model based on Mel-KSVD rarefaction representation of the present invention and reconstructing method are further elaborated.
As shown in Figure 1, a kind of speech conversion of the adaptive interpolation weighted spectral model based on rarefaction representation and reconstructing method, first the power spectrum parameters of training utterance signal is extracted based on STRAIGHT analytical model, then the method for Mel-KSVD is utilized to train dictionary D adaptively, utilize the method for Mel-KSVD to carry out rarefaction representation to power spectrum parameters, by constantly iteration renewal dictionary D and sparse spike x simultaneously k, until reconstructed error value kept stable and till being less than certain threshold value, export target dictionary with sparse spike x k.And then the target dictionary that will obtain with sparse spike x kpass to STRAIGHT synthetic model, carry out the synthesis of voice.
As shown in Figure 1, based on speech conversion and the reconstructing method of the adaptive interpolation weighted spectral model of Mel-KSVD rarefaction representation, following steps are comprised:
(1) voice signal to be synthesized is inputted, passed through STRAIGHT analytical model and extracted level and smooth spectrum, namely time-frequency penalty method is first adopted to extract power spectrum, then again it is carried out to low-frequency band compensation, crosses smooth compensating, finally its tone-off frame is processed, just obtain smooth power spectrum, its parameter forms a data matrix, is set to Y=[y 1..., y m].
(2) smooth power spectrum extracted is by after Mei Er wave filter, and recycling Mel-KSVD algorithm is to formula: min D , X ( | | M ( Y - DX ) | | F 2 + &lambda; &Sigma; i = 1 M | | x i | | 0 ) , Constraint condition is | | M ( Y - DX ) | | F 2 &le; &epsiv; , Carry out the Optimization Solution of parameter D and X,
Wherein M is the matrix of coefficients of Mei Er bank of filters, Y=[y 1..., y m] represent power spectrum parameters matrix, D=[d 1..., d k] be target training dictionary, d irepresent an atom of dictionary, x kfor y kthe sparse spike that D projects, X=[x 1..., x m] .|||| ffor Frobenius norm, || || 2it is 0 norm.
According to above-mentioned steps (2), utilize Mel-KSVD algorithm to formula: min D , X ( | | M ( Y - DX ) | | F 2 + &lambda; &Sigma; i = 1 M | | x i | | 0 ) , Constraint condition is | | M ( Y - DX ) | | F 2 &le; &epsiv; Carry out the Optimization Solution of D and X, it carries out as follows:
(2a) in the dictionary training stage, target dictionary D and reconstructed error relevant.MD in objective function is regarded as a complicated dictionary D at this eq, dictionary D eqin atom d koptimization problem can be classified as following formula:
< d eq , k , &delta; k > = arg min d k , x k | | E eq , k - d eq , k | | F 2 ,
Wherein d eq, kd eqkth row, δ kit is the row k of X.
(2b) adopt svd (SVD) algorithm to above formula process,
E eq,k=UΣV T
d ~ eq , k = U ( : , 1 ) ,
&delta; ~ k = &Sigma; ( 1,1 ) * V ( : , 1 ) ,
Wherein, U and V is unitary matrix, and Σ is diagonal matrix, and its kth diagonal element is E ksingular value, U (:, 1) and V (:, 1) represents the first row of U and V respectively, and Σ (1,1) is the maximum singular value of Σ.
Therefore, best dictionary is obtained its atom be optimized for
When for all k=1,2 ..., M, carries out the iteration of sparse coefficient and dictionary updating, until and almost remain unchanged constantly, stop the Optimization Solution to dictionary D, the dictionary now obtained is best dictionary export sparse coefficient matrix X=[x 1..., x m] and corresponding dictionary enter step (3), otherwise repeat to step (2a) and (2b).
(3) the target training dictionary of training dictionary module optimization is utilized with sparse spike x k, by its sparse coefficient matrix X=[x 1..., x m] synthesis of voice is carried out by STRAIGHT synthetic model.Estimate power spectrum parameters matrix during synthesis, estimated matrix is its solution formula is k=1,2 ..., M.
Effect of the present invention can be further illustrated by following experiment:
1) experiment condition
In this experiment employing TIMIT sound bank, voice are as experimental data, and speech sampling rates is 8kHz, and voice frame length is 30ms, and frame displacement 1ms, spectrum analysis adopts the Fast Fourier Transform (FFT) of 1024.Adopt Matlab R2011b as emulation tool, allocation of computer is Intel Duo i53210/4G.
2) experiment content
Experiment respectively to utilizing men and women's sound voice of Mel cepstrum coefficient (MFCC) compression method, KSVD rarefaction representation algorithm and Mel-KSVD rarefaction representation algorithm of the present invention to carry out the synthesis of voice respectively, and is contrasted to the sound spectrograph of the voice utilizing said method to synthesize and the sound spectrograph of raw tone.When making said method synthetic speech quality statistical graph, with the voice of former STRAIGHT model synthesis for fundamental tone.Finally, also in different dictionary atom number situation, the Mel-KSVD algorithm that the present invention the is proposed statistics of having made voice quality with compare.
First, the sound spectrograph of synthesis men and women sound voice is compared, result as shown in Figure 2, wherein Fig. 2 (a), e () is respectively original Nan ﹑ female voice voice, Fig. 2 (b), f () is respectively the synthetic speech of MFCC compression method, Fig. 2 (c), g () is respectively the synthetic speech of KSVD rarefaction representation algorithm, Fig. 2 (d), and (h) is respectively the synthetic speech of Mel-KSVD rarefaction representation algorithm of the present invention, the dictionary atom number that wherein the wave filter number of MFCC and Mel-KSVD is set to 70, KSVD and Mel-KSVD is set to 70;
Secondly, to utilizing men and women's phonosynthesis voice quality of above-mentioned three kinds of methods to contrast respectively, the dictionary atom number that wherein the wave filter number of MFCC and Mel-KSVD is set to 70, KSVD and Mel-KSVD is set to 90, and result as shown in Figure 3.
Finally, to utilizing Mel-KSVD algorithm of the present invention, the men and women's phonosynthesis voice quality when different dictionary atom number is contrasted, and wherein the wave filter number of Mel-KSVD is set to 70, and result as shown in Figure 4.
3) interpretation
As can be seen from Figure 2, when the synthetic speech of Mel-KSVD method of the present invention, traditional KSVD algorithm is respectively compared with MFCC compression method, before the synthetic effect of low band speech of two kinds of methods better, as the place's of drawing a circle instruction in figure.The synthetic speech effect in low-frequency band as Fig. 2, Mel-KSVD method and traditional KSVD algorithm is suitable, and this is mainly because Mel wave filter arranges relative close in low-frequency band.But, the comparatively strong and male voice voice of rule for harmonic wave, the present invention is better than traditional KSVD algorithm in the effect of high frequency band; For the female voice voice that harmonic wave changes greatly, harmonic performance enhancing may make schoolgirl's synthetic speech become machinery, and schoolgirl's voice quality of therefore the present invention's generation is only a little more than the voice of traditional KSVD algorithm synthesis;
In the voice quality statistical graph that the distinct methods of Fig. 3 synthesizes, the evaluation of voice quality adopts voice to experience quality evaluation (PESQ) for objective evaluation index.As can be seen from Figure 3, compared with original STRAIGHT model, no matter men and women's sound voice, Mel-KSVD algorithm of the present invention obtains higher PESQ score, all improves about 0.05.Because be an accurate language spectrum through STRAIGHT model level and smooth spectrum out, wherein noise estimates according to adjacent harmonic wave, namely in the level and smooth spectrum extracted, introduces noise.And the target of sparse representation theory is the major component of restoring signal, ignores the composition of similar noise, with noise reduction process similar process, so algorithm of the present invention is relatively better to the noise processed ground introduced in level and smooth spectrum.As shown in Figure 3, compared with MFCC compression method, synthetic speech quality of the present invention is better, especially male voice voice, and its PESQ score improves close to 0.1, and concerning female voice voice, its PESQ score improves nearly 0.05.As Fig. 3, because the present invention introduces the Mel wave filter of auditory perceptual, compared with traditional KSVD method, synthetic speech quality of the present invention also slightly improves.
As can be seen from Figure 4, and when different dictionary atom numbers (dictionary atom number is respectively 30, and 50,70,90), the performance that the present invention is based on the STRAIGHT model synthetic speech quality of Mel-KSVD algorithmic notation is also different, to the synthesis quality of men and women's sound also difference to some extent.As shown in Figure 4, along with the increase of dictionary atom number, male voice phonetic synthesis quality is improving always, and when adopting 90 dictionary atomic update dictionary rarefaction representation power spectrum parameters, its synthetic speech quality is best, improves nearly 0.1 when being 30 than atomicity.This is because along with the increase of atomicity, rarefaction representation is more and more accurate.Visible in Fig. 4, but for female voice voice, when atomicity 70, synthetic effect is best, and after atomicity is more than 70, the increase of atom number causes the decline of synthetic speech quality on the contrary.This is because the excessive sparse rarefaction representation introducing too much noise contribution accurately.
The above is only the preferred embodiment of the present invention; here should particularly point out; for those skilled in the art; under the prerequisite not departing from the technology of the present invention principle; can also make some improvement and distortion, these improve and distortion also should be considered as protection scope of the present invention.

Claims (1)

1. based on speech conversion and the reconstructing method of the adaptive interpolation weighted spectral model of Mel-KSVD rarefaction representation, it is characterized in that, utilize the method for Mel-KSVD to carry out rarefaction representation to the smooth power spectrum parameter extracted through STRAIGHT analytical model, comprise following steps:
(1) voice signal to be synthesized is inputted, voice signal is extracted level and smooth spectrum by STRAIGHT analytical model: first adopt time-frequency penalty method to extract power spectrum, then again low-frequency band compensation carried out to power spectrum and cross smooth compensating, finally power spectrum tone-off frame is processed, to obtain smooth power spectrum, the parameter of smooth power spectrum forms a data matrix, is set to Y=[y 1..., y m];
(2) the smooth power spectrum parameter extracted is by carrying out the training of dictionary after Mei Er wave filter, recycling Mel-KSVD algorithm is to formula: constraint condition is carry out the Optimization Solution of parameter D and X,
Wherein M is the matrix of coefficients of Mei Er bank of filters, Y=[y 1..., y m] represent power spectrum parameters matrix, D=[d 1..., d k] be target training dictionary, d irepresent an atom of dictionary, x kfor y kthe sparse spike that D projects, X=[x 1..., x m], ε is reconstruct error threshold, || .|| ffor Frobenius norm, || .|| 0it is 0 norm;
(3) the target training dictionary of optimization is utilized with by Mei Er wave filter and Mel-KSVD algorithm the level and smooth spectrum parameter to the voice to be synthesized that STRAIGHT analytical model obtains carry out the sparse spike x that rarefaction representation obtains k, and the sparse coefficient matrix X=[x that will obtain 1..., x m] synthesis of voice is carried out by STRAIGHT synthetic model; By estimating to power spectrum parameters matrix the synthesis carrying out voice, estimated matrix is solution formula is k=1,2 ..., M;
Algorithm described in step (2) is to formula constraint condition is carry out the Optimization Solution of D and X, carry out as follows:
(2a) in the dictionary training stage, target dictionary D and reconstructed error relevant; MD in objective function is seen as a complicated dictionary D eq, dictionary D eqin atom d koptimization problem be classified as following formula:
< d eq , k , &delta; k > = arg min d k , x k | | E eq , k - d eq , k &delta; k | | F 2 ,
Wherein d eq, kd eqkth row, δ kit is the row k of X;
(2b) adopt singular value decomposition algorithm to above formula process,
E eq,k=UΣV T,
d ~ eq , k = U ( : , 1 ) ,
&delta; ~ k = &Sigma; ( 1,1 ) * V ( : , 1 ) ,
Wherein, U and V is unitary matrix, and Σ is diagonal matrix, and its kth diagonal element is E ksingular value, U (:, 1) and V (:, 1) represents the first row of U and V respectively, and Σ (1,1) is the maximum singular value of Σ;
Obtain best dictionary atom be optimized for
When for all k=1,2 ..., M, carries out the iteration of sparse coefficient and dictionary updating, until time, stop the Optimization Solution to D, the dictionary now obtained is best dictionary export sparse coefficient matrix X=[x 1..., x m] and corresponding dictionary enter described step (3), otherwise repeat step (2a) and (2b).
CN201310211046.3A 2013-05-29 2013-05-29 Self-adaptation interpolation weighted spectrum model voice conversion and reconstructing method based on Mel-KSVD sparse representation Expired - Fee Related CN103345920B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310211046.3A CN103345920B (en) 2013-05-29 2013-05-29 Self-adaptation interpolation weighted spectrum model voice conversion and reconstructing method based on Mel-KSVD sparse representation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310211046.3A CN103345920B (en) 2013-05-29 2013-05-29 Self-adaptation interpolation weighted spectrum model voice conversion and reconstructing method based on Mel-KSVD sparse representation

Publications (2)

Publication Number Publication Date
CN103345920A CN103345920A (en) 2013-10-09
CN103345920B true CN103345920B (en) 2015-07-15

Family

ID=49280711

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310211046.3A Expired - Fee Related CN103345920B (en) 2013-05-29 2013-05-29 Self-adaptation interpolation weighted spectrum model voice conversion and reconstructing method based on Mel-KSVD sparse representation

Country Status (1)

Country Link
CN (1) CN103345920B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104240717B (en) * 2014-09-17 2017-04-26 河海大学常州校区 Voice enhancement method based on combination of sparse code and ideal binary system mask
CN106782599A (en) * 2016-12-21 2017-05-31 河海大学常州校区 The phonetics transfer method of post filtering is exported based on Gaussian process
CN108766450B (en) * 2018-04-16 2023-02-17 杭州电子科技大学 Voice conversion method based on harmonic impulse decomposition
CN110853679B (en) * 2019-10-23 2022-06-28 百度在线网络技术(北京)有限公司 Speech synthesis evaluation method and device, electronic equipment and readable storage medium
CN111507418B (en) * 2020-04-21 2022-09-06 中国科学技术大学 Encaustic tile quality detection method
CN117459187B (en) * 2023-12-25 2024-03-12 深圳市迈威数字电视器材有限公司 High-speed data transmission method based on optical fiber network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101950559A (en) * 2010-07-05 2011-01-19 李华东 Method for synthesizing continuous speech with large vocabulary and terminal equipment
CN102664021A (en) * 2012-04-20 2012-09-12 河海大学常州校区 Low-rate speech coding method based on speech power spectrum
CN102930863A (en) * 2012-10-19 2013-02-13 河海大学常州校区 Voice conversion and reconstruction method based on simplified self-adaptive interpolation weighting spectrum model

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3515039B2 (en) * 2000-03-03 2004-04-05 沖電気工業株式会社 Pitch pattern control method in text-to-speech converter
JP5195652B2 (en) * 2008-06-11 2013-05-08 ソニー株式会社 Signal processing apparatus, signal processing method, and program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101950559A (en) * 2010-07-05 2011-01-19 李华东 Method for synthesizing continuous speech with large vocabulary and terminal equipment
CN102664021A (en) * 2012-04-20 2012-09-12 河海大学常州校区 Low-rate speech coding method based on speech power spectrum
CN102930863A (en) * 2012-10-19 2013-02-13 河海大学常州校区 Voice conversion and reconstruction method based on simplified self-adaptive interpolation weighting spectrum model

Also Published As

Publication number Publication date
CN103345920A (en) 2013-10-09

Similar Documents

Publication Publication Date Title
CN103345920B (en) Self-adaptation interpolation weighted spectrum model voice conversion and reconstructing method based on Mel-KSVD sparse representation
Cooke et al. Intelligibility-enhancing speech modifications: the hurricane challenge.
CN105957537B (en) One kind being based on L1/2The speech de-noising method and system of sparse constraint convolution Non-negative Matrix Factorization
CN108986834A (en) The blind Enhancement Method of bone conduction voice based on codec framework and recurrent neural network
CN110648684B (en) Bone conduction voice enhancement waveform generation method based on WaveNet
CN108447495A (en) A kind of deep learning sound enhancement method based on comprehensive characteristics collection
CN103531205A (en) Asymmetrical voice conversion method based on deep neural network feature mapping
CN1815552B (en) Frequency spectrum modelling and voice reinforcing method based on line spectrum frequency and its interorder differential parameter
CN105023580A (en) Unsupervised noise estimation and speech enhancement method based on separable deep automatic encoding technology
CN104900229A (en) Method for extracting mixed characteristic parameters of voice signals
CN101853661A (en) Noise spectrum estimation and voice mobility detection method based on unsupervised learning
CN102930863B (en) Voice conversion and reconstruction method based on simplified self-adaptive interpolation weighting spectrum model
CN104217730A (en) Artificial speech bandwidth expansion method and device based on K-SVD
CN107274887A (en) Speaker&#39;s Further Feature Extraction method based on fusion feature MGFCC
Sadasivan et al. Joint dictionary training for bandwidth extension of speech signals
CN103093757B (en) Conversion method for conversion from narrow-band code stream to wide-band code stream
Takamichi et al. Parameter generation algorithm considering modulation spectrum for HMM-based speech synthesis
CN104240717B (en) Voice enhancement method based on combination of sparse code and ideal binary system mask
CN105679321A (en) Speech recognition method and device and terminal
CN113096680A (en) Far-field speech recognition method
Katsir et al. Evaluation of a speech bandwidth extension algorithm based on vocal tract shape estimation
He et al. Spectrum enhancement with sparse coding for robust speech recognition
Lian et al. Whisper to normal speech based on deep neural networks with MCC and F0 features
Liu et al. Spectral envelope estimation used for audio bandwidth extension based on RBF neural network
Zheng et al. Bandwidth extension WaveNet for bone-conducted speech enhancement

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20150715

Termination date: 20200529

CF01 Termination of patent right due to non-payment of annual fee