CN102214463A - Imbedded voice synthesis method based on adaptive weighted spectrum interpolation coefficient - Google Patents

Imbedded voice synthesis method based on adaptive weighted spectrum interpolation coefficient Download PDF

Info

Publication number
CN102214463A
CN102214463A CN201110145478XA CN201110145478A CN102214463A CN 102214463 A CN102214463 A CN 102214463A CN 201110145478X A CN201110145478X A CN 201110145478XA CN 201110145478 A CN201110145478 A CN 201110145478A CN 102214463 A CN102214463 A CN 102214463A
Authority
CN
China
Prior art keywords
spectrum
coefficient
straight
synthetic
sound channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201110145478XA
Other languages
Chinese (zh)
Inventor
王朝民
那兴宇
谢湘
何娅玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING YUYIN TIANXIA TECHNOLOGY Co Ltd
Original Assignee
BEIJING YUYIN TIANXIA TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING YUYIN TIANXIA TECHNOLOGY Co Ltd filed Critical BEIJING YUYIN TIANXIA TECHNOLOGY Co Ltd
Priority to CN201110145478XA priority Critical patent/CN102214463A/en
Publication of CN102214463A publication Critical patent/CN102214463A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses an imbedded voice synthesis method based on an adaptive weighted spectrum interpolation coefficient, which is used in an imbedded operation system for transforming arbitrary received characters into voices and outputting the voices. The method comprises the following steps of: at a training end, extracting a base frequency adaptive weighted spectrum interpolation (a STRAIGHT spectrum) for voice signals, extracting a track spectrum characteristic coefficient for the STRAIGHT spectrum, and modeling and training the characteristic coefficient through HTS; and at a synthesis end, after a characteristic coefficient sequence is calculated via a model, acquiring a synthesis voice by a traditional parameter synthesizer. When the method provided by the invention is used, the synthesis voice quality equivalent to that of a STRAIGHT synthesizer can be acquired, and the STRAIGHT synthesizer is replaced by the traditional parameter synthesizer at the synthesis end so as to greatly improve the synthesis speed, and the embedded application becomes possible.

Description

A kind of embedded language synthetic method based on adaptive weighted spectrum interpolation coefficient
Technical field
Present invention relates in general to a kind of embedded language synthetic method, especially storage and the limited terminal device of calculation resources based on adaptive weighted spectrum interpolation coefficient.
Background technology
Flourish along with mobile Internet and technology of Internet of things, embedded device such as mobile phone, e-book terminal progressively becomes people's the most direct daily information and obtains and the processing approach, voice then are the most direct naturally interactive meanses, therefore the development of embedded speech synthetic technology is trend of the times, has urgent market application demand.
The aim of speech synthesis technique is the perfect reproduction mankind's a sound, just allows machine can imitate characteristics such as human voice, pronunciation style and the rhythm.Traditional speech synthesis technique is on the splicing synthetic method that is based upon based on extensive corpus, and the simple and synthetic tonequality height of technology once was widely adopted.But the sound storehouse scale of this method is big, though by after the processing of technological means such as cluster, coding and compression, the space can reduce, and tonequality sustains damage, and flexibility ratio descends.Therefore, statistical modeling parameter synthetic method based on extensive corpus is widely studied in recent years, basic thought is, a large amount of raw tone storehouses carried out parametrization is represented and statistical modeling, select model component model sequence according to ad hoc rules when synthetic, further calculate the argument sequence of synthetic statement, by the synthetic satisfactory voice of the synthetic method of parametrization.The voice synthetic by parametrization statistical modeling method have higher naturalness and degree of intelligence.Be speech synthesis technique by everybody broad research and employing at present based on HMM.The selection of speech characteristic parameter has determined the tonequality of synthetic speech to a great extent, and characteristic parameter generally comprises driving source parameter and sound channel spectrum parameter etc.General sound channel spectral coefficient is to extract from the Short Time Fourier Transform spectrum, can directly finish the synthetic of voice by traditional parameters compositor (as cepstrum wave filter or linear prediction filter) at synthetic end, and tonequality is better.Adaptive weighted spectrum interpolation (STRAIGHT) the speech analysis composition algorithm of Ti Chuing was removed by the periodicity that will have time-domain and frequency-domain in the Short Time Fourier Transform spectrum now in recent years, obtain the level and smooth frequency spectrum of aperiodicity disturbance, can synthesize the more natural voice of high tone quality.But the STRAIGHT compositor is to finish sound channel filtering by the frequency spectrum convolution, and computing cost is very big, accounts for very much computational resource, can't satisfy practical requirement for the limited terminal device of computing and storage resources.
Therefore, need a kind of improved method, can under embedded platform, realize taking the less parameterised speech synthesis system of computational resource, and can be with the tonequality of the synthetic speech synthetic speech near STRAIGHT.
Summary of the invention
Technical matters to be solved by this invention provides a kind of embedded language synthetic method based on adaptive weighted spectrum interpolation coefficient, can under embedded platform, realize taking the less parameterised speech synthesis system of computational resource, and can be with the tonequality of the synthetic speech synthetic speech near STRAIGHT.
For achieving the above object, this paper provides a kind of embedded language synthetic method based on adaptive weighted spectrum interpolation (STRAIGHT spectrum) coefficient, is used for embedded OS, and any text conversion that receives is become voice output.Can obtain the synthetic speech tonequality suitable, and replace the STRAIGHT compositor significantly to improve aggregate velocity by the traditional parameters compositor, and make it Embedded Application and become possibility at synthetic end with the STRAIGHT compositor.The speech synthesis system of using this method is divided into following two parts:
A. train part: at first voice signal is extracted the STRAIGHT spectrum, then the STRAIGHT spectrum is extracted sound channel spectrum signature coefficient, and then by HTS to characteristic coefficient modeling, training;
B. composite part: after obtaining calculating the characteristic coefficient sequence by model, obtain synthetic speech by the traditional parameters compositor.
Above-mentioned embedded language synthetic method based on adaptive weighted spectrum interpolation coefficient, its phonetic synthesis training end sound channel spectrum signature coefficient leaching process is divided into following four steps:
A. the voice signal in the training utterance database is carried out parameter extraction, be respectively fundamental frequency, gain and STRAIGHT spectrum;
B. from the STRAIGHT spectrum that obtains, extract sound channel spectrum signature coefficient again;
C. will gain with sound channel spectrum signature coefficient be combined into new sound channel spectrum signature coefficient;
D. fundamental frequency and new sound channel spectral coefficient are carried out the HMM model training as the characteristic parameter sequence in the lump.
Above-mentioned embedded language synthetic method based on adaptive weighted spectrum interpolation coefficient, the synthetic end compositor synthetic speech process of its phonetic synthesis is divided into following three steps:
A. from model, generate fundamental frequency and sound channel spectral coefficient sequence by the parameter calculation algorithm;
B. generate the driving source of synthetic speech by the fundamental frequency sequence;
C. driving source and sound channel spectral coefficient sequence are obtained synthetic speech by the traditional parameters compositor.
Embedded speech synthesis system according to said method is set up can realize taking the less relatively parameterised speech synthesis system of computational resource fully under embedded platform, and can be with the tonequality of the synthetic speech synthetic speech near STRAIGHT.
The present invention is further described below in conjunction with drawings and Examples, will describe step of the present invention and the process of realizing better to the detailed description of each building block of system in conjunction with the drawings.
Description of drawings
Accompanying drawing 1 is based on the speech synthesis system structured flowchart of HMM
The LSP coefficient of accompanying drawing 2STRAIGHT spectrum extracts synoptic diagram
Accompanying drawing 3 parameter compositor synthetic speech synoptic diagram
1. voice language material databases among the figure, 2. driving source parameter extraction, 3.HMM model training, 4.HMM mode set 5. generates parameter, 6. text analyzing by the HMM model, 7. driving source generates, 8. synthetic filtering, 9. sound channel spectrum parameter extraction, 10. voice signal, 11. driving source parameters, 12. sound channels spectrum parameter, 13. synthetic speech, 14. synthesis texts, 15. training parts, 16. composite part, 17. mark texts, 18. training end characteristic parameter extraction, 19. voice signal data, 20.TANDEM-STRAIGHT analyze 21.STRAIGHT spectrum, 22.LSP parameter, 23. gain, 24. new LSP parameter, 25.lsp[0], 26. fundamental frequencies, 27.lsp2ipc, 28.LPC wave filter, 29. synthetic end parameter synthetic filterings, 30. driving sources.
Embodiment
As shown in Figure 1, in embodiments of the invention, speech synthesis system is deployed in a kind of embedded OS, and this embedded speech synthesis system comprises: phonetic synthesis training end and synthetic end.Wherein, phonetic synthesis model training part is only used under system line, desired compression model bank when only being used to generate speech synthesis system work; The composite part of phonetic synthesis (16) then is to finish on chip.Because the present invention focuses on Parameter Extraction with synthetic, and mark text (17), text analyzing (6), modeling, training and parameter generation are not focus of the present invention, so highlighting the parameter extraction of training end and the composite filter of parameter reconstruction and synthetic end below selects.Present embodiment has selected LSP (line spectrum pair) parameter to compose parameter (12) as sound channel, and selects for use LPC wave filter (28) as composite filter, and speech data is the 16K sampling.
The characteristic parameter extraction (as Fig. 2) of training end:
Step 1, to the training utterance data carry out time domain firm power spectrum estimate (TANDEM-STRAIGHT algorithm) thus obtain fundamental frequency, STRAIGHT spectrum (21) and gain (23).
Step 2 uses broad sense cepstral analysis algorithm to extract LPC coefficient (22) from STRAIGHT spectrum (21), wherein uses the notion of Mei Er broad sense cepstral analysis to come the conversion spectrum coefficient, and the formula of broad sense cepstrum is:
H ( z ) = s γ - 1 { Σ m = 0 M c α , γ ( m ) z - m ) )
= ( 1 + &gamma; &Sigma; m + 1 M c &alpha; , &gamma; ( m ) z ~ - m ) 1 / &gamma; , - 1 &le; &gamma; < 0 ; exp &Sigma; m = 1 M c &alpha; , &gamma; ( m ) z ~ - m , &gamma; = 0 ;
Wherein, c Alpha, gamma(m) be Mei Er broad sense cepstrum coefficient, α represents the frequency bending, the expression precision of γ control zero limit.
Step 3 converts the LPC coefficient of gained to LSP coefficient (22).
Step 4 replaces the 0th of LSP to tie up parameter gain, generates new LSP sound channel spectral coefficient.
Step 5 composes new LSP sound channel with the characteristic parameter use HMM model training (3) of fundamental frequency as voice signal.
The composite filter of synthetic end is selected (as Fig. 3):
Step 1, LSP coefficient (22) sequence that parameter calculation is obtained converts the LPC coefficient sequence to.
Step 2 obtains driving source (30) signal by fundamental frequency (26) sequence.
Step 3 obtains synthetic speech (13) with driving source by LPC wave filter (28).
Above-mentioned example is preferred embodiment of the present invention, wherein sound channel spectral coefficient (12) can be selected MGC for use, corresponding composite filter is then selected the MLSA wave filter for use, effect is fine equally, but the MLSA wave filter requires higher with respect to LPC wave filter (28) to computing power, so in embedded device, it is good selecting LSP coefficient (22).
When the present invention uses on embedded device, the IO interface that all audio frequency input and output all can use equipment itself to provide.Phonetic function can be opened on equipment or close at any time.When the not enabled phonetic function, the various functions of original equipment are not affected.
Application of the present invention can be used for various embedded type terminal equipments.According to main design of the present invention, those of ordinary skill in the art all can produce the low or of equal value application of multiple class.Therefore, protection of the present invention should be as the criterion with the protection domain of claim.

Claims (3)

1. the embedded language synthetic method based on adaptive weighted spectrum interpolation (STRAIGHT spectrum) coefficient is used for embedded OS, and any text conversion that receives is become voice output.Can obtain the synthetic speech tonequality suitable, and replace the STRAIGHT compositor significantly to improve aggregate velocity by the traditional parameters compositor, and make it Embedded Application and become possibility at synthetic end with the STRAIGHT compositor.The speech synthesis system of using this method is divided into following two parts:
A. train part: at first voice signal is extracted the STRAIGHT spectrum, then the STRAIGHT spectrum is extracted sound channel spectrum signature coefficient, and then by HTS to characteristic coefficient modeling, training;
B. composite part: after obtaining calculating the characteristic coefficient sequence by model, obtain synthetic speech by the traditional parameters compositor.
2. the embedded language synthetic method based on adaptive weighted spectrum interpolation coefficient according to claim 1 is characterized in that: in the described A step, phonetic synthesis training end sound channel spectrum signature coefficient leaching process is divided into following four steps:
A. the voice signal in the training utterance database is carried out parameter extraction, be respectively fundamental frequency, gain and STRAIGHT spectrum;
B. from the STRAIGHT spectrum that obtains, extract sound channel spectrum signature coefficient again;
C. will gain with sound channel spectrum signature coefficient be combined into new sound channel spectrum signature coefficient;
D. fundamental frequency and new sound channel spectral coefficient are carried out the HMM model training as the characteristic parameter sequence in the lump.
3. the embedded language synthetic method based on adaptive weighted spectrum interpolation coefficient according to claim 1 is characterized in that: in the described B step, the synthetic end compositor synthetic speech process of phonetic synthesis is divided into following three steps:
A. from model, generate fundamental frequency and sound channel spectral coefficient sequence by the parameter calculation algorithm;
B. generate the driving source of synthetic speech by the fundamental frequency sequence;
C. driving source and sound channel spectral coefficient sequence are obtained synthetic speech by the traditional parameters compositor.
CN201110145478XA 2011-06-01 2011-06-01 Imbedded voice synthesis method based on adaptive weighted spectrum interpolation coefficient Pending CN102214463A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110145478XA CN102214463A (en) 2011-06-01 2011-06-01 Imbedded voice synthesis method based on adaptive weighted spectrum interpolation coefficient

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110145478XA CN102214463A (en) 2011-06-01 2011-06-01 Imbedded voice synthesis method based on adaptive weighted spectrum interpolation coefficient

Publications (1)

Publication Number Publication Date
CN102214463A true CN102214463A (en) 2011-10-12

Family

ID=44745744

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110145478XA Pending CN102214463A (en) 2011-06-01 2011-06-01 Imbedded voice synthesis method based on adaptive weighted spectrum interpolation coefficient

Country Status (1)

Country Link
CN (1) CN102214463A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105206259A (en) * 2015-11-03 2015-12-30 常州工学院 Voice conversion method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010021906A1 (en) * 2000-03-03 2001-09-13 Keiichi Chihara Intonation control method for text-to-speech conversion
CN1815552A (en) * 2006-02-28 2006-08-09 安徽中科大讯飞信息科技有限公司 Frequency spectrum modelling and voice reinforcing method based on line spectrum frequency and its interorder differential parameter
CN101950559A (en) * 2010-07-05 2011-01-19 李华东 Method for synthesizing continuous speech with large vocabulary and terminal equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010021906A1 (en) * 2000-03-03 2001-09-13 Keiichi Chihara Intonation control method for text-to-speech conversion
CN1815552A (en) * 2006-02-28 2006-08-09 安徽中科大讯飞信息科技有限公司 Frequency spectrum modelling and voice reinforcing method based on line spectrum frequency and its interorder differential parameter
CN101950559A (en) * 2010-07-05 2011-01-19 李华东 Method for synthesizing continuous speech with large vocabulary and terminal equipment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
《数据采集与处理》 20050331 凌震华; 戴礼荣; 王仁华; 双志伟; 周斌; 基于自适应加权谱内插的宽带语音编码算法 , *
《电声技术》 20100930 张正军; 杨卫英; 陈赞; 基于STRAIGHT模型和人工神经网络的语音转换 , *
张正军; 杨卫英; 陈赞;: "基于STRAIGHT模型和人工神经网络的语音转换", 《电声技术》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105206259A (en) * 2015-11-03 2015-12-30 常州工学院 Voice conversion method

Similar Documents

Publication Publication Date Title
US9368103B2 (en) Estimation system of spectral envelopes and group delays for sound analysis and synthesis, and audio signal synthesis system
CN102385859B (en) Method and system for synthesizing parameter voice
Childers et al. Voice conversion
CN102664003B (en) Residual excitation signal synthesis and voice conversion method based on harmonic plus noise model (HNM)
CN1815552B (en) Frequency spectrum modelling and voice reinforcing method based on line spectrum frequency and its interorder differential parameter
Suni et al. Wavelets for intonation modeling in HMM speech synthesis
US20150025892A1 (en) Method and system for template-based personalized singing synthesis
US20110125493A1 (en) Voice quality conversion apparatus, pitch conversion apparatus, and voice quality conversion method
CN106128450A (en) The bilingual method across language voice conversion and system thereof hidden in a kind of Chinese
CN102184731A (en) Method for converting emotional speech by combining rhythm parameters with tone parameters
CN101527141B (en) Method of converting whispered voice into normal voice based on radial group neutral network
CN106653056A (en) Fundamental frequency extraction model based on LSTM recurrent neural network and training method thereof
CN113506562B (en) End-to-end voice synthesis method and system based on fusion of acoustic features and text emotional features
CN102201234A (en) Speech synthesizing method based on tone automatic tagging and prediction
CN102201240B (en) Harmonic noise excitation model vocoder based on inverse filtering
KR20170107683A (en) Text-to-Speech Synthesis Method using Pitch Synchronization in Deep Learning Based Text-to-Speech Synthesis System
CN110648684A (en) Bone conduction voice enhancement waveform generation method based on WaveNet
CN101887719A (en) Speech synthesis method, system and mobile terminal equipment with speech synthesis function
AU2015411306A1 (en) Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system
CN102231275B (en) Embedded speech synthesis method based on weighted mixed excitation
Oura et al. Deep neural network based real-time speech vocoder with periodic and aperiodic inputs
CN103886859B (en) Phonetics transfer method based on one-to-many codebook mapping
Razak et al. Emotion pitch variation analysis in Malay and English voice samples
CN104282300A (en) Non-periodic component syllable model building and speech synthesizing method and device
CN102214463A (en) Imbedded voice synthesis method based on adaptive weighted spectrum interpolation coefficient

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20111012