CN102184731A - Method for converting emotional speech by combining rhythm parameters with tone parameters - Google Patents
Method for converting emotional speech by combining rhythm parameters with tone parameters Download PDFInfo
- Publication number
- CN102184731A CN102184731A CN2011101220344A CN201110122034A CN102184731A CN 102184731 A CN102184731 A CN 102184731A CN 2011101220344 A CN2011101220344 A CN 2011101220344A CN 201110122034 A CN201110122034 A CN 201110122034A CN 102184731 A CN102184731 A CN 102184731A
- Authority
- CN
- China
- Prior art keywords
- speech
- emotional
- energy
- signal
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Auxiliary Devices For Music (AREA)
Abstract
The invention discloses a method for converting emotional speech by combining rhythm parameters (fundamental frequency, time length and energy) with a tone parameter (a formant), which mainly comprises the following steps of: 1, carrying out extraction and analysis of feature parameters on a Beihang University emotional speech database (BHUDES) emotional speech sample (containing neutral speech and four types of emotional speech of sadness, anger, happiness and surprise); 2, making an emotional speech conversion rule and defining each conversion constant according to the extracted feature parameters; 3, carrying out extraction of the feature parameters and fundamental tone synchronous tagging on the neutral speech to be converted; 4, setting each conversion constant according to the emotional speech conversion rule in the step 2, modifying a fundamental frequency curve, the time length and the energy and synchronously overlaying fundamental tones to synthesize a speech signal; and 5, carrying out linear predictive coding (LPC) analysis on the speech signal in the step 4 and modifying the formant by a pole of a transfer function so as to finally obtain the emotional speech rich in expressive force.
Description
Technical field
The present invention relates to voice signal and handle and artificial intelligence field, relate generally to the emotional speech conversion method that a kind of rhythm class and tonequality class parameter combine.
Background technology
Phonetic synthesis is an important component part in the man-machine interaction.Now people desired what hear no longer is the uninteresting machine sound that very high intelligibility is arranged, but voice with human interest that can show emotion.Existing phonetic synthesis level, still this stage from the literal to the phonetic synthesis of solution, literary composition language conversion (TTS:Text to Speech) just, the emotion information in the voice can not well be expressed.
In addition emotional speech can also and other multimedia technology combination, show emotion such as emotional speech being equipped with corresponding facial characteristics, make sound and expression synchronously, current relatively more popular " vision voice (visualspeech) " technology that Here it is.
Extract affective characteristics from voice signal, analyst's emotion is related with voice signal, and affective characteristics is applied to the research of phonetic synthesis aspect, is the research topic of just having risen in recent years in this field both at home and abroad.But a large amount of models also are not well solved.The emotional speech study on the synthesis is that emotion is calculated the problem of intersecting with these two fields of phonetic synthesis, and wherein phonetic synthesis research is longer, and emotion calculating then is relatively young research field.
PSOLA (Pitch Synchronous Overlap Add) is a kind of waveform concatenation algorithm that is used for speech synthesis technique.It and early stage waveform concatenation have the difference of principle: this algorithm is before adjusting the voice unit splicing, can carry out fundamental frequency to concatenation unit, the adjustment of duration and energy, and when adjusting, be that unit carries out waveform and revises with pitch period rather than traditional frame length, the integrality of pitch period as the level and smooth continuous basic premise that guarantees waveform and frequency spectrum.
In the conversion of emotional speech, it is ripe that PSOLA uses that is that all right, and it can only make amendment to the rhythm class parameter of voice signal, can not change tonequality class parameter.Therefore proposing a kind of conversion method more efficiently has very strong realistic meaning.
Summary of the invention
The present invention proposes a kind of method of changing rhythm class parameter and tonequality class parameter simultaneously and finish the conversion of emotional speech.
Main contents of the present invention are: the extraction statistics of the emotional speech sample being carried out characteristic parameter, formulate transformation rule, according to the fundamental curve and the resonance peak position of rule change voice, finish the conversion of neutral voice to four kind of emotional speech (sad, indignation, glad and surprised) then.
The concrete steps of this method are as follows:
Step 1: the extraction and analysis that the emotional speech sample is carried out (comprising neutral voice and sadness, indignation, happiness and surprised four kinds of emotional speeches) characteristic parameter;
Step 2: according to the characteristic parameter that extracts, formulate the emotional speech transformation rule, define every conversion parameter;
Step 3: neutral voice to be converted are carried out characteristic parameter extraction and pitch synchronous mark;
Step 4: by the emotion transformation rule setting and modifying parameter of step 2, to fundamental curve, duration and energy are made amendment, and carry out pitch synchronous stack synthetic speech signal again;
Step 5: the voice signal to step 4 carries out lpc analysis, by the limit that changes transport function resonance peak is changed.
Wherein, in step 1, the language material of choosing is BHUDES (a Beijing Institute of Aeronautics emotional speech database), and the characteristic parameter of extraction comprises fundamental frequency, duration and energy and resonance peak.
In step 2, extract the fundamental frequency of neutral voice and four kinds of emotional speeches respectively, characteristic parameters such as duration and energy draw following transformation rule through statistics:
On the basis of transformation rule, define UP_POSITION (position raises up), DOWN_POSITION (down position) in the above, MEANf0 (whole fundamental frequency change amount), DUR_POSITION (time-delay position), DUR_LEN (time-delay length), ENERGY_SCALE constants such as (energy factors).
In step 3, at first to carry out the judgement of voice segments and quiet section and pure and impure sound to the voice signal x (n) of input.
Voice segments and quiet section double threshold method of adjudicating employing based on short-time energy and short-time zero-crossing rate.
Pure and impure decision method adopts prediction residual energy e
rWith the first rank reflection coefficient r
1The method that combines, judgment condition is:
If r
1>0.2﹠amp; e
r>threshold, then this frame is a voiced sound; Otherwise be voiceless sound.
Wherein N is a frame length, e
rFor carrying out the residual energy after the linear prediction.
Voiced sound is partly carried out the fundamental tone mark, partly take equidistant mark, convenient calculating for voiceless sound.According to the transformation rule of step 2, set every correlation parameter.
Voice signal and a series of pitch synchronous window function of finishing the pitch synchronous mark multiplied each other, obtain some and show overlapping short-time analysis signal.Peaceful (Hanning) window of the general employing standard Chinese or Hamming (Hamming) window, window length is two pitch periods, between the adjacent short-time analysis signal about 50% lap is arranged.The accuracy and the reference position of pitch period are extremely important, and it will have very big influence to the quality of synthetic speech.This method adopts Hamming (Hamming) window, window function as shown in Equation 5:
Original signal x (n) and a series of pitch synchronous window function ω
m(n) multiply each other to the short-time analysis signal be:
x
m(n)=ω
m(n
m-n)×x(n) (6)
N in the formula
mBe fundamental tone mark point.
According to the emotion transformation rule, set up the mapping relations of the pitch period between converted-wave and the original waveform, see Fig. 2, the needed sequence of composite signal in short-term when mapping relations are determined to synthesize thus again.
Short signal sequence and target pitch period are arranged synchronously, and overlap-add obtains synthetic waveform.At this moment, He Cheng speech waveform y (n) just has desired affective characteristics.
In step 4, set the UP_POSITION (position raises up) in the step 2, DOWN_POSITION (down position), MEANf0 (whole fundamental frequency change amount), DUR_POSITION (time-delay position), DUR_LEN (time-delay length), ENERGY_SCALE constants such as (energy factors).
In step 5, earlier voice signal is carried out lpc analysis, in the method, to analyze exponent number and get 12 rank, flow process is seen Fig. 3.Limit to the transport function that obtains is changed, thereby resonance peak is carried out moving on the frequency.
Advantage of the present invention and good effect are: because the present invention changes simultaneously to the rhythm category feature (fundamental frequency, energy and duration) and the tonequality category feature (resonance peak) of voice, make the emotional speech after the conversion more natural.Simultaneously, when fundamental curve is changed,, can make the rhythm revise better effects if by setting necessary parameter.
Description of drawings
Fig. 1 emotional speech flow path switch figure
The mapping relations synoptic diagram of Fig. 2 pitch period
Fig. 3 LPC change resonance peak process flow diagram
Embodiment
The present invention be a kind of be the new method of four kinds of emotional speeches with neutral speech conversion.
Main contents of the present invention are: the extraction statistics of the BHUDES emotional speech sample of choosing being carried out characteristic parameter, formulate transformation rule, according to the fundamental curve and the resonance peak position of rule change voice, finish the conversion of neutral voice to four kind of emotional speech (sad, indignation, glad and surprised) then.
In order to set forth purpose of the present invention, technical scheme and advantage more clearly,, be that surprised voice are that example is described in further details with neutral speech conversion below in conjunction with accompanying drawing.Should be appreciated that specific embodiment described herein only in order to explanation the present invention, and be not used in qualification the present invention.
Embodiment is seen Fig. 1 process flow diagram, and key step is as follows:
Step 1: the emotional speech sample is carried out (comprising neutral voice and sadness, indignation, happiness and surprised four kinds of emotional speeches) extraction and analysis of characteristic parameter, and the characteristic parameter of extraction comprises fundamental frequency, duration and energy and resonance peak;
Step 2: according to the neutral voice that extract and the fundamental frequency of four kinds of emotional speeches, characteristic parameters such as duration and energy are formulated the emotional speech transformation rule, define every conversion constant;
Step 3: the transformation rule that from the emotional speech transformation rule of step 2, extracts surprised voice.
The average fundamental frequency surprised as can be known by transformation rule is high slightly, and the fundamental frequency scope is high slightly, and the afterbody existence of fundamental curve raises up, and energy is high slightly and the resonance peak position is high slightly, and duration is shorter.
At first to carry out the judgement of voice segments and quiet section and pure and impure sound to the voice signal x (n) of input.
Voice segments and quiet section double threshold method of adjudicating employing based on short-time energy and short-time zero-crossing rate.
Pure and impure decision method adopts prediction residual energy e
rWith the first rank reflection coefficient r
1The method that combines, judgment condition is:
If r
1>0.2﹠amp; e
r>threshold, then this frame is a voiced sound; Otherwise be voiceless sound.
Wherein N is a frame length, e
rFor carrying out the residual energy after the linear prediction.
Voiced sound is partly carried out the fundamental tone mark, partly take equidistant mark, convenient calculating for voiceless sound.According to the transformation rule of step 2, set every correlation parameter.
Voice signal and a series of pitch synchronous window function of finishing the pitch synchronous mark multiplied each other, obtain some and show overlapping short-time analysis signal.Peaceful (Hanning) window of the general employing standard Chinese or Hamming (Hamming) window, window length is two pitch periods, between the adjacent short-time analysis signal about 50% lap is arranged.The accuracy and the reference position of pitch period are extremely important, and it will have very big influence to the quality of synthetic speech.This method adopts Hamming (Hamming) window, window function as shown in Equation 5:
Original signal x (n) and a series of pitch synchronous window function ω
m(n) multiply each other to the short-time analysis signal be:
x
m(n)=ω
m(n
m-n)×x(n) (6)
N in the formula
mBe fundamental tone mark point.
According to the emotion transformation rule, set up the mapping relations of the pitch period between converted-wave and the original waveform, mapping relations are determined the synthetic needed sequence of composite signal in short-term thus again.
Short signal sequence and target pitch period are arranged synchronously, and overlap-add obtains synthetic waveform.At this moment, He Cheng speech waveform y (n) just has desired affective characteristics.
Step 4: the emotion transformation rule by step 2 is set UP_POSITION (position raises up), DOWN_POSITION (down position), MEANf0 (whole fundamental frequency change amount), DUR_POSITION (time-delay position), DUR_LEN (time-delay length), ENERGY_SCALE constants such as (energy factors).To fundamental curve, duration and energy are made amendment then, again pitch synchronous stack synthetic speech signal.
Step 5: the voice signal to step 4 carries out lpc analysis, by the limit of transport function resonance peak is changed.Finally obtain emotional speech to be converted.
Claims (6)
1. the present invention proposes the emotional speech conversion method that rhythm class parameter (fundamental frequency, duration and energy) and tonequality class parameter (resonance peak) combine, the concrete steps of this method are as follows:
Step 1: the extraction and analysis that BHUDES emotional speech sample is carried out (comprising neutral voice and sadness, indignation, happiness and surprised four kinds of emotional speeches) characteristic parameter;
Step 2: according to the characteristic parameter that extracts, formulate the emotional speech transformation rule, define every conversion constant
Step 3: neutral voice to be converted are carried out characteristic parameter extraction and pitch synchronous mark;
Step 4: by the emotion transformation rule setting and modifying parameter of step 2, to fundamental curve, duration and energy are made amendment, again pitch synchronous stack synthetic speech signal;
Step 5: the voice signal to step 4 carries out lpc analysis, by the limit of transport function resonance peak is changed.
2. according to the described method of claim 1, the principal character of described step 1 is the parameter extraction to neutral and sad, indignation, happiness and surprised four kinds of emotional speeches.
3. according to the described method of claim 1, the principal character of described step 2 is: the fundamental frequency that extracts neutral voice and four kinds of emotional speeches respectively, characteristic parameter such as duration and energy, draw transformation rule through statistical study, and in the above on the basis of transformation rule, definition UP_POSITION (position raises up), DOWN_POSITION (down position), MEANf0 (whole fundamental frequency change amount), DUR_POSITION (time-delay position), DUR_LEN (time-delay length), ENERGY_SCALE constants such as (energy factors).
4. according to the described method of claim 1, principal character in the described step 3 is: at first will carry out the judgement of voice segments and quiet section and pure and impure sound to the voice signal x (n) of input, the double threshold method based on short-time energy and short-time zero-crossing rate is adopted in voice segments and quiet section judgement;
Pure and impure decision method adopts prediction residual energy e
rWith the first rank reflection coefficient r
1The method that combines, judgment condition is: if r
1>0.2﹠amp; e
r>threshold, then this frame is a voiced sound, otherwise is voiceless sound;
E wherein
rFor carrying out the residual energy after the linear prediction, N is a frame length;
Voiced sound is partly carried out the fundamental tone mark, partly take equidistant mark for voiceless sound; According to the transformation rule of step 2, set every correlation parameter; Multiply each other to finishing pitch synchronous mark point voice signal and a series of pitch synchronous window function, obtain some and show overlapping short-time analysis signal, this method adopts Hamming (hamming) window, window length is two pitch periods, between the adjacent short-time analysis signal about 50% lap is arranged, window function as shown in Equation 5:
Original signal x (n) and a series of pitch synchronous window function ω
m(n) multiply each other to the short-time analysis signal be:
x
m(n)=ω
m(n
m-n)×x(n) (6)
N in the formula
mBe fundamental tone mark point.
5. according to the described method of claim 1, principal character in the described step 4 is: according to the emotion transformation rule, set up the mapping relations of the pitch period between converted-wave and the original waveform, mapping relations are determined the synthetic needed sequence of composite signal in short-term thus again, short signal sequence and target pitch period are arranged synchronously, and overlap-add obtains synthetic waveform.
6. according to the described method of claim 5, its principal character is: voice signal is carried out lpc analysis, in the method, analyze exponent number and get 12 rank, the limit of the transport function that obtains is changed, thereby changed the sound channel transport function, and then change the resonance peak position.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011101220344A CN102184731A (en) | 2011-05-12 | 2011-05-12 | Method for converting emotional speech by combining rhythm parameters with tone parameters |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011101220344A CN102184731A (en) | 2011-05-12 | 2011-05-12 | Method for converting emotional speech by combining rhythm parameters with tone parameters |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102184731A true CN102184731A (en) | 2011-09-14 |
Family
ID=44570897
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2011101220344A Pending CN102184731A (en) | 2011-05-12 | 2011-05-12 | Method for converting emotional speech by combining rhythm parameters with tone parameters |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102184731A (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102592590A (en) * | 2012-02-21 | 2012-07-18 | 华南理工大学 | Arbitrarily adjustable method and device for changing phoneme naturally |
CN103903627A (en) * | 2012-12-27 | 2014-07-02 | 中兴通讯股份有限公司 | Voice-data transmission method and device |
CN104934029A (en) * | 2014-03-17 | 2015-09-23 | 陈成钧 | Speech identification system based on pitch-synchronous spectrum parameter |
CN105654941A (en) * | 2016-01-20 | 2016-06-08 | 华南理工大学 | Voice change method and device based on specific target person voice change ratio parameter |
CN105741854A (en) * | 2014-12-12 | 2016-07-06 | 中兴通讯股份有限公司 | Voice signal processing method and terminal |
CN106205600A (en) * | 2016-07-26 | 2016-12-07 | 浪潮电子信息产业股份有限公司 | One can Chinese text speech synthesis system and method alternately |
CN106688034A (en) * | 2014-09-11 | 2017-05-17 | 微软技术许可有限责任公司 | Text-to-speech with emotional content |
CN107103900A (en) * | 2017-06-06 | 2017-08-29 | 西北师范大学 | A kind of across language emotional speech synthesizing method and system |
CN107221344A (en) * | 2017-04-07 | 2017-09-29 | 南京邮电大学 | A kind of speech emotional moving method |
CN107919138A (en) * | 2017-11-30 | 2018-04-17 | 维沃移动通信有限公司 | Mood processing method and mobile terminal in a kind of voice |
CN108447470A (en) * | 2017-12-28 | 2018-08-24 | 中南大学 | A kind of emotional speech conversion method based on sound channel and prosodic features |
CN110444192A (en) * | 2019-08-15 | 2019-11-12 | 广州科粤信息科技有限公司 | A kind of intelligent sound robot based on voice technology |
WO2019218773A1 (en) * | 2018-05-15 | 2019-11-21 | 中兴通讯股份有限公司 | Voice synthesis method and device, storage medium, and electronic device |
CN110663080A (en) * | 2017-02-13 | 2020-01-07 | 法国国家科研中心 | Method and apparatus for dynamically modifying the timbre of speech by frequency shifting of spectral envelope formants |
CN111968626A (en) * | 2020-08-31 | 2020-11-20 | 腾讯科技(深圳)有限公司 | Sound changing processing method, device, equipment and readable storage medium |
WO2021127979A1 (en) * | 2019-12-24 | 2021-07-01 | 深圳市优必选科技股份有限公司 | Speech synthesis method and apparatus, computer device, and computer readable storage medium |
CN113066511A (en) * | 2021-03-16 | 2021-07-02 | 云知声智能科技股份有限公司 | Voice conversion method and device, electronic equipment and storage medium |
CN113409762A (en) * | 2021-06-30 | 2021-09-17 | 平安科技(深圳)有限公司 | Emotional voice synthesis method, device, equipment and storage medium |
CN113555027A (en) * | 2021-07-26 | 2021-10-26 | 平安科技(深圳)有限公司 | Voice emotion conversion method and device, computer equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1297561A (en) * | 1999-03-25 | 2001-05-30 | 松下电器产业株式会社 | Speech synthesizing system and speech synthesizing method |
US20020049594A1 (en) * | 2000-05-30 | 2002-04-25 | Moore Roger Kenneth | Speech synthesis |
CN101261832A (en) * | 2008-04-21 | 2008-09-10 | 北京航空航天大学 | Extraction and modeling method for Chinese speech sensibility information |
CN101369423A (en) * | 2007-08-17 | 2009-02-18 | 株式会社东芝 | Voice synthesizing method and device |
CN101620852A (en) * | 2008-07-01 | 2010-01-06 | 邹采荣 | Speech-emotion recognition method based on improved quadratic discriminant |
-
2011
- 2011-05-12 CN CN2011101220344A patent/CN102184731A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1297561A (en) * | 1999-03-25 | 2001-05-30 | 松下电器产业株式会社 | Speech synthesizing system and speech synthesizing method |
US20020049594A1 (en) * | 2000-05-30 | 2002-04-25 | Moore Roger Kenneth | Speech synthesis |
CN101369423A (en) * | 2007-08-17 | 2009-02-18 | 株式会社东芝 | Voice synthesizing method and device |
CN101261832A (en) * | 2008-04-21 | 2008-09-10 | 北京航空航天大学 | Extraction and modeling method for Chinese speech sensibility information |
CN101620852A (en) * | 2008-07-01 | 2010-01-06 | 邹采荣 | Speech-emotion recognition method based on improved quadratic discriminant |
Non-Patent Citations (2)
Title |
---|
《中国优秀硕士学位论文全文数据库 信息科技辑》 20080815 任蕊 基于Fujisaki模型的情感语音信号分析与合成 , * |
《信号处理》 20070430 邵艳秋 等 韵律参数和频谱包络修改相结合的情感语音合成技术研究 第23卷, 第4期 * |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102592590B (en) * | 2012-02-21 | 2014-07-02 | 华南理工大学 | Arbitrarily adjustable method and device for changing phoneme naturally |
CN102592590A (en) * | 2012-02-21 | 2012-07-18 | 华南理工大学 | Arbitrarily adjustable method and device for changing phoneme naturally |
CN103903627B (en) * | 2012-12-27 | 2018-06-19 | 中兴通讯股份有限公司 | The transmission method and device of a kind of voice data |
CN103903627A (en) * | 2012-12-27 | 2014-07-02 | 中兴通讯股份有限公司 | Voice-data transmission method and device |
CN104934029A (en) * | 2014-03-17 | 2015-09-23 | 陈成钧 | Speech identification system based on pitch-synchronous spectrum parameter |
CN104934029B (en) * | 2014-03-17 | 2019-03-29 | 纽约市哥伦比亚大学理事会 | Speech recognition system and method based on pitch synchronous frequency spectrum parameter |
CN106688034A (en) * | 2014-09-11 | 2017-05-17 | 微软技术许可有限责任公司 | Text-to-speech with emotional content |
CN106688034B (en) * | 2014-09-11 | 2020-11-13 | 微软技术许可有限责任公司 | Text-to-speech conversion with emotional content |
CN105741854A (en) * | 2014-12-12 | 2016-07-06 | 中兴通讯股份有限公司 | Voice signal processing method and terminal |
CN105654941A (en) * | 2016-01-20 | 2016-06-08 | 华南理工大学 | Voice change method and device based on specific target person voice change ratio parameter |
CN106205600A (en) * | 2016-07-26 | 2016-12-07 | 浪潮电子信息产业股份有限公司 | One can Chinese text speech synthesis system and method alternately |
CN110663080A (en) * | 2017-02-13 | 2020-01-07 | 法国国家科研中心 | Method and apparatus for dynamically modifying the timbre of speech by frequency shifting of spectral envelope formants |
CN107221344A (en) * | 2017-04-07 | 2017-09-29 | 南京邮电大学 | A kind of speech emotional moving method |
CN107103900A (en) * | 2017-06-06 | 2017-08-29 | 西北师范大学 | A kind of across language emotional speech synthesizing method and system |
CN107919138A (en) * | 2017-11-30 | 2018-04-17 | 维沃移动通信有限公司 | Mood processing method and mobile terminal in a kind of voice |
CN107919138B (en) * | 2017-11-30 | 2021-01-08 | 维沃移动通信有限公司 | Emotion processing method in voice and mobile terminal |
CN108447470A (en) * | 2017-12-28 | 2018-08-24 | 中南大学 | A kind of emotional speech conversion method based on sound channel and prosodic features |
WO2019218773A1 (en) * | 2018-05-15 | 2019-11-21 | 中兴通讯股份有限公司 | Voice synthesis method and device, storage medium, and electronic device |
CN110556092A (en) * | 2018-05-15 | 2019-12-10 | 中兴通讯股份有限公司 | Speech synthesis method and device, storage medium and electronic device |
CN110444192A (en) * | 2019-08-15 | 2019-11-12 | 广州科粤信息科技有限公司 | A kind of intelligent sound robot based on voice technology |
WO2021127979A1 (en) * | 2019-12-24 | 2021-07-01 | 深圳市优必选科技股份有限公司 | Speech synthesis method and apparatus, computer device, and computer readable storage medium |
CN111968626A (en) * | 2020-08-31 | 2020-11-20 | 腾讯科技(深圳)有限公司 | Sound changing processing method, device, equipment and readable storage medium |
CN113066511A (en) * | 2021-03-16 | 2021-07-02 | 云知声智能科技股份有限公司 | Voice conversion method and device, electronic equipment and storage medium |
CN113409762A (en) * | 2021-06-30 | 2021-09-17 | 平安科技(深圳)有限公司 | Emotional voice synthesis method, device, equipment and storage medium |
CN113409762B (en) * | 2021-06-30 | 2024-05-07 | 平安科技(深圳)有限公司 | Emotion voice synthesis method, emotion voice synthesis device, emotion voice synthesis equipment and storage medium |
CN113555027A (en) * | 2021-07-26 | 2021-10-26 | 平安科技(深圳)有限公司 | Voice emotion conversion method and device, computer equipment and storage medium |
CN113555027B (en) * | 2021-07-26 | 2024-02-13 | 平安科技(深圳)有限公司 | Voice emotion conversion method and device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102184731A (en) | Method for converting emotional speech by combining rhythm parameters with tone parameters | |
CN101064104B (en) | Emotion voice creating method based on voice conversion | |
CN107464559B (en) | Combined prediction model construction method and system based on Chinese prosody structure and accents | |
CN101000765B (en) | Speech synthetic method based on rhythm character | |
CN107103900A (en) | A kind of across language emotional speech synthesizing method and system | |
CN106971703A (en) | A kind of song synthetic method and device based on HMM | |
CN100476877C (en) | Generating method of cartoon face driven by voice and text together | |
CN101064103B (en) | Chinese voice synthetic method and system based on syllable rhythm restricting relationship | |
CN103531196B (en) | A kind of waveform concatenation phonetic synthesis select sound method | |
CN102664003B (en) | Residual excitation signal synthesis and voice conversion method based on harmonic plus noise model (HNM) | |
CN105374350B (en) | Speech marking method and device | |
CN106128450A (en) | The bilingual method across language voice conversion and system thereof hidden in a kind of Chinese | |
CN102201234B (en) | Speech synthesizing method based on tone automatic tagging and prediction | |
CN101178896A (en) | Unit selection voice synthetic method based on acoustics statistical model | |
CN103165126A (en) | Method for voice playing of mobile phone text short messages | |
CN102254554B (en) | Method for carrying out hierarchical modeling and predicating on mandarin accent | |
CN1815552A (en) | Frequency spectrum modelling and voice reinforcing method based on line spectrum frequency and its interorder differential parameter | |
CN105654942A (en) | Speech synthesis method of interrogative sentence and exclamatory sentence based on statistical parameter | |
Urbain et al. | Evaluation of HMM-based laughter synthesis | |
TWI503813B (en) | Speaking-rate controlled prosodic-information generating device and speaking-rate dependent hierarchical prosodic module | |
CN101887719A (en) | Speech synthesis method, system and mobile terminal equipment with speech synthesis function | |
Kang et al. | Connectionist temporal classification loss for vector quantized variational autoencoder in zero-shot voice conversion | |
CN102231275B (en) | Embedded speech synthesis method based on weighted mixed excitation | |
Bartkova et al. | Prosodic parameters and prosodic structures of French emotional data | |
TWI402824B (en) | A pronunciation variation generation method for spontaneous speech synthesis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C12 | Rejection of a patent application after its publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20110914 |