CN102184731A - Method for converting emotional speech by combining rhythm parameters with tone parameters - Google Patents

Method for converting emotional speech by combining rhythm parameters with tone parameters Download PDF

Info

Publication number
CN102184731A
CN102184731A CN2011101220344A CN201110122034A CN102184731A CN 102184731 A CN102184731 A CN 102184731A CN 2011101220344 A CN2011101220344 A CN 2011101220344A CN 201110122034 A CN201110122034 A CN 201110122034A CN 102184731 A CN102184731 A CN 102184731A
Authority
CN
China
Prior art keywords
speech
emotional
energy
signal
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011101220344A
Other languages
Chinese (zh)
Inventor
毛峡
韩林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN2011101220344A priority Critical patent/CN102184731A/en
Publication of CN102184731A publication Critical patent/CN102184731A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Auxiliary Devices For Music (AREA)

Abstract

The invention discloses a method for converting emotional speech by combining rhythm parameters (fundamental frequency, time length and energy) with a tone parameter (a formant), which mainly comprises the following steps of: 1, carrying out extraction and analysis of feature parameters on a Beihang University emotional speech database (BHUDES) emotional speech sample (containing neutral speech and four types of emotional speech of sadness, anger, happiness and surprise); 2, making an emotional speech conversion rule and defining each conversion constant according to the extracted feature parameters; 3, carrying out extraction of the feature parameters and fundamental tone synchronous tagging on the neutral speech to be converted; 4, setting each conversion constant according to the emotional speech conversion rule in the step 2, modifying a fundamental frequency curve, the time length and the energy and synchronously overlaying fundamental tones to synthesize a speech signal; and 5, carrying out linear predictive coding (LPC) analysis on the speech signal in the step 4 and modifying the formant by a pole of a transfer function so as to finally obtain the emotional speech rich in expressive force.

Description

The emotional speech conversion method that a kind of rhythm class and tonequality class parameter combine
Technical field
The present invention relates to voice signal and handle and artificial intelligence field, relate generally to the emotional speech conversion method that a kind of rhythm class and tonequality class parameter combine.
Background technology
Phonetic synthesis is an important component part in the man-machine interaction.Now people desired what hear no longer is the uninteresting machine sound that very high intelligibility is arranged, but voice with human interest that can show emotion.Existing phonetic synthesis level, still this stage from the literal to the phonetic synthesis of solution, literary composition language conversion (TTS:Text to Speech) just, the emotion information in the voice can not well be expressed.
In addition emotional speech can also and other multimedia technology combination, show emotion such as emotional speech being equipped with corresponding facial characteristics, make sound and expression synchronously, current relatively more popular " vision voice (visualspeech) " technology that Here it is.
Extract affective characteristics from voice signal, analyst's emotion is related with voice signal, and affective characteristics is applied to the research of phonetic synthesis aspect, is the research topic of just having risen in recent years in this field both at home and abroad.But a large amount of models also are not well solved.The emotional speech study on the synthesis is that emotion is calculated the problem of intersecting with these two fields of phonetic synthesis, and wherein phonetic synthesis research is longer, and emotion calculating then is relatively young research field.
PSOLA (Pitch Synchronous Overlap Add) is a kind of waveform concatenation algorithm that is used for speech synthesis technique.It and early stage waveform concatenation have the difference of principle: this algorithm is before adjusting the voice unit splicing, can carry out fundamental frequency to concatenation unit, the adjustment of duration and energy, and when adjusting, be that unit carries out waveform and revises with pitch period rather than traditional frame length, the integrality of pitch period as the level and smooth continuous basic premise that guarantees waveform and frequency spectrum.
In the conversion of emotional speech, it is ripe that PSOLA uses that is that all right, and it can only make amendment to the rhythm class parameter of voice signal, can not change tonequality class parameter.Therefore proposing a kind of conversion method more efficiently has very strong realistic meaning.
Summary of the invention
The present invention proposes a kind of method of changing rhythm class parameter and tonequality class parameter simultaneously and finish the conversion of emotional speech.
Main contents of the present invention are: the extraction statistics of the emotional speech sample being carried out characteristic parameter, formulate transformation rule, according to the fundamental curve and the resonance peak position of rule change voice, finish the conversion of neutral voice to four kind of emotional speech (sad, indignation, glad and surprised) then.
The concrete steps of this method are as follows:
Step 1: the extraction and analysis that the emotional speech sample is carried out (comprising neutral voice and sadness, indignation, happiness and surprised four kinds of emotional speeches) characteristic parameter;
Step 2: according to the characteristic parameter that extracts, formulate the emotional speech transformation rule, define every conversion parameter;
Step 3: neutral voice to be converted are carried out characteristic parameter extraction and pitch synchronous mark;
Step 4: by the emotion transformation rule setting and modifying parameter of step 2, to fundamental curve, duration and energy are made amendment, and carry out pitch synchronous stack synthetic speech signal again;
Step 5: the voice signal to step 4 carries out lpc analysis, by the limit that changes transport function resonance peak is changed.
Wherein, in step 1, the language material of choosing is BHUDES (a Beijing Institute of Aeronautics emotional speech database), and the characteristic parameter of extraction comprises fundamental frequency, duration and energy and resonance peak.
In step 2, extract the fundamental frequency of neutral voice and four kinds of emotional speeches respectively, characteristic parameters such as duration and energy draw following transformation rule through statistics:
Figure BDA0000060658290000021
On the basis of transformation rule, define UP_POSITION (position raises up), DOWN_POSITION (down position) in the above, MEANf0 (whole fundamental frequency change amount), DUR_POSITION (time-delay position), DUR_LEN (time-delay length), ENERGY_SCALE constants such as (energy factors).
In step 3, at first to carry out the judgement of voice segments and quiet section and pure and impure sound to the voice signal x (n) of input.
Voice segments and quiet section double threshold method of adjudicating employing based on short-time energy and short-time zero-crossing rate.
Pure and impure decision method adopts prediction residual energy e rWith the first rank reflection coefficient r 1The method that combines, judgment condition is:
If r 1>0.2﹠amp; e r>threshold, then this frame is a voiced sound; Otherwise be voiceless sound.
r 1 = R ss ( 1 ) R ss ( 0 ) - - - ( 1 )
R ss ( 0 ) = 1 N Σ n = 1 N x ( n ) x ( n ) - - - ( 2 )
R ss ( 1 ) = 1 N Σ n = 1 N - 1 x ( n ) x ( n + 1 ) - - - ( 3 )
Wherein N is a frame length, e rFor carrying out the residual energy after the linear prediction.
Voiced sound is partly carried out the fundamental tone mark, partly take equidistant mark, convenient calculating for voiceless sound.According to the transformation rule of step 2, set every correlation parameter.
Voice signal and a series of pitch synchronous window function of finishing the pitch synchronous mark multiplied each other, obtain some and show overlapping short-time analysis signal.Peaceful (Hanning) window of the general employing standard Chinese or Hamming (Hamming) window, window length is two pitch periods, between the adjacent short-time analysis signal about 50% lap is arranged.The accuracy and the reference position of pitch period are extremely important, and it will have very big influence to the quality of synthetic speech.This method adopts Hamming (Hamming) window, window function as shown in Equation 5:
ω ( n ) = 0.54 - 0.46 ( 2 πn / ( N - 1 ) ) 0 ≤ n ≤ N - 1 0 else - - - ( 5 )
Original signal x (n) and a series of pitch synchronous window function ω m(n) multiply each other to the short-time analysis signal be:
x m(n)=ω m(n m-n)×x(n) (6)
N in the formula mBe fundamental tone mark point.
According to the emotion transformation rule, set up the mapping relations of the pitch period between converted-wave and the original waveform, see Fig. 2, the needed sequence of composite signal in short-term when mapping relations are determined to synthesize thus again.
Short signal sequence and target pitch period are arranged synchronously, and overlap-add obtains synthetic waveform.At this moment, He Cheng speech waveform y (n) just has desired affective characteristics.
In step 4, set the UP_POSITION (position raises up) in the step 2, DOWN_POSITION (down position), MEANf0 (whole fundamental frequency change amount), DUR_POSITION (time-delay position), DUR_LEN (time-delay length), ENERGY_SCALE constants such as (energy factors).
In step 5, earlier voice signal is carried out lpc analysis, in the method, to analyze exponent number and get 12 rank, flow process is seen Fig. 3.Limit to the transport function that obtains is changed, thereby resonance peak is carried out moving on the frequency.
Advantage of the present invention and good effect are: because the present invention changes simultaneously to the rhythm category feature (fundamental frequency, energy and duration) and the tonequality category feature (resonance peak) of voice, make the emotional speech after the conversion more natural.Simultaneously, when fundamental curve is changed,, can make the rhythm revise better effects if by setting necessary parameter.
Description of drawings
Fig. 1 emotional speech flow path switch figure
The mapping relations synoptic diagram of Fig. 2 pitch period
Fig. 3 LPC change resonance peak process flow diagram
Embodiment
The present invention be a kind of be the new method of four kinds of emotional speeches with neutral speech conversion.
Main contents of the present invention are: the extraction statistics of the BHUDES emotional speech sample of choosing being carried out characteristic parameter, formulate transformation rule, according to the fundamental curve and the resonance peak position of rule change voice, finish the conversion of neutral voice to four kind of emotional speech (sad, indignation, glad and surprised) then.
In order to set forth purpose of the present invention, technical scheme and advantage more clearly,, be that surprised voice are that example is described in further details with neutral speech conversion below in conjunction with accompanying drawing.Should be appreciated that specific embodiment described herein only in order to explanation the present invention, and be not used in qualification the present invention.
Embodiment is seen Fig. 1 process flow diagram, and key step is as follows:
Step 1: the emotional speech sample is carried out (comprising neutral voice and sadness, indignation, happiness and surprised four kinds of emotional speeches) extraction and analysis of characteristic parameter, and the characteristic parameter of extraction comprises fundamental frequency, duration and energy and resonance peak;
Step 2: according to the neutral voice that extract and the fundamental frequency of four kinds of emotional speeches, characteristic parameters such as duration and energy are formulated the emotional speech transformation rule, define every conversion constant;
Step 3: the transformation rule that from the emotional speech transformation rule of step 2, extracts surprised voice.
The average fundamental frequency surprised as can be known by transformation rule is high slightly, and the fundamental frequency scope is high slightly, and the afterbody existence of fundamental curve raises up, and energy is high slightly and the resonance peak position is high slightly, and duration is shorter.
At first to carry out the judgement of voice segments and quiet section and pure and impure sound to the voice signal x (n) of input.
Voice segments and quiet section double threshold method of adjudicating employing based on short-time energy and short-time zero-crossing rate.
Pure and impure decision method adopts prediction residual energy e rWith the first rank reflection coefficient r 1The method that combines, judgment condition is:
If r 1>0.2﹠amp; e r>threshold, then this frame is a voiced sound; Otherwise be voiceless sound.
r 1 = R ss ( 1 ) R ss ( 0 ) - - - ( 1 )
R ss ( 0 ) = 1 N Σ n = 1 N x ( n ) x ( n ) - - - ( 2 )
R ss ( 1 ) = 1 N Σ n = 1 N - 1 x ( n ) x ( n + 1 ) - - - ( 3 )
Wherein N is a frame length, e rFor carrying out the residual energy after the linear prediction.
Voiced sound is partly carried out the fundamental tone mark, partly take equidistant mark, convenient calculating for voiceless sound.According to the transformation rule of step 2, set every correlation parameter.
Voice signal and a series of pitch synchronous window function of finishing the pitch synchronous mark multiplied each other, obtain some and show overlapping short-time analysis signal.Peaceful (Hanning) window of the general employing standard Chinese or Hamming (Hamming) window, window length is two pitch periods, between the adjacent short-time analysis signal about 50% lap is arranged.The accuracy and the reference position of pitch period are extremely important, and it will have very big influence to the quality of synthetic speech.This method adopts Hamming (Hamming) window, window function as shown in Equation 5:
ω ( n ) = 0.54 - 0.46 ( 2 πn / ( N - 1 ) ) 0 ≤ n ≤ N - 1 0 else - - - ( 5 )
Original signal x (n) and a series of pitch synchronous window function ω m(n) multiply each other to the short-time analysis signal be:
x m(n)=ω m(n m-n)×x(n) (6)
N in the formula mBe fundamental tone mark point.
According to the emotion transformation rule, set up the mapping relations of the pitch period between converted-wave and the original waveform, mapping relations are determined the synthetic needed sequence of composite signal in short-term thus again.
Short signal sequence and target pitch period are arranged synchronously, and overlap-add obtains synthetic waveform.At this moment, He Cheng speech waveform y (n) just has desired affective characteristics.
Step 4: the emotion transformation rule by step 2 is set UP_POSITION (position raises up), DOWN_POSITION (down position), MEANf0 (whole fundamental frequency change amount), DUR_POSITION (time-delay position), DUR_LEN (time-delay length), ENERGY_SCALE constants such as (energy factors).To fundamental curve, duration and energy are made amendment then, again pitch synchronous stack synthetic speech signal.
Step 5: the voice signal to step 4 carries out lpc analysis, by the limit of transport function resonance peak is changed.Finally obtain emotional speech to be converted.

Claims (6)

1. the present invention proposes the emotional speech conversion method that rhythm class parameter (fundamental frequency, duration and energy) and tonequality class parameter (resonance peak) combine, the concrete steps of this method are as follows:
Step 1: the extraction and analysis that BHUDES emotional speech sample is carried out (comprising neutral voice and sadness, indignation, happiness and surprised four kinds of emotional speeches) characteristic parameter;
Step 2: according to the characteristic parameter that extracts, formulate the emotional speech transformation rule, define every conversion constant
Step 3: neutral voice to be converted are carried out characteristic parameter extraction and pitch synchronous mark;
Step 4: by the emotion transformation rule setting and modifying parameter of step 2, to fundamental curve, duration and energy are made amendment, again pitch synchronous stack synthetic speech signal;
Step 5: the voice signal to step 4 carries out lpc analysis, by the limit of transport function resonance peak is changed.
2. according to the described method of claim 1, the principal character of described step 1 is the parameter extraction to neutral and sad, indignation, happiness and surprised four kinds of emotional speeches.
3. according to the described method of claim 1, the principal character of described step 2 is: the fundamental frequency that extracts neutral voice and four kinds of emotional speeches respectively, characteristic parameter such as duration and energy, draw transformation rule through statistical study, and in the above on the basis of transformation rule, definition UP_POSITION (position raises up), DOWN_POSITION (down position), MEANf0 (whole fundamental frequency change amount), DUR_POSITION (time-delay position), DUR_LEN (time-delay length), ENERGY_SCALE constants such as (energy factors).
4. according to the described method of claim 1, principal character in the described step 3 is: at first will carry out the judgement of voice segments and quiet section and pure and impure sound to the voice signal x (n) of input, the double threshold method based on short-time energy and short-time zero-crossing rate is adopted in voice segments and quiet section judgement;
Pure and impure decision method adopts prediction residual energy e rWith the first rank reflection coefficient r 1The method that combines, judgment condition is: if r 1>0.2﹠amp; e r>threshold, then this frame is a voiced sound, otherwise is voiceless sound;
r 1 = R ss ( 1 ) R ss ( 0 ) - - - ( 1 )
R ss ( 0 ) = 1 N Σ n = 1 N x ( n ) x ( n ) - - - ( 2 )
R ss ( 1 ) = 1 N Σ n = 1 N - 1 x ( n ) x ( n + 1 ) - - - ( 3 )
E wherein rFor carrying out the residual energy after the linear prediction, N is a frame length;
Voiced sound is partly carried out the fundamental tone mark, partly take equidistant mark for voiceless sound; According to the transformation rule of step 2, set every correlation parameter; Multiply each other to finishing pitch synchronous mark point voice signal and a series of pitch synchronous window function, obtain some and show overlapping short-time analysis signal, this method adopts Hamming (hamming) window, window length is two pitch periods, between the adjacent short-time analysis signal about 50% lap is arranged, window function as shown in Equation 5:
ω ( n ) = 0.54 - 0.46 ( 2 πn / ( N - 1 ) ) 0 ≤ n ≤ N - 1 0 else - - - ( 5 )
Original signal x (n) and a series of pitch synchronous window function ω m(n) multiply each other to the short-time analysis signal be:
x m(n)=ω m(n m-n)×x(n) (6)
N in the formula mBe fundamental tone mark point.
5. according to the described method of claim 1, principal character in the described step 4 is: according to the emotion transformation rule, set up the mapping relations of the pitch period between converted-wave and the original waveform, mapping relations are determined the synthetic needed sequence of composite signal in short-term thus again, short signal sequence and target pitch period are arranged synchronously, and overlap-add obtains synthetic waveform.
6. according to the described method of claim 5, its principal character is: voice signal is carried out lpc analysis, in the method, analyze exponent number and get 12 rank, the limit of the transport function that obtains is changed, thereby changed the sound channel transport function, and then change the resonance peak position.
CN2011101220344A 2011-05-12 2011-05-12 Method for converting emotional speech by combining rhythm parameters with tone parameters Pending CN102184731A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011101220344A CN102184731A (en) 2011-05-12 2011-05-12 Method for converting emotional speech by combining rhythm parameters with tone parameters

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011101220344A CN102184731A (en) 2011-05-12 2011-05-12 Method for converting emotional speech by combining rhythm parameters with tone parameters

Publications (1)

Publication Number Publication Date
CN102184731A true CN102184731A (en) 2011-09-14

Family

ID=44570897

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011101220344A Pending CN102184731A (en) 2011-05-12 2011-05-12 Method for converting emotional speech by combining rhythm parameters with tone parameters

Country Status (1)

Country Link
CN (1) CN102184731A (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102592590A (en) * 2012-02-21 2012-07-18 华南理工大学 Arbitrarily adjustable method and device for changing phoneme naturally
CN103903627A (en) * 2012-12-27 2014-07-02 中兴通讯股份有限公司 Voice-data transmission method and device
CN104934029A (en) * 2014-03-17 2015-09-23 陈成钧 Speech identification system based on pitch-synchronous spectrum parameter
CN105654941A (en) * 2016-01-20 2016-06-08 华南理工大学 Voice change method and device based on specific target person voice change ratio parameter
CN105741854A (en) * 2014-12-12 2016-07-06 中兴通讯股份有限公司 Voice signal processing method and terminal
CN106205600A (en) * 2016-07-26 2016-12-07 浪潮电子信息产业股份有限公司 One can Chinese text speech synthesis system and method alternately
CN106688034A (en) * 2014-09-11 2017-05-17 微软技术许可有限责任公司 Text-to-speech with emotional content
CN107103900A (en) * 2017-06-06 2017-08-29 西北师范大学 A kind of across language emotional speech synthesizing method and system
CN107221344A (en) * 2017-04-07 2017-09-29 南京邮电大学 A kind of speech emotional moving method
CN107919138A (en) * 2017-11-30 2018-04-17 维沃移动通信有限公司 Mood processing method and mobile terminal in a kind of voice
CN108447470A (en) * 2017-12-28 2018-08-24 中南大学 A kind of emotional speech conversion method based on sound channel and prosodic features
CN110444192A (en) * 2019-08-15 2019-11-12 广州科粤信息科技有限公司 A kind of intelligent sound robot based on voice technology
WO2019218773A1 (en) * 2018-05-15 2019-11-21 中兴通讯股份有限公司 Voice synthesis method and device, storage medium, and electronic device
CN110663080A (en) * 2017-02-13 2020-01-07 法国国家科研中心 Method and apparatus for dynamically modifying the timbre of speech by frequency shifting of spectral envelope formants
CN111968626A (en) * 2020-08-31 2020-11-20 腾讯科技(深圳)有限公司 Sound changing processing method, device, equipment and readable storage medium
WO2021127979A1 (en) * 2019-12-24 2021-07-01 深圳市优必选科技股份有限公司 Speech synthesis method and apparatus, computer device, and computer readable storage medium
CN113066511A (en) * 2021-03-16 2021-07-02 云知声智能科技股份有限公司 Voice conversion method and device, electronic equipment and storage medium
CN113409762A (en) * 2021-06-30 2021-09-17 平安科技(深圳)有限公司 Emotional voice synthesis method, device, equipment and storage medium
CN113555027A (en) * 2021-07-26 2021-10-26 平安科技(深圳)有限公司 Voice emotion conversion method and device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1297561A (en) * 1999-03-25 2001-05-30 松下电器产业株式会社 Speech synthesizing system and speech synthesizing method
US20020049594A1 (en) * 2000-05-30 2002-04-25 Moore Roger Kenneth Speech synthesis
CN101261832A (en) * 2008-04-21 2008-09-10 北京航空航天大学 Extraction and modeling method for Chinese speech sensibility information
CN101369423A (en) * 2007-08-17 2009-02-18 株式会社东芝 Voice synthesizing method and device
CN101620852A (en) * 2008-07-01 2010-01-06 邹采荣 Speech-emotion recognition method based on improved quadratic discriminant

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1297561A (en) * 1999-03-25 2001-05-30 松下电器产业株式会社 Speech synthesizing system and speech synthesizing method
US20020049594A1 (en) * 2000-05-30 2002-04-25 Moore Roger Kenneth Speech synthesis
CN101369423A (en) * 2007-08-17 2009-02-18 株式会社东芝 Voice synthesizing method and device
CN101261832A (en) * 2008-04-21 2008-09-10 北京航空航天大学 Extraction and modeling method for Chinese speech sensibility information
CN101620852A (en) * 2008-07-01 2010-01-06 邹采荣 Speech-emotion recognition method based on improved quadratic discriminant

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《中国优秀硕士学位论文全文数据库 信息科技辑》 20080815 任蕊 基于Fujisaki模型的情感语音信号分析与合成 , *
《信号处理》 20070430 邵艳秋 等 韵律参数和频谱包络修改相结合的情感语音合成技术研究 第23卷, 第4期 *

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102592590B (en) * 2012-02-21 2014-07-02 华南理工大学 Arbitrarily adjustable method and device for changing phoneme naturally
CN102592590A (en) * 2012-02-21 2012-07-18 华南理工大学 Arbitrarily adjustable method and device for changing phoneme naturally
CN103903627B (en) * 2012-12-27 2018-06-19 中兴通讯股份有限公司 The transmission method and device of a kind of voice data
CN103903627A (en) * 2012-12-27 2014-07-02 中兴通讯股份有限公司 Voice-data transmission method and device
CN104934029A (en) * 2014-03-17 2015-09-23 陈成钧 Speech identification system based on pitch-synchronous spectrum parameter
CN104934029B (en) * 2014-03-17 2019-03-29 纽约市哥伦比亚大学理事会 Speech recognition system and method based on pitch synchronous frequency spectrum parameter
CN106688034A (en) * 2014-09-11 2017-05-17 微软技术许可有限责任公司 Text-to-speech with emotional content
CN106688034B (en) * 2014-09-11 2020-11-13 微软技术许可有限责任公司 Text-to-speech conversion with emotional content
CN105741854A (en) * 2014-12-12 2016-07-06 中兴通讯股份有限公司 Voice signal processing method and terminal
CN105654941A (en) * 2016-01-20 2016-06-08 华南理工大学 Voice change method and device based on specific target person voice change ratio parameter
CN106205600A (en) * 2016-07-26 2016-12-07 浪潮电子信息产业股份有限公司 One can Chinese text speech synthesis system and method alternately
CN110663080A (en) * 2017-02-13 2020-01-07 法国国家科研中心 Method and apparatus for dynamically modifying the timbre of speech by frequency shifting of spectral envelope formants
CN107221344A (en) * 2017-04-07 2017-09-29 南京邮电大学 A kind of speech emotional moving method
CN107103900A (en) * 2017-06-06 2017-08-29 西北师范大学 A kind of across language emotional speech synthesizing method and system
CN107919138A (en) * 2017-11-30 2018-04-17 维沃移动通信有限公司 Mood processing method and mobile terminal in a kind of voice
CN107919138B (en) * 2017-11-30 2021-01-08 维沃移动通信有限公司 Emotion processing method in voice and mobile terminal
CN108447470A (en) * 2017-12-28 2018-08-24 中南大学 A kind of emotional speech conversion method based on sound channel and prosodic features
WO2019218773A1 (en) * 2018-05-15 2019-11-21 中兴通讯股份有限公司 Voice synthesis method and device, storage medium, and electronic device
CN110556092A (en) * 2018-05-15 2019-12-10 中兴通讯股份有限公司 Speech synthesis method and device, storage medium and electronic device
CN110444192A (en) * 2019-08-15 2019-11-12 广州科粤信息科技有限公司 A kind of intelligent sound robot based on voice technology
WO2021127979A1 (en) * 2019-12-24 2021-07-01 深圳市优必选科技股份有限公司 Speech synthesis method and apparatus, computer device, and computer readable storage medium
CN111968626A (en) * 2020-08-31 2020-11-20 腾讯科技(深圳)有限公司 Sound changing processing method, device, equipment and readable storage medium
CN113066511A (en) * 2021-03-16 2021-07-02 云知声智能科技股份有限公司 Voice conversion method and device, electronic equipment and storage medium
CN113409762A (en) * 2021-06-30 2021-09-17 平安科技(深圳)有限公司 Emotional voice synthesis method, device, equipment and storage medium
CN113409762B (en) * 2021-06-30 2024-05-07 平安科技(深圳)有限公司 Emotion voice synthesis method, emotion voice synthesis device, emotion voice synthesis equipment and storage medium
CN113555027A (en) * 2021-07-26 2021-10-26 平安科技(深圳)有限公司 Voice emotion conversion method and device, computer equipment and storage medium
CN113555027B (en) * 2021-07-26 2024-02-13 平安科技(深圳)有限公司 Voice emotion conversion method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN102184731A (en) Method for converting emotional speech by combining rhythm parameters with tone parameters
CN101064104B (en) Emotion voice creating method based on voice conversion
CN107464559B (en) Combined prediction model construction method and system based on Chinese prosody structure and accents
CN101000765B (en) Speech synthetic method based on rhythm character
CN107103900A (en) A kind of across language emotional speech synthesizing method and system
CN106971703A (en) A kind of song synthetic method and device based on HMM
CN100476877C (en) Generating method of cartoon face driven by voice and text together
CN101064103B (en) Chinese voice synthetic method and system based on syllable rhythm restricting relationship
CN103531196B (en) A kind of waveform concatenation phonetic synthesis select sound method
CN102664003B (en) Residual excitation signal synthesis and voice conversion method based on harmonic plus noise model (HNM)
CN105374350B (en) Speech marking method and device
CN106128450A (en) The bilingual method across language voice conversion and system thereof hidden in a kind of Chinese
CN102201234B (en) Speech synthesizing method based on tone automatic tagging and prediction
CN101178896A (en) Unit selection voice synthetic method based on acoustics statistical model
CN103165126A (en) Method for voice playing of mobile phone text short messages
CN102254554B (en) Method for carrying out hierarchical modeling and predicating on mandarin accent
CN1815552A (en) Frequency spectrum modelling and voice reinforcing method based on line spectrum frequency and its interorder differential parameter
CN105654942A (en) Speech synthesis method of interrogative sentence and exclamatory sentence based on statistical parameter
Urbain et al. Evaluation of HMM-based laughter synthesis
TWI503813B (en) Speaking-rate controlled prosodic-information generating device and speaking-rate dependent hierarchical prosodic module
CN101887719A (en) Speech synthesis method, system and mobile terminal equipment with speech synthesis function
Kang et al. Connectionist temporal classification loss for vector quantized variational autoencoder in zero-shot voice conversion
CN102231275B (en) Embedded speech synthesis method based on weighted mixed excitation
Bartkova et al. Prosodic parameters and prosodic structures of French emotional data
TWI402824B (en) A pronunciation variation generation method for spontaneous speech synthesis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20110914