CN107039033A - A kind of speech synthetic device - Google Patents
A kind of speech synthetic device Download PDFInfo
- Publication number
- CN107039033A CN107039033A CN201710248786.2A CN201710248786A CN107039033A CN 107039033 A CN107039033 A CN 107039033A CN 201710248786 A CN201710248786 A CN 201710248786A CN 107039033 A CN107039033 A CN 107039033A
- Authority
- CN
- China
- Prior art keywords
- tone
- speech
- synthesis
- voice
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 47
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 47
- 241001269238 Data Species 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 238000004519 manufacturing process Methods 0.000 claims description 3
- 239000000284 extract Substances 0.000 claims 1
- 230000008451 emotion Effects 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000000034 method Methods 0.000 description 2
- 210000003462 vein Anatomy 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 230000001020 rhythmical effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
- G10L13/0335—Pitch control
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/04—Details of speech synthesis systems, e.g. synthesiser structure or memory management
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
- G10L13/10—Prosody rules derived from text; Stress or intonation
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Abstract
The present invention relates to a kind of speech synthetic device, module, receiving module, tone processing module, Modifying model module and synthesis module are built including voice, wherein tone processing module is used in the speech text to be synthesized received, according to the status information for indicating affective state, produce for influenceing the tone information of synthesis voice;Synthesis speech data of the synthesis module synthesis with tone is eventually passed, so that the speech naturalness after synthesis is higher, and then Consumer's Experience is improved.
Description
Technical field
The present invention relates to the present invention relates to phonetic synthesis field, and in particular to a kind of speech synthetic device.
Background technology
Any text information, the smooth voice of standard can be converted into real time bright by phonetic synthesis, also known as literary periodicals technology
Read out, and synthesis voice is had higher intelligibility and naturalness as far as possible, artificial face has been loaded onto equivalent to machine.
Synthesis voice should can regenerate transmission information in the way of nature is read again with emotion by a kind of, it may be preferable to body
Reveal stronger rhythmical image, voice of the synthesis with specific characteristic style, the heavier novel of such as emotion reads aloud style, storytelling
Style, and different manifestations the informal synthesis voice style such as vein of humour vein, so as to increase the diversity of synthesis voice, meet
The different demands of people.
In existing speech synthesis system, after input text is by a series of processing such as Text Pretreatment, participle, into rhythm
Level prediction module is restrained, then using acoustic model, target acoustical parameters sequence is generated, and finally synthesize voice.Closed in parameter
Into in system, speech production is realized by vocoder, due to this speech production mode, it is not necessary to utilize original sound
Fragment is spliced, and can accomplish smaller size, so being widely applied on embedded equipment.
At present, synthesis voice Main is that rule-based method pairing is adjusted into voice, and this method can not
In view of the details of voice, such as tone information causes the speech naturalness after synthesis relatively low, and then reduce Consumer's Experience.
The content of the invention
In order to solve the above-mentioned technical problem, the present invention provides a kind of speech synthetic device.
The present invention is realized with following technical scheme, a kind of speech synthetic device, including:
Voice builds module, for building phonetic synthesis model previously according to a large amount of speech datas of collection;
Receiving module, the speech text to be synthesized for receiving user;
Tone processing module, in the speech text to be synthesized received, being believed according to the state for indicating affective state
Breath, is produced for influenceing the tone information of synthesis voice;
Modifying model module, for being modified according to the tone information of synthesis voice to the phonetic synthesis model;
Synthesis module, for carrying out voice conjunction according to the revised voice with tone information of the Modifying model module
Into obtaining the synthesis speech data with tone.
Preferably, the tone processing module includes:Pitch parameters generating unit and tone information converter section, the tone
Parameter generating unit is used to produce tone affecting parameters, and tone information conversion according to the status information for indicating affective state
The tone affecting parameters that the pitch parameters generating unit is produced are converted into tone influence information by portion.
Preferably, the Modifying model module includes:
Tone acquiring unit, for obtaining the number of tones corresponding to the tone information produced with the tone processing module
According to;
Tone recognition unit, for carrying out tone identification to the tone data, obtains tone identification text;
Acoustic feature extraction unit, the acoustic feature for extracting the speech text to be synthesized;
Voice amending unit, for tone identification text to be modified to the phonetic synthesis model, is repaiied
Phonetic synthesis model after just.
Preferably, the Modifying model module also includes:Pretreatment unit, use is received for removing the receiving module
Noise in the speech text to be synthesized at family.
Preferably, the tone influence information is the characteristic parameter extracted from the Wave data of voice.
Preferably, the tone influence information is the control parameter synthesized for control voice.
Preferably, the control parameter is used for the volume balance and amplitude wave momentum that control voice is synthesized.
The beneficial effects of the invention are as follows:The present invention is after the speech text to be synthesized of user is received, according to instruction emotion
The status information of state, produces for influenceing the tone information of synthesis voice, is subsequently used for the tone information according to synthesis voice
The phonetic synthesis model is modified;Finally obtain the synthesis speech data with tone so that the voice after synthesis is certainly
So degree is higher, and then improves Consumer's Experience.
Brief description of the drawings
Fig. 1 is the structured flowchart of speech synthetic device of the present invention.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, the present invention is made into one below in conjunction with accompanying drawing
It is described in detail on step ground.
As shown in figure 1, the present invention provides a kind of speech synthetic device, including:
Voice builds module, for building phonetic synthesis model previously according to a large amount of speech datas of collection;
Receiving module, the speech text to be synthesized for receiving user;
Tone processing module, in the speech text to be synthesized received, being believed according to the state for indicating affective state
Breath, is produced for influenceing the tone information of synthesis voice;
Modifying model module, for being modified according to the tone information of synthesis voice to the phonetic synthesis model;
Synthesis module, for carrying out voice conjunction according to the revised voice with tone information of the Modifying model module
Into obtaining the synthesis speech data with tone.
Preferably, the tone processing module includes:Pitch parameters generating unit and tone information converter section, the tone
Parameter generating unit is used to produce tone affecting parameters, and tone information conversion according to the status information for indicating affective state
The tone affecting parameters that the pitch parameters generating unit is produced are converted into tone influence information by portion.
Preferably, the Modifying model module includes:
Tone acquiring unit, for obtaining the number of tones corresponding to the tone information produced with the tone processing module
According to;
Tone recognition unit, for carrying out tone identification to the tone data, obtains tone identification text;
Acoustic feature extraction unit, the acoustic feature for extracting the speech text to be synthesized;
Voice amending unit, for tone identification text to be modified to the phonetic synthesis model, is repaiied
Phonetic synthesis model after just.
Preferably, the Modifying model module also includes:Pretreatment unit, use is received for removing the receiving module
Noise in the speech text to be synthesized at family.
Preferably, the tone influence information is the characteristic parameter extracted from the Wave data of voice.
Preferably, the tone influence information is the control parameter synthesized for control voice.
Preferably, the control parameter is used for the volume balance and amplitude wave momentum that control voice is synthesized.
Above disclosure is only preferred embodiment of present invention, can not limit the right model of the present invention with this certainly
Enclose, therefore the equivalent variations made according to the claims in the present invention, still belong to the scope that the present invention is covered.
Claims (7)
1. a kind of speech synthetic device, it is characterised in that including:
Voice builds module, for building phonetic synthesis model previously according to a large amount of speech datas of collection;
Receiving module, the speech text to be synthesized for receiving user;
Tone processing module, in the speech text to be synthesized received, according to the status information for indicating affective state, production
It is raw to be used to influence the tone information of synthesis voice;
Modifying model module, for being modified according to the tone information of synthesis voice to the phonetic synthesis model;
Synthesis module, for carrying out phonetic synthesis according to the revised voice with tone information of the Modifying model module, is obtained
To the synthesis speech data with tone.
2. a kind of speech synthetic device according to claim 1, it is characterised in that the tone processing module includes:Sound
Parameter generating unit and tone information converter section are adjusted, the pitch parameters generating unit is used for according to the status information for indicating affective state
Produce tone affecting parameters, and the tone affecting parameters that the tone information converter section produces the pitch parameters generating unit
It is converted into tone influence information.
3. a kind of speech synthetic device according to claim 1, it is characterised in that the Modifying model module includes:
Tone acquiring unit, for obtaining the tone data corresponding to the tone information produced with the tone processing module;
Tone recognition unit, for carrying out tone identification to the tone data, obtains tone identification text;
Acoustic feature extraction unit, the acoustic feature for extracting the speech text to be synthesized;
Voice amending unit, for tone identification text to be modified to the phonetic synthesis model, is obtained after amendment
Phonetic synthesis model.
4. a kind of speech synthetic device according to claim 3, it is characterised in that the Modifying model module also includes:
Pretreatment unit, the noise in the speech text to be synthesized of user is received for removing the receiving module.
5. a kind of speech synthetic device according to claim 2, it is characterised in that the tone influence information is from voice
Wave data in the characteristic parameter that extracts.
6. a kind of speech synthetic device according to claim 2, it is characterised in that the tone influence information is to be used to control
The control parameter of phonetic synthesis processed.
7. a kind of speech synthetic device according to claim 6, it is characterised in that the control parameter is synthesized for control voice
Volume balance and amplitude wave momentum.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710248786.2A CN107039033A (en) | 2017-04-17 | 2017-04-17 | A kind of speech synthetic device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710248786.2A CN107039033A (en) | 2017-04-17 | 2017-04-17 | A kind of speech synthetic device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107039033A true CN107039033A (en) | 2017-08-11 |
Family
ID=59535293
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710248786.2A Pending CN107039033A (en) | 2017-04-17 | 2017-04-17 | A kind of speech synthetic device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107039033A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108010512A (en) * | 2017-12-05 | 2018-05-08 | 广东小天才科技有限公司 | Sound effect acquisition method and recording terminal |
CN109036370A (en) * | 2018-06-06 | 2018-12-18 | 安徽继远软件有限公司 | A kind of speaker's voice adaptive training method |
CN109599090A (en) * | 2018-10-29 | 2019-04-09 | 阿里巴巴集团控股有限公司 | A kind of method, device and equipment of speech synthesis |
CN110312161A (en) * | 2018-03-20 | 2019-10-08 | Tcl集团股份有限公司 | A kind of video dubbing method, device and terminal device |
CN110600002A (en) * | 2019-09-18 | 2019-12-20 | 北京声智科技有限公司 | Voice synthesis method and device and electronic equipment |
WO2021155662A1 (en) * | 2020-02-03 | 2021-08-12 | 华为技术有限公司 | Text information processing method and apparatus, computer device, and readable storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5873059A (en) * | 1995-10-26 | 1999-02-16 | Sony Corporation | Method and apparatus for decoding and changing the pitch of an encoded speech signal |
CN1461463A (en) * | 2001-03-09 | 2003-12-10 | 索尼公司 | Voice synthesis device |
CN102201234A (en) * | 2011-06-24 | 2011-09-28 | 北京宇音天下科技有限公司 | Speech synthesizing method based on tone automatic tagging and prediction |
CN102496363A (en) * | 2011-11-11 | 2012-06-13 | 北京宇音天下科技有限公司 | Correction method for Chinese speech synthesis tone |
CN103117057A (en) * | 2012-12-27 | 2013-05-22 | 安徽科大讯飞信息科技股份有限公司 | Application method of special human voice synthesis technique in mobile phone cartoon dubbing |
-
2017
- 2017-04-17 CN CN201710248786.2A patent/CN107039033A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5873059A (en) * | 1995-10-26 | 1999-02-16 | Sony Corporation | Method and apparatus for decoding and changing the pitch of an encoded speech signal |
CN1461463A (en) * | 2001-03-09 | 2003-12-10 | 索尼公司 | Voice synthesis device |
CN102201234A (en) * | 2011-06-24 | 2011-09-28 | 北京宇音天下科技有限公司 | Speech synthesizing method based on tone automatic tagging and prediction |
CN102496363A (en) * | 2011-11-11 | 2012-06-13 | 北京宇音天下科技有限公司 | Correction method for Chinese speech synthesis tone |
CN103117057A (en) * | 2012-12-27 | 2013-05-22 | 安徽科大讯飞信息科技股份有限公司 | Application method of special human voice synthesis technique in mobile phone cartoon dubbing |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108010512A (en) * | 2017-12-05 | 2018-05-08 | 广东小天才科技有限公司 | Sound effect acquisition method and recording terminal |
CN108010512B (en) * | 2017-12-05 | 2021-04-30 | 广东小天才科技有限公司 | Sound effect acquisition method and recording terminal |
CN110312161A (en) * | 2018-03-20 | 2019-10-08 | Tcl集团股份有限公司 | A kind of video dubbing method, device and terminal device |
CN110312161B (en) * | 2018-03-20 | 2020-12-11 | Tcl科技集团股份有限公司 | Video dubbing method and device and terminal equipment |
CN109036370A (en) * | 2018-06-06 | 2018-12-18 | 安徽继远软件有限公司 | A kind of speaker's voice adaptive training method |
CN109036370B (en) * | 2018-06-06 | 2021-07-20 | 安徽继远软件有限公司 | Adaptive training method for speaker voice |
CN109599090A (en) * | 2018-10-29 | 2019-04-09 | 阿里巴巴集团控股有限公司 | A kind of method, device and equipment of speech synthesis |
CN109599090B (en) * | 2018-10-29 | 2020-10-30 | 创新先进技术有限公司 | Method, device and equipment for voice synthesis |
CN110600002A (en) * | 2019-09-18 | 2019-12-20 | 北京声智科技有限公司 | Voice synthesis method and device and electronic equipment |
CN110600002B (en) * | 2019-09-18 | 2022-04-22 | 北京声智科技有限公司 | Voice synthesis method and device and electronic equipment |
WO2021155662A1 (en) * | 2020-02-03 | 2021-08-12 | 华为技术有限公司 | Text information processing method and apparatus, computer device, and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107039033A (en) | A kind of speech synthetic device | |
CN105304080B (en) | Speech synthetic device and method | |
US11295721B2 (en) | Generating expressive speech audio from text data | |
CN111201565A (en) | System and method for sound-to-sound conversion | |
CN105244026B (en) | A kind of method of speech processing and device | |
Tran et al. | Improvement to a NAM-captured whisper-to-speech system | |
Keller | The analysis of voice quality in speech processing | |
CN113129914A (en) | Cross-language speech conversion system and method | |
CN108231062A (en) | A kind of voice translation method and device | |
DE112004000187T5 (en) | Method and apparatus of prosodic simulation synthesis | |
CN109346057A (en) | A kind of speech processing system of intelligence toy for children | |
CN106504742A (en) | The transmission method of synthesis voice, cloud server and terminal device | |
CN101310315A (en) | Language learning device, method and program and recording medium | |
US20210118464A1 (en) | Method and apparatus for emotion recognition from speech | |
CN113724683B (en) | Audio generation method, computer device and computer readable storage medium | |
CN112735454A (en) | Audio processing method and device, electronic equipment and readable storage medium | |
TW201806638A (en) | Auditory training device, auditory training method and program | |
CN114121006A (en) | Image output method, device, equipment and storage medium of virtual character | |
CN113053357A (en) | Speech synthesis method, apparatus, device and computer readable storage medium | |
CN111916054A (en) | Lip-based voice generation method, device and system and storage medium | |
CN102231275B (en) | Embedded speech synthesis method based on weighted mixed excitation | |
US20230015112A1 (en) | Method and apparatus for processing speech, electronic device and storage medium | |
CN113539239B (en) | Voice conversion method and device, storage medium and electronic equipment | |
CN116129852A (en) | Training method of speech synthesis model, speech synthesis method and related equipment | |
KR102484006B1 (en) | Voice self-practice method for voice disorders and user device for voice therapy |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170811 |
|
RJ01 | Rejection of invention patent application after publication |