CN107039033A - A kind of speech synthetic device - Google Patents

A kind of speech synthetic device Download PDF

Info

Publication number
CN107039033A
CN107039033A CN201710248786.2A CN201710248786A CN107039033A CN 107039033 A CN107039033 A CN 107039033A CN 201710248786 A CN201710248786 A CN 201710248786A CN 107039033 A CN107039033 A CN 107039033A
Authority
CN
China
Prior art keywords
tone
speech
synthesis
voice
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710248786.2A
Other languages
Chinese (zh)
Inventor
王军
陈翠琴
尹利平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hainan vocational technical college
Original Assignee
Hainan vocational technical college
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hainan vocational technical college filed Critical Hainan vocational technical college
Priority to CN201710248786.2A priority Critical patent/CN107039033A/en
Publication of CN107039033A publication Critical patent/CN107039033A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • G10L13/0335Pitch control
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • G10L13/10Prosody rules derived from text; Stress or intonation

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

The present invention relates to a kind of speech synthetic device, module, receiving module, tone processing module, Modifying model module and synthesis module are built including voice, wherein tone processing module is used in the speech text to be synthesized received, according to the status information for indicating affective state, produce for influenceing the tone information of synthesis voice;Synthesis speech data of the synthesis module synthesis with tone is eventually passed, so that the speech naturalness after synthesis is higher, and then Consumer's Experience is improved.

Description

A kind of speech synthetic device
Technical field
The present invention relates to the present invention relates to phonetic synthesis field, and in particular to a kind of speech synthetic device.
Background technology
Any text information, the smooth voice of standard can be converted into real time bright by phonetic synthesis, also known as literary periodicals technology Read out, and synthesis voice is had higher intelligibility and naturalness as far as possible, artificial face has been loaded onto equivalent to machine.
Synthesis voice should can regenerate transmission information in the way of nature is read again with emotion by a kind of, it may be preferable to body Reveal stronger rhythmical image, voice of the synthesis with specific characteristic style, the heavier novel of such as emotion reads aloud style, storytelling Style, and different manifestations the informal synthesis voice style such as vein of humour vein, so as to increase the diversity of synthesis voice, meet The different demands of people.
In existing speech synthesis system, after input text is by a series of processing such as Text Pretreatment, participle, into rhythm Level prediction module is restrained, then using acoustic model, target acoustical parameters sequence is generated, and finally synthesize voice.Closed in parameter Into in system, speech production is realized by vocoder, due to this speech production mode, it is not necessary to utilize original sound Fragment is spliced, and can accomplish smaller size, so being widely applied on embedded equipment.
At present, synthesis voice Main is that rule-based method pairing is adjusted into voice, and this method can not In view of the details of voice, such as tone information causes the speech naturalness after synthesis relatively low, and then reduce Consumer's Experience.
The content of the invention
In order to solve the above-mentioned technical problem, the present invention provides a kind of speech synthetic device.
The present invention is realized with following technical scheme, a kind of speech synthetic device, including:
Voice builds module, for building phonetic synthesis model previously according to a large amount of speech datas of collection;
Receiving module, the speech text to be synthesized for receiving user;
Tone processing module, in the speech text to be synthesized received, being believed according to the state for indicating affective state Breath, is produced for influenceing the tone information of synthesis voice;
Modifying model module, for being modified according to the tone information of synthesis voice to the phonetic synthesis model;
Synthesis module, for carrying out voice conjunction according to the revised voice with tone information of the Modifying model module Into obtaining the synthesis speech data with tone.
Preferably, the tone processing module includes:Pitch parameters generating unit and tone information converter section, the tone Parameter generating unit is used to produce tone affecting parameters, and tone information conversion according to the status information for indicating affective state The tone affecting parameters that the pitch parameters generating unit is produced are converted into tone influence information by portion.
Preferably, the Modifying model module includes:
Tone acquiring unit, for obtaining the number of tones corresponding to the tone information produced with the tone processing module According to;
Tone recognition unit, for carrying out tone identification to the tone data, obtains tone identification text;
Acoustic feature extraction unit, the acoustic feature for extracting the speech text to be synthesized;
Voice amending unit, for tone identification text to be modified to the phonetic synthesis model, is repaiied Phonetic synthesis model after just.
Preferably, the Modifying model module also includes:Pretreatment unit, use is received for removing the receiving module Noise in the speech text to be synthesized at family.
Preferably, the tone influence information is the characteristic parameter extracted from the Wave data of voice.
Preferably, the tone influence information is the control parameter synthesized for control voice.
Preferably, the control parameter is used for the volume balance and amplitude wave momentum that control voice is synthesized.
The beneficial effects of the invention are as follows:The present invention is after the speech text to be synthesized of user is received, according to instruction emotion The status information of state, produces for influenceing the tone information of synthesis voice, is subsequently used for the tone information according to synthesis voice The phonetic synthesis model is modified;Finally obtain the synthesis speech data with tone so that the voice after synthesis is certainly So degree is higher, and then improves Consumer's Experience.
Brief description of the drawings
Fig. 1 is the structured flowchart of speech synthetic device of the present invention.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, the present invention is made into one below in conjunction with accompanying drawing It is described in detail on step ground.
As shown in figure 1, the present invention provides a kind of speech synthetic device, including:
Voice builds module, for building phonetic synthesis model previously according to a large amount of speech datas of collection;
Receiving module, the speech text to be synthesized for receiving user;
Tone processing module, in the speech text to be synthesized received, being believed according to the state for indicating affective state Breath, is produced for influenceing the tone information of synthesis voice;
Modifying model module, for being modified according to the tone information of synthesis voice to the phonetic synthesis model;
Synthesis module, for carrying out voice conjunction according to the revised voice with tone information of the Modifying model module Into obtaining the synthesis speech data with tone.
Preferably, the tone processing module includes:Pitch parameters generating unit and tone information converter section, the tone Parameter generating unit is used to produce tone affecting parameters, and tone information conversion according to the status information for indicating affective state The tone affecting parameters that the pitch parameters generating unit is produced are converted into tone influence information by portion.
Preferably, the Modifying model module includes:
Tone acquiring unit, for obtaining the number of tones corresponding to the tone information produced with the tone processing module According to;
Tone recognition unit, for carrying out tone identification to the tone data, obtains tone identification text;
Acoustic feature extraction unit, the acoustic feature for extracting the speech text to be synthesized;
Voice amending unit, for tone identification text to be modified to the phonetic synthesis model, is repaiied Phonetic synthesis model after just.
Preferably, the Modifying model module also includes:Pretreatment unit, use is received for removing the receiving module Noise in the speech text to be synthesized at family.
Preferably, the tone influence information is the characteristic parameter extracted from the Wave data of voice.
Preferably, the tone influence information is the control parameter synthesized for control voice.
Preferably, the control parameter is used for the volume balance and amplitude wave momentum that control voice is synthesized.
Above disclosure is only preferred embodiment of present invention, can not limit the right model of the present invention with this certainly Enclose, therefore the equivalent variations made according to the claims in the present invention, still belong to the scope that the present invention is covered.

Claims (7)

1. a kind of speech synthetic device, it is characterised in that including:
Voice builds module, for building phonetic synthesis model previously according to a large amount of speech datas of collection;
Receiving module, the speech text to be synthesized for receiving user;
Tone processing module, in the speech text to be synthesized received, according to the status information for indicating affective state, production It is raw to be used to influence the tone information of synthesis voice;
Modifying model module, for being modified according to the tone information of synthesis voice to the phonetic synthesis model;
Synthesis module, for carrying out phonetic synthesis according to the revised voice with tone information of the Modifying model module, is obtained To the synthesis speech data with tone.
2. a kind of speech synthetic device according to claim 1, it is characterised in that the tone processing module includes:Sound Parameter generating unit and tone information converter section are adjusted, the pitch parameters generating unit is used for according to the status information for indicating affective state Produce tone affecting parameters, and the tone affecting parameters that the tone information converter section produces the pitch parameters generating unit It is converted into tone influence information.
3. a kind of speech synthetic device according to claim 1, it is characterised in that the Modifying model module includes:
Tone acquiring unit, for obtaining the tone data corresponding to the tone information produced with the tone processing module;
Tone recognition unit, for carrying out tone identification to the tone data, obtains tone identification text;
Acoustic feature extraction unit, the acoustic feature for extracting the speech text to be synthesized;
Voice amending unit, for tone identification text to be modified to the phonetic synthesis model, is obtained after amendment Phonetic synthesis model.
4. a kind of speech synthetic device according to claim 3, it is characterised in that the Modifying model module also includes: Pretreatment unit, the noise in the speech text to be synthesized of user is received for removing the receiving module.
5. a kind of speech synthetic device according to claim 2, it is characterised in that the tone influence information is from voice Wave data in the characteristic parameter that extracts.
6. a kind of speech synthetic device according to claim 2, it is characterised in that the tone influence information is to be used to control The control parameter of phonetic synthesis processed.
7. a kind of speech synthetic device according to claim 6, it is characterised in that the control parameter is synthesized for control voice Volume balance and amplitude wave momentum.
CN201710248786.2A 2017-04-17 2017-04-17 A kind of speech synthetic device Pending CN107039033A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710248786.2A CN107039033A (en) 2017-04-17 2017-04-17 A kind of speech synthetic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710248786.2A CN107039033A (en) 2017-04-17 2017-04-17 A kind of speech synthetic device

Publications (1)

Publication Number Publication Date
CN107039033A true CN107039033A (en) 2017-08-11

Family

ID=59535293

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710248786.2A Pending CN107039033A (en) 2017-04-17 2017-04-17 A kind of speech synthetic device

Country Status (1)

Country Link
CN (1) CN107039033A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108010512A (en) * 2017-12-05 2018-05-08 广东小天才科技有限公司 Sound effect acquisition method and recording terminal
CN109036370A (en) * 2018-06-06 2018-12-18 安徽继远软件有限公司 A kind of speaker's voice adaptive training method
CN109599090A (en) * 2018-10-29 2019-04-09 阿里巴巴集团控股有限公司 A kind of method, device and equipment of speech synthesis
CN110312161A (en) * 2018-03-20 2019-10-08 Tcl集团股份有限公司 A kind of video dubbing method, device and terminal device
CN110600002A (en) * 2019-09-18 2019-12-20 北京声智科技有限公司 Voice synthesis method and device and electronic equipment
WO2021155662A1 (en) * 2020-02-03 2021-08-12 华为技术有限公司 Text information processing method and apparatus, computer device, and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5873059A (en) * 1995-10-26 1999-02-16 Sony Corporation Method and apparatus for decoding and changing the pitch of an encoded speech signal
CN1461463A (en) * 2001-03-09 2003-12-10 索尼公司 Voice synthesis device
CN102201234A (en) * 2011-06-24 2011-09-28 北京宇音天下科技有限公司 Speech synthesizing method based on tone automatic tagging and prediction
CN102496363A (en) * 2011-11-11 2012-06-13 北京宇音天下科技有限公司 Correction method for Chinese speech synthesis tone
CN103117057A (en) * 2012-12-27 2013-05-22 安徽科大讯飞信息科技股份有限公司 Application method of special human voice synthesis technique in mobile phone cartoon dubbing

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5873059A (en) * 1995-10-26 1999-02-16 Sony Corporation Method and apparatus for decoding and changing the pitch of an encoded speech signal
CN1461463A (en) * 2001-03-09 2003-12-10 索尼公司 Voice synthesis device
CN102201234A (en) * 2011-06-24 2011-09-28 北京宇音天下科技有限公司 Speech synthesizing method based on tone automatic tagging and prediction
CN102496363A (en) * 2011-11-11 2012-06-13 北京宇音天下科技有限公司 Correction method for Chinese speech synthesis tone
CN103117057A (en) * 2012-12-27 2013-05-22 安徽科大讯飞信息科技股份有限公司 Application method of special human voice synthesis technique in mobile phone cartoon dubbing

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108010512A (en) * 2017-12-05 2018-05-08 广东小天才科技有限公司 Sound effect acquisition method and recording terminal
CN108010512B (en) * 2017-12-05 2021-04-30 广东小天才科技有限公司 Sound effect acquisition method and recording terminal
CN110312161A (en) * 2018-03-20 2019-10-08 Tcl集团股份有限公司 A kind of video dubbing method, device and terminal device
CN110312161B (en) * 2018-03-20 2020-12-11 Tcl科技集团股份有限公司 Video dubbing method and device and terminal equipment
CN109036370A (en) * 2018-06-06 2018-12-18 安徽继远软件有限公司 A kind of speaker's voice adaptive training method
CN109036370B (en) * 2018-06-06 2021-07-20 安徽继远软件有限公司 Adaptive training method for speaker voice
CN109599090A (en) * 2018-10-29 2019-04-09 阿里巴巴集团控股有限公司 A kind of method, device and equipment of speech synthesis
CN109599090B (en) * 2018-10-29 2020-10-30 创新先进技术有限公司 Method, device and equipment for voice synthesis
CN110600002A (en) * 2019-09-18 2019-12-20 北京声智科技有限公司 Voice synthesis method and device and electronic equipment
CN110600002B (en) * 2019-09-18 2022-04-22 北京声智科技有限公司 Voice synthesis method and device and electronic equipment
WO2021155662A1 (en) * 2020-02-03 2021-08-12 华为技术有限公司 Text information processing method and apparatus, computer device, and readable storage medium

Similar Documents

Publication Publication Date Title
CN107039033A (en) A kind of speech synthetic device
CN105304080B (en) Speech synthetic device and method
US11295721B2 (en) Generating expressive speech audio from text data
CN111201565A (en) System and method for sound-to-sound conversion
CN105244026B (en) A kind of method of speech processing and device
Tran et al. Improvement to a NAM-captured whisper-to-speech system
Keller The analysis of voice quality in speech processing
CN113129914A (en) Cross-language speech conversion system and method
CN108231062A (en) A kind of voice translation method and device
DE112004000187T5 (en) Method and apparatus of prosodic simulation synthesis
CN109346057A (en) A kind of speech processing system of intelligence toy for children
CN106504742A (en) The transmission method of synthesis voice, cloud server and terminal device
CN101310315A (en) Language learning device, method and program and recording medium
US20210118464A1 (en) Method and apparatus for emotion recognition from speech
CN113724683B (en) Audio generation method, computer device and computer readable storage medium
CN112735454A (en) Audio processing method and device, electronic equipment and readable storage medium
TW201806638A (en) Auditory training device, auditory training method and program
CN114121006A (en) Image output method, device, equipment and storage medium of virtual character
CN113053357A (en) Speech synthesis method, apparatus, device and computer readable storage medium
CN111916054A (en) Lip-based voice generation method, device and system and storage medium
CN102231275B (en) Embedded speech synthesis method based on weighted mixed excitation
US20230015112A1 (en) Method and apparatus for processing speech, electronic device and storage medium
CN113539239B (en) Voice conversion method and device, storage medium and electronic equipment
CN116129852A (en) Training method of speech synthesis model, speech synthesis method and related equipment
KR102484006B1 (en) Voice self-practice method for voice disorders and user device for voice therapy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170811

RJ01 Rejection of invention patent application after publication