CN110517662A - A kind of method and system of Intelligent voice broadcasting - Google Patents

A kind of method and system of Intelligent voice broadcasting Download PDF

Info

Publication number
CN110517662A
CN110517662A CN201910630232.8A CN201910630232A CN110517662A CN 110517662 A CN110517662 A CN 110517662A CN 201910630232 A CN201910630232 A CN 201910630232A CN 110517662 A CN110517662 A CN 110517662A
Authority
CN
China
Prior art keywords
voice
recording
true man
text
synthesis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910630232.8A
Other languages
Chinese (zh)
Inventor
贺来朋
刘露婕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Intelligent Technology Co Ltd
Original Assignee
Unisound Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisound Intelligent Technology Co Ltd filed Critical Unisound Intelligent Technology Co Ltd
Priority to CN201910630232.8A priority Critical patent/CN110517662A/en
Publication of CN110517662A publication Critical patent/CN110517662A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Abstract

The present invention provides a kind of method and system of Intelligent voice broadcasting, the method of the Intelligent voice broadcasting executes following steps: step 1: obtaining and generate voice, including synthesizing voice used by the recording of true man used by the fixation clause part in text to be processed and slot position part;Step 2: recording using from true man and synthesize the characteristic parameter extracted in voice, the characteristic parameter for generating voice is adjusted, to promote the consistency of synthesis voice and true man's recording;Step 3: text analyzing being done to the whole sentence synthesis text for generating voice, to retain context prosodic information, promotes the rhythm continuity of stitching portion;Step 4: audio effect processing being done to true man's recording, and, so that energy variation true man recording and synthesis voice between reach unanimity level regular to true man's recording and synthesis voice progress energy.Pairing is handled the method at voice and true man's recording respectively, promotes the similarity of synthesis voice and true man's recording and the global consistency of spliced voice.

Description

A kind of method and system of Intelligent voice broadcasting
Technical field
The present invention relates to intelligent sound technical field, in particular to a kind of method and system of Intelligent voice broadcasting.
Background technique
In the application scenarios such as intelligent outgoing call, the high quality casting audio of true man's pronunciation is needed access to.It is currently used to do Method is to be recorded to the fixation clause part in casting text using true man, (usual for the part for needing often to change in text Referred to as slot position, such as name, personal information etc.) using synthesis voice, then true man are recorded and do real-time splicing with synthesis voice.
Prior art due to synthesis system effect limitation, sound quality, in terms of, synthesis voice and true man record The beat opposite sex is larger, very unnatural in sense of hearing so as to cause spliced voice, and there is apparent jump in stitching portion Sense, influences Product Experience.
Summary of the invention
The present invention provides a kind of method and system of Intelligent voice broadcasting, the phase recorded to promote synthesis voice with true man Like degree and the global consistency of spliced voice.
The present invention provides a kind of method of Intelligent voice broadcasting, the method executes following steps:
Step 1: obtaining and generate voice, used by the generation voice includes the fixation clause part in text to be processed Voice is synthesized used by true man's recording and slot position part;
Step 2: using the characteristic parameter extracted from true man recording and the synthesis voice, to the generation voice Characteristic parameter be adjusted, with promoted it is described synthesis voice and the true man recording consistency;
Step 3: text analyzing being done to the whole sentence synthesis text for generating voice, to retain context prosodic information, is mentioned Rise the rhythm continuity of stitching portion;
Step 4: audio effect processing being done to true man recording, and energy is carried out with the synthesis voice to true man recording It is regular, the level so that energy variation between true man recording and the synthesis voice reaches unanimity.
Further, before the step 1, the method also includes being recorded using the true man to the generation voice Duration modeling and acoustic model the step of adjusting.
It further, further include step 5 between the step 2 and the step 3: to the text of the slot position part It optimizes, so that the text of the slot position part of optimization includes entire prosodic phrase information.
Further, between the step 3 and the step 4, the method also includes steps 6: removing the true man Mute section of the stitching portion of recording and the synthesis voice, to promote the continuity of stitching portion.
Further, after the step 4, the method also includes steps 7: it is directed to different types of slot position text, Dynamic adjusts the synthetic parameters of the slot position part.
Further, after the step 4, the method also includes steps 8: being directed to different application scene, is broadcasting The background sound of corresponding scene is added in voice.
A kind of method of Intelligent voice broadcasting provided in an embodiment of the present invention, has the advantages that respectively to synthesis Voice and true man's recording are handled, and promote the similarity of synthesis voice and true man's recording and the entirety of spliced voice Consistency.
The present invention also provides a kind of systems of Intelligent voice broadcasting, comprising:
Module is obtained, generates voice for obtaining, the voice that generates includes the fixation clause part in text to be processed Voice is synthesized used by used true man's recording and slot position part;
Module is adjusted, the characteristic parameter for recording using the true man with synthesis voice, to the spy for generating voice Sign parameter is adjusted, to promote the consistency for generating voice;
Text analysis model, for doing text analyzing to the whole sentence synthesis text for generating voice, to retain context Prosodic information promotes the rhythm continuity of stitching portion;
Audio effect processing module carries out audio effect processing for recording to the true man, and the true man is recorded and the conjunction Consistent level is arrived at the energy of voice is regular.
Preferably, the system of the Intelligent voice broadcasting further includes removing mute module, for removing true man's recording With mute section of stitching portion of the synthesis voice, to promote the continuity of stitching portion.
Preferably, the system of the Intelligent voice broadcasting further includes parameter adjustment module, for being directed to different types of slot Position text, dynamically adjusts the synthetic parameters of the slot position part.
Preferably, the system of the Intelligent voice broadcasting further includes background sound adding module, for being directed to different application field Scape adds the background sound of corresponding scene in casting voice.
A kind of system of Intelligent voice broadcasting provided in an embodiment of the present invention, have the advantages that adjustment module and Audio effect processing module is respectively handled generation voice and true man's recording, and text analysis model synthesizes the whole sentence for generating voice Text does text analyzing, promotes the similarity of synthesis voice and true man's recording and the global consistency of spliced voice
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification It obtains it is clear that understand through the implementation of the invention.The objectives and other advantages of the invention can be by written explanation Specifically noted structure is achieved and obtained in book, claims and attached drawing.
Below by drawings and examples, technical scheme of the present invention will be described in further detail.
Detailed description of the invention
Attached drawing is used to provide further understanding of the present invention, and constitutes part of specification, with reality of the invention It applies example to be used to explain the present invention together, not be construed as limiting the invention.In the accompanying drawings:
Fig. 1 is a kind of method flow schematic diagram of the method for Intelligent voice broadcasting in the embodiment of the present invention;
Fig. 2 is a kind of block diagram of the system of Intelligent voice broadcasting in the embodiment of the present invention.
Specific embodiment
Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings, it should be understood that preferred reality described herein Apply example only for the purpose of illustrating and explaining the present invention and is not intended to limit the present invention.
The embodiment of the invention provides a kind of methods of Intelligent voice broadcasting, as shown in Figure 1, the method executes following step It is rapid:
Step 1: obtaining and generate voice, used by the generation voice includes the fixation clause part in text to be processed Voice is synthesized used by true man's recording and slot position part;
Step 2: using the characteristic parameter extracted from true man recording and the synthesis voice, to the generation voice Characteristic parameter be adjusted, with promoted it is described synthesis voice and the true man recording consistency;
Step 3: text analyzing being done to the whole sentence synthesis text for generating voice, to retain context prosodic information, is mentioned Rise the rhythm continuity of stitching portion;
Step 4: audio effect processing being done to true man recording, and energy is carried out with the synthesis voice to true man recording It is regular, the level so that energy variation between true man recording and the synthesis voice reaches unanimity.
Specifically, it in the step 2, is recorded using existing true man and synthesizes the acoustical characteristic parameters of voice to TTS (Text To Speech, from Text To Speech) system model does adaptive adjustment, to do to the characteristic parameter for generating voice Corresponding adjustment.The massage voice reading that text-to-speech technology can convert in real time any text information to standard smoothness comes out, quite In having loaded onto artificial mouth to machine.Wherein, acoustic feature includes intonation, word speed, sound quality, fundamental frequency, the parameters,acoustics such as frequency spectrum.
It in the step 3, is completed by the front-end module in tts system, specifically, whole sentence synthesis text is inputted into TTS The front-end module of system, front-end module are treated converting text information and are analyzed and processed, the text envelope to be converted that will be originally inputted Breath is converted to different intermediate state information, for instructing text to carry out sounding.
In the step 4, the audio effect processing includes noise reduction process and/or reverberation processing.
The working principle of above-mentioned technical proposal are as follows: the characteristic parameter of voice is recorded and synthesized using true man to generation voice Characteristic parameter is adjusted;Text analyzing is done to the whole sentence synthesis text for generating voice;Audio effect processing is done to true man's recording, and will True man's recording and the energy of synthesis voice are regular to consistent level.
Above-mentioned technical proposal has the beneficial effect that pairing is handled at voice and true man's recording respectively, promotes synthesis language The global consistency of the similarity and spliced voice of sound and true man's recording.
In one embodiment, before the step 1, the method also includes being recorded using the true man to the life The step of adjusting at the duration modeling and acoustic model of voice.
The working principle of above-mentioned technical proposal are as follows: duration modeling can be the neural network of convolution, or be also possible to it He has the model of machine learning ability.Acoustic model can be hidden markov model, or be also possible to convolutional neural networks Model, or it is also possible to other models with machine learning ability.
Having the beneficial effect that for above-mentioned technical proposal does the duration modeling and acoustic model that generate voice using true man's recording After adjustment, more matched so that generating voice with true man's recording.
It in one embodiment, further include step 5 between the step 2 and the step 3: to the slot position part Text optimize so that optimization slot position part text include entire prosodic phrase information.
The working principle of above-mentioned technical proposal are as follows: in the step 5, the text of the slot position part of optimization includes entire Prosodic phrase information refers to the location information etc. of rhythm word, prosodic phrase.
Above-mentioned technical proposal has the beneficial effect that the similarity for further promoting synthesis voice and true man's recording, and spells The global consistency of voice after connecing.
Further, between the step 3 and the step 4, the method also includes steps 6: removing the true man Mute section of the stitching portion of recording and the synthesis voice, to promote the continuity of stitching portion.
The working principle of above-mentioned technical proposal are as follows: both true man's recording and synthesis voice are by Big-corpus stitching algorithm Spliced, in addition, the step 6 further includes to true man recording and the synthesis voice after mute section of removal The step of stitching portion is smoothed.
Having the beneficial effect that for above-mentioned technical proposal promotes true man's recording and the continuity at synthesis voice joint.
In one embodiment, after the step 4, the method also includes steps 7: being directed to different types of slot position Text dynamically adjusts the synthetic parameters of the slot position part.
Wherein, the synthetic parameters include word speed and rhythm etc..
The working principle of above-mentioned technical proposal are as follows: by the synthetic parameters of dynamic adjustment tank bit position, to emphasize in text Important information.
Above-mentioned technical proposal has the beneficial effect that so that Intelligent voice broadcasting is more in line with true man's articulation type.
In one embodiment, after the step 4, specifically, after the step 6, the method also includes Step 8: being directed to different application scene, the background sound of corresponding scene is added in casting voice.
The working principle of above-mentioned technical proposal are as follows: the background sound that adds can be more close to really exhaling in casting voice Scene out.
Above-mentioned technical proposal has the beneficial effect that so that the communication process of Intelligent voice broadcasting is more true.
The embodiment of the invention also provides a kind of systems of Intelligent voice broadcasting, as shown in Figure 2, comprising:
Module 201 is obtained, generates voice for obtaining, the voice that generates includes the fixation clause portion in text to be processed Voice is synthesized used by the recording of true man used by point and slot position part;
Adjustment module 202, the characteristic parameter for being recorded using the true man with synthesis voice, to the generation voice Characteristic parameter is adjusted, to promote the consistency for generating voice;
Text analysis model 203, for doing text analyzing to the whole sentence synthesis text for generating voice, to retain up and down Literary prosodic information promotes the rhythm continuity of stitching portion;
Audio effect processing module 204, for the true man record carry out audio effect processing, and by the true man recording with it is described The energy for synthesizing voice is regular to consistent level.
The audio effect processing module 204 includes noise reduction process module and reverberation processing module.The noise reduction process module is used Noise reduction process is carried out in recording to the true man, the reverberation processing module, which is used to record to the true man, carries out reverberation processing.
The working principle of above-mentioned technical proposal are as follows: the characteristic parameter that adjustment module 202 is recorded using true man with synthesis voice, The characteristic parameter for generating voice is adjusted;The whole sentence synthesis text of 203 pairs of text analysis model generation voices makees text point Analysis;Audio effect processing module 204, which records to true man, carries out audio effect processing, and true man's recording and the energy of synthesis voice is regular to one It causes horizontal.
Above-mentioned technical proposal has the beneficial effect that adjustment module and audio effect processing module respectively to generation at voice and true man Recording is handled, and text analysis model does text analyzing to the whole sentence synthesis text for generating voice, promotes synthesis voice and true The similarity of people's recording and the global consistency of spliced voice.
In one embodiment, the system of the Intelligent voice broadcasting further includes slot position text optimization module 205, for pair The text of the slot position part optimizes, so that the text of the slot position part of optimization includes entire prosodic phrase information.
The working principle of above-mentioned technical proposal are as follows: the entire prosodic phrase information that the text of the slot position part of optimization includes refers to Be rhythm word, prosodic phrase location information etc..
Above-mentioned technical proposal has the beneficial effect that the similarity for further promoting synthesis voice and true man's recording, and spells The global consistency of voice after connecing.
In one embodiment, the system of the Intelligent voice broadcasting further includes removing mute module 206, for removing Mute section for stating true man's recording and the stitching portion of the synthesis voice, to promote the continuity of stitching portion.
The working principle of above-mentioned technical proposal are as follows: the mute module 206 of removal includes smoothing module, for institute It states true man's recording and the stitching portion of the synthesis voice is smoothed.
Having the beneficial effect that for above-mentioned technical proposal can promote true man's recording and the continuity at synthesis voice joint.
In one embodiment, the system of the Intelligent voice broadcasting further includes parameter adjustment module 207, for for not The slot position text of same type dynamically adjusts the synthetic parameters of the slot position part.
Wherein, the synthetic parameters include word speed and rhythm etc..
The working principle of above-mentioned technical proposal are as follows: by the synthetic parameters of dynamic adjustment tank bit position, to emphasize in text Important information.
Above-mentioned technical proposal has the beneficial effect that so that Intelligent voice broadcasting is more in line with true man's articulation type.
In one embodiment, the system of the Intelligent voice broadcasting further includes background sound adding module 208, for being directed to Different application scene adds the background sound of corresponding scene in casting voice.
The working principle of above-mentioned technical proposal are as follows: the background sound that adds can be more close to really exhaling in casting voice Scene out.
Above-mentioned technical proposal has the beneficial effect that so that the communication process of Intelligent voice broadcasting is more true.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims (10)

1. a kind of method of Intelligent voice broadcasting, which is characterized in that the method executes following steps:
Step 1: obtaining and generate voice, the generation voice includes true man used by the fixation clause part in text to be processed Voice is synthesized used by recording and slot position part;
Step 2: using the characteristic parameter extracted from true man recording and the synthesis voice, to the spy for generating voice Sign parameter is adjusted, to promote the consistency of the synthesis voice and true man recording;
Step 3: text analyzing being done to the whole sentence synthesis text for generating voice, to retain context prosodic information, is promoted and is spelled Connect the rhythm continuity at place;
Step 4: audio effect processing being done to true man recording, and carrying out to true man recording with the synthesis voice can gauge It is whole, the level so that energy variation between true man recording and the synthesis voice reaches unanimity.
2. the method for Intelligent voice broadcasting as described in claim 1, which is characterized in that before the step 1, the method Further include the steps that adjusting to the duration modeling for generating voice and acoustic model using true man recording.
3. the method for Intelligent voice broadcasting as described in claim 1, which is characterized in that the step 2 and the step 3 it Between, further include step 5: the text of the slot position part is optimized, so that the text of the slot position part of optimization includes entire Prosodic phrase information.
4. the method for Intelligent voice broadcasting as claimed in claim 2, which is characterized in that the step 3 and the step 4 it Between, the method also includes steps 6: mute section of true man's recording and the stitching portion of the synthesis voice is removed, to be promoted The continuity of stitching portion.
5. the method for Intelligent voice broadcasting as described in claim 1, which is characterized in that after the step 4, the method Further include step 7: being directed to different types of slot position text, dynamically adjust the synthetic parameters of the slot position part.
6. the method for Intelligent voice broadcasting as described in claim 1, which is characterized in that after the step 4, the method Further include step 8: being directed to different application scene, the background sound of corresponding scene is added in casting voice.
7. a kind of system of Intelligent voice broadcasting characterized by comprising
Module is obtained, generates voice for obtaining, the voice that generates includes that the fixation clause part in text to be processed is adopted Voice is synthesized used by true man's recording and slot position part;
Module is adjusted, the characteristic parameter for being recorded using the true man with synthesis voice joins the feature for generating voice Number is adjusted, to promote the consistency for generating voice;
Text analysis model, for doing text analyzing to the whole sentence synthesis text for generating voice, to retain the context rhythm Information promotes the rhythm continuity of stitching portion;
Audio effect processing module carries out audio effect processing for recording to the true man, and the true man is recorded and the synthesis language The energy of sound is regular to arrive consistent level.
8. the system of Intelligent voice broadcasting as claimed in claim 7, which is characterized in that the system of the Intelligent voice broadcasting is also Including removing mute module, for removing mute section of true man's recording and the stitching portion of the synthesis voice, spelled with being promoted Connect the continuity at place.
9. the system of Intelligent voice broadcasting as claimed in claim 7, which is characterized in that the system of the Intelligent voice broadcasting is also The synthetic parameters of the slot position part are dynamically adjusted for being directed to different types of slot position text including parameter adjustment module.
10. the system of Intelligent voice broadcasting as claimed in claim 7, which is characterized in that the system of the Intelligent voice broadcasting Further include background sound adding module, for being directed to different application scene, the background sound of corresponding scene is added in casting voice.
CN201910630232.8A 2019-07-12 2019-07-12 A kind of method and system of Intelligent voice broadcasting Pending CN110517662A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910630232.8A CN110517662A (en) 2019-07-12 2019-07-12 A kind of method and system of Intelligent voice broadcasting

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910630232.8A CN110517662A (en) 2019-07-12 2019-07-12 A kind of method and system of Intelligent voice broadcasting

Publications (1)

Publication Number Publication Date
CN110517662A true CN110517662A (en) 2019-11-29

Family

ID=68623049

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910630232.8A Pending CN110517662A (en) 2019-07-12 2019-07-12 A kind of method and system of Intelligent voice broadcasting

Country Status (1)

Country Link
CN (1) CN110517662A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111564153A (en) * 2020-04-02 2020-08-21 湖南声广信息科技有限公司 Intelligent broadcasting music program system of broadcasting station
CN112289298A (en) * 2020-09-30 2021-01-29 北京大米科技有限公司 Processing method and device for synthesized voice, storage medium and electronic equipment
CN113744716A (en) * 2021-10-19 2021-12-03 北京房江湖科技有限公司 Method and apparatus for synthesizing speech

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1584979A (en) * 2004-06-01 2005-02-23 安徽中科大讯飞信息科技有限公司 Method for outputting mixed with background sound and text sound in speech synthetic system
CN1584980A (en) * 2004-06-01 2005-02-23 安徽中科大讯飞信息科技有限公司 Method for synthetic output with prompting sound and text sound in speech synthetic system
CN1811913A (en) * 2005-01-28 2006-08-02 凌阳科技股份有限公司 Mixed parameter mode type speech sounds synthetizing system and method
CN1945691A (en) * 2006-10-16 2007-04-11 安徽中科大讯飞信息科技有限公司 Method for improving template sentence synthetic effect in voice synthetic system
CN101000765A (en) * 2007-01-09 2007-07-18 黑龙江大学 Speech synthetic method based on rhythm character
CN101685633A (en) * 2008-09-28 2010-03-31 富士通株式会社 Voice synthesizing apparatus and method based on rhythm reference
CN108182936A (en) * 2018-03-14 2018-06-19 百度在线网络技术(北京)有限公司 Voice signal generation method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1584979A (en) * 2004-06-01 2005-02-23 安徽中科大讯飞信息科技有限公司 Method for outputting mixed with background sound and text sound in speech synthetic system
CN1584980A (en) * 2004-06-01 2005-02-23 安徽中科大讯飞信息科技有限公司 Method for synthetic output with prompting sound and text sound in speech synthetic system
CN1811913A (en) * 2005-01-28 2006-08-02 凌阳科技股份有限公司 Mixed parameter mode type speech sounds synthetizing system and method
CN1945691A (en) * 2006-10-16 2007-04-11 安徽中科大讯飞信息科技有限公司 Method for improving template sentence synthetic effect in voice synthetic system
CN101000765A (en) * 2007-01-09 2007-07-18 黑龙江大学 Speech synthetic method based on rhythm character
CN101685633A (en) * 2008-09-28 2010-03-31 富士通株式会社 Voice synthesizing apparatus and method based on rhythm reference
CN108182936A (en) * 2018-03-14 2018-06-19 百度在线网络技术(北京)有限公司 Voice signal generation method and device

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111564153A (en) * 2020-04-02 2020-08-21 湖南声广信息科技有限公司 Intelligent broadcasting music program system of broadcasting station
CN111564153B (en) * 2020-04-02 2021-10-01 湖南声广科技有限公司 Intelligent broadcasting music program system of broadcasting station
CN112289298A (en) * 2020-09-30 2021-01-29 北京大米科技有限公司 Processing method and device for synthesized voice, storage medium and electronic equipment
CN113744716A (en) * 2021-10-19 2021-12-03 北京房江湖科技有限公司 Method and apparatus for synthesizing speech
CN113744716B (en) * 2021-10-19 2023-08-29 北京房江湖科技有限公司 Method and apparatus for synthesizing speech

Similar Documents

Publication Publication Date Title
US11295721B2 (en) Generating expressive speech audio from text data
Takamichi et al. Postfilters to modify the modulation spectrum for statistical parametric speech synthesis
JP2885372B2 (en) Audio coding method
US7739113B2 (en) Voice synthesizer, voice synthesizing method, and computer program
JP2021110943A (en) Cross-lingual voice conversion system and method
CN110517662A (en) A kind of method and system of Intelligent voice broadcasting
CN108766413A (en) Phoneme synthesizing method and system
JP2020507819A (en) Method and apparatus for dynamically modifying voice sound quality by frequency shift of spectral envelope formants
CN112735454A (en) Audio processing method and device, electronic equipment and readable storage medium
KR20230056741A (en) Synthetic Data Augmentation Using Voice Transformation and Speech Recognition Models
CN106548785A (en) A kind of method of speech processing and device, terminal unit
Hu et al. Whispered and Lombard neural speech synthesis
CN112530400A (en) Method, system, device and medium for generating voice based on text of deep learning
US20090177473A1 (en) Applying vocal characteristics from a target speaker to a source speaker for synthetic speech
WO2015025788A1 (en) Quantitative f0 pattern generation device and method, and model learning device and method for generating f0 pattern
KR102072627B1 (en) Speech synthesis apparatus and method thereof
JP2005070430A (en) Speech output device and method
CN116798405B (en) Speech synthesis method, device, storage medium and electronic equipment
JP6330069B2 (en) Multi-stream spectral representation for statistical parametric speech synthesis
WO2023116243A1 (en) Data conversion method and computer storage medium
CN114005428A (en) Speech synthesis method, apparatus, electronic device, storage medium, and program product
Tanaka et al. A vibration control method of an electrolarynx based on statistical f 0 pattern prediction
JP7179216B1 (en) VOICE CONVERSION DEVICE, VOICE CONVERSION METHOD, VOICE CONVERSION NEURAL NETWORK, PROGRAM, AND RECORDING MEDIUM
US11908447B2 (en) Method and apparatus for synthesizing multi-speaker speech using artificial neural network
Chandra et al. Towards The Development Of Accent Conversion Model For (L1) Bengali Speaker Using Cycle Consistent Adversarial Network (Cyclegan)

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191129