CN110517662A

CN110517662A - A kind of method and system of Intelligent voice broadcasting

Info

Publication number: CN110517662A
Application number: CN201910630232.8A
Authority: CN
Inventors: 贺来朋; 刘露婕
Original assignee: Unisound Intelligent Technology Co Ltd
Current assignee: Unisound Intelligent Technology Co Ltd
Priority date: 2019-07-12
Filing date: 2019-07-12
Publication date: 2019-11-29

Abstract

The present invention provides a kind of method and system of Intelligent voice broadcasting, the method of the Intelligent voice broadcasting executes following steps: step 1: obtaining and generate voice, including synthesizing voice used by the recording of true man used by the fixation clause part in text to be processed and slot position part；Step 2: recording using from true man and synthesize the characteristic parameter extracted in voice, the characteristic parameter for generating voice is adjusted, to promote the consistency of synthesis voice and true man's recording；Step 3: text analyzing being done to the whole sentence synthesis text for generating voice, to retain context prosodic information, promotes the rhythm continuity of stitching portion；Step 4: audio effect processing being done to true man's recording, and, so that energy variation true man recording and synthesis voice between reach unanimity level regular to true man's recording and synthesis voice progress energy.Pairing is handled the method at voice and true man's recording respectively, promotes the similarity of synthesis voice and true man's recording and the global consistency of spliced voice.

Description

A kind of method and system of Intelligent voice broadcasting

Technical field

The present invention relates to intelligent sound technical field, in particular to a kind of method and system of Intelligent voice broadcasting.

Background technique

In the application scenarios such as intelligent outgoing call, the high quality casting audio of true man's pronunciation is needed access to.It is currently used to do Method is to be recorded to the fixation clause part in casting text using true man, (usual for the part for needing often to change in text Referred to as slot position, such as name, personal information etc.) using synthesis voice, then true man are recorded and do real-time splicing with synthesis voice.

Prior art due to synthesis system effect limitation, sound quality, in terms of, synthesis voice and true man record The beat opposite sex is larger, very unnatural in sense of hearing so as to cause spliced voice, and there is apparent jump in stitching portion Sense, influences Product Experience.

Summary of the invention

The present invention provides a kind of method and system of Intelligent voice broadcasting, the phase recorded to promote synthesis voice with true man Like degree and the global consistency of spliced voice.

The present invention provides a kind of method of Intelligent voice broadcasting, the method executes following steps:

Step 1: obtaining and generate voice, used by the generation voice includes the fixation clause part in text to be processed Voice is synthesized used by true man's recording and slot position part；

Step 2: using the characteristic parameter extracted from true man recording and the synthesis voice, to the generation voice Characteristic parameter be adjusted, with promoted it is described synthesis voice and the true man recording consistency；

Step 3: text analyzing being done to the whole sentence synthesis text for generating voice, to retain context prosodic information, is mentioned Rise the rhythm continuity of stitching portion；

Step 4: audio effect processing being done to true man recording, and energy is carried out with the synthesis voice to true man recording It is regular, the level so that energy variation between true man recording and the synthesis voice reaches unanimity.

Further, before the step 1, the method also includes being recorded using the true man to the generation voice Duration modeling and acoustic model the step of adjusting.

It further, further include step 5 between the step 2 and the step 3: to the text of the slot position part It optimizes, so that the text of the slot position part of optimization includes entire prosodic phrase information.

Further, between the step 3 and the step 4, the method also includes steps 6: removing the true man Mute section of the stitching portion of recording and the synthesis voice, to promote the continuity of stitching portion.

Further, after the step 4, the method also includes steps 7: it is directed to different types of slot position text, Dynamic adjusts the synthetic parameters of the slot position part.

Further, after the step 4, the method also includes steps 8: being directed to different application scene, is broadcasting The background sound of corresponding scene is added in voice.

A kind of method of Intelligent voice broadcasting provided in an embodiment of the present invention, has the advantages that respectively to synthesis Voice and true man's recording are handled, and promote the similarity of synthesis voice and true man's recording and the entirety of spliced voice Consistency.

The present invention also provides a kind of systems of Intelligent voice broadcasting, comprising:

Module is obtained, generates voice for obtaining, the voice that generates includes the fixation clause part in text to be processed Voice is synthesized used by used true man's recording and slot position part；

Module is adjusted, the characteristic parameter for recording using the true man with synthesis voice, to the spy for generating voice Sign parameter is adjusted, to promote the consistency for generating voice；

Text analysis model, for doing text analyzing to the whole sentence synthesis text for generating voice, to retain context Prosodic information promotes the rhythm continuity of stitching portion；

Audio effect processing module carries out audio effect processing for recording to the true man, and the true man is recorded and the conjunction Consistent level is arrived at the energy of voice is regular.

Preferably, the system of the Intelligent voice broadcasting further includes removing mute module, for removing true man's recording With mute section of stitching portion of the synthesis voice, to promote the continuity of stitching portion.

Preferably, the system of the Intelligent voice broadcasting further includes parameter adjustment module, for being directed to different types of slot Position text, dynamically adjusts the synthetic parameters of the slot position part.

Preferably, the system of the Intelligent voice broadcasting further includes background sound adding module, for being directed to different application field Scape adds the background sound of corresponding scene in casting voice.

A kind of system of Intelligent voice broadcasting provided in an embodiment of the present invention, have the advantages that adjustment module and Audio effect processing module is respectively handled generation voice and true man's recording, and text analysis model synthesizes the whole sentence for generating voice Text does text analyzing, promotes the similarity of synthesis voice and true man's recording and the global consistency of spliced voice

Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification It obtains it is clear that understand through the implementation of the invention.The objectives and other advantages of the invention can be by written explanation Specifically noted structure is achieved and obtained in book, claims and attached drawing.

Below by drawings and examples, technical scheme of the present invention will be described in further detail.

Detailed description of the invention

Attached drawing is used to provide further understanding of the present invention, and constitutes part of specification, with reality of the invention It applies example to be used to explain the present invention together, not be construed as limiting the invention.In the accompanying drawings:

Fig. 1 is a kind of method flow schematic diagram of the method for Intelligent voice broadcasting in the embodiment of the present invention；

Fig. 2 is a kind of block diagram of the system of Intelligent voice broadcasting in the embodiment of the present invention.

Specific embodiment

Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings, it should be understood that preferred reality described herein Apply example only for the purpose of illustrating and explaining the present invention and is not intended to limit the present invention.

The embodiment of the invention provides a kind of methods of Intelligent voice broadcasting, as shown in Figure 1, the method executes following step It is rapid:

Specifically, it in the step 2, is recorded using existing true man and synthesizes the acoustical characteristic parameters of voice to TTS (Text To Speech, from Text To Speech) system model does adaptive adjustment, to do to the characteristic parameter for generating voice Corresponding adjustment.The massage voice reading that text-to-speech technology can convert in real time any text information to standard smoothness comes out, quite In having loaded onto artificial mouth to machine.Wherein, acoustic feature includes intonation, word speed, sound quality, fundamental frequency, the parameters,acoustics such as frequency spectrum.

It in the step 3, is completed by the front-end module in tts system, specifically, whole sentence synthesis text is inputted into TTS The front-end module of system, front-end module are treated converting text information and are analyzed and processed, the text envelope to be converted that will be originally inputted Breath is converted to different intermediate state information, for instructing text to carry out sounding.

In the step 4, the audio effect processing includes noise reduction process and/or reverberation processing.

The working principle of above-mentioned technical proposal are as follows: the characteristic parameter of voice is recorded and synthesized using true man to generation voice Characteristic parameter is adjusted；Text analyzing is done to the whole sentence synthesis text for generating voice；Audio effect processing is done to true man's recording, and will True man's recording and the energy of synthesis voice are regular to consistent level.

Above-mentioned technical proposal has the beneficial effect that pairing is handled at voice and true man's recording respectively, promotes synthesis language The global consistency of the similarity and spliced voice of sound and true man's recording.

In one embodiment, before the step 1, the method also includes being recorded using the true man to the life The step of adjusting at the duration modeling and acoustic model of voice.

The working principle of above-mentioned technical proposal are as follows: duration modeling can be the neural network of convolution, or be also possible to it He has the model of machine learning ability.Acoustic model can be hidden markov model, or be also possible to convolutional neural networks Model, or it is also possible to other models with machine learning ability.

Having the beneficial effect that for above-mentioned technical proposal does the duration modeling and acoustic model that generate voice using true man's recording After adjustment, more matched so that generating voice with true man's recording.

It in one embodiment, further include step 5 between the step 2 and the step 3: to the slot position part Text optimize so that optimization slot position part text include entire prosodic phrase information.

The working principle of above-mentioned technical proposal are as follows: in the step 5, the text of the slot position part of optimization includes entire Prosodic phrase information refers to the location information etc. of rhythm word, prosodic phrase.

Above-mentioned technical proposal has the beneficial effect that the similarity for further promoting synthesis voice and true man's recording, and spells The global consistency of voice after connecing.

The working principle of above-mentioned technical proposal are as follows: both true man's recording and synthesis voice are by Big-corpus stitching algorithm Spliced, in addition, the step 6 further includes to true man recording and the synthesis voice after mute section of removal The step of stitching portion is smoothed.

Having the beneficial effect that for above-mentioned technical proposal promotes true man's recording and the continuity at synthesis voice joint.

In one embodiment, after the step 4, the method also includes steps 7: being directed to different types of slot position Text dynamically adjusts the synthetic parameters of the slot position part.

Wherein, the synthetic parameters include word speed and rhythm etc..

The working principle of above-mentioned technical proposal are as follows: by the synthetic parameters of dynamic adjustment tank bit position, to emphasize in text Important information.

Above-mentioned technical proposal has the beneficial effect that so that Intelligent voice broadcasting is more in line with true man's articulation type.

In one embodiment, after the step 4, specifically, after the step 6, the method also includes Step 8: being directed to different application scene, the background sound of corresponding scene is added in casting voice.

The working principle of above-mentioned technical proposal are as follows: the background sound that adds can be more close to really exhaling in casting voice Scene out.

Above-mentioned technical proposal has the beneficial effect that so that the communication process of Intelligent voice broadcasting is more true.

The embodiment of the invention also provides a kind of systems of Intelligent voice broadcasting, as shown in Figure 2, comprising:

Module 201 is obtained, generates voice for obtaining, the voice that generates includes the fixation clause portion in text to be processed Voice is synthesized used by the recording of true man used by point and slot position part；

Adjustment module 202, the characteristic parameter for being recorded using the true man with synthesis voice, to the generation voice Characteristic parameter is adjusted, to promote the consistency for generating voice；

Text analysis model 203, for doing text analyzing to the whole sentence synthesis text for generating voice, to retain up and down Literary prosodic information promotes the rhythm continuity of stitching portion；

Audio effect processing module 204, for the true man record carry out audio effect processing, and by the true man recording with it is described The energy for synthesizing voice is regular to consistent level.

The audio effect processing module 204 includes noise reduction process module and reverberation processing module.The noise reduction process module is used Noise reduction process is carried out in recording to the true man, the reverberation processing module, which is used to record to the true man, carries out reverberation processing.

The working principle of above-mentioned technical proposal are as follows: the characteristic parameter that adjustment module 202 is recorded using true man with synthesis voice, The characteristic parameter for generating voice is adjusted；The whole sentence synthesis text of 203 pairs of text analysis model generation voices makees text point Analysis；Audio effect processing module 204, which records to true man, carries out audio effect processing, and true man's recording and the energy of synthesis voice is regular to one It causes horizontal.

Above-mentioned technical proposal has the beneficial effect that adjustment module and audio effect processing module respectively to generation at voice and true man Recording is handled, and text analysis model does text analyzing to the whole sentence synthesis text for generating voice, promotes synthesis voice and true The similarity of people's recording and the global consistency of spliced voice.

In one embodiment, the system of the Intelligent voice broadcasting further includes slot position text optimization module 205, for pair The text of the slot position part optimizes, so that the text of the slot position part of optimization includes entire prosodic phrase information.

The working principle of above-mentioned technical proposal are as follows: the entire prosodic phrase information that the text of the slot position part of optimization includes refers to Be rhythm word, prosodic phrase location information etc..

In one embodiment, the system of the Intelligent voice broadcasting further includes removing mute module 206, for removing Mute section for stating true man's recording and the stitching portion of the synthesis voice, to promote the continuity of stitching portion.

The working principle of above-mentioned technical proposal are as follows: the mute module 206 of removal includes smoothing module, for institute It states true man's recording and the stitching portion of the synthesis voice is smoothed.

Having the beneficial effect that for above-mentioned technical proposal can promote true man's recording and the continuity at synthesis voice joint.

In one embodiment, the system of the Intelligent voice broadcasting further includes parameter adjustment module 207, for for not The slot position text of same type dynamically adjusts the synthetic parameters of the slot position part.

Wherein, the synthetic parameters include word speed and rhythm etc..

In one embodiment, the system of the Intelligent voice broadcasting further includes background sound adding module 208, for being directed to Different application scene adds the background sound of corresponding scene in casting voice.

It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.

The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims

1. a kind of method of Intelligent voice broadcasting, which is characterized in that the method executes following steps:

Step 1: obtaining and generate voice, the generation voice includes true man used by the fixation clause part in text to be processed Voice is synthesized used by recording and slot position part；

Step 2: using the characteristic parameter extracted from true man recording and the synthesis voice, to the spy for generating voice Sign parameter is adjusted, to promote the consistency of the synthesis voice and true man recording；

Step 3: text analyzing being done to the whole sentence synthesis text for generating voice, to retain context prosodic information, is promoted and is spelled Connect the rhythm continuity at place；

Step 4: audio effect processing being done to true man recording, and carrying out to true man recording with the synthesis voice can gauge It is whole, the level so that energy variation between true man recording and the synthesis voice reaches unanimity.

2. the method for Intelligent voice broadcasting as described in claim 1, which is characterized in that before the step 1, the method Further include the steps that adjusting to the duration modeling for generating voice and acoustic model using true man recording.

3. the method for Intelligent voice broadcasting as described in claim 1, which is characterized in that the step 2 and the step 3 it Between, further include step 5: the text of the slot position part is optimized, so that the text of the slot position part of optimization includes entire Prosodic phrase information.

4. the method for Intelligent voice broadcasting as claimed in claim 2, which is characterized in that the step 3 and the step 4 it Between, the method also includes steps 6: mute section of true man's recording and the stitching portion of the synthesis voice is removed, to be promoted The continuity of stitching portion.

5. the method for Intelligent voice broadcasting as described in claim 1, which is characterized in that after the step 4, the method Further include step 7: being directed to different types of slot position text, dynamically adjust the synthetic parameters of the slot position part.

6. the method for Intelligent voice broadcasting as described in claim 1, which is characterized in that after the step 4, the method Further include step 8: being directed to different application scene, the background sound of corresponding scene is added in casting voice.

7. a kind of system of Intelligent voice broadcasting characterized by comprising

Module is obtained, generates voice for obtaining, the voice that generates includes that the fixation clause part in text to be processed is adopted Voice is synthesized used by true man's recording and slot position part；

Module is adjusted, the characteristic parameter for being recorded using the true man with synthesis voice joins the feature for generating voice Number is adjusted, to promote the consistency for generating voice；

Text analysis model, for doing text analyzing to the whole sentence synthesis text for generating voice, to retain the context rhythm Information promotes the rhythm continuity of stitching portion；

Audio effect processing module carries out audio effect processing for recording to the true man, and the true man is recorded and the synthesis language The energy of sound is regular to arrive consistent level.

8. the system of Intelligent voice broadcasting as claimed in claim 7, which is characterized in that the system of the Intelligent voice broadcasting is also Including removing mute module, for removing mute section of true man's recording and the stitching portion of the synthesis voice, spelled with being promoted Connect the continuity at place.

9. the system of Intelligent voice broadcasting as claimed in claim 7, which is characterized in that the system of the Intelligent voice broadcasting is also The synthetic parameters of the slot position part are dynamically adjusted for being directed to different types of slot position text including parameter adjustment module.

10. the system of Intelligent voice broadcasting as claimed in claim 7, which is characterized in that the system of the Intelligent voice broadcasting Further include background sound adding module, for being directed to different application scene, the background sound of corresponding scene is added in casting voice.