CN107437412A - A kind of acoustic model processing method, phoneme synthesizing method, device and relevant device - Google Patents
A kind of acoustic model processing method, phoneme synthesizing method, device and relevant device Download PDFInfo
- Publication number
- CN107437412A CN107437412A CN201610353978.5A CN201610353978A CN107437412A CN 107437412 A CN107437412 A CN 107437412A CN 201610353978 A CN201610353978 A CN 201610353978A CN 107437412 A CN107437412 A CN 107437412A
- Authority
- CN
- China
- Prior art keywords
- amplitude spectrum
- model
- spectral model
- processing
- spectrum
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 61
- 230000002194 synthesizing effect Effects 0.000 title claims abstract description 25
- 238000003672 processing method Methods 0.000 title claims abstract description 22
- 238000001228 spectrum Methods 0.000 claims abstract description 279
- 230000003595 spectral effect Effects 0.000 claims abstract description 166
- 238000012545 processing Methods 0.000 claims abstract description 101
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 31
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 31
- 230000003044 adaptive effect Effects 0.000 claims description 55
- 238000012805 post-processing Methods 0.000 claims description 20
- 230000008569 process Effects 0.000 claims description 15
- 230000008859 change Effects 0.000 claims description 11
- 230000003068 static effect Effects 0.000 claims description 8
- 230000000694 effects Effects 0.000 abstract description 9
- 238000005516 engineering process Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 9
- 230000004048 modification Effects 0.000 description 8
- 238000012986 modification Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 238000004590 computer program Methods 0.000 description 5
- 230000009466 transformation Effects 0.000 description 5
- 230000002708 enhancing effect Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 4
- 230000000712 assembly Effects 0.000 description 3
- 238000000429 assembly Methods 0.000 description 3
- 241000208340 Araliaceae Species 0.000 description 2
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 2
- 235000003140 Panax quinquefolius Nutrition 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 235000008434 ginseng Nutrition 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000006641 stabilisation Effects 0.000 description 2
- 238000011105 stabilization Methods 0.000 description 2
- KLDZYURQCUYZBL-UHFFFAOYSA-N 2-[3-[(2-hydroxyphenyl)methylideneamino]propyliminomethyl]phenol Chemical compound OC1=CC=CC=C1C=NCCCN=CC1=CC=CC=C1O KLDZYURQCUYZBL-UHFFFAOYSA-N 0.000 description 1
- 230000004308 accommodation Effects 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 201000001098 delayed sleep phase syndrome Diseases 0.000 description 1
- 208000033921 delayed sleep phase type circadian rhythm sleep disease Diseases 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000003389 potentiating effect Effects 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/04—Details of speech synthesis systems, e.g. synthesiser structure or memory management
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Telephonic Communication Services (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Data processing field of the present invention, a kind of acoustic model processing method, phoneme synthesizing method, device and relevant device are disclosed, the poor technical problem of the speech quality synthesized in the prior art with solution.This method includes:Obtain the parameter preset of the spectral model in speech model;The parameter preset of the spectral model is converted into amplitude spectrum;The amplitude spectrum is adaptively post-processed, the amplitude spectrum after being handled;Amplitude spectrum after the processing is converted to the parameter preset of the spectral model, and then the spectral model after being handled.The technique effect for improving phonetic synthesis quality is reached.
Description
Technical field
The present invention relates to data processing field, more particularly to a kind of acoustic model processing method, phoneme synthesizing method, device
And relevant device.
Background technology
Offline speech synthesis system, main stream approach is to be based on HMM (Hidden Markov Model at present:Hidden Markov
Model) parameter phonetic synthesis.Then it is realized the synthesis of voice by speech model, asked firstly the need of training speech model
With reference to figure 1, establish speech model and comprise the following steps:
Step S101:Obtain corpus;
Step S102:The extraction of parameters,acoustic is carried out to language material in corpus;
Step S103:Context-sensitive HMM-GMM is carried out to parameters,acoustic in corpus and corresponding rhythm text to build
Mould, and then speech model is obtained, wherein, modeling object includes frequency spectrum, fundamental frequency, duration;
Before speech model is established, Fig. 2 is refer to, voice can be synthesized by the following:
Step S201:Obtain text to be synthesized;
Step S202:Treat synthesis text and parse contextual information;
Step S203:Model prediction, parameters,acoustic corresponding to acquisition, parameters,acoustic are carried out to context by speech model
Including:Frequency spectrum, fundamental frequency information;
Step S204:Parameters,acoustic is synthesized by voice by vocoder.
By the voice that the program synthesizes there is the poor technical problem of tonequality, cause user experience relatively low.
The content of the invention
The present invention provides a kind of acoustic model processing method, phoneme synthesizing method, device and relevant device, existing to solve
The poor technical problem of the speech quality that is synthesized in technology.
In a first aspect, the embodiment of the present invention provides a kind of acoustic model processing method, including:
Obtain the parameter preset of the spectral model in speech model;
The parameter preset of the spectral model is converted into amplitude spectrum;
The amplitude spectrum is adaptively post-processed, the amplitude spectrum after being handled;
Amplitude spectrum after the processing is converted to the parameter preset of the spectral model, so it is described after being handled
Spectral model.
Optionally, the parameter preset by the spectral model is converted to amplitude spectrum, including:
The static parameter of the equal value part of the spectral model is converted into the amplitude spectrum.
Optionally, the amplitude spectrum after adaptive handled is carried out to the amplitude spectrum, including:Pass through following public affairs
Formula is adaptively post-processed to the amplitude spectrum:
Wherein, Snew(z) amplitude spectrum after expression processing;
Sori(z) amplitude spectrum of before processing is represented;
Sori(z/ β) represents the S on z-planeori(z) β times before change of scale arrives, the amplitude spectrum obtained from;
Sori(z/ α) represents the S on z-planeori(z) α times before change of scale arrives, the amplitude spectrum obtained from.
Optionally, the amplitude spectrum is adaptively post-processed, the amplitude spectrum after being handled, in addition to:
For the amplitude spectrum of each before processing, judge to calculate whether the amplitude spectrum after the processing of acquisition is located at preset maximum value
With the scope of predetermined minimum;
When amplitude spectrum after treatment is less than the predetermined minimum, using the predetermined minimum as the amplitude after processing
Spectrum;
When amplitude spectrum after treatment is more than the preset maximum value, using the preset maximum value as the amplitude after processing
Spectrum.
Optionally, the amplitude spectrum is adaptively post-processed, after the amplitude spectrum after being handled, methods described is also
Including:
Spectrum energy unification processing is carried out to the amplitude spectrum after the processing;
Amplitude spectrum after the processing is converted to the parameter preset of the spectral model, including:
Amplitude spectrum after progress spectrum energy unification processing is converted to the parameter preset of the spectral model.
Optionally, methods described also includes:
Obtain the text to be synthesized for carrying out phonetic synthesis;
The parameters,acoustic of the text to be synthesized is determined based on the speech model;
The speech data of the text to be synthesized is synthesized by the parameters,acoustic.
Second aspect, the embodiment of the present invention provide a kind of phoneme synthesizing method, including:
Obtain the text to be synthesized for carrying out phonetic synthesis;
The frequency spectrum parameter of the text to be synthesized is determined based on the spectral model in speech model, the spectral model is
Through adaptive reprocessed spectral model, the adaptive last handling process comprises the following steps:By the spectral model
Parameter preset is converted to amplitude spectrum, the amplitude spectrum after adaptive handled is carried out to the amplitude spectrum, by the place
Amplitude spectrum after reason is converted to the parameter preset of the spectral model;
The speech data of the text to be synthesized is synthesized by the frequency spectrum parameter.
Optionally, the spectral model based in speech model determine the text to be synthesized frequency spectrum parameter it
Before, methods described also includes:
The adaptive post processing of the spectral model is locally carried out in client device;And/or
The spectral model by adaptively post-processing is received from the server.
The third aspect, the embodiment of the present invention provide a kind of acoustic model processing unit, including:
Acquisition module, for obtaining the parameter preset of the spectral model in speech model;
First modular converter, for the parameter preset of the spectral model to be converted into amplitude spectrum;
First obtains module, for adaptively being post-processed to the amplitude spectrum, the amplitude spectrum after being handled;
Second modular converter, for the amplitude spectrum after the processing to be converted to the parameter preset of the spectral model, enter
And the spectral model after being handled.
Fourth aspect, the embodiment of the present invention provide a kind of speech synthetic device, including:
3rd obtains module, and the text to be synthesized of phonetic synthesis is carried out for obtaining;
Second determining module, for determining that the frequency spectrum of the text to be synthesized is joined based on the spectral model in speech model
Number, the spectral model are to comprise the following steps through adaptive reprocessed spectral model, the adaptive last handling process:
The parameter preset of the spectral model is converted into amplitude spectrum, after adaptive handled is carried out to the amplitude spectrum
Amplitude spectrum, the amplitude spectrum after the processing is converted to the parameter preset of the spectral model;
Second synthesis module, for synthesizing the speech data of the text to be synthesized by the frequency spectrum parameter.
5th aspect, the embodiment of the present invention provide a kind of processing equipment, include memory, and one or one with
On program, one of them or more than one program storage in memory, and be configured to by one or more than one
Computing device is one or more than one program bag contains the instruction for being used for being operated below:
Obtain the parameter preset of the spectral model in speech model;
The parameter preset of the spectral model is converted into amplitude spectrum;
The amplitude spectrum is adaptively post-processed, the amplitude spectrum after being handled;
Amplitude spectrum after the processing is converted to the parameter preset of the spectral model, so it is described after being handled
Spectral model.
6th aspect, the embodiment of the present invention provide a kind of processing equipment, include memory, and one or one with
On program, one of them or more than one program storage in memory, and be configured to by one or more than one
Computing device is one or more than one program bag contains the instruction for being used for being operated below:
Obtain the text to be synthesized for carrying out phonetic synthesis;
The frequency spectrum parameter of the text to be synthesized is determined based on the spectral model in speech model, the spectral model is
Through adaptive reprocessed spectral model, the adaptive last handling process comprises the following steps:By the spectral model
Parameter preset is converted to amplitude spectrum, the amplitude spectrum after adaptive handled is carried out to the amplitude spectrum, by the place
Amplitude spectrum after reason is converted to the parameter preset of the spectral model;
The speech data of the text to be synthesized is synthesized by the frequency spectrum parameter.
The present invention has the beneficial effect that:
In embodiments of the present invention, handled in the following manner for speech model:Obtain the frequency in speech model
The parameter preset of spectrum model;Then, the parameter preset of the spectral model is converted into amplitude spectrum;The amplitude spectrum is carried out certainly
Adapt to the amplitude spectrum after being handled;Amplitude spectrum after the processing is converted to the default ginseng of the spectral model
Number, and then the spectral model after handle, due to for the parameter preset in spectral model carried out it is adaptive after locate
Reason, that is to say the desired signal enhanced in spectral model and reduces interference signal, so as to be given birth to subsequently through the speech model
During into speech data, it is possible to increase the quality of synthesized voice;
Also, it is amplitude spectrum that adaptive post processing object is carried out in scheme, and amplitude spectrum is a kind of general frequency spectrum, various
Frequency spectrum parameter can be converted to amplitude spectrum, thus the program is all suitable for for any frequency spectrum parameter, without for difference
Frequency spectrum parameter (such as:Line spectrum pair, mel cepstrum etc.) different adaptive post processing modes is used, therefore the program is directed to
The compatibility of the adaptive post processing of frequency spectrum parameter is stronger;
Also, the spectral model that the program is directed in speech model in advance is adaptively post-processed, without rear
Adaptively post-processed after continuous generation parameters,acoustic, therefore, reduce the consumption using speech model synthesis speech data
When.
Brief description of the drawings
Fig. 1 is the flow chart for establishing speech model in the prior art;
Fig. 2 is the flow chart for synthesizing speech data in the prior art;
Fig. 3 is the flow chart of the acoustic model processing method of first aspect of the embodiment of the present invention;
Fig. 4 is the flow chart that speech data is synthesized in the acoustic model processing method of first aspect of the embodiment of the present invention;
Fig. 5 is the flow chart of the phoneme synthesizing method of second aspect of the embodiment of the present invention;
Fig. 6 is the structure chart of the acoustic model processing method of the third aspect of the embodiment of the present invention;
Fig. 7 is the speech synthetic device structure chart of fourth aspect of the embodiment of the present invention;
Fig. 8 is the block diagram of the electronic equipment according to an exemplary embodiment;
Fig. 9 is the structural representation of server in the embodiment of the present invention.
Embodiment
The present invention provides a kind of acoustic model processing method, phoneme synthesizing method, device and relevant device, existing to solve
The poor technical problem of the speech quality that is synthesized in technology.
Technical scheme in the embodiment of the present application is the above-mentioned technical problem of solution, and general thought is as follows:
In order to be better understood from above-mentioned technical proposal, below by accompanying drawing and specific embodiment to technical solution of the present invention
It is described in detail, it should be understood that the specific features in the embodiment of the present invention and embodiment are to the detailed of technical solution of the present invention
Thin explanation, rather than the restriction to technical solution of the present invention, in the case where not conflicting, the embodiment of the present invention and embodiment
In technical characteristic can be mutually combined.
Handled in the following manner for speech model:Obtain the parameter preset of the spectral model in speech model;
Then, the parameter preset of the spectral model is converted into amplitude spectrum;The amplitude spectrum is adaptively post-processed, at acquisition
Amplitude spectrum after reason;Amplitude spectrum after the processing is converted to the parameter preset of the spectral model, and then after being handled
The spectral model, due to having carried out adaptive post processing for the parameter preset in spectral model, that is to say enhance frequency
Desired signal in spectrum model simultaneously reduces interference signal, can during so as to generate speech data subsequently through the speech model
Improve the quality of synthesized voice;
Also, it is amplitude spectrum that adaptive post processing object is carried out in scheme, and amplitude spectrum is a kind of general frequency spectrum, various
Frequency spectrum parameter can be converted to amplitude spectrum, thus the program is all suitable for for any frequency spectrum parameter, without for difference
Frequency spectrum parameter (such as:Line spectrum pair, mel cepstrum etc.) different adaptive post processing modes is used, therefore the program is directed to
The compatibility of the adaptive post processing of frequency spectrum parameter is stronger;
Also, the spectral model that the program is directed in speech model in advance is adaptively post-processed, without rear
Adaptively post-processed after continuous generation parameters,acoustic, therefore, reduce the consumption using speech model synthesis speech data
When.
Although preferred embodiments of the present invention have been described, but those skilled in the art once know basic creation
Property concept, then can make other change and modification to these embodiments.So appended claims be intended to be construed to include it is excellent
Select embodiment and fall into having altered and changing for the scope of the invention.
Obviously, those skilled in the art can carry out the essence of various changes and modification without departing from the present invention to the present invention
God and scope.So, if these modifications and variations of the present invention belong to the scope of the claims in the present invention and its equivalent technologies
Within, then the present invention is also intended to comprising including these changes and modification.
In a first aspect, the embodiment of the present invention provides acoustic model processing method, Fig. 3 is refer to, including:
Step S301:Obtain the parameter preset of the spectral model in speech model;
Step S302:The parameter preset of the spectral model is converted into amplitude spectrum;
Step S303:The amplitude spectrum is adaptively post-processed, the amplitude spectrum after being handled;
Step S304:Amplitude spectrum after the processing is converted to the parameter preset of the spectral model, and then at acquisition
The spectral model after reason.
For example, the program can apply to server, can also be applied to client device, the embodiment of the present invention is not
It is restricted.Client device is, for example,:Mobile phone, notebook computer, tablet personal computer, PC etc., the embodiment of the present invention is not restricted.
In step S301, for example, speech model for example including:Spectral model, fundamental frequency model, duration modeling etc..
Spectral model generally includes:Probability density part and decision tree part, wherein, probability density part includes average and variance,
Value and variance all include static parameter and dynamic parameter respectively, and the parameter preset of spectral model is, for example,:Static parameter, certainly,
Dynamic parameter can also be included, the embodiment of the present invention is not restricted.
In step S302, the parameter preset of spectral model can be converted into amplitude spectrum in the following manner:
When the parameter preset of spectral model is line spectrum pairs parameter, it is assumed that its form is K, l (1), l (2) ... l (V).Work as V
For even number when, amplitude spectrum S (ω) is:
When V is odd number, amplitude spectrum S (ω) is:
When the parameter preset of spectral model is Mel-cepstrum, it is assumed that its form is ca(0),ca(1),…ca(V),
Wherein a is known, and when spectral model derives from sample rate 16KHZ audio, a is traditionally arranged to be 0.42, first according to such as
Lower equations cepstrumWherein v represents currently processed dimension,
Then by Fourier transformation, then by the exponential function that natural constant e is bottom, so as to obtain amplitude spectrum.
Where it is assumed that containing M equal value sequences in spectral model, the static parameter of each of which value sequence is N-dimensional, i.e.,
The pending data of the parameter preset of spectral model is M*N matrix, and it is converted into the amplitude spectrum of Y dimensions by such scheme
Afterwards, M*Y matrix will be obtained.In subsequent step S102~S104, only processing Y ties up amplitude spectrum every time, performs M behaviour altogether
Make.
In step S303, it can adaptively be post-processed by below equation for Y dimension amplitude spectrums:
Wherein, Snew(z) amplitude spectrum after expression processing;
Sori(z) amplitude spectrum of before processing is represented;
Sori(z/ β) represents the S on z-planeori(z) β times before change of scale arrives, the amplitude spectrum obtained from;
Sori(z/ α) represents the S on z-planeori(z) α times before change of scale arrives, the amplitude spectrum obtained from.
Under normal circumstances, α and β can empirically be set, and generally, β-α numerical value is bigger, and the tonequality for synthesizing voice increases
Potent fruit is more obvious, but the excessive synthetic effect that may result in of β-α numerical value is unstable, such as:The voice distortion of synthesis.
In specific implementation process, after adaptively being post-processed through the above way, in order to which synthetic effect is stable,
The scope of amplitude spectral transformation, and then the amplitude carried out to the amplitude spectrum after adaptive handled can be limited
Spectrum, in addition to:For the amplitude spectrum of each before processing, whether judge to calculate the amplitude spectrum after the processing of acquisition positioned at default maximum
The scope of value and predetermined minimum;When amplitude spectrum after treatment is less than the predetermined minimum, the predetermined minimum is made
For the amplitude spectrum after processing;When amplitude spectrum after treatment is more than the preset maximum value, using the preset maximum value as place
Amplitude spectrum after reason.
For example, preset maximum value can be the fixed value of setting, or Sori(z) preset ratio, equally,
Predetermined minimum can be the fixed value of setting, or Sori(z) preset ratio, the embodiment of the present invention are not restricted.
Wherein, if preset maximum value and predetermined minimum are all Sori(z) if preset ratio, then can by with
The transformation range of lower equations amplitude spectrum:
Assuming that Sori(z) value of y dimensions is sori, Snew(z) value of y dimensions is snew, wherein, 1≤y≤Y.So:
Wherein, mindata and maxdata can empirically be set, and generally maxdata-mindata numerical value is got over
Greatly, it is more obvious to synthesize the tonequality enhancing effect of voice, but maxdata-mindata numerical value is excessive may to cause synthetic effect not
It is stable.Maxdata-mindata value for example can in 7-10, such as:8th, 9,10 etc., in this case, can either
Ensure the stabilization of synthetic effect, while the enough pairings of and can realize preferable enhancing effect into the tonequality of voice.
Wherein, if preset maximum value and predetermined minimum are all the fixed value of setting, then following public affairs can be passed through
Formula limits the transformation range of amplitude spectrum:
Assuming that Snew(z) value of y dimensions is snew, wherein, 1≤y≤Y.So:
Equally, mindata and maxdata can empirically be set, and generally maxdata-mindata numerical value is got over
Greatly, it is more obvious to synthesize the tonequality enhancing effect of voice, but maxdata-mindata numerical value is excessive may to cause synthetic effect not
It is stable.Equally, maxdata-mindata value for example can in 7-10, such as:8th, 9,10 etc., in this case, both
The stabilization of synthetic effect can be ensured, while the enough pairings of and can realize preferable enhancing effect into the tonequality of voice.
As a kind of optional embodiment, in order to ensure that synthetic effect is stable, it is also necessary to ensure before and after adaptively post-processing
Spectrum energy it is consistent, that is to say:The amplitude spectrum carried out to the amplitude spectrum after adaptive handled it
Afterwards, methods described also includes:Spectrum energy unification processing is carried out to the amplitude spectrum after the processing.
Wherein it is possible to ensure that the front and rear spectrum energy of adaptive post processing is consistent by below equation:
Wherein, S 'new(z) amplitude spectrum after the processing of spectrum energy unification is represented;
Snew(z) amplitude spectrum of spectrum energy unification before processing is represented;
Sori(z) amplitude spectrum before self-adaptive processing is represented.
In step S304, amplitude spectrum can be converted to the parameter preset of spectral model in the following manner:
When the parameter preset of spectral model is line spectrum pairs parameter, the logarithm using e the bottom of as, Ran Houtong are taken to amplitude spectrum first
Cross inverse Fourier transform and obtain cepstrum parameter c0(v) broad sense cepstrum parameter c then, is solved according to following regression equation-1(v), v tables
Show the dimension for being presently in reason,
Then by the regular acquisition LPC parameters of gain, z-transform is carried out to it afterwards, solves it in unit circle
On zero point, the angular frequency value corresponding to zero point is line spectrum pairs parameter.
When the parameter preset of spectral model is Mel-cepstrum, the logarithm using e the bottom of as is taken to amplitude spectrum first, then
Cepstrum parameter is obtained by inverse Fourier transform, it is assumed that its form is c0(0),c0(1),…c0(V), finally according to equation below
Solve mel cepstrumWherein a is known, and when spectral model derives from sample rate 16KHZ audio, a is typically set
0.42, v is set to represent to be presently in the dimension of reason,
Wherein, if not carrying out spectrum energy unification processing to amplitude spectrum before, then in step S304 directly
The parameter preset of spectral model will be converted to through adaptive reprocessed amplitude spectrum;If frequency spectrum was carried out to amplitude spectrum before
If energy coincidenceization processing, then in step S304, the amplitude spectrum treated by spectrum energy unification is converted into frequency spectrum
The parameter preset of model.
In specific implementation process, after the spectral model after being handled based on step S304, it is possible to pass through bag
Speech model synthesis speech data containing the spectral model, refer to Fig. 4, can synthesize speech data by following steps:
Step S401:Obtain the text to be synthesized for carrying out phonetic synthesis;
Step S402:The parameters,acoustic of the text to be synthesized is determined based on the speech model;
Step S403:The speech data of the text to be synthesized is synthesized by the parameters,acoustic.
In step S401, text to be synthesized is, for example,:Text, the client device of user's input are produced corresponding to prompt tone
Text, e-book text etc., can also be other any form of texts certainly, the embodiment of the present invention no longer arranges in detail
Lift, and be not restricted.
In step S402, synthesis text can be treated first and carries out context resolution, and then parses text to be synthesized
Contextual information, model prediction, parameters,acoustic corresponding to acquisition, parameters,acoustic bag are then carried out to context by speech model
Include:Frequency spectrum, fundamental frequency information, duration etc..
In step S403, the step S402 parameters,acoustics determined can be synthesized by vocoder, and then obtained
Corresponding speech data.After speech data is synthesized, the speech data can also be inputted by various modes, such as:Pass through
The voice output that client device carries exports the speech data, sends the speech data to another client device,
So that another client device exports the speech data etc..Second aspect, based on same inventive concept, the embodiment of the present invention
A kind of phoneme synthesizing method is provided, refer to Fig. 5, including:
Step S501:Obtain the text to be synthesized for carrying out phonetic synthesis;
Step S502:The frequency spectrum parameter of the text to be synthesized is determined based on the spectral model in speech model, it is described
Spectral model is to comprise the following steps through adaptive reprocessed spectral model, the adaptive last handling process:By described in
The parameter preset of spectral model is converted to amplitude spectrum, and the amplitude after adaptive handled is carried out to the amplitude spectrum
Spectrum, the amplitude spectrum after the processing is converted to the parameter preset of the spectral model;
Step S503:The speech data of the text to be synthesized is synthesized by the frequency spectrum parameter.
In step S501, which kind of text text to be synthesized is specially, due to being above described, so it is no longer superfluous herein
State.
In step S502, specifically how to obtain through adaptive reprocessed spectral model, due to first aspect present invention
It has been be described that, so will not be repeated here.It can be obtained through adaptive reprocessed spectral model by number of ways, below
Enumerate two kinds of approach therein to be introduced, certainly, in specific implementation process, be not limited to following two situations.
The first, the adaptive post processing of the spectral model is locally carried out in client device.
Second, the spectral model by adaptively post-processing is received from the server.
In step S503, other ginsengs of speech data can also be obtained by the model of other included in speech model
Number, such as:The base frequency parameters of text to be synthesized are obtained by fundamental frequency model, the duration of text to be synthesized is obtained by duration modeling
Parameter etc., then synthesize the language of text to be synthesized jointly by parameters,acoustics such as base frequency parameters, duration parameters and frequency spectrum parameters
Sound data.
Speech data for specifically how to synthesize by parameters,acoustic text to be synthesized, due to being above described, therefore
And it will not be repeated here.
Analyzed more than, in embodiments of the present invention for spectral model, first by the parameter preset of spectral model
(such as:The equal value part of static parameter) amplitude spectrum is converted into, then amplitude spectrum is adaptively post-processed, in order to synthesize effect
Fruit is stable, the scope of limited amplitude spectral transformation, and amplitude of accommodation spectrum energy, is allowed to identical with the amplitude spectrum energy of before processing,
Amplitude spectrum after processing is finally converted into the parameter preset of spectral model, keeps constant for the other parts of spectral model,
Due to having carried out adaptive post processing for the parameter preset in spectral model, the expectation letter enhanced in spectral model that is to say
Number and reduce interference signal, and then when synthesizing speech data based on the spectral model, it is possible to increase synthesized voice number
According to quality.
The third aspect, based on same inventive concept, the embodiment of the present invention provides a kind of acoustic model processing unit, refer to
Fig. 6, including:
Acquisition module 60, for obtaining the parameter preset of the spectral model in speech model;
First modular converter 61, for the parameter preset of the spectral model to be converted into amplitude spectrum;
First obtains module 62, for adaptively being post-processed to the amplitude spectrum, the amplitude spectrum after being handled;
Second modular converter 63, for the amplitude spectrum after the processing to be converted to the parameter preset of the spectral model,
And then the spectral model after being handled.
Optionally, first modular converter 61, is used for:
The static parameter of the equal value part of the spectral model is converted into the amplitude spectrum.
Optionally, described first module 62 is obtained, for carrying out adaptive rear place to the amplitude spectrum by below equation
Reason:
Wherein, Snew(z) amplitude spectrum after expression processing;
Sori(z) amplitude spectrum of before processing is represented;
Sori(z/ β) represents the S on z-planeori(z) β times before change of scale arrives, the amplitude spectrum obtained from;
Sori(z/ α) represents the S on z-planeori(z) α times before change of scale arrives, the amplitude spectrum obtained from.
Optionally, described first module 62 is obtained, including:
Judging unit, for the amplitude spectrum for each before processing, whether judge the amplitude spectrum after the processing that calculating obtains
Positioned at preset maximum value and the scope of predetermined minimum;
First determining unit, when being less than the predetermined minimum for amplitude spectrum after treatment, by the default minimum
It is worth as the amplitude spectrum after processing;
Second determining unit, when being more than the preset maximum value for amplitude spectrum after treatment, by the default maximum
It is worth as the amplitude spectrum after processing.
Optionally, described device also includes:
First processing module, for carrying out spectrum energy unification processing to the amplitude spectrum after the processing;
Second modular converter, for the amplitude spectrum after progress spectrum energy unification processing to be converted into the frequency spectrum
The parameter preset of model.
Optionally, described device also includes:
Second obtains module, and the text to be synthesized of phonetic synthesis is carried out for obtaining;
First determining module, for determining the parameters,acoustic of the text to be synthesized based on the speech model;
First synthesis module, for synthesizing the speech data of the text to be synthesized by the parameters,acoustic.
By the acoustic model processing unit that third aspect present invention is introduced is implementation first aspect of the embodiment of the present invention
Device used by the acoustic model processing method introduced, the acoustic model introduced based on first aspect of the embodiment of the present invention
Processing method, those skilled in the art can understand concrete structure and the deformation of the device, so will not be repeated here, it is all
It is that device belongs to the present invention in fact used by implementing the acoustic model processing method that first aspect of the embodiment of the present invention is introduced
Apply the scope to be protected of example.
Fourth aspect, based on same inventive concept, the embodiment of the present invention provides a kind of speech synthetic device, refer to Fig. 7,
Including:
3rd obtains module 70, and the text to be synthesized of phonetic synthesis is carried out for obtaining;
Second determining module 71, for determining the frequency spectrum of the text to be synthesized based on the spectral model in speech model
Parameter, the spectral model are to include following step through adaptive reprocessed spectral model, the adaptive last handling process
Suddenly:The parameter preset of the spectral model is converted into amplitude spectrum, the amplitude spectrum adaptively handled
Amplitude spectrum afterwards, the amplitude spectrum after the processing is converted to the parameter preset of the spectral model;
Second synthesis module 72, for synthesizing the speech data of the text to be synthesized by the frequency spectrum parameter.
Optionally, described device also includes:
Second processing module, for locally carrying out the adaptive post processing of the spectral model in client device;And/or
Receiving module, for receiving the spectral model by adaptively post-processing from the server.
By the speech synthetic device that fourth aspect present invention is introduced by implement second aspect of the embodiment of the present invention be situated between
Device used by the phoneme synthesizing method to continue, the phoneme synthesizing method introduced based on second aspect of the embodiment of the present invention, this
Those skilled in the art can understand concrete structure and the deformation of the device, so will not be repeated here, every this hair of implementation
Device belongs to what the embodiment of the present invention to be protected used by the phoneme synthesizing method that bright embodiment second aspect is introduced
Scope.
5th aspect, based on same inventive concept, the embodiment of the present invention provides a kind of processing equipment, and the processing equipment can be with
For electronic equipment or server, include memory, and one or more than one program, one of them or one
Procedure above is stored in memory, and be configured to by one either more than one computing device it is one or one
Procedure above includes the instruction for being used for being operated below:
Obtain the parameter preset of the spectral model in speech model;
The parameter preset of the spectral model is converted into amplitude spectrum;
The amplitude spectrum is adaptively post-processed, the amplitude spectrum after being handled;
Amplitude spectrum after the processing is converted to the parameter preset of the spectral model, so it is described after being handled
Spectral model.
Implement first aspect of the embodiment of the present invention by the electronic equipment that fifth aspect present invention is introduced to be introduced
Electronic equipment used by acoustic model processing method, the acoustic model processing introduced based on first aspect of the embodiment of the present invention
Method, those skilled in the art can understand concrete structure and the deformation of the electronic equipment, so will not be repeated here, it is all
It is that electronic equipment belongs to this hair used by implementing the acoustic model processing method that first aspect of the embodiment of the present invention is introduced
The scope to be protected of bright embodiment.
6th aspect, based on same inventive concept, the embodiment of the present invention provides a kind of processing equipment, and the processing equipment can be with
For electronic equipment or server, include memory, and one or more than one program, one of them or one
Procedure above is stored in memory, and be configured to by one either more than one computing device it is one or one
Procedure above includes the instruction for being used for being operated below:
Obtain the text to be synthesized for carrying out phonetic synthesis;
The frequency spectrum parameter of the text to be synthesized is determined based on the spectral model in speech model, the spectral model is
Through adaptive reprocessed spectral model, the adaptive last handling process comprises the following steps:By the spectral model
Parameter preset is converted to amplitude spectrum, the amplitude spectrum after adaptive handled is carried out to the amplitude spectrum, by the place
Amplitude spectrum after reason is converted to the parameter preset of the spectral model;
The speech data of the text to be synthesized is synthesized by the frequency spectrum parameter.
Implement second aspect of the embodiment of the present invention by the electronic equipment that sixth aspect present invention is introduced to be introduced
Electronic equipment used by phoneme synthesizing method, the phoneme synthesizing method introduced based on second aspect of the embodiment of the present invention, this
Those skilled in the art can understand concrete structure and the deformation of the electronic equipment, so will not be repeated here, every implementation
Electronic equipment belongs to institute of the embodiment of the present invention used by the phoneme synthesizing method that second aspect of the embodiment of the present invention is introduced
The scope to be protected.
Fig. 8 is a kind of acoustic model processing method of implementation (or phonetic synthesis side according to an exemplary embodiment
Method) electronic equipment 800 block diagram.For example, electronic equipment 800 can be mobile phone, computer, digital broadcast terminal, disappear
Cease transceiver, game console, tablet device, Medical Devices, body-building equipment, personal digital assistant etc..
Reference picture 8, electronic equipment 800 can include following one or more assemblies:Processing component 802, memory 804,
Power supply module 806, multimedia groupware 808, audio-frequency assembly 810, the interface 812 of input/output (I/O), sensor cluster 814,
And communication component 816.
The integrated operation of the usual control electronics 800 of processing component 802, such as leads to display, call, data
The operation that letter, camera operation and record operation are associated.Treatment element 802 can include one or more processors 820 to hold
Row instruction, to complete all or part of step of above-mentioned method.In addition, processing component 802 can include one or more moulds
Block, the interaction being easy between processing component 802 and other assemblies.For example, processing component 802 can include multi-media module, with
Facilitate the interaction between multimedia groupware 808 and processing component 802.
Memory 804 is configured as storing various types of data to support the operation in equipment 800.These data are shown
Example includes the instruction of any application program or method for being operated on electronic equipment 800, contact data, telephone directory number
According to, message, picture, video etc..Memory 804 can by any kind of volatibility or non-volatile memory device or they
Combination realize, as static RAM (SRAM), Electrically Erasable Read Only Memory (EEPROM) are erasable
Programmable read only memory (EPROM), programmable read only memory (PROM), read-only storage (ROM), magnetic memory, quick flashing
Memory, disk or CD.
Power supply module 806 provides electric power for the various assemblies of electronic equipment 800.Power supply module 806 can include power supply pipe
Reason system, one or more power supplys, and other components associated with generating, managing and distributing electric power for electronic equipment 800.
Multimedia groupware 808 is included in the screen of one output interface of offer between the electronic equipment 800 and user.
In certain embodiments, screen can include liquid crystal display (LCD) and touch panel (TP).If screen includes touch surface
Plate, screen may be implemented as touch-screen, to receive the input signal from user.Touch panel includes one or more touch
Sensor is with the gesture on sensing touch, slip and touch panel.The touch sensor can not only sensing touch or slip
The border of action, but also detect the duration and pressure related to the touch or slide.In certain embodiments,
Multimedia groupware 808 includes a front camera and/or rear camera.When electronic equipment 800 is in operator scheme, such as clap
When taking the photograph pattern or video mode, front camera and/or rear camera can receive outside multi-medium data.It is each preposition
Camera and rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio-frequency assembly 810 is configured as output and/or input audio signal.For example, audio-frequency assembly 810 includes a Mike
Wind (MIC), when electronic equipment 800 is in operator scheme, during such as call model, logging mode and speech recognition mode, microphone
It is configured as receiving external audio signal.The audio signal received can be further stored in memory 804 or via logical
Letter component 816 is sent.In certain embodiments, audio-frequency assembly 810 also includes a loudspeaker, for exports audio signal.
I/O interfaces 812 provide interface between processing component 802 and peripheral interface module, and above-mentioned peripheral interface module can
To be keyboard, click wheel, button etc..These buttons may include but be not limited to:Home button, volume button, start button and lock
Determine button.
Sensor cluster 814 includes one or more sensors, for providing the state of various aspects for electronic equipment 800
Assess.For example, sensor cluster 814 can detect opening/closed mode of equipment 800, the relative positioning of component, such as institute
The display and keypad that component is electronic equipment 800 are stated, sensor cluster 814 can also detect electronic equipment 800 or electronics
The position of 800 1 components of equipment changes, the existence or non-existence that user contacts with electronic equipment 800, the orientation of electronic equipment 800
Or acceleration/deceleration and the temperature change of electronic equipment 800.Sensor cluster 814 can include proximity transducer, be configured to
The presence of object nearby is detected in no any physical contact.Sensor cluster 814 can also include optical sensor, such as
CMOS or ccd image sensor, for being used in imaging applications.In certain embodiments, the sensor cluster 814 can be with
Including acceleration transducer, gyro sensor, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 816 is configured to facilitate the communication of wired or wireless way between electronic equipment 800 and other equipment.
Electronic equipment 800 can access the wireless network based on communication standard, such as WiFi, 2G or 3G, or combinations thereof.Show at one
In example property embodiment, communication component 816 receives broadcast singal or broadcast from external broadcasting management system via broadcast channel
Relevant information.In one exemplary embodiment, the communication component 816 also includes near-field communication (NFC) module, short to promote
Cheng Tongxin.For example, radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band can be based in NFC module
(UWB) technology, bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, electronic equipment 800 can be by one or more application specific integrated circuits (ASIC), number
Word signal processor (DSP), digital signal processing appts (DSPD), PLD (PLD), field programmable gate array
(FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for performing the above method.
In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instructing, example are additionally provided
Such as include the memory 804 of instruction, above-mentioned instruction can be performed to complete the above method by the processor 820 of electronic equipment 800.Example
Such as, the non-transitorycomputer readable storage medium can be ROM, it is random access memory (RAM), CD-ROM, tape, soft
Disk and optical data storage devices etc..
A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is by the processing of electronic equipment
When device performs so that electronic equipment is able to carry out a kind of acoustic model processing method, and methods described includes:
Obtain the parameter preset of the spectral model in speech model;
The parameter preset of the spectral model is converted into amplitude spectrum;
The amplitude spectrum is adaptively post-processed, the amplitude spectrum after being handled;
Amplitude spectrum after the processing is converted to the parameter preset of the spectral model, so it is described after being handled
Spectral model.
A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is by the processing of electronic equipment
When device performs so that electronic equipment is able to carry out a kind of phoneme synthesizing method, and methods described includes:
Obtain the text to be synthesized for carrying out phonetic synthesis;
The frequency spectrum parameter of the text to be synthesized is determined based on the spectral model in speech model, the spectral model is
Through adaptive reprocessed spectral model, the adaptive last handling process comprises the following steps:By the spectral model
Parameter preset is converted to amplitude spectrum, the amplitude spectrum after adaptive handled is carried out to the amplitude spectrum, by the place
Amplitude spectrum after reason is converted to the parameter preset of the spectral model;
The speech data of the text to be synthesized is synthesized by the frequency spectrum parameter.
Fig. 9 is the structural representation of server in the embodiment of the present invention.The server 1900 can be different because of configuration or performance
And produce bigger difference, can include one or more central processing units (central processing units,
CPU) 1922 (for example, one or more processors) and memory 1932, one or more storage application programs
1942 or the storage medium 1930 (such as one or more mass memory units) of data 1944.Wherein, memory 1932
Can be of short duration storage or persistently storage with storage medium 1930.Be stored in storage medium 1930 program can include one or
More than one module (diagram does not mark), each module can include operating the series of instructions in server.Further
Ground, central processing unit 1922 be could be arranged to communicate with storage medium 1930, and storage medium 1930 is performed on server 1900
In series of instructions operation.
Server 1900 can also include one or more power supplys 1926, one or more wired or wireless nets
Network interface 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or, one or
More than one operating system 1941, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM
Etc..
A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is by the centre of server
When managing device execution so that server is able to carry out a kind of acoustic model processing method, and methods described includes:
Obtain the parameter preset of the spectral model in speech model;
The parameter preset of the spectral model is converted into amplitude spectrum;
The amplitude spectrum is adaptively post-processed, the amplitude spectrum after being handled;
Amplitude spectrum after the processing is converted to the parameter preset of the spectral model, so it is described after being handled
Spectral model.
A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is by the centre of server
When managing device execution so that server is able to carry out a kind of phoneme synthesizing method, and methods described includes:
Obtain the text to be synthesized for carrying out phonetic synthesis;
The frequency spectrum parameter of the text to be synthesized is determined based on the spectral model in speech model, the spectral model is
Through adaptive reprocessed spectral model, the adaptive last handling process comprises the following steps:By the spectral model
Parameter preset is converted to amplitude spectrum, the amplitude spectrum after adaptive handled is carried out to the amplitude spectrum, by the place
Amplitude spectrum after reason is converted to the parameter preset of the spectral model;
The speech data of the text to be synthesized is synthesized by the frequency spectrum parameter.
The one or more embodiments of the present invention, at least have the advantages that:
Due in embodiments of the present invention, being handled in the following manner for speech model:Obtain in speech model
Spectral model parameter preset;Then, the parameter preset of the spectral model is converted into amplitude spectrum;The amplitude spectrum is entered
The amplitude spectrum gone after adaptively being handled;Amplitude spectrum after the processing is converted into the default of the spectral model
Parameter, and then the spectral model after being handled, due to for the parameter preset in spectral model carried out it is adaptive after
Processing, that is to say the desired signal enhanced in spectral model and reduces interference signal, so as to subsequently through the speech model
When generating speech data, it is possible to increase the quality of synthesized voice;
Also, the object adaptively post-processed in scheme is amplitude spectrum, amplitude spectrum is a kind of general frequency spectrum, respectively
Kind frequency spectrum parameter can be converted to amplitude spectrum, thus the program is all suitable for for any frequency spectrum parameter, without for not
Frequency spectrum parameter together (such as:Line spectrum pair, mel cepstrum etc.) use different adaptive post processing modes, therefore program pin
The compatibility of adaptive post processing to frequency spectrum parameter is stronger;
Also, the spectral model that the program is directed in speech model in advance is adaptively post-processed, without rear
Adaptively post-processed after continuous generation parameters,acoustic, therefore, reduce the consumption using speech model synthesis speech data
When.
The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that can be by every first-class in computer program instructions implementation process figure and/or block diagram
Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided
The processors of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce
A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real
The equipment for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which produces, to be included referring to
Make the manufacture of equipment, the commander equipment realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or
The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted
Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, so as in computer or
The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one
The step of function of being specified in individual square frame or multiple square frames.
Although preferred embodiments of the present invention have been described, but those skilled in the art once know basic creation
Property concept, then can make other change and modification to these embodiments.So appended claims be intended to be construed to include it is excellent
Select embodiment and fall into having altered and changing for the scope of the invention.
Obviously, those skilled in the art can carry out the essence of various changes and modification without departing from the present invention to the present invention
God and scope.So, if these modifications and variations of the present invention belong to the scope of the claims in the present invention and its equivalent technologies
Within, then the present invention is also intended to comprising including these changes and modification.
Claims (12)
- A kind of 1. acoustic model processing method, it is characterised in that including:Obtain the parameter preset of the spectral model in speech model;The parameter preset of the spectral model is converted into amplitude spectrum;The amplitude spectrum is adaptively post-processed, the amplitude spectrum after being handled;Amplitude spectrum after the processing is converted to the parameter preset of the spectral model, and then the frequency spectrum after being handled Model.
- 2. the method as described in claim 1, it is characterised in that the parameter preset by the spectral model is converted to amplitude Spectrum, including:The static parameter of the equal value part of the spectral model is converted into the amplitude spectrum.
- 3. the method as described in claim 1, it is characterised in that after adaptive handled is carried out to the amplitude spectrum Amplitude spectrum, including:The amplitude spectrum is adaptively post-processed by below equation:<mrow> <msub> <mi>S</mi> <mrow> <mi>n</mi> <mi>e</mi> <mi>w</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>z</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msub> <mi>S</mi> <mrow> <mi>o</mi> <mi>r</mi> <mi>i</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>z</mi> <mo>/</mo> <mi>&beta;</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>S</mi> <mrow> <mi>o</mi> <mi>r</mi> <mi>i</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>z</mi> <mo>/</mo> <mi>&alpha;</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>*</mo> <msub> <mi>S</mi> <mrow> <mi>o</mi> <mi>r</mi> <mi>i</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>z</mi> <mo>)</mo> </mrow> <mo>,</mo> <mn>0</mn> <mo><</mo> <mi>&alpha;</mi> <mo><</mo> <mi>&beta;</mi> <mo><</mo> <mn>1</mn> </mrow>Wherein, Snew(z) amplitude spectrum after expression processing;Sori(z) amplitude spectrum of before processing is represented;Sori(z/ β) represents the S on z-planeori(z) β times before change of scale arrives, the amplitude spectrum obtained from;Sori(z/ α) represents the S on z-planeori(z) α times before change of scale arrives, the amplitude spectrum obtained from.
- 4. method as claimed in claim 3, it is characterised in that adaptively post-processed, handled to the amplitude spectrum Amplitude spectrum afterwards, in addition to:For the amplitude spectrum of each before processing, judge to calculate amplitude spectrum after the processing obtained whether positioned at preset maximum value with it is pre- If the scope of minimum value;When amplitude spectrum after treatment is less than the predetermined minimum, using the predetermined minimum as the amplitude spectrum after processing;When amplitude spectrum after treatment is more than the preset maximum value, using the preset maximum value as the amplitude spectrum after processing.
- 5. the method as described in claim 1, it is characterised in that adaptively post-processed, handled to the amplitude spectrum After amplitude spectrum afterwards, methods described also includes:Spectrum energy unification processing is carried out to the amplitude spectrum after the processing;Amplitude spectrum after the processing is converted to the parameter preset of the spectral model, including:Amplitude spectrum after progress spectrum energy unification processing is converted to the parameter preset of the spectral model.
- 6. the method as described in claim 1-5 is any, it is characterised in that methods described also includes:Obtain the text to be synthesized for carrying out phonetic synthesis;The parameters,acoustic of the text to be synthesized is determined based on the speech model;The speech data of the text to be synthesized is synthesized by the parameters,acoustic.
- A kind of 7. phoneme synthesizing method, it is characterised in that including:Obtain the text to be synthesized for carrying out phonetic synthesis;The frequency spectrum parameter of the text to be synthesized is determined based on the spectral model in speech model, the spectral model is through certainly Reprocessed spectral model is adapted to, the adaptive last handling process comprises the following steps:By the default of the spectral model Parameter Switch is amplitude spectrum, the amplitude spectrum after adaptive handled is carried out to the amplitude spectrum, after the processing Amplitude spectrum be converted to the parameter preset of the spectral model;The speech data of the text to be synthesized is synthesized by the frequency spectrum parameter.
- 8. method as claimed in claim 7, it is characterised in that determine institute in the spectral model based in speech model Before the frequency spectrum parameter for stating text to be synthesized, methods described also includes:The adaptive post processing of the spectral model is locally carried out in client device;And/orThe spectral model by adaptively post-processing is received from the server.
- A kind of 9. acoustic model processing unit, it is characterised in that including:Acquisition module, for obtaining the parameter preset of the spectral model in speech model;First modular converter, for the parameter preset of the spectral model to be converted into amplitude spectrum;First obtains module, for adaptively being post-processed to the amplitude spectrum, the amplitude spectrum after being handled;Second modular converter, for the amplitude spectrum after the processing to be converted to the parameter preset of the spectral model, and then obtain The spectral model after must handling.
- A kind of 10. speech synthetic device, it is characterised in that including:3rd obtains module, and the text to be synthesized of phonetic synthesis is carried out for obtaining;Second determining module, for determining the frequency spectrum parameter of the text to be synthesized based on the spectral model in speech model, The spectral model is to comprise the following steps through adaptive reprocessed spectral model, the adaptive last handling process:Will The parameter preset of the spectral model is converted to amplitude spectrum, and the width after adaptive handled is carried out to the amplitude spectrum Degree spectrum, the amplitude spectrum after the processing is converted to the parameter preset of the spectral model;Second synthesis module, for synthesizing the speech data of the text to be synthesized by the frequency spectrum parameter.
- 11. a kind of processing equipment, it is characterised in that include memory, and one or more than one program, wherein one Individual or more than one program storage is configured to one as described in one or more than one computing device in memory Individual or more than one program bag contains the instruction for being used for being operated below:Obtain the parameter preset of the spectral model in speech model;The parameter preset of the spectral model is converted into amplitude spectrum;The amplitude spectrum is adaptively post-processed, the amplitude spectrum after being handled;Amplitude spectrum after the processing is converted to the parameter preset of the spectral model, and then the frequency spectrum after being handled Model.
- 12. a kind of processing equipment, it is characterised in that include memory, and one or more than one program, wherein one Individual or more than one program storage is configured to one as described in one or more than one computing device in memory Individual or more than one program bag contains the instruction for being used for being operated below:Obtain the text to be synthesized for carrying out phonetic synthesis;The frequency spectrum parameter of the text to be synthesized is determined based on the spectral model in speech model, the spectral model is through certainly Reprocessed spectral model is adapted to, the adaptive last handling process comprises the following steps:By the default of the spectral model Parameter Switch is amplitude spectrum, the amplitude spectrum after adaptive handled is carried out to the amplitude spectrum, after the processing Amplitude spectrum be converted to the parameter preset of the spectral model;The speech data of the text to be synthesized is synthesized by the frequency spectrum parameter.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610353978.5A CN107437412B (en) | 2016-05-25 | 2016-05-25 | Acoustic model processing method, voice synthesis method, device and related equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610353978.5A CN107437412B (en) | 2016-05-25 | 2016-05-25 | Acoustic model processing method, voice synthesis method, device and related equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107437412A true CN107437412A (en) | 2017-12-05 |
CN107437412B CN107437412B (en) | 2021-06-29 |
Family
ID=60452931
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610353978.5A Active CN107437412B (en) | 2016-05-25 | 2016-05-25 | Acoustic model processing method, voice synthesis method, device and related equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107437412B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110580910A (en) * | 2018-06-08 | 2019-12-17 | 北京搜狗科技发展有限公司 | Audio processing method, device and equipment and readable storage medium |
CN110930977A (en) * | 2019-11-12 | 2020-03-27 | 北京搜狗科技发展有限公司 | Data processing method and device and electronic equipment |
CN110931045A (en) * | 2019-12-20 | 2020-03-27 | 重庆大学 | Audio feature generation method based on convolutional neural network |
CN115798455A (en) * | 2023-02-07 | 2023-03-14 | 深圳元象信息科技有限公司 | Speech synthesis method, system, electronic device and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102610236A (en) * | 2012-02-29 | 2012-07-25 | 山东大学 | Method for improving voice quality of throat microphone |
CN102938254A (en) * | 2012-10-24 | 2013-02-20 | 中国科学技术大学 | Voice signal enhancement system and method |
US20140207460A1 (en) * | 2013-01-24 | 2014-07-24 | Huawei Device Co., Ltd. | Voice identification method and apparatus |
CN104318927A (en) * | 2014-11-04 | 2015-01-28 | 东莞市北斗时空通信科技有限公司 | Anti-noise low-bitrate speech coding method and decoding method |
-
2016
- 2016-05-25 CN CN201610353978.5A patent/CN107437412B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102610236A (en) * | 2012-02-29 | 2012-07-25 | 山东大学 | Method for improving voice quality of throat microphone |
CN102938254A (en) * | 2012-10-24 | 2013-02-20 | 中国科学技术大学 | Voice signal enhancement system and method |
US20140207460A1 (en) * | 2013-01-24 | 2014-07-24 | Huawei Device Co., Ltd. | Voice identification method and apparatus |
CN104318927A (en) * | 2014-11-04 | 2015-01-28 | 东莞市北斗时空通信科技有限公司 | Anti-noise low-bitrate speech coding method and decoding method |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110580910A (en) * | 2018-06-08 | 2019-12-17 | 北京搜狗科技发展有限公司 | Audio processing method, device and equipment and readable storage medium |
CN110580910B (en) * | 2018-06-08 | 2024-04-26 | 北京搜狗科技发展有限公司 | Audio processing method, device, equipment and readable storage medium |
CN110930977A (en) * | 2019-11-12 | 2020-03-27 | 北京搜狗科技发展有限公司 | Data processing method and device and electronic equipment |
CN110931045A (en) * | 2019-12-20 | 2020-03-27 | 重庆大学 | Audio feature generation method based on convolutional neural network |
CN115798455A (en) * | 2023-02-07 | 2023-03-14 | 深圳元象信息科技有限公司 | Speech synthesis method, system, electronic device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN107437412B (en) | 2021-06-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109801644B (en) | Separation method, separation device, electronic equipment and readable medium for mixed sound signal | |
CN110136692B (en) | Speech synthesis method, apparatus, device and storage medium | |
JP5996783B2 (en) | Method and terminal for updating voiceprint feature model | |
CN111508511A (en) | Real-time sound changing method and device | |
JP6336676B2 (en) | Method and apparatus for synthesizing voice based on facial structure | |
CN111583944A (en) | Sound changing method and device | |
CN112099628A (en) | VR interaction method and device based on artificial intelligence, computer equipment and medium | |
CN108346433A (en) | A kind of audio-frequency processing method, device, equipment and readable storage medium storing program for executing | |
CN107705783A (en) | A kind of phoneme synthesizing method and device | |
CN107437412A (en) | A kind of acoustic model processing method, phoneme synthesizing method, device and relevant device | |
CN109599128A (en) | Speech-emotion recognition method, device, electronic equipment and readable medium | |
JP2022145772A (en) | Acoustic apparatus, control method of acoustic apparatus, and control program | |
CN109801618B (en) | Audio information generation method and device | |
CN110097890A (en) | A kind of method of speech processing, device and the device for speech processes | |
EP4012702A1 (en) | Internet calling method and apparatus, computer device, and storage medium | |
CN108198569A (en) | A kind of audio-frequency processing method, device, equipment and readable storage medium storing program for executing | |
CN110992927B (en) | Audio generation method, device, computer readable storage medium and computing equipment | |
CN111326138A (en) | Voice generation method and device | |
CN113223542B (en) | Audio conversion method and device, storage medium and electronic equipment | |
CN111210844B (en) | Method, device and equipment for determining speech emotion recognition model and storage medium | |
KR20210032875A (en) | Voice information processing method, apparatus, program and storage medium | |
CN108573306A (en) | Export method, the training method and device of deep learning model of return information | |
CN110931028B (en) | Voice processing method and device and electronic equipment | |
CN112651235A (en) | Poetry generation method and related device | |
CN110830368A (en) | Instant messaging message sending method and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |