CN108053814A - A kind of speech synthesis system and method for analog subscriber song - Google Patents
A kind of speech synthesis system and method for analog subscriber song Download PDFInfo
- Publication number
- CN108053814A CN108053814A CN201711079095.0A CN201711079095A CN108053814A CN 108053814 A CN108053814 A CN 108053814A CN 201711079095 A CN201711079095 A CN 201711079095A CN 108053814 A CN108053814 A CN 108053814A
- Authority
- CN
- China
- Prior art keywords
- information
- basic phone
- beat
- period information
- phone
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0091—Means for obtaining special acoustic effects
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/315—Sound category-dependent sound synthesis processes [Gensound] for musical use; Sound category-specific synthesis-controlling parameters or control means therefor
- G10H2250/455—Gensound singing voices, i.e. generation of human voices for musical applications, vocal singing sounds or intelligible words at a desired pitch or with desired vocal effects, e.g. by phoneme synthesis
Abstract
The invention discloses a kind of speech synthesis systems and method of analog subscriber song, belong to speech simulation technical field;Its principle is:It obtains the normal speech utterance of externally input user and is converted into pronunciation text, phone sequence is formed according to pronunciation text;The original period information of basic phone is obtained according to phone series processing;The original period information of basic phone is adjusted to the beat period information of corresponding music score of Chinese operas information;For the more original period information of basic phone and beat period information, and accordingly adjust according to judging result the phonetic synthesis parameter of basic phone;According to the phonetic synthesis parameter being adjusted of basic phone and pronunciation text, form the synthesis voice of analog subscriber song and export.The advantageous effect of above-mentioned technical proposal is:Without modeling, just energy analog subscriber is sung, and is improved the efficiency of speech simulation, is achieved the effect that intimate Real-time Feedback, retain the timbre information of user, details is enriched, and effect is true to nature, thus improves user experience.
Description
Technical field
The present invention relates to the speech synthesis systems and side of speech simulation technical field more particularly to a kind of analog subscriber song
Method.
Background technology
With the continuous development of speech synthesis technique, more and more application software start using speech synthesis technique come mould
Anthropomorphic content of speaking, the content spoken for example with speech synthesis technique simulation people is to obtain the purpose of " mechanically repeating other people's words " or adopt
The voices different from scene of commonly speaking such as people's singing are simulated with speech synthesis technique.
Specifically, in the prior art, in the scene sung in analog subscriber, it is common practice to using speech synthesis data
Intrinsic tone color generates song in storehouse, and needs to be modeled the timbre information of user, is existed using tamber transformation technology
The song effect of user voice is realized on the basis of the intrinsic tone color of song.The defects of this way, essentially consists in:
1. needs are in advance modeled the timbre information of user so that the process of phonetic synthesis is more complicated;
2. needs realize the conversion of user voice according to the model of structure, so as to the song synthesized, processing speed
Degree is slower, and treatment effeciency is low, can not realize and handle in real time and export the effect of song;
3. using timbre information intrinsic in synthesizer database come realize phonetic synthesis and simulation by the way of can not retain
The tamber characteristic of user in itself so that the results contrast of speech simulation is stiff, and simulation effect is not inconsistent with actual tone color.
The content of the invention
According to the above-mentioned problems in the prior art, speech synthesis system and the side of a kind of analog subscriber song are now provided
The technical solution of method, sound of speaking when directly user is normally spoken are converted into the singing sound of certain tune, it is intended to improve language
The efficiency of sound simulation achievees the effect that intimate Real-time Feedback is sung to user, and retains the timbre information of user, the language of synthesis
Sound details is enriched, and effect is true to nature, thus improves user experience.
Above-mentioned technical proposal specifically includes:
A kind of speech synthesis system of analog subscriber song, suitable for speech simulation application;Wherein, including:
First acquisition unit, for obtaining user speech when externally input user normally speaks;
First converting unit connects the first acquisition unit, for the user speech to be converted into corresponding pronunciation
Text and according to it is described pronunciation text form the corresponding phone sequence including basic phone;
First processing units connect first converting unit, for obtaining each institute according to the phone series processing
The corresponding original period information of basic phone is stated, the original period information is used to represent each basic phone in the use
Beginning and ending time in the voice of family;
First synthesis unit connects the first acquisition unit and the first processing units respectively, for according to
The original period information of the fundamental frequency information of user speech and each basic phone handles to obtain the user respectively
The phonetic synthesis parameter of each basic phone of voice;
Second acquisition unit, for obtaining the music score of Chinese operas information in a default target song;
Second processing unit connects the first processing units and the second acquisition unit respectively, for by each institute
The original period information for stating basic phone is adjusted to correspond to the beat period information of the music score of Chinese operas information, the beat respectively
Period information is used to represent beginning and ending time of each basic phone in the target song in corresponding beat;
Second synthesis unit connects first synthesis unit and the second processing unit respectively, each for being directed to
The basic phone original period information and the beat period information, and accordingly adjusted according to judging result each
The phonetic synthesis parameter of the basic phone;
Speech simulation unit connects second synthesis unit, second acquisition unit and first converting unit respectively,
For the phonetic synthesis parameter being adjusted according to each basic phone and the pronunciation text, simulation is formed
The synthesis voice of user's song simultaneously exports.
Preferably, the speech synthesis system, wherein, the first processing units handle to obtain respectively using Viterbi method
The original period information of each basic phone.
Preferably, the speech synthesis system, wherein, the music score of Chinese operas information includes the beat of the corresponding target song
Information, the beat information is used to represent the temporal information of each beat in the corresponding target song, in a section
Bat includes one or more basic phones;
Then the second processing unit is according to the beat information, respectively by each basic phone it is described original when
Segment information is adjusted to represent the time corresponding to the beat number that the basic phone covers in the target song
Beat period information.
Preferably, the speech synthesis system, wherein, second synthesis unit specifically includes:
Judgment module, for respectively believing the original period information of each basic phone and the beat period
Breath is compared, and exports corresponding comparative result;
First processing module connects the judgment module, for according to the comparative result:
Represent that the time span that the original period information represents is shorter than the beat period information in the comparative result
During the time span of expression, the phonetic synthesis parameter corresponding to the basic phone performs the interpolation processing in time-domain,
To obtain corresponding to the phonetic synthesis parameter being adjusted of the basic phone;And
Represent that the time span that the original period information represents is longer than the beat period information in the comparative result
During the time span of expression, the phonetic synthesis parameter corresponding to the basic phone performs pumps processing in time-domain,
To obtain corresponding to the phonetic synthesis parameter being adjusted of the basic phone.
Preferably, the speech synthesis system, wherein, it is further included in second synthesis unit:
Second processing module connects the first processing module, for joining to the phonetic synthesis of the basic phone
After number is adjusted, the phonetic synthesis parameter is smoothed.
Preferably, the speech synthesis system, wherein, it is further included in the music score of Chinese operas information of the target song for described
The tune information of each note of target song;
The speech simulation unit includes:
Fundamental frequency replacement module is closed for being replaced the voice of each basic phone respectively using the tune information
Into the fundamental frequency information in parameter;
Speech simulation module connects the fundamental frequency replacement module, for according to the phonetic synthesis parameter by replacing
With the pronunciation text, form the synthesis voice of analog subscriber song and export.
A kind of phoneme synthesizing method of analog subscriber song, suitable for speech simulation application;Wherein, it is pre- that one is obtained in advance
If target song music score of Chinese operas information, further include:
Step S1 obtains user speech when externally input user normally speaks, and the user speech is converted into
Corresponding pronunciation text and according to it is described pronunciation text form the corresponding phone sequence including basic phone;
Step S2 obtains the corresponding original period information of each basic phone, institute according to the phone series processing
Original period information is stated for representing beginning and ending time of each basic phone in the user speech;
The original period information of each basic phone is adjusted to correspond to the music score of Chinese operas information by step S3 respectively
Beat period information, the beat period information is for representing that each basic phone is corresponding in the target song
Beginning and ending time in beat;
Step S4, for each basic phone original period information and the beat period information, and
The phonetic synthesis parameter of each basic phone is accordingly adjusted according to judging result;
Step S5, according to the phonetic synthesis parameter being adjusted of each basic phone and the pronunciation text
This, forms the synthesis voice of analog subscriber song and exports.
Preferably, the phoneme synthesizing method, wherein, the music score of Chinese operas information includes the beat of the corresponding target song
Information, the beat information is used to represent the temporal information of each beat in the corresponding target song, in a section
Bat includes one or more basic phones;
Then in the step S3, according to the beat information, respectively by the original period of each basic phone
Information is adjusted to represent the institute of the time corresponding to the beat number that the basic phone covers in the target song
State beat period information.
Preferably, the phoneme synthesizing method, wherein, the step S4 is specifically included:
Step S41, respectively by the original period information of each basic phone and the beat period information into
Row compares:
If the original period information of the basic phone is more than the beat period information, step S42 is turned to;
If the original period information of the basic phone is less than the beat period information, step S43 is turned to;
Step S42, the phonetic synthesis parameter corresponding to the basic phone perform the interpolation processing in time-domain, with
It obtains corresponding to the phonetic synthesis parameter being adjusted of the basic phone, is subsequently diverted to step S44;
Step S43, the phonetic synthesis parameter corresponding to the basic phone performs pumps processing in time-domain, with
It obtains corresponding to the phonetic synthesis parameter being adjusted of the basic phone, is subsequently diverted to step S44;
Step S44 does smoothing processing to the phonetic synthesis parameter being adjusted, is subsequently diverted to the step S5.
Preferably, the phoneme synthesizing method, wherein, it is further included in the music score of Chinese operas information of the target song for described
The tune information of each note of target song;
Then the step S5 is specifically included:
Step S51, in the phonetic synthesis parameter for replacing each basic phone respectively using the tune information
Fundamental frequency information;
Step S52 according to the phonetic synthesis parameter by replacing and the pronunciation text, forms analog subscriber song
The synthesis voice and export.
The advantageous effect of above-mentioned technical proposal is:
1) a kind of speech synthesis system of analog subscriber song is provided, without modeling the language that just can normally speak according to user
Sound forms user's song that analog subscriber is sung, and can improve the efficiency of speech simulation, reach intimate Real-time Feedback and sung to user
The effect of song, and retain the timbre information of user, the voice details of synthesis is enriched, and effect is true to nature, thus improves user's body
It tests.
2) a kind of phoneme synthesizing method of analog subscriber song is provided, above-mentioned speech synthesis system can be supported normally to transport
Row.
Description of the drawings
Fig. 1 is a kind of overall structure of the speech synthesis system of analog subscriber song in the preferred embodiment of the present invention
Schematic diagram;
Fig. 2 is the schematic diagram that is adjusted to basic phone according to beat information in the preferred embodiment of the present invention;
Fig. 3 is in the preferred embodiment of the present invention, and on the basis of Fig. 1, the concrete structure of the second synthesis unit is illustrated
Figure;
Fig. 4 is in the preferred embodiment of the present invention, and on the basis of Fig. 1, the concrete structure of speech simulation unit is illustrated
Figure;
Fig. 5 is a kind of overall procedure of the phoneme synthesizing method of analog subscriber song in the preferred embodiment of the present invention
Schematic diagram;
Fig. 6 is in the preferred embodiment of the present invention, on the basis of Fig. 5, to the phonetic synthesis parameter of basic phone into
The flow diagram of row adjustment;
Fig. 7 is in the preferred embodiment of the present invention, on the basis of Fig. 5, forms the flow diagram of synthesis voice.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete
Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, those of ordinary skill in the art obtained on the premise of creative work is not made it is all its
His embodiment, belongs to the scope of protection of the invention.
It should be noted that in the case where there is no conflict, the feature in embodiment and embodiment in the present invention can phase
Mutually combination.
The invention will be further described in the following with reference to the drawings and specific embodiments, but not as limiting to the invention.
Based on the above-mentioned problems in the prior art, a kind of speech synthesis system of analog subscriber song is now provided, it should
System is suitable for speech simulation application, so-called speech simulation application, be it is a kind of speak or sing by analog subscriber etc. with
Reach the application program of " mechanically repeating other people's words " purpose, this application program is usually more common in mobile terminal or terminal.
Then in preferred embodiment of the invention, the speech synthesis system of above-mentioned analog subscriber song it is specific as shown in Figure 1,
Including:
First acquisition unit 1, for obtaining user speech when externally input user normally speaks;
First converting unit 2 connects first acquisition unit 1, for user speech to be converted into corresponding pronunciation text, with
And the corresponding phone sequence including basic phone is formed according to pronunciation text;
First processing units 3 connect the first converting unit 2, for obtaining each basic phone according to phone series processing
Corresponding original period information, original period information are used to represent beginning and ending time of each basic phone in user speech;
First synthesis unit 4 connects first acquisition unit 1 and first processing units 3, for according to user speech respectively
The original period information of fundamental frequency information and each basic phone handles to obtain respectively the voice of each basic phone of user's song
Synthetic parameters;
Second acquisition unit 5, for obtaining the music score of Chinese operas information in the corresponding target song of user's song;
Second processing unit 6 connects first processing units 3 and second acquisition unit 5 respectively, for will each basic phone
Original period information be adjusted to the beat period information of corresponding music score of Chinese operas information respectively, beat period information is for representing each base
Beginning and ending time of this phone in target song in corresponding beat;
Second synthesis unit 7 connects the first synthesis unit 4 and second processing unit 6 respectively, for being directed to each basic announcement
The more original period information of son and beat period information, and accordingly adjust according to judging result the phonetic synthesis of each basic phone
Parameter;
Speech simulation unit 8 connects the second synthesis unit 7,5 and first converting unit 2 of second acquisition unit, is used for respectively
According to the phonetic synthesis parameter being adjusted of each basic phone and pronunciation text, the synthesis language of formation analog subscriber song
Sound simultaneously exports.
Specifically, in the present embodiment, above-mentioned first acquisition unit 1 obtains user when user input by user normally speaks
Voice, the user's voice are to gather the sound sent when user normally speaks to obtain.Above-mentioned first acquisition unit 1 can connect
Such as the pick up facility of mobile terminal, pick up facility collect user speech during user speaks and are obtained by first single
Acquired in member 1, it is subsequently sent in the first converting unit 2.Above-mentioned user speech is converted into corresponding pronunciation by the first converting unit 2
Text, conversion regime carry out that details are not described herein according to the mode of voice converting text of the prior art.
Then, the first converting unit 2 converts thereof into corresponding phone sequence according to above-mentioned pronunciation text.The phone sequence
Row include the basic phone of sequential.It can be according to existing into corresponding basic phone by each text conversion in the text that pronounces
There is the phone converting form in technology to carry out.Such as the pronunciation text of Chinese can according to Chinese phone (phonetic symbol) compare form into
Row, the pronunciation text of English can compare form according to English phone (phonetic symbol) and carry out, and details are not described herein.
After phone sequence is converted to, according to the phone sequence and user speech, Viterbi may be employed
(Veterbi) method handles to obtain original period information of each basic phone in user speech, which uses
In beginning and ending time of the corresponding basic phone of expression in user speech, you can to represent corresponding basic phone in user speech
In the occupied period.Therefore the initial time and end time of basic phone can be included in above-mentioned original period information, i.e.,
The basic phone occupied period in user's song is represented using the initial time and end time of basic phone.
Obtain the original period information of each basic phone in above-mentioned first processing units 3 respectively using the above method.
In the present embodiment, while according to the user speech of above-mentioned input, using based on the speech synthesis technique of parameter to
In the voice of family various pieces extraction phonetic synthesis parameter, the phonetic synthesis parameter include user voice fundamental frequency information and
Parameter information needed for other phonetic syntheses, such as frequency spectrum envelope information, aperiodicity information or other a certain vocoders
Required parameter information etc..
In the present embodiment, above-mentioned second acquisition unit 5 is used to obtain the music score of Chinese operas information of a default target song.Specifically
Ground, above-mentioned target song for its song desired by user can matched target song, number of songs can be previously set in language
It, can be by choosing a song being set in advance when user will input user speech into application in sound simulation application
Mode for target song and as accompaniment carries out, and in other words, the music score of Chinese operas information acquired in above-mentioned second acquisition unit 5 is in advance
It is set in the application, is obtained without additional acquisition mode.
In the present embodiment, then, above-mentioned second processing unit 6 is according to above-mentioned music score of Chinese operas information, by the original of each basic phone
Beginning period information is adjusted to beat period information, so that each basic phone and the beat of target song mutually agree with, above-mentioned beat
Period information is used to represent beginning and ending time of each basic phone in target song in corresponding beat.
Finally, in the present embodiment, above-mentioned second synthesis unit 7 is according to the original period information and beat of each basic phone
Period information is adjusted the phonetic synthesis parameter of each basic phone.Above-mentioned speech simulation unit 8 is finally according to through toning
The pronunciation text that whole phonetic synthesis parameter and above-mentioned conversion are formed simulates user's song, to form synthesis voice simultaneously
Output, achievees the purpose that simulate user's song.
In the preferred embodiment of the present invention, music score of Chinese operas information includes the beat information of corresponding target song, beat letter
Breath includes one or more basic announcements for representing the temporal information of each beat in corresponding target song, in a beat
Son;
Then the original period information of each basic phone is adjusted to use by second processing unit 6 respectively according to beat information
In the beat period information of the time corresponding to the beat number that the basic phone of expression covers in target song.
Specifically, in the present embodiment, in a first target song, basic phone (namely a pronunciation unit) can be with
A beat or multiple beats are covered, can also only cover a part for a beat.Conversely, it can include in a beat
As soon as basic phone or multiple basic phones, above-mentioned beat information is for representing in the standard singing style of target song, each
The beat number that basic phone is covered.Therefore, above-mentioned second processing unit 6 can be according to above-mentioned beat information, by each base
The original period information of this phone is adjusted to adapt to the beat period information of target song respectively, which is used for
Represent the lasting total pitch time of each basic phone, which can be represented by beat number.
Specifically, as shown in Figure 2, the original period for the basic phone that 3 processing of the first behavior first processing units is formed
Information.For Phone1-4 for representing basic phone, each square can be understood as a beat, Huo Zhejie in Fig. 2
The element clapped.The beat period information for the basic phone that the processing of second behavior second processing unit 6 obtains.From
It can be seen that, after being adjusted, each basic phone has been aligned on the beat of target song in Fig. 2.
In the preferred embodiment of the present invention, as shown in Figure 3, above-mentioned second synthesis unit 7 specifically includes:
Judgment module 71, for respectively comparing the original period information of each basic phone with beat period information
Compared with, and export corresponding comparative result;
First processing module 72, connection judgment module 71, for according to comparative result:
The time for being shorter than the expression of beat period information in the time span that the original period information of comparative result expression represents is long
When spending, the interpolation processing in time-domain is performed to the corresponding phonetic synthesis parameter of basic phone, to obtain corresponding to basic phone
The phonetic synthesis parameter being adjusted;And
The time for being longer than the expression of beat period information in the time span that the original period information of comparative result expression represents is long
When spending, the corresponding phonetic synthesis parameter of basic phone is performed in time-domain and pump processing, to obtain corresponding to basic phone
The phonetic synthesis parameter being adjusted.It is so-called to pump processing, when referring to delete additional with essentially identical time interval
Between frame.
Specifically, in the present embodiment, tool that above-mentioned second synthesis unit 7 adjusts to the phonetic synthesis parameter of basic phone
Body principle can be:
The original period information and beat period information of basic phone are compared using above-mentioned judgment module 71 first,
To judge basic phone in the front and rear situation of period information adjustment.Then using above-mentioned first processing module 72 according to judgment module
71 judging result adjusts to the corresponding phonetic synthesis parameter of basic phone.Specially:
When the time span of the original period information of basic phone is shorter than the time span of beat period information, i.e., through toning
Basic phone occupied time is elongated (Phone1 as shown in Figure 2 and 3) after whole, and first processing module 72 is to base at this time
The corresponding phonetic synthesis parameter of this phone does the interpolation processing in time-domain, to be adjusted to it.So-called interpolation processing, can be with
Directly by the way of replicating, some time frames are carried out with reuse and achievees the effect that interpolation, linear interpolation can also be used
Mode obtain the newly-increased phonetic synthesis parameter frame between adjacent time frame.
When the time span of the original period information of basic phone is longer than the time span of beat period information, i.e., through toning
The basic phone occupied time has shortened (Phone2 as shown in Figure 2) after whole, and first processing module 72 is to basic at this time
The corresponding phonetic synthesis parameter of phone is done and pumps processing in time-domain, to be adjusted to it.
Certainly, when the time span of the original period information of basic phone is equal with the time span of beat period information,
The basic occupied time-preserving of phone (Phone4 as shown in Figure 2) before and after adjusting, at this time first processing module
72 also do not make any adjustments the corresponding phonetic synthesis parameter of basic phone.
In the preferred embodiment of the present invention, still as shown in Figure 3, further included in above-mentioned second synthesis unit 7:
Second processing module 73 connects first processing module 72, is adjusted for the phonetic synthesis parameter to basic phone
After whole, the generation of the behavioral characteristics based on difference that may be referred in the parameter synthesis technology based on Hidden Markov Model is smooth
The process of parameter is smoothed phonetic synthesis parameter.
Specifically, in the present embodiment, in order to ensure that the basic phone after adjustment is smooth in time, there is no lofty
Basic phone, then after the phonetic synthesis parameter to basic phone is adjusted, it is necessary to be smoothed to it, to protect
Demonstrate,prove the smoothness of the phonetic synthesis parameter of all basic phones in time.
In the preferred embodiment of the present invention, further included in the music score of Chinese operas information of above-mentioned target song for the every of target song
The tune information of a note;
Then as shown in Figure 4, above-mentioned speech simulation unit 8 includes:
Fundamental frequency replacement module 81, for being replaced respectively in the phonetic synthesis parameter of each basic phone using tune information
The fundamental frequency information of original user voice;
Speech simulation module 82, connection fundamental frequency replacement module 81, for according to the phonetic synthesis parameter and hair by replacing
Sound text forms the synthesis voice of analog subscriber song and exports.
Specifically, in the present embodiment, the music score of Chinese operas information that above-mentioned second acquisition unit 5 obtains includes representing that target is sung
The tune information of each note in song, which is mainly used for representing the accuracy in pitch of note, using tune information to user
Voice carries out the modification of rhythm and tone, so that the voice simulated caters to target song, is unlikely to the feelings for detonieren occur
Condition.
Then above-mentioned fundamental frequency replacement module 81 replaces the base of user in script phonetic synthesis parameter using above-mentioned tune information
Frequency information and the time span for accordingly adjusting each pronunciation unit in user speech, make the beat in its fit object music score of Chinese operas, but
It is since other phonetic synthesis parameters of the timbre information of expression user do not make an amendment, the tone color of user does not change substantially
Become, finally by speech simulation module 82 obtain be exactly user's script song.
To sum up, in technical solution of the present invention, user's song is simulated using aforesaid way, pronunciation duration can be obtained
The synthesis voice completely the same all with user with pronunciation intonation, and the tone color of the synthesis voice is still the original tone color of user.
The independence of user itself tone color had so not only been remained, but also has completed and imitates the interaction that user sings, and analog rate is very
Soon, the degree of intimate real-time Simulation can be reached, so as to bring a kind of entirely different user experience to user.
In the preferred embodiment of the present invention, based on the above speech synthesis system, a kind of simulation is now provided and is used
The phoneme synthesizing method of family song, this method are equally applicable in speech simulation application.
In this method, the music score of Chinese operas information of corresponding target song when user sings is obtained in advance, and is performed such as institute in Fig. 5
The following step shown, specifically includes:
Step S1 obtains the normal one's voice in speech of externally input user, and user speech is converted into corresponding pronunciation
Text and according to pronunciation text form the corresponding phone sequence including basic phone;
Step S2 obtains the corresponding original period information of each basic phone, original period letter according to phone series processing
It ceases to represent beginning and ending time of each basic phone in user speech;
The original period information of each basic phone is adjusted to the beat period letter of corresponding music score of Chinese operas information by step S3 respectively
Breath, beat period information are used to represent beginning and ending time of each basic phone in target song in corresponding beat;
Step S4, for the more original period information of each basic phone and beat period information, and according to judging result
The phonetic synthesis parameter of the corresponding each basic phone of adjustment;
Step S5 according to the phonetic synthesis parameter being adjusted of each basic phone and pronunciation text, forms simulation
The synthesis voice of user's song simultaneously exports.
In the preferred embodiment of the present invention, music score of Chinese operas information includes the beat information of corresponding target song, beat letter
Breath includes one or more basic announcements for representing the temporal information of each beat in corresponding target song, in a beat
Son;
Then in step S3, according to beat information, the original period information of each basic phone is adjusted to for table respectively
Show the beat period information of the time corresponding to the beat number that basic phone covers in target song.
In the preferred embodiment of the present invention, as shown in Figure 6, above-mentioned steps S4 further comprises:
Step S41, respectively by the original period information of each basic phone compared with beat period information:
If the original period information of basic phone is more than beat period information, step S42 is turned to;
If the original period information of basic phone is less than beat period information, step S43 is turned to;
Step S42 performs the interpolation processing in time-domain, to be corresponded to the corresponding phonetic synthesis parameter of basic phone
The phonetic synthesis parameter being adjusted of basic phone, is subsequently diverted to step S44;
Step S43 performs basic phone corresponding phonetic synthesis parameter in time-domain and pumps processing, to be corresponded to
The phonetic synthesis parameter being adjusted of basic phone, is subsequently diverted to step S44;
Step S44 does smoothing processing to the phonetic synthesis parameter being adjusted, is subsequently diverted to step S5.
In the preferred embodiment of the present invention, further included in the music score of Chinese operas information of above-mentioned target song for the every of target song
The tune information of a note;
Then as shown in fig. 7, above-mentioned steps S5 further comprises:
Step S51 replaces the fundamental frequency information in the phonetic synthesis parameter of each basic phone using tune information respectively;
Step S52, according to the phonetic synthesis parameter by replacing and pronunciation text, the synthesis language of formation analog subscriber song
Sound simultaneously exports.
The foregoing is merely preferred embodiments of the present invention, not thereby limit embodiments of the present invention and protection model
It encloses, to those skilled in the art, should can appreciate that all with made by description of the invention and diagramatic content
Equivalent substitution and obviously change obtained scheme, should all include within the scope of the present invention.
Claims (10)
1. a kind of speech synthesis system of analog subscriber song, suitable for speech simulation application;It is characterised in that it includes:
First acquisition unit, for obtaining user speech when externally input user normally speaks;
First converting unit connects the first acquisition unit, for the user speech to be converted into corresponding pronunciation text,
And the corresponding phone sequence including basic phone is formed according to the pronunciation text;
First processing units connect first converting unit, for obtaining each base according to the phone series processing
The corresponding original period information of this phone, the original period information are used to represent each basic phone in user's language
Beginning and ending time in sound;
First synthesis unit connects the first acquisition unit and the first processing units respectively, for according to the user
The original period information of the fundamental frequency information of voice and each basic phone handles to obtain the user speech respectively
The phonetic synthesis parameter of each basic phone;
Second acquisition unit, for obtaining the music score of Chinese operas information in a default target song;
Second processing unit connects the first processing units and the second acquisition unit respectively, for inciting somebody to action each base
The original period information of this phone is adjusted to correspond to the beat period information of the music score of Chinese operas information, the beat period respectively
Information is used to represent beginning and ending time of each basic phone in the target song in corresponding beat;
Second synthesis unit connects first synthesis unit and the second processing unit respectively, each described for being directed to
The basic phone original period information and the beat period information, and according to judging result accordingly adjust it is each described in
The phonetic synthesis parameter of basic phone;
Speech simulation unit connects second synthesis unit, second acquisition unit and first converting unit, is used for respectively
According to the phonetic synthesis parameter being adjusted of each basic phone and the pronunciation text, analog subscriber is formed
The synthesis voice of song simultaneously exports.
2. speech synthesis system as described in claim 1, which is characterized in that the first processing units use Viterbi method
Processing obtains the original period information of each basic phone respectively.
3. speech synthesis system as described in claim 1, which is characterized in that the music score of Chinese operas information includes the corresponding mesh
The beat information of song is marked, the beat information is used to represent the temporal information of each beat in the corresponding target song,
Include one or more basic phones in a beat;
Then the second processing unit respectively believes the original period of each basic phone according to the beat information
Breath is adjusted to represent the beat of the time corresponding to the beat number that the basic phone covers in the target song
Period information.
4. speech synthesis system as described in claim 1, which is characterized in that second synthesis unit specifically includes:
Judgment module, for respectively by the original period information of each basic phone and the beat period information into
Row compares, and exports corresponding comparative result;
First processing module connects the judgment module, for according to the comparative result:
Represent that the time span that the original period information represents is shorter than the beat period information and represents in the comparative result
Time span when, the phonetic synthesis parameter corresponding to the basic phone performs the interpolation processing in time-domain, with
To the phonetic synthesis parameter being adjusted of the correspondence basic phone;And
Represent that the time span that the original period information represents is longer than the beat period information and represents in the comparative result
Time span when, the phonetic synthesis parameter corresponding to the basic phone performs pumps processing in time-domain, with must
To the phonetic synthesis parameter being adjusted of the correspondence basic phone.
5. speech synthesis system as claimed in claim 4, which is characterized in that further included in second synthesis unit:
Second processing module connects the first processing module, for the phonetic synthesis parameter to the basic phone into
After row adjustment, the phonetic synthesis parameter is smoothed.
6. speech synthesis system as described in claim 1, which is characterized in that in the music score of Chinese operas information of the target song also
Tune information including each note for being used for the target song;
The speech simulation unit includes:
Fundamental frequency replacement module, for replacing the phonetic synthesis ginseng of each basic phone respectively using the tune information
Fundamental frequency information in number;
Speech simulation module connects the fundamental frequency replacement module, for according to the phonetic synthesis parameter by replacing and institute
Pronunciation text is stated, form the synthesis voice of analog subscriber song and is exported.
7. a kind of phoneme synthesizing method of analog subscriber song, suitable for speech simulation application;It is characterized in that, it obtains in advance
The music score of Chinese operas information of one default target song, further includes:
Step S1 obtains user speech when externally input user normally speaks, and the user speech is converted into corresponding to
Pronunciation text and according to it is described pronunciation text form the corresponding phone sequence including basic phone;
Step S2 obtains the corresponding original period information of each basic phone, the original according to the phone series processing
Beginning period information is used to represent beginning and ending time of each basic phone in the user speech;
The original period information of each basic phone is adjusted to correspond to the section of the music score of Chinese operas information by step S3 respectively
Period information is clapped, the beat period information is used to represent each basic phone corresponding beat in the target song
In beginning and ending time;
Step S4, for each basic phone original period information and the beat period information, and according to
Judging result accordingly adjusts the phonetic synthesis parameter of each basic phone;
Step S5, according to the phonetic synthesis parameter being adjusted of each basic phone and the pronunciation text,
It forms the synthesis voice of analog subscriber song and exports.
8. phoneme synthesizing method as claimed in claim 7, which is characterized in that the music score of Chinese operas information includes the corresponding mesh
The beat information of song is marked, the beat information is used to represent the temporal information of each beat in the corresponding target song,
Include one or more basic phones in a beat;
Then in the step S3, according to the beat information, respectively by the original period information of each basic phone
It is adjusted to represent the section of the time corresponding to beat number that the basic phone covers in the target song
Clap period information.
9. phoneme synthesizing method as claimed in claim 7, which is characterized in that the step S4 is specifically included:
Step S41 respectively compares the original period information of each basic phone with the beat period information
Compared with:
If the original period information of the basic phone is more than the beat period information, step S42 is turned to;
If the original period information of the basic phone is less than the beat period information, step S43 is turned to;
Step S42, the phonetic synthesis parameter corresponding to the basic phone performs the interpolation processing in time-domain, to obtain
The phonetic synthesis parameter being adjusted of the corresponding basic phone, is subsequently diverted to step S44;
Step S43, the phonetic synthesis parameter corresponding to the basic phone performs pumps processing in time-domain, to obtain
The phonetic synthesis parameter being adjusted of the corresponding basic phone, is subsequently diverted to step S44;
Step S44 does smoothing processing to the phonetic synthesis parameter being adjusted, is subsequently diverted to the step S5.
10. phoneme synthesizing method as claimed in claim 7, which is characterized in that in the music score of Chinese operas information of the target song
Further include the tune information of each note for the target song;
Then the step S5 is specifically included:
Step S51 replaces the base in the phonetic synthesis parameter of each basic phone using the tune information respectively
Frequency information;
Step S52, according to the phonetic synthesis parameter by replacing and the pronunciation text, the institute of formation analog subscriber song
It states synthesis voice and exports.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711079095.0A CN108053814B (en) | 2017-11-06 | 2017-11-06 | Speech synthesis system and method for simulating singing voice of user |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711079095.0A CN108053814B (en) | 2017-11-06 | 2017-11-06 | Speech synthesis system and method for simulating singing voice of user |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108053814A true CN108053814A (en) | 2018-05-18 |
CN108053814B CN108053814B (en) | 2023-10-13 |
Family
ID=62118922
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711079095.0A Active CN108053814B (en) | 2017-11-06 | 2017-11-06 | Speech synthesis system and method for simulating singing voice of user |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108053814B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108831437A (en) * | 2018-06-15 | 2018-11-16 | 百度在线网络技术(北京)有限公司 | A kind of song generation method, device, terminal and storage medium |
CN108877753A (en) * | 2018-06-15 | 2018-11-23 | 百度在线网络技术(北京)有限公司 | Music synthesis method and system, terminal and computer readable storage medium |
CN110600034A (en) * | 2019-09-12 | 2019-12-20 | 广州酷狗计算机科技有限公司 | Singing voice generation method, singing voice generation device, singing voice generation equipment and storage medium |
CN110838286A (en) * | 2019-11-19 | 2020-02-25 | 腾讯科技(深圳)有限公司 | Model training method, language identification method, device and equipment |
CN111354332A (en) * | 2018-12-05 | 2020-06-30 | 北京嘀嘀无限科技发展有限公司 | Singing voice synthesis method and device |
CN111681637A (en) * | 2020-04-28 | 2020-09-18 | 平安科技(深圳)有限公司 | Song synthesis method, device, equipment and storage medium |
WO2020248388A1 (en) * | 2019-06-11 | 2020-12-17 | 平安科技(深圳)有限公司 | Method and device for training singing voice synthesis model, computer apparatus, and storage medium |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR960013731B1 (en) * | 1993-05-10 | 1996-10-10 | 조길완 | Method and its apparatus of electronic auto sing scoring |
JPH10288993A (en) * | 1997-04-15 | 1998-10-27 | Daiichi Kosho:Kk | Karaoke sing-along machine with vocal mimicry function |
JPH1185177A (en) * | 1997-09-01 | 1999-03-30 | Taito Corp | Automatic singing imitation system for karaoke sing-along machine |
CN1719514A (en) * | 2004-07-06 | 2006-01-11 | 中国科学院自动化研究所 | Based on speech analysis and synthetic high-quality real-time change of voice method |
CN101588322A (en) * | 2009-06-18 | 2009-11-25 | 中山大学 | Mailbox system based on speech recognition |
CN101901598A (en) * | 2010-06-30 | 2010-12-01 | 北京捷通华声语音技术有限公司 | Humming synthesis method and system |
CN102024453A (en) * | 2009-09-09 | 2011-04-20 | 财团法人资讯工业策进会 | Singing sound synthesis system, method and device |
WO2012148112A2 (en) * | 2011-04-28 | 2012-11-01 | 주식회사 티젠스 | System for creating musical content using a client terminal |
CN103035235A (en) * | 2011-09-30 | 2013-04-10 | 西门子公司 | Method and device for transforming voice into melody |
CN103915093A (en) * | 2012-12-31 | 2014-07-09 | 安徽科大讯飞信息科技股份有限公司 | Method and device for realizing voice singing |
CN106328144A (en) * | 2015-06-30 | 2017-01-11 | 芋头科技(杭州)有限公司 | Telephone-network-based remote voice control system |
CN106652997A (en) * | 2016-12-29 | 2017-05-10 | 腾讯音乐娱乐(深圳)有限公司 | Audio synthesis method and terminal |
-
2017
- 2017-11-06 CN CN201711079095.0A patent/CN108053814B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR960013731B1 (en) * | 1993-05-10 | 1996-10-10 | 조길완 | Method and its apparatus of electronic auto sing scoring |
JPH10288993A (en) * | 1997-04-15 | 1998-10-27 | Daiichi Kosho:Kk | Karaoke sing-along machine with vocal mimicry function |
JPH1185177A (en) * | 1997-09-01 | 1999-03-30 | Taito Corp | Automatic singing imitation system for karaoke sing-along machine |
CN1719514A (en) * | 2004-07-06 | 2006-01-11 | 中国科学院自动化研究所 | Based on speech analysis and synthetic high-quality real-time change of voice method |
CN101588322A (en) * | 2009-06-18 | 2009-11-25 | 中山大学 | Mailbox system based on speech recognition |
CN102024453A (en) * | 2009-09-09 | 2011-04-20 | 财团法人资讯工业策进会 | Singing sound synthesis system, method and device |
CN101901598A (en) * | 2010-06-30 | 2010-12-01 | 北京捷通华声语音技术有限公司 | Humming synthesis method and system |
WO2012148112A2 (en) * | 2011-04-28 | 2012-11-01 | 주식회사 티젠스 | System for creating musical content using a client terminal |
CN103035235A (en) * | 2011-09-30 | 2013-04-10 | 西门子公司 | Method and device for transforming voice into melody |
CN103915093A (en) * | 2012-12-31 | 2014-07-09 | 安徽科大讯飞信息科技股份有限公司 | Method and device for realizing voice singing |
CN106328144A (en) * | 2015-06-30 | 2017-01-11 | 芋头科技(杭州)有限公司 | Telephone-network-based remote voice control system |
CN106652997A (en) * | 2016-12-29 | 2017-05-10 | 腾讯音乐娱乐(深圳)有限公司 | Audio synthesis method and terminal |
Non-Patent Citations (1)
Title |
---|
肖熙;周路;: "基于k均值和基于归一化类内方差的语音识别自适应聚类特征提取算法", 清华大学学报(自然科学版), no. 08, pages 75 - 79 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108831437A (en) * | 2018-06-15 | 2018-11-16 | 百度在线网络技术(北京)有限公司 | A kind of song generation method, device, terminal and storage medium |
CN108877753A (en) * | 2018-06-15 | 2018-11-23 | 百度在线网络技术(北京)有限公司 | Music synthesis method and system, terminal and computer readable storage medium |
CN108877753B (en) * | 2018-06-15 | 2020-01-21 | 百度在线网络技术(北京)有限公司 | Music synthesis method and system, terminal and computer readable storage medium |
CN108831437B (en) * | 2018-06-15 | 2020-09-01 | 百度在线网络技术(北京)有限公司 | Singing voice generation method, singing voice generation device, terminal and storage medium |
US10971125B2 (en) | 2018-06-15 | 2021-04-06 | Baidu Online Network Technology (Beijing) Co., Ltd. | Music synthesis method, system, terminal and computer-readable storage medium |
CN111354332A (en) * | 2018-12-05 | 2020-06-30 | 北京嘀嘀无限科技发展有限公司 | Singing voice synthesis method and device |
WO2020248388A1 (en) * | 2019-06-11 | 2020-12-17 | 平安科技(深圳)有限公司 | Method and device for training singing voice synthesis model, computer apparatus, and storage medium |
CN110600034A (en) * | 2019-09-12 | 2019-12-20 | 广州酷狗计算机科技有限公司 | Singing voice generation method, singing voice generation device, singing voice generation equipment and storage medium |
CN110600034B (en) * | 2019-09-12 | 2021-12-03 | 广州酷狗计算机科技有限公司 | Singing voice generation method, singing voice generation device, singing voice generation equipment and storage medium |
CN110838286A (en) * | 2019-11-19 | 2020-02-25 | 腾讯科技(深圳)有限公司 | Model training method, language identification method, device and equipment |
CN111681637A (en) * | 2020-04-28 | 2020-09-18 | 平安科技(深圳)有限公司 | Song synthesis method, device, equipment and storage medium |
CN111681637B (en) * | 2020-04-28 | 2024-03-22 | 平安科技(深圳)有限公司 | Song synthesis method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN108053814B (en) | 2023-10-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108053814A (en) | A kind of speech synthesis system and method for analog subscriber song | |
CN101578659B (en) | Voice tone converting device and voice tone converting method | |
Carlson et al. | Experiments with voice modelling in speech synthesis | |
CN105654939B (en) | A kind of phoneme synthesizing method based on sound vector text feature | |
US8386256B2 (en) | Method, apparatus and computer program product for providing real glottal pulses in HMM-based text-to-speech synthesis | |
CN106971703A (en) | A kind of song synthetic method and device based on HMM | |
JP2002328695A (en) | Method for generating personalized voice from text | |
CN102436807A (en) | Method and system for automatically generating voice with stressed syllables | |
CN106057192A (en) | Real-time voice conversion method and apparatus | |
CN109326280B (en) | Singing synthesis method and device and electronic equipment | |
CN101901598A (en) | Humming synthesis method and system | |
CN111370024A (en) | Audio adjusting method, device and computer readable storage medium | |
Rabiner et al. | Computer synthesis of speech by concatenation of formant-coded words | |
JP2002244689A (en) | Synthesizing method for averaged voice and method for synthesizing arbitrary-speaker's voice from averaged voice | |
CN109036376A (en) | A kind of the south of Fujian Province language phoneme synthesizing method | |
CN113470622B (en) | Conversion method and device capable of converting any voice into multiple voices | |
Tamaru et al. | Generative moment matching network-based random modulation post-filter for DNN-based singing voice synthesis and neural double-tracking | |
JP6330069B2 (en) | Multi-stream spectral representation for statistical parametric speech synthesis | |
Lee et al. | A comparative study of spectral transformation techniques for singing voice synthesis | |
US10643600B1 (en) | Modifying syllable durations for personalizing Chinese Mandarin TTS using small corpus | |
CN112242134A (en) | Speech synthesis method and device | |
CN112185343B (en) | Method and device for synthesizing singing voice and audio | |
Aso et al. | Speakbysinging: Converting singing voices to speaking voices while retaining voice timbre | |
CN113724684A (en) | Voice synthesis method and system for air traffic control instruction | |
Gutiérrez-Arriola et al. | A new multi-speaker formant synthesizer that applies voice conversion techniques |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |