CN108053814A - A kind of speech synthesis system and method for analog subscriber song - Google Patents

A kind of speech synthesis system and method for analog subscriber song Download PDF

Info

Publication number
CN108053814A
CN108053814A CN201711079095.0A CN201711079095A CN108053814A CN 108053814 A CN108053814 A CN 108053814A CN 201711079095 A CN201711079095 A CN 201711079095A CN 108053814 A CN108053814 A CN 108053814A
Authority
CN
China
Prior art keywords
information
basic phone
beat
period information
phone
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711079095.0A
Other languages
Chinese (zh)
Other versions
CN108053814B (en
Inventor
孟猛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yutou Technology Hangzhou Co Ltd
Original Assignee
Yutou Technology Hangzhou Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yutou Technology Hangzhou Co Ltd filed Critical Yutou Technology Hangzhou Co Ltd
Priority to CN201711079095.0A priority Critical patent/CN108053814B/en
Publication of CN108053814A publication Critical patent/CN108053814A/en
Application granted granted Critical
Publication of CN108053814B publication Critical patent/CN108053814B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0091Means for obtaining special acoustic effects
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/315Sound category-dependent sound synthesis processes [Gensound] for musical use; Sound category-specific synthesis-controlling parameters or control means therefor
    • G10H2250/455Gensound singing voices, i.e. generation of human voices for musical applications, vocal singing sounds or intelligible words at a desired pitch or with desired vocal effects, e.g. by phoneme synthesis

Abstract

The invention discloses a kind of speech synthesis systems and method of analog subscriber song, belong to speech simulation technical field;Its principle is:It obtains the normal speech utterance of externally input user and is converted into pronunciation text, phone sequence is formed according to pronunciation text;The original period information of basic phone is obtained according to phone series processing;The original period information of basic phone is adjusted to the beat period information of corresponding music score of Chinese operas information;For the more original period information of basic phone and beat period information, and accordingly adjust according to judging result the phonetic synthesis parameter of basic phone;According to the phonetic synthesis parameter being adjusted of basic phone and pronunciation text, form the synthesis voice of analog subscriber song and export.The advantageous effect of above-mentioned technical proposal is:Without modeling, just energy analog subscriber is sung, and is improved the efficiency of speech simulation, is achieved the effect that intimate Real-time Feedback, retain the timbre information of user, details is enriched, and effect is true to nature, thus improves user experience.

Description

A kind of speech synthesis system and method for analog subscriber song
Technical field
The present invention relates to the speech synthesis systems and side of speech simulation technical field more particularly to a kind of analog subscriber song Method.
Background technology
With the continuous development of speech synthesis technique, more and more application software start using speech synthesis technique come mould Anthropomorphic content of speaking, the content spoken for example with speech synthesis technique simulation people is to obtain the purpose of " mechanically repeating other people's words " or adopt The voices different from scene of commonly speaking such as people's singing are simulated with speech synthesis technique.
Specifically, in the prior art, in the scene sung in analog subscriber, it is common practice to using speech synthesis data Intrinsic tone color generates song in storehouse, and needs to be modeled the timbre information of user, is existed using tamber transformation technology The song effect of user voice is realized on the basis of the intrinsic tone color of song.The defects of this way, essentially consists in:
1. needs are in advance modeled the timbre information of user so that the process of phonetic synthesis is more complicated;
2. needs realize the conversion of user voice according to the model of structure, so as to the song synthesized, processing speed Degree is slower, and treatment effeciency is low, can not realize and handle in real time and export the effect of song;
3. using timbre information intrinsic in synthesizer database come realize phonetic synthesis and simulation by the way of can not retain The tamber characteristic of user in itself so that the results contrast of speech simulation is stiff, and simulation effect is not inconsistent with actual tone color.
The content of the invention
According to the above-mentioned problems in the prior art, speech synthesis system and the side of a kind of analog subscriber song are now provided The technical solution of method, sound of speaking when directly user is normally spoken are converted into the singing sound of certain tune, it is intended to improve language The efficiency of sound simulation achievees the effect that intimate Real-time Feedback is sung to user, and retains the timbre information of user, the language of synthesis Sound details is enriched, and effect is true to nature, thus improves user experience.
Above-mentioned technical proposal specifically includes:
A kind of speech synthesis system of analog subscriber song, suitable for speech simulation application;Wherein, including:
First acquisition unit, for obtaining user speech when externally input user normally speaks;
First converting unit connects the first acquisition unit, for the user speech to be converted into corresponding pronunciation Text and according to it is described pronunciation text form the corresponding phone sequence including basic phone;
First processing units connect first converting unit, for obtaining each institute according to the phone series processing The corresponding original period information of basic phone is stated, the original period information is used to represent each basic phone in the use Beginning and ending time in the voice of family;
First synthesis unit connects the first acquisition unit and the first processing units respectively, for according to The original period information of the fundamental frequency information of user speech and each basic phone handles to obtain the user respectively The phonetic synthesis parameter of each basic phone of voice;
Second acquisition unit, for obtaining the music score of Chinese operas information in a default target song;
Second processing unit connects the first processing units and the second acquisition unit respectively, for by each institute The original period information for stating basic phone is adjusted to correspond to the beat period information of the music score of Chinese operas information, the beat respectively Period information is used to represent beginning and ending time of each basic phone in the target song in corresponding beat;
Second synthesis unit connects first synthesis unit and the second processing unit respectively, each for being directed to The basic phone original period information and the beat period information, and accordingly adjusted according to judging result each The phonetic synthesis parameter of the basic phone;
Speech simulation unit connects second synthesis unit, second acquisition unit and first converting unit respectively, For the phonetic synthesis parameter being adjusted according to each basic phone and the pronunciation text, simulation is formed The synthesis voice of user's song simultaneously exports.
Preferably, the speech synthesis system, wherein, the first processing units handle to obtain respectively using Viterbi method The original period information of each basic phone.
Preferably, the speech synthesis system, wherein, the music score of Chinese operas information includes the beat of the corresponding target song Information, the beat information is used to represent the temporal information of each beat in the corresponding target song, in a section Bat includes one or more basic phones;
Then the second processing unit is according to the beat information, respectively by each basic phone it is described original when Segment information is adjusted to represent the time corresponding to the beat number that the basic phone covers in the target song Beat period information.
Preferably, the speech synthesis system, wherein, second synthesis unit specifically includes:
Judgment module, for respectively believing the original period information of each basic phone and the beat period Breath is compared, and exports corresponding comparative result;
First processing module connects the judgment module, for according to the comparative result:
Represent that the time span that the original period information represents is shorter than the beat period information in the comparative result During the time span of expression, the phonetic synthesis parameter corresponding to the basic phone performs the interpolation processing in time-domain, To obtain corresponding to the phonetic synthesis parameter being adjusted of the basic phone;And
Represent that the time span that the original period information represents is longer than the beat period information in the comparative result During the time span of expression, the phonetic synthesis parameter corresponding to the basic phone performs pumps processing in time-domain, To obtain corresponding to the phonetic synthesis parameter being adjusted of the basic phone.
Preferably, the speech synthesis system, wherein, it is further included in second synthesis unit:
Second processing module connects the first processing module, for joining to the phonetic synthesis of the basic phone After number is adjusted, the phonetic synthesis parameter is smoothed.
Preferably, the speech synthesis system, wherein, it is further included in the music score of Chinese operas information of the target song for described The tune information of each note of target song;
The speech simulation unit includes:
Fundamental frequency replacement module is closed for being replaced the voice of each basic phone respectively using the tune information Into the fundamental frequency information in parameter;
Speech simulation module connects the fundamental frequency replacement module, for according to the phonetic synthesis parameter by replacing With the pronunciation text, form the synthesis voice of analog subscriber song and export.
A kind of phoneme synthesizing method of analog subscriber song, suitable for speech simulation application;Wherein, it is pre- that one is obtained in advance If target song music score of Chinese operas information, further include:
Step S1 obtains user speech when externally input user normally speaks, and the user speech is converted into Corresponding pronunciation text and according to it is described pronunciation text form the corresponding phone sequence including basic phone;
Step S2 obtains the corresponding original period information of each basic phone, institute according to the phone series processing Original period information is stated for representing beginning and ending time of each basic phone in the user speech;
The original period information of each basic phone is adjusted to correspond to the music score of Chinese operas information by step S3 respectively Beat period information, the beat period information is for representing that each basic phone is corresponding in the target song Beginning and ending time in beat;
Step S4, for each basic phone original period information and the beat period information, and The phonetic synthesis parameter of each basic phone is accordingly adjusted according to judging result;
Step S5, according to the phonetic synthesis parameter being adjusted of each basic phone and the pronunciation text This, forms the synthesis voice of analog subscriber song and exports.
Preferably, the phoneme synthesizing method, wherein, the music score of Chinese operas information includes the beat of the corresponding target song Information, the beat information is used to represent the temporal information of each beat in the corresponding target song, in a section Bat includes one or more basic phones;
Then in the step S3, according to the beat information, respectively by the original period of each basic phone Information is adjusted to represent the institute of the time corresponding to the beat number that the basic phone covers in the target song State beat period information.
Preferably, the phoneme synthesizing method, wherein, the step S4 is specifically included:
Step S41, respectively by the original period information of each basic phone and the beat period information into Row compares:
If the original period information of the basic phone is more than the beat period information, step S42 is turned to;
If the original period information of the basic phone is less than the beat period information, step S43 is turned to;
Step S42, the phonetic synthesis parameter corresponding to the basic phone perform the interpolation processing in time-domain, with It obtains corresponding to the phonetic synthesis parameter being adjusted of the basic phone, is subsequently diverted to step S44;
Step S43, the phonetic synthesis parameter corresponding to the basic phone performs pumps processing in time-domain, with It obtains corresponding to the phonetic synthesis parameter being adjusted of the basic phone, is subsequently diverted to step S44;
Step S44 does smoothing processing to the phonetic synthesis parameter being adjusted, is subsequently diverted to the step S5.
Preferably, the phoneme synthesizing method, wherein, it is further included in the music score of Chinese operas information of the target song for described The tune information of each note of target song;
Then the step S5 is specifically included:
Step S51, in the phonetic synthesis parameter for replacing each basic phone respectively using the tune information Fundamental frequency information;
Step S52 according to the phonetic synthesis parameter by replacing and the pronunciation text, forms analog subscriber song The synthesis voice and export.
The advantageous effect of above-mentioned technical proposal is:
1) a kind of speech synthesis system of analog subscriber song is provided, without modeling the language that just can normally speak according to user Sound forms user's song that analog subscriber is sung, and can improve the efficiency of speech simulation, reach intimate Real-time Feedback and sung to user The effect of song, and retain the timbre information of user, the voice details of synthesis is enriched, and effect is true to nature, thus improves user's body It tests.
2) a kind of phoneme synthesizing method of analog subscriber song is provided, above-mentioned speech synthesis system can be supported normally to transport Row.
Description of the drawings
Fig. 1 is a kind of overall structure of the speech synthesis system of analog subscriber song in the preferred embodiment of the present invention Schematic diagram;
Fig. 2 is the schematic diagram that is adjusted to basic phone according to beat information in the preferred embodiment of the present invention;
Fig. 3 is in the preferred embodiment of the present invention, and on the basis of Fig. 1, the concrete structure of the second synthesis unit is illustrated Figure;
Fig. 4 is in the preferred embodiment of the present invention, and on the basis of Fig. 1, the concrete structure of speech simulation unit is illustrated Figure;
Fig. 5 is a kind of overall procedure of the phoneme synthesizing method of analog subscriber song in the preferred embodiment of the present invention Schematic diagram;
Fig. 6 is in the preferred embodiment of the present invention, on the basis of Fig. 5, to the phonetic synthesis parameter of basic phone into The flow diagram of row adjustment;
Fig. 7 is in the preferred embodiment of the present invention, on the basis of Fig. 5, forms the flow diagram of synthesis voice.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art obtained on the premise of creative work is not made it is all its His embodiment, belongs to the scope of protection of the invention.
It should be noted that in the case where there is no conflict, the feature in embodiment and embodiment in the present invention can phase Mutually combination.
The invention will be further described in the following with reference to the drawings and specific embodiments, but not as limiting to the invention.
Based on the above-mentioned problems in the prior art, a kind of speech synthesis system of analog subscriber song is now provided, it should System is suitable for speech simulation application, so-called speech simulation application, be it is a kind of speak or sing by analog subscriber etc. with Reach the application program of " mechanically repeating other people's words " purpose, this application program is usually more common in mobile terminal or terminal.
Then in preferred embodiment of the invention, the speech synthesis system of above-mentioned analog subscriber song it is specific as shown in Figure 1, Including:
First acquisition unit 1, for obtaining user speech when externally input user normally speaks;
First converting unit 2 connects first acquisition unit 1, for user speech to be converted into corresponding pronunciation text, with And the corresponding phone sequence including basic phone is formed according to pronunciation text;
First processing units 3 connect the first converting unit 2, for obtaining each basic phone according to phone series processing Corresponding original period information, original period information are used to represent beginning and ending time of each basic phone in user speech;
First synthesis unit 4 connects first acquisition unit 1 and first processing units 3, for according to user speech respectively The original period information of fundamental frequency information and each basic phone handles to obtain respectively the voice of each basic phone of user's song Synthetic parameters;
Second acquisition unit 5, for obtaining the music score of Chinese operas information in the corresponding target song of user's song;
Second processing unit 6 connects first processing units 3 and second acquisition unit 5 respectively, for will each basic phone Original period information be adjusted to the beat period information of corresponding music score of Chinese operas information respectively, beat period information is for representing each base Beginning and ending time of this phone in target song in corresponding beat;
Second synthesis unit 7 connects the first synthesis unit 4 and second processing unit 6 respectively, for being directed to each basic announcement The more original period information of son and beat period information, and accordingly adjust according to judging result the phonetic synthesis of each basic phone Parameter;
Speech simulation unit 8 connects the second synthesis unit 7,5 and first converting unit 2 of second acquisition unit, is used for respectively According to the phonetic synthesis parameter being adjusted of each basic phone and pronunciation text, the synthesis language of formation analog subscriber song Sound simultaneously exports.
Specifically, in the present embodiment, above-mentioned first acquisition unit 1 obtains user when user input by user normally speaks Voice, the user's voice are to gather the sound sent when user normally speaks to obtain.Above-mentioned first acquisition unit 1 can connect Such as the pick up facility of mobile terminal, pick up facility collect user speech during user speaks and are obtained by first single Acquired in member 1, it is subsequently sent in the first converting unit 2.Above-mentioned user speech is converted into corresponding pronunciation by the first converting unit 2 Text, conversion regime carry out that details are not described herein according to the mode of voice converting text of the prior art.
Then, the first converting unit 2 converts thereof into corresponding phone sequence according to above-mentioned pronunciation text.The phone sequence Row include the basic phone of sequential.It can be according to existing into corresponding basic phone by each text conversion in the text that pronounces There is the phone converting form in technology to carry out.Such as the pronunciation text of Chinese can according to Chinese phone (phonetic symbol) compare form into Row, the pronunciation text of English can compare form according to English phone (phonetic symbol) and carry out, and details are not described herein.
After phone sequence is converted to, according to the phone sequence and user speech, Viterbi may be employed (Veterbi) method handles to obtain original period information of each basic phone in user speech, which uses In beginning and ending time of the corresponding basic phone of expression in user speech, you can to represent corresponding basic phone in user speech In the occupied period.Therefore the initial time and end time of basic phone can be included in above-mentioned original period information, i.e., The basic phone occupied period in user's song is represented using the initial time and end time of basic phone. Obtain the original period information of each basic phone in above-mentioned first processing units 3 respectively using the above method.
In the present embodiment, while according to the user speech of above-mentioned input, using based on the speech synthesis technique of parameter to In the voice of family various pieces extraction phonetic synthesis parameter, the phonetic synthesis parameter include user voice fundamental frequency information and Parameter information needed for other phonetic syntheses, such as frequency spectrum envelope information, aperiodicity information or other a certain vocoders Required parameter information etc..
In the present embodiment, above-mentioned second acquisition unit 5 is used to obtain the music score of Chinese operas information of a default target song.Specifically Ground, above-mentioned target song for its song desired by user can matched target song, number of songs can be previously set in language It, can be by choosing a song being set in advance when user will input user speech into application in sound simulation application Mode for target song and as accompaniment carries out, and in other words, the music score of Chinese operas information acquired in above-mentioned second acquisition unit 5 is in advance It is set in the application, is obtained without additional acquisition mode.
In the present embodiment, then, above-mentioned second processing unit 6 is according to above-mentioned music score of Chinese operas information, by the original of each basic phone Beginning period information is adjusted to beat period information, so that each basic phone and the beat of target song mutually agree with, above-mentioned beat Period information is used to represent beginning and ending time of each basic phone in target song in corresponding beat.
Finally, in the present embodiment, above-mentioned second synthesis unit 7 is according to the original period information and beat of each basic phone Period information is adjusted the phonetic synthesis parameter of each basic phone.Above-mentioned speech simulation unit 8 is finally according to through toning The pronunciation text that whole phonetic synthesis parameter and above-mentioned conversion are formed simulates user's song, to form synthesis voice simultaneously Output, achievees the purpose that simulate user's song.
In the preferred embodiment of the present invention, music score of Chinese operas information includes the beat information of corresponding target song, beat letter Breath includes one or more basic announcements for representing the temporal information of each beat in corresponding target song, in a beat Son;
Then the original period information of each basic phone is adjusted to use by second processing unit 6 respectively according to beat information In the beat period information of the time corresponding to the beat number that the basic phone of expression covers in target song.
Specifically, in the present embodiment, in a first target song, basic phone (namely a pronunciation unit) can be with A beat or multiple beats are covered, can also only cover a part for a beat.Conversely, it can include in a beat As soon as basic phone or multiple basic phones, above-mentioned beat information is for representing in the standard singing style of target song, each The beat number that basic phone is covered.Therefore, above-mentioned second processing unit 6 can be according to above-mentioned beat information, by each base The original period information of this phone is adjusted to adapt to the beat period information of target song respectively, which is used for Represent the lasting total pitch time of each basic phone, which can be represented by beat number.
Specifically, as shown in Figure 2, the original period for the basic phone that 3 processing of the first behavior first processing units is formed Information.For Phone1-4 for representing basic phone, each square can be understood as a beat, Huo Zhejie in Fig. 2 The element clapped.The beat period information for the basic phone that the processing of second behavior second processing unit 6 obtains.From It can be seen that, after being adjusted, each basic phone has been aligned on the beat of target song in Fig. 2.
In the preferred embodiment of the present invention, as shown in Figure 3, above-mentioned second synthesis unit 7 specifically includes:
Judgment module 71, for respectively comparing the original period information of each basic phone with beat period information Compared with, and export corresponding comparative result;
First processing module 72, connection judgment module 71, for according to comparative result:
The time for being shorter than the expression of beat period information in the time span that the original period information of comparative result expression represents is long When spending, the interpolation processing in time-domain is performed to the corresponding phonetic synthesis parameter of basic phone, to obtain corresponding to basic phone The phonetic synthesis parameter being adjusted;And
The time for being longer than the expression of beat period information in the time span that the original period information of comparative result expression represents is long When spending, the corresponding phonetic synthesis parameter of basic phone is performed in time-domain and pump processing, to obtain corresponding to basic phone The phonetic synthesis parameter being adjusted.It is so-called to pump processing, when referring to delete additional with essentially identical time interval Between frame.
Specifically, in the present embodiment, tool that above-mentioned second synthesis unit 7 adjusts to the phonetic synthesis parameter of basic phone Body principle can be:
The original period information and beat period information of basic phone are compared using above-mentioned judgment module 71 first, To judge basic phone in the front and rear situation of period information adjustment.Then using above-mentioned first processing module 72 according to judgment module 71 judging result adjusts to the corresponding phonetic synthesis parameter of basic phone.Specially:
When the time span of the original period information of basic phone is shorter than the time span of beat period information, i.e., through toning Basic phone occupied time is elongated (Phone1 as shown in Figure 2 and 3) after whole, and first processing module 72 is to base at this time The corresponding phonetic synthesis parameter of this phone does the interpolation processing in time-domain, to be adjusted to it.So-called interpolation processing, can be with Directly by the way of replicating, some time frames are carried out with reuse and achievees the effect that interpolation, linear interpolation can also be used Mode obtain the newly-increased phonetic synthesis parameter frame between adjacent time frame.
When the time span of the original period information of basic phone is longer than the time span of beat period information, i.e., through toning The basic phone occupied time has shortened (Phone2 as shown in Figure 2) after whole, and first processing module 72 is to basic at this time The corresponding phonetic synthesis parameter of phone is done and pumps processing in time-domain, to be adjusted to it.
Certainly, when the time span of the original period information of basic phone is equal with the time span of beat period information, The basic occupied time-preserving of phone (Phone4 as shown in Figure 2) before and after adjusting, at this time first processing module 72 also do not make any adjustments the corresponding phonetic synthesis parameter of basic phone.
In the preferred embodiment of the present invention, still as shown in Figure 3, further included in above-mentioned second synthesis unit 7:
Second processing module 73 connects first processing module 72, is adjusted for the phonetic synthesis parameter to basic phone After whole, the generation of the behavioral characteristics based on difference that may be referred in the parameter synthesis technology based on Hidden Markov Model is smooth The process of parameter is smoothed phonetic synthesis parameter.
Specifically, in the present embodiment, in order to ensure that the basic phone after adjustment is smooth in time, there is no lofty Basic phone, then after the phonetic synthesis parameter to basic phone is adjusted, it is necessary to be smoothed to it, to protect Demonstrate,prove the smoothness of the phonetic synthesis parameter of all basic phones in time.
In the preferred embodiment of the present invention, further included in the music score of Chinese operas information of above-mentioned target song for the every of target song The tune information of a note;
Then as shown in Figure 4, above-mentioned speech simulation unit 8 includes:
Fundamental frequency replacement module 81, for being replaced respectively in the phonetic synthesis parameter of each basic phone using tune information The fundamental frequency information of original user voice;
Speech simulation module 82, connection fundamental frequency replacement module 81, for according to the phonetic synthesis parameter and hair by replacing Sound text forms the synthesis voice of analog subscriber song and exports.
Specifically, in the present embodiment, the music score of Chinese operas information that above-mentioned second acquisition unit 5 obtains includes representing that target is sung The tune information of each note in song, which is mainly used for representing the accuracy in pitch of note, using tune information to user Voice carries out the modification of rhythm and tone, so that the voice simulated caters to target song, is unlikely to the feelings for detonieren occur Condition.
Then above-mentioned fundamental frequency replacement module 81 replaces the base of user in script phonetic synthesis parameter using above-mentioned tune information Frequency information and the time span for accordingly adjusting each pronunciation unit in user speech, make the beat in its fit object music score of Chinese operas, but It is since other phonetic synthesis parameters of the timbre information of expression user do not make an amendment, the tone color of user does not change substantially Become, finally by speech simulation module 82 obtain be exactly user's script song.
To sum up, in technical solution of the present invention, user's song is simulated using aforesaid way, pronunciation duration can be obtained The synthesis voice completely the same all with user with pronunciation intonation, and the tone color of the synthesis voice is still the original tone color of user. The independence of user itself tone color had so not only been remained, but also has completed and imitates the interaction that user sings, and analog rate is very Soon, the degree of intimate real-time Simulation can be reached, so as to bring a kind of entirely different user experience to user.
In the preferred embodiment of the present invention, based on the above speech synthesis system, a kind of simulation is now provided and is used The phoneme synthesizing method of family song, this method are equally applicable in speech simulation application.
In this method, the music score of Chinese operas information of corresponding target song when user sings is obtained in advance, and is performed such as institute in Fig. 5 The following step shown, specifically includes:
Step S1 obtains the normal one's voice in speech of externally input user, and user speech is converted into corresponding pronunciation Text and according to pronunciation text form the corresponding phone sequence including basic phone;
Step S2 obtains the corresponding original period information of each basic phone, original period letter according to phone series processing It ceases to represent beginning and ending time of each basic phone in user speech;
The original period information of each basic phone is adjusted to the beat period letter of corresponding music score of Chinese operas information by step S3 respectively Breath, beat period information are used to represent beginning and ending time of each basic phone in target song in corresponding beat;
Step S4, for the more original period information of each basic phone and beat period information, and according to judging result The phonetic synthesis parameter of the corresponding each basic phone of adjustment;
Step S5 according to the phonetic synthesis parameter being adjusted of each basic phone and pronunciation text, forms simulation The synthesis voice of user's song simultaneously exports.
In the preferred embodiment of the present invention, music score of Chinese operas information includes the beat information of corresponding target song, beat letter Breath includes one or more basic announcements for representing the temporal information of each beat in corresponding target song, in a beat Son;
Then in step S3, according to beat information, the original period information of each basic phone is adjusted to for table respectively Show the beat period information of the time corresponding to the beat number that basic phone covers in target song.
In the preferred embodiment of the present invention, as shown in Figure 6, above-mentioned steps S4 further comprises:
Step S41, respectively by the original period information of each basic phone compared with beat period information:
If the original period information of basic phone is more than beat period information, step S42 is turned to;
If the original period information of basic phone is less than beat period information, step S43 is turned to;
Step S42 performs the interpolation processing in time-domain, to be corresponded to the corresponding phonetic synthesis parameter of basic phone The phonetic synthesis parameter being adjusted of basic phone, is subsequently diverted to step S44;
Step S43 performs basic phone corresponding phonetic synthesis parameter in time-domain and pumps processing, to be corresponded to The phonetic synthesis parameter being adjusted of basic phone, is subsequently diverted to step S44;
Step S44 does smoothing processing to the phonetic synthesis parameter being adjusted, is subsequently diverted to step S5.
In the preferred embodiment of the present invention, further included in the music score of Chinese operas information of above-mentioned target song for the every of target song The tune information of a note;
Then as shown in fig. 7, above-mentioned steps S5 further comprises:
Step S51 replaces the fundamental frequency information in the phonetic synthesis parameter of each basic phone using tune information respectively;
Step S52, according to the phonetic synthesis parameter by replacing and pronunciation text, the synthesis language of formation analog subscriber song Sound simultaneously exports.
The foregoing is merely preferred embodiments of the present invention, not thereby limit embodiments of the present invention and protection model It encloses, to those skilled in the art, should can appreciate that all with made by description of the invention and diagramatic content Equivalent substitution and obviously change obtained scheme, should all include within the scope of the present invention.

Claims (10)

1. a kind of speech synthesis system of analog subscriber song, suitable for speech simulation application;It is characterised in that it includes:
First acquisition unit, for obtaining user speech when externally input user normally speaks;
First converting unit connects the first acquisition unit, for the user speech to be converted into corresponding pronunciation text, And the corresponding phone sequence including basic phone is formed according to the pronunciation text;
First processing units connect first converting unit, for obtaining each base according to the phone series processing The corresponding original period information of this phone, the original period information are used to represent each basic phone in user's language Beginning and ending time in sound;
First synthesis unit connects the first acquisition unit and the first processing units respectively, for according to the user The original period information of the fundamental frequency information of voice and each basic phone handles to obtain the user speech respectively The phonetic synthesis parameter of each basic phone;
Second acquisition unit, for obtaining the music score of Chinese operas information in a default target song;
Second processing unit connects the first processing units and the second acquisition unit respectively, for inciting somebody to action each base The original period information of this phone is adjusted to correspond to the beat period information of the music score of Chinese operas information, the beat period respectively Information is used to represent beginning and ending time of each basic phone in the target song in corresponding beat;
Second synthesis unit connects first synthesis unit and the second processing unit respectively, each described for being directed to The basic phone original period information and the beat period information, and according to judging result accordingly adjust it is each described in The phonetic synthesis parameter of basic phone;
Speech simulation unit connects second synthesis unit, second acquisition unit and first converting unit, is used for respectively According to the phonetic synthesis parameter being adjusted of each basic phone and the pronunciation text, analog subscriber is formed The synthesis voice of song simultaneously exports.
2. speech synthesis system as described in claim 1, which is characterized in that the first processing units use Viterbi method Processing obtains the original period information of each basic phone respectively.
3. speech synthesis system as described in claim 1, which is characterized in that the music score of Chinese operas information includes the corresponding mesh The beat information of song is marked, the beat information is used to represent the temporal information of each beat in the corresponding target song, Include one or more basic phones in a beat;
Then the second processing unit respectively believes the original period of each basic phone according to the beat information Breath is adjusted to represent the beat of the time corresponding to the beat number that the basic phone covers in the target song Period information.
4. speech synthesis system as described in claim 1, which is characterized in that second synthesis unit specifically includes:
Judgment module, for respectively by the original period information of each basic phone and the beat period information into Row compares, and exports corresponding comparative result;
First processing module connects the judgment module, for according to the comparative result:
Represent that the time span that the original period information represents is shorter than the beat period information and represents in the comparative result Time span when, the phonetic synthesis parameter corresponding to the basic phone performs the interpolation processing in time-domain, with To the phonetic synthesis parameter being adjusted of the correspondence basic phone;And
Represent that the time span that the original period information represents is longer than the beat period information and represents in the comparative result Time span when, the phonetic synthesis parameter corresponding to the basic phone performs pumps processing in time-domain, with must To the phonetic synthesis parameter being adjusted of the correspondence basic phone.
5. speech synthesis system as claimed in claim 4, which is characterized in that further included in second synthesis unit:
Second processing module connects the first processing module, for the phonetic synthesis parameter to the basic phone into After row adjustment, the phonetic synthesis parameter is smoothed.
6. speech synthesis system as described in claim 1, which is characterized in that in the music score of Chinese operas information of the target song also Tune information including each note for being used for the target song;
The speech simulation unit includes:
Fundamental frequency replacement module, for replacing the phonetic synthesis ginseng of each basic phone respectively using the tune information Fundamental frequency information in number;
Speech simulation module connects the fundamental frequency replacement module, for according to the phonetic synthesis parameter by replacing and institute Pronunciation text is stated, form the synthesis voice of analog subscriber song and is exported.
7. a kind of phoneme synthesizing method of analog subscriber song, suitable for speech simulation application;It is characterized in that, it obtains in advance The music score of Chinese operas information of one default target song, further includes:
Step S1 obtains user speech when externally input user normally speaks, and the user speech is converted into corresponding to Pronunciation text and according to it is described pronunciation text form the corresponding phone sequence including basic phone;
Step S2 obtains the corresponding original period information of each basic phone, the original according to the phone series processing Beginning period information is used to represent beginning and ending time of each basic phone in the user speech;
The original period information of each basic phone is adjusted to correspond to the section of the music score of Chinese operas information by step S3 respectively Period information is clapped, the beat period information is used to represent each basic phone corresponding beat in the target song In beginning and ending time;
Step S4, for each basic phone original period information and the beat period information, and according to Judging result accordingly adjusts the phonetic synthesis parameter of each basic phone;
Step S5, according to the phonetic synthesis parameter being adjusted of each basic phone and the pronunciation text, It forms the synthesis voice of analog subscriber song and exports.
8. phoneme synthesizing method as claimed in claim 7, which is characterized in that the music score of Chinese operas information includes the corresponding mesh The beat information of song is marked, the beat information is used to represent the temporal information of each beat in the corresponding target song, Include one or more basic phones in a beat;
Then in the step S3, according to the beat information, respectively by the original period information of each basic phone It is adjusted to represent the section of the time corresponding to beat number that the basic phone covers in the target song Clap period information.
9. phoneme synthesizing method as claimed in claim 7, which is characterized in that the step S4 is specifically included:
Step S41 respectively compares the original period information of each basic phone with the beat period information Compared with:
If the original period information of the basic phone is more than the beat period information, step S42 is turned to;
If the original period information of the basic phone is less than the beat period information, step S43 is turned to;
Step S42, the phonetic synthesis parameter corresponding to the basic phone performs the interpolation processing in time-domain, to obtain The phonetic synthesis parameter being adjusted of the corresponding basic phone, is subsequently diverted to step S44;
Step S43, the phonetic synthesis parameter corresponding to the basic phone performs pumps processing in time-domain, to obtain The phonetic synthesis parameter being adjusted of the corresponding basic phone, is subsequently diverted to step S44;
Step S44 does smoothing processing to the phonetic synthesis parameter being adjusted, is subsequently diverted to the step S5.
10. phoneme synthesizing method as claimed in claim 7, which is characterized in that in the music score of Chinese operas information of the target song Further include the tune information of each note for the target song;
Then the step S5 is specifically included:
Step S51 replaces the base in the phonetic synthesis parameter of each basic phone using the tune information respectively Frequency information;
Step S52, according to the phonetic synthesis parameter by replacing and the pronunciation text, the institute of formation analog subscriber song It states synthesis voice and exports.
CN201711079095.0A 2017-11-06 2017-11-06 Speech synthesis system and method for simulating singing voice of user Active CN108053814B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711079095.0A CN108053814B (en) 2017-11-06 2017-11-06 Speech synthesis system and method for simulating singing voice of user

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711079095.0A CN108053814B (en) 2017-11-06 2017-11-06 Speech synthesis system and method for simulating singing voice of user

Publications (2)

Publication Number Publication Date
CN108053814A true CN108053814A (en) 2018-05-18
CN108053814B CN108053814B (en) 2023-10-13

Family

ID=62118922

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711079095.0A Active CN108053814B (en) 2017-11-06 2017-11-06 Speech synthesis system and method for simulating singing voice of user

Country Status (1)

Country Link
CN (1) CN108053814B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108831437A (en) * 2018-06-15 2018-11-16 百度在线网络技术(北京)有限公司 A kind of song generation method, device, terminal and storage medium
CN108877753A (en) * 2018-06-15 2018-11-23 百度在线网络技术(北京)有限公司 Music synthesis method and system, terminal and computer readable storage medium
CN110600034A (en) * 2019-09-12 2019-12-20 广州酷狗计算机科技有限公司 Singing voice generation method, singing voice generation device, singing voice generation equipment and storage medium
CN110838286A (en) * 2019-11-19 2020-02-25 腾讯科技(深圳)有限公司 Model training method, language identification method, device and equipment
CN111354332A (en) * 2018-12-05 2020-06-30 北京嘀嘀无限科技发展有限公司 Singing voice synthesis method and device
CN111681637A (en) * 2020-04-28 2020-09-18 平安科技(深圳)有限公司 Song synthesis method, device, equipment and storage medium
WO2020248388A1 (en) * 2019-06-11 2020-12-17 平安科技(深圳)有限公司 Method and device for training singing voice synthesis model, computer apparatus, and storage medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR960013731B1 (en) * 1993-05-10 1996-10-10 조길완 Method and its apparatus of electronic auto sing scoring
JPH10288993A (en) * 1997-04-15 1998-10-27 Daiichi Kosho:Kk Karaoke sing-along machine with vocal mimicry function
JPH1185177A (en) * 1997-09-01 1999-03-30 Taito Corp Automatic singing imitation system for karaoke sing-along machine
CN1719514A (en) * 2004-07-06 2006-01-11 中国科学院自动化研究所 Based on speech analysis and synthetic high-quality real-time change of voice method
CN101588322A (en) * 2009-06-18 2009-11-25 中山大学 Mailbox system based on speech recognition
CN101901598A (en) * 2010-06-30 2010-12-01 北京捷通华声语音技术有限公司 Humming synthesis method and system
CN102024453A (en) * 2009-09-09 2011-04-20 财团法人资讯工业策进会 Singing sound synthesis system, method and device
WO2012148112A2 (en) * 2011-04-28 2012-11-01 주식회사 티젠스 System for creating musical content using a client terminal
CN103035235A (en) * 2011-09-30 2013-04-10 西门子公司 Method and device for transforming voice into melody
CN103915093A (en) * 2012-12-31 2014-07-09 安徽科大讯飞信息科技股份有限公司 Method and device for realizing voice singing
CN106328144A (en) * 2015-06-30 2017-01-11 芋头科技(杭州)有限公司 Telephone-network-based remote voice control system
CN106652997A (en) * 2016-12-29 2017-05-10 腾讯音乐娱乐(深圳)有限公司 Audio synthesis method and terminal

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR960013731B1 (en) * 1993-05-10 1996-10-10 조길완 Method and its apparatus of electronic auto sing scoring
JPH10288993A (en) * 1997-04-15 1998-10-27 Daiichi Kosho:Kk Karaoke sing-along machine with vocal mimicry function
JPH1185177A (en) * 1997-09-01 1999-03-30 Taito Corp Automatic singing imitation system for karaoke sing-along machine
CN1719514A (en) * 2004-07-06 2006-01-11 中国科学院自动化研究所 Based on speech analysis and synthetic high-quality real-time change of voice method
CN101588322A (en) * 2009-06-18 2009-11-25 中山大学 Mailbox system based on speech recognition
CN102024453A (en) * 2009-09-09 2011-04-20 财团法人资讯工业策进会 Singing sound synthesis system, method and device
CN101901598A (en) * 2010-06-30 2010-12-01 北京捷通华声语音技术有限公司 Humming synthesis method and system
WO2012148112A2 (en) * 2011-04-28 2012-11-01 주식회사 티젠스 System for creating musical content using a client terminal
CN103035235A (en) * 2011-09-30 2013-04-10 西门子公司 Method and device for transforming voice into melody
CN103915093A (en) * 2012-12-31 2014-07-09 安徽科大讯飞信息科技股份有限公司 Method and device for realizing voice singing
CN106328144A (en) * 2015-06-30 2017-01-11 芋头科技(杭州)有限公司 Telephone-network-based remote voice control system
CN106652997A (en) * 2016-12-29 2017-05-10 腾讯音乐娱乐(深圳)有限公司 Audio synthesis method and terminal

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
肖熙;周路;: "基于k均值和基于归一化类内方差的语音识别自适应聚类特征提取算法", 清华大学学报(自然科学版), no. 08, pages 75 - 79 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108831437A (en) * 2018-06-15 2018-11-16 百度在线网络技术(北京)有限公司 A kind of song generation method, device, terminal and storage medium
CN108877753A (en) * 2018-06-15 2018-11-23 百度在线网络技术(北京)有限公司 Music synthesis method and system, terminal and computer readable storage medium
CN108877753B (en) * 2018-06-15 2020-01-21 百度在线网络技术(北京)有限公司 Music synthesis method and system, terminal and computer readable storage medium
CN108831437B (en) * 2018-06-15 2020-09-01 百度在线网络技术(北京)有限公司 Singing voice generation method, singing voice generation device, terminal and storage medium
US10971125B2 (en) 2018-06-15 2021-04-06 Baidu Online Network Technology (Beijing) Co., Ltd. Music synthesis method, system, terminal and computer-readable storage medium
CN111354332A (en) * 2018-12-05 2020-06-30 北京嘀嘀无限科技发展有限公司 Singing voice synthesis method and device
WO2020248388A1 (en) * 2019-06-11 2020-12-17 平安科技(深圳)有限公司 Method and device for training singing voice synthesis model, computer apparatus, and storage medium
CN110600034A (en) * 2019-09-12 2019-12-20 广州酷狗计算机科技有限公司 Singing voice generation method, singing voice generation device, singing voice generation equipment and storage medium
CN110600034B (en) * 2019-09-12 2021-12-03 广州酷狗计算机科技有限公司 Singing voice generation method, singing voice generation device, singing voice generation equipment and storage medium
CN110838286A (en) * 2019-11-19 2020-02-25 腾讯科技(深圳)有限公司 Model training method, language identification method, device and equipment
CN111681637A (en) * 2020-04-28 2020-09-18 平安科技(深圳)有限公司 Song synthesis method, device, equipment and storage medium
CN111681637B (en) * 2020-04-28 2024-03-22 平安科技(深圳)有限公司 Song synthesis method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN108053814B (en) 2023-10-13

Similar Documents

Publication Publication Date Title
CN108053814A (en) A kind of speech synthesis system and method for analog subscriber song
CN101578659B (en) Voice tone converting device and voice tone converting method
Carlson et al. Experiments with voice modelling in speech synthesis
CN105654939B (en) A kind of phoneme synthesizing method based on sound vector text feature
US8386256B2 (en) Method, apparatus and computer program product for providing real glottal pulses in HMM-based text-to-speech synthesis
CN106971703A (en) A kind of song synthetic method and device based on HMM
JP2002328695A (en) Method for generating personalized voice from text
CN102436807A (en) Method and system for automatically generating voice with stressed syllables
CN106057192A (en) Real-time voice conversion method and apparatus
CN109326280B (en) Singing synthesis method and device and electronic equipment
CN101901598A (en) Humming synthesis method and system
CN111370024A (en) Audio adjusting method, device and computer readable storage medium
Rabiner et al. Computer synthesis of speech by concatenation of formant-coded words
JP2002244689A (en) Synthesizing method for averaged voice and method for synthesizing arbitrary-speaker's voice from averaged voice
CN109036376A (en) A kind of the south of Fujian Province language phoneme synthesizing method
CN113470622B (en) Conversion method and device capable of converting any voice into multiple voices
Tamaru et al. Generative moment matching network-based random modulation post-filter for DNN-based singing voice synthesis and neural double-tracking
JP6330069B2 (en) Multi-stream spectral representation for statistical parametric speech synthesis
Lee et al. A comparative study of spectral transformation techniques for singing voice synthesis
US10643600B1 (en) Modifying syllable durations for personalizing Chinese Mandarin TTS using small corpus
CN112242134A (en) Speech synthesis method and device
CN112185343B (en) Method and device for synthesizing singing voice and audio
Aso et al. Speakbysinging: Converting singing voices to speaking voices while retaining voice timbre
CN113724684A (en) Voice synthesis method and system for air traffic control instruction
Gutiérrez-Arriola et al. A new multi-speaker formant synthesizer that applies voice conversion techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant