CN101901598A - Humming synthesis method and system - Google Patents
Humming synthesis method and system Download PDFInfo
- Publication number
- CN101901598A CN101901598A CN2010102234975A CN201010223497A CN101901598A CN 101901598 A CN101901598 A CN 101901598A CN 2010102234975 A CN2010102234975 A CN 2010102234975A CN 201010223497 A CN201010223497 A CN 201010223497A CN 101901598 A CN101901598 A CN 101901598A
- Authority
- CN
- China
- Prior art keywords
- syllable
- parameters
- base frequency
- text
- song
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Auxiliary Devices For Music (AREA)
Abstract
The invention provides a humming synthesis method and a humming synthesis system. The method comprises the following steps of: receiving a text input by a user; performing text analysis to acquire a syllable sequence corresponding to the text and syllable names of syllables in the syllable sequence; acquiring corresponding time length parameters, base frequency parameters and spectrum parameters by combing a statistical parameter model and planning according to the syllable names of the syllables and the context aiming at the syllables in the syllable sequence; regulating the time length parameters and the base frequency parameters acquired by planing according to a song template selected by the user and the number of the syllables of the syllable sequence, wherein the song template stores the time length parameters and the base frequency parameters of the syllables; performing interpolation regulation on the spectrum parameters of the corresponding syllables according to the regulated time length parameters; and acquiring voice data by using a synthesizer according to the time length parameters, the base frequency parameters and the spectrum parameters of the syllables in the syllable sequence. The method and the system can output the voice data with song rhythm and melody.
Description
Technical field
The present invention relates to the speech synthesis technique field, particularly relate to a kind of humming synthesis method and system.
Background technology
Speech synthesis technique claims literary composition language conversion (TTS, Text to Speech) technology again, and its massage voice reading that any Word message can be converted into the standard smoothness comes out.
Present phoneme synthesizing method is to prerecord a sound bank, finishes a speech synthesis system then on this sound bank basis.The intonation rhythm of the method synthetic video depends on sound bank, and the sound that promptly synthesizes similarly is that the recording people is speaking.
And in some entertainment applications, the user wishes to regulate the intonation rhythm of synthetic speech, such as, note " is sung " with the intonation of song.
In a word, need the urgent technical matters that solves of those skilled in the art to be exactly: how can synthesize voice with song intonation rhythm.
Summary of the invention
Technical matters to be solved by this invention provides a kind of humming synthesis method and system, is used to export the speech data that has song rhythm and melody.
In order to address the above problem, the invention discloses a kind of humming synthesis method, comprising:
Receive the text of user's input;
Carry out text analyzing, obtain the syllable sequence corresponding with described text, and, the syllable title of each syllable in this syllable sequence;
At each syllable in the described syllable sequence, according to its syllable title and context environmental, in conjunction with the statistical parameter model, planning obtains corresponding time length parameter, base frequency parameters and spectrum parameter;
According to the song template of user's selection and the syllable number of described syllable sequence, duration parameters, base frequency parameters that described planning obtains are adjusted, wherein, articulatory duration parameters of storage and base frequency parameters in the described song template;
According to adjusted duration parameters, the spectrum parameter of corresponding syllables is carried out the interpolation adjustment;
According to duration parameters, base frequency parameters and the spectrum parameter of each syllable in the described syllable sequence, utilize compositor to obtain the speech data corresponding with described syllable sequence.
Preferably, the described step that duration parameters, base frequency parameters are adjusted comprises:
Obtain the syllable number of described syllable sequence;
From described song template, extract and described syllable number corresponding time length parameter and base frequency parameters, and cover duration parameters, the base frequency parameters that described planning obtains.
Preferably, described text analyzing step comprises:
Described text is carried out the participle operation;
Numeric character in the described text is converted to literal;
According to word segmentation result, the text after the numeric character conversion is carried out rhythm prediction;
Predicting the outcome according to the rhythm, is syllable sequence with text-converted, and, based on the syllable mapping table, obtain the syllable title of each syllable in this syllable sequence.
Preferably, described song template is the template that generates as follows:
At song sample, extract the wherein duration parameters and the base frequency parameters of each syllable;
With described duration parameters and base frequency parameters, be saved to the song template.
Preferably, described song sample comprises the song sample of singing opera arias.
On the other hand, the invention also discloses a kind of humming synthesis system, comprising:
Interface module is used to receive the text that the user imports;
Text analysis model is used to carry out text analyzing, obtains the syllable sequence corresponding with described text, and, the syllable title of each syllable in this syllable sequence;
The parametric programming module is used at each syllable of described syllable sequence, and according to its syllable title and context environmental, in conjunction with the statistical parameter model, planning obtains corresponding time length parameter, base frequency parameters and spectrum parameter;
First parameter adjustment module, be used for according to the song template of user's selection and the syllable number of described syllable sequence, duration parameters, base frequency parameters that described planning obtains are adjusted, wherein, articulatory duration parameters of storage and base frequency parameters in the described song template;
Second parameter adjustment module is used for according to adjusted duration parameters the spectrum parameter of corresponding syllables being carried out the interpolation adjustment;
Synthesis module is used for duration parameters, base frequency parameters and spectrum parameter according to described each syllable of syllable sequence, utilizes compositor to obtain the speech data corresponding with described syllable sequence.
Preferably, described first parameter adjustment module comprises:
Acquiring unit is used to obtain the syllable number of described syllable sequence;
Adjustment unit is used for extracting and described syllable number corresponding parameters information from the song template, covers duration parameters, base frequency parameters that described planning obtains, and the spectrum parameter is carried out interpolation according to the planning duration.
Preferably, described text analysis model comprises:
The participle unit is used for described text is carried out the participle operation;
The numeric character converting unit is used for the numeric character of described text is converted to literal;
Rhythm predicting unit is used for according to word segmentation result, and the text after the numeric character conversion is carried out rhythm prediction;
The syllable converting unit is used for predicting the outcome according to the rhythm, is syllable sequence with text-converted, and, based on the syllable mapping table, obtain the syllable title of each syllable in this syllable sequence.
Preferably, described system also comprises song template generation module, and this song template generation module comprises:
Extraction unit is used at song sample, extracts the wherein duration parameters and the base frequency parameters of each syllable;
Preserve the unit, be used for described duration parameters and base frequency parameters are saved to the song template.
Preferably, described song sample comprises the song sample of singing opera arias.
Compared with prior art, the present invention has the following advantages:
It is unit storage duration parameters, base frequency parameters with the syllable that the present invention adopts the song template, and can name described song template according to the rule of sign rhythm such as song title, melody; Like this, the user can select suitable song template according to actual demands such as personal habits, application scenarioss, adjusts with duration and base frequency parameters that planning is obtained, obtains the speech data of user input text at last based on the parameter synthetic technology.Because in speech parameter, duration and base frequency parameters determine the information of rhythm, melody aspect jointly, spectrum parameter decision tone color information, i.e. the characteristic voice information of speaker; Thereby the present invention can combine duration, the base frequency parameters of song template with the spectrum parameter of sound storehouse speaker, and can access tone color is that sound storehouse speaker, tone rhythm are song and the humming voice flow that has certain melody.
Description of drawings
Fig. 1 is the process flow diagram of a kind of humming synthesis method embodiment of the present invention;
Fig. 2 is a kind of structural drawing of humming synthesis system embodiment of the present invention.
Embodiment
For above-mentioned purpose of the present invention, feature and advantage can be become apparent more, the present invention is further detailed explanation below in conjunction with the drawings and specific embodiments.
One of core idea of the embodiment of the invention is, generate the song template based on duration parameters and base frequency parameters, and, when user input text, can adjust duration and base frequency parameters that planning obtains according to described song template, utilize compositor to obtain the speech data of described text then.Because in speech parameter, duration and base frequency parameters determine the information of rhythm, melody aspect jointly, spectrum parameter decision tone color information, i.e. the characteristic voice information of speaker; Thereby above-mentioned duration with the song template, base frequency parameters combine with the spectrum parameter of sound storehouse speaker, and can access tone color is that sound storehouse speaker, tone rhythm are song and the humming voice flow that has certain melody.
With reference to Fig. 1, show the process flow diagram of a kind of humming synthesis method embodiment of the present invention, specifically can comprise:
The text of step 101, reception user input;
The text of described user's input can comprise literal and numeric character, wherein, described literal can be Chinese character, Japanese, Korean, English etc., perhaps, in the above-mentioned kinds of words one or several, as Chinese-English combination or the like, the present invention is not limited concrete text, below mainly is example with the Chinese character.
Below concrete text " grand Opening Ceremony of the Games has been held at 2008-8-8 in Beijing " be example, described text analyzing step is described, specifically can comprise:
Substep A1, described text is carried out participle operation;
Word segmentation result: Beijing/hold at/2008-8-8/// grand// Olympic Games/opening ceremony
Substep A2, the numeric character in the described text is converted to literal;
Corresponding this example, described numeric character conversion also promptly is converted to " 2008-8-8 " " 2008 on August 8, ", and the text after the numeric character conversion is " grand Opening Ceremony of the Games has been held in Beijing 2008 on August 8, ".
Substep A3, according to word segmentation result, the text after the numeric character conversion is carried out rhythm prediction;
The rhythm predicts the outcome: Beijing is at grand Opening Ceremony of the Games in 2008 on August 8 ,/held
Substep A4, predicting the outcome according to the rhythm, is syllable sequence with text-converted, and, based on the syllable mapping table, obtain the syllable title of each syllable in this syllable sequence.
Syllable sequence: bei3 jing1 zai4 er4 ling2 ling2 ba1 nian2 ba1 yue4 ba1 ri4Ju3 xing2 le5 sheng4 da4 de5 ao4 yun4 hui4 kai1 mu4 shi4
Wherein, numeral 12345 is represented tone, is respectively, two, three, the four tones of standard Chinese pronunciation, softly.In practice, the syllable title of Chinese character syllable can obtain by inquiry of Chinese character syllable mapping table, and " bei3 " that for example go up in the example promptly is the syllable title.
Described context environmental mainly is meant the positional information of syllable, can comprise in beginning of the sentence, the sentence and end of the sentence; Example on the correspondence, the context environmental of " shi4 " is an end of the sentence, the context environmental of " er4 " then is in the sentence.
In practice, described statistical parameter model can obtain by off-line training, and it stores each syllable pairing parameter under different context environmentals.
For example, during off-line, train first statistical model, train second statistical model at base frequency parameters at duration parameters, and, at spectrum parameter training the 3rd statistical model; So, during online planning, can directly obtain and syllable corresponding time length parameter, base frequency parameters and spectrum parameter from described three statistical models.
In practice, can set up the song template by following off-line step:
Substep A1, at song sample, extract the wherein duration parameters and the base frequency parameters of each syllable;
Substep A2, with described duration parameters and base frequency parameters, be saved to the song template.
Because common song is made up of voice and music two parts, and the sounding characteristics and the mankind of musical instrument differ greatly, and can produce a lot of deviations during extraction, therefore, the present invention preferentially selects the song sample of singing opera arias for use.
In speech parameter, duration parameters also is the tone period length of each syllable, can determine according to wave file; Base frequency parameters is the vibration frequency of sound wave, can at first detect the cycle of waveform during extraction, gets inverse then and can obtain base frequency parameters.
In specific implementation, can adopt ripe instrument, from song sample, extract described duration parameters and base frequency parameters automatically, the present invention is not limited concrete extracting mode.
In addition, the present invention generally generates a song template at a song sample, and wherein, described song sample can be complete song, also can be snatch of song; And, select for the convenience of the user, can be described song template name, for example, described naming rule can be a song title: " approximately in the winter time ", " moon is represented my heart ", " story Of The Spring " etc.
When user input text, the present invention can represent the option that described off-line is set up several song templates, select for the user, and the user can select suitable song template according to actual needs such as personal habits, application scenarioss.
Particularly, described step 104 can realize by following substep:
Substep B1, obtain the syllable number of described syllable sequence;
Substep B2, extraction and described syllable number corresponding time length parameter and base frequency parameters from described song template, and cover duration parameters, the base frequency parameters that described planning obtains.
The syllable number of supposing the described syllable sequence that acquires is N, and the syllable number in the described song template is M, wherein, M, N is natural number, and set-up procedure of the present invention mainly contains two kinds of situations:
Situation 1, M 〉=N;
At this moment, can directly from the song template, intercept the duration parameters and the base frequency parameters of top n syllable.
Situation 2, M<N;
At this situation, can recycling described song template in the duration parameters and the base frequency parameters of M syllable, suppose that the syllable sequence number is 1,2 in the song template, ..., M, and hypothesis N>2M, so, the syllable sequence number in the pairing song template of duration parameters of finally obtaining and base frequency parameters can for: 1,2, ..., M, 1,2 ..., M, 1,2 ... N.
Here, the duration parameters that described coverage planning obtains, base frequency parameters also, are replaced original duration parameters and base frequency parameters with duration parameters in the song template and base frequency parameters.
In practice, can after the duration parameters and base frequency parameters of extracting a syllable, and then carry out described overlapping operation, carry out at other syllable then and extract and overlapping operation; Perhaps, after the duration parameters and base frequency parameters of extracting N syllable, carry out overlapping operation again, the present invention is not limited concrete sequence of operation.
The precondition of utilizing compositor to carry out phonetic synthesis is that base frequency parameters and spectrum parameter should be one to one, also, must corresponding one an of base frequency parameters compose parameter; So this step by adjusting the spectrum parameter, makes it corresponding with the base frequency parameters that step 104 planning obtains, to carry out next step phonetic synthesis.
Below by concrete example described adjustment process is described:
Suppose that step 103 is 400ms at the duration parameters that described syllable sequence planning obtains, the number of sampling each second is 1000, and also, sample frequency is 1000HZ (hertz), and by calculating, the number that can obtain base frequency parameters and spectrum parameter is 400;
Suppose step 104 according to the song template of user's selection and the syllable number of described syllable sequence, adjusting the duration parameters that obtains is 500ms, and the number that also is base frequency parameters is 500;
So, this step then is at 400 in the step 103 spectrum parameters, and interpolation obtains 500 spectrum parameters.
Interpolation method has a lot, for example, and linear interpolation, non-linear interpolation, perhaps, two point interpolations, multiple spot interpolation etc., those skilled in the art can adopt any as required, and the present invention is not limited this.
For example, when adopting 2 linear interpolation, interpolation formula can be Qs=(aQ1+bQ2+u1)/(a+b), wherein, Q1, Q2 are respectively the spectrum parameter of known spectrum parameter point 1,2 (can be original spectrum parameter point in the step 103, also can be the acquired new spectrum parameter point of this step), a, b is a natural number, can represent known spectrum parameter point 1,2 to treat the weight that interpolation point S produces, 0<u1<a+b respectively.
In summary, this step promptly is to be N with M spectrum parameter interpolation, and to satisfy the requirement of a corresponding fundamental frequency of spectrum, wherein, M value can be obtained by step 103, and the N value can be by step 104 acquisition, and M, N are natural number.
Because have advantages such as regulating power is big, voice plasticity is strong, the parameter synthetic technology has obtained using widely in phonetic synthesis; In practice, wave filter is as compositor can to adopt LPC (linear predictive coding, linearpredictive coding), and the present invention is not limited concrete compositor.
Owing to added duration parameters and base frequency parameters in the song template, thereby the described synthetic speech data that obtains has melody identical with song and rhythm.
With reference to Fig. 2, show a kind of structural drawing of humming synthesis system embodiment of the present invention, specifically can comprise:
First parameter adjustment module 204, be used for according to the song template of user's selection and the syllable number of described syllable sequence, duration parameters, base frequency parameters that described planning obtains are adjusted, wherein, articulatory duration parameters of storage and base frequency parameters in the described song template;
Second parameter adjustment module 205 is used for according to adjusted duration parameters the spectrum parameter of corresponding syllables being carried out the interpolation adjustment;
In practice, described text analysis model 202 may further include:
Participle unit C1 is used for described text is carried out the participle operation;
Numeric character processing unit C2 is used for the numeric character of described text is converted to literal;
Rhythm predicting unit C3 is used for according to word segmentation result, and the text after the numeric character conversion is carried out rhythm prediction;
Syllable converting unit C4 is used for predicting the outcome according to the rhythm, is syllable sequence with text-converted, and, based on the syllable mapping table, obtain the syllable title of each syllable in this syllable sequence.
The present invention can adopt the song template generation module of following off-line to set up described song template, and this song template generation module specifically can comprise:
Extraction unit D1 is used at song sample, extracts the wherein duration parameters and the base frequency parameters of each syllable;
Preserve cells D 2, be used for described duration parameters and base frequency parameters and corresponding sample frequency are saved to the song template.
Because common song is made up of voice and music two parts, and the sounding characteristics and the mankind of musical instrument differ greatly, and can produce a lot of deviations during extraction, therefore, the present invention preferentially selects the song sample of singing opera arias for use.
When user input text, the present invention can represent the option that described off-line is set up several song templates, select for the user, and the user can select suitable song template according to actual needs such as personal habits, application scenarioss.
Particularly, described first parameter adjustment module 204 can comprise following cellular construction:
Acquiring unit E1 is used to obtain the syllable number of described syllable sequence;
Adjustment unit E2 is used for extracting and described syllable number corresponding parameters information from the song template, and covers duration parameters, the base frequency parameters that described planning obtains.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed all is and the difference of other embodiment that identical similar part is mutually referring to getting final product between each embodiment.For system embodiment, because it is similar substantially to method embodiment, so description is fairly simple, relevant part gets final product referring to the part explanation of method embodiment.
The present invention can be applied to various computer terminals and digital mobile equipment, and arbitrary text that be used for system is received or input converts the voice flow that has song rhythm and melody to.
More than to a kind of humming synthesis method provided by the present invention and system, be described in detail, used specific case herein principle of the present invention and embodiment are set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, the part that all can change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention.
Claims (10)
1. a humming synthesis method is characterized in that, comprising:
Receive the text of user's input;
Carry out text analyzing, obtain the syllable sequence corresponding with described text, and, the syllable title of each syllable in this syllable sequence;
At each syllable in the described syllable sequence, according to its syllable title and context environmental, in conjunction with the statistical parameter model, planning obtains corresponding time length parameter, base frequency parameters and spectrum parameter;
According to the song template of user's selection and the syllable number of described syllable sequence, duration parameters, base frequency parameters that described planning obtains are adjusted, wherein, articulatory duration parameters of storage and base frequency parameters in the described song template;
According to adjusted duration parameters, the spectrum parameter of corresponding syllables is carried out the interpolation adjustment;
According to duration parameters, base frequency parameters and the spectrum parameter of each syllable in the described syllable sequence, utilize compositor to obtain the speech data corresponding with described syllable sequence.
2. the method for claim 1 is characterized in that, the described step that duration parameters, base frequency parameters are adjusted comprises:
Obtain the syllable number of described syllable sequence;
From described song template, extract and described syllable number corresponding time length parameter and base frequency parameters, and cover duration parameters, the base frequency parameters that described planning obtains.
3. the method for claim 1 is characterized in that, described text analyzing step comprises:
Described text is carried out the participle operation;
Numeric character in the described text is converted to literal;
According to word segmentation result, the text after the numeric character conversion is carried out rhythm prediction;
Predicting the outcome according to the rhythm, is syllable sequence with text-converted, and, based on the syllable mapping table, obtain the syllable title of each syllable in this syllable sequence.
4. the method for claim 1 is characterized in that, described song template is the template that generates as follows:
At song sample, extract the wherein duration parameters and the base frequency parameters of each syllable;
With described duration parameters and base frequency parameters, be saved to the song template.
5. method as claimed in claim 4 is characterized in that described song sample comprises the song sample of singing opera arias.
6. a humming synthesis system is characterized in that, comprising:
Interface module is used to receive the text that the user imports;
Text analysis model is used to carry out text analyzing, obtains the syllable sequence corresponding with described text, and, the syllable title of each syllable in this syllable sequence;
The parametric programming module is used at each syllable of described syllable sequence, and according to its syllable title and context environmental, in conjunction with the statistical parameter model, planning obtains corresponding time length parameter, base frequency parameters and spectrum parameter;
First parameter adjustment module, be used for according to the song template of user's selection and the syllable number of described syllable sequence, duration parameters, base frequency parameters that described planning obtains are adjusted, wherein, articulatory duration parameters of storage and base frequency parameters in the described song template;
Second parameter adjustment module is used for according to adjusted duration parameters the spectrum parameter of corresponding syllables being carried out the interpolation adjustment;
Synthesis module is used for duration parameters, base frequency parameters and spectrum parameter according to described each syllable of syllable sequence, utilizes compositor to obtain the speech data corresponding with described syllable sequence.
7. system as claimed in claim 6 is characterized in that, described first parameter adjustment module comprises:
Acquiring unit is used to obtain the syllable number of described syllable sequence;
Adjustment unit is used for extracting and described syllable number corresponding parameters information from the song template, covers duration parameters, base frequency parameters that described planning obtains, and the spectrum parameter is carried out interpolation according to the planning duration.
8. system as claimed in claim 6 is characterized in that, described text analysis model comprises:
The participle unit is used for described text is carried out the participle operation;
The numeric character converting unit is used for the numeric character of described text is converted to literal;
Rhythm predicting unit is used for according to word segmentation result, and the text after the numeric character conversion is carried out rhythm prediction;
The syllable converting unit is used for predicting the outcome according to the rhythm, is syllable sequence with text-converted, and, based on the syllable mapping table, obtain the syllable title of each syllable in this syllable sequence.
9. system as claimed in claim 6 is characterized in that, also comprises song template generation module, and this song template generation module comprises:
Extraction unit is used at song sample, extracts the wherein duration parameters and the base frequency parameters of each syllable;
Preserve the unit, be used for described duration parameters and base frequency parameters are saved to the song template.
10. system as claimed in claim 9 is characterized in that described song sample comprises the song sample of singing opera arias.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2010102234975A CN101901598A (en) | 2010-06-30 | 2010-06-30 | Humming synthesis method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2010102234975A CN101901598A (en) | 2010-06-30 | 2010-06-30 | Humming synthesis method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN101901598A true CN101901598A (en) | 2010-12-01 |
Family
ID=43227091
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2010102234975A Pending CN101901598A (en) | 2010-06-30 | 2010-06-30 | Humming synthesis method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101901598A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103456295A (en) * | 2013-08-05 | 2013-12-18 | 安徽科大讯飞信息科技股份有限公司 | Method and system for generating fundamental frequency parameters in singing synthesis |
CN103915093A (en) * | 2012-12-31 | 2014-07-09 | 安徽科大讯飞信息科技股份有限公司 | Method and device for realizing voice singing |
CN104391980A (en) * | 2014-12-08 | 2015-03-04 | 百度在线网络技术(北京)有限公司 | Song generating method and device |
CN105740394A (en) * | 2016-01-27 | 2016-07-06 | 广州酷狗计算机科技有限公司 | Music generation method, terminal, and server |
CN105788589A (en) * | 2016-05-04 | 2016-07-20 | 腾讯科技(深圳)有限公司 | Audio data processing method and device |
CN107705782A (en) * | 2017-09-29 | 2018-02-16 | 百度在线网络技术(北京)有限公司 | Method and apparatus for determining phoneme pronunciation duration |
CN107749301A (en) * | 2017-09-18 | 2018-03-02 | 得理电子(上海)有限公司 | A kind of tone color sample reconstructing method and system, storage medium and terminal device |
CN108053814A (en) * | 2017-11-06 | 2018-05-18 | 芋头科技(杭州)有限公司 | A kind of speech synthesis system and method for analog subscriber song |
CN108257609A (en) * | 2017-12-05 | 2018-07-06 | 北京小唱科技有限公司 | The modified method of audio content and its intelligent apparatus |
CN108877753A (en) * | 2018-06-15 | 2018-11-23 | 百度在线网络技术(北京)有限公司 | Music synthesis method and system, terminal and computer readable storage medium |
CN109801618A (en) * | 2017-11-16 | 2019-05-24 | 深圳市腾讯计算机系统有限公司 | A kind of generation method and device of audio-frequency information |
CN110047462A (en) * | 2019-01-31 | 2019-07-23 | 北京捷通华声科技股份有限公司 | A kind of phoneme synthesizing method, device and electronic equipment |
CN112270917A (en) * | 2020-10-20 | 2021-01-26 | 网易(杭州)网络有限公司 | Voice synthesis method and device, electronic equipment and readable storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11184490A (en) * | 1997-12-25 | 1999-07-09 | Nippon Telegr & Teleph Corp <Ntt> | Singing synthesizing method by rule voice synthesis |
CN1661674A (en) * | 2004-01-23 | 2005-08-31 | 雅马哈株式会社 | Singing generator and portable communication terminal having singing generation function |
EP1256932B1 (en) * | 2001-05-11 | 2006-05-10 | Sony France S.A. | Method and apparatus for synthesising an emotion conveyed on a sound |
CN101308652A (en) * | 2008-07-17 | 2008-11-19 | 安徽科大讯飞信息科技股份有限公司 | Synthesizing method of personalized singing voice |
WO2010031437A1 (en) * | 2008-09-19 | 2010-03-25 | Asociacion Centro De Tecnologias De Interaccion Visual Y Comunicaciones Vicomtech | Method and system of voice conversion |
-
2010
- 2010-06-30 CN CN2010102234975A patent/CN101901598A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11184490A (en) * | 1997-12-25 | 1999-07-09 | Nippon Telegr & Teleph Corp <Ntt> | Singing synthesizing method by rule voice synthesis |
EP1256932B1 (en) * | 2001-05-11 | 2006-05-10 | Sony France S.A. | Method and apparatus for synthesising an emotion conveyed on a sound |
CN1661674A (en) * | 2004-01-23 | 2005-08-31 | 雅马哈株式会社 | Singing generator and portable communication terminal having singing generation function |
CN101308652A (en) * | 2008-07-17 | 2008-11-19 | 安徽科大讯飞信息科技股份有限公司 | Synthesizing method of personalized singing voice |
WO2010031437A1 (en) * | 2008-09-19 | 2010-03-25 | Asociacion Centro De Tecnologias De Interaccion Visual Y Comunicaciones Vicomtech | Method and system of voice conversion |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103915093A (en) * | 2012-12-31 | 2014-07-09 | 安徽科大讯飞信息科技股份有限公司 | Method and device for realizing voice singing |
CN103915093B (en) * | 2012-12-31 | 2019-07-30 | 科大讯飞股份有限公司 | A kind of method and apparatus for realizing singing of voice |
CN103456295B (en) * | 2013-08-05 | 2016-05-18 | 科大讯飞股份有限公司 | Sing synthetic middle base frequency parameters and generate method and system |
CN103456295A (en) * | 2013-08-05 | 2013-12-18 | 安徽科大讯飞信息科技股份有限公司 | Method and system for generating fundamental frequency parameters in singing synthesis |
CN104391980B (en) * | 2014-12-08 | 2019-03-08 | 百度在线网络技术(北京)有限公司 | The method and apparatus for generating song |
CN104391980A (en) * | 2014-12-08 | 2015-03-04 | 百度在线网络技术(北京)有限公司 | Song generating method and device |
CN105740394A (en) * | 2016-01-27 | 2016-07-06 | 广州酷狗计算机科技有限公司 | Music generation method, terminal, and server |
WO2017190674A1 (en) * | 2016-05-04 | 2017-11-09 | 腾讯科技(深圳)有限公司 | Method and device for processing audio data, and computer storage medium |
CN105788589B (en) * | 2016-05-04 | 2021-07-06 | 腾讯科技(深圳)有限公司 | Audio data processing method and device |
CN105788589A (en) * | 2016-05-04 | 2016-07-20 | 腾讯科技(深圳)有限公司 | Audio data processing method and device |
US10789290B2 (en) | 2016-05-04 | 2020-09-29 | Tencent Technology (Shenzhen) Company Limited | Audio data processing method and apparatus, and computer storage medium |
CN107749301B (en) * | 2017-09-18 | 2021-03-09 | 得理电子(上海)有限公司 | Tone sample reconstruction method and system, storage medium and terminal device |
CN107749301A (en) * | 2017-09-18 | 2018-03-02 | 得理电子(上海)有限公司 | A kind of tone color sample reconstructing method and system, storage medium and terminal device |
CN107705782A (en) * | 2017-09-29 | 2018-02-16 | 百度在线网络技术(北京)有限公司 | Method and apparatus for determining phoneme pronunciation duration |
CN108053814A (en) * | 2017-11-06 | 2018-05-18 | 芋头科技(杭州)有限公司 | A kind of speech synthesis system and method for analog subscriber song |
CN108053814B (en) * | 2017-11-06 | 2023-10-13 | 芋头科技(杭州)有限公司 | Speech synthesis system and method for simulating singing voice of user |
CN109801618A (en) * | 2017-11-16 | 2019-05-24 | 深圳市腾讯计算机系统有限公司 | A kind of generation method and device of audio-frequency information |
CN108257609A (en) * | 2017-12-05 | 2018-07-06 | 北京小唱科技有限公司 | The modified method of audio content and its intelligent apparatus |
CN108877753B (en) * | 2018-06-15 | 2020-01-21 | 百度在线网络技术(北京)有限公司 | Music synthesis method and system, terminal and computer readable storage medium |
US10971125B2 (en) | 2018-06-15 | 2021-04-06 | Baidu Online Network Technology (Beijing) Co., Ltd. | Music synthesis method, system, terminal and computer-readable storage medium |
CN108877753A (en) * | 2018-06-15 | 2018-11-23 | 百度在线网络技术(北京)有限公司 | Music synthesis method and system, terminal and computer readable storage medium |
CN110047462A (en) * | 2019-01-31 | 2019-07-23 | 北京捷通华声科技股份有限公司 | A kind of phoneme synthesizing method, device and electronic equipment |
CN110047462B (en) * | 2019-01-31 | 2021-08-13 | 北京捷通华声科技股份有限公司 | Voice synthesis method and device and electronic equipment |
CN112270917A (en) * | 2020-10-20 | 2021-01-26 | 网易(杭州)网络有限公司 | Voice synthesis method and device, electronic equipment and readable storage medium |
CN112270917B (en) * | 2020-10-20 | 2024-06-04 | 网易(杭州)网络有限公司 | Speech synthesis method, device, electronic equipment and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101901598A (en) | Humming synthesis method and system | |
US20220013106A1 (en) | Multi-speaker neural text-to-speech synthesis | |
KR101120710B1 (en) | Front-end architecture for a multilingual text-to-speech system | |
KR20220004737A (en) | Multilingual speech synthesis and cross-language speech replication | |
EP2704092A2 (en) | System for creating musical content using a client terminal | |
Latorre et al. | New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer | |
CN101578659A (en) | Voice tone converting device and voice tone converting method | |
Santra et al. | Development of GUI for text-to-speech recognition using natural language processing | |
JP4829477B2 (en) | Voice quality conversion device, voice quality conversion method, and voice quality conversion program | |
JP2006517037A (en) | Prosodic simulated word synthesis method and apparatus | |
CN102543081A (en) | Controllable rhythm re-estimation system and method and computer program product | |
CN112102811B (en) | Optimization method and device for synthesized voice and electronic equipment | |
CN112037755B (en) | Voice synthesis method and device based on timbre clone and electronic equipment | |
Macon et al. | Concatenation-based midi-to-singing voice synthesis | |
KR20200069264A (en) | System for outputing User-Customizable voice and Driving Method thereof | |
JP2014062970A (en) | Voice synthesis, device, and program | |
JP2001034280A (en) | Electronic mail receiving device and electronic mail system | |
CN112242134A (en) | Speech synthesis method and device | |
CN115762471A (en) | Voice synthesis method, device, equipment and storage medium | |
CN113539236B (en) | Speech synthesis method and device | |
CN113724684A (en) | Voice synthesis method and system for air traffic control instruction | |
Li et al. | A lyrics to singing voice synthesis system with variable timbre | |
CN1979636B (en) | Method for converting phonetic symbol to speech | |
JP2021148942A (en) | Voice quality conversion system and voice quality conversion method | |
Kiruthiga et al. | Annotating Speech Corpus for Prosody Modeling in Indian Language Text to Speech Systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20101201 |