CN108831437A - A kind of song generation method, device, terminal and storage medium - Google Patents
A kind of song generation method, device, terminal and storage medium Download PDFInfo
- Publication number
- CN108831437A CN108831437A CN201810622548.8A CN201810622548A CN108831437A CN 108831437 A CN108831437 A CN 108831437A CN 201810622548 A CN201810622548 A CN 201810622548A CN 108831437 A CN108831437 A CN 108831437A
- Authority
- CN
- China
- Prior art keywords
- voice signal
- acoustic feature
- information
- feature information
- song
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000006243 chemical reaction Methods 0.000 claims abstract description 32
- 238000001228 spectrum Methods 0.000 claims description 23
- 230000009466 transformation Effects 0.000 claims description 17
- 230000008859 change Effects 0.000 claims description 13
- 230000005236 sound signal Effects 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 4
- 238000004321 preservation Methods 0.000 claims description 4
- 238000012549 training Methods 0.000 abstract description 15
- 230000000694 effects Effects 0.000 abstract description 10
- 238000010586 diagram Methods 0.000 description 5
- 230000002123 temporal effect Effects 0.000 description 5
- 230000015572 biosynthetic process Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000005291 magnetic effect Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000003786 synthesis reaction Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000033764 rhythmic process Effects 0.000 description 3
- 238000000605 extraction Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H7/00—Instruments in which the tones are synthesised from a data store, e.g. computer organs
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/315—Sound category-dependent sound synthesis processes [Gensound] for musical use; Sound category-specific synthesis-controlling parameters or control means therefor
- G10H2250/455—Gensound singing voices, i.e. generation of human voices for musical applications, vocal singing sounds or intelligible words at a desired pitch or with desired vocal effects, e.g. by phoneme synthesis
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- General Engineering & Computer Science (AREA)
- Reverberation, Karaoke And Other Acoustics (AREA)
Abstract
The embodiment of the invention discloses a kind of song generation method, device, terminal and storage mediums, wherein song generation method includes:Obtain the voice signal corresponding with song of user's typing;The corresponding standard acoustic feature information of song is obtained from the acoustic feature template pre-established, carries out the update of acoustic feature information to voice signal according to standard acoustic feature information;Wherein, the standard acoustic feature information of at least one song is preserved in acoustic feature template;Voice signal with updated acoustic feature information is stored or exported as targeted voice signal.The embodiment of the present invention, which overcomes, existing to be carried out acoustic training model using a large amount of data and realizes the conversion of voice to song, and the sound of user oneself is not included in finally formed song, the problem for causing user's participation and Experience Degree not high, it realizes without carrying out acoustic training model, the effect that the voice of user is converted to the song of sound for remaining with user oneself can be realized.
Description
Technical field
The present embodiments relate to field of computer technology more particularly to a kind of song generation method, device, terminal and deposit
Storage media.
Background technique
Voice switching singing voice, which refers to, transforms into corresponding song for the voice of user.Such internet product can incite somebody to action
After the voice of user transforms into song, in conjunction with accompaniment music, synthesize user oneself sings works, has entertainment, social activity
Property and certain market value.
The scheme for converting speech into song in the prior art is mainly:In model training stage, use professional singer A's
The text data (including the lyrics etc.) and singer A of multiple songs sing the acoustic feature of corresponding song, carry out model training, obtain
To the acoustic model of singer A;In song generation phase, the voice data that user B sang or read song is obtained, according to the voice
The lyrics of data identification song simultaneously obtain the acoustic feature of user B;The acoustic model for the lyrics input singer A that will identify that, obtains
The prediction acoustic feature exported to the acoustic model, according to the fundamental frequency and duration of a sound update prediction acoustics in the acoustic feature of user B
Fundamental frequency and the duration of a sound in feature, obtain modified acoustic feature, and what which included is the base of user B
Frequently, the frequency spectrum of the duration of a sound of user B, singer A, therefore modified acoustic feature is spelled using parametric statistical methods or sound library again
Method is connect, the pitch and rhythm of characteristic voice and user B of the obtained song with singer A have reached singer A and imitated user B
The effect to give song recitals.
Above scheme generally requires to carry out acoustic training model, to the more demanding of sample data volume, realizes that process is multiple
It is miscellaneous, and the loss in sound quality can be brought;In addition, having the characteristic voice of singer using the song of above method synthesis, cause to use
The participation and Experience Degree at family are bad.
Summary of the invention
The embodiment of the present invention provides a kind of song generation method, device, terminal and storage medium, is not necessarily to carry out sound to reach
Model training is learned, the voice of user can be converted to the effect for remaining with the song of sound of user oneself.
In a first aspect, the embodiment of the invention provides a kind of song generation method, the method includes:
Obtain the voice signal corresponding with song of user's typing;
The corresponding standard acoustic feature information of the song is obtained from the acoustic feature template pre-established, according to described
Standard acoustic feature information carries out the update of acoustic feature information to the voice signal;Wherein, in the acoustic feature template
Preserve the standard acoustic feature information of at least one song;
Voice signal with updated acoustic feature information is stored or exported as targeted voice signal.
Second aspect, the embodiment of the invention also provides a kind of song generating means, described device includes:
Voice signal obtains module, for obtaining the voice signal corresponding with song of user's typing;
Acoustic feature information updating module, it is corresponding for obtaining the song from the acoustic feature template pre-established
Standard acoustic feature information carries out acoustic feature information more to the voice signal according to the standard acoustic feature information
Newly;Wherein, the standard acoustic feature information of at least one song is preserved in the acoustic feature template;
Targeted voice signal determining module, for that will have the voice signal of updated acoustic feature information as target
Voice signal is stored or is exported.
The third aspect, the embodiment of the invention also provides a kind of songs to generate terminal, and the terminal includes:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processing
Device realizes song generation method described in first aspect as above.
Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer
Program, the program realize song generation method described in first aspect as above when being executed by processor.
The embodiment of the present invention passes through the voice signal corresponding with song for obtaining user's typing, special from the acoustics pre-established
It levies and obtains the corresponding standard acoustic feature information of song in template, acoustics is carried out to voice signal according to standard acoustic feature information
The update of characteristic information, wherein the standard acoustic feature information that at least one song is preserved in acoustic feature template will have
The voice signal of updated acoustic feature information is stored or is exported as targeted voice signal, is overcome in the prior art
Acoustic training model is carried out to realize the conversion of voice to song using a large amount of data, and is not included in finally formed song
Without carrying out acoustic training model, i.e., the sound of user oneself, the problem for causing user's participation and Experience Degree not high realize
The effect that the voice of user is converted to the song of sound for remaining with user oneself can be achieved, meanwhile, also ensure that song has
There is good acoustical quality.
Detailed description of the invention
Fig. 1 is the flow chart of the song generation method in the embodiment of the present invention one;
Fig. 2 is the flow chart of the song generation method in the embodiment of the present invention two;
Fig. 3 is the flow chart of the song generation method in the embodiment of the present invention three;
Fig. 4 is the structural schematic diagram of the song generating means in the embodiment of the present invention four;
Fig. 5 is the structural schematic diagram of the song generation terminal in the embodiment of the present invention five.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just
Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.
Embodiment one
Fig. 1 is the flow chart of a kind of song generation method that the embodiment of the present invention one provides, the present embodiment be applicable to by
The case where voice of user is converted into song, this method can be executed by song generating means, wherein the device can be by software
And/or hardware realization, it can generally be integrated in song and generate in terminal, as shown in Figure 1, the method for the present embodiment specifically includes:
S110, the voice signal corresponding with song for obtaining user's typing.
Wherein, the voice signal corresponding with song of user's typing can be using specific song content as object, by with
Family is generated by way of reading aloud or singing.The voice signal may include various information, such as may include particular songs
Lyrics information and acoustics characteristic information, acoustic feature information include reflect pitch fundamental frequency information, reflect volume energy letter
Breath, duration information of reflection rhythm etc..Wherein, it may determine that the user reads aloud or sings specifically according to acoustic feature information
The level and professional singer of song sing the gap between the professional standards of the song.
Preferably, user can generate the request that terminal sends typing voice signal corresponding with song, song to song
After generation terminal receives the request, it can pass through and open the voice signal that microphone etc. obtains user's typing.Wherein, song is raw
It can be independent hardware device at terminal, such as intelligent sound box, be used for interactive robot, be also possible to be installed on each
Client in terminal (such as mobile phone, notebook, smart television etc.).
S120, the corresponding standard acoustic feature information of song is obtained from the acoustic feature template pre-established, according to mark
Quasi- acoustic feature information carries out the update of acoustic feature information to voice signal.
Wherein, acoustic feature template is obtained by extracting the acoustic feature information at least one song that professional singer is recorded
, wherein preserving the standard acoustic feature information of at least one song.In the present embodiment, get user's typing with
After the corresponding voice signal of particular songs, in order to update the acoustic feature information of the voice signal, it can preferably be built from advance
Standard acoustic feature information corresponding with particular songs is obtained in vertical acoustic feature template, is believed according to the standard acoustic feature
Breath updates the corresponding acoustic feature information of voice signal.
Illustratively, user wants to obtain sound characteristic while and the song of the acoustic feature with professional singer with oneself
Song preferably can generate terminal typing song A to song by way of performance.At this point, in order to by user give song recitals A when
Acoustic feature is converted into the acoustic feature of professional singer, can use song and generates the acoustic feature mould pre-saved in terminal
Plate.Specifically, can determine that the corresponding song of voice signal of user's typing has according to the lyrics of song A or the selection of user
Which head body is, after determining song, the corresponding standard acoustic feature of the song can be obtained from acoustic feature template
Information, and be updated using acoustic feature information of the standard acoustic feature information to the voice signal of user's typing.
S130, using the voice signal with updated acoustic feature information as targeted voice signal carry out storage or it is defeated
Out.
Above-mentioned standard acoustic feature information and use of the voice signal with professional singer having updated after acoustic feature information
The sound characteristic information at family oneself, it is therefore preferable that can be using the voice signal with updated acoustic feature information as mesh
Poster sound signal is saved or is exported.
Song generation method provided in this embodiment, by obtaining the voice signal corresponding with song of user's typing, from
The corresponding standard acoustic feature information of song is obtained in the acoustic feature template pre-established, according to standard acoustic feature information pair
The update of voice signal progress acoustic feature information, wherein the standard sound of at least one song is preserved in acoustic feature template
Learn characteristic information, using the voice signal with updated acoustic feature information as targeted voice signal carry out store or it is defeated
Out, it overcomes and carries out acoustic training model using a large amount of data in the prior art to realize the conversion of voice to song, and most
End form at song in do not include the sound of user oneself, the problem for causing user's participation and Experience Degree not high realizes nothing
Acoustic training model need to be carried out, the effect that the voice of user is converted to the song of sound for remaining with user oneself can be realized
Fruit, meanwhile, also ensure that song has good acoustical quality.
On the basis of the various embodiments described above, further, in the voice signal corresponding with song for obtaining user's typing
Further include before:
Standard acoustic feature information of the acoustic feature information of multiple songs of recording as corresponding song is extracted respectively;
By the identification information of multiple songs with corresponding standard acoustic feature information preservation in acoustic feature template.
In the present embodiment, acoustic feature template is that the number of songs recorded previously according to professional singer obtain.Specifically,
The acoustic feature information of each song can be extracted respectively, due to each sound after the number of songs for getting professional singer recording
Learning the corresponding each song of characteristic information is recorded by professional singer, therefore, each acoustic feature information that can will be extracted
Standard acoustic feature information as corresponding song.
If only by each standard acoustic feature information preservation extracted in acoustic feature template, then from preparatory
In the acoustic feature template of foundation obtain standard acoustic feature information corresponding with some particular songs when, lack acquisition according to
According to.Based on this, can be obtained each correspondingly with each standard acoustic feature while extracting each standard acoustic feature information
The identification information of song, and by the identification information of each song with corresponding standard acoustic feature information preservation in acoustic feature template
In.Wherein, the identification information of song includes the title of song, and the lyrics of song, the title of song add name of professional singer etc.,
The mode that song generates the identification information that terminal obtains song corresponding with the voice signal of user's typing can be reception and use
The input information at family is also possible to extract from the voice signal got.
Embodiment two
Fig. 2 is a kind of flow chart of song generation method provided by Embodiment 2 of the present invention.The present embodiment is in above-mentioned each reality
It is optional that acoustic feature information is carried out more to the voice signal according to the standard acoustic feature information on the basis of applying example
Newly, including:The corresponding duration information of the voice signal is obtained, according to the duration information and the standard acoustic feature information
Time-domain audio transformation is carried out to the voice signal, to change the acoustic feature information of the voice signal;Correspondingly, will have
The voice signal of updated acoustic feature information is stored or is exported as targeted voice signal, including:It will carry out time domain
The voice signal obtained after audio transformation is stored or is exported as targeted voice signal.As shown in Fig. 2, the side of the present embodiment
Method specifically includes:
S210, the voice signal corresponding with song for obtaining user's typing.
S220, the corresponding standard acoustic feature information of song is obtained from the acoustic feature template pre-established.
S230, the corresponding duration information of voice signal is obtained, according to duration information and standard acoustic feature information to voice
Signal carries out time-domain audio transformation, to change the acoustic feature information of voice signal.
Wherein, voice signal can be the waveform changed over time, for each word, word or the phrase in voice signal
Corresponding one section of waveform is corresponded to, whens every section of waveform has its corresponding time starting point, time terminating point and time span etc.
Between information, above-mentioned each word, word or phrase and temporal information corresponding with each word, word or phrase are that voice signal is corresponding
Duration information.
It, can be according to duration information and standard acoustic feature information pair after getting the corresponding duration information of voice signal
Voice signal carries out time-domain audio transformation, to change the acoustic feature information of voice signal.Specifically, can be believed based on the duration of a sound
Breath carries out time-domain audio transformation to the corresponding waveform of the voice signal using standard acoustic feature information, so that time-domain audio becomes
The duration information of waveform corresponding to the voice signal, fundamental frequency information and energy information can be believed with standard acoustic feature respectively after changing
Standard duration information, normal fundamental frequency information and standard energy information in breath match.Aforesaid operations are by the acoustic feature of standard
Benchmark of the information as adjustment voice signal, is adjusted the acoustic feature information of voice signal, to change voice signal
Acoustic feature information.
Preferably, the corresponding duration information of voice signal is obtained, may include:
The lyrics information for including in voice signal is obtained by speech recognition, it is corresponding to obtain voice signal according to lyrics information
Duration information.
Specifically, after the voice signal for getting user's typing voice signal can be obtained by audio recognition method
In lyrics information, wherein comprising word, word or phrase etc. in the lyrics information, each word, word or phrase have its it is corresponding when
Between information.The corresponding duration information of voice signal can be obtained according to the lyrics information.
S240, obtained voice signal will be carried out after time-domain audio transformation store as targeted voice signal or defeated
Out.
The voice signal obtained after above-mentioned carry out time-domain audio transformation both may include the sound characteristic of user oneself, may be used also
With the acoustic feature information comprising professional singer, it is based on this, the voice signal obtained after time-domain audio converts can be made
It is stored or is exported for targeted voice signal.
Song generation method provided in this embodiment, by obtaining the voice signal corresponding with song of user's typing, from
The corresponding standard acoustic feature information of song is obtained in the acoustic feature template pre-established, obtains the corresponding duration of a sound of voice signal
Information carries out time-domain audio transformation to voice signal according to duration information and standard acoustic feature information, to change voice signal
Acoustic feature information, obtained voice signal is stored or is exported as targeted voice signal after carrying out time domain transformation,
It overcomes and carries out acoustic training model using a large amount of data to realize the conversion of voice to song, and most end form in the prior art
At song in do not include user oneself sound, the problem for causing user's participation and Experience Degree not high, realize without into
Row acoustic training model can be realized in the time domain and the voice of user is converted to the song for remaining with the sound of user oneself
Effect, meanwhile, also ensure that song has good acoustical quality.
On the basis of the various embodiments described above, further, according to duration information and standard acoustic feature information to voice
Signal carries out time-domain audio transformation, to change the acoustic feature information of voice signal, including:
Voice signal is subjected to tone division according to duration information, is believed according to the normal fundamental frequency in standard acoustic feature information
Voice signal after breath, standard duration information and standard energy information divide tone carries out time-domain audio transformation so that through when
The fundamental frequency information of the transformed voice signal of domain audio is consistent with normal fundamental frequency information, through the transformed voice signal of time-domain audio
The energy information consistent with standard duration information and through the transformed voice signal of time-domain audio of duration information and standard energy
It is consistent to measure information.
Wherein, acoustic feature information may include fundamental frequency information, duration information and the energy information of voice signal.Wherein,
Fundamental frequency information corresponds to the pitch of voice signal, and duration information corresponds to the rhythm of voice signal, and energy information corresponds to voice signal
Volume.
In the present embodiment, voice signal can be carried out by tone division according to the duration information of voice signal, it is preferred that can
With according in duration information each word and the corresponding temporal information of each word by voice signal carry out tone division, obtain with
The corresponding tone of each word, each tone correspond to a part of voice signal, such as the song for the lyrics comprising 100 words
Song, the 1st corresponding temporal information of word are t1b-t1n, and the 2nd corresponding temporal information of word is t2b-t2n ... ..., the 100th
The corresponding temporal information of word is t100b-t100n, then t1b-t1n period corresponding part signal is the 1st in voice signal
The tone of a word, t2b-t2n period corresponding part signal is the tone ... ... of the 2nd word, voice signal in voice signal
Middle t100b-t100n period corresponding part signal is the tone of the 100th word.Wherein, each tone has its corresponding
Fundamental frequency information, duration information and energy information.Later the standard in standard acoustic feature information can be utilized as unit of tone
Fundamental frequency information, standard duration information and standard energy information carry out time-domain audio change to the voice signal after tone divides
Change so that the fundamental frequency information through the transformed voice signal of time-domain audio and corresponding normal fundamental frequency information it is consistent, through time domain sound
Frequently the duration information of transformed voice signal is consistent with corresponding standard duration information and through the transformed language of time-domain audio
The energy information of sound signal is consistent with corresponding standard energy information.That is, for song in the standard acoustic feature information of song
Each word of the bent lyrics, all preserves fundamental frequency information, duration information and the energy information of its corresponding tone, for through time domain
Each tone of the transformed voice signal of audio, fundamental frequency information, duration information and the energy information of the tone respectively with standard
Normal fundamental frequency information, the standard duration information that tone is corresponded in acoustic feature information are consistent with standard energy information.
Embodiment three
Fig. 3 is a kind of flow chart for song generation method that the embodiment of the present invention three provides.The present embodiment is in above-mentioned each reality
On the basis of applying example, after being selected in the voice signal corresponding with song for obtaining user's typing, special according to the standard acoustic
Before reference breath carries out the update of acoustic feature information to the voice signal, further include:Extract the frequency spectrum of the voice signal
Information;The update of acoustic feature information is carried out to the voice signal according to the standard acoustic feature information, including:Obtain institute
The voice signal is carried out tone division according to the duration information by the corresponding duration information of predicate sound signal;Tone is drawn
The voice signal after point carries out the conversion of time domain to frequency domain, according to the standard acoustic feature information to obtaining after conversion
The acoustic feature information of voice signal on frequency domain is updated;Correspondingly, by the language with updated acoustic feature information
Sound signal is stored or is exported as targeted voice signal, including:According to the acoustic feature information that is obtained after update and described
Spectrum information obtains targeted voice signal, and targeted voice signal is stored or exported.As shown in figure 3, the side of the present embodiment
Method specifically includes:
S310, the voice signal corresponding with song for obtaining user's typing.
S320, the spectrum information for extracting voice signal.
In the present embodiment, the spectrum information of voice signal corresponds to the tone color of voice signal, and that reflects the sound of user spies
Sign.During converting voice signals into song, in order to retain the sound characteristic of user, so that the song tool ultimately generated
There is the sound characteristic of user oneself, can in advance extract the spectrum information in voice signal.
S330, the corresponding standard acoustic feature information of song is obtained from the acoustic feature template pre-established.
S340, the corresponding duration information of voice signal is obtained, voice signal is carried out by tone division according to duration information;It will
Voice signal after tone divides carries out the conversion of time domain to frequency domain, according to standard acoustic feature information to the frequency obtained after conversion
The acoustic feature information of voice signal on domain is updated.
Method acquisition voice signal corresponding duration information described according to the above embodiments and divide tone, can
To get its corresponding duration information according to the waveform of voice signal and voice content in the time domain, according to duration information to language
Sound signal carries out tone division.
It, can also be on frequency domain to voice other than carrying out the update of acoustic feature information to voice signal in the time domain
The update of signal progress acoustic feature information.Specifically, can be as unit of each tone of division, it will be after tone divides
Voice signal carry out time domain to the conversion of frequency domain, obtain representation of each tone on frequency domain.Existed according to each tone
Representation on frequency domain determines the acoustic feature information of the voice signal on frequency domain, and according to standard acoustic feature information pair
The acoustic feature information of the voice signal on frequency domain obtained after conversion is updated, and obtains updated acoustic feature information.
S350, targeted voice signal is obtained according to the acoustic feature information and spectrum information obtained after update, by target language
Sound signal is stored or is exported.
Above-mentioned updated acoustic feature information is the acoustic feature information of professional singer, and spectrum information reflection is user
Sound characteristic, both include user using the targeted voice signal that the acoustic feature information and spectrum information that obtain after update obtain
Sound characteristic, and the acoustic feature information including professional singer.It, can be by the target voice after obtaining targeted voice signal
Signal is stored or is exported.
Song generation method provided in this embodiment, by obtaining the voice signal corresponding with song of user's typing, and
The spectrum signature for extracting voice signal obtains the corresponding standard acoustic feature letter of song from the acoustic feature template pre-established
Breath obtains the corresponding duration information of voice signal, voice signal is carried out tone division according to duration information, after tone is divided
Voice signal carry out time domain to frequency domain conversion, according to standard acoustic feature information to the voice on the frequency domain obtained after conversion
The acoustic feature information of signal is updated, and finally obtains target according to the acoustic feature information and spectrum information obtained after update
Targeted voice signal is stored or is exported by voice signal, is overcome and is carried out acoustics using a large amount of data in the prior art
Model training causes to use in finally formed song to realize the conversion of voice to song not comprising the sound of user oneself
Family participation and the not high problem of Experience Degree, realize without carrying out acoustic training model, can realize on frequency domain by user
Voice be converted to the effect for remaining with the song of sound of user oneself, meanwhile, also ensure song have good sound quality
Effect.
On the basis of the various embodiments described above, further, according to standard acoustic feature information to the frequency obtained after conversion
The acoustic feature information of voice signal on domain is updated, including:
Use the voice signal on the frequency domain obtained after the normal fundamental frequency information replacement conversion in standard acoustic feature information
Fundamental frequency information, use the voice letter after the standard duration information replacement conversion in standard acoustic feature information on obtained frequency domain
Number duration information, use the voice after the standard energy information replacement conversion in standard acoustic feature information on obtained frequency domain
The energy information of signal.
In the present embodiment, as unit of each tone, according to representation of each tone on frequency domain, determine on frequency domain
Voice signal acoustic feature information, include fundamental frequency information, duration information and the energy of voice signal in the acoustic feature information
Measure information.Later, the voice on the frequency domain obtained after the normal fundamental frequency information replacement conversion in standard acoustic feature information is utilized
The fundamental frequency information of signal uses the language on the frequency domain obtained after the standard duration information replacement conversion in standard acoustic feature information
The duration information of sound signal, using on the frequency domain obtained after the standard energy information replacement conversion in standard acoustic feature information
The energy information of voice signal.
On the basis of the various embodiments described above, further, believed according to the acoustic feature information and frequency spectrum obtained after update
Breath obtains targeted voice signal, including:
The acoustic feature information and spectrum information that obtain after update are inputed into vocoder, obtain the mesh that vocoder restores
Poster sound signal.
Wherein, the acoustic feature information updated on frequency domain and the spectrum information got in advance, Wu Fazhi are utilized
It connects to obtain corresponding targeted voice signal.It is preferred, therefore, that targeted voice signal can be restored by vocoder.Wherein, sound
Code device is also referred to as speech analysis synthesis system or voice band compressibility, can use the model parameter and knot of voice signal
It closes speech synthesis technique and restores corresponding voice signal, be the coder that a kind of pair of voice is analyzed and synthesized.
In the present embodiment, the acoustic feature information obtained after update and the spectrum information being obtained ahead of time can be input to sound
In code device, vocoder is according to each parameter of input, and the speech synthesis technique for combining it internal, restores corresponding target voice
Signal.
Example IV
Fig. 4 is the structural schematic diagram of one of embodiment of the present invention four song generating means.As shown in figure 4, this implementation
Example song generating means include:
Voice signal obtains module 410, for obtaining the voice signal corresponding with song of user's typing;
Acoustic feature information updating module 420, it is corresponding for obtaining song from the acoustic feature template pre-established
Standard acoustic feature information carries out the update of acoustic feature information according to standard acoustic feature information to voice signal;Wherein, sound
Learn the standard acoustic feature information that at least one song is preserved in feature templates;
Targeted voice signal determining module 430, for that will have the voice signal conduct of updated acoustic feature information
Targeted voice signal is stored or is exported.
Song generating means provided in this embodiment, by voice signal obtain module obtain user's typing with song pair
It is corresponding to obtain song using acoustic feature information updating module from the acoustic feature template pre-established for the voice signal answered
Standard acoustic feature information carries out the update of acoustic feature information according to standard acoustic feature information, wherein sound to voice signal
The standard acoustic feature information for preserving at least one song in feature templates is learned, recycles targeted voice signal determining module will
Voice signal with updated acoustic feature information is stored or is exported as targeted voice signal, and existing skill is overcome
Acoustic training model is carried out to realize the conversion of voice to song using a large amount of data in art, and in finally formed song not
Sound comprising user oneself, the problem for causing user's participation and Experience Degree not high are realized without carrying out acoustic model instruction
Practice, the effect that the voice of user is converted to the song of sound for remaining with user oneself can be realized, meanwhile, also ensure song
Sound has good acoustical quality.
On the basis of the various embodiments described above, further, acoustic feature information updating module 420 may include:
First duration information acquiring unit, for obtaining the corresponding duration information of voice signal;
Time-domain audio converter unit, for carrying out time domain to voice signal according to duration information and standard acoustic feature information
Audio transformation, to change the acoustic feature information of voice signal;
Targeted voice signal determining module 430 can specifically include:
First object voice signal determination unit, the voice signal for that will obtain after time-domain audio transformation is as mesh
Poster sound signal is stored or is exported.
Further, time-domain audio converter unit specifically can be used for:
Voice signal is subjected to tone division according to duration information, is believed according to the normal fundamental frequency in standard acoustic feature information
Voice signal after breath, standard duration information and standard energy information divide tone carries out time-domain audio transformation so that through when
The fundamental frequency information of the transformed voice signal of domain audio is consistent with normal fundamental frequency information, through the transformed voice signal of time-domain audio
The energy information consistent with standard duration information and through the transformed voice signal of time-domain audio of duration information and standard energy
It is consistent to measure information.
Further, which can also include:
Spectrum information extraction module, for obtain user's typing voice signal corresponding with song after, according to mark
Before quasi- acoustic feature information carries out the update of acoustic feature information to voice signal, the spectrum information of voice signal is extracted;
Acoustic feature information updating module 420 can also include:
Second duration information acquiring unit, for obtaining the corresponding duration information of voice signal;
Frequency domain audio converter unit, for voice signal to be carried out tone division according to duration information;After tone is divided
Voice signal carry out time domain to frequency domain conversion, according to standard acoustic feature information to the voice on the frequency domain obtained after conversion
The acoustic feature information of signal is updated;
Targeted voice signal determining module 430 specifically can also include:
Second target voice determination unit, for obtaining mesh according to the acoustic feature information and spectrum information obtained after update
Targeted voice signal is stored or is exported by poster sound signal.
Further, frequency domain audio converter unit specifically can be used for:
Use the voice signal on the frequency domain obtained after the normal fundamental frequency information replacement conversion in standard acoustic feature information
Fundamental frequency information, use the voice letter after the standard duration information replacement conversion in standard acoustic feature information on obtained frequency domain
Number duration information, use the voice after the standard energy information replacement conversion in standard acoustic feature information on obtained frequency domain
The energy information of signal.
Further, the second target voice determination unit specifically can be used for:
The acoustic feature information and spectrum information that obtain after update are inputed into vocoder, obtain the mesh that vocoder restores
Poster sound signal.
Further, the first duration information acquiring unit and the second duration information acquiring unit specifically may be incorporated for:
The lyrics information for including in voice signal is obtained by speech recognition, it is corresponding to obtain voice signal according to lyrics information
Duration information;
Further, which can also include:
Standard acoustic feature information extraction modules, for obtain user's typing voice signal corresponding with song it
Before, standard acoustic feature information of the acoustic feature information of multiple songs of recording as corresponding song is extracted respectively;
Acoustic feature template generation module, for by the identification information of multiple songs and corresponding standard acoustic feature information
It is stored in acoustic feature template.
It is raw that song provided by any embodiment of the invention can be performed in song generating means provided by the embodiment of the present invention
At method, have the corresponding functional module of execution method and beneficial effect.
Embodiment five
Fig. 5 is the structural schematic diagram that the song that the embodiment of the present invention five provides generates terminal.Fig. 5, which is shown, to be suitable for being used in fact
The exemplary song of existing embodiment of the present invention generates the block diagram of terminal 512.The song that Fig. 5 is shown generates terminal 512
One example, should not function to the embodiment of the present invention and use scope bring any restrictions.
It is showed in the form of universal computing device as shown in figure 5, song generates terminal 512.The group of song generation terminal 512
Part can include but is not limited to:One or more processor 516, memory 528 connect (including the storage of different system components
Device 528 and processor 516) bus 518.
Bus 518 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller,
Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts
For example, these architectures include but is not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC)
Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.
Song generates terminal 512 and typically comprises a variety of computer system readable media.These media can be any energy
Enough usable mediums accessed by song generation terminal 512, including volatile and non-volatile media, it is moveable and irremovable
Medium.
Memory 528 may include the computer system readable media of form of volatile memory, such as arbitrary access is deposited
Reservoir (RAM) 530 and/or cache memory 532.Song generate terminal 512 may further include it is other it is removable/no
Movably, volatile/non-volatile computer system storage medium.Only as an example, storage device 534 can be used for reading and writing
Immovable, non-volatile magnetic media (Fig. 5 do not show, commonly referred to as " hard disk drive ").It, can although being not shown in Fig. 5
To provide the disc driver for reading and writing to removable non-volatile magnetic disk (such as " floppy disk "), and it is non-volatile to moving
Property CD (such as CD-ROM, DVD-ROM or other optical mediums) read and write CD drive.In these cases, each drive
Dynamic device can be connected by one or more data media interfaces with bus 518.Memory 528 may include at least one journey
Sequence product, the program product have one group of (for example, at least one) program module, these program modules are configured to perform this hair
The function of bright each embodiment.
Program/utility 540 with one group of (at least one) program module 542, can store in such as memory
In 528, such program module 542 includes but is not limited to operating system, one or more application program, other program modules
And program data, it may include the realization of network environment in each of these examples or certain combination.Program module 542
Usually execute the function and/or method in embodiment described in the invention.
Song generates terminal 512 can also be with one or more external equipments 514 (such as keyboard, sensing equipment, display
524 etc., wherein display 524 can decide whether to configure according to actual needs) it communicates, can also to use with one or more
Family can generate the equipment that interact of terminal 512 with the song and communicate, and/or with enable song generation terminal 512 and one or
A number of other any equipment (such as network interface card, modem etc.) communications for calculating equipment and being communicated.This communication can be with
It is carried out by input/output (I/O) interface 522.Also, song, which generates terminal 512, can also pass through network adapter 520 and one
A or multiple networks (such as local area network (LAN), wide area network (WAN) and/or public network, such as internet) communication.Such as figure
Shown, network adapter 520 is communicated by bus 518 with other modules that song generates terminal 512.Although should be understood that Fig. 5
In be not shown, can in conjunction with song generate terminal 512 use other hardware and/or software module, including but not limited to:Micro- generation
Code, device driver, redundant processing unit, external disk drive array, RAID system, tape drive and data backup are deposited
Storage device etc..
The program that processor 516 is stored in memory 528 by operation, thereby executing various function application and data
Processing, such as realize song generation method provided by any embodiment of the invention.
Embodiment six
The embodiment of the present invention six additionally provides a kind of computer readable storage medium, is stored thereon with computer program, should
Realize that the song generation method as provided by the embodiment of the present invention, this method include when program is executed by processor:
Obtain the voice signal corresponding with song of user's typing;
The corresponding standard acoustic feature information of song is obtained from the acoustic feature template pre-established, according to standard acoustic
Characteristic information carries out the update of acoustic feature information to voice signal;Wherein, at least one song is preserved in acoustic feature template
Bent standard acoustic feature information;
Voice signal with updated acoustic feature information is stored or exported as targeted voice signal.
Certainly, a kind of computer readable storage medium provided by the embodiment of the present invention, the computer program stored thereon
The method operation being not limited to the described above, can also be performed the phase in song generation method provided by any embodiment of the invention
Close operation.
The computer storage medium of the embodiment of the present invention, can be using any of one or more computer-readable media
Combination.Computer-readable medium can be computer-readable signal media or computer readable storage medium.It is computer-readable
Storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device or
Device, or any above combination.The more specific example (non exhaustive list) of computer readable storage medium includes:Tool
There are electrical connection, the portable computer diskette, hard disk, random access memory (RAM), read-only memory of one or more conducting wires
(ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-
ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer-readable storage
Medium can be any tangible medium for including or store program, which can be commanded execution system, device or device
Using or it is in connection.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal,
Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited
In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can
Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for
By the use of instruction execution system, device or device or program in connection.
The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited
In wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.
The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof
Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++,
It further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with
It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion
Divide and partially executes or executed on a remote computer or server completely on the remote computer on the user computer.?
Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including local area network (LAN) or
Wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as mentioned using Internet service
It is connected for quotient by internet).
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that
The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation,
It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention
It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also
It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.
Claims (11)
1. a kind of song generation method, which is characterized in that including:
Obtain the voice signal corresponding with song of user's typing;
The corresponding standard acoustic feature information of the song is obtained from the acoustic feature template pre-established, according to the standard
Acoustic feature information carries out the update of acoustic feature information to the voice signal;Wherein, it is saved in the acoustic feature template
There is the standard acoustic feature information of at least one song;
Voice signal with updated acoustic feature information is stored or exported as targeted voice signal.
2. the method according to claim 1, wherein being believed according to the standard acoustic feature information the voice
Number carry out acoustic feature information update, including:
The corresponding duration information of the voice signal is obtained, according to the duration information and the standard acoustic feature information to institute
Predicate sound signal carries out time-domain audio transformation, to change the acoustic feature information of the voice signal;
Correspondingly, using the voice signal with updated acoustic feature information as targeted voice signal carry out storage or it is defeated
Out, including:
The voice signal obtain after time-domain audio transformation is stored or exported as targeted voice signal.
3. according to the method described in claim 2, it is characterized in that, being believed according to the duration information and the standard acoustic feature
Breath carries out time-domain audio transformation to the voice signal, to change the acoustic feature information of the voice signal, including:
The voice signal is subjected to tone division according to the duration information, according to the mark in the standard acoustic feature information
The voice signal after quasi- fundamental frequency information, standard duration information and standard energy information divide tone carries out time-domain audio change
Change so that the fundamental frequency information through the transformed voice signal of time-domain audio it is consistent with the normal fundamental frequency information, through time-domain audio
The duration information of transformed voice signal is consistent with the standard duration information and believes through the transformed voice of time-domain audio
Number energy information it is consistent with the standard energy information.
4. the method according to claim 1, wherein in the voice signal corresponding with song for obtaining user's typing
Later, before the update for carrying out acoustic feature information to the voice signal according to the standard acoustic feature information, further include:
Extract the spectrum information of the voice signal;
The update of acoustic feature information is carried out to the voice signal according to the standard acoustic feature information, including:
The corresponding duration information of the voice signal is obtained, the voice signal is carried out by tone according to the duration information and is drawn
Point;The voice signal after tone is divided carries out the conversion of time domain to frequency domain, according to the standard acoustic feature information pair
The acoustic feature information of the voice signal on frequency domain obtained after conversion is updated;
Correspondingly, using the voice signal with updated acoustic feature information as targeted voice signal carry out storage or it is defeated
Out, including:
Targeted voice signal is obtained according to the acoustic feature information obtained after update and the spectrum information, by targeted voice signal
It is stored or is exported.
5. according to the method described in claim 4, it is characterized in that, according to the standard acoustic feature information to being obtained after conversion
Frequency domain on the acoustic feature information of voice signal be updated, including:
Use the voice signal on the frequency domain obtained after the normal fundamental frequency information replacement conversion in the standard acoustic feature information
Fundamental frequency information, use the language after the standard duration information replacement conversion in the standard acoustic feature information on obtained frequency domain
The duration information of sound signal uses the frequency domain obtained after the standard energy information replacement conversion in the standard acoustic feature information
On voice signal energy information.
6. according to the method described in claim 4, it is characterized in that, according to the acoustic feature information and the frequency obtained after update
Spectrum information obtains targeted voice signal, including:
The acoustic feature information obtained after update and the spectrum information are inputed into vocoder, the vocoder is obtained and restores
Targeted voice signal.
7. method according to claim 2 or 4, which is characterized in that obtain the corresponding duration information of the voice signal, wrap
It includes:
The lyrics information for including in the voice signal is obtained by speech recognition, and the voice is obtained according to the lyrics information
The corresponding duration information of signal.
8. method according to claim 1 to 6, which is characterized in that obtaining the corresponding with song of user's typing
Voice signal before further include:
Standard acoustic feature information of the acoustic feature information of multiple songs of recording as corresponding song is extracted respectively;
By the identification information of the multiple song with corresponding standard acoustic feature information preservation in acoustic feature template.
9. a kind of song generating means, which is characterized in that including:
Voice signal obtains module, for obtaining the voice signal corresponding with song of user's typing;
Acoustic feature information updating module, for obtaining the corresponding standard of the song from the acoustic feature template pre-established
Acoustic feature information carries out the update of acoustic feature information according to the standard acoustic feature information to the voice signal;Its
In, the standard acoustic feature information of at least one song is preserved in the acoustic feature template;
Targeted voice signal determining module, for that will have the voice signal of updated acoustic feature information as target voice
Signal is stored or is exported.
10. a kind of song generates terminal, which is characterized in that including:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors are real
Now such as song generation method described in any one of claims 1-8.
11. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor
Such as song generation method described in any one of claims 1-8 is realized when execution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810622548.8A CN108831437B (en) | 2018-06-15 | 2018-06-15 | Singing voice generation method, singing voice generation device, terminal and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810622548.8A CN108831437B (en) | 2018-06-15 | 2018-06-15 | Singing voice generation method, singing voice generation device, terminal and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108831437A true CN108831437A (en) | 2018-11-16 |
CN108831437B CN108831437B (en) | 2020-09-01 |
Family
ID=64142414
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810622548.8A Active CN108831437B (en) | 2018-06-15 | 2018-06-15 | Singing voice generation method, singing voice generation device, terminal and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108831437B (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109920449A (en) * | 2019-03-18 | 2019-06-21 | 广州市百果园网络科技有限公司 | Beat analysis method, audio-frequency processing method and device, equipment, medium |
CN110738980A (en) * | 2019-09-16 | 2020-01-31 | 平安科技(深圳)有限公司 | Singing voice synthesis model training method and system and singing voice synthesis method |
CN111091807A (en) * | 2019-12-26 | 2020-05-01 | 广州酷狗计算机科技有限公司 | Speech synthesis method, speech synthesis device, computer equipment and storage medium |
CN111354332A (en) * | 2018-12-05 | 2020-06-30 | 北京嘀嘀无限科技发展有限公司 | Singing voice synthesis method and device |
CN111429881A (en) * | 2020-03-19 | 2020-07-17 | 北京字节跳动网络技术有限公司 | Sound reproduction method, device, readable medium and electronic equipment |
CN111445892A (en) * | 2020-03-23 | 2020-07-24 | 北京字节跳动网络技术有限公司 | Song generation method and device, readable medium and electronic equipment |
CN111477210A (en) * | 2020-04-02 | 2020-07-31 | 北京字节跳动网络技术有限公司 | Speech synthesis method and device |
CN112289300A (en) * | 2020-10-28 | 2021-01-29 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio processing method and device, electronic equipment and computer readable storage medium |
CN112420008A (en) * | 2019-08-22 | 2021-02-26 | 北京峰趣互联网信息服务有限公司 | Method and device for recording songs, electronic equipment and storage medium |
CN112712783A (en) * | 2020-12-21 | 2021-04-27 | 北京百度网讯科技有限公司 | Method and apparatus for generating music, computer device and medium |
CN112837668A (en) * | 2019-11-01 | 2021-05-25 | 北京搜狗科技发展有限公司 | Voice processing method and device for processing voice |
CN113593520A (en) * | 2021-09-08 | 2021-11-02 | 广州虎牙科技有限公司 | Singing voice synthesis method and device, electronic equipment and storage medium |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050169114A1 (en) * | 2002-02-20 | 2005-08-04 | Hosung Ahn | Digital recorder for selectively storing only a music section out of radio broadcasting contents and method thereof |
CN1682278A (en) * | 2002-09-17 | 2005-10-12 | 皇家飞利浦电子股份有限公司 | Method of synthesis for a steady sound signal |
CN1719514A (en) * | 2004-07-06 | 2006-01-11 | 中国科学院自动化研究所 | Based on speech analysis and synthetic high-quality real-time change of voice method |
CN1761993A (en) * | 2003-03-20 | 2006-04-19 | 索尼株式会社 | Singing voice synthesizing method, singing voice synthesizing device, program, recording medium, and robot |
EP1185976B1 (en) * | 2000-02-25 | 2006-08-16 | Philips Electronics N.V. | Speech recognition device with reference transformation means |
TWI260582B (en) * | 2005-01-20 | 2006-08-21 | Sunplus Technology Co Ltd | Speech synthesizer with mixed parameter mode and method thereof |
CN1924994A (en) * | 2005-08-31 | 2007-03-07 | 中国科学院自动化研究所 | Embedded language synthetic method and system |
CN101064103A (en) * | 2006-04-24 | 2007-10-31 | 中国科学院自动化研究所 | Chinese voice synthetic method and system based on syllable rhythm restricting relationship |
US20130226957A1 (en) * | 2012-02-27 | 2013-08-29 | The Trustees Of Columbia University In The City Of New York | Methods, Systems, and Media for Identifying Similar Songs Using Two-Dimensional Fourier Transform Magnitudes |
CN105244041A (en) * | 2015-09-22 | 2016-01-13 | 百度在线网络技术(北京)有限公司 | Song audition evaluation method and device |
CN105845125A (en) * | 2016-05-18 | 2016-08-10 | 百度在线网络技术(北京)有限公司 | Speech synthesis method and speech synthesis device |
JP2016206496A (en) * | 2015-04-24 | 2016-12-08 | ヤマハ株式会社 | Controller, synthetic singing sound creation device and program |
CN106652997A (en) * | 2016-12-29 | 2017-05-10 | 腾讯音乐娱乐(深圳)有限公司 | Audio synthesis method and terminal |
CN106971703A (en) * | 2017-03-17 | 2017-07-21 | 西北师范大学 | A kind of song synthetic method and device based on HMM |
CN107863095A (en) * | 2017-11-21 | 2018-03-30 | 广州酷狗计算机科技有限公司 | Acoustic signal processing method, device and storage medium |
CN108053814A (en) * | 2017-11-06 | 2018-05-18 | 芋头科技(杭州)有限公司 | A kind of speech synthesis system and method for analog subscriber song |
-
2018
- 2018-06-15 CN CN201810622548.8A patent/CN108831437B/en active Active
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1185976B1 (en) * | 2000-02-25 | 2006-08-16 | Philips Electronics N.V. | Speech recognition device with reference transformation means |
US20050169114A1 (en) * | 2002-02-20 | 2005-08-04 | Hosung Ahn | Digital recorder for selectively storing only a music section out of radio broadcasting contents and method thereof |
CN1682278A (en) * | 2002-09-17 | 2005-10-12 | 皇家飞利浦电子股份有限公司 | Method of synthesis for a steady sound signal |
CN1761993A (en) * | 2003-03-20 | 2006-04-19 | 索尼株式会社 | Singing voice synthesizing method, singing voice synthesizing device, program, recording medium, and robot |
CN1719514A (en) * | 2004-07-06 | 2006-01-11 | 中国科学院自动化研究所 | Based on speech analysis and synthetic high-quality real-time change of voice method |
TWI260582B (en) * | 2005-01-20 | 2006-08-21 | Sunplus Technology Co Ltd | Speech synthesizer with mixed parameter mode and method thereof |
CN1924994A (en) * | 2005-08-31 | 2007-03-07 | 中国科学院自动化研究所 | Embedded language synthetic method and system |
CN101064103A (en) * | 2006-04-24 | 2007-10-31 | 中国科学院自动化研究所 | Chinese voice synthetic method and system based on syllable rhythm restricting relationship |
US20130226957A1 (en) * | 2012-02-27 | 2013-08-29 | The Trustees Of Columbia University In The City Of New York | Methods, Systems, and Media for Identifying Similar Songs Using Two-Dimensional Fourier Transform Magnitudes |
JP2016206496A (en) * | 2015-04-24 | 2016-12-08 | ヤマハ株式会社 | Controller, synthetic singing sound creation device and program |
CN105244041A (en) * | 2015-09-22 | 2016-01-13 | 百度在线网络技术(北京)有限公司 | Song audition evaluation method and device |
CN105845125A (en) * | 2016-05-18 | 2016-08-10 | 百度在线网络技术(北京)有限公司 | Speech synthesis method and speech synthesis device |
CN106652997A (en) * | 2016-12-29 | 2017-05-10 | 腾讯音乐娱乐(深圳)有限公司 | Audio synthesis method and terminal |
CN106971703A (en) * | 2017-03-17 | 2017-07-21 | 西北师范大学 | A kind of song synthetic method and device based on HMM |
CN108053814A (en) * | 2017-11-06 | 2018-05-18 | 芋头科技(杭州)有限公司 | A kind of speech synthesis system and method for analog subscriber song |
CN107863095A (en) * | 2017-11-21 | 2018-03-30 | 广州酷狗计算机科技有限公司 | Acoustic signal processing method, device and storage medium |
Non-Patent Citations (3)
Title |
---|
JAMES P. KIRBY: "《Onset pitch perturbations and the cross-linguistic implementation of voicing: Evidence from tonal and non-tonal languages》", 《JOURNAL OF PHONETICS》 * |
张丹烽等: "《语音合成技术发展综述与研究现状》", 《电子信息》 * |
杨楠: "《基于频谱建模合成技术的自动音调修正系统》", 《计算机与数字工程》 * |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111354332A (en) * | 2018-12-05 | 2020-06-30 | 北京嘀嘀无限科技发展有限公司 | Singing voice synthesis method and device |
CN109920449B (en) * | 2019-03-18 | 2022-03-04 | 广州市百果园网络科技有限公司 | Beat analysis method, audio processing method, device, equipment and medium |
CN109920449A (en) * | 2019-03-18 | 2019-06-21 | 广州市百果园网络科技有限公司 | Beat analysis method, audio-frequency processing method and device, equipment, medium |
CN112420008A (en) * | 2019-08-22 | 2021-02-26 | 北京峰趣互联网信息服务有限公司 | Method and device for recording songs, electronic equipment and storage medium |
CN110738980A (en) * | 2019-09-16 | 2020-01-31 | 平安科技(深圳)有限公司 | Singing voice synthesis model training method and system and singing voice synthesis method |
CN112837668B (en) * | 2019-11-01 | 2023-04-28 | 北京搜狗科技发展有限公司 | Voice processing method and device for processing voice |
CN112837668A (en) * | 2019-11-01 | 2021-05-25 | 北京搜狗科技发展有限公司 | Voice processing method and device for processing voice |
CN111091807A (en) * | 2019-12-26 | 2020-05-01 | 广州酷狗计算机科技有限公司 | Speech synthesis method, speech synthesis device, computer equipment and storage medium |
CN111429881A (en) * | 2020-03-19 | 2020-07-17 | 北京字节跳动网络技术有限公司 | Sound reproduction method, device, readable medium and electronic equipment |
CN111429881B (en) * | 2020-03-19 | 2023-08-18 | 北京字节跳动网络技术有限公司 | Speech synthesis method and device, readable medium and electronic equipment |
CN111445892A (en) * | 2020-03-23 | 2020-07-24 | 北京字节跳动网络技术有限公司 | Song generation method and device, readable medium and electronic equipment |
CN111477210A (en) * | 2020-04-02 | 2020-07-31 | 北京字节跳动网络技术有限公司 | Speech synthesis method and device |
CN112289300A (en) * | 2020-10-28 | 2021-01-29 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio processing method and device, electronic equipment and computer readable storage medium |
CN112289300B (en) * | 2020-10-28 | 2024-01-09 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio processing method and device, electronic equipment and computer readable storage medium |
CN112712783A (en) * | 2020-12-21 | 2021-04-27 | 北京百度网讯科技有限公司 | Method and apparatus for generating music, computer device and medium |
CN112712783B (en) * | 2020-12-21 | 2023-09-29 | 北京百度网讯科技有限公司 | Method and device for generating music, computer equipment and medium |
CN113593520A (en) * | 2021-09-08 | 2021-11-02 | 广州虎牙科技有限公司 | Singing voice synthesis method and device, electronic equipment and storage medium |
CN113593520B (en) * | 2021-09-08 | 2024-05-17 | 广州虎牙科技有限公司 | Singing voice synthesizing method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN108831437B (en) | 2020-09-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108831437A (en) | A kind of song generation method, device, terminal and storage medium | |
CN106898340B (en) | Song synthesis method and terminal | |
CN111048064B (en) | Voice cloning method and device based on single speaker voice synthesis data set | |
CN110364140B (en) | Singing voice synthesis model training method, singing voice synthesis model training device, computer equipment and storage medium | |
JP2007249212A (en) | Method, computer program and processor for text speech synthesis | |
CN100585663C (en) | Language studying system | |
JP2021110943A (en) | Cross-lingual voice conversion system and method | |
CN108573694A (en) | Language material expansion and speech synthesis system construction method based on artificial intelligence and device | |
CN109346043B (en) | Music generation method and device based on generation countermeasure network | |
CN109599090B (en) | Method, device and equipment for voice synthesis | |
CN112102811B (en) | Optimization method and device for synthesized voice and electronic equipment | |
CN113724683B (en) | Audio generation method, computer device and computer readable storage medium | |
JP7497523B2 (en) | Method, device, electronic device and storage medium for synthesizing custom timbre singing voice | |
CN112185340B (en) | Speech synthesis method, speech synthesis device, storage medium and electronic equipment | |
CN112289300B (en) | Audio processing method and device, electronic equipment and computer readable storage medium | |
US20120109654A1 (en) | Methods and apparatuses for facilitating speech synthesis | |
CN113948062A (en) | Data conversion method and computer storage medium | |
CN114333758A (en) | Speech synthesis method, apparatus, computer device, storage medium and product | |
JP5706368B2 (en) | Speech conversion function learning device, speech conversion device, speech conversion function learning method, speech conversion method, and program | |
CN112164387A (en) | Audio synthesis method and device, electronic equipment and computer-readable storage medium | |
JP2006139162A (en) | Language learning system | |
CN113421544B (en) | Singing voice synthesizing method, singing voice synthesizing device, computer equipment and storage medium | |
CN114299910B (en) | Training method, using method, device, equipment and medium of speech synthesis model | |
CN114822492B (en) | Speech synthesis method and device, electronic equipment and computer readable storage medium | |
CN116825090B (en) | Training method and device for speech synthesis model and speech synthesis method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |