CN108831437A

CN108831437A - A kind of song generation method, device, terminal and storage medium

Info

Publication number: CN108831437A
Application number: CN201810622548.8A
Authority: CN
Inventors: 李�昊
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd; Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-06-15
Filing date: 2018-06-15
Publication date: 2018-11-16
Anticipated expiration: 2038-06-15
Also published as: CN108831437B

Abstract

The embodiment of the invention discloses a kind of song generation method, device, terminal and storage mediums, wherein song generation method includes：Obtain the voice signal corresponding with song of user's typing；The corresponding standard acoustic feature information of song is obtained from the acoustic feature template pre-established, carries out the update of acoustic feature information to voice signal according to standard acoustic feature information；Wherein, the standard acoustic feature information of at least one song is preserved in acoustic feature template；Voice signal with updated acoustic feature information is stored or exported as targeted voice signal.The embodiment of the present invention, which overcomes, existing to be carried out acoustic training model using a large amount of data and realizes the conversion of voice to song, and the sound of user oneself is not included in finally formed song, the problem for causing user's participation and Experience Degree not high, it realizes without carrying out acoustic training model, the effect that the voice of user is converted to the song of sound for remaining with user oneself can be realized.

Description

A kind of song generation method, device, terminal and storage medium

Technical field

The present embodiments relate to field of computer technology more particularly to a kind of song generation method, device, terminal and deposit Storage media.

Background technique

Voice switching singing voice, which refers to, transforms into corresponding song for the voice of user.Such internet product can incite somebody to action After the voice of user transforms into song, in conjunction with accompaniment music, synthesize user oneself sings works, has entertainment, social activity Property and certain market value.

The scheme for converting speech into song in the prior art is mainly：In model training stage, use professional singer A's The text data (including the lyrics etc.) and singer A of multiple songs sing the acoustic feature of corresponding song, carry out model training, obtain To the acoustic model of singer A；In song generation phase, the voice data that user B sang or read song is obtained, according to the voice The lyrics of data identification song simultaneously obtain the acoustic feature of user B；The acoustic model for the lyrics input singer A that will identify that, obtains The prediction acoustic feature exported to the acoustic model, according to the fundamental frequency and duration of a sound update prediction acoustics in the acoustic feature of user B Fundamental frequency and the duration of a sound in feature, obtain modified acoustic feature, and what which included is the base of user B Frequently, the frequency spectrum of the duration of a sound of user B, singer A, therefore modified acoustic feature is spelled using parametric statistical methods or sound library again Method is connect, the pitch and rhythm of characteristic voice and user B of the obtained song with singer A have reached singer A and imitated user B The effect to give song recitals.

Above scheme generally requires to carry out acoustic training model, to the more demanding of sample data volume, realizes that process is multiple It is miscellaneous, and the loss in sound quality can be brought；In addition, having the characteristic voice of singer using the song of above method synthesis, cause to use The participation and Experience Degree at family are bad.

Summary of the invention

The embodiment of the present invention provides a kind of song generation method, device, terminal and storage medium, is not necessarily to carry out sound to reach Model training is learned, the voice of user can be converted to the effect for remaining with the song of sound of user oneself.

In a first aspect, the embodiment of the invention provides a kind of song generation method, the method includes：

Obtain the voice signal corresponding with song of user's typing；

The corresponding standard acoustic feature information of the song is obtained from the acoustic feature template pre-established, according to described Standard acoustic feature information carries out the update of acoustic feature information to the voice signal；Wherein, in the acoustic feature template Preserve the standard acoustic feature information of at least one song；

Voice signal with updated acoustic feature information is stored or exported as targeted voice signal.

Second aspect, the embodiment of the invention also provides a kind of song generating means, described device includes：

Voice signal obtains module, for obtaining the voice signal corresponding with song of user's typing；

Acoustic feature information updating module, it is corresponding for obtaining the song from the acoustic feature template pre-established Standard acoustic feature information carries out acoustic feature information more to the voice signal according to the standard acoustic feature information Newly；Wherein, the standard acoustic feature information of at least one song is preserved in the acoustic feature template；

Targeted voice signal determining module, for that will have the voice signal of updated acoustic feature information as target Voice signal is stored or is exported.

The third aspect, the embodiment of the invention also provides a kind of songs to generate terminal, and the terminal includes：

One or more processors；

Storage device, for storing one or more programs,

When one or more of programs are executed by one or more of processors, so that one or more of processing Device realizes song generation method described in first aspect as above.

Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer Program, the program realize song generation method described in first aspect as above when being executed by processor.

The embodiment of the present invention passes through the voice signal corresponding with song for obtaining user's typing, special from the acoustics pre-established It levies and obtains the corresponding standard acoustic feature information of song in template, acoustics is carried out to voice signal according to standard acoustic feature information The update of characteristic information, wherein the standard acoustic feature information that at least one song is preserved in acoustic feature template will have The voice signal of updated acoustic feature information is stored or is exported as targeted voice signal, is overcome in the prior art Acoustic training model is carried out to realize the conversion of voice to song using a large amount of data, and is not included in finally formed song Without carrying out acoustic training model, i.e., the sound of user oneself, the problem for causing user's participation and Experience Degree not high realize The effect that the voice of user is converted to the song of sound for remaining with user oneself can be achieved, meanwhile, also ensure that song has There is good acoustical quality.

Detailed description of the invention

Fig. 1 is the flow chart of the song generation method in the embodiment of the present invention one；

Fig. 2 is the flow chart of the song generation method in the embodiment of the present invention two；

Fig. 3 is the flow chart of the song generation method in the embodiment of the present invention three；

Fig. 4 is the structural schematic diagram of the song generating means in the embodiment of the present invention four；

Fig. 5 is the structural schematic diagram of the song generation terminal in the embodiment of the present invention five.

Specific embodiment

The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.

Embodiment one

Fig. 1 is the flow chart of a kind of song generation method that the embodiment of the present invention one provides, the present embodiment be applicable to by The case where voice of user is converted into song, this method can be executed by song generating means, wherein the device can be by software And/or hardware realization, it can generally be integrated in song and generate in terminal, as shown in Figure 1, the method for the present embodiment specifically includes：

S110, the voice signal corresponding with song for obtaining user's typing.

Wherein, the voice signal corresponding with song of user's typing can be using specific song content as object, by with Family is generated by way of reading aloud or singing.The voice signal may include various information, such as may include particular songs Lyrics information and acoustics characteristic information, acoustic feature information include reflect pitch fundamental frequency information, reflect volume energy letter Breath, duration information of reflection rhythm etc..Wherein, it may determine that the user reads aloud or sings specifically according to acoustic feature information The level and professional singer of song sing the gap between the professional standards of the song.

Preferably, user can generate the request that terminal sends typing voice signal corresponding with song, song to song After generation terminal receives the request, it can pass through and open the voice signal that microphone etc. obtains user's typing.Wherein, song is raw It can be independent hardware device at terminal, such as intelligent sound box, be used for interactive robot, be also possible to be installed on each Client in terminal (such as mobile phone, notebook, smart television etc.).

S120, the corresponding standard acoustic feature information of song is obtained from the acoustic feature template pre-established, according to mark Quasi- acoustic feature information carries out the update of acoustic feature information to voice signal.

Wherein, acoustic feature template is obtained by extracting the acoustic feature information at least one song that professional singer is recorded , wherein preserving the standard acoustic feature information of at least one song.In the present embodiment, get user's typing with After the corresponding voice signal of particular songs, in order to update the acoustic feature information of the voice signal, it can preferably be built from advance Standard acoustic feature information corresponding with particular songs is obtained in vertical acoustic feature template, is believed according to the standard acoustic feature Breath updates the corresponding acoustic feature information of voice signal.

Illustratively, user wants to obtain sound characteristic while and the song of the acoustic feature with professional singer with oneself Song preferably can generate terminal typing song A to song by way of performance.At this point, in order to by user give song recitals A when Acoustic feature is converted into the acoustic feature of professional singer, can use song and generates the acoustic feature mould pre-saved in terminal Plate.Specifically, can determine that the corresponding song of voice signal of user's typing has according to the lyrics of song A or the selection of user Which head body is, after determining song, the corresponding standard acoustic feature of the song can be obtained from acoustic feature template Information, and be updated using acoustic feature information of the standard acoustic feature information to the voice signal of user's typing.

S130, using the voice signal with updated acoustic feature information as targeted voice signal carry out storage or it is defeated Out.

Above-mentioned standard acoustic feature information and use of the voice signal with professional singer having updated after acoustic feature information The sound characteristic information at family oneself, it is therefore preferable that can be using the voice signal with updated acoustic feature information as mesh Poster sound signal is saved or is exported.

Song generation method provided in this embodiment, by obtaining the voice signal corresponding with song of user's typing, from The corresponding standard acoustic feature information of song is obtained in the acoustic feature template pre-established, according to standard acoustic feature information pair The update of voice signal progress acoustic feature information, wherein the standard sound of at least one song is preserved in acoustic feature template Learn characteristic information, using the voice signal with updated acoustic feature information as targeted voice signal carry out store or it is defeated Out, it overcomes and carries out acoustic training model using a large amount of data in the prior art to realize the conversion of voice to song, and most End form at song in do not include the sound of user oneself, the problem for causing user's participation and Experience Degree not high realizes nothing Acoustic training model need to be carried out, the effect that the voice of user is converted to the song of sound for remaining with user oneself can be realized Fruit, meanwhile, also ensure that song has good acoustical quality.

On the basis of the various embodiments described above, further, in the voice signal corresponding with song for obtaining user's typing Further include before：

Standard acoustic feature information of the acoustic feature information of multiple songs of recording as corresponding song is extracted respectively；

By the identification information of multiple songs with corresponding standard acoustic feature information preservation in acoustic feature template.

In the present embodiment, acoustic feature template is that the number of songs recorded previously according to professional singer obtain.Specifically, The acoustic feature information of each song can be extracted respectively, due to each sound after the number of songs for getting professional singer recording Learning the corresponding each song of characteristic information is recorded by professional singer, therefore, each acoustic feature information that can will be extracted Standard acoustic feature information as corresponding song.

If only by each standard acoustic feature information preservation extracted in acoustic feature template, then from preparatory In the acoustic feature template of foundation obtain standard acoustic feature information corresponding with some particular songs when, lack acquisition according to According to.Based on this, can be obtained each correspondingly with each standard acoustic feature while extracting each standard acoustic feature information The identification information of song, and by the identification information of each song with corresponding standard acoustic feature information preservation in acoustic feature template In.Wherein, the identification information of song includes the title of song, and the lyrics of song, the title of song add name of professional singer etc., The mode that song generates the identification information that terminal obtains song corresponding with the voice signal of user's typing can be reception and use The input information at family is also possible to extract from the voice signal got.

Embodiment two

Fig. 2 is a kind of flow chart of song generation method provided by Embodiment 2 of the present invention.The present embodiment is in above-mentioned each reality It is optional that acoustic feature information is carried out more to the voice signal according to the standard acoustic feature information on the basis of applying example Newly, including：The corresponding duration information of the voice signal is obtained, according to the duration information and the standard acoustic feature information Time-domain audio transformation is carried out to the voice signal, to change the acoustic feature information of the voice signal；Correspondingly, will have The voice signal of updated acoustic feature information is stored or is exported as targeted voice signal, including：It will carry out time domain The voice signal obtained after audio transformation is stored or is exported as targeted voice signal.As shown in Fig. 2, the side of the present embodiment Method specifically includes：

S210, the voice signal corresponding with song for obtaining user's typing.

S220, the corresponding standard acoustic feature information of song is obtained from the acoustic feature template pre-established.

S230, the corresponding duration information of voice signal is obtained, according to duration information and standard acoustic feature information to voice Signal carries out time-domain audio transformation, to change the acoustic feature information of voice signal.

Wherein, voice signal can be the waveform changed over time, for each word, word or the phrase in voice signal Corresponding one section of waveform is corresponded to, whens every section of waveform has its corresponding time starting point, time terminating point and time span etc. Between information, above-mentioned each word, word or phrase and temporal information corresponding with each word, word or phrase are that voice signal is corresponding Duration information.

It, can be according to duration information and standard acoustic feature information pair after getting the corresponding duration information of voice signal Voice signal carries out time-domain audio transformation, to change the acoustic feature information of voice signal.Specifically, can be believed based on the duration of a sound Breath carries out time-domain audio transformation to the corresponding waveform of the voice signal using standard acoustic feature information, so that time-domain audio becomes The duration information of waveform corresponding to the voice signal, fundamental frequency information and energy information can be believed with standard acoustic feature respectively after changing Standard duration information, normal fundamental frequency information and standard energy information in breath match.Aforesaid operations are by the acoustic feature of standard Benchmark of the information as adjustment voice signal, is adjusted the acoustic feature information of voice signal, to change voice signal Acoustic feature information.

Preferably, the corresponding duration information of voice signal is obtained, may include：

The lyrics information for including in voice signal is obtained by speech recognition, it is corresponding to obtain voice signal according to lyrics information Duration information.

Specifically, after the voice signal for getting user's typing voice signal can be obtained by audio recognition method In lyrics information, wherein comprising word, word or phrase etc. in the lyrics information, each word, word or phrase have its it is corresponding when Between information.The corresponding duration information of voice signal can be obtained according to the lyrics information.

S240, obtained voice signal will be carried out after time-domain audio transformation store as targeted voice signal or defeated Out.

The voice signal obtained after above-mentioned carry out time-domain audio transformation both may include the sound characteristic of user oneself, may be used also With the acoustic feature information comprising professional singer, it is based on this, the voice signal obtained after time-domain audio converts can be made It is stored or is exported for targeted voice signal.

Song generation method provided in this embodiment, by obtaining the voice signal corresponding with song of user's typing, from The corresponding standard acoustic feature information of song is obtained in the acoustic feature template pre-established, obtains the corresponding duration of a sound of voice signal Information carries out time-domain audio transformation to voice signal according to duration information and standard acoustic feature information, to change voice signal Acoustic feature information, obtained voice signal is stored or is exported as targeted voice signal after carrying out time domain transformation, It overcomes and carries out acoustic training model using a large amount of data to realize the conversion of voice to song, and most end form in the prior art At song in do not include user oneself sound, the problem for causing user's participation and Experience Degree not high, realize without into Row acoustic training model can be realized in the time domain and the voice of user is converted to the song for remaining with the sound of user oneself Effect, meanwhile, also ensure that song has good acoustical quality.

On the basis of the various embodiments described above, further, according to duration information and standard acoustic feature information to voice Signal carries out time-domain audio transformation, to change the acoustic feature information of voice signal, including：

Voice signal is subjected to tone division according to duration information, is believed according to the normal fundamental frequency in standard acoustic feature information Voice signal after breath, standard duration information and standard energy information divide tone carries out time-domain audio transformation so that through when The fundamental frequency information of the transformed voice signal of domain audio is consistent with normal fundamental frequency information, through the transformed voice signal of time-domain audio The energy information consistent with standard duration information and through the transformed voice signal of time-domain audio of duration information and standard energy It is consistent to measure information.

Wherein, acoustic feature information may include fundamental frequency information, duration information and the energy information of voice signal.Wherein, Fundamental frequency information corresponds to the pitch of voice signal, and duration information corresponds to the rhythm of voice signal, and energy information corresponds to voice signal Volume.

In the present embodiment, voice signal can be carried out by tone division according to the duration information of voice signal, it is preferred that can With according in duration information each word and the corresponding temporal information of each word by voice signal carry out tone division, obtain with The corresponding tone of each word, each tone correspond to a part of voice signal, such as the song for the lyrics comprising 100 words Song, the 1st corresponding temporal information of word are t1b-t1n, and the 2nd corresponding temporal information of word is t2b-t2n ... ..., the 100th The corresponding temporal information of word is t100b-t100n, then t1b-t1n period corresponding part signal is the 1st in voice signal The tone of a word, t2b-t2n period corresponding part signal is the tone ... ... of the 2nd word, voice signal in voice signal Middle t100b-t100n period corresponding part signal is the tone of the 100th word.Wherein, each tone has its corresponding Fundamental frequency information, duration information and energy information.Later the standard in standard acoustic feature information can be utilized as unit of tone Fundamental frequency information, standard duration information and standard energy information carry out time-domain audio change to the voice signal after tone divides Change so that the fundamental frequency information through the transformed voice signal of time-domain audio and corresponding normal fundamental frequency information it is consistent, through time domain sound Frequently the duration information of transformed voice signal is consistent with corresponding standard duration information and through the transformed language of time-domain audio The energy information of sound signal is consistent with corresponding standard energy information.That is, for song in the standard acoustic feature information of song Each word of the bent lyrics, all preserves fundamental frequency information, duration information and the energy information of its corresponding tone, for through time domain Each tone of the transformed voice signal of audio, fundamental frequency information, duration information and the energy information of the tone respectively with standard Normal fundamental frequency information, the standard duration information that tone is corresponded in acoustic feature information are consistent with standard energy information.

Embodiment three

Fig. 3 is a kind of flow chart for song generation method that the embodiment of the present invention three provides.The present embodiment is in above-mentioned each reality On the basis of applying example, after being selected in the voice signal corresponding with song for obtaining user's typing, special according to the standard acoustic Before reference breath carries out the update of acoustic feature information to the voice signal, further include：Extract the frequency spectrum of the voice signal Information；The update of acoustic feature information is carried out to the voice signal according to the standard acoustic feature information, including：Obtain institute The voice signal is carried out tone division according to the duration information by the corresponding duration information of predicate sound signal；Tone is drawn The voice signal after point carries out the conversion of time domain to frequency domain, according to the standard acoustic feature information to obtaining after conversion The acoustic feature information of voice signal on frequency domain is updated；Correspondingly, by the language with updated acoustic feature information Sound signal is stored or is exported as targeted voice signal, including：According to the acoustic feature information that is obtained after update and described Spectrum information obtains targeted voice signal, and targeted voice signal is stored or exported.As shown in figure 3, the side of the present embodiment Method specifically includes：

S310, the voice signal corresponding with song for obtaining user's typing.

S320, the spectrum information for extracting voice signal.

In the present embodiment, the spectrum information of voice signal corresponds to the tone color of voice signal, and that reflects the sound of user spies Sign.During converting voice signals into song, in order to retain the sound characteristic of user, so that the song tool ultimately generated There is the sound characteristic of user oneself, can in advance extract the spectrum information in voice signal.

S330, the corresponding standard acoustic feature information of song is obtained from the acoustic feature template pre-established.

S340, the corresponding duration information of voice signal is obtained, voice signal is carried out by tone division according to duration information；It will Voice signal after tone divides carries out the conversion of time domain to frequency domain, according to standard acoustic feature information to the frequency obtained after conversion The acoustic feature information of voice signal on domain is updated.

Method acquisition voice signal corresponding duration information described according to the above embodiments and divide tone, can To get its corresponding duration information according to the waveform of voice signal and voice content in the time domain, according to duration information to language Sound signal carries out tone division.

It, can also be on frequency domain to voice other than carrying out the update of acoustic feature information to voice signal in the time domain The update of signal progress acoustic feature information.Specifically, can be as unit of each tone of division, it will be after tone divides Voice signal carry out time domain to the conversion of frequency domain, obtain representation of each tone on frequency domain.Existed according to each tone Representation on frequency domain determines the acoustic feature information of the voice signal on frequency domain, and according to standard acoustic feature information pair The acoustic feature information of the voice signal on frequency domain obtained after conversion is updated, and obtains updated acoustic feature information.

S350, targeted voice signal is obtained according to the acoustic feature information and spectrum information obtained after update, by target language Sound signal is stored or is exported.

Above-mentioned updated acoustic feature information is the acoustic feature information of professional singer, and spectrum information reflection is user Sound characteristic, both include user using the targeted voice signal that the acoustic feature information and spectrum information that obtain after update obtain Sound characteristic, and the acoustic feature information including professional singer.It, can be by the target voice after obtaining targeted voice signal Signal is stored or is exported.

Song generation method provided in this embodiment, by obtaining the voice signal corresponding with song of user's typing, and The spectrum signature for extracting voice signal obtains the corresponding standard acoustic feature letter of song from the acoustic feature template pre-established Breath obtains the corresponding duration information of voice signal, voice signal is carried out tone division according to duration information, after tone is divided Voice signal carry out time domain to frequency domain conversion, according to standard acoustic feature information to the voice on the frequency domain obtained after conversion The acoustic feature information of signal is updated, and finally obtains target according to the acoustic feature information and spectrum information obtained after update Targeted voice signal is stored or is exported by voice signal, is overcome and is carried out acoustics using a large amount of data in the prior art Model training causes to use in finally formed song to realize the conversion of voice to song not comprising the sound of user oneself Family participation and the not high problem of Experience Degree, realize without carrying out acoustic training model, can realize on frequency domain by user Voice be converted to the effect for remaining with the song of sound of user oneself, meanwhile, also ensure song have good sound quality Effect.

On the basis of the various embodiments described above, further, according to standard acoustic feature information to the frequency obtained after conversion The acoustic feature information of voice signal on domain is updated, including：

Use the voice signal on the frequency domain obtained after the normal fundamental frequency information replacement conversion in standard acoustic feature information Fundamental frequency information, use the voice letter after the standard duration information replacement conversion in standard acoustic feature information on obtained frequency domain Number duration information, use the voice after the standard energy information replacement conversion in standard acoustic feature information on obtained frequency domain The energy information of signal.

In the present embodiment, as unit of each tone, according to representation of each tone on frequency domain, determine on frequency domain Voice signal acoustic feature information, include fundamental frequency information, duration information and the energy of voice signal in the acoustic feature information Measure information.Later, the voice on the frequency domain obtained after the normal fundamental frequency information replacement conversion in standard acoustic feature information is utilized The fundamental frequency information of signal uses the language on the frequency domain obtained after the standard duration information replacement conversion in standard acoustic feature information The duration information of sound signal, using on the frequency domain obtained after the standard energy information replacement conversion in standard acoustic feature information The energy information of voice signal.

On the basis of the various embodiments described above, further, believed according to the acoustic feature information and frequency spectrum obtained after update Breath obtains targeted voice signal, including：

The acoustic feature information and spectrum information that obtain after update are inputed into vocoder, obtain the mesh that vocoder restores Poster sound signal.

Wherein, the acoustic feature information updated on frequency domain and the spectrum information got in advance, Wu Fazhi are utilized It connects to obtain corresponding targeted voice signal.It is preferred, therefore, that targeted voice signal can be restored by vocoder.Wherein, sound Code device is also referred to as speech analysis synthesis system or voice band compressibility, can use the model parameter and knot of voice signal It closes speech synthesis technique and restores corresponding voice signal, be the coder that a kind of pair of voice is analyzed and synthesized.

In the present embodiment, the acoustic feature information obtained after update and the spectrum information being obtained ahead of time can be input to sound In code device, vocoder is according to each parameter of input, and the speech synthesis technique for combining it internal, restores corresponding target voice Signal.

Example IV

Fig. 4 is the structural schematic diagram of one of embodiment of the present invention four song generating means.As shown in figure 4, this implementation Example song generating means include：

Voice signal obtains module 410, for obtaining the voice signal corresponding with song of user's typing；

Acoustic feature information updating module 420, it is corresponding for obtaining song from the acoustic feature template pre-established Standard acoustic feature information carries out the update of acoustic feature information according to standard acoustic feature information to voice signal；Wherein, sound Learn the standard acoustic feature information that at least one song is preserved in feature templates；

Targeted voice signal determining module 430, for that will have the voice signal conduct of updated acoustic feature information Targeted voice signal is stored or is exported.

Song generating means provided in this embodiment, by voice signal obtain module obtain user's typing with song pair It is corresponding to obtain song using acoustic feature information updating module from the acoustic feature template pre-established for the voice signal answered Standard acoustic feature information carries out the update of acoustic feature information according to standard acoustic feature information, wherein sound to voice signal The standard acoustic feature information for preserving at least one song in feature templates is learned, recycles targeted voice signal determining module will Voice signal with updated acoustic feature information is stored or is exported as targeted voice signal, and existing skill is overcome Acoustic training model is carried out to realize the conversion of voice to song using a large amount of data in art, and in finally formed song not Sound comprising user oneself, the problem for causing user's participation and Experience Degree not high are realized without carrying out acoustic model instruction Practice, the effect that the voice of user is converted to the song of sound for remaining with user oneself can be realized, meanwhile, also ensure song Sound has good acoustical quality.

On the basis of the various embodiments described above, further, acoustic feature information updating module 420 may include：

First duration information acquiring unit, for obtaining the corresponding duration information of voice signal；

Time-domain audio converter unit, for carrying out time domain to voice signal according to duration information and standard acoustic feature information Audio transformation, to change the acoustic feature information of voice signal；

Targeted voice signal determining module 430 can specifically include：

First object voice signal determination unit, the voice signal for that will obtain after time-domain audio transformation is as mesh Poster sound signal is stored or is exported.

Further, time-domain audio converter unit specifically can be used for：

Further, which can also include：

Spectrum information extraction module, for obtain user's typing voice signal corresponding with song after, according to mark Before quasi- acoustic feature information carries out the update of acoustic feature information to voice signal, the spectrum information of voice signal is extracted；

Acoustic feature information updating module 420 can also include：

Second duration information acquiring unit, for obtaining the corresponding duration information of voice signal；

Frequency domain audio converter unit, for voice signal to be carried out tone division according to duration information；After tone is divided Voice signal carry out time domain to frequency domain conversion, according to standard acoustic feature information to the voice on the frequency domain obtained after conversion The acoustic feature information of signal is updated；

Targeted voice signal determining module 430 specifically can also include：

Second target voice determination unit, for obtaining mesh according to the acoustic feature information and spectrum information obtained after update Targeted voice signal is stored or is exported by poster sound signal.

Further, frequency domain audio converter unit specifically can be used for：

Further, the second target voice determination unit specifically can be used for：

Further, the first duration information acquiring unit and the second duration information acquiring unit specifically may be incorporated for：

The lyrics information for including in voice signal is obtained by speech recognition, it is corresponding to obtain voice signal according to lyrics information Duration information；

Further, which can also include：

Standard acoustic feature information extraction modules, for obtain user's typing voice signal corresponding with song it Before, standard acoustic feature information of the acoustic feature information of multiple songs of recording as corresponding song is extracted respectively；

Acoustic feature template generation module, for by the identification information of multiple songs and corresponding standard acoustic feature information It is stored in acoustic feature template.

It is raw that song provided by any embodiment of the invention can be performed in song generating means provided by the embodiment of the present invention At method, have the corresponding functional module of execution method and beneficial effect.

Embodiment five

Fig. 5 is the structural schematic diagram that the song that the embodiment of the present invention five provides generates terminal.Fig. 5, which is shown, to be suitable for being used in fact The exemplary song of existing embodiment of the present invention generates the block diagram of terminal 512.The song that Fig. 5 is shown generates terminal 512 One example, should not function to the embodiment of the present invention and use scope bring any restrictions.

It is showed in the form of universal computing device as shown in figure 5, song generates terminal 512.The group of song generation terminal 512 Part can include but is not limited to：One or more processor 516, memory 528 connect (including the storage of different system components Device 528 and processor 516) bus 518.

Bus 518 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts For example, these architectures include but is not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC) Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.

Song generates terminal 512 and typically comprises a variety of computer system readable media.These media can be any energy Enough usable mediums accessed by song generation terminal 512, including volatile and non-volatile media, it is moveable and irremovable Medium.

Memory 528 may include the computer system readable media of form of volatile memory, such as arbitrary access is deposited Reservoir (RAM) 530 and/or cache memory 532.Song generate terminal 512 may further include it is other it is removable/no Movably, volatile/non-volatile computer system storage medium.Only as an example, storage device 534 can be used for reading and writing Immovable, non-volatile magnetic media (Fig. 5 do not show, commonly referred to as " hard disk drive ").It, can although being not shown in Fig. 5 To provide the disc driver for reading and writing to removable non-volatile magnetic disk (such as " floppy disk "), and it is non-volatile to moving Property CD (such as CD-ROM, DVD-ROM or other optical mediums) read and write CD drive.In these cases, each drive Dynamic device can be connected by one or more data media interfaces with bus 518.Memory 528 may include at least one journey Sequence product, the program product have one group of (for example, at least one) program module, these program modules are configured to perform this hair The function of bright each embodiment.

Program/utility 540 with one group of (at least one) program module 542, can store in such as memory In 528, such program module 542 includes but is not limited to operating system, one or more application program, other program modules And program data, it may include the realization of network environment in each of these examples or certain combination.Program module 542 Usually execute the function and/or method in embodiment described in the invention.

Song generates terminal 512 can also be with one or more external equipments 514 (such as keyboard, sensing equipment, display 524 etc., wherein display 524 can decide whether to configure according to actual needs) it communicates, can also to use with one or more Family can generate the equipment that interact of terminal 512 with the song and communicate, and/or with enable song generation terminal 512 and one or A number of other any equipment (such as network interface card, modem etc.) communications for calculating equipment and being communicated.This communication can be with It is carried out by input/output (I/O) interface 522.Also, song, which generates terminal 512, can also pass through network adapter 520 and one A or multiple networks (such as local area network (LAN), wide area network (WAN) and/or public network, such as internet) communication.Such as figure Shown, network adapter 520 is communicated by bus 518 with other modules that song generates terminal 512.Although should be understood that Fig. 5 In be not shown, can in conjunction with song generate terminal 512 use other hardware and/or software module, including but not limited to：Micro- generation Code, device driver, redundant processing unit, external disk drive array, RAID system, tape drive and data backup are deposited Storage device etc..

The program that processor 516 is stored in memory 528 by operation, thereby executing various function application and data Processing, such as realize song generation method provided by any embodiment of the invention.

Embodiment six

The embodiment of the present invention six additionally provides a kind of computer readable storage medium, is stored thereon with computer program, should Realize that the song generation method as provided by the embodiment of the present invention, this method include when program is executed by processor：

Obtain the voice signal corresponding with song of user's typing；

The corresponding standard acoustic feature information of song is obtained from the acoustic feature template pre-established, according to standard acoustic Characteristic information carries out the update of acoustic feature information to voice signal；Wherein, at least one song is preserved in acoustic feature template Bent standard acoustic feature information；

Certainly, a kind of computer readable storage medium provided by the embodiment of the present invention, the computer program stored thereon The method operation being not limited to the described above, can also be performed the phase in song generation method provided by any embodiment of the invention Close operation.

The computer storage medium of the embodiment of the present invention, can be using any of one or more computer-readable media Combination.Computer-readable medium can be computer-readable signal media or computer readable storage medium.It is computer-readable Storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device or Device, or any above combination.The more specific example (non exhaustive list) of computer readable storage medium includes：Tool There are electrical connection, the portable computer diskette, hard disk, random access memory (RAM), read-only memory of one or more conducting wires (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD- ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer-readable storage Medium can be any tangible medium for including or store program, which can be commanded execution system, device or device Using or it is in connection.

Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By the use of instruction execution system, device or device or program in connection.

The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited In wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.

The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, It further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion Divide and partially executes or executed on a remote computer or server completely on the remote computer on the user computer.? Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including local area network (LAN) or Wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as mentioned using Internet service It is connected for quotient by internet).

Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.

Claims

1. a kind of song generation method, which is characterized in that including：

Obtain the voice signal corresponding with song of user's typing；

The corresponding standard acoustic feature information of the song is obtained from the acoustic feature template pre-established, according to the standard Acoustic feature information carries out the update of acoustic feature information to the voice signal；Wherein, it is saved in the acoustic feature template There is the standard acoustic feature information of at least one song；

2. the method according to claim 1, wherein being believed according to the standard acoustic feature information the voice Number carry out acoustic feature information update, including：

The corresponding duration information of the voice signal is obtained, according to the duration information and the standard acoustic feature information to institute Predicate sound signal carries out time-domain audio transformation, to change the acoustic feature information of the voice signal；

Correspondingly, using the voice signal with updated acoustic feature information as targeted voice signal carry out storage or it is defeated Out, including：

The voice signal obtain after time-domain audio transformation is stored or exported as targeted voice signal.

3. according to the method described in claim 2, it is characterized in that, being believed according to the duration information and the standard acoustic feature Breath carries out time-domain audio transformation to the voice signal, to change the acoustic feature information of the voice signal, including：

The voice signal is subjected to tone division according to the duration information, according to the mark in the standard acoustic feature information The voice signal after quasi- fundamental frequency information, standard duration information and standard energy information divide tone carries out time-domain audio change Change so that the fundamental frequency information through the transformed voice signal of time-domain audio it is consistent with the normal fundamental frequency information, through time-domain audio The duration information of transformed voice signal is consistent with the standard duration information and believes through the transformed voice of time-domain audio Number energy information it is consistent with the standard energy information.

4. the method according to claim 1, wherein in the voice signal corresponding with song for obtaining user's typing Later, before the update for carrying out acoustic feature information to the voice signal according to the standard acoustic feature information, further include： Extract the spectrum information of the voice signal；

The update of acoustic feature information is carried out to the voice signal according to the standard acoustic feature information, including：

The corresponding duration information of the voice signal is obtained, the voice signal is carried out by tone according to the duration information and is drawn Point；The voice signal after tone is divided carries out the conversion of time domain to frequency domain, according to the standard acoustic feature information pair The acoustic feature information of the voice signal on frequency domain obtained after conversion is updated；

Targeted voice signal is obtained according to the acoustic feature information obtained after update and the spectrum information, by targeted voice signal It is stored or is exported.

5. according to the method described in claim 4, it is characterized in that, according to the standard acoustic feature information to being obtained after conversion Frequency domain on the acoustic feature information of voice signal be updated, including：

Use the voice signal on the frequency domain obtained after the normal fundamental frequency information replacement conversion in the standard acoustic feature information Fundamental frequency information, use the language after the standard duration information replacement conversion in the standard acoustic feature information on obtained frequency domain The duration information of sound signal uses the frequency domain obtained after the standard energy information replacement conversion in the standard acoustic feature information On voice signal energy information.

6. according to the method described in claim 4, it is characterized in that, according to the acoustic feature information and the frequency obtained after update Spectrum information obtains targeted voice signal, including：

The acoustic feature information obtained after update and the spectrum information are inputed into vocoder, the vocoder is obtained and restores Targeted voice signal.

7. method according to claim 2 or 4, which is characterized in that obtain the corresponding duration information of the voice signal, wrap It includes：

The lyrics information for including in the voice signal is obtained by speech recognition, and the voice is obtained according to the lyrics information The corresponding duration information of signal.

8. method according to claim 1 to 6, which is characterized in that obtaining the corresponding with song of user's typing Voice signal before further include：

By the identification information of the multiple song with corresponding standard acoustic feature information preservation in acoustic feature template.

9. a kind of song generating means, which is characterized in that including：

Acoustic feature information updating module, for obtaining the corresponding standard of the song from the acoustic feature template pre-established Acoustic feature information carries out the update of acoustic feature information according to the standard acoustic feature information to the voice signal；Its In, the standard acoustic feature information of at least one song is preserved in the acoustic feature template；

10. a kind of song generates terminal, which is characterized in that including：

One or more processors；

Storage device, for storing one or more programs,

When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as song generation method described in any one of claims 1-8.

11. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor Such as song generation method described in any one of claims 1-8 is realized when execution.