CN108806655A

CN108806655A - Song automatically generates

Info

Publication number: CN108806655A
Application number: CN201710284144.8A
Authority: CN
Inventors: 廖勤樱; 杨南; 栾剑; 韦福如; 刘震; 杨子奇; 黄斌
Original assignee: Microsoft Technology Licensing LLC
Current assignee: Microsoft Technology Licensing LLC
Priority date: 2017-04-26
Filing date: 2017-04-26
Publication date: 2018-11-13
Anticipated expiration: 2037-04-26
Also published as: CN108806655B; WO2018200268A1

Abstract

According to the realization of the disclosure, a kind of scheme for supporting the automatic song of machine to generate is provided.In this scenario, the input of user is used for determining creation intention of the user about song to be generated.The template for song is generated based on creation intention, which indicates the distribution of the melody and the lyrics of song relative to melody.Then, it is at least partially based on the template, generates the lyrics of song.Thus, it is possible to automatically create the melody and the lyrics for meeting user's creation intention and being mutually matched.

Description

Song automatically generates

Background technology

Song is people's appreciation and a kind of favorite art form, has been deeply infiltrated into people's lives.However, song Song creation is still a complicated process.Generally, song creation process include write words (that is, generate lyrics) and wrirte music (that is, Generate melody) two major parts.Tradition composition needs composer to have certain music theory knowledge, and combines inspiration and creation Experience creates complete song lyric.Create melodious melody has more requirement in music theory, such as requires to ensure melody With rhythm unification, the combination etc. be capable of Behaviour theme, embody various music styles or style.In addition, weight of the lyrics as song Component part is wanted, is also required to express the meaning, agree with theme and match with song lyric.Therefore, it to create with specific wind Lattice and emotion and the song for showing specific subject require the music theory of creator very high.

Invention content

According to the realization of the disclosure, a kind of scheme for supporting the automatic song of machine to generate is provided.In this scenario, user Input be used for determining creation intention of the user about song to be generated.The template for song is generated based on creation intention, The template indicates the distribution of the melody and the lyrics of song relative to melody.Then, it is at least partially based on the template, generates song The lyrics.Thus, it is possible to automatically create the melody and the lyrics for meeting user's creation intention and being mutually matched.

It is the specific implementation below in order to which simplified form introduces the mark to concept to provide Summary It will be further described in mode.Summary is not intended to identify the key feature or main feature of claimed theme, Also it is not intended to limit the range of claimed theme.

Description of the drawings

Fig. 1 shows the block diagram of the computing environment for the multiple realizations that can be implemented within the disclosure；

Fig. 2 shows the block diagrams that system is generated according to some automatic songs realized of the disclosure；

Fig. 3 shows the schematic diagram analyzed creation intention input by user realized according to some of the disclosure；

Fig. 4 shows that the automatic song realized according to other of the disclosure generates the block diagram of system；And

Fig. 5 shows the flow chart for the process that the song realized according to some of the disclosure generates.

In these attached drawings, same or similar reference mark is for indicating same or similar element.

Specific implementation mode

The disclosure is discussed now with reference to several example implementations.It is realized it should be appreciated that discussing these merely to making It obtains those of ordinary skill in the art and better understood when and therefore realize the disclosure, rather than imply to the range of this theme Any restrictions.

As it is used herein, term " comprising " and its variant will be read as the opening for meaning " to include but not limited to " Formula term.Term "based" will be read as " being based at least partially on ".Term " realization " and " a kind of realization " will be solved It reads to be " at least one realization ".Term " another realization " will be read as " at least one other realization ".Term " first ", " second " etc. may refer to different or identical object.Hereafter it is also possible that other specific and implicit definition.

As discussed above, there are many requirements for the melody of song and/or the lyrics during song creation, these are wanted Seek the possibility for limiting ordinary people or organizing individual characteristic of creating song.In many situations, ordinary people or tissue if it is intended to The song for obtaining customization generally requires to seek help from the people with professional ability of writing words and wrirte music or tissue.With computer age Arrival, in particular with being constantly progressive for artificial intelligence, it is desired to be able to automatically generate desired song, such as generate song Melody and/or the lyrics.

According to some realizations of the disclosure, provide a kind of by the computer-implemented scheme for automatically generating song.At this In scheme, the input of user, image, word, video and/or audio etc. are used for determining user about generation song Creation intention.The creation intention of such input user further be used to instruct the generation of the template of song so that be generated Template instruction song the distribution relative to melody of melody and the lyrics.Further, based on the melody and song indicated by template The distribution of word can generate the lyrics of song.By the scheme of the disclosure, the lyrics generated have been matched with the template of song In melody, therefore the song that can be sung can be directly combined into together with the melody.In addition, the input life based on user At the lyrics, melody and/or song can embody the creation intention of user, enabling provide to the user personalized and high-quality Song, the lyrics and/or the melody of amount.

Below with reference to attached drawing come the basic principle for illustrating the disclosure and several example implementations.

Example context

Fig. 1 shows the block diagram of the computing environment 100 for the multiple realizations that can be implemented within the disclosure.It should be appreciated that Computing environment 100 shown in figure 1 is only exemplary, without should constitute to the function realized described in the disclosure and Any restrictions of range.As shown in Figure 1, computing environment 100 includes the computing device 102 of universal computing device form.Calculating is set Standby 102 component can include but is not limited to one or more processors or processing unit 110, memory 120, storage device 130, one or more communication units 140, one or more input equipments 150 and one or more output equipments 160.

In some implementations, computing device 102 may be implemented as various user terminals or service terminal.Service terminal can To be server, the mainframe computing devices etc. of various service providers offers.User terminal is all any kind of mobile whole in this way End, fixed terminal or portable terminal, including cell phone, multimedia computer, multimedia tablet, internet node, communication Device, desktop computer, laptop computer, notebook computer, netbook computer, tablet computer, PCS Personal Communications System (PCS) equipment, personal navigation equipment, personal digital assistant (PDA), audio/video player, digital camera/video camera, positioning Equipment, television receiver, radio broadcast receiver, electronic book equipment, game station or its it is arbitrary combine, including these set Standby accessory and peripheral hardware or its arbitrary combination.It is also foreseeable that computing device 102 can support any kind of be directed to The interface (" wearable " circuit etc.) of user.

Processing unit 110 can be reality or virtual processor and can according to the program stored in memory 120 come Execute various processing.In a multi-processor system, multiple processing unit for parallel execution computer executable instructions are calculated with improving The parallel processing capability of equipment 102.Processing unit 110 can also be referred to as central processing unit (CPU), microprocessor, control Device, microcontroller.

Computing device 102 generally includes multiple computer storage medias.Such medium, which can be computing device 102, to visit Any medium that can be obtained asked, including but not limited to volatile and non-volatile media, removable and non-removable media. Memory 120 can be volatile memory (such as register, cache, random access storage device (RAM)), non-volatile Memory (for example, read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory) or its certain group It closes.Memory 120 may include one or more program modules 122, these program modules are configured as executing described herein Various realizations function.Module 122 can be accessed and run by processing unit 110, to realize corresponding function.Storage device 130 can be detachable or non-removable medium, and may include machine readable media, and it can be used to store information And/or it data and can be accessed in computing device 102.

The function of the component of computing device 102 can realize with single computing cluster or multiple computing machines, these meters Calculating machine can be by being communicated.Therefore, computing device 102 can use and other one or more services The logical connection of device, personal computer (PC) or another general networking node is operated in networked environment.It calculates Equipment 102 can also be communicated as desired by communication unit 140 with one or more external equipment (not shown), external Equipment database 170, other storage devices, server, display equipment etc., with one or more so that user sets with calculating The equipment of standby 102 interaction are communicated, or with make appointing for computing device 102 and other one or more computing device communications What equipment (for example, network interface card, modem etc.) is communicated.Such communication can be via input/output (I/O) interface (not shown) executes.

Input equipment 150 can be one or more various input equipments, such as the input of mouse, keyboard, trackball, voice Equipment, camera etc..Output equipment 160 can be one or more output equipments, such as display, loud speaker, printer etc..? In the realization that some automatic songs generate, input equipment 150 receives the input 104 of user.It is expected in input depending on user The type of appearance, different types of input equipment 150 can be used to receive input 104.Input 104 is provided to module 122, with Make module 122 based on input 104 determination users about song creation intention and thus generate corresponding song melody and/ Or the lyrics.In some implementations, module 122 is using the lyrics of generation, melody and/or the song that is made of the lyrics and melody as defeated Going out 106 is supplied to output equipment 160 for output.Output equipment 160 can be with one kind such as word, image, audio and/or videos Or diversified forms provide output 106.

The example implementation discussed more fully below that the lyrics, melody and song are automatically generated in module 122.

The generation of melody and the lyrics

Fig. 2 shows the block diagrams that some automatic songs realized according to the disclosure generate system.In some implementations, this is System can be implemented as the module 122 in computing device 102.In the realization of Fig. 2, module 122 be used to realize automatic melody life It is generated at the lyrics.As shown, module 122 includes creation intention analysis module 210, lyrics generation module 220 and template life At module 230.According to the realization of the disclosure, creation intention analysis module 210 is configured as receiving the input 104 of user, and Creation intention 202 of the user about song to be generated is determined based on the input 104.Input 104 can be via computing device 102 Input equipment 150 is received from user and is provided to creation intention analysis module 210.

In some implementations, creation intention analysis module 210 can be based on certain types of input 104 or a variety of differences Creation intention 202 is analyzed and is determined in the input 104 of type.The example of input 104 can be word, such as pass input by user Dialogue, label between key word, personage, the various documents comprising word etc..Alternatively or additionally, inputting 104 may include The video and/or audio etc. of the image of various formats, various length and format.The use that can be provided via input equipment 150 Family interface receives the input of user.Therefore, according to the realization of the disclosure, user can be allowed to be waited for by being simply input to control The song (lyrics and/or melody that include song) of generation, without require user to have more music theory knowledge guide the lyrics, The generation of melody and/or song.

User refers to that the user being embodied in input 104 it is expected that the song generated can about the creation intention of song The one or more features given expression to, including the theme of song, emotion, keynote, style, key element etc..For example, if defeated Enter 104 be a good fortune of the whole family and the kinsfolk in image expression it is all very happy, then creation intention analysis module 210 can Creation intention to analyze user is that the song title generated to be made to be " family " and integrally to give expression to " joy " Emotion etc..

Depending on the type of input 104, creation intention analysis module 210 may be used different analytical technologies and come from input Creation intention 202 is extracted in 104.For example, if input 104 is word, nature may be used in creation intention analysis module 210 Language Processing or text analysis technique analyze theme described in the word of input, emotion, key element etc..

In another example, if input 104 is image, image knowledge may be used in creation intention analysis module 210 Not, the image analysis technologies such as recognition of face, gesture recognition, expression detection, gender and age detection include to analyze in image The information such as object and personage and the expression of these objects and personage, posture and emotion, and thereby determine that image integrally shows Theme, emotion, the key element (such as image includes people, object, environment, event etc.) gone out.

Alternatively or additionally, creation intention analysis module 210 can also obtain other features associated with image, all The size of such as image, format, type (such as oil painting, stick figure, clip art, black white image), integral color, associated mark Label (can be added by user or automatically be added) and metadata etc..Then, analyze and determine creation based on the information of acquisition It is intended to 202.

Fig. 3 shows the schematic diagram of the creation intention analysis to input 104, and it is image to input 104 in this instance.It is receiving To after image 104, recognition of face and gesture recognition technology may be used to determine in image 104 in creation intention analysis module 210 Including multiple personages, and thereby determine that the classification of image 104 belongs to " crowd ", as indicated by the label 302 in Fig. 3.Into One step, creation intention analysis module 210 can also be analyzed every in image 104 by gender and age detection and recognition of face etc. The age of a personage and gender (as indicated by label 304) and be also based on the age, gender and other information (such as Human face similarity degree etc.) determine that crowd that image 104 includes is one family.

In addition, passing through expression detection technique, image recognition technology, image analysis technology etc., it may be determined that people in image 104 The emotion of object is cheerful and light-hearted, and in outdoor environment.Therefore, creation intention analysis module 210 can determine the wound of user Make to be intended to may be to create the happy song for eulogizing family, the members such as " open air ", " close ", " individual " can occur in this song song Element.Certainly, creation intention analysis module 210 can also continue to determine that the information such as type, format, the size of image 104 are come into one Step ground auxiliary determines creation intention.

In other examples, if input 104 includes audio and/or video, creation intention analysis module 210 may be used Speech analysis (being directed to audio and video) and image analysis (for video) technology include to determine in input audio and/or video Particular content.For example, can be by the way that the voice in audio and/or video is converted to word, and use carry above in turn And natural language processing or text analysis technique analyzed.Above-mentioned image analysis technology may be used to video One or more frames are analyzed.Further, it is also possible to be analyzed the spectral characteristic of the voice in audio and/or video to come really The emotion of personage that is showed in accordatura frequency and/or video identifies theme etc. involved by voice.

It should be appreciated that may be used existing or leaved for development various to word, image, audio and/or video in the future In analytical technology come execute creation intention analysis task, so long as technology can be analyzed from the input of respective type The one or more aspects of song creation can be influenced by going out.In some implementations, input 104 can include a plurality of types of Input, and corresponding analytical technology therefore may be used for each type of input and analyze.From different types of The analysis result that input obtains can be combined for determining creation intention 202.In some implementations, if input 104 Include certain keys of instruction of specific creation intention, such as the style of instruction song, emotion etc. or the instruction lyrics Element or a part of melody and/or the lyrics distribution for indicating song, then can extract these specific creation from input 104 It is intended to.Although listing the example of some creation intentions, however, it is to be appreciated that influence can also be analyzed from the input of user Other aspects of the feature of song, the scope of the present disclosure are not limited in this respect.

With continued reference to Fig. 2, the creation intention 202 that creation intention analysis module 210 determines can be passed as keyword To template generation module 230.Template generation module 230 is configured as generating the template for song based on creation intention 202 (template)204.The template 204 of song can at least indicate the melody of song, and melody can be represented as holding for phoneme Continuous time, pitch track, loudness of a sound track and other various parameters for generating melody.In addition, the template 204 of song may be used also To indicate distribution of the lyrics relative to melody, include the lyrics number of words of each trifle, the duration of each phoneme of each word, Pitch track and loudness of a sound track etc..Therefore, the lyrics distribution in template 204 matches with melody so that the song thus generated The song of word and melody composition can easily be sung.

In some realizations, may be implemented to determine and store multiple predefined song templates, referred to as " candidate template ".This When, template generation module 230 can be configured as based on creation intention 202 selected from this multiple candidate template template 204 with Generation for current song.Multiple candidate templates can be obtained from existing song.For example, can be by the rotation of existing song The lyrics of rule and existing song relative to melody distribution directly or through being determined as one or more after manually adjusting Candidate template.In other examples, one or more candidate templates can be by having the people of music theory knowledge creation.In addition, one A or multiple candidate templates can also be provided by user, such as created by user or obtained from other sources.Multiple candidate templates It can be obtained ahead of time and be stored in storage device for using.For example, multiple candidate templates can be stored in calculating The storage device 130 of equipment 102 is used as local data, and/or can be stored in 102 addressable external data of computing device In library 170.

Music style, tune, rhythm, the emotion of candidate template are known, and can for example pass through the form of label It is recorded.Template generation module 230 can be based on letters such as theme, emotion, elements included by creation intention 202 as a result, Breath selects the candidate template to match as template 204 from multiple candidate templates.Template generation module 230 can be based on and time The associated label information of modeling plate (having recorded music style, tune, rhythm, emotion of candidate template etc.) and creation intention Template 204 to be used is selected in 202 comparison.For example, if creation intention 202 indicates that the theme of song to be generated is " family Front yard " and emotion will show " joy ", then the candidate mould that emotion is more happy, tune and rhythm are more active can be selected Plate.In some implementations, two or more candidate blocks can be determined for selection by the user based on creation intention 202, and led to The user received is crossed to select to determine template 204 to be used.

It substitutes predefined candidate template or as supplement, in other realization, template generation module 230 may be used also To generate template 204 to be used in real time based on creation intention 202.Specifically, template generation module 230 can be by one Or multiple existing song lyrics are divided into multiple melody segments in advance.The division of such melody segment can be with the one of melody Based on a or multiple syllables, and there can be any identical or different length.By professional to being carried out by song Artificial division is also feasible.It divides obtained multiple melody segments and is used as the basis that follow-up melody generates, and can Be partly or entirely stored in computing device 102 local memory device 130 and/or addressable external equipment, it is all in full According in library 170.After creation intention analysis module 210 receives creation intention 202, template generation module 230 can be based on Creation intention 202 selects melody segment forming complete melody.When combining melody segment, not only to enable melody Enough meet creation intention 202, but also to allow the transitions smooth between melody segment, so that whole melody sounds more happy Ear.Standard about " smooth " and judgement will specifically describe below.

Specifically, template generation module 230 can select two or more candidate's rotations from the melody segment divided in advance Restrain segment, and the smoothness being then based between candidate melody segment is by therein at least two candidate melody fragment assemblies For melody.The selection of candidate melody segment can be based on creation intention 202, so that selected one or more times Select melody segment can individually and/or combine show the creation intention 202.For example, if the instruction of creation intention 202 waits for The emotion of the song of generation is " joy ", then can select that happy mood can be expressed from the melody segment divided in advance Melody segment is as candidate melody segment.If creation intention 202 also indicates other aspects for influencing song creation, can also phase Melody segment is selected accordingly with answering.

In some implementations, the melody segment that divided in advance can be classified and be labelled, and then can be with base Candidate melody segment is determined in classification and the comparison of label and creation intention 202.It, can also be in advance in other realization Definition trains preselected model to execute the selection of candidate melody segment.The preselected model can be trained to can basis The corresponding candidate melody segment of creation intention 202 (for example, in the form of keyword) selection of input.Different instructions may be used Practice the melody segment that creation intention and known and these creation intentions match and trains the preselected mould as training data Type.Further, it is also possible to some negative samples (that is, some creation intentions and with the unmatched melody segment of these creation intentions) The model is trained, so that the model has the ability of correct judgment and error result.The preselected model can be by part Or it is stored entirely in the local memory device 130 and/or addressable external equipment, such as database 170 of computing device 102 In.

As described above, seamlessly transitting between melody segment is important for the song quality created.It is waiting Among selecting melody segment, template generation module 230 can determine the smoothness between each two candidate's melody segment to determine this Whether two candidate melody segments can be spliced together.Smoothness between neighboring candidate melody segment can use various Technology determines that the example of such technology includes but not limited to：By the pitch track, right for analyzing the melody in melody segment The aspect of the perception of continuity and/or other influences attentive listener between the pitch track answered is measured.

In some implementations, template generation module 230 can determine two using pre-defined smoothness judgment models Whether a candidate's melody segment is acoustically being smoothed transition.The smoothness judgment models can be designed as the rotation based on input Restrain various parameters,acoustics (frequency spectrum, frequency, loudness, duration etc.) output smoothness of segment.Output can be to locate In a certain range of smoothness metric or it can indicate the whether smooth instructions of melody segments (such as 1 or 0 of two inputs Value).Training data for smoothness judgment models as training may include that the adjacent melody segment having in song (is made For positive sample) and the melody segment selected at random from multiple segments of existing song (as negative sample).In some instances, Such model for example can the various models (model that (LSTM) is such as remembered based on DNN or shot and long term) based on neural network Or any other can complete the model etc. of smoothness judgement.Template generation module 230 can be by two candidate melody pieces Section is input to smoothness judgment models, and (or whether result refers to for the comparison of the result based on model output and predetermined threshold Show smooth) to determine two, whether candidate melody segment is smooth and it is thus determined that whether they can be spliced.

Alternatively or additionally, template generation module 230 can also pass through Viterbi search (viterbi searching) Plan the stitching path of candidate melody segment, i.e. candidate melody segment puts in order.Therefore, 230 base of template generation module In smoothness and/or the result of Viterbi search, it may be determined that the candidate melody segment of two or more to be spliced and it Splicing sequence.The candidate melody segment of these splicings forms the melody indicated by template 204.

Further, in some implementations, template generation module 230 is also based on generated melody to determine by mould The distribution of the lyrics indicated by plate 204.In some implementations, since the melody segment of composition melody is divided from existing song It obtains, template generation module 230 can analyze the lyrics in the song corresponding to the candidate melody segment being spliced, really with this Lyrics distribution indicated by solid plate.It will be understood that, can be considered as each other between the lyrics and melody segment in existing song It is matched.Therefore, the lyrics distribution that the candidate melody segment that easily can be analyzed and be spliced matches.In other reality In existing, distribution of the lyrics relative to melody can also be determined based on creation intention 202 and established melody.Determining melody And after the lyrics are relative to the distribution of melody, template generation module 230 can obtain corresponding template 204.

In some implementations, clearly refer to about what melody and/or the lyrics were distributed if creation intention 202 includes user Show, template generation module 230 also takes into account these when generating template, can clearly embody these creation intentions to obtain Template 204.In order to further enhance user experience, the template for being selected or being generated based on creation intention 202 can be first as in Between template be presented to user.Then template generation module 230 receives user and the melody and/or the lyrics of intermediate die plate is distributed Modification, and obtain final template 204 based on these modifications.

The template 204 determined by template generation module 230 be used to instruct the lyrics of lyrics generation module 220 to generate.Tool Body, lyrics generation module 220 is configured as generating the lyrics of song based on template 204.Since template 204 indicates the lyrics Relative to the distribution of melody, therefore the lyrics can be generated as matching with the distribution by lyrics generation module 220.For example, the lyrics In each trifle lyrics number of words, duration, pitch track and the loudness of a sound track of each phoneme of each word and this point Matching indicated by cloth, so that the lyrics and melody that are generated can make up the song that can be sung.In addition, the lyrics Generation module 220 can also obtain creation intention 202 from creation intention analysis module 210, and be based further on creation intention 202 generate the lyrics.The lyrics that creation intention can instruct lyrics generation module 220 to generate also show corresponding theme, emotion And/or various key elements.

In some implementations, lyrics generation module 220 can indicate one or more existing lyrics with template 204 Distribution be compared.The existing lyrics may include the lyrics for including in various existing songs, or the poem etc. write The text that can be sung.If some existing lyrics matches with the distribution indicated by template 204, the song can be selected Word.In some cases, one or more existing lyrics can also be divided into multiple lyrics pieces by lyrics generation module 220 Section, and determine whether corresponding lyrics segment matches with part distribution is indicated in template.Then, by will be matched more A lyrics fragment combination at song the lyrics.When in addition considering creation intention 202, lyrics generation module 220 is also based on Creation intention 202 selects lyrics segment, so that the lyrics segment of selection individually or combines and embodies the creation intention 202 One or more aspects.

In other realization, lyrics generation module 220 can generate model using the predefined lyrics to generate song The generation of word.Such lyrics generate model and can be trained to have the different templates according to song (for example, the different lyrics Distribution) generate the abilities of the different lyrics.Model is generated using such lyrics, the lyrics indicated with template 204 can be obtained It is distributed the lyrics to match.For example, the lyrics number of words of each trifle in the lyrics, the duration of each phoneme of each word, Pitch track and loudness of a sound track all with matching indicated by the distribution so that the lyrics and melody that are generated can make up The song that can be sung.

Alternatively or additionally, the lyrics, which generate model, can also be trained to many not Tongfangs based on creation intention 202 The input in face generates the corresponding lyrics so that the lyrics can embody the one or more aspects of creation intention, such as meet phase The song title answered gives expression to song emotion and/or comprising certain key elements.In some implementations, if the lyrics generate mould Block 220 is not covered with the whole that the lyrics generate required by model from the creation intention 202 that creation intention analysis module 210 obtains and creates Make the aspect (because limited input 104 of user) being intended to, then it is empty otherwise value can be set to, so that song Word generation module 220 can generate the defeated of model using limited creation intention 202 (and template 204 of song) as the lyrics Enter to generate the lyrics.It should be appreciated that in some implementations, if creation intention 202 includes user clearly referring to about the lyrics Showing, such as the key element or word that the lyrics include, lyrics generation module 220 also takes into account these when generating template, To obtain the lyrics that can clearly embody these creation intentions.

In some instances, the lyrics generate the model that model can be based on neural network, such as Recognition with Recurrent Neural Network (RNN) Or other learning models and be fabricated.The lyrics are generated model and can be trained to using multiple existing lyrics.It is existing The lyrics may include the text that the lyrics for including or the poem write etc. can be sung in various existing songs.In training When, the existing lyrics can be classified as different themes, style and/or content.Lyrics generation model, which is trained to work as, to be connect When receiving specific template and/or creation intention, the corresponding lyrics can be generated.Therefore, specific template and creation intention As the training data of lyrics generation module, can learn from training data to for specific mould so that the lyrics generate model Plate and/or creation intention generate the ability of the lyrics.The lyrics trained, which generate model, to be partly or entirely stored in calculating The local memory device 130 and/or addressable external equipment of equipment 102, such as database 170.It should be appreciated that may be used It is various known and/or model structure and/or training method leaved for development obtain lyrics generation module, the model of the disclosure in the future It encloses and is not limited in this respect.

By from it is being selected in the existing lyrics and/or by the lyrics generate model generate the lyrics after, in some implementations, Lyrics generation module 220 can directly provide the lyrics as output 106.It is alternatively possible to provide to the user to giving birth to automatically At the lyrics modification.Lyrics generation module 220 can first will be selecting and/or by the lyrics generate model from the existing lyrics The lyrics of generation are exported as the candidate lyrics to user, such as are shown with text via output equipment 160 and/or played with audio To user.The modification that user can be inputted by input equipment 150 to the candidate lyrics indicates 206.Such modification instruction 206 can To indicate the adjustment to one or more of candidate lyrics word, such as replaces these words with other words or change word The sequence of language.After receiving input of the user about the modification instruction 206 of the lyrics, lyrics generation module 220 is based on input The 206 candidate lyrics of modification of modification instruction to obtain the lyrics 106 of song for output.

The lyrics 106 can be provided to the output equipment 160 of computing device 102, and can be with word and/or audio Form is exported to user.In some implementations, the melody in the template 204 that template generation module 230 generates can also be carried Supply output equipment 160 using as output 106.For example, melody 106 can be composed it is defeated for the form of numbered musical notation and/or staff Go out to user.

Automatic melody and the lyrics discussed above generate.In some alternative realizations, the lyrics may be utilized for and template The melody of 204 instructions is combined to generate song.Such song can also be played to user.Automatic song discussed more fully below The example implementation of Qu Hecheng.

The synthesis of song

Fig. 4 shows the block diagram of the module 122 of the realization synthesized according to automatic song.In the example of fig. 4, in addition to automatic Except the lyrics generate, module 122 can be also used for realizing and be synthesized based on the automatic song of the lyrics and melody.As shown in figure 4, module 122 further comprise song synthesis module 410.Song synthesis module 410 from lyrics generation module 220 receives the lyrics and from mould Plate generation module 230 receives the melody of template instruction, and then can be by generate by the lyrics received and melody combination The song sung.

It should be appreciated that song synthesis module 410 shown in Fig. 4 is optional.In some cases, module 122 can be with As shown in Figure 2 is only the lyrics and/or melody for providing separation.In the case of in addition, it can be automatically, or in response to use The input (such as being indicated for the user for synthesizing song) at family and is synthesized the lyrics of generation and melody by song synthesis module 410 For song.

It in some implementations, then will song together with song synthesis module 410 can simply match the lyrics with melody 106 output of song is to user.For example, melody is composed and shown on the display device in the form of numbered musical notation or staff, and will The lyrics are displayed in association with melody.User can be given song recitals by identifying melody and the lyrics.

In other realization, song synthesis module 410 can also be the sound that song determines corresponding chanteur, from And song 106 is directly played.Specifically, lyrics synthesis module 410 can obtain the sound that can indicate chanteur The sound model of sound feature, and then using the lyrics as the input of the sound model, to generate the sound spectrum track of the lyrics. In this way, the lyrics can be chanted by the chanteur represented by the sound model.In order to enable chanteur reads aloud the lyrics Read that there is certain rhythm, the melody of sound spectrum track and template instruction is further synthesized song by lyrics synthesis module 410 Bent performance waveform, the performance waveform are the singing songs indicated with melody matching.

In some implementations, lyrics synthesis module 410 can using vocoder (vocoder) come by sound spectrum track and Melody is synthesized together.It is obtained sing waveform and can be provided to the output equipment 160 of computing device 102 (such as raise one's voice Device) for broadcasting song.Alternatively, singing waveform can also be supplied to other external equipments to play by computing device 102 Song.

The sound model for the sound spectrum track that song synthesis module 410 is used to generate the lyrics can be pre-defined sound Sound model, the sound model can be trained using several sound clips, be generated so as to word or the lyrics that can be based on input Corresponding sound spectrum track.Sound model can be based on such as Hidden Markov (HMM) model or various based on nerve net Model (model that (LSTM) is such as remembered based on DNN or shot and long term) of network etc. is constructed.In some implementations, the sound mould Type can be trained using multiple sound clips of some chanteur.In other realization, which can use The sound clip of multiple and different chanteurs is trained, so that sound model can show the average speech of these chanteurs Feature.Such sound model can also be referred to as average Voice model.These predefined sound models can be by part Or it is stored entirely in the local memory device 130 of computing device 102 and/or addressable external equipment, such as database 170 In.

In some cases, user may expect that song can be given song recitals by personalized sound.Therefore, at some In realization, song synthesis module 410 can receive one or more sound clips 402 of specific chanteur input by user, and And sound model is trained based on the sound clip.In general, sound clip input by user may be limited, it is not enough to be used for Train the sound model that can be worked.Therefore, song synthesis module 410 can be adjusted using the sound clip 402 received Predefined average Voice model, so that the average Voice model after adjustment can also indicate singing in sound clip 402 The sound characteristic of person.Certainly, in other implementations, it may also require that user inputs the enough of one or more specific chanteurs Sound clip so that the voice training for this or these chanteur goes out corresponding sound model.

Instantiation procedure

Fig. 5 shows the flow chart for the process 500 that some automatic songs realized generate according to the disclosure.Process 500 can To be realized by computing device 102, such as can be implemented in the module 122 of computing device 102.

510, computing device 102 determines user about song to be generated in response to receiving the input of user based on input Bent creation intention.520, computing device 102 generates the template for song based on creation intention.The template indicates song The distribution of melody and the lyrics relative to melody.530, computing device 102 is based at least partially on the lyrics of template generation song. Further, in some implementations, computing device 102 can be based further on creation intention to generate the lyrics.

In some implementations, process 500, which may further include, combines the melody of the lyrics and template instruction to generate song It is bent.

In some implementations, process 500 may further include the sound model for obtaining and indicating the sound characteristic of chanteur； The sound spectrum track of the lyrics is generated using sound model；The melody that sound spectrum track and template indicate is synthesized into song Sing waveform；And play song based on waveform is sung.

In some implementations, obtaining sound model includes：Receive the sound clip of chanteur；And by using receiving Sound clip adjust predefined average Voice model to obtain sound model, average Voice model is sung using multiple and different The sound clip of person and obtain.

In some implementations, generating template based on creation intention includes：Based on creation intention, selected from multiple candidate templates Select template.

In some implementations, generating template based on creation intention includes：At least an existing song lyric is divided into Multiple melody segments；Multiple candidate melody segments are selected from multiple melody segments based on creation intention；Based on multiple candidate rotations The smoothness between segment is restrained, by least two candidate melody fragment assemblies in multiple candidate melody segments, to form template The melody of instruction；And by analyzing the lyrics in the corresponding song of be spliced at least two candidate's melody segments, to determine Distribution of the lyrics of template instruction relative to melody.

In some implementations, generating the lyrics includes：It is based at least partially on template generation candidate's lyrics；And based on reception To user input and change the candidate lyrics to obtain the lyrics.

In some implementations, generating the lyrics includes：It obtains the predefined lyrics and generates model, lyrics generation model utilizes more A existing lyrics and obtain；And model is generated using the lyrics, it is based on the template generation lyrics.

In some implementations, the input of user includes at least one of following：Image, word, video or audio.

Sample implementation

It is listed below some sample implementations of the disclosure.

On the one hand, present disclose provides a kind of the method implemented by computer, including：In response to receiving the defeated of user Enter, creation intention of the user about song to be generated is determined based on the input；It is used for based on creation intention generation The template of the song, the template indicate the distribution of the melody and the lyrics of the song relative to the melody；And at least It is based in part on the lyrics of song described in the template generation.

In some implementations, generating the lyrics further includes：The creation intention is based further on to generate the lyrics.

In some implementations, this method further comprises：The melody that the lyrics and the template indicate is combined with life At the song.

In some implementations, this method further comprises：Obtain the sound model for indicating the sound characteristic of chanteur；It utilizes The sound model generates the sound spectrum track of the lyrics；The melody that the sound spectrum track and the template are indicated Synthesize the performance waveform of the song；And the song is played based on the performance waveform.

In some implementations, obtaining the sound model includes：Receive the sound clip of chanteur；And by using connecing The sound clip received adjusts predefined average Voice model to obtain the sound model, the average Voice model It is obtained using the sound clip of multiple and different chanteurs.

In some implementations, generating the template based on the creation intention includes：Based on the creation intention, from multiple The template is selected in candidate template.

In some implementations, generating the template based on the creation intention includes：It will at least an existing song rotation Rule is divided into multiple melody segments；Multiple candidate melody pieces are selected from the multiple melody segment based on the creation intention Section；Based on the smoothness between the multiple candidate melody segment, by least two times in the multiple candidate melody segment Melody fragment assembly is selected, to form the melody of the template instruction；And by analyzing be spliced described at least two The lyrics in the corresponding song of candidate melody segment, to determine the lyrics of the template instruction relative to described in the melody points Cloth.

In some implementations, generating the lyrics includes：It is based at least partially on the template generation candidate lyrics；And It is inputted based on the user received to change the candidate lyrics to obtain the lyrics.

In some implementations, generating the lyrics includes：It obtains the predefined lyrics and generates model, the lyrics generate mould Type is obtained using multiple existing lyrics；And model is generated using the lyrics, based on the lyrics described in the template generation.

In some implementations, the input includes at least one of following：Image, word, video or audio.

On the other hand, present disclose provides a kind of equipment.The equipment includes：Processing unit；And memory, it is coupled to The processing unit and include the instruction being stored thereon, described instruction makes the equipment when being executed by the processing unit Execute following action：In response to receiving the input of user, determine the user about song to be generated based on the input Creation intention；Based on the creation intention generate for the song template, the template indicate the song melody and Distribution of the lyrics relative to the melody；And it is based at least partially on the lyrics of song described in the template generation.

In some implementations, the action further comprises：The melody that the lyrics and the template are indicated combine with Generate the song.

In some implementations, the action further comprises：Obtain the sound model for indicating the sound characteristic of chanteur；Profit The sound spectrum track of the lyrics is generated with the sound model；The rotation that the sound spectrum track and the template are indicated Rule synthesizes the performance waveform of the song；And the song is played based on the performance waveform.

On the other hand, present disclose provides a kind of computer program products, and the computer program product is by visibly It is stored in non-transitory, computer storage medium and includes machine-executable instruction, the machine-executable instruction is by equipment Make the equipment when execution：In response to receiving the input of user, determine the user about song to be generated based on the input Bent creation intention；The template for the song is generated based on the creation intention, the template indicates the rotation of the song Rule and distribution of the lyrics relative to the melody；And it is based at least partially on the lyrics of song described in the template generation.

In some implementations, the machine-executable instruction further makes the equipment when being executed by equipment：Further The lyrics are generated based on the creation intention.

In some implementations, the machine-executable instruction further makes the equipment when being executed by equipment：It will be described The lyrics and the melody of template instruction are combined to generate the song.

In some implementations, the machine-executable instruction further makes the equipment when being executed by equipment：Obtain table Show the sound model of the sound characteristic of chanteur；The sound spectrum track of the lyrics is generated using the sound model；By institute The melody for stating sound spectrum track and template instruction synthesizes the performance waveform of the song；And it is based on the performance wave Shape plays the song.

In some implementations, the machine-executable instruction makes the equipment when being executed by equipment：Receive chanteur's Sound clip；And predefined average Voice model is adjusted to obtain the sound by using the sound clip received Sound model, the average Voice model are obtained using the sound clip of multiple and different chanteurs.

In some implementations, the machine-executable instruction makes the equipment when being executed by equipment：Based on the creation It is intended to, the template is selected from multiple candidate templates.

In some implementations, the machine-executable instruction makes the equipment when being executed by equipment：By it is at least one Some song lyrics are divided into multiple melody segments；Based on the creation intention multiple times are selected from the multiple melody segment Select melody segment；Based on the smoothness between the multiple candidate melody segment, by the multiple candidate melody segment extremely Few two candidate melody fragment assemblies, to form the melody of the template instruction；And by analysis be spliced described in The lyrics in the corresponding song of at least two candidate's melody segments, to determine the lyrics of the template instruction relative to the melody The distribution.

In some implementations, the machine-executable instruction makes the equipment when being executed by equipment：At least partly ground In the template generation candidate lyrics；And it is inputted based on the user received to change the candidate lyrics to obtain the song Word.

In some implementations, the machine-executable instruction makes the equipment when being executed by equipment：It obtains predefined The lyrics generate model, and the lyrics are generated model and obtained using multiple existing lyrics；And generate mould using the lyrics Type, based on the lyrics described in the template generation.

Function described herein can be executed by one or more hardware logic components at least partly.Example Such as, without limitation, the hardware logic component for the exemplary type that can be used includes：Field programmable gate array (FPGA), specially With integrated circuit (ASIC), Application Specific Standard Product (ASSP), system on chip (SOC), complex programmable logic equipment (CPLD) etc. Deng.

Any combinations that one or more programming languages may be used in program code for implementing disclosed method are come It writes.These program codes can be supplied to the place of all-purpose computer, special purpose computer or other programmable data processing units Manage device or controller so that program code makes defined in flowchart and or block diagram when by processor or controller execution Function/operation is carried out.Program code can execute completely on machine, partly execute on machine, as stand alone software Is executed on machine and partly execute or executed on remote machine or server completely on the remote machine to packet portion.

In the context of the disclosure, machine readable media can be tangible medium, can include or be stored for The program that instruction execution system, device or equipment are used or is used in combination with instruction execution system, device or equipment.Machine can It can be machine-readable signal medium or machine-readable storage medium to read medium.Machine readable media can include but is not limited to electricity Son, magnetic, optical, electromagnetism, infrared or semiconductor system, device or equipment or the above any conjunction Suitable combination.The more specific example of machine readable storage medium will include being electrically connected of line based on one or more, portable meter Calculation machine disk, hard disk, random access memory (RAM), read-only memory (ROM), Erasable Programmable Read Only Memory EPROM (EPROM Or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage facilities or Any appropriate combination of the above.

Although in addition, depicting each operation using certain order, this should be understood as requirement operation in this way with shown The certain order that goes out executes in sequential order, or requires the operation of all diagrams that should be performed to obtain desired result. Under certain environment, it may be advantageous for multitask and parallel processing.Similarly, although containing several tools in being discussed above Body realizes details, but these are not construed as the limitation to the scope of the present disclosure.In the context individually realized Certain features of description can also be realized in combination in single realize.On the contrary, described in the context individually realized Various features can also individually or in any suitable subcombination be realized in multiple realizations.

Although having used specific to this theme of the language description of structure feature and/or method logical action, answer When understanding that the theme defined in the appended claims is not necessarily limited to special characteristic described above or action.On on the contrary, Special characteristic described in face and action are only to realize the exemplary forms of claims.

Claims

1. a kind of the method implemented by computer, including：

In response to receiving the input of user, creation intention of the user about song to be generated is determined based on the input；

The template for the song is generated based on the creation intention, the template indicates the melody and lyrics phase of the song Distribution for the melody；And

It is based at least partially on the lyrics of song described in the template generation.

2. according to the method described in claim 1, wherein generating the lyrics and further including：

The creation intention is based further on to generate the lyrics.

3. according to the method described in claim 1, further comprising：

The melody that the lyrics and the template indicate is combined to generate the song.

4. according to the method described in claim 1, further comprising：

Obtain the sound model for indicating the sound characteristic of chanteur；

The sound spectrum track of the lyrics is generated using the sound model；

The melody that the sound spectrum track and the template indicate is synthesized to the performance waveform of the song；And

The song is played based on the performance waveform.

5. according to the method described in claim 4, wherein obtaining the sound model and including：

Receive the sound clip of chanteur；And

Predefined average Voice model is adjusted by using the sound clip received to obtain the sound model, institute Average Voice model is stated to obtain using the sound clip of multiple and different chanteurs.

6. according to the method described in claim 1, wherein including based on the creation intention generation template：

Based on the creation intention, the template is selected from multiple candidate templates.

7. according to the method described in claim 1, wherein including based on the creation intention generation template：

At least an existing song lyric is divided into multiple melody segments；

Multiple candidate melody segments are selected from the multiple melody segment based on the creation intention；

Based on the smoothness between the multiple candidate melody segment, by least two times in the multiple candidate melody segment Melody fragment assembly is selected, to form the melody of the template instruction；And

The lyrics in the corresponding song of at least two candidates melody segment being spliced by analysis, to determine the template The distribution of the lyrics of instruction relative to the melody.

8. according to the method described in claim 1, wherein generating the lyrics and including：

It is based at least partially on the template generation candidate lyrics；And

It is inputted based on the user received to change the candidate lyrics to obtain the lyrics.

9. according to the method described in claim 1, wherein generating the lyrics and including：

It obtains the predefined lyrics and generates model, the lyrics are generated model and obtained using multiple existing lyrics；And

Model is generated using the lyrics, based on the lyrics described in the template generation.

10. according to the method described in claim 1, the wherein described input includes at least one of following：Image, word, video or Audio.

11. a kind of equipment, including：

Processing unit；And

Memory is coupled to the processing unit and includes the instruction being stored thereon, and described instruction is single by the processing Member makes the equipment execute following action when executing：

12. equipment according to claim 11, wherein generating the lyrics and further including：

The creation intention is based further on to generate the lyrics.

13. equipment according to claim 11, wherein the action further comprises：

14. equipment according to claim 11, wherein the action further comprises：

Obtain the sound model for indicating the sound characteristic of chanteur；

The sound spectrum track of the lyrics is generated using the sound model；

The song is played based on the performance waveform.

15. equipment according to claim 14, wherein obtaining the sound model and including：

Receive the sound clip of chanteur；And

16. equipment according to claim 11, wherein including based on the creation intention generation template：

17. equipment according to claim 11, wherein including based on the creation intention generation template：

At least an existing song lyric is divided into multiple melody segments；

18. equipment according to claim 11, wherein generating the lyrics and including：

19. equipment according to claim 11, wherein generating the lyrics and including：

20. a kind of computer program product, the computer program product is tangibly stored in non-transient computer storage and is situated between In matter and include machine-executable instruction, the machine-executable instruction makes the equipment when being executed by equipment：