CN108806655A - Song automatically generates - Google Patents
Song automatically generates Download PDFInfo
- Publication number
- CN108806655A CN108806655A CN201710284144.8A CN201710284144A CN108806655A CN 108806655 A CN108806655 A CN 108806655A CN 201710284144 A CN201710284144 A CN 201710284144A CN 108806655 A CN108806655 A CN 108806655A
- Authority
- CN
- China
- Prior art keywords
- lyrics
- melody
- song
- template
- sound
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004458 analytical method Methods 0.000 claims description 27
- 238000000034 method Methods 0.000 claims description 26
- 238000001228 spectrum Methods 0.000 claims description 18
- 238000012545 processing Methods 0.000 claims description 17
- 230000009471 action Effects 0.000 claims description 9
- 239000012634 fragment Substances 0.000 claims description 8
- 230000004044 response Effects 0.000 claims description 8
- 230000008859 change Effects 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 4
- 230000001052 transient effect Effects 0.000 claims 1
- 230000008451 emotion Effects 0.000 description 17
- 230000015572 biosynthetic process Effects 0.000 description 13
- 238000005516 engineering process Methods 0.000 description 13
- 238000003786 synthesis reaction Methods 0.000 description 13
- 238000010586 diagram Methods 0.000 description 10
- 230000004048 modification Effects 0.000 description 8
- 238000012986 modification Methods 0.000 description 8
- 238000012549 training Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 230000033764 rhythmic process Effects 0.000 description 5
- 238000001514 detection method Methods 0.000 description 4
- 238000010191 image analysis Methods 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000000712 assembly Effects 0.000 description 3
- 238000000429 assembly Methods 0.000 description 3
- 239000004744 fabric Substances 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 210000004218 nerve net Anatomy 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000010428 oil painting Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000032696 parturition Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0008—Associated control or indicating means
- G10H1/0025—Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H7/00—Instruments in which the tones are synthesised from a data store, e.g. computer organs
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/101—Music Composition or musical creation; Tools or processes therefor
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/101—Music Composition or musical creation; Tools or processes therefor
- G10H2210/111—Automatic composing, i.e. using predefined musical rules
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/101—Music Composition or musical creation; Tools or processes therefor
- G10H2210/125—Medley, i.e. linking parts of different musical pieces in one single piece, e.g. sound collage, DJ mix
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/101—Music Composition or musical creation; Tools or processes therefor
- G10H2210/151—Music Composition or musical creation; Tools or processes therefor using templates, i.e. incomplete musical sections, as a basis for composing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2220/00—Input/output interfacing specifically adapted for electrophonic musical tools or instruments
- G10H2220/005—Non-interactive screen display of musical or status data
- G10H2220/011—Lyrics displays, e.g. for karaoke applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2220/00—Input/output interfacing specifically adapted for electrophonic musical tools or instruments
- G10H2220/155—User input interfaces for electrophonic musical instruments
- G10H2220/441—Image sensing, i.e. capturing images or optical patterns for musical purposes or musical control purposes
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/075—Musical metadata derived from musical analysis or for use in electrophonic musical instruments
- G10H2240/085—Mood, i.e. generation, detection or selection of a particular emotional content or atmosphere in a musical piece
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/121—Musical libraries, i.e. musical databases indexed by musical parameters, wavetables, indexing schemes using musical parameters, musical rule bases or knowledge bases, e.g. for automatic composing methods
- G10H2240/131—Library retrieval, i.e. searching a database or selecting a specific musical piece, segment, pattern, rule or parameter set
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Auxiliary Devices For Music (AREA)
- Electrophonic Musical Instruments (AREA)
Abstract
According to the realization of the disclosure, a kind of scheme for supporting the automatic song of machine to generate is provided.In this scenario, the input of user is used for determining creation intention of the user about song to be generated.The template for song is generated based on creation intention, which indicates the distribution of the melody and the lyrics of song relative to melody.Then, it is at least partially based on the template, generates the lyrics of song.Thus, it is possible to automatically create the melody and the lyrics for meeting user's creation intention and being mutually matched.
Description
Background technology
Song is people's appreciation and a kind of favorite art form, has been deeply infiltrated into people's lives.However, song
Song creation is still a complicated process.Generally, song creation process include write words (that is, generate lyrics) and wrirte music (that is,
Generate melody) two major parts.Tradition composition needs composer to have certain music theory knowledge, and combines inspiration and creation
Experience creates complete song lyric.Create melodious melody has more requirement in music theory, such as requires to ensure melody
With rhythm unification, the combination etc. be capable of Behaviour theme, embody various music styles or style.In addition, weight of the lyrics as song
Component part is wanted, is also required to express the meaning, agree with theme and match with song lyric.Therefore, it to create with specific wind
Lattice and emotion and the song for showing specific subject require the music theory of creator very high.
Invention content
According to the realization of the disclosure, a kind of scheme for supporting the automatic song of machine to generate is provided.In this scenario, user
Input be used for determining creation intention of the user about song to be generated.The template for song is generated based on creation intention,
The template indicates the distribution of the melody and the lyrics of song relative to melody.Then, it is at least partially based on the template, generates song
The lyrics.Thus, it is possible to automatically create the melody and the lyrics for meeting user's creation intention and being mutually matched.
It is the specific implementation below in order to which simplified form introduces the mark to concept to provide Summary
It will be further described in mode.Summary is not intended to identify the key feature or main feature of claimed theme,
Also it is not intended to limit the range of claimed theme.
Description of the drawings
Fig. 1 shows the block diagram of the computing environment for the multiple realizations that can be implemented within the disclosure;
Fig. 2 shows the block diagrams that system is generated according to some automatic songs realized of the disclosure;
Fig. 3 shows the schematic diagram analyzed creation intention input by user realized according to some of the disclosure;
Fig. 4 shows that the automatic song realized according to other of the disclosure generates the block diagram of system;And
Fig. 5 shows the flow chart for the process that the song realized according to some of the disclosure generates.
In these attached drawings, same or similar reference mark is for indicating same or similar element.
Specific implementation mode
The disclosure is discussed now with reference to several example implementations.It is realized it should be appreciated that discussing these merely to making
It obtains those of ordinary skill in the art and better understood when and therefore realize the disclosure, rather than imply to the range of this theme
Any restrictions.
As it is used herein, term " comprising " and its variant will be read as the opening for meaning " to include but not limited to "
Formula term.Term "based" will be read as " being based at least partially on ".Term " realization " and " a kind of realization " will be solved
It reads to be " at least one realization ".Term " another realization " will be read as " at least one other realization ".Term " first ",
" second " etc. may refer to different or identical object.Hereafter it is also possible that other specific and implicit definition.
As discussed above, there are many requirements for the melody of song and/or the lyrics during song creation, these are wanted
Seek the possibility for limiting ordinary people or organizing individual characteristic of creating song.In many situations, ordinary people or tissue if it is intended to
The song for obtaining customization generally requires to seek help from the people with professional ability of writing words and wrirte music or tissue.With computer age
Arrival, in particular with being constantly progressive for artificial intelligence, it is desired to be able to automatically generate desired song, such as generate song
Melody and/or the lyrics.
According to some realizations of the disclosure, provide a kind of by the computer-implemented scheme for automatically generating song.At this
In scheme, the input of user, image, word, video and/or audio etc. are used for determining user about generation song
Creation intention.The creation intention of such input user further be used to instruct the generation of the template of song so that be generated
Template instruction song the distribution relative to melody of melody and the lyrics.Further, based on the melody and song indicated by template
The distribution of word can generate the lyrics of song.By the scheme of the disclosure, the lyrics generated have been matched with the template of song
In melody, therefore the song that can be sung can be directly combined into together with the melody.In addition, the input life based on user
At the lyrics, melody and/or song can embody the creation intention of user, enabling provide to the user personalized and high-quality
Song, the lyrics and/or the melody of amount.
Below with reference to attached drawing come the basic principle for illustrating the disclosure and several example implementations.
Example context
Fig. 1 shows the block diagram of the computing environment 100 for the multiple realizations that can be implemented within the disclosure.It should be appreciated that
Computing environment 100 shown in figure 1 is only exemplary, without should constitute to the function realized described in the disclosure and
Any restrictions of range.As shown in Figure 1, computing environment 100 includes the computing device 102 of universal computing device form.Calculating is set
Standby 102 component can include but is not limited to one or more processors or processing unit 110, memory 120, storage device
130, one or more communication units 140, one or more input equipments 150 and one or more output equipments 160.
In some implementations, computing device 102 may be implemented as various user terminals or service terminal.Service terminal can
To be server, the mainframe computing devices etc. of various service providers offers.User terminal is all any kind of mobile whole in this way
End, fixed terminal or portable terminal, including cell phone, multimedia computer, multimedia tablet, internet node, communication
Device, desktop computer, laptop computer, notebook computer, netbook computer, tablet computer, PCS Personal Communications System
(PCS) equipment, personal navigation equipment, personal digital assistant (PDA), audio/video player, digital camera/video camera, positioning
Equipment, television receiver, radio broadcast receiver, electronic book equipment, game station or its it is arbitrary combine, including these set
Standby accessory and peripheral hardware or its arbitrary combination.It is also foreseeable that computing device 102 can support any kind of be directed to
The interface (" wearable " circuit etc.) of user.
Processing unit 110 can be reality or virtual processor and can according to the program stored in memory 120 come
Execute various processing.In a multi-processor system, multiple processing unit for parallel execution computer executable instructions are calculated with improving
The parallel processing capability of equipment 102.Processing unit 110 can also be referred to as central processing unit (CPU), microprocessor, control
Device, microcontroller.
Computing device 102 generally includes multiple computer storage medias.Such medium, which can be computing device 102, to visit
Any medium that can be obtained asked, including but not limited to volatile and non-volatile media, removable and non-removable media.
Memory 120 can be volatile memory (such as register, cache, random access storage device (RAM)), non-volatile
Memory (for example, read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory) or its certain group
It closes.Memory 120 may include one or more program modules 122, these program modules are configured as executing described herein
Various realizations function.Module 122 can be accessed and run by processing unit 110, to realize corresponding function.Storage device
130 can be detachable or non-removable medium, and may include machine readable media, and it can be used to store information
And/or it data and can be accessed in computing device 102.
The function of the component of computing device 102 can realize with single computing cluster or multiple computing machines, these meters
Calculating machine can be by being communicated.Therefore, computing device 102 can use and other one or more services
The logical connection of device, personal computer (PC) or another general networking node is operated in networked environment.It calculates
Equipment 102 can also be communicated as desired by communication unit 140 with one or more external equipment (not shown), external
Equipment database 170, other storage devices, server, display equipment etc., with one or more so that user sets with calculating
The equipment of standby 102 interaction are communicated, or with make appointing for computing device 102 and other one or more computing device communications
What equipment (for example, network interface card, modem etc.) is communicated.Such communication can be via input/output (I/O) interface
(not shown) executes.
Input equipment 150 can be one or more various input equipments, such as the input of mouse, keyboard, trackball, voice
Equipment, camera etc..Output equipment 160 can be one or more output equipments, such as display, loud speaker, printer etc..?
In the realization that some automatic songs generate, input equipment 150 receives the input 104 of user.It is expected in input depending on user
The type of appearance, different types of input equipment 150 can be used to receive input 104.Input 104 is provided to module 122, with
Make module 122 based on input 104 determination users about song creation intention and thus generate corresponding song melody and/
Or the lyrics.In some implementations, module 122 is using the lyrics of generation, melody and/or the song that is made of the lyrics and melody as defeated
Going out 106 is supplied to output equipment 160 for output.Output equipment 160 can be with one kind such as word, image, audio and/or videos
Or diversified forms provide output 106.
The example implementation discussed more fully below that the lyrics, melody and song are automatically generated in module 122.
The generation of melody and the lyrics
Fig. 2 shows the block diagrams that some automatic songs realized according to the disclosure generate system.In some implementations, this is
System can be implemented as the module 122 in computing device 102.In the realization of Fig. 2, module 122 be used to realize automatic melody life
It is generated at the lyrics.As shown, module 122 includes creation intention analysis module 210, lyrics generation module 220 and template life
At module 230.According to the realization of the disclosure, creation intention analysis module 210 is configured as receiving the input 104 of user, and
Creation intention 202 of the user about song to be generated is determined based on the input 104.Input 104 can be via computing device 102
Input equipment 150 is received from user and is provided to creation intention analysis module 210.
In some implementations, creation intention analysis module 210 can be based on certain types of input 104 or a variety of differences
Creation intention 202 is analyzed and is determined in the input 104 of type.The example of input 104 can be word, such as pass input by user
Dialogue, label between key word, personage, the various documents comprising word etc..Alternatively or additionally, inputting 104 may include
The video and/or audio etc. of the image of various formats, various length and format.The use that can be provided via input equipment 150
Family interface receives the input of user.Therefore, according to the realization of the disclosure, user can be allowed to be waited for by being simply input to control
The song (lyrics and/or melody that include song) of generation, without require user to have more music theory knowledge guide the lyrics,
The generation of melody and/or song.
User refers to that the user being embodied in input 104 it is expected that the song generated can about the creation intention of song
The one or more features given expression to, including the theme of song, emotion, keynote, style, key element etc..For example, if defeated
Enter 104 be a good fortune of the whole family and the kinsfolk in image expression it is all very happy, then creation intention analysis module 210 can
Creation intention to analyze user is that the song title generated to be made to be " family " and integrally to give expression to " joy "
Emotion etc..
Depending on the type of input 104, creation intention analysis module 210 may be used different analytical technologies and come from input
Creation intention 202 is extracted in 104.For example, if input 104 is word, nature may be used in creation intention analysis module 210
Language Processing or text analysis technique analyze theme described in the word of input, emotion, key element etc..
In another example, if input 104 is image, image knowledge may be used in creation intention analysis module 210
Not, the image analysis technologies such as recognition of face, gesture recognition, expression detection, gender and age detection include to analyze in image
The information such as object and personage and the expression of these objects and personage, posture and emotion, and thereby determine that image integrally shows
Theme, emotion, the key element (such as image includes people, object, environment, event etc.) gone out.
Alternatively or additionally, creation intention analysis module 210 can also obtain other features associated with image, all
The size of such as image, format, type (such as oil painting, stick figure, clip art, black white image), integral color, associated mark
Label (can be added by user or automatically be added) and metadata etc..Then, analyze and determine creation based on the information of acquisition
It is intended to 202.
Fig. 3 shows the schematic diagram of the creation intention analysis to input 104, and it is image to input 104 in this instance.It is receiving
To after image 104, recognition of face and gesture recognition technology may be used to determine in image 104 in creation intention analysis module 210
Including multiple personages, and thereby determine that the classification of image 104 belongs to " crowd ", as indicated by the label 302 in Fig. 3.Into
One step, creation intention analysis module 210 can also be analyzed every in image 104 by gender and age detection and recognition of face etc.
The age of a personage and gender (as indicated by label 304) and be also based on the age, gender and other information (such as
Human face similarity degree etc.) determine that crowd that image 104 includes is one family.
In addition, passing through expression detection technique, image recognition technology, image analysis technology etc., it may be determined that people in image 104
The emotion of object is cheerful and light-hearted, and in outdoor environment.Therefore, creation intention analysis module 210 can determine the wound of user
Make to be intended to may be to create the happy song for eulogizing family, the members such as " open air ", " close ", " individual " can occur in this song song
Element.Certainly, creation intention analysis module 210 can also continue to determine that the information such as type, format, the size of image 104 are come into one
Step ground auxiliary determines creation intention.
In other examples, if input 104 includes audio and/or video, creation intention analysis module 210 may be used
Speech analysis (being directed to audio and video) and image analysis (for video) technology include to determine in input audio and/or video
Particular content.For example, can be by the way that the voice in audio and/or video is converted to word, and use carry above in turn
And natural language processing or text analysis technique analyzed.Above-mentioned image analysis technology may be used to video
One or more frames are analyzed.Further, it is also possible to be analyzed the spectral characteristic of the voice in audio and/or video to come really
The emotion of personage that is showed in accordatura frequency and/or video identifies theme etc. involved by voice.
It should be appreciated that may be used existing or leaved for development various to word, image, audio and/or video in the future
In analytical technology come execute creation intention analysis task, so long as technology can be analyzed from the input of respective type
The one or more aspects of song creation can be influenced by going out.In some implementations, input 104 can include a plurality of types of
Input, and corresponding analytical technology therefore may be used for each type of input and analyze.From different types of
The analysis result that input obtains can be combined for determining creation intention 202.In some implementations, if input 104
Include certain keys of instruction of specific creation intention, such as the style of instruction song, emotion etc. or the instruction lyrics
Element or a part of melody and/or the lyrics distribution for indicating song, then can extract these specific creation from input 104
It is intended to.Although listing the example of some creation intentions, however, it is to be appreciated that influence can also be analyzed from the input of user
Other aspects of the feature of song, the scope of the present disclosure are not limited in this respect.
With continued reference to Fig. 2, the creation intention 202 that creation intention analysis module 210 determines can be passed as keyword
To template generation module 230.Template generation module 230 is configured as generating the template for song based on creation intention 202
(template)204.The template 204 of song can at least indicate the melody of song, and melody can be represented as holding for phoneme
Continuous time, pitch track, loudness of a sound track and other various parameters for generating melody.In addition, the template 204 of song may be used also
To indicate distribution of the lyrics relative to melody, include the lyrics number of words of each trifle, the duration of each phoneme of each word,
Pitch track and loudness of a sound track etc..Therefore, the lyrics distribution in template 204 matches with melody so that the song thus generated
The song of word and melody composition can easily be sung.
In some realizations, may be implemented to determine and store multiple predefined song templates, referred to as " candidate template ".This
When, template generation module 230 can be configured as based on creation intention 202 selected from this multiple candidate template template 204 with
Generation for current song.Multiple candidate templates can be obtained from existing song.For example, can be by the rotation of existing song
The lyrics of rule and existing song relative to melody distribution directly or through being determined as one or more after manually adjusting
Candidate template.In other examples, one or more candidate templates can be by having the people of music theory knowledge creation.In addition, one
A or multiple candidate templates can also be provided by user, such as created by user or obtained from other sources.Multiple candidate templates
It can be obtained ahead of time and be stored in storage device for using.For example, multiple candidate templates can be stored in calculating
The storage device 130 of equipment 102 is used as local data, and/or can be stored in 102 addressable external data of computing device
In library 170.
Music style, tune, rhythm, the emotion of candidate template are known, and can for example pass through the form of label
It is recorded.Template generation module 230 can be based on letters such as theme, emotion, elements included by creation intention 202 as a result,
Breath selects the candidate template to match as template 204 from multiple candidate templates.Template generation module 230 can be based on and time
The associated label information of modeling plate (having recorded music style, tune, rhythm, emotion of candidate template etc.) and creation intention
Template 204 to be used is selected in 202 comparison.For example, if creation intention 202 indicates that the theme of song to be generated is " family
Front yard " and emotion will show " joy ", then the candidate mould that emotion is more happy, tune and rhythm are more active can be selected
Plate.In some implementations, two or more candidate blocks can be determined for selection by the user based on creation intention 202, and led to
The user received is crossed to select to determine template 204 to be used.
It substitutes predefined candidate template or as supplement, in other realization, template generation module 230 may be used also
To generate template 204 to be used in real time based on creation intention 202.Specifically, template generation module 230 can be by one
Or multiple existing song lyrics are divided into multiple melody segments in advance.The division of such melody segment can be with the one of melody
Based on a or multiple syllables, and there can be any identical or different length.By professional to being carried out by song
Artificial division is also feasible.It divides obtained multiple melody segments and is used as the basis that follow-up melody generates, and can
Be partly or entirely stored in computing device 102 local memory device 130 and/or addressable external equipment, it is all in full
According in library 170.After creation intention analysis module 210 receives creation intention 202, template generation module 230 can be based on
Creation intention 202 selects melody segment forming complete melody.When combining melody segment, not only to enable melody
Enough meet creation intention 202, but also to allow the transitions smooth between melody segment, so that whole melody sounds more happy
Ear.Standard about " smooth " and judgement will specifically describe below.
Specifically, template generation module 230 can select two or more candidate's rotations from the melody segment divided in advance
Restrain segment, and the smoothness being then based between candidate melody segment is by therein at least two candidate melody fragment assemblies
For melody.The selection of candidate melody segment can be based on creation intention 202, so that selected one or more times
Select melody segment can individually and/or combine show the creation intention 202.For example, if the instruction of creation intention 202 waits for
The emotion of the song of generation is " joy ", then can select that happy mood can be expressed from the melody segment divided in advance
Melody segment is as candidate melody segment.If creation intention 202 also indicates other aspects for influencing song creation, can also phase
Melody segment is selected accordingly with answering.
In some implementations, the melody segment that divided in advance can be classified and be labelled, and then can be with base
Candidate melody segment is determined in classification and the comparison of label and creation intention 202.It, can also be in advance in other realization
Definition trains preselected model to execute the selection of candidate melody segment.The preselected model can be trained to can basis
The corresponding candidate melody segment of creation intention 202 (for example, in the form of keyword) selection of input.Different instructions may be used
Practice the melody segment that creation intention and known and these creation intentions match and trains the preselected mould as training data
Type.Further, it is also possible to some negative samples (that is, some creation intentions and with the unmatched melody segment of these creation intentions)
The model is trained, so that the model has the ability of correct judgment and error result.The preselected model can be by part
Or it is stored entirely in the local memory device 130 and/or addressable external equipment, such as database 170 of computing device 102
In.
As described above, seamlessly transitting between melody segment is important for the song quality created.It is waiting
Among selecting melody segment, template generation module 230 can determine the smoothness between each two candidate's melody segment to determine this
Whether two candidate melody segments can be spliced together.Smoothness between neighboring candidate melody segment can use various
Technology determines that the example of such technology includes but not limited to:By the pitch track, right for analyzing the melody in melody segment
The aspect of the perception of continuity and/or other influences attentive listener between the pitch track answered is measured.
In some implementations, template generation module 230 can determine two using pre-defined smoothness judgment models
Whether a candidate's melody segment is acoustically being smoothed transition.The smoothness judgment models can be designed as the rotation based on input
Restrain various parameters,acoustics (frequency spectrum, frequency, loudness, duration etc.) output smoothness of segment.Output can be to locate
In a certain range of smoothness metric or it can indicate the whether smooth instructions of melody segments (such as 1 or 0 of two inputs
Value).Training data for smoothness judgment models as training may include that the adjacent melody segment having in song (is made
For positive sample) and the melody segment selected at random from multiple segments of existing song (as negative sample).In some instances,
Such model for example can the various models (model that (LSTM) is such as remembered based on DNN or shot and long term) based on neural network
Or any other can complete the model etc. of smoothness judgement.Template generation module 230 can be by two candidate melody pieces
Section is input to smoothness judgment models, and (or whether result refers to for the comparison of the result based on model output and predetermined threshold
Show smooth) to determine two, whether candidate melody segment is smooth and it is thus determined that whether they can be spliced.
Alternatively or additionally, template generation module 230 can also pass through Viterbi search (viterbi searching)
Plan the stitching path of candidate melody segment, i.e. candidate melody segment puts in order.Therefore, 230 base of template generation module
In smoothness and/or the result of Viterbi search, it may be determined that the candidate melody segment of two or more to be spliced and it
Splicing sequence.The candidate melody segment of these splicings forms the melody indicated by template 204.
Further, in some implementations, template generation module 230 is also based on generated melody to determine by mould
The distribution of the lyrics indicated by plate 204.In some implementations, since the melody segment of composition melody is divided from existing song
It obtains, template generation module 230 can analyze the lyrics in the song corresponding to the candidate melody segment being spliced, really with this
Lyrics distribution indicated by solid plate.It will be understood that, can be considered as each other between the lyrics and melody segment in existing song
It is matched.Therefore, the lyrics distribution that the candidate melody segment that easily can be analyzed and be spliced matches.In other reality
In existing, distribution of the lyrics relative to melody can also be determined based on creation intention 202 and established melody.Determining melody
And after the lyrics are relative to the distribution of melody, template generation module 230 can obtain corresponding template 204.
In some implementations, clearly refer to about what melody and/or the lyrics were distributed if creation intention 202 includes user
Show, template generation module 230 also takes into account these when generating template, can clearly embody these creation intentions to obtain
Template 204.In order to further enhance user experience, the template for being selected or being generated based on creation intention 202 can be first as in
Between template be presented to user.Then template generation module 230 receives user and the melody and/or the lyrics of intermediate die plate is distributed
Modification, and obtain final template 204 based on these modifications.
The template 204 determined by template generation module 230 be used to instruct the lyrics of lyrics generation module 220 to generate.Tool
Body, lyrics generation module 220 is configured as generating the lyrics of song based on template 204.Since template 204 indicates the lyrics
Relative to the distribution of melody, therefore the lyrics can be generated as matching with the distribution by lyrics generation module 220.For example, the lyrics
In each trifle lyrics number of words, duration, pitch track and the loudness of a sound track of each phoneme of each word and this point
Matching indicated by cloth, so that the lyrics and melody that are generated can make up the song that can be sung.In addition, the lyrics
Generation module 220 can also obtain creation intention 202 from creation intention analysis module 210, and be based further on creation intention
202 generate the lyrics.The lyrics that creation intention can instruct lyrics generation module 220 to generate also show corresponding theme, emotion
And/or various key elements.
In some implementations, lyrics generation module 220 can indicate one or more existing lyrics with template 204
Distribution be compared.The existing lyrics may include the lyrics for including in various existing songs, or the poem etc. write
The text that can be sung.If some existing lyrics matches with the distribution indicated by template 204, the song can be selected
Word.In some cases, one or more existing lyrics can also be divided into multiple lyrics pieces by lyrics generation module 220
Section, and determine whether corresponding lyrics segment matches with part distribution is indicated in template.Then, by will be matched more
A lyrics fragment combination at song the lyrics.When in addition considering creation intention 202, lyrics generation module 220 is also based on
Creation intention 202 selects lyrics segment, so that the lyrics segment of selection individually or combines and embodies the creation intention 202
One or more aspects.
In other realization, lyrics generation module 220 can generate model using the predefined lyrics to generate song
The generation of word.Such lyrics generate model and can be trained to have the different templates according to song (for example, the different lyrics
Distribution) generate the abilities of the different lyrics.Model is generated using such lyrics, the lyrics indicated with template 204 can be obtained
It is distributed the lyrics to match.For example, the lyrics number of words of each trifle in the lyrics, the duration of each phoneme of each word,
Pitch track and loudness of a sound track all with matching indicated by the distribution so that the lyrics and melody that are generated can make up
The song that can be sung.
Alternatively or additionally, the lyrics, which generate model, can also be trained to many not Tongfangs based on creation intention 202
The input in face generates the corresponding lyrics so that the lyrics can embody the one or more aspects of creation intention, such as meet phase
The song title answered gives expression to song emotion and/or comprising certain key elements.In some implementations, if the lyrics generate mould
Block 220 is not covered with the whole that the lyrics generate required by model from the creation intention 202 that creation intention analysis module 210 obtains and creates
Make the aspect (because limited input 104 of user) being intended to, then it is empty otherwise value can be set to, so that song
Word generation module 220 can generate the defeated of model using limited creation intention 202 (and template 204 of song) as the lyrics
Enter to generate the lyrics.It should be appreciated that in some implementations, if creation intention 202 includes user clearly referring to about the lyrics
Showing, such as the key element or word that the lyrics include, lyrics generation module 220 also takes into account these when generating template,
To obtain the lyrics that can clearly embody these creation intentions.
In some instances, the lyrics generate the model that model can be based on neural network, such as Recognition with Recurrent Neural Network (RNN)
Or other learning models and be fabricated.The lyrics are generated model and can be trained to using multiple existing lyrics.It is existing
The lyrics may include the text that the lyrics for including or the poem write etc. can be sung in various existing songs.In training
When, the existing lyrics can be classified as different themes, style and/or content.Lyrics generation model, which is trained to work as, to be connect
When receiving specific template and/or creation intention, the corresponding lyrics can be generated.Therefore, specific template and creation intention
As the training data of lyrics generation module, can learn from training data to for specific mould so that the lyrics generate model
Plate and/or creation intention generate the ability of the lyrics.The lyrics trained, which generate model, to be partly or entirely stored in calculating
The local memory device 130 and/or addressable external equipment of equipment 102, such as database 170.It should be appreciated that may be used
It is various known and/or model structure and/or training method leaved for development obtain lyrics generation module, the model of the disclosure in the future
It encloses and is not limited in this respect.
By from it is being selected in the existing lyrics and/or by the lyrics generate model generate the lyrics after, in some implementations,
Lyrics generation module 220 can directly provide the lyrics as output 106.It is alternatively possible to provide to the user to giving birth to automatically
At the lyrics modification.Lyrics generation module 220 can first will be selecting and/or by the lyrics generate model from the existing lyrics
The lyrics of generation are exported as the candidate lyrics to user, such as are shown with text via output equipment 160 and/or played with audio
To user.The modification that user can be inputted by input equipment 150 to the candidate lyrics indicates 206.Such modification instruction 206 can
To indicate the adjustment to one or more of candidate lyrics word, such as replaces these words with other words or change word
The sequence of language.After receiving input of the user about the modification instruction 206 of the lyrics, lyrics generation module 220 is based on input
The 206 candidate lyrics of modification of modification instruction to obtain the lyrics 106 of song for output.
The lyrics 106 can be provided to the output equipment 160 of computing device 102, and can be with word and/or audio
Form is exported to user.In some implementations, the melody in the template 204 that template generation module 230 generates can also be carried
Supply output equipment 160 using as output 106.For example, melody 106 can be composed it is defeated for the form of numbered musical notation and/or staff
Go out to user.
Automatic melody and the lyrics discussed above generate.In some alternative realizations, the lyrics may be utilized for and template
The melody of 204 instructions is combined to generate song.Such song can also be played to user.Automatic song discussed more fully below
The example implementation of Qu Hecheng.
The synthesis of song
Fig. 4 shows the block diagram of the module 122 of the realization synthesized according to automatic song.In the example of fig. 4, in addition to automatic
Except the lyrics generate, module 122 can be also used for realizing and be synthesized based on the automatic song of the lyrics and melody.As shown in figure 4, module
122 further comprise song synthesis module 410.Song synthesis module 410 from lyrics generation module 220 receives the lyrics and from mould
Plate generation module 230 receives the melody of template instruction, and then can be by generate by the lyrics received and melody combination
The song sung.
It should be appreciated that song synthesis module 410 shown in Fig. 4 is optional.In some cases, module 122 can be with
As shown in Figure 2 is only the lyrics and/or melody for providing separation.In the case of in addition, it can be automatically, or in response to use
The input (such as being indicated for the user for synthesizing song) at family and is synthesized the lyrics of generation and melody by song synthesis module 410
For song.
It in some implementations, then will song together with song synthesis module 410 can simply match the lyrics with melody
106 output of song is to user.For example, melody is composed and shown on the display device in the form of numbered musical notation or staff, and will
The lyrics are displayed in association with melody.User can be given song recitals by identifying melody and the lyrics.
In other realization, song synthesis module 410 can also be the sound that song determines corresponding chanteur, from
And song 106 is directly played.Specifically, lyrics synthesis module 410 can obtain the sound that can indicate chanteur
The sound model of sound feature, and then using the lyrics as the input of the sound model, to generate the sound spectrum track of the lyrics.
In this way, the lyrics can be chanted by the chanteur represented by the sound model.In order to enable chanteur reads aloud the lyrics
Read that there is certain rhythm, the melody of sound spectrum track and template instruction is further synthesized song by lyrics synthesis module 410
Bent performance waveform, the performance waveform are the singing songs indicated with melody matching.
In some implementations, lyrics synthesis module 410 can using vocoder (vocoder) come by sound spectrum track and
Melody is synthesized together.It is obtained sing waveform and can be provided to the output equipment 160 of computing device 102 (such as raise one's voice
Device) for broadcasting song.Alternatively, singing waveform can also be supplied to other external equipments to play by computing device 102
Song.
The sound model for the sound spectrum track that song synthesis module 410 is used to generate the lyrics can be pre-defined sound
Sound model, the sound model can be trained using several sound clips, be generated so as to word or the lyrics that can be based on input
Corresponding sound spectrum track.Sound model can be based on such as Hidden Markov (HMM) model or various based on nerve net
Model (model that (LSTM) is such as remembered based on DNN or shot and long term) of network etc. is constructed.In some implementations, the sound mould
Type can be trained using multiple sound clips of some chanteur.In other realization, which can use
The sound clip of multiple and different chanteurs is trained, so that sound model can show the average speech of these chanteurs
Feature.Such sound model can also be referred to as average Voice model.These predefined sound models can be by part
Or it is stored entirely in the local memory device 130 of computing device 102 and/or addressable external equipment, such as database 170
In.
In some cases, user may expect that song can be given song recitals by personalized sound.Therefore, at some
In realization, song synthesis module 410 can receive one or more sound clips 402 of specific chanteur input by user, and
And sound model is trained based on the sound clip.In general, sound clip input by user may be limited, it is not enough to be used for
Train the sound model that can be worked.Therefore, song synthesis module 410 can be adjusted using the sound clip 402 received
Predefined average Voice model, so that the average Voice model after adjustment can also indicate singing in sound clip 402
The sound characteristic of person.Certainly, in other implementations, it may also require that user inputs the enough of one or more specific chanteurs
Sound clip so that the voice training for this or these chanteur goes out corresponding sound model.
Instantiation procedure
Fig. 5 shows the flow chart for the process 500 that some automatic songs realized generate according to the disclosure.Process 500 can
To be realized by computing device 102, such as can be implemented in the module 122 of computing device 102.
510, computing device 102 determines user about song to be generated in response to receiving the input of user based on input
Bent creation intention.520, computing device 102 generates the template for song based on creation intention.The template indicates song
The distribution of melody and the lyrics relative to melody.530, computing device 102 is based at least partially on the lyrics of template generation song.
Further, in some implementations, computing device 102 can be based further on creation intention to generate the lyrics.
In some implementations, process 500, which may further include, combines the melody of the lyrics and template instruction to generate song
It is bent.
In some implementations, process 500 may further include the sound model for obtaining and indicating the sound characteristic of chanteur;
The sound spectrum track of the lyrics is generated using sound model;The melody that sound spectrum track and template indicate is synthesized into song
Sing waveform;And play song based on waveform is sung.
In some implementations, obtaining sound model includes:Receive the sound clip of chanteur;And by using receiving
Sound clip adjust predefined average Voice model to obtain sound model, average Voice model is sung using multiple and different
The sound clip of person and obtain.
In some implementations, generating template based on creation intention includes:Based on creation intention, selected from multiple candidate templates
Select template.
In some implementations, generating template based on creation intention includes:At least an existing song lyric is divided into
Multiple melody segments;Multiple candidate melody segments are selected from multiple melody segments based on creation intention;Based on multiple candidate rotations
The smoothness between segment is restrained, by least two candidate melody fragment assemblies in multiple candidate melody segments, to form template
The melody of instruction;And by analyzing the lyrics in the corresponding song of be spliced at least two candidate's melody segments, to determine
Distribution of the lyrics of template instruction relative to melody.
In some implementations, generating the lyrics includes:It is based at least partially on template generation candidate's lyrics;And based on reception
To user input and change the candidate lyrics to obtain the lyrics.
In some implementations, generating the lyrics includes:It obtains the predefined lyrics and generates model, lyrics generation model utilizes more
A existing lyrics and obtain;And model is generated using the lyrics, it is based on the template generation lyrics.
In some implementations, the input of user includes at least one of following:Image, word, video or audio.
Sample implementation
It is listed below some sample implementations of the disclosure.
On the one hand, present disclose provides a kind of the method implemented by computer, including:In response to receiving the defeated of user
Enter, creation intention of the user about song to be generated is determined based on the input;It is used for based on creation intention generation
The template of the song, the template indicate the distribution of the melody and the lyrics of the song relative to the melody;And at least
It is based in part on the lyrics of song described in the template generation.
In some implementations, generating the lyrics further includes:The creation intention is based further on to generate the lyrics.
In some implementations, this method further comprises:The melody that the lyrics and the template indicate is combined with life
At the song.
In some implementations, this method further comprises:Obtain the sound model for indicating the sound characteristic of chanteur;It utilizes
The sound model generates the sound spectrum track of the lyrics;The melody that the sound spectrum track and the template are indicated
Synthesize the performance waveform of the song;And the song is played based on the performance waveform.
In some implementations, obtaining the sound model includes:Receive the sound clip of chanteur;And by using connecing
The sound clip received adjusts predefined average Voice model to obtain the sound model, the average Voice model
It is obtained using the sound clip of multiple and different chanteurs.
In some implementations, generating the template based on the creation intention includes:Based on the creation intention, from multiple
The template is selected in candidate template.
In some implementations, generating the template based on the creation intention includes:It will at least an existing song rotation
Rule is divided into multiple melody segments;Multiple candidate melody pieces are selected from the multiple melody segment based on the creation intention
Section;Based on the smoothness between the multiple candidate melody segment, by least two times in the multiple candidate melody segment
Melody fragment assembly is selected, to form the melody of the template instruction;And by analyzing be spliced described at least two
The lyrics in the corresponding song of candidate melody segment, to determine the lyrics of the template instruction relative to described in the melody points
Cloth.
In some implementations, generating the lyrics includes:It is based at least partially on the template generation candidate lyrics;And
It is inputted based on the user received to change the candidate lyrics to obtain the lyrics.
In some implementations, generating the lyrics includes:It obtains the predefined lyrics and generates model, the lyrics generate mould
Type is obtained using multiple existing lyrics;And model is generated using the lyrics, based on the lyrics described in the template generation.
In some implementations, the input includes at least one of following:Image, word, video or audio.
On the other hand, present disclose provides a kind of equipment.The equipment includes:Processing unit;And memory, it is coupled to
The processing unit and include the instruction being stored thereon, described instruction makes the equipment when being executed by the processing unit
Execute following action:In response to receiving the input of user, determine the user about song to be generated based on the input
Creation intention;Based on the creation intention generate for the song template, the template indicate the song melody and
Distribution of the lyrics relative to the melody;And it is based at least partially on the lyrics of song described in the template generation.
In some implementations, generating the lyrics further includes:The creation intention is based further on to generate the lyrics.
In some implementations, the action further comprises:The melody that the lyrics and the template are indicated combine with
Generate the song.
In some implementations, the action further comprises:Obtain the sound model for indicating the sound characteristic of chanteur;Profit
The sound spectrum track of the lyrics is generated with the sound model;The rotation that the sound spectrum track and the template are indicated
Rule synthesizes the performance waveform of the song;And the song is played based on the performance waveform.
In some implementations, obtaining the sound model includes:Receive the sound clip of chanteur;And by using connecing
The sound clip received adjusts predefined average Voice model to obtain the sound model, the average Voice model
It is obtained using the sound clip of multiple and different chanteurs.
In some implementations, generating the template based on the creation intention includes:Based on the creation intention, from multiple
The template is selected in candidate template.
In some implementations, generating the template based on the creation intention includes:It will at least an existing song rotation
Rule is divided into multiple melody segments;Multiple candidate melody pieces are selected from the multiple melody segment based on the creation intention
Section;Based on the smoothness between the multiple candidate melody segment, by least two times in the multiple candidate melody segment
Melody fragment assembly is selected, to form the melody of the template instruction;And by analyzing be spliced described at least two
The lyrics in the corresponding song of candidate melody segment, to determine the lyrics of the template instruction relative to described in the melody points
Cloth.
In some implementations, generating the lyrics includes:It is based at least partially on the template generation candidate lyrics;And
It is inputted based on the user received to change the candidate lyrics to obtain the lyrics.
In some implementations, generating the lyrics includes:It obtains the predefined lyrics and generates model, the lyrics generate mould
Type is obtained using multiple existing lyrics;And model is generated using the lyrics, based on the lyrics described in the template generation.
In some implementations, the input includes at least one of following:Image, word, video or audio.
On the other hand, present disclose provides a kind of computer program products, and the computer program product is by visibly
It is stored in non-transitory, computer storage medium and includes machine-executable instruction, the machine-executable instruction is by equipment
Make the equipment when execution:In response to receiving the input of user, determine the user about song to be generated based on the input
Bent creation intention;The template for the song is generated based on the creation intention, the template indicates the rotation of the song
Rule and distribution of the lyrics relative to the melody;And it is based at least partially on the lyrics of song described in the template generation.
In some implementations, the machine-executable instruction further makes the equipment when being executed by equipment:Further
The lyrics are generated based on the creation intention.
In some implementations, the machine-executable instruction further makes the equipment when being executed by equipment:It will be described
The lyrics and the melody of template instruction are combined to generate the song.
In some implementations, the machine-executable instruction further makes the equipment when being executed by equipment:Obtain table
Show the sound model of the sound characteristic of chanteur;The sound spectrum track of the lyrics is generated using the sound model;By institute
The melody for stating sound spectrum track and template instruction synthesizes the performance waveform of the song;And it is based on the performance wave
Shape plays the song.
In some implementations, the machine-executable instruction makes the equipment when being executed by equipment:Receive chanteur's
Sound clip;And predefined average Voice model is adjusted to obtain the sound by using the sound clip received
Sound model, the average Voice model are obtained using the sound clip of multiple and different chanteurs.
In some implementations, the machine-executable instruction makes the equipment when being executed by equipment:Based on the creation
It is intended to, the template is selected from multiple candidate templates.
In some implementations, the machine-executable instruction makes the equipment when being executed by equipment:By it is at least one
Some song lyrics are divided into multiple melody segments;Based on the creation intention multiple times are selected from the multiple melody segment
Select melody segment;Based on the smoothness between the multiple candidate melody segment, by the multiple candidate melody segment extremely
Few two candidate melody fragment assemblies, to form the melody of the template instruction;And by analysis be spliced described in
The lyrics in the corresponding song of at least two candidate's melody segments, to determine the lyrics of the template instruction relative to the melody
The distribution.
In some implementations, the machine-executable instruction makes the equipment when being executed by equipment:At least partly ground
In the template generation candidate lyrics;And it is inputted based on the user received to change the candidate lyrics to obtain the song
Word.
In some implementations, the machine-executable instruction makes the equipment when being executed by equipment:It obtains predefined
The lyrics generate model, and the lyrics are generated model and obtained using multiple existing lyrics;And generate mould using the lyrics
Type, based on the lyrics described in the template generation.
In some implementations, the input includes at least one of following:Image, word, video or audio.
Function described herein can be executed by one or more hardware logic components at least partly.Example
Such as, without limitation, the hardware logic component for the exemplary type that can be used includes:Field programmable gate array (FPGA), specially
With integrated circuit (ASIC), Application Specific Standard Product (ASSP), system on chip (SOC), complex programmable logic equipment (CPLD) etc.
Deng.
Any combinations that one or more programming languages may be used in program code for implementing disclosed method are come
It writes.These program codes can be supplied to the place of all-purpose computer, special purpose computer or other programmable data processing units
Manage device or controller so that program code makes defined in flowchart and or block diagram when by processor or controller execution
Function/operation is carried out.Program code can execute completely on machine, partly execute on machine, as stand alone software
Is executed on machine and partly execute or executed on remote machine or server completely on the remote machine to packet portion.
In the context of the disclosure, machine readable media can be tangible medium, can include or be stored for
The program that instruction execution system, device or equipment are used or is used in combination with instruction execution system, device or equipment.Machine can
It can be machine-readable signal medium or machine-readable storage medium to read medium.Machine readable media can include but is not limited to electricity
Son, magnetic, optical, electromagnetism, infrared or semiconductor system, device or equipment or the above any conjunction
Suitable combination.The more specific example of machine readable storage medium will include being electrically connected of line based on one or more, portable meter
Calculation machine disk, hard disk, random access memory (RAM), read-only memory (ROM), Erasable Programmable Read Only Memory EPROM (EPROM
Or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage facilities or
Any appropriate combination of the above.
Although in addition, depicting each operation using certain order, this should be understood as requirement operation in this way with shown
The certain order that goes out executes in sequential order, or requires the operation of all diagrams that should be performed to obtain desired result.
Under certain environment, it may be advantageous for multitask and parallel processing.Similarly, although containing several tools in being discussed above
Body realizes details, but these are not construed as the limitation to the scope of the present disclosure.In the context individually realized
Certain features of description can also be realized in combination in single realize.On the contrary, described in the context individually realized
Various features can also individually or in any suitable subcombination be realized in multiple realizations.
Although having used specific to this theme of the language description of structure feature and/or method logical action, answer
When understanding that the theme defined in the appended claims is not necessarily limited to special characteristic described above or action.On on the contrary,
Special characteristic described in face and action are only to realize the exemplary forms of claims.
Claims (20)
1. a kind of the method implemented by computer, including:
In response to receiving the input of user, creation intention of the user about song to be generated is determined based on the input;
The template for the song is generated based on the creation intention, the template indicates the melody and lyrics phase of the song
Distribution for the melody;And
It is based at least partially on the lyrics of song described in the template generation.
2. according to the method described in claim 1, wherein generating the lyrics and further including:
The creation intention is based further on to generate the lyrics.
3. according to the method described in claim 1, further comprising:
The melody that the lyrics and the template indicate is combined to generate the song.
4. according to the method described in claim 1, further comprising:
Obtain the sound model for indicating the sound characteristic of chanteur;
The sound spectrum track of the lyrics is generated using the sound model;
The melody that the sound spectrum track and the template indicate is synthesized to the performance waveform of the song;And
The song is played based on the performance waveform.
5. according to the method described in claim 4, wherein obtaining the sound model and including:
Receive the sound clip of chanteur;And
Predefined average Voice model is adjusted by using the sound clip received to obtain the sound model, institute
Average Voice model is stated to obtain using the sound clip of multiple and different chanteurs.
6. according to the method described in claim 1, wherein including based on the creation intention generation template:
Based on the creation intention, the template is selected from multiple candidate templates.
7. according to the method described in claim 1, wherein including based on the creation intention generation template:
At least an existing song lyric is divided into multiple melody segments;
Multiple candidate melody segments are selected from the multiple melody segment based on the creation intention;
Based on the smoothness between the multiple candidate melody segment, by least two times in the multiple candidate melody segment
Melody fragment assembly is selected, to form the melody of the template instruction;And
The lyrics in the corresponding song of at least two candidates melody segment being spliced by analysis, to determine the template
The distribution of the lyrics of instruction relative to the melody.
8. according to the method described in claim 1, wherein generating the lyrics and including:
It is based at least partially on the template generation candidate lyrics;And
It is inputted based on the user received to change the candidate lyrics to obtain the lyrics.
9. according to the method described in claim 1, wherein generating the lyrics and including:
It obtains the predefined lyrics and generates model, the lyrics are generated model and obtained using multiple existing lyrics;And
Model is generated using the lyrics, based on the lyrics described in the template generation.
10. according to the method described in claim 1, the wherein described input includes at least one of following:Image, word, video or
Audio.
11. a kind of equipment, including:
Processing unit;And
Memory is coupled to the processing unit and includes the instruction being stored thereon, and described instruction is single by the processing
Member makes the equipment execute following action when executing:
In response to receiving the input of user, creation intention of the user about song to be generated is determined based on the input;
The template for the song is generated based on the creation intention, the template indicates the melody and lyrics phase of the song
Distribution for the melody;And
It is based at least partially on the lyrics of song described in the template generation.
12. equipment according to claim 11, wherein generating the lyrics and further including:
The creation intention is based further on to generate the lyrics.
13. equipment according to claim 11, wherein the action further comprises:
The melody that the lyrics and the template indicate is combined to generate the song.
14. equipment according to claim 11, wherein the action further comprises:
Obtain the sound model for indicating the sound characteristic of chanteur;
The sound spectrum track of the lyrics is generated using the sound model;
The melody that the sound spectrum track and the template indicate is synthesized to the performance waveform of the song;And
The song is played based on the performance waveform.
15. equipment according to claim 14, wherein obtaining the sound model and including:
Receive the sound clip of chanteur;And
Predefined average Voice model is adjusted by using the sound clip received to obtain the sound model, institute
Average Voice model is stated to obtain using the sound clip of multiple and different chanteurs.
16. equipment according to claim 11, wherein including based on the creation intention generation template:
Based on the creation intention, the template is selected from multiple candidate templates.
17. equipment according to claim 11, wherein including based on the creation intention generation template:
At least an existing song lyric is divided into multiple melody segments;
Multiple candidate melody segments are selected from the multiple melody segment based on the creation intention;
Based on the smoothness between the multiple candidate melody segment, by least two times in the multiple candidate melody segment
Melody fragment assembly is selected, to form the melody of the template instruction;And
The lyrics in the corresponding song of at least two candidates melody segment being spliced by analysis, to determine the template
The distribution of the lyrics of instruction relative to the melody.
18. equipment according to claim 11, wherein generating the lyrics and including:
It is based at least partially on the template generation candidate lyrics;And
It is inputted based on the user received to change the candidate lyrics to obtain the lyrics.
19. equipment according to claim 11, wherein generating the lyrics and including:
It obtains the predefined lyrics and generates model, the lyrics are generated model and obtained using multiple existing lyrics;And
Model is generated using the lyrics, based on the lyrics described in the template generation.
20. a kind of computer program product, the computer program product is tangibly stored in non-transient computer storage and is situated between
In matter and include machine-executable instruction, the machine-executable instruction makes the equipment when being executed by equipment:
In response to receiving the input of user, creation intention of the user about song to be generated is determined based on the input;
The template for the song is generated based on the creation intention, the template indicates the melody and lyrics phase of the song
Distribution for the melody;And
It is based at least partially on the lyrics of song described in the template generation.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710284144.8A CN108806655B (en) | 2017-04-26 | 2017-04-26 | Automatic generation of songs |
PCT/US2018/028044 WO2018200268A1 (en) | 2017-04-26 | 2018-04-18 | Automatic song generation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710284144.8A CN108806655B (en) | 2017-04-26 | 2017-04-26 | Automatic generation of songs |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108806655A true CN108806655A (en) | 2018-11-13 |
CN108806655B CN108806655B (en) | 2022-01-07 |
Family
ID=62165623
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710284144.8A Active CN108806655B (en) | 2017-04-26 | 2017-04-26 | Automatic generation of songs |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108806655B (en) |
WO (1) | WO2018200268A1 (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109903743A (en) * | 2019-01-03 | 2019-06-18 | 江苏食品药品职业技术学院 | A method of music rhythm is automatically generated based on template |
CN110164412A (en) * | 2019-04-26 | 2019-08-23 | 吉林大学珠海学院 | A kind of music automatic synthesis method and system based on LSTM |
CN110808019A (en) * | 2019-10-31 | 2020-02-18 | 维沃移动通信有限公司 | Song generation method and electronic equipment |
CN111161695A (en) * | 2019-12-26 | 2020-05-15 | 北京百度网讯科技有限公司 | Song generation method and device |
CN111680185A (en) * | 2020-05-29 | 2020-09-18 | 平安科技(深圳)有限公司 | Music generation method, music generation device, electronic device and storage medium |
CN112185321A (en) * | 2019-06-14 | 2021-01-05 | 微软技术许可有限责任公司 | Song generation |
CN112699269A (en) * | 2020-12-30 | 2021-04-23 | 北京达佳互联信息技术有限公司 | Lyric display method, device, electronic equipment and computer readable storage medium |
CN112837664A (en) * | 2020-12-30 | 2021-05-25 | 北京达佳互联信息技术有限公司 | Song melody generation method and device and electronic equipment |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109684501B (en) * | 2018-11-26 | 2023-08-22 | 平安科技(深圳)有限公司 | Lyric information generation method and device |
CN109783798A (en) * | 2018-12-12 | 2019-05-21 | 平安科技(深圳)有限公司 | Method, apparatus, terminal and the storage medium of text information addition picture |
CN109815363A (en) * | 2018-12-12 | 2019-05-28 | 平安科技(深圳)有限公司 | Generation method, device, terminal and the storage medium of lyrics content |
CN109616090B (en) * | 2018-12-24 | 2020-12-18 | 北京达佳互联信息技术有限公司 | Multi-track sequence generation method, device, equipment and storage medium |
CN111782864B (en) * | 2020-06-30 | 2023-11-07 | 腾讯音乐娱乐科技(深圳)有限公司 | Singing audio classification method, computer program product, server and storage medium |
CN112632906A (en) * | 2020-12-30 | 2021-04-09 | 北京达佳互联信息技术有限公司 | Lyric generation method, device, electronic equipment and computer readable storage medium |
CN113793578B (en) * | 2021-08-12 | 2023-10-20 | 咪咕音乐有限公司 | Method, device and equipment for generating tune and computer readable storage medium |
CN113792178A (en) * | 2021-08-31 | 2021-12-14 | 北京达佳互联信息技术有限公司 | Song generation method and device, electronic equipment and storage medium |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1326303A (en) * | 2000-05-25 | 2001-12-12 | 雅马哈株式会社 | Portable communication terminal device with music mixing |
US20030024376A1 (en) * | 2001-08-06 | 2003-02-06 | Yamaha Corporation | Electronic musical apparatus customizing method |
WO2007053917A2 (en) * | 2005-11-14 | 2007-05-18 | Continental Structures Sprl | Method for composing a piece of music by a non-musician |
CN101326569A (en) * | 2005-12-09 | 2008-12-17 | 索尼株式会社 | Music edit device and music edit method |
US20100162879A1 (en) * | 2008-12-29 | 2010-07-01 | International Business Machines Corporation | Automated generation of a song for process learning |
CN102024453A (en) * | 2009-09-09 | 2011-04-20 | 财团法人资讯工业策进会 | Singing sound synthesis system, method and device |
US20110231193A1 (en) * | 2008-06-20 | 2011-09-22 | Microsoft Corporation | Synthesized singing voice waveform generator |
US20120294457A1 (en) * | 2011-05-17 | 2012-11-22 | Fender Musical Instruments Corporation | Audio System and Method of Using Adaptive Intelligence to Distinguish Information Content of Audio Signals and Control Signal Processing Function |
US20130218929A1 (en) * | 2012-02-16 | 2013-08-22 | Jay Kilachand | System and method for generating personalized songs |
CN104485101A (en) * | 2014-11-19 | 2015-04-01 | 成都云创新科技有限公司 | Method for automatically generating music melody on basis of template |
CN104766603A (en) * | 2014-01-06 | 2015-07-08 | 安徽科大讯飞信息科技股份有限公司 | Method and device for building personalized singing style spectrum synthesis model |
CN105161081A (en) * | 2015-08-06 | 2015-12-16 | 蔡雨声 | APP humming composition system and method thereof |
CN105513607A (en) * | 2015-11-25 | 2016-04-20 | 网易传媒科技(北京)有限公司 | Method and apparatus for music composition and lyric writing |
CN105575393A (en) * | 2015-12-02 | 2016-05-11 | 中国传媒大学 | Personalized song recommendation method based on voice timbre |
CN106373580A (en) * | 2016-09-05 | 2017-02-01 | 北京百度网讯科技有限公司 | Singing synthesis method based on artificial intelligence and device |
WO2017058844A1 (en) * | 2015-09-29 | 2017-04-06 | Amper Music, Inc. | Machines, systems and processes for automated music composition and generation employing linguistic and/or graphical icon based musical experience descriptors |
-
2017
- 2017-04-26 CN CN201710284144.8A patent/CN108806655B/en active Active
-
2018
- 2018-04-18 WO PCT/US2018/028044 patent/WO2018200268A1/en active Application Filing
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1326303A (en) * | 2000-05-25 | 2001-12-12 | 雅马哈株式会社 | Portable communication terminal device with music mixing |
US20030024376A1 (en) * | 2001-08-06 | 2003-02-06 | Yamaha Corporation | Electronic musical apparatus customizing method |
WO2007053917A2 (en) * | 2005-11-14 | 2007-05-18 | Continental Structures Sprl | Method for composing a piece of music by a non-musician |
CN101326569A (en) * | 2005-12-09 | 2008-12-17 | 索尼株式会社 | Music edit device and music edit method |
US20110231193A1 (en) * | 2008-06-20 | 2011-09-22 | Microsoft Corporation | Synthesized singing voice waveform generator |
US20100162879A1 (en) * | 2008-12-29 | 2010-07-01 | International Business Machines Corporation | Automated generation of a song for process learning |
CN102024453A (en) * | 2009-09-09 | 2011-04-20 | 财团法人资讯工业策进会 | Singing sound synthesis system, method and device |
US20120294457A1 (en) * | 2011-05-17 | 2012-11-22 | Fender Musical Instruments Corporation | Audio System and Method of Using Adaptive Intelligence to Distinguish Information Content of Audio Signals and Control Signal Processing Function |
US20130218929A1 (en) * | 2012-02-16 | 2013-08-22 | Jay Kilachand | System and method for generating personalized songs |
CN104766603A (en) * | 2014-01-06 | 2015-07-08 | 安徽科大讯飞信息科技股份有限公司 | Method and device for building personalized singing style spectrum synthesis model |
CN104485101A (en) * | 2014-11-19 | 2015-04-01 | 成都云创新科技有限公司 | Method for automatically generating music melody on basis of template |
CN105161081A (en) * | 2015-08-06 | 2015-12-16 | 蔡雨声 | APP humming composition system and method thereof |
WO2017058844A1 (en) * | 2015-09-29 | 2017-04-06 | Amper Music, Inc. | Machines, systems and processes for automated music composition and generation employing linguistic and/or graphical icon based musical experience descriptors |
CN105513607A (en) * | 2015-11-25 | 2016-04-20 | 网易传媒科技(北京)有限公司 | Method and apparatus for music composition and lyric writing |
CN105575393A (en) * | 2015-12-02 | 2016-05-11 | 中国传媒大学 | Personalized song recommendation method based on voice timbre |
CN106373580A (en) * | 2016-09-05 | 2017-02-01 | 北京百度网讯科技有限公司 | Singing synthesis method based on artificial intelligence and device |
Non-Patent Citations (1)
Title |
---|
TOIVANEN J等: "Automatical composition of lyrical songs", 《THE FOURTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL CREATIVITY 2013》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109903743A (en) * | 2019-01-03 | 2019-06-18 | 江苏食品药品职业技术学院 | A method of music rhythm is automatically generated based on template |
CN110164412A (en) * | 2019-04-26 | 2019-08-23 | 吉林大学珠海学院 | A kind of music automatic synthesis method and system based on LSTM |
CN112185321A (en) * | 2019-06-14 | 2021-01-05 | 微软技术许可有限责任公司 | Song generation |
CN112185321B (en) * | 2019-06-14 | 2024-05-31 | 微软技术许可有限责任公司 | Song generation |
CN110808019A (en) * | 2019-10-31 | 2020-02-18 | 维沃移动通信有限公司 | Song generation method and electronic equipment |
CN111161695A (en) * | 2019-12-26 | 2020-05-15 | 北京百度网讯科技有限公司 | Song generation method and device |
CN111680185A (en) * | 2020-05-29 | 2020-09-18 | 平安科技(深圳)有限公司 | Music generation method, music generation device, electronic device and storage medium |
WO2021115311A1 (en) * | 2020-05-29 | 2021-06-17 | 平安科技(深圳)有限公司 | Song generation method, apparatus, electronic device, and storage medium |
CN112699269A (en) * | 2020-12-30 | 2021-04-23 | 北京达佳互联信息技术有限公司 | Lyric display method, device, electronic equipment and computer readable storage medium |
CN112837664A (en) * | 2020-12-30 | 2021-05-25 | 北京达佳互联信息技术有限公司 | Song melody generation method and device and electronic equipment |
CN112837664B (en) * | 2020-12-30 | 2023-07-25 | 北京达佳互联信息技术有限公司 | Song melody generation method and device and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN108806655B (en) | 2022-01-07 |
WO2018200268A1 (en) | 2018-11-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108806655A (en) | Song automatically generates | |
CN108806656A (en) | Song automatically generates | |
CN108369799B (en) | Machines, systems, and processes for automatic music synthesis and generation with linguistic and/or graphical icon-based music experience descriptors | |
EP3803846B1 (en) | Autonomous generation of melody | |
Urbain et al. | Arousal-driven synthesis of laughter | |
Bell | The dB in the. db: Vocaloid software as posthuman instrument | |
CN109741724A (en) | Make the method, apparatus and intelligent sound of song | |
Ben-Tal | Characterising musical gestures | |
CN113178182A (en) | Information processing method, information processing device, electronic equipment and storage medium | |
Zhu et al. | A Survey of AI Music Generation Tools and Models | |
Collins | A funny thing happened on the way to the formula: Algorithmic composition for musical theater | |
CN114974184A (en) | Audio production method and device, terminal equipment and readable storage medium | |
Mesaros | Singing voice recognition for music information retrieval | |
Kleinberger et al. | Voice at NIME: a Taxonomy of New Interfaces for Vocal Musical Expression | |
CN108922505B (en) | Information processing method and device | |
CN112382274A (en) | Audio synthesis method, device, equipment and storage medium | |
Furduj | Virtual orchestration: a film composer's creative practice | |
Thompson IV | Creating Musical Scores Inspired by the Intersection of Human Speech and Music Through Model-Based Cross Synthesis | |
Beňuš et al. | Prosody II: Intonation | |
CN116645957B (en) | Music generation method, device, terminal, storage medium and program product | |
KR102623459B1 (en) | Method, apparatus and system for providing audition event service based on user's vocal evaluation | |
Midtlyng et al. | Voice adaptation by color-encoded frame matching as a multi-objective optimization problem for future games | |
Karipidou et al. | Computer analysis of sentiment interpretation in musical conducting | |
WO2016039463A1 (en) | Acoustic analysis device | |
Wang et al. | Visual signatures for music mood and timbre |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |