CN110019919A - A kind of generation method and device of the rhymed lyrics - Google Patents
A kind of generation method and device of the rhymed lyrics Download PDFInfo
- Publication number
- CN110019919A CN110019919A CN201710939775.9A CN201710939775A CN110019919A CN 110019919 A CN110019919 A CN 110019919A CN 201710939775 A CN201710939775 A CN 201710939775A CN 110019919 A CN110019919 A CN 110019919A
- Authority
- CN
- China
- Prior art keywords
- lyrics
- scene
- rhymed
- image
- rhyme
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/683—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/685—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using automatically derived transcript of audio data, e.g. lyrics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
Abstract
The embodiment of the invention discloses the generation methods and device of a kind of rhymed lyrics, for automatically generating the rhymed lyrics according to input picture.The embodiment of the present invention provides a kind of generation method of rhymed lyrics, comprising: carries out scene Recognition respectively to multiple images inputted in terminal, generates the descriptive text for being matched with the corresponding scene of multiple described images respectively;The Chinese phonetic alphabet corresponding to the last one word and rhyme in the descriptive text are obtained from the descriptive text that the corresponding scene matching of every image goes out;The rhymed lyrics for corresponding to multiple images are generated according to the Chinese phonetic alphabet corresponding to the last one word in the descriptive text and rhyme, wherein, the last one word rhyme having the same for the descriptive text that the corresponding rhymed lyrics of every image scene corresponding with the image matches.
Description
Technical field
The present invention relates to field of computer technology more particularly to a kind of generation methods and device of the rhymed lyrics.
Background technique
Music plays irreplaceable role in people's lives, and music is different according to timing and can be divided into more
Kind music type, wherein hip-hop music (Rap music or Hip hop) is that a kind of follow accompanies, chants (too talkative) with the rhythm
Music style, accompaniment fecund used is born in music sampling means.The generating mode of music is mainly carried out by people at present
Artificial creation is to complete, such as hip-hop music can work out hip-hop music by professional hip-hop singer.But for no music foundation
People for, do not have the ability of art music.
In order to realize creating without threshold for music, need to generate the music appreciated for ordinary user, and in the life of music
At in the process, rhymed design is vital link.The text usually to be rhymed in the prior art by manual compiling, and
It which type of rhythm to be also by manually determining using.This rhymed design method needs to expend many time, cannot achieve
The rhymed lyrics automatically generate.
Summary of the invention
The embodiment of the invention provides the generation methods and device of a kind of rhymed lyrics, for automatic according to input picture
Generate the rhymed lyrics.
In order to solve the above technical problems, the embodiment of the present invention the following technical schemes are provided:
In a first aspect, the embodiment of the present invention provides a kind of generation method of rhymed lyrics, comprising:
Scene Recognition is carried out respectively to multiple images inputted in terminal, generation is matched with multiple described images respectively
The descriptive text of corresponding scene;
The last one in the descriptive text is obtained from the descriptive text that the corresponding scene matching of every image goes out
The Chinese phonetic alphabet corresponding to word and rhyme;
Multiple described in corresponding to are generated according to the Chinese phonetic alphabet corresponding to the last one word in the descriptive text and rhyme
The rhymed lyrics of image, wherein the descriptive text that the corresponding rhymed lyrics of every image scene corresponding with the image matches
The last one word rhyme having the same.
Second aspect, the embodiment of the present invention also provide a kind of generating means of rhymed lyrics, comprising:
Scene Recognition module generates difference for carrying out scene Recognition respectively to multiple images inputted in terminal
Descriptive text assigned in the corresponding scene of multiple described images;
Rhyme obtains module, described in obtaining from the descriptive text that the corresponding scene matching of every image goes out
The Chinese phonetic alphabet corresponding to the last one word and rhyme in descriptive text;
Lyrics generation module, for the Chinese phonetic alphabet according to corresponding to the last one word in the descriptive text and rhyme
Generate the rhymed lyrics for corresponding to multiple images, wherein the corresponding rhymed lyrics of every image scene corresponding with the image
The last one word rhyme having the same of the descriptive text matched.
The third aspect of the application provides a kind of computer readable storage medium, the computer readable storage medium
In be stored with instruction, when run on a computer, so that computer executes method described in above-mentioned various aspects.
As can be seen from the above technical solutions, the embodiment of the present invention has the advantage that
In embodiments of the present invention, scene Recognition is carried out to multiple images inputted in terminal first respectively, generates and divides
It is not matched with the descriptive text of the corresponding scene of multiple images, the description text then gone out from the corresponding scene matching of every image
The Chinese phonetic alphabet and rhyme corresponding to the last one word in descriptive text are obtained in word, finally can according in descriptive text most
The Chinese phonetic alphabet corresponding to the latter word and rhyme generate the rhymed lyrics for corresponding to multiple images, wherein every image is corresponding
The last one word rhyme having the same of descriptive text for matching of rhymed lyrics scene corresponding with the image.The present invention
Image music can be generated by only needing terminal to provide multiple images in embodiment, by carrying out scene Recognition to multiple images,
Then Auto-matching goes out the descriptive text being adapted with scene, then carries out rhymed design to the descriptive text of scene, raw in this way
At the rhymed lyrics meet Music.The rhymed lyrics are the image sounds for generating according to the image of terminal input, therefore exporting
Pleasure can get up with picture material tight association provided by user, therefore can automatically generate the rhymed lyrics according to input picture.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, required in being described below to embodiment
The attached drawing used is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention,
To those skilled in the art, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of process blocks schematic diagram of the generation method of the rhymed lyrics provided in an embodiment of the present invention;
Fig. 2 is a kind of process blocks schematic diagram of the generation method of music provided in an embodiment of the present invention;
Fig. 3 is the product process schematic diagram of hip-hop music provided in an embodiment of the present invention;
Fig. 4-a is the schematic diagram that client provided in an embodiment of the present invention uploads multiple images;
Fig. 4-b is the flow diagram that the rhymed lyrics provided in an embodiment of the present invention are converted to voice;
Fig. 5-a is a kind of composed structure schematic diagram of the generating means of music provided in an embodiment of the present invention;
Fig. 5-b is a kind of composed structure schematic diagram of scene Recognition module provided in an embodiment of the present invention;
Fig. 5-c is a kind of composed structure schematic diagram of rhymed matching module provided in an embodiment of the present invention;
Fig. 5-d is a kind of composed structure schematic diagram of lyrics generation module provided in an embodiment of the present invention;
Fig. 5-e is the composed structure schematic diagram that a kind of lyrics provided in an embodiment of the present invention obtain module;
Fig. 5-f is a kind of composed structure schematic diagram of speech production module provided in an embodiment of the present invention;
Fig. 6-a is a kind of composed structure schematic diagram of the generating means of the rhymed lyrics provided in an embodiment of the present invention;
Fig. 6-b is a kind of composed structure schematic diagram of scene Recognition module provided in an embodiment of the present invention;
Fig. 6-c is a kind of composed structure schematic diagram of lyrics generation module provided in an embodiment of the present invention;
Fig. 6-d is the composed structure schematic diagram that a kind of lyrics provided in an embodiment of the present invention obtain module;
Fig. 7 is that the generation method of the rhymed lyrics provided in an embodiment of the present invention is applied to the composed structure schematic diagram of terminal.
Specific embodiment
The embodiment of the invention provides the generation methods and device of a kind of rhymed lyrics, for automatic according to input picture
Generate the rhymed lyrics.
In order to make the invention's purpose, features and advantages of the invention more obvious and easy to understand, below in conjunction with this hair
Attached drawing in bright embodiment, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that is retouched below
The embodiment stated is only a part of the embodiment of the present invention, and not all embodiments.Based on the embodiments of the present invention, ability
The technical staff in domain every other embodiment obtained, shall fall within the protection scope of the present invention.
Term " includes " in description and claims of this specification and above-mentioned attached drawing and " having " and they
Any deformation, it is intended that cover it is non-exclusive include, so as to a series of process, method comprising units, system, product or
Equipment is not necessarily limited to those units, but is not clearly listed or for these process, methods, product or sets
Standby intrinsic other units.
It is described in detail separately below.
One embodiment of the generation method of the rhymed lyrics of the present invention specifically can be applied to input based on user more
Kind image generates the matched rhymed lyrics of descriptive text with the image.Refering to Figure 1, one embodiment of the invention mentions
The generation method of the rhymed lyrics supplied, may include steps of:
101, scene Recognition is carried out respectively to multiple images inputted in terminal, generation is matched with multiple images respectively
The descriptive text of corresponding scene.
101, scene Recognition is carried out respectively to multiple images inputted in terminal, generation is matched with multiple images respectively
The descriptive text of corresponding scene.
In embodiments of the present invention, user can input multiple images for generating image music in terminal, and the present invention is real
It applies image music described in example and refers to the tool rhythmical music adaptable with multiple images of user's input.Wherein, eventually
Multiple images inputted in end can be user and be pre-reserved to terminal, are also possible to user and use taking the photograph for terminal in real time
Picture is collected, such as multiple images can be after entering photographing mode by terminal and collect;Or, multiple images from
It is got in the photograph album of terminal.Implementation for multiple images inputted in terminal, without limitation.
In embodiments of the present invention, scene Recognition can be carried out respectively for multiple images inputted in terminal, thus
It identifies the corresponding scene of every image, such as the classification of image scene can there are many implementations, such as can lead
Divide the four class scenes such as landscape, personage, food, self-timer, carries out image scene identification according to user's uploading pictures, every is schemed
As carrying out scene Recognition, and respectively every image Auto-matching goes out to describe the text of the corresponding scene of different images, such as
Multiple images are subjected to scene Recognition, for example if having blue sky and bird on an image, can be given automatically after scene Recognition
The descriptive text of " bird hovers on blue sky " out.
In some embodiments of the invention, step 101 carries out scene knowledge to multiple images inputted in terminal respectively
Not, the descriptive text for being matched with the corresponding scene of multiple images respectively is generated, comprising:
A1, scene Recognition is carried out to multiple images according to deep learning neural network model, the image identified is special
Sign, and the corresponding scene of multiple images is determined according to characteristics of image;
A2, iamge description generation is carried out according to the characteristics of image identified scene corresponding with multiple images, obtained
The descriptive text that the corresponding scene of multiple images matches respectively.
Wherein, scene knowledge can be carried out to multiple images using deep learning neural network model in the embodiment of the present invention
Not, which is referred to as neuro images annotation model, passes through deep learning neural network model
It can identify characteristics of image, the corresponding scene of multiple images is determined according to characteristics of image.Wherein, image recognition refers to
Image is handled, analyzed and is understood using computer, to identify the target of various different modes and to the technology of picture.It connects
Get off, iamge description generation is carried out according to the characteristics of image identified scene corresponding with multiple images, obtains multiple figures
The descriptive text matched respectively as corresponding scene.Image scene is identified using deep learning neural network, and from
The dynamic associated description Chinese language word for matching the scene.Wherein, iamge description generation refers to based on computer vision, with scene and object
Body classification information extracts characteristics of image as priori knowledge, and collaboration generates the iamge description sentence of fusion scene and object category
Son.
102, from the corresponding scene matching of every image go out descriptive text in obtain descriptive text in the last one word institute
The corresponding Chinese phonetic alphabet and rhyme.
103, the Chinese phonetic alphabet according to corresponding to the last one word in descriptive text and rhyme, which generate, corresponds to multiple images
The rhymed lyrics, wherein the descriptive text that the corresponding rhymed lyrics of every image scene corresponding with the image matches it is last
One word rhyme having the same.
In embodiments of the present invention, the description that the corresponding scene matching of every image goes out can be generated by scene Recognition
Text, the descriptive text that the corresponding scene of every image is matched are the further foundations for generating the lyrics, pass through every image
The descriptive text that scene matching goes out can carry out the rhymed design of text, can generate the rhymed lyrics for every image,
Wherein, the rhymed lyrics refer to that one section of lyrics for having rhymed, the corresponding rhymed lyrics of every image can be a lyrics,
It is also possible to two or more the lyrics.
In some embodiments of the invention, it obtains and retouches from the descriptive text that the corresponding scene matching of every image goes out
State the Chinese phonetic alphabet corresponding to the last one word and rhyme in text.According to the Chinese corresponding to the last one word in descriptive text
Language phonetic and rhyme generate the rhymed lyrics for corresponding to multiple images, wherein the corresponding rhymed lyrics of every image and the image
The last one word rhyme having the same for the descriptive text that corresponding scene matches.
Wherein, the descriptive text matched to the corresponding scene of every image, can last in the descriptive text
The Chinese phonetic alphabet corresponding to a word and rhyme.In Chinese character, Chinese character is commonly used less than 8000, therefore can pre- Mr.
At the pinyin table of Chinese characters in common use, index is established according to Chinese character and is loaded into memory, can according to need acquisition phonetic transcriptions of Chinese characters, Cha Yun
There are 35 kinds of simple or compound vowel of a Chinese syllable known to matrix, all simple or compound vowel of a Chinese syllable can be placed in an array, and according to simple or compound vowel of a Chinese syllable length according to greatly to float
Then sequence successively compares character string, to obtain rhyme corresponding to the last one word.It is last in getting descriptive text
The corresponding Chinese phonetic alphabet of one word and rhyme and then based on the Chinese phonetic alphabet corresponding to the last one word in descriptive text and
Rhyme generates the rhymed lyrics for corresponding to multiple images, wherein the corresponding rhymed lyrics of every image scene corresponding with the image
The last one word rhyme having the same of the descriptive text matched, therefore the rhyme for the rhymed lyrics that can be generated can be with
It from the rhyme of the last one word of descriptive text, is designed using identical rhyme, it is ensured that generated corresponding to more
The rhymed lyrics for opening image can have the rhyme of the coordinating and unifying, read bright suitable for reading.
Further, in some embodiments of the invention, step 103 is right according to the last one word institute in descriptive text
The Chinese phonetic alphabet and rhyme answered generate the rhymed lyrics for corresponding to multiple images, comprising:
1031, all simple or compound vowel of a Chinese syllable are arranged out from the Chinese phonetic alphabet corresponding to the last one word in descriptive text;
1032, simple or compound vowel of a Chinese syllable Distribution dynamics are determined according to all simple or compound vowel of a Chinese syllable arranged out;
1033, rhythm corresponding to the last one word in descriptive text is determined from the simple or compound vowel of a Chinese syllable for meeting simple or compound vowel of a Chinese syllable Distribution dynamics
Foot;
1034, according to corresponding rhyme under the corresponding scene of multiple images and each scene from the pre-generated lyrics
The rhymed lyrics corresponding to multiple images are got in template, are provided in advance in lyrics template corresponding to several scenes and more
The lyric characters of kind rhyme.
Wherein, the corresponding scene of every image all matches descriptive text in multiple images, all wraps in these descriptive texts
The last one word has been included, therefore all possible simple or compound vowel of a Chinese syllable can be arranged out from the Chinese phonetic alphabet in the last one multiple word, has been every
A simple or compound vowel of a Chinese syllable is all pre-generated to have the descriptive text of more different scenes as lyrics template, passes through the data of multiple descriptive texts
Sample finds the simple or compound vowel of a Chinese syllable Distribution dynamics of the last one word in descriptive text, finds out several simple or compound vowel of a Chinese syllable at most distributed, for these types of rhythm
Mother increases data volume, may thereby determine that using which simple or compound vowel of a Chinese syllable as rhyme, the rhyme filtered out based on simple or compound vowel of a Chinese syllable Distribution dynamics
Lyrics template is searched for, the available rhymed lyrics to corresponding to multiple images of the lyrics template are used.
It is illustrated below, it, can be raw for different scene difference rhymes by taking the rhymed lyrics of hip-hop music generate as an example
At corresponding hip-hop words art as lyrics template, it is selective that high-frequency rhyme generates more art.Then according to rhyme and
Art is randomly selected if a matching to generate the rhymed lyrics of hip-hop music in scene.Art, phase are talked about for identical hip-hop
It is exactly identical with the hip-hop words art that rhyme generates, can is that these frequencies are high when the frequency that certain rhymes occur is very high
Rhyme generates more lyrics templates, so as to generate the rhymed lyrics from multiple lyrics templates.
Further, in some embodiments of the invention, step 1034 is according to the corresponding scene of multiple images and each
Corresponding rhyme gets the rhymed lyrics corresponding to multiple images from pre-generated lyrics template under a scene, comprising:
10341, the iamge description lyrics are generated according to the descriptive text that the corresponding scene matching of every image goes out;
10342, according to corresponding rhyme under the corresponding scene of multiple images and each scene from the pre-generated lyrics
The supplement lyrics are got in template;
10343, the iamge description lyrics and the supplement lyrics are synthesized together, obtain the rhymed lyrics.
Specifically, in the above embodiment of the invention, descriptive text that the corresponding scene matching of every image goes out can be with
As the iamge description lyrics, the iamge description lyrics refer to the lyrics from descriptive text, such as descriptive text can be " small
Bird hovers on blue sky ", which can be used as the iamge description lyrics.Supplement can also be got in step 10342
The generation of the lyrics, the supplement lyrics can carry out simultaneously with the generation of the iamge description lyrics, and the supplement lyrics refer to from the lyrics
The lyrics that template obtains, the supplement lyrics and the iamge description lyrics can have identical rhyme.Finally by the iamge description lyrics with
The supplement lyrics are synthesized together, and obtain the rhymed lyrics.For example, carrying out rhymed supplement to descriptive text, such as figure
It is " bird hovers on blue sky " as describing the lyrics, can find from lyrics template has identical rhyme with the image lyrics
The lyrics are supplemented, which can be " much the same good, much the same to praise ", so ultimately generating in the embodiment of the present application
The rhymed lyrics may is that bird hovers on blue sky;It is much the same good, it is much the same to praise.
Further, in some embodiments of the invention, step 10342, according to the corresponding scene of multiple images and
Corresponding rhyme gets the supplement lyrics from pre-generated lyrics template under each scene, comprising:
The lyrics, which are described, according to described image is determined for compliance with double rhymed rhymes;
Meet double rhymed rhymes from preparatory according to corresponding under the corresponding scene of multiple described images and each scene
The supplement lyrics are got in the lyrics template of generation.
Wherein, it when the embodiment of the present invention obtains rhyme by the iamge description lyrics, can also be determined for compliance with double rhymed
Rhyme, it is double it is rhymed be exactly rhyme be the rhymed of two words, based on scene and meet double rhymes to rhyme can be from lyrics template
In get the supplement lyrics, generate the supplement lyrics by double rhymed modes, allow the supplement lyrics and iamge description lyrics
With identical double rhymed rhymes.
By above embodiments to the description of the embodiment of the present invention it is found that first to multiple images inputted in terminal
Scene Recognition is carried out respectively, the descriptive text for being matched with the corresponding scene of multiple images respectively is generated, then from every image pair
The Chinese phonetic alphabet and rhyme corresponding to the last one word in descriptive text are obtained in the descriptive text that the scene matching answered goes out, most
It afterwards can the Chinese phonetic alphabet according to corresponding to the last one word in descriptive text and rhyme generation rhyming corresponding to multiple images
The lyrics, wherein the last one word for the descriptive text that the corresponding rhymed lyrics of every image scene corresponding with the image matches
Rhyme having the same.Image music can be generated by only needing terminal to provide multiple images in the embodiment of the present invention, by right
Multiple images carry out scene Recognition, and then Auto-matching goes out the descriptive text being adapted with scene, then the description text to scene
Word carries out rhymed design, and the rhymed lyrics generated in this way meet Music.The rhymed lyrics are raw according to the image of terminal input
At, therefore the image music exported can get up with picture material tight association provided by user, therefore according to input picture
The rhymed lyrics can be automatically generated.
One embodiment of the generation method of music of the present invention specifically can be applied to a variety of figures inputted based on user
As generating the matched music of descriptive text with the image.It please refers to shown in Fig. 2, music provided by one embodiment of the present invention
Generation method, may include steps of:
101, scene Recognition is carried out respectively to multiple images inputted in terminal, generation is matched with multiple images respectively
The descriptive text of corresponding scene.
In embodiments of the present invention, user can input multiple images for generating image music in terminal, and the present invention is real
It applies image music described in example and refers to the tool rhythmical music adaptable with multiple images of user's input.Wherein, eventually
Multiple images inputted in end can be user and be pre-reserved to terminal, are also possible to user and use taking the photograph for terminal in real time
Picture is collected, such as multiple images can be after entering photographing mode by terminal and collect;Or, multiple images from
It is got in the photograph album of terminal.Implementation for multiple images inputted in terminal, without limitation.
102, from the corresponding scene matching of every image go out descriptive text in obtain descriptive text in the last one word institute
The corresponding Chinese phonetic alphabet and rhyme.
103, the Chinese phonetic alphabet according to corresponding to the last one word in descriptive text and rhyme, which generate, corresponds to multiple images
The rhymed lyrics, wherein the descriptive text that the corresponding rhymed lyrics of every image scene corresponding with the image matches it is last
One word rhyme having the same.
104, the corresponding rhymed lyrics of multiple images are converted into voice.
It in embodiments of the present invention, can be by the rhymed lyrics after getting the corresponding rhymed lyrics of multiple images
Carry out text-to-speech, wherein specifically text, which can be used, switchs to voice (Text To Speech, TTS), will pass through step
The 103 obtained rhymed lyrics all switch to voice.
In some embodiments of the invention, the corresponding rhymed lyrics of multiple images are converted to language by step 104
Sound, comprising:
C1, the rhymed lyrics corresponding to multiple images carry out text analyzing, obtain text analyzing result;
C2, linguistic feature is extracted from text analyzing result;
C3, it is adaptively adjusted, is obtained and rhymed song according to the duration prediction and duration of linguistic feature progress phone-level
The matched prosodic features of word and part of speech feature;
C4, based on linguistic feature and with the prosodic features and part of speech feature of rhymed lyric match, use neural network mould
Type carries out pronunciation generation, obtains voice.
Wherein, the rhymed lyrics corresponding for every image, can carry out text analyzing first, be subsequent characteristics
It extracts and information is provided, obtained text analyzing result is primarily useful for pronunciation generation, prosody prediction, part of speech prediction etc., obtains text
After this analysis result, linguistic feature extraction is carried out to the result and turns flower into the input vector of neural network model.Next
The duration prediction and duration that duration modeling progress phone-level can be used adaptively adjust, due to raw in the embodiment of the present application
At the rhymed lyrics and commonly speak difference, there is rhythm, thus done in the result of duration prediction one it is adaptive
Adjustment, makes each word that can guarantee that original pronunciation does not change while on beat.Linguistic feature can be finally based on
With the prosodic features and part of speech feature with rhymed lyric match, pronunciation generation is carried out using neural network model, obtains voice.
105, voice and preset background music are synthesized together, generate image music.
In embodiments of the present invention, it is rhymed after the lyrics are converted to voice by step 105, which will include signature
The content of the rhythm lyrics, then voice and background music are combined and generate final image music.The image music is defeated by user
The rhymed lyrics and background music that multiple images entered are write out synthesize to obtain, therefore user can be with when playing for image music
Hearing one section has the lyrics, rhythmical music.Such as hip-hop is write out by multiple images and is rhymed the lyrics and then by hip-hop
The rhymed lyrics are synthesized together with hip-hop background music, one section of hip-hop music are obtained, so that completing text turns hip-hop music
(Text To Rap, TTR).
By previous embodiment to illustration of the invention it is found that first to multiple images inputted in terminal point
Not carry out scene Recognition, generate and be matched with the descriptive text of the corresponding scene of multiple images respectively, it is then corresponding to every image
Scene matching go out descriptive text carry out the rhymed matching based on keyword, generate the corresponding rhymed song of multiple images
Next the corresponding rhymed lyrics of multiple images are converted to voice by word, finally close voice and preset background music
At together, image music is generated.Image sound can be generated by only needing terminal to provide multiple images in the embodiment of the present invention
Happy, by carrying out scene Recognition to multiple images, then Auto-matching goes out the descriptive text being adapted with scene, then to scene
Descriptive text carry out rhymed design, the rhymed lyrics generated in this way meet Music, and then the rhymed lyrics are switched to language
Sound finally synthesizes the rhymed lyrics with background music, so that it may form one section of image music.The rhymed lyrics in image music
It is to be generated according to the image of terminal input, therefore the image music exported can closely be closed with picture material provided by user
Connection gets up, and can be automatically generated and the matched music of the descriptive text of scene by input picture.
In order to facilitate a better understanding and implementation of the above scheme of the embodiment of the present invention, it illustrates below corresponding application scenarios
To be specifically described.
In the embodiment of the present invention, song can be woven by artificial intelligence (Artificial Intelligence), be
A kind of forward-looking trial, applies for later AI and provides the value of reference in more large scene.Next with hip-hop music
Generation for, TTR (Text To Rap) i.e. text switchs to Rap Music, mainly to multiple input pictures carry out scene knowledge
Not, one section of description language for meeting the scene is then provided, and then identifies that carrying out the rhymed of subtitle sets according to image content-based
Description language of this section to scene is switched to voice, the subsequent background music that particular cadence is added will be carried on the back finally by TTS by meter
Scape music and text voice seamless connection ultimately generate one section of beautiful sound with hip-hop characteristic to complete a first hip-hop music
It is happy.TTR turns this segment description by carrying out scene Recognition to any input picture and providing description language, by a series of processing
For hip-hop music, song is woven using AI, is a kind of forward-looking trial, applies for later AI and provided in more large scene
The value used for reference.
It is based primarily upon in the embodiment of the present invention and scene Recognition is carried out to multiple input pictures, finally multiple input figures by this
As being combined into the MV (video) equipped with hip-hop music.User inputs multiple images from cell phone client small routine, on multiple images
After biography, image scene is identified using deep learning neural network model, and the associated description of the Auto-matching scene
Then these associated description languages are carried out rhymed design by picture material identification, after rhyming finally by TTS technology by language
Text switch to voice,
As shown in figure 3, the product process schematic diagram of hip-hop music provided in an embodiment of the present invention.This system mainly includes
Four partial contents:
1, user uploads or selects multiple images from cell phone client.Obtain user's input picture.
2, image scene identifies.Scene Recognition is carried out to multiple images of input, and provides associated description language.
3, rhyme design.Rhymed design is carried out to associated description language.
4, text is switched into voice.Voice will be converted to by rhymed description language.
Wherein, when user is when cell phone client submits multiple images, multiple input pictures will be identified, then
Scene Recognition is carried out to input picture, the description language that Auto-matching is adapted therewith out describes the rhymed feelings of language further according to these
Condition carries out rhymed design and supplement, this is an intelligent algorithm, directly gives description language by the picture of input, in fact
Namely directly judge this is what kind of scene, such as bird blue sky flies, someone incites somebody to action at seabeach etc., and then by TTS
Text switchs to voice, then generates one section of hip-hop music by subsequent working process.
Next various pieces content is carried out respectively for example, please refer to shown in Fig. 4-a, is user from mobile phone visitor
Family end uploads the schematic diagram of multiple images.User shoots plurality of pictures existing for plurality of pictures or selection mobile phone from mobile phone visitor
Family end uploads.Example is uploaded with cell phone client image.When the user clicks when " uploading pictures " button, then it will appear two kinds of choosings
Mode is selected, one kind is " taking pictures " mode, and one kind is " selecting from mobile phone photo album " mode.It can choose on plurality of pictures each time
It passes.
It is identified followed by image scene, Auto-matching text.Image scene knowledge is carried out according to user's uploading pictures
Not, scene Recognition, and respectively every image Auto-matching text carried out to every image, then by the corresponding text of every image
Word is together in series.Text annotation is generated for input picture, training neuro images annotation model can make its chance of success maximum
Change, whether deep learning neural network model is identical as the meaning of annotation model here.And novel image can be generated and retouched
It states.Such as following note can be generated: a grey clothing man waves ear of maize, and black clothing man looks on.For another example, it can be generated as follows
Explain: a bus " seat " is beside a people.
Next the rhymed design method of text provided in an embodiment of the present invention is illustrated.The present invention relates to AI
Figure item description generates application field, is related specifically to the rhymed matching process based on keyword, main flow is as follows:
1, the text information that iamge description generates is obtained, the Chinese phonetic alphabet and rhyme of corresponding Chinese character are obtained.
2, all possible simple or compound vowel of a Chinese syllable are arranged out from the Chinese phonetic alphabet, pre-generate more different scenes for each simple or compound vowel of a Chinese syllable
Descriptive text, and it is double rhymed in one, second lyrics is supplemented by the method.Pre-generated mode is as follows: Chinese
All simple or compound vowel of a Chinese syllable of phonetic may be listed all.Each simple or compound vowel of a Chinese syllable writes the signature of " landscape " " personage " " self-timer " " food " these four types of scenes
The rhythm lyrics.
3, descriptive text simple or compound vowel of a Chinese syllable Distribution dynamics are found by data sample, finds out several simple or compound vowel of a Chinese syllable at most distributed, be that this is several
Kind simple or compound vowel of a Chinese syllable increases data volume.
4, crawl scene is generated by iamge description and verbal description, by text simple or compound vowel of a Chinese syllable that verbal description generates come
With rhymed data.
5, last technology shows completely rhymed lyrics works.
The technical program is based on image recognition technology, and the picture that user uploads is converted text by iamge description generation technique
Word, and any second lyrics are matched by a word the last one rhyme of poems mother of iamge description generation and image scene,
Ultimately generate the rhymed lyrics.Then it gives song recitals again to AI.User's uploading pictures are formed, AI, which writes words, sings a complete friendship
Mutual process, interactive and interest greatly reinforce.Refer to that multiple lyrics can be used for matching, be here using first the last one
The simple or compound vowel of a Chinese syllable of word matches second.
It obtains iamge description first to generate, according to the photo that user uploads, AI iamge description generation technique obtains iamge description
Information, each picture can obtain the description of a word.
Then the Chinese phonetic alphabet is obtained, Chinese character is commonly used less than 8000, pre-generates the pinyin table of Chinese characters in common use, root
Index is established according to Chinese character and is loaded into memory, when needing to obtain phonetic transcriptions of Chinese characters, can quickly be obtained by indexing in O (1) time
It takes.
It is illustrated below:
Ah a1, a1 breathe out a1, and salt down a1, a1, a1, a2, breathe out a2, and Sha a2, a3 breathe out a3, a4, breathe out a4, Ah a5,
A5 breathes out a5, and sad ai1 suffers ai1, angstrom ai1, and sound of sighing ai1, honor zun1 abides by zun1, cup zun1, trout zun1 saves zun3, makees zuo1, sucks
Zuo1 makees zuo2, yesterday zuo2, chisels zuo2, Zuo zuo2, left zuo3, helps zuo3, and crowd zuo3 makees zuo4, is zuo4, sits zuo4,
Seat zuo4, digs zuo4, toothed oak zuo4, ashamed zuo4, sacrificial meat zuo4, fight of steps on the eastern side of the hall where the host stood to welcome the guests zuo4, azoles zuo4, blessing zuo4, jealous woman zuo4.
Next rhyme is obtained, looking into known to rhythm matrix has simple or compound vowel of a Chinese syllable in 35, with Chinese character " change " into example, Three kinds of hors d'oeuvres harmonious sounds mother's meeting
Simple or compound vowel of a Chinese syllable i and simple or compound vowel of a Chinese syllable an are contained comprising compound vowel and single vowel, such as ian, so being in acquisition simple or compound vowel of a Chinese syllable will be referring initially to three simple or compound vowel of a Chinese syllable again
Compound vowel is seen, referring finally to single vowel.All simple or compound vowel of a Chinese syllable are placed in an array by implementation, and according to simple or compound vowel of a Chinese syllable length according to
Small sequence is arrived greatly, then successively compares character string.
Iamge description scene is obtained, according to the text of iamge description, keyword contained by scene is matched, to differentiate corresponding fields
Scape, mainly divides landscape, personage, food, four class scene of self-timer at present, and the following figure is the corresponding keyword in part.
It is illustrated below:, can be there are many language be described, for example, landscape sunlight when scene is landscape, landscape sea,
Landscape rain, landscape flower, landscape grass.Scene can there are many describe language, such as boy personage, girl personage when being personage.Scene
It can there are many describe language, such as food cuisines when for food.Scene can there are many describe language, such as self-timer when being self-timer
Photo, self-timer head portrait.
Next the supplement lyrics are obtained according to scene and rhyme.It is generated first for different scene difference rhymes corresponding
Hip-hop talks about art, and it is selective that high-frequency rhyme generates more art.Then according to rhyme and scene, it is randomly selected one
Art if matching.
It is illustrated below:
If a landscape is similar, there is much the same care for
A personage is much the same big, has and much the same admires very
A food is much the same fried, has much the same hot
The much the same shrimp of ia cuisines, there is much the same scaring
Ia personage is much the same harmonious, have it is much the same we two
, there are much the same sunset clouds in the much the same family of ia food
The much the same wild flower of ua landscape has much the same beauty as drawn
If ua personage is similar, there are much the same Eight Diagrams
Ce it is general it is much the same lose, have much the same urging on
The much the same river of che landscape has much the same limpid
Ge cuisines are much the same gluttonous, there is much the same squab
The much the same solarization of re landscape, have difference must not sweltering heat
The much the same vindication of te personage has much the same perturbed
The ye general much the same late into the night is much the same choking with sobs
The general much the same life of ze, is much the same select
, there is much the same estrangement in the much the same river of he landscape
The much the same visitor of ke personage, there is much the same harshness
Ke food is much the same to be drunk, and is had much the same thirsty
The rhymed lyrics ultimately generated can be such that
Group walks on bustling street [iamge description]
It is much the same busy, there are much the same forgetting [the supplement lyrics]
High buildings and large mansions [iamge description] in city
Much the same scape has much the same care for [supplement the lyrics]
Cuisines photo [iamge description] when having a dinner party with friend
Much the same face has and much the same misses [supplement the lyrics]
Finally switch to voice to text to be illustrated, please refer to shown in Fig. 4-b, text analyzing is carried out to description language,
Information is provided for subsequent characteristics, pronunciation generation, prosody prediction, part of speech prediction etc. is specifically included that, obtains text analyzing
As a result after, linguistic feature extraction is carried out to the result and turns flower into the input vector of neural network.It is carried out using duration modeling
The duration prediction of phone-level.Phoneme is predicted using duration modeling, so as to obtain better rhythm.Due to hip-hop with
Common difference of speaking, has rhythm, therefore an adaptive adjustment has been done in the result of duration prediction, duration is adaptive
Should refer to makes each word that can guarantee that original pronunciation does not change while on beat by neural network adjust automatically.
Wherein, hip-hop sings input, this refers to description language.Acoustic feature predicts that being includes: prosody prediction and part of speech prediction.Hip-hop
Hip-hop rhythm in rhythm input is obtained by neural network prediction.Background music can be the faster background music of rhythm.It plays
The Kazakhstan lyrics, which refer to, carries out the description language obtained after scene Recognition to image, then obtains after carrying out rhymed design.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a systems
The combination of actions of column, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described,
Because according to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art also answer
This knows that the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily originally
Necessary to invention.
For the above scheme convenient for the better implementation embodiment of the present invention, it is also provided below for implementing the above scheme
Relevant apparatus.
It please refers to shown in Fig. 5-a, a kind of generating means 500 of music provided in an embodiment of the present invention, may include: field
Scape identification module 501, rhymed matching module 502, speech production module 503, music generating module 504, wherein
Scene Recognition module 501 generates and divides for carrying out scene Recognition respectively to multiple images inputted in terminal
It is not matched with the descriptive text of the corresponding scene of multiple described images;
Rhymed matching module 502, the descriptive text for going out to the corresponding scene matching of every image are carried out based on key
The rhymed matching of word generates the corresponding rhymed lyrics of multiple described images;
Speech production module 503, for the corresponding rhymed lyrics of multiple described images to be converted to voice;
Music generating module 504 generates image sound for the voice and preset background music to be synthesized together
It is happy.
In some embodiments of the invention, it please refers to shown in Fig. 5-b, the scene Recognition module 501, comprising:
Scene determining module 5011 carries out scene Recognition to multiple described images according to deep learning neural network model,
The characteristics of image identified, and the corresponding scene of multiple described images is determined according to described image feature;
Image description module 5012, characteristics of image and multiple described images for identifying according to respectively correspond
Scene carry out iamge description generation, obtain the descriptive text that the corresponding scene of multiple images matches respectively.
In some embodiments of the invention, it please refers to shown in Fig. 5-c, the rhymed matching module 502, comprising:
Rhyme obtains module 5021, for obtaining from the descriptive text that the corresponding scene matching of every image goes out
The Chinese phonetic alphabet corresponding to the last one word and rhyme in the descriptive text;
Lyrics generation module 5022, for the Chinese phonetic alphabet according to corresponding to the last one word in the descriptive text and
Rhyme generates the rhymed lyrics for corresponding to multiple images, wherein the corresponding rhymed lyrics of every image are corresponding with the image
The last one word rhyme having the same for the descriptive text that scene matching goes out.
In some embodiments of the invention, it please refers to shown in Fig. 5-d, the lyrics generation module 5022, comprising:
Simple or compound vowel of a Chinese syllable arranges module 50221, for arranging from the Chinese phonetic alphabet corresponding to the last one word in the descriptive text
List all simple or compound vowel of a Chinese syllable;
Regular determining module 50222, for determining simple or compound vowel of a Chinese syllable Distribution dynamics according to all simple or compound vowel of a Chinese syllable arranged out;
Rhyme determining module 50223, for being determined in the descriptive text from the simple or compound vowel of a Chinese syllable for meeting simple or compound vowel of a Chinese syllable Distribution dynamics
Rhyme corresponding to the last one word;
The lyrics obtain module 50224, for according to corresponding under the corresponding scene of multiple images and each scene
Rhyme gets the rhymed lyrics for corresponding to multiple images from pre-generated lyrics template, in the lyrics template
Be provided with the lyric characters corresponding to several scenes and a variety of rhymes in advance.
In some embodiments of the invention, it please refers to shown in Fig. 5-e, the lyrics obtain module 50224, comprising:
Lyrics generation module 502241 is described, the description text for going out according to the corresponding scene matching of every image
Word generates the iamge description lyrics;
Lyrics generation module 502242 is supplemented, for according under the corresponding scene of multiple images and each scene
Corresponding rhyme gets the supplement lyrics from pre-generated lyrics template;
Lyrics synthesis module 502243 is synthesized together with the supplement lyrics for described image to be described the lyrics, obtains
To the rhymed lyrics.
In some embodiments of the invention, multiple described images acquire after entering photographing mode by the terminal
It arrives;Or,
Multiple described images are got from the photograph album of the terminal.
In some embodiments of the invention, it please refers to shown in Fig. 5-f, the speech production module 503, comprising:
Text analysis model 5031 carries out text analyzing for the rhymed lyrics corresponding to multiple described images, obtains
To text analyzing result;
Linguistic feature extraction module 5032, for extracting linguistic feature from the text analyzing result;
Prosodic features and part of speech feature obtain module 5033, for carrying out phone-level according to the linguistic feature
Duration prediction is adaptively adjusted with duration, obtains the prosodic features and part of speech feature with the rhymed lyric match;
Pronounce generation module 5034, for based on the linguistic feature and described and rhymed lyric match rhythm
Feature and part of speech feature are restrained, pronunciation generation is carried out using neural network model, obtains the voice.
By previous embodiment to illustration of the invention it is found that first to multiple images inputted in terminal point
Not carry out scene Recognition, generate and be matched with the descriptive text of the corresponding scene of multiple images respectively, it is then corresponding to every image
Scene matching go out descriptive text carry out the rhymed matching based on keyword, generate the corresponding rhymed song of multiple images
Next the corresponding rhymed lyrics of multiple images are converted to voice by word, finally close voice and preset background music
At together, image music is generated.Image sound can be generated by only needing terminal to provide multiple images in the embodiment of the present invention
Happy, by carrying out scene Recognition to multiple images, then Auto-matching goes out the descriptive text being adapted with scene, then to scene
Descriptive text carry out rhymed design, the rhymed lyrics generated in this way meet Music, and then the rhymed lyrics are switched to language
Sound finally synthesizes the rhymed lyrics with background music, so that it may form one section of image music.The rhymed lyrics in image music
It is to be generated according to the image of terminal input, therefore the image music exported can closely be closed with picture material provided by user
Connection gets up, and can be automatically generated and the matched music of the descriptive text of scene by input picture.
For the above scheme convenient for the better implementation embodiment of the present invention, it is also provided below for implementing the above scheme
Relevant apparatus.
It please refers to shown in Fig. 6-a, a kind of generating means 600 of the rhymed lyrics provided in an embodiment of the present invention can wrap
Include: scene Recognition module 601, rhyme obtain module 602, lyrics generation module 603, wherein
Scene Recognition module 601 generates and divides for carrying out scene Recognition respectively to multiple images inputted in terminal
It is not matched with the descriptive text of the corresponding scene of multiple described images;
Rhyme obtains module 602, for obtaining institute from the descriptive text that the corresponding scene matching of every image goes out
State the Chinese phonetic alphabet corresponding to the last one word and rhyme in descriptive text;
Lyrics generation module 603, for the Chinese phonetic alphabet according to corresponding to the last one word in the descriptive text and rhythm
Foot generates the rhymed lyrics for corresponding to multiple images, wherein the corresponding rhymed lyrics of every image and the image corresponding fields
The last one word rhyme having the same for the descriptive text that scape matches.
In some embodiments of the invention, it please refers to shown in Fig. 6-b, the scene Recognition module 601, comprising:
Scene determining module 6011 carries out scene Recognition to multiple described images according to deep learning neural network model,
The characteristics of image identified, and the corresponding scene of multiple described images is determined according to described image feature;
Image description module 6012, characteristics of image and multiple described images for identifying according to respectively correspond
Scene carry out iamge description generation, obtain the descriptive text that the corresponding scene of multiple images matches respectively.
In some embodiments of the invention, it please refers to shown in Fig. 6-c, the lyrics generation module 603, comprising:
Simple or compound vowel of a Chinese syllable arranges module 6031, for arranging from the Chinese phonetic alphabet corresponding to the last one word in the descriptive text
List all simple or compound vowel of a Chinese syllable;
Regular determining module 6032, for determining simple or compound vowel of a Chinese syllable Distribution dynamics according to all simple or compound vowel of a Chinese syllable arranged out;
Rhyme determining module 6033, for being determined in the descriptive text from the simple or compound vowel of a Chinese syllable for meeting simple or compound vowel of a Chinese syllable Distribution dynamics
Rhyme corresponding to the last one word;
The lyrics obtain module 6034, for according to corresponding under the corresponding scene of multiple images and each scene
Rhyme gets the rhymed lyrics for corresponding to multiple images from pre-generated lyrics template, in the lyrics template
Be provided with the lyric characters corresponding to several scenes and a variety of rhymes in advance.
In some embodiments of the invention, it please refers to shown in Fig. 6-d, the lyrics obtain module 6034, comprising:
Lyrics generation module 60341 is described, the descriptive text for going out according to the corresponding scene matching of every image
Generate the iamge description lyrics;
Lyrics generation module 60342 is supplemented, for according to right under the corresponding scene of multiple images and each scene
The rhyme answered gets the supplement lyrics from pre-generated lyrics template;
Lyrics synthesis module 60343 is synthesized together with the supplement lyrics for described image to be described the lyrics, obtains
To the rhymed lyrics.
In some embodiments of the invention, multiple described images acquire after entering photographing mode by the terminal
It arrives;Or,
Multiple described images are got from the photograph album of the terminal.
In some embodiments of the invention, the supplement lyrics generation module 60342 is specifically used for according to the figure
As the description lyrics are determined for compliance with double rhymed rhymes;According to corresponding under the corresponding scene of multiple described images and each scene
Double rhymed rhymes that meet get the supplement lyrics from pre-generated lyrics template.
Scene Recognition is carried out respectively to multiple images inputted in terminal first, generation is matched with multiple images respectively
Then the descriptive text of corresponding scene obtains descriptive text from the descriptive text that the corresponding scene matching of every image goes out
In the Chinese phonetic alphabet and rhyme corresponding to the last one word, finally can be according to corresponding to the last one word in descriptive text
The Chinese phonetic alphabet and rhyme generate the rhymed lyrics for corresponding to multiple images, wherein the corresponding rhymed lyrics of every image and the figure
As the last one word rhyme having the same for the descriptive text that corresponding scene matches.End is only needed in the embodiment of the present invention
End provide multiple images can generate image music, by multiple images carry out scene Recognition, then Auto-matching go out with
The adaptable descriptive text of scene, then rhymed design is carried out to the descriptive text of scene, the rhymed lyrics generated in this way meet
Music.The rhymed lyrics are to be generated according to the image of terminal input, therefore the image music exported can be provided with user
Picture material tight association get up, therefore the rhymed lyrics can be automatically generated according to input picture.
The embodiment of the invention also provides a kind of terminals, as shown in fig. 7, for ease of description, illustrating only and the present invention
The relevant part of embodiment, it is disclosed by specific technical details, please refer to present invention method part.The terminal can be with
Being includes mobile phone, tablet computer, PDA (Personal Digital Assistant, personal digital assistant), POS (Point of
Sales, point-of-sale terminal), any terminal device such as vehicle-mounted computer, taking the terminal as an example:
Fig. 7 shows the block diagram of the part-structure of mobile phone relevant to terminal provided in an embodiment of the present invention.With reference to figure
7, mobile phone includes: radio frequency (Radio Frequency, RF) circuit 1010, memory 1020, input unit 1030, display unit
1040, sensor 1050, voicefrequency circuit 1060, Wireless Fidelity (wireless fidelity, WiFi) module 1070, processor
The components such as 1080 and power supply 1090.It will be understood by those skilled in the art that handset structure shown in Fig. 7 is not constituted pair
The restriction of mobile phone may include perhaps combining certain components or different component cloth than illustrating more or fewer components
It sets.
It is specifically introduced below with reference to each component parts of the Fig. 7 to mobile phone:
RF circuit 1010 can be used for receiving and sending messages or communication process in, signal sends and receivees, particularly, by base station
Downlink information receive after, to processor 1080 handle;In addition, the data for designing uplink are sent to base station.In general, RF is electric
Road 1010 includes but is not limited to antenna, at least one amplifier, transceiver, coupler, low-noise amplifier (Low Noise
Amplifier, LNA), duplexer etc..In addition, RF circuit 1010 can also be logical with network and other equipment by wireless communication
Letter.Any communication standard or agreement, including but not limited to global system for mobile communications can be used in above-mentioned wireless communication
(Global System of Mobile communication, GSM), general packet radio service (General Packet
Radio Service, GPRS), CDMA (Code Division Multiple Access, CDMA), wideband code division it is more
Location (Wideband Code Division Multiple Access, WCDMA), long term evolution (Long Term
Evolution, LTE), Email, short message service (Short Messaging Service, SMS) etc..
Memory 1020 can be used for storing software program and module, and processor 1080 is stored in memory by operation
1020 software program and module, thereby executing the various function application and data processing of mobile phone.Memory 1020 can be led
It to include storing program area and storage data area, wherein storing program area can be needed for storage program area, at least one function
Application program (such as sound-playing function, image player function etc.) etc.;Storage data area can store the use according to mobile phone
Data (such as audio data, phone directory etc.) created etc..In addition, memory 1020 may include that high random access is deposited
Reservoir, can also include nonvolatile memory, a for example, at least disk memory, flush memory device or other are volatile
Property solid-state memory.
Input unit 1030 can be used for receiving the number or character information of input, and generate the user setting with mobile phone
And the related key signals input of function control.Specifically, input unit 1030 may include touch panel 1031 and other are defeated
Enter equipment 1032.Touch panel 1031, also referred to as touch screen collect the touch operation of user on it or nearby and (for example use
Family is using any suitable objects or attachment such as finger, stylus on touch panel 1031 or near touch panel 1031
Operation), and corresponding attachment device is driven according to preset formula.Optionally, touch panel 1031 may include touching inspection
Survey two parts of device and touch controller.Wherein, the touch orientation of touch detecting apparatus detection user, and detect touch behaviour
Make bring signal, transmits a signal to touch controller;Touch controller receives touch information from touch detecting apparatus,
And it is converted into contact coordinate, then give processor 1080, and order that processor 1080 is sent can be received and held
Row.Furthermore, it is possible to realize touch panel 1031 using multiple types such as resistance-type, condenser type, infrared ray and surface acoustic waves.
In addition to touch panel 1031, input unit 1030 can also include other input equipments 1032.Specifically, other input equipments
1032 can include but is not limited to physical keyboard, function key (such as volume control button, switch key etc.), trace ball, mouse
One of mark, operating stick etc. are a variety of.
Display unit 1040 can be used for showing information input by user or be supplied to user information and mobile phone it is each
Kind menu.Display unit 1040 may include display panel 1041, optionally, can use liquid crystal display (Liquid
Crystal Display, LCD), the forms such as Organic Light Emitting Diode (Organic Light-Emitting Diode, OLED)
To configure display panel 1041.Further, touch panel 1031 can cover display panel 1041, when touch panel 1031 is examined
After measuring touch operation on it or nearby, processor 1080 is sent to determine the type of touch event, is followed by subsequent processing device
1080 provide corresponding visual output according to the type of touch event on display panel 1041.Although in Fig. 7, touch surface
Plate 1031 and display panel 1041 are the input and input function for realizing mobile phone as two independent components, but at certain
In a little embodiments, can be integrated by touch panel 1031 and display panel 1041 and that realizes mobile phone output and input function.
Mobile phone may also include at least one sensor 1050, such as optical sensor, motion sensor and other sensings
Device.Specifically, optical sensor may include ambient light sensor and proximity sensor, wherein ambient light sensor can be according to environment
The light and shade of light adjusts the brightness of display panel 1041, and proximity sensor can close display surface when mobile phone is moved in one's ear
Plate 1041 and/or backlight.As a kind of motion sensor, accelerometer sensor can detect (generally three in all directions
Axis) acceleration size, can detect that size and the direction of gravity when static, can be used to identify mobile phone posture application (such as
Horizontal/vertical screen switching, dependent game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, tap) etc.;Extremely
In other sensors such as gyroscope, barometer, hygrometer, thermometer, the infrared sensors that mobile phone can also configure, herein not
It repeats again.
Voicefrequency circuit 1060, loudspeaker 1061, microphone 1062 can provide the audio interface between user and mobile phone.Sound
Electric signal after the audio data received conversion can be transferred to loudspeaker 1061, by 1061 turns of loudspeaker by frequency circuit 1060
It is changed to voice signal output;On the other hand, the voice signal of collection is converted to electric signal by microphone 1062, by voicefrequency circuit
1060 receive after be converted to audio data, then by after the processing of audio data output processor 1080, through RF circuit 1010 to send
It exports to memory 1020 to such as another mobile phone, or by audio data to be further processed.
WiFi belongs to short range wireless transmission technology, and mobile phone can help user's transceiver electronics by WiFi module 1070
Mail, browsing webpage and access streaming video etc., it provides wireless broadband internet access for user.Although Fig. 7 is shown
WiFi module 1070, but it is understood that, and it is not belonging to must be configured into for mobile phone, it can according to need completely
Do not change in the range of the essence of invention and omits.
Processor 1080 is the control centre of mobile phone, using the various pieces of various interfaces and connection whole mobile phone,
By running or execute the software program and/or module that are stored in memory 1020, and calls and be stored in memory 1020
Interior data execute the various functions and processing data of mobile phone, to carry out integral monitoring to mobile phone.Optionally, processor
1080 may include one or more processing units;Preferably, processor 1080 can integrate application processor and modulation /demodulation processing
Device, wherein the main processing operation system of application processor, user interface and application program etc., modem processor is mainly located
Reason wireless communication.It is understood that above-mentioned modem processor can not also be integrated into processor 1080.
Mobile phone further includes the power supply 1090 (such as battery) powered to all parts, it is preferred that power supply can pass through power supply
Management system and processor 1080 are logically contiguous, to realize management charging, electric discharge and power consumption by power-supply management system
The functions such as management.
Although being not shown, mobile phone can also include camera, bluetooth module etc., and details are not described herein.
In embodiments of the present invention, processor 1080 included by the terminal also has control execution is above to be held by terminal
Capable method flow.
In addition it should be noted that, the apparatus embodiments described above are merely exemplary, wherein described as separation
The unit of part description may or may not be physically separated, component shown as a unit can be or
It may not be physical unit, it can it is in one place, or may be distributed over multiple network units.It can root
According to actual need that some or all of the modules therein is selected to achieve the purpose of the solution of this embodiment.In addition, the present invention mentions
In the Installation practice attached drawing of confession, the connection relationship between module indicates there is communication connection between them, specifically may be implemented
For one or more communication bus or signal wire.Those of ordinary skill in the art without creative efforts, i.e.,
It is understood that and implementing.
Through the above description of the embodiments, it is apparent to those skilled in the art that the present invention can
Realized by the mode of software plus required common hardware, naturally it is also possible to by specialized hardware include specific integrated circuit,
Dedicated cpu, private memory, special components and parts etc. are realized.Under normal circumstances, it is all by computer program complete function all
It can easily be realized with corresponding hardware, moreover, being used to realize that the specific hardware structure of same function is also possible to more
Kind multiplicity, such as analog circuit, digital circuit or special circuit etc..But software journey in situations more for the purpose of the present invention
Sequence realization is more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to the prior art
The part to contribute can be embodied in the form of software products, which is stored in depositing of can be read
In storage media, such as the floppy disk of computer, USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), at random deposit
Access to memory (RAM, Random Access Memory), magnetic or disk etc., including some instructions are used so that a meter
It calculates machine equipment (can be personal computer, server or the network equipment etc.) and executes side described in each embodiment of the present invention
Method.
In conclusion the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although reference
Invention is explained in detail for above-described embodiment, those skilled in the art should understand that: it still can be right
Technical solution documented by the various embodiments described above is modified or equivalent replacement of some of the technical features;And this
It modifies or replaces, the spirit and model of technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution
It encloses.
Claims (13)
1. a kind of generation method of the rhymed lyrics, which is characterized in that the described method includes:
Scene Recognition is carried out respectively to multiple images inputted in terminal, it is corresponding that generation is matched with multiple described images respectively
The descriptive text of scene;
The last one word institute in the descriptive text is obtained from the descriptive text that the corresponding scene matching of every image goes out
The corresponding Chinese phonetic alphabet and rhyme;
It is generated according to the Chinese phonetic alphabet corresponding to the last one word in the descriptive text and rhyme and corresponds to multiple described images
The rhymed lyrics, wherein the descriptive text that the corresponding rhymed lyrics of every image scene corresponding with the image matches it is last
One word rhyme having the same.
2. the method according to claim 1, wherein described carry out multiple images inputted in terminal respectively
Scene Recognition generates the descriptive text for being matched with the corresponding scene of multiple described images respectively, comprising:
According to deep learning neural network model to multiple described images progress scene Recognition, the characteristics of image identified,
And the corresponding scene of multiple described images is determined according to described image feature;
Iamge description generation is carried out according to the characteristics of image identified and the corresponding scene of multiple described images, is obtained
The descriptive text that multiple described images match respectively.
3. the method according to claim 1, wherein described right according to the last one word institute in the descriptive text
The Chinese phonetic alphabet and rhyme answered generate the rhymed lyrics for corresponding to multiple images, comprising:
All simple or compound vowel of a Chinese syllable are arranged out from the Chinese phonetic alphabet corresponding to the last one word in the descriptive text;
Simple or compound vowel of a Chinese syllable Distribution dynamics are determined according to all simple or compound vowel of a Chinese syllable arranged out;
From determining rhyme corresponding to the last one word in the descriptive text in the simple or compound vowel of a Chinese syllable for meeting simple or compound vowel of a Chinese syllable Distribution dynamics;
According to corresponding rhyme under the corresponding scene of multiple described images and each scene from pre-generated lyrics template
The rhymed lyrics for corresponding to multiple images are got, are provided in advance in the lyrics template corresponding to several scenes and more
The lyric characters of kind rhyme.
4. according to the method described in claim 3, it is characterized in that, the corresponding scene of multiple images according to and each
Corresponding rhyme gets the rhymed lyrics for corresponding to multiple images, packet from pre-generated lyrics template under a scene
It includes:
The iamge description lyrics are generated according to the descriptive text that the corresponding scene matching of every image goes out;
According to corresponding rhyme under the corresponding scene of multiple described images and each scene from pre-generated lyrics template
Get the supplement lyrics;
Described image is described the lyrics to be synthesized together with the supplement lyrics, obtains the rhymed lyrics.
5. according to the method described in claim 4, it is characterized in that, the corresponding scene of multiple images according to and each
Corresponding rhyme gets the supplement lyrics from pre-generated lyrics template under a scene, comprising:
The lyrics, which are described, according to described image is determined for compliance with double rhymed rhymes;
Meet double rhymed rhymes from pre-generated according to corresponding under the corresponding scene of multiple described images and each scene
Lyrics template in get the supplement lyrics.
6. the method according to claim 1, wherein multiple described images enter photographing mode by the terminal
After collect;Or,
Multiple described images are got from the photograph album of the terminal.
7. a kind of generating means of the rhymed lyrics, which is characterized in that described device includes:
Scene Recognition module, for carrying out scene Recognition respectively to multiple images inputted in terminal, generation is matched with respectively
The descriptive text of the corresponding scene of described multiple images;
Rhyme obtains module, for obtaining the description text from the descriptive text that the corresponding scene matching of every image goes out
The Chinese phonetic alphabet corresponding to the last one word and rhyme in word;
Lyrics generation module, for the Chinese phonetic alphabet according to corresponding to the last one word in the descriptive text and rhyme generation pair
The rhymed lyrics of multiple images described in Ying Yu, wherein the corresponding rhymed lyrics of every image scene corresponding with the image matches
Descriptive text the last one word rhyme having the same.
8. device according to claim 7, which is characterized in that the scene Recognition module, comprising:
Scene determining module carries out scene Recognition to multiple described images according to deep learning neural network model, is identified
Characteristics of image out, and the corresponding scene of multiple described images is determined according to described image feature;
Image description module, the corresponding scene of characteristics of image and multiple described images for identifying according to carry out
Iamge description generates, and obtains the descriptive text that the corresponding scene of multiple images matches respectively.
9. device according to claim 7, which is characterized in that the lyrics generation module, comprising:
Simple or compound vowel of a Chinese syllable arranges module, for arranging out all rhythms from the Chinese phonetic alphabet corresponding to the last one word in the descriptive text
It is female;
Regular determining module, for determining simple or compound vowel of a Chinese syllable Distribution dynamics according to all simple or compound vowel of a Chinese syllable arranged out;
Rhyme determining module, for from determining the last one word in the descriptive text in the simple or compound vowel of a Chinese syllable for meeting simple or compound vowel of a Chinese syllable Distribution dynamics
Corresponding rhyme;
The lyrics obtain module, for according to corresponding rhyme under the corresponding scene of multiple images and each scene from preparatory
The rhymed lyrics for corresponding to multiple images are got in the lyrics template of generation, are provided in advance in the lyrics template pair
It should be in the lyric characters of several scenes and a variety of rhymes.
10. device according to claim 9, which is characterized in that the lyrics obtain module, comprising:
Lyrics generation module is described, the descriptive text for going out according to the corresponding scene matching of every image generates image and retouches
State the lyrics;
Supplement lyrics generation module, for according to corresponding rhyme under the corresponding scene of multiple images and each scene from
The supplement lyrics are got in pre-generated lyrics template;
Lyrics synthesis module is synthesized together for described image to be described the lyrics with the supplement lyrics, is obtained described rhymed
The lyrics.
11. device according to claim 10, which is characterized in that the supplement lyrics generation module is specifically used for basis
Described image describes the lyrics and is determined for compliance with double rhymed rhymes;According under the corresponding scene of multiple described images and each scene
The corresponding pair rhymed rhymes that meet get the supplement lyrics from pre-generated lyrics template.
12. device according to claim 7, which is characterized in that multiple described images enter mould of taking pictures by the terminal
It is collected after formula;Or,
Multiple described images are got from the photograph album of the terminal.
13. a kind of computer readable storage medium, including instruction, when run on a computer, so that computer executes such as
Method as claimed in any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710939775.9A CN110019919B (en) | 2017-09-30 | 2017-09-30 | Method and device for generating rhyme-rhyme lyrics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710939775.9A CN110019919B (en) | 2017-09-30 | 2017-09-30 | Method and device for generating rhyme-rhyme lyrics |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110019919A true CN110019919A (en) | 2019-07-16 |
CN110019919B CN110019919B (en) | 2022-07-26 |
Family
ID=67186509
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710939775.9A Active CN110019919B (en) | 2017-09-30 | 2017-09-30 | Method and device for generating rhyme-rhyme lyrics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110019919B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111177317A (en) * | 2019-12-20 | 2020-05-19 | 吕梁学院 | Literature theory rapid retrieval query system and method |
CN111241829A (en) * | 2020-01-14 | 2020-06-05 | 成都嗨翻屋科技有限公司 | Intelligent lyric modification method based on neural network and auxiliary system |
CN112035699A (en) * | 2020-08-27 | 2020-12-04 | 北京字节跳动网络技术有限公司 | Music synthesis method, device, equipment and computer readable medium |
CN116011431A (en) * | 2023-03-22 | 2023-04-25 | 暗链科技(深圳)有限公司 | Method for generating mnemonic words and electronic equipment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004102302A (en) * | 1996-05-29 | 2004-04-02 | Yamaha Corp | Apparatus and method for assisting lyric writing and storage medium |
CN101089793A (en) * | 2006-06-15 | 2007-12-19 | 李树喜 | Poems rhyme base vowel indexing input method |
CN102385596A (en) * | 2010-09-03 | 2012-03-21 | 腾讯科技(深圳)有限公司 | Verse searching method and device |
JP2014170146A (en) * | 2013-03-05 | 2014-09-18 | Univ Of Tokyo | Method and device for automatically composing chorus from japanese lyrics |
CN105955938A (en) * | 2016-04-25 | 2016-09-21 | 广州酷狗计算机科技有限公司 | Method and device for editing lyrics |
CN106547789A (en) * | 2015-09-22 | 2017-03-29 | 阿里巴巴集团控股有限公司 | A kind of lyrics generation method and device |
CN107122492A (en) * | 2017-05-19 | 2017-09-01 | 北京金山安全软件有限公司 | Lyric generation method and device based on picture content |
-
2017
- 2017-09-30 CN CN201710939775.9A patent/CN110019919B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004102302A (en) * | 1996-05-29 | 2004-04-02 | Yamaha Corp | Apparatus and method for assisting lyric writing and storage medium |
CN101089793A (en) * | 2006-06-15 | 2007-12-19 | 李树喜 | Poems rhyme base vowel indexing input method |
CN102385596A (en) * | 2010-09-03 | 2012-03-21 | 腾讯科技(深圳)有限公司 | Verse searching method and device |
JP2014170146A (en) * | 2013-03-05 | 2014-09-18 | Univ Of Tokyo | Method and device for automatically composing chorus from japanese lyrics |
CN106547789A (en) * | 2015-09-22 | 2017-03-29 | 阿里巴巴集团控股有限公司 | A kind of lyrics generation method and device |
CN105955938A (en) * | 2016-04-25 | 2016-09-21 | 广州酷狗计算机科技有限公司 | Method and device for editing lyrics |
CN107122492A (en) * | 2017-05-19 | 2017-09-01 | 北京金山安全软件有限公司 | Lyric generation method and device based on picture content |
Non-Patent Citations (1)
Title |
---|
GAVIN PAUL: "Lyric App Nerds:Meet "Giorgio Cam",Turning Pictures into Song", 《HTTP://WWW.SONGLYRICS.COM/NEWS/LYRIC-APP-NERDS-MEET-GIORGIO-CAM-TURNING-PICTURES-INTO-SONG/》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111177317A (en) * | 2019-12-20 | 2020-05-19 | 吕梁学院 | Literature theory rapid retrieval query system and method |
CN111241829A (en) * | 2020-01-14 | 2020-06-05 | 成都嗨翻屋科技有限公司 | Intelligent lyric modification method based on neural network and auxiliary system |
CN111241829B (en) * | 2020-01-14 | 2023-05-05 | 成都潜在人工智能科技有限公司 | Intelligent lyric modification method and auxiliary system based on neural network |
CN112035699A (en) * | 2020-08-27 | 2020-12-04 | 北京字节跳动网络技术有限公司 | Music synthesis method, device, equipment and computer readable medium |
CN116011431A (en) * | 2023-03-22 | 2023-04-25 | 暗链科技(深圳)有限公司 | Method for generating mnemonic words and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN110019919B (en) | 2022-07-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109599079A (en) | A kind of generation method and device of music | |
JP7408048B2 (en) | Anime character driving method and related device based on artificial intelligence | |
WO2020253663A1 (en) | Artificial intelligence-based image region recognition method and apparatus, and model training method and apparatus | |
CN110381388A (en) | A kind of method for generating captions and device based on artificial intelligence | |
CN105512164B (en) | Use the method and apparatus of voice label management image | |
CN110476405A (en) | For providing and shooting the method and system of related recommendation information | |
US11511436B2 (en) | Robot control method and companion robot | |
CN109783798A (en) | Method, apparatus, terminal and the storage medium of text information addition picture | |
JP2019220194A (en) | Information processing device, information processing method and program | |
CN110019919A (en) | A kind of generation method and device of the rhymed lyrics | |
CN108780389A (en) | Image retrieval for computing device | |
CN108292317A (en) | Problem and answer processing method and the electronic equipment for supporting this method | |
CN110503942A (en) | A kind of voice driven animation method and device based on artificial intelligence | |
CN106560891A (en) | Speech Recognition Apparatus And Method With Acoustic Modelling | |
CN108806669A (en) | Electronic device for providing speech-recognition services and its method | |
CN109102802A (en) | System for handling user spoken utterances | |
CN107977928A (en) | Expression generation method, apparatus, terminal and storage medium | |
CN105630954B (en) | A kind of method and apparatus based on photo synthesis dynamic picture | |
CN109815363A (en) | Generation method, device, terminal and the storage medium of lyrics content | |
CN109784165A (en) | Generation method, device, terminal and the storage medium of poem content | |
CN110298212A (en) | Model training method, Emotion identification method, expression display methods and relevant device | |
WO2018033066A1 (en) | Robot control method and companion robot | |
CN107704514A (en) | A kind of photo management method, device and computer-readable recording medium | |
CN109994206A (en) | A kind of appearance prediction technique and electronic equipment | |
CN116861850A (en) | Data processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |