CN108492817A

CN108492817A - A kind of song data processing method and performance interactive system based on virtual idol

Info

Publication number: CN108492817A
Application number: CN201810142242.2A
Authority: CN
Inventors: 陆羽皓
Original assignee: Beijing Guangnian Wuxian Technology Co Ltd
Current assignee: Beijing Virtual Point Technology Co Ltd
Priority date: 2018-02-11
Filing date: 2018-02-11
Publication date: 2018-09-04
Anticipated expiration: 2038-02-11
Also published as: CN108492817B

Abstract

The invention discloses a kind of song data processing methods based on virtual idol, and this method comprises the following steps：Multi-modal data is obtained, the audio that gives song recitals is extracted from multi-modal data, and the audio that will give song recitals is converted into song files with collection of tunes of poems lock out operation, generates music-book information corresponding with the song files and lyrics information；Model is handled according to music, editing and processing is carried out to music-book information and lyrics information, generate music score authoring information and lyrics authoring information；Based on the sound ray of virtual idol, music score authoring information and lyrics authoring information are subjected to phonetic synthesis, target song files is generated and exports.The application can be performed target on an imaging device from creation song by the control of mobile device, to assist interactive object voluntarily create and sing, promote the authoring experience of interactive object.

Description

A kind of song data processing method and performance interactive system based on virtual idol

Technical field

The present invention relates to field in intelligent robotics more particularly to a kind of song data processing method based on virtual idol and Sing interactive system.

Background technology

With the continuous development of artificial intelligence field, now to the research work of robot not only just in industry Field is also gradually expanded more across fields such as sufficient amusement, medical treatment, health care, family and services using object.

In entertainment field, played out at present in requesting song and existing song for the application of virtual robot is multipair, nothing Method is according to setting a song to music and the lyrics generate corresponding song.Therefore, it is proposed to which a kind of interaction capabilities of new robot, assist user's sound Minstrel carries out the creation of target song, improves user experience.

Invention content

The first technical problem to be solved by the present invention is to need to provide a kind of song data processing based on virtual idol Method, this method comprises the following steps：Spectrum step is taken off, multi-modal data is obtained, extraction, which is sung, from the multi-modal data sings Bent audio, and the audio that gives song recitals is converted to song files with collection of tunes of poems lock out operation, it generates corresponding with the song files Music-book information and lyrics information；Ci and qu edit step handles model according to music and believes the music-book information and the lyrics Breath carries out editing and processing, generates music score authoring information and lyrics authoring information；Phonetic synthesis step, based on the virtual idol The music score authoring information and the lyrics authoring information are carried out phonetic synthesis by sound ray, are generated target song files and are exported.

In one embodiment, in the ci and qu edit step, further comprise, be based on the music-book information, judge Current song style；It transfers and handles model with the matched music of the style；Model is handled according to the matched music of the style To the music-book information and the lyrics information into edlin, to generate music score authoring information and lyrics authoring information.

In one embodiment, the lyrics authoring information includes lyrics fragment data, wherein the lyrics fragment data Configured with lyrics fragment coding, lyrics segment start-stop time-histories, simple or compound vowel of a Chinese syllable/initial consonant mark, tone coding, punctuate mark and simple or compound vowel of a Chinese syllable/sound Female data；The music score authoring information includes music score fragment data, wherein the music score fragment data is compiled configured with music score segment Code, music score segment start-stop time-histories, tone segments data and trifle mark.

In one embodiment, it takes off in spectrum step, further comprises described：Give song recitals audio described in getting Song separation is carried out, background sound data are removed, retains voice and sings opera arias data；It further decomposes the voice to sing opera arias data, compile It collects to arrange and generates the music-book information and the lyrics information.

In one embodiment, in the phonetic synthesis step, further comprise, be based on current song style, determine With the matched sound ray of song style；By the lyrics authoring information, the music score authoring information and the song style The sound ray matched and the background sound data substitute into preset speech synthesis system, generate target song files.

In one embodiment, the music handles model by being based on words pinyin library, song note and composition of writing words Database is trained, and in conjunction with multiclass write words composition custom property data base it is built-up.

Another aspect according to the ... of the embodiment of the present invention additionally provides a kind of song data processing system based on virtual idol System, which includes following module：Spectrum module is taken off, multi-modal data is obtained, extraction, which is sung, from the multi-modal data sings Bent audio, and the audio that gives song recitals is converted to song files with collection of tunes of poems lock out operation, it generates corresponding with the song files Music-book information and lyrics information；Ci and qu editor module handles model to the music-book information and the lyrics according to music Information carries out editing and processing, generates music score authoring information and lyrics authoring information；Voice synthetic module, based on it is described virtual The music score authoring information and the lyrics authoring information are carried out phonetic synthesis, generate target song files by the sound ray of idol And it exports.

In one embodiment, in the ci and qu editor module, further comprise, song style recognition unit, base In the music-book information, judges current song style and generate corresponding information；Authoring models selection unit is transferred and the style Matched music handles model；Edit cell handles model to the music-book information according to the matched music of the style With the lyrics information into edlin, to generate music score authoring information and lyrics authoring information.

In one embodiment, described to take off spectrum module, further comprise：Song separative element, the institute that will be got It states the audio that gives song recitals and carries out song separation, remove background sound data, retain voice and sing opera arias data；Ci and qu resolving cell, It further decomposes the voice to sing opera arias data, editor, which arranges, generates the music-book information and the lyrics information.

In one embodiment, the voice synthetic module, further comprises：Sound ray selection unit, based on current Song style determines and the matched sound ray of song style；Song synthesis unit, by the lyrics authoring information, described Music score authoring information substitutes into preset phonetic synthesis system with the matched sound ray of the song style and the background sound data In system, target song files are generated.

Another aspect according to the ... of the embodiment of the present invention additionally provides a kind of computer readable storage medium, is stored thereon with The step of computer program, which realizes song data processing method described above when being executed by processor.

Another aspect according to the ... of the embodiment of the present invention, additionally provide it is a kind of based on virtual idol song processing with sing hand over Mutual system, the interactive system have：Cloud server has computer readable storage medium as described above；It is mobile Equipment receives and plays the target song files of the cloud server output, is also based on the target song files Imaging control information is generated, and controls the target song files and imaging control synchronizing information output；Imaging device, The imaging control information that the output equipment is sent is received, and carries out the exhibition of virtual idol based on the imaging control information Show, the virtual idol displaying controls the specific image characteristics of information matches with the imaging.

Compared with prior art, one or more of said program embodiment can have the following advantages that or beneficial to effect Fruit：

The embodiment of the present invention after the audio that gives song recitals in getting multi-modal data, can be identified by mobile device Song style, and combined according to the style information and model is handled by the music that machine learning method customizes, to the sound that gives song recitals Music-book information and lyrics information in frequency, further using the sound ray for style information, pass through mobile device into edlin Control performs target from creation song on an imaging device.The virtual robot of the application can assist song origin interaction pair As being created, and promote the authoring experience of interactive object.

Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification It obtains it is clear that being understood by implementing technical scheme of the present invention.The purpose of the present invention and other advantages can by Specifically noted structure and/or flow are realized and are obtained in specification, claims and attached drawing.

Description of the drawings

Attached drawing is used to provide further understanding of the present invention, and a part for constitution instruction, the reality with the present invention It applies example and is used together to explain the present invention, be not construed as limiting the invention.In the accompanying drawings：

Fig. 1 is the imaging scene application of the song processing and performance interactive system based on virtual idol of the embodiment of the present application Schematic diagram.

Fig. 2 is the structural schematic diagram of the song processing and performance interactive system based on virtual idol of the embodiment of the present application.

The song processing based on virtual idol and output equipment 22 in performance interactive system that Fig. 3 is the embodiment of the present application Module frame chart.

The song processing based on virtual idol and output equipment 22 in performance interactive system that Fig. 4 is the embodiment of the present application Implementing procedure figure.

Fig. 5 is the structural schematic diagram of the song data processing system 23 based on virtual idol of the embodiment of the present application.

Fig. 6 is to take off spectrum module 231 in the song data processing system 23 based on virtual idol of the embodiment of the present application Implementing procedure figure.

Fig. 7 is the ci and qu editor module in the song data processing system 23 based on virtual idol of the embodiment of the present application 232 implementing procedure figure.

Fig. 8 is the voice synthetic module in the song data processing system 23 based on virtual idol of the embodiment of the present application 233 implementing procedure figure.

Fig. 9 is the step flow chart of the song data processing method based on virtual idol of the embodiment of the present application.

Specific implementation mode

Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings and examples, how to be applied to the present invention whereby Technological means solves technical problem, and the realization process for reaching relevant art effect can fully understand and implement.This Shen Each feature that please be in embodiment and embodiment, can be combined with each other under the premise of not colliding, be formed by technical solution Within protection scope of the present invention.

In addition, the flow of attached drawing can be in the computer system of such as a group of computer-executable instructions the step of illustrating Middle execution.Also, although logical order is shown in flow charts, and it in some cases, can be with different from herein Sequence executes shown or described step.

Embodiments herein can be realized on line and/or under line and interactive object completion multimode in Entertainment Scene For the virtual idol of state data interaction come what is implemented, which has specific image characteristics, and is run in mobile device, by Mobile device carries out control and is shown by imaging device.Mobile device can be that virtual idol configures social property, personality attribute With personage's technical ability etc..Specifically, mobile device can be connect with cloud server and so that virtual idol have it is multi-modal man-machine Interaction, have natural language understanding, visual perception, touch perception, language voice output, emotional facial expressions action output etc. The ability of Artificial Intelligence (AI)；The display function of imaging device can also be controlled, including right The appendicular display of scene is controlled (such as flowers, plants and trees etc. in control scene) and to light, special efficacy, particle or is penetrated The display of line, wherein light, special efficacy, particle and ray can be shown by imaging device.Wherein, social property can wrap It includes：Appearance, dress ornament, decoration, gender, native place, the age, family relationship, occupation, position, religious belief, emotion state, is learned name Go through equal attributes；Personality attribute may include：The attributes such as personality, makings；Personage's technical ability may include：Sing and dance, tell a story, The professional skills such as training, and the displaying of personage's technical ability is not limited to the technical ability displaying of limbs, expression, head and/or mouth.

It should be noted that social property, personality attribute and the personage's technical ability etc. of virtual idol can be to multi-modal interactions The resolving of data is instructed so that the multi-modal output data result that decision goes out is more prone to or is more suitable for the virtual idol Picture.Virtual idol can also coordinate mobile device to project on imaging device simultaneously, and according to the scene of imaging device displaying into Row is deduced, such as sing and dance etc..

In this application, virtual idol has song editing and processing ability, can be from the multi-modal data got Obtain song audio information, by the information carry out imitate singer or composer write words composition custom song creation, synthesis meet robot The song of sound ray, and be the song that carrier plays that creation is completed by the specific image of virtual idol, while it is complete to control imaging device The displaying of pairs of specific image.

Illustrate the embodiment of the present invention below.

Fig. 1 is the imaging scene application of the song processing and performance interactive system based on virtual idol of the embodiment of the present application Schematic diagram.It is presented as shown in Figure 1, virtual idol runs in mobile device 101 and projected by imaging device 102.Cloud service Device 103 has computer readable storage medium, is interconnected by internet and mobile device 101, is received for mobile device 101 Data provide data analysis, processing and storage and support, wherein the physical location of mobile device 101 and imaging device 102 is with reference to right Together, to realize that mobile device 101 and the signal of imaging device 102 interconnect.Mobile device 101 receives and plays cloud service The target song files that device 103 exports also are based on target song files and generate imaging control information, and control targe song files It is exported with imaging control synchronizing information, to realize the display being incident upon the virtual idol for operating in itself on imaging device 102 Operation.Imaging device 102 receives the imaging control information that mobile device 101 is sent, and carries out void based on imaging control information The displaying of quasi- idol, wherein virtual idol displaying controls the specific image characteristics of information matches with the imaging.Imaging device 102 can be line holographic projections equipment, and line holographic projections equipment can provide the carrier supported of basic projection imaging, and can will move The contents such as the picture or word that are shown on dynamic device screen are shown, can also be acquired about vision, infrared and/or bluetooth Equal signals, are interacted with mobility-aid apparatus.It should be noted that the application is directed to mobile device 101 and imaging device 102 Device form be not especially limited, wherein mobile device 101 can be smart mobile phone, IPAD, tablet computer etc., this field Technical staff can select according to actual conditions.

Fig. 2 is the structural schematic diagram of the song processing and performance interactive system based on virtual idol of the embodiment of the present application. As shown in Fig. 2, the interactive system has：Input equipment 21, output equipment 22, song data processing system 23 and imaging device 102.Wherein, input equipment 21 is built in output equipment 22 in mobile device 101.It should be noted that song data is handled System 23 is built in cloud server 103, by abilities such as high in the clouds brain powerful storages and data processing, is completed to interaction The function that the audio-frequency information of object output is voluntarily created and sung.Below the composition to various pieces in the interactive system and Function is described in detail.

First, in input equipment 21, the multi-modal data for concurrently sending interactive object to export can be obtained.Specifically, The input equipment 21 is either the entity hardware such as microphone, preposition or postposition camera lens being installed in mobile device 101 is set It is standby, it can also be that network channel or local channel, the application are not especially limited this.When input equipment 21 is that entity hardware is set When standby, which can be converted into the format that song data processing system 23 can be read by the multi-modal data exported from interactive object Afterwards, song data processing system 23 is sent it to, at this point, interactive object can be video or the audio-frequency information etc. that user sings, The application is not especially limited this.Such as：User records a segment table by front camera and drills, the camera shooting in input equipment 21 The head drive software information that can complete the performance exports after changing into the multi-modal data of video format.When input equipment 21 For network channel or local channel when, the multi-modal data of acquisition directly can be sent to song data processing system by input equipment 21 System 23, at this point, interactive object can be network platform etc., the application is not especially limited this.Such as：Input equipment 21 can be direct The performance data that the network platform plays are obtained, or can multi-modal data be directly loaded onto by input by local channel by user In equipment 21.

Then, output equipment 22 is described in detail.Fig. 3 is at the song based on virtual idol of the embodiment of the present application Reason and the module frame chart for singing output equipment 22 in interactive system.Output equipment 22 can receive and play song data processing system The target song files that the creation of 23 output of system is completed can also be based on target song files and generate imaging control information, and complete The synchronism output of target song files and imaging control information.As shown in figure 3, the equipment 22 includes：Lyric characters decomposing module 221, oral area animation memory module 222, specific vivid memory module 223, the vivid generation module 224 of performance and synchronous output module 225。

The song processing based on virtual idol and output equipment 22 in performance interactive system that Fig. 4 is the embodiment of the present application Implementing procedure figure.The function of modules in output equipment 22 is described in detail with reference to Fig. 3 and Fig. 4.

Lyric characters decomposing module 221 receives the lyrics authoring information that song data processing system 23 is sent；Then will Lyrics authoring information is parsed into the lyrics fragment data as unit of pinyin syllable, is further decomposed into lyrics fragment data and works as The lyrics fragment coding of preceding lyrics segment, lyrics segment start-stop time-histories, initial consonant/simple or compound vowel of a Chinese syllable mark, tone coding, punctuate mark and Simple or compound vowel of a Chinese syllable/initial consonant content etc..Then, oral area is sent to according to code perhaps corresponding with content in the simple or compound vowel of a Chinese syllable/initial consonant got to move It draws in memory module 222.

Detailed process following points in above-mentioned lyric characters decomposing module 221 are needed to illustrate：Lyrics creation letter Breath includes lyrics fragment data, and the sequence that natural sequence is carried out by the phonetic arrangement position of each segment lyrics in song is compiled Code.Since lyrics fragment data is as unit of pinyin syllable into edlin, it may be wrapped in a complete word Containing several lyrics fragment datas.In tone coding, will can accordingly it be encoded without tone, a sound, two sound, three sound, the four tones of standard Chinese pronunciation. The application is not especially limited for the coding rule of lyrics authoring information, and those skilled in the art can carry out according to actual conditions Corresponding adjustment and definition.

(example) is if first of the whole head lyrics is " edelweiss ", wherein " suede " word will include two lyrics segments According to including following information in second fragment data：Lyrics fragment coding is " 5 ", and start-stop time-histories is " 00:01:01~00: 01:56 ", simple or compound vowel of a Chinese syllable mark, tone is two sound encoders " 2 ", and content information is " ong ".

Oral area animation memory module 222, is stored with the mouth shape cartoon data for each initial consonant and simple or compound vowel of a Chinese syllable, and the shape of the mouth as one speaks is dynamic It draws data to be made of the three dimensional local information of each pixel, which is got into above-mentioned lyric characters decomposing module 221 When what is sent is directed to the interior code for perhaps representing the piece segment information of piece segment information, oral area animation memory module 222 is understood from data Corresponding mouth shape cartoon data are found out in library, are sent in performance image generation module 224.It should be noted that the application Mouth shape cartoon data, but an only example are described using three dimensional local information, the application for mouth shape cartoon data and under The form of the animated types data such as specific vivid data, the vivid data of performance stated is not especially limited.

Specific image memory module 223, is stored with the specific image of preset virtual idol, can randomly select wherein The information of any one specific image is sent to performance image generation module 224.In addition, the module 223 can also receive user Image instruction, specific vivid information corresponding with the instruction is sent to the vivid generation module of performance 224.It should be noted that The form that the application transfers instruction for the specific image of the module is not especially limited, it is possible to specify the same specific shape of output Image information can randomly choose, and can also can also be converted to individual character from preceding two ways by the individualized selection of user Change selection mode.

It next proceeds to illustrate the vivid generation module of performance 224.The above-mentioned oral area animation of 224 real-time reception of module The specific vivid information of mouth shape cartoon data and specific vivid memory module 223 that memory module 222 is sent, by specific image It indicates that the information of oral area action is substituted for mouth shape cartoon data in information, generates and be directed to and current lyrics segment contents pair in real time The performance image answered, i.e., current imaging control information.

Synchronous output module 225, after the target song files for finishing receiving the transmission of song data processing system 23, again The imaging control information and lyric characters decomposing module 221 for obtaining the vivid transmission of generation module 224 of performance in real time are parsed The lyrics fragment coding and lyrics segment start-stop time-histories for current clip, above- mentioned information is integrated so that imaging control The output time of information processed exports simultaneously with the homologous segment in target song files.It should be noted that target song files Output played out by configuring the instantaneous speech powers such as speaker, boombox in mobile device 101；Imaging control Information is shown on the screen of mobile device 101, or directly will after the completion of mobile device 101 is connect with imaging device 102 Imaging control information is sent on imaging device 102, to realize auxiliary interactive function of the imaging device 102 to mobile device 101.

Referring again to FIGS. 2, then being illustrated to imaging device 102.Imaging device 102 receives output equipment and sends Imaging control information, and based on imaging control information shown.Specifically, if imaging device 102 is line holographic projections equipment, It will then be showed according to the specific vivid information of imaging control information displaying on 101 screen of mobile device.If its with shifting After dynamic equipment 101 is connected by wireless, bluetooth or the modes such as infrared, imaging control that direct real-time reception mobile device 101 generates Information processed by the preset three-dimensional imaging model conversation of the use of information at the three dimensional local information of specific image, and is presented, To realize the interaction capabilities of mobility-aid apparatus 101.

Finally, the song data processing system 23 based on virtual idol is described in detail.

Fig. 5 is the structural schematic diagram of the song data processing system 23 based on virtual idol of the embodiment of the present application.Such as Fig. 5 Shown, which, which has, takes off spectrum module 231, ci and qu editor module 232 and voice synthetic module 233.Below to song data The function of each module and workflow are specifically described in processing system 23.

Spectrum module 231 is taken off, multi-modal data is obtained, the audio that gives song recitals is extracted from multi-modal data, and will sing Song audio is converted into song files with collection of tunes of poems lock out operation, generates music-book information corresponding with the song files and lyrics letter Breath.Wherein, which further comprises audio extraction unit 2311, song separative element 2312 and ci and qu point by its function Solve unit 2313.

Specifically, Fig. 6 be the embodiment of the present application the song data processing system 23 based on virtual idol in take off spectrum mould The implementing procedure figure of block 231.As shown in Figure 6 and referring again to FIGS. 5, first, audio extraction unit 2311, reception is set from input Standby 21 multi-modal datas got are simultaneously identified, if multi-modal data is to include the various states such as audio, video or word Data, then therefrom extract corresponding audio-frequency information and be sent to song separative element 2312；If the multi-modal data got Only audio data is then forwarded directly to as the audio that gives song recitals in song separative element 2312.

Then, song separative element 2312 can utilize common song separation method, give song recitals what is got Audio carries out song separation, and retaining voice by voice identification technology sings opera arias data, and real using existing voice technology for eliminating The purpose of existing separating background voice data, background sound and voice are distinguished, and voice data of singing opera arias finally are sent to word Bent resolving cell 2313.In this example, in order to improve the accuracy of Pitch-Synchronous OLA, fundamental tone is first determined using open loop-closed-loop fashion Approximate range, then be adjusted and accurately extract, then use energy comparison to distinguish song and background sound, removal is only deposited In the fundamental tone of background sound period extraction, the separation for data of singing opera arias so as to complete voice.Wherein, background sound data are according to drilling The difference of the bent audio content of singing may include the information such as accompaniment, harmony.It should be noted that the application is clear for extraction voice The specific implementation method for singing data is not especially limited, and be may be used and is extracted the foundation of voice feature in voice identification technology accordingly Voice sings the methods of identification model, can also utilize voice enhancing technology, high pitch identification technology etc..

Then, ci and qu resolving cell 2313 is sung opera arias after data getting voice, is further decomposed voice and is sung opera arias data, Editor, which arranges, generates music-book information and lyrics information.Specifically, the generation of lyrics information is utilized speech recognition technology and combines The preliminary identification model of the lyrics of semantic understanding database sharing, the lyrics por-tion in data that voice is sung opera arias extract.It is singing In the preliminary identification model training process of word, historical data that a large amount of voice is sung opera arias is utilized after completing filtering and noise reduction, extraction The mapping relations for going out Speech acoustics parameter and the text information identified, to obtain the training mould of the preliminary identification model of the lyrics Type, then the voice that the module 2313 obtains data of singing opera arias are identified as input information, utilize above-mentioned training pattern to carry out Corresponding lyrics information is precipitated in characteristic matching, last solution.Wherein, the text information identified is obtained by speech recognition technology To after preliminary lyrics information, the lyrics tentatively generated are believed using natural language processing technique according to semantic understanding database Breath carries out the amendment for meeting logic of language or the word to not identifying clearly wherein is logically maked corrections, finally by above-mentioned knot Data basis of the fruit as the training preliminary identification model of the lyrics.In addition, speech recognition technology is utilized simultaneously in the generation of music-book information In conjunction with the preliminary identification model of music score that song note library is built, the score parts in data that voice is sung opera arias extract.In pleasure Compose in preliminary identification model training process, the historical data sung opera arias also with a large amount of voice after completing filtering and noise reduction, The mapping relations for extracting Speech acoustics frequency parameter and the note information identified, to obtain the preliminary identification model of music score Training pattern, then the voice that the module 2313 obtains data of singing opera arias are identified as input information, tentatively known according to music score Other training pattern carries out characteristic matching, obtains corresponding music-book information.Wherein, the note information identified is to pass through voice After identification technology obtains preliminary music-book information, according to song note library, by not clear identification in the music-book information tentatively generated Note carry out correction processing, the result that will finally make corrections is as the data basis of the trained preliminary identification model of music score.

Referring again to FIGS. 5, continuing to illustrate the ci and qu editor module 232 in song data processing system 23.Ci and qu Editor module 232 is built in cloud server 103, can according to music handle model to from take off spectrum module 231 get Music-book information and lyrics information carry out editing and processing, generate music score authoring information and lyrics authoring information.Wherein, the module is also Including：Song style recognition unit 2321, authoring models selection unit 2322 and edit cell 2323.

Specifically, Fig. 7 is the ci and qu volume in the song data processing system 23 based on virtual idol of the embodiment of the present application Collect the implementing procedure figure of module 232.As shown in fig. 7, below to the work(of each unit (with reference to figure 5) in ci and qu editor module 232 Energy and workflow illustrate.

Song style recognition unit 2321, can be based on the music-book information got, by the melody wind for having built completion Lattice identification model judges current song style, to generate corresponding song style information, then, by the above-mentioned style of generation Information is sent to authoring models selection unit 2322.Wherein, song style identification model includes the sound for various music styles The set for according with the feature of information has same feature according to various music styles when music-book information is inputted Different characteristic values, to judge the corresponding style of the melody, the code of final output respective songs style.In this example In, the confirmation that method carries out stylistic category is built by binary tree.First, first by stylistic category according to including popular, national, ancient Allusion quotation, jazz, folk rhyme, rock and roll, R＆B and put gram etc. classify；It then, will be from rhythm, note Species distributing, tone (audio) point Cloth situation etc. is used as subcharacter, summarizes the specific subcharacter information of the style subcharacter for each type；Then, it establishes Each characteristic node weight expression formula in binary tree；Finally, the weight analysis of comprehensive each node is as a result, build final song wind Lattice identification types.It is not especially limited it should be noted that the application sentences method for distinguishing for song style, people in the art Member is selected using 2321 function of complete cost-element as principle according to actual conditions.

In authoring models selection unit 2322, it is stored with the music built for each song style and handles model, it should Unit 2322 is transferred according to the above-mentioned song style of acquisition and handles model with the matched music of recent style.Wherein, music is handled Model is write words composition by being based on words pinyin library, song note and composition data library of writing words is trained, and in conjunction with multiclass It is built-up to be accustomed to property data base.It should be noted that music handles model, gone through with the music of a large amount of different song styles The data basis of music lyrics data and music notation data as the training model in history data, is corresponded to historical data and is given birth to At with the lyrics authoring information of pattern matched specified in the application and music score authoring information as training objective data, (record music is created in existing words pinyin library, song note library, composition data of the writing words library stored according to cloud server 103 Make theoretical rule feature) and for the composition custom property data base of writing words that various song styles are established, using machine learning Method trains the mapping relations from basic data to target data, to construct the creation of words and music rule for meeting creation custom Training pattern, i.e., music handle model.Do not make specifically it should be noted that the application handles the final form of model to music It limits, can also be the mark text obtained by above-mentioned training process other than using the mapping model trained in the application Part template can also be the mapping relations etc. after training.Wherein, words pinyin library has initial consonant and simple or compound vowel of a Chinese syllable in all phonetics, And it is encoded for each phonetic segment；Song note library include all note types and its duration information, rest type and Its duration information etc., and encoded for each note segment.

Wherein, composition custom property data base of writing words includes a large amount of custom feature of writing words for being directed to song style, makees Song custom feature and the matched habitual feature of ci and qu.(one embodiment) if judging current song style for rock and roll, Then since this style of song turns that sound is less, in lyrics segment, the time-histories of each initial consonant/simple or compound vowel of a Chinese syllable is shorter and the time-history curves of the whole first lyrics It is more steady；Again since this kind of style of song can elongate last or end syllable at sentence sentence tail so that the larger time-histories of lyrics segment is used to see disconnected At sentence mark and the lyrics segment corresponds at most two music score segments.(second embodiment) current song style is if judging R＆B then turns that sound is more, the number of fragments of shorter, the whole first music score of the time-histories of each note in music score segment due to this kind of style A more and lyrics segment often corresponds to multiple music score segments.

Referring again to FIGS. 5, in edit cell 2323, can according to the matched music of song style that is currently generated Model is handled to music-book information and lyrics information into edlin, to generate music score authoring information and lyrics authoring information.Wherein, it sings Word authoring information includes configured with lyrics fragment coding, lyrics segment start-stop time-histories, simple or compound vowel of a Chinese syllable/initial consonant mark, tone coding, punctuate Several lyrics fragment datas of mark and simple or compound vowel of a Chinese syllable/initial consonant content.Since lyrics fragment data is handed in the processing of above-mentioned song with performance It is stated that therefore this will not be repeated here in the lyric characters decomposing module 221 of mutual system.

Specifically, model is being handled to the lyrics information of input in the process of processing by music, can will sung first Word information carries out alphabetizing processing.Then, the lyrics information by alphabetizing processing is subjected to piece according to the different of initial consonant/simple or compound vowel of a Chinese syllable Sectionization processing, i.e., " in " the alphabetizing information of word is " zhong ", it generates after fragmentation processing and is sung for " zh " and " ong " two Word segment further carries out simple or compound vowel of a Chinese syllable/initial consonant to each segment and marks, and each segment is encoded.Then, fragmentation is sung Word information carries out the processing that theorizes according to theory of composition rule of writing words, and the beginning and ending time of each lyrics segment is demarcated, mark Note lyrics punctuate mark (such as：Ensure the lyrics " spring breeze blow all sorts of flowers perfume ", in melody in continuity).Finally, the lyrics are believed Breath carries out stylized processing, according to the song style information that the unit 2323 is got, according to the habit of writing words for song style Time course data in lyrics segment, is carried out final adjustment so that most by used, composition custom and the matched common custom of ci and qu Composition custom of writing words in the lyrics creation data fit respective songs style generated afterwards.

In addition, above-mentioned music score authoring information includes several music score fragment datas, each music score fragment data is configured with music score Fragment coding, music score segment start-stop time-histories, tone segments data and trifle mark.Music score fragment data is with the note of a unit A fragment data is generated for least unit, each segment needs to carry out according to the arrangement position of the score note in entire song The sequential encoding of natural sequence, the start-stop time-histories of each note are whole note, minim, four parts of notes, eight according to the note Corresponding duration such as dieresis or semiquaver is into edlin, if current note segment is the last one note of this trifle It then needs to mark effective trifle mark.

(example)

In song《Starlet》In, the corresponding music score of first lyrics " coruscating glittering " is " Do Do So So La La So”.This music score is divided into seven music score fragment datas, needs to include as follows in the 4th note segment " So " Information：Music score fragment coding is " 4 ", music score segment start-stop time-histories is the corresponding time-histories of crotchet, tone segments data are C tune The audio data and trifle of " So " are identified as effectively；It needs to include following information in the 5th note segment " La "：Music score piece Section is encoded to " 5 ", music score segment start-stop time-histories is the corresponding time-histories of crotchet, the audio that tone segments data are C tune " La " Data and trifle are identified as in vain.

Specifically, model is handled to the music-book information of input in the process of processing in music, first, by music-book information Note fragmentation processing is carried out as unit of minimum note, and each music score segment is encoded, the tone of each note is loaded Fragment data.Then, fragmentation music-book information is carried out the processing that theorizes according to theory of composition rule of writing words, by each music score piece The start-stop time-histories of section is demarcated, and marks trifle mark.Finally, music-book information is subjected to stylized processing, according to from song The song style that style recognition unit 2321 obtains transfers the feature of the composition custom of writing words for the style, by lyrics segment In time course data carry out final adjustment so that the music score authoring information ultimately produced has the work for current song style Word custom, composition custom and the matched common custom of ci and qu.

Referring again to FIGS. 5, next, being illustrated to the voice synthetic module 233 in song data processing system 23.Language Sound synthesis module 233, based on the sound ray with virtual idol, the music score authoring information that will be got from ci and qu editor module 232 Phonetic synthesis is carried out with lyrics authoring information, generate target song files and is exported.Wherein, which further includes：Sound ray is chosen Unit 2331 and song synthesis unit 2322.It should be noted that in this example, voice synthetic module 233 is built in mobile device , it can be achieved that speech-sound synthesizing function under line in 101, but the application is not especially limited the position of the module 233, can configure with To realize the real-time voice complex functionality on line in cloud server 103；Sound ray selection unit 2331 can also be configured beyond the clouds In server 103, while song synthesis unit 2322 being built in mobile device 101.

Fig. 8 is the voice synthetic module in the song data processing system 23 based on virtual idol of the embodiment of the present application 233 implementing procedure figure.As shown in figure 8, below to the function and work of each unit (with reference to figure 5) in voice synthetic module 233 It is illustrated as flow.

First, in sound ray selection unit 2331, it is stored with the virtual idol sound ray for different song styles, is based on The current song style obtained from ci and qu editor module 232, the determining and matched virtual idol sound ray of song style, to obtain It is directed to the synthetic effect data of the sound ray accordingly.

In song synthesis unit 2332, it can be matched by lyrics authoring information, music score authoring information, with song style Sound ray and background sound data etc. substitute into preset speech synthesis system, generate target song files.Specifically, the list Member 2332 carries out parsing pretreatment work to the above-mentioned information got first, further obtains for each in speech synthesis system A effective information of link.Wherein, pretreatment work includes at least：Music score authoring information is parsed, is obtained in each music score segment Effective trifle mark, note pitch data and its corresponding time course data, language is loaded onto according to the coded sequence of each segment Sound synthesis system；Lyrics authoring information is parsed, the simple or compound vowel of a Chinese syllable/initial consonant mark, punctuate mark, simple or compound vowel of a Chinese syllable/sound of each lyrics segment are obtained Female content and its corresponding time-histories, further distinguish initial consonant time-histories and simple or compound vowel of a Chinese syllable time-histories, similarly, according to each segment Coded sequence is loaded onto speech synthesis system.Then, it enters during phonetic synthesis to complete following synthetic operation：It is composed from taking off Module 231 obtains lyrics information, as the input information of the urtext of the speech synthesis system 23, by about text Frontal chromatography processing, generates the prosodic information for original lyrics information；It obtains and to need to be sequentially loaded by lyrics fragment coding Simple or compound vowel of a Chinese syllable/initial consonant content and its corresponding simple or compound vowel of a Chinese syllable/initial consonant mark, which is substituted into together with above-mentioned prosodic information in time-histories model, After initial consonant time-histories and simple or compound vowel of a Chinese syllable time-histories are completed in load respectively, the initial consonant time-histories and simple or compound vowel of a Chinese syllable time-histories with prosodic mark are obtained；By rhythm The corresponding simple or compound vowel of a Chinese syllable of mother/initial consonant content/initial consonant mark, and the initial consonant time-histories with prosodic mark and simple or compound vowel of a Chinese syllable time-histories are by pre- If acoustic model processing, output for current song SP information and AP information, in conjunction with needing by music score fragment coding It is sequentially loaded into trifle mark, note pitch data and its corresponding time course data for each segment, and is chosen from sound ray Unit 2331 obtain with the matched virtual idol sound ray of current song style information, final effect synthesis is carried out, to defeated Go out corresponding creation of song song；Finally, complete creation of song song with from taking off the background sound data that obtain of spectrum module 231 Merger operation, to generate final target song files.

It should be noted that the application is not especially limited the acquisition form of preset speech synthesis system, can adopt Corresponding time-histories model, acoustic model and effect synthetic model are generated with machine learning method, to be carried out to speech synthesis system Structure and phonetic synthesis operation.

In addition, the application also proposed a kind of song data processing method based on virtual idol.Fig. 9 is implemented for the application The step flow chart of the song data processing method based on virtual idol of example.As shown in figure 9, being carried out below to the processing method Certain explanation.

In step S910 (taking off spectrum step), takes off spectrum module 231 and obtain multi-modal data, extract and drill from multi-modal data Song audio is sung, and the audio that will give song recitals is converted into song files with collection of tunes of poems lock out operation, generates corresponding with the song files Music-book information and lyrics information.Specifically, audio extraction unit 2311 can obtain multi-modal data, and can be from multi-modal number The audio that gives song recitals is extracted according to middle, is sent to song separative element 2312 or by the audio-frequency information got directly as performance Song audio is forwarded directly in song separative element 2312.Song separative element 2312 by get give song recitals audio into Row song detaches, and retaining voice by voice identification technology sings opera arias data so that background sound data separating (removal) comes out, most Voice data of singing opera arias are sent to ci and qu resolving cell 2313 eventually.Voice is got in ci and qu resolving cell 2313 to sing opera arias data Afterwards, it further decomposes voice to sing opera arias data, editor, which arranges, generates music-book information and lyrics information.

After ci and qu editor module 232 receives the music-book information and lyrics information of taking off the spectrum transmission of module 231, step is entered In rapid S920 (ci and qu edit step).Specifically, first, believed according to the music score got by song style recognition unit 2321 Breath judges current song style, then, by above-mentioned song style correlation by having built the music style identification model of completion Information is sent to authoring models selection unit 2322.Authoring models selection unit 2322 is stored with to be built for each song style Music handle model, which transfers according to the above-mentioned song style of acquisition and the matched music of recent style is handled Model.Finally, by edit cell 2323 according to the matched music processing model of the song style information that is currently generated to from word The music-book information and lyrics information that bent resolving cell 2313 obtains are into edlin, to generate corresponding music score authoring information and the lyrics Authoring information, hence into in step S930 (phonetic synthesis step).

In step S930 (phonetic synthesis step), the sound ray selection unit 2331 in voice synthetic module 233 being capable of base In above-mentioned song style, the determining and matched virtual idol sound ray of current song style obtains being directed to the sound ray accordingly Synthetic effect data.Again by song synthesis unit 2332 by lyrics authoring information, music score authoring information, with song style information Sound ray, synthetic effect data background voice data for matching etc. substitute into preset speech synthesis system, generate target song files.

It should be noted that the above-mentioned song data processing method based on virtual idol can be used as computer program module It is stored on the computer-readable medium in cloud server 103, can be completed when which is executed by processor to interaction pair The function of voluntarily being created and sung as the audio-frequency information of output.

It should be understood that disclosed embodiment of this invention is not limited to specific structure disclosed herein, processing step, And the equivalent substitute for these features that those of ordinary skill in the related art are understood should be extended to.It is to be further understood that Term as used herein is used only for the purpose of describing specific embodiments, and is not intended to limit.

" one embodiment " or " embodiment " mentioned in specification means the special characteristic described in conjunction with the embodiments, structure Or characteristic includes at least one embodiment of the present invention.Therefore, the phrase " reality that specification various places throughout occurs Apply example " or " embodiment " the same embodiment might not be referred both to.

While it is disclosed that embodiment content as above but described only to facilitate understanding the present invention and adopting Embodiment is not limited to the present invention.Any those skilled in the art to which this invention pertains are not departing from this Under the premise of the disclosed spirit and scope of invention, any modification and change can be made in the implementing form and in details, But the scope of patent protection of the present invention, still should be subject to the scope of the claims as defined in the appended claims.

Claims

1. a kind of song data processing method based on virtual idol, which is characterized in that this method comprises the following steps：

Spectrum step is taken off, multi-modal data is obtained, the audio that gives song recitals is extracted from the multi-modal data, and the performance is sung Bent audio is converted into song files with collection of tunes of poems lock out operation, generates music-book information corresponding with the song files and lyrics information；

Ci and qu edit step handles model according to music and carries out editing and processing to the music-book information and the lyrics information, raw At music score authoring information and lyrics authoring information；

Phonetic synthesis step, based on the sound ray of the virtual idol, by the music score authoring information and the lyrics authoring information Phonetic synthesis is carried out, target song files are generated and is exported.

2. according to the method described in claim 1, it is characterized in that, in the ci and qu edit step, further comprise,

Based on the music-book information, current song style is judged；

It transfers and handles model with the matched music of the style；

Model is handled to the music-book information and the lyrics information into edlin, to generate according to the matched music of the style Music score authoring information and lyrics authoring information.

3. method according to claim 1 or 2, which is characterized in that

The lyrics authoring information includes lyrics fragment data, wherein the lyrics fragment data configured with lyrics fragment coding, Lyrics segment start-stop time-histories, simple or compound vowel of a Chinese syllable/initial consonant mark, tone coding, punctuate mark and simple or compound vowel of a Chinese syllable/initial consonant data；

The music score authoring information includes music score fragment data, wherein the music score fragment data configured with music score fragment coding, Music score segment start-stop time-histories, tone segments data and trifle mark.

4. method described in any one of claim 1 to 3, which is characterized in that take off in spectrum step described, further wrap It includes：

The audio that gives song recitals described in getting carries out song separation, removes background sound data, retains voice and sings opera arias data；

It further decomposes the voice to sing opera arias data, editor, which arranges, generates the music-book information and the lyrics information.

5. according to the method described in claim 3, it is characterized in that, in the phonetic synthesis step, further comprise,

Based on current song style, determine and the matched sound ray of song style；

By the lyrics authoring information, the music score authoring information and the matched sound ray of the song style and the background Voice data substitutes into preset speech synthesis system, generates target song files.

6. according to the method described in claim 1, it is characterized in that, the music handle model by be based on words pinyin library, Song note and composition data library of writing words are trained, and built-up in conjunction with multiclass composition custom property data base of writing words.

7. a kind of song data processing system based on virtual idol, which is characterized in that the system includes following module：

Take off spectrum module, obtain multi-modal data, from the multi-modal data extraction give song recitals audio, and by the performance Song audio is converted into song files with collection of tunes of poems lock out operation, generates music-book information corresponding with the song files and lyrics letter Breath；

Ci and qu editor module handles model according to music and carries out editing and processing to the music-book information and the lyrics information, Generate music score authoring information and lyrics authoring information；

Voice synthetic module is created the music score authoring information and the lyrics based on the sound ray with the virtual idol Information carries out phonetic synthesis, generates target song files and exports.

8. system according to claim 7, which is characterized in that in the ci and qu editor module, further comprise,

Song style recognition unit is based on the music-book information, judges current song style and generate corresponding information；

Authoring models selection unit is transferred and handles model with the matched music of the style；

Edit cell carries out the music-book information and the lyrics information according to the matched music processing model of the style Editor, to generate music score authoring information and lyrics authoring information.

9. system according to claim 7 or 8, which is characterized in that it is described to take off spectrum module, further comprise：

Song separative element, will get described in give song recitals audio carry out song separation, remove background sound data, protect Voice is stayed to sing opera arias data；

Ci and qu resolving cell further decomposes the voice and sings opera arias data, and editor, which arranges, generates the music-book information and described Lyrics information.

10. the system according to any one of claim 7~9, which is characterized in that the voice synthetic module, into one Step includes：

Sound ray selection unit is based on current song style, determines and the matched sound ray of song style；

Song synthesis unit is matched by the lyrics authoring information, the music score authoring information, with the song style information Sound ray and the background sound data substitute into preset speech synthesis system, generate target song files.

11. system according to claim 7 or 8, which is characterized in that the music handles model by being based on words pinyin Library, song note and composition data library of writing words are trained, and in conjunction with multiclass write words composition custom property data base it is built-up.

12. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor Such as method and step according to any one of claims 1 to 6 is realized when execution.

13. a kind of song processing based on virtual idol and performance interactive system, which is characterized in that the interactive system has：

Cloud server has computer readable storage medium as claimed in claim 12；

Mobile device receives and plays the target song files of the cloud server output, is also based on the target Song files generate imaging control information, and control the target song files and imaging control synchronizing information output；

Imaging device receives the imaging control information that the mobile device is sent, and controls information based on the imaging The displaying of virtual idol is carried out, the virtual idol displaying controls the specific image characteristics of information matches with the imaging.