CN108492817A - A kind of song data processing method and performance interactive system based on virtual idol - Google Patents
A kind of song data processing method and performance interactive system based on virtual idol Download PDFInfo
- Publication number
- CN108492817A CN108492817A CN201810142242.2A CN201810142242A CN108492817A CN 108492817 A CN108492817 A CN 108492817A CN 201810142242 A CN201810142242 A CN 201810142242A CN 108492817 A CN108492817 A CN 108492817A
- Authority
- CN
- China
- Prior art keywords
- information
- song
- music
- lyrics
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000002452 interceptive effect Effects 0.000 title claims abstract description 25
- 238000003672 processing method Methods 0.000 title claims abstract description 11
- 238000012545 processing Methods 0.000 claims abstract description 58
- 238000003384 imaging method Methods 0.000 claims abstract description 54
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 35
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 35
- 238000000034 method Methods 0.000 claims abstract description 31
- 239000012634 fragment Substances 0.000 claims description 34
- 150000001875 compounds Chemical class 0.000 claims description 24
- 239000000203 mixture Substances 0.000 claims description 21
- 238000001228 spectrum Methods 0.000 claims description 17
- 238000000926 separation method Methods 0.000 claims description 7
- 238000003860 storage Methods 0.000 claims description 7
- 238000012546 transfer Methods 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 3
- 238000013075 data extraction Methods 0.000 claims 1
- 238000012549 training Methods 0.000 description 13
- 238000005516 engineering process Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 11
- 238000000605 extraction Methods 0.000 description 8
- 230000003993 interaction Effects 0.000 description 8
- 230000000694 effects Effects 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 241001269238 Data Species 0.000 description 5
- 238000013507 mapping Methods 0.000 description 5
- 238000013467 fragmentation Methods 0.000 description 4
- 238000006062 fragmentation reaction Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000012937 correction Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 239000002245 particle Substances 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000033764 rhythmic process Effects 0.000 description 2
- 239000011435 rock Substances 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 241000196324 Embryophyta Species 0.000 description 1
- 241000226556 Leontopodium alpinum Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 238000004587 chromatography analysis Methods 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000005034 decoration Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005553 drilling Methods 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000004438 eyesight Effects 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 230000002650 habitual effect Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 239000002304 perfume Substances 0.000 description 1
- QHGVXILFMXYDRS-UHFFFAOYSA-N pyraclofos Chemical compound C1=C(OP(=O)(OCC)SCCC)C=NN1C1=CC=C(Cl)C=C1 QHGVXILFMXYDRS-UHFFFAOYSA-N 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 230000016776 visual perception Effects 0.000 description 1
- 239000011800 void material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
- G10L13/0335—Pitch control
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/101—Music Composition or musical creation; Tools or processes therefor
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/101—Music Composition or musical creation; Tools or processes therefor
- G10H2210/145—Composing rules, e.g. harmonic or musical rules, for use in automatic composition; Rule generation algorithms therefor
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/295—Noise generation, its use, control or rejection for music processing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/315—Sound category-dependent sound synthesis processes [Gensound] for musical use; Sound category-specific synthesis-controlling parameters or control means therefor
- G10H2250/455—Gensound singing voices, i.e. generation of human voices for musical applications, vocal singing sounds or intelligible words at a desired pitch or with desired vocal effects, e.g. by phoneme synthesis
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Auxiliary Devices For Music (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of song data processing methods based on virtual idol, and this method comprises the following steps:Multi-modal data is obtained, the audio that gives song recitals is extracted from multi-modal data, and the audio that will give song recitals is converted into song files with collection of tunes of poems lock out operation, generates music-book information corresponding with the song files and lyrics information;Model is handled according to music, editing and processing is carried out to music-book information and lyrics information, generate music score authoring information and lyrics authoring information;Based on the sound ray of virtual idol, music score authoring information and lyrics authoring information are subjected to phonetic synthesis, target song files is generated and exports.The application can be performed target on an imaging device from creation song by the control of mobile device, to assist interactive object voluntarily create and sing, promote the authoring experience of interactive object.
Description
Technical field
The present invention relates to field in intelligent robotics more particularly to a kind of song data processing method based on virtual idol and
Sing interactive system.
Background technology
With the continuous development of artificial intelligence field, now to the research work of robot not only just in industry
Field is also gradually expanded more across fields such as sufficient amusement, medical treatment, health care, family and services using object.
In entertainment field, played out at present in requesting song and existing song for the application of virtual robot is multipair, nothing
Method is according to setting a song to music and the lyrics generate corresponding song.Therefore, it is proposed to which a kind of interaction capabilities of new robot, assist user's sound
Minstrel carries out the creation of target song, improves user experience.
Invention content
The first technical problem to be solved by the present invention is to need to provide a kind of song data processing based on virtual idol
Method, this method comprises the following steps:Spectrum step is taken off, multi-modal data is obtained, extraction, which is sung, from the multi-modal data sings
Bent audio, and the audio that gives song recitals is converted to song files with collection of tunes of poems lock out operation, it generates corresponding with the song files
Music-book information and lyrics information;Ci and qu edit step handles model according to music and believes the music-book information and the lyrics
Breath carries out editing and processing, generates music score authoring information and lyrics authoring information;Phonetic synthesis step, based on the virtual idol
The music score authoring information and the lyrics authoring information are carried out phonetic synthesis by sound ray, are generated target song files and are exported.
In one embodiment, in the ci and qu edit step, further comprise, be based on the music-book information, judge
Current song style;It transfers and handles model with the matched music of the style;Model is handled according to the matched music of the style
To the music-book information and the lyrics information into edlin, to generate music score authoring information and lyrics authoring information.
In one embodiment, the lyrics authoring information includes lyrics fragment data, wherein the lyrics fragment data
Configured with lyrics fragment coding, lyrics segment start-stop time-histories, simple or compound vowel of a Chinese syllable/initial consonant mark, tone coding, punctuate mark and simple or compound vowel of a Chinese syllable/sound
Female data;The music score authoring information includes music score fragment data, wherein the music score fragment data is compiled configured with music score segment
Code, music score segment start-stop time-histories, tone segments data and trifle mark.
In one embodiment, it takes off in spectrum step, further comprises described:Give song recitals audio described in getting
Song separation is carried out, background sound data are removed, retains voice and sings opera arias data;It further decomposes the voice to sing opera arias data, compile
It collects to arrange and generates the music-book information and the lyrics information.
In one embodiment, in the phonetic synthesis step, further comprise, be based on current song style, determine
With the matched sound ray of song style;By the lyrics authoring information, the music score authoring information and the song style
The sound ray matched and the background sound data substitute into preset speech synthesis system, generate target song files.
In one embodiment, the music handles model by being based on words pinyin library, song note and composition of writing words
Database is trained, and in conjunction with multiclass write words composition custom property data base it is built-up.
Another aspect according to the ... of the embodiment of the present invention additionally provides a kind of song data processing system based on virtual idol
System, which includes following module:Spectrum module is taken off, multi-modal data is obtained, extraction, which is sung, from the multi-modal data sings
Bent audio, and the audio that gives song recitals is converted to song files with collection of tunes of poems lock out operation, it generates corresponding with the song files
Music-book information and lyrics information;Ci and qu editor module handles model to the music-book information and the lyrics according to music
Information carries out editing and processing, generates music score authoring information and lyrics authoring information;Voice synthetic module, based on it is described virtual
The music score authoring information and the lyrics authoring information are carried out phonetic synthesis, generate target song files by the sound ray of idol
And it exports.
In one embodiment, in the ci and qu editor module, further comprise, song style recognition unit, base
In the music-book information, judges current song style and generate corresponding information;Authoring models selection unit is transferred and the style
Matched music handles model;Edit cell handles model to the music-book information according to the matched music of the style
With the lyrics information into edlin, to generate music score authoring information and lyrics authoring information.
In one embodiment, described to take off spectrum module, further comprise:Song separative element, the institute that will be got
It states the audio that gives song recitals and carries out song separation, remove background sound data, retain voice and sing opera arias data;Ci and qu resolving cell,
It further decomposes the voice to sing opera arias data, editor, which arranges, generates the music-book information and the lyrics information.
In one embodiment, the voice synthetic module, further comprises:Sound ray selection unit, based on current
Song style determines and the matched sound ray of song style;Song synthesis unit, by the lyrics authoring information, described
Music score authoring information substitutes into preset phonetic synthesis system with the matched sound ray of the song style and the background sound data
In system, target song files are generated.
In one embodiment, the music handles model by being based on words pinyin library, song note and composition of writing words
Database is trained, and in conjunction with multiclass write words composition custom property data base it is built-up.
Another aspect according to the ... of the embodiment of the present invention additionally provides a kind of computer readable storage medium, is stored thereon with
The step of computer program, which realizes song data processing method described above when being executed by processor.
Another aspect according to the ... of the embodiment of the present invention, additionally provide it is a kind of based on virtual idol song processing with sing hand over
Mutual system, the interactive system have:Cloud server has computer readable storage medium as described above;It is mobile
Equipment receives and plays the target song files of the cloud server output, is also based on the target song files
Imaging control information is generated, and controls the target song files and imaging control synchronizing information output;Imaging device,
The imaging control information that the output equipment is sent is received, and carries out the exhibition of virtual idol based on the imaging control information
Show, the virtual idol displaying controls the specific image characteristics of information matches with the imaging.
Compared with prior art, one or more of said program embodiment can have the following advantages that or beneficial to effect
Fruit:
The embodiment of the present invention after the audio that gives song recitals in getting multi-modal data, can be identified by mobile device
Song style, and combined according to the style information and model is handled by the music that machine learning method customizes, to the sound that gives song recitals
Music-book information and lyrics information in frequency, further using the sound ray for style information, pass through mobile device into edlin
Control performs target from creation song on an imaging device.The virtual robot of the application can assist song origin interaction pair
As being created, and promote the authoring experience of interactive object.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification
It obtains it is clear that being understood by implementing technical scheme of the present invention.The purpose of the present invention and other advantages can by
Specifically noted structure and/or flow are realized and are obtained in specification, claims and attached drawing.
Description of the drawings
Attached drawing is used to provide further understanding of the present invention, and a part for constitution instruction, the reality with the present invention
It applies example and is used together to explain the present invention, be not construed as limiting the invention.In the accompanying drawings:
Fig. 1 is the imaging scene application of the song processing and performance interactive system based on virtual idol of the embodiment of the present application
Schematic diagram.
Fig. 2 is the structural schematic diagram of the song processing and performance interactive system based on virtual idol of the embodiment of the present application.
The song processing based on virtual idol and output equipment 22 in performance interactive system that Fig. 3 is the embodiment of the present application
Module frame chart.
The song processing based on virtual idol and output equipment 22 in performance interactive system that Fig. 4 is the embodiment of the present application
Implementing procedure figure.
Fig. 5 is the structural schematic diagram of the song data processing system 23 based on virtual idol of the embodiment of the present application.
Fig. 6 is to take off spectrum module 231 in the song data processing system 23 based on virtual idol of the embodiment of the present application
Implementing procedure figure.
Fig. 7 is the ci and qu editor module in the song data processing system 23 based on virtual idol of the embodiment of the present application
232 implementing procedure figure.
Fig. 8 is the voice synthetic module in the song data processing system 23 based on virtual idol of the embodiment of the present application
233 implementing procedure figure.
Fig. 9 is the step flow chart of the song data processing method based on virtual idol of the embodiment of the present application.
Specific implementation mode
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings and examples, how to be applied to the present invention whereby
Technological means solves technical problem, and the realization process for reaching relevant art effect can fully understand and implement.This Shen
Each feature that please be in embodiment and embodiment, can be combined with each other under the premise of not colliding, be formed by technical solution
Within protection scope of the present invention.
In addition, the flow of attached drawing can be in the computer system of such as a group of computer-executable instructions the step of illustrating
Middle execution.Also, although logical order is shown in flow charts, and it in some cases, can be with different from herein
Sequence executes shown or described step.
Embodiments herein can be realized on line and/or under line and interactive object completion multimode in Entertainment Scene
For the virtual idol of state data interaction come what is implemented, which has specific image characteristics, and is run in mobile device, by
Mobile device carries out control and is shown by imaging device.Mobile device can be that virtual idol configures social property, personality attribute
With personage's technical ability etc..Specifically, mobile device can be connect with cloud server and so that virtual idol have it is multi-modal man-machine
Interaction, have natural language understanding, visual perception, touch perception, language voice output, emotional facial expressions action output etc.
The ability of Artificial Intelligence (AI);The display function of imaging device can also be controlled, including right
The appendicular display of scene is controlled (such as flowers, plants and trees etc. in control scene) and to light, special efficacy, particle or is penetrated
The display of line, wherein light, special efficacy, particle and ray can be shown by imaging device.Wherein, social property can wrap
It includes:Appearance, dress ornament, decoration, gender, native place, the age, family relationship, occupation, position, religious belief, emotion state, is learned name
Go through equal attributes;Personality attribute may include:The attributes such as personality, makings;Personage's technical ability may include:Sing and dance, tell a story,
The professional skills such as training, and the displaying of personage's technical ability is not limited to the technical ability displaying of limbs, expression, head and/or mouth.
It should be noted that social property, personality attribute and the personage's technical ability etc. of virtual idol can be to multi-modal interactions
The resolving of data is instructed so that the multi-modal output data result that decision goes out is more prone to or is more suitable for the virtual idol
Picture.Virtual idol can also coordinate mobile device to project on imaging device simultaneously, and according to the scene of imaging device displaying into
Row is deduced, such as sing and dance etc..
In this application, virtual idol has song editing and processing ability, can be from the multi-modal data got
Obtain song audio information, by the information carry out imitate singer or composer write words composition custom song creation, synthesis meet robot
The song of sound ray, and be the song that carrier plays that creation is completed by the specific image of virtual idol, while it is complete to control imaging device
The displaying of pairs of specific image.
Illustrate the embodiment of the present invention below.
Fig. 1 is the imaging scene application of the song processing and performance interactive system based on virtual idol of the embodiment of the present application
Schematic diagram.It is presented as shown in Figure 1, virtual idol runs in mobile device 101 and projected by imaging device 102.Cloud service
Device 103 has computer readable storage medium, is interconnected by internet and mobile device 101, is received for mobile device 101
Data provide data analysis, processing and storage and support, wherein the physical location of mobile device 101 and imaging device 102 is with reference to right
Together, to realize that mobile device 101 and the signal of imaging device 102 interconnect.Mobile device 101 receives and plays cloud service
The target song files that device 103 exports also are based on target song files and generate imaging control information, and control targe song files
It is exported with imaging control synchronizing information, to realize the display being incident upon the virtual idol for operating in itself on imaging device 102
Operation.Imaging device 102 receives the imaging control information that mobile device 101 is sent, and carries out void based on imaging control information
The displaying of quasi- idol, wherein virtual idol displaying controls the specific image characteristics of information matches with the imaging.Imaging device
102 can be line holographic projections equipment, and line holographic projections equipment can provide the carrier supported of basic projection imaging, and can will move
The contents such as the picture or word that are shown on dynamic device screen are shown, can also be acquired about vision, infrared and/or bluetooth
Equal signals, are interacted with mobility-aid apparatus.It should be noted that the application is directed to mobile device 101 and imaging device 102
Device form be not especially limited, wherein mobile device 101 can be smart mobile phone, IPAD, tablet computer etc., this field
Technical staff can select according to actual conditions.
Fig. 2 is the structural schematic diagram of the song processing and performance interactive system based on virtual idol of the embodiment of the present application.
As shown in Fig. 2, the interactive system has:Input equipment 21, output equipment 22, song data processing system 23 and imaging device
102.Wherein, input equipment 21 is built in output equipment 22 in mobile device 101.It should be noted that song data is handled
System 23 is built in cloud server 103, by abilities such as high in the clouds brain powerful storages and data processing, is completed to interaction
The function that the audio-frequency information of object output is voluntarily created and sung.Below the composition to various pieces in the interactive system and
Function is described in detail.
First, in input equipment 21, the multi-modal data for concurrently sending interactive object to export can be obtained.Specifically,
The input equipment 21 is either the entity hardware such as microphone, preposition or postposition camera lens being installed in mobile device 101 is set
It is standby, it can also be that network channel or local channel, the application are not especially limited this.When input equipment 21 is that entity hardware is set
When standby, which can be converted into the format that song data processing system 23 can be read by the multi-modal data exported from interactive object
Afterwards, song data processing system 23 is sent it to, at this point, interactive object can be video or the audio-frequency information etc. that user sings,
The application is not especially limited this.Such as:User records a segment table by front camera and drills, the camera shooting in input equipment 21
The head drive software information that can complete the performance exports after changing into the multi-modal data of video format.When input equipment 21
For network channel or local channel when, the multi-modal data of acquisition directly can be sent to song data processing system by input equipment 21
System 23, at this point, interactive object can be network platform etc., the application is not especially limited this.Such as:Input equipment 21 can be direct
The performance data that the network platform plays are obtained, or can multi-modal data be directly loaded onto by input by local channel by user
In equipment 21.
Then, output equipment 22 is described in detail.Fig. 3 is at the song based on virtual idol of the embodiment of the present application
Reason and the module frame chart for singing output equipment 22 in interactive system.Output equipment 22 can receive and play song data processing system
The target song files that the creation of 23 output of system is completed can also be based on target song files and generate imaging control information, and complete
The synchronism output of target song files and imaging control information.As shown in figure 3, the equipment 22 includes:Lyric characters decomposing module
221, oral area animation memory module 222, specific vivid memory module 223, the vivid generation module 224 of performance and synchronous output module
225。
The song processing based on virtual idol and output equipment 22 in performance interactive system that Fig. 4 is the embodiment of the present application
Implementing procedure figure.The function of modules in output equipment 22 is described in detail with reference to Fig. 3 and Fig. 4.
Lyric characters decomposing module 221 receives the lyrics authoring information that song data processing system 23 is sent;Then will
Lyrics authoring information is parsed into the lyrics fragment data as unit of pinyin syllable, is further decomposed into lyrics fragment data and works as
The lyrics fragment coding of preceding lyrics segment, lyrics segment start-stop time-histories, initial consonant/simple or compound vowel of a Chinese syllable mark, tone coding, punctuate mark and
Simple or compound vowel of a Chinese syllable/initial consonant content etc..Then, oral area is sent to according to code perhaps corresponding with content in the simple or compound vowel of a Chinese syllable/initial consonant got to move
It draws in memory module 222.
Detailed process following points in above-mentioned lyric characters decomposing module 221 are needed to illustrate:Lyrics creation letter
Breath includes lyrics fragment data, and the sequence that natural sequence is carried out by the phonetic arrangement position of each segment lyrics in song is compiled
Code.Since lyrics fragment data is as unit of pinyin syllable into edlin, it may be wrapped in a complete word
Containing several lyrics fragment datas.In tone coding, will can accordingly it be encoded without tone, a sound, two sound, three sound, the four tones of standard Chinese pronunciation.
The application is not especially limited for the coding rule of lyrics authoring information, and those skilled in the art can carry out according to actual conditions
Corresponding adjustment and definition.
(example) is if first of the whole head lyrics is " edelweiss ", wherein " suede " word will include two lyrics segments
According to including following information in second fragment data:Lyrics fragment coding is " 5 ", and start-stop time-histories is " 00:01:01~00:
01:56 ", simple or compound vowel of a Chinese syllable mark, tone is two sound encoders " 2 ", and content information is " ong ".
Oral area animation memory module 222, is stored with the mouth shape cartoon data for each initial consonant and simple or compound vowel of a Chinese syllable, and the shape of the mouth as one speaks is dynamic
It draws data to be made of the three dimensional local information of each pixel, which is got into above-mentioned lyric characters decomposing module 221
When what is sent is directed to the interior code for perhaps representing the piece segment information of piece segment information, oral area animation memory module 222 is understood from data
Corresponding mouth shape cartoon data are found out in library, are sent in performance image generation module 224.It should be noted that the application
Mouth shape cartoon data, but an only example are described using three dimensional local information, the application for mouth shape cartoon data and under
The form of the animated types data such as specific vivid data, the vivid data of performance stated is not especially limited.
Specific image memory module 223, is stored with the specific image of preset virtual idol, can randomly select wherein
The information of any one specific image is sent to performance image generation module 224.In addition, the module 223 can also receive user
Image instruction, specific vivid information corresponding with the instruction is sent to the vivid generation module of performance 224.It should be noted that
The form that the application transfers instruction for the specific image of the module is not especially limited, it is possible to specify the same specific shape of output
Image information can randomly choose, and can also can also be converted to individual character from preceding two ways by the individualized selection of user
Change selection mode.
It next proceeds to illustrate the vivid generation module of performance 224.The above-mentioned oral area animation of 224 real-time reception of module
The specific vivid information of mouth shape cartoon data and specific vivid memory module 223 that memory module 222 is sent, by specific image
It indicates that the information of oral area action is substituted for mouth shape cartoon data in information, generates and be directed to and current lyrics segment contents pair in real time
The performance image answered, i.e., current imaging control information.
Synchronous output module 225, after the target song files for finishing receiving the transmission of song data processing system 23, again
The imaging control information and lyric characters decomposing module 221 for obtaining the vivid transmission of generation module 224 of performance in real time are parsed
The lyrics fragment coding and lyrics segment start-stop time-histories for current clip, above- mentioned information is integrated so that imaging control
The output time of information processed exports simultaneously with the homologous segment in target song files.It should be noted that target song files
Output played out by configuring the instantaneous speech powers such as speaker, boombox in mobile device 101;Imaging control
Information is shown on the screen of mobile device 101, or directly will after the completion of mobile device 101 is connect with imaging device 102
Imaging control information is sent on imaging device 102, to realize auxiliary interactive function of the imaging device 102 to mobile device 101.
Referring again to FIGS. 2, then being illustrated to imaging device 102.Imaging device 102 receives output equipment and sends
Imaging control information, and based on imaging control information shown.Specifically, if imaging device 102 is line holographic projections equipment,
It will then be showed according to the specific vivid information of imaging control information displaying on 101 screen of mobile device.If its with shifting
After dynamic equipment 101 is connected by wireless, bluetooth or the modes such as infrared, imaging control that direct real-time reception mobile device 101 generates
Information processed by the preset three-dimensional imaging model conversation of the use of information at the three dimensional local information of specific image, and is presented,
To realize the interaction capabilities of mobility-aid apparatus 101.
Finally, the song data processing system 23 based on virtual idol is described in detail.
Fig. 5 is the structural schematic diagram of the song data processing system 23 based on virtual idol of the embodiment of the present application.Such as Fig. 5
Shown, which, which has, takes off spectrum module 231, ci and qu editor module 232 and voice synthetic module 233.Below to song data
The function of each module and workflow are specifically described in processing system 23.
Spectrum module 231 is taken off, multi-modal data is obtained, the audio that gives song recitals is extracted from multi-modal data, and will sing
Song audio is converted into song files with collection of tunes of poems lock out operation, generates music-book information corresponding with the song files and lyrics letter
Breath.Wherein, which further comprises audio extraction unit 2311, song separative element 2312 and ci and qu point by its function
Solve unit 2313.
Specifically, Fig. 6 be the embodiment of the present application the song data processing system 23 based on virtual idol in take off spectrum mould
The implementing procedure figure of block 231.As shown in Figure 6 and referring again to FIGS. 5, first, audio extraction unit 2311, reception is set from input
Standby 21 multi-modal datas got are simultaneously identified, if multi-modal data is to include the various states such as audio, video or word
Data, then therefrom extract corresponding audio-frequency information and be sent to song separative element 2312;If the multi-modal data got
Only audio data is then forwarded directly to as the audio that gives song recitals in song separative element 2312.
Then, song separative element 2312 can utilize common song separation method, give song recitals what is got
Audio carries out song separation, and retaining voice by voice identification technology sings opera arias data, and real using existing voice technology for eliminating
The purpose of existing separating background voice data, background sound and voice are distinguished, and voice data of singing opera arias finally are sent to word
Bent resolving cell 2313.In this example, in order to improve the accuracy of Pitch-Synchronous OLA, fundamental tone is first determined using open loop-closed-loop fashion
Approximate range, then be adjusted and accurately extract, then use energy comparison to distinguish song and background sound, removal is only deposited
In the fundamental tone of background sound period extraction, the separation for data of singing opera arias so as to complete voice.Wherein, background sound data are according to drilling
The difference of the bent audio content of singing may include the information such as accompaniment, harmony.It should be noted that the application is clear for extraction voice
The specific implementation method for singing data is not especially limited, and be may be used and is extracted the foundation of voice feature in voice identification technology accordingly
Voice sings the methods of identification model, can also utilize voice enhancing technology, high pitch identification technology etc..
Then, ci and qu resolving cell 2313 is sung opera arias after data getting voice, is further decomposed voice and is sung opera arias data,
Editor, which arranges, generates music-book information and lyrics information.Specifically, the generation of lyrics information is utilized speech recognition technology and combines
The preliminary identification model of the lyrics of semantic understanding database sharing, the lyrics por-tion in data that voice is sung opera arias extract.It is singing
In the preliminary identification model training process of word, historical data that a large amount of voice is sung opera arias is utilized after completing filtering and noise reduction, extraction
The mapping relations for going out Speech acoustics parameter and the text information identified, to obtain the training mould of the preliminary identification model of the lyrics
Type, then the voice that the module 2313 obtains data of singing opera arias are identified as input information, utilize above-mentioned training pattern to carry out
Corresponding lyrics information is precipitated in characteristic matching, last solution.Wherein, the text information identified is obtained by speech recognition technology
To after preliminary lyrics information, the lyrics tentatively generated are believed using natural language processing technique according to semantic understanding database
Breath carries out the amendment for meeting logic of language or the word to not identifying clearly wherein is logically maked corrections, finally by above-mentioned knot
Data basis of the fruit as the training preliminary identification model of the lyrics.In addition, speech recognition technology is utilized simultaneously in the generation of music-book information
In conjunction with the preliminary identification model of music score that song note library is built, the score parts in data that voice is sung opera arias extract.In pleasure
Compose in preliminary identification model training process, the historical data sung opera arias also with a large amount of voice after completing filtering and noise reduction,
The mapping relations for extracting Speech acoustics frequency parameter and the note information identified, to obtain the preliminary identification model of music score
Training pattern, then the voice that the module 2313 obtains data of singing opera arias are identified as input information, tentatively known according to music score
Other training pattern carries out characteristic matching, obtains corresponding music-book information.Wherein, the note information identified is to pass through voice
After identification technology obtains preliminary music-book information, according to song note library, by not clear identification in the music-book information tentatively generated
Note carry out correction processing, the result that will finally make corrections is as the data basis of the trained preliminary identification model of music score.
Referring again to FIGS. 5, continuing to illustrate the ci and qu editor module 232 in song data processing system 23.Ci and qu
Editor module 232 is built in cloud server 103, can according to music handle model to from take off spectrum module 231 get
Music-book information and lyrics information carry out editing and processing, generate music score authoring information and lyrics authoring information.Wherein, the module is also
Including:Song style recognition unit 2321, authoring models selection unit 2322 and edit cell 2323.
Specifically, Fig. 7 is the ci and qu volume in the song data processing system 23 based on virtual idol of the embodiment of the present application
Collect the implementing procedure figure of module 232.As shown in fig. 7, below to the work(of each unit (with reference to figure 5) in ci and qu editor module 232
Energy and workflow illustrate.
Song style recognition unit 2321, can be based on the music-book information got, by the melody wind for having built completion
Lattice identification model judges current song style, to generate corresponding song style information, then, by the above-mentioned style of generation
Information is sent to authoring models selection unit 2322.Wherein, song style identification model includes the sound for various music styles
The set for according with the feature of information has same feature according to various music styles when music-book information is inputted
Different characteristic values, to judge the corresponding style of the melody, the code of final output respective songs style.In this example
In, the confirmation that method carries out stylistic category is built by binary tree.First, first by stylistic category according to including popular, national, ancient
Allusion quotation, jazz, folk rhyme, rock and roll, R&B and put gram etc. classify;It then, will be from rhythm, note Species distributing, tone (audio) point
Cloth situation etc. is used as subcharacter, summarizes the specific subcharacter information of the style subcharacter for each type;Then, it establishes
Each characteristic node weight expression formula in binary tree;Finally, the weight analysis of comprehensive each node is as a result, build final song wind
Lattice identification types.It is not especially limited it should be noted that the application sentences method for distinguishing for song style, people in the art
Member is selected using 2321 function of complete cost-element as principle according to actual conditions.
In authoring models selection unit 2322, it is stored with the music built for each song style and handles model, it should
Unit 2322 is transferred according to the above-mentioned song style of acquisition and handles model with the matched music of recent style.Wherein, music is handled
Model is write words composition by being based on words pinyin library, song note and composition data library of writing words is trained, and in conjunction with multiclass
It is built-up to be accustomed to property data base.It should be noted that music handles model, gone through with the music of a large amount of different song styles
The data basis of music lyrics data and music notation data as the training model in history data, is corresponded to historical data and is given birth to
At with the lyrics authoring information of pattern matched specified in the application and music score authoring information as training objective data,
(record music is created in existing words pinyin library, song note library, composition data of the writing words library stored according to cloud server 103
Make theoretical rule feature) and for the composition custom property data base of writing words that various song styles are established, using machine learning
Method trains the mapping relations from basic data to target data, to construct the creation of words and music rule for meeting creation custom
Training pattern, i.e., music handle model.Do not make specifically it should be noted that the application handles the final form of model to music
It limits, can also be the mark text obtained by above-mentioned training process other than using the mapping model trained in the application
Part template can also be the mapping relations etc. after training.Wherein, words pinyin library has initial consonant and simple or compound vowel of a Chinese syllable in all phonetics,
And it is encoded for each phonetic segment;Song note library include all note types and its duration information, rest type and
Its duration information etc., and encoded for each note segment.
Wherein, composition custom property data base of writing words includes a large amount of custom feature of writing words for being directed to song style, makees
Song custom feature and the matched habitual feature of ci and qu.(one embodiment) if judging current song style for rock and roll,
Then since this style of song turns that sound is less, in lyrics segment, the time-histories of each initial consonant/simple or compound vowel of a Chinese syllable is shorter and the time-history curves of the whole first lyrics
It is more steady;Again since this kind of style of song can elongate last or end syllable at sentence sentence tail so that the larger time-histories of lyrics segment is used to see disconnected
At sentence mark and the lyrics segment corresponds at most two music score segments.(second embodiment) current song style is if judging
R&B then turns that sound is more, the number of fragments of shorter, the whole first music score of the time-histories of each note in music score segment due to this kind of style
A more and lyrics segment often corresponds to multiple music score segments.
Referring again to FIGS. 5, in edit cell 2323, can according to the matched music of song style that is currently generated
Model is handled to music-book information and lyrics information into edlin, to generate music score authoring information and lyrics authoring information.Wherein, it sings
Word authoring information includes configured with lyrics fragment coding, lyrics segment start-stop time-histories, simple or compound vowel of a Chinese syllable/initial consonant mark, tone coding, punctuate
Several lyrics fragment datas of mark and simple or compound vowel of a Chinese syllable/initial consonant content.Since lyrics fragment data is handed in the processing of above-mentioned song with performance
It is stated that therefore this will not be repeated here in the lyric characters decomposing module 221 of mutual system.
Specifically, model is being handled to the lyrics information of input in the process of processing by music, can will sung first
Word information carries out alphabetizing processing.Then, the lyrics information by alphabetizing processing is subjected to piece according to the different of initial consonant/simple or compound vowel of a Chinese syllable
Sectionization processing, i.e., " in " the alphabetizing information of word is " zhong ", it generates after fragmentation processing and is sung for " zh " and " ong " two
Word segment further carries out simple or compound vowel of a Chinese syllable/initial consonant to each segment and marks, and each segment is encoded.Then, fragmentation is sung
Word information carries out the processing that theorizes according to theory of composition rule of writing words, and the beginning and ending time of each lyrics segment is demarcated, mark
Note lyrics punctuate mark (such as:Ensure the lyrics " spring breeze blow all sorts of flowers perfume ", in melody in continuity).Finally, the lyrics are believed
Breath carries out stylized processing, according to the song style information that the unit 2323 is got, according to the habit of writing words for song style
Time course data in lyrics segment, is carried out final adjustment so that most by used, composition custom and the matched common custom of ci and qu
Composition custom of writing words in the lyrics creation data fit respective songs style generated afterwards.
In addition, above-mentioned music score authoring information includes several music score fragment datas, each music score fragment data is configured with music score
Fragment coding, music score segment start-stop time-histories, tone segments data and trifle mark.Music score fragment data is with the note of a unit
A fragment data is generated for least unit, each segment needs to carry out according to the arrangement position of the score note in entire song
The sequential encoding of natural sequence, the start-stop time-histories of each note are whole note, minim, four parts of notes, eight according to the note
Corresponding duration such as dieresis or semiquaver is into edlin, if current note segment is the last one note of this trifle
It then needs to mark effective trifle mark.
(example)
In song《Starlet》In, the corresponding music score of first lyrics " coruscating glittering " is " Do Do So So
La La So”.This music score is divided into seven music score fragment datas, needs to include as follows in the 4th note segment " So "
Information:Music score fragment coding is " 4 ", music score segment start-stop time-histories is the corresponding time-histories of crotchet, tone segments data are C tune
The audio data and trifle of " So " are identified as effectively;It needs to include following information in the 5th note segment " La ":Music score piece
Section is encoded to " 5 ", music score segment start-stop time-histories is the corresponding time-histories of crotchet, the audio that tone segments data are C tune " La "
Data and trifle are identified as in vain.
Specifically, model is handled to the music-book information of input in the process of processing in music, first, by music-book information
Note fragmentation processing is carried out as unit of minimum note, and each music score segment is encoded, the tone of each note is loaded
Fragment data.Then, fragmentation music-book information is carried out the processing that theorizes according to theory of composition rule of writing words, by each music score piece
The start-stop time-histories of section is demarcated, and marks trifle mark.Finally, music-book information is subjected to stylized processing, according to from song
The song style that style recognition unit 2321 obtains transfers the feature of the composition custom of writing words for the style, by lyrics segment
In time course data carry out final adjustment so that the music score authoring information ultimately produced has the work for current song style
Word custom, composition custom and the matched common custom of ci and qu.
Referring again to FIGS. 5, next, being illustrated to the voice synthetic module 233 in song data processing system 23.Language
Sound synthesis module 233, based on the sound ray with virtual idol, the music score authoring information that will be got from ci and qu editor module 232
Phonetic synthesis is carried out with lyrics authoring information, generate target song files and is exported.Wherein, which further includes:Sound ray is chosen
Unit 2331 and song synthesis unit 2322.It should be noted that in this example, voice synthetic module 233 is built in mobile device
, it can be achieved that speech-sound synthesizing function under line in 101, but the application is not especially limited the position of the module 233, can configure with
To realize the real-time voice complex functionality on line in cloud server 103;Sound ray selection unit 2331 can also be configured beyond the clouds
In server 103, while song synthesis unit 2322 being built in mobile device 101.
Fig. 8 is the voice synthetic module in the song data processing system 23 based on virtual idol of the embodiment of the present application
233 implementing procedure figure.As shown in figure 8, below to the function and work of each unit (with reference to figure 5) in voice synthetic module 233
It is illustrated as flow.
First, in sound ray selection unit 2331, it is stored with the virtual idol sound ray for different song styles, is based on
The current song style obtained from ci and qu editor module 232, the determining and matched virtual idol sound ray of song style, to obtain
It is directed to the synthetic effect data of the sound ray accordingly.
In song synthesis unit 2332, it can be matched by lyrics authoring information, music score authoring information, with song style
Sound ray and background sound data etc. substitute into preset speech synthesis system, generate target song files.Specifically, the list
Member 2332 carries out parsing pretreatment work to the above-mentioned information got first, further obtains for each in speech synthesis system
A effective information of link.Wherein, pretreatment work includes at least:Music score authoring information is parsed, is obtained in each music score segment
Effective trifle mark, note pitch data and its corresponding time course data, language is loaded onto according to the coded sequence of each segment
Sound synthesis system;Lyrics authoring information is parsed, the simple or compound vowel of a Chinese syllable/initial consonant mark, punctuate mark, simple or compound vowel of a Chinese syllable/sound of each lyrics segment are obtained
Female content and its corresponding time-histories, further distinguish initial consonant time-histories and simple or compound vowel of a Chinese syllable time-histories, similarly, according to each segment
Coded sequence is loaded onto speech synthesis system.Then, it enters during phonetic synthesis to complete following synthetic operation:It is composed from taking off
Module 231 obtains lyrics information, as the input information of the urtext of the speech synthesis system 23, by about text
Frontal chromatography processing, generates the prosodic information for original lyrics information;It obtains and to need to be sequentially loaded by lyrics fragment coding
Simple or compound vowel of a Chinese syllable/initial consonant content and its corresponding simple or compound vowel of a Chinese syllable/initial consonant mark, which is substituted into together with above-mentioned prosodic information in time-histories model,
After initial consonant time-histories and simple or compound vowel of a Chinese syllable time-histories are completed in load respectively, the initial consonant time-histories and simple or compound vowel of a Chinese syllable time-histories with prosodic mark are obtained;By rhythm
The corresponding simple or compound vowel of a Chinese syllable of mother/initial consonant content/initial consonant mark, and the initial consonant time-histories with prosodic mark and simple or compound vowel of a Chinese syllable time-histories are by pre-
If acoustic model processing, output for current song SP information and AP information, in conjunction with needing by music score fragment coding
It is sequentially loaded into trifle mark, note pitch data and its corresponding time course data for each segment, and is chosen from sound ray
Unit 2331 obtain with the matched virtual idol sound ray of current song style information, final effect synthesis is carried out, to defeated
Go out corresponding creation of song song;Finally, complete creation of song song with from taking off the background sound data that obtain of spectrum module 231
Merger operation, to generate final target song files.
It should be noted that the application is not especially limited the acquisition form of preset speech synthesis system, can adopt
Corresponding time-histories model, acoustic model and effect synthetic model are generated with machine learning method, to be carried out to speech synthesis system
Structure and phonetic synthesis operation.
In addition, the application also proposed a kind of song data processing method based on virtual idol.Fig. 9 is implemented for the application
The step flow chart of the song data processing method based on virtual idol of example.As shown in figure 9, being carried out below to the processing method
Certain explanation.
In step S910 (taking off spectrum step), takes off spectrum module 231 and obtain multi-modal data, extract and drill from multi-modal data
Song audio is sung, and the audio that will give song recitals is converted into song files with collection of tunes of poems lock out operation, generates corresponding with the song files
Music-book information and lyrics information.Specifically, audio extraction unit 2311 can obtain multi-modal data, and can be from multi-modal number
The audio that gives song recitals is extracted according to middle, is sent to song separative element 2312 or by the audio-frequency information got directly as performance
Song audio is forwarded directly in song separative element 2312.Song separative element 2312 by get give song recitals audio into
Row song detaches, and retaining voice by voice identification technology sings opera arias data so that background sound data separating (removal) comes out, most
Voice data of singing opera arias are sent to ci and qu resolving cell 2313 eventually.Voice is got in ci and qu resolving cell 2313 to sing opera arias data
Afterwards, it further decomposes voice to sing opera arias data, editor, which arranges, generates music-book information and lyrics information.
After ci and qu editor module 232 receives the music-book information and lyrics information of taking off the spectrum transmission of module 231, step is entered
In rapid S920 (ci and qu edit step).Specifically, first, believed according to the music score got by song style recognition unit 2321
Breath judges current song style, then, by above-mentioned song style correlation by having built the music style identification model of completion
Information is sent to authoring models selection unit 2322.Authoring models selection unit 2322 is stored with to be built for each song style
Music handle model, which transfers according to the above-mentioned song style of acquisition and the matched music of recent style is handled
Model.Finally, by edit cell 2323 according to the matched music processing model of the song style information that is currently generated to from word
The music-book information and lyrics information that bent resolving cell 2313 obtains are into edlin, to generate corresponding music score authoring information and the lyrics
Authoring information, hence into in step S930 (phonetic synthesis step).
In step S930 (phonetic synthesis step), the sound ray selection unit 2331 in voice synthetic module 233 being capable of base
In above-mentioned song style, the determining and matched virtual idol sound ray of current song style obtains being directed to the sound ray accordingly
Synthetic effect data.Again by song synthesis unit 2332 by lyrics authoring information, music score authoring information, with song style information
Sound ray, synthetic effect data background voice data for matching etc. substitute into preset speech synthesis system, generate target song files.
It should be noted that the above-mentioned song data processing method based on virtual idol can be used as computer program module
It is stored on the computer-readable medium in cloud server 103, can be completed when which is executed by processor to interaction pair
The function of voluntarily being created and sung as the audio-frequency information of output.
It should be understood that disclosed embodiment of this invention is not limited to specific structure disclosed herein, processing step,
And the equivalent substitute for these features that those of ordinary skill in the related art are understood should be extended to.It is to be further understood that
Term as used herein is used only for the purpose of describing specific embodiments, and is not intended to limit.
" one embodiment " or " embodiment " mentioned in specification means the special characteristic described in conjunction with the embodiments, structure
Or characteristic includes at least one embodiment of the present invention.Therefore, the phrase " reality that specification various places throughout occurs
Apply example " or " embodiment " the same embodiment might not be referred both to.
While it is disclosed that embodiment content as above but described only to facilitate understanding the present invention and adopting
Embodiment is not limited to the present invention.Any those skilled in the art to which this invention pertains are not departing from this
Under the premise of the disclosed spirit and scope of invention, any modification and change can be made in the implementing form and in details,
But the scope of patent protection of the present invention, still should be subject to the scope of the claims as defined in the appended claims.
Claims (13)
1. a kind of song data processing method based on virtual idol, which is characterized in that this method comprises the following steps:
Spectrum step is taken off, multi-modal data is obtained, the audio that gives song recitals is extracted from the multi-modal data, and the performance is sung
Bent audio is converted into song files with collection of tunes of poems lock out operation, generates music-book information corresponding with the song files and lyrics information;
Ci and qu edit step handles model according to music and carries out editing and processing to the music-book information and the lyrics information, raw
At music score authoring information and lyrics authoring information;
Phonetic synthesis step, based on the sound ray of the virtual idol, by the music score authoring information and the lyrics authoring information
Phonetic synthesis is carried out, target song files are generated and is exported.
2. according to the method described in claim 1, it is characterized in that, in the ci and qu edit step, further comprise,
Based on the music-book information, current song style is judged;
It transfers and handles model with the matched music of the style;
Model is handled to the music-book information and the lyrics information into edlin, to generate according to the matched music of the style
Music score authoring information and lyrics authoring information.
3. method according to claim 1 or 2, which is characterized in that
The lyrics authoring information includes lyrics fragment data, wherein the lyrics fragment data configured with lyrics fragment coding,
Lyrics segment start-stop time-histories, simple or compound vowel of a Chinese syllable/initial consonant mark, tone coding, punctuate mark and simple or compound vowel of a Chinese syllable/initial consonant data;
The music score authoring information includes music score fragment data, wherein the music score fragment data configured with music score fragment coding,
Music score segment start-stop time-histories, tone segments data and trifle mark.
4. method described in any one of claim 1 to 3, which is characterized in that take off in spectrum step described, further wrap
It includes:
The audio that gives song recitals described in getting carries out song separation, removes background sound data, retains voice and sings opera arias data;
It further decomposes the voice to sing opera arias data, editor, which arranges, generates the music-book information and the lyrics information.
5. according to the method described in claim 3, it is characterized in that, in the phonetic synthesis step, further comprise,
Based on current song style, determine and the matched sound ray of song style;
By the lyrics authoring information, the music score authoring information and the matched sound ray of the song style and the background
Voice data substitutes into preset speech synthesis system, generates target song files.
6. according to the method described in claim 1, it is characterized in that, the music handle model by be based on words pinyin library,
Song note and composition data library of writing words are trained, and built-up in conjunction with multiclass composition custom property data base of writing words.
7. a kind of song data processing system based on virtual idol, which is characterized in that the system includes following module:
Take off spectrum module, obtain multi-modal data, from the multi-modal data extraction give song recitals audio, and by the performance
Song audio is converted into song files with collection of tunes of poems lock out operation, generates music-book information corresponding with the song files and lyrics letter
Breath;
Ci and qu editor module handles model according to music and carries out editing and processing to the music-book information and the lyrics information,
Generate music score authoring information and lyrics authoring information;
Voice synthetic module is created the music score authoring information and the lyrics based on the sound ray with the virtual idol
Information carries out phonetic synthesis, generates target song files and exports.
8. system according to claim 7, which is characterized in that in the ci and qu editor module, further comprise,
Song style recognition unit is based on the music-book information, judges current song style and generate corresponding information;
Authoring models selection unit is transferred and handles model with the matched music of the style;
Edit cell carries out the music-book information and the lyrics information according to the matched music processing model of the style
Editor, to generate music score authoring information and lyrics authoring information.
9. system according to claim 7 or 8, which is characterized in that it is described to take off spectrum module, further comprise:
Song separative element, will get described in give song recitals audio carry out song separation, remove background sound data, protect
Voice is stayed to sing opera arias data;
Ci and qu resolving cell further decomposes the voice and sings opera arias data, and editor, which arranges, generates the music-book information and described
Lyrics information.
10. the system according to any one of claim 7~9, which is characterized in that the voice synthetic module, into one
Step includes:
Sound ray selection unit is based on current song style, determines and the matched sound ray of song style;
Song synthesis unit is matched by the lyrics authoring information, the music score authoring information, with the song style information
Sound ray and the background sound data substitute into preset speech synthesis system, generate target song files.
11. system according to claim 7 or 8, which is characterized in that the music handles model by being based on words pinyin
Library, song note and composition data library of writing words are trained, and in conjunction with multiclass write words composition custom property data base it is built-up.
12. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor
Such as method and step according to any one of claims 1 to 6 is realized when execution.
13. a kind of song processing based on virtual idol and performance interactive system, which is characterized in that the interactive system has:
Cloud server has computer readable storage medium as claimed in claim 12;
Mobile device receives and plays the target song files of the cloud server output, is also based on the target
Song files generate imaging control information, and control the target song files and imaging control synchronizing information output;
Imaging device receives the imaging control information that the mobile device is sent, and controls information based on the imaging
The displaying of virtual idol is carried out, the virtual idol displaying controls the specific image characteristics of information matches with the imaging.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810142242.2A CN108492817B (en) | 2018-02-11 | 2018-02-11 | Song data processing method based on virtual idol and singing interaction system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810142242.2A CN108492817B (en) | 2018-02-11 | 2018-02-11 | Song data processing method based on virtual idol and singing interaction system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108492817A true CN108492817A (en) | 2018-09-04 |
CN108492817B CN108492817B (en) | 2020-11-10 |
Family
ID=63340216
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810142242.2A Active CN108492817B (en) | 2018-02-11 | 2018-02-11 | Song data processing method based on virtual idol and singing interaction system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108492817B (en) |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109215626A (en) * | 2018-10-26 | 2019-01-15 | 广东电网有限责任公司 | Method for automatically composing words and music |
CN109471951A (en) * | 2018-09-19 | 2019-03-15 | 平安科技(深圳)有限公司 | Lyrics generation method, device, equipment and storage medium neural network based |
CN109817191A (en) * | 2019-01-04 | 2019-05-28 | 平安科技(深圳)有限公司 | Trill modeling method, device, computer equipment and storage medium |
CN109829482A (en) * | 2019-01-04 | 2019-05-31 | 平安科技(深圳)有限公司 | Song training data processing method, device and computer readable storage medium |
CN110136678A (en) * | 2019-04-26 | 2019-08-16 | 北京奇艺世纪科技有限公司 | A kind of music method, apparatus and electronic equipment |
CN110570876A (en) * | 2019-07-30 | 2019-12-13 | 平安科技(深圳)有限公司 | Singing voice synthesis method and device, computer equipment and storage medium |
CN111326131A (en) * | 2020-03-03 | 2020-06-23 | 北京香侬慧语科技有限责任公司 | Song conversion method, device, equipment and medium |
CN111445897A (en) * | 2020-03-23 | 2020-07-24 | 北京字节跳动网络技术有限公司 | Song generation method and device, readable medium and electronic equipment |
CN111798821A (en) * | 2020-06-29 | 2020-10-20 | 北京字节跳动网络技术有限公司 | Sound conversion method, device, readable storage medium and electronic equipment |
CN112417201A (en) * | 2019-08-22 | 2021-02-26 | 北京峰趣互联网信息服务有限公司 | Audio information pushing method and system, electronic equipment and computer readable medium |
CN112420008A (en) * | 2019-08-22 | 2021-02-26 | 北京峰趣互联网信息服务有限公司 | Method and device for recording songs, electronic equipment and storage medium |
CN112955948A (en) * | 2018-09-25 | 2021-06-11 | 宅斯楚蒙特公司 | Musical instrument and method for real-time music generation |
CN113409747A (en) * | 2021-05-28 | 2021-09-17 | 北京达佳互联信息技术有限公司 | Song generation method and device, electronic equipment and storage medium |
CN113539217A (en) * | 2021-06-29 | 2021-10-22 | 广州酷狗计算机科技有限公司 | Lyric creation navigation method and device, equipment, medium and product thereof |
CN113611267A (en) * | 2021-08-17 | 2021-11-05 | 网易(杭州)网络有限公司 | Word and song processing method and device, computer readable storage medium and computer equipment |
CN113808555A (en) * | 2021-09-17 | 2021-12-17 | 广州酷狗计算机科技有限公司 | Song synthesis method and device, equipment, medium and product thereof |
CN113836344A (en) * | 2021-09-30 | 2021-12-24 | 广州艾美网络科技有限公司 | Personalized song file generation method and device and music singing equipment |
CN113851146A (en) * | 2021-09-26 | 2021-12-28 | 平安科技(深圳)有限公司 | Performance evaluation method and device based on feature decomposition |
CN114972592A (en) * | 2022-06-22 | 2022-08-30 | 成都潜在人工智能科技有限公司 | Singing mouth shape and facial animation generation method and device and electronic equipment |
CN117765903A (en) * | 2023-05-31 | 2024-03-26 | 深圳火山视觉技术发展有限公司 | Intelligent music creation method and system |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101313477A (en) * | 2005-12-21 | 2008-11-26 | Lg电子株式会社 | Music generating device and operating method thereof |
CN101414322A (en) * | 2007-10-16 | 2009-04-22 | 盛趣信息技术(上海)有限公司 | Exhibition method and system for virtual role |
US20090314155A1 (en) * | 2008-06-20 | 2009-12-24 | Microsoft Corporation | Synthesized singing voice waveform generator |
US20140046667A1 (en) * | 2011-04-28 | 2014-02-13 | Tgens Co., Ltd | System for creating musical content using a client terminal |
US8687005B2 (en) * | 2007-06-26 | 2014-04-01 | Samsung Electronics Co., Ltd. | Apparatus and method for synchronizing and sharing virtual character |
CN103839559A (en) * | 2012-11-20 | 2014-06-04 | 华为技术有限公司 | Audio file manufacturing method and terminal equipment |
CN105740394A (en) * | 2016-01-27 | 2016-07-06 | 广州酷狗计算机科技有限公司 | Music generation method, terminal, and server |
CN106448630A (en) * | 2016-09-09 | 2017-02-22 | 腾讯科技(深圳)有限公司 | Method and device for generating digital music file of song |
CN106652984A (en) * | 2016-10-11 | 2017-05-10 | 张文铂 | Automatic song creation method via computer |
CN106898341A (en) * | 2017-01-04 | 2017-06-27 | 清华大学 | A kind of individualized music generation method and device based on common semantic space |
-
2018
- 2018-02-11 CN CN201810142242.2A patent/CN108492817B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101313477A (en) * | 2005-12-21 | 2008-11-26 | Lg电子株式会社 | Music generating device and operating method thereof |
US8687005B2 (en) * | 2007-06-26 | 2014-04-01 | Samsung Electronics Co., Ltd. | Apparatus and method for synchronizing and sharing virtual character |
CN101414322A (en) * | 2007-10-16 | 2009-04-22 | 盛趣信息技术(上海)有限公司 | Exhibition method and system for virtual role |
US20090314155A1 (en) * | 2008-06-20 | 2009-12-24 | Microsoft Corporation | Synthesized singing voice waveform generator |
US20140046667A1 (en) * | 2011-04-28 | 2014-02-13 | Tgens Co., Ltd | System for creating musical content using a client terminal |
CN103839559A (en) * | 2012-11-20 | 2014-06-04 | 华为技术有限公司 | Audio file manufacturing method and terminal equipment |
CN105740394A (en) * | 2016-01-27 | 2016-07-06 | 广州酷狗计算机科技有限公司 | Music generation method, terminal, and server |
CN106448630A (en) * | 2016-09-09 | 2017-02-22 | 腾讯科技(深圳)有限公司 | Method and device for generating digital music file of song |
CN106652984A (en) * | 2016-10-11 | 2017-05-10 | 张文铂 | Automatic song creation method via computer |
CN106898341A (en) * | 2017-01-04 | 2017-06-27 | 清华大学 | A kind of individualized music generation method and device based on common semantic space |
Non-Patent Citations (3)
Title |
---|
ELENA SAMOYLOVA: "Virtual World of Computer Games: Reality or Illusion?", 《PROCEDIA - SOCIAL AND BEHAVIORAL SCIENCES》 * |
李镓等: "网络虚拟偶像及其粉丝群体的网络互动研究——以虚拟歌姬"洛天依"为个案", 《中国青年研究》 * |
魏丹: "国内外"虚拟歌手"的音乐文化比较", 《音乐传播》 * |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109471951A (en) * | 2018-09-19 | 2019-03-15 | 平安科技(深圳)有限公司 | Lyrics generation method, device, equipment and storage medium neural network based |
CN109471951B (en) * | 2018-09-19 | 2023-06-02 | 平安科技(深圳)有限公司 | Lyric generating method, device, equipment and storage medium based on neural network |
CN112955948A (en) * | 2018-09-25 | 2021-06-11 | 宅斯楚蒙特公司 | Musical instrument and method for real-time music generation |
CN109215626A (en) * | 2018-10-26 | 2019-01-15 | 广东电网有限责任公司 | Method for automatically composing words and music |
CN109817191A (en) * | 2019-01-04 | 2019-05-28 | 平安科技(深圳)有限公司 | Trill modeling method, device, computer equipment and storage medium |
CN109829482A (en) * | 2019-01-04 | 2019-05-31 | 平安科技(深圳)有限公司 | Song training data processing method, device and computer readable storage medium |
CN109817191B (en) * | 2019-01-04 | 2023-06-06 | 平安科技(深圳)有限公司 | Tremolo modeling method, device, computer equipment and storage medium |
CN109829482B (en) * | 2019-01-04 | 2023-10-27 | 平安科技(深圳)有限公司 | Song training data processing method and device and computer readable storage medium |
CN110136678A (en) * | 2019-04-26 | 2019-08-16 | 北京奇艺世纪科技有限公司 | A kind of music method, apparatus and electronic equipment |
CN110136678B (en) * | 2019-04-26 | 2022-06-03 | 北京奇艺世纪科技有限公司 | Music editing method and device and electronic equipment |
CN110570876B (en) * | 2019-07-30 | 2024-03-15 | 平安科技(深圳)有限公司 | Singing voice synthesizing method, singing voice synthesizing device, computer equipment and storage medium |
CN110570876A (en) * | 2019-07-30 | 2019-12-13 | 平安科技(深圳)有限公司 | Singing voice synthesis method and device, computer equipment and storage medium |
CN112417201A (en) * | 2019-08-22 | 2021-02-26 | 北京峰趣互联网信息服务有限公司 | Audio information pushing method and system, electronic equipment and computer readable medium |
CN112420008A (en) * | 2019-08-22 | 2021-02-26 | 北京峰趣互联网信息服务有限公司 | Method and device for recording songs, electronic equipment and storage medium |
CN111326131A (en) * | 2020-03-03 | 2020-06-23 | 北京香侬慧语科技有限责任公司 | Song conversion method, device, equipment and medium |
CN111326131B (en) * | 2020-03-03 | 2023-06-02 | 北京香侬慧语科技有限责任公司 | Song conversion method, device, equipment and medium |
CN111445897A (en) * | 2020-03-23 | 2020-07-24 | 北京字节跳动网络技术有限公司 | Song generation method and device, readable medium and electronic equipment |
CN111798821B (en) * | 2020-06-29 | 2022-06-14 | 北京字节跳动网络技术有限公司 | Sound conversion method, device, readable storage medium and electronic equipment |
CN111798821A (en) * | 2020-06-29 | 2020-10-20 | 北京字节跳动网络技术有限公司 | Sound conversion method, device, readable storage medium and electronic equipment |
CN113409747B (en) * | 2021-05-28 | 2023-08-29 | 北京达佳互联信息技术有限公司 | Song generation method and device, electronic equipment and storage medium |
CN113409747A (en) * | 2021-05-28 | 2021-09-17 | 北京达佳互联信息技术有限公司 | Song generation method and device, electronic equipment and storage medium |
CN113539217A (en) * | 2021-06-29 | 2021-10-22 | 广州酷狗计算机科技有限公司 | Lyric creation navigation method and device, equipment, medium and product thereof |
CN113539217B (en) * | 2021-06-29 | 2024-05-31 | 广州酷狗计算机科技有限公司 | Lyric creation navigation method and device, equipment, medium and product thereof |
CN113611267A (en) * | 2021-08-17 | 2021-11-05 | 网易(杭州)网络有限公司 | Word and song processing method and device, computer readable storage medium and computer equipment |
CN113808555A (en) * | 2021-09-17 | 2021-12-17 | 广州酷狗计算机科技有限公司 | Song synthesis method and device, equipment, medium and product thereof |
CN113851146A (en) * | 2021-09-26 | 2021-12-28 | 平安科技(深圳)有限公司 | Performance evaluation method and device based on feature decomposition |
CN113836344A (en) * | 2021-09-30 | 2021-12-24 | 广州艾美网络科技有限公司 | Personalized song file generation method and device and music singing equipment |
CN114972592A (en) * | 2022-06-22 | 2022-08-30 | 成都潜在人工智能科技有限公司 | Singing mouth shape and facial animation generation method and device and electronic equipment |
CN117765903A (en) * | 2023-05-31 | 2024-03-26 | 深圳火山视觉技术发展有限公司 | Intelligent music creation method and system |
Also Published As
Publication number | Publication date |
---|---|
CN108492817B (en) | 2020-11-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108492817A (en) | A kind of song data processing method and performance interactive system based on virtual idol | |
CN104391980B (en) | The method and apparatus for generating song | |
CN106653052A (en) | Virtual human face animation generation method and device | |
Ofli et al. | Learn2dance: Learning statistical music-to-dance mappings for choreography synthesis | |
CN103218842B (en) | A kind of voice synchronous drives the method for the three-dimensional face shape of the mouth as one speaks and facial pose animation | |
CN108962217A (en) | Phoneme synthesizing method and relevant device | |
CN108763190A (en) | Voice-based mouth shape cartoon synthesizer, method and readable storage medium storing program for executing | |
JP2001209820A (en) | Emotion expressing device and mechanically readable recording medium with recorded program | |
CN102568023A (en) | Real-time animation for an expressive avatar | |
CN109120992A (en) | Video generation method and device, electronic equipment and storage medium | |
CN116863038A (en) | Method for generating digital human voice and facial animation by text | |
JP2003530654A (en) | Animating characters | |
JP2022518721A (en) | Real-time generation of utterance animation | |
CN109326280B (en) | Singing synthesis method and device and electronic equipment | |
CN111145777A (en) | Virtual image display method and device, electronic equipment and storage medium | |
Lim et al. | Towards expressive musical robots: a cross-modal framework for emotional gesture, voice and music | |
CN112184859B (en) | End-to-end virtual object animation generation method and device, storage medium and terminal | |
CN108052250A (en) | Virtual idol deductive data processing method and system based on multi-modal interaction | |
US20150187112A1 (en) | System and Method for Automatic Generation of Animation | |
CN114219880A (en) | Method and device for generating expression animation | |
KR20110081364A (en) | Method and system for providing a speech and expression of emotion in 3d charactor | |
CN106292424A (en) | Music data processing method and device for anthropomorphic robot | |
CN117523088A (en) | Personalized three-dimensional digital human holographic interaction forming system and method | |
Hill et al. | Low-level articulatory synthesis: A working text-to-speech solution and a linguistic tool1 | |
CN108922505B (en) | Information processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20230919 Address after: 100000 6198, Floor 6, Building 4, Yard 49, Badachu Road, Shijingshan District, Beijing Patentee after: Beijing Virtual Dynamic Technology Co.,Ltd. Address before: 100000 Fourth Floor Ivy League Youth Venture Studio No. 193, Yuquan Building, No. 3 Shijingshan Road, Shijingshan District, Beijing Patentee before: Beijing Guangnian Infinite Technology Co.,Ltd. |
|
TR01 | Transfer of patent right |