CN108052250A - Virtual idol deductive data processing method and system based on multi-modal interaction - Google Patents
Virtual idol deductive data processing method and system based on multi-modal interaction Download PDFInfo
- Publication number
- CN108052250A CN108052250A CN201711320367.1A CN201711320367A CN108052250A CN 108052250 A CN108052250 A CN 108052250A CN 201711320367 A CN201711320367 A CN 201711320367A CN 108052250 A CN108052250 A CN 108052250A
- Authority
- CN
- China
- Prior art keywords
- virtual idol
- data
- modal
- idol
- virtual
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0481—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0487—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/205—3D [Three Dimensional] animation driven by audio data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating 3D models or images for computer graphics
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Computer Hardware Design (AREA)
- Computer Graphics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The virtual idol deductive data processing method and system based on multi-modal interaction that the application provides, wherein, the described method includes obtain multi-modal input data, it is matched in the deep learning model that the multi-modal input data input is pre-established, obtain multi-modal output data, the multi-modal output data is exported, and is deduced by the virtual idol;In the case of being opened in current virtual idol deduction technical ability, skill data, and the multi-modal output data of decision-making are parsed by cloud server.The multi-modal output data is shown by the virtual idol by imaging device, so that the deduction of the virtual idol possesses real-time, and deductive data possesses correspondence with technical ability content, user can also enjoy personalized Flow Experience, and man-machine interaction effect is good.
Description
Technical field
This application involves field of artificial intelligence, at the virtual idol deductive data more particularly to based on multi-modal interaction
Manage method and system, virtual idol, imaging device and computer readable storage medium.
Background technology
With the continuous development of science and technology, machine man-based development gradually extends to medical treatment, protects also by industrial circle
The fields such as strong, family, amusement and service industry.People constantly improve so that it is preferably the intelligence requirement of robot
The mankind service.
Robot includes the tangible machine people for possessing entity and the virtual robot being mounted on hardware device.The prior art
In virtual robot can only complete part deliberate action by being programmed to it, degree of intelligence is relatively low.
Currently, the multi-modal interaction of virtual robot and technical ability output do not possess real-time and deductive data and technical ability
Content does not possess correspondence, and virtual robot can not also realize true to nature, smooth, anthropomorphic effect, and man-machine interaction effect is poor.
The content of the invention
In view of this, the application provides virtual idol deductive data processing method and system, void based on multi-modal interaction
Intend idol, imaging device and computer readable storage medium, to solve technological deficiency in the prior art.
The embodiment of the present application discloses a kind of virtual idol deductive data processing method based on multi-modal interaction, the void
Intend idol and project presentation in the operation of movement equipment and by imaging device, and the virtual idol possesses default image characteristics and presets
Attribute, the described method includes:
Judge current virtual idol whether in technical ability output state;
If so, according to the current skill data of acquisition and the corresponding content parameters of the technical ability, the multi-modal output of decision-making
Data, the deductive data in the multi-modal output data are shown by the virtual idol.
Optionally, the deductive data in the multi-modal output data is included by the virtual idol displaying:
Based on the multi-modal output data, the virtual idol output limb action and the matched shape of the mouth as one speaks of emotion information
And/or facial expression.
Optionally, the method further includes:
The mobile equipment controls the imaging device to export the virtual idol according to presently described multi-modal output data
As the assembly function open signal for deducing and coordinating the virtual idol to deduce.
Optionally, the method further includes:
The affection data of current virtual idol is obtained, when the virtual idol is in technical ability output state, described in matching
Affection data exports multi-modal output data.
Optionally, when the deductive data is dancing data, according to the current skill data of acquisition and the technical ability
The step of corresponding content parameters, decision-making multi-modal output data, includes:
Dancing is obtained in real time to dub in background music;
Extract the acoustic feature that the dancing is dubbed in background music;
The acoustic feature is input in the deep learning model pre-established, output is matched with the acoustic feature
Dance movement.
Optionally, the deep learning model is built as follows:
Dancing and dancing of the acquisition with vocal music feature are dubbed in background music;
The acoustic feature that the action of the dancing with vocal music feature is dubbed in background music with the dancing is matched, generation instruction
Practice data sample;
Final deep learning model is obtained according to deep learning model described in the training data sample training.
On the other hand, present invention also provides a kind of virtual idol deductive data processing system based on multi-modal interaction,
Including mobile equipment, imaging device and cloud server, the virtual idol is in the mobile equipment operation and by imaging device
Projection is presented, and the virtual idol possesses default image characteristics and preset attribute, wherein:
Whether the cloud server judges current virtual idol in technical ability output state;
If so, the current skill data and the corresponding content parameters of the technical ability that are obtained according to the mobile equipment, by
The multi-modal output data of cloud server decision-making, the deductive data in the multi-modal output data is by the virtual idol
It is shown by the imaging device.
On the other hand, present invention also provides a kind of virtual idol, the virtual idol is in mobile equipment operation, the void
Intend the step of idol performs virtual idol deductive data processing method based on multi-modal interaction as described above.
On the other hand, present invention also provides a kind of imaging device, the virtual idol is in mobile equipment operation and by institute
Imaging device projection is stated to present.
On the other hand, present invention also provides a kind of computer readable storage medium, computer program is stored with, the journey
The step of virtual idol deductive data processing method based on multi-modal interaction as described above is realized when sequence is executed by processor.
Virtual idol deductive data processing method based on multi-modal interaction and system that the application provides, virtual idol,
Imaging device and computer readable storage medium, wherein, it, will be described multi-modal the described method includes multi-modal input data is obtained
It is matched in the deep learning model that input data input pre-establishes, obtains multi-modal output data, export the multimode
State output data, and deduced by the virtual idol;In the case of being opened in current virtual idol deduction technical ability, lead to
Cross cloud server parsing skill data, and the multi-modal output data of decision-making.The multi-modal output data is by the virtual idol
As being shown by imaging device so that the deduction of the virtual idol possesses in real-time and deductive data and technical ability
For container for correspondence, user can also enjoy personalized Flow Experience, and man-machine interaction effect is good.
Description of the drawings
Fig. 1 is a kind of structure diagram for computing device that one embodiment of the application provides;
Fig. 2 is the stream for the virtual idol deductive data processing method based on multi-modal interaction that one embodiment of the application provides
Cheng Tu;
Fig. 3 is the stream for the virtual idol deductive data processing method based on multi-modal interaction that one embodiment of the application provides
Cheng Tu;
Fig. 4 is the stream for the virtual idol deductive data processing method based on multi-modal interaction that one embodiment of the application provides
Cheng Tu;
Fig. 5 is the stream for the virtual idol deductive data processing method based on multi-modal interaction that one embodiment of the application provides
Cheng Tu;
Fig. 6 is the flow chart for the structure deep learning model that one embodiment of the application provides;
Fig. 7 is the imaging device structure schematic diagram that one embodiment of the application provides.
Specific embodiment
Many details are elaborated in the following description in order to fully understand the application.But the application can be with
Much implement different from other manner described here, those skilled in the art can be in the situation without prejudice to the application intension
Under do similar popularization, therefore the application is from the limitation of following public specific implementation.
This application provides the virtual idol deductive data processing method based on multi-modal interaction and system, virtual idol,
Imaging device and computer readable storage medium, are described in detail one by one in the following embodiments.
Referring to Fig. 1, the virtual idol deductive data processing system based on multi-modal interaction of one embodiment of the application offer
Structure diagram, realize current virtual idol deduction technical ability open in the case of, multi-modal output data described in decision-making and by
The virtual idol is shown by imaging device, carries out parsing skill data by the cloud server, and decision-making is more
Mode output data so that the deduction of the virtual idol possess real-time and deductive data possess with technical ability content it is corresponding
Property, user can also enjoy personalized Flow Experience, and man-machine interaction effect is good.
As illustratively, the virtual idol deductive data processing system based on multi-modal interaction include mobile equipment 101, into
As equipment 102 and cloud server 106.The mobile equipment 101 is alignd with 102 physical location of imaging device reference, with
And realize that the mobile equipment 101 and the signal of the imaging device 102 interconnect.
The virtual idol for operating in itself can be incident upon on the imaging device 102 and carry out by the mobile equipment 101
It has been shown that, the imaging device 102 can be line holographic projections equipment, and the mobile equipment 101 can be with the cloud service
Device 106 connects so that the virtual idol operated in the mobile equipment 101 presents multi-modal on the imaging device 102
The effect of human-computer interaction.
The mobile equipment 101 can include:Communication module 103, central processing unit 104 and human-computer interaction input and output
Module 105;
Wherein, the human-computer interaction input/output module 105 is used to obtain multi-modal data and the virtual idol of output
Parameter is performed, multi-modal data includes the data from ambient enviroment and the multi-modal input data interacted with user;
The communication module 103 is used to call the ability interface of the cloud server 106 and receives through the cloud
The ability interface of end server 106 parses the multi-modal input data and goes out multi-modal output data with decision-making;
The central processing unit 104, for being calculated and the multi-modal output number using the multi-modal output data
According to corresponding reply data.
The cloud server 106 possesses multi-modal data parsing module, more for being sent to the mobile equipment 101
Modal data is parsed, and the multi-modal output data of decision-making.
The imaging device 102 is used for virtual idol of the display with specific image in default display area.
As shown in Figure 1, each ability interface calls corresponding logical process respectively in multi-modal data resolving.Below
For the explanation of each interface:
Semantic understanding interface 107 receives the special sound instruction forwarded from the communication module 103, language is carried out to it
Sound identifies and the natural language processing based on a large amount of language materials.
Visual identity interface 108 can be directed to human body, face, scene according to computer vision algorithms make, deep learning algorithm
Deng progress video content detection, identification, tracking etc..Image is identified according to predetermined algorithm, the detection of quantitative
As a result.Possess image preprocessing function, feature extraction functions, decision making function and application function;
Wherein, described image preprocessing function can carry out basic handling to the vision collecting data of acquisition, including face
Color space transformation, edge extracting, image conversion and image threshold;The feature extraction functions can extract target in image
The characteristic informations such as the colour of skin, color, texture, movement and coordinate;The application function realize Face datection, human limbs identification,
The functions such as motion detection.
Affection computation interface 110 receives the multi-modal data forwarded from the communication module 103, utilizes affection computation
Logic (can be emotion recognition technology) calculates the current affective state of user.Emotion recognition technology is one of affection computation
Important component, the content of emotion recognition research include the sides such as facial expression, voice, behavior, text and physiological signal identification
Face may determine that the affective state of user by more than content.Emotion recognition technology can only pass through visual emotion identification technology
Monitor the affective state of user, can also using visual emotion identification technology and sound emotion recognition technology with reference to by the way of come
The affective state of user is monitored, and is not limited thereto.In the present embodiment, it is preferred to use the two with reference to mode monitor feelings
Sense.
Affection computation interface 110 is when carrying out visual emotion identification, and mankind face is collected by using image capture device
Portion's facial expression image is then converted into that data can be analyzed, the technologies such as image procossing is recycled to carry out expression sentiment analysis.Understand face
Expression, it usually needs the delicate variation of expression is detected, such as cheek muscle, mouth variation and choose eyebrow etc..
Cognition calculates interface 109, receives the multi-modal data forwarded from the communication module 103, and the cognition calculates
Interface 109 carries out data acquisition, identification and study to handle multi-modal data, to obtain user's portrait, knowledge mapping etc., with
Rational Decision is carried out to multi-modal output data.
One kind of the above-mentioned virtual idol deductive data processing system based on multi-modal interaction for the embodiment of the present application is shown
The technical solution of meaning property.For the ease of those skilled in the art understand that the technical solution of the application, following to pass through multiple embodiments
The virtual idol deductive data processing method based on multi-modal interaction and systems that there is provided the application, virtual idol, imaging are set
Standby and computer readable storage medium, is further detailed.
In the application, mobile equipment is alignd with the reference of imaging device physical location, with realize the mobile equipment with it is described
The signal interconnection of imaging device.
The virtual idol for operating in itself can be incident upon on the imaging device and show by the mobile equipment, institute
It can be line holographic projections equipment to state imaging device, and the mobile equipment can be connected with cloud server and cause the void
Intend idol and possess multi-modal human-computer interaction, possess natural language understanding, visual perception, touch perception, language voice output, emotion
The ability of the Artificial Intelligence (AI) such as facial expressions and acts output.
The virtual idol can be shown that it is special to possess specific image with 3D virtual images by the imaging device
Sign, and can be that the virtual idol configures social property, personality attribute and personage's technical ability etc..
Specifically, the social property can include:Appearance, name, dress ornament, decoration, gender, native place, age, family
The attributes such as relation, occupation, position, religious belief, emotion state, educational background;The personality attribute can include:Personality, makings etc.
Attribute;Personage's technical ability can include:Sing and dance, the professional skills such as tell a story, train, and the displaying of personage's technical ability is not
It is limited to the technical ability displaying of limbs, expression, head and/or mouth.
In this application, the social property of virtual idol, personality attribute and personage's technical ability etc. can cause multi-modal interaction
Parsing skill data, and the multi-modal output data of decision-making, it is virtual that the multi-modal output data is more prone to or is more suitable for this
Idol.
The virtual idol can also coordinate mobile equipment to project on imaging device simultaneously, and according to the imaging device
The scene of displaying is deduced, such as sing and dance etc..
Referring to Fig. 2, one embodiment of the application provides a kind of virtual idol deductive data processing based on multi-modal interaction
Method, the virtual idol in mobile equipment operation and projects presentation by imaging device, and the virtual idol possesses default shape
As feature and preset attribute, the method includes the steps 201 to step 202.
Step 201:Judge current virtual idol whether in technical ability output state.
In the embodiment of the present application, the technical ability can be including sing and dance etc., and the technical ability output state can be included just
It singing, dancing;And the virtual idol in mobile equipment operation and projects presentation by imaging device, can basis
Mobile equipment or imaging device judge the technical ability output state of current virtual idol, such as may determine that current virtual idol just
It is singing or the technical ability output state such as dancing.
In the embodiment of the present application, the mobile equipment can be smart mobile phone, laptop, tablet computer, palm electricity
The computing devices such as brain and other mobile terminals, the computing device can also be portable or state type server, the shifting
Dynamic equipment is the main media that the virtual idol is interacted with user and environment.
The imaging device can be line holographic projections equipment, and line holographic projections equipment can provide the load of basic projection imaging
Body supports, and can show the contents such as the picture shown in the mobile device screen or word, and it is described into
As equipment can also be gathered on signals such as vision, infrared and/or bluetooths, the mobile equipment to be aided in interact.
The mobile equipment controls the display function of the imaging device, including appendicular aobvious to scene
Show and controlled, such as flowers, plants and trees in control scene etc., the display to light, special efficacy, particle or ray, wherein the lamp
Light, the special efficacy, the particle and the ray can be shown by the imaging device.
In the embodiment of the present application, when the relative position of mobile equipment or imaging device changes, mobile equipment can
Adjustment operates in the state of its virtual idol, and the state includes but not limited to resting state, active state, listens attentively to state
Deng.
Wherein, the resting state:Refer to that virtual idol remains static or no interactions state;It is described to enliven shape
State refers to that virtual idol is in multi-modal interaction mode, and can carry out works deduction and technical ability output;It is described to listen attentively to state:
It refers to that the speech input interface of virtual idol is opened, the voice signal of user and environment input can be received.
Step 202:It is if so, more according to the current skill data of acquisition and the corresponding content parameters of the technical ability, decision-making
Mode output data, the deductive data in the multi-modal output data are shown by the virtual idol.
In the embodiment of the present application, if current virtual idol is in technical ability output state, it can be obtained using cloud server
The skill data of current virtual idol and the corresponding content parameters of the technical ability, the multi-modal output data of decision-making.
For example, current virtual idol is in the state for having turned on technical ability output, it is even to current virtual using cloud server
The technical ability of picture is obtained, and such as the technical ability of virtual idol is song accompaning with dance, and corresponding content parameters are then:In the song of singing
Hold, can determine that multi-modal output data is according to the above- mentioned information got:The music rhythm of song, the Text To of song
Speech (TTS), the song rhythm rhythm and with the matched dancing of the song.
The multi-modal output data can include deductive data and voice data etc., the application mainly using deductive data as
Example illustrates, such as the multi-modal output data is:When singing and jump dancing matched with the song, the deductive data
For:The multi-modal output data of the imaging device can be shown in, as dance movement, limbs with the song rhythm rhythm change
Change, music rhythm and the corresponding emotions of song TTS are in the emotion behavior of virtual idol face.
In the embodiment of the present application, deep learning model, the deep learning mould can also be pre-established by server beyond the clouds
Type can be Recognition with Recurrent Neural Network (RNN, Recurrent Neural Networks), by the current skill data got with
And in the technical ability corresponding content parameters input deep learning model, multi-modal output data, example can be immediately arrived at
The current skill data such as got is:It dances, the corresponding content parameters of the technical ability are:The dance movement of peacock dance, by this
Skill data and the corresponding content parameters of technical ability are inputted in the deep learning model, are immediately arrived at:Jump the next of peacock dance
Then dance movement A is accurately drawn and current action A again according to " the next dance movement A for jumping peacock dance " that currently draws
Corresponding " the next dance movement B for jumping peacock dance ", so cycles, whole dance movements of peacock dance is generated, then by institute
It states virtual idol and is deduced and pass through imaging device and be shown.
Optionally, (source of affection data includes the affection data of acquisition current virtual idol:User feeling input, skill
Energy content corresponds to affection data), when the virtual idol is in technical ability output state, match the affection data output multimode
State output data.
For example, the affection data of the current virtual idol is happiness, and the virtual idol is in what is sung
During technical ability output state, matching the multi-modal output data of the affection data output can include singing and jumping and the song
The dancing to match if the affection data of the current virtual idol is difficult out-of-date, is being sung when the virtual idol is in
Technical ability output state when, the multi-modal output data for matching affection data output may be only to sing.
Virtual idol deductive data processing method provided by the embodiments of the present application based on multi-modal interaction, can be current
In the case that virtual idol deduction technical ability is opened, parsing skill data is carried out by the cloud server, and decision-making is multi-modal
Output data.The multi-modal output data is simultaneously shown by the virtual idol by imaging device so that described virtual
The deduction of idol possesses real-time and deductive data and possesses correspondence with technical ability content, and user can also enjoy personalized stream
Smooth experience, man-machine interaction effect are good.
Referring to Fig. 3, one embodiment of the application provides a kind of virtual idol deductive data processing based on multi-modal interaction
Method, the virtual idol in mobile equipment operation and projects presentation by imaging device, and the virtual idol possesses default shape
As feature and preset attribute, the method includes the steps 301 to step 303.
Step 301:Judge current virtual idol whether in technical ability output state.
In the embodiment of the present application, the technical ability can be to sing, and the technical ability output state can be to sing.
Step 302:It is if so, more according to the current skill data of acquisition and the corresponding content parameters of the technical ability, decision-making
Mode output data, the deductive data in the multi-modal output data are shown by the virtual idol.
In the embodiment of the present application, technical ability output state can be analyzed by cloud server, then tied according to analysis
The current skill data and the corresponding content parameters of the technical ability that fruit obtains, such as cloud server is to current virtual idol
Technical ability output state determines that the technical ability output state of presently described virtual idol is after being analyzed:It is singing, then basis
It is corresponding for singing and the technical ability that the technical ability output state sung gets the current skill data of virtual idol
Content parameters for the lyrics " Liang shanbo and Zhu yingtai's:Butterfly with different colors is hovered long in pairs ", the skill data and the corresponding content parameters of the technical ability is defeated
Enter into the deep learning model pre-established, draw the multi-modal output data:Sing bent Liang shanbo and Zhu yingtai's this song and jump and the song
Bent matched dancing, the deductive data in the multi-modal output data are:Jump and the bent matched dancing of Liang shanbo and Zhu yingtai's this song, then
It is deduced by the virtual idol and the dancing and passes through imaging device and be shown.
Step 303:Based on the multi-modal output data, the virtual idol output limb action and emotion information
The shape of the mouth as one speaks and/or facial expression matched somebody with somebody.
In the embodiment of the present application, when the virtual idol shows the deductive data in the multi-modal output data, also
The multi-modal output data can be based on, can also coordinate output limb action and the matched shape of the mouth as one speaks of emotion information and/or facial table
Feelings etc..
When deductive data in the multi-modal output data is song accompaning with dance, limbs (such as both arms, hand of the virtual idol
Finger, both legs etc.) rhythm of music in song, the content of the lyrics, the development fluctuations of lyrics plot can be followed, it makes corresponding
Action, the shape of the mouth as one speaks of the virtual idol also can make corresponding variation according to song tune, lyrics emotion corresponding with the song,
And the face (such as eyes, eyebrow etc.) and facial expression (such as a crease in the skin etc.) of the virtual idol also can be according to parsings
Affection data in song and dancing is changed.
For example, the deductive data in the multi-modal output data is:When jumping matched dancing bent with Liang shanbo and Zhu yingtai's this song, when
When the virtual idol jumps matched dancing bent with Liang shanbo and Zhu yingtai's this song, then it can judge to be wrapped in Liang shanbo and Zhu yingtai's this song song by algorithm
The emotion and state contained, by identification, it can be determined that go out the Sentiment orientation of Liang shanbo and Zhu yingtai's this song song for negative sense, emotional category is difficult
It crosses, it is necessary to which the type of action showed is:Do not give up, when cloud server it will be appreciated that information after be sent to mobile equipment, fortune
Row need to make when the virtual idol of mobile equipment just can know that and jump matched dancing bent with Liang shanbo and Zhu yingtai's this song do not give up it is dynamic
Make, and the shape of the mouth as one speaks is half, and expression in the eyes is sad state.
Virtual idol deductive data processing method provided by the embodiments of the present application based on multi-modal interaction so that virtual even
As increasing limb action and the matched shape of the mouth as one speaks of emotion information and/or facial expression etc. in deductive procedure so that the virtual idol
The presentation of picture more personalizes, and also more increases interest using this multi-modal interaction.
Referring to Fig. 4, one embodiment of the application provides a kind of virtual idol deductive data processing based on multi-modal interaction
Method, the virtual idol in mobile equipment operation and projects presentation by imaging device, and the virtual idol possesses default shape
As feature and preset attribute, the method includes the steps 401 to step 403.
Step 401:Judge current virtual idol whether in technical ability output state.
The technical ability output state, which is opened, to be inputted by the phonetic entry of user, visual gesture, touch perceive input or
The opening ways such as entity button input so that virtual idol open from sing, dance, the technical ability such as song accompaning with dance or poem recitation.
Step 402:It is if so, more according to the current skill data of acquisition and the corresponding content parameters of the technical ability, decision-making
Mode output data, the deductive data in the multi-modal output data are shown by the virtual idol.
In the embodiment of the present application, technical ability output state can be analyzed by cloud server, then tied according to analysis
The current skill data and the corresponding content parameters of the technical ability that fruit obtains, such as cloud server is to current virtual idol
Technical ability output state determines that the technical ability output state of presently described virtual idol is after being analyzed:It is listening to music, then root
The current skill data of virtual idol is got according to the technical ability output state listened to music to listen to music and the technical ability
Corresponding content parameters are to listen dubbing in background music for Latin dancing, and the skill data and the corresponding content parameters of the technical ability are input in advance
In the deep learning model of foundation, the multi-modal output data is drawn:Listen dubbing in background music and jumping Latin dancing, the multimode for Latin dancing
Deductive data in state output data is:Latin dancing is jumped, then being deduced by the virtual idol dancing and passes through imaging device
It is shown.
Step 403:The mobile equipment controls the imaging device output institute according to presently described multi-modal output data
State the assembly function open signal that virtual idol deduces and the virtual idol is coordinated to deduce.
In the embodiment of the present application, when the technical ability of the virtual idol is opened, mobile equipment can be selected according to current scene
Wireless network connection is selected to imaging device, and imaging device is controlled to open light, variation light color or opens dot matrix light etc..
For example, when presently described virtual idol is jumping Latin dancing, mobile equipment can select wireless according to current scene
Network connection controls imaging device to open golden and red light to imaging device, changes the light color of golden red, opens
Dot matrix light, coordinates the virtual idol, and the effect of presentation is truer.
Virtual idol deductive data processing method provided by the embodiments of the present application based on multi-modal interaction, described virtual
Idol can coordinate while carrying out dancing deduction according to the dancing of deduction automatically generates matched light of stage and stage effect,
Better visual experience is presented to user.
Referring to Fig. 5, when deductive data is dancing data, corresponded to according to the current skill data of acquisition and the technical ability
Content parameters, the step of deductive data in the multi-modal output data of decision-making includes:
Step 501:Dancing is obtained in real time to dub in background music.
In the embodiment of the present application, it is to be in technical ability output state to first determine whether current virtual idol, and can be according to the dance
It steps the skill data of the definite virtual idol of dubbing in background music to dub in background music output for dancing, the corresponding content parameters of the technical ability are the dancing
The each lyrics or each section music score of Chinese operas dubbed in background music.
Step 502:Extract the acoustic feature that the dancing is dubbed in background music.
In the embodiment of the present application, each lyrics or each acoustic feature for saving the music score of Chinese operas that the dancing is dubbed in background music are extracted, it is described
Acoustic feature can be tune, bent meaning or beat etc..
Step 503:The acoustic feature is input in the deep learning model pre-established, output is special with the acoustics
Levy matched dance movement.
In the embodiment of the present application, output and the matched dance movement of the acoustic feature are in multi-modal output data
Deductive data.
In the embodiment of the present application, the dance movement (such as preliminary activities) of the virtual idol of previous moment is obtained first, then is obtained
The dancing of presently described virtual idol is taken to dub in background music, the acoustic feature that the dancing is dubbed in background music then is extracted, according to the acoustic feature
Dance movement with the virtual idol of previous moment generates the dance movement at matched current time, so cycles, and generation is a series of
Then dance movement is deduced by virtual idol by line holographic projections equipment.
In the embodiment of the present application, the dancing at the dance movement generation current time of the virtual idol of previous moment can also be obtained
Action so cycles, generates a series of dance movements;If the dance movement of the virtual idol of previous moment and current institute are obtained simultaneously
When stating the dancing of virtual idol and dubbing in background music, the virtual idol can automatically select current dance movement or described the most suitable
Virtual idol can match somebody with somebody by the current dance movement generated according to the dance movement of the virtual idol of the previous moment and according to dancing
The current dance movement of happy generation is combined, and generates new current time dance movement, the virtual idol can be realized automatically
Learn dance movement and have well from invasive.
In the embodiment of the present application, the deep learning model is built by step 601 to step 603.
Step 601:Dancing and dancing of the acquisition with vocal music feature are dubbed in background music.
In the embodiment of the present application, dubbed in background music by dancing of the mobile equipment acquisition with vocal music feature and dancing.
Step 602:The acoustic feature progress that the action of the dancing with vocal music feature is dubbed in background music with the dancing
Match somebody with somebody, generate training data sample.
In the embodiment of the present application, each action of the dancing with vocal music feature is dubbed in background music with the dancing every
One acoustic feature is matched, and generates training sample.
Step 603:Final deep learning mould is obtained according to deep learning model described in the training data sample training
Type.
A kind of virtual image technique of expression provided by the embodiments of the present application realizes that the multi-modal input data can basis
The deep learning model that training obtains in advance directly obtains the multi-modal output data, is solved by the cloud server
Analyse skill data, and the multi-modal output data of decision-making so that the deduction of the virtual image possesses real-time and deductive data
Possesses correspondence with technical ability content, user can also enjoy personalized Flow Experience, and man-machine interaction effect is good.
One embodiment of the application also provides a kind of virtual idol deductive data processing system based on multi-modal interaction, including
Mobile equipment, imaging device and cloud server, the virtual idol run in the mobile equipment and are projected by imaging device
It presents, and the virtual idol possesses default image characteristics and preset attribute, wherein:
Whether the cloud server judges current virtual idol in technical ability output state;
If so, the current skill data and the corresponding content parameters of the technical ability that are obtained according to the mobile equipment, by
The multi-modal output data of cloud server decision-making, the deductive data in the multi-modal output data is by the virtual idol
It is shown by the imaging device.
The exemplary scheme of the above-mentioned virtual idol deductive data processing system based on multi-modal interaction for the present embodiment.
It should be noted that should virtual idol deductive data processing system based on multi-modal interaction technical solution with it is above-mentioned based on
The technical solution of the virtual idol deductive data processing method of multi-modal interaction belongs to same design, the void based on multi-modal interaction
Intend the detail content that the technical solution of idol deductive data processing system is not described in detail, may refer to above-mentioned based on multi-modal
The description of the technical solution of interactive virtual idol deductive data processing method.
One embodiment of the application additionally provides a kind of virtual idol, and the virtual idol is in mobile equipment operation, the void
Intend the step of idol performs virtual idol deductive data processing method based on multi-modal interaction as described above.
Referring to Fig. 7, one embodiment of the application additionally provides a kind of imaging device, and the virtual idol is in mobile equipment 701
It runs and presentation is projected by the imaging device 702.
One embodiment of the application also provides a kind of computer readable storage medium, is stored with computer program, the program
The step of virtual idol deductive data processing method based on multi-modal interaction as previously described is realized when being executed by processor.
A kind of exemplary scheme of above-mentioned computer readable storage medium for the present embodiment.It should be noted that this is deposited
The technical solution category of the technical solution of storage media and the above-mentioned virtual idol deductive data processing method based on multi-modal interaction
In same design, detail content that the technical solution of storage medium is not described in detail may refer to above-mentioned based on multi-modal friendship
The description of the technical solution of mutual virtual idol deductive data processing method.
The computer instruction include computer program code, the computer program code can be source code form,
Object identification code form, executable file or some intermediate forms etc..The computer-readable medium can include:Institute can be carried
State any entity of computer program code or system, recording medium, USB flash disk, mobile hard disk, magnetic disc, CD, computer storage
Device, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory),
Electric carrier signal, telecommunication signal and software distribution medium etc..It should be noted that the computer-readable medium include it is interior
Appropriate increase and decrease can be carried out according to legislation in jurisdiction and the requirement of patent practice by holding, such as in some jurisdictions of courts
Area, according to legislation and patent practice, computer-readable medium does not include electric carrier signal and telecommunication signal.
It it should be noted that for foregoing each method embodiment, describes, therefore it is all expressed as a series of for simplicity
Combination of actions, but those skilled in the art should know, the application and from the limitation of described sequence of movement because
According to the application, some steps may be employed other orders or be carried out at the same time.Secondly, those skilled in the art should also know
It knows, embodiment described in this description belongs to preferred embodiment, and involved action and module might not all be this Shens
It please be necessary.
In the above-described embodiments, all emphasize particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, it may refer to the associated description of other embodiments.
The application preferred embodiment disclosed above is only intended to help to illustrate the application.There is no detailed for alternative embodiment
All details are described, are not limited the invention to the specific embodiments described.Obviously, according to the content of this specification,
It can make many modifications and variations.This specification is chosen and specifically describes these embodiments, is in order to preferably explain the application
Principle and practical application so that skilled artisan can be best understood by and utilize the application.The application is only
It is limited by claims and its four corner and equivalent.
Claims (10)
1. a kind of virtual idol deductive data processing method based on multi-modal interaction, which is characterized in that the virtual idol exists
Mobile equipment operation simultaneously projects presentation by imaging device, and the virtual idol possesses default image characteristics and preset attribute, institute
The method of stating includes:
Judge current virtual idol whether in technical ability output state;
If so, according to the current skill data of acquisition and the corresponding content parameters of the technical ability, the multi-modal output data of decision-making,
Deductive data in the multi-modal output data is shown by the virtual idol.
2. according to the method described in claim 1, it is characterized in that, the deductive data in the multi-modal output data is by described
Virtual idol displaying includes:
Based on the multi-modal output data, the virtual idol output limb action, with the matched shape of the mouth as one speaks of emotion information and/or
Facial expression.
3. it according to the method described in claim 1, it is characterized in that, further includes:
The mobile equipment controls the imaging device to export the virtual idol and drills according to presently described multi-modal output data
The assembly function open signal for unraveling silk and coordinating the virtual idol to deduce.
4. it according to the method described in claim 1, it is characterized in that, further includes:
The affection data of current virtual idol is obtained, when the virtual idol is in technical ability output state, matches the emotion
Data export multi-modal output data.
5. according to the method described in claim 1, it is characterized in that, when the deductive data be dancing data when, according to acquisition
Current skill data and the corresponding content parameters of the technical ability, the step of decision-making multi-modal output data include:
Dancing is obtained in real time to dub in background music;
Extract the acoustic feature that the dancing is dubbed in background music;
The acoustic feature is input in the deep learning model pre-established and matched, output and the acoustic feature
The dance movement matched somebody with somebody.
6. according to the method described in claim 5, it is characterized in that, the deep learning model carries out structure as follows
It builds:
Dancing and dancing of the acquisition with vocal music feature are dubbed in background music;
The acoustic feature that the action of the dancing with vocal music feature is dubbed in background music with the dancing is matched, generates training number
According to sample;
Final deep learning model is obtained according to deep learning model described in the training data sample training.
7. a kind of virtual idol deductive data processing system based on multi-modal interaction, which is characterized in that including mobile equipment, into
As equipment and cloud server, the virtual idol in the mobile equipment operation and projects presentation by imaging device, and described
Virtual idol possesses default image characteristics and preset attribute, wherein:
Whether the cloud server judges current virtual idol in technical ability output state;
If so, according to the current skill data and the corresponding content parameters of the technical ability of the mobile equipment acquisition, by described
The multi-modal output data of cloud server decision-making, the deductive data in the multi-modal output data are passed through by the virtual idol
The imaging device displaying.
8. a kind of virtual idol, which is characterized in that the virtual idol is performed in mobile equipment operation, the virtual idol as weighed
Profit requires the step of 1-6 any one the methods.
9. a kind of imaging device, which is characterized in that virtual idol described in claim 8 is in mobile equipment operation and by the imaging
Equipment projection is presented.
10. a kind of computer readable storage medium, is stored with computer program, which is characterized in that the program is held by processor
The step of claim 1-6 any one the methods are realized during row.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711320367.1A CN108052250A (en) | 2017-12-12 | 2017-12-12 | Virtual idol deductive data processing method and system based on multi-modal interaction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711320367.1A CN108052250A (en) | 2017-12-12 | 2017-12-12 | Virtual idol deductive data processing method and system based on multi-modal interaction |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108052250A true CN108052250A (en) | 2018-05-18 |
Family
ID=62124409
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711320367.1A Pending CN108052250A (en) | 2017-12-12 | 2017-12-12 | Virtual idol deductive data processing method and system based on multi-modal interaction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108052250A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109298779A (en) * | 2018-08-10 | 2019-02-01 | 济南奥维信息科技有限公司济宁分公司 | Virtual training System and method for based on virtual protocol interaction |
CN110148406A (en) * | 2019-04-12 | 2019-08-20 | 北京搜狗科技发展有限公司 | A kind of data processing method and device, a kind of device for data processing |
CN110147454A (en) * | 2019-04-30 | 2019-08-20 | 东华大学 | A kind of emotion communication matching system based on virtual robot |
CN110850970A (en) * | 2019-10-25 | 2020-02-28 | 智亮君 | Handshake interaction method and system based on hand model, hand model and storage medium |
CN110850971A (en) * | 2019-10-25 | 2020-02-28 | 智亮君 | Handshake interaction method and system between hand model and intelligent mirror and storage medium |
CN111179694A (en) * | 2019-12-02 | 2020-05-19 | 广东小天才科技有限公司 | Dance teaching interaction method, intelligent sound box and storage medium |
CN115426553A (en) * | 2021-05-12 | 2022-12-02 | 海信集团控股股份有限公司 | Intelligent sound box and display method thereof |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104866101A (en) * | 2015-05-27 | 2015-08-26 | 世优(北京)科技有限公司 | Real-time interactive control method and real-time interactive control device of virtual object |
CN104883557A (en) * | 2015-05-27 | 2015-09-02 | 世优(北京)科技有限公司 | Real time holographic projection method, device and system |
CN106096720A (en) * | 2016-06-12 | 2016-11-09 | 杭州如雷科技有限公司 | A kind of method that dance movement is automatically synthesized |
CN107423809A (en) * | 2017-07-07 | 2017-12-01 | 北京光年无限科技有限公司 | The multi-modal exchange method of virtual robot and system applied to net cast platform |
-
2017
- 2017-12-12 CN CN201711320367.1A patent/CN108052250A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104866101A (en) * | 2015-05-27 | 2015-08-26 | 世优(北京)科技有限公司 | Real-time interactive control method and real-time interactive control device of virtual object |
CN104883557A (en) * | 2015-05-27 | 2015-09-02 | 世优(北京)科技有限公司 | Real time holographic projection method, device and system |
CN106096720A (en) * | 2016-06-12 | 2016-11-09 | 杭州如雷科技有限公司 | A kind of method that dance movement is automatically synthesized |
CN107423809A (en) * | 2017-07-07 | 2017-12-01 | 北京光年无限科技有限公司 | The multi-modal exchange method of virtual robot and system applied to net cast platform |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109298779A (en) * | 2018-08-10 | 2019-02-01 | 济南奥维信息科技有限公司济宁分公司 | Virtual training System and method for based on virtual protocol interaction |
CN109298779B (en) * | 2018-08-10 | 2021-10-12 | 济南奥维信息科技有限公司济宁分公司 | Virtual training system and method based on virtual agent interaction |
CN110148406A (en) * | 2019-04-12 | 2019-08-20 | 北京搜狗科技发展有限公司 | A kind of data processing method and device, a kind of device for data processing |
CN110147454A (en) * | 2019-04-30 | 2019-08-20 | 东华大学 | A kind of emotion communication matching system based on virtual robot |
CN110850970A (en) * | 2019-10-25 | 2020-02-28 | 智亮君 | Handshake interaction method and system based on hand model, hand model and storage medium |
CN110850971A (en) * | 2019-10-25 | 2020-02-28 | 智亮君 | Handshake interaction method and system between hand model and intelligent mirror and storage medium |
CN111179694A (en) * | 2019-12-02 | 2020-05-19 | 广东小天才科技有限公司 | Dance teaching interaction method, intelligent sound box and storage medium |
CN115426553A (en) * | 2021-05-12 | 2022-12-02 | 海信集团控股股份有限公司 | Intelligent sound box and display method thereof |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108052250A (en) | Virtual idol deductive data processing method and system based on multi-modal interaction | |
WO2022048403A1 (en) | Virtual role-based multimodal interaction method, apparatus and system, storage medium, and terminal | |
CN107944542A (en) | A kind of multi-modal interactive output method and system based on visual human | |
CN108665492B (en) | Dance teaching data processing method and system based on virtual human | |
CN111833418B (en) | Animation interaction method, device, equipment and storage medium | |
CN107797663A (en) | Multi-modal interaction processing method and system based on visual human | |
CN108942919B (en) | Interaction method and system based on virtual human | |
CN107831905A (en) | A kind of virtual image exchange method and system based on line holographic projections equipment | |
CN106997243B (en) | Speech scene monitoring method and device based on intelligent robot | |
CN107340859A (en) | The multi-modal exchange method and system of multi-modal virtual robot | |
CN106710590A (en) | Voice interaction system with emotional function based on virtual reality environment and method | |
CN109271018A (en) | Exchange method and system based on visual human's behavioral standard | |
CN109086860B (en) | Interaction method and system based on virtual human | |
CN108492817A (en) | A kind of song data processing method and performance interactive system based on virtual idol | |
CN107679519A (en) | A kind of multi-modal interaction processing method and system based on visual human | |
CN109324688A (en) | Exchange method and system based on visual human's behavioral standard | |
CN109343695A (en) | Exchange method and system based on visual human's behavioral standard | |
CN109278051A (en) | Exchange method and system based on intelligent robot | |
CN106570473A (en) | Deaf-mute sign language identification interaction system based on robot | |
WO2023284435A1 (en) | Method and apparatus for generating animation | |
CN113760101B (en) | Virtual character control method and device, computer equipment and storage medium | |
CN109032328A (en) | A kind of exchange method and system based on visual human | |
CN107003825A (en) | System and method with dynamic character are instructed by natural language output control film | |
CN108595012A (en) | Visual interactive method and system based on visual human | |
CN108416420A (en) | Limbs exchange method based on visual human and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180518 |
|
RJ01 | Rejection of invention patent application after publication |