CN108052250A

CN108052250A - Virtual idol deductive data processing method and system based on multi-modal interaction

Info

Publication number: CN108052250A
Application number: CN201711320367.1A
Authority: CN
Inventors: 贾志强; 俞晓君
Original assignee: Beijing Guangnian Wuxian Technology Co Ltd
Current assignee: Beijing Guangnian Wuxian Technology Co Ltd
Priority date: 2017-12-12
Filing date: 2017-12-12
Publication date: 2018-05-18

Abstract

The virtual idol deductive data processing method and system based on multi-modal interaction that the application provides, wherein, the described method includes obtain multi-modal input data, it is matched in the deep learning model that the multi-modal input data input is pre-established, obtain multi-modal output data, the multi-modal output data is exported, and is deduced by the virtual idol；In the case of being opened in current virtual idol deduction technical ability, skill data, and the multi-modal output data of decision-making are parsed by cloud server.The multi-modal output data is shown by the virtual idol by imaging device, so that the deduction of the virtual idol possesses real-time, and deductive data possesses correspondence with technical ability content, user can also enjoy personalized Flow Experience, and man-machine interaction effect is good.

Description

Virtual idol deductive data processing method and system based on multi-modal interaction

Technical field

This application involves field of artificial intelligence, at the virtual idol deductive data more particularly to based on multi-modal interaction Manage method and system, virtual idol, imaging device and computer readable storage medium.

Background technology

With the continuous development of science and technology, machine man-based development gradually extends to medical treatment, protects also by industrial circle The fields such as strong, family, amusement and service industry.People constantly improve so that it is preferably the intelligence requirement of robot The mankind service.

Robot includes the tangible machine people for possessing entity and the virtual robot being mounted on hardware device.The prior art In virtual robot can only complete part deliberate action by being programmed to it, degree of intelligence is relatively low.

Currently, the multi-modal interaction of virtual robot and technical ability output do not possess real-time and deductive data and technical ability Content does not possess correspondence, and virtual robot can not also realize true to nature, smooth, anthropomorphic effect, and man-machine interaction effect is poor.

The content of the invention

In view of this, the application provides virtual idol deductive data processing method and system, void based on multi-modal interaction Intend idol, imaging device and computer readable storage medium, to solve technological deficiency in the prior art.

The embodiment of the present application discloses a kind of virtual idol deductive data processing method based on multi-modal interaction, the void Intend idol and project presentation in the operation of movement equipment and by imaging device, and the virtual idol possesses default image characteristics and presets Attribute, the described method includes：

Judge current virtual idol whether in technical ability output state；

If so, according to the current skill data of acquisition and the corresponding content parameters of the technical ability, the multi-modal output of decision-making Data, the deductive data in the multi-modal output data are shown by the virtual idol.

Optionally, the deductive data in the multi-modal output data is included by the virtual idol displaying：

Based on the multi-modal output data, the virtual idol output limb action and the matched shape of the mouth as one speaks of emotion information And/or facial expression.

Optionally, the method further includes：

The mobile equipment controls the imaging device to export the virtual idol according to presently described multi-modal output data As the assembly function open signal for deducing and coordinating the virtual idol to deduce.

Optionally, the method further includes：

The affection data of current virtual idol is obtained, when the virtual idol is in technical ability output state, described in matching Affection data exports multi-modal output data.

Optionally, when the deductive data is dancing data, according to the current skill data of acquisition and the technical ability The step of corresponding content parameters, decision-making multi-modal output data, includes：

Dancing is obtained in real time to dub in background music；

Extract the acoustic feature that the dancing is dubbed in background music；

The acoustic feature is input in the deep learning model pre-established, output is matched with the acoustic feature Dance movement.

Optionally, the deep learning model is built as follows：

Dancing and dancing of the acquisition with vocal music feature are dubbed in background music；

The acoustic feature that the action of the dancing with vocal music feature is dubbed in background music with the dancing is matched, generation instruction Practice data sample；

Final deep learning model is obtained according to deep learning model described in the training data sample training.

On the other hand, present invention also provides a kind of virtual idol deductive data processing system based on multi-modal interaction, Including mobile equipment, imaging device and cloud server, the virtual idol is in the mobile equipment operation and by imaging device Projection is presented, and the virtual idol possesses default image characteristics and preset attribute, wherein：

Whether the cloud server judges current virtual idol in technical ability output state；

If so, the current skill data and the corresponding content parameters of the technical ability that are obtained according to the mobile equipment, by The multi-modal output data of cloud server decision-making, the deductive data in the multi-modal output data is by the virtual idol It is shown by the imaging device.

On the other hand, present invention also provides a kind of virtual idol, the virtual idol is in mobile equipment operation, the void Intend the step of idol performs virtual idol deductive data processing method based on multi-modal interaction as described above.

On the other hand, present invention also provides a kind of imaging device, the virtual idol is in mobile equipment operation and by institute Imaging device projection is stated to present.

On the other hand, present invention also provides a kind of computer readable storage medium, computer program is stored with, the journey The step of virtual idol deductive data processing method based on multi-modal interaction as described above is realized when sequence is executed by processor.

Virtual idol deductive data processing method based on multi-modal interaction and system that the application provides, virtual idol, Imaging device and computer readable storage medium, wherein, it, will be described multi-modal the described method includes multi-modal input data is obtained It is matched in the deep learning model that input data input pre-establishes, obtains multi-modal output data, export the multimode State output data, and deduced by the virtual idol；In the case of being opened in current virtual idol deduction technical ability, lead to Cross cloud server parsing skill data, and the multi-modal output data of decision-making.The multi-modal output data is by the virtual idol As being shown by imaging device so that the deduction of the virtual idol possesses in real-time and deductive data and technical ability For container for correspondence, user can also enjoy personalized Flow Experience, and man-machine interaction effect is good.

Description of the drawings

Fig. 1 is a kind of structure diagram for computing device that one embodiment of the application provides；

Fig. 2 is the stream for the virtual idol deductive data processing method based on multi-modal interaction that one embodiment of the application provides Cheng Tu；

Fig. 3 is the stream for the virtual idol deductive data processing method based on multi-modal interaction that one embodiment of the application provides Cheng Tu；

Fig. 4 is the stream for the virtual idol deductive data processing method based on multi-modal interaction that one embodiment of the application provides Cheng Tu；

Fig. 5 is the stream for the virtual idol deductive data processing method based on multi-modal interaction that one embodiment of the application provides Cheng Tu；

Fig. 6 is the flow chart for the structure deep learning model that one embodiment of the application provides；

Fig. 7 is the imaging device structure schematic diagram that one embodiment of the application provides.

Specific embodiment

Many details are elaborated in the following description in order to fully understand the application.But the application can be with Much implement different from other manner described here, those skilled in the art can be in the situation without prejudice to the application intension Under do similar popularization, therefore the application is from the limitation of following public specific implementation.

This application provides the virtual idol deductive data processing method based on multi-modal interaction and system, virtual idol, Imaging device and computer readable storage medium, are described in detail one by one in the following embodiments.

Referring to Fig. 1, the virtual idol deductive data processing system based on multi-modal interaction of one embodiment of the application offer Structure diagram, realize current virtual idol deduction technical ability open in the case of, multi-modal output data described in decision-making and by The virtual idol is shown by imaging device, carries out parsing skill data by the cloud server, and decision-making is more Mode output data so that the deduction of the virtual idol possess real-time and deductive data possess with technical ability content it is corresponding Property, user can also enjoy personalized Flow Experience, and man-machine interaction effect is good.

As illustratively, the virtual idol deductive data processing system based on multi-modal interaction include mobile equipment 101, into As equipment 102 and cloud server 106.The mobile equipment 101 is alignd with 102 physical location of imaging device reference, with And realize that the mobile equipment 101 and the signal of the imaging device 102 interconnect.

The virtual idol for operating in itself can be incident upon on the imaging device 102 and carry out by the mobile equipment 101 It has been shown that, the imaging device 102 can be line holographic projections equipment, and the mobile equipment 101 can be with the cloud service Device 106 connects so that the virtual idol operated in the mobile equipment 101 presents multi-modal on the imaging device 102 The effect of human-computer interaction.

The mobile equipment 101 can include：Communication module 103, central processing unit 104 and human-computer interaction input and output Module 105；

Wherein, the human-computer interaction input/output module 105 is used to obtain multi-modal data and the virtual idol of output Parameter is performed, multi-modal data includes the data from ambient enviroment and the multi-modal input data interacted with user；

The communication module 103 is used to call the ability interface of the cloud server 106 and receives through the cloud The ability interface of end server 106 parses the multi-modal input data and goes out multi-modal output data with decision-making；

The central processing unit 104, for being calculated and the multi-modal output number using the multi-modal output data According to corresponding reply data.

The cloud server 106 possesses multi-modal data parsing module, more for being sent to the mobile equipment 101 Modal data is parsed, and the multi-modal output data of decision-making.

The imaging device 102 is used for virtual idol of the display with specific image in default display area.

As shown in Figure 1, each ability interface calls corresponding logical process respectively in multi-modal data resolving.Below For the explanation of each interface：

Semantic understanding interface 107 receives the special sound instruction forwarded from the communication module 103, language is carried out to it Sound identifies and the natural language processing based on a large amount of language materials.

Visual identity interface 108 can be directed to human body, face, scene according to computer vision algorithms make, deep learning algorithm Deng progress video content detection, identification, tracking etc..Image is identified according to predetermined algorithm, the detection of quantitative As a result.Possess image preprocessing function, feature extraction functions, decision making function and application function；

Wherein, described image preprocessing function can carry out basic handling to the vision collecting data of acquisition, including face Color space transformation, edge extracting, image conversion and image threshold；The feature extraction functions can extract target in image The characteristic informations such as the colour of skin, color, texture, movement and coordinate；The application function realize Face datection, human limbs identification, The functions such as motion detection.

Affection computation interface 110 receives the multi-modal data forwarded from the communication module 103, utilizes affection computation Logic (can be emotion recognition technology) calculates the current affective state of user.Emotion recognition technology is one of affection computation Important component, the content of emotion recognition research include the sides such as facial expression, voice, behavior, text and physiological signal identification Face may determine that the affective state of user by more than content.Emotion recognition technology can only pass through visual emotion identification technology Monitor the affective state of user, can also using visual emotion identification technology and sound emotion recognition technology with reference to by the way of come The affective state of user is monitored, and is not limited thereto.In the present embodiment, it is preferred to use the two with reference to mode monitor feelings Sense.

Affection computation interface 110 is when carrying out visual emotion identification, and mankind face is collected by using image capture device Portion's facial expression image is then converted into that data can be analyzed, the technologies such as image procossing is recycled to carry out expression sentiment analysis.Understand face Expression, it usually needs the delicate variation of expression is detected, such as cheek muscle, mouth variation and choose eyebrow etc..

Cognition calculates interface 109, receives the multi-modal data forwarded from the communication module 103, and the cognition calculates Interface 109 carries out data acquisition, identification and study to handle multi-modal data, to obtain user's portrait, knowledge mapping etc., with Rational Decision is carried out to multi-modal output data.

One kind of the above-mentioned virtual idol deductive data processing system based on multi-modal interaction for the embodiment of the present application is shown The technical solution of meaning property.For the ease of those skilled in the art understand that the technical solution of the application, following to pass through multiple embodiments The virtual idol deductive data processing method based on multi-modal interaction and systems that there is provided the application, virtual idol, imaging are set Standby and computer readable storage medium, is further detailed.

In the application, mobile equipment is alignd with the reference of imaging device physical location, with realize the mobile equipment with it is described The signal interconnection of imaging device.

The virtual idol for operating in itself can be incident upon on the imaging device and show by the mobile equipment, institute It can be line holographic projections equipment to state imaging device, and the mobile equipment can be connected with cloud server and cause the void Intend idol and possess multi-modal human-computer interaction, possess natural language understanding, visual perception, touch perception, language voice output, emotion The ability of the Artificial Intelligence (AI) such as facial expressions and acts output.

The virtual idol can be shown that it is special to possess specific image with 3D virtual images by the imaging device Sign, and can be that the virtual idol configures social property, personality attribute and personage's technical ability etc..

Specifically, the social property can include：Appearance, name, dress ornament, decoration, gender, native place, age, family The attributes such as relation, occupation, position, religious belief, emotion state, educational background；The personality attribute can include：Personality, makings etc. Attribute；Personage's technical ability can include：Sing and dance, the professional skills such as tell a story, train, and the displaying of personage's technical ability is not It is limited to the technical ability displaying of limbs, expression, head and/or mouth.

In this application, the social property of virtual idol, personality attribute and personage's technical ability etc. can cause multi-modal interaction Parsing skill data, and the multi-modal output data of decision-making, it is virtual that the multi-modal output data is more prone to or is more suitable for this Idol.

The virtual idol can also coordinate mobile equipment to project on imaging device simultaneously, and according to the imaging device The scene of displaying is deduced, such as sing and dance etc..

Referring to Fig. 2, one embodiment of the application provides a kind of virtual idol deductive data processing based on multi-modal interaction Method, the virtual idol in mobile equipment operation and projects presentation by imaging device, and the virtual idol possesses default shape As feature and preset attribute, the method includes the steps 201 to step 202.

Step 201：Judge current virtual idol whether in technical ability output state.

In the embodiment of the present application, the technical ability can be including sing and dance etc., and the technical ability output state can be included just It singing, dancing；And the virtual idol in mobile equipment operation and projects presentation by imaging device, can basis Mobile equipment or imaging device judge the technical ability output state of current virtual idol, such as may determine that current virtual idol just It is singing or the technical ability output state such as dancing.

In the embodiment of the present application, the mobile equipment can be smart mobile phone, laptop, tablet computer, palm electricity The computing devices such as brain and other mobile terminals, the computing device can also be portable or state type server, the shifting Dynamic equipment is the main media that the virtual idol is interacted with user and environment.

The imaging device can be line holographic projections equipment, and line holographic projections equipment can provide the load of basic projection imaging Body supports, and can show the contents such as the picture shown in the mobile device screen or word, and it is described into As equipment can also be gathered on signals such as vision, infrared and/or bluetooths, the mobile equipment to be aided in interact.

The mobile equipment controls the display function of the imaging device, including appendicular aobvious to scene Show and controlled, such as flowers, plants and trees in control scene etc., the display to light, special efficacy, particle or ray, wherein the lamp Light, the special efficacy, the particle and the ray can be shown by the imaging device.

In the embodiment of the present application, when the relative position of mobile equipment or imaging device changes, mobile equipment can Adjustment operates in the state of its virtual idol, and the state includes but not limited to resting state, active state, listens attentively to state Deng.

Wherein, the resting state：Refer to that virtual idol remains static or no interactions state；It is described to enliven shape State refers to that virtual idol is in multi-modal interaction mode, and can carry out works deduction and technical ability output；It is described to listen attentively to state： It refers to that the speech input interface of virtual idol is opened, the voice signal of user and environment input can be received.

Step 202：It is if so, more according to the current skill data of acquisition and the corresponding content parameters of the technical ability, decision-making Mode output data, the deductive data in the multi-modal output data are shown by the virtual idol.

In the embodiment of the present application, if current virtual idol is in technical ability output state, it can be obtained using cloud server The skill data of current virtual idol and the corresponding content parameters of the technical ability, the multi-modal output data of decision-making.

For example, current virtual idol is in the state for having turned on technical ability output, it is even to current virtual using cloud server The technical ability of picture is obtained, and such as the technical ability of virtual idol is song accompaning with dance, and corresponding content parameters are then：In the song of singing Hold, can determine that multi-modal output data is according to the above- mentioned information got：The music rhythm of song, the Text To of song Speech (TTS), the song rhythm rhythm and with the matched dancing of the song.

The multi-modal output data can include deductive data and voice data etc., the application mainly using deductive data as Example illustrates, such as the multi-modal output data is：When singing and jump dancing matched with the song, the deductive data For：The multi-modal output data of the imaging device can be shown in, as dance movement, limbs with the song rhythm rhythm change Change, music rhythm and the corresponding emotions of song TTS are in the emotion behavior of virtual idol face.

In the embodiment of the present application, deep learning model, the deep learning mould can also be pre-established by server beyond the clouds Type can be Recognition with Recurrent Neural Network (RNN, Recurrent Neural Networks), by the current skill data got with And in the technical ability corresponding content parameters input deep learning model, multi-modal output data, example can be immediately arrived at The current skill data such as got is：It dances, the corresponding content parameters of the technical ability are：The dance movement of peacock dance, by this Skill data and the corresponding content parameters of technical ability are inputted in the deep learning model, are immediately arrived at：Jump the next of peacock dance Then dance movement A is accurately drawn and current action A again according to " the next dance movement A for jumping peacock dance " that currently draws Corresponding " the next dance movement B for jumping peacock dance ", so cycles, whole dance movements of peacock dance is generated, then by institute It states virtual idol and is deduced and pass through imaging device and be shown.

Optionally, (source of affection data includes the affection data of acquisition current virtual idol：User feeling input, skill Energy content corresponds to affection data), when the virtual idol is in technical ability output state, match the affection data output multimode State output data.

For example, the affection data of the current virtual idol is happiness, and the virtual idol is in what is sung During technical ability output state, matching the multi-modal output data of the affection data output can include singing and jumping and the song The dancing to match if the affection data of the current virtual idol is difficult out-of-date, is being sung when the virtual idol is in Technical ability output state when, the multi-modal output data for matching affection data output may be only to sing.

Virtual idol deductive data processing method provided by the embodiments of the present application based on multi-modal interaction, can be current In the case that virtual idol deduction technical ability is opened, parsing skill data is carried out by the cloud server, and decision-making is multi-modal Output data.The multi-modal output data is simultaneously shown by the virtual idol by imaging device so that described virtual The deduction of idol possesses real-time and deductive data and possesses correspondence with technical ability content, and user can also enjoy personalized stream Smooth experience, man-machine interaction effect are good.

Referring to Fig. 3, one embodiment of the application provides a kind of virtual idol deductive data processing based on multi-modal interaction Method, the virtual idol in mobile equipment operation and projects presentation by imaging device, and the virtual idol possesses default shape As feature and preset attribute, the method includes the steps 301 to step 303.

Step 301：Judge current virtual idol whether in technical ability output state.

In the embodiment of the present application, the technical ability can be to sing, and the technical ability output state can be to sing.

Step 302：It is if so, more according to the current skill data of acquisition and the corresponding content parameters of the technical ability, decision-making Mode output data, the deductive data in the multi-modal output data are shown by the virtual idol.

In the embodiment of the present application, technical ability output state can be analyzed by cloud server, then tied according to analysis The current skill data and the corresponding content parameters of the technical ability that fruit obtains, such as cloud server is to current virtual idol Technical ability output state determines that the technical ability output state of presently described virtual idol is after being analyzed：It is singing, then basis It is corresponding for singing and the technical ability that the technical ability output state sung gets the current skill data of virtual idol Content parameters for the lyrics " Liang shanbo and Zhu yingtai's：Butterfly with different colors is hovered long in pairs ", the skill data and the corresponding content parameters of the technical ability is defeated Enter into the deep learning model pre-established, draw the multi-modal output data：Sing bent Liang shanbo and Zhu yingtai's this song and jump and the song Bent matched dancing, the deductive data in the multi-modal output data are：Jump and the bent matched dancing of Liang shanbo and Zhu yingtai's this song, then It is deduced by the virtual idol and the dancing and passes through imaging device and be shown.

Step 303：Based on the multi-modal output data, the virtual idol output limb action and emotion information The shape of the mouth as one speaks and/or facial expression matched somebody with somebody.

In the embodiment of the present application, when the virtual idol shows the deductive data in the multi-modal output data, also The multi-modal output data can be based on, can also coordinate output limb action and the matched shape of the mouth as one speaks of emotion information and/or facial table Feelings etc..

When deductive data in the multi-modal output data is song accompaning with dance, limbs (such as both arms, hand of the virtual idol Finger, both legs etc.) rhythm of music in song, the content of the lyrics, the development fluctuations of lyrics plot can be followed, it makes corresponding Action, the shape of the mouth as one speaks of the virtual idol also can make corresponding variation according to song tune, lyrics emotion corresponding with the song, And the face (such as eyes, eyebrow etc.) and facial expression (such as a crease in the skin etc.) of the virtual idol also can be according to parsings Affection data in song and dancing is changed.

For example, the deductive data in the multi-modal output data is：When jumping matched dancing bent with Liang shanbo and Zhu yingtai's this song, when When the virtual idol jumps matched dancing bent with Liang shanbo and Zhu yingtai's this song, then it can judge to be wrapped in Liang shanbo and Zhu yingtai's this song song by algorithm The emotion and state contained, by identification, it can be determined that go out the Sentiment orientation of Liang shanbo and Zhu yingtai's this song song for negative sense, emotional category is difficult It crosses, it is necessary to which the type of action showed is：Do not give up, when cloud server it will be appreciated that information after be sent to mobile equipment, fortune Row need to make when the virtual idol of mobile equipment just can know that and jump matched dancing bent with Liang shanbo and Zhu yingtai's this song do not give up it is dynamic Make, and the shape of the mouth as one speaks is half, and expression in the eyes is sad state.

Virtual idol deductive data processing method provided by the embodiments of the present application based on multi-modal interaction so that virtual even As increasing limb action and the matched shape of the mouth as one speaks of emotion information and/or facial expression etc. in deductive procedure so that the virtual idol The presentation of picture more personalizes, and also more increases interest using this multi-modal interaction.

Referring to Fig. 4, one embodiment of the application provides a kind of virtual idol deductive data processing based on multi-modal interaction Method, the virtual idol in mobile equipment operation and projects presentation by imaging device, and the virtual idol possesses default shape As feature and preset attribute, the method includes the steps 401 to step 403.

Step 401：Judge current virtual idol whether in technical ability output state.

The technical ability output state, which is opened, to be inputted by the phonetic entry of user, visual gesture, touch perceive input or The opening ways such as entity button input so that virtual idol open from sing, dance, the technical ability such as song accompaning with dance or poem recitation.

Step 402：It is if so, more according to the current skill data of acquisition and the corresponding content parameters of the technical ability, decision-making Mode output data, the deductive data in the multi-modal output data are shown by the virtual idol.

In the embodiment of the present application, technical ability output state can be analyzed by cloud server, then tied according to analysis The current skill data and the corresponding content parameters of the technical ability that fruit obtains, such as cloud server is to current virtual idol Technical ability output state determines that the technical ability output state of presently described virtual idol is after being analyzed：It is listening to music, then root The current skill data of virtual idol is got according to the technical ability output state listened to music to listen to music and the technical ability Corresponding content parameters are to listen dubbing in background music for Latin dancing, and the skill data and the corresponding content parameters of the technical ability are input in advance In the deep learning model of foundation, the multi-modal output data is drawn：Listen dubbing in background music and jumping Latin dancing, the multimode for Latin dancing Deductive data in state output data is：Latin dancing is jumped, then being deduced by the virtual idol dancing and passes through imaging device It is shown.

Step 403：The mobile equipment controls the imaging device output institute according to presently described multi-modal output data State the assembly function open signal that virtual idol deduces and the virtual idol is coordinated to deduce.

In the embodiment of the present application, when the technical ability of the virtual idol is opened, mobile equipment can be selected according to current scene Wireless network connection is selected to imaging device, and imaging device is controlled to open light, variation light color or opens dot matrix light etc..

For example, when presently described virtual idol is jumping Latin dancing, mobile equipment can select wireless according to current scene Network connection controls imaging device to open golden and red light to imaging device, changes the light color of golden red, opens Dot matrix light, coordinates the virtual idol, and the effect of presentation is truer.

Virtual idol deductive data processing method provided by the embodiments of the present application based on multi-modal interaction, described virtual Idol can coordinate while carrying out dancing deduction according to the dancing of deduction automatically generates matched light of stage and stage effect, Better visual experience is presented to user.

Referring to Fig. 5, when deductive data is dancing data, corresponded to according to the current skill data of acquisition and the technical ability Content parameters, the step of deductive data in the multi-modal output data of decision-making includes：

Step 501：Dancing is obtained in real time to dub in background music.

In the embodiment of the present application, it is to be in technical ability output state to first determine whether current virtual idol, and can be according to the dance It steps the skill data of the definite virtual idol of dubbing in background music to dub in background music output for dancing, the corresponding content parameters of the technical ability are the dancing The each lyrics or each section music score of Chinese operas dubbed in background music.

Step 502：Extract the acoustic feature that the dancing is dubbed in background music.

In the embodiment of the present application, each lyrics or each acoustic feature for saving the music score of Chinese operas that the dancing is dubbed in background music are extracted, it is described Acoustic feature can be tune, bent meaning or beat etc..

Step 503：The acoustic feature is input in the deep learning model pre-established, output is special with the acoustics Levy matched dance movement.

In the embodiment of the present application, output and the matched dance movement of the acoustic feature are in multi-modal output data Deductive data.

In the embodiment of the present application, the dance movement (such as preliminary activities) of the virtual idol of previous moment is obtained first, then is obtained The dancing of presently described virtual idol is taken to dub in background music, the acoustic feature that the dancing is dubbed in background music then is extracted, according to the acoustic feature Dance movement with the virtual idol of previous moment generates the dance movement at matched current time, so cycles, and generation is a series of Then dance movement is deduced by virtual idol by line holographic projections equipment.

In the embodiment of the present application, the dancing at the dance movement generation current time of the virtual idol of previous moment can also be obtained Action so cycles, generates a series of dance movements；If the dance movement of the virtual idol of previous moment and current institute are obtained simultaneously When stating the dancing of virtual idol and dubbing in background music, the virtual idol can automatically select current dance movement or described the most suitable Virtual idol can match somebody with somebody by the current dance movement generated according to the dance movement of the virtual idol of the previous moment and according to dancing The current dance movement of happy generation is combined, and generates new current time dance movement, the virtual idol can be realized automatically Learn dance movement and have well from invasive.

In the embodiment of the present application, the deep learning model is built by step 601 to step 603.

Step 601：Dancing and dancing of the acquisition with vocal music feature are dubbed in background music.

In the embodiment of the present application, dubbed in background music by dancing of the mobile equipment acquisition with vocal music feature and dancing.

Step 602：The acoustic feature progress that the action of the dancing with vocal music feature is dubbed in background music with the dancing Match somebody with somebody, generate training data sample.

In the embodiment of the present application, each action of the dancing with vocal music feature is dubbed in background music with the dancing every One acoustic feature is matched, and generates training sample.

Step 603：Final deep learning mould is obtained according to deep learning model described in the training data sample training Type.

A kind of virtual image technique of expression provided by the embodiments of the present application realizes that the multi-modal input data can basis The deep learning model that training obtains in advance directly obtains the multi-modal output data, is solved by the cloud server Analyse skill data, and the multi-modal output data of decision-making so that the deduction of the virtual image possesses real-time and deductive data Possesses correspondence with technical ability content, user can also enjoy personalized Flow Experience, and man-machine interaction effect is good.

One embodiment of the application also provides a kind of virtual idol deductive data processing system based on multi-modal interaction, including Mobile equipment, imaging device and cloud server, the virtual idol run in the mobile equipment and are projected by imaging device It presents, and the virtual idol possesses default image characteristics and preset attribute, wherein：

The exemplary scheme of the above-mentioned virtual idol deductive data processing system based on multi-modal interaction for the present embodiment. It should be noted that should virtual idol deductive data processing system based on multi-modal interaction technical solution with it is above-mentioned based on The technical solution of the virtual idol deductive data processing method of multi-modal interaction belongs to same design, the void based on multi-modal interaction Intend the detail content that the technical solution of idol deductive data processing system is not described in detail, may refer to above-mentioned based on multi-modal The description of the technical solution of interactive virtual idol deductive data processing method.

One embodiment of the application additionally provides a kind of virtual idol, and the virtual idol is in mobile equipment operation, the void Intend the step of idol performs virtual idol deductive data processing method based on multi-modal interaction as described above.

Referring to Fig. 7, one embodiment of the application additionally provides a kind of imaging device, and the virtual idol is in mobile equipment 701 It runs and presentation is projected by the imaging device 702.

One embodiment of the application also provides a kind of computer readable storage medium, is stored with computer program, the program The step of virtual idol deductive data processing method based on multi-modal interaction as previously described is realized when being executed by processor.

A kind of exemplary scheme of above-mentioned computer readable storage medium for the present embodiment.It should be noted that this is deposited The technical solution category of the technical solution of storage media and the above-mentioned virtual idol deductive data processing method based on multi-modal interaction In same design, detail content that the technical solution of storage medium is not described in detail may refer to above-mentioned based on multi-modal friendship The description of the technical solution of mutual virtual idol deductive data processing method.

The computer instruction include computer program code, the computer program code can be source code form, Object identification code form, executable file or some intermediate forms etc..The computer-readable medium can include：Institute can be carried State any entity of computer program code or system, recording medium, USB flash disk, mobile hard disk, magnetic disc, CD, computer storage Device, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), Electric carrier signal, telecommunication signal and software distribution medium etc..It should be noted that the computer-readable medium include it is interior Appropriate increase and decrease can be carried out according to legislation in jurisdiction and the requirement of patent practice by holding, such as in some jurisdictions of courts Area, according to legislation and patent practice, computer-readable medium does not include electric carrier signal and telecommunication signal.

It it should be noted that for foregoing each method embodiment, describes, therefore it is all expressed as a series of for simplicity Combination of actions, but those skilled in the art should know, the application and from the limitation of described sequence of movement because According to the application, some steps may be employed other orders or be carried out at the same time.Secondly, those skilled in the art should also know It knows, embodiment described in this description belongs to preferred embodiment, and involved action and module might not all be this Shens It please be necessary.

In the above-described embodiments, all emphasize particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, it may refer to the associated description of other embodiments.

The application preferred embodiment disclosed above is only intended to help to illustrate the application.There is no detailed for alternative embodiment All details are described, are not limited the invention to the specific embodiments described.Obviously, according to the content of this specification, It can make many modifications and variations.This specification is chosen and specifically describes these embodiments, is in order to preferably explain the application Principle and practical application so that skilled artisan can be best understood by and utilize the application.The application is only It is limited by claims and its four corner and equivalent.

Claims

1. a kind of virtual idol deductive data processing method based on multi-modal interaction, which is characterized in that the virtual idol exists Mobile equipment operation simultaneously projects presentation by imaging device, and the virtual idol possesses default image characteristics and preset attribute, institute The method of stating includes：

Judge current virtual idol whether in technical ability output state；

If so, according to the current skill data of acquisition and the corresponding content parameters of the technical ability, the multi-modal output data of decision-making, Deductive data in the multi-modal output data is shown by the virtual idol.

2. according to the method described in claim 1, it is characterized in that, the deductive data in the multi-modal output data is by described Virtual idol displaying includes：

Based on the multi-modal output data, the virtual idol output limb action, with the matched shape of the mouth as one speaks of emotion information and/or Facial expression.

3. it according to the method described in claim 1, it is characterized in that, further includes：

The mobile equipment controls the imaging device to export the virtual idol and drills according to presently described multi-modal output data The assembly function open signal for unraveling silk and coordinating the virtual idol to deduce.

4. it according to the method described in claim 1, it is characterized in that, further includes：

The affection data of current virtual idol is obtained, when the virtual idol is in technical ability output state, matches the emotion Data export multi-modal output data.

5. according to the method described in claim 1, it is characterized in that, when the deductive data be dancing data when, according to acquisition Current skill data and the corresponding content parameters of the technical ability, the step of decision-making multi-modal output data include：

Dancing is obtained in real time to dub in background music；

Extract the acoustic feature that the dancing is dubbed in background music；

The acoustic feature is input in the deep learning model pre-established and matched, output and the acoustic feature The dance movement matched somebody with somebody.

6. according to the method described in claim 5, it is characterized in that, the deep learning model carries out structure as follows It builds：

The acoustic feature that the action of the dancing with vocal music feature is dubbed in background music with the dancing is matched, generates training number According to sample；

7. a kind of virtual idol deductive data processing system based on multi-modal interaction, which is characterized in that including mobile equipment, into As equipment and cloud server, the virtual idol in the mobile equipment operation and projects presentation by imaging device, and described Virtual idol possesses default image characteristics and preset attribute, wherein：

If so, according to the current skill data and the corresponding content parameters of the technical ability of the mobile equipment acquisition, by described The multi-modal output data of cloud server decision-making, the deductive data in the multi-modal output data are passed through by the virtual idol The imaging device displaying.

8. a kind of virtual idol, which is characterized in that the virtual idol is performed in mobile equipment operation, the virtual idol as weighed Profit requires the step of 1-6 any one the methods.

9. a kind of imaging device, which is characterized in that virtual idol described in claim 8 is in mobile equipment operation and by the imaging Equipment projection is presented.

10. a kind of computer readable storage medium, is stored with computer program, which is characterized in that the program is held by processor The step of claim 1-6 any one the methods are realized during row.