CN107944542A - A kind of multi-modal interactive output method and system based on visual human - Google Patents
A kind of multi-modal interactive output method and system based on visual human Download PDFInfo
- Publication number
- CN107944542A CN107944542A CN201711162023.2A CN201711162023A CN107944542A CN 107944542 A CN107944542 A CN 107944542A CN 201711162023 A CN201711162023 A CN 201711162023A CN 107944542 A CN107944542 A CN 107944542A
- Authority
- CN
- China
- Prior art keywords
- data
- visual human
- facial
- modal
- smart machine
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000000007 visual effect Effects 0.000 title claims abstract description 166
- 238000000034 method Methods 0.000 title claims abstract description 42
- 230000002452 interceptive effect Effects 0.000 title claims abstract description 38
- 230000001815 facial effect Effects 0.000 claims abstract description 71
- 239000011664 nicotinic acid Substances 0.000 claims abstract description 25
- 210000001097 facial muscle Anatomy 0.000 claims description 18
- 230000002195 synergetic effect Effects 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 6
- 239000011800 void material Substances 0.000 claims description 2
- 230000008451 emotion Effects 0.000 abstract description 29
- 230000003993 interaction Effects 0.000 abstract description 12
- 230000009471 action Effects 0.000 abstract description 9
- 230000000694 effects Effects 0.000 abstract description 8
- 238000004458 analytical method Methods 0.000 abstract description 5
- 238000004088 simulation Methods 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 description 23
- 210000003491 skin Anatomy 0.000 description 17
- 230000008921 facial expression Effects 0.000 description 14
- 230000002996 emotional effect Effects 0.000 description 13
- 230000006870 function Effects 0.000 description 10
- 230000036651 mood Effects 0.000 description 8
- 230000008859 change Effects 0.000 description 7
- 238000004891 communication Methods 0.000 description 5
- 230000004438 eyesight Effects 0.000 description 5
- 241000288906 Primates Species 0.000 description 4
- 238000013461 design Methods 0.000 description 4
- 230000014509 gene expression Effects 0.000 description 4
- 230000008447 perception Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000006399 behavior Effects 0.000 description 3
- 210000004556 brain Anatomy 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000016776 visual perception Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000019771 cognition Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 210000003205 muscle Anatomy 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 210000000988 bone and bone Anatomy 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000005059 dormancy Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 210000002615 epidermis Anatomy 0.000 description 1
- 210000004709 eyebrow Anatomy 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 239000003292 glue Substances 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000001093 holography Methods 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000000284 resting effect Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 230000002618 waking effect Effects 0.000 description 1
- 230000037303 wrinkles Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/008—Artificial life, i.e. computing arrangements simulating life based on physical entities controlled by simulated intelligence so as to replicate intelligent life forms, e.g. based on robots replicating pets or humans in their appearance or behaviour
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Robotics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Child & Adolescent Psychology (AREA)
- Hospice & Palliative Care (AREA)
- Psychiatry (AREA)
- Signal Processing (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The application provides a kind of multi-modal interactive output method and system based on visual human, wherein, the described method includes:The visual human runs in smart machine, obtain multi-modal data, voice data is included at least in the multi-modal data, parse the multi-modal data, to obtain the semantic data and affection data in the voice data, the semantic data and the affection data are matched with the facial parameter of the visual human, facial bionic data is generated and exports;By carrying out semantic data and affection data parsing to the multi-modal data got, the face of visual human is set to carry out the imitation of face action and facial emotion according to analysis result, strengthen the viscosity of user's visual sense feeling, true to nature, smooth simulation interaction effect is presented, improves interactive experience.
Description
Technical field
This application involves field of artificial intelligence, more particularly to a kind of multi-modal interactive output method based on visual human
And system, a kind of visual human, a kind of smart machine and a kind of computer-readable recording medium.
Background technology
With the continuous development of scientific technology, the introducing of information technology, computer technology and artificial intelligence technology, machine
Industrial circle is progressively walked out in the research of people, gradually extend to the neck such as medical treatment, health care, family, amusement and service industry
Domain.And requirement of the people for robot also conform to the principle of simplicity the multiple mechanical action of substance be promoted to anthropomorphic question and answer, independence and with
The intelligent robot that other robot interacts, human-computer interaction also just become an important factor for determining intelligent robot development.
Robot includes the tangible machine people for possessing entity and the virtual robot being mounted on hardware device at present.It is existing
Virtual robot in technology can not carry out multi-modal interaction, show changeless state always, will can not be imitated
Mood, the emotion of person is come out by facial true smooth imitation, can not realize true to nature, smooth, anthropomorphic interaction effect.
Therefore, the interaction capabilities and presentation ability of virtual robot are lifted, are the major issues of present urgent need to resolve.
The content of the invention
In view of this, the application provides a kind of multi-modal interactive output method and system based on visual human, one kind virtually
People, a kind of smart machine and a kind of computer-readable recording medium, to solve technological deficiency existing in the prior art.
On the one hand, the application provides a kind of multi-modal interactive output method based on visual human, and the visual human is in intelligence
Equipment is run, including:
Multi-modal data is obtained, voice data is included at least in the multi-modal data;
The multi-modal data is parsed, to obtain the semantic data and affection data in the voice data;
The semantic data and the affection data are matched with the facial parameter of the visual human, it is imitative to generate face
Raw data simultaneously export.
Alternatively, before obtaining multi-modal data, further include:
Visual human is waken up, the visual human is shown in default display area.
Alternatively, the semantic data and the affection data are matched with the facial parameter of the visual human, it is raw
Include into facial bionic data and output:
Cutting word is carried out according to the semantic data, cutting word result is matched with the nozzle type model of the visual human, with
Generation mouth bionic data simultaneously exports;
For the affection data, affective tag is set;
According to the corresponding facial parameter set of affective tag selection, to coordinate the bionical number of mouth of the nozzle type model
According to.
Alternatively, the facial parameter of the visual human includes the facial skeleton, described a crease in the skin, the facial muscle groups
And/or the blee.
Alternatively, the facial parameter set includes but not limited to:
The facial skeleton and the bionical synergistic data of facial muscle groups movement;
The facial skeleton and the bionical synergistic data of described a crease in the skin movement;
Described a crease in the skin and the bionical synergistic data of facial muscle groups movement;Or
The facial skeleton, described a crease in the skin, the bionical collaboration number of the facial muscle groups and/or the blee
According to.
Alternatively, the visual human is built by 3D high moulds and generated, and possesses default image and technical ability;
The visual human is including operating in the application program on the smart machine, executable file or by the intelligence
The hologram that equipment projects.
Alternatively, the system that the smart machine uses includes but not limited to WINDOWS systems, MACOS systems or holography
Equipment built-in system.
Alternatively, the default display area includes the throwing of the display interface or the smart machine of the smart machine
Penetrate region.
On the other hand, this application provides a kind of multi-modal interactive output system based on visual human, including smart machine
And server, the visual human run in smart machine, wherein:
The smart machine obtains multi-modal data, and voice data is included at least in the multi-modal data;
The server parses the multi-modal data, to obtain the semantic data and emotion number in the voice data
According to;
The server is matched the semantic data and the affection data with the facial parameter of the visual human,
The facial bionic data of generation;
The smart machine receives the facial bionic data and exports.
Alternatively, the server parses the multi-modal data and is implemented as:
Cutting word is carried out according to the semantic data, cutting word result is matched with the nozzle type model of the visual human, with
Generation mouth bionic data simultaneously exports;
For the affection data, affective tag is set;
According to the corresponding facial parameter set of affective tag selection, to coordinate the bionical number of mouth of the nozzle type model
According to.
On the other hand, this application provides a kind of visual human, the visual human to run in smart machine, and the visual human holds
The above-mentioned multi-modal interactive output method based on visual human of row.
On the other hand, this application provides a kind of smart machine, above-mentioned visual human is run on the smart machine.
On the other hand, this application provides a kind of computer-readable recording medium, it is stored with computer program, its feature
It is, which realizes above-mentioned multi-modal interactive output method based on visual human when being executed by processor the step of.
A kind of multi-modal interactive output method and system, a kind of visual human, one kind based on visual human that the application provides
Smart machine and a kind of computer-readable recording medium, by obtaining multi-modal data, include at least in the multi-modal data
Voice data;Then the multi-modal data is parsed, to obtain the semantic data and affection data in the voice data;Finally
The semantic data and the affection data are matched with the facial parameter of the visual human, generate facial bionic data simultaneously
Output;By carrying out semantic data and affection data parsing to the multi-modal data got, allow the face root of visual human
The imitation of face action and facial emotion is carried out according to analysis result, strengthens the viscosity of user's visual sense feeling, is presented true to nature, smooth
Interaction effect is simulated, improves interactive experience.
Brief description of the drawings
Fig. 1 is a kind of structural representation for multi-modal interactive output system based on visual human that one embodiment of the application provides
Figure;
Fig. 2 is a kind of multi-modal interactive output method flow chart based on visual human that one embodiment of the application provides;
Fig. 3 is a kind of multi-modal interactive output method flow chart based on visual human that one embodiment of the application provides;
Fig. 4 is a kind of multi-modal interactive output method flow chart based on visual human that one embodiment of the application provides;
Fig. 5 is a kind of multi-modal interactive output method flow chart based on visual human that one embodiment of the application provides;
Fig. 6 is a kind of structural representation for multi-modal interactive output system based on visual human that one embodiment of the application provides
Figure.
Embodiment
Many details are elaborated in the following description in order to fully understand the application.But the application can be with
Much implement different from other manner described here, those skilled in the art can be in the situation without prejudice to the application intension
Under do similar popularization, therefore the application is from the limitation of following public specific implementation.
In this application, there is provided a kind of multi-modal interactive output method and system based on visual human, a kind of visual human,
A kind of smart machine and a kind of computer-readable recording medium, are described in detail one by one in the following embodiments.
In the application, the virtual smart machine for being artificially equipped on the input/output modules such as support perception, control;
Using height emulation 3d virtual figure images as Main User Interface, possesses the appearance of notable character features;
Support multi-modal human-computer interaction, possess natural language understanding, visual perception, touch perception, language voice output, feelings
Feel the AI abilities such as facial expressions and acts output;
Configurable social property, personality attribute, personage's technical ability etc., make user enjoy intelligent and personalized Flow Experience
Virtual portrait.
Visual human operates in smart machine, the smart machine can be desktop PC, notebook, palm PC and
The intellectual computing devices such as mobile terminal, it is even more important that can also be intelligent line holographic projections equipment etc., the mobile terminal
It can include smart mobile phone, tablet, intelligent robot etc..
The attribute that the visual human possesses, can include:Visual human's mark, social property, personality attribute, personage's technical ability etc.
Attribute.Specifically, social property can include:Appearance, name, gender, native place, age, family relationship, occupation, position, ancestor
Teach the attribute fields such as faith, emotion state, educational background;Personality attribute can include:The attribute fields such as personality, makings;Personage's technical ability
It can include:Sing and dance, the professional skill such as tell a story, train.
In this application, the attribute of visual human can enable the parsing of multi-modal interaction and the result of decision be more prone to or
It is more suitable for the visual human, system can be by calling the attribute information to realize the wake-up of visual human, activity, going to wake up and nullify
Etc. the control of state, belong to the adeditive attribute information that visual human distinguishes true people.
In the application, the intelligent holographic projector equipment can use hologram device built-in system, can also pass through certainly
Other equipment and platform, other equipment and platform can configure WINDOWS systems or MACOS systems.
Therefore, the visual human can be the hologram or fortune come out by intelligent holographic projection
Application program or executable file of the row on the smart machine.
Referring to Fig. 1, for the structure diagram of the multi-modal output system based on visual human of the embodiment of the present application.
The multi-modal output system based on visual human includes smart machine 120 and server, and the server can be
High in the clouds brain 110.
The smart machine 120 can include:User interface 121, communication module 122, central processing unit 123 and man-machine
Interactively enter output module 124.Wherein, the user interface 121, its shown in default display area be waken up it is virtual
People.The human-computer interaction input/output module 124, it obtains multi-modal data and output visual human performs parameter, multi-modal
Data include the data from surrounding environment and the multi-modal input data (including at least voice data) interacted with user.
The communication module 122, it calls visual human's ability interface and receives parses multi-modal input number by visual human's ability interface
According to the multi-modal output data of decision-making.The central processing unit 123, it utilizes the real speech data in multi-modal output data
When calculating visual human to the speech imitation to the semantic understanding and affective comprehension of the voice data with visual human mouth movement and
The execution parameter of facial expression.
The high in the clouds brain 110 possesses multi-modal data parsing module (also referred to as " visual human's ability interface "), it is to described
The multi-modal data that smart machine 120 is sent is parsed, and the multi-modal output data of decision-making, the multi-modal output data packet
Include real speech data and visual human semantic understanding and affective comprehension to the voice data.
As shown in Figure 1, corresponding logical process is called respectively in each ability interface of multi-modal data resolving.Below
For the explanation of each interface:
Semantic understanding interface 111, it receives the voice messaging from the communication module 122 forwarding, voice knowledge is carried out to it
The other and natural language processing based on a large amount of language materials.
Visual identity interface 112, can be directed to human body, face, scene according to computer vision algorithms make, deep learning algorithm
Deng progress video content detection, identification, tracking etc..Image is identified according to predetermined algorithm, the detection of quantitative
As a result.Possess image preprocessing function, feature extraction functions, decision making function and concrete application function.Image preprocessing can be
Basic handling, including the conversion of color space conversion, edge extracting, image and image threshold are carried out to the vision collecting data of acquisition
Change;Feature extraction can extract the characteristic information such as the colour of skin of target, color, texture, movement and coordinate in image;Decision-making can be with
It is to characteristic information, the concrete application for needing this feature information is distributed to according to certain decision strategy;Concrete application function is real
The functions such as existing Face datection, human limbs identification, motion detection.
Affection computation interface 114, it receives the multi-modal data from the communication module 122 forwarding, utilizes affection computation
Logic (can be Emotion identification technology) calculates the current emotional state of user.Emotion identification technology is one of affection computation
Important component, the content of Emotion identification research can include facial expression, voice, behavior, text and physiological signal identification
Etc., the emotional state of user is may determine that by above content.Emotion identification technology can only be known by vision mood
Other technology monitors the emotional state of user, can also monitor the emotional state of user using sound Emotion identification technology, and
It is not limited thereto.In the present embodiment, mood is monitored by the way of sound Emotion identification technology.
Affection computation interface 114 collects voice when carrying out voice mood identification, by using voice capture device, then
Detachable text is converted into, recycles sound Emotion identification technology technology to carry out expression mood analysis.Understand facial expression, lead to
Often the delicate change to expression is needed to be detected, such as the change of cheek muscle, mouth and eye, the motion change etc. of eyebrow.
Cognition calculates interface 113, it receives the multi-modal data from the communication module 122 forwarding, and the cognition calculates
Interface 113 carries out data acquisition, identification and study to handle multi-modal data, to obtain user's portrait, knowledge mapping etc., with
Rational Decision is carried out to multi-modal output data.
A kind of schematical technical solution of the above-mentioned multi-modal output system based on visual human for the embodiment of the present application.
For the ease of skilled artisan understands that the technical solution of the application, the description below by multiple embodiments to the application one
Multi-modal interactive output method and system, a kind of visual human, a kind of smart machine and a kind of computer of the kind based on visual human can
Storage medium is read to be further detailed.
Referring to Fig. 2, one embodiment of the application provides a kind of multi-modal interactive output method based on visual human, described virtual
People runs in smart machine, including step 201 is to step 203.
Step 201:Multi-modal data is obtained, voice data is included at least in the multi-modal data.
In the embodiment of the present application, the multi-modal data can be real human, the class mankind or the primate of collection
Natural language, visual perception, touch data, the multi-modal data such as perception, language voice, emotional facial expressions and action and may be used also
With including the data from surrounding environment.
The voice data includes semantic data, affection data, pitch data, loudness of a sound data, duration of a sound data and sound
Chromatic number according to etc..
Step 202:The multi-modal data is parsed, to obtain the semantic data and affection data in the voice data.
In the embodiment of the present application, the semantic data of acquisition and the affection data are needed according in the voice data
Pitch data, loudness of a sound data, duration of a sound data and tamber data etc. be determined.
Such as the voice data got is " I am fine ", when pitch data, loudness of a sound data, duration of a sound data and sound
Chromatic number according to it is overall it is relatively low in the case of, then cannot only understand from literal meaning " I am fine ", but to combine the expresser
State in which, scene or itself occupation etc. understood, should when semantic understanding and emotional expression is carried out
Analyze expresser true intention to be expressed " I is not fine ".
The affection data can include multiple affective tags, and the affective tag can be divided into positive affective tag and bear
To affective tag, the forward direction affective tag can include unpleasant emotional label, trust affective tag, grateful affective tag and celebrating
Good fortune affective tag etc., the negative sense affective tag can include painful affective tag, disdain affective tag, hatred affective tag and
Envy affective tag etc., and can also these affective tags be carried out with the division of grade, such as unpleasant emotional label again can be with
It is divided into happiness affective tag, pleasant affective tag or comfortable affective tag etc..
Step 203:The semantic data and the affection data are matched with the facial parameter of the visual human, it is raw
Into facial bionic data and export.
In the embodiment of the present application, the facial parameter of the visual human includes facial skeleton, a crease in the skin or facial muscle groups, institute
It can be facial face bone to state facial skeleton, and described a crease in the skin can be the line that facial epidermis produces when action
Road, the facial muscle groups are distributed for the muscle of face, meanwhile, the facial muscle groups can drive a crease in the skin according to affective tag
And blee change.
Referring to Fig. 3, in the embodiment of the present application, by the semantic data and the affection data and the face of the visual human
Parameter is matched, and generates facial bionic data and output specifically includes step 301 to step 303.
Step 301:Cutting word is carried out according to the semantic data, the nozzle type model of cutting word result and the visual human are carried out
Matching, to generate mouth bionic data and export.
In the embodiment of the present application, the voice data is converted into corresponding text data, then according to the semantic number
Cutting word is carried out according to the text data, is then matched cutting word result with the nozzle type model of the visual human, to coordinate
The nozzle type of the visual human.
Step 302:For the affection data, affective tag is set.
In the embodiment of the present application, pitch data, loudness of a sound data, duration of a sound data and tone color in the voice data
The data such as data determine the affection data, and corresponding affective tag, such as the language got are set according to the affection data
Sound is " I has enough ", can determine that the words is after then being judged by the pitch of the voice, loudness of a sound, the duration of a sound and tone color
Told in the case where meeting happy affective state, then will be that the words sets a happiness affective tag.
In addition, the affection data can also calculate user using affection computation logic (can be Emotion identification technology)
Current emotional state.Emotion identification technology is an important component of affection computation, the content bag of Emotion identification research
Facial expression, voice, behavior, text and physiological signal identification etc. are included, the feelings of user are may determine that by above content
Not-ready status.Emotion identification technology can only monitor the feelings of user by vision Emotion identification technology or sound Emotion identification technology
Not-ready status, can also monitor the mood of user by the way of vision Emotion identification technology and sound Emotion identification technology combine
State, wherein, the sound Emotion identification technology can pass through:The text Emotion identification realization of text is changed into according to sound, and
It is not limited thereto.In the present embodiment, it is preferred to use sound Emotion identification technology monitors mood, calculates affection data.
Step 303:According to the corresponding facial parameter set of affective tag selection, to coordinate the mouth of the nozzle type model
Portion's bionic data.
In the embodiment of the present application, the face parameter set includes but not limited to:The facial skeleton and the facial flesh
The bionical synergistic data of group's movement;The facial skeleton and the bionical synergistic data of described a crease in the skin movement;The skin fold
Wrinkle and the bionical synergistic data of facial muscle groups movement;Or the facial skeleton, described a crease in the skin, the facial muscle groups
And/or the bionical synergistic data of the blee.
Each described affective tag corresponds to a suitable facial parameter set, such as when the affective tag is
During happiness affective tag, the corresponding facial parameter set can be the one of the facial skeleton and facial muscle groups movement
The bionical synergistic data of group, alternatively, can be the facial skeleton, described a crease in the skin, the facial muscle groups and/or the face
One group of bionical synergistic data of the colour of skin is, it is necessary to which explanation, the facial muscle groups can drive a crease in the skin according to affective tag
And the change of blee.
The embodiment of the present application, the voice data in the multi-modal data to getting carry out corresponding affection computation and obtain
To the affection data of user, the affective tag exported according to current state, semantic context decision-making, and according to the voice data
Corresponding text carries out cutting word to the voice data, then by the progress of the result of cutting word and the nozzle type model of visual human
Match somebody with somebody, then affection data is matched in voice output and facial movement and is merged, the virtual human face coordinated is presented.
Referring to Fig. 4, one embodiment of the application provides a kind of multi-modal interactive output method based on visual human, described virtual
People runs in smart machine, including step 401 is to step 406.
Step 401:Visual human is waken up, the visual human is shown in default display area.
In the embodiment of the present application, the visual human is built by 3D high moulds and generated, and possesses default image and technical ability,
Such as visual human can be the image appearance of real human, primate either cartoon figure etc., possessing can be according to connecing
Received voice carries out the function of nozzle type imitation and facial emotion behavior.
The default display area can include the projection of the display interface or intelligent holographic projector equipment of smart machine
Region.
In the embodiment of the present application, the visual human can in standby, dormancy isotype, it is automatic when needing to be imitated or
Visual human is waken up manually so that visual human shows that image, such as visual human are operated on smart mobile phone in smart machine
One application program, the image of the visual human is the face image of the real human of display after which opens, can
With the voice by obtaining one section of real human, cutting word after text is converted the speech into, the visual human is according to cutting word result
The imitation of nozzle type is carried out, and the corresponding affection data of the voice data can be calculated, when carrying out nozzle type imitation by described in
Affection data fusion is entered, and more vivid imitation is presented, when the application program without using when will enter on backstage it is temporary
When resting state, it is necessary to when use manually from backstage switch, you can wake up the visual human run in the application program.
In addition, the visual human can also be the hologram of intelligent holographic projector equipment projection, the hologram
Projected area is the display area of the visual human.
Step 402:Multi-modal data is obtained, voice data is included at least in the multi-modal data.
In the embodiment of the present application, the multi-modal data is acquired based on hardware, and the hardware can be smart machine
Built-in or external microphone, camera or touch-screen etc..
Step 403:The multi-modal data is parsed, to obtain the semantic data and affection data in the voice data.
In the embodiment of the present application, the multi-modal data is that brain i.e. server is parsed beyond the clouds, then will parsing
The facial parameter of obtained affection data, semantic data and visual human are matched.
Step 404:Cutting word is carried out according to the semantic data, the nozzle type model of cutting word result and the visual human are carried out
Matching, to generate mouth bionic data and export.
In the embodiment of the present application, the voice data is converted into text, then basis is obtained from the voice data
Semantic progress cutting word of the semantic data arrived to text based on context.
Step 405:For the affection data, affective tag is set.
Step 406:According to the corresponding facial parameter set of affective tag selection, to coordinate the mouth of the nozzle type model
Portion's bionic data.
In the embodiment of the present application, when the visual human is imitated according to one section of voice, the corresponding text of voice data
Determine cutting word as a result, the matching that the nozzle type model of the result of cutting word and the visual human are carried out, realizes the visual human
The mouth of the voice is imitated, while the visual human carries out mouth imitation, the face of the visual human also can basis
The affection data being calculated is changed accordingly, and mouth movement and face operation are combined realization pair by the visual human
The imitation of the voice of real human or primate, the nozzle type of the visual human is imitated and facial emotion changes what can be imitated
Incisively and vividly.
Referring to Fig. 5, exemplified by operating in the visual human on smart mobile phone and realize the speech imitation of real human, the application carries
For a kind of multi-modal interactive output method based on visual human, including step 501 is to step 507.
Step 501:Visual human is waken up, the visual human is shown in the default display area on smart mobile phone.
The visual human is configured with visual human's wake-up module, when meeting to wake up the preset condition of visual human with judgement, by void
Anthropomorphic status attribute transition are wake-up states, and wake-up condition for example can be that user sends the voice messaging for waking up some visual human
Or user wake up visual human action message, also or user directly input biological characteristic instruction.Visual human's wake-up module
When judging to meet to wake up the preset condition of visual human, then instructed according to wake-up and carry out wake operation.If the wake-up that user sends refers to
Order is without specific visual human is referred to, then system default is the last visual human waken up.
In the embodiment of the present application, the visual human can be to build to generate by 3D high moulds, and possess default image
And technical ability, such as the face image for the small C of real human face that nozzle type and facial emotions are imitated can be carried out, open in smart mobile phone
The application program (APP) of installation, the small C of visual human are operated in the APP, wake up the small C of the visual human, the visual human is small
C may be displayed on the default display area of smart mobile phone application program (APP), default display described in the embodiment of the present application
Region can be the middle position of smart mobile phone display screen.
Step 502:Multi-modal data is obtained, voice data is included at least in the multi-modal data.
In the embodiment of the present application, the multi-modal data can be real human, the class mankind or the primate of collection
Natural language, visual perception, the data for touching the generation such as perception, language voice, emotional facial expressions and action, the multi-modal number
According to the data from surrounding environment can also be included.
In the embodiment of the present application, multi-modal data, such as the multi-modal data for the small D of real human of acquisition are obtained, and
And one section of real speech comprising small D.
Below with the face image of the small C of virtual artificial real human face, the voice data got is the small D's of real human
Illustrated exemplified by one section of real speech.
Step 503:The multi-modal data is parsed, to obtain the semantic data and affection data in the voice data.
In the embodiment of the present application, the multi-modal data of the small D got is parsed, to obtain in the voice data
Semantic data and affection data, such as obtain the semantic data and affection data in one section of real speech of small D, and will obtain
The real speech of the small D got is converted into text as " glad to meet you again next having time again about ".
Step 504:Cutting word is carried out according to the semantic data, the nozzle type model of cutting word result and the visual human are carried out
Matching, to generate mouth bionic data and export.
In the embodiment of the present application, cutting word is carried out according to the semantic data, such as be converted into according to the real speech of small D
The semantic of context in text " glad to meet you again next having time again about " carries out cutting word to text, and cutting word result is
" again, see you, be very glad, next time, having time, again about ", then by the result of cutting word and the nozzle type model of the visual human
Matched, be that the movement of true mouth and the visual human small C that carry out will be needed during word pronunciation after above-mentioned cutting word
Mouth model is matched so that the small C of visual human can not only make a sound when above-mentioned voice is imitated, and face also can
Corresponding movement is occurred according to the difference of word.
Step 505:For the affection data, affective tag is set.
In the embodiment of the present application, the corresponding affection data is calculated according to voice data, to obtain the voice data
Corresponding mood, such as the affection data, the emotion that the affection data is correspondingly arranged are calculated by the real speech of small D
Label is unpleasant emotional label.
Step 506:According to the corresponding facial parameter set of affective tag selection, to coordinate the mouth of the nozzle type model
Portion's bionic data.
In the embodiment of the present application, according to the corresponding facial parameter set of affective tag selection, such as unpleasant emotional mark
Sign corresponding facial parameter collection and be combined into the corners of the mouth and raise up into radian, eyes are crescent shape, and skin of face fold is more.
When the small C of the visual human is imitated in the real speech to small D, not only make a sound, face can be according to word
Corresponding movement occurs for the difference of language, facial expression also can according to the corresponding facial parameter set driving of the affection data and
Facial face, a crease in the skin and the facial muscle groups for mixing the small C of the visual human are changed, and are, for example, that the corners of the mouth raises up into radian,
Eyes are crescent shape, the more smiling face's expression of skin of face fold, meanwhile, under facial muscle groups drive, and specific emotional
Also there is color or light and shade change in small C blees under the driving of (shy, angry) etc., and smiling face's expression can also basis
The effect of facial expression is also different caused by the either default occupation difference of current state, scene of the small C of visual human,
Such as smiling face's facial expression of service worker should be it is professional, and smiling face's facial expression of ordinary people should be compare with
Meaning.
Step 507:The face of the visual human carries out imitation and the face of mouth model according to the voice data received
The synchronism output of portion's expression.
In the embodiment of the present application, the small C of visual human carries out the movement of mouth according to the real speech of the small D received
Can be also changed at the same time according to the corresponding facial parameter set driving facial expression of affective tag, realize it is true to nature, smooth and
Anthropomorphic interaction effect.
In the embodiment of the present application, by carrying out semantic data and affection data parsing to the multi-modal data got, make
The face of visual human can carry out the imitation of face action and facial emotion according to analysis result, and enhancing user's visual sense feeling glues
Degree, is presented true to nature, smooth simulation interaction effect, improves interactive experience.
Referring to Fig. 6, this application provides a kind of multi-modal interactive output system based on visual human, including smart machine
601 and server 602, the visual human 603 runs in smart machine, wherein:
The smart machine 601 obtains multi-modal data, and voice data is included at least in the multi-modal data;
The server 602 parses the multi-modal data, to obtain the semantic data and emotion in the voice data
Data;
The server 602 carries out the facial parameter of the semantic data and the affection data and the visual human
Match somebody with somebody, generate facial bionic data;
The smart machine 601 receives the facial bionic data and exports.
Alternatively, the server 602 parses the multi-modal data and is implemented as:
Cutting word is carried out according to the semantic data, cutting word result is matched with the nozzle type model of the visual human 603,
To generate mouth bionic data and export;
For the affection data, affective tag is set;
According to the corresponding facial parameter set of affective tag selection, to coordinate the bionical number of mouth of the nozzle type model
According to.
The multi-modal interactive output system based on visual human of the embodiment of the present application, passes through the multi-modal data to getting
Semantic data and affection data parsing are carried out, the face of visual human is carried out face action and facial feelings according to analysis result
The imitation of sense, strengthens the viscosity of user's visual sense feeling, and true to nature, smooth simulation interaction effect is presented, improves interactive experience.
The exemplary scheme of the above-mentioned multi-modal interactive output system based on visual human for the present embodiment.Need what is illustrated
Be, should multi-modal interactive output system based on visual human technical solution interacted with above-mentioned multi-modal based on visual human it is defeated
The technical solution for going out method belongs to same design, and the technical solution of the multi-modal interactive output system based on visual human is not retouched in detail
The detail content stated, may refer to the description of the technical solution of the above-mentioned multi-modal interactive output method based on visual human.
One embodiment of the application also provides a kind of visual human, and the visual human performs the above method.
The exemplary scheme of above-mentioned visual human for the present embodiment a kind of.It should be noted that the technical side of the visual human
Case belongs to same design, the technical side of visual human with the above-mentioned multi-modal technical solution for interacting output method based on visual human
The detail content that case is not described in detail, may refer to the technical solution of the above-mentioned multi-modal interactive output method based on visual human
Description.
This application provides a kind of smart machine, above-mentioned visual human is run on the smart machine.
A kind of exemplary scheme of above-mentioned smart machine for the present embodiment.It should be noted that the skill of the smart machine
Art scheme belongs to same design with the above-mentioned multi-modal technical solution for interacting output method based on visual human, smart machine
The detail content that technical solution is not described in detail, may refer to the skill of the above-mentioned multi-modal interactive output method based on visual human
The description of art scheme.
The smart machine of the application can include processor and memory, and the memory storage has computer instruction, institute
Processor is stated to call the computer instruction and perform the foregoing multi-modal interactive output method based on visual human.
It should be noted that the smart machine can be desktop PC, notebook, palm PC and mobile terminal
Deng computing device.
The processor can be central processing unit (Central Processing Unit, CPU), can also be it
His general processor, digital signal processor (Digital Signal Processor, DSP), application-specific integrated circuit
(Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-
Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic,
Discrete hardware components etc..General processor can be microprocessor or the processor can also be any conventional processor
Deng the processor is the control centre of the terminal, utilizes various interfaces and the various pieces of the whole terminal of connection.
The memory mainly includes storing program area and storage data field, wherein, storing program area can store operation system
Application program (such as sound-playing function, image player function etc.) needed for system, at least one function etc.;Storage data field can
Storage uses created data (such as voice data, phone directory etc.) etc. according to mobile phone.In addition, memory can include height
Fast random access memory, can also include nonvolatile memory, such as hard disk, memory, plug-in type hard disk, intelligent memory card
(Smart MediaCard, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card), at least one
A disk memory, flush memory device or other volatile solid-state parts.
This application provides a kind of computer-readable recording medium, it is stored with computer program, it is characterised in that the journey
The step of above-mentioned multi-modal interactive output method based on visual human is realized when sequence is executed by processor.
A kind of exemplary scheme of above-mentioned computer-readable recording medium for the present embodiment.It should be noted that the meter
The technical solution of calculation machine readable storage medium storing program for executing and the above-mentioned multi-modal technical solution category for interacting output method based on visual human
In same design, detail content that the technical solution of computer-readable recording medium is not described in detail may refer to above-mentioned base
In the description of the technical solution of the multi-modal interactive output method of visual human.
The computer instruction includes computer program code, the computer program code can be source code form,
Object identification code form, executable file or some intermediate forms etc..The computer-readable medium can include:Institute can be carried
Any entity or device, recording medium, USB flash disk, mobile hard disk, magnetic disc, CD, the computer for stating computer program code store
Device, read-only storage (ROM, Read-OnlyMemory), random access memory (RAM, Random Access Memory),
Electric carrier signal, telecommunication signal and software distribution medium etc..It should be noted that the computer-readable medium include it is interior
Appropriate increase and decrease can be carried out according to legislation in jurisdiction and the requirement of patent practice by holding, such as in some jurisdictions of courts
Area, according to legislation and patent practice, computer-readable medium does not include electric carrier signal and telecommunication signal.
It should be noted that for foregoing each method embodiment, describe, therefore it is all expressed as a series of for simplicity
Combination of actions, but those skilled in the art should know, the application and from the limitation of described sequence of movement because
According to the application, some steps can use other orders or be carried out at the same time.Secondly, those skilled in the art should also know
Know, embodiment described in this description belongs to preferred embodiment, and involved action and module might not all be this Shens
Please be necessary.
In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, and does not have the portion being described in detail in some embodiment
Point, it may refer to the associated description of other embodiments.
The application preferred embodiment disclosed above is only intended to help and illustrates the application.Alternative embodiment is not detailed
All details are described, are not limited the invention to the specific embodiments described.Obviously, according to the content of this specification,
It can make many modifications and variations.This specification is chosen and specifically describes these embodiments, is in order to preferably explain the application
Principle and practical application so that skilled artisan can be best understood by and utilize the application.The application is only
Limited by claims and its four corner and equivalent.
Claims (13)
- A kind of 1. multi-modal interactive output method based on visual human, it is characterised in that the visual human runs in smart machine, Including:Multi-modal data is obtained, voice data is included at least in the multi-modal data;The multi-modal data is parsed, to obtain the semantic data and affection data in the voice data;The semantic data and the affection data are matched with the facial parameter of the visual human, the bionical number of generation face According to and export.
- 2. according to the method described in claim 1, it is characterized in that, before obtaining multi-modal data, further include:Visual human is waken up, the visual human is shown in default display area.
- 3. according to the method described in claim 1, it is characterized in that, by the semantic data and the affection data and the void Anthropomorphic facial parameter is matched, and generating facial bionic data and exporting includes:Cutting word is carried out according to the semantic data, cutting word result is matched with the nozzle type model of the visual human, with generation Mouth bionic data simultaneously exports;For the affection data, affective tag is set;According to the corresponding facial parameter set of affective tag selection, to coordinate the mouth bionic data of the nozzle type model.
- 4. according to the method described in claims 1 to 3 any one, it is characterised in that the facial parameter of the visual human includes The facial skeleton, described a crease in the skin, the facial muscle groups and/or the blee.
- 5. according to the method described in claim 4, it is characterized in that, the face parameter set includes but not limited to:The facial skeleton and the bionical synergistic data of facial muscle groups movement;The facial skeleton and the bionical synergistic data of described a crease in the skin movement;Described a crease in the skin and the bionical synergistic data of facial muscle groups movement;OrThe facial skeleton, described a crease in the skin, the bionical synergistic data of the facial muscle groups and/or the blee.
- 6. according to the method described in claim 1, it is characterized in that, the visual human builds generation by 3D high moulds, possess pre- If image and technical ability;The visual human is including operating in the application program on the smart machine, executable file or by the smart machine The hologram projected.
- 7. according to the method described in claim 6, it is characterized in that, the system that the smart machine uses includes but not limited to WINDOWS systems, MAC OS systems or hologram device built-in system.
- 8. according to the method described in claim 2, it is characterized in that, the default display area includes the smart machine The projected area of display interface or the smart machine.
- 9. a kind of multi-modal interactive output system based on visual human, it is characterised in that described including smart machine and server Visual human runs in smart machine, wherein:The smart machine obtains multi-modal data, and voice data is included at least in the multi-modal data;The server parses the multi-modal data, to obtain the semantic data and affection data in the voice data;The server is matched the semantic data and the affection data with the facial parameter of the visual human, generation Facial bionic data;The smart machine receives the facial bionic data and exports.
- 10. system according to claim 9, it is characterised in that it is specifically real that the server parses the multi-modal data It is now:Cutting word is carried out according to the semantic data, cutting word result is matched with the nozzle type model of the visual human, with generation Mouth bionic data simultaneously exports;For the affection data, affective tag is set;According to the corresponding facial parameter set of affective tag selection, to coordinate the mouth bionic data of the nozzle type model.
- 11. a kind of visual human, it is characterised in that the visual human runs in smart machine, visual human's perform claim requirement Method described in 1-8 any one.
- 12. a kind of smart machine, it is characterised in that the visual human described in claim 11 is run on the smart machine.
- A kind of 13. computer-readable recording medium, it is characterised in that it is stored with computer program, it is characterised in that the program The step of claim 1-8 any one the methods are realized when being executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711162023.2A CN107944542A (en) | 2017-11-21 | 2017-11-21 | A kind of multi-modal interactive output method and system based on visual human |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711162023.2A CN107944542A (en) | 2017-11-21 | 2017-11-21 | A kind of multi-modal interactive output method and system based on visual human |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107944542A true CN107944542A (en) | 2018-04-20 |
Family
ID=61929421
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711162023.2A Pending CN107944542A (en) | 2017-11-21 | 2017-11-21 | A kind of multi-modal interactive output method and system based on visual human |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107944542A (en) |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108942919A (en) * | 2018-05-28 | 2018-12-07 | 北京光年无限科技有限公司 | A kind of exchange method and system based on visual human |
CN109032328A (en) * | 2018-05-28 | 2018-12-18 | 北京光年无限科技有限公司 | A kind of exchange method and system based on visual human |
CN109086860A (en) * | 2018-05-28 | 2018-12-25 | 北京光年无限科技有限公司 | A kind of exchange method and system based on visual human |
CN109087644A (en) * | 2018-10-22 | 2018-12-25 | 奇酷互联网络科技(深圳)有限公司 | Electronic equipment and its exchange method of voice assistant, the device with store function |
CN109410297A (en) * | 2018-09-14 | 2019-03-01 | 重庆爱奇艺智能科技有限公司 | It is a kind of for generating the method and apparatus of avatar image |
CN109448737A (en) * | 2018-08-30 | 2019-03-08 | 百度在线网络技术(北京)有限公司 | Creation method, device, electronic equipment and the storage medium of virtual image |
CN109473122A (en) * | 2018-11-12 | 2019-03-15 | 平安科技(深圳)有限公司 | Mood analysis method, device and terminal device based on detection model |
CN109961152A (en) * | 2019-03-14 | 2019-07-02 | 广州多益网络股份有限公司 | Personalized interactive method, system, terminal device and the storage medium of virtual idol |
CN110070879A (en) * | 2019-05-13 | 2019-07-30 | 吴小军 | A method of intelligent expression and phonoreception game are made based on change of voice technology |
CN110070944A (en) * | 2019-05-17 | 2019-07-30 | 段新 | Training system is assessed based on virtual environment and the social function of virtual role |
CN110874557A (en) * | 2018-09-03 | 2020-03-10 | 阿里巴巴集团控股有限公司 | Video generation method and device for voice-driven virtual human face |
CN111240482A (en) * | 2020-01-10 | 2020-06-05 | 北京字节跳动网络技术有限公司 | Special effect display method and device |
CN111290682A (en) * | 2018-12-06 | 2020-06-16 | 阿里巴巴集团控股有限公司 | Interaction method and device and computer equipment |
CN111354370A (en) * | 2020-02-13 | 2020-06-30 | 百度在线网络技术(北京)有限公司 | Lip shape feature prediction method and device and electronic equipment |
CN111930907A (en) * | 2020-08-06 | 2020-11-13 | 北京艾阿智能科技有限公司 | Intelligent interactive dialogue engine simulating human communication through simulation |
CN112331209A (en) * | 2020-11-03 | 2021-02-05 | 建信金融科技有限责任公司 | Method and device for converting voice into text, electronic equipment and readable storage medium |
CN112529992A (en) * | 2019-08-30 | 2021-03-19 | 阿里巴巴集团控股有限公司 | Dialogue processing method, device, equipment and storage medium of virtual image |
CN113050794A (en) * | 2021-03-24 | 2021-06-29 | 北京百度网讯科技有限公司 | Slider processing method and device for virtual image |
CN114245155A (en) * | 2021-11-30 | 2022-03-25 | 北京百度网讯科技有限公司 | Live broadcast method and device and electronic equipment |
CN114489326A (en) * | 2021-12-30 | 2022-05-13 | 南京七奇智能科技有限公司 | Crowd-oriented gesture control device and method driven by virtual human interaction attention |
US20220157036A1 (en) * | 2021-03-24 | 2022-05-19 | Beijing Baidu Netcom Science Technology Co., Ltd. | Method for generating virtual character, electronic device, and storage medium |
CN114519895A (en) * | 2022-02-21 | 2022-05-20 | 上海元梦智能科技有限公司 | Virtual human action configuration method and device |
CN115390678A (en) * | 2022-10-27 | 2022-11-25 | 科大讯飞股份有限公司 | Virtual human interaction method and device, electronic equipment and storage medium |
CN116778041A (en) * | 2023-08-22 | 2023-09-19 | 北京百度网讯科技有限公司 | Multi-mode-based face image generation method, model training method and equipment |
CN117152308A (en) * | 2023-09-05 | 2023-12-01 | 南京八点八数字科技有限公司 | Virtual person action expression optimization method and system |
CN117590944A (en) * | 2023-11-28 | 2024-02-23 | 上海源庐加佳信息科技有限公司 | Binding system for physical person object and digital virtual person object |
WO2024054713A1 (en) * | 2022-09-07 | 2024-03-14 | Qualcomm Incorporated | Avatar facial expressions based on semantical context |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101474481A (en) * | 2009-01-12 | 2009-07-08 | 北京科技大学 | Emotional robot system |
CN103258532A (en) * | 2012-11-28 | 2013-08-21 | 河海大学常州校区 | Method for recognizing Chinese speech emotions based on fuzzy support vector machine |
CN104123938A (en) * | 2013-04-29 | 2014-10-29 | 富泰华工业(深圳)有限公司 | Voice control system, electronic device and voice control method |
CN106024014A (en) * | 2016-05-24 | 2016-10-12 | 努比亚技术有限公司 | Voice conversion method and device and mobile terminal |
CN106485774A (en) * | 2016-12-30 | 2017-03-08 | 当家移动绿色互联网技术集团有限公司 | Expression based on voice Real Time Drive person model and the method for attitude |
CN106531162A (en) * | 2016-10-28 | 2017-03-22 | 北京光年无限科技有限公司 | Man-machine interaction method and device used for intelligent robot |
CN106985137A (en) * | 2017-03-09 | 2017-07-28 | 北京光年无限科技有限公司 | Multi-modal exchange method and system for intelligent robot |
CN107340859A (en) * | 2017-06-14 | 2017-11-10 | 北京光年无限科技有限公司 | The multi-modal exchange method and system of multi-modal virtual robot |
US20170352351A1 (en) * | 2014-10-29 | 2017-12-07 | Kyocera Corporation | Communication robot |
-
2017
- 2017-11-21 CN CN201711162023.2A patent/CN107944542A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101474481A (en) * | 2009-01-12 | 2009-07-08 | 北京科技大学 | Emotional robot system |
CN103258532A (en) * | 2012-11-28 | 2013-08-21 | 河海大学常州校区 | Method for recognizing Chinese speech emotions based on fuzzy support vector machine |
CN104123938A (en) * | 2013-04-29 | 2014-10-29 | 富泰华工业(深圳)有限公司 | Voice control system, electronic device and voice control method |
US20170352351A1 (en) * | 2014-10-29 | 2017-12-07 | Kyocera Corporation | Communication robot |
CN106024014A (en) * | 2016-05-24 | 2016-10-12 | 努比亚技术有限公司 | Voice conversion method and device and mobile terminal |
CN106531162A (en) * | 2016-10-28 | 2017-03-22 | 北京光年无限科技有限公司 | Man-machine interaction method and device used for intelligent robot |
CN106485774A (en) * | 2016-12-30 | 2017-03-08 | 当家移动绿色互联网技术集团有限公司 | Expression based on voice Real Time Drive person model and the method for attitude |
CN106985137A (en) * | 2017-03-09 | 2017-07-28 | 北京光年无限科技有限公司 | Multi-modal exchange method and system for intelligent robot |
CN107340859A (en) * | 2017-06-14 | 2017-11-10 | 北京光年无限科技有限公司 | The multi-modal exchange method and system of multi-modal virtual robot |
Non-Patent Citations (1)
Title |
---|
《运动解剖学、运动医学大辞典》编辑委员会: "《运动解剖学、运动医学大辞典》", 31 December 1999 * |
Cited By (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109032328A (en) * | 2018-05-28 | 2018-12-18 | 北京光年无限科技有限公司 | A kind of exchange method and system based on visual human |
CN109086860A (en) * | 2018-05-28 | 2018-12-25 | 北京光年无限科技有限公司 | A kind of exchange method and system based on visual human |
CN108942919A (en) * | 2018-05-28 | 2018-12-07 | 北京光年无限科技有限公司 | A kind of exchange method and system based on visual human |
CN109086860B (en) * | 2018-05-28 | 2022-03-15 | 北京光年无限科技有限公司 | Interaction method and system based on virtual human |
CN109448737A (en) * | 2018-08-30 | 2019-03-08 | 百度在线网络技术(北京)有限公司 | Creation method, device, electronic equipment and the storage medium of virtual image |
CN109448737B (en) * | 2018-08-30 | 2020-09-01 | 百度在线网络技术(北京)有限公司 | Method and device for creating virtual image, electronic equipment and storage medium |
CN110874557B (en) * | 2018-09-03 | 2023-06-16 | 阿里巴巴集团控股有限公司 | Voice-driven virtual face video generation method and device |
CN110874557A (en) * | 2018-09-03 | 2020-03-10 | 阿里巴巴集团控股有限公司 | Video generation method and device for voice-driven virtual human face |
CN109410297A (en) * | 2018-09-14 | 2019-03-01 | 重庆爱奇艺智能科技有限公司 | It is a kind of for generating the method and apparatus of avatar image |
CN109087644A (en) * | 2018-10-22 | 2018-12-25 | 奇酷互联网络科技(深圳)有限公司 | Electronic equipment and its exchange method of voice assistant, the device with store function |
CN109473122A (en) * | 2018-11-12 | 2019-03-15 | 平安科技(深圳)有限公司 | Mood analysis method, device and terminal device based on detection model |
CN111290682A (en) * | 2018-12-06 | 2020-06-16 | 阿里巴巴集团控股有限公司 | Interaction method and device and computer equipment |
CN109961152B (en) * | 2019-03-14 | 2021-03-02 | 广州多益网络股份有限公司 | Personalized interaction method and system of virtual idol, terminal equipment and storage medium |
CN109961152A (en) * | 2019-03-14 | 2019-07-02 | 广州多益网络股份有限公司 | Personalized interactive method, system, terminal device and the storage medium of virtual idol |
CN110070879A (en) * | 2019-05-13 | 2019-07-30 | 吴小军 | A method of intelligent expression and phonoreception game are made based on change of voice technology |
CN110070944B (en) * | 2019-05-17 | 2023-12-08 | 段新 | Social function assessment training system based on virtual environment and virtual roles |
CN110070944A (en) * | 2019-05-17 | 2019-07-30 | 段新 | Training system is assessed based on virtual environment and the social function of virtual role |
CN112529992A (en) * | 2019-08-30 | 2021-03-19 | 阿里巴巴集团控股有限公司 | Dialogue processing method, device, equipment and storage medium of virtual image |
CN111240482A (en) * | 2020-01-10 | 2020-06-05 | 北京字节跳动网络技术有限公司 | Special effect display method and device |
CN111354370A (en) * | 2020-02-13 | 2020-06-30 | 百度在线网络技术(北京)有限公司 | Lip shape feature prediction method and device and electronic equipment |
CN111930907A (en) * | 2020-08-06 | 2020-11-13 | 北京艾阿智能科技有限公司 | Intelligent interactive dialogue engine simulating human communication through simulation |
CN112331209A (en) * | 2020-11-03 | 2021-02-05 | 建信金融科技有限责任公司 | Method and device for converting voice into text, electronic equipment and readable storage medium |
CN112331209B (en) * | 2020-11-03 | 2023-08-08 | 建信金融科技有限责任公司 | Method and device for converting voice into text, electronic equipment and readable storage medium |
US11842457B2 (en) | 2021-03-24 | 2023-12-12 | Beijing Baidu Netcom Science Technology Co., Ltd. | Method for processing slider for virtual character, electronic device, and storage medium |
CN113050794A (en) * | 2021-03-24 | 2021-06-29 | 北京百度网讯科技有限公司 | Slider processing method and device for virtual image |
EP3989179A3 (en) * | 2021-03-24 | 2022-08-17 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for processing slider for virtual character |
US20220157036A1 (en) * | 2021-03-24 | 2022-05-19 | Beijing Baidu Netcom Science Technology Co., Ltd. | Method for generating virtual character, electronic device, and storage medium |
US20220122337A1 (en) * | 2021-03-24 | 2022-04-21 | Beijing Baidu Netcom Science Technology Co., Ltd. | Method for processing slider for virtual character, electronic device, and storage medium |
CN114245155A (en) * | 2021-11-30 | 2022-03-25 | 北京百度网讯科技有限公司 | Live broadcast method and device and electronic equipment |
CN114489326A (en) * | 2021-12-30 | 2022-05-13 | 南京七奇智能科技有限公司 | Crowd-oriented gesture control device and method driven by virtual human interaction attention |
CN114489326B (en) * | 2021-12-30 | 2023-12-15 | 南京七奇智能科技有限公司 | Crowd-oriented virtual human interaction attention driven gesture control device and method |
CN114519895A (en) * | 2022-02-21 | 2022-05-20 | 上海元梦智能科技有限公司 | Virtual human action configuration method and device |
WO2024054713A1 (en) * | 2022-09-07 | 2024-03-14 | Qualcomm Incorporated | Avatar facial expressions based on semantical context |
CN115390678A (en) * | 2022-10-27 | 2022-11-25 | 科大讯飞股份有限公司 | Virtual human interaction method and device, electronic equipment and storage medium |
CN116778041B (en) * | 2023-08-22 | 2023-12-12 | 北京百度网讯科技有限公司 | Multi-mode-based face image generation method, model training method and equipment |
CN116778041A (en) * | 2023-08-22 | 2023-09-19 | 北京百度网讯科技有限公司 | Multi-mode-based face image generation method, model training method and equipment |
CN117152308A (en) * | 2023-09-05 | 2023-12-01 | 南京八点八数字科技有限公司 | Virtual person action expression optimization method and system |
CN117152308B (en) * | 2023-09-05 | 2024-03-22 | 江苏八点八智能科技有限公司 | Virtual person action expression optimization method and system |
CN117590944A (en) * | 2023-11-28 | 2024-02-23 | 上海源庐加佳信息科技有限公司 | Binding system for physical person object and digital virtual person object |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107944542A (en) | A kind of multi-modal interactive output method and system based on visual human | |
CN107894833B (en) | Multi-modal interaction processing method and system based on virtual human | |
US11890748B2 (en) | Socially assistive robot | |
CN108665492B (en) | Dance teaching data processing method and system based on virtual human | |
CN107679519A (en) | A kind of multi-modal interaction processing method and system based on visual human | |
CN107797663A (en) | Multi-modal interaction processing method and system based on visual human | |
CN113760101B (en) | Virtual character control method and device, computer equipment and storage medium | |
CN107765852A (en) | Multi-modal interaction processing method and system based on visual human | |
CN108942919B (en) | Interaction method and system based on virtual human | |
CN107765856A (en) | Visual human's visual processing method and system based on multi-modal interaction | |
CN108052250A (en) | Virtual idol deductive data processing method and system based on multi-modal interaction | |
CN109086860B (en) | Interaction method and system based on virtual human | |
CN110598576A (en) | Sign language interaction method and device and computer medium | |
CN107831905A (en) | A kind of virtual image exchange method and system based on line holographic projections equipment | |
CN108037825A (en) | The method and system that a kind of virtual idol technical ability is opened and deduced | |
CN109278051A (en) | Exchange method and system based on intelligent robot | |
KR102222911B1 (en) | System for Providing User-Robot Interaction and Computer Program Therefore | |
CN109032328A (en) | A kind of exchange method and system based on visual human | |
CN110837294A (en) | Facial expression control method and system based on eyeball tracking | |
Ochs et al. | 18 facial expressions of emotions for virtual characters | |
CN109343695A (en) | Exchange method and system based on visual human's behavioral standard | |
CN108416420A (en) | Limbs exchange method based on visual human and system | |
CN109542389A (en) | Sound effect control method and system for the output of multi-modal story content | |
CN108595012A (en) | Visual interactive method and system based on visual human | |
CN107817799B (en) | Method and system for intelligent interaction by combining virtual maze |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180420 |
|
RJ01 | Rejection of invention patent application after publication |