CN109324688A

CN109324688A - Exchange method and system based on visual human's behavioral standard

Info

Publication number: CN109324688A
Application number: CN201810953818.3A
Authority: CN
Inventors: 尚小维; 李晓丹; 俞志晨
Original assignee: Beijing Guangnian Wuxian Technology Co Ltd
Current assignee: Beijing Guangnian Wuxian Technology Co Ltd
Priority date: 2018-08-21
Filing date: 2018-08-21
Publication date: 2019-02-12

Abstract

The present invention provides a kind of exchange method based on visual human's behavioral standard, visual human is shown by smart machine, starting voice, emotion, vision and sensing capability when being in interaction mode, include: obtaining multi-modal interaction data, multi-modal interaction data is parsed, the interaction for obtaining user is intended to；It is intended to generate visual human's language response data and corresponding visual human's behavior expression data according to interaction, wherein, visual human's behavior expression data include the headwork data of visual human, eye sight control data and facial expression data, body motion data and major beat data；Visual human's behavior expression data are cooperated to export visual human's language response data.The present invention can cooperate output visual human's behavior expression data when exporting multi-modal response data, express the emotion of visual human, so that being able to carry out the smooth interactive experience for exchanging, and making user's enjoyment anthropomorphic between user and visual human.

Description

Exchange method and system based on visual human's behavioral standard

Technical field

The present invention relates to artificial intelligence fields, specifically, being related to a kind of exchange method based on visual human's behavioral standard And system.

Background technique

The exploitation of robot multi-modal interactive system is dedicated to imitating human conversation, to attempt to imitate people between context Interaction between class.But at present for, the exploitation of robot multi-modal interactive system relevant for visual human is also less complete It is kind, not yet occur carrying out the visual human of multi-modal interaction, it is even more important that there is no and carry out based on visual human itself behavioral standard Interactive interactive product.

Therefore, the present invention provides a kind of exchange method and system based on visual human's behavioral standard.

Summary of the invention

To solve the above problems, the present invention provides a kind of exchange method based on visual human's behavioral standard, it is described virtual People is shown by smart machine, starts voice, emotion, vision and sensing capability, the method packet when being in interaction mode Containing following steps:

Multi-modal interaction data is obtained, the multi-modal interaction data is parsed, the interaction for obtaining user is intended to；

It is intended to generate visual human's language response data and corresponding visual human's behavior expression number according to the interaction According to, wherein visual human's behavior expression data include the headwork data of visual human, eye sight control data and face Expression data, body motion data and major beat data；

Visual human's behavior expression data are cooperated to export visual human's language response data.

According to one embodiment of present invention, according to interaction intention generation visual human's language response data and therewith In the step of corresponding visual human's behavior expression data, also comprise the steps of:

Visual human's language response data is parsed, the virtual human feelings for including in visual human's language response data are extracted Feel information；

It obtains and the matched visual human's headwork data of visual human's emotion information, eye sight control data and face Portion's expression data, body motion data and major beat data；

Visual human's behavior expression data are generated according to matched result.

According to one embodiment of present invention, visual human's emotion information includes: positive emotion, negative sense emotion, attitude emotion And communication emotion, wherein positive emotion includes: happy, self-confident, expect and surprised；Negative sense emotion include it is angry, frightened, Sad and detest；Attitude emotion includes: accepting and does not accept；Emotion of communicating includes greeting and goodbye.

According to one embodiment of present invention, the step of generating visual human's behavior expression data according to matched result In, also comprise the steps of:

Classify according to visual human's emotion information, checks headwork data, the eye sight control data of visual human And visual human's emotion that facial expression data, body motion data and major beat data respectively represent whether there is conflict simultaneously Replacement.

According to one embodiment of present invention, the visual human has specific virtual image and preset attribute, is generating When visual human's behavior expression data, with headwork data, the eye sight of the visual human that the preset attribute is not consistent Control data and facial expression data, body motion data and major beat data are not involved in decision and output.

According to one embodiment of present invention, cooperate visual human's behavior expression data to export visual human's language to return In the step of answering data comprising the steps of:

Determine the headwork data, eye sight control data and facial expression data, body motion data of visual human And output time, performance degree and the duration of major beat data.

According to one embodiment of present invention, if current scene is no voice output scene, then according to the current shape of visual human State exports the headwork data of matched visual human, eye sight control data and facial expression data, body motion data And major beat data.

According to another aspect of the present invention, a kind of interactive device based on visual human's behavioral standard is additionally provided, it is described Device includes:

Interaction is intended to obtain module, is used to obtain multi-modal interaction data, solves to the multi-modal interaction data Analysis, the interaction for obtaining user are intended to；

Visual human's behavior expression data generation module is used to be intended to according to the interaction to generate visual human's language and responds number Accordingly and corresponding visual human's behavior expression data, wherein visual human's behavior expression data include the head of visual human Portion's action data, eye sight control data and facial expression data, body motion data and major beat data；

Output module is used to that visual human's behavior expression data to be cooperated to export visual human's language response data.

According to another aspect of the present invention, a kind of program product is additionally provided, program is run for visual human, for holding The series of instructions of row described in any item method and steps as above.

According to another aspect of the present invention, a kind of interactive system based on visual human's behavioral standard is additionally provided, it is described System includes:

Smart machine is mounted with visual human thereon, for obtaining multi-modal interaction data, and has voice, emotion, expression With the ability of movement output, the smart machine includes hologram device；

Cloud brain, be used to carry out the multi-modal interaction data semantic understanding, visual identity, cognition calculate and Affection computation exports visual human's behavior expression data with visual human described in decision and visual human's language responds number According to.

Exchange method and system provided by the invention based on visual human's behavioral standard provides a kind of visual human, visual human Have default image and preset attribute, multi-modal interaction can be carried out with user.Also, it is provided by the invention to be based on visual human The exchange method and system of behavioral standard can also cooperate output visual human's behavior expression number when exporting multi-modal response data According to, express the emotion of visual human so that be able to carry out between user and visual human it is smooth exchange, and it is anthropomorphic that user is enjoyed Interactive experience.

Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification It obtains it is clear that understand through the implementation of the invention.The objectives and other advantages of the invention can be by specification, right Specifically noted structure is achieved and obtained in claim and attached drawing.

Detailed description of the invention

Attached drawing is used to provide further understanding of the present invention, and constitutes part of specification, with reality of the invention It applies example and is used together to explain the present invention, be not construed as limiting the invention.In the accompanying drawings:

Fig. 1 shows that the interaction of the interactive system according to an embodiment of the invention based on visual human's behavioral standard is shown It is intended to；

Fig. 2 shows the structural frames of the interactive system according to an embodiment of the invention based on visual human's behavioral standard Figure；

Fig. 3 shows the module frame of the interactive system according to an embodiment of the invention based on visual human's behavioral standard Figure；

Fig. 4 shows the structure of the interactive system based on visual human's behavioral standard according to another embodiment of the invention Block diagram；

Fig. 5 shows the exchange method flow chart according to an embodiment of the invention based on visual human's behavioral standard；

Fig. 6, which is shown, generates void in the exchange method according to an embodiment of the invention based on visual human's behavioral standard The flow chart of anthropomorphic behavior expression data；

Fig. 7 shows emotion parameter classification schematic diagram according to an embodiment of the invention；

Fig. 8 shows another stream of the exchange method according to an embodiment of the invention based on visual human's behavioral standard Cheng Tu；And

Fig. 9 show it is according to an embodiment of the invention user, smart machine and cloud brain between the parties The flow chart communicated.

Specific embodiment

To make the object, technical solutions and advantages of the present invention clearer, the embodiment of the present invention is made below in conjunction with attached drawing Further it is described in detail.

It is clear to state, it needs to carry out before embodiment as described below:

The visual human that the present invention mentions is equipped on the smart machine for supporting the input/output modules such as perception, control；With Gao Fang True 3d virtual figure image is Main User Interface, has the appearance of significant character features；It supports multi-modal human-computer interaction, has Natural language understanding, visual perception touch the AI abilities such as perception, language voice output, emotional facial expressions movement output；Configurable society Meeting attribute, personality attribute, personage's technical ability etc. make user enjoy the virtual portrait of intelligent and personalized Flow Experience.

Visual human's smart machine mounted are as follows: have the input of non-tactile, non-mouse-keyboard screen (holography, TV screen, Multimedia display screen, LED screen etc.), and the smart machine of camera is carried, meanwhile, it can be hologram device, VR equipment, PC Machine.But other smart machines are not precluded, such as: hand-held plate, naked eye 3D equipment, even smart phone.

Visual human interacts in system level and user, and operating system, such as hologram device are run in the system hardware Built-in system is windows or MAC OS if PC.

Virtual artificial system application or executable file.

Virtual robot obtains the multi-modal interaction data of user based on the hardware of the smart machine, beyond the clouds the energy of brain Under power is supported, semantic understanding, visual identity, cognition calculating, affection computation are carried out to multi-modal interaction data, it is defeated to complete decision Process out.

The cloud brain being previously mentioned is to provide the visual human to carry out semantic understanding (language semantic to the interaction demand of user Understanding, Action Semantic understanding, visual identity, affection computation, cognition calculate) processing capacity terminal, realize and the friendship of user Mutually, with the multi-modal response data of the output of visual human described in decision.

Each embodiment of the invention is described in detail with reference to the accompanying drawing.

Fig. 1 shows that the interaction of the interactive system according to an embodiment of the invention based on visual human's behavioral standard is shown It is intended to.As shown in Figure 1, carrying out multi-modal interaction needs user 101, smart machine 102, visual human 103 and cloud brain 104.Wherein, the user 101 interacted with visual human can be the visual human of true people, another visual human and entity, another Visual human and entity visual human are similar with the interactive process of visual human with the interactive process of visual human with single people.Therefore, Only show the multi-modal interactive process of user (people) Yu visual human in Fig. 1.

In addition, smart machine 102 includes 1022 (substantially core processing of display area 1021 and hardware supported equipment Device).Display area 1021 is used to show that the image of visual human 103, hardware supported equipment 1022 to make with the cooperation of cloud brain 104 With for the data processing in interactive process.Visual human 103 needs screen display carrier to present.Therefore, display area 1021 is wrapped It includes: holographic screen, TV screen, multimedia display screen and LED screen etc..

The process interacted between visual human and user 101 in Fig. 1 are as follows:

Interaction required early-stage preparations or condition have, and visual human is carried and operated on smart machine 102, and virtual People has specific image characteristics.Visual human has natural language understanding, visual perception, touches perception, language output, emotion table The AI abilities such as feelings movement output.In order to cooperate the touch perceptional function of visual human, it is also required to be equipped on smart machine and has touching Touch the component of perceptional function.According to one embodiment of present invention, in order to promote interactive experience, visual human after being activated just It is shown in predeterminable area.

It should be noted that the image of visual human 103 and dressing up and being not limited to one mode.Visual human 103 can be with Have different images and dresss up.The image of visual human 103 is generally 3D high mould animating image.Visual human 103 can have Different appearance and decoration.The image of every kind of visual human 103 can also correspond to it is a variety of different dress up, the classification dressed up can be according to Classify according to season, can also classify according to occasion.These images and dresss up and can reside in cloud brain 104, it can also be with It is present in smart machine 102, can be called at any time when needing to call these images and dress up.

Social property, personality attribute and the personage's technical ability of visual human 103 is also not necessarily limited to a kind of or a kind of.Visual human 103 can have a variety of social properties, multiple personality attribute and a variety of personage's technical ability.These social properties, personality attribute with And personage's technical ability can arrange in pairs or groups respectively, and be not secured to a kind of collocation mode, user, which can according to need, to be selected and arranges in pairs or groups.

Specifically, social property may include: appearance, name, dress ornament, decoration, gender, native place, age, family pass The attributes such as system, occupation, position, religious belief, emotion state, educational background；Personality attribute may include: the attributes such as personality, makings；People The professional skills such as object technical ability may include: sing and dance, tells a story, trains, and the displaying of personage's technical ability is not limited to limbs, table The technical ability of feelings, head and/or mouth is shown.

In this application, the social property of visual human, personality attribute and personage's technical ability etc. can make multi-modal interaction Parsing and the result of decision are more prone to or are more suitable for the visual human.

The following are multi-modal interactive processes, firstly, obtaining multi-modal interaction data, solve to multi-modal interaction data Analysis, the interaction for obtaining user are intended to.The reception device for obtaining multi-modal interaction data is respectively mounted or is configured at smart machine 102 On, these reception devices include the received text device for receiving text, receive the pronunciation receiver of voice, receive taking the photograph for vision As head and the infrored equipment etc. of reception perception information.

Then, it is intended to generate visual human's language response data and corresponding visual human's behavior expression number according to interaction According to, wherein visual human's behavior expression data include the headwork data of visual human, eye sight control data and facial expression Data, body motion data and major beat data.

Finally, cooperation visual human's behavior expression data export visual human's language response data.

Fig. 2 shows the structural frames of the interactive system according to an embodiment of the invention based on visual human's behavioral standard Figure.As shown in Fig. 2, completing multi-modal interactive needs: user 101, smart machine 102 and cloud brain 104 by system.Its In, smart machine 102 includes reception device 102A, processing unit 102B, output device 102C and attachment device 102D.Cloud Brain 104 includes communication device 104A.

Interactive system provided by the invention based on visual human's behavioral standard need user 101, smart machine 102 with And unobstructed communication channel is established between cloud brain 104, so as to complete the interaction of user 101 Yu visual human.In order to complete At interactive task, smart machine 102 and cloud brain 104 can be provided with the device and component for supporting to complete interaction.With The object of visual human's interaction can be a side, or multi-party.

Smart machine 102 includes reception device 102A, processing unit 102B, output device 102C and attachment device 102D.Wherein, reception device 102A is for receiving multi-modal interaction data.The example of reception device 102A includes grasping for voice The microphone of work, scanner, camera (movement touched is not related to using the detection of visible or nonvisible wavelength) etc..Intelligence is set Standby 102 can obtain multi-modal interaction data by above-mentioned input equipment.Output device 102C is virtual for exporting The multi-modal reply data that people interacts with user 101, substantially suitable with the configuration of reception device 102A, details are not described herein.

Processing unit 102B is for handling the interaction data transmitted in interactive process by cloud brain 104.Attachment device 102D is used for contacting between cloud brain 104, and processing unit 102B handles the pretreated multi-modal friendship of reception device 102A Mutual data or the data transmitted by cloud brain 104.Attachment device 102D sends call instruction to call on cloud brain 104 Robot capability.

The communication device 104A that cloud brain 104 includes is for completing writing to each other between smart machine 102.Communication It keeps in communication and contacts between attachment device 102D on device 104A and smart machine 102, what reception smart machine 102 was sent asks It asks, and sends the processing result of the sending of cloud brain 104, be Jie linked up between smart machine 102 and cloud brain 104 Matter.

Fig. 3 shows the module of the interactive system based on visual human's behavioral standard according to another embodiment of the invention Block diagram.As shown in figure 3, system includes that interaction is intended to obtain module 301, generation module 302 and output module 303.Wherein, it hands over Mutually it is intended to obtain module 301 to include text collection unit 3011, audio collection unit 3012, vision collecting unit 3013, perception Acquisition unit 3014 and resolution unit 3015.Generation module 302 includes language response data generation unit 3021 and behavior Show data generating unit 3022.Output module 303 includes cooperation output unit 3031.

Interaction is intended to obtain module 301 for obtaining multi-modal interaction data, parses, obtains to multi-modal interaction data Interaction to user is intended to.Visual human 103 shown by smart machine 102, when being in interaction mode starting voice, emotion, Vision and sensing capability.Text collection unit 3011 is used to acquire text information.Audio collection unit 3012 is used to acquire sound Frequency information.Vision collecting unit 3013 is used to acquire visual information.Perception acquisition unit 3014 is used to acquire perception information.More than The example of acquisition unit includes the microphone for voice operating, scanner, camera, sensing control equipment, such as using visible or not Visible wavelength ray, signal, environmental data etc..Multi-modal interactive number can be obtained by above-mentioned input equipment According to.Multi-modal interaction may include one of text, audio, vision and perception data, also may include a variety of, the present invention It is restricted not to this.

Generation module 302 is used to be intended to according to interaction to generate visual human's language response data and corresponding visual human Behavior expression data, wherein visual human's behavior expression data include the headwork data of visual human, eye sight control data And facial expression data, body motion data and major beat data.

Language response data generation unit 3021 is used to be intended to according to interaction to generate visual human's language response data.Behavior table Existing data generating unit 3022 is for generating visual human's behavior expression data corresponding with visual human's language response data.

Output module 303 is for cooperating visual human's behavior expression data to export visual human's language response data.Cooperation output Unit 3031 be used for when export multi-modal language response data at the time of suitable and position cooperation export visual human's behavior Show data.

Fig. 4 shows the structure of the interactive system based on visual human's behavioral standard according to another embodiment of the invention Block diagram.As shown in figure 4, completing interaction needs user 101, smart machine 102 and cloud brain 104.Wherein, smart machine 102 include man-machine interface 401, data processing unit 402, input/output unit 403 and interface unit 404.Cloud brain 104 Interface 1043 and affection computation interface 1044 are calculated comprising semantic understanding interface 1041, visual identity interface 1042, cognition.

Interactive system provided by the invention based on visual human's behavioral standard includes smart machine 102 and cloud brain 104.Visual human 103 runs in smart machine 102, and visual human 103 has default image and preset attribute, in interaction It can star voice, emotion, vision and sensing capability when state.

In one embodiment, smart machine 102 may include: man-machine interface 401, data processing unit 402, input it is defeated Device 403 and interface unit 404 out.Wherein, man-machine interface 401 is shown in the predeterminable area of smart machine 102 in fortune The visual human 103 of row state.

Data processing unit 402 carries out the number generated in multi-modal interactive process for handling user 101 and visual human 103 According to.Processor used can be data processing unit (Central Processing Unit, CPU), can also be that other are logical With processor, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor Deng processor is the control centre of terminal, utilizes the various pieces of various interfaces and the entire terminal of connection.

It include memory in smart machine 102, memory mainly includes storing program area and storage data area, wherein is deposited Store up program area can application program needed for storage program area, at least one function (for example sound-playing function, image play function Energy is equal) etc.；Storage data area can store according to smart machine 102 use created data (such as audio data, browsing note Record etc.) etc..In addition, memory may include high-speed random access memory, it can also include nonvolatile memory, such as firmly Disk, memory, plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) block, flash card (Flash Card), at least one disk memory, flush memory device or other volatile solid-states Part.

Input/output unit 403 is used to obtain multi-modal interaction data and exports the output data in interactive process.It connects Mouthful unit 404 is used to communicate with the expansion of cloud brain 104, and by with the interface in cloud brain 104, to fetching, to transfer cloud big Visual human's ability in brain 104.

Cloud brain 104 include semantic understanding interface 1041, visual identity interface 1042, cognition calculate interface 1043 and Affection computation interface 1044.The above interface is communicated with the expansion of interface unit 404 in smart machine 102.Also, cloud is big Brain 104 also includes and the corresponding semantic understanding logic of semantic understanding interface 1041, vision corresponding with visual identity interface 1042 Recognition logic and cognition calculate the corresponding cognition calculating logic of interface 1043 and emotion corresponding with affection computation interface 1044 Calculating logic.

As shown in figure 4, each ability interface calls corresponding logical process respectively in multi-modal data resolving.Below For the explanation of each interface:

Semantic understanding interface 1041 receives the special sound instruction forwarded from interface unit 404, carries out voice knowledge to it The other and natural language processing based on a large amount of corpus.

Visual identity interface 1042 can be calculated for human body, face, scene according to computer vision algorithms make, deep learning Method etc. carries out video content detection, identification, tracking etc..Image is identified according to scheduled algorithm, the inspection of quantitative Survey result.Have image preprocessing function, feature extraction functions, decision making function and concrete application function；

Wherein, image preprocessing function, which can be, carries out basic handling, including color sky to the vision collecting data of acquisition Between conversion, edge extracting, image transformation and image threshold；

Feature extraction functions can extract the features such as the colour of skin of target, color, texture, movement and coordinate in image and believe Breath；

Decision making function can be to characteristic information, is distributed to according to certain decision strategy and needs the specific of this feature information Multi-modal output equipment or multi-modal output application, such as realize Face datection, human limbs identification, motion detection function.

Cognition calculates interface 1043, receives the multi-modal data forwarded from interface unit 404, and cognition calculates interface 1043 Data acquisition, identification and study are carried out to handle multi-modal data, to obtain user's portrait, knowledge mapping etc., to multimode State output data carries out Rational Decision.

Affection computation interface 1044 receives the multi-modal data forwarded from interface unit 404, utilizes affection computation logic (can be Emotion identification technology) calculates the current emotional state of user.Emotion identification technology is that one of affection computation is important Component part, the content of Emotion identification research include facial expression, voice, behavior, text and physiological signal identification etc., are led to Crossing the above content may determine that the emotional state of user.Emotion identification technology can be monitored only by vision Emotion identification technology The emotional state of user can also monitor use in conjunction with by the way of using vision Emotion identification technology and sound Emotion identification technology The emotional state at family, and be not limited thereto.In the present embodiment, it is preferred to use the two in conjunction with mode monitor mood.

Affection computation interface 1044 is to collect mankind face by using image capture device when carrying out vision Emotion identification Portion's facial expression image is then converted into that data can be analyzed, the technologies such as image procossing is recycled to carry out the analysis of expression mood.Understand face Expression, it usually needs the delicate variation of expression is detected, such as cheek muscle, mouth variation and choose eyebrow etc..

Fig. 5 shows the exchange method flow chart according to an embodiment of the invention based on visual human's behavioral standard. As shown in figure 5, parsing, obtaining to multi-modal interaction data firstly, obtain multi-modal interaction data in step S501 The interaction of user is intended to.In multi-modal interactive process, virtual robot obtains multimode by the reception device on smart machine State interaction data.It may include text data, voice data, perception data and action data etc. in multi-modal interaction data.

Then, in step S502, it is intended to generate visual human's language response data and corresponding void according to interaction Anthropomorphic behavior expression data, wherein visual human's behavior expression data include the headwork data of visual human, the control of eye sight Data and facial expression data, body motion data and major beat data.

According to one embodiment of present invention, classify according to visual human's emotion information, check the headwork number of visual human It is respectively represented according to, eye sight control data and facial expression data, body motion data and major beat data virtual Human feelings sense is with the presence or absence of conflict and replaces.

According to one embodiment of present invention, visual human has specific virtual image and preset attribute, described in generation When visual human's behavior expression data, with headwork data, the eye sight control data of the visual human that preset attribute is not consistent And facial expression data, body motion data and major beat data are not involved in decision and output.

Finally, cooperation visual human's behavior expression data export visual human's language response data in step S503.In order to make It obtains visual human and reaches more anthropomorphic effect, need to export visual human's behavior expression data when interacting with user.Behavior expression number According to language response data can be cooperated to export, the more true anthropomorphic interactive experience of user is brought.

According to one embodiment of present invention, in step S503, headwork data, the eye sight of visual human are determined Control data and facial expression data, performance degree and are held the output time of body motion data and major beat data The continuous time.

According to one embodiment of present invention, if current scene is no voice output scene, then according to the current shape of visual human State exports the headwork data of matched visual human, eye sight control data and facial expression data, body motion data And major beat data.For example, visual human exports, the behavior table of visual human when listening user speaks without language data It is now forward lean, binocular fixation user, sight follows user mobile, visits ear and the absorbed language for listening to user exports.

In addition, the visual interactive system provided by the invention based on visual human can also cooperate a kind of program product, packet Containing for executing the series of instructions for completing the exchange method step based on visual human's behavioral standard.Program product can run meter The instruction of calculation machine, computer instruction includes computer program code, and computer program code can be source code form, object identification code Form, executable file or certain intermediate forms etc..

Program product may include: can carry computer program code any entity or device, recording medium, USB flash disk, Mobile hard disk, magnetic disk, CD, computer storage, read-only memory (ROM, Read-Only Memory), random access memory Device (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..

It should be noted that the content that program product includes can be according to making laws in jurisdiction and patent practice is wanted It asks and carries out increase and decrease appropriate, such as do not include electric carrier wave according to legislation and patent practice, program product in certain jurisdictions Signal and telecommunication signal.

Fig. 6, which is shown, generates void in the exchange method according to an embodiment of the invention based on visual human's behavioral standard The flow chart of anthropomorphic behavior expression data.

As shown in fig. 6, in step s 601, parsing visual human's language response data, visual human's language response data is extracted In include visual human's emotion information.

In step S602, obtain and the matched visual human's headwork data of visual human's emotion information, eye sight control Data and facial expression data processed, body motion data and major beat data.

In step S603, visual human's behavior expression data are generated according to matched result.

According to one embodiment of present invention, visual human has specific virtual image and preset attribute, virtual generating When people's behavior expression data, with headwork data, eye sight control data and the face of the visual human that preset attribute is not consistent Portion's expression data, body motion data and major beat data are not involved in decision and output.

Fig. 7 shows emotion parameter classification schematic diagram according to an embodiment of the invention.As shown in fig. 7, visual human Emotion information includes: positive emotion, negative sense emotion, attitude emotion and communication emotion, wherein positive emotion includes: happily, certainly Letter, expect and it is surprised；Negative sense emotion includes angry, frightened, sad and detests；Attitude emotion includes: accepting and does not recognize Together；Emotion of communicating includes greeting and goodbye.

In one example, visual human is set to the Hans of BeiJing, China, and name is " clever youngster ".It is needed in visual human When expressing angry emotion, the facial expression of visual human can be " staring sb. with glaring eyes ", along with body recycling and limbs little trick. When visual human needs to express the emotions such as frightened, sad and detest, the mood of visual human is negative, with angry emotion row It is similar to show.

When visual human needs to express happy emotion, the facial expression of visual human can be facial expression and unfold, smile or Laugh etc..Body language can be export-oriented movement.Shape movement is embraced for example, opening one's arms.

Visual human need to express it is self-confident, expect and when the emotions such as surprised, the mood of visual human is sign face, and it is happy The behavior expression of emotion is similar.

When visual human needs to express approval emotion, the headwork of visual human can be posture of nodding.The face of visual human Portion's expression can be smile.When visual human needs to express and do not accept emotion, the headwork of visual human can be appearance of shaking the head Gesture.The facial expression of visual human can be serious.

When visual human needs to express greeting emotion, the behavior of visual human can arrange in pairs or groups the body action of happy emotion, The arm action of visual human can be shape of waving.When visual human needs to express goodbye emotion, the hand motion of visual human can be with It is goodbye gesture.

According to one embodiment of present invention, the emotion of visual human need to have reasonability.It needs to believe according to visual human's emotion Breath classification, check visual human headwork data, eye sight control data and facial expression data, body motion data with And visual human's emotion that major beat data respectively represent whether there is conflict and replace.For example, visual human will not occur simultaneously The mood wailed and laughed shows.

In addition, visual human has specific virtual image and preset attribute, when generating visual human's behavior expression data, with Headwork data, the eye sight for the visual human that preset attribute is not consistent control data and facial expression data, body action Data and major beat data are not involved in decision and output.For example, the personality attribute of visual human be it is introversive, then visual human is former It is not in the laugh behavior expression rocked with laughter on then.

In addition, the attributive character of visual human can have group and heredity, have group and heredity model The behavior expression data of visual human in enclosing have similitude.

When exporting visual human's behavior expression data, headwork data, the eye sight control data of visual human are determined And facial expression data, the output time of body motion data and major beat data, performance degree and duration.

In general, visual human's behavior expression data need that language response data is cooperated to export, therefore, in output When it needs to be determined that behavior expression data output opportunity and degree.For example, the duration that expression is generally the long period is dynamic Make, therefore expression often persistently shows in the expression of a word.Body action and the corresponding language data degree of association are high, often exist Corresponding language expression position nearby is fade-in gradually to go out.

In fact, the emotion parameter of visual human is not limited to enumerated above four, can also include feelings more abundant Feel parameter, Emotional Behavior on Virtual Human corresponding with emotion parameter is also not unique as shown in Figure 7.Under a certain emotion parameter, visual human can With there are many headwork data more segmented, eye sight control data and facial expression data, body motion data with And major beat data.All forms of expression that can show visual human's emotion can apply in the embodiment of the present invention, The present invention makes limitation not to this.

Fig. 8 shows another stream of the exchange method according to an embodiment of the invention based on visual human's behavioral standard Cheng Tu.

As shown in figure 8, smart machine 102 is issued to cloud brain 104 and is requested in step S801.Later, in step In S802, smart machine 102 is constantly in the state for waiting cloud brain 104 to reply.During waiting, smart machine 102 can carry out Clocked operation to returned data the time it takes.

In step S803, if the reply data not returned for a long time, for example, being more than scheduled time span 5S, then smart machine 102 can select to carry out local reply, generate local common reply data.Then, defeated in step S804 The animation cooperated out with local common response, and voice playing equipment is called to carry out voice broadcasting.

In order to realize the multi-modal interaction between smart machine 102 and user 101, user 101, smart machine 102 are needed And communication connection is set up between cloud brain 104.This communication connection should be it is real-time, unobstructed, can guarantee to hand over It is mutually impregnable.

In order to complete to interact, some conditions or premise are needed to have.These conditions or premise include smart machine Visual human is loaded and run in 102, and smart machine 102 has the hardware facility of perception and control function.Visual human exists Start voice, emotion, vision and sensing capability when in interaction mode.

After completing early-stage preparations, smart machine 102 starts to interact with the expansion of user 101, firstly, smart machine 102 obtains Multi-modal interaction data.It may include the data of diversified forms in multi-modal interaction data, for example, can in multi-modal interaction data To include text data, voice data, perception data and action data etc..It is multi-modal configured with receiving in smart machine 102 The relevant device of interaction data, for receiving the multi-modal interaction data of the transmission of user 101.At this point, the two of expanding data transmitting Side is user 101 and smart machine 102, and the direction of data transmitting is to be transmitted to smart machine 102 from user 101.

Then, smart machine 102 sends to cloud brain 104 and requests.Request cloud brain 104 to multi-modal interaction data Semantic understanding, visual identity, cognition calculating and affection computation are carried out, to help user to carry out decision.At this point, to multi-modal friendship Mutual data are parsed, and the interaction for obtaining user is intended to.And according to interaction be intended to generate visual human's language response data and with Corresponding visual human's behavior expression data, wherein visual human's behavior expression data include the headwork data of visual human, eye Portion's sight controls data and facial expression data, body motion data and major beat data.Then, cloud brain 104 will Data transmission is replied to smart machine 102.At this point, two sides of expansion communication are smart machine 102 and cloud brain 104.

Finally, smart machine 102 can cooperate virtually after smart machine 102 receives the data of the transmission of cloud brain 104 People's behavior expression data export visual human's language response data.At this point, two sides of expansion communication are smart machine 102 and user 101。

It should be understood that disclosed embodiment of this invention is not limited to specific structure disclosed herein, processing step Or material, and the equivalent substitute for these features that those of ordinary skill in the related art are understood should be extended to.It should also manage Solution, term as used herein is used only for the purpose of describing specific embodiments, and is not intended to limit.

" one embodiment " or " embodiment " mentioned in specification means the special characteristic described in conjunction with the embodiments, structure Or characteristic is included at least one embodiment of the present invention.Therefore, the phrase " reality that specification various places throughout occurs Apply example " or " embodiment " the same embodiment might not be referred both to.

While it is disclosed that embodiment content as above but described only to facilitate understanding the present invention and adopting Embodiment is not intended to limit the invention.Any those skilled in the art to which this invention pertains are not departing from this Under the premise of the disclosed spirit and scope of invention, any modification and change can be made in the implementing form and in details, But scope of patent protection of the invention, still should be subject to the scope of the claims as defined in the appended claims.

Claims

1. a kind of exchange method based on visual human's behavioral standard, which is characterized in that the visual human by smart machine show, When being in interaction mode, starting voice, emotion, vision and sensing capability, the method are comprised the steps of:

It is intended to generate visual human's language response data and corresponding visual human's behavior expression data according to the interaction, In, visual human's behavior expression data include the headwork data of visual human, eye sight control data and facial expression Data, body motion data and major beat data；

2. the method as described in claim 1, which is characterized in that be intended to generate visual human's language response data according to the interaction And it in the step of corresponding visual human's behavior expression data, also comprises the steps of:

Visual human's language response data is parsed, the visual human's emotion letter for including in visual human's language response data is extracted Breath；

It obtains and the matched visual human's headwork data of visual human's emotion information, eye sight control data and facial table Feelings data, body motion data and major beat data；

3. method according to claim 2, which is characterized in that visual human's emotion information includes: positive emotion, negative sense emotion, Attitude emotion and communication emotion, wherein positive emotion includes: happy, self-confident, expect and surprised；Negative sense emotion includes life Gas, fear, sadness and detest；Attitude emotion includes: accepting and does not accept；Emotion of communicating includes greeting and goodbye.

4. method as claimed in claim 3, which is characterized in that generate visual human's behavior expression number according to matched result According to the step of in, also comprise the steps of:

Classify according to visual human's emotion information, checks the headwork data, eye sight control data and face of visual human Visual human's emotion that portion's expression data, body motion data and major beat data respectively represent is with the presence or absence of conflict and replaces It changes.

5. the method as described in claim 1, which is characterized in that the visual human has specific virtual image and default category Property, when generating visual human's behavior expression data, the headwork data for the visual human not being consistent with the preset attribute, Eye sight control data and facial expression data, body motion data and major beat data are not involved in decision and output.

6. the method as described in claim 1, which is characterized in that cooperate visual human's behavior expression data output described virtual Human speech was sayed in the step of response data comprising the steps of:

Determine visual human headwork data, eye sight control data and facial expression data, body motion data and Output time, performance degree and the duration of major beat data.

7. the method as described in claim 1, which is characterized in that if current scene is no voice output scene, then according to virtual People's current state exports the headwork data of matched visual human, eye sight control data and facial expression data, body Action data and major beat data.

8. a kind of interactive device based on visual human's behavioral standard, which is characterized in that described device includes:

Interaction is intended to obtain module, is used to obtain multi-modal interaction data, parses, obtain to the multi-modal interaction data Interaction to user is intended to；

Visual human's behavior expression data generation module, be used for according to it is described interaction be intended to generate visual human's language response data with And corresponding visual human's behavior expression data, wherein visual human's behavior expression data include that the head of visual human is dynamic Make data, eye sight control data and facial expression data, body motion data and major beat data；

9. a kind of program product runs program for visual human, for executing such as method of any of claims 1-7 The series of instructions of step.

10. a kind of interactive system based on visual human's behavioral standard, which is characterized in that the system includes:

Smart machine is mounted with visual human thereon, for obtaining multi-modal interaction data, and has voice, emotion, expression and moves Make the ability exported, the smart machine includes hologram device；

Cloud brain is used to carry out the multi-modal interaction data semantic understanding, visual identity, cognition calculating and emotion It calculates, visual human's behavior expression data and visual human's language response data is exported with visual human described in decision.