CN108363706A - The method and apparatus of human-computer dialogue interaction, the device interacted for human-computer dialogue - Google Patents
The method and apparatus of human-computer dialogue interaction, the device interacted for human-computer dialogue Download PDFInfo
- Publication number
- CN108363706A CN108363706A CN201710056801.3A CN201710056801A CN108363706A CN 108363706 A CN108363706 A CN 108363706A CN 201710056801 A CN201710056801 A CN 201710056801A CN 108363706 A CN108363706 A CN 108363706A
- Authority
- CN
- China
- Prior art keywords
- data
- interaction
- characteristic
- voice
- attribute
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000003993 interaction Effects 0.000 title claims abstract description 162
- 238000000034 method Methods 0.000 title claims abstract description 38
- 230000009471 action Effects 0.000 claims abstract description 55
- 230000002452 interceptive effect Effects 0.000 claims abstract description 26
- 238000012549 training Methods 0.000 claims description 84
- 238000012545 processing Methods 0.000 description 15
- 238000004891 communication Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 7
- 230000036651 mood Effects 0.000 description 7
- 230000008859 change Effects 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000005236 sound signal Effects 0.000 description 4
- 239000000284 extract Substances 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000008921 facial expression Effects 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 230000000712 assembly Effects 0.000 description 2
- 238000000429 assembly Methods 0.000 description 2
- 238000013075 data extraction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000002996 emotional effect Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- KLDZYURQCUYZBL-UHFFFAOYSA-N 2-[3-[(2-hydroxyphenyl)methylideneamino]propyliminomethyl]phenol Chemical compound OC1=CC=CC=C1C=NCCCN=CC1=CC=CC=C1O KLDZYURQCUYZBL-UHFFFAOYSA-N 0.000 description 1
- 241001269238 Data Species 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000036760 body temperature Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 201000001098 delayed sleep phase syndrome Diseases 0.000 description 1
- 208000033921 delayed sleep phase type circadian rhythm sleep disease Diseases 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000029087 digestion Effects 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 210000003414 extremity Anatomy 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 239000012092 media component Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000035479 physiological effects, processes and functions Effects 0.000 description 1
- 230000029058 respiratory gaseous exchange Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 210000003371 toe Anatomy 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
- G10L13/0335—Pitch control
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
- G10L2015/227—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of the speaker; Human-factor methodology
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
- G10L2015/228—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Theoretical Computer Science (AREA)
- Child & Adolescent Psychology (AREA)
- General Health & Medical Sciences (AREA)
- Psychiatry (AREA)
- Hospice & Palliative Care (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Manipulator (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
An embodiment of the present invention provides a kind of method and apparatus of human-computer dialogue interaction, wherein the method includes:Obtain voice data, image data and the contextual data of interaction side;Corresponding scene characteristic model is obtained according to the contextual data;The voice data and image data are input to the scene characteristic model and obtain target person characteristic attribute;Target dialogue strategy is determined using the target person characteristic attribute and contextual data;Expression, voice based on target dialogue policy control robot and/or action output.The embodiment of the present invention so that during human-computer interaction, machine can coordinate the feature of interaction side's current session, the dialogue to personalize with the side of interaction, to improve interaction side's interactive experience according to target dialogue strategy.
Description
Technical field
The present invention relates to technical field of data processing, man-machine more particularly to the method and one kind of a kind of human-computer dialogue interaction
Talk with the device of interaction and the device of user's human-computer dialogue interaction.
Background technology
Man-machine interaction is exactly the process that people carries out information exchange with machine.Information flowrate is that measurement is man-machine
The most important index of dialogue interaction.Man-machine interaction will follow the Evolution Paths of person to person's interaction, and dialogue interaction is highest
The Health For All mode of effect will also become most efficient human-computer dialogue interactive mode.
The voice messaging of interaction side can only be changed into text information by existing interactive, cannot be identified more
Information causes machine that can only be gone to generate reply according to single parameter, model when replying interaction side in this way.In addition, existing
Interactive system is generally synthetic when replying based on voice messaging, and dialogue form is dull, and interaction cube is tested ineffective.
Invention content
In view of the above problems, it is proposed that the embodiment of the present invention overcoming the above problem or at least partly in order to provide one kind
The device of a kind of method of human-computer dialogue interaction to solve the above problems and a kind of corresponding human-computer dialogue interaction.
To solve the above-mentioned problems, the embodiment of the invention discloses a kind of methods of human-computer dialogue interaction, including:
Obtain voice data, image data and the contextual data of interaction side;
Corresponding scene characteristic model is obtained according to the contextual data;
The voice data and image data are input to the scene characteristic model and obtain target person characteristic attribute;
Target dialogue strategy is determined using the target person characteristic attribute and contextual data;
Expression, voice based on target dialogue policy control robot and/or action output.
Optionally, the step of voice data for obtaining interaction side, image data and contextual data includes:
The voice data of the interaction side is acquired based on microphone;
The image data of the interaction side is acquired based on camera;
And the contextual data is acquired based on sensor.
Optionally, the step of voice data for obtaining interaction side, image data and contextual data includes:
Show interactive interface;
Based on interactive interface prompt interaction side's input voice data, image data and contextual data.
Optionally, described the step of obtaining corresponding scene characteristic model according to the contextual data, includes:
Scene characteristic attribute is extracted from the contextual data;
Obtain the corresponding scene characteristic model of the scene characteristic attribute.
Optionally, the scene characteristic model is trained in the following way:
Obtain the training sample and the corresponding character features attribute of each training sample under each scene characteristic model;Institute
It includes training voice data and training image data to state training sample;
Trained intonation characteristic and training patterned feature data are extracted from the trained voice data;
Go out trained expressive features data and training action characteristic from the training image extracting data;
The intonation characteristic that includes using the training sample under each scene characteristic, training patterned feature data,
Training expressive features data and/or training action characteristic and corresponding character features attribute, it is special that training obtains each scene
Levy model.
Optionally, described the voice data and image data are input to the scene characteristic model to obtain target person
Characteristic attribute, including:
Intonation characteristic and patterned feature data are extracted from the voice data;
Go out expressive features data and motion characteristic data from described image extracting data;
Based on the intonation characteristic, patterned feature data, expressive features data and/or motion characteristic data, in conjunction with
The scene characteristic model obtains the target person characteristic attribute.
Optionally, expression, voice and/or the action output based on target dialogue policy control robot, packet
It includes:
Obtain text information corresponding with the target dialogue strategy, expression instruction, phonetic order, and/or action command;
The robot, which is controlled, based on expression instruction, phonetic order and/or action command exports the text information.
The embodiment of the invention also discloses a kind of devices of human-computer dialogue interaction, including:
Interaction side's data acquisition module, voice data, image data and contextual data for obtaining interaction side;
Scene characteristic model acquisition module, for obtaining corresponding scene characteristic model according to the contextual data;
Character features attribute obtains module, for the voice data and image data to be input to the scene characteristic mould
Type obtains target person characteristic attribute;
Target dialogue strategy determining module, for determining target pair using the target person characteristic attribute and contextual data
Words strategy;
Human-computer interaction session module is used for the expression based on target dialogue policy control robot, voice and/or moves
It exports.
Optionally, interaction side's data acquisition module includes:
First interaction side's data-acquisition submodule, the voice data for acquiring the interaction side based on microphone;It is based on
Camera acquires the image data of the interaction side;And the contextual data is acquired based on sensor.
Optionally, interaction side's data acquisition module includes:
Interactive interface shows submodule, for showing interactive interface;
Second interaction side's data-acquisition submodule, for based on the interactive interface prompt interaction side input voice data,
Image data and contextual data.
Optionally, the scene characteristic model acquisition module includes:
Scene characteristic attributes extraction submodule, for extracting scene characteristic attribute from the contextual data;
Scene characteristic model determination sub-module, for obtaining the corresponding scene characteristic model of the scene characteristic attribute.
Optionally, described device further includes:
Training sample acquisition module, for obtaining training sample and each training sample under each scene characteristic model
Corresponding character features attribute;The training sample includes training voice data and training image data;
First training characteristics data extraction module, for extracting trained intonation characteristic from the trained voice data
According to training patterned feature data;
Second training characteristics data extraction module, for going out trained expressive features number from the training image extracting data
According to training action characteristic;
Scene characteristic model training module, the intonation for including using the training sample under each scene characteristic are special
Levy data, training patterned feature data, training expressive features data and/or training action characteristic and corresponding personage
Characteristic attribute, training obtain each scene characteristic model.
Optionally, the character features attribute acquisition module includes:
Fisrt feature data extracting sub-module, it is special for extracting intonation characteristic and lines from the voice data
Levy data;
Second feature data extracting sub-module, for going out expressive features data and action spy from described image extracting data
Levy data;
Target person characteristic attribute obtains submodule, for based on the intonation characteristic, patterned feature data, expression
Characteristic and/or motion characteristic data obtain the target person characteristic attribute in conjunction with the scene characteristic model.
Optionally, the target dialogue strategy setting has corresponding word, expression, action, the human-computer interaction to talk with mould
Block includes:
Acquisition submodule is instructed, for obtaining text information corresponding with the target dialogue strategy, expression instruction, voice
Instruction, and/or action command;
Command executing sub module, for controlling the machine based on expression instruction, phonetic order and/or action command
People exports the text information.
The embodiment of the present invention also provides a kind of device for human-computer dialogue interaction, include memory and one or
The more than one program of person, one of them either more than one program be stored in memory and be configured to by one or
It includes the instruction for being operated below that more than one processor, which executes the one or more programs,:
Obtain voice data, image data and the contextual data of interaction side;
Corresponding scene characteristic model is obtained according to the contextual data;
The voice data and image data are input to the scene characteristic model and obtain target person characteristic attribute;
Target dialogue strategy is determined using the target person characteristic attribute and contextual data;
Expression, voice based on target dialogue policy control robot and/or action output.
The embodiment of the present invention includes following advantages:
The embodiment of the present invention obtains voice data, image data and the scene number of interaction side during human-computer interaction
According to, and obtained based on contextual data and meet the scene characteristic models of session operational scenarios instantly, then by the voice data of interaction side with/
Or image data is input to scene characteristic model and obtains character features attribute, and corresponding mesh is formulated based on personage's characteristic attribute
Mark dialog strategy so that during human-computer interaction, machine can coordinate interaction side's current session according to target dialogue strategy
Feature, the dialogue to personalize with the side of interaction.
The embodiment of the present invention can meet the scene characteristic mould under respective fields scape according to the contextual data selection collected
Type determines the character features category to match with the phonetic feature of the current side of interaction and/characteristics of image according to the scene characteristic model
Property, wherein character features attribute can reflect the intention for the side of speaking, mood, and character features attribute is not had under different scenes
The side of speaking phonetic feature and characteristics of image can be slightly different.Therefore, scene characteristic corresponding with current scene data is selected
Model determines character features attribute, can make the intention for the side of speaking and the emotion expression service more accurate.The embodiment of the present invention into
One step can formulate the target dialogue strategy interacted with the side of interaction according to character features attribute, can make machine and the side of interaction
Interaction it is more personalized, being capable of preferably service interaction side.
Description of the drawings
Fig. 1 is a kind of step flow chart of the embodiment of the method one of human-computer dialogue interaction of the present invention;
Fig. 2 is a kind of step flow chart of the embodiment of the method two of human-computer dialogue interaction of the present invention;
Fig. 3 is a kind of structure diagram of the device embodiment of human-computer dialogue interaction of the present invention;
Fig. 4 is a kind of block diagram of the device of human-computer interaction shown according to an exemplary embodiment;
Fig. 5 be shown according to an exemplary embodiment it is a kind of for human-computer dialogue interaction device as server when
Block diagram.
Specific implementation mode
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, below in conjunction with the accompanying drawings and specific real
Applying mode, the present invention is described in further detail.
Referring to Fig.1, a kind of step flow chart of the embodiment of the method one of human-computer dialogue interaction of the present invention is shown, specifically
It may include steps of:
Step 101, the voice data of acquisition interaction side and image data and contextual data;
It engages in the dialogue with robot in interaction side during interact, obtains the associated data of interaction side in real time.Wherein, should
Associated data may include the voice data of interaction side, image data, and, the contextual data etc. of interaction scenarios where dialogue
Deng.Wherein, which can be obtained by data acquisition equipment, can also be by being manually entered.
In embodiments of the present invention, above-mentioned associated data can actively be acquired.As, once it is determined that current robot
In human-computer interaction state, then automatically by robot built-in or the data acquisition equipment of external connection to currently interactive object
(as interaction side) carries out the acquisition of voice data and/image data and contextual data.
Specifically, the step 101 may include following sub-step:
Sub-step S11 acquires the voice data of interaction side based on microphone;The picture number of interaction side is acquired based on camera
According to;And the contextual data of current session is acquired based on sensor.
Data acquisition equipment there are many being installed in the robot interacted with the side of interaction, including robot interior are pacified
It is filled with and external connection.Different interaction side's associated datas can be collected based on different data acquisition equipments, such as interactive
Voice data when side speaks, the facial expression data of interaction side, the gesture motion data of interaction side, what interaction side was presently in
Scene (environment) data etc..
In one preferred embodiment of the invention, the step 101 may include following sub-step:
Sub-step S12 shows interactive interface;
Sub-step S13, based on interactive interface prompt interaction side's input voice data, image data and scene number
According to.Optionally, the embodiment of the present invention can also input the association by inquiry interaction side or by interactive interface guided interaction side
Data can be specifically that interaction side shows interactive interface, and the input of prompt interaction side is corresponding item by item on the interactive interface
Data, such as prompt alignment microphone input voice data, alignment cameras input image data are currently located field according to interaction side
Scape determines contextual data.Further, it includes the data such as age, gender information that can also prompt the input of interaction side.
Certainly, in practice, the mode that opposed robots voluntarily acquire in such a way that interaction side inputs more not in time,
And the data that can be obtained are less, can not cope with the variation of interaction side, therefore in interactive process, should be with machine certainly
Based on the data of dynamic acquisition, supplemented by the data that interaction side inputs.
It should be noted that the embodiment of the present invention can also acquire the Human Physiology of interaction side by wearable device
Data, such as heartbeat, breathing, digestion, body temperature etc. are based on these data, and robot can analyze and identifying processing, to sentence
Break and the emotional characteristics of interaction side.
Step 102, corresponding scene characteristic model is obtained according to the contextual data;
In embodiments of the present invention, it is contemplated that under different scenes, the character features attribute of interaction side will be different,
So that the associated data of the interaction side collected also slightly has difference.Such as personage's performance indoors may be more overcautious,
It is outdoor then may more decontrol, then may be more dull in driving, therefore, the embodiment of the present invention is according to different scenes
Corresponding scene characteristic model is arranged in corresponding character features attribute.For example, corresponding and indoor environment can be arranged indoors
Corresponding outdoor character features attribute can be arranged in outdoor in the corresponding indoor characteristic model of more matched character features attribute
Driving corresponding with the more matched character features attribute of environment when driving a vehicle can be arranged in driving in corresponding outdoor characteristics model
Characteristic model.
In one preferred embodiment of the invention, described 202 may include following sub-step:
Sub-step S21 extracts scene characteristic attribute from the contextual data;
Sub-step S22 obtains the corresponding scene characteristic model of the scene characteristic attribute.
The embodiment of the present invention can acquire the specific environment information of human-computer dialogue occurrence scene by sensor, that is, collect
Contextual data.Specifically, temperature, humidity can be identified by Temperature Humidity Sensor, is to move by velocity sensor identification
It is dynamic or static, it is daytime or evening by optical sensor identification, is indoor or outdoor etc. by environmental sensor identification.
For the data that sensor recognizes, corresponding scene characteristic attribute will be therefrom extracted, is based on these scene characteristic attributes, it can
Gone out whether indoors, whether in driving etc. with simple analysis.For example, the speed data that velocity sensor identifies, Ke Yicong
In extract the current velocity information in interaction side, if speed reaches preset Vehicle Speed, so that it may to think interaction side
Under driving scene.
Certainly, when the embodiments of the present invention are specifically implemented, can also be identified using other sensors or other modes
Go out the scene that interaction side is presently in, the embodiment of the present invention does not limit this.
In one preferred embodiment of the invention, the scene characteristic model can be trained in the following way:
Obtain the training sample and the corresponding character features attribute of each training sample under each scene characteristic;The instruction
It includes training voice data and training image data to practice sample;
Trained intonation characteristic and training patterned feature data are extracted from the trained voice data;
Go out trained expressive features data and training action characteristic from the training image extracting data;
The intonation characteristic that includes using the training sample under each scene characteristic, training patterned feature data,
Training expressive features data and/or training action characteristic and corresponding character features attribute, it is special that training obtains each scene
Levy model.
The embodiment of the present invention can utilize magnanimity training sample and training sample under each scene characteristic collected in advance
This corresponding character features attribute is as training data.Training sample may include trained voice data and training image data.
For example, voice collecting and Image Acquisition are carried out to the different personages under some scene, and according to a certain personage's collected
Voice data and image data establish the correspondence between the character features attribute of the personage, constitute a training data.Example
Such as, under some scene, the voice data and image data of a child are obtained, and is extracted from the voice data of the child small
The intonation characteristic and patterned feature data of child, and extract from the image data of child the facial expression feature number of child
It establishes the character features attribute with child using the partial data as training sample according to gesture feature data and works as front court
The association of scape feature is stored in as a training sample in training sample database.Based on this so that preserved in training sample database
The training sample for different scenes of magnanimity.
By deep learning, according to the training sample of magnanimity under each scene characteristic, can train to obtain under each scene
Scene characteristic model.Wherein, the corresponding character features attribute of training sample may include age attribute (child/youth/old
People), personality attribute (optimistic/containing/shy) and mood attribute (sad/tranquil/excited) etc..It can be to language after the completion of training
Sound data, image data make corresponding classification.
Scene characteristic model can determine corresponding character features attribute based on the voice data of input, image data.
In the concrete application of the present invention, the scene of the mood of interaction side can be trained by training intonation characteristic
Characteristic model further, can also add other by training characteristics data come the scene characteristic model of sport career age
Data are trained so that model accuracy more increases.
Step 103, the voice data and image data are input to the scene characteristic model and obtain target person spy
Levy attribute.
Specifically, step 103 of the embodiment of the present invention may include:
Intonation characteristic and patterned feature data are extracted from the voice data;
Go out expressive features data and motion characteristic data from described image extracting data;
Based on the intonation characteristic, patterned feature data, expressive features data and/or motion characteristic data, in conjunction with
The scene characteristic model obtains the target person characteristic attribute.
For collected interaction side's associated data, for example, voice data and/or image data will be input to it is corresponding
Scene characteristic model is obtained special with current scene and the more matched personage of corresponding interaction side's feature based on the scene characteristic model
Levy attribute.Specifically, character features attribute may include essential characteristic, emotional characteristics, character trait etc., wherein mood
Feature may include excitement, tranquil, sad, and essential characteristic may include old man, children, male, women etc., and character trait can be with
Including optimistic, active, containing, shy etc..
It should be noted that division and judgement for character features attribute, can be arranged above-mentioned one according to actual demand
Kind is a variety of, or the other character features attributes of addition, and the embodiment of the present invention is to this without limiting.
Step 104, target dialogue strategy is determined using the target person characteristic attribute and contextual data;
Different dialog strategies is arranged based on different character features attributes and scene characteristic for the embodiment of the present invention, for example,
If identifying the atmosphere that current scene is relatively more oppressive, the mood of interaction side is sad, can be used in dialog procedure
The dialog strategy consoled.
Step 105, the expression, voice based on target dialogue policy control robot and/or action output.
During human-computer dialogue, the control expression of robot that can be based on target dialogue strategy, voice, and/or dynamic
It exports, engages in the dialogue with the side of interaction.Wherein, expression can be facial expression, such as the performance of the performance characteristic, face of eyes
Feature etc.;Voice can be that intonation, the height of sound that robot voice exports are low;Action can be gesture, the head of robot
Portion acts and the action etc. of other limbs.
The patterned feature data in voice data can be acquired in the embodiment of the present invention by Application on Voiceprint Recognition.Application on Voiceprint Recognition,
Also referred to as Speaker Identification has two classes, i.e. speaker's identification and speaker verification.The former is judging that certain section of voice is several people
Which of described in, be " multiselect one " problem, and the latter is confirming whether certain section of voice is described in specified someone
's.
The embodiment of the present invention can using patterned feature data as the identity of interaction side, when recognize there are two or
Can be the language which interaction side generates based on the judgement of patterned feature data when the more than two interaction sides of person carry out human-computer dialogue
Sound data, and the voice data that is generated based on the interaction side and/or picture number are come the target dialogue for the interaction side that determines
Strategy engages in the dialogue with the interaction side, rather than uses identical dialog strategy for multiple interaction sides of two interaction sides, thus
The individual demand of interaction side can be met so that dialogue is more interesting.
The embodiment of the present invention is during human-computer interaction, the voice data and image data of acquisition interaction side and field
Scape data, and obtained based on contextual data and meet the scene characteristic model of scene instantly, then by the voice data of interaction side with
Image data is input to scene characteristic model and obtains the character features attribute to match with current scene and the side's of interaction feature, and base
Corresponding target dialogue strategy is formulated in personage's characteristic attribute, the expression and action for controlling robot export so that interaction side
During human-computer interaction, robot can coordinate according to target dialogue strategy to engage in the dialogue with the side of interaction.
The embodiment of the present invention can select to meet the scene characteristic model under respective fields scape, to determine machine under current scene
People needs the character features attribute showed, wherein character features attribute can reflect personality, intention, mood of the mankind etc., not
The voice data and image data that the mankind of different personage's characteristic attributes generate under same scene can be slightly different, and therefore, be passed through
To determine, robot needs the character features attribute shown to scene characteristic model under corresponding scene under current scene, can make
It is more accurate to obtain personalizing for robot.The embodiment of the present invention is further according to the target person characteristic attribute of acquisition come control machine
The expression of device people and action export, and the interaction of machine and the mankind can be made more personalized, being capable of preferably service user.
With reference to Fig. 2, a kind of step flow chart of the embodiment of the method two of human-computer dialogue interaction of the present invention is shown, specifically
It may include steps of:
Step 201, voice data, image data and the contextual data of interaction side are obtained;
Step 202, corresponding scene characteristic model is obtained according to the contextual data;
Step 203, the voice data and image data are input to the scene characteristic model and obtain target person spy
Levy attribute;
Step 204, target dialogue strategy is determined using the target person characteristic attribute and contextual data;
Step 205, text information corresponding with the target dialogue strategy, expression instruction, phonetic order are obtained and/or is moved
It instructs;
Since the specific implementation mode of the step 201- steps 205 in embodiment of the method two essentially corresponds to method above-mentioned
The specific implementation mode of embodiment one, therefore the present embodiment is for not detailed place in the description of step 201- steps 205, Ke Yican
As soon as seeing the related description in previous embodiment, do not repeat herein.
Step 206, expression instruction, phonetic order and/or action command is based on to control described in the robot output
Text information.
In embodiments of the present invention, target dialogue strategy is determined based on character features attribute, be arranged in target dialogue strategy
Have corresponding text information, expression instruction, phonetic order and action command, to guidance machine people under current scene how
Personage's characteristic attribute is showed by certain dialogue and expression, action.
For example, if identifying that the mood of interaction side is sad, the possible character features attribute obtained according to model
Corresponding target dialogue strategy is that type of pacifying and the target dialogue strategy obtain corresponding text information, expression instruction, voice
Instruction and/or action command, for example, text information is comfort property language, and expression instruction is sad, face is soft etc., voice refers to
It is sound is relatively low, intonation is soft etc. to enable, and action command is interactive square toes portion etc. of stroking, and based on expression instruction, phonetic order
And/or action command control robot exports the text information.
It is possible thereby to which so that human-computer interaction promotes user experience effect more close to the current actual conditions in interaction side.
The voice of interaction side can only be changed into word by existing technology, cannot be identified more information, be caused machine in this way
People can only go to generate reply according to single parameter, model in dialogue.
In addition it is generally synthetic based on voice when existing robot dialogue, and lacks the intonation change for different inputs
Change, combination of embodiment of the present invention voice, intonation, expression, action improve interactive experience.In other words, the embodiment of the present invention is real
The man-machine interaction method of multimode input and multimode output is showed so that the dialogue of machine is no longer single.
In summary, the embodiment of the present invention is by state (voice data, the figure of the who object for being presently in scene and facing
As data and contextual data) etc. as reference, obtain robot and need the expression showed and action etc. more under current scene
Mode exports, and realizes the multi-modal human-computer interaction of machine.That is, a kind of comprehensive utilization speech recognition of proposition of the embodiment of the present invention, feelings
Perception not, recognition of face, the technological maheups multimode input such as scene Recognition;So that robot voice cooperation expression, action composition are more
Mould exports, to promote conversational system experience.Wherein, the realization process of the embodiment of the present invention can be divided into two kinds:
1, off-line procedure
Collection data that off-line procedure is mentioned before, the process of training, the embodiment of the present invention is according to people in each scene
The information such as object characteristic attribute and conversation content, for different types of personage, under different scenes, the expression of generation, action
Etc. for statistical analysis, scene characteristic model is established.Also, it is arranged in each scene and the corresponding dialogue plan of character features attribute
Slightly.For example, under certain scene, action corresponding with current chat content, expression, intonation etc.;Either for old man, small
Child, under certain scene, action corresponding with current chat content, expression, intonation etc..
2, in line process
According to current scene data, and the chat content of interaction side and image data etc., the situation of presence is sentenced
It is disconnected, it determines scene characteristic model, interaction side's characteristic under the scene is determined based on scene characteristic model and determines target pair
Words strategy, is then based on target dialogue strategy, obtains expression, the action etc. to match with current scene and chat content, real
The multi-modal output of existing robot.
It should be noted that for embodiment of the method, for simple description, therefore it is all expressed as a series of action group
It closes, but those skilled in the art should understand that, the embodiment of the present invention is not limited by the described action sequence, because according to
According to the embodiment of the present invention, certain steps can be performed in other orders or simultaneously.Secondly, those skilled in the art also should
Know, embodiment described in this description belongs to preferred embodiment, and the involved action not necessarily present invention is implemented
Necessary to example.
With reference to Fig. 3, a kind of structure diagram of the device embodiment of human-computer dialogue interaction of the present invention is shown, it specifically can be with
Including following module:
Interaction side's data acquisition module 301, voice data, image data and contextual data for obtaining interaction side;
Scene characteristic model acquisition module 302, for obtaining corresponding scene characteristic model according to the contextual data;
Character features attribute obtains module 303, special for the voice data and image data to be input to the scene
Sign model obtains target person characteristic attribute;
Target dialogue strategy determining module 304, for determining mesh using the target person characteristic attribute and contextual data
Mark dialog strategy;
Human-computer interaction session module 305, for based on target dialogue policy control robot expression, voice and/
Or action output.
In one preferred embodiment of the invention, interaction side's data acquisition module 301 may include:
First interaction side's data-acquisition submodule, the voice data for acquiring the interaction side based on microphone;It is based on
Camera acquires the image data of the interaction side;And the contextual data is acquired based on sensor.
In one preferred embodiment of the invention, interaction side's data acquisition module 301 may include:
Interactive interface shows submodule, for showing interactive interface;
Second interaction side's data-acquisition submodule, for based on the interactive interface prompt interaction side input voice data,
Image data and contextual data.
In one preferred embodiment of the invention, the scene characteristic model acquisition module 302 may include:
Scene characteristic attributes extraction submodule, for extracting scene characteristic attribute from the contextual data;
Scene characteristic model determination sub-module, for obtaining the corresponding scene characteristic model of the scene characteristic attribute.
In one preferred embodiment of the invention, described device can also include:
Scene characteristic attributes extraction submodule, for extracting scene characteristic attribute from the contextual data;
Scene characteristic model determination sub-module, for obtaining the corresponding scene characteristic model of the scene characteristic attribute.
In one preferred embodiment of the invention, the character features attribute acquisition module 303 may include:
Fisrt feature data extracting sub-module, it is special for extracting intonation characteristic and lines from the voice data
Levy data;
Second feature data extracting sub-module, for going out expressive features data and action spy from described image extracting data
Levy data;
Target person characteristic attribute obtains submodule, for based on the intonation characteristic, patterned feature data, expression
Characteristic and/or motion characteristic data obtain the target person characteristic attribute in conjunction with the scene characteristic model.
In one preferred embodiment of the invention, the target dialogue strategy setting has corresponding word, expression, moves
Make, the human-computer interaction session module 305 may include:
Acquisition submodule is instructed, for obtaining text information corresponding with the target dialogue strategy, expression instruction, voice
Instruction, and/or action command;
Command executing sub module, for controlling the machine based on expression instruction, phonetic order and/or action command
People exports the text information.
For device embodiments, since it is basically similar to the method embodiment, so fairly simple, the correlation of description
Place illustrates referring to the part of embodiment of the method.
Fig. 4 is a kind of block diagram of the device 500 of human-computer interaction shown according to an exemplary embodiment.For example, device 500
Can be mobile phone, computer, digital broadcast terminal, messaging devices, game console, tablet device, Medical Devices,
Body-building equipment, personal digital assistant etc..
With reference to Fig. 4, device 500 may include following one or more components:Processing component 502, memory 504, power supply
Component 506, multimedia component 508, audio component 510, the interface 512 of input/output (I/O), sensor module 514, and
Communication component 516.
The integrated operation of 502 usual control device 500 of processing component, such as with display, call, data communication, phase
Machine operates and record operates associated operation.Processing element 502 may include that one or more processors 520 refer to execute
It enables, to perform all or part of the steps of the methods described above.In addition, processing component 502 may include one or more modules, just
Interaction between processing component 502 and other assemblies.For example, processing component 502 may include multi-media module, it is more to facilitate
Interaction between media component 508 and processing component 502.
Memory 504 is configured as storing various types of data to support the operation in device 500.These data are shown
Example includes instruction for any application program or method that operate on device 500, contact data, and telephone book data disappears
Breath, picture, video etc..Memory 504 can be by any kind of volatibility or non-volatile memory device or their group
It closes and realizes, such as static RAM (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable to compile
Journey read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash
Device, disk or CD.
Power supply module 506 provides electric power for the various assemblies of device 500.Power supply module 506 may include power management system
System, one or more power supplys and other generated with for device 500, management and the associated component of distribution electric power.
Multimedia component 508 is included in the screen of one output interface of offer between described device 500 and user.One
In a little embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen
Curtain may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touch sensings
Device is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding action
Boundary, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, more matchmakers
Body component 508 includes a front camera and/or rear camera.When equipment 500 is in operation mode, such as screening-mode or
When video mode, front camera and/or rear camera can receive external multi-medium data.Each front camera and
Rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio component 510 is configured as output and/or input audio signal.For example, audio component 510 includes a Mike
Wind (MIC), when device 500 is in operation mode, when such as call model, logging mode and speech recognition mode, microphone by with
It is set to reception external audio signal.The received audio signal can be further stored in memory 504 or via communication set
Part 516 is sent.In some embodiments, audio component 510 further includes a loud speaker, is used for exports audio signal.
I/O interfaces 512 provide interface between processing component 502 and peripheral interface module, and above-mentioned peripheral interface module can
To be keyboard, click wheel, button etc..These buttons may include but be not limited to:Home button, volume button, start button and lock
Determine button.
Sensor module 514 includes one or more sensors, and the state for providing various aspects for device 500 is commented
Estimate.For example, sensor module 514 can detect the state that opens/closes of equipment 500, and the relative positioning of component, for example, it is described
Component is the display and keypad of device 500, and sensor module 514 can be with 500 1 components of detection device 500 or device
Position change, the existence or non-existence that user contacts with device 500,500 orientation of device or acceleration/deceleration and device 500
Temperature change.Sensor module 514 may include proximity sensor, be configured to detect without any physical contact
Presence of nearby objects.Sensor module 514 can also include optical sensor, such as CMOS or ccd image sensor, at
As being used in application.In some embodiments, which can also include acceleration transducer, gyro sensors
Device, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 516 is configured to facilitate the communication of wired or wireless way between device 500 and other equipment.Device
500 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or combination thereof.In an exemplary implementation
In example, communication component 514 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel.
In one exemplary embodiment, the communication component 514 further includes near-field communication (NFC) module, to promote short range communication.Example
Such as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology,
Bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, device 500 can be believed by one or more application application-specific integrated circuit (ASIC), number
Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array
(FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.
In the exemplary embodiment, it includes the non-transitorycomputer readable storage medium instructed, example to additionally provide a kind of
Such as include the memory 504 of instruction, above-metioned instruction can be executed by the processor 520 of device 500 to complete the above method.For example,
The non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk
With optical data storage devices etc..
Fig. 5 be shown according to an exemplary embodiment it is a kind of for human-computer dialogue interaction device as server when
Block diagram.The server 1900 can generate bigger difference because configuration or performance are different, may include one or more
Central processing unit (central processing units, CPU) 1922 (for example, one or more processors) and storage
The storage medium 1930 (such as one or one of device 1932, one or more storage application programs 1942 or data 1944
The above mass memory unit).Wherein, memory 1932 and storage medium 1930 can be of short duration storage or persistent storage.Storage
May include one or more modules (diagram does not mark) in the program of storage medium 1930, each module may include pair
Series of instructions operation in server.Further, central processing unit 1922 could be provided as logical with storage medium 1930
Letter executes the series of instructions operation in storage medium 1930 on server 1900.
Server 1900 can also include one or more power supplys 1926, one or more wired or wireless nets
Network interface 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or, one or
More than one operating system 1941, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM
Etc..
A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is by the processing of mobile terminal
When device executes so that mobile terminal is able to carry out a kind of method of human-computer dialogue interaction, the method includes:
Obtain voice data, image data and the contextual data of interaction side;
Corresponding scene characteristic model is obtained according to the contextual data;
The voice data and image data are input to the scene characteristic model and obtain target person characteristic attribute;
Target dialogue strategy is determined using the target person characteristic attribute and contextual data;
Expression, voice based on target dialogue policy control robot and/or action output.
Optionally, the step of voice data for obtaining interaction side, image data and contextual data includes:
The voice data of the interaction side is acquired based on microphone;
The image data of the interaction side is acquired based on camera;
And the contextual data is acquired based on sensor.
Optionally, the step of voice data for obtaining interaction side, image data and contextual data includes:
Show interactive interface;
Based on interactive interface prompt interaction side's input voice data, image data and contextual data.
Optionally, described the step of obtaining corresponding scene characteristic model according to the contextual data, includes:
Scene characteristic attribute is extracted from the contextual data;
Obtain the corresponding scene characteristic model of the scene characteristic attribute.
Optionally, the scene characteristic model is trained in the following way:
Obtain the training sample and the corresponding character features attribute of each training sample under each scene characteristic model;Institute
It includes training voice data and training image data to state training sample;
Trained intonation characteristic and training patterned feature data are extracted from the trained voice data;
Go out trained expressive features data and training action characteristic from the training image extracting data;
The intonation characteristic that includes using the training sample under each scene characteristic, training patterned feature data,
Training expressive features data and/or training action characteristic and corresponding character features attribute, it is special that training obtains each scene
Levy model.
Optionally, described the voice data and image data are input to the scene characteristic model to obtain target person
Characteristic attribute, including:
Intonation characteristic and patterned feature data are extracted from the voice data;
Go out expressive features data and motion characteristic data from described image extracting data;
Based on the intonation characteristic, patterned feature data, expressive features data and/or motion characteristic data, in conjunction with
The scene characteristic model obtains the target person characteristic attribute.
Optionally, expression, voice and/or the action output based on target dialogue policy control robot, packet
It includes:
Obtain text information corresponding with the target dialogue strategy, expression instruction, phonetic order, and/or action command;
The robot, which is controlled, based on expression instruction, phonetic order and/or action command exports the text information.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the present invention
Its embodiment.The present invention is directed to cover the present invention any variations, uses, or adaptations, these modifications, purposes or
Person's adaptive change follows the general principle of the present invention and includes the undocumented common knowledge in the art of the disclosure
Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following
Claim is pointed out.
It should be understood that the invention is not limited in the precision architectures for being described above and being shown in the accompanying drawings, and
And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims.
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit and
Within principle, any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.
Claims (10)
1. a kind of method of human-computer dialogue interaction, which is characterized in that including:
Obtain voice data, image data and the contextual data of interaction side;
Corresponding scene characteristic model is obtained according to the contextual data;
The voice data and image data are input to the scene characteristic model and obtain target person characteristic attribute;
Target dialogue strategy is determined using the target person characteristic attribute and contextual data;
Expression, voice based on target dialogue policy control robot and/or action output.
2. according to the method described in claim 1, it is characterized in that, it is described obtain the voice data of interaction side, image data, with
And the step of contextual data, includes:
The voice data of the interaction side is acquired based on microphone;
The image data of the interaction side is acquired based on camera;
And the contextual data is acquired based on sensor.
3. according to the method described in claim 2, it is characterized in that, it is described obtain the voice data of interaction side, image data, with
And the step of contextual data, includes:
Show interactive interface;
Based on interactive interface prompt interaction side's input voice data, image data and contextual data.
4. according to the method described in claim 1, it is characterized in that, described obtain corresponding scene spy according to the contextual data
Levy model the step of include:
Scene characteristic attribute is extracted from the contextual data;
Obtain the corresponding scene characteristic model of the scene characteristic attribute.
5. according to the method described in claim 1, it is characterized in that, the scene characteristic model is trained in the following way:
Obtain the training sample and the corresponding character features attribute of each training sample under each scene characteristic model;The instruction
It includes training voice data and training image data to practice sample;
Trained intonation characteristic and training patterned feature data are extracted from the trained voice data;
Go out trained expressive features data and training action characteristic from the training image extracting data;
The intonation characteristic that includes using the training sample under each scene characteristic, training patterned feature data, training
Expressive features data and/or training action characteristic and corresponding character features attribute, training obtain each scene characteristic mould
Type.
6. according to the method described in claim 5, it is characterized in that, described be input to institute by the voice data and image data
It states scene characteristic model and obtains target person characteristic attribute, including:
Intonation characteristic and patterned feature data are extracted from the voice data;
Go out expressive features data and motion characteristic data from described image extracting data;
Based on the intonation characteristic, patterned feature data, expressive features data and/or motion characteristic data, in conjunction with described
Scene characteristic model obtains the target person characteristic attribute.
7. according to any methods of claim 1-6, which is characterized in that described to be based on the target dialogue policy control machine
Expression, voice and/or the action output of device people, including:
Obtain text information corresponding with the target dialogue strategy, expression instruction, phonetic order, and/or action command;
The robot, which is controlled, based on expression instruction, phonetic order and/or action command exports the text information.
8. a kind of device of human-computer dialogue interaction, which is characterized in that including:
Interaction side's data acquisition module, voice data, image data and contextual data for obtaining interaction side;
Scene characteristic model acquisition module, for obtaining corresponding scene characteristic model according to the contextual data;
Character features attribute obtains module, is obtained for the voice data and image data to be input to the scene characteristic model
To target person characteristic attribute;
Target dialogue strategy determining module, for determining target dialogue plan using the target person characteristic attribute and contextual data
Slightly;
Human-computer interaction session module is used for the expression based on target dialogue policy control robot, voice and/or acts defeated
Go out.
9. device according to claim 8, which is characterized in that interaction side's data acquisition module includes:
First interaction side's data-acquisition submodule, the voice data for acquiring the interaction side based on microphone;Based on camera shooting
Head acquires the image data of the interaction side;And the contextual data is acquired based on sensor.
10. a kind of device for human-computer dialogue interaction, which is characterized in that include memory and one or one with
On program, one of them either more than one program be stored in memory and be configured to by one or more than one
It includes the instruction for being operated below that processor, which executes the one or more programs,:
Obtain voice data, image data and the contextual data of interaction side;
Corresponding scene characteristic model is obtained according to the contextual data;
The voice data and image data are input to the scene characteristic model and obtain target person characteristic attribute;
Target dialogue strategy is determined using the target person characteristic attribute and contextual data;
Expression, voice based on target dialogue policy control robot and/or action output.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710056801.3A CN108363706B (en) | 2017-01-25 | 2017-01-25 | Method and device for man-machine dialogue interaction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710056801.3A CN108363706B (en) | 2017-01-25 | 2017-01-25 | Method and device for man-machine dialogue interaction |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108363706A true CN108363706A (en) | 2018-08-03 |
CN108363706B CN108363706B (en) | 2023-07-18 |
Family
ID=63011370
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710056801.3A Active CN108363706B (en) | 2017-01-25 | 2017-01-25 | Method and device for man-machine dialogue interaction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108363706B (en) |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108942949A (en) * | 2018-09-26 | 2018-12-07 | 北京子歌人工智能科技有限公司 | A kind of robot control method based on artificial intelligence, system and intelligent robot |
CN109101663A (en) * | 2018-09-18 | 2018-12-28 | 宁波众鑫网络科技股份有限公司 | A kind of robot conversational system Internet-based |
CN109451188A (en) * | 2018-11-29 | 2019-03-08 | 平安科技(深圳)有限公司 | Method, apparatus, computer equipment and the storage medium of the self-service response of otherness |
CN109784157A (en) * | 2018-12-11 | 2019-05-21 | 口碑(上海)信息技术有限公司 | A kind of image processing method, apparatus and system |
CN109801632A (en) * | 2019-03-08 | 2019-05-24 | 北京马尔马拉科技有限公司 | A kind of artificial intelligent voice robot system and method based on big data |
CN110008321A (en) * | 2019-03-07 | 2019-07-12 | 腾讯科技(深圳)有限公司 | Information interacting method and device, storage medium and electronic device |
CN110085225A (en) * | 2019-04-24 | 2019-08-02 | 北京百度网讯科技有限公司 | Voice interactive method, device, intelligent robot and computer readable storage medium |
CN110125932A (en) * | 2019-05-06 | 2019-08-16 | 达闼科技(北京)有限公司 | A kind of dialogue exchange method, robot and the readable storage medium storing program for executing of robot |
CN110188220A (en) * | 2019-05-17 | 2019-08-30 | 北京小米移动软件有限公司 | Image presentation method, device and smart machine |
CN110209792A (en) * | 2019-06-13 | 2019-09-06 | 苏州思必驰信息科技有限公司 | Talk with painted eggshell generation method and system |
CN110347247A (en) * | 2019-06-19 | 2019-10-18 | 深圳前海达闼云端智能科技有限公司 | Man-machine interaction method, device, storage medium and electronic equipment |
CN110689078A (en) * | 2019-09-29 | 2020-01-14 | 浙江连信科技有限公司 | Man-machine interaction method and device based on personality classification model and computer equipment |
JP2020064616A (en) * | 2018-10-18 | 2020-04-23 | 深▲せん▼前海達闥云端智能科技有限公司Cloudminds (Shenzhen) Robotics Systems Co.,Ltd. | Virtual robot interaction method, device, storage medium, and electronic device |
CN111435268A (en) * | 2019-01-11 | 2020-07-21 | 合肥虹慧达科技有限公司 | Human-computer interaction method based on image recognition and reconstruction and system and device using same |
CN111540358A (en) * | 2020-04-26 | 2020-08-14 | 云知声智能科技股份有限公司 | Man-machine interaction method, device, equipment and storage medium |
CN111951787A (en) * | 2020-07-31 | 2020-11-17 | 北京小米松果电子有限公司 | Voice output method, device, storage medium and electronic equipment |
CN112232101A (en) * | 2019-07-15 | 2021-01-15 | 北京正和思齐数据科技有限公司 | User communication state evaluation method, device and system |
CN112240458A (en) * | 2020-10-14 | 2021-01-19 | 上海宝钿科技产业发展有限公司 | Quality control method for multi-modal scene specific target recognition model |
CN112918381A (en) * | 2019-12-06 | 2021-06-08 | 广州汽车集团股份有限公司 | Method, device and system for welcoming and delivering guests by vehicle-mounted robot |
WO2021174757A1 (en) * | 2020-03-03 | 2021-09-10 | 深圳壹账通智能科技有限公司 | Method and apparatus for recognizing emotion in voice, electronic device and computer-readable storage medium |
CN114356068A (en) * | 2020-09-28 | 2022-04-15 | 北京搜狗智能科技有限公司 | Data processing method and device and electronic equipment |
CN114566145A (en) * | 2022-03-04 | 2022-05-31 | 河南云迹智能技术有限公司 | Data interaction method, system and medium |
CN116880697A (en) * | 2023-07-31 | 2023-10-13 | 深圳市麦驰安防技术有限公司 | Man-machine interaction method and system based on scene object |
WO2023246163A1 (en) * | 2022-06-22 | 2023-12-28 | 海信视像科技股份有限公司 | Virtual digital human driving method, apparatus, device, and medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140359536A1 (en) * | 2013-06-03 | 2014-12-04 | Amchael Visual Technology Corporation | Three-dimensional (3d) human-computer interaction system using computer mouse as a 3d pointing device and an operation method thereof |
CN105280183A (en) * | 2015-09-10 | 2016-01-27 | 百度在线网络技术(北京)有限公司 | Voice interaction method and system |
CN105913039A (en) * | 2016-04-26 | 2016-08-31 | 北京光年无限科技有限公司 | Visual-and-vocal sense based dialogue data interactive processing method and apparatus |
CN106228983A (en) * | 2016-08-23 | 2016-12-14 | 北京谛听机器人科技有限公司 | Scene process method and system during a kind of man-machine natural language is mutual |
-
2017
- 2017-01-25 CN CN201710056801.3A patent/CN108363706B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140359536A1 (en) * | 2013-06-03 | 2014-12-04 | Amchael Visual Technology Corporation | Three-dimensional (3d) human-computer interaction system using computer mouse as a 3d pointing device and an operation method thereof |
CN105280183A (en) * | 2015-09-10 | 2016-01-27 | 百度在线网络技术(北京)有限公司 | Voice interaction method and system |
CN105913039A (en) * | 2016-04-26 | 2016-08-31 | 北京光年无限科技有限公司 | Visual-and-vocal sense based dialogue data interactive processing method and apparatus |
CN106228983A (en) * | 2016-08-23 | 2016-12-14 | 北京谛听机器人科技有限公司 | Scene process method and system during a kind of man-machine natural language is mutual |
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109101663A (en) * | 2018-09-18 | 2018-12-28 | 宁波众鑫网络科技股份有限公司 | A kind of robot conversational system Internet-based |
CN108942949A (en) * | 2018-09-26 | 2018-12-07 | 北京子歌人工智能科技有限公司 | A kind of robot control method based on artificial intelligence, system and intelligent robot |
JP2020064616A (en) * | 2018-10-18 | 2020-04-23 | 深▲せん▼前海達闥云端智能科技有限公司Cloudminds (Shenzhen) Robotics Systems Co.,Ltd. | Virtual robot interaction method, device, storage medium, and electronic device |
CN109451188A (en) * | 2018-11-29 | 2019-03-08 | 平安科技(深圳)有限公司 | Method, apparatus, computer equipment and the storage medium of the self-service response of otherness |
CN109784157A (en) * | 2018-12-11 | 2019-05-21 | 口碑(上海)信息技术有限公司 | A kind of image processing method, apparatus and system |
CN111435268A (en) * | 2019-01-11 | 2020-07-21 | 合肥虹慧达科技有限公司 | Human-computer interaction method based on image recognition and reconstruction and system and device using same |
CN110008321A (en) * | 2019-03-07 | 2019-07-12 | 腾讯科技(深圳)有限公司 | Information interacting method and device, storage medium and electronic device |
CN110008321B (en) * | 2019-03-07 | 2021-06-25 | 腾讯科技(深圳)有限公司 | Information interaction method and device, storage medium and electronic device |
CN109801632A (en) * | 2019-03-08 | 2019-05-24 | 北京马尔马拉科技有限公司 | A kind of artificial intelligent voice robot system and method based on big data |
CN110085225B (en) * | 2019-04-24 | 2024-01-02 | 北京百度网讯科技有限公司 | Voice interaction method and device, intelligent robot and computer readable storage medium |
CN110085225A (en) * | 2019-04-24 | 2019-08-02 | 北京百度网讯科技有限公司 | Voice interactive method, device, intelligent robot and computer readable storage medium |
CN110125932B (en) * | 2019-05-06 | 2024-03-19 | 达闼科技(北京)有限公司 | Dialogue interaction method for robot, robot and readable storage medium |
CN110125932A (en) * | 2019-05-06 | 2019-08-16 | 达闼科技(北京)有限公司 | A kind of dialogue exchange method, robot and the readable storage medium storing program for executing of robot |
CN110188220A (en) * | 2019-05-17 | 2019-08-30 | 北京小米移动软件有限公司 | Image presentation method, device and smart machine |
CN110209792A (en) * | 2019-06-13 | 2019-09-06 | 苏州思必驰信息科技有限公司 | Talk with painted eggshell generation method and system |
CN110209792B (en) * | 2019-06-13 | 2021-07-06 | 思必驰科技股份有限公司 | Method and system for generating dialogue color eggs |
CN110347247A (en) * | 2019-06-19 | 2019-10-18 | 深圳前海达闼云端智能科技有限公司 | Man-machine interaction method, device, storage medium and electronic equipment |
CN112232101A (en) * | 2019-07-15 | 2021-01-15 | 北京正和思齐数据科技有限公司 | User communication state evaluation method, device and system |
CN110689078A (en) * | 2019-09-29 | 2020-01-14 | 浙江连信科技有限公司 | Man-machine interaction method and device based on personality classification model and computer equipment |
CN112918381B (en) * | 2019-12-06 | 2023-10-27 | 广州汽车集团股份有限公司 | Vehicle-mounted robot welcome method, device and system |
CN112918381A (en) * | 2019-12-06 | 2021-06-08 | 广州汽车集团股份有限公司 | Method, device and system for welcoming and delivering guests by vehicle-mounted robot |
WO2021174757A1 (en) * | 2020-03-03 | 2021-09-10 | 深圳壹账通智能科技有限公司 | Method and apparatus for recognizing emotion in voice, electronic device and computer-readable storage medium |
CN111540358A (en) * | 2020-04-26 | 2020-08-14 | 云知声智能科技股份有限公司 | Man-machine interaction method, device, equipment and storage medium |
CN111951787A (en) * | 2020-07-31 | 2020-11-17 | 北京小米松果电子有限公司 | Voice output method, device, storage medium and electronic equipment |
CN114356068A (en) * | 2020-09-28 | 2022-04-15 | 北京搜狗智能科技有限公司 | Data processing method and device and electronic equipment |
CN114356068B (en) * | 2020-09-28 | 2023-08-25 | 北京搜狗智能科技有限公司 | Data processing method and device and electronic equipment |
CN112240458A (en) * | 2020-10-14 | 2021-01-19 | 上海宝钿科技产业发展有限公司 | Quality control method for multi-modal scene specific target recognition model |
CN114566145A (en) * | 2022-03-04 | 2022-05-31 | 河南云迹智能技术有限公司 | Data interaction method, system and medium |
WO2023246163A1 (en) * | 2022-06-22 | 2023-12-28 | 海信视像科技股份有限公司 | Virtual digital human driving method, apparatus, device, and medium |
CN116880697A (en) * | 2023-07-31 | 2023-10-13 | 深圳市麦驰安防技术有限公司 | Man-machine interaction method and system based on scene object |
CN116880697B (en) * | 2023-07-31 | 2024-04-05 | 深圳市麦驰安防技术有限公司 | Man-machine interaction method and system based on scene object |
Also Published As
Publication number | Publication date |
---|---|
CN108363706B (en) | 2023-07-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108363706A (en) | The method and apparatus of human-computer dialogue interaction, the device interacted for human-computer dialogue | |
US11241789B2 (en) | Data processing method for care-giving robot and apparatus | |
US11017779B2 (en) | System and method for speech understanding via integrated audio and visual based speech recognition | |
US20190371318A1 (en) | System and method for adaptive detection of spoken language via multiple speech models | |
JP2019164345A (en) | System for processing sound data, user terminal and method for controlling the system | |
US11017551B2 (en) | System and method for identifying a point of interest based on intersecting visual trajectories | |
US11200902B2 (en) | System and method for disambiguating a source of sound based on detected lip movement | |
CN106502382B (en) | Active interaction method and system for intelligent robot | |
US10785489B2 (en) | System and method for visual rendering based on sparse samples with predicted motion | |
CN107221330A (en) | Punctuate adding method and device, the device added for punctuate | |
US11308312B2 (en) | System and method for reconstructing unoccupied 3D space | |
US20190251350A1 (en) | System and method for inferring scenes based on visual context-free grammar model | |
CN110598576A (en) | Sign language interaction method and device and computer medium | |
CN107274903A (en) | Text handling method and device, the device for text-processing | |
CN108073572A (en) | Information processing method and its device, simultaneous interpretation system | |
CN108628819A (en) | Treating method and apparatus, the device for processing | |
CN111149172B (en) | Emotion management method, device and computer-readable storage medium | |
CN108648754A (en) | Sound control method and device | |
WO2023231211A1 (en) | Voice recognition method and apparatus, electronic device, storage medium, and product | |
CN109102812B (en) | Voiceprint recognition method and system and electronic equipment | |
CN113270087A (en) | Processing method, mobile terminal and storage medium | |
EP3288035B1 (en) | Personal audio analytics and behavior modification feedback | |
WO2020087534A1 (en) | Generating response in conversation | |
EP4350690A1 (en) | Artificial intelligence device and operating method thereof | |
CN109102810A (en) | Method for recognizing sound-groove and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |