CN108363706A - The method and apparatus of human-computer dialogue interaction, the device interacted for human-computer dialogue - Google Patents

The method and apparatus of human-computer dialogue interaction, the device interacted for human-computer dialogue Download PDF

Info

Publication number
CN108363706A
CN108363706A CN201710056801.3A CN201710056801A CN108363706A CN 108363706 A CN108363706 A CN 108363706A CN 201710056801 A CN201710056801 A CN 201710056801A CN 108363706 A CN108363706 A CN 108363706A
Authority
CN
China
Prior art keywords
data
interaction
characteristic
voice
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710056801.3A
Other languages
Chinese (zh)
Other versions
CN108363706B (en
Inventor
赵海舟
许静芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN201710056801.3A priority Critical patent/CN108363706B/en
Publication of CN108363706A publication Critical patent/CN108363706A/en
Application granted granted Critical
Publication of CN108363706B publication Critical patent/CN108363706B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • G10L13/0335Pitch control
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/227Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of the speaker; Human-factor methodology
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/228Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Child & Adolescent Psychology (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Hospice & Palliative Care (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Manipulator (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

An embodiment of the present invention provides a kind of method and apparatus of human-computer dialogue interaction, wherein the method includes:Obtain voice data, image data and the contextual data of interaction side;Corresponding scene characteristic model is obtained according to the contextual data;The voice data and image data are input to the scene characteristic model and obtain target person characteristic attribute;Target dialogue strategy is determined using the target person characteristic attribute and contextual data;Expression, voice based on target dialogue policy control robot and/or action output.The embodiment of the present invention so that during human-computer interaction, machine can coordinate the feature of interaction side's current session, the dialogue to personalize with the side of interaction, to improve interaction side's interactive experience according to target dialogue strategy.

Description

The method and apparatus of human-computer dialogue interaction, the device interacted for human-computer dialogue
Technical field
The present invention relates to technical field of data processing, man-machine more particularly to the method and one kind of a kind of human-computer dialogue interaction Talk with the device of interaction and the device of user's human-computer dialogue interaction.
Background technology
Man-machine interaction is exactly the process that people carries out information exchange with machine.Information flowrate is that measurement is man-machine The most important index of dialogue interaction.Man-machine interaction will follow the Evolution Paths of person to person's interaction, and dialogue interaction is highest The Health For All mode of effect will also become most efficient human-computer dialogue interactive mode.
The voice messaging of interaction side can only be changed into text information by existing interactive, cannot be identified more Information causes machine that can only be gone to generate reply according to single parameter, model when replying interaction side in this way.In addition, existing Interactive system is generally synthetic when replying based on voice messaging, and dialogue form is dull, and interaction cube is tested ineffective.
Invention content
In view of the above problems, it is proposed that the embodiment of the present invention overcoming the above problem or at least partly in order to provide one kind The device of a kind of method of human-computer dialogue interaction to solve the above problems and a kind of corresponding human-computer dialogue interaction.
To solve the above-mentioned problems, the embodiment of the invention discloses a kind of methods of human-computer dialogue interaction, including:
Obtain voice data, image data and the contextual data of interaction side;
Corresponding scene characteristic model is obtained according to the contextual data;
The voice data and image data are input to the scene characteristic model and obtain target person characteristic attribute;
Target dialogue strategy is determined using the target person characteristic attribute and contextual data;
Expression, voice based on target dialogue policy control robot and/or action output.
Optionally, the step of voice data for obtaining interaction side, image data and contextual data includes:
The voice data of the interaction side is acquired based on microphone;
The image data of the interaction side is acquired based on camera;
And the contextual data is acquired based on sensor.
Optionally, the step of voice data for obtaining interaction side, image data and contextual data includes:
Show interactive interface;
Based on interactive interface prompt interaction side's input voice data, image data and contextual data.
Optionally, described the step of obtaining corresponding scene characteristic model according to the contextual data, includes:
Scene characteristic attribute is extracted from the contextual data;
Obtain the corresponding scene characteristic model of the scene characteristic attribute.
Optionally, the scene characteristic model is trained in the following way:
Obtain the training sample and the corresponding character features attribute of each training sample under each scene characteristic model;Institute It includes training voice data and training image data to state training sample;
Trained intonation characteristic and training patterned feature data are extracted from the trained voice data;
Go out trained expressive features data and training action characteristic from the training image extracting data;
The intonation characteristic that includes using the training sample under each scene characteristic, training patterned feature data, Training expressive features data and/or training action characteristic and corresponding character features attribute, it is special that training obtains each scene Levy model.
Optionally, described the voice data and image data are input to the scene characteristic model to obtain target person Characteristic attribute, including:
Intonation characteristic and patterned feature data are extracted from the voice data;
Go out expressive features data and motion characteristic data from described image extracting data;
Based on the intonation characteristic, patterned feature data, expressive features data and/or motion characteristic data, in conjunction with The scene characteristic model obtains the target person characteristic attribute.
Optionally, expression, voice and/or the action output based on target dialogue policy control robot, packet It includes:
Obtain text information corresponding with the target dialogue strategy, expression instruction, phonetic order, and/or action command;
The robot, which is controlled, based on expression instruction, phonetic order and/or action command exports the text information.
The embodiment of the invention also discloses a kind of devices of human-computer dialogue interaction, including:
Interaction side's data acquisition module, voice data, image data and contextual data for obtaining interaction side;
Scene characteristic model acquisition module, for obtaining corresponding scene characteristic model according to the contextual data;
Character features attribute obtains module, for the voice data and image data to be input to the scene characteristic mould Type obtains target person characteristic attribute;
Target dialogue strategy determining module, for determining target pair using the target person characteristic attribute and contextual data Words strategy;
Human-computer interaction session module is used for the expression based on target dialogue policy control robot, voice and/or moves It exports.
Optionally, interaction side's data acquisition module includes:
First interaction side's data-acquisition submodule, the voice data for acquiring the interaction side based on microphone;It is based on Camera acquires the image data of the interaction side;And the contextual data is acquired based on sensor.
Optionally, interaction side's data acquisition module includes:
Interactive interface shows submodule, for showing interactive interface;
Second interaction side's data-acquisition submodule, for based on the interactive interface prompt interaction side input voice data, Image data and contextual data.
Optionally, the scene characteristic model acquisition module includes:
Scene characteristic attributes extraction submodule, for extracting scene characteristic attribute from the contextual data;
Scene characteristic model determination sub-module, for obtaining the corresponding scene characteristic model of the scene characteristic attribute.
Optionally, described device further includes:
Training sample acquisition module, for obtaining training sample and each training sample under each scene characteristic model Corresponding character features attribute;The training sample includes training voice data and training image data;
First training characteristics data extraction module, for extracting trained intonation characteristic from the trained voice data According to training patterned feature data;
Second training characteristics data extraction module, for going out trained expressive features number from the training image extracting data According to training action characteristic;
Scene characteristic model training module, the intonation for including using the training sample under each scene characteristic are special Levy data, training patterned feature data, training expressive features data and/or training action characteristic and corresponding personage Characteristic attribute, training obtain each scene characteristic model.
Optionally, the character features attribute acquisition module includes:
Fisrt feature data extracting sub-module, it is special for extracting intonation characteristic and lines from the voice data Levy data;
Second feature data extracting sub-module, for going out expressive features data and action spy from described image extracting data Levy data;
Target person characteristic attribute obtains submodule, for based on the intonation characteristic, patterned feature data, expression Characteristic and/or motion characteristic data obtain the target person characteristic attribute in conjunction with the scene characteristic model.
Optionally, the target dialogue strategy setting has corresponding word, expression, action, the human-computer interaction to talk with mould Block includes:
Acquisition submodule is instructed, for obtaining text information corresponding with the target dialogue strategy, expression instruction, voice Instruction, and/or action command;
Command executing sub module, for controlling the machine based on expression instruction, phonetic order and/or action command People exports the text information.
The embodiment of the present invention also provides a kind of device for human-computer dialogue interaction, include memory and one or The more than one program of person, one of them either more than one program be stored in memory and be configured to by one or It includes the instruction for being operated below that more than one processor, which executes the one or more programs,:
Obtain voice data, image data and the contextual data of interaction side;
Corresponding scene characteristic model is obtained according to the contextual data;
The voice data and image data are input to the scene characteristic model and obtain target person characteristic attribute;
Target dialogue strategy is determined using the target person characteristic attribute and contextual data;
Expression, voice based on target dialogue policy control robot and/or action output.
The embodiment of the present invention includes following advantages:
The embodiment of the present invention obtains voice data, image data and the scene number of interaction side during human-computer interaction According to, and obtained based on contextual data and meet the scene characteristic models of session operational scenarios instantly, then by the voice data of interaction side with/ Or image data is input to scene characteristic model and obtains character features attribute, and corresponding mesh is formulated based on personage's characteristic attribute Mark dialog strategy so that during human-computer interaction, machine can coordinate interaction side's current session according to target dialogue strategy Feature, the dialogue to personalize with the side of interaction.
The embodiment of the present invention can meet the scene characteristic mould under respective fields scape according to the contextual data selection collected Type determines the character features category to match with the phonetic feature of the current side of interaction and/characteristics of image according to the scene characteristic model Property, wherein character features attribute can reflect the intention for the side of speaking, mood, and character features attribute is not had under different scenes The side of speaking phonetic feature and characteristics of image can be slightly different.Therefore, scene characteristic corresponding with current scene data is selected Model determines character features attribute, can make the intention for the side of speaking and the emotion expression service more accurate.The embodiment of the present invention into One step can formulate the target dialogue strategy interacted with the side of interaction according to character features attribute, can make machine and the side of interaction Interaction it is more personalized, being capable of preferably service interaction side.
Description of the drawings
Fig. 1 is a kind of step flow chart of the embodiment of the method one of human-computer dialogue interaction of the present invention;
Fig. 2 is a kind of step flow chart of the embodiment of the method two of human-computer dialogue interaction of the present invention;
Fig. 3 is a kind of structure diagram of the device embodiment of human-computer dialogue interaction of the present invention;
Fig. 4 is a kind of block diagram of the device of human-computer interaction shown according to an exemplary embodiment;
Fig. 5 be shown according to an exemplary embodiment it is a kind of for human-computer dialogue interaction device as server when Block diagram.
Specific implementation mode
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, below in conjunction with the accompanying drawings and specific real Applying mode, the present invention is described in further detail.
Referring to Fig.1, a kind of step flow chart of the embodiment of the method one of human-computer dialogue interaction of the present invention is shown, specifically It may include steps of:
Step 101, the voice data of acquisition interaction side and image data and contextual data;
It engages in the dialogue with robot in interaction side during interact, obtains the associated data of interaction side in real time.Wherein, should Associated data may include the voice data of interaction side, image data, and, the contextual data etc. of interaction scenarios where dialogue Deng.Wherein, which can be obtained by data acquisition equipment, can also be by being manually entered.
In embodiments of the present invention, above-mentioned associated data can actively be acquired.As, once it is determined that current robot In human-computer interaction state, then automatically by robot built-in or the data acquisition equipment of external connection to currently interactive object (as interaction side) carries out the acquisition of voice data and/image data and contextual data.
Specifically, the step 101 may include following sub-step:
Sub-step S11 acquires the voice data of interaction side based on microphone;The picture number of interaction side is acquired based on camera According to;And the contextual data of current session is acquired based on sensor.
Data acquisition equipment there are many being installed in the robot interacted with the side of interaction, including robot interior are pacified It is filled with and external connection.Different interaction side's associated datas can be collected based on different data acquisition equipments, such as interactive Voice data when side speaks, the facial expression data of interaction side, the gesture motion data of interaction side, what interaction side was presently in Scene (environment) data etc..
In one preferred embodiment of the invention, the step 101 may include following sub-step:
Sub-step S12 shows interactive interface;
Sub-step S13, based on interactive interface prompt interaction side's input voice data, image data and scene number According to.Optionally, the embodiment of the present invention can also input the association by inquiry interaction side or by interactive interface guided interaction side Data can be specifically that interaction side shows interactive interface, and the input of prompt interaction side is corresponding item by item on the interactive interface Data, such as prompt alignment microphone input voice data, alignment cameras input image data are currently located field according to interaction side Scape determines contextual data.Further, it includes the data such as age, gender information that can also prompt the input of interaction side.
Certainly, in practice, the mode that opposed robots voluntarily acquire in such a way that interaction side inputs more not in time, And the data that can be obtained are less, can not cope with the variation of interaction side, therefore in interactive process, should be with machine certainly Based on the data of dynamic acquisition, supplemented by the data that interaction side inputs.
It should be noted that the embodiment of the present invention can also acquire the Human Physiology of interaction side by wearable device Data, such as heartbeat, breathing, digestion, body temperature etc. are based on these data, and robot can analyze and identifying processing, to sentence Break and the emotional characteristics of interaction side.
Step 102, corresponding scene characteristic model is obtained according to the contextual data;
In embodiments of the present invention, it is contemplated that under different scenes, the character features attribute of interaction side will be different, So that the associated data of the interaction side collected also slightly has difference.Such as personage's performance indoors may be more overcautious, It is outdoor then may more decontrol, then may be more dull in driving, therefore, the embodiment of the present invention is according to different scenes Corresponding scene characteristic model is arranged in corresponding character features attribute.For example, corresponding and indoor environment can be arranged indoors Corresponding outdoor character features attribute can be arranged in outdoor in the corresponding indoor characteristic model of more matched character features attribute Driving corresponding with the more matched character features attribute of environment when driving a vehicle can be arranged in driving in corresponding outdoor characteristics model Characteristic model.
In one preferred embodiment of the invention, described 202 may include following sub-step:
Sub-step S21 extracts scene characteristic attribute from the contextual data;
Sub-step S22 obtains the corresponding scene characteristic model of the scene characteristic attribute.
The embodiment of the present invention can acquire the specific environment information of human-computer dialogue occurrence scene by sensor, that is, collect Contextual data.Specifically, temperature, humidity can be identified by Temperature Humidity Sensor, is to move by velocity sensor identification It is dynamic or static, it is daytime or evening by optical sensor identification, is indoor or outdoor etc. by environmental sensor identification. For the data that sensor recognizes, corresponding scene characteristic attribute will be therefrom extracted, is based on these scene characteristic attributes, it can Gone out whether indoors, whether in driving etc. with simple analysis.For example, the speed data that velocity sensor identifies, Ke Yicong In extract the current velocity information in interaction side, if speed reaches preset Vehicle Speed, so that it may to think interaction side Under driving scene.
Certainly, when the embodiments of the present invention are specifically implemented, can also be identified using other sensors or other modes Go out the scene that interaction side is presently in, the embodiment of the present invention does not limit this.
In one preferred embodiment of the invention, the scene characteristic model can be trained in the following way:
Obtain the training sample and the corresponding character features attribute of each training sample under each scene characteristic;The instruction It includes training voice data and training image data to practice sample;
Trained intonation characteristic and training patterned feature data are extracted from the trained voice data;
Go out trained expressive features data and training action characteristic from the training image extracting data;
The intonation characteristic that includes using the training sample under each scene characteristic, training patterned feature data, Training expressive features data and/or training action characteristic and corresponding character features attribute, it is special that training obtains each scene Levy model.
The embodiment of the present invention can utilize magnanimity training sample and training sample under each scene characteristic collected in advance This corresponding character features attribute is as training data.Training sample may include trained voice data and training image data. For example, voice collecting and Image Acquisition are carried out to the different personages under some scene, and according to a certain personage's collected Voice data and image data establish the correspondence between the character features attribute of the personage, constitute a training data.Example Such as, under some scene, the voice data and image data of a child are obtained, and is extracted from the voice data of the child small The intonation characteristic and patterned feature data of child, and extract from the image data of child the facial expression feature number of child It establishes the character features attribute with child using the partial data as training sample according to gesture feature data and works as front court The association of scape feature is stored in as a training sample in training sample database.Based on this so that preserved in training sample database The training sample for different scenes of magnanimity.
By deep learning, according to the training sample of magnanimity under each scene characteristic, can train to obtain under each scene Scene characteristic model.Wherein, the corresponding character features attribute of training sample may include age attribute (child/youth/old People), personality attribute (optimistic/containing/shy) and mood attribute (sad/tranquil/excited) etc..It can be to language after the completion of training Sound data, image data make corresponding classification.
Scene characteristic model can determine corresponding character features attribute based on the voice data of input, image data.
In the concrete application of the present invention, the scene of the mood of interaction side can be trained by training intonation characteristic Characteristic model further, can also add other by training characteristics data come the scene characteristic model of sport career age Data are trained so that model accuracy more increases.
Step 103, the voice data and image data are input to the scene characteristic model and obtain target person spy Levy attribute.
Specifically, step 103 of the embodiment of the present invention may include:
Intonation characteristic and patterned feature data are extracted from the voice data;
Go out expressive features data and motion characteristic data from described image extracting data;
Based on the intonation characteristic, patterned feature data, expressive features data and/or motion characteristic data, in conjunction with The scene characteristic model obtains the target person characteristic attribute.
For collected interaction side's associated data, for example, voice data and/or image data will be input to it is corresponding Scene characteristic model is obtained special with current scene and the more matched personage of corresponding interaction side's feature based on the scene characteristic model Levy attribute.Specifically, character features attribute may include essential characteristic, emotional characteristics, character trait etc., wherein mood Feature may include excitement, tranquil, sad, and essential characteristic may include old man, children, male, women etc., and character trait can be with Including optimistic, active, containing, shy etc..
It should be noted that division and judgement for character features attribute, can be arranged above-mentioned one according to actual demand Kind is a variety of, or the other character features attributes of addition, and the embodiment of the present invention is to this without limiting.
Step 104, target dialogue strategy is determined using the target person characteristic attribute and contextual data;
Different dialog strategies is arranged based on different character features attributes and scene characteristic for the embodiment of the present invention, for example, If identifying the atmosphere that current scene is relatively more oppressive, the mood of interaction side is sad, can be used in dialog procedure The dialog strategy consoled.
Step 105, the expression, voice based on target dialogue policy control robot and/or action output.
During human-computer dialogue, the control expression of robot that can be based on target dialogue strategy, voice, and/or dynamic It exports, engages in the dialogue with the side of interaction.Wherein, expression can be facial expression, such as the performance of the performance characteristic, face of eyes Feature etc.;Voice can be that intonation, the height of sound that robot voice exports are low;Action can be gesture, the head of robot Portion acts and the action etc. of other limbs.
The patterned feature data in voice data can be acquired in the embodiment of the present invention by Application on Voiceprint Recognition.Application on Voiceprint Recognition, Also referred to as Speaker Identification has two classes, i.e. speaker's identification and speaker verification.The former is judging that certain section of voice is several people Which of described in, be " multiselect one " problem, and the latter is confirming whether certain section of voice is described in specified someone 's.
The embodiment of the present invention can using patterned feature data as the identity of interaction side, when recognize there are two or Can be the language which interaction side generates based on the judgement of patterned feature data when the more than two interaction sides of person carry out human-computer dialogue Sound data, and the voice data that is generated based on the interaction side and/or picture number are come the target dialogue for the interaction side that determines Strategy engages in the dialogue with the interaction side, rather than uses identical dialog strategy for multiple interaction sides of two interaction sides, thus The individual demand of interaction side can be met so that dialogue is more interesting.
The embodiment of the present invention is during human-computer interaction, the voice data and image data of acquisition interaction side and field Scape data, and obtained based on contextual data and meet the scene characteristic model of scene instantly, then by the voice data of interaction side with Image data is input to scene characteristic model and obtains the character features attribute to match with current scene and the side's of interaction feature, and base Corresponding target dialogue strategy is formulated in personage's characteristic attribute, the expression and action for controlling robot export so that interaction side During human-computer interaction, robot can coordinate according to target dialogue strategy to engage in the dialogue with the side of interaction.
The embodiment of the present invention can select to meet the scene characteristic model under respective fields scape, to determine machine under current scene People needs the character features attribute showed, wherein character features attribute can reflect personality, intention, mood of the mankind etc., not The voice data and image data that the mankind of different personage's characteristic attributes generate under same scene can be slightly different, and therefore, be passed through To determine, robot needs the character features attribute shown to scene characteristic model under corresponding scene under current scene, can make It is more accurate to obtain personalizing for robot.The embodiment of the present invention is further according to the target person characteristic attribute of acquisition come control machine The expression of device people and action export, and the interaction of machine and the mankind can be made more personalized, being capable of preferably service user.
With reference to Fig. 2, a kind of step flow chart of the embodiment of the method two of human-computer dialogue interaction of the present invention is shown, specifically It may include steps of:
Step 201, voice data, image data and the contextual data of interaction side are obtained;
Step 202, corresponding scene characteristic model is obtained according to the contextual data;
Step 203, the voice data and image data are input to the scene characteristic model and obtain target person spy Levy attribute;
Step 204, target dialogue strategy is determined using the target person characteristic attribute and contextual data;
Step 205, text information corresponding with the target dialogue strategy, expression instruction, phonetic order are obtained and/or is moved It instructs;
Since the specific implementation mode of the step 201- steps 205 in embodiment of the method two essentially corresponds to method above-mentioned The specific implementation mode of embodiment one, therefore the present embodiment is for not detailed place in the description of step 201- steps 205, Ke Yican As soon as seeing the related description in previous embodiment, do not repeat herein.
Step 206, expression instruction, phonetic order and/or action command is based on to control described in the robot output Text information.
In embodiments of the present invention, target dialogue strategy is determined based on character features attribute, be arranged in target dialogue strategy Have corresponding text information, expression instruction, phonetic order and action command, to guidance machine people under current scene how Personage's characteristic attribute is showed by certain dialogue and expression, action.
For example, if identifying that the mood of interaction side is sad, the possible character features attribute obtained according to model Corresponding target dialogue strategy is that type of pacifying and the target dialogue strategy obtain corresponding text information, expression instruction, voice Instruction and/or action command, for example, text information is comfort property language, and expression instruction is sad, face is soft etc., voice refers to It is sound is relatively low, intonation is soft etc. to enable, and action command is interactive square toes portion etc. of stroking, and based on expression instruction, phonetic order And/or action command control robot exports the text information.
It is possible thereby to which so that human-computer interaction promotes user experience effect more close to the current actual conditions in interaction side.
The voice of interaction side can only be changed into word by existing technology, cannot be identified more information, be caused machine in this way People can only go to generate reply according to single parameter, model in dialogue.
In addition it is generally synthetic based on voice when existing robot dialogue, and lacks the intonation change for different inputs Change, combination of embodiment of the present invention voice, intonation, expression, action improve interactive experience.In other words, the embodiment of the present invention is real The man-machine interaction method of multimode input and multimode output is showed so that the dialogue of machine is no longer single.
In summary, the embodiment of the present invention is by state (voice data, the figure of the who object for being presently in scene and facing As data and contextual data) etc. as reference, obtain robot and need the expression showed and action etc. more under current scene Mode exports, and realizes the multi-modal human-computer interaction of machine.That is, a kind of comprehensive utilization speech recognition of proposition of the embodiment of the present invention, feelings Perception not, recognition of face, the technological maheups multimode input such as scene Recognition;So that robot voice cooperation expression, action composition are more Mould exports, to promote conversational system experience.Wherein, the realization process of the embodiment of the present invention can be divided into two kinds:
1, off-line procedure
Collection data that off-line procedure is mentioned before, the process of training, the embodiment of the present invention is according to people in each scene The information such as object characteristic attribute and conversation content, for different types of personage, under different scenes, the expression of generation, action Etc. for statistical analysis, scene characteristic model is established.Also, it is arranged in each scene and the corresponding dialogue plan of character features attribute Slightly.For example, under certain scene, action corresponding with current chat content, expression, intonation etc.;Either for old man, small Child, under certain scene, action corresponding with current chat content, expression, intonation etc..
2, in line process
According to current scene data, and the chat content of interaction side and image data etc., the situation of presence is sentenced It is disconnected, it determines scene characteristic model, interaction side's characteristic under the scene is determined based on scene characteristic model and determines target pair Words strategy, is then based on target dialogue strategy, obtains expression, the action etc. to match with current scene and chat content, real The multi-modal output of existing robot.
It should be noted that for embodiment of the method, for simple description, therefore it is all expressed as a series of action group It closes, but those skilled in the art should understand that, the embodiment of the present invention is not limited by the described action sequence, because according to According to the embodiment of the present invention, certain steps can be performed in other orders or simultaneously.Secondly, those skilled in the art also should Know, embodiment described in this description belongs to preferred embodiment, and the involved action not necessarily present invention is implemented Necessary to example.
With reference to Fig. 3, a kind of structure diagram of the device embodiment of human-computer dialogue interaction of the present invention is shown, it specifically can be with Including following module:
Interaction side's data acquisition module 301, voice data, image data and contextual data for obtaining interaction side;
Scene characteristic model acquisition module 302, for obtaining corresponding scene characteristic model according to the contextual data;
Character features attribute obtains module 303, special for the voice data and image data to be input to the scene Sign model obtains target person characteristic attribute;
Target dialogue strategy determining module 304, for determining mesh using the target person characteristic attribute and contextual data Mark dialog strategy;
Human-computer interaction session module 305, for based on target dialogue policy control robot expression, voice and/ Or action output.
In one preferred embodiment of the invention, interaction side's data acquisition module 301 may include:
First interaction side's data-acquisition submodule, the voice data for acquiring the interaction side based on microphone;It is based on Camera acquires the image data of the interaction side;And the contextual data is acquired based on sensor.
In one preferred embodiment of the invention, interaction side's data acquisition module 301 may include:
Interactive interface shows submodule, for showing interactive interface;
Second interaction side's data-acquisition submodule, for based on the interactive interface prompt interaction side input voice data, Image data and contextual data.
In one preferred embodiment of the invention, the scene characteristic model acquisition module 302 may include:
Scene characteristic attributes extraction submodule, for extracting scene characteristic attribute from the contextual data;
Scene characteristic model determination sub-module, for obtaining the corresponding scene characteristic model of the scene characteristic attribute.
In one preferred embodiment of the invention, described device can also include:
Scene characteristic attributes extraction submodule, for extracting scene characteristic attribute from the contextual data;
Scene characteristic model determination sub-module, for obtaining the corresponding scene characteristic model of the scene characteristic attribute.
In one preferred embodiment of the invention, the character features attribute acquisition module 303 may include:
Fisrt feature data extracting sub-module, it is special for extracting intonation characteristic and lines from the voice data Levy data;
Second feature data extracting sub-module, for going out expressive features data and action spy from described image extracting data Levy data;
Target person characteristic attribute obtains submodule, for based on the intonation characteristic, patterned feature data, expression Characteristic and/or motion characteristic data obtain the target person characteristic attribute in conjunction with the scene characteristic model.
In one preferred embodiment of the invention, the target dialogue strategy setting has corresponding word, expression, moves Make, the human-computer interaction session module 305 may include:
Acquisition submodule is instructed, for obtaining text information corresponding with the target dialogue strategy, expression instruction, voice Instruction, and/or action command;
Command executing sub module, for controlling the machine based on expression instruction, phonetic order and/or action command People exports the text information.
For device embodiments, since it is basically similar to the method embodiment, so fairly simple, the correlation of description Place illustrates referring to the part of embodiment of the method.
Fig. 4 is a kind of block diagram of the device 500 of human-computer interaction shown according to an exemplary embodiment.For example, device 500 Can be mobile phone, computer, digital broadcast terminal, messaging devices, game console, tablet device, Medical Devices, Body-building equipment, personal digital assistant etc..
With reference to Fig. 4, device 500 may include following one or more components:Processing component 502, memory 504, power supply Component 506, multimedia component 508, audio component 510, the interface 512 of input/output (I/O), sensor module 514, and Communication component 516.
The integrated operation of 502 usual control device 500 of processing component, such as with display, call, data communication, phase Machine operates and record operates associated operation.Processing element 502 may include that one or more processors 520 refer to execute It enables, to perform all or part of the steps of the methods described above.In addition, processing component 502 may include one or more modules, just Interaction between processing component 502 and other assemblies.For example, processing component 502 may include multi-media module, it is more to facilitate Interaction between media component 508 and processing component 502.
Memory 504 is configured as storing various types of data to support the operation in device 500.These data are shown Example includes instruction for any application program or method that operate on device 500, contact data, and telephone book data disappears Breath, picture, video etc..Memory 504 can be by any kind of volatibility or non-volatile memory device or their group It closes and realizes, such as static RAM (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable to compile Journey read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash Device, disk or CD.
Power supply module 506 provides electric power for the various assemblies of device 500.Power supply module 506 may include power management system System, one or more power supplys and other generated with for device 500, management and the associated component of distribution electric power.
Multimedia component 508 is included in the screen of one output interface of offer between described device 500 and user.One In a little embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen Curtain may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touch sensings Device is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding action Boundary, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, more matchmakers Body component 508 includes a front camera and/or rear camera.When equipment 500 is in operation mode, such as screening-mode or When video mode, front camera and/or rear camera can receive external multi-medium data.Each front camera and Rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio component 510 is configured as output and/or input audio signal.For example, audio component 510 includes a Mike Wind (MIC), when device 500 is in operation mode, when such as call model, logging mode and speech recognition mode, microphone by with It is set to reception external audio signal.The received audio signal can be further stored in memory 504 or via communication set Part 516 is sent.In some embodiments, audio component 510 further includes a loud speaker, is used for exports audio signal.
I/O interfaces 512 provide interface between processing component 502 and peripheral interface module, and above-mentioned peripheral interface module can To be keyboard, click wheel, button etc..These buttons may include but be not limited to:Home button, volume button, start button and lock Determine button.
Sensor module 514 includes one or more sensors, and the state for providing various aspects for device 500 is commented Estimate.For example, sensor module 514 can detect the state that opens/closes of equipment 500, and the relative positioning of component, for example, it is described Component is the display and keypad of device 500, and sensor module 514 can be with 500 1 components of detection device 500 or device Position change, the existence or non-existence that user contacts with device 500,500 orientation of device or acceleration/deceleration and device 500 Temperature change.Sensor module 514 may include proximity sensor, be configured to detect without any physical contact Presence of nearby objects.Sensor module 514 can also include optical sensor, such as CMOS or ccd image sensor, at As being used in application.In some embodiments, which can also include acceleration transducer, gyro sensors Device, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 516 is configured to facilitate the communication of wired or wireless way between device 500 and other equipment.Device 500 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or combination thereof.In an exemplary implementation In example, communication component 514 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel. In one exemplary embodiment, the communication component 514 further includes near-field communication (NFC) module, to promote short range communication.Example Such as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology, Bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, device 500 can be believed by one or more application application-specific integrated circuit (ASIC), number Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.
In the exemplary embodiment, it includes the non-transitorycomputer readable storage medium instructed, example to additionally provide a kind of Such as include the memory 504 of instruction, above-metioned instruction can be executed by the processor 520 of device 500 to complete the above method.For example, The non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk With optical data storage devices etc..
Fig. 5 be shown according to an exemplary embodiment it is a kind of for human-computer dialogue interaction device as server when Block diagram.The server 1900 can generate bigger difference because configuration or performance are different, may include one or more Central processing unit (central processing units, CPU) 1922 (for example, one or more processors) and storage The storage medium 1930 (such as one or one of device 1932, one or more storage application programs 1942 or data 1944 The above mass memory unit).Wherein, memory 1932 and storage medium 1930 can be of short duration storage or persistent storage.Storage May include one or more modules (diagram does not mark) in the program of storage medium 1930, each module may include pair Series of instructions operation in server.Further, central processing unit 1922 could be provided as logical with storage medium 1930 Letter executes the series of instructions operation in storage medium 1930 on server 1900.
Server 1900 can also include one or more power supplys 1926, one or more wired or wireless nets Network interface 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or, one or More than one operating system 1941, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM Etc..
A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is by the processing of mobile terminal When device executes so that mobile terminal is able to carry out a kind of method of human-computer dialogue interaction, the method includes:
Obtain voice data, image data and the contextual data of interaction side;
Corresponding scene characteristic model is obtained according to the contextual data;
The voice data and image data are input to the scene characteristic model and obtain target person characteristic attribute;
Target dialogue strategy is determined using the target person characteristic attribute and contextual data;
Expression, voice based on target dialogue policy control robot and/or action output.
Optionally, the step of voice data for obtaining interaction side, image data and contextual data includes:
The voice data of the interaction side is acquired based on microphone;
The image data of the interaction side is acquired based on camera;
And the contextual data is acquired based on sensor.
Optionally, the step of voice data for obtaining interaction side, image data and contextual data includes:
Show interactive interface;
Based on interactive interface prompt interaction side's input voice data, image data and contextual data.
Optionally, described the step of obtaining corresponding scene characteristic model according to the contextual data, includes:
Scene characteristic attribute is extracted from the contextual data;
Obtain the corresponding scene characteristic model of the scene characteristic attribute.
Optionally, the scene characteristic model is trained in the following way:
Obtain the training sample and the corresponding character features attribute of each training sample under each scene characteristic model;Institute It includes training voice data and training image data to state training sample;
Trained intonation characteristic and training patterned feature data are extracted from the trained voice data;
Go out trained expressive features data and training action characteristic from the training image extracting data;
The intonation characteristic that includes using the training sample under each scene characteristic, training patterned feature data, Training expressive features data and/or training action characteristic and corresponding character features attribute, it is special that training obtains each scene Levy model.
Optionally, described the voice data and image data are input to the scene characteristic model to obtain target person Characteristic attribute, including:
Intonation characteristic and patterned feature data are extracted from the voice data;
Go out expressive features data and motion characteristic data from described image extracting data;
Based on the intonation characteristic, patterned feature data, expressive features data and/or motion characteristic data, in conjunction with The scene characteristic model obtains the target person characteristic attribute.
Optionally, expression, voice and/or the action output based on target dialogue policy control robot, packet It includes:
Obtain text information corresponding with the target dialogue strategy, expression instruction, phonetic order, and/or action command;
The robot, which is controlled, based on expression instruction, phonetic order and/or action command exports the text information.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the present invention Its embodiment.The present invention is directed to cover the present invention any variations, uses, or adaptations, these modifications, purposes or Person's adaptive change follows the general principle of the present invention and includes the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following Claim is pointed out.
It should be understood that the invention is not limited in the precision architectures for being described above and being shown in the accompanying drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims.
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.

Claims (10)

1. a kind of method of human-computer dialogue interaction, which is characterized in that including:
Obtain voice data, image data and the contextual data of interaction side;
Corresponding scene characteristic model is obtained according to the contextual data;
The voice data and image data are input to the scene characteristic model and obtain target person characteristic attribute;
Target dialogue strategy is determined using the target person characteristic attribute and contextual data;
Expression, voice based on target dialogue policy control robot and/or action output.
2. according to the method described in claim 1, it is characterized in that, it is described obtain the voice data of interaction side, image data, with And the step of contextual data, includes:
The voice data of the interaction side is acquired based on microphone;
The image data of the interaction side is acquired based on camera;
And the contextual data is acquired based on sensor.
3. according to the method described in claim 2, it is characterized in that, it is described obtain the voice data of interaction side, image data, with And the step of contextual data, includes:
Show interactive interface;
Based on interactive interface prompt interaction side's input voice data, image data and contextual data.
4. according to the method described in claim 1, it is characterized in that, described obtain corresponding scene spy according to the contextual data Levy model the step of include:
Scene characteristic attribute is extracted from the contextual data;
Obtain the corresponding scene characteristic model of the scene characteristic attribute.
5. according to the method described in claim 1, it is characterized in that, the scene characteristic model is trained in the following way:
Obtain the training sample and the corresponding character features attribute of each training sample under each scene characteristic model;The instruction It includes training voice data and training image data to practice sample;
Trained intonation characteristic and training patterned feature data are extracted from the trained voice data;
Go out trained expressive features data and training action characteristic from the training image extracting data;
The intonation characteristic that includes using the training sample under each scene characteristic, training patterned feature data, training Expressive features data and/or training action characteristic and corresponding character features attribute, training obtain each scene characteristic mould Type.
6. according to the method described in claim 5, it is characterized in that, described be input to institute by the voice data and image data It states scene characteristic model and obtains target person characteristic attribute, including:
Intonation characteristic and patterned feature data are extracted from the voice data;
Go out expressive features data and motion characteristic data from described image extracting data;
Based on the intonation characteristic, patterned feature data, expressive features data and/or motion characteristic data, in conjunction with described Scene characteristic model obtains the target person characteristic attribute.
7. according to any methods of claim 1-6, which is characterized in that described to be based on the target dialogue policy control machine Expression, voice and/or the action output of device people, including:
Obtain text information corresponding with the target dialogue strategy, expression instruction, phonetic order, and/or action command;
The robot, which is controlled, based on expression instruction, phonetic order and/or action command exports the text information.
8. a kind of device of human-computer dialogue interaction, which is characterized in that including:
Interaction side's data acquisition module, voice data, image data and contextual data for obtaining interaction side;
Scene characteristic model acquisition module, for obtaining corresponding scene characteristic model according to the contextual data;
Character features attribute obtains module, is obtained for the voice data and image data to be input to the scene characteristic model To target person characteristic attribute;
Target dialogue strategy determining module, for determining target dialogue plan using the target person characteristic attribute and contextual data Slightly;
Human-computer interaction session module is used for the expression based on target dialogue policy control robot, voice and/or acts defeated Go out.
9. device according to claim 8, which is characterized in that interaction side's data acquisition module includes:
First interaction side's data-acquisition submodule, the voice data for acquiring the interaction side based on microphone;Based on camera shooting Head acquires the image data of the interaction side;And the contextual data is acquired based on sensor.
10. a kind of device for human-computer dialogue interaction, which is characterized in that include memory and one or one with On program, one of them either more than one program be stored in memory and be configured to by one or more than one It includes the instruction for being operated below that processor, which executes the one or more programs,:
Obtain voice data, image data and the contextual data of interaction side;
Corresponding scene characteristic model is obtained according to the contextual data;
The voice data and image data are input to the scene characteristic model and obtain target person characteristic attribute;
Target dialogue strategy is determined using the target person characteristic attribute and contextual data;
Expression, voice based on target dialogue policy control robot and/or action output.
CN201710056801.3A 2017-01-25 2017-01-25 Method and device for man-machine dialogue interaction Active CN108363706B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710056801.3A CN108363706B (en) 2017-01-25 2017-01-25 Method and device for man-machine dialogue interaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710056801.3A CN108363706B (en) 2017-01-25 2017-01-25 Method and device for man-machine dialogue interaction

Publications (2)

Publication Number Publication Date
CN108363706A true CN108363706A (en) 2018-08-03
CN108363706B CN108363706B (en) 2023-07-18

Family

ID=63011370

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710056801.3A Active CN108363706B (en) 2017-01-25 2017-01-25 Method and device for man-machine dialogue interaction

Country Status (1)

Country Link
CN (1) CN108363706B (en)

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108942949A (en) * 2018-09-26 2018-12-07 北京子歌人工智能科技有限公司 A kind of robot control method based on artificial intelligence, system and intelligent robot
CN109101663A (en) * 2018-09-18 2018-12-28 宁波众鑫网络科技股份有限公司 A kind of robot conversational system Internet-based
CN109451188A (en) * 2018-11-29 2019-03-08 平安科技(深圳)有限公司 Method, apparatus, computer equipment and the storage medium of the self-service response of otherness
CN109784157A (en) * 2018-12-11 2019-05-21 口碑(上海)信息技术有限公司 A kind of image processing method, apparatus and system
CN109801632A (en) * 2019-03-08 2019-05-24 北京马尔马拉科技有限公司 A kind of artificial intelligent voice robot system and method based on big data
CN110008321A (en) * 2019-03-07 2019-07-12 腾讯科技(深圳)有限公司 Information interacting method and device, storage medium and electronic device
CN110085225A (en) * 2019-04-24 2019-08-02 北京百度网讯科技有限公司 Voice interactive method, device, intelligent robot and computer readable storage medium
CN110125932A (en) * 2019-05-06 2019-08-16 达闼科技(北京)有限公司 A kind of dialogue exchange method, robot and the readable storage medium storing program for executing of robot
CN110188220A (en) * 2019-05-17 2019-08-30 北京小米移动软件有限公司 Image presentation method, device and smart machine
CN110209792A (en) * 2019-06-13 2019-09-06 苏州思必驰信息科技有限公司 Talk with painted eggshell generation method and system
CN110347247A (en) * 2019-06-19 2019-10-18 深圳前海达闼云端智能科技有限公司 Man-machine interaction method, device, storage medium and electronic equipment
CN110689078A (en) * 2019-09-29 2020-01-14 浙江连信科技有限公司 Man-machine interaction method and device based on personality classification model and computer equipment
JP2020064616A (en) * 2018-10-18 2020-04-23 深▲せん▼前海達闥云端智能科技有限公司Cloudminds (Shenzhen) Robotics Systems Co.,Ltd. Virtual robot interaction method, device, storage medium, and electronic device
CN111435268A (en) * 2019-01-11 2020-07-21 合肥虹慧达科技有限公司 Human-computer interaction method based on image recognition and reconstruction and system and device using same
CN111540358A (en) * 2020-04-26 2020-08-14 云知声智能科技股份有限公司 Man-machine interaction method, device, equipment and storage medium
CN111951787A (en) * 2020-07-31 2020-11-17 北京小米松果电子有限公司 Voice output method, device, storage medium and electronic equipment
CN112232101A (en) * 2019-07-15 2021-01-15 北京正和思齐数据科技有限公司 User communication state evaluation method, device and system
CN112240458A (en) * 2020-10-14 2021-01-19 上海宝钿科技产业发展有限公司 Quality control method for multi-modal scene specific target recognition model
CN112918381A (en) * 2019-12-06 2021-06-08 广州汽车集团股份有限公司 Method, device and system for welcoming and delivering guests by vehicle-mounted robot
WO2021174757A1 (en) * 2020-03-03 2021-09-10 深圳壹账通智能科技有限公司 Method and apparatus for recognizing emotion in voice, electronic device and computer-readable storage medium
CN114356068A (en) * 2020-09-28 2022-04-15 北京搜狗智能科技有限公司 Data processing method and device and electronic equipment
CN114566145A (en) * 2022-03-04 2022-05-31 河南云迹智能技术有限公司 Data interaction method, system and medium
CN116880697A (en) * 2023-07-31 2023-10-13 深圳市麦驰安防技术有限公司 Man-machine interaction method and system based on scene object
WO2023246163A1 (en) * 2022-06-22 2023-12-28 海信视像科技股份有限公司 Virtual digital human driving method, apparatus, device, and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140359536A1 (en) * 2013-06-03 2014-12-04 Amchael Visual Technology Corporation Three-dimensional (3d) human-computer interaction system using computer mouse as a 3d pointing device and an operation method thereof
CN105280183A (en) * 2015-09-10 2016-01-27 百度在线网络技术(北京)有限公司 Voice interaction method and system
CN105913039A (en) * 2016-04-26 2016-08-31 北京光年无限科技有限公司 Visual-and-vocal sense based dialogue data interactive processing method and apparatus
CN106228983A (en) * 2016-08-23 2016-12-14 北京谛听机器人科技有限公司 Scene process method and system during a kind of man-machine natural language is mutual

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140359536A1 (en) * 2013-06-03 2014-12-04 Amchael Visual Technology Corporation Three-dimensional (3d) human-computer interaction system using computer mouse as a 3d pointing device and an operation method thereof
CN105280183A (en) * 2015-09-10 2016-01-27 百度在线网络技术(北京)有限公司 Voice interaction method and system
CN105913039A (en) * 2016-04-26 2016-08-31 北京光年无限科技有限公司 Visual-and-vocal sense based dialogue data interactive processing method and apparatus
CN106228983A (en) * 2016-08-23 2016-12-14 北京谛听机器人科技有限公司 Scene process method and system during a kind of man-machine natural language is mutual

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109101663A (en) * 2018-09-18 2018-12-28 宁波众鑫网络科技股份有限公司 A kind of robot conversational system Internet-based
CN108942949A (en) * 2018-09-26 2018-12-07 北京子歌人工智能科技有限公司 A kind of robot control method based on artificial intelligence, system and intelligent robot
JP2020064616A (en) * 2018-10-18 2020-04-23 深▲せん▼前海達闥云端智能科技有限公司Cloudminds (Shenzhen) Robotics Systems Co.,Ltd. Virtual robot interaction method, device, storage medium, and electronic device
CN109451188A (en) * 2018-11-29 2019-03-08 平安科技(深圳)有限公司 Method, apparatus, computer equipment and the storage medium of the self-service response of otherness
CN109784157A (en) * 2018-12-11 2019-05-21 口碑(上海)信息技术有限公司 A kind of image processing method, apparatus and system
CN111435268A (en) * 2019-01-11 2020-07-21 合肥虹慧达科技有限公司 Human-computer interaction method based on image recognition and reconstruction and system and device using same
CN110008321A (en) * 2019-03-07 2019-07-12 腾讯科技(深圳)有限公司 Information interacting method and device, storage medium and electronic device
CN110008321B (en) * 2019-03-07 2021-06-25 腾讯科技(深圳)有限公司 Information interaction method and device, storage medium and electronic device
CN109801632A (en) * 2019-03-08 2019-05-24 北京马尔马拉科技有限公司 A kind of artificial intelligent voice robot system and method based on big data
CN110085225B (en) * 2019-04-24 2024-01-02 北京百度网讯科技有限公司 Voice interaction method and device, intelligent robot and computer readable storage medium
CN110085225A (en) * 2019-04-24 2019-08-02 北京百度网讯科技有限公司 Voice interactive method, device, intelligent robot and computer readable storage medium
CN110125932B (en) * 2019-05-06 2024-03-19 达闼科技(北京)有限公司 Dialogue interaction method for robot, robot and readable storage medium
CN110125932A (en) * 2019-05-06 2019-08-16 达闼科技(北京)有限公司 A kind of dialogue exchange method, robot and the readable storage medium storing program for executing of robot
CN110188220A (en) * 2019-05-17 2019-08-30 北京小米移动软件有限公司 Image presentation method, device and smart machine
CN110209792A (en) * 2019-06-13 2019-09-06 苏州思必驰信息科技有限公司 Talk with painted eggshell generation method and system
CN110209792B (en) * 2019-06-13 2021-07-06 思必驰科技股份有限公司 Method and system for generating dialogue color eggs
CN110347247A (en) * 2019-06-19 2019-10-18 深圳前海达闼云端智能科技有限公司 Man-machine interaction method, device, storage medium and electronic equipment
CN112232101A (en) * 2019-07-15 2021-01-15 北京正和思齐数据科技有限公司 User communication state evaluation method, device and system
CN110689078A (en) * 2019-09-29 2020-01-14 浙江连信科技有限公司 Man-machine interaction method and device based on personality classification model and computer equipment
CN112918381B (en) * 2019-12-06 2023-10-27 广州汽车集团股份有限公司 Vehicle-mounted robot welcome method, device and system
CN112918381A (en) * 2019-12-06 2021-06-08 广州汽车集团股份有限公司 Method, device and system for welcoming and delivering guests by vehicle-mounted robot
WO2021174757A1 (en) * 2020-03-03 2021-09-10 深圳壹账通智能科技有限公司 Method and apparatus for recognizing emotion in voice, electronic device and computer-readable storage medium
CN111540358A (en) * 2020-04-26 2020-08-14 云知声智能科技股份有限公司 Man-machine interaction method, device, equipment and storage medium
CN111951787A (en) * 2020-07-31 2020-11-17 北京小米松果电子有限公司 Voice output method, device, storage medium and electronic equipment
CN114356068A (en) * 2020-09-28 2022-04-15 北京搜狗智能科技有限公司 Data processing method and device and electronic equipment
CN114356068B (en) * 2020-09-28 2023-08-25 北京搜狗智能科技有限公司 Data processing method and device and electronic equipment
CN112240458A (en) * 2020-10-14 2021-01-19 上海宝钿科技产业发展有限公司 Quality control method for multi-modal scene specific target recognition model
CN114566145A (en) * 2022-03-04 2022-05-31 河南云迹智能技术有限公司 Data interaction method, system and medium
WO2023246163A1 (en) * 2022-06-22 2023-12-28 海信视像科技股份有限公司 Virtual digital human driving method, apparatus, device, and medium
CN116880697A (en) * 2023-07-31 2023-10-13 深圳市麦驰安防技术有限公司 Man-machine interaction method and system based on scene object
CN116880697B (en) * 2023-07-31 2024-04-05 深圳市麦驰安防技术有限公司 Man-machine interaction method and system based on scene object

Also Published As

Publication number Publication date
CN108363706B (en) 2023-07-18

Similar Documents

Publication Publication Date Title
CN108363706A (en) The method and apparatus of human-computer dialogue interaction, the device interacted for human-computer dialogue
US11241789B2 (en) Data processing method for care-giving robot and apparatus
US11017779B2 (en) System and method for speech understanding via integrated audio and visual based speech recognition
US20190371318A1 (en) System and method for adaptive detection of spoken language via multiple speech models
JP2019164345A (en) System for processing sound data, user terminal and method for controlling the system
US11017551B2 (en) System and method for identifying a point of interest based on intersecting visual trajectories
US11200902B2 (en) System and method for disambiguating a source of sound based on detected lip movement
CN106502382B (en) Active interaction method and system for intelligent robot
US10785489B2 (en) System and method for visual rendering based on sparse samples with predicted motion
CN107221330A (en) Punctuate adding method and device, the device added for punctuate
US11308312B2 (en) System and method for reconstructing unoccupied 3D space
US20190251350A1 (en) System and method for inferring scenes based on visual context-free grammar model
CN110598576A (en) Sign language interaction method and device and computer medium
CN107274903A (en) Text handling method and device, the device for text-processing
CN108073572A (en) Information processing method and its device, simultaneous interpretation system
CN108628819A (en) Treating method and apparatus, the device for processing
CN111149172B (en) Emotion management method, device and computer-readable storage medium
CN108648754A (en) Sound control method and device
WO2023231211A1 (en) Voice recognition method and apparatus, electronic device, storage medium, and product
CN109102812B (en) Voiceprint recognition method and system and electronic equipment
CN113270087A (en) Processing method, mobile terminal and storage medium
EP3288035B1 (en) Personal audio analytics and behavior modification feedback
WO2020087534A1 (en) Generating response in conversation
EP4350690A1 (en) Artificial intelligence device and operating method thereof
CN109102810A (en) Method for recognizing sound-groove and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant