CN110364164B - Dialogue control device, dialogue system, dialogue control method, and storage medium - Google Patents

Dialogue control device, dialogue system, dialogue control method, and storage medium Download PDF

Info

Publication number
CN110364164B
CN110364164B CN201910207297.1A CN201910207297A CN110364164B CN 110364164 B CN110364164 B CN 110364164B CN 201910207297 A CN201910207297 A CN 201910207297A CN 110364164 B CN110364164 B CN 110364164B
Authority
CN
China
Prior art keywords
speech
reaction
robot
utterance
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910207297.1A
Other languages
Chinese (zh)
Other versions
CN110364164A (en
Inventor
市川英里奈
富田高弘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Casio Computer Co Ltd
Original Assignee
Casio Computer Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Casio Computer Co Ltd filed Critical Casio Computer Co Ltd
Publication of CN110364164A publication Critical patent/CN110364164A/en
Application granted granted Critical
Publication of CN110364164B publication Critical patent/CN110364164B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Manipulator (AREA)
  • User Interface Of Digital Computer (AREA)
  • Toys (AREA)

Abstract

The application relates to a dialogue control device, a dialogue system, a dialogue control method, and a storage medium. The present application provides a user preference grasping device and a user preference recognizing method, which can accurately grasp the user preference and perform a dialogue conforming to the user preference. The robot (100A) comprises: a reaction acquisition unit (110A) that acquires a plurality of reaction determination results including a result of determining a reaction of a prescribed object to a speech of the robot (100A) and a result of determining a reaction of a prescribed object to a speech of a robot (100B) provided independently of the robot (100A); and a speech control unit (115A) that controls the speech of at least any one of the plurality of speech devices including the robot (100A) and the robot (100B) based on the plurality of reaction determination results acquired by the reaction acquisition unit (110A).

Description

Dialogue control device, dialogue system, dialogue control method, and storage medium
The present application is based on japanese patent applications 2018-058200 filed on 3 months 26 and japanese patent applications 2018-247382 filed on 12 months 28 of 2018, enjoying the priority of these applications, and the contents of these basic applications are incorporated by reference in their entirety.
Technical Field
The invention relates to a dialogue control device, a dialogue system, a dialogue control method, and a storage medium.
Background
In order to spread such devices, the proximity ease is an important aspect. For example, japanese patent application laid-open No. 2006-071936 discloses the following technique: the user's preference is learned by a dialogue with the user, and a dialogue conforming to the user's preference is performed.
In the technique disclosed in japanese patent application laid-open No. 2006-071936, it is difficult to grasp the preference of the user with high accuracy.
Disclosure of Invention
The present invention has been made in view of the above-described circumstances, and an object thereof is to provide a conversation control apparatus, a conversation system, a conversation control method, and a storage medium capable of grasping a user's preference with high accuracy and performing a conversation in accordance with the user's preference.
In order to achieve the above object, one aspect of the present invention provides a session control device comprising: a reaction acquisition unit that acquires a plurality of reaction determination results including a result of determining a reaction of a predetermined object to an utterance of a 1 st utterance device and a result of determining a reaction of the predetermined object to an utterance of a 2 nd utterance device provided independently of the 1 st utterance device; and a speech control unit configured to control the speech of at least any one of a plurality of speech devices including the 1 st and 2 nd speech devices and the 1 st and 2 nd speech devices based on the plurality of reaction determination results obtained by the reaction obtaining unit.
In order to achieve the above object, one aspect of the present invention provides a conversation system including a 1 st speech device and a 2 nd speech device configured to be able to speak, and a conversation control device, the conversation system including: a reaction acquisition unit that acquires a plurality of reaction determination results including a result of determining a reaction of a predetermined object to an utterance of a 1 st utterance device and a result of determining a reaction of the predetermined object to an utterance of a 2 nd utterance device provided independently of the 1 st utterance device; and a speech control unit configured to control the speech of at least any one of a plurality of speech devices including the 1 st and 2 nd speech devices based on the plurality of reaction determination results obtained by the reaction obtaining unit.
In order to achieve the above object, one aspect of the present invention provides a session control method comprising: a process of acquiring a plurality of reaction determination results including a result of determining a reaction of a predetermined object to a speech of a 1 st speech device and a result of determining a reaction of the predetermined object to a speech of a 2 nd speech device provided independently of the 1 st speech device; and a process of controlling the speech of at least any one of the plurality of speech devices including the 1 st and 2 nd speech devices based on the obtained plurality of reaction determination results.
In order to achieve the above object, one aspect of the storage medium according to the present invention is a storage medium storing a program for causing a computer to function as: a reaction acquisition unit that acquires a plurality of reaction determination results including a result of determining a reaction of a predetermined object to an utterance of a 1 st utterance device and a result of determining a reaction of the predetermined object to an utterance of a 2 nd utterance device provided independently of the 1 st utterance device; and a speech control unit configured to control the speech of at least any one of a plurality of speech devices including the 1 st and 2 nd speech devices based on the plurality of reaction determination results obtained by the reaction obtaining unit.
ADVANTAGEOUS EFFECTS OF INVENTION
According to the present invention, it is possible to provide a conversation control apparatus, a conversation system, a conversation control method, and a storage medium, which can grasp a user's preference with high accuracy and perform a conversation in accordance with the user's preference.
Drawings
Fig. 1 is a diagram showing a configuration of a dialogue system according to embodiment 1 of the present invention.
Fig. 2 is a front view of the robot according to embodiment 1.
Fig. 3 is a block diagram showing a configuration of the robot according to embodiment 1.
Fig. 4 is a diagram showing an example of the voice response polarity determination table according to embodiment 1.
Fig. 5 is a flowchart showing a flow of the session control process according to embodiment 1.
Fig. 6 is a flowchart showing a flow of user identification processing according to embodiment 1.
Fig. 7 is a flowchart showing a flow of sound determination processing according to embodiment 1.
Fig. 8 is a flowchart showing a flow of expression determination processing according to embodiment 1.
Fig. 9 is a flowchart showing a flow of the action determination processing according to embodiment 1.
Fig. 10 is a flowchart showing a flow of preference determination processing according to embodiment 1.
Fig. 11 is a block diagram showing the configuration of a dialogue system according to embodiment 2.
Detailed Description
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
(embodiment 1)
The dialogue system 1 according to embodiment 1 of the present invention includes a plurality of robots 100. The plurality of robots 100 are disposed in living spaces such as offices and houses of predetermined objects, and the plurality of robots 100 perform conversations with the predetermined objects. In the following description, an example in which 2 robots 100 are allowed to perform a conversation with a predetermined object is described, but the conversation system 1 may be configured to include 3 or more robots 100.
Here, the specified object refers to a user (user) who uses the dialogue system, typically, an owner of the dialogue system, family members or friends of the owner, or the like. In addition to the human, the predetermined object includes, for example, an animal raised as a pet, and another robot different from the robot 100.
As shown in fig. 1, the conversation system 1 includes 2 robots 100 capable of communicating with each other, and performs a conversation with a user USR. Here, for convenience of explanation, the left robot 100 is referred to as a robot 100A toward the paper of fig. 1, and the right robot 100 is referred to as a robot 100B toward the paper of fig. 1. Note that, when the description is made without distinguishing between the robot 100A and the robot 100B, any one of the robots may be collectively referred to as "robot 100". The robot 100A and the robot 100B are disposed at different locations, and are disposed at locations where the same predetermined object cannot recognize the utterances of both the robot 100A and the robot 100B. For example, the robot 100A is disposed in an office of a predetermined object, and the robot 100B is disposed in a house of the predetermined object away from the office. Alternatively, the robot 100A is disposed at a facility where the predetermined object is always present, and the robot 100B is disposed at another facility where the predetermined object is always present away from the facility.
As shown in fig. 2, the robot 100 is a robot having a three-dimensional shape that mimics a human in appearance. The exterior of the robot 100 is made of a material mainly composed of synthetic resin. The robot 100 includes a trunk 101, a head 102 connected to an upper portion of the trunk 101, hands 103 connected to left and right sides of the trunk 101, and two legs 104 connected from the trunk 101 to a lower portion. The head 102 has a pair of left and right eyes 105, a mouth 106, and a pair of left and right ears 107. The upper, lower, left, and right sides of fig. 2 are respectively the upper, lower, right, and left sides of the robot 100.
Next, the configuration of the robot 100 will be described with reference to fig. 3. Fig. 3 shows a block diagram illustrating respective configurations of the robot 100A and the robot 100B, and the robot 100A has the same configuration as the robot 100B. First, the configuration of the robot 100A will be described as an example.
As shown in fig. 3, the robot 100A includes a control unit 110A, a storage unit 120A, an imaging unit 130A, a sound input unit 140A, a sound output unit 150A, a moving unit 160A, and a communication unit 170A. These parts are electrically connected to each other via a bus BL.
The control unit 110A is configured by a computer having a CPU (Central Processing Unit: central processing unit), a ROM (Read Only Memory), and a RAM (Random Access Memory: random access Memory), and controls the operation of the entire robot 100A. The control unit 110A reads a control program stored in the ROM by the CPU and executes the control program on the RAM, thereby controlling the operations of the respective units of the robot 100A.
The control unit 110A functions as a user detection unit 111A, a user specification unit 112A, a user information acquisition unit 113A, a voice recognition unit 114A, a speech control unit 115A, a voice synthesis unit 116A, a response determination unit 117A, and a preference determination unit 118A by executing a control program.
The user detection unit 111A detects a user USR existing around the robot 100A (for example, within a radius of 2m from the robot 100A). The user detection unit 111A detects the presence of the user USR around the robot 100A by controlling the imaging unit 130A described later to capture an image around the robot 100A and detecting the movement of an object, the head, the face, and the like, for example.
The user specification unit 112A specifies the user USR detected by the user detection unit 111A. The user specification unit 112A extracts a face image corresponding to the face portion of the user USR from the captured image of the image capturing unit 130A, for example. Then, the user specification unit 112A detects a feature amount from the face image, compares the feature amount with face information indicating the feature amount of the face registered in a user information database of the storage unit 120A described later, calculates a similarity based on a result of the comparison, and specifies the user USR based on whether or not the calculated similarity satisfies a predetermined criterion. The user information database of the storage unit 120A stores face information indicating feature amounts of faces of a predetermined plurality of users USR. The user specification unit 112A specifies which of the user USRs the user detection unit 111A detects. The feature quantity may be information capable of identifying the USR of the user, and is, for example, information in which the shape, size, arrangement, and other appearance features of each part included in the face such as eyes, nose, and mouth are represented by numerical values. In the following description, the user USR detected by the user detection unit 111A and specified by the user specification unit 112A is referred to as a target user. In this way, the user specification unit 112A functions as the object specification means of the present invention.
The user information acquisition unit 113A acquires user information indicating the speech, appearance, action, and the like of the target user. In the present embodiment, the user information acquisition unit 113A controls the image pickup unit 130A and the audio input unit 140A, for example, to acquire at least one of image information including image data of an image picked up by the target user and audio information including audio data of an audio uttered by the target user as user information. In this way, the user information acquisition unit 113A functions as an acquisition means of the present invention in cooperation with the imaging unit 130A and the audio input unit 140A.
The voice recognition unit 114A performs voice recognition processing on the voice data included in the voice information acquired by the user information acquisition unit 113A, and converts the voice data into text data representing the speech content of the target user. The voice recognition processing uses, for example, an acoustic model, a language model, and a word dictionary stored in the voice information DB (database) 122A of the storage unit 120A. The voice recognition unit 114A, for example, eliminates background noise from the acquired voice data, recognizes phonemes included in the voice data from which the background noise has been eliminated, and generates a plurality of conversion candidates for converting the recognized phoneme sequence into words with reference to the word dictionary. Then, the voice recognition unit 114A refers to the language model, and selects a candidate having the highest suitability from the generated plurality of conversion candidates, and outputs the candidate as text data corresponding to the voice data.
The speech control unit 115A controls the speech of the robot 100A. The speech control unit 115A refers to the speech information stored in the speech information DB123A of the storage unit 120A, for example, and extracts a plurality of speech candidates according to the situation from the speech information stored in the speech information DB 123A. Then, the speech control unit 115A refers to preference information included in the user information stored in the user information DB121A, selects a speech candidate suitable for the preference of the target user from among the extracted plurality of speech candidates, and determines the speech candidate as the speech content of the robot 100A. In this way, the speech control unit 115A functions as a speech control means of the present invention.
The floor control unit 115A communicates with the robot 100B via the communication unit 170A, and cooperates with the floor control unit 115B of the robot 100B to adjust and determine the floor content of the robot 100A as follows.
That is, the floor control unit 115A cooperates with the floor control unit 115B of the robot 100B to obtain, for example, an elapsed time after the floor of the robot 100B, and when the robot 100A has been making a floor within a predetermined elapsed time (for example, 72 hours), the floor control unit adjusts and determines the floor of the robot 100A so that the floor of the robot 100A is different from the floor of the robot 100B within the predetermined elapsed time before the floor of the robot 100A is started. The determination of such a topic is performed similarly in the speech control unit 115B of the robot 100B. As described above, the topics on which the robot 100A and the robot 100B speak are determined as topics different from each other, and the two robots 100A and 100B speak are controlled according to the determined topics.
As will be described later, the robot 100A and the robot 100B determine the response of the target user to the utterance of the target user, and collect (store) preference information of the target user based on the determination result, but in this case, when the topic of the utterance of the robot 100A and the topic of the robot 100B are repeated or always associated, new preference information of the target user and preference information in a wider area cannot be collected. In addition, the target user may feel boring by hearing the speech of the repeated topic. By determining the topics of the utterances of the robot 100A and the robot 100B as mutually different topics, a larger variety of preference information can be collected.
In contrast, when a predetermined elapsed time or more has elapsed after the robot 100B uttered, the utterance control unit 115A alone determines the utterance content, and is not limited by the utterance content of the robot 100B. That is, the topics (utterances) uttered by the robots 100A and 100B are determined independently of each other (independently of each other) without cooperating with each other.
The speech control unit 115A generates and outputs text data indicating the content of the speech itself determined in cooperation with the robot 100B.
The voice synthesis unit 116A generates voice data corresponding to text data representing the speech content of the robot 100A, which is input from the speech control unit 115A. The voice synthesizing unit 116A generates voice data for reading out a character string shown in the text data, for example, using an acoustic model or the like stored in the voice information DB122A of the storage unit 120A. The audio synthesizer 116A controls the audio output unit 150A to output the generated audio data.
The reaction determination unit 117A determines the reaction of the target user to the utterance of the robot 100A. Thus, the reaction to the utterance of the robot 100A is determined for each target user specified by the user specification unit 112A among the above-described plurality of users USR. The response determination unit 117A includes a sound determination unit 117AA, an expression determination unit 117BA, and an action determination unit 117CA. The sound determination unit 117AA, the expression determination unit 117BA, and the action determination unit 117CA determine by classifying the reaction to the utterance of the target robot 100A into three polarities based on the sound, expression, and action of the target user, respectively. Three Positive reactions, "Positive", negative reactions, "Negative", and Neutral reactions, "Neutral", which are neither Positive nor Negative.
The sound determination unit 117AA determines the response of the target user to the utterance of the robot 100A based on the sound made by the target user after the utterance of the robot 100A. The voice determination unit 117AA classifies the speech content of the target user into three polarities, i.e., positive, negative, and neutral, of the voice reaction polarity based on text data generated by the voice recognition unit 114A performing voice recognition processing on the voice acquired by the post-speech user information acquisition unit 113A of the robot 100A, thereby determining the reaction of the target user to the speech of the robot 100A. In this way, the sound determination unit 117AA functions as a sound determination unit of the present invention.
The expression determination unit 117BA determines the reaction of the target user to the utterance of the robot 100A based on the expression of the target user after the utterance of the robot 100A. The expression determination section 117BA calculates a smiling face degree indicating the degree of a smiling face as an index for evaluating the expression of the subject user. The expression determination unit 117BA extracts a face image of the target user from the captured image acquired by the post-utterance user information acquisition unit 113A of the robot 100A, and detects a feature amount of the face of the target user. The expression determination unit 117BA refers to the smile degree information stored in the reaction determination information DB124A of the storage unit 120A, and calculates the smile degree of the target user based on the detected feature amount. The expression determination unit 117BA classifies the expression of the target user into three polarities, i.e., positive, negative, and neutral, of the expression reaction polarity based on the calculated smile degree, thereby determining the reaction of the target user to the utterance of the robot 100A. In this way, the expression determination section 117BA functions as expression determination means of the present invention.
The action determination unit 117CA determines the reaction of the target user to the utterance of the robot 100A based on the action of the target user after the utterance of the robot 100A. The action determination unit 117CA detects the action of the target user from the captured image acquired by the post-utterance user information acquisition unit 113A of the robot 100A, and classifies the action of the target user into three polarities, i.e., positive, negative, and neutral, of the action reaction polarity, to thereby determine the reaction of the target user to the utterance of the robot 100A. In this way, the action determining unit 117CA functions as an action determining means of the present invention.
The preference determination unit 118A determines a topic of the conversation between the target user and the robot 100A, and determines a preference level indicating a preference level of the target user for the determined topic based on each determination result of the reaction determination unit 117A. Thus, the preference degree is determined for each target user specified by the user specification unit 112A among the predetermined plurality of users USR. Here, the hobbies are interests and favorites related to various things, whether they are tangible or intangible, and include, for example, interests and favorites related to food, sports, weather, and the like, as well as favorites related to correspondence (speech content) of the robot 100. The preference determination unit 118A classifies the preference degrees into four categories of "preference degree a", "preference degree B", "preference degree C", and "preference degree D" in order of high preference to topics of the subject user. In this way, the preference determination unit 118A functions as the determination means and the preference determination means of the present invention.
The user detection unit 111A, the user specification unit 112A, the user information acquisition unit 113A, the voice recognition unit 114A, the speech control unit 115A, the voice synthesis unit 116A, the reaction determination unit 117A, and the preference determination unit 118A may each be implemented by a single computer, or may each be implemented by a separate computer.
The storage unit 120A includes a nonvolatile semiconductor memory, a hard disk drive, and the like, which can rewrite the stored contents, and stores various data necessary for the control unit 110A to control each unit of the robot 100A.
The storage unit 120A has a plurality of databases each storing various data. The storage unit 120A includes, for example, a user information DB121A, a sound information DB122A, a speech information DB123A, and a response determination information DB124A. Further, the storage unit 120A stores, for each user USR, utterance history information including the date and time of the utterance of the robot 100A, the topic of the utterance, and the like.
The user information DB121A accumulates and stores various information about each of the registered plurality of user USRs as user information. The user information includes, for example, user identification information (for example, ID of the user USR) assigned in advance to identify each of the plurality of user USRs, face information indicating a feature amount of the face of the user USR, and preference information indicating preference of the user USR for each topic. In this way, the preference information of each of the plurality of user USRs is stored as preference information that can identify which user USR is using the user identification information.
The sound information DB122A stores, as data for the sound recognition processing or the sound synthesis processing, for example, an acoustic model showing each feature (frequency characteristic) of a phoneme which is the minimum unit of sound having a meaning different from other words, a word dictionary associating the feature of the phoneme with a word, and a language model showing the arrangement of the words and the connection probability thereof.
The utterance information DB123A stores utterance information indicating an utterance candidate of the robot 100A. The utterance information includes, for example, an utterance candidate in the case of starting a conversation with the target user, an utterance candidate in the case of responding to an utterance of the target user, an utterance candidate in the case of a conversation with the robot 100B, and the like, and various utterance candidates corresponding to a conversation situation with the target user.
The reaction determination information DB124A stores reaction determination information, and the reaction determination information is used when the reaction determination unit 117A determines a reaction of the target user to the utterance of the robot 100A. The response determination information DB124A is, for example, sound determination information stored in the sound determination unit 117AA of the response determination unit 117A for determining a response of the target user to the utterance of the robot 100A as response determination information. The sound determination information is stored in the form of a sound reaction polarity determination table shown in fig. 4, for example. In the voice response polarity determination table, voice response polarities described later are associated with feature keywords. The reaction determination information DB124A is, for example, smile degree information stored in the expression determination unit 117BA of the reaction determination unit 117A for use in calculating the smile degree of the target user as reaction determination information. The smile degree information is, for example, information obtained by digitizing the smile degree in a range of 0 to 100% according to the degree of change in the positions of the corners of eyes and mouth, the sizes of eyes and mouth, and the like.
The imaging unit 130A is configured by a camera that takes an image of the surroundings of the robot 100A, and an imaging element such as a lens, a CCD (Charge Coupled Device: image coupling device) image sensor, a CMOS (Complementary Metal Oxide Semiconductor: complementary metal oxide semiconductor) image sensor, or the like. The imaging unit 130A is provided, for example, at the upper front part of the head 102, and images the front of the head 102 to generate and output digital image data. The camera is mounted on a motor-driven mount (gimbal or the like) that is operated so as to be able to change the direction in which the lens is oriented, and is configured to be able to track the face or the like of the user USR.
The audio input unit 140A is configured by a microphone, an a/D (Analog to Digital: analog-digital) converter, and the like, amplifies, for example, audio picked up by the microphone provided at the ear 107, and outputs digital audio data (audio information) subjected to signal processing such as a/D conversion and encoding to the control unit 110A.
The audio output unit 150A is configured by a speaker, a D/a (Digital to Analog: digital-analog) converter, and the like, performs signal processing such as decoding, D/a conversion, amplification, and the like on the audio data supplied from the audio synthesizing unit 116A of the control unit 110A, and outputs an analog audio signal from the speaker provided in the mouth 106, for example.
The robot 100A can pick up the voice of the target user through the microphone of the voice input unit 140A, and under the control of the control unit 110A, output the voice corresponding to the speech content of the target user from the speaker of the voice output unit 150A, thereby making it possible to perform a conversation with the target user and generate communication. As described above, the robot 100A functions as the 1 st speech device of the present invention.
The moving unit 160A is a portion for moving the robot 100A. The moving unit 160A includes wheels provided at bottoms of the left and right legs 104 of the robot 100A, motors for rotationally driving the left and right wheels, and a drive circuit for driving and controlling the motors. The driving circuit supplies a pulse signal for driving to the motor according to the control signal received from the control unit 110A. The motor rotates and drives the left and right wheels according to the driving pulse signal, and moves the robot 100A. In this way, the moving unit 160A functions as a moving unit of the present invention. The left and right wheels are independently rotationally driven, but the number of motors is arbitrary as long as the robot 100A can travel forward, backward, turn, and accelerate/decelerate. For example, a coupling mechanism, a steering mechanism, or the like may be provided to drive the left and right wheels by one motor. In addition, the number of driving circuits can be appropriately changed in accordance with the number of motors.
The communication unit 170A is configured by a wireless communication module and an antenna for performing communication using a wireless communication system, and performs wireless data communication with the robot 100B. As the wireless communication system, for example, a short-range wireless communication system such as Bluetooth (registered trademark), BLE (Bluetooth Low Energy: bluetooth low energy), zigBee (registered trademark), and infrared communication, and a wireless LAN communication system such as WiFi (Wireless Fidelity) can be suitably used. In the present embodiment, the robot 100A performs wireless data communication with the robot 100B via the communication unit 170A, whereby the robot 100A and the robot 100B perform a session with the target user.
The robot 100B is the same as the robot 100A, and therefore, the configuration thereof will be briefly described. Like the robot 100A, the robot 100B includes a control unit 110B, a storage unit 120B, an imaging unit 130B, a sound input unit 140B, a sound output unit 150B, a moving unit 160B, and a communication unit 170B. The control unit 110B controls the operation of the entire robot 100B, and executes a control program to function as a user detection unit 111B, a user specification unit 112B, a user information acquisition unit 113B, a speech recognition unit 114B, a speech control unit 115B, a speech synthesis unit 116B, a response determination unit 117B, and a preference determination unit 118B. The speech control unit 115B refers to preference information included in the user information stored in the user information DB121B, selects a speech candidate suitable for the preference of the target user from among the extracted plurality of speech candidates, and determines the speech candidate as the speech content of the robot 100B. The communication unit 170B communicates with the robot 100A, and cooperates with the speech control unit 115A of the robot 100A to obtain, for example, an elapsed time after the robot 100A has uttered. When the acquired elapsed time is within the predetermined elapsed time, the speech control unit 115B adjusts and determines the content of the speech of the robot 100B so that the topic of the speech of the robot 100B is different from the topic of the speech of the robot 100A within the predetermined elapsed time before the start of the speech of the robot 100B. The reaction determination unit 117B determines a reaction of the target user to the utterance of the robot 100B. The response determination unit 117B includes a sound determination unit 117AB, an expression determination unit 117BB, and an action determination unit 117CB. The sound determination unit 117AB determines by classifying the reaction to the utterance of the target robot 100B into three polarities of "positive", "negative" and "neutral" based on the sound of the target user. The expression determination unit 117BB determines by classifying the reactions to the utterance of the target robot 100B into three polarities of "positive", "negative" and "neutral" based on the expression of the target user. The action determination unit 117CB determines by classifying the reaction to the utterance of the target robot 100B into three polarities of "positive", "negative" and "neutral" based on the action of the target user. The storage unit 120B has a plurality of databases each storing various data. The storage unit 120B includes, for example, a user information DB121B, a sound information DB122B, a speech information DB123B, and a response determination information DB124B. Further, the storage unit 120B stores, for each user USR, utterance history information including the date and time of the utterance of the robot 100B, the topic of the utterance, and the like. The robot 100B can pick up the voice of the target user by the microphone of the voice input unit 140B, and under the control of the control unit 110B, output the voice corresponding to the speech content of the target user from the speaker of the voice output unit 150B, thereby making it possible to perform a conversation with the target user and generate communication. As described above, the robot 100B functions as the 2 nd speech device of the present invention.
Next, a dialogue control process performed by the robot 100 will be described with reference to a flowchart shown in fig. 5. The dialogue control process is a process of controlling a dialogue according to the preference of the subject user. Here, the dialogue control process will be described by taking as an example a case where the control unit 110A of the robot 100A executes. The control unit 110A starts the session control process when the user detection unit 111A detects the user USR around the robot 100A.
When the session control process is started, the control unit 110A first executes the user specification process (step S101). Here, the user specification process will be described with reference to a flowchart shown in fig. 6. The user specifying process is a process of specifying the user existing around the robot 100A detected by the user detecting unit 111A.
When the user specification process is started, the control unit 110A first extracts a face image of the target user from the captured image acquired by the imaging unit 130A (step S201). The control unit 110A (user specification unit 112A) detects a skin color region in the captured image, for example, and determines whether or not a portion corresponding to a facial portion such as an eye, nose, or mouth is present in the skin color region, and when it is determined that a portion corresponding to a facial portion is present, extracts the skin color region as a facial image.
Next, the control unit 110A searches for a registered user corresponding to the extracted face image (step S202). The control unit 110A (user specification unit 112A) detects a feature amount from the extracted face image, for example, and searches for a registered user having a similarity equal to or greater than a predetermined reference by comparing the feature amount with the face information stored in the user information DB121A of the storage unit 120A.
The control unit 110A determines the user USR existing around the robot 100 based on the search result in step S202 (step S203). The control unit 110A (user specifying unit 112A) specifies, as the target user existing around the robot 100A, a user USR corresponding to, for example, the feature quantity having the highest similarity with the feature quantity detected from the face image among the feature quantities of the faces of the plurality of users USR stored in the user information DB 121A.
After the processing of step S203 is executed, the control unit 110A ends the user specification processing and returns the processing to the session control processing.
Returning to fig. 5, after the user specification process is performed (step S101), the control unit 110A establishes a communication connection with the robot 100B (other robots) (step S102). Here, the establishment of the communication connection means a state in which a predetermined procedure is performed by designating a communication target and data can be transmitted and received to and from each other. The control unit 110A controls the communication unit 170A to perform a predetermined process based on the communication scheme, thereby establishing a communication connection with the robot 100B. In addition, when the robot 100A and the robot 100B perform data communication using an infrared communication method, it is not necessary to establish a communication connection in advance.
Next, the control unit 110A determines whether or not the target user specified in step S101 has uttered within a predetermined time (for example, within 20 seconds) shorter than the predetermined elapsed time (step S103). The control unit 110A measures the elapsed Time from the start of execution of the present process using, for example, current Time information measured by an RTC (Real Time Clock) attached to the CPU, and determines whether or not the target user has uttered within a predetermined Time based on the sound information acquired by the user information acquisition unit 113A.
When it is determined that the target user has uttered within the predetermined time (yes in step S103), the control unit 110A (speech control unit 115A) determines that a session with the target user is being performed, and determines the content of the utterance as a response to the utterance of the target user in cooperation with the robot 100B (step S104). The control unit 110A (speech control unit 115A) refers to the speech information DB123A and the user information DB121A of the storage unit 120A, and determines candidates of topics corresponding to the speech content of the target user and suitable for the preference of the target user stored in the user information DB 121A. In this case, topics corresponding to preference degrees a and B described below are determined as candidates of topics suitable for the preference of the target user.
In step S104, when the determined topic candidate is one, it is determined as the final topic. On the other hand, when candidates for a plurality of topics are determined, if the utterance history information is stored in the storage unit 120B of the robot 100B, the control unit 110A (the utterance control unit 115A) reads out the utterance history information stored in the storage unit 120B via the communication unit 170A, and determines whether or not a topic identical to or associated with any one of the candidates for a plurality of topics exists in the read-out utterance history information, and the elapsed time from the date and time of the utterance to the current topic (that is, when the utterance of the robot 100A starts) is a topic (hereinafter referred to as "topic 1 to be compared") within a predetermined elapsed time.
Then, when determining that the topic to be compared 1 is present in the speech history information, the control unit 110A (speech control unit 115A) removes topics matching or associated with the topic to be compared 1 from among the candidates of the topics, and finally determines the topics. If there are a plurality of topic candidates remaining from the removal, one topic selected at random from the topic candidates is determined as a final topic.
On the other hand, when the candidate of the plurality of topics is determined, if no speech history information is stored in the storage unit 120B of the robot 100B or if it is determined that the 1 st topic to be compared is not present in the speech history information, one topic selected randomly from the determined candidate of the plurality of topics is determined as the final topic. The speech control unit 115A outputs text data indicating the content of speech according to the topic determined as described above.
On the other hand, when it is determined that the target user has not uttered for a predetermined period of time (step S103: no), the control unit 110A (the utterance control unit 115A) determines a topic of the utterance of the conversation with the target user (step S105). At this time, the control unit 110A (speech control unit 115A) refers to the speech information DB123A and the user information DB121A of the storage unit 120A, and determines candidates of a plurality of topics suitable for the preference of the target user stored in the user information DB 121A. In this case, topics corresponding to preference degrees a and B described below are determined as candidates of topics suitable for the preference of the target user.
In step S105, if the determined topic candidate is one, it is determined as the final topic. On the other hand, when a plurality of topic candidates are determined, the final topic is selected from the plurality of topic candidates, as in the case of step S104. Specifically, when the control unit 110A (the speech control unit 115A) determines candidates for a plurality of topics, if the speech history information is stored in the storage unit 120B of the robot 100B, the control unit 110A (the speech control unit 115A) reads the speech history information stored in the storage unit 120B via the communication unit 170A, and determines whether or not the 1 st comparison topic exists in the read speech history information.
Then, when determining that the topic to be compared 1 is present in the speech history information, the control unit 110A (speech control unit 115A) removes topics matching or associated with the topic to be compared 1 from among the candidates of the topics, and finally determines the topics. If there are a plurality of topic candidates remaining from the removal, one topic selected at random from the topic candidates is determined as a final topic.
On the other hand, when the candidate of the plurality of topics is determined, if no speech history information is stored in the storage unit 120B of the robot 100B or if it is determined that the 1 st topic to be compared is not present in the speech history information, one topic selected randomly from the determined candidate of the plurality of topics is determined as the final topic.
When the target user does not speak for a predetermined period of time, an operation of talking to the target user is triggered by a session between the target user and the robot 100A and the robot 100B, and is performed to prompt the target user to use the session system 1.
After step S104 or step S105 is executed, the control unit 110A speaks based on the speech content according to the determined topic (step S106). The control unit 110A (voice synthesis unit 116A) generates voice data corresponding to text data representing the speech content of the robot 100A, which is input from the speech control unit 115A, and controls the voice output unit 150A to output a voice based on the voice data.
Steps S107 to S109 are processes for determining the reaction of the target user to the utterance of the robot 100A in step S106.
The control unit 110A (the sound determination unit 117AA of the response determination unit 117A) first executes the sound determination process (step S107). Here, the sound determination processing will be described with reference to a flowchart shown in fig. 7. The sound determination process is a process of determining a reaction of the target user to the utterance of the robot 100A based on the sound emitted from the target user after the utterance of the robot 100A.
When the sound determination processing is started, the sound determination unit 117AA first determines whether or not the target user uttered after the utterance of the robot 100A in step S106 (step S301). The control unit 110A determines whether or not a speech is being made to the user who is the subject of the speech of the robot 100A, based on the sound information acquired by the user information acquisition unit 113A after the speech is made to the robot 100A.
When it is determined that there is a utterance of the target user after the utterance of the robot 100A (yes in step S301), the voice determination unit 117AA extracts a feature keyword from the utterance of the target user for the utterance of the robot 100A (step S302). The voice determination unit 117AA extracts, for example, keywords related to emotion as feature keywords that give features to the utterance content of the target user, based on the text data representing the utterance content of the target user generated by the voice recognition unit 114.
Next, the sound determination unit 117AA determines the sound reaction polarity based on the feature key (step S303). The sound determination unit 117AA refers to, for example, the sound reaction polarity determination table shown in fig. 4 stored as the reaction determination information in the reaction determination information DB124A of the storage unit 120A, and performs determination based on the sound reaction polarity associated with the extracted feature keyword. The sound determination unit 117AA determines that the sound reaction polarity is "positive" when the feature key is "like", "happy", or the like, for example.
On the other hand, when it is determined that there is no utterance of the target user after the utterance of the robot 100A (step S301: no), the voice determination unit 117AA determines that the voice response polarity is "neutral" because the response of the target user to the utterance of the robot 100A is not clear (step S304).
After step S303 or S304 is executed, the control unit 110 ends the sound determination processing and returns the processing to the dialogue control processing.
Returning to fig. 5, after the sound determination process is performed (step S107), the control section 110A (the expression determination section 117BA of the reaction determination section 117) performs the expression determination process (step S108). Here, the condition determination processing will be described with reference to a flowchart shown in fig. 8. The expression determination process is a process of determining the reaction of the target user to the utterance of the robot 100A based on the expression of the target user.
When the expression determination processing is started, the control unit 110A (the expression determination unit 117BA of the reaction determination unit 117A) first extracts a face image of the target user from the captured image acquired by the user information acquisition unit 113A after the utterance of the robot 100A in step S106 (step S401).
Next, the expression determination unit 117BA calculates the smile degree of the target user based on the face image extracted in step S401 (step S402). The control unit 110 refers to the smile degree information stored in the reaction determination information DB124A, for example, and calculates the smile degree of the target user in the range of 0 to 100% based on the change in the eye corner position, the change in the mouth size, and the like in the face image.
Next, the expression determination unit 117BA determines whether or not the smiling face degree of the target user calculated in step S402 is 70% or more (step S403). When the smiling face of the target user is 70% or more (yes in step S403), the control unit 110 determines that the expression reaction polarity is "positive" (step S405).
When the smiling face of the target user is not 70% or more (step S403: no), the control unit 110A determines whether or not the smiling face of the target user is 40% or more and less than 70% (step S404). When the smile degree of the target user is 40% or more and less than 70% (step S404: yes), the control unit 110 determines that the expression reaction polarity is "neutral" (step S406).
If the smile degree of the target user is not 40% or more and less than 70% (step S404: NO), that is, if the smile degree of the target user is less than 40%, the control unit 110 determines that the emotion response polarity is "negative" (step S407).
After determining the emotion response polarity of the target user in any one of steps S405 to S407, the control unit 110A ends the emotion determination process and returns the process to the dialogue control process.
Returning to fig. 5, after the expression determination process is performed (step S108), the control section 110A performs an action determination process (step S109). Here, the action determination process will be described with reference to a flowchart shown in fig. 9. The action determination process is a process of determining the reaction of the target user to the utterance of the robot 100A based on the action of the target user.
When the action determination processing is started, the control unit 110A (action determination unit 117CA of the reaction determination unit 117A) first determines whether or not the target user is actively moving (step S501). The action determination unit 117CA determines based on the movement of the target user in the captured image acquired by the user information acquisition unit 113A after the utterance of the robot 100A in step S106. When it is determined that the target user is actively moving (yes in step S501), the action determining unit 117CA determines whether or not the line of sight of the target user is directed to the robot 100A (step S502). The action determination unit 117CA determines the direction of the line of sight of the target user based on, for example, the pupil position and the direction of the face in the eye region in the captured image acquired by the user information acquisition unit 113A.
When it is determined that the line of sight of the target user is directed to the robot 100A (yes in step S502), the action determining unit 117CA determines that the action reaction polarity is "positive" (step S508). On the other hand, when it is determined that the line of sight of the target user is not directed to the robot 100A (step S502: no), the action determining unit 117CA determines that the action reaction polarity is "negative" (step S509).
In step S501, when it is determined that the target user is not actively moving (NO in step S501), the action determination unit 117CA determines whether or not the target user is approaching the robot 100A (step S503). The action determination unit 117CA performs determination based on, for example, a change in the size of the face image in the captured image acquired by the user information acquisition unit 113A.
When it is determined that the target user is approaching the robot 100A (yes in step S503), the action determination unit 117CA determines whether or not the line of sight of the target user is directed to the robot 100A (step S504). When it is determined that the line of sight of the target user is directed to the robot 100A (yes in step S504), the action determining unit 117CA determines that the action reaction polarity is "positive" (step S508). On the other hand, when it is determined that the line of sight of the target user is not directed to the robot 100A (step S504: no), the action determining unit 117CA determines that the action reaction polarity is "negative" (step S509).
In step S503, when it is determined that the target user is not approaching the robot 100A (step S503: NO), the action determination unit 117CA determines whether the target user is away from the robot 100A (step S505). When it is determined that the target user is away from the robot 100A (yes in step S505), the action determining unit 117CA determines that the action reaction polarity is "negative" (step S509).
On the other hand, when it is determined that the target user is not away from the robot 100A (step S505: no), the action determination unit 117C determines whether or not the face of the target user is not observed (step S506). When the face of the target user is inverted in direction or the like and the face image of the target user cannot be extracted from the captured image, the action determination unit 117CA determines that the face portion of the target user is not observed. When it is determined that the face portion of the target user is not observed (yes in step S506), the action determining unit 117CA determines that the action reaction polarity is "neutral" (step S510).
When it is determined that the face portion of the target user is not observed (step S506: no), the action determination unit 117CA determines whether or not the line of sight of the target user is directed to the robot 100A (step S507). When it is determined that the line of sight of the target user is directed to the robot 100A (yes in step S507), the action determining unit 117CA determines that the action reaction polarity is "positive" (step S508). On the other hand, when it is determined that the line of sight of the target user is not directed to the robot 100A (step S507: no), the action determining unit 117CA determines that the action reaction polarity is "negative" (step S509).
The control unit 110 ends the action determination process after determining the action reaction polarity of the target user in any one of step S508 to step S510, and returns the process to the session control process.
Returning to fig. 5, after the action determination process (step S109) is performed, the control unit 110A (preference determination unit 118A) performs the preference determination process (step S110). Here, the preference determination processing will be described with reference to a flowchart shown in fig. 10. The preference determination process is a process as follows: using the determination results of the sound determination process, the expression determination process, and the action determination process, the preference degree of the target user for the topic of the conversation between the target user and the robot 100A is comprehensively determined.
When the preference determination processing is started, the preference determination unit 118A first identifies the topic of the conversation between the target user and the robot 100A (step S601). The preference determination unit 118A refers to a topic keyword stored in a RAM or the like, and specifies a topic of a conversation between the target user and the robot 100A when the target user is talking to the target user without speaking for a predetermined time in step S105 of the conversation control process. On the other hand, if no topic is set in advance, the topic of the conversation between the target user and the robot 100A is specified by extracting a topic keyword from the utterance of the target user based on the text data representing the utterance content of the target user generated by the voice recognition unit 114A. For example, topics such as "baseball" are specified based on the speech of the target user such as "like baseball".
Next, the preference determination unit 118A determines whether or not the sound reaction polarity determined by the sound determination processing of fig. 7 is "positive" (step S602), and if the sound reaction polarity is "positive" (step S602: yes), determines the preference degree as "preference degree a" (step S609).
If the sound reaction polarity is not "positive" (no in step S602), the preference determination unit 118A determines whether the sound reaction polarity is "negative" (step S603). When the sound reaction polarity is "negative" (yes in step S603), the preference determination unit 118A determines whether or not the expression reaction polarity determined by the expression determination process in fig. 8 is "positive" (step S604). When the expression reaction polarity is "positive" (yes in step S604), the preference determination unit 118A determines the preference degree as "preference degree B" (step S610). On the other hand, when the expression reaction polarity is not "positive" (no in step S604), the preference determination unit 118A determines the preference degree as "preference degree D" (step S612).
In step S603, when the sound reaction polarity is not "negative" (step S603: NO), the preference determination unit 118A determines whether the action reaction polarity determined by the action determination processing of FIG. 9 is "positive" (step S605). When the polarity of the action reaction is "positive" (yes in step S605), the preference determination unit 118A determines whether or not the polarity of the expression reaction is "positive" or "neutral" (step S606). When the expression reaction polarity is either "positive" or "neutral" (yes in step S606), the preference determination unit 118A determines the preference degree as "preference degree a" (step S609). On the other hand, when the expression reaction polarity is not either "positive" or "neutral" (no in step S606), that is, when the expression reaction polarity is "negative", the preference determination unit 118A determines the preference degree as "preference degree C" (step S611).
In step S605, if the action reaction polarity is not "positive" (step S605: no), the preference determination unit 118A determines whether the action reaction polarity is "neutral" (step S607), and if the action reaction polarity is not "neutral" (step S607: no), the preference determination unit 118A determines the preference degree as "preference degree C" (step S611).
On the other hand, when the action reaction polarity is "neutral" (yes in step S607), the preference determination unit 118A determines whether or not the expression reaction polarity is "positive" (step S608). The preference determination unit 118A determines the preference degree as "preference degree B" when the expression reaction polarity is "positive" (yes in step S608), and determines the preference degree as "preference degree D" when the expression reaction polarity is not "positive" (no in step S608) (step S612).
In any one of step S609 to step S612, after determining the preference degree of the target user, the preference determination unit 118A ends the preference determination process, and returns the process to the dialogue control process.
Returning to fig. 5, after the preference determination process is executed (step S110), the control unit 110A reflects the preference determination result in preference degree information (step S111). The control unit 110A updates the preference information by adding information that associates the topic of the conversation between the target user and the robot 100A with the preference degree as the preference determination result of the preference determination process to the preference degree information of the user information stored in the user information DB 121A. Thus, preference information is updated for each user USR. The topic of the conversation between the target user and the robot 100A is a topic represented by topic keywords stored in RAM or the like. The control unit 110A controls the communication unit 170A to transmit information that associates the topic of the conversation between the target user and the robot 100A with the preference to the robot 100B. The robot 100B that has received this information similarly adds this information to the preference information of the user information stored in the user information DB121B, and updates the preference information. Thus, the robot 100A and the robot 100B can share the preference determination results. Further, an initial value of preference degree included in preference degree information stored in association with each of the plurality of topics is set as preference degree a. In this way, the control unit 110A (110B) including the reaction determination unit 117A (117B), the preference determination unit 118A (118B), and the communication unit 170A (170B) function as the reaction acquisition means of the present invention.
After the process of step S111 is performed, the control unit 110A determines whether or not the target user exists around the robot 100A (step S112). When it is determined that the target user exists around the robot 100A (yes in step S112), the control unit 110A determines that the dialogue with the target user can be continued, and returns the process to step S103. In step S103 when yes in step S112, it is determined whether or not the elapsed time from the completion of the utterance in step S106 is within a predetermined time.
On the other hand, when it is determined that there is no target user around the robot 100A (no in step S112), the control unit 110A determines that the session with the target user cannot be continued, and releases the communication connection with the robot 100B (other robot) (step S113). The control unit 110A performs a predetermined process based on the communication scheme by controlling the communication unit 170A, thereby releasing the communication connection with the robot 100B. After that, the control unit 110A ends the session control process.
The above is the session control process executed by the control unit 110A of the robot 100A, and the session control process executed by the control unit 110B of the robot 100B is also the same. As shown in fig. 5, the control unit 110B starts the session control process. The user determination process is performed as shown in fig. 6.
In step S103 of FIG. 5, when it is determined that the target user has uttered within a predetermined time (step S103: yes), the control unit 110B (the utterance control unit 115B) determines that a conversation with the target user is being performed, and determines the utterance content as a response to the utterance of the target user (step S104). The control unit 110B (speech control unit 115B) refers to the speech information DB123B and the user information DB121B of the storage unit 120B, and determines candidates of topics corresponding to the speech content of the target user and matching the preference of the target user.
In step S104, if the determined topic candidate is one, it is determined as the final topic. On the other hand, when candidates for a plurality of topics are determined, if the utterance history information is stored in the storage unit 120A of the robot 100A, the control unit 110B (the utterance control unit 115B) reads out the utterance history information stored in the storage unit 120A via the communication unit 170B. Then, the control unit 110B (the speech control unit 115B) determines whether or not there is a topic identical to or associated with any one of the candidates of the plurality of topics in the read speech history information, and the elapsed time from the speech date and time to the current (that is, when the robot 100B starts speaking) is a topic (hereinafter referred to as "the 2 nd topic to be compared") within a predetermined elapsed time.
When it is determined that the topic to be compared 2 is present, the control unit 110B (speech control unit 115B) removes topics matching or associated with the topic to be compared 2 from among the topic candidates, and finally determines topics.
On the other hand, when the candidate of the plurality of topics is determined, if no speech history information is stored in the storage unit 120A of the robot 100A or if it is determined that the topic to be compared 2 is not present in the speech history information, one topic selected randomly from the determined candidate of the plurality of topics is determined as the final topic. The speech control unit 115B outputs text data indicating the content of speech according to the topic determined as described above.
On the other hand, when it is determined that the target user has not uttered for a predetermined period of time (step S103: no), the control unit 110B (the utterance control unit 115B) determines the content of the utterance of the user to be uttered (step S105). At this time, the control unit 110B (speech control unit 115B) refers to the speech information DB123B and the user information DB121B of the storage unit 120B, and determines candidates of a plurality of topics suitable for the preference of the target user stored in the user information DB 121B. In this case, topics corresponding to the preference degrees a and B are determined as candidates of topics as topics suitable for the preference of the target user.
In step S105, if the determined topic candidate is one, it is determined as the final topic. On the other hand, when a plurality of topic candidates are determined, the final topic is selected from the plurality of topic candidates, as in the case of step S104. Specifically, when the control unit 110B (the speech control unit 115B) determines candidates for a plurality of topics, the control unit 110B (the speech control unit 115B) reads out the speech history information stored in the storage unit 120A via the communication unit 170B when the speech history information is stored in the storage unit 120A of the robot 100A. Then, the control unit 110B (speech control unit 115B) determines whether or not the topic to be compared 2 exists in the read speech history information.
When it is determined that the topic to be compared 2 is present, the control unit 110B (speech control unit 115B) removes topics matching or associated with the topic to be compared 2 from among the topic candidates, and finally determines the topic.
On the other hand, when the candidate of the plurality of topics is determined, if no speech history information is stored in the storage unit 120A of the robot 100A or if it is determined that the topic to be compared 2 is not present in the speech history information, one topic selected randomly from the determined candidate of the plurality of topics is determined as the final topic.
The control unit 110B performs the sound determination process shown in fig. 7, the expression determination process shown in fig. 8, and the action determination process shown in fig. 9, which determine the reaction of the target user, when speaking based on the content of the speech on the determined topic (step S106) and outputting the sound. When the action determination processing ends, preference determination processing shown in fig. 10 is executed. The control unit 110B adds the preference determination result in the preference determination process to preference information of the user information stored in the user information DB121B, and updates the preference information. The control unit 110B controls the communication unit 170B to transmit information to the robot 100A that associates the topic of the conversation between the target user and the robot 100B with the preference. The robot 100A that has received this information similarly adds this information to the preference information of the user information stored in the user information DB121A, and updates the preference information. Thus, the robot 100A and the robot 100B share the preference determination results.
In embodiment 1, when one of the robots 100A and 100B is speaking and then the other robot is speaking within the predetermined elapsed time, the topic of the other robot speaking is determined to be a topic different from the topic of the one robot speaking within the predetermined elapsed time before the other robot speaking. In other cases, topics on which robots 100A and 100B speak are determined independently of each other (independent of each other) without cooperation with each other. Instead of this determination method, if the number of pieces of preference information of the target users stored in the user information DB121A (DB 121B) is smaller than a predetermined threshold, the topics uttered by the robots 100A and 100B may be determined as topics different from each other, and if the number of pieces of preference information is equal to or larger than the predetermined threshold, the topics uttered by the robots 100A and 100B may be determined independently of each other. That is, when the predetermined condition is satisfied, the topics on which the robots 100A and 100B are speaking may be determined as topics different from each other, and when the predetermined condition is not satisfied, the topics on which the robots 100A and 100B are speaking may be determined independently of each other. Alternatively, the topics (speaking contents) on which the robots 100A and 100B speak may be determined without always cooperating with each other regardless of the predetermined conditions.
(embodiment 2)
In the above embodiment, the robot 100A and the robot 100B have the functions of reaction determination and floor control, respectively, but these functions may be present independently of the robot 100A and the robot 100B. In the present embodiment, an external server capable of communicating with the robot 100A and the robot 100B is provided, and the server performs processing of reaction determination and speech control of the robot 100A and the robot 100B.
As shown in fig. 11, the dialogue system 1 in the present embodiment includes a robot 100A, a robot 100B, and a server 200.
As in embodiment 1, the robot 100A includes a control unit 110A, a storage unit 120A, an imaging unit 130A, a sound input unit 140A, a sound output unit 150A, a moving unit 160A, and a communication unit 170A. However, unlike the case of embodiment 1, the control unit 110A does not include the speech control unit 115A, the reaction determination unit 117A, and the preference determination unit 118A. Unlike embodiment 1, the storage unit 120A does not include the user information DB121A, the sound information DB122A, the speech information DB123A, and the response determination information DB124A. The robot 100B also includes a control unit 110B, a storage unit 120B, an imaging unit 130B, a sound input unit 140B, a sound output unit 150B, a moving unit 160B, and a communication unit 170B, similar to the robot 100A. The control unit 110B does not include the speech control unit 115B, the response determination unit 117B, and the preference determination unit 118B. The storage unit 120B does not include the user information DB121B, the sound information DB122B, the utterance information DB123B, and the response determination information DB124B.
The server 200 includes a control unit 210, a storage unit 220, and a communication unit 270. The control unit 210 includes a speech control unit 215, a response determination unit 217, and a preference determination unit 218. That is, instead of the robots 100A and 100B, the server 200 performs various processes for controlling the speech of each of the robots 100A and 100B, determining the reaction of the user, determining the preference of the user, and the like. The storage unit 220 includes a user information DB221, a sound information DB222, a speech information DB223, and a response determination information DB224. That is, the databases provided in the robots 100A and 100B are collected in the server 200. The storage unit 220 stores, for each user USR, utterance history information including the date and time when the robot 100A and the robot 100B made the utterance, the topic of the utterance, and the like. The server 200 performs wireless data communication with the robot 100A and the robot 100B via the communication unit 270, the communication unit 170A of the robot 100A, and the communication unit 170B of the robot 100B. Thereby, the server 200 controls the robot 100A and the robot 100B to perform a session with the target user. As described above, the communication units 170A and 170B function as the 1 st communication unit of the present invention. The communication unit 270 functions as the 2 nd communication unit of the present invention.
Next, a dialogue control process in the present embodiment will be described. Here, a dialogue control process of the robot 100A will be described as an example. The control unit 110A of the robot 100A starts the session control process when the user detection unit 111A detects the user USR around the robot 100A.
When the session control process (see fig. 5) is started, the control unit 110A first executes the user specification process. The control unit 110A searches for a registered user corresponding to a face image extracted from the captured image obtained from the imaging unit 130A. The control unit 110A (user specification unit 112A) accesses the user information DB221 of the storage unit 220 of the server 200, compares the face image extracted from the captured image with the face images of the plurality of users stored in the user information DB221, and specifies the user USR as the target user. Here, the control unit 110A functions as the object specifying means of the present invention.
When the control unit 210 of the server 200 that has received the information of the user USR determines that the target user has uttered within a predetermined time, the control unit 210 (the utterance control unit 215) determines that a conversation with the target user is being performed, and determines the content of the utterance as a response to the utterance of the target user. The control unit 210 (speech control unit 215) refers to the speech information DB223 and the user information DB221 of the storage unit 220, and determines candidates of topics corresponding to the speech content of the target user and matching the preference of the target user.
When the determined topic candidate is one, it is determined as the final topic. On the other hand, when the candidates of the plurality of topics are determined, if the utterance history information of the robot 100B is stored in the storage unit 220, the control unit 210 (the utterance control unit 215) reads the utterance history information stored in the storage unit 220, and determines whether or not the 1 st topic to be compared is present in the read utterance history information.
When it is determined that the 1 st topic to be compared exists, the control unit 210 (speech control unit 215) removes topics matching or associated with the 1 st topic to be compared from among the candidates of the plurality of topics, and finally determines the topics.
On the other hand, when the candidates of the plurality of topics are determined, if no utterance history information of the robot 100B is stored or if it is determined that the 1 st topic to be compared is not present in the utterance history information, one topic selected randomly from the determined candidates of the plurality of topics is determined as the final topic. The speech control unit 215 outputs text data indicating the content of speech according to the topic determined as described above.
On the other hand, when it is determined that the target user does not speak within the predetermined time, the control unit 210 (the speaking control unit 215) determines the speaking content of the conversation with the target user. At this time, the speech control unit 215 refers to the speech information DB223 and the user information DB221 of the storage unit 220, and determines candidates of a plurality of topics suitable for the preference of the target user stored in the user information DB 221.
When the determined topic candidate is one, it is determined as the final topic. On the other hand, when a plurality of topic candidates are determined, a final topic is selected from the plurality of topic candidates. When a plurality of topic candidates are determined, if the speech history information of the robot 100B is stored, the control unit 210 (the speech control unit 215) reads the speech history information and determines whether or not the 1 st topic to be compared exists.
When it is determined that the 1 st topic to be compared exists, the control unit 210 (speech control unit 215) removes topics matching or associated with the 1 st topic to be compared from among the candidates of the plurality of topics, and finally determines the topics.
On the other hand, when the candidates of the plurality of topics are determined, if no utterance history information of the robot 100B is stored or if it is determined that the 1 st topic to be compared is not present in the utterance history information, one topic selected randomly from the determined candidates of the plurality of topics is determined as the final topic.
The robot 100A receives the text data via the communication unit 170A and sends the text data to the voice synthesis unit 116A. The voice synthesizing unit 116A accesses the voice information DB222 of the storage unit 220 of the server 200, and generates voice data from the received text data using the acoustic model or the like stored in the voice information DB 222. The voice synthesis unit 116A controls the voice output unit 150A to output voice data generated.
Next, a reaction determination process (see fig. 7 to 9) is performed, which determines a reaction of the target user to the utterance of the robot 100A.
The control unit 210 (the sound determination unit 217A of the response determination unit 217) executes sound determination processing (see fig. 7). The sound determination unit 217A determines a reaction of the target user to the utterance of the robot 100A based on the sound made by the target user after the utterance of the robot 100A. When the target user speaks, the voice recognition unit 114A of the robot 100A accesses the voice information DB222 of the storage unit 220 of the server 200, and generates text data from the voice data using the acoustic model or the like stored in the voice information DB 222. The text data is transmitted to the server 200. The voice determination unit 217A determines the response of the target user to the utterance of the robot 100A and the robot 100B based on the text data received through the communication unit 270.
After the sound determination process is performed, the control section 210 (the expression determination section 217B of the reaction determination section 217) performs the expression determination process (refer to fig. 8). The expression determination unit 217B determines the reaction of the target user to the utterance of the robot 100A based on the expression of the target user after the utterance of the robot 100A. When the user information acquisition unit 113A of the robot 100A acquires the captured image of the user, the user information acquisition unit 113A transmits the captured image to the server 200 via the communication unit 170A. The expression determination unit 217B detects a feature amount of the face of the target user from the captured image acquired via the communication unit 270, refers to smile degree information stored in the reaction determination information DB224 of the storage unit 220, and calculates the smile degree of the target user based on the detected feature amount. The expression determination unit 217B determines the response of the target user to the utterance of the robot 100A based on the calculated smile degree.
After the expression determination process is performed, the control section 210 performs an action determination process (refer to fig. 9). The action determination unit 217C determines the reaction of the target user to the utterance of the robot 100A based on the action of the target user after the utterance of the robot 100A. The action determination unit 217C determines the reaction of the target user to the utterance of the robot 100A based on the action of the target user detected from the captured image acquired via the communication unit 270.
After the action determination process is performed, the control unit 210 (preference determination unit 218A) performs a preference determination process (see fig. 10). The preference determination unit 218 determines a topic of the conversation between the target user and the robot 100A, and determines a preference degree indicating a height of preference of the target user for the topic based on each determination result of the reaction determination unit 217.
After the preference determination processing is executed, the control unit 210 reflects the preference determination result in preference degree information. The control unit 210 updates the preference information by adding information that associates the topic of the conversation between the target user and the robot 100A with the preference degree as the preference determination result of the preference determination process to the preference degree information of the user information stored in the user information DB 221. Thus, preference information is updated for each user USR.
The same control process is performed also for the robot 100B. In embodiment 1, the robot 100A updates preference information of a session between the target user and the robot 100A, and transmits the updated preference information to the robot 100B, and the robot 100B that has received the updated preference information stored in the user information DB121B similarly. Thus, the robot 100A and the robot 100B can share the preference determination results. In contrast, in the present embodiment, the preference information of the robot 100A and the robot 100B is stored in the user information DB221 of the server 200 for each user USR, and thus, the preference information does not need to be updated.
In the above embodiment, the server 200 performs various processes such as control of the speech of each of the robot 100A and the robot 100B, determination of the reaction of the user, and determination of the preference of the user. However, the server 200 is not limited to this, and may selectively execute any process of the robot 100A and the robot 100B. For example, the control unit 210 of the server 200 may include only the speech control unit 215, and may execute only the processing of the speech control of the robot 100A and the robot 100B, and the other processing may be executed by the robot 100A and the robot 100B. Further, all of the processes of the robot 100A and the robot 100B, such as user detection, user specification, user information acquisition, voice recognition, voice synthesis, speech control, response determination, and preference determination, may be executed by the server. In the present embodiment, the storage unit 220 of the server 200 includes a user information DB221, a sound information DB222, a speech information DB223, and a response determination information DB224. However, the present invention is not limited thereto, and the server 200 may include any database. For example, in the present embodiment, instead of the server 200 having the sound information DB222, the robot 100A and the robot 100B may each have the sound information DB222. The face information of the specified user in the user information DB221 may be provided not only in the server 200 but also in the robot 100A and the robot 100B. Thus, the robot 100A and the robot 100B do not need to access the server 200 at the time of voice recognition, voice synthesis, and user determination.
As described above, according to embodiment 1, the conversation system 1 includes the robots 100A and 100B, and controls the utterances of each of the robots 100A and 100B based on the result of the determination of the response of the subject user to the utterance of the robot 100A (i.e., preference information of the subject user) and the result of the determination of the response of the subject user to the utterance of the robot 100B (i.e., preference information of the subject user).
According to embodiment 2, the conversation system 1 includes the robot 100A, the robot 100B, and the server 200 controls the utterances of each of the robot 100A and the robot 100B based on the result of determining the response of the subject user to the utterance of the robot 100A (i.e., preference information of the subject user) and the result of determining the response of the subject user to the utterance of the robot 100B (i.e., preference information of the subject user). As described above, according to embodiments 1 and 2, it is possible to grasp the preference of the target user with high accuracy and efficiency, and to perform a dialogue in accordance with the preference of the target user.
The present invention is not limited to the above-described embodiments, and various modifications and applications can be made. The above embodiment may be modified as follows.
In the above embodiment, the robot 100A and the robot 100B are provided at a place where the mutual utterances are not recognized by the target user. In contrast, a modification example will be described in which the robot 100A and the robot 100B are installed in a place where the utterances of each other are recognized by the target user. In this case, the robot 100A and the robot 100B can simultaneously perform a conversation with the target user. However, if the speaking times of the robot 100A and the robot 100B are repeated or continued, it may be impossible to appropriately determine which speaking the target user has reacted to. Thus, preference information of the target user cannot be acquired appropriately, and further, a response cannot be appropriately made. Therefore, the speech control unit 115A (115B) determines the speech start timing of the robot 100A (100B) in cooperation with the speech control unit 115B of the robot 100B (the speech control unit 115A of the robot 100A) so as to prevent the speech timings of the robot 100A and the robot 100B from being overlapped with each other or continued to each other. The speech control unit 115A (115B) determines the speech start timing of the robot 100A (100B) so that the speech interval between the robot 100A and the robot 100B is equal to or longer than a predetermined time, for example, a time sufficient for the reaction of the determination target user. The floor control unit 115B of the robot 100B (floor control unit 115A of the robot 100A) determines the floor start timing of the robot 100B (100A) so that the robot 100B (100A) does not continuously floor during the floor of the robot 100A (100B) and after the floor is fastened. The speaking start timing of the robot 100A and the robot 100B may be determined by one of the parties, in addition to the speaking control units 115A and 115B. When the server 200 controls the origination of the robot 100A and the origination of the robot 100B, the origination control unit 215 determines origination start timing of both. Thus, the robot 100A and the robot 100B do not speak continuously to each other, but do speak at timings different from each other by a predetermined time or more. This makes it possible to grasp the preference of the target user with high accuracy and to perform a dialogue in accordance with the preference of the target user.
In the modification described above, the speech control unit 115A may cooperate with the speech control unit 115B of the robot 100B to determine topics of the speech of the robot 100A and the robot 100B as topics different from each other. In this case, as in the case of embodiment 1, when one of the robots 100A and 100B is speaking within the predetermined elapsed time, the topic of the other robot speaking may be determined to be a topic different from the topic of the other robot speaking within the predetermined elapsed time before the other robot speaking, and otherwise, the topics of the robots 100A and 100B may be determined not to cooperate with each other and independently (independently of each other). Alternatively, in this case, if the number of pieces of preference information of the target user stored in the user information DB121A (DB 121B) is smaller than a predetermined threshold, the topics on which the robots 100A and 100B are speaking may be determined as topics different from each other, and if the number of pieces of preference information is equal to or larger than the predetermined threshold, the topics on which the robots 100A and 100B are speaking may be determined independently of each other. Alternatively, irrespective of the above-described predetermined conditions, topics (speaking contents) on which the robots 100A and 100B speak may be always determined without cooperation with each other and independently of each other.
For example, a movement control unit that controls the movement unit 160A according to the control of the speech by the speech control unit 115A may be provided. The movement control means may control the movement unit 160A to bring the robot 100A close to the target user in accordance with the start of the utterance of the robot 100A, for example.
For example, the plurality of robots 100 constituting the interactive system 1 may adopt a master/slave system, and for example, the robot 100 functioning as a master may determine the content of the utterance of the robot 100 functioning as a slave together with the content of the utterance contained in the robot 100 functioning as a slave, and instruct the robot 100 functioning as a slave to speak based on the determined content of the utterance. In this case, the determination method of the robot 100 functioning as the master and the robot 100 functioning as the slave is arbitrary, and for example, the robot whose surrounding user USR is detected and determined first may be made to function as the master and the other robots 100 may be made to function as slaves. For example, the robot 100 that is first powered on by the user USR may be made to function as a master and the robot 100 that is subsequently powered on may be made to function as a slave, or the user USR may be configured to be able to set the robot 100 that functions as a master and the robot 100 that functions as a slave using a physical switch or the like.
Further, the robot 100 functioning as a master device and the robot 100 functioning as a slave device may be determined in advance. In this case, some of the functions that can be executed by the robot 100 functioning as a slave may be omitted. For example, when speaking is instructed by the robot 100 functioning as the master, the robot 100 functioning as the slave may not have a function equivalent to the speaking controller 115A or the like.
In the above embodiment, the description has been given of an example in which the robot 100A and the robot 100B perform a session with the target user, but the session with the target user may be performed by 1 robot 100. In this case, for example, the 1-robot 100 may determine the content of the utterance itself and the content of the utterance of the other robots together as in the case of functioning as the master device, and sequentially output the determined content of the utterance with the color of the utterance changed, so that the 1-robot 100 may speak the utterances of the other robots.
In the above embodiment, the explanation has been given by taking the case where the interactive system 1 is a robot system including a plurality of robots 100 as an example, but the interactive system 1 may be configured by a plurality of interactive devices including all or a part of the configuration of the robots 100.
In the above embodiment, the control programs executed by the CPUs of the control units 110A and 110B are stored in advance in the ROM or the like. However, the present invention is not limited to this, and a control program for executing the above-described various processes may be installed in an existing general-purpose computer, architecture, workstation, or other electronic device, thereby functioning as a device corresponding to the robots 100A and 100B according to the above-described embodiment. For example, a portable terminal having a voice assist function, a digital signage, and the like are included as speaking devices corresponding to the robots 100A and 100B. The digital signage is a system for displaying images and information on an electronic display device such as a display. Note that speaking is not limited to outputting sound through a speaker, and is also included in a display device for displaying as text. Therefore, a portable terminal, a digital signage, and the like that display text for speaking are also included as speaking devices corresponding to the robots 100A and 100B.
Such a program may be provided by being distributed on a computer-readable recording medium (a floppy disk, a Compact Disc (CD) -ROM, a DVD (Digital Versatile Disc: digital versatile Disc) -ROM), or by being stored in advance in a memory on a network such as the internet and downloaded.
In the case where the processing is executed by sharing an OS (Operating System) with an application program or cooperation between the OS and the application program, only the application program may be stored in the recording medium or the memory. Further, the program may be distributed via a network by being superimposed on a carrier wave. For example, the above-described program may be advertised in a bulletin board (Bulletin Board System (BBS): electronic bulletin board system) on a network, and the program may be distributed via the network. The distributed program may be started and executed under the control of the OS in the same manner as other application programs, thereby enabling the execution of the processing described above.
The present invention can be embodied in various forms and modifications without departing from the broad spirit and scope of the present invention. The above embodiments are for the purpose of illustrating the present invention, and do not limit the scope of the present invention. That is, the scope of the present invention is not represented by the embodiments but by the scope of patent claims. Further, various modifications that are implemented within the scope of the patent claims and within the meaning of the invention equivalent to the scope of the patent claims are considered to be within the scope of the invention.

Claims (19)

1. A conversation control apparatus is characterized by comprising:
A reaction acquisition unit that acquires a plurality of reaction determination results including a result of determining a reaction of a predetermined object to an utterance of a 1 st utterance device and a result of determining a reaction of the predetermined object to an utterance of a 2 nd utterance device provided independently of the 1 st utterance device; and
and a speech control unit that cooperates with a plurality of speech devices including the 1 st and 2 nd speech devices to acquire an elapsed time after one of the 1 st and 2 nd speech devices has uttered, and controls at least one of the plurality of speech devices so that the topic of the other of the 1 st and 2 nd speech devices is determined to be a topic different from the topic of the one of the 1 st and 2 nd speech devices based on a plurality of reaction determination results acquired by the reaction acquisition unit when the elapsed time is within a predetermined elapsed time.
2. The dialog control device of claim 1, wherein,
the reaction obtaining means obtains the plurality of reaction determination results including a result of determining a reaction of the predetermined object to each of the 1 st and 2 nd speaking devices, in the case of: the location where the predetermined object is uttered by the 1 st speaker and the location where the predetermined object is uttered by the 2 nd speaker are locations where the predetermined object cannot recognize both of the 1 st and 2 nd speakers.
3. The dialog control device of claim 1, wherein,
the speech control means controls the speech of the 1 st and 2 nd speech devices not to be continuously performed but to be performed at a timing different from each other by a predetermined time or more.
4. The dialog control device of claim 1, wherein,
the speech control means determines the contents of the speech of the 1 st and 2 nd speech devices independently of each other.
5. The dialog control device of claim 1, wherein,
the plurality of reaction determination results are results of determining reactions of the predetermined object to the utterances of the 1 st and 2 nd utterances based on at least one of the sound emitted by the predetermined object and the captured image of the predetermined object.
6. The dialog control device of claim 1, wherein,
the device further comprises an acquisition unit for acquiring at least one of the sound emitted from the predetermined object and the captured image of the predetermined object,
the reaction acquiring means is configured to acquire the plurality of reaction determination results by determining the reaction of the predetermined object to the speech of each of the 1 st and 2 nd speech devices based on at least one of the sound and the captured image acquired by the acquiring means.
7. The dialog control device of claim 6, wherein,
the reaction acquisition means includes at least one of sound determination means, expression determination means, and action determination means,
the sound determination means determines the content of the sound of the utterance of the predetermined object for each of the 1 st and 2 nd speaking devices based on the sound acquired by the acquisition means,
the expression determination means determines the expression of the speech of the predetermined subject for each of the 1 st and 2 nd speech devices based on the captured image acquired by the acquisition means,
the action determination means determines the action of the predetermined object on the speech of each of the 1 st and 2 nd speech devices based on the captured image acquired by the acquisition means,
the reaction acquiring means acquires the plurality of reaction determination results by determining the reaction of the predetermined object to the speech of each of the 1 st and 2 nd speech devices based on the determination result of at least one of the sound determining means, the expression determining means, and the action determining means.
8. The dialog control device of claim 7, wherein,
The reaction acquisition means classifies the reaction of the predetermined object into a positive reaction, a negative reaction, and a neutral reaction that is neither positive nor negative based on at least one of the sound, the expression, and the action of the predetermined object, thereby determining the reaction of the predetermined object.
9. The session control device according to claim 6, further comprising:
a specification unit configured to specify a topic of a conversation with the predetermined object based on at least one of the sound emitted by the predetermined object, the utterance of the 1 st utterance device, and the utterance of the 2 nd utterance device; and
preference determination means for determining a preference degree indicating a degree of preference of the predetermined object for the topic determined by the determination means based on the plurality of obtained reaction determination results,
the speech control means controls the speech of at least any one of the plurality of speech devices based on the preference degree determined by the preference determination means.
10. The dialog control device of claim 9, wherein,
The preference is interest and preference related to things regardless of the shape and intangibility, and includes preference for speaking contents of at least one of the 1 st speaking device and the 2 nd speaking device in addition to interest and preference related to food, sports, and weather.
11. The dialog control device of claim 9, wherein,
the preference determination means determines the preference degree to a plurality of levels in order of the preference of the predetermined subject to the topic from high to low,
the speech control means controls the speech of at least one of the plurality of speech devices based on the information indicating the plurality of levels of preference determined by the preference determination means.
12. The dialog control device of claim 1, wherein,
the predetermined object includes a person, an animal, or a robot.
13. The dialog control device according to any of claims 1 to 12, characterized in that,
further comprising an object specifying means for specifying the predetermined object from a plurality of objects different from each other,
the reaction obtaining means obtains a plurality of reaction determination results including a result of determining a reaction of the specified object to the utterance of the 1 st utterance device and a result of determining a reaction of the specified object to the utterance of the 2 nd utterance device provided independently of the 1 st utterance device.
14. The dialog control device of claim 1, wherein,
the conversation control apparatus is provided in at least one of the 1 st and 2 nd speaking apparatuses.
15. The dialog control device of claim 1, wherein,
the conversation control apparatus is provided independently of the 1 st and 2 nd speaking apparatuses.
16. A dialogue system comprising a 1 st speech device and a 2 nd speech device configured to be able to speak, and a dialogue control device, characterized in that,
the session control device includes:
a reaction acquisition unit that acquires a plurality of reaction determination results including a result of determining a reaction of a predetermined object to an utterance of a 1 st utterance device and a result of determining a reaction of the predetermined object to an utterance of a 2 nd utterance device provided independently of the 1 st utterance device; and
and a speech control unit that cooperates with a plurality of speech devices including the 1 st and 2 nd speech devices to acquire an elapsed time after one of the 1 st and 2 nd speech devices has uttered, and controls the speech of at least one of the plurality of speech devices so that the topic of the other of the 1 st and 2 nd speech devices is determined to be a topic different from the topic of the one of the 1 st and 2 nd speech devices when the elapsed time is within a predetermined elapsed time based on a plurality of reaction determination results acquired by the reaction acquisition unit.
17. The dialog system of claim 16, wherein,
the 1 st and 2 nd speaking devices each include:
an acquisition unit that acquires at least one of a sound emitted from the predetermined object and an image captured by the predetermined object; and
a 1 st communication unit is provided which is configured to communicate with the first communication unit,
the conversation control apparatus further includes a 2 nd communication unit for communicating with the 1 st and 2 nd speaking devices via the 1 st communication unit,
the reaction acquisition means of the session control device is,
acquiring 1 st data, which is at least one of the sound and the captured image acquired by the acquisition means of the 1 st speech device, via the 1 st communication means and the 2 nd communication means, and determining a reaction of the predetermined object to the speech of the 1 st speech device based on the 1 st data acquired, thereby acquiring a 1 st reaction determination result, which is a determination result of the reaction of the predetermined object to the speech of the 1 st speech device,
acquiring 2 nd data, which is at least one of the sound and the captured image acquired by the acquisition means of the 2 nd speaker device, via the 1 st communication means and the 2 nd communication means, and determining a reaction of the predetermined object to the speaker of the 2 nd speaker device based on the acquired 2 nd data, thereby acquiring a 2 nd reaction determination result, which is a determination result of the reaction of the predetermined object to the speaker of the 2 nd speaker device,
The speech control means of the conversation control apparatus controls the speech of the 1 st and 2 nd speech apparatuses via the 2 nd communication means and the 1 st communication means based on the plurality of reaction determination results including the 1 st and 2 nd reaction determination results acquired by the reaction acquisition means.
18. A conversation control method, characterized by comprising:
a process of acquiring a plurality of reaction determination results including a result of determining a reaction of a predetermined object to a speech of a 1 st speech device and a result of determining a reaction of the predetermined object to a speech of a 2 nd speech device provided independently of the 1 st speech device; and
and processing for controlling at least one of the plurality of speaking devices by causing a plurality of speaking devices including the 1 st and 2 nd speaking devices to cooperate with each other, acquiring an elapsed time after one of the 1 st and 2 nd speaking devices speaks, and determining a topic of the other of the 1 st and 2 nd speaking devices as a topic different from the topic of the one of the 1 st and 2 nd speaking devices based on the acquired plurality of reaction determination results so that the elapsed time is within a predetermined elapsed time.
19. A storage medium, characterized in that,
a program for causing a computer to function as:
a reaction acquisition unit that acquires a plurality of reaction determination results including a result of determining a reaction of a predetermined object to an utterance of a 1 st utterance device and a result of determining a reaction of the predetermined object to an utterance of a 2 nd utterance device provided independently of the 1 st utterance device; and
and a speech control unit that cooperates with a plurality of speech devices including the 1 st and 2 nd speech devices to acquire an elapsed time after one of the 1 st and 2 nd speech devices has uttered, and controls the speech of at least one of the plurality of speech devices so that the topic of the other of the 1 st and 2 nd speech devices is determined to be a topic different from the topic of the one of the 1 st and 2 nd speech devices when the elapsed time is within a predetermined elapsed time based on a plurality of reaction determination results acquired by the reaction acquisition unit.
CN201910207297.1A 2018-03-26 2019-03-19 Dialogue control device, dialogue system, dialogue control method, and storage medium Active CN110364164B (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
JP2018058200 2018-03-26
JP2018-058200 2018-03-26
JP2018-247382 2018-12-28
JP2018247382A JP2019175432A (en) 2018-03-26 2018-12-28 Dialogue control device, dialogue system, dialogue control method, and program

Publications (2)

Publication Number Publication Date
CN110364164A CN110364164A (en) 2019-10-22
CN110364164B true CN110364164B (en) 2023-12-05

Family

ID=68167044

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910207297.1A Active CN110364164B (en) 2018-03-26 2019-03-19 Dialogue control device, dialogue system, dialogue control method, and storage medium

Country Status (2)

Country Link
JP (3) JP2019175432A (en)
CN (1) CN110364164B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7407560B2 (en) 2019-10-30 2024-01-04 日本放送協会 Keyword evaluation device, keyword evaluation method, and keyword evaluation program
WO2021131737A1 (en) * 2019-12-27 2021-07-01 ソニーグループ株式会社 Information processing device, information processing method, and information processing program
JP2021144086A (en) * 2020-03-10 2021-09-24 株式会社東海理化電機製作所 Agent system and computer program

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1561514A (en) * 2001-09-27 2005-01-05 松下电器产业株式会社 Dialogue apparatus, dialogue parent apparatus, dialogue child apparatus, dialogue control method, and dialogue control program
JP2006178063A (en) * 2004-12-21 2006-07-06 Toyota Central Res & Dev Lab Inc Interactive processing device
JP2008158697A (en) * 2006-12-21 2008-07-10 Nec Corp Robot control device
JP2016020963A (en) * 2014-07-14 2016-02-04 シャープ株式会社 Interaction evaluation device, interaction evaluation system, interaction evaluation method, and interaction evaluation program
JP2016109897A (en) * 2014-12-08 2016-06-20 シャープ株式会社 Electronic equipment, speech production control method and program
CN106233378A (en) * 2014-05-13 2016-12-14 夏普株式会社 Control device and message output control system
CN106484093A (en) * 2015-09-01 2017-03-08 卡西欧计算机株式会社 Session control, dialog control method
CN106503030A (en) * 2015-09-03 2017-03-15 卡西欧计算机株式会社 Session control, dialog control method
CN106663219A (en) * 2014-04-17 2017-05-10 软银机器人欧洲公司 Methods and systems of handling a dialog with a robot
WO2017094212A1 (en) * 2015-11-30 2017-06-08 ソニー株式会社 Information processing device, information processing method, and program
CN107053186A (en) * 2015-12-14 2017-08-18 卡西欧计算机株式会社 Interface, robot, dialogue method and storage medium
JP2017194910A (en) * 2016-04-22 2017-10-26 Cocoro Sb株式会社 Response data collection system, customer response system, and program
WO2017200072A1 (en) * 2016-05-20 2017-11-23 日本電信電話株式会社 Dialog method, dialog system, dialog device, and program
WO2017200077A1 (en) * 2016-05-20 2017-11-23 日本電信電話株式会社 Dialog method, dialog system, dialog device, and program

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004021509A (en) 2002-06-14 2004-01-22 Mitsubishi Heavy Ind Ltd Information sharing robot
JP2004062063A (en) * 2002-07-31 2004-02-26 Matsushita Electric Ind Co Ltd Interactive apparatus
JP2005099934A (en) * 2003-09-22 2005-04-14 Konica Minolta Photo Imaging Inc Robot service system
JP2007011674A (en) 2005-06-30 2007-01-18 National Institute Of Information & Communication Technology Method for executing service for explaining reason by using interactive robot, device and program thereof
JP2015184563A (en) * 2014-03-25 2015-10-22 シャープ株式会社 Interactive household electrical system, server device, interactive household electrical appliance, method for household electrical system to interact, and program for realizing the same by computer
JP2015219583A (en) 2014-05-14 2015-12-07 日本電信電話株式会社 Topic determination device, utterance device, method, and program
JP6555113B2 (en) 2015-12-14 2019-08-07 株式会社デンソー Dialogue device
JP2017151517A (en) * 2016-02-22 2017-08-31 富士ゼロックス株式会社 Robot control system
JP6380469B2 (en) * 2016-06-23 2018-08-29 カシオ計算機株式会社 Robot, robot control method and program
JP6767206B2 (en) * 2016-08-30 2020-10-14 シャープ株式会社 Response system
JP6731598B2 (en) * 2016-08-31 2020-07-29 Jsw株式会社 Communication device for ornamental fish, communication system for ornamental fish, and communication method with ornamental fish

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1561514A (en) * 2001-09-27 2005-01-05 松下电器产业株式会社 Dialogue apparatus, dialogue parent apparatus, dialogue child apparatus, dialogue control method, and dialogue control program
JP2006178063A (en) * 2004-12-21 2006-07-06 Toyota Central Res & Dev Lab Inc Interactive processing device
JP2008158697A (en) * 2006-12-21 2008-07-10 Nec Corp Robot control device
CN106663219A (en) * 2014-04-17 2017-05-10 软银机器人欧洲公司 Methods and systems of handling a dialog with a robot
CN106233378A (en) * 2014-05-13 2016-12-14 夏普株式会社 Control device and message output control system
JP2016020963A (en) * 2014-07-14 2016-02-04 シャープ株式会社 Interaction evaluation device, interaction evaluation system, interaction evaluation method, and interaction evaluation program
JP2016109897A (en) * 2014-12-08 2016-06-20 シャープ株式会社 Electronic equipment, speech production control method and program
CN106484093A (en) * 2015-09-01 2017-03-08 卡西欧计算机株式会社 Session control, dialog control method
CN106503030A (en) * 2015-09-03 2017-03-15 卡西欧计算机株式会社 Session control, dialog control method
WO2017094212A1 (en) * 2015-11-30 2017-06-08 ソニー株式会社 Information processing device, information processing method, and program
CN107053186A (en) * 2015-12-14 2017-08-18 卡西欧计算机株式会社 Interface, robot, dialogue method and storage medium
JP2017194910A (en) * 2016-04-22 2017-10-26 Cocoro Sb株式会社 Response data collection system, customer response system, and program
WO2017200072A1 (en) * 2016-05-20 2017-11-23 日本電信電話株式会社 Dialog method, dialog system, dialog device, and program
WO2017200077A1 (en) * 2016-05-20 2017-11-23 日本電信電話株式会社 Dialog method, dialog system, dialog device, and program

Also Published As

Publication number Publication date
JP2023133410A (en) 2023-09-22
JP2019175432A (en) 2019-10-10
JP2023055910A (en) 2023-04-18
CN110364164A (en) 2019-10-22
JP7416295B2 (en) 2024-01-17

Similar Documents

Publication Publication Date Title
CN110313152B (en) User registration for an intelligent assistant computer
EP3623118B1 (en) Emotion recognizer, robot including the same, and server including the same
JP6505748B2 (en) Method for performing multi-mode conversation between humanoid robot and user, computer program implementing said method and humanoid robot
JP7416295B2 (en) Robots, dialogue systems, information processing methods and programs
EP1494210B1 (en) Speech communication system and method, and robot apparatus
US20220032482A1 (en) Information processing device and storage medium
KR20010062767A (en) Information processing device, information processing method and storage medium
US20210232807A1 (en) Information processing system, storage medium, and information processing method
JP7476941B2 (en) ROBOT, ROBOT CONTROL METHOD AND PROGRAM
US20200110968A1 (en) Identification device, robot, identification method, and storage medium
US20190295526A1 (en) Dialogue control device, dialogue system, dialogue control method, and recording medium
CN108665907B (en) Voice recognition device, voice recognition method, recording medium, and robot
JP2006243555A (en) Response determination system, robot, event output server, and response determining method
JPWO2019181144A1 (en) Information processing equipment and information processing methods, and robot equipment
JP6798258B2 (en) Generation program, generation device, control program, control method, robot device and call system
JP6887035B1 (en) Control systems, control devices, control methods and computer programs
US20220288791A1 (en) Information processing device, information processing method, and program
EP3502940A1 (en) Information processing device, robot, information processing method, and program
US20210166685A1 (en) Speech processing apparatus and speech processing method
JP2017191531A (en) Communication system, server, and communication method
US20220157305A1 (en) Information processing apparatus, information processing method, and program
JP2002307349A (en) Robot device, information learning method, and program and recording medium
JP7425681B2 (en) Social ability generation device, social ability generation method, and communication robot
WO2020004213A1 (en) Information processing device, information processing method, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant