WO2021131737A1 - Information processing device, information processing method, and information processing program - Google Patents

Information processing device, information processing method, and information processing program Download PDF

Info

Publication number
WO2021131737A1
WO2021131737A1 PCT/JP2020/045993 JP2020045993W WO2021131737A1 WO 2021131737 A1 WO2021131737 A1 WO 2021131737A1 JP 2020045993 W JP2020045993 W JP 2020045993W WO 2021131737 A1 WO2021131737 A1 WO 2021131737A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
user
information processing
utterance
thing
Prior art date
Application number
PCT/JP2020/045993
Other languages
French (fr)
Japanese (ja)
Inventor
英樹 野間
直矢 村松
Original Assignee
ソニーグループ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニーグループ株式会社 filed Critical ソニーグループ株式会社
Publication of WO2021131737A1 publication Critical patent/WO2021131737A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • the present invention relates to an information processing device, an information processing method, and an information processing program.
  • Patent Document 1 discloses a technique of calculating an expected value of the user's attention to output information and controlling information output based on the expected value.
  • the information processing apparatus of one form according to the present disclosure includes an acquisition unit that acquires utterance information indicating an utterance by a user, the utterance information acquired by the acquisition unit, and a specific object. Based on the user information about the user in which the thing information indicating the above and the emotion information indicating the user's feelings toward the specific thing are associated with each other, the providing unit that provides the expression information expressed by the robot device. Be prepared.
  • Embodiment 1-1 Outline of information processing according to the embodiment 1-2.
  • Configuration of Information Processing Device According to Embodiment 1-3.
  • Information processing procedure according to the embodiment 2.
  • Effect of this disclosure 3.
  • Hardware configuration
  • FIG. 1 is a diagram showing an example of information processing according to the embodiment of the present disclosure.
  • the information processing shown in FIG. 1 is performed by the robot device 10 and the information processing device 100.
  • the robot device 10 can be various devices that perform autonomous operations based on environmental recognition.
  • the robot device 10 according to the present embodiment is an oblong agent-type robot device that autonomously travels by wheels.
  • the robot device 10 realizes various communications including information presentation by performing autonomous operations according to, for example, the user, the surroundings, and its own situation.
  • the robot device 10 may be a small robot having a size and weight that can be easily lifted by a user with one hand.
  • the information processing device 100 acquires utterance information indicating utterance by the user. Further, the information processing device 100 is a robot device based on the acquired speech information and user information about the user in which the thing information indicating a specific thing and the emotion information indicating the user's feelings for the specific thing are associated with each other.
  • the expression information expressed by 10 is provided.
  • the information processing device 100 acquires voice data of a user's utterance from the robot device 10. Subsequently, the information processing device 100 acquires the voice recognition result of the acquired voice data (for example, "I ate curry yesterday"). Subsequently, the information processing device 100 decomposes the voice recognition result into morphemes by morphological analysis. For example, the information processing apparatus 100 acquires morphemes such as "yesterday: noun, tense”, “curry: noun, food,” “eat: verb", and "yo: particle” by morphological analysis.
  • the information processing device 100 acquires the voice direction (for example, "direction of 45 degrees to the left") from the robot device 10. Further, the information processing device 100 acquires a user's face identification result (for example, face ID "U1") from the robot device 10. Further, the information processing device 100 acquires image data from the robot device 10. Subsequently, the information processing device 100 matches the voice direction (the direction of 45 degrees to the left) acquired from the robot device 10 with the position of the user in the image data (the user with the face ID "U1" in the direction of 45 degrees to the left). Identify the user who is the speaker of the utterance.
  • the voice direction for example, "direction of 45 degrees to the left
  • a user's face identification result for example, face ID "U1"
  • the information processing device 100 acquires image data from the robot device 10. Subsequently, the information processing device 100 matches the voice direction (the direction of 45 degrees to the left) acquired from the robot device 10 with the position of the user in the image data (the user with the face ID "
  • the information processing device 100 acquires the user's preference information. For example, the information processing apparatus 100 acquires information that the food that the user U1 likes is "curry" as the preference information of the user U1. Subsequently, the information processing apparatus 100 inputs the character string "curry", which is the preference target of the user U1, into the sympathy model, and acquires the character string "delicious” as the output data of the sympathy model.
  • the information processing apparatus 100 acquires the spoken sentence template.
  • the information processing device 100 acquires an utterance sentence template of "X is Y, isn't it?" (X is a character string indicating a user's preference target, and Y is a character string output from the sympathy model).
  • X is a character string indicating a user's preference target
  • Y is a character string output from the sympathy model.
  • the information processing device 100 acquires the utterance template "X is Y, isn't it?" Apply the character string to generate the response sentence "Curry is delicious, isn't it?"
  • the emotion of the user U1 who made the utterance is a positive emotion. Presumed to be.
  • the information processing device 100 When the information processing device 100 generates the response sentence, the information processing device 100 provides the robot device 10 with the response sentence "Curry is delicious, isn't it?". Further, when the information processing device 100 estimates the emotion of the user U1, the information processing device 100 provides the robot device 10 with information on the behavior based on the estimated emotion. For example, when the information processing device 100 estimates that the emotion of the user U1 is a positive emotion, the information processing device 100 provides the robot device 10 with information on a bright tone as information on the tone of the voice of the response sentence output by the robot device 10 by voice. To do. Further, the information processing device 100 provides the robot device 10 with information indicating a smile as information regarding the facial expression of the robot device 10. Further, when the robot device 10 moves at the time of outputting the response sentence, the information processing device 100 provides the robot device 10 with information indicating a quick speed as information on the operating speed.
  • FIG. 2 is a block diagram showing an example of a schematic configuration of the information processing apparatus according to the embodiment of the present disclosure.
  • the information processing system 1 has a robot device 10 and an information processing device 100.
  • the information processing device 100 includes an acquisition unit 131 and a generation unit 132.
  • the acquisition unit 131 includes a voice recognizer, a speaker identifyr, a morphological analyzer, and a preprocessing unit.
  • the voice recognizer acquires voice data from the robot device 10. When the voice recognizer acquires the voice data, it recognizes the voice of the voice data and outputs the voice recognition result to the preprocessing unit.
  • the speaker classifier acquires the voice direction, face recognition result, and image data from the robot device 10.
  • the speaker classifier acquires the voice direction, face recognition result, and image data, it identifies the speaker and outputs the speaker identification result to the preprocessing unit.
  • the pre-processing unit acquires the voice recognition result from the voice recognizer.
  • the preprocessing unit acquires the voice recognition result, it outputs the acquired voice recognition result to the morphological analyzer.
  • the morphological analyzer acquires the voice recognition result from the preprocessing unit.
  • the morphological analyzer acquires the speech recognition result, it performs morphological analysis on the speech recognition result and decomposes the speech recognition result into morphemes.
  • the morphological analyzer performs morphological analysis, it outputs the morpheme to the preprocessing unit.
  • the preprocessing unit acquires the speaker identification result from the speaker classifier.
  • the preprocessing unit acquires the speaker identification result, it refers to the preference information database 121 and acquires the identified speaker preference information from the preference information database 121.
  • the preprocessing unit acquires the morpheme and speaker preference information, it outputs the acquired morpheme and speaker preference information to the generation unit 132.
  • the generation unit 132 acquires morpheme and speaker preference information from the preprocessing unit. When the morpheme and the speaker's preference information are acquired, the generation unit 132 generates an utterance sentence and estimates the speaker's emotion based on the acquired state and speaker's preference information.
  • FIG. 3 is a diagram showing a configuration example of the information processing device according to the embodiment of the present disclosure.
  • the information processing device 100 according to the embodiment of the present disclosure includes a communication unit 110, a storage unit 120, and a control unit 130.
  • the communication unit 110 is realized by, for example, a NIC or the like. Then, the communication unit 110 is connected to the network N by wire or wirelessly, and transmits / receives information to / from, for example, the robot device 10.
  • the storage unit 120 is realized by, for example, a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory (Flash Memory), or a storage device such as a hard disk or an optical disk.
  • the storage unit 120 stores the information processing program according to the embodiment.
  • the storage unit 120 has a preference information database 121, a model information database 122, and a template database 123.
  • the preference information database 121 stores various information related to the user's preference.
  • An example of the preference information database according to the embodiment will be described with reference to FIG.
  • FIG. 4 is a diagram showing an example of a preference information database according to the embodiment of the present disclosure.
  • the preference information database 121 has a "user ID”, a "name”, a “favorite food”, a “disliked food”, a “recently dissatisfied", a "sad thing”, and a “sad thing”. It has items such as "hobbies" and "hometown".
  • “User ID” indicates identification information that identifies the user.
  • “Name” indicates the name of the user.
  • “Favorite food” indicates the food that the user likes.
  • “Disliked food” indicates food that the user dislikes.
  • "Recently dissatisfied” indicates that the user has recently been dissatisfied.
  • "What makes you sad” indicates what makes the user sad.
  • “Hobby” indicates a user's hobby.
  • “Hometown” indicates the hometown of the user.
  • the name of the user (user U1) identified by the user ID "U1" is “A”.
  • the food that user U1 likes is “curry”.
  • the food that user U1 dislikes is “coriander”.
  • what user U1 has recently been dissatisfied with is the "tax increase.”
  • what makes user U1 sad is the "birthday”.
  • the hobby of user U1 is "futsal”.
  • the hometown of the user is "Tokyo”.
  • Model information database 122 The model information database 122 stores various information related to the empathy model. Specifically, the model information database 122 stores various information related to the learning model learned to output the character string corresponding to the user's preference information when the character string indicating the user's preference information is input. To do. For example, the model information database 122 stores the identification information that identifies the sympathy model and the model data of the sympathy model in association with each other.
  • the template database 123 stores various information related to the template. Specifically, the template database 123 stores an utterance template of "X is Y, isn't it?" (X is a character string indicating a user's preference target, Y is a character string output from the sympathy model).
  • Control unit 130 In the control unit 130, various programs (corresponding to an example of an information processing program) stored in a storage device inside the information processing device 100 by a CPU (Central Processing Unit), an MPU (Micro Processing Unit), or the like have a RAM as a work area. It is realized by executing as. Further, the control unit 130 is realized by, for example, an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • control unit 130 has an acquisition unit 131, a generation unit 132, and a provision unit 133, and realizes or executes an information processing function or operation described below.
  • the internal configuration of the control unit 130 is not limited to the configuration shown in FIG. 3, and may be another configuration as long as it is a configuration for performing information processing described later.
  • the acquisition unit 131 acquires utterance information indicating utterance by the user. Specifically, the acquisition unit 131 acquires voice data of a user's utterance from the robot device 10. Subsequently, the acquisition unit 131 acquires the voice recognition result (for example, "I ate curry yesterday") of the acquired voice data. Subsequently, the acquisition unit 131 decomposes the speech recognition result into morphemes by morphological analysis. For example, the acquisition unit 131 acquires morphemes such as "yesterday: noun, tense”, “curry: noun, food,””eat:verb", and "yo: particle” by morphological analysis.
  • the acquisition unit 131 acquires the voice direction (for example, "direction of 45 degrees to the left") from the robot device 10. Further, the acquisition unit 131 acquires the user's face identification result (for example, the face ID “U1”) from the robot device 10. Further, the acquisition unit 131 acquires image data from the robot device 10. Subsequently, the acquisition unit 131 collates the voice direction (the direction of 45 degrees to the left) acquired from the robot device 10 with the position of the user in the image data (the user of the face ID “U1” in the direction of 45 degrees to the left). Identify the user who is the speaker of the speech.
  • the voice direction for example, "direction of 45 degrees to the left
  • the acquisition unit 131 acquires the user's face identification result (for example, the face ID “U1”) from the robot device 10.
  • the acquisition unit 131 acquires image data from the robot device 10. Subsequently, the acquisition unit 131 collates the voice direction (the direction of 45 degrees to the left) acquired from the robot device 10 with the position of the
  • the acquisition unit 131 acquires the user's preference information. Specifically, when the acquisition unit 131 identifies a user, it refers to the preference information database 121 and acquires the preference information of the identified user. For example, the acquisition unit 131 acquires information that the food that the user U1 likes is "curry" as the preference information of the user U1. In this way, the acquisition unit 131 associates the thing information indicating a specific thing (for example, "curry") with the emotion information indicating the user's feelings for the specific thing (for example, "the food that the user likes"). Acquires user information about the user.
  • the acquisition unit 131 acquires the user's preference information. Specifically, when the acquisition unit 131 identifies a user, it refers to the preference information database 121 and acquires the preference information of the identified user. For example, the acquisition unit 131 acquires information that the food that the user U1 likes is "curry" as the preference information of the user U1. In this way, the acquisition unit 131 associates the thing
  • the generation unit 132 generates a character string to be applied to the utterance sentence template. Specifically, when the user's preference information is acquired by the acquisition unit 131, the generation unit 132 inputs the acquired preference information into the sympathy model and utters "X is Y, isn't it?" As the output data of the sympathy model. Generate a character string that applies to the "Y" part of the sentence template. For example, the generation unit 132 inputs the character string "curry", which is the preference target of the user U1, into the sympathy model, and generates the character string "delicious" as the output data of the sympathy model.
  • the generation unit 132 acquires the utterance sentence template.
  • the generation unit 132 acquires an utterance sentence template of "X is Y, isn't it?" (X is a character string indicating a user's preference target, and Y is a character string output from the empathy model).
  • X is a character string indicating a user's preference target
  • Y is a character string output from the empathy model.
  • the generation unit 132 when the speaker's preference information is acquired by the acquisition unit 131, the generation unit 132 generates response information indicating a response to the utterance based on the speaker's preference information. For example, when the speaker's preference information is acquired by the acquisition unit 131, the generation unit 132 generates an utterance sentence output by the robot device based on the speaker's preference information.
  • the generation unit 132 since the generation unit 132 includes "curry", which is a food that user U1 likes, in "I ate curry yesterday", which is an utterance by user U1, the emotion of user U1 who made the utterance is a positive emotion. Presumed to be. In this way, the generation unit 132 estimates the emotion of the speaker based on the utterance information acquired by the acquisition unit 131 and the preference information of the speaker. For example, the generation unit 132 estimates that the speaker's emotions are positive when the utterance information acquired by the acquisition unit 131 includes the speaker's preference target.
  • the generation unit 132 when the utterance "Pakuchi came out at noon" by the user U1 includes the food “Pakuchi” that the user U1 dislikes, the emotion of the user U1 who made the utterance is Presumed to be a negative emotion. Further, the generation unit 132 inputs the character string "Pakuchi", which is an object disliked by the user U1, into the sympathy model, and generates the character string "Sorry" as the output data of the sympathy model.
  • the generation unit 132 when the user U1's utterance "I'm going to see a futsal game next time” includes the user U1's hobby "futsal", the emotion of the user U1 who made the utterance is positive. Presumed to be emotion. Further, the generation unit 132 inputs the character string "futsal", which is the hobby of the user U1, into the sympathy model, and generates the character string "fun” as the output data of the sympathy model.
  • the providing unit 133 is based on the speech information acquired by the acquiring unit 131 and the user information about the user in which the thing information indicating a specific thing and the emotion information indicating the user's feelings for the specific thing are linked. It provides the expression information expressed by the robot device. Specifically, the providing unit 133 provides expression information which is response information indicating a response to an utterance. More specifically, the providing unit 133 provides the expression information which is the sentence of the utterance output by the robot device. For example, the providing unit 133 provides expression information showing empathy based on emotional information associated with the thing information included in the utterance information. The providing unit 133 provides the expression information generated by the generating unit 132. In the example shown in FIG. 1, the providing unit 133 provides the robot device 10 with a response sentence "curry is delicious, isn't it?" Generated by the generating unit 132.
  • the providing unit 133 provides the expression information showing sympathy based on the emotional information associated with at least one of the plurality of thing information.
  • the acquisition unit 131 acquires the utterance "I ate curry while watching the rabbit at the zoo”.
  • the provider 133 was associated with at least one of "curry” and “rabbit”, such as “curry, I like it! Or "rabbit, I hate it a little”.
  • the providing unit 133 gives an emotion associated with the thing information included in the bunsetsu that does not modify the other bunsetsu among the bunsetsu included in the utterance information.
  • the providing unit 133 provides the robot device 10 with information on the behavior based on the estimated emotion. Specifically, the providing unit 133 provides display information which is information indicating the tone of the voice output by the robot device. For example, assuming that the emotion of the user U1 is a positive emotion, the providing unit 133 provides the robot device 10 with information on a bright tone as information on the tone of the voice of the response sentence output by the robot device 10 by voice. .. Alternatively, assuming that the emotion of the user U1 is a negative emotion, the providing unit 133 provides the robot device 10 with information on the dark tone as information on the tone of the voice of the response sentence output by the robot device 10 by voice. ..
  • the providing unit 133 provides the expression information which is the information indicating the facial expression of the robot device. For example, assuming that the emotion of the user U1 is a positive emotion, the providing unit 133 provides the robot device 10 with information indicating a smile as information regarding the facial expression of the robot device 10. Alternatively, assuming that the emotion of the user U1 is a negative emotion, the providing unit 133 provides the robot device 10 with information indicating a sad face as information regarding the facial expression of the robot device 10.
  • the providing unit 133 provides display information which is information indicating the operating speed of the robot device. For example, if the providing unit 133 estimates that the emotion of the user U1 is a positive emotion when the robot device 10 moves at the time of outputting the response sentence, the robot device 10 provides information indicating a quick speed as information on the operating speed. To provide. Alternatively, assuming that the emotion of the user U1 is a negative emotion, the providing unit 133 provides the robot device 10 with information indicating a slow speed as information regarding the operating speed.
  • FIG. 5 is a flowchart showing an information processing procedure according to the embodiment of the present disclosure.
  • the information processing apparatus 100 acquires the morpheme of the voice recognition result (step S101).
  • the information processing device 100 acquires the preference information of the speaker based on the speaker identification result (step S102).
  • the information processing device 100 estimates the emotion of the speaker based on the acquired morpheme and the preference information of the speaker, and generates an utterance sentence of the robot device 10 (step S103).
  • the information processing apparatus 100 includes an acquisition unit 131 and a provision unit 133.
  • the acquisition unit 131 acquires utterance information indicating utterance by the user.
  • the providing unit 133 is based on the speech information acquired by the acquiring unit 131 and the user information about the user in which the thing information indicating a specific thing and the emotion information indicating the user's feelings for the specific thing are linked. It provides the expression information expressed by the robot device.
  • the information processing device 100 enables the robot device to express an expression that matches the emotions of the other party, so that the range of communication with the user can be expanded.
  • the providing unit 133 provides the expression information which is the response information indicating the response to the utterance.
  • the information processing device 100 enables the robot device to respond according to the emotions of the other party, so that the range of communication with the user can be expanded.
  • the providing unit 133 provides expression information showing sympathy based on emotional information associated with the thing information included in the utterance information.
  • the information processing device 100 enables the robot device to estimate the user's emotion based on, for example, the user's preference information included in the utterance, and to express an expression showing sympathy for the estimated emotion. Therefore, the range of communication with the user can be expanded.
  • the providing unit 133 provides the expression information showing sympathy based on the emotional information associated with at least one of the plurality of thing information. To do.
  • the information processing device 100 enables the robot device to make an appropriate response according to the emotion of the other party, so that the range of communication with the user can be expanded.
  • the providing unit 133 gives an emotion associated with the thing information included in the bunsetsu that does not modify the other bunsetsu among the bunsetsu included in the utterance information.
  • the information processing device 100 enables the robot device to make an appropriate response according to the emotion of the other party, so that the range of communication with the user can be expanded.
  • the providing unit 133 provides the expression information which is the text of the utterance output by the robot device.
  • the information processing device 100 enables the robot device to make an appropriate utterance according to the emotion of the other party.
  • the providing unit 133 provides the expression information which is the information indicating the tone of the voice output by the robot device.
  • the providing unit 133 provides the expression information which is the information indicating the facial expression of the robot device.
  • the providing unit 133 provides display information which is information indicating the operating speed of the robot device.
  • FIG. 6 is a hardware configuration diagram showing an example of a computer 1000 that realizes the functions of an information processing device such as the information processing device 100.
  • the computer 1000 includes a CPU 1100, a RAM 1200, a ROM (Read Only Memory) 1300, an HDD (Hard Disk Drive) 1400, a communication interface 1500, and an input / output interface 1600. Each part of the computer 1000 is connected by a bus 1050.
  • the CPU 1100 operates based on the program stored in the ROM 1300 or the HDD 1400, and controls each part. For example, the CPU 1100 expands the program stored in the ROM 1300 or the HDD 1400 into the RAM 1200 and executes processing corresponding to various programs.
  • the ROM 1300 stores a boot program such as a BIOS (Basic Input Output System) executed by the CPU 1100 when the computer 1000 is started, a program that depends on the hardware of the computer 1000, and the like.
  • BIOS Basic Input Output System
  • the HDD 1400 is a computer-readable recording medium that non-temporarily records a program executed by the CPU 1100 and data used by the program.
  • the HDD 1400 is a recording medium for recording an information processing program according to the present disclosure, which is an example of program data 1450.
  • the communication interface 1500 is an interface for the computer 1000 to connect to an external network 1550 (for example, the Internet).
  • the CPU 1100 receives data from another device or transmits data generated by the CPU 1100 to another device via the communication interface 1500.
  • the input / output interface 1600 is an interface for connecting the input / output device 1650 and the computer 1000.
  • the CPU 1100 receives data from an input device such as a keyboard or mouse via the input / output interface 1600. Further, the CPU 1100 transmits data to an output device such as a display, a speaker, or a printer via the input / output interface 1600. Further, the input / output interface 1600 may function as a media interface for reading a program or the like recorded on a predetermined recording medium (media).
  • the media is, for example, an optical recording medium such as a DVD (Digital Versatile Disc) or PD (Phase change rewritable Disk), a magneto-optical recording medium such as an MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory.
  • an optical recording medium such as a DVD (Digital Versatile Disc) or PD (Phase change rewritable Disk)
  • a magneto-optical recording medium such as an MO (Magneto-Optical disk)
  • a tape medium such as a magnetic tape
  • magnetic recording medium such as a magnetic tape
  • semiconductor memory for example, an optical recording medium such as a DVD (Digital Versatile Disc) or PD (Phase change rewritable Disk), a magneto-optical recording medium such as an MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory.
  • the CPU 1100 of the computer 1000 realizes the functions of the control unit 130 and the like by executing the information processing program loaded on the RAM 1200.
  • the information processing program according to the present disclosure and the data in the storage unit 120 are stored in the HDD 1400.
  • the CPU 1100 reads the program data 1450 from the HDD 1400 and executes the program, but as another example, these programs may be acquired from another device via the external network 1550.
  • the present technology can also have the following configurations.
  • An acquisition unit that acquires utterance information indicating utterances by the user, Based on the speech information acquired by the acquisition unit, the user information about the user in which the thing information indicating a specific thing and the emotion information indicating the user's emotion toward the specific thing are associated with each other, the robot A provider that provides the expression information that the device expresses, Information processing device equipped with.
  • the providing part The information processing device according to (1) above, which provides the expression information which is the response information indicating the response to the utterance.
  • the providing part The information processing device according to (1) or (2), which provides the expression information indicating sympathy based on the emotional information associated with the thing information included in the utterance information.
  • the providing part When the spoken information includes a plurality of the said thing information, the expression showing sympathy based on the emotional information associated with at least one of the said said thing information among the plurality of said thing information.
  • the information processing apparatus according to any one of (1) to (3) above, which provides information.
  • the providing part When the utterance information includes a plurality of the thing information, the emotion associated with the thing information included in the clause not modifying the other clauses among the clauses included in the utterance information.
  • the information processing apparatus according to any one of (1) to (4) above, which provides the expression information showing sympathy based on the information.
  • the providing part The information processing device according to any one of (1) to (5) above, which provides the expression information which is the text of the utterance output by the robot device.
  • the providing part The information processing device according to any one of (1) to (6) above, which provides the display information which is information indicating the tone of the voice output by the robot device.
  • the providing part The information processing device according to any one of (1) to (7), which provides the expression information which is information indicating the facial expression of the robot device.
  • the providing part The information processing device according to any one of (1) to (8), which provides the display information which is information indicating the operating speed of the robot device.
  • Acquires utterance information indicating the utterance by the user The robot device expresses based on the acquired utterance information, the thing information indicating a specific thing, and the user information about the user in which the emotion information indicating the user's feelings for the specific thing is associated with each other.
  • An information processing method that executes processing. (11) On the computer The acquisition procedure for acquiring utterance information indicating the utterance by the user, Based on the speech information acquired by the acquisition procedure, the user information about the user in which the thing information indicating a specific thing and the emotion information indicating the user's emotion toward the specific thing are associated with each other, the robot Providing procedure, which provides the expression information expressed by the device, Information processing program to execute.
  • Information processing system 10 Robot device 100 Information processing device 110 Communication unit 120 Storage unit 121 Preference information database 122 Model information database 123 Template database 130 Control unit 131 Acquisition unit 132 Generation unit 133 Providing unit

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Manipulator (AREA)

Abstract

An image processing device (100) according to the present application comprises an acquisition unit (131) and a provision unit (133). The acquisition unit (131) acquires speech information indicating speech uttered by a user. The provision unit (133) provides expression information to be expressed by a robot device, the expression information being provided on the basis of the speech information acquired by the acquisition unit (131), and user information pertaining to the user in which thing information indicating a specific thing and emotion information indicating the user's emotion toward the specific thing are associated with other. The provision unit (133) provides expression information that is response information indicating a response to the speech.

Description

情報処理装置、情報処理方法及び情報処理プログラムInformation processing equipment, information processing methods and information processing programs
 本発明は、情報処理装置、情報処理方法及び情報処理プログラムに関する。 The present invention relates to an information processing device, an information processing method, and an information processing program.
 近年、ユーザのアクションに対する応答を行う種々の装置が普及している。上記のような装置には、ユーザからの問い合わせに対する回答を提示するエージェントなどが含まれる。例えば、特許文献1には、出力情報に対するユーザの注意力の期待値を算出し、当該期待値に基づいて情報出力を制御する技術が開示されている。 In recent years, various devices that respond to user actions have become widespread. Such devices include agents and the like that present answers to inquiries from users. For example, Patent Document 1 discloses a technique of calculating an expected value of the user's attention to output information and controlling information output based on the expected value.
特開2015-132878号公報Japanese Unexamined Patent Publication No. 2015-132878
 ところで、近年のエージェントにおいては、単なる情報提示に加え、ユーザとのコミュニケーションをより重視する傾向が認められる。しかし、特許文献1に記載されるような、ユーザのアクションに対して応答を行う装置では、十分なコミュニケーションが発生しているとは言い難い。 By the way, in recent years, agents tend to place more importance on communication with users in addition to simply presenting information. However, it cannot be said that sufficient communication occurs in a device that responds to a user's action as described in Patent Document 1.
 そこで、本開示では、ユーザとのコミュニケーションの幅を広げることができる情報処理装置、情報処理方法及び情報処理プログラムを提案する。 Therefore, in this disclosure, we propose an information processing device, an information processing method, and an information processing program that can expand the range of communication with the user.
 上記の課題を解決するために、本開示に係る一形態の情報処理装置は、ユーザによる発話を示す発話情報を取得する取得部と、前記取得部によって取得された前記発話情報と、特定の事物を示す事物情報と前記特定の事物に対する前記ユーザの感情を示す感情情報とが紐づけられた前記ユーザに関するユーザ情報とに基づいて、ロボット装置が表出する表出情報を提供する提供部、を備える。 In order to solve the above-mentioned problems, the information processing apparatus of one form according to the present disclosure includes an acquisition unit that acquires utterance information indicating an utterance by a user, the utterance information acquired by the acquisition unit, and a specific object. Based on the user information about the user in which the thing information indicating the above and the emotion information indicating the user's feelings toward the specific thing are associated with each other, the providing unit that provides the expression information expressed by the robot device. Be prepared.
本開示の実施形態に係る情報処理の一例を示す図である。It is a figure which shows an example of information processing which concerns on embodiment of this disclosure. 本開示の実施形態に係る情報処理装置の構成例を示す図である。It is a figure which shows the structural example of the information processing apparatus which concerns on embodiment of this disclosure. 本開示の実施形態に係る情報処理装置の概略的な構成の一例を示すブロック図である。It is a block diagram which shows an example of the schematic structure of the information processing apparatus which concerns on embodiment of this disclosure. 本開示の実施形態に係る嗜好情報データベースの一例を示す図である。It is a figure which shows an example of the preference information database which concerns on embodiment of this disclosure. 本開示の実施形態に係る情報処理手順を示すフローチャートである。It is a flowchart which shows the information processing procedure which concerns on embodiment of this disclosure. 情報処理装置や情報処理装置の機能を実現するコンピュータの一例を示すハードウェア構成図である。It is a hardware block diagram which shows an example of an information processing apparatus and a computer which realizes the function of an information processing apparatus.
 以下に、本開示の実施形態について図面に基づいて詳細に説明する。なお、以下の各実施形態において、同一の部位には同一の符号を付することにより重複する説明を省略する。 Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings. In each of the following embodiments, the same parts are designated by the same reference numerals, so that duplicate description will be omitted.
 以下に示す項目順序に従って本開示を説明する。
  1.実施形態
   1-1.実施形態に係る情報処理の概要
   1-2.実施形態に係る情報処理装置の構成
   1-3.実施形態に係る情報処理の手順
  2.本開示に係る効果
  3.ハードウェア構成
The present disclosure will be described according to the order of items shown below.
1. 1. Embodiment 1-1. Outline of information processing according to the embodiment 1-2. Configuration of Information Processing Device According to Embodiment 1-3. Information processing procedure according to the embodiment 2. Effect of this disclosure 3. Hardware configuration
[1.実施形態]
[1-1.実施形態に係る情報処理の概要]
 まず、図1を用いて、本開示の実施形態に係る情報処理の概要について説明する。図1は、本開示の実施形態に係る情報処理の一例を示す図である。図1に示す情報処理は、ロボット装置10と情報処理装置100とによって行われる。
[1. Embodiment]
[1-1. Outline of information processing according to the embodiment]
First, the outline of information processing according to the embodiment of the present disclosure will be described with reference to FIG. FIG. 1 is a diagram showing an example of information processing according to the embodiment of the present disclosure. The information processing shown in FIG. 1 is performed by the robot device 10 and the information processing device 100.
 ロボット装置10は、環境認識に基づく自律動作を行う種々の装置であり得る。例えば、本実施形態に係るロボット装置10は、車輪による自律走行を行う長楕円体のエージェント型のロボット装置である。ロボット装置10は、例えば、ユーザ、周囲、また自身の状況に応じた自律動作を行うことで、情報提示を含む種々のコミュニケーションを実現する。ロボット装置10は、ユーザが片手で容易に持ち上げられる程度の大きさおよび重量を有する小型ロボットであってもよい。 The robot device 10 can be various devices that perform autonomous operations based on environmental recognition. For example, the robot device 10 according to the present embodiment is an oblong agent-type robot device that autonomously travels by wheels. The robot device 10 realizes various communications including information presentation by performing autonomous operations according to, for example, the user, the surroundings, and its own situation. The robot device 10 may be a small robot having a size and weight that can be easily lifted by a user with one hand.
 情報処理装置100は、ユーザによる発話を示す発話情報を取得する。また、情報処理装置100は、取得した発話情報と、特定の事物を示す事物情報と特定の事物に対するユーザの感情を示す感情情報とが紐づけられたユーザに関するユーザ情報とに基づいて、ロボット装置10が表出する表出情報を提供する。 The information processing device 100 acquires utterance information indicating utterance by the user. Further, the information processing device 100 is a robot device based on the acquired speech information and user information about the user in which the thing information indicating a specific thing and the emotion information indicating the user's feelings for the specific thing are associated with each other. The expression information expressed by 10 is provided.
 図1に示す例では、情報処理装置100は、ロボット装置10からユーザによる発話の音声データを取得する。続いて、情報処理装置100は、取得した音声データの音声認識結果(例えば、「昨日カレー食べたよ」)を取得する。続いて、情報処理装置100は、音声認識結果を形態素解析によって形態素に分解する。例えば、情報処理装置100は、形態素解析によって、「昨日:名詞、時制」、「カレー:名詞、食べ物、」、「食べた:動詞」、「よ:助詞」といった形態素を取得する。 In the example shown in FIG. 1, the information processing device 100 acquires voice data of a user's utterance from the robot device 10. Subsequently, the information processing device 100 acquires the voice recognition result of the acquired voice data (for example, "I ate curry yesterday"). Subsequently, the information processing device 100 decomposes the voice recognition result into morphemes by morphological analysis. For example, the information processing apparatus 100 acquires morphemes such as "yesterday: noun, tense", "curry: noun, food," "eat: verb", and "yo: particle" by morphological analysis.
 また、情報処理装置100は、ロボット装置10から音声方向(例えば、「左45度の方向」)を取得する。また、情報処理装置100は、ロボット装置10からユーザの顔識別結果(例えば、顔ID「U1」)を取得する。また、情報処理装置100は、ロボット装置10から画像データを取得する。続いて、情報処理装置100は、ロボット装置10から取得した音声方向(左45度の方向)と画像データ中のユーザの位置(左45度の方向に顔ID「U1」のユーザ)とを突き合わせて発話の話者であるユーザを識別する。 Further, the information processing device 100 acquires the voice direction (for example, "direction of 45 degrees to the left") from the robot device 10. Further, the information processing device 100 acquires a user's face identification result (for example, face ID "U1") from the robot device 10. Further, the information processing device 100 acquires image data from the robot device 10. Subsequently, the information processing device 100 matches the voice direction (the direction of 45 degrees to the left) acquired from the robot device 10 with the position of the user in the image data (the user with the face ID "U1" in the direction of 45 degrees to the left). Identify the user who is the speaker of the utterance.
 続いて、情報処理装置100は、話者であるユーザを識別すると、ユーザの嗜好情報を取得する。例えば、情報処理装置100は、ユーザU1の嗜好情報として、ユーザU1が好きな食べ物が「カレー」であるという情報を取得する。続いて、情報処理装置100は、ユーザU1の嗜好対象である「カレー」という文字列を共感モデルに入力して、共感モデルの出力データとして「おいしい」という文字列を取得する。 Subsequently, when the information processing device 100 identifies the user who is the speaker, the information processing device 100 acquires the user's preference information. For example, the information processing apparatus 100 acquires information that the food that the user U1 likes is "curry" as the preference information of the user U1. Subsequently, the information processing apparatus 100 inputs the character string "curry", which is the preference target of the user U1, into the sympathy model, and acquires the character string "delicious" as the output data of the sympathy model.
 続いて、情報処理装置100は、発話文テンプレートを取得する。例えば、情報処理装置100は、「XはYよね」(Xはユーザの嗜好対象を示す文字列、Yは共感モデルから出力された文字列)という発話文テンプレートを取得する。情報処理装置100は、「XはYよね」という発話文テンプレートを取得すると、XにユーザU1の嗜好対象である「カレー」という文字列を、Yに共感モデルの出力データである「おいしい」という文字列を当てはめて、「カレーはおいしいよね」という応答文を生成する。また、情報処理装置100は、ユーザU1による発話である「昨日カレー食べたよ」にユーザU1が好きな食べ物である「カレー」が含まれることから、発話を行ったユーザU1の感情がポジティブな感情であると推定する。 Subsequently, the information processing apparatus 100 acquires the spoken sentence template. For example, the information processing device 100 acquires an utterance sentence template of "X is Y, isn't it?" (X is a character string indicating a user's preference target, and Y is a character string output from the sympathy model). When the information processing device 100 acquires the utterance template "X is Y, isn't it?" Apply the character string to generate the response sentence "Curry is delicious, isn't it?" Further, in the information processing device 100, since the utterance "I ate curry yesterday" by the user U1 includes the food "curry" that the user U1 likes, the emotion of the user U1 who made the utterance is a positive emotion. Presumed to be.
 情報処理装置100は、応答文を生成すると、ロボット装置10に対して「カレーはおいしいよね」という応答文を提供する。また、情報処理装置100は、ユーザU1の感情を推定すると、推定した感情に基づく振る舞いに関する情報をロボット装置10に提供する。例えば、情報処理装置100は、ユーザU1の感情がポジティブな感情であると推定すると、ロボット装置10が音声によって出力する応答文の音声のトーンに関する情報として、明るい調子に関する情報をロボット装置10に提供する。また、情報処理装置100は、ロボット装置10の表情に関する情報として、笑顔を示す情報をロボット装置10に提供する。また、情報処理装置100は、ロボット装置10が応答文の出力時に動く場合には、動作速度に関する情報として、素早い速度を示す情報をロボット装置10に提供する。 When the information processing device 100 generates the response sentence, the information processing device 100 provides the robot device 10 with the response sentence "Curry is delicious, isn't it?". Further, when the information processing device 100 estimates the emotion of the user U1, the information processing device 100 provides the robot device 10 with information on the behavior based on the estimated emotion. For example, when the information processing device 100 estimates that the emotion of the user U1 is a positive emotion, the information processing device 100 provides the robot device 10 with information on a bright tone as information on the tone of the voice of the response sentence output by the robot device 10 by voice. To do. Further, the information processing device 100 provides the robot device 10 with information indicating a smile as information regarding the facial expression of the robot device 10. Further, when the robot device 10 moves at the time of outputting the response sentence, the information processing device 100 provides the robot device 10 with information indicating a quick speed as information on the operating speed.
[1-2.実施形態に係る情報処理装置の構成]
 次に、図2を用いて、本開示の実施形態に係る情報処理装置の概略的な構成の一例について説明する。図2は、本開示の実施形態に係る情報処理装置の概略的な構成の一例を示すブロック図である。図2の例では、情報処理システム1は、ロボット装置10と情報処理装置100とを有する。
[1-2. Configuration of Information Processing Device According to Embodiment]
Next, an example of a schematic configuration of the information processing apparatus according to the embodiment of the present disclosure will be described with reference to FIG. FIG. 2 is a block diagram showing an example of a schematic configuration of the information processing apparatus according to the embodiment of the present disclosure. In the example of FIG. 2, the information processing system 1 has a robot device 10 and an information processing device 100.
 図2に示す例では、情報処理装置100は、取得部131と生成部132とを備える。取得部131は、音声認識器と、話者識別器と、形態素解析器と、前処理部を備える。 In the example shown in FIG. 2, the information processing device 100 includes an acquisition unit 131 and a generation unit 132. The acquisition unit 131 includes a voice recognizer, a speaker identifyr, a morphological analyzer, and a preprocessing unit.
 音声認識器は、ロボット装置10から音声データを取得する。音声認識器は、音声データを取得すると、音声データの音声を認識して、音声認識結果を前処理部に出力する。 The voice recognizer acquires voice data from the robot device 10. When the voice recognizer acquires the voice data, it recognizes the voice of the voice data and outputs the voice recognition result to the preprocessing unit.
 話者識別器は、ロボット装置10から音声方向および顔識別結果および画像データを取得する。話者識別器は、音声方向および顔識別結果および画像データを取得すると、話者を識別して、話者識別結果を前処理部に出力する。 The speaker classifier acquires the voice direction, face recognition result, and image data from the robot device 10. When the speaker classifier acquires the voice direction, face recognition result, and image data, it identifies the speaker and outputs the speaker identification result to the preprocessing unit.
 前処理部は、音声認識器から音声認識結果を取得する。前処理部は、音声認識結果を取得すると、取得した音声認識結果を形態素解析器に出力する。 The pre-processing unit acquires the voice recognition result from the voice recognizer. When the preprocessing unit acquires the voice recognition result, it outputs the acquired voice recognition result to the morphological analyzer.
 形態素解析器は、前処理部から音声認識結果を取得する。形態素解析器は、音声認識結果を取得すると、音声認識結果に対する形態素解析を行って、音声認識結果を形態素に分解する。形態素解析器は、形態素解析を行うと、形態素を前処理部に出力する。 The morphological analyzer acquires the voice recognition result from the preprocessing unit. When the morphological analyzer acquires the speech recognition result, it performs morphological analysis on the speech recognition result and decomposes the speech recognition result into morphemes. When the morphological analyzer performs morphological analysis, it outputs the morpheme to the preprocessing unit.
 また、前処理部は、話者識別器から話者識別結果を取得する。前処理部は、話者識別結果を取得すると、嗜好情報データベース121を参照して、識別した話者の嗜好情報を嗜好情報データベース121から取得する。 In addition, the preprocessing unit acquires the speaker identification result from the speaker classifier. When the preprocessing unit acquires the speaker identification result, it refers to the preference information database 121 and acquires the identified speaker preference information from the preference information database 121.
 続いて、前処理部は、形態素および話者の嗜好情報を取得すると、取得した形態素および話者の嗜好情報を生成部132に出力する。 Subsequently, when the preprocessing unit acquires the morpheme and speaker preference information, it outputs the acquired morpheme and speaker preference information to the generation unit 132.
 生成部132は、形態素および話者の嗜好情報を前処理部から取得する。生成部132は、形態素および話者の嗜好情報を取得すると、取得した態素および話者の嗜好情報に基づいて、発話文を生成するとともに、話者の感情を推定する。 The generation unit 132 acquires morpheme and speaker preference information from the preprocessing unit. When the morpheme and the speaker's preference information are acquired, the generation unit 132 generates an utterance sentence and estimates the speaker's emotion based on the acquired state and speaker's preference information.
 次に、図3を用いて、本開示の実施形態に係る情報処理装置の構成について説明する。図3は、本開示の実施形態に係る情報処理装置の構成例を示す図である。図3に示すように、本開示の実施形態に係る情報処理装置100は、通信部110、記憶部120、制御部130を備える。 Next, the configuration of the information processing device according to the embodiment of the present disclosure will be described with reference to FIG. FIG. 3 is a diagram showing a configuration example of the information processing device according to the embodiment of the present disclosure. As shown in FIG. 3, the information processing device 100 according to the embodiment of the present disclosure includes a communication unit 110, a storage unit 120, and a control unit 130.
(通信部110)
 通信部110は、例えば、NIC等によって実現される。そして、通信部110は、ネットワークNと有線または無線で接続され、例えば、ロボット装置10との間で情報の送受信を行う。
(Communication unit 110)
The communication unit 110 is realized by, for example, a NIC or the like. Then, the communication unit 110 is connected to the network N by wire or wirelessly, and transmits / receives information to / from, for example, the robot device 10.
(記憶部120)
 記憶部120は、例えば、RAM(Random Access Memory)、フラッシュメモリ(Flash Memory)等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。例えば、記憶部120は、実施形態に係る情報処理プログラムを記憶する。記憶部120は、図3に示すように、嗜好情報データベース121と、モデル情報データベース122と、テンプレートデータベース123を有する。
(Memory unit 120)
The storage unit 120 is realized by, for example, a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory (Flash Memory), or a storage device such as a hard disk or an optical disk. For example, the storage unit 120 stores the information processing program according to the embodiment. As shown in FIG. 3, the storage unit 120 has a preference information database 121, a model information database 122, and a template database 123.
(嗜好情報データベース121)
 嗜好情報データベース121は、利用者の嗜好に関する各種情報を記憶する。図4を用いて、実施形態に係る嗜好情報データベースの一例について説明する。図4は、本開示の実施形態に係る嗜好情報データベースの一例を示す図である。図4に示す例では、嗜好情報データベース121は、「ユーザID」、「名前」、「好きな食べ物」、「嫌いな食べ物」、「最近不満に思ったこと」、「悲しくなるもの」、「趣味」、「出身地」といった項目を有する。
(Preference information database 121)
The preference information database 121 stores various information related to the user's preference. An example of the preference information database according to the embodiment will be described with reference to FIG. FIG. 4 is a diagram showing an example of a preference information database according to the embodiment of the present disclosure. In the example shown in FIG. 4, the preference information database 121 has a "user ID", a "name", a "favorite food", a "disliked food", a "recently dissatisfied", a "sad thing", and a "sad thing". It has items such as "hobbies" and "hometown".
 「ユーザID」は、ユーザを識別する識別情報を示す。「名前」は、ユーザの名前を示す。「好きな食べ物」は、ユーザが好きな食べ物を示す。「嫌いな食べ物」は、ユーザが嫌いな食べ物を示す。「最近不満に思ったこと」は、ユーザが最近不満に思ったことを示す。「悲しくなるもの」は、ユーザが悲しくなるものを示す。「趣味」は、ユーザの趣味を示す。「出身地」は、ユーザの出身地を示す。 "User ID" indicates identification information that identifies the user. "Name" indicates the name of the user. "Favorite food" indicates the food that the user likes. "Disliked food" indicates food that the user dislikes. "Recently dissatisfied" indicates that the user has recently been dissatisfied. "What makes you sad" indicates what makes the user sad. "Hobby" indicates a user's hobby. "Hometown" indicates the hometown of the user.
 図4の1レコード目に示す例では、ユーザID「U1」で識別されるユーザ(ユーザU1)の名前は、「A」である。また、ユーザU1が好きな食べ物は、「カレー」である。また、ユーザU1が嫌いな食べ物は、「パクチー」である。また、ユーザU1が最近不満に思ったことは、「増税」である。また、ユーザU1が悲しくなるものは、「誕生日」である。また、ユーザU1の趣味は、「フットサル」である。また、ユーザの出身地は、「東京」である。 In the example shown in the first record of FIG. 4, the name of the user (user U1) identified by the user ID "U1" is "A". Also, the food that user U1 likes is "curry". The food that user U1 dislikes is "coriander". Also, what user U1 has recently been dissatisfied with is the "tax increase." Also, what makes user U1 sad is the "birthday". The hobby of user U1 is "futsal". The hometown of the user is "Tokyo".
(モデル情報データベース122)
 モデル情報データベース122は、共感モデルに関する各種情報を記憶する。具体的には、モデル情報データベース122は、ユーザの嗜好情報を示す文字列が入力された際に、ユーザの嗜好情報に対応する文字列を出力するよう学習された学習モデルに関する各種の情報を記憶する。例えば、モデル情報データベース122は、共感モデルを識別する識別情報と共感モデルのモデルデータとを対応付けて記憶する。
(Model information database 122)
The model information database 122 stores various information related to the empathy model. Specifically, the model information database 122 stores various information related to the learning model learned to output the character string corresponding to the user's preference information when the character string indicating the user's preference information is input. To do. For example, the model information database 122 stores the identification information that identifies the sympathy model and the model data of the sympathy model in association with each other.
(テンプレートデータベース123)
 テンプレートデータベース123は、テンプレートに関する各種情報を記憶する。具体的には、テンプレートデータベース123は、「XはYよね」(Xはユーザの嗜好対象を示す文字列、Yは共感モデルから出力された文字列)という発話文テンプレートを記憶する。
(Template database 123)
The template database 123 stores various information related to the template. Specifically, the template database 123 stores an utterance template of "X is Y, isn't it?" (X is a character string indicating a user's preference target, Y is a character string output from the sympathy model).
(制御部130)
 制御部130は、CPU(Central Processing Unit)やMPU(Micro Processing Unit)等によって、情報処理装置100内部の記憶装置に記憶されている各種プログラム(情報処理プログラムの一例に相当)がRAMを作業領域として実行されることにより実現される。また、制御部130は、例えば、ASIC(Application Specific Integrated Circuit)やFPGA(Field Programmable Gate Array)等の集積回路により実現される。
(Control unit 130)
In the control unit 130, various programs (corresponding to an example of an information processing program) stored in a storage device inside the information processing device 100 by a CPU (Central Processing Unit), an MPU (Micro Processing Unit), or the like have a RAM as a work area. It is realized by executing as. Further, the control unit 130 is realized by, for example, an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).
 図3に示すように、制御部130は、取得部131と、生成部132と、提供部133とを有し、以下に説明する情報処理の機能や作用を実現または実行する。なお、制御部130の内部構成は、図3に示した構成に限られず、後述する情報処理を行う構成であれば他の構成であってもよい。 As shown in FIG. 3, the control unit 130 has an acquisition unit 131, a generation unit 132, and a provision unit 133, and realizes or executes an information processing function or operation described below. The internal configuration of the control unit 130 is not limited to the configuration shown in FIG. 3, and may be another configuration as long as it is a configuration for performing information processing described later.
(取得部131)
 取得部131は、ユーザによる発話を示す発話情報を取得する。具体的には、取得部131は、ロボット装置10からユーザによる発話の音声データを取得する。続いて、取得部131は、取得した音声データの音声認識結果(例えば、「昨日カレー食べたよ」)を取得する。続いて、取得部131は、音声認識結果を形態素解析によって形態素に分解する。例えば、取得部131は、形態素解析によって、「昨日:名詞、時制」、「カレー:名詞、食べ物、」、「食べた:動詞」、「よ:助詞」といった形態素を取得する。
(Acquisition unit 131)
The acquisition unit 131 acquires utterance information indicating utterance by the user. Specifically, the acquisition unit 131 acquires voice data of a user's utterance from the robot device 10. Subsequently, the acquisition unit 131 acquires the voice recognition result (for example, "I ate curry yesterday") of the acquired voice data. Subsequently, the acquisition unit 131 decomposes the speech recognition result into morphemes by morphological analysis. For example, the acquisition unit 131 acquires morphemes such as "yesterday: noun, tense", "curry: noun, food,""eat:verb", and "yo: particle" by morphological analysis.
 また、取得部131は、ロボット装置10から音声方向(例えば、「左45度の方向」)を取得する。また、取得部131は、ロボット装置10からユーザの顔識別結果(例えば、顔ID「U1」)を取得する。また、取得部131は、ロボット装置10から画像データを取得する。続いて、取得部131は、ロボット装置10から取得した音声方向(左45度の方向)と画像データ中のユーザの位置(左45度の方向に顔ID「U1」のユーザ)とを突き合わせて発話の話者であるユーザを識別する。 Further, the acquisition unit 131 acquires the voice direction (for example, "direction of 45 degrees to the left") from the robot device 10. Further, the acquisition unit 131 acquires the user's face identification result (for example, the face ID “U1”) from the robot device 10. Further, the acquisition unit 131 acquires image data from the robot device 10. Subsequently, the acquisition unit 131 collates the voice direction (the direction of 45 degrees to the left) acquired from the robot device 10 with the position of the user in the image data (the user of the face ID “U1” in the direction of 45 degrees to the left). Identify the user who is the speaker of the speech.
 続いて、取得部131は、話者であるユーザを識別すると、ユーザの嗜好情報を取得する。具体的には、取得部131は、ユーザを識別すると、嗜好情報データベース121を参照して、識別したユーザの嗜好情報を取得する。例えば、取得部131は、ユーザU1の嗜好情報として、ユーザU1が好きな食べ物が「カレー」であるという情報を取得する。このように、取得部131は、特定の事物を示す事物情報(例えば、「カレー」)と特定の事物に対するユーザの感情を示す感情情報(例えば、「ユーザが好きな食べ物」)とが紐づけられたユーザに関するユーザ情報を取得する。 Subsequently, when the acquisition unit 131 identifies the user who is the speaker, the acquisition unit 131 acquires the user's preference information. Specifically, when the acquisition unit 131 identifies a user, it refers to the preference information database 121 and acquires the preference information of the identified user. For example, the acquisition unit 131 acquires information that the food that the user U1 likes is "curry" as the preference information of the user U1. In this way, the acquisition unit 131 associates the thing information indicating a specific thing (for example, "curry") with the emotion information indicating the user's feelings for the specific thing (for example, "the food that the user likes"). Acquires user information about the user.
(生成部132)
 生成部132は、発話文テンプレートに当てはめる文字列を生成する。具体的には、生成部132は、取得部131によってユーザの嗜好情報が取得されると、取得した嗜好情報を共感モデルに入力して、共感モデルの出力データとして「XはYよね」という発話文テンプレートの「Y」の部分に当てはめる文字列を生成する。例えば、生成部132は、ユーザU1の嗜好対象である「カレー」という文字列を共感モデルに入力して、共感モデルの出力データとして「おいしい」という文字列を生成する。
(Generator 132)
The generation unit 132 generates a character string to be applied to the utterance sentence template. Specifically, when the user's preference information is acquired by the acquisition unit 131, the generation unit 132 inputs the acquired preference information into the sympathy model and utters "X is Y, isn't it?" As the output data of the sympathy model. Generate a character string that applies to the "Y" part of the sentence template. For example, the generation unit 132 inputs the character string "curry", which is the preference target of the user U1, into the sympathy model, and generates the character string "delicious" as the output data of the sympathy model.
 続いて、生成部132は、発話文テンプレートを取得する。例えば、生成部132は、「XはYよね」(Xはユーザの嗜好対象を示す文字列、Yは共感モデルから出力された文字列)という発話文テンプレートを取得する。生成部132は、「XはYよね」という発話文テンプレートを取得すると、XにユーザU1の嗜好対象である「カレー」という文字列を、Yに共感モデルの出力データである「おいしい」という文字列を当てはめて、「カレーはおいしいよね」という応答文を生成する。このように、生成部132は、取得部131によって話者の嗜好情報が取得されると、話者の嗜好情報に基づいて、発話に対する応答を示す応答情報を生成する。例えば、生成部132は、取得部131によって話者の嗜好情報が取得されると、話者の嗜好情報に基づいて、ロボット装置が出力する発話の文章を生成する。 Subsequently, the generation unit 132 acquires the utterance sentence template. For example, the generation unit 132 acquires an utterance sentence template of "X is Y, isn't it?" (X is a character string indicating a user's preference target, and Y is a character string output from the empathy model). When the generation unit 132 acquires the utterance sentence template "X is Y, isn't it?", The character string "curry", which is the preference target of the user U1, is given to X, and the character "delicious", which is the output data of the sympathy model, is given to Y. Fit the columns and generate the response "Curry is delicious, isn't it?" In this way, when the speaker's preference information is acquired by the acquisition unit 131, the generation unit 132 generates response information indicating a response to the utterance based on the speaker's preference information. For example, when the speaker's preference information is acquired by the acquisition unit 131, the generation unit 132 generates an utterance sentence output by the robot device based on the speaker's preference information.
 また、生成部132は、ユーザU1による発話である「昨日カレー食べたよ」にユーザU1が好きな食べ物である「カレー」が含まれることから、発話を行ったユーザU1の感情がポジティブな感情であると推定する。このように、生成部132は、取得部131によって取得された発話情報と、話者の嗜好情報とに基づいて、話者の感情を推定する。例えば、生成部132は、取得部131によって取得された発話情報に話者の嗜好対象が含まれる場合は、話者の感情がポジティブであると推定する。 In addition, since the generation unit 132 includes "curry", which is a food that user U1 likes, in "I ate curry yesterday", which is an utterance by user U1, the emotion of user U1 who made the utterance is a positive emotion. Presumed to be. In this way, the generation unit 132 estimates the emotion of the speaker based on the utterance information acquired by the acquisition unit 131 and the preference information of the speaker. For example, the generation unit 132 estimates that the speaker's emotions are positive when the utterance information acquired by the acquisition unit 131 includes the speaker's preference target.
 また、生成部132は、ユーザU1による発話である「お昼にパクチーが出てきたんだよ」にユーザU1が嫌い食べ物である「パクチー」が含まれる場合は、発話を行ったユーザU1の感情がネガティブな感情であると推定する。また、生成部132は、ユーザU1の嫌いな対象である「パクチー」という文字列を共感モデルに入力して、共感モデルの出力データとして「残念」という文字列を生成する。続いて、生成部132は、「XはYよね」という発話文テンプレートを取得すると、XにユーザU1の嫌いな対象である「パクチー」という文字列を、Yに共感モデルの出力データである「残念」という文字列を当てはめて、「パクチーは残念よね」という応答文を生成する。 Further, in the generation unit 132, when the utterance "Pakuchi came out at noon" by the user U1 includes the food "Pakuchi" that the user U1 dislikes, the emotion of the user U1 who made the utterance is Presumed to be a negative emotion. Further, the generation unit 132 inputs the character string "Pakuchi", which is an object disliked by the user U1, into the sympathy model, and generates the character string "Sorry" as the output data of the sympathy model. Subsequently, when the generation unit 132 acquires the utterance sentence template "X is Y, isn't it?", The character string "Pakuchi", which is the object that the user U1 dislikes, is given to X, and the output data of the sympathy model is given to Y. Apply the string "Sorry" to generate the response "Pakuchi is sorry".
 生成部132は、ユーザU1による発話である「今度フットサルの試合を見に行くんだ」にユーザU1の趣味である「フットサル」が含まれる場合は、発話を行ったユーザU1の感情がポジティブな感情であると推定する。また、生成部132は、ユーザU1の趣味の対象である「フットサル」という文字列を共感モデルに入力して、共感モデルの出力データとして「楽しみ」という文字列を生成する。続いて、生成部132は、「XはYよね」という発話文テンプレートを取得すると、XにユーザU1の趣味の対象である「フットサル」という文字列を、Yに共感モデルの出力データである「楽しみ」という文字列を当てはめて、「フットサルは楽しみよね」という応答文を生成する。 In the generation unit 132, when the user U1's utterance "I'm going to see a futsal game next time" includes the user U1's hobby "futsal", the emotion of the user U1 who made the utterance is positive. Presumed to be emotion. Further, the generation unit 132 inputs the character string "futsal", which is the hobby of the user U1, into the sympathy model, and generates the character string "fun" as the output data of the sympathy model. Subsequently, when the generation unit 132 acquires the utterance sentence template "X is Y, isn't it?", The character string "futsal", which is the object of the user U1's hobby, is given to X, and the output data of the sympathy model is given to Y. Apply the string "fun" to generate the response sentence "futsal is fun".
(提供部133)
 提供部133は、取得部131によって取得された発話情報と、特定の事物を示す事物情報と特定の事物に対するユーザの感情を示す感情情報とが紐づけられたユーザに関するユーザ情報とに基づいて、ロボット装置が表出する表出情報を提供する。具体的には、提供部133は、発話に対する応答を示す応答情報である表出情報を提供する。より具体的には、提供部133は、ロボット装置が出力する発話の文章である表出情報を提供する。例えば、提供部133は、発話情報に含まれる事物情報と紐づけられた感情情報に基づく共感を示す表出情報を提供する。提供部133は、生成部132によって生成された表出情報を提供する。図1に示す例では、提供部133は、生成部132が生成した「カレーはおいしいよね」という応答文をロボット装置10に対して提供する。
(Providing Department 133)
The providing unit 133 is based on the speech information acquired by the acquiring unit 131 and the user information about the user in which the thing information indicating a specific thing and the emotion information indicating the user's feelings for the specific thing are linked. It provides the expression information expressed by the robot device. Specifically, the providing unit 133 provides expression information which is response information indicating a response to an utterance. More specifically, the providing unit 133 provides the expression information which is the sentence of the utterance output by the robot device. For example, the providing unit 133 provides expression information showing empathy based on emotional information associated with the thing information included in the utterance information. The providing unit 133 provides the expression information generated by the generating unit 132. In the example shown in FIG. 1, the providing unit 133 provides the robot device 10 with a response sentence "curry is delicious, isn't it?" Generated by the generating unit 132.
 また、提供部133は、発話情報に複数の事物情報が含まれる場合は、複数の事物情報のうち少なくともいずれか一つの事物情報と紐づけられた感情情報に基づく共感を示す表出情報を提供する。例えば、話者が「カレー」が「好き」で「ウサギ」が「嫌い」の場合に、取得部131が「動物園でウサギ見ながらカレーを食べたよ」という発話を取得したとする。この時、提供部133は、「カレー、それはいいね!」または「ウサギか、それはちょっと嫌だね」というように、「カレー」と「ウサギ」のうち少なくともいずれか一方と紐づけられた「好き」または「嫌い」という感情に基づく共感を示す応答文を提供する。 Further, when the utterance information includes a plurality of thing information, the providing unit 133 provides the expression information showing sympathy based on the emotional information associated with at least one of the plurality of thing information. To do. For example, suppose that when the speaker "likes" "curry" and "dislikes" "rabbit", the acquisition unit 131 acquires the utterance "I ate curry while watching the rabbit at the zoo". At this time, the provider 133 was associated with at least one of "curry" and "rabbit", such as "curry, I like it!" Or "rabbit, I hate it a little". Provide a response sentence that shows empathy based on the feeling of "like" or "dislike".
 あるいは、提供部133は、発話情報に複数の事物情報が含まれる場合は、発話情報に含まれる各文節のうち、他の文節を修飾していない文節に含まれる事物情報と紐づけられた感情情報に基づく共感を示す表出情報を提供する。例えば、上記の例では、「ウサギ見ながら」という文節は、「カレーを食べたよ」という文節を修飾しているが、「カレーを食べたよ」という文節は他の文節を修飾していない。そこで、提供部133は、「カレーを食べたよ」という文節に含まれる「カレー」と紐づけられた「好き」という感情に基づく共感を示す応答文である「カレー、それはいいね!」を提供する。 Alternatively, when the utterance information includes a plurality of things information, the providing unit 133 gives an emotion associated with the thing information included in the bunsetsu that does not modify the other bunsetsu among the bunsetsu included in the utterance information. Provide expression information that shows sympathy based on information. For example, in the above example, the phrase "while watching the rabbit" modifies the phrase "I ate curry", but the phrase "I ate curry" does not qualify other phrases. Therefore, the providing department 133 provides "Curry, that's good!", Which is a response sentence showing sympathy based on the feeling of "like" associated with "curry" included in the phrase "I ate curry". To do.
 また、提供部133は、ユーザU1の感情を推定すると、推定した感情に基づく振る舞いに関する情報をロボット装置10に提供する。具体的には、提供部133は、ロボット装置が出力する音声のトーンを示す情報である表出情報を提供する。例えば、提供部133は、ユーザU1の感情がポジティブな感情であると推定すると、ロボット装置10が音声によって出力する応答文の音声のトーンに関する情報として、明るい調子に関する情報をロボット装置10に提供する。あるいは、提供部133は、ユーザU1の感情がネガティブな感情であると推定すると、ロボット装置10が音声によって出力する応答文の音声のトーンに関する情報として、暗い調子に関する情報をロボット装置10に提供する。 Further, when the emotion of the user U1 is estimated, the providing unit 133 provides the robot device 10 with information on the behavior based on the estimated emotion. Specifically, the providing unit 133 provides display information which is information indicating the tone of the voice output by the robot device. For example, assuming that the emotion of the user U1 is a positive emotion, the providing unit 133 provides the robot device 10 with information on a bright tone as information on the tone of the voice of the response sentence output by the robot device 10 by voice. .. Alternatively, assuming that the emotion of the user U1 is a negative emotion, the providing unit 133 provides the robot device 10 with information on the dark tone as information on the tone of the voice of the response sentence output by the robot device 10 by voice. ..
 また、提供部133は、ロボット装置の表情を示す情報である表出情報を提供する。例えば、提供部133は、ユーザU1の感情がポジティブな感情であると推定すると、ロボット装置10の表情に関する情報として、笑顔を示す情報をロボット装置10に提供する。あるいは、提供部133は、ユーザU1の感情がネガティブな感情であると推定すると、ロボット装置10の表情に関する情報として、悲しい顔を示す情報をロボット装置10に提供する。 Further, the providing unit 133 provides the expression information which is the information indicating the facial expression of the robot device. For example, assuming that the emotion of the user U1 is a positive emotion, the providing unit 133 provides the robot device 10 with information indicating a smile as information regarding the facial expression of the robot device 10. Alternatively, assuming that the emotion of the user U1 is a negative emotion, the providing unit 133 provides the robot device 10 with information indicating a sad face as information regarding the facial expression of the robot device 10.
 また、提供部133は、ロボット装置の動作速度を示す情報である表出情報を提供する。例えば、提供部133は、ロボット装置10が応答文の出力時に動く場合には、ユーザU1の感情がポジティブな感情であると推定すると、動作速度に関する情報として、素早い速度を示す情報をロボット装置10に提供する。あるいは、提供部133は、ユーザU1の感情がネガティブな感情であると推定すると、動作速度に関する情報として、遅い速度を示す情報をロボット装置10に提供する。 Further, the providing unit 133 provides display information which is information indicating the operating speed of the robot device. For example, if the providing unit 133 estimates that the emotion of the user U1 is a positive emotion when the robot device 10 moves at the time of outputting the response sentence, the robot device 10 provides information indicating a quick speed as information on the operating speed. To provide. Alternatively, assuming that the emotion of the user U1 is a negative emotion, the providing unit 133 provides the robot device 10 with information indicating a slow speed as information regarding the operating speed.
[1-3.実施形態に係る情報処理の手順]
 次に、図5を用いて、本開示の実施形態に係る情報処理の手順について説明する。図5は、本開示の実施形態に係る情報処理の手順を示すフローチャートである。図5に示す例では、情報処理装置100は、音声認識結果の形態素を取得する(ステップS101)。続いて、情報処理装置100は、話者識別結果に基づいて話者の嗜好情報を取得する(ステップS102)。続いて、情報処理装置100は、取得した形態素と話者の嗜好情報とに基づいて、話者の感情を推定するとともに、ロボット装置10の発話文を生成する(ステップS103)。
[1-3. Information processing procedure according to the embodiment]
Next, the procedure of information processing according to the embodiment of the present disclosure will be described with reference to FIG. FIG. 5 is a flowchart showing an information processing procedure according to the embodiment of the present disclosure. In the example shown in FIG. 5, the information processing apparatus 100 acquires the morpheme of the voice recognition result (step S101). Subsequently, the information processing device 100 acquires the preference information of the speaker based on the speaker identification result (step S102). Subsequently, the information processing device 100 estimates the emotion of the speaker based on the acquired morpheme and the preference information of the speaker, and generates an utterance sentence of the robot device 10 (step S103).
[2.本開示に係る効果]
 上述のように、本開示に係る情報処理装置100は、取得部131と提供部133とを備える。取得部131は、ユーザによる発話を示す発話情報を取得する。提供部133は、取得部131によって取得された発話情報と、特定の事物を示す事物情報と特定の事物に対するユーザの感情を示す感情情報とが紐づけられたユーザに関するユーザ情報とに基づいて、ロボット装置が表出する表出情報を提供する。
[2. Effect of this disclosure]
As described above, the information processing apparatus 100 according to the present disclosure includes an acquisition unit 131 and a provision unit 133. The acquisition unit 131 acquires utterance information indicating utterance by the user. The providing unit 133 is based on the speech information acquired by the acquiring unit 131 and the user information about the user in which the thing information indicating a specific thing and the emotion information indicating the user's feelings for the specific thing are linked. It provides the expression information expressed by the robot device.
 これにより、情報処理装置100は、ロボット装置が相手の感情に合わせた表現を表出することを可能にするので、ユーザとのコミュニケーションの幅を広げることができる。 As a result, the information processing device 100 enables the robot device to express an expression that matches the emotions of the other party, so that the range of communication with the user can be expanded.
 また、提供部133は、発話に対する応答を示す応答情報である表出情報を提供する。 Further, the providing unit 133 provides the expression information which is the response information indicating the response to the utterance.
 これにより、情報処理装置100は、ロボット装置が相手の感情に合わせた応答をすることを可能にするので、ユーザとのコミュニケーションの幅を広げることができる。 As a result, the information processing device 100 enables the robot device to respond according to the emotions of the other party, so that the range of communication with the user can be expanded.
 また、提供部133は、発話情報に含まれる事物情報と紐づけられた感情情報に基づく共感を示す表出情報を提供する。 In addition, the providing unit 133 provides expression information showing sympathy based on emotional information associated with the thing information included in the utterance information.
 これにより、情報処理装置100は、ロボット装置が例えば、発話に含まれるユーザの嗜好情報に基づいて、ユーザの感情を推定し、推定した感情に共感を示す表現を表出することを可能にするので、ユーザとのコミュニケーションの幅を広げることができる。 Thereby, the information processing device 100 enables the robot device to estimate the user's emotion based on, for example, the user's preference information included in the utterance, and to express an expression showing sympathy for the estimated emotion. Therefore, the range of communication with the user can be expanded.
 また、提供部133は、発話情報に複数の事物情報が含まれる場合は、複数の事物情報のうち少なくともいずれか一つの事物情報と紐づけられた感情情報に基づく共感を示す表出情報を提供する。 Further, when the utterance information includes a plurality of thing information, the providing unit 133 provides the expression information showing sympathy based on the emotional information associated with at least one of the plurality of thing information. To do.
 これにより、情報処理装置100は、ロボット装置が相手の感情に合わせた適切な応答をすることを可能にするので、ユーザとのコミュニケーションの幅を広げることができる。 As a result, the information processing device 100 enables the robot device to make an appropriate response according to the emotion of the other party, so that the range of communication with the user can be expanded.
 また、提供部133は、発話情報に複数の事物情報が含まれる場合は、発話情報に含まれる各文節のうち、他の文節を修飾していない文節に含まれる事物情報と紐づけられた感情情報に基づく共感を示す表出情報を提供する。 In addition, when the utterance information includes a plurality of things information, the providing unit 133 gives an emotion associated with the thing information included in the bunsetsu that does not modify the other bunsetsu among the bunsetsu included in the utterance information. Provide expression information that shows sympathy based on information.
 これにより、情報処理装置100は、ロボット装置が相手の感情に合わせた適切な応答をすることを可能にするので、ユーザとのコミュニケーションの幅を広げることができる。 As a result, the information processing device 100 enables the robot device to make an appropriate response according to the emotion of the other party, so that the range of communication with the user can be expanded.
 また、提供部133は、ロボット装置が出力する発話の文章である表出情報を提供する。 In addition, the providing unit 133 provides the expression information which is the text of the utterance output by the robot device.
 これにより、情報処理装置100は、ロボット装置が相手の感情に合わせた適切な発話をすることを可能にする。 As a result, the information processing device 100 enables the robot device to make an appropriate utterance according to the emotion of the other party.
 また、提供部133は、ロボット装置が出力する音声のトーンを示す情報である表出情報を提供する。 Further, the providing unit 133 provides the expression information which is the information indicating the tone of the voice output by the robot device.
 これにより、情報処理装置100は、ロボット装置が相手の感情に合わせた適切な声のトーンで発話することを可能にする。 This allows the information processing device 100 to allow the robot device to speak with an appropriate voice tone that matches the emotions of the other party.
 また、提供部133は、ロボット装置の表情を示す情報である表出情報を提供する。 Further, the providing unit 133 provides the expression information which is the information indicating the facial expression of the robot device.
 これにより、情報処理装置100は、ロボット装置が相手の感情に合わせた適切な表情を表現することを可能にする。 This enables the information processing device 100 to allow the robot device to express an appropriate facial expression according to the emotion of the other party.
 また、提供部133は、ロボット装置の動作速度を示す情報である表出情報を提供する。 Further, the providing unit 133 provides display information which is information indicating the operating speed of the robot device.
 これにより、情報処理装置100は、ロボット装置が相手の感情に合わせた適切な動作を表現することを可能にする。 This enables the information processing device 100 to allow the robot device to express an appropriate operation according to the emotion of the other party.
[3.ハードウェア構成]
 上述してきた実施形態や変形例に係る情報処理装置100等の情報機器は、例えば図6に示すような構成のコンピュータ1000によって実現される。図6は、情報処理装置100等の情報処理装置の機能を実現するコンピュータ1000の一例を示すハードウェア構成図である。以下、実施形態に係る情報処理装置100を例に挙げて説明する。コンピュータ1000は、CPU1100、RAM1200、ROM(Read Only Memory)1300、HDD(Hard Disk Drive)1400、通信インターフェイス1500、及び入出力インターフェイス1600を有する。コンピュータ1000の各部は、バス1050によって接続される。
[3. Hardware configuration]
The information device such as the information processing device 100 according to the above-described embodiment and modification is realized by, for example, a computer 1000 having a configuration as shown in FIG. FIG. 6 is a hardware configuration diagram showing an example of a computer 1000 that realizes the functions of an information processing device such as the information processing device 100. Hereinafter, the information processing apparatus 100 according to the embodiment will be described as an example. The computer 1000 includes a CPU 1100, a RAM 1200, a ROM (Read Only Memory) 1300, an HDD (Hard Disk Drive) 1400, a communication interface 1500, and an input / output interface 1600. Each part of the computer 1000 is connected by a bus 1050.
 CPU1100は、ROM1300又はHDD1400に格納されたプログラムに基づいて動作し、各部の制御を行う。例えば、CPU1100は、ROM1300又はHDD1400に格納されたプログラムをRAM1200に展開し、各種プログラムに対応した処理を実行する。 The CPU 1100 operates based on the program stored in the ROM 1300 or the HDD 1400, and controls each part. For example, the CPU 1100 expands the program stored in the ROM 1300 or the HDD 1400 into the RAM 1200 and executes processing corresponding to various programs.
 ROM1300は、コンピュータ1000の起動時にCPU1100によって実行されるBIOS(Basic Input Output System)等のブートプログラムや、コンピュータ1000のハードウェアに依存するプログラム等を格納する。 The ROM 1300 stores a boot program such as a BIOS (Basic Input Output System) executed by the CPU 1100 when the computer 1000 is started, a program that depends on the hardware of the computer 1000, and the like.
 HDD1400は、CPU1100によって実行されるプログラム、及び、かかるプログラムによって使用されるデータ等を非一時的に記録する、コンピュータが読み取り可能な記録媒体である。具体的には、HDD1400は、プログラムデータ1450の一例である本開示に係る情報処理プログラムを記録する記録媒体である。 The HDD 1400 is a computer-readable recording medium that non-temporarily records a program executed by the CPU 1100 and data used by the program. Specifically, the HDD 1400 is a recording medium for recording an information processing program according to the present disclosure, which is an example of program data 1450.
 通信インターフェイス1500は、コンピュータ1000が外部ネットワーク1550(例えばインターネット)と接続するためのインターフェイスである。例えば、CPU1100は、通信インターフェイス1500を介して、他の機器からデータを受信したり、CPU1100が生成したデータを他の機器へ送信したりする。 The communication interface 1500 is an interface for the computer 1000 to connect to an external network 1550 (for example, the Internet). For example, the CPU 1100 receives data from another device or transmits data generated by the CPU 1100 to another device via the communication interface 1500.
 入出力インターフェイス1600は、入出力デバイス1650とコンピュータ1000とを接続するためのインターフェイスである。例えば、CPU1100は、入出力インターフェイス1600を介して、キーボードやマウス等の入力デバイスからデータを受信する。また、CPU1100は、入出力インターフェイス1600を介して、ディスプレイやスピーカーやプリンタ等の出力デバイスにデータを送信する。また、入出力インターフェイス1600は、所定の記録媒体(メディア)に記録されたプログラム等を読み取るメディアインターフェイスとして機能してもよい。メディアとは、例えばDVD(Digital Versatile Disc)、PD(Phase change rewritable Disk)等の光学記録媒体、MO(Magneto-Optical disk)等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等である。 The input / output interface 1600 is an interface for connecting the input / output device 1650 and the computer 1000. For example, the CPU 1100 receives data from an input device such as a keyboard or mouse via the input / output interface 1600. Further, the CPU 1100 transmits data to an output device such as a display, a speaker, or a printer via the input / output interface 1600. Further, the input / output interface 1600 may function as a media interface for reading a program or the like recorded on a predetermined recording medium (media). The media is, for example, an optical recording medium such as a DVD (Digital Versatile Disc) or PD (Phase change rewritable Disk), a magneto-optical recording medium such as an MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory. Is.
 例えば、コンピュータ1000が実施形態に係る情報処理装置100として機能する場合、コンピュータ1000のCPU1100は、RAM1200上にロードされた情報処理プログラムを実行することにより、制御部130等の機能を実現する。また、HDD1400には、本開示に係る情報処理プログラムや、記憶部120内のデータが格納される。なお、CPU1100は、プログラムデータ1450をHDD1400から読み取って実行するが、他の例として、外部ネットワーク1550を介して、他の装置からこれらのプログラムを取得してもよい。 For example, when the computer 1000 functions as the information processing device 100 according to the embodiment, the CPU 1100 of the computer 1000 realizes the functions of the control unit 130 and the like by executing the information processing program loaded on the RAM 1200. Further, the information processing program according to the present disclosure and the data in the storage unit 120 are stored in the HDD 1400. The CPU 1100 reads the program data 1450 from the HDD 1400 and executes the program, but as another example, these programs may be acquired from another device via the external network 1550.
 なお、本技術は以下のような構成も取ることができる。
(1)
 ユーザによる発話を示す発話情報を取得する取得部と、
 前記取得部によって取得された前記発話情報と、特定の事物を示す事物情報と前記特定の事物に対する前記ユーザの感情を示す感情情報とが紐づけられた前記ユーザに関するユーザ情報とに基づいて、ロボット装置が表出する表出情報を提供する提供部、
 を備える情報処理装置。
(2)
 前記提供部は、
 前記発話に対する応答を示す応答情報である前記表出情報を提供する
 前記(1)に記載の情報処理装置。
(3)
 前記提供部は、
 前記発話情報に含まれる前記事物情報と紐づけられた前記感情情報に基づく共感を示す前記表出情報を提供する
 前記(1)または(2)に記載の情報処理装置。
(4)
 前記提供部は、
 前記発話情報に複数の前記事物情報が含まれる場合は、前記複数の前記事物情報のうち少なくともいずれか一つの前記事物情報と紐づけられた前記感情情報に基づく共感を示す前記表出情報を提供する
 前記(1)~(3)のいずれか一つに記載の情報処理装置。
(5)
 前記提供部は、
 前記発話情報に複数の前記事物情報が含まれる場合は、前記発話情報に含まれる各文節のうち、他の文節を修飾していない文節に含まれる前記事物情報と紐づけられた前記感情情報に基づく共感を示す前記表出情報を提供する
 前記(1)~(4)のいずれか一つに記載の情報処理装置。
(6)
 前記提供部は、
 前記ロボット装置が出力する発話の文章である前記表出情報を提供する
 前記(1)~(5)のいずれか一つに記載の情報処理装置。
(7)
 前記提供部は、
 前記ロボット装置が出力する音声のトーンを示す情報である前記表出情報を提供する
 前記(1)~(6)のいずれか一つに記載の情報処理装置。
(8)
 前記提供部は、
 前記ロボット装置の表情を示す情報である前記表出情報を提供する
 前記(1)~(7)のいずれか一つに記載の情報処理装置。
(9)
 前記提供部は、
 前記ロボット装置の動作速度を示す情報である前記表出情報を提供する
 前記(1)~(8)のいずれか一つに記載の情報処理装置。
(10)
 ユーザによる発話を示す発話情報を取得し、
 取得した前記発話情報と、特定の事物を示す事物情報と前記特定の事物に対する前記ユーザの感情を示す感情情報とが紐づけられた前記ユーザに関するユーザ情報とに基づいて、ロボット装置が表出する表出情報を提供する、
 処理を実行する情報処理方法。
(11)
 コンピュータに、
 ユーザによる発話を示す発話情報を取得する取得手順と、
 前記取得手順によって取得された前記発話情報と、特定の事物を示す事物情報と前記特定の事物に対する前記ユーザの感情を示す感情情報とが紐づけられた前記ユーザに関するユーザ情報とに基づいて、ロボット装置が表出する表出情報を提供する提供手順、
 を実行させるための情報処理プログラム。
The present technology can also have the following configurations.
(1)
An acquisition unit that acquires utterance information indicating utterances by the user,
Based on the speech information acquired by the acquisition unit, the user information about the user in which the thing information indicating a specific thing and the emotion information indicating the user's emotion toward the specific thing are associated with each other, the robot A provider that provides the expression information that the device expresses,
Information processing device equipped with.
(2)
The providing part
The information processing device according to (1) above, which provides the expression information which is the response information indicating the response to the utterance.
(3)
The providing part
The information processing device according to (1) or (2), which provides the expression information indicating sympathy based on the emotional information associated with the thing information included in the utterance information.
(4)
The providing part
When the spoken information includes a plurality of the said thing information, the expression showing sympathy based on the emotional information associated with at least one of the said said thing information among the plurality of said thing information. The information processing apparatus according to any one of (1) to (3) above, which provides information.
(5)
The providing part
When the utterance information includes a plurality of the thing information, the emotion associated with the thing information included in the clause not modifying the other clauses among the clauses included in the utterance information. The information processing apparatus according to any one of (1) to (4) above, which provides the expression information showing sympathy based on the information.
(6)
The providing part
The information processing device according to any one of (1) to (5) above, which provides the expression information which is the text of the utterance output by the robot device.
(7)
The providing part
The information processing device according to any one of (1) to (6) above, which provides the display information which is information indicating the tone of the voice output by the robot device.
(8)
The providing part
The information processing device according to any one of (1) to (7), which provides the expression information which is information indicating the facial expression of the robot device.
(9)
The providing part
The information processing device according to any one of (1) to (8), which provides the display information which is information indicating the operating speed of the robot device.
(10)
Acquires utterance information indicating the utterance by the user,
The robot device expresses based on the acquired utterance information, the thing information indicating a specific thing, and the user information about the user in which the emotion information indicating the user's feelings for the specific thing is associated with each other. Provide expression information,
An information processing method that executes processing.
(11)
On the computer
The acquisition procedure for acquiring utterance information indicating the utterance by the user,
Based on the speech information acquired by the acquisition procedure, the user information about the user in which the thing information indicating a specific thing and the emotion information indicating the user's emotion toward the specific thing are associated with each other, the robot Providing procedure, which provides the expression information expressed by the device,
Information processing program to execute.
   1 情報処理システム
  10 ロボット装置
 100 情報処理装置
 110 通信部
 120 記憶部
 121 嗜好情報データベース
 122 モデル情報データベース
 123 テンプレートデータベース
 130 制御部
 131 取得部
 132 生成部
 133 提供部
1 Information processing system 10 Robot device 100 Information processing device 110 Communication unit 120 Storage unit 121 Preference information database 122 Model information database 123 Template database 130 Control unit 131 Acquisition unit 132 Generation unit 133 Providing unit

Claims (11)

  1.  ユーザによる発話を示す発話情報を取得する取得部と、
     前記取得部によって取得された前記発話情報と、特定の事物を示す事物情報と前記特定の事物に対する前記ユーザの感情を示す感情情報とが紐づけられた前記ユーザに関するユーザ情報とに基づいて、ロボット装置が表出する表出情報を提供する提供部、
     を備える情報処理装置。
    An acquisition unit that acquires utterance information indicating utterances by the user,
    Based on the speech information acquired by the acquisition unit, the user information about the user in which the thing information indicating a specific thing and the emotion information indicating the user's emotion toward the specific thing are associated with each other, the robot A provider that provides the expression information that the device expresses,
    Information processing device equipped with.
  2.  前記提供部は、
     前記発話に対する応答を示す応答情報である前記表出情報を提供する
     請求項1に記載の情報処理装置。
    The providing part
    The information processing apparatus according to claim 1, which provides the expression information which is the response information indicating the response to the utterance.
  3.  前記提供部は、
     前記発話情報に含まれる前記事物情報と紐づけられた前記感情情報に基づく共感を示す前記表出情報を提供する
     請求項1に記載の情報処理装置。
    The providing part
    The information processing device according to claim 1, which provides the expression information showing sympathy based on the emotional information associated with the thing information included in the utterance information.
  4.  前記提供部は、
     前記発話情報に複数の前記事物情報が含まれる場合は、前記複数の前記事物情報のうち少なくともいずれか一つの前記事物情報と紐づけられた前記感情情報に基づく共感を示す前記表出情報を提供する
     請求項1に記載の情報処理装置。
    The providing part
    When the spoken information includes a plurality of the said thing information, the expression showing sympathy based on the emotional information associated with at least one of the said said thing information among the plurality of said thing information. The information processing device according to claim 1, which provides information.
  5.  前記提供部は、
     前記発話情報に複数の前記事物情報が含まれる場合は、前記発話情報に含まれる各文節のうち、他の文節を修飾していない文節に含まれる前記事物情報と紐づけられた前記感情情報に基づく共感を示す前記表出情報を提供する
     請求項1に記載の情報処理装置。
    The providing part
    When the utterance information includes a plurality of the thing information, the emotion associated with the thing information included in the clause not modifying the other clauses among the clauses included in the utterance information. The information processing apparatus according to claim 1, which provides the expression information showing sympathy based on the information.
  6.  前記提供部は、
     前記ロボット装置が出力する発話の文章である前記表出情報を提供する
     請求項1に記載の情報処理装置。
    The providing part
    The information processing device according to claim 1, which provides the expression information which is a sentence of an utterance output by the robot device.
  7.  前記提供部は、
     前記ロボット装置が出力する音声のトーンを示す情報である前記表出情報を提供する
     請求項1に記載の情報処理装置。
    The providing part
    The information processing device according to claim 1, which provides the display information which is information indicating the tone of the voice output by the robot device.
  8.  前記提供部は、
     前記ロボット装置の表情を示す情報である前記表出情報を提供する
     請求項1に記載の情報処理装置。
    The providing part
    The information processing device according to claim 1, which provides the expression information which is information indicating the facial expression of the robot device.
  9.  前記提供部は、
     前記ロボット装置の動作速度を示す情報である前記表出情報を提供する
     請求項1に記載の情報処理装置。
    The providing part
    The information processing device according to claim 1, which provides the display information which is information indicating the operating speed of the robot device.
  10.  ユーザによる発話を示す発話情報を取得し、
     取得した前記発話情報と、特定の事物を示す事物情報と前記特定の事物に対する前記ユーザの感情を示す感情情報とが紐づけられた前記ユーザに関するユーザ情報とに基づいて、ロボット装置が表出する表出情報を提供する、
     処理を実行する情報処理方法。
    Acquires utterance information indicating the utterance by the user,
    The robot device expresses based on the acquired utterance information, the thing information indicating a specific thing, and the user information about the user in which the emotion information indicating the user's feelings for the specific thing is associated with each other. Provide expression information,
    An information processing method that executes processing.
  11.  コンピュータに、
     ユーザによる発話を示す発話情報を取得する取得手順と、
     前記取得手順によって取得された前記発話情報と、特定の事物を示す事物情報と前記特定の事物に対する前記ユーザの感情を示す感情情報とが紐づけられた前記ユーザに関するユーザ情報とに基づいて、ロボット装置が表出する表出情報を提供する提供手順、
     を実行させるための情報処理プログラム。
    On the computer
    The acquisition procedure for acquiring utterance information indicating the utterance by the user,
    Based on the speech information acquired by the acquisition procedure, the user information about the user in which the thing information indicating a specific thing and the emotion information indicating the user's emotion toward the specific thing are associated with each other, the robot Providing procedure, which provides the expression information expressed by the device,
    Information processing program to execute.
PCT/JP2020/045993 2019-12-27 2020-12-10 Information processing device, information processing method, and information processing program WO2021131737A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2019239051 2019-12-27
JP2019-239051 2019-12-27

Publications (1)

Publication Number Publication Date
WO2021131737A1 true WO2021131737A1 (en) 2021-07-01

Family

ID=76575476

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/045993 WO2021131737A1 (en) 2019-12-27 2020-12-10 Information processing device, information processing method, and information processing program

Country Status (1)

Country Link
WO (1) WO2021131737A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004086001A (en) * 2002-08-28 2004-03-18 Sony Corp Conversation processing system, conversation processing method, and computer program
JP2006178063A (en) * 2004-12-21 2006-07-06 Toyota Central Res & Dev Lab Inc Interactive processing device
JP2015148701A (en) * 2014-02-06 2015-08-20 日本電信電話株式会社 Robot control device, robot control method and robot control program
JP2016536630A (en) * 2013-10-01 2016-11-24 ソフトバンク・ロボティクス・ヨーロッパSoftbank Robotics Europe Method for dialogue between machine such as humanoid robot and human speaker, computer program product, and humanoid robot for executing the method
JP2018054866A (en) * 2016-09-29 2018-04-05 トヨタ自動車株式会社 Voice interactive apparatus and voice interactive method
JP2019175432A (en) * 2018-03-26 2019-10-10 カシオ計算機株式会社 Dialogue control device, dialogue system, dialogue control method, and program

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004086001A (en) * 2002-08-28 2004-03-18 Sony Corp Conversation processing system, conversation processing method, and computer program
JP2006178063A (en) * 2004-12-21 2006-07-06 Toyota Central Res & Dev Lab Inc Interactive processing device
JP2016536630A (en) * 2013-10-01 2016-11-24 ソフトバンク・ロボティクス・ヨーロッパSoftbank Robotics Europe Method for dialogue between machine such as humanoid robot and human speaker, computer program product, and humanoid robot for executing the method
JP2015148701A (en) * 2014-02-06 2015-08-20 日本電信電話株式会社 Robot control device, robot control method and robot control program
JP2018054866A (en) * 2016-09-29 2018-04-05 トヨタ自動車株式会社 Voice interactive apparatus and voice interactive method
JP2019175432A (en) * 2018-03-26 2019-10-10 カシオ計算機株式会社 Dialogue control device, dialogue system, dialogue control method, and program

Similar Documents

Publication Publication Date Title
JP6719739B2 (en) Dialogue method, dialogue system, dialogue device, and program
US9355092B2 (en) Human-like response emulator
KR20170027705A (en) Methods and systems of handling a dialog with a robot
JP2016536630A (en) Method for dialogue between machine such as humanoid robot and human speaker, computer program product, and humanoid robot for executing the method
CN108470188B (en) Interaction method based on image analysis and electronic equipment
JP7371135B2 (en) Speaker recognition using speaker specific speech models
KR102418558B1 (en) English speaking teaching method using interactive artificial intelligence avatar, device and system therefor
Katayama et al. Situation-aware emotion regulation of conversational agents with kinetic earables
CA2835368A1 (en) System and method for providing a dialog with a user
Catania et al. CORK: A COnversational agent framewoRK exploiting both rational and emotional intelligence
Ritschel et al. Multimodal joke generation and paralinguistic personalization for a socially-aware robot
US20220253609A1 (en) Social Agent Personalized and Driven by User Intent
US11682318B2 (en) Methods and systems for assisting pronunciation correction
WO2021131737A1 (en) Information processing device, information processing method, and information processing program
JP7029351B2 (en) How to generate OOS text and the device that does it
Boonstra Introduction to conversational AI
Planet et al. Children’s emotion recognition from spontaneous speech using a reduced set of acoustic and linguistic features
US20220180762A1 (en) Computer assisted linguistic training including machine learning
WO2021186525A1 (en) Utterance generation device, utterance generation method, and program
JP2021149664A (en) Output apparatus, output method, and output program
JP6176137B2 (en) Spoken dialogue apparatus, spoken dialogue system, and program
DeMara et al. Towards interactive training with an avatar-based human-computer interface
WO2021064947A1 (en) Interaction method, interaction system, interaction device, and program
Feng et al. A platform for building mobile virtual humans
López et al. Lifeline dialogues with roberta

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20907462

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20907462

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP