WO2021131737A1 - Dispositif, procédé et programme de traitement d'informations - Google Patents

Dispositif, procédé et programme de traitement d'informations Download PDF

Info

Publication number
WO2021131737A1
WO2021131737A1 PCT/JP2020/045993 JP2020045993W WO2021131737A1 WO 2021131737 A1 WO2021131737 A1 WO 2021131737A1 JP 2020045993 W JP2020045993 W JP 2020045993W WO 2021131737 A1 WO2021131737 A1 WO 2021131737A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
user
information processing
utterance
thing
Prior art date
Application number
PCT/JP2020/045993
Other languages
English (en)
Japanese (ja)
Inventor
英樹 野間
直矢 村松
Original Assignee
ソニーグループ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニーグループ株式会社 filed Critical ソニーグループ株式会社
Publication of WO2021131737A1 publication Critical patent/WO2021131737A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • the present invention relates to an information processing device, an information processing method, and an information processing program.
  • Patent Document 1 discloses a technique of calculating an expected value of the user's attention to output information and controlling information output based on the expected value.
  • the information processing apparatus of one form according to the present disclosure includes an acquisition unit that acquires utterance information indicating an utterance by a user, the utterance information acquired by the acquisition unit, and a specific object. Based on the user information about the user in which the thing information indicating the above and the emotion information indicating the user's feelings toward the specific thing are associated with each other, the providing unit that provides the expression information expressed by the robot device. Be prepared.
  • Embodiment 1-1 Outline of information processing according to the embodiment 1-2.
  • Configuration of Information Processing Device According to Embodiment 1-3.
  • Information processing procedure according to the embodiment 2.
  • Effect of this disclosure 3.
  • Hardware configuration
  • FIG. 1 is a diagram showing an example of information processing according to the embodiment of the present disclosure.
  • the information processing shown in FIG. 1 is performed by the robot device 10 and the information processing device 100.
  • the robot device 10 can be various devices that perform autonomous operations based on environmental recognition.
  • the robot device 10 according to the present embodiment is an oblong agent-type robot device that autonomously travels by wheels.
  • the robot device 10 realizes various communications including information presentation by performing autonomous operations according to, for example, the user, the surroundings, and its own situation.
  • the robot device 10 may be a small robot having a size and weight that can be easily lifted by a user with one hand.
  • the information processing device 100 acquires utterance information indicating utterance by the user. Further, the information processing device 100 is a robot device based on the acquired speech information and user information about the user in which the thing information indicating a specific thing and the emotion information indicating the user's feelings for the specific thing are associated with each other.
  • the expression information expressed by 10 is provided.
  • the information processing device 100 acquires voice data of a user's utterance from the robot device 10. Subsequently, the information processing device 100 acquires the voice recognition result of the acquired voice data (for example, "I ate curry yesterday"). Subsequently, the information processing device 100 decomposes the voice recognition result into morphemes by morphological analysis. For example, the information processing apparatus 100 acquires morphemes such as "yesterday: noun, tense”, “curry: noun, food,” “eat: verb", and "yo: particle” by morphological analysis.
  • the information processing device 100 acquires the voice direction (for example, "direction of 45 degrees to the left") from the robot device 10. Further, the information processing device 100 acquires a user's face identification result (for example, face ID "U1") from the robot device 10. Further, the information processing device 100 acquires image data from the robot device 10. Subsequently, the information processing device 100 matches the voice direction (the direction of 45 degrees to the left) acquired from the robot device 10 with the position of the user in the image data (the user with the face ID "U1" in the direction of 45 degrees to the left). Identify the user who is the speaker of the utterance.
  • the voice direction for example, "direction of 45 degrees to the left
  • a user's face identification result for example, face ID "U1"
  • the information processing device 100 acquires image data from the robot device 10. Subsequently, the information processing device 100 matches the voice direction (the direction of 45 degrees to the left) acquired from the robot device 10 with the position of the user in the image data (the user with the face ID "
  • the information processing device 100 acquires the user's preference information. For example, the information processing apparatus 100 acquires information that the food that the user U1 likes is "curry" as the preference information of the user U1. Subsequently, the information processing apparatus 100 inputs the character string "curry", which is the preference target of the user U1, into the sympathy model, and acquires the character string "delicious” as the output data of the sympathy model.
  • the information processing apparatus 100 acquires the spoken sentence template.
  • the information processing device 100 acquires an utterance sentence template of "X is Y, isn't it?" (X is a character string indicating a user's preference target, and Y is a character string output from the sympathy model).
  • X is a character string indicating a user's preference target
  • Y is a character string output from the sympathy model.
  • the information processing device 100 acquires the utterance template "X is Y, isn't it?" Apply the character string to generate the response sentence "Curry is delicious, isn't it?"
  • the emotion of the user U1 who made the utterance is a positive emotion. Presumed to be.
  • the information processing device 100 When the information processing device 100 generates the response sentence, the information processing device 100 provides the robot device 10 with the response sentence "Curry is delicious, isn't it?". Further, when the information processing device 100 estimates the emotion of the user U1, the information processing device 100 provides the robot device 10 with information on the behavior based on the estimated emotion. For example, when the information processing device 100 estimates that the emotion of the user U1 is a positive emotion, the information processing device 100 provides the robot device 10 with information on a bright tone as information on the tone of the voice of the response sentence output by the robot device 10 by voice. To do. Further, the information processing device 100 provides the robot device 10 with information indicating a smile as information regarding the facial expression of the robot device 10. Further, when the robot device 10 moves at the time of outputting the response sentence, the information processing device 100 provides the robot device 10 with information indicating a quick speed as information on the operating speed.
  • FIG. 2 is a block diagram showing an example of a schematic configuration of the information processing apparatus according to the embodiment of the present disclosure.
  • the information processing system 1 has a robot device 10 and an information processing device 100.
  • the information processing device 100 includes an acquisition unit 131 and a generation unit 132.
  • the acquisition unit 131 includes a voice recognizer, a speaker identifyr, a morphological analyzer, and a preprocessing unit.
  • the voice recognizer acquires voice data from the robot device 10. When the voice recognizer acquires the voice data, it recognizes the voice of the voice data and outputs the voice recognition result to the preprocessing unit.
  • the speaker classifier acquires the voice direction, face recognition result, and image data from the robot device 10.
  • the speaker classifier acquires the voice direction, face recognition result, and image data, it identifies the speaker and outputs the speaker identification result to the preprocessing unit.
  • the pre-processing unit acquires the voice recognition result from the voice recognizer.
  • the preprocessing unit acquires the voice recognition result, it outputs the acquired voice recognition result to the morphological analyzer.
  • the morphological analyzer acquires the voice recognition result from the preprocessing unit.
  • the morphological analyzer acquires the speech recognition result, it performs morphological analysis on the speech recognition result and decomposes the speech recognition result into morphemes.
  • the morphological analyzer performs morphological analysis, it outputs the morpheme to the preprocessing unit.
  • the preprocessing unit acquires the speaker identification result from the speaker classifier.
  • the preprocessing unit acquires the speaker identification result, it refers to the preference information database 121 and acquires the identified speaker preference information from the preference information database 121.
  • the preprocessing unit acquires the morpheme and speaker preference information, it outputs the acquired morpheme and speaker preference information to the generation unit 132.
  • the generation unit 132 acquires morpheme and speaker preference information from the preprocessing unit. When the morpheme and the speaker's preference information are acquired, the generation unit 132 generates an utterance sentence and estimates the speaker's emotion based on the acquired state and speaker's preference information.
  • FIG. 3 is a diagram showing a configuration example of the information processing device according to the embodiment of the present disclosure.
  • the information processing device 100 according to the embodiment of the present disclosure includes a communication unit 110, a storage unit 120, and a control unit 130.
  • the communication unit 110 is realized by, for example, a NIC or the like. Then, the communication unit 110 is connected to the network N by wire or wirelessly, and transmits / receives information to / from, for example, the robot device 10.
  • the storage unit 120 is realized by, for example, a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory (Flash Memory), or a storage device such as a hard disk or an optical disk.
  • the storage unit 120 stores the information processing program according to the embodiment.
  • the storage unit 120 has a preference information database 121, a model information database 122, and a template database 123.
  • the preference information database 121 stores various information related to the user's preference.
  • An example of the preference information database according to the embodiment will be described with reference to FIG.
  • FIG. 4 is a diagram showing an example of a preference information database according to the embodiment of the present disclosure.
  • the preference information database 121 has a "user ID”, a "name”, a “favorite food”, a “disliked food”, a “recently dissatisfied", a "sad thing”, and a “sad thing”. It has items such as "hobbies" and "hometown".
  • “User ID” indicates identification information that identifies the user.
  • “Name” indicates the name of the user.
  • “Favorite food” indicates the food that the user likes.
  • “Disliked food” indicates food that the user dislikes.
  • "Recently dissatisfied” indicates that the user has recently been dissatisfied.
  • "What makes you sad” indicates what makes the user sad.
  • “Hobby” indicates a user's hobby.
  • “Hometown” indicates the hometown of the user.
  • the name of the user (user U1) identified by the user ID "U1" is “A”.
  • the food that user U1 likes is “curry”.
  • the food that user U1 dislikes is “coriander”.
  • what user U1 has recently been dissatisfied with is the "tax increase.”
  • what makes user U1 sad is the "birthday”.
  • the hobby of user U1 is "futsal”.
  • the hometown of the user is "Tokyo”.
  • Model information database 122 The model information database 122 stores various information related to the empathy model. Specifically, the model information database 122 stores various information related to the learning model learned to output the character string corresponding to the user's preference information when the character string indicating the user's preference information is input. To do. For example, the model information database 122 stores the identification information that identifies the sympathy model and the model data of the sympathy model in association with each other.
  • the template database 123 stores various information related to the template. Specifically, the template database 123 stores an utterance template of "X is Y, isn't it?" (X is a character string indicating a user's preference target, Y is a character string output from the sympathy model).
  • Control unit 130 In the control unit 130, various programs (corresponding to an example of an information processing program) stored in a storage device inside the information processing device 100 by a CPU (Central Processing Unit), an MPU (Micro Processing Unit), or the like have a RAM as a work area. It is realized by executing as. Further, the control unit 130 is realized by, for example, an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • control unit 130 has an acquisition unit 131, a generation unit 132, and a provision unit 133, and realizes or executes an information processing function or operation described below.
  • the internal configuration of the control unit 130 is not limited to the configuration shown in FIG. 3, and may be another configuration as long as it is a configuration for performing information processing described later.
  • the acquisition unit 131 acquires utterance information indicating utterance by the user. Specifically, the acquisition unit 131 acquires voice data of a user's utterance from the robot device 10. Subsequently, the acquisition unit 131 acquires the voice recognition result (for example, "I ate curry yesterday") of the acquired voice data. Subsequently, the acquisition unit 131 decomposes the speech recognition result into morphemes by morphological analysis. For example, the acquisition unit 131 acquires morphemes such as "yesterday: noun, tense”, “curry: noun, food,””eat:verb", and "yo: particle” by morphological analysis.
  • the acquisition unit 131 acquires the voice direction (for example, "direction of 45 degrees to the left") from the robot device 10. Further, the acquisition unit 131 acquires the user's face identification result (for example, the face ID “U1”) from the robot device 10. Further, the acquisition unit 131 acquires image data from the robot device 10. Subsequently, the acquisition unit 131 collates the voice direction (the direction of 45 degrees to the left) acquired from the robot device 10 with the position of the user in the image data (the user of the face ID “U1” in the direction of 45 degrees to the left). Identify the user who is the speaker of the speech.
  • the voice direction for example, "direction of 45 degrees to the left
  • the acquisition unit 131 acquires the user's face identification result (for example, the face ID “U1”) from the robot device 10.
  • the acquisition unit 131 acquires image data from the robot device 10. Subsequently, the acquisition unit 131 collates the voice direction (the direction of 45 degrees to the left) acquired from the robot device 10 with the position of the
  • the acquisition unit 131 acquires the user's preference information. Specifically, when the acquisition unit 131 identifies a user, it refers to the preference information database 121 and acquires the preference information of the identified user. For example, the acquisition unit 131 acquires information that the food that the user U1 likes is "curry" as the preference information of the user U1. In this way, the acquisition unit 131 associates the thing information indicating a specific thing (for example, "curry") with the emotion information indicating the user's feelings for the specific thing (for example, "the food that the user likes"). Acquires user information about the user.
  • the acquisition unit 131 acquires the user's preference information. Specifically, when the acquisition unit 131 identifies a user, it refers to the preference information database 121 and acquires the preference information of the identified user. For example, the acquisition unit 131 acquires information that the food that the user U1 likes is "curry" as the preference information of the user U1. In this way, the acquisition unit 131 associates the thing
  • the generation unit 132 generates a character string to be applied to the utterance sentence template. Specifically, when the user's preference information is acquired by the acquisition unit 131, the generation unit 132 inputs the acquired preference information into the sympathy model and utters "X is Y, isn't it?" As the output data of the sympathy model. Generate a character string that applies to the "Y" part of the sentence template. For example, the generation unit 132 inputs the character string "curry", which is the preference target of the user U1, into the sympathy model, and generates the character string "delicious" as the output data of the sympathy model.
  • the generation unit 132 acquires the utterance sentence template.
  • the generation unit 132 acquires an utterance sentence template of "X is Y, isn't it?" (X is a character string indicating a user's preference target, and Y is a character string output from the empathy model).
  • X is a character string indicating a user's preference target
  • Y is a character string output from the empathy model.
  • the generation unit 132 when the speaker's preference information is acquired by the acquisition unit 131, the generation unit 132 generates response information indicating a response to the utterance based on the speaker's preference information. For example, when the speaker's preference information is acquired by the acquisition unit 131, the generation unit 132 generates an utterance sentence output by the robot device based on the speaker's preference information.
  • the generation unit 132 since the generation unit 132 includes "curry", which is a food that user U1 likes, in "I ate curry yesterday", which is an utterance by user U1, the emotion of user U1 who made the utterance is a positive emotion. Presumed to be. In this way, the generation unit 132 estimates the emotion of the speaker based on the utterance information acquired by the acquisition unit 131 and the preference information of the speaker. For example, the generation unit 132 estimates that the speaker's emotions are positive when the utterance information acquired by the acquisition unit 131 includes the speaker's preference target.
  • the generation unit 132 when the utterance "Pakuchi came out at noon" by the user U1 includes the food “Pakuchi” that the user U1 dislikes, the emotion of the user U1 who made the utterance is Presumed to be a negative emotion. Further, the generation unit 132 inputs the character string "Pakuchi", which is an object disliked by the user U1, into the sympathy model, and generates the character string "Sorry" as the output data of the sympathy model.
  • the generation unit 132 when the user U1's utterance "I'm going to see a futsal game next time” includes the user U1's hobby "futsal", the emotion of the user U1 who made the utterance is positive. Presumed to be emotion. Further, the generation unit 132 inputs the character string "futsal", which is the hobby of the user U1, into the sympathy model, and generates the character string "fun” as the output data of the sympathy model.
  • the providing unit 133 is based on the speech information acquired by the acquiring unit 131 and the user information about the user in which the thing information indicating a specific thing and the emotion information indicating the user's feelings for the specific thing are linked. It provides the expression information expressed by the robot device. Specifically, the providing unit 133 provides expression information which is response information indicating a response to an utterance. More specifically, the providing unit 133 provides the expression information which is the sentence of the utterance output by the robot device. For example, the providing unit 133 provides expression information showing empathy based on emotional information associated with the thing information included in the utterance information. The providing unit 133 provides the expression information generated by the generating unit 132. In the example shown in FIG. 1, the providing unit 133 provides the robot device 10 with a response sentence "curry is delicious, isn't it?" Generated by the generating unit 132.
  • the providing unit 133 provides the expression information showing sympathy based on the emotional information associated with at least one of the plurality of thing information.
  • the acquisition unit 131 acquires the utterance "I ate curry while watching the rabbit at the zoo”.
  • the provider 133 was associated with at least one of "curry” and “rabbit”, such as “curry, I like it! Or "rabbit, I hate it a little”.
  • the providing unit 133 gives an emotion associated with the thing information included in the bunsetsu that does not modify the other bunsetsu among the bunsetsu included in the utterance information.
  • the providing unit 133 provides the robot device 10 with information on the behavior based on the estimated emotion. Specifically, the providing unit 133 provides display information which is information indicating the tone of the voice output by the robot device. For example, assuming that the emotion of the user U1 is a positive emotion, the providing unit 133 provides the robot device 10 with information on a bright tone as information on the tone of the voice of the response sentence output by the robot device 10 by voice. .. Alternatively, assuming that the emotion of the user U1 is a negative emotion, the providing unit 133 provides the robot device 10 with information on the dark tone as information on the tone of the voice of the response sentence output by the robot device 10 by voice. ..
  • the providing unit 133 provides the expression information which is the information indicating the facial expression of the robot device. For example, assuming that the emotion of the user U1 is a positive emotion, the providing unit 133 provides the robot device 10 with information indicating a smile as information regarding the facial expression of the robot device 10. Alternatively, assuming that the emotion of the user U1 is a negative emotion, the providing unit 133 provides the robot device 10 with information indicating a sad face as information regarding the facial expression of the robot device 10.
  • the providing unit 133 provides display information which is information indicating the operating speed of the robot device. For example, if the providing unit 133 estimates that the emotion of the user U1 is a positive emotion when the robot device 10 moves at the time of outputting the response sentence, the robot device 10 provides information indicating a quick speed as information on the operating speed. To provide. Alternatively, assuming that the emotion of the user U1 is a negative emotion, the providing unit 133 provides the robot device 10 with information indicating a slow speed as information regarding the operating speed.
  • FIG. 5 is a flowchart showing an information processing procedure according to the embodiment of the present disclosure.
  • the information processing apparatus 100 acquires the morpheme of the voice recognition result (step S101).
  • the information processing device 100 acquires the preference information of the speaker based on the speaker identification result (step S102).
  • the information processing device 100 estimates the emotion of the speaker based on the acquired morpheme and the preference information of the speaker, and generates an utterance sentence of the robot device 10 (step S103).
  • the information processing apparatus 100 includes an acquisition unit 131 and a provision unit 133.
  • the acquisition unit 131 acquires utterance information indicating utterance by the user.
  • the providing unit 133 is based on the speech information acquired by the acquiring unit 131 and the user information about the user in which the thing information indicating a specific thing and the emotion information indicating the user's feelings for the specific thing are linked. It provides the expression information expressed by the robot device.
  • the information processing device 100 enables the robot device to express an expression that matches the emotions of the other party, so that the range of communication with the user can be expanded.
  • the providing unit 133 provides the expression information which is the response information indicating the response to the utterance.
  • the information processing device 100 enables the robot device to respond according to the emotions of the other party, so that the range of communication with the user can be expanded.
  • the providing unit 133 provides expression information showing sympathy based on emotional information associated with the thing information included in the utterance information.
  • the information processing device 100 enables the robot device to estimate the user's emotion based on, for example, the user's preference information included in the utterance, and to express an expression showing sympathy for the estimated emotion. Therefore, the range of communication with the user can be expanded.
  • the providing unit 133 provides the expression information showing sympathy based on the emotional information associated with at least one of the plurality of thing information. To do.
  • the information processing device 100 enables the robot device to make an appropriate response according to the emotion of the other party, so that the range of communication with the user can be expanded.
  • the providing unit 133 gives an emotion associated with the thing information included in the bunsetsu that does not modify the other bunsetsu among the bunsetsu included in the utterance information.
  • the information processing device 100 enables the robot device to make an appropriate response according to the emotion of the other party, so that the range of communication with the user can be expanded.
  • the providing unit 133 provides the expression information which is the text of the utterance output by the robot device.
  • the information processing device 100 enables the robot device to make an appropriate utterance according to the emotion of the other party.
  • the providing unit 133 provides the expression information which is the information indicating the tone of the voice output by the robot device.
  • the providing unit 133 provides the expression information which is the information indicating the facial expression of the robot device.
  • the providing unit 133 provides display information which is information indicating the operating speed of the robot device.
  • FIG. 6 is a hardware configuration diagram showing an example of a computer 1000 that realizes the functions of an information processing device such as the information processing device 100.
  • the computer 1000 includes a CPU 1100, a RAM 1200, a ROM (Read Only Memory) 1300, an HDD (Hard Disk Drive) 1400, a communication interface 1500, and an input / output interface 1600. Each part of the computer 1000 is connected by a bus 1050.
  • the CPU 1100 operates based on the program stored in the ROM 1300 or the HDD 1400, and controls each part. For example, the CPU 1100 expands the program stored in the ROM 1300 or the HDD 1400 into the RAM 1200 and executes processing corresponding to various programs.
  • the ROM 1300 stores a boot program such as a BIOS (Basic Input Output System) executed by the CPU 1100 when the computer 1000 is started, a program that depends on the hardware of the computer 1000, and the like.
  • BIOS Basic Input Output System
  • the HDD 1400 is a computer-readable recording medium that non-temporarily records a program executed by the CPU 1100 and data used by the program.
  • the HDD 1400 is a recording medium for recording an information processing program according to the present disclosure, which is an example of program data 1450.
  • the communication interface 1500 is an interface for the computer 1000 to connect to an external network 1550 (for example, the Internet).
  • the CPU 1100 receives data from another device or transmits data generated by the CPU 1100 to another device via the communication interface 1500.
  • the input / output interface 1600 is an interface for connecting the input / output device 1650 and the computer 1000.
  • the CPU 1100 receives data from an input device such as a keyboard or mouse via the input / output interface 1600. Further, the CPU 1100 transmits data to an output device such as a display, a speaker, or a printer via the input / output interface 1600. Further, the input / output interface 1600 may function as a media interface for reading a program or the like recorded on a predetermined recording medium (media).
  • the media is, for example, an optical recording medium such as a DVD (Digital Versatile Disc) or PD (Phase change rewritable Disk), a magneto-optical recording medium such as an MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory.
  • an optical recording medium such as a DVD (Digital Versatile Disc) or PD (Phase change rewritable Disk)
  • a magneto-optical recording medium such as an MO (Magneto-Optical disk)
  • a tape medium such as a magnetic tape
  • magnetic recording medium such as a magnetic tape
  • semiconductor memory for example, an optical recording medium such as a DVD (Digital Versatile Disc) or PD (Phase change rewritable Disk), a magneto-optical recording medium such as an MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory.
  • the CPU 1100 of the computer 1000 realizes the functions of the control unit 130 and the like by executing the information processing program loaded on the RAM 1200.
  • the information processing program according to the present disclosure and the data in the storage unit 120 are stored in the HDD 1400.
  • the CPU 1100 reads the program data 1450 from the HDD 1400 and executes the program, but as another example, these programs may be acquired from another device via the external network 1550.
  • the present technology can also have the following configurations.
  • An acquisition unit that acquires utterance information indicating utterances by the user, Based on the speech information acquired by the acquisition unit, the user information about the user in which the thing information indicating a specific thing and the emotion information indicating the user's emotion toward the specific thing are associated with each other, the robot A provider that provides the expression information that the device expresses, Information processing device equipped with.
  • the providing part The information processing device according to (1) above, which provides the expression information which is the response information indicating the response to the utterance.
  • the providing part The information processing device according to (1) or (2), which provides the expression information indicating sympathy based on the emotional information associated with the thing information included in the utterance information.
  • the providing part When the spoken information includes a plurality of the said thing information, the expression showing sympathy based on the emotional information associated with at least one of the said said thing information among the plurality of said thing information.
  • the information processing apparatus according to any one of (1) to (3) above, which provides information.
  • the providing part When the utterance information includes a plurality of the thing information, the emotion associated with the thing information included in the clause not modifying the other clauses among the clauses included in the utterance information.
  • the information processing apparatus according to any one of (1) to (4) above, which provides the expression information showing sympathy based on the information.
  • the providing part The information processing device according to any one of (1) to (5) above, which provides the expression information which is the text of the utterance output by the robot device.
  • the providing part The information processing device according to any one of (1) to (6) above, which provides the display information which is information indicating the tone of the voice output by the robot device.
  • the providing part The information processing device according to any one of (1) to (7), which provides the expression information which is information indicating the facial expression of the robot device.
  • the providing part The information processing device according to any one of (1) to (8), which provides the display information which is information indicating the operating speed of the robot device.
  • Acquires utterance information indicating the utterance by the user The robot device expresses based on the acquired utterance information, the thing information indicating a specific thing, and the user information about the user in which the emotion information indicating the user's feelings for the specific thing is associated with each other.
  • An information processing method that executes processing. (11) On the computer The acquisition procedure for acquiring utterance information indicating the utterance by the user, Based on the speech information acquired by the acquisition procedure, the user information about the user in which the thing information indicating a specific thing and the emotion information indicating the user's emotion toward the specific thing are associated with each other, the robot Providing procedure, which provides the expression information expressed by the device, Information processing program to execute.
  • Information processing system 10 Robot device 100 Information processing device 110 Communication unit 120 Storage unit 121 Preference information database 122 Model information database 123 Template database 130 Control unit 131 Acquisition unit 132 Generation unit 133 Providing unit

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Manipulator (AREA)

Abstract

La présente invention concerne un dispositif de traitement d'images (100) qui comprend une unité d'acquisition (131) et une unité de fourniture (133). L'unité d'acquisition (131) acquiert des informations vocales indiquant un énoncé formulé par un utilisateur. L'unité de fourniture (133) fournit des informations d'expression destinées à être exprimées par un dispositif robotisé, les informations d'expression étant fournies en fonction des informations vocales acquises par l'unité d'acquisition (131) et des informations de l'utilisateur concernant l'utilisateur dans lesquelles des informations relatives à une chose indiquant une chose spécifique et des informations d'émotion indiquant l'émotion de l'utilisateur envers la chose spécifique sont associées à d'autres informations. L'unité de fourniture (133) fournit des informations d'expression qui sont des informations de réponse indiquant une réponse à l'énoncé.
PCT/JP2020/045993 2019-12-27 2020-12-10 Dispositif, procédé et programme de traitement d'informations WO2021131737A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2019239051 2019-12-27
JP2019-239051 2019-12-27

Publications (1)

Publication Number Publication Date
WO2021131737A1 true WO2021131737A1 (fr) 2021-07-01

Family

ID=76575476

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/045993 WO2021131737A1 (fr) 2019-12-27 2020-12-10 Dispositif, procédé et programme de traitement d'informations

Country Status (1)

Country Link
WO (1) WO2021131737A1 (fr)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004086001A (ja) * 2002-08-28 2004-03-18 Sony Corp 会話処理装置、および会話処理方法、並びにコンピュータ・プログラム
JP2006178063A (ja) * 2004-12-21 2006-07-06 Toyota Central Res & Dev Lab Inc 対話処理装置
JP2015148701A (ja) * 2014-02-06 2015-08-20 日本電信電話株式会社 ロボット制御装置、ロボット制御方法及びロボット制御プログラム
JP2016536630A (ja) * 2013-10-01 2016-11-24 ソフトバンク・ロボティクス・ヨーロッパSoftbank Robotics Europe 人型ロボット等の機械と人間話者との間の対話方法、コンピュータプログラム製品、および同方法を実行する人型ロボット
JP2018054866A (ja) * 2016-09-29 2018-04-05 トヨタ自動車株式会社 音声対話装置および音声対話方法
JP2019175432A (ja) * 2018-03-26 2019-10-10 カシオ計算機株式会社 対話制御装置、対話システム、対話制御方法及びプログラム

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004086001A (ja) * 2002-08-28 2004-03-18 Sony Corp 会話処理装置、および会話処理方法、並びにコンピュータ・プログラム
JP2006178063A (ja) * 2004-12-21 2006-07-06 Toyota Central Res & Dev Lab Inc 対話処理装置
JP2016536630A (ja) * 2013-10-01 2016-11-24 ソフトバンク・ロボティクス・ヨーロッパSoftbank Robotics Europe 人型ロボット等の機械と人間話者との間の対話方法、コンピュータプログラム製品、および同方法を実行する人型ロボット
JP2015148701A (ja) * 2014-02-06 2015-08-20 日本電信電話株式会社 ロボット制御装置、ロボット制御方法及びロボット制御プログラム
JP2018054866A (ja) * 2016-09-29 2018-04-05 トヨタ自動車株式会社 音声対話装置および音声対話方法
JP2019175432A (ja) * 2018-03-26 2019-10-10 カシオ計算機株式会社 対話制御装置、対話システム、対話制御方法及びプログラム

Similar Documents

Publication Publication Date Title
CN106663219B (zh) 处理与机器人的对话的方法和系统
JP6719739B2 (ja) 対話方法、対話システム、対話装置、及びプログラム
US9355092B2 (en) Human-like response emulator
KR102418558B1 (ko) 대화형 인공지능 아바타를 이용한 영어 말하기 교육 방법, 장치 및 이에 대한 시스템
JP2016536630A (ja) 人型ロボット等の機械と人間話者との間の対話方法、コンピュータプログラム製品、および同方法を実行する人型ロボット
CN108470188B (zh) 基于图像分析的交互方法及电子设备
JP7371135B2 (ja) 特定話者スピーチモデルを使用した話者認識
CA2835368A1 (fr) Systeme et procede permettant d'assurer un dialogue avec un utilisateur
US11682318B2 (en) Methods and systems for assisting pronunciation correction
Catania et al. CORK: A COnversational agent framewoRK exploiting both rational and emotional intelligence
Ritschel et al. Multimodal joke generation and paralinguistic personalization for a socially-aware robot
JP7029351B2 (ja) Oos文章を生成する方法及びこれを行う装置
US12008919B2 (en) Computer assisted linguistic training including machine learning
US20220253609A1 (en) Social Agent Personalized and Driven by User Intent
WO2021131737A1 (fr) Dispositif, procédé et programme de traitement d'informations
WO2024069978A1 (fr) Dispositif de génération, dispositif d'apprentissage, procédé de génération, procédé d'entraînement et programme
Boonstra Introduction to conversational AI
Planet et al. Children’s emotion recognition from spontaneous speech using a reduced set of acoustic and linguistic features
WO2021186525A1 (fr) Dispositif de génération d'énoncé, procédé de génération d'énoncé et programme
DeMara et al. Towards interactive training with an avatar-based human-computer interface
JP2021149664A (ja) 出力装置、出力方法及び出力プログラム
JP6176137B2 (ja) 音声対話装置、音声対話システム及びプログラム
Feng et al. A platform for building mobile virtual humans
López et al. Lifeline dialogues with roberta
WO2021064947A1 (fr) Procédé d'interaction, système d'interaction, dispositif d'interaction et programme

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20907462

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20907462

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP