WO2023228433A1 - Line-of-sight control device and method, non-temporary storage medium, and computer program - Google Patents

Line-of-sight control device and method, non-temporary storage medium, and computer program Download PDF

Info

Publication number
WO2023228433A1
WO2023228433A1 PCT/JP2022/038670 JP2022038670W WO2023228433A1 WO 2023228433 A1 WO2023228433 A1 WO 2023228433A1 JP 2022038670 W JP2022038670 W JP 2022038670W WO 2023228433 A1 WO2023228433 A1 WO 2023228433A1
Authority
WO
WIPO (PCT)
Prior art keywords
line
sight
robot
gaze
dialogue
Prior art date
Application number
PCT/JP2022/038670
Other languages
French (fr)
Japanese (ja)
Inventor
カルロス トシノリ イシイ
太健 新谷
Original Assignee
国立研究開発法人理化学研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 国立研究開発法人理化学研究所 filed Critical 国立研究開発法人理化学研究所
Publication of WO2023228433A1 publication Critical patent/WO2023228433A1/en

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J13/00Controls for manipulators
    • B25J13/08Controls for manipulators by means of sensing devices, e.g. viewing or touching devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output

Definitions

  • This invention relates to a technology for controlling the line of sight of an agent such as a robot when interacting with a person.
  • This application claims priority based on Japanese Application No. 2022-086674 filed on May 27, 2022, and incorporates all the contents described in the said Japanese application.
  • agents such as robots interact with people, not only verbal information but also non-verbal information has great significance.
  • controlling the agent's line of sight is important not only for facilitating the interaction but also for allowing the agent to express its individuality.
  • Non-Patent Document 1 states that by observing the behavior of two people when they are doing a certain task, extroverts and introverts are able to determine the amount of time they spend looking at the other person and the amount of time they spend on the task. It has been reported that there is a difference in the amount of time the children spend their time focusing on the target. Non-Patent Document 1 reports the results of an experiment regarding the impression that a person who interacts with a robot receives by controlling the robot's movements based on the results.
  • Non-Patent Document 1 is based on human behavior when performing a predetermined task in a specific environment. In general dialogue, guidelines regarding how to control the robot's line of sight cannot be obtained from the disclosure of Non-Patent Document 1.
  • Patent Document 1 listed below discloses a technology related to controlling the line of sight of a robot when interacting with one or more people.
  • the robot's line of sight is directed toward the person who is speaking.
  • the robot's gaze generally does not turn in the direction of the new speaker.
  • the robot's degree of interest in the utterance of the first speaker is calculated, and if that value is lower than a threshold, the robot directs its gaze in the direction of the new utterance.
  • controlling the robot's line of sight in this manner has the effect of preventing the robot from unnaturally moving its line of sight here and there in a short period of time.
  • a line-of-sight control device is a line-of-sight control device for controlling the line of sight of a robot in a multi-person interaction, and in response to a timing for determining a line-of-sight direction,
  • a line-of-sight direction setting means for determining the line-of-sight direction of the robot based on a combination of the role of the robot in a multi-person dialogue and the state of the dialogue flow; and control parameter generation means for generating control parameters for controlling the direction of the robot's face and the direction of the eyeballs.
  • the line-of-sight direction setting means calculates the probability that the plurality of participants will each face in a plurality of predetermined directions according to a combination of each role of the plurality of participants in the multiperson dialogue and the state of the dialogue flow.
  • a direction determination model storage means for storing a direction determination model determined for each role, and a direction determination model storage means for storing a direction determination model determined for each role, and a direction determination model that stores the robot's role and the state of the dialogue flow from the direction determination model in response to the timing for determining the line of sight direction. It includes a probability distribution extracting means for extracting a probability distribution according to the combination, and a first sampling means for sampling the line of sight direction of the robot from the probability distribution extracted by the probability distribution extracting means.
  • the plurality of directions of the direction determination model include the directions of the plurality of participants and a gaze aversion direction that is different from any of the directions of the plurality of participants.
  • the line-of-sight direction setting means further includes a line-of-sight direction model for storing a line-of-sight direction model that is a probabilistic model for probabilistically determining the line-of-sight direction according to a combination of the role of the robot and the state of the dialogue flow. a deflection direction model storage means; and a second sampling means for sampling the direction in which the robot's gaze is diverted from the gaze aversion direction model in response to the fact that the gaze direction sampled by the first sampling device is the gaze aversion direction. including.
  • the line-of-sight control device further includes a continuation for calculating the duration of the robot's line-of-sight according to a combination of the role of the robot, the state of the dialogue flow, and the line-of-sight direction determined by the line-of-sight direction setting means. Contains a time calculation section.
  • the timing for determining the line of sight direction is different between when the dialogue flow is in a turn change state and when it is not.
  • the timing for determining the line of sight direction is a predetermined timing during a turn change state when the state of the dialogue flow is a turn change state, and a predetermined timing immediately before the turn change state when the state of the dialogue flow is not a turn change state. This is the timing when the duration calculated by the duration calculation unit expires.
  • the line-of-sight direction setting means allows the plurality of participants to be predetermined according to the combination of the roles of the plurality of participants in the multiperson dialogue, the state of the dialogue flow, and the expected personality of the robot.
  • a direction determination model storage means for storing a direction determination model that determines the probability of facing each of a plurality of directions for each role;
  • Probability distribution extraction means for extracting a probability distribution according to the combination of role, dialogue flow state, and personality, and first sampling for sampling the robot's gaze direction from the probability distribution extracted by the probability distribution extraction means. means.
  • the plurality of directions of the direction determination model include the directions of the plurality of participants and a gaze aversion direction that is different from any of the directions of the plurality of participants.
  • the line-of-sight direction setting means further stores a line-of-sight direction model that is a probabilistic model for probabilistically determining the line-of-sight direction according to a combination of the robot's role, dialogue flow state, and personality. and a second line of sight for sampling the direction in which the robot looks away from the line-of-sight direction model in response to the line-of-sight direction sampled by the first sampling means being the line-of-sight direction. sampling means.
  • the line of sight control device further calculates the duration of the robot's line of sight according to a combination of the robot's role, the state of the dialogue flow, its personality, and the line of sight direction determined by the line of sight direction setting means. Contains a duration calculation section for.
  • the timing for determining the line of sight direction is different between when the dialogue flow is in a turn change state and when it is not.
  • the timing for determining the line of sight direction is a predetermined timing during a turn change state when the state of the dialogue flow is a turn change state, and a predetermined timing immediately before the turn change state when the state of the dialogue flow is not a turn change state. This is the timing when the duration calculated by the duration calculation unit expires.
  • a line-of-sight control method is a line-of-sight control method implemented by a computer for controlling the line-of-sight of a robot in a multi-person interaction, wherein the computer determines the timing for determining the line-of-sight direction.
  • the computer determines the robot's line of sight direction based on the combination of the robot's role in the multi-person dialogue and the state of the dialogue flow, and the computer determines the robot's line of sight direction in the step of determining the line of sight direction. and generating control parameters for controlling the direction of the robot's face and the direction of the eyes in response to the determination of the robot.
  • a computer program is a computer program for controlling the line of sight of a robot in a multi-person interaction, and the computer program controls the computer to control the line of sight of a robot in response to timing for determining the line of sight direction.
  • a line-of-sight direction setting means for determining the line-of-sight direction of the robot based on a combination of the role of the robot in a multi-person dialogue and the state of the dialogue flow, and a response to the line-of-sight direction of the robot being determined by the line-of-sight direction setting means.
  • the controller functions as a control parameter generation means for generating control parameters for controlling the direction of the robot's face and the direction of the eyeballs.
  • FIG. 1 is a diagram schematically showing the settings of a preliminary experiment.
  • FIG. 2 is a schematic diagram for explaining a method for tagging utterances of each participant in a preliminary experiment.
  • FIG. 3 is a schematic diagram for explaining the timing of turn changes in utterances.
  • FIG. 4 is a graph showing the frequency of each speaker's gaze direction during turn changes in a three-way dialogue in a preliminary experiment.
  • FIG. 5 is a graph showing the temporal ratio of each speaker's line of sight direction in a three-way dialogue in a preliminary experiment, except at the time of turn change.
  • FIG. 1 is a diagram schematically showing the settings of a preliminary experiment.
  • FIG. 2 is a schematic diagram for explaining a method for tagging utterances of each participant in a preliminary experiment.
  • FIG. 3 is a schematic diagram for explaining the timing of turn changes in utterances.
  • FIG. 4 is a graph showing the frequency of each speaker's gaze direction during turn changes in a three-
  • FIG. 6 is a diagram showing, in a table format, the proportion of each participant in a dialogue who directs his/her gaze toward each other participant while speaking and the time during which the conversation averts his/her gaze.
  • FIG. 7 is a graph showing the temporal ratio of the line of sight direction of a particular extroverted participant in each of the participating roles in the dialogue in the three-way dialogue of the preliminary experiment.
  • FIG. 8 is a graph showing the temporal ratio of the direction of gaze of a specific introverted participant in each of the participating roles in the dialogue in the three-way dialogue of the preliminary experiment.
  • FIG. 9 is a graph showing the distribution of the times when an extroverted dialogue participant turned his/her gaze toward each participant when the speaker was the speaker.
  • FIG. 7 is a graph showing the temporal ratio of the line of sight direction of a particular extroverted participant in each of the participating roles in the dialogue in the three-way dialogue of the preliminary experiment.
  • FIG. 8 is a graph showing the temporal ratio of the
  • FIG. 10 is a graph showing the distribution of times when an introverted dialogue participant turns his/her gaze toward each participant when the speaker is the speaker.
  • FIG. 11 is a graph showing the frequency of the number of times participants in each role averted their gaze during speech.
  • FIG. 12 is a graph showing a histogram showing the distribution of times when dialogue participants avert their gaze and an approximate curve thereof.
  • FIG. 13 is a diagram showing the ratio of pupil positions when dialogue participants avert their line of sight.
  • FIG. 14 is a diagram showing the ratio of pupil positions when an extroverted conversation participant averts his/her line of sight.
  • FIG. 15 is a diagram showing the ratio of pupil positions when an introverted conversation participant averts his/her line of sight.
  • FIG. 16 is a block diagram showing the hardware configuration of a conversational robot system 150 according to an embodiment of the present invention.
  • FIG. 17 is a diagram showing the outer shape of the robot shown in FIG. 16.
  • FIG. 18 is a block diagram showing the hardware configuration of the computer shown in FIG. 16.
  • FIG. 19 is a block diagram showing the functional configuration of a line-of-sight control device realized by the robot control device shown in FIG. 16.
  • FIG. 20 is a schematic diagram showing the configuration of the line-of-sight direction model shown in FIG. 19.
  • FIG. 21 is a diagram illustrating an example of the gaze direction model for each individual and role during utterance shown in FIG. 20.
  • FIG. 22 is a schematic diagram showing an example of the configuration of the line-of-sight aversion model shown in FIG. 19.
  • FIG. 19 is a block diagram showing the hardware configuration of a conversational robot system 150 according to an embodiment of the present invention.
  • FIG. 17 is a diagram showing the outer shape of the robot shown in FIG. 16.
  • FIG. 23 is a schematic diagram showing an example of the individual-by-individual, turn-changing, and gaze-averting gaze direction models shown in FIG. 22.
  • FIG. 24 is a flowchart showing a control structure of a computer program that causes a computer to function as a line-of-sight control device in an embodiment of the present invention.
  • FIG. 25 is a flowchart showing a control structure of a computer program that causes a computer to execute the state sensing step shown in FIG. 24.
  • FIG. 26 is a flowchart showing a control structure of a computer program that causes a computer to execute the step of determining the line-of-sight direction and duration shown in FIG. 24.
  • FIG. 24 is a flowchart showing a control structure of a computer program that causes a computer to function as a line-of-sight control device in an embodiment of the present invention.
  • FIG. 25 is a flowchart showing a control structure of a computer program that causes a computer to execute the state sensing step
  • FIG. 27 is a flowchart showing a control structure of a computer program that causes a computer to execute the line-of-sight direction determining step shown in FIG. 24.
  • FIG. 28 is a flowchart showing a control structure of a computer program that causes a computer to execute the step of determining the gaze duration time shown in FIG. 26.
  • FIG. 29 is a flowchart showing a control structure of a computer program that causes a computer to execute the step of determining the direction of the line of sight when the line of sight is averted, shown in FIG.
  • FIG. 30 is a diagram showing the arrangement of video screens for evaluation experiments.
  • FIG. 31 is a graph showing the results of the evaluation experiment.
  • FIG. 32 is a graph showing the results of the evaluation experiment.
  • the same parts are given the same reference numerals. Therefore, detailed description thereof will not be repeated.
  • the robot does not necessarily have to be human-shaped; it can be of any shape as long as it has eyeballs and can talk.
  • This also applies to virtual agents that do not have a three-dimensional entity such as robots, but are expressed as two-dimensional images, or virtual agents that are expressed as three-dimensional images in virtual space.
  • the invention can be applied.
  • a preliminary experiment was conducted as a three-way dialogue 50.
  • the participants in the three-way dialogue 50 are participants 60, 62, and 64.
  • participants 60, 62, and 64 are all standing, but in a preliminary experiment, the direction of their line of sight is important, and it is necessary to fix each other's positions.
  • Participants in each group spoke while sitting on three chairs placed at the three vertices of a triangle.
  • a camera was installed in front of each participant, and each participant's face and body movements were photographed from directly in front.
  • Each participant wore a headset microphone, and audio data of each participant and video of the entire discussion were collected. After the dialogue was completed, a transcript of the dialogue was created from these recordings and audio data.
  • FIG. 2 shows an example of labeled dialogue data obtained in this way. The following analysis was performed on the labeled dialogue data thus obtained.
  • FIG. 2 shows examples of gaze label sequences 80, 82, and 84 for three participants A, B, and C.
  • the label "B" in the gaze label column 80 of participant A indicates that participant A was looking at participant B during this time period.
  • the label "C” in the line of sight label string 80 indicates that participant A was looking at participant C.
  • “Averted gaze” in the line of sight label column 80 indicates that participant A was not looking at either participant B or participant C. In other words, this label indicates that Participant A was looking away from other participants during this period.
  • a participant not looking at another participant in this manner is referred to as a "look away”
  • its duration is referred to as a "look away duration time”.
  • the line-of-sight label strings 82 and 84 were also created in the same way as the line-of-sight label string 80.
  • a label indicating which direction participant A's gaze was facing at that time is written as additional information.
  • a period marked “above” indicates that Participant A's line of sight was directed upward
  • a period marked “Top right” indicates that Participant A's line of sight was directed upward to the right.
  • the line of sight includes not only the angle of the face but also the direction of the eyeballs, and the direction comprehensively determined from these two elements is the direction of the line of sight. Similar labeling was performed for Participant B and Participant C.
  • Presence of speaking right and turn change Figure 3 shows the actual turn change.
  • participant A initially has the right to speak and is making the utterance 100. This is indicated by turn alternation label 102.
  • speaker C takes the right to speak and makes an utterance 104.
  • turn alternation label 106 The annotator determines the timing of the turn change. For example, the turn change label 106 clearly indicates that the right to speak has been transferred from participant A to participant C.
  • participant C is the speaker.
  • the main listener at the time of a turn change is a person who previously had the right to speak and who has given up the right to speak to the speaker due to the turn change.
  • participant A is the main listener.
  • a person who is not involved in this exchange of speaking rights is a sublistener.
  • participant B is the sublistener.
  • FIG. 4 shows the changes in the rate of eye movement for each role during turn change.
  • the horizontal axis is time (unit: seconds).
  • the 2-second interval at the time of turn change is set from -1.0 seconds to -0.3 seconds, and from -0.3 seconds to 0 seconds, with the timing of turn change being the origin (0 seconds). It is divided into three sections: .3 seconds, and from 0.3 seconds to 1 second, and the line-of-sight direction is determined at the beginning of each section. Therefore, in a preliminary experiment, statistics are taken separately for each of these sections and a model is created.
  • FIG. 4(A) shows the change in the percentage of the speaker's (SP) gaze during turn changes.
  • SP the speaker's
  • ML main listener
  • the rate at which he looks away from the main listener increases.
  • the rate at which the speaker looks away reaches a peak around 0.1 seconds after the speaker starts speaking.
  • the rate at which the speaker's line of sight turns toward the main listener and the rate at which the speaker's line of sight averts become almost equally high, completing the line-of-sight transition during the turn change.
  • FIG. 4(B) shows the transition of the gaze ratio of the main listener who had the right to speak until just before the turn change.
  • various situations can be considered, such as the main listener passing the right to speak to the speaker, the speaker taking the right to speak, or the speaker taking the right to speak naturally.
  • the main listener knows or has decided who the next speaker will be before the next speaker starts speaking, and the main listener knows or has decided who the next speaker will be before the next speaker starts speaking, and the main listener has the ability to listen to the next speaker from 1 second before the turn takes place to 1 second after giving up the right to speak.
  • the main listener tends to keep looking at the speaker.
  • Fig. 3(C) shows the transition of the gaze ratio of the sublisteners who did not participate in the turn change.
  • the proportion of sub-listeners looking at the main listener and the proportion looking at the speaker are equally high.
  • the rate at which the sublistener looks at the speaker increases. Therefore, the sublistener guesses the next speaker from 1 second before the turn change until the turn change, and until then, the sublistener looks at the main listener or the next speaker, and then looks at the next speaker. It is thought that there is a tendency to move.
  • FIG. 5 shows the ratio of each participant's gaze direction during the speaking period.
  • FIG. 6 shows specific numbers of the ratio of line of sight in the utterance section shown in FIG. 5 in a table format.
  • the horizontal axis represents the direction of the line of sight.
  • the vertical axis indicates the proportion of time spent facing each direction relative to the total time.
  • the rate at which speakers look at the main listener while speaking is slightly high. However, the speaker distributes his/her gaze in a well-balanced manner as a whole.
  • the main listener faces the speaker for nearly 70% of the speaking period, and the proportion of looking at the sublistener is quite low.
  • the rate at which main listeners look away is significantly higher than the rate at which sublisteners look, but it is significantly lower than the rate at which they look at the speaker.
  • the tendency for sublisteners is almost the same as for the main listener.
  • the sublistener also looks in the direction of the speaker for nearly 70% of the utterance period.
  • the rate of looking away is higher than the rate of looking at the main listener, but lower than the rate of looking away by the main listener.
  • Gaze movements based on individuality Based on the awareness that gaze movements may differ depending on individuality, in addition to the above preliminary experiment (first preliminary experiment), we conducted a separate experiment to analyze gaze movements related to individuality.
  • a preliminary experiment (second preliminary experiment) was conducted.
  • a three-way dialogue was conducted with 17 participants, both male and female.
  • a total of 14 sessions were conducted, and each session included 10 to 20 minutes of free conversation.
  • the three speakers sat on three chairs, each placed at the three vertices of an equilateral triangle with sides of about 2 meters, and were equipped with headset microphones and acceleration sensors. Separately, each participant's movements were recorded using an image depth camera.
  • FIG. 8(A) similarly shows the results obtained for speaker B.
  • FIG. 8(A) it can be seen that when speaker B is the speaker, he looks away more often than speaker A, and tends not to look at the sublistener much. In particular, when compared with FIG. 5A, it can be seen that the proportion of time in which speaker B looks away is quite high.
  • FIG. 7(B) shows the ratio of the time when speaker A looked at the speaker, the time when he looked at the sub-listener, and the time when he looked away when he was the main listener.
  • Figure (B) also shows the results obtained for speaker B.
  • FIG. 7(B) it can be seen that when speaker A is the main listener, the listener spends a very high percentage of the time looking at the speaker, and tends to look away from time to time.
  • FIG. 8(B) in the case of speaker B, the proportion of time spent looking at the speaker is high, similar to speaker A, but the proportion of time spent looking away from the speaker is also considerably higher than speaker A. I understand. It can also be seen that the proportion of time spent looking at the sublistener is low for both speakers A and B.
  • FIG. 7(B) and FIG. 8(B) Comparing FIG. 7(B) and FIG. 8(B) with FIG. 5(B), it can be seen that they show the same tendency. However, as shown in FIG. 8(B), it is noteworthy that in the case of speaker B, the proportion of time during which he looks away is high.
  • FIG. 7(C) shows the ratio of the time when speaker A looked at the speaker, the time when he looked at the main listener, and the time when he looked away when he was a sublistener.
  • FIG. 8(C) similarly shows the results obtained for speaker B.
  • speaker A spends a higher proportion of the time looking at the speaker and a lower proportion of the time he looks away.
  • speaker B spends a lower percentage of time looking at the speaker and a higher percentage of time looking away from the speaker, compared to the case where individuality is not considered.
  • the proportion of time spent looking at the main listener is also quite low.
  • Gaze duration time based on individuality
  • Speaker A and Speaker B the distribution of gaze duration times during the speaking period (period other than the turn change period) when playing the roles of speaker, main listener, and sublistener.
  • FIGS. 9(A) to (C) and FIGS. 10(A) to (C) show a histogram of the duration of the line of sight under each condition and a distribution curve approximated by the ⁇ 2 distribution expressed by the following equation.
  • n is the degree of freedom of the ⁇ 2 distribution
  • scale is a parameter that determines the magnitude of the amplitude in the vertical axis direction of the distribution curve
  • loc is a parameter that determines the bias in the horizontal axis direction of the distribution curve.
  • FIG. 9 shows the results for speaker A
  • FIG. 10 shows the results for speaker B.
  • Gaze aversion/interval of gaze aversion Figure 11 shows a scatter plot of the number of times participants averted their gaze during speech in the first preliminary experiment.
  • the horizontal axis indicates the time during speech.
  • the vertical axis indicates the number of times the person looked away from the person.
  • the time on the horizontal axis is the time excluding the beginning and end of the utterance.
  • FIG. 11 also shows an approximate straight line for averting the line of sight.
  • the regression coefficient r of the approximate straight line is 0.83, 0.82, and 0.67 for FIGS. 11(A), (B), and (C), respectively. From FIGS. 11 and 6, it can be seen that the interval at which the user looks away does not change regardless of the role.
  • FIG. 14 shows the results for speaker A
  • FIG. 15 shows the results for speaker B.
  • both speaker A and speaker B looked downward rather than upward when averting their gaze. It can be seen that the frequency of deviation is high. However, speaker B tends to look away downward more frequently. Both speakers have a relatively high frequency of having their eyes facing forward (i.e., their pupils are located in the center). In the three-person dialogue data obtained in the second preliminary experiment, the dialogue participants are located diagonally to the left and right. Therefore, when the speaker's eyes are facing forward, it means that the speaker is looking away from either of the conversation partners.
  • Second embodiment 1 System Configuration
  • a conversational robot system 150 that employs the line-of-sight control system according to the embodiment of the present invention for controlling the robot 160 will be described below.
  • an agent such as a robot to interact with people
  • existing functions are used.
  • techniques for recognizing turn change timing such as those described in the following references, can be used.
  • the recognition of the roles of participants including the robot and the recognition of whether the robot should perform an action to acquire the right to speak are given as given from the outside. shall be.
  • the identity of the speaker can be determined by determining the location of the speaker and localizing the sound source as described later.
  • a method such as selecting a participant who faces the robot for a longer period of time based on facial images of other participants as the main listener may be considered.
  • FIG. 16 shows the hardware configuration of the conversational robot system 150 according to this embodiment.
  • the conversational robot system 150 includes a probability model storage device 172 that stores a probability model for controlling the gaze of the robot 160, and a probability model stored in the probability model storage device 172.
  • the computer 160 includes an operation control PC (Personal Computer) 162 for controlling line-of-sight-related operations, and a network 176 to which the operation control PC 162 is connected.
  • the robot 160 according to this embodiment is for carrying out a three-way dialogue with two other dialogue participants.
  • PC Personal Computer
  • the conversational robot system 150 further includes a human position sensor 178 for detecting the position of a conversation partner, and a human position recognition PC 180 connected to the human position sensor 178 and the network 176.
  • the human position sensor 178 is for detecting the position of the conversation partner and providing a detection signal to the human position recognition PC 180.
  • the human position recognition PC 180 is for calculating the position of the dialogue partner as seen from the robot 160 based on the signal from the human position sensor 178 and transmitting the position to the operation control PC 162 via the network 176.
  • the conversation robot system 150 further uses a microphone 164 to recognize the utterances of the conversation participants, the turn changes, and the roles of each participant based on the audio signals received from the microphone 164, and sends the results via the network 176. and an audio processing PC 166 for transmitting data to the operation control PC 162.
  • the microphone 164 is a microphone array, and the audio processing PC 166 can specify the position of the sound source based on the output of the microphone 164.
  • the audio processing PC 166 has a function of transmitting to the operation control PC 162 information indicating the position of the person who made the utterance, as well as the utterance text, turn change detection information, and the role of each participant.
  • the operation control PC 162 has a function of determining whether or not it should acquire the right to speak based on the content of the uttered text, the detection result of the turn change timing, and the recognition result of the role by the voice processing PC 166.
  • the conversation robot system 150 further includes a speech synthesis PC 170 for receiving spoken text and generating an audio signal corresponding to the text, and a speaker 168 for receiving an audio signal from the speech synthesis PC 170 and converting it into speech.
  • the quality of the voice synthesized by the voice synthesis PC 170 is selected to be suitable for the appearance of the robot 160.
  • the conversation robot system 150 further includes a conversation PC 174 connected to the network 176 and configured to generate and output a response to the utterance in response to receiving utterance text from another PC via the network 176.
  • the operation control PC 162 has a function of sending the text of the speaker's utterance to the dialogue PC 174 when the voice processing PC 166 determines that the robot 160 should acquire the right to speak.
  • the dialogue PC 174 generates an uttered text in response to this input and sends it to the operation control PC 162 and the probability model storage device 172, so that the speech synthesis PC 170 and the speaker 168 generate a voice corresponding to the uttered text, and the operation control
  • the PC 162 can control the movement of the audio processing PC 166 in response to speech.
  • the audio processing PC 166 performs speech recognition on the audio signal output from the microphone 164, further performs sound source localization based on the output of the microphone 164, and indicates the text obtained as a result of the speech recognition and the position of the speaker. and a voice recognition unit 190 having a function of transmitting information to the operation control PC 162.
  • the voice processing PC 166 further uses the text output from the voice recognition unit 190 and the prosody information including the voice power and fundamental frequency F0 obtained from the voice signal of the microphone 164 to perform a turn according to the method disclosed in the above-mentioned reference document. It includes a turn recognition unit 192 that executes a process of detecting a turn change and a process of recognizing the role of each participant in the turn change, and transmits the results to the operation control PC 162.
  • FIG. 17 shows the appearance of the robot 160.
  • the robot 160 is a rather small robot, and can rotate its upper body and head left and right. Furthermore, a large pupil is drawn on the eyeball of the robot 160, and by controlling the rotation angle of the eyeball, the line of sight direction of the robot 160 can be moved vertically and horizontally.
  • FIG. 18 is a hardware block diagram of a computer system that operates as the operation control PC 162 shown in FIG. 16, for example.
  • the voice processing PC 166, voice synthesis PC 170, and human position recognition PC 180 shown in FIG. 16 can also be realized by a computer system having almost the same configuration as the operation control PC 162.
  • the configuration of the computer system 250 as a representative of these PCs 162 will be described, and details of the configuration of each individual PC will not be described.
  • this computer system 250 includes a computer 270 having a DVD (Digital Versatile Disc) drive 302, a keyboard 274, a mouse 276, and a monitor 272, all connected to the computer 270, for interacting with the user. including.
  • DVD Digital Versatile Disc
  • keyboard 274 a mouse 276, and a monitor 272
  • monitor 272 all connected to the computer 270, for interacting with the user.
  • these are examples of configurations for when user interaction is required, and any general hardware and software that can be used for user interaction (e.g. touch panel, voice input, pointing device in general) can be used. Available.
  • a computer 270 is connected to a CPU (Central Processing Unit) 290, a GPU (Graphics Processing Unit) 292, and a DVD drive 302. bus 310 and , a ROM (Read-Only Memory) 296 connected to the bus 310 and storing boot-up programs for the computer 270, and a RAM connected to the bus 310 and storing instructions constituting the program, system programs, work data, etc. (Random Access Memory) 298 and an SSD (Solid State Drive) 300 that is a nonvolatile memory connected to a bus 310.
  • the SSD 300 is for storing programs executed by the CPU 290 and GPU 292, data used by the programs executed by the CPU 290 and GPU 292, and the like.
  • the computer 270 further includes a network I/F (Interface) 308 that provides a connection to the network 176 that enables communication with other terminals, and a USB (Universal Serial Bus) memory 284 that is removably connected to the computer. 270.
  • network I/F Interface
  • USB Universal Serial Bus
  • the computer 270 is further connected to the microphone 164, the speaker 168, and the bus 310, reads audio signals, video signals, and text data generated by the CPU 290 and stored in the RAM 298 or the SSD 300 according to instructions from the CPU 290, and performs analog conversion and amplification processing. It includes an audio I/F 304 having a function of driving a speaker 168, digitizing an analog audio signal from a microphone 164, and storing it in an arbitrary address specified by the CPU 290 in the RAM 298 or SSD 300.
  • programs for realizing the functions of the operation control PC 162, the voice processing PC 166, the voice synthesis PC 170, and the human position recognition PC 180 are all stored in the SSD 300, RAM 298, DVD 278, or USB memory 284 shown in FIG. 18, for example.
  • the information may be stored in a storage medium of an external device (not shown) connected via the network I/F 308 and the network 176.
  • these data and parameters are written into the SSD 300 from the outside, for example, and loaded into the RAM 298 when the computer 270 is executed.
  • a computer program for operating this computer system so as to realize the functions of the operation control PC 162, voice processing PC 166, voice synthesis PC 170, and human position recognition PC 180 shown in FIG. 7 and their respective components is installed in the DVD drive 302.
  • the data is stored on the DVD 278 and transferred from the DVD drive 302 to the SSD 300.
  • these programs are stored in the USB memory 284, the USB memory 284 is attached to the USB port 306, and the programs are transferred to the SSD 300.
  • this program may be transmitted to computer 270 via network 176 and stored on SSD 300.
  • the program is loaded into RAM 298 during execution.
  • a source program may be input using the keyboard 274, monitor 272, and mouse 276, and the compiled object program may be stored in the SSD 300.
  • a script language as in the above embodiment, a script input using the keyboard 274 or the like may be stored in the SSD 300.
  • Neural networks are used for speech recognition, speech synthesis, etc. A trained neural network may be used, or the training may be performed on the talking robot system 150.
  • the CPU 290 reads the program from the RAM 298 according to the address indicated by an internal register called a program counter (not shown), interprets the instruction, and stores the data necessary for executing the instruction in the RAM 298 and the SSD 300 according to the address specified by the instruction. Or read it from other devices and execute the process specified by the command.
  • the CPU 290 stores the data of the execution result at an address specified by the program, such as the RAM 298, the SSD 300, or a register within the CPU 290.
  • the computer outputs a command to the robot's actuator, a voice signal, etc.
  • the value of the program counter is also updated by the program.
  • Computer programs may be loaded directly into RAM 298 from DVD 278 , from USB memory 284 , or via network 176 . Note that in the program executed by the CPU 290, some tasks (mainly numerical calculations) are dispatched to the GPU 292 according to instructions included in the program or according to an analysis result when the CPU 290 executes the instructions.
  • a program for realizing the functions of each part according to the above-described embodiment by the computer 270 includes a plurality of instructions written and arranged to cause the computer 270 to operate to realize those functions. Some of the basic functions required to execute this instruction are provided by the operating system (OS) running on the computer 270, third party programs, modules of various toolkits installed on the computer 270, or the program execution environment. In some cases, it may be provided. Therefore, this program does not necessarily include all the functions necessary to implement the system and method of this embodiment.
  • This program includes each of the above-mentioned devices and modules by statically linking or dynamically calling appropriate functions or modules in a controlled manner to obtain the desired results. It is sufficient to include only instructions for executing operations as its constituent elements. The manner in which computer 270 operates for this purpose is well known and will not be repeated here.
  • the GPU 292 is capable of parallel processing, and can execute a large amount of calculations associated with machine learning simultaneously in parallel or in a pipeline manner. For example, parallel computing elements found in a program when the program is compiled or parallel computing elements discovered when the program is executed are dispatched from the CPU 290 to the GPU 292 and executed, and the results are sent directly or to the RAM 298. is returned to the CPU 290 via a predetermined address, and is substituted into a predetermined variable in the program.
  • FIG. 19 shows the functional configuration of a line-of-sight control system 350, which is a part related to line-of-sight control in the conversation robot system 150 according to this embodiment.
  • the line of sight control system 350 includes a role determination unit 360 for determining and outputting the current roles of the robot 160 and other dialogue participants, and a turn change detection unit 362 for detecting a turn change. including.
  • turn changes and the roles of dialogue participants are detected each time.
  • the role determining section 360 and the turn replacement detecting section 362. In the evaluation experiment described below, line of sight control was performed under such conditions.
  • the line-of-sight control system 350 further includes a line-of-sight direction model 364 for stochastically determining the line-of-sight direction of the robot 160 during a turn change or while speaking, and a line-of-sight model 364 for determining the line-of-sight direction of the robot 160 in an overefficient manner when the robot 160 averts its line of sight. , and a personality information storage unit 368 for storing information regarding the personality (extrovert, introvert, neutral) of the robot 160. Details of the viewing direction model 364 will be described later with reference to FIGS. 20 and 21. Details of the model 366 will be described later with reference to FIGS. 22 and 23.
  • both the line-of-sight direction model 364 and the line-of-sight aversion model 366 are stored in the probability model storage device 172 shown in FIG.
  • Personality information storage section 368 is also actually realized by probability model storage device 172.
  • the probabilistic model storage device 172 is realized by the SSD 300 in FIG.
  • the line of sight control system 350 further receives information regarding the roles of the robot 160 and other dialogue participants from the role determination unit 360 in response to receiving a signal from the turn change detection unit 362 indicating that a turn change has been detected.
  • the gaze direction model 364 and the gaze motion generation section 370 create parameters for controlling the gaze motion of the robot 160 using the personality information storage sections 368 and 368.
  • a line-of-sight movement control unit 372 for controlling various actuators (not shown) for controlling the line-of-sight of the robot 160 according to the parameters set.
  • Gaze direction model 364 Referring to FIG. 20, the gaze direction model 364 includes a speech gaze direction model 400 that is prepared in advance according to the combination of personality and role, and a gaze duration model that is prepared in advance according to the combination of personality and role. 402, turn change gaze direction models 404, 406, and 406 for determining gaze directions in the first, second, and third sections of turn change, respectively, which are prepared in advance according to the combination of personalities and roles. 408.
  • the configuration of the speech gaze direction model 400 is substantially the same as that shown in FIG. 6.
  • the speech gaze direction model 400 includes, in addition to the model shown in FIG. 6, a model for introverted participants and a model for neutral participants.
  • FIG. 21 shows an example of the configuration of the gaze duration model 402.
  • the gaze duration model 402 is based on the combination of the role of the robot 160 (speaker, main listener, and sublistener) and the personality assumed for the robot 160 (extrovert, introvert, neutral).
  • the degree of freedom (n) of the chi - square distribution curve, the bias in the horizontal axis direction (loc), and the amplitude in the vertical axis direction (scale) are calculated. It stores the value.
  • the degree of freedom (n) of the chi - square distribution curve the bias in the horizontal axis direction (loc), and the amplitude in the vertical axis direction (scale) are calculated. It stores the value.
  • the distribution is specified, and by randomly sampling values from the distribution, the duration can also be determined when determining the line-of-sight direction.
  • the turn change gaze direction models 404, 406, and 408 have the same configuration as the speech gaze direction model 400. The difference is that the stored value is limited to each section at the time of turn change.
  • FIG. 22 shows the configuration of the line-of-sight aversion model 366 shown in FIG. 19.
  • the gaze aversion model 366 includes a gaze aversion direction model 450 during turn change and a gaze aversion direction model 452 during speech. All of these models are provided individually.
  • FIG. 23 shows an example of the configuration of the direction model 450 for looking away when changing turns.
  • a turn change gaze aversion direction model 450 models the gaze aversion direction of the extroverted participant shown in FIG. 14, for example.
  • the turn change time gaze aversion direction model 450 is realized by an array.
  • the elements are arranged in the order in which the center column and right column in FIG. 14 are moved below the leftmost column.
  • Each element indicates a gaze averting direction (nine directions from the upper left to the lower right) and a rate (probability) of averting the gaze in that direction.
  • "L” indicates the left
  • C indicates the center
  • "R” indicates the right
  • "U” indicates the top
  • “D” indicates the bottom.
  • "LD" represents the lower left
  • "CD” represents the lower center.
  • the program can stochastically determine the direction in which the robot 160 averts its gaze during turn change, depending on the role of the robot 160.
  • FIG. 24 shows, in a flowchart format, the overall configuration of a program executed by the line of sight control system 350 to control the robot 160 according to this embodiment.
  • this program is repeatedly activated at regular time intervals, and at the time of activation, the program senses the state of the dialogue (whether or not it is time to change turns, what the role of each dialogue participant is, etc.) at step 480. and step 482 of branching the flow of control depending on whether or not the current time is the timing to determine the viewing direction. For example, in the case of a turn change, the timing for determining the line of sight direction is -1 second, -0.3 second, and 0.3 second, assuming that the timing at which the turn change occurs is 0 seconds.
  • the process of re-determining the line-of-sight direction is further performed every 0.7 seconds.
  • the time can be set in a timer, and detection can be made according to whether the timer has expired or not.
  • This program is further executed when the determination in step 482 is affirmative, and generates a line-of-sight direction model 360 according to the combination of the role currently assigned to the robot 160, its personality, and information on whether it is speaking or changing turns.
  • steps for voice recognition, voice synthesis, posture control, etc. for the robot are also executed in other steps not shown. In step 486, control based on such processing is also performed.
  • step 486 in accordance with the parameters determined in step 484, the direction of the head of the robot 160, the position of the eyeballs, etc. are controlled in order to control the line of sight of the robot 160 in accordance with the progress of the timer. Since time progresses each time the program is executed, the position of the head and the position of the eyes of the robot 160 change each time the program is executed.
  • FIG. 25 shows the configuration of a program that implements step 480 in FIG. 24 in a flowchart format.
  • step 480 includes a step 510 of reading out information indicating the role of the robot 160 and each participant from the role determination section 360 shown in FIG. and step 514 of reading information indicating the individuality given to the robot 160 from the individuality information storage unit 368 shown in FIG. 19 and ending the execution of this program.
  • the personality given to the robot 160 can be considered to be fixed. Therefore, information regarding individuality does not necessarily need to be read every time the program is executed. However, if the robot 160 is to be given not only individuality but also states that are expected to change over time, such as emotions, the information is read out each time the program is executed, as shown in FIG. This is desirable.
  • Step 484 (determining gaze direction and duration) Referring to FIG. 26, step 484 includes step 550 in which a gaze direction model and a gaze duration model corresponding to the role, turn state, and personality assigned to robot 160 are selected and read from gaze direction model 364; A step 552 of selecting and reading out the gaze aversion direction model 450 during turn change or the gaze aversion direction model 452 during utterance from the gaze aversion models 366 according to the role, turn state, and personality assigned to the turn change mode 160 .
  • step 550 if the turn state is the speaking time, the speaking gaze direction model 400 shown in FIG. 20 is selected. If the turn state is a turn change, one of the turn change line-of-sight direction models 404, 406, and 408 is selected according to the timing.
  • step 552 The program further continues step 552 with step 554 of sampling a value p in the range [0,1] from a uniform distribution, and using the value p sampled in step 554 to determine the viewing direction selected in step 550.
  • determining 556 a viewing direction from the model.
  • determining the line-of-sight direction using randomly sampled values of the line-of-sight direction model in the range [0, 1] is referred to as "sampling of the line-of-sight direction.” Details of step 556 will be described later with reference to FIG. 27.
  • This program further includes a step 558 of sampling a value p in the range [0, 1] from a uniform distribution, similar to step 554, and a process shown in FIG. 25 from the gaze duration model 402 read out in step 550.
  • the parameters n, loc, and scale of the chi - square distribution according to the set role of the robot 160 and the personality given to the robot 160 are read out, and the duration of the line of sight is calculated using these values and the value p sampled in step 558. and step 560 of determining. Details of step 560 will be described later with reference to FIG.
  • This program further includes a step 562 in which the control flow is branched depending on whether the line-of-sight direction determined in step 556 is an averted line-of-sight direction. When the determination at step 562 is negative, execution of this program ends.
  • the program further includes a step 564 of sampling a value p in the range [0, 1] from a uniform distribution in response to a positive determination in step 562, and a step 564 of sampling a value p in the range [0, 1] from a uniform distribution. and step 566 of determining the viewing direction using the viewing direction model selected in step 556 and terminating the execution of this program. Details of step 566 will be described later with reference to FIG.
  • FIG. 27 shows details of step 556 in FIG. 26 in flowchart form.
  • a variable S M is used as a variable representing the probability of directing the line of sight to the main listener
  • S S is the probability of directing the line of sight to the speaker
  • S B is used as the probability of directing the line of sight to the sublistener.
  • step 556 assigns the sum of the value of the variable SS and the value of the variable SM to the variable SM , and the value of the variable SM whose value has been calculated in this way.
  • the process includes step 600 of assigning the sum of the values of the variable SB and the value of the variable SB to the variable SB . By performing this calculation, the probability of facing toward the speaker is stored in the variable SS .
  • the variable SM stores the sum of the probability of facing the speaker and the probability of facing the main listener.
  • the variable SB stores the sum of the probabilities of facing the speaker, the main listener, and the sublistener.
  • This program further includes a step 602 in which, following step 600, the flow of control is branched depending on whether the value p is smaller than the value of the variable S. and step 604 in which the direction is determined and the execution of this program is terminated.
  • This program further includes a step 606 in which, in response to the determination in step 602 being negative, the flow of control is branched depending on whether the value p is smaller than the value of the variable S M ; In response to the affirmative, determining 608 the viewing direction to be the direction of the main listener and terminating execution of the program.
  • This program further includes a step 610 in which, in response to the negative determination in step 606, the flow of control is branched depending on whether the value p is smaller than the value of the variable SB ; At step 612, the line of sight direction is determined to be the direction of the sublistener and the execution of this program is ended. and step 614 of terminating execution of the lever program.
  • step 600 By performing the processing in step 600, such an algorithm can be used to determine the viewing direction according to the value p and the probability represented by the model.
  • FIG. 28 shows the control structure of the program executed in step 560 of FIG. 26 in a flowchart format.
  • the program includes a step 650 of reading the values n, loc, and scale from the gaze duration model 402 selected in step 556 of FIG. Step 652 of sampling p and substituting this value into the variable x, and substituting the values of n, loc, scale, and x into the equation of the chi - square distribution for calculating the gaze duration, which has already been described, calculates the gaze duration. and step 654 of calculating the value and terminating the execution of this program.
  • FIG. 29 shows the control structure of the program that implements step 566 in FIG. 26 in a flowchart format.
  • this program sets 0 to the repetition variable i for loop processing in the program, and 0 to the variable S for accumulating each probability shown in FIG. 23 in order from array number 0. , respectively, and following step 680, a step 682 of branching the flow of control depending on whether the value of variable i is less than 8.
  • "8" is the maximum value of the subscript in the array shown in FIG.
  • This program further includes a step 684 of adding the probability P(i) of the array shown in FIG. 23 to the variable S when the determination in step 682 is affirmative; Step 686 branches the flow of control according to whether or not and returning control to step 682. If the determination in step 686 is affirmative, step 690 is included in which the line-of-sight averting direction is determined to be the line-of-sight averting direction D(i) in the array of FIG. 23 and the execution of the program is ended.
  • This program further includes a step 692 in which, in response to the negative determination in step 682, the direction of averting the line of sight is determined to be the direction of averting the line of sight D(8) and the execution of the program ends.
  • the conversational robot system 150 operates as follows. Note that the following explanation will mainly be made with reference to FIGS. 24 to 29 only regarding the line of sight control system 350.
  • step 480 in FIG. 24 is executed.
  • the personality information storage unit 368 shown in FIG. 19 stores information indicating the personality to be given to the robot 160.
  • the robot 160 is given the personality of "extrovert.”
  • step 480 such information is loaded into RAM 298 of FIG.
  • step 482 it is determined whether it is time to determine the line of sight direction. Assuming that the utterance period is set as the initial value of the turn state and the gaze duration timer is cleared, the determination in step 482 of FIG. 24 is affirmative, and the control proceeds to step 484.
  • the role of the robot 160 is the speaker, one of main listener, sublistener, and averting the line of sight is selected as the line of sight direction.
  • the gaze duration time is determined as follows.
  • step 650 from the gaze duration model 402 read out in step 552 of FIG . , loc and scale are read.
  • the value p sampled in step 558 is assigned to variable x.
  • step 654 the duration of the line of sight is determined by substituting the parameters n, loc, and scale and the value of the variable x into a formula for duration calculation to calculate the value.
  • step 560 it is determined whether the line of sight direction determined in step 556 is an averted line of sight. Here, it is assumed that the direction of the line of sight is not averted. Then, the process shown in FIG. 26 (step 484 in FIG. 24) ends immediately.
  • step 486 control of the robot 160 according to the line-of-sight direction and line-of-sight duration determined in step 484 is started. Note that in step 486, control of the robot 160 is also executed in addition to the line-of-sight control. If the line-of-sight direction determined in step 556 is a line-of-sight direction, steps 564 and 566 in FIG. 26 are executed to determine the line-of-sight direction of the robot 160. In step 486, the line of sight of the robot 160 is controlled so as to be in a direction that is neither the main listener nor the sublistener.
  • Step 480 is the same as the first process.
  • step 482 since the duration time of the line of sight direction has not yet expired, the process in step 484 is not executed, and only the process in step 486 is executed. Therefore, if the change in the line-of-sight direction of the robot 160 has not yet been completed, the change in the line-of-sight direction is continued; if it has been completed, the change in the line-of-sight direction is maintained.
  • a new gaze direction and a new gaze duration are determined.
  • control of the line of sight of the robot 160 is started according to the new parameters.
  • the line of sight control system 350 repeats the above-described process.
  • the robot 160's line of sight changes when the previously determined line of sight duration time expires, or when the turn change timing is -1 second, -0.3 seconds, and 0.3 seconds. A direction determination is made.
  • the viewing direction was basically changed according to the distributions shown in FIGS. 13 to 15.
  • the distributions shown in FIGS. 13 to 15 are applied on the premise that the line-of-sight direction is set at the center and the line-of-sight direction is changed to a position other than the center.
  • the line of sight direction is changed more than once while looking away, actual observation results show that there is a tendency for the line of sight to return to the immediately previous direction, so in implementation, the line of sight is returned to the immediately previous direction.
  • Evaluation experiment 1 Regarding the robot 160 according to the embodiment described above, an evaluation experiment was conducted to examine how a change in the line of sight direction is perceived by an observer when no personality is specified.
  • a. Baseline (same proportion - head model) As a baseline, we used a model in which the robot 160 turns its gaze at the same rate in the direction of two interlocutors when its role is as a speaker. Regarding the ratio of looking at the two interlocutors and the others, the ratio of looking at the first interlocutor, the second interlocutor, and the others was set to be approximately 3:3:4. At this time, the robot looks at the person's face or at a location obtained by applying a two-dimensional Gaussian distribution with an angle of 4 degrees distributed between the two interlocutors. . When acting as a listener, I made the robot face the direction of the speaker so as not to make the listener feel uncomfortable.
  • Evaluation model 1 head-eyeball model according to embodiment
  • This is a model that implements line-of-sight control and line-of-sight aversion based on the above embodiment.
  • Evaluation model 2 head model according to embodiment
  • this model eliminates eyeball movements and performs only head movements.
  • FIG. 30 shows an example of a video image 720 used in this experiment.
  • the experiment is strictly related to changes in the line of sight direction of the robot 160. Therefore, in this evaluation experiment, a three-party dialogue between actual humans was used for the utterances, and the robot 160 made the utterances of one of them. At that time, the line of sight direction of the robot 160 was controlled to move in the same manner as in the above embodiment according to the role, personality, and turn state of the robot 160. Since the actual three-way dialogue is labeled, the role determination unit 360 and turn change detection unit 362 shown in FIG. I decided to give it to
  • a moving image of the robot 160 is placed in the center, and simple character images 730 and 732 showing the other two speakers are placed on the left and right sides.
  • the character images 730 and 732 are used to indicate a three-way dialogue, but for example, when a speaker on the right side of the robot 160 (left side in FIG. 30) is speaking, the characters around the character image 730 are A sign will now be displayed to indicate that it is present. The same applies to the speaker on the left side of the robot 160 (on the right side when facing the robot 160).
  • FIG. 30 shows a situation where the speaker on the right side is speaking.
  • the experimental procedure is as follows. First, the presentation order of the four videos was randomized to reduce order effects. Next, the evaluators were asked to rate their impressions of the videos of the four methods individually on a 7-point scale (1: very unnatural, 4: neither, 7: very natural). The above was considered as one set, and impressions were evaluated for a total of 12 videos for three dialogue sections (each dialogue section was about 1 minute in length), and the results were tabulated.
  • FIG. 31 shows the evaluation results for the question "Overall, did the robot's behavior feel natural?"
  • the results of multiple comparisons were calculated based on the mean value, standard error, and Ryan method of the results.
  • the p value was 0.020 ( ⁇ .05) between the baseline model and evaluation model 1, the p value was 0.001 ( ⁇ .05) between the comparison model and evaluation model 1, and the p value was 0.001 ( ⁇ .05) between the comparison model and evaluation model 1. There was a significant difference between evaluation model 1 and p value 0.008 ( ⁇ .05).
  • Evaluation experiment 2 In addition to the settings of Evaluation Experiment 1 above, we also took into account individuality in the robot's gaze control, and conducted an experiment to evaluate the impression of individuality (extroversion) based on gaze movements.
  • the experiment participants first watched an audio-only video (SM), and then watched three videos that included the robot's gaze movements.
  • the presentation order of the three videos was randomized to reduce order effects.
  • participants were asked to rate their impressions on a 7-point scale (1: very introverted, 4: neutral, 7: very extroverted). Furthermore, the participants were not informed that the experiment involved controlling gaze movements.
  • SM audio-only video
  • the model stores the probabilities of directing the line of sight in each direction, and when actually determining the line of sight direction, these probabilities are added together and then the line of sight direction is sampled using random numbers.
  • the model may store in advance a value obtained by adding up each probability (cumulative probability).
  • the invention is not limited to such embodiments. Directions may be classified more finely.
  • a line-of-sight direction model and a line-of-sight continuation model are prepared and used based on a combination of the interaction role, turn state, and personality of an agent such as a robot.
  • a model may be prepared without any of these.
  • criteria different from these may be used for selecting the line-of-sight direction model. For example, age, social positions such as teacher and student, parent and child, etc. may be used as criteria.
  • the above embodiment relates to three-party dialogue.
  • the present invention can be applied to conversations involving four or more people as long as each speaker can be distinguished and their roles can be classified.

Abstract

A line-of-sight control device, which provides a device for controlling line of sight such that the line of sight of an agent when in dialogue with a person can be more naturally realized, includes: a line-of-sight movement generation unit 370 for determining a line-of-sight direction of a robot, on the basis of a combination of the role of the robot in a multi-person dialogue and a state of dialogue flow, in response to reaching a timing for determining the line-of-sight direction; and a line-of-sight movement control unit 372 for generating control parameter for controlling the orientation of the face of the robot and the direction of the eyeballs of the robot, in response to the line-of-sight direction of the robot being set by a line-of-sight direction setting means.

Description

視線制御装置及び方法、非一時的記憶媒体、並びにコンピュータプログラムGaze control device and method, non-transitory storage medium, and computer program
 この発明は、ロボットなどのエージェントの、人との対話時における視線を制御する技術に関する。この出願は2022年5月27日出願の日本出願第2022-086674号に基づく優先権を主張し、前記日本出願に記載された全ての記載内容を援用するものである。 This invention relates to a technology for controlling the line of sight of an agent such as a robot when interacting with a person. This application claims priority based on Japanese Application No. 2022-086674 filed on May 27, 2022, and incorporates all the contents described in the said Japanese application.
 ロボット及びバーチャルエージェントを含む様々な対話エージェントが社会進出している。人々が対話エージェントに触れる機会は増え始めている。小売店、ホテルのロビー及び駅など、人々が集まる場においてロボットなどのエージェントを見かけることも多くなっている。 Various conversational agents, including robots and virtual agents, are entering society. Opportunities for people to interact with conversational agents are beginning to increase. Agents such as robots are increasingly being seen in places where people gather, such as retail stores, hotel lobbies, and train stations.
 ロボットなどのエージェントが人と対話する際、発話が含む言語情報のみならずジェスチャ、表情、韻律、視線などのような非言語情報も対話を円滑に進めるために重要である。また言語情報及び非言語情報は、人々が個性を意識的に又は無意識的に表出する媒体となっているとも考えられる。 When an agent such as a robot interacts with a person, not only the linguistic information contained in utterances, but also non-verbal information such as gestures, facial expressions, prosody, and gaze are important for the conversation to proceed smoothly. It is also believed that verbal and non-verbal information serve as a medium through which people consciously or unconsciously express their individuality.
 したがって、ロボットなどのエージェントが人と対話するときにも、言語情報のみならず非言語情報が大きな意味を持つ。特に、エージェントが人と対話するときに、その視線の制御は、対話を円滑にするためだけではなく、エージェントに個性を表出させる上でも重要である。 Therefore, when agents such as robots interact with people, not only verbal information but also non-verbal information has great significance. In particular, when an agent interacts with a person, controlling the agent's line of sight is important not only for facilitating the interaction but also for allowing the agent to express its individuality.
 非特許文献1には、2人の人が何らかの作業をしているときの両者の振る舞いを観察することにより、外向的な人と内向的な人とでは、視線を相手に向ける時間と、作業の対象に向ける時間とが相違することが報告されている。非特許文献1は、この結果に基づいてロボットの動作を制御することにより、ロボットと対話する人がどのような印象を受けるかに関する実験を行った結果を報告している。 Non-Patent Document 1 states that by observing the behavior of two people when they are doing a certain task, extroverts and introverts are able to determine the amount of time they spend looking at the other person and the amount of time they spend on the task. It has been reported that there is a difference in the amount of time the children spend their time focusing on the target. Non-Patent Document 1 reports the results of an experiment regarding the impression that a person who interacts with a robot receives by controlling the robot's movements based on the results.
 しかし、非特許文献1の報告は、特定の環境において所定の作業をするときの人の振る舞いに基づくものである。一般的な対話において、ロボットの視線をどのように制御するかに関する指針は非特許文献1の開示からは得られない。 However, the report in Non-Patent Document 1 is based on human behavior when performing a predetermined task in a specific environment. In general dialogue, guidelines regarding how to control the robot's line of sight cannot be obtained from the disclosure of Non-Patent Document 1.
 これに対し、後掲の特許文献1には、1人又は複数の人と対話するときのロボットの視線の制御に関する技術が開示されている。特許文献1に開示された技術においては、発話している人の方向にロボットの視線を向ける。しかし、ある人が発話中に他の人が発話しても、基本的にはロボットの視線をその新たな発話者の方向には向けない。最初の発話者の発話に対するロボットの関心度を算出し、その値がしきい値より低ければ新たな発話の方向にロボットの視線を向ける。 On the other hand, Patent Document 1 listed below discloses a technology related to controlling the line of sight of a robot when interacting with one or more people. In the technique disclosed in Patent Document 1, the robot's line of sight is directed toward the person who is speaking. However, even if one person speaks while another person speaks, the robot's gaze generally does not turn in the direction of the new speaker. The robot's degree of interest in the utterance of the first speaker is calculated, and if that value is lower than a threshold, the robot directs its gaze in the direction of the new utterance.
 非特許文献1によれば、このようにしてロボットの視線を制御することにより、ロボットが短期間に視線の方向をあちこちに向ける不自然な動きを防止できるという効果があるとされている。 According to Non-Patent Document 1, controlling the robot's line of sight in this manner has the effect of preventing the robot from unnaturally moving its line of sight here and there in a short period of time.
特開2022-057507号公報Japanese Patent Application Publication No. 2022-057507
 人の対話時には、発話権(ターン)がある人から別の人に移動するとき(ターン交替時)付近において特徴的な視線方向の動きが見られる。それだけでなく、発話者が話をしているときにも、他の発話者が必ず発話者の方向を向いているわけではない。人は、発話者ではなく別の方向に視線を向けたり、参加者が3人以上いるときには、話をしていない別の参加者の方向に視線を向けたりすることもある。発話者の視線についても同様である。さらにその視線方向の動きも、人の性格(個性)によって違いが見られる。特許文献1に開示の技術は、こうした情報に基づいてロボットの視線を制御しているわけではない。そのため、得られるロボットの視線の動きは必ずしも自然なものとはならない。 During human dialogue, a characteristic movement of the gaze direction can be seen around the time when the right to speak (turn) is transferred from one person to another (when the turn is changed). Not only that, even when a speaker is speaking, other speakers are not necessarily facing the speaker's direction. People may look away from the speaker, or when there are three or more participants, look towards another participant who is not speaking. The same applies to the speaker's line of sight. Furthermore, the movement of the gaze direction also differs depending on a person's personality (individuality). The technology disclosed in Patent Document 1 does not control the robot's line of sight based on such information. Therefore, the resulting movement of the robot's line of sight is not necessarily natural.
 したがって、ロボットなどのエージェントの人との対話時の視線を、より自然に実現できるような視線の制御装置が望まれている。 Therefore, there is a need for a line-of-sight control device that allows agents such as robots to more naturally control their line-of-sight when interacting with people.
 この発明の第1の局面に係る視線制御装置は、複数人対話におけるロボットの視線を制御するための視線制御装置であって、視線方向の決定のためのタイミングとなったことに応答して、複数人対話におけるロボットの役割と対話フローの状態との組み合わせに基づいて、ロボットの視線方向を定めるための視線方向設定手段と、視線方向設定手段によりロボットの視線方向が定められたことに応答して、ロボットの顔の向き及び眼球の方向を制御するための制御パラメータを生成するための制御パラメータ生成手段とを含む。 A line-of-sight control device according to a first aspect of the present invention is a line-of-sight control device for controlling the line of sight of a robot in a multi-person interaction, and in response to a timing for determining a line-of-sight direction, A line-of-sight direction setting means for determining the line-of-sight direction of the robot based on a combination of the role of the robot in a multi-person dialogue and the state of the dialogue flow; and control parameter generation means for generating control parameters for controlling the direction of the robot's face and the direction of the eyeballs.
 好ましくは、視線方向設定手段は、複数人対話における複数の参加者の各役割と、対話フローの状態との組み合わせに応じて、複数の参加者があらかじめ定められた複数の方向をそれぞれ向く確率を役割ごとに定める方向決定モデルを記憶するための方向決定モデル記憶手段と、視線方向の決定のためのタイミングとなったことに応答して、方向決定モデルからロボットの役割と対話フローの状態との組み合わせに応じた確率分布を抽出するための確率分布抽出手段と、確率分布抽出手段により抽出された確率分布からロボットの視線方向をサンプリングするための第1サンプリング手段とを含む。 Preferably, the line-of-sight direction setting means calculates the probability that the plurality of participants will each face in a plurality of predetermined directions according to a combination of each role of the plurality of participants in the multiperson dialogue and the state of the dialogue flow. A direction determination model storage means for storing a direction determination model determined for each role, and a direction determination model storage means for storing a direction determination model determined for each role, and a direction determination model that stores the robot's role and the state of the dialogue flow from the direction determination model in response to the timing for determining the line of sight direction. It includes a probability distribution extracting means for extracting a probability distribution according to the combination, and a first sampling means for sampling the line of sight direction of the robot from the probability distribution extracted by the probability distribution extracting means.
 より好ましくは、方向決定モデルの複数の方向は、複数の参加者の方向と、複数の参加者の方向のいずれとも異なる視線逸らし方向とを含む。 More preferably, the plurality of directions of the direction determination model include the directions of the plurality of participants and a gaze aversion direction that is different from any of the directions of the plurality of participants.
 さらに好ましくは、視線方向設定手段はさらに、ロボットの役割と対話フローの状態との組み合わせに応じて視線逸らし方向を確率的に決定するための確率モデルからなる視線逸らし方向モデルを記憶するための視線逸らし方向モデル記憶手段と、第1サンプリング手段によりサンプリングされた視線方向が、視線逸らし方向であることに応答して、視線逸らし方向モデルからロボットの視線を逸らす方向をサンプリングするための第2サンプリング手段とを含む。 More preferably, the line-of-sight direction setting means further includes a line-of-sight direction model for storing a line-of-sight direction model that is a probabilistic model for probabilistically determining the line-of-sight direction according to a combination of the role of the robot and the state of the dialogue flow. a deflection direction model storage means; and a second sampling means for sampling the direction in which the robot's gaze is diverted from the gaze aversion direction model in response to the fact that the gaze direction sampled by the first sampling device is the gaze aversion direction. including.
 好ましくは、視線制御装置は、さらに、ロボットの役割と、対話フローの状態と、視線方向設定手段により定められた視線方向との組み合わせに応じて、ロボットの視線の継続時間を算出するための継続時間算出部を含む。 Preferably, the line-of-sight control device further includes a continuation for calculating the duration of the robot's line-of-sight according to a combination of the role of the robot, the state of the dialogue flow, and the line-of-sight direction determined by the line-of-sight direction setting means. Contains a time calculation section.
 より好ましくは、視線方向の決定のためのタイミングは、対話フローの状態がターン交替状態のときと、それ以外のときとで異なる。 More preferably, the timing for determining the line of sight direction is different between when the dialogue flow is in a turn change state and when it is not.
 さらに好ましくは、視線方向の決定のためのタイミングは、対話フローの状態がターン交替状態のときにはターン交替状態中におけるあらかじめ定められたタイミングであり、対話フローの状態がターン交替状態でないときには、直前に継続時間算出部により算出された継続時間が満了したタイミングである。 More preferably, the timing for determining the line of sight direction is a predetermined timing during a turn change state when the state of the dialogue flow is a turn change state, and a predetermined timing immediately before the turn change state when the state of the dialogue flow is not a turn change state. This is the timing when the duration calculated by the duration calculation unit expires.
 好ましくは、視線方向設定手段は、複数人対話における複数の参加者の各役割と、対話フローの状態と、ロボットに想定される個性との組み合わせに応じて、複数の参加者があらかじめ定められた複数の方向をそれぞれ向く確率を役割ごとに定める方向決定モデルを記憶するための方向決定モデル記憶手段と、視線方向の決定のためのタイミングとなったことに応答して、方向決定モデルからロボットの役割と対話フローの状態と個性との組み合わせに応じた確率分布を抽出するための確率分布抽出手段と、確率分布抽出手段により抽出された確率分布からロボットの視線方向をサンプリングするための第1サンプリング手段とを含む。 Preferably, the line-of-sight direction setting means allows the plurality of participants to be predetermined according to the combination of the roles of the plurality of participants in the multiperson dialogue, the state of the dialogue flow, and the expected personality of the robot. A direction determination model storage means for storing a direction determination model that determines the probability of facing each of a plurality of directions for each role; Probability distribution extraction means for extracting a probability distribution according to the combination of role, dialogue flow state, and personality, and first sampling for sampling the robot's gaze direction from the probability distribution extracted by the probability distribution extraction means. means.
 より好ましくは、方向決定モデルの複数の方向は、複数の参加者の方向と、複数の参加者の方向のいずれとも異なる視線逸らし方向とを含む。 More preferably, the plurality of directions of the direction determination model include the directions of the plurality of participants and a gaze aversion direction that is different from any of the directions of the plurality of participants.
 さらに好ましくは、視線方向設定手段はさらに、ロボットの役割と対話フローの状態と個性との組み合わせに応じて視線逸らし方向を確率的に決定するための確率モデルからなる視線逸らし方向モデルを記憶するための視線逸らし方向モデル記憶手段と、第1サンプリング手段によりサンプリングされた視線方向が、視線逸らし方向であることに応答して、視線逸らし方向モデルからロボットの視線を逸らす方向をサンプリングするための第2サンプリング手段とを含む。 More preferably, the line-of-sight direction setting means further stores a line-of-sight direction model that is a probabilistic model for probabilistically determining the line-of-sight direction according to a combination of the robot's role, dialogue flow state, and personality. and a second line of sight for sampling the direction in which the robot looks away from the line-of-sight direction model in response to the line-of-sight direction sampled by the first sampling means being the line-of-sight direction. sampling means.
 好ましくは、視線制御装置は、さらに、ロボットの役割と、対話フローの状態と、個性と、視線方向設定手段により定められた視線方向との組み合わせに応じて、ロボットの視線の継続時間を算出するための継続時間算出部を含む。 Preferably, the line of sight control device further calculates the duration of the robot's line of sight according to a combination of the robot's role, the state of the dialogue flow, its personality, and the line of sight direction determined by the line of sight direction setting means. Contains a duration calculation section for.
 より好ましくは、視線方向の決定のためのタイミングは、対話フローの状態がターン交替状態のときと、それ以外のときとで異なる。 More preferably, the timing for determining the line of sight direction is different between when the dialogue flow is in a turn change state and when it is not.
 さらに好ましくは、視線方向の決定のためのタイミングは、対話フローの状態がターン交替状態のときにはターン交替状態中におけるあらかじめ定められたタイミングであり、対話フローの状態がターン交替状態でないときには、直前に継続時間算出部により算出された継続時間が満了したタイミングである。 More preferably, the timing for determining the line of sight direction is a predetermined timing during a turn change state when the state of the dialogue flow is a turn change state, and a predetermined timing immediately before the turn change state when the state of the dialogue flow is not a turn change state. This is the timing when the duration calculated by the duration calculation unit expires.
 この発明の第2の局面に係る視線制御方法は、複数人対話におけるロボットの視線を制御するための、コンピュータにより実現される視線制御方法であって、コンピュータが、視線方向の決定のためのタイミングとなったことに応答して、複数人対話におけるロボットの役割と対話フローの状態との組み合わせに基づいて、ロボットの視線方向を定めるステップと、コンピュータが、視線方向を定めるステップにおいてロボットの視線方向が定められたことに応答して、ロボットの顔の向き及び眼球の方向を制御するための制御パラメータを生成するステップとを含む。 A line-of-sight control method according to a second aspect of the invention is a line-of-sight control method implemented by a computer for controlling the line-of-sight of a robot in a multi-person interaction, wherein the computer determines the timing for determining the line-of-sight direction. In response to this, the computer determines the robot's line of sight direction based on the combination of the robot's role in the multi-person dialogue and the state of the dialogue flow, and the computer determines the robot's line of sight direction in the step of determining the line of sight direction. and generating control parameters for controlling the direction of the robot's face and the direction of the eyes in response to the determination of the robot.
 この発明の第3の局面に係るコンピュータプログラムは、複数人対話におけるロボットの視線を制御するためのコンピュータプログラムであって、コンピュータを、視線方向の決定のためのタイミングとなったことに応答して、複数人対話におけるロボットの役割と対話フローの状態との組み合わせに基づいて、ロボットの視線方向を定めるための視線方向設定手段と、視線方向設定手段によりロボットの視線方向が定められたことに応答して、ロボットの顔の向き及び眼球の方向を制御するための制御パラメータを生成するための制御パラメータ生成手段として機能させる。 A computer program according to a third aspect of the present invention is a computer program for controlling the line of sight of a robot in a multi-person interaction, and the computer program controls the computer to control the line of sight of a robot in response to timing for determining the line of sight direction. , a line-of-sight direction setting means for determining the line-of-sight direction of the robot based on a combination of the role of the robot in a multi-person dialogue and the state of the dialogue flow, and a response to the line-of-sight direction of the robot being determined by the line-of-sight direction setting means. The controller functions as a control parameter generation means for generating control parameters for controlling the direction of the robot's face and the direction of the eyeballs.
 以上のようにこの発明によれば、ロボットなどのエージェントの人との対話時の視線を、より自然に実現できるような視線の制御装置及び方法、並びにコンピュータプログラムを提供できる。 As described above, according to the present invention, it is possible to provide a line-of-sight control device and method, as well as a computer program, which can more naturally realize the line-of-sight of an agent such as a robot when interacting with a person.
図1は、予備実験の設定を模式的に示す図である。FIG. 1 is a diagram schematically showing the settings of a preliminary experiment. 図2は、予備実験における各参加者の発話に対するタグ付け方法を説明するための模式図である。FIG. 2 is a schematic diagram for explaining a method for tagging utterances of each participant in a preliminary experiment. 図3は、発話のターン交替のタイミングを説明するための模式図である。FIG. 3 is a schematic diagram for explaining the timing of turn changes in utterances. 図4は、予備実験の3者対話のターン交替時における、各発話者の視線方向の頻度を示すグラフである。FIG. 4 is a graph showing the frequency of each speaker's gaze direction during turn changes in a three-way dialogue in a preliminary experiment. 図5は、予備実験の3者対話における、ターン交替時以外の各発話者の視線方向の時間的割合を示すグラフである。FIG. 5 is a graph showing the temporal ratio of each speaker's line of sight direction in a three-way dialogue in a preliminary experiment, except at the time of turn change. 図6は、対話の各参加者について、発話中の視線を他の各参加者に向ける割合と視線を逸らす時間とを表形式で示す図である。FIG. 6 is a diagram showing, in a table format, the proportion of each participant in a dialogue who directs his/her gaze toward each other participant while speaking and the time during which the conversation averts his/her gaze. 図7は、予備実験の3者対話における、外向的な特定参加者の、対話への参与役割の各々における視線方向の時間的割合を示すグラフである。FIG. 7 is a graph showing the temporal ratio of the line of sight direction of a particular extroverted participant in each of the participating roles in the dialogue in the three-way dialogue of the preliminary experiment. 図8は、予備実験の3者対話における、内向的な特定参加者の、対話への参与役割の各々における視線方向の時間的割合を示すグラフである。FIG. 8 is a graph showing the temporal ratio of the direction of gaze of a specific introverted participant in each of the participating roles in the dialogue in the three-way dialogue of the preliminary experiment. 図9は、外向的な対話参加者が発話者のときに、各参加者に視線を向けた時間の分布を示すグラフである。FIG. 9 is a graph showing the distribution of the times when an extroverted dialogue participant turned his/her gaze toward each participant when the speaker was the speaker. 図10は、内向的な対話参加者が発話者のときに、各参加者に視線を向けた時間の分布を示すグラフである。FIG. 10 is a graph showing the distribution of times when an introverted dialogue participant turns his/her gaze toward each participant when the speaker is the speaker. 図11は、発話中における各役割の参加者の視線を逸らした回数の頻度を示すグラフである。FIG. 11 is a graph showing the frequency of the number of times participants in each role averted their gaze during speech. 図12は、対話参加者が視線を逸らすときの時間の分布を示すヒストグラムとその近似曲線を示すグラフである。FIG. 12 is a graph showing a histogram showing the distribution of times when dialogue participants avert their gaze and an approximate curve thereof. 図13は、対話参加者が視線を逸らすときの瞳の位置の割合を示す図である。FIG. 13 is a diagram showing the ratio of pupil positions when dialogue participants avert their line of sight. 図14は、外向的な対話参加者が視線を逸らすときの瞳の位置の割合を示す図である。FIG. 14 is a diagram showing the ratio of pupil positions when an extroverted conversation participant averts his/her line of sight. 図15は、内向的な対話参加者が視線を逸らすときの瞳の位置の割合を示す図である。FIG. 15 is a diagram showing the ratio of pupil positions when an introverted conversation participant averts his/her line of sight. 図16は、この発明の実施形態に係る会話ロボットシステム150のハードウェア構成を示すブロック図である。FIG. 16 is a block diagram showing the hardware configuration of a conversational robot system 150 according to an embodiment of the present invention. 図17は、図16に示すロボットの外形を示す図である。FIG. 17 is a diagram showing the outer shape of the robot shown in FIG. 16. 図18は、図16に示すコンピュータのハードウェア構成を示すブロック図である。FIG. 18 is a block diagram showing the hardware configuration of the computer shown in FIG. 16. 図19は、図16に示すロボットの制御装置が実現する視線制御装置の機能的構成を示すブロック図である。FIG. 19 is a block diagram showing the functional configuration of a line-of-sight control device realized by the robot control device shown in FIG. 16. 図20は、図19に示す視線方向モデルの構成を示す模式図である。FIG. 20 is a schematic diagram showing the configuration of the line-of-sight direction model shown in FIG. 19. 図21は、図20に示す個性別・役割別発話時視線方向モデルの1例を示す図である。FIG. 21 is a diagram illustrating an example of the gaze direction model for each individual and role during utterance shown in FIG. 20. 図22は、図19に示す視線逸らしモデルの構成の1例を示す模式図である。FIG. 22 is a schematic diagram showing an example of the configuration of the line-of-sight aversion model shown in FIG. 19. 図23は、図22に示す個性別・ターン交替時・視線逸らし時視線方向モデルの1例を示す模式図である。FIG. 23 is a schematic diagram showing an example of the individual-by-individual, turn-changing, and gaze-averting gaze direction models shown in FIG. 22. 図24は、この発明の実施形態において、コンピュータを視線制御装置として機能させるコンピュータプログラムの制御構造を示すフローチャートである。FIG. 24 is a flowchart showing a control structure of a computer program that causes a computer to function as a line-of-sight control device in an embodiment of the present invention. 図25は、図24に示す状態センシングステップをコンピュータに実行させるコンピュータプログラムの制御構造を示すフローチャートである。FIG. 25 is a flowchart showing a control structure of a computer program that causes a computer to execute the state sensing step shown in FIG. 24. 図26は、図24に示す視線方向と継続時間を決定するステップをコンピュータに実行させるコンピュータプログラムの制御構造を示すフローチャートである。FIG. 26 is a flowchart showing a control structure of a computer program that causes a computer to execute the step of determining the line-of-sight direction and duration shown in FIG. 24. 図27は、図24に示す視線方向決定ステップをコンピュータに実行させるコンピュータプログラムの制御構造を示すフローチャートである。FIG. 27 is a flowchart showing a control structure of a computer program that causes a computer to execute the line-of-sight direction determining step shown in FIG. 24. 図28は、図26に示す視線継続時間の決定ステップをコンピュータに実行させるコンピュータプログラムの制御構造を示すフローチャートである。FIG. 28 is a flowchart showing a control structure of a computer program that causes a computer to execute the step of determining the gaze duration time shown in FIG. 26. 図29は、図26の示す視線逸らし時の視線方向を決定するステップをコンピュータに実行させるコンピュータプログラムの制御構造を示すフローチャートである。FIG. 29 is a flowchart showing a control structure of a computer program that causes a computer to execute the step of determining the direction of the line of sight when the line of sight is averted, shown in FIG. 図30は、評価実験のためのビデオ画面の配置を示す図である。FIG. 30 is a diagram showing the arrangement of video screens for evaluation experiments. 図31は、評価実験の結果を示すグラフである。FIG. 31 is a graph showing the results of the evaluation experiment. 図32は、評価実験の結果を示すグラフである。FIG. 32 is a graph showing the results of the evaluation experiment.
 以下の説明及び図面においては、同一の部品には同一の参照番号を付してある。したがって、それらについての詳細な説明は繰返さない。なお以下の実施形態は、エージェントとして人の形をしたロボットを用いている。しかしこの発明はそのような実施形態には限定されない。ロボットとして必ずしも人の形をしたものではなく、眼球があって話をすることが前提とされているものならばどのような形でもよい。また、ロボットのように3次元的な実体を持たずとも、例えば2次元的な画像として表現される仮想エージェント、又は仮想空間上に3次元的な画像として表現される仮想エージェントに対してもこの発明を適用できる。 In the following description and drawings, the same parts are given the same reference numerals. Therefore, detailed description thereof will not be repeated. Note that the following embodiment uses a human-shaped robot as an agent. However, the invention is not limited to such embodiments. The robot does not necessarily have to be human-shaped; it can be of any shape as long as it has eyeballs and can talk. This also applies to virtual agents that do not have a three-dimensional entity such as robots, but are expressed as two-dimensional images, or virtual agents that are expressed as three-dimensional images in virtual space. The invention can be applied.
 第1 予備実験
 1.データ収集の目的
 複数対話時のロボットの視線の動きを自然なものにするためには、実際の人間の視線を調べ、必要な情報を収集する必要がある。そこために、我々は以下のような予備実験を行った。図1に、予備実験の設定を示す。
1st preliminary experiment 1. Purpose of data collection In order to make the robot's line of sight movements natural during multiple interactions, it is necessary to investigate the line of sight of actual humans and collect the necessary information. To this end, we conducted the following preliminary experiments. Figure 1 shows the setup of the preliminary experiment.
 図1を参照して、予備実験は3者対話50として行った。3者対話50の参加者は参加者60、62及び64である。図1では参加者60、62及び64はいずれも立っているが、予備実験においては視線の方向が重要であり、互いの位置を固定する必要がある。そのために、予備実験では後述するように椅子を準備し、参加者にはそれらの椅子に座って対話をしてもらった。予備実験においては、これら参加者に、井戸端会議のように、特に目的なく明確に議論の進め方が存在しないような形により自由に対話してもらい、そのときの各参加者の視線の方向に関する情報を収集した。このような対話においては、明確な議論の進め方のルールが存在しないにもかかわらず、会話が弾むことも多い。エージェントが社会進出するためには、会議のような目的のある対話から何気ない対話まで広い範囲のインタラクションができるようにすることが望ましい。 Referring to FIG. 1, a preliminary experiment was conducted as a three-way dialogue 50. The participants in the three-way dialogue 50 are participants 60, 62, and 64. In FIG. 1, participants 60, 62, and 64 are all standing, but in a preliminary experiment, the direction of their line of sight is important, and it is necessary to fix each other's positions. To this end, in a preliminary experiment, we prepared chairs as described below and asked participants to sit on these chairs and have a conversation. In a preliminary experiment, we asked these participants to freely interact in a manner similar to a waterfront meeting, where there is no specific purpose and no clear way to proceed with the discussion, and we collected information about the direction of each participant's line of sight at that time. collected. In such dialogues, there are often no clear rules for how to proceed with the discussion, but the conversation is often lively. In order for agents to advance into society, it is desirable to be able to have a wide range of interactions, from purposeful conversations such as meetings to casual conversations.
 ところで、このように3者対話をしているときには、誰が発話をするか、すなわち誰が発話する権利を持つかが大きな意味を持つ。ここではこのような権利を発話権と呼び、発話権を保持している参加者を発話者と呼ぶものとする。他の参加者は聞く人の役割を持つことになる。これをここではリスナと呼ぶ。発話者は適宜交替していく。このように発話権が交替することをここではターン交替と呼ぶ。ターン交替があるたびに、対話における各参加者は発話者になったり、リスナになったりする。このように対話に参与する各参加者の立場をここでは参与役割、又は単に役割と呼ぶ。 By the way, when a three-way dialogue is held like this, who speaks, that is, who has the right to speak, has great significance. Here, such a right will be referred to as a right to speak, and a participant who holds the right to speak will be referred to as a speaker. Other participants will have the role of listeners. This is called a listener here. Speakers will be alternated as appropriate. This exchange of speaking rights is called turn exchange here. Each time there is a turn, each participant in the dialogue becomes a speaker or a listener. The position of each participant who participates in the dialogue is referred to here as a participating role or simply a role.
 過去の研究から、対話の参加者は、対話の状況に応じて自己の役割を意識的に、又は無意識的に認識していることが知られている。すなわち、各参加者は、自分自身が発話者なのか、発話者から主に話しかけられているリスナなのか、会話に少ししか関与していないリスナなのかを常に認識している。ここでは、発話者から主に話しかけられているリスナをメインリスナ(ML)と呼び、会話に少ししか関与していないリスナをサブリスナ(SL)と呼ぶ。 It is known from past research that participants in dialogue consciously or unconsciously recognize their own roles depending on the situation of the dialogue. That is, each participant always recognizes whether he or she is the speaker, a listener who is primarily being addressed by the speaker, or a listener who is only slightly involved in the conversation. Here, a listener who is mainly addressed by a speaker is called a main listener (ML), and a listener who is only slightly involved in the conversation is called a sublistener (SL).
 これも従来の研究から、会話における各参加者の役割と、その視線の動きに相関があることが知られている。予備実験においては、この考え方に従い、3者対話50における参加者が、各役割に応じてどのように視線を動かしているかに関する情報を収集した。 It is also known from previous research that there is a correlation between the role of each participant in a conversation and their gaze movements. In a preliminary experiment, in accordance with this idea, information was collected regarding how the participants in the three-way dialogue 50 moved their line of sight according to their respective roles.
 2.データ解析(ラベル付け)
 A.データセット
 予備実験には男女あわせて9人が参加した。これら参加者により、各グループが3人を含む6グループを形成した。各グループ内において、参加者3名が自由に対話を行った。対話の内容はフリートークである。対話の継続時間は20分から30分の範囲だった。
2. Data analysis (labeling)
A. Dataset Nine people, both male and female, participated in the preliminary experiment. These participants formed 6 groups with 3 people in each group. Within each group, three participants engaged in free dialogue. The content of the dialogue is free talk. The duration of the conversations ranged from 20 to 30 minutes.
 各グループの参加者は、三角形の3頂点にそれぞれ配置された3個の椅子にすわって話を行った。各参加者の前にカメラが設置され、各参加者の真正面から顔及び体の動きが撮影された。各参加者はヘッドセットマイクを装着しており、各参加者の音声データ及び議論全体の動画を収集した。対話終了後、これら録画及び音声データから対話の書き起こしを作成した。 Participants in each group spoke while sitting on three chairs placed at the three vertices of a triangle. A camera was installed in front of each participant, and each participant's face and body movements were photographed from directly in front. Each participant wore a headset microphone, and audio data of each participant and video of the entire discussion were collected. After the dialogue was completed, a transcript of the dialogue was created from these recordings and audio data.
 次に、アノテータが、視線方向、眼球の方向、発話権の有無、ターン交替、各参加者の対話における役割に注目して書き起こしデータに対するラベル付を行った。図2にそうして得られたラベル付対話データの例を示す。こうして得られたラベル付対話データに対して以下に説明する解析が行われた。 Next, the annotator labeled the transcribed data, paying attention to the direction of the line of sight, the direction of the eyeballs, the presence or absence of speaking rights, turn changes, and each participant's role in the dialogue. FIG. 2 shows an example of labeled dialogue data obtained in this way. The following analysis was performed on the labeled dialogue data thus obtained.
 B.データの分析
 図2には3人の参加者A、B及びCの視線ラベル列80、82及び84の例を示す。図2を参照して、例えば参加者Aの視線ラベル列80におけるラベル「B」は、この時間帯に参加者Aが参加者Bを見ていたことを示す。同様に視線ラベル列80におけるラベル「C」は参加者Aが参加者Cを見ていたことを示す。視線ラベル列80における「視線逸らし」は、参加者Aが参加者B及び参加者Cのいずれも見ていなかったことを示す。すなわちこのラベルは、この期間に参加者Aが他の参加者から目を逸らしていたことを示す。このように参加者が他の参加者を見ていないことをこの明細書においては「視線逸らし」と呼び、その継続時間を「視線逸らし継続時間」と呼ぶ。
B. Data Analysis FIG. 2 shows examples of gaze label sequences 80, 82, and 84 for three participants A, B, and C. Referring to FIG. 2, for example, the label "B" in the gaze label column 80 of participant A indicates that participant A was looking at participant B during this time period. Similarly, the label "C" in the line of sight label string 80 indicates that participant A was looking at participant C. “Averted gaze” in the line of sight label column 80 indicates that participant A was not looking at either participant B or participant C. In other words, this label indicates that Participant A was looking away from other participants during this period. In this specification, a participant not looking at another participant in this manner is referred to as a "look away", and its duration is referred to as a "look away duration time".
 視線ラベル列82及び84についても視線ラベル列80と同様に作成した。 The line-of-sight label strings 82 and 84 were also created in the same way as the line-of-sight label string 80.
 C.眼球の方向
 人が人を見る動作において、ラベルと顔及び瞳の位置との間にはあまり大きなズレは生じない。一方、視線を逸らすという動きにおいては、視線、特に瞳の位置は非常に重要である。しかし、視線の方向を定める精度に限界があることに鑑み、この予備実験においては瞳の位置に対応する眼球の方向に関しては、中央及び中央の斜めを含む上下左右からなる9方向によりラベル付を行った。
C. Direction of Eyeballs When a person looks at a person, there is not a large discrepancy between the label and the position of the face and eyes. On the other hand, when looking away, the line of sight, especially the position of the eyes, is very important. However, considering that there is a limit to the accuracy of determining the direction of the line of sight, in this preliminary experiment, the direction of the eyeball corresponding to the position of the pupil was labeled in nine directions consisting of the top, bottom, left, and right, including the center and diagonal of the center. went.
 図2の視線ラベル列80の「視線逸らし」と記載された部分の下には、そのときに参加者Aの視線がどの方向を向いていたかを示すラベルが付加情報として記載されている。 Below the portion of the line of sight label row 80 in FIG. 2 labeled "averted gaze", a label indicating which direction participant A's gaze was facing at that time is written as additional information.
 例えば「上」と記載された期間には参加者Aの視線が上を向いていたことを示し「右上」と記載された期間には、参加者Aの視線が右上を向いていたことを示す。このとき、顔の角度だけではなく眼球の方向も含めて視線とし、この2つの要素から総合的に判断された方向を視線の方向としている。参加者B及び参加者Cについても同様のラベル付を行った。 For example, a period marked "above" indicates that Participant A's line of sight was directed upward, and a period marked "Top right" indicates that Participant A's line of sight was directed upward to the right. . At this time, the line of sight includes not only the angle of the face but also the direction of the eyeballs, and the direction comprehensively determined from these two elements is the direction of the line of sight. Similar labeling was performed for Participant B and Participant C.
 D.発話権の有無及びターン交替
 図3に、実際のターン交替の様子を示す。この例においては、最初に参加者Aが発話権を持ち、発話100を行っている。これをターン交替ラベル102により示す。この次に発話者Cが発話権を取り、発話104を行っている。これをターン交替ラベル106により示す。ターン交替のタイミングはアノテータが判断する。例えばターン交替ラベル106により、発話権が参加者Aから参加者Cに移動したことが明確に分かる。
D. Presence of speaking right and turn change Figure 3 shows the actual turn change. In this example, participant A initially has the right to speak and is making the utterance 100. This is indicated by turn alternation label 102. Next, speaker C takes the right to speak and makes an utterance 104. This is indicated by turn alternation label 106. The annotator determines the timing of the turn change. For example, the turn change label 106 clearly indicates that the right to speak has been transferred from participant A to participant C.
 この実験においては、両者の発話権が交替するタイミングの前後1秒間の合計2秒間をターン交替108とし、特にこの間の各参加者の視線の方向の動きを視線ラベルから解析した。 In this experiment, a total of 2 seconds, 1 second before and after the timing when the right to speak was exchanged, was defined as the turn change 108, and in particular, the movement in the direction of each participant's line of sight during this period was analyzed from the line of sight label.
 E.役割
 3者対話における視線解析においては、対話における各参加者の役割が重要である。この予備実験においては、上記したように発話者、メインリスナ及びサブリスナを定義し、これらの各々についてターン交替時及びそれ以外に分けて視線解析を行った。なお、ターン交替時には発話権が移動する。ターン交替時の各参加者の役割を明確にするため、ここでは、ターン交替時における発話者とは、ターン交替により発話権をとった人のことをいう。図3の場合には参加者Cが発話者である。ターン交替時におけるメインリスナとは、一つ前に発話権を持っていて、ターン交替により発話者に発話権を譲った人のことをいう。図3の場合には参加者Aがメインリスナである。この発話権の交替に関与しなかった人がサブリスナである。図3の場合、参加者Bがサブリスナである。
E. Roles In line-of-sight analysis in three-party dialogue, the role of each participant in the dialogue is important. In this preliminary experiment, we defined the speaker, the main listener, and the sublistener as described above, and analyzed the line of sight for each of them separately at the time of turn change and at other times. Note that the right to speak changes when the turn changes. In order to clarify the role of each participant at the time of turn change, here, the speaker at the time of turn change refers to the person who has taken the right to speak due to turn change. In the case of FIG. 3, participant C is the speaker. The main listener at the time of a turn change is a person who previously had the right to speak and who has given up the right to speak to the speaker due to the turn change. In the case of FIG. 3, participant A is the main listener. A person who is not involved in this exchange of speaking rights is a sublistener. In the case of FIG. 3, participant B is the sublistener.
 F.解析結果
 a.ターン交替時
 ターン交替時の役割ごとの視線の動きの割合の推移について図4に示す。図4において、横軸は時間(単位は秒)である。時間t=0.0の時に発話者がターンを取り、話し始める。縦軸は時間tにおいて発話者がどこを見ているのかの統計に基づき、その割合(0.0-1.0)を表している。例えば図3(A)の横軸t=0.0秒において縦軸の値を見るとML、SL、視線逸らしはそれぞれ0.37、0.19及び0.44である。これは話し始めたタイミングでの発話者の視線の先の総計の中で、0.37がメインリスナを、0.19がサブリスナを、それぞれ見ており、残る0.44は視線を逸らしていたことを表す。視線方向は0.1秒ごとに算出されており、図3は、発話者が発話権を取り話し始めたタイミングの前後1秒ずつ、合計2秒の区間における発話者の視線のやり場の割合が、時間経過とともに遷移する様子を表している。
F. Analysis results a. During turn change Figure 4 shows the changes in the rate of eye movement for each role during turn change. In FIG. 4, the horizontal axis is time (unit: seconds). At time t=0.0, the speaker takes a turn and starts speaking. The vertical axis represents the ratio (0.0-1.0) based on statistics of where the speaker is looking at time t. For example, looking at the values on the vertical axis when the horizontal axis t=0.0 seconds in FIG. 3(A), ML, SL, and gaze aversion are 0.37, 0.19, and 0.44, respectively. This means that out of the total number of the speakers' line of sight when they started speaking, 0.37 were looking at the main listener, 0.19 were looking at the sublistener, and the remaining 0.44 were looking away. represents something. The line of sight direction is calculated every 0.1 seconds, and Figure 3 shows the proportion of the speaker's line of sight in a total of 2 seconds, 1 second before and after the moment when the speaker takes the right to speak and starts speaking. , which shows the transition over time.
 なお、後述する実施形態においては、ターン交替時の2秒の区間を、ターン交替のタイミングを原点(0秒)として、-1.0秒から-0.3秒、-0.3秒から0.3秒、及び0.3秒から1秒の3つの区間に分割し、各区間の先頭において視線方向の決定を行う。そのため、予備実験においては、これら各区間についてそれぞれ別々に統計をとりモデルを作成する。 In addition, in the embodiment described later, the 2-second interval at the time of turn change is set from -1.0 seconds to -0.3 seconds, and from -0.3 seconds to 0 seconds, with the timing of turn change being the origin (0 seconds). It is divided into three sections: .3 seconds, and from 0.3 seconds to 1 second, and the line-of-sight direction is determined at the beginning of each section. Therefore, in a preliminary experiment, statistics are taken separately for each of these sections and a model is created.
 ・発話者(SP)
 図4(A)にターン交替時における発話者(SP)の視線の割合の推移を示す。発話者はターンを取る1秒前はメインリスナ(ML)、つまり、前の発話者を見ている割合が高い。しかし、発話者が発話権を取り、発話を開始する時点(t=0.0秒)に向かうにつれて、メインリスナから視線を外す割合が高くなる。そして、発話者が話し始めて0.1秒経過したあたりにおいて発話者が視線を逸らす割合はピークを迎える。その後、発話者の視線がメインリスナを向く割合と視線を逸らす割合がほぼ同等に高くなり、ターン交替における視線の遷移を終える。この結果より、人はターン交替時の最初に前の発話者を見て、話を始める際に視線を逸らし、以降は視線を逸らしたまま、あるいは前の発話者であるメインリスナを見る傾向にある。
・Speaker (SP)
FIG. 4(A) shows the change in the percentage of the speaker's (SP) gaze during turn changes. One second before taking a turn, the speaker has a high percentage of looking at the main listener (ML), that is, the previous speaker. However, as the speaker takes the right to speak and approaches the point in time (t=0.0 seconds) when he starts speaking, the rate at which he looks away from the main listener increases. The rate at which the speaker looks away reaches a peak around 0.1 seconds after the speaker starts speaking. After that, the rate at which the speaker's line of sight turns toward the main listener and the rate at which the speaker's line of sight averts become almost equally high, completing the line-of-sight transition during the turn change. These results show that people tend to look at the previous speaker at the beginning of a turn change, look away when they start speaking, and then either continue to look away or look at the previous speaker, the main listener. be.
 ・メインリスナ(ML)
図4(B)に、ターン交替の直前まで発話権を持っていたメインリスナの視線の割合の遷移を示す。ターン交替時には、メインリスナが発話権を発話者に渡した、発話者に取られた、又は自然に発話者が交替したなど、状況は様々に考えられる。しかし、全体として、次の発話者が話し始める前から、メインリスナには次の発話者がわかっているか又は決めており、ターン交替の1秒前から発話権を譲ってから1秒後までの発話交替の区間において、メインリスナは発話者の方を見続ける傾向にある。
・Main listener (ML)
FIG. 4(B) shows the transition of the gaze ratio of the main listener who had the right to speak until just before the turn change. At the time of turn change, various situations can be considered, such as the main listener passing the right to speak to the speaker, the speaker taking the right to speak, or the speaker taking the right to speak naturally. However, overall, the main listener knows or has decided who the next speaker will be before the next speaker starts speaking, and the main listener knows or has decided who the next speaker will be before the next speaker starts speaking, and the main listener has the ability to listen to the next speaker from 1 second before the turn takes place to 1 second after giving up the right to speak. During a speech change, the main listener tends to keep looking at the speaker.
 ・サブリスナ(SL)
 最後に、ターン交替に関与しなかったサブリスナの視線の割合の遷移を図3(C)に示す。図4(C)において、横軸t=-1.0秒の時、すなわち、発話者が発言する1秒前までは、サブリスナはメインリスナを見る割合及び発話者を見る割合が同等に高い。しかし、そこからt=1.0秒、つまり発話者が発話を開始し1秒経過した時点に向かうにつれてサブリスナの発話者を見る割合が高くなる。したがって、サブリスナはターン交替の1秒前からターン交替までに次発話者を推測し、それまではメインリスナの方又は次の発話者の方に視線を向け、それからは次の発話者へと視線を動かす傾向にあると考えられる。
・Sublistener (SL)
Finally, Fig. 3(C) shows the transition of the gaze ratio of the sublisteners who did not participate in the turn change. In FIG. 4C, when the horizontal axis t=-1.0 seconds, that is, until 1 second before the speaker speaks, the proportion of sub-listeners looking at the main listener and the proportion looking at the speaker are equally high. However, as the sublistener approaches t=1.0 seconds, that is, 1 second has elapsed since the speaker started speaking, the rate at which the sublistener looks at the speaker increases. Therefore, the sublistener guesses the next speaker from 1 second before the turn change until the turn change, and until then, the sublistener looks at the main listener or the next speaker, and then looks at the next speaker. It is thought that there is a tendency to move.
 b.ターン交替時以外(発話期間)
 ターン交替時以外を、ここでは発話期間と呼ぶ。発話期間の先頭及び末尾はターン交替の影響を受けていると考えられる。したがって、以下の発話期間における視線解析においては、発話期間の先頭及び末尾の2秒ずつについては解析の対象から除いた。図5に、発話期間における各参加者の視線の方向の割合を示した。図6には、図5に示した発話区間における視線の割合の具体的な数字を表形式により示す。
b. Other than when changing turns (speech period)
The period other than the time of turn change is referred to as the utterance period here. It is thought that the beginning and end of the utterance period are affected by turn alternation. Therefore, in the following line-of-sight analysis during the speech period, the two seconds at the beginning and end of the speech period were excluded from the analysis. Figure 5 shows the ratio of each participant's gaze direction during the speaking period. FIG. 6 shows specific numbers of the ratio of line of sight in the utterance section shown in FIG. 5 in a table format.
 図5において、横軸は視線の方向を表す。縦軸は各方向を向いていた時間の全体に対する割合を示す。 In FIG. 5, the horizontal axis represents the direction of the line of sight. The vertical axis indicates the proportion of time spent facing each direction relative to the total time.
 ・発話者
 図5(A)を参照して、発話者については、発話中にメインリスナを見る割合が少し高い。しかし、発話者は全体としてバランスよく視線を配分している。
-Speaker Referring to FIG. 5(A), the rate at which speakers look at the main listener while speaking is slightly high. However, the speaker distributes his/her gaze in a well-balanced manner as a whole.
 ・メインリスナ
 図5(B)を参照して、メインリスナの場合、発話期間の7割近くは発話者の方向をむいており、サブリスナを見る割合はかなり低い。メインリスナが目を逸らす割合はサブリスナを見る割合よりかなり高いが発話者を見る割合よりもかなり低い。
- Main listener Referring to FIG. 5(B), the main listener faces the speaker for nearly 70% of the speaking period, and the proportion of looking at the sublistener is quite low. The rate at which main listeners look away is significantly higher than the rate at which sublisteners look, but it is significantly lower than the rate at which they look at the speaker.
 ・サブリスナ
 図5(C)を参照して、サブリスナの場合も傾向はメインリスナとほぼ同様である。すなわちサブリスナも、発話期間の7割近くは発話者の方向を見ている。目を逸らす割合はメインリスナを見る割合より高いが、メインリスナが目を逸らす割合よりは低い。
- Sublistener Referring to FIG. 5(C), the tendency for sublisteners is almost the same as for the main listener. In other words, the sublistener also looks in the direction of the speaker for nearly 70% of the utterance period. The rate of looking away is higher than the rate of looking at the main listener, but lower than the rate of looking away by the main listener.
 c.個性による視線の動き
 視線の動きは個性によっても異なるのではないかという問題意識のもと、上記予備実験(第1の予備実験)とは別に、個性に関する視線の動きを解析するために、別の予備実験(第2の予備実験)を行った。第2の予備実験においては、男女あわせて17名の参加者による3者対話を行った。全部で14セッションを行い、各セッションにおいては10分から20分程度の自由会話を行った。3名の話者は一辺2メートル程度の正三角形の3頂点にそれぞれ配置された3個の椅子に座りヘッドセットマイクと加速度センサを装着した。これらとは別に画像深度カメラを用いて各参加者の動きを記録した。
c. Gaze movements based on individuality Based on the awareness that gaze movements may differ depending on individuality, in addition to the above preliminary experiment (first preliminary experiment), we conducted a separate experiment to analyze gaze movements related to individuality. A preliminary experiment (second preliminary experiment) was conducted. In the second preliminary experiment, a three-way dialogue was conducted with 17 participants, both male and female. A total of 14 sessions were conducted, and each session included 10 to 20 minutes of free conversation. The three speakers sat on three chairs, each placed at the three vertices of an equilateral triangle with sides of about 2 meters, and were equipped with headset microphones and acceleration sensors. Separately, each participant's movements were recorded using an image depth camera.
 個性を表現する指標としてBIG5と呼ばれる指標が存在する。BIG5は人間のパーソナリティ特性を5つの次元により説明する。その一つに「外向性」がある。この予備実験において、我々は「外向性」に着目し、外向性により参加者の視線の動きがどのように異なるかを調べることを目的とした。 There is an index called BIG5 that expresses individuality. BIG5 describes human personality characteristics using five dimensions. One of them is "extroversion." In this preliminary experiment, we focused on "extraversion" and aimed to examine how participants' gaze movements differed depending on their extroversion.
 この実験においては、17名の参加者の中で、特に外向性の印象が異なる2名の参加者を選定し、この2名の参加者から得られたデータにからそれぞれの視線の動きを解析した。以下の説明においてはこの2名の参加者をそれぞれ話者A及び話者Bとする。話者Aは外向寄りという印象を与え、話者Bは内向寄りという印象を与えた。この2名の視線に関する解析結果を使用して異なるモデルを構築することにより、異なった個性をロボットなどのエージェントにより表出できる可能性がある。 In this experiment, we selected two participants with particularly different impressions of extroversion from among the 17 participants, and analyzed their gaze movements based on the data obtained from these two participants. did. In the following description, these two participants will be referred to as speaker A and speaker B, respectively. Speaker A gave the impression of being an extrovert, and speaker B gave the impression of being an introvert. By constructing different models using the analysis results regarding the line of sight of these two people, it is possible that different personalities can be expressed by agents such as robots.
 ・発話者として
 図7(A)に、話者Aが発話者(SP)のときに、メインリスナ(ML)を見た時間、サブリスナ(SL)を見た時間、及び視線を逸らした時間(GA)の割合を示す。図8(A)には話者Bについて得られた結果を同様に示す。
・As a speaker Figure 7 (A) shows the time when speaker A looked at the main listener (ML), the time when he looked at the sublistener (SL), and the time when he looked away (when he was the speaker (SP)). GA). FIG. 8(A) similarly shows the results obtained for speaker B.
 図7(A)から、話者Aが発話者のときにはメインリスナだけではなく、サブリスナを見る時間の割合も比較的高いことが分かる。すなわち、メインリスナ、サブリスナを見る時間の割合は互いにほぼ等しく、さらに視線を逸らす時間の割合もほぼ同じことが分かる。なお、個性による区別をしていない先の予備実験の結果である図5(A)と比較すると、話者Aがメインリスナを見る確率は相対的に低くなっている。 From FIG. 7(A), it can be seen that when speaker A is the speaker, the proportion of time spent looking not only at the main listener but also at the sublistener is relatively high. That is, it can be seen that the proportion of time spent looking at the main listener and the sublistener is almost equal to each other, and the proportion of time spent looking away from the main listener is also almost the same. Note that the probability that speaker A looks at the main listener is relatively low compared to FIG. 5A, which is the result of the previous preliminary experiment in which no distinction was made based on personality.
 一方、図8(A)から、話者Bが発話者のときには、話者Aと比較して視線を逸らす時間帯が多く、かつサブリスナをあまり見ない傾向にあることが分かる。特に図5(A)と比較すると、話者Bの場合には視線を逸らす時間の割合がかなり高いことも分かる。 On the other hand, from FIG. 8(A), it can be seen that when speaker B is the speaker, he looks away more often than speaker A, and tends not to look at the sublistener much. In particular, when compared with FIG. 5A, it can be seen that the proportion of time in which speaker B looks away is quite high.
 ・メインリスナとして
 図7(B)に、話者Aがメインリスナのときに、発話者を見た時間、サブリスナを見た時間、及び目を逸らした時間の割合を示す。図(B)には、話者Bについて得られた結果を同様に示す。
- As the main listener FIG. 7(B) shows the ratio of the time when speaker A looked at the speaker, the time when he looked at the sub-listener, and the time when he looked away when he was the main listener. Figure (B) also shows the results obtained for speaker B.
 図7(B)を参照して、話者Aがメインリスナのときには、発話者を見る時間の割合が非常に高く、時々、視線を逸らす傾向にあることが分かる。一方、図8(B)を参照して、話者Bの場合は、話者Aと同様に発話者を見る時間の割合が高いものの、視線を逸らす時間の割合も話者Aよりかなり高いことが分かる。話者A及び話者Bのいずれの場合もサブリスナを見る時間の割合は低いことも分かる。 Referring to FIG. 7(B), it can be seen that when speaker A is the main listener, the listener spends a very high percentage of the time looking at the speaker, and tends to look away from time to time. On the other hand, referring to FIG. 8(B), in the case of speaker B, the proportion of time spent looking at the speaker is high, similar to speaker A, but the proportion of time spent looking away from the speaker is also considerably higher than speaker A. I understand. It can also be seen that the proportion of time spent looking at the sublistener is low for both speakers A and B.
 図7(B)及び図8(B)を図5(B)と比較すると、これらは同じ傾向を示すことが分かる。ただし、図8(B)に示すように話者Bの場合には視線を逸らす時間の割合が高いことが注目される。 Comparing FIG. 7(B) and FIG. 8(B) with FIG. 5(B), it can be seen that they show the same tendency. However, as shown in FIG. 8(B), it is noteworthy that in the case of speaker B, the proportion of time during which he looks away is high.
 ・サブリスナとして
 図7(C)に話者Aがサブリスナのときに発話者を見た時間、メインリスナを見た時間、及び目を逸らした時間の割合を示す。図8(C)に、話者Bについて得られた結果を同様に示す。
- As a sublistener FIG. 7(C) shows the ratio of the time when speaker A looked at the speaker, the time when he looked at the main listener, and the time when he looked away when he was a sublistener. FIG. 8(C) similarly shows the results obtained for speaker B.
 図7(C)より、話者Aがサブリスナとなったときには、メインリスナの場合と同様、発話者を見る時間の割合が非常に高く、メインリスナを見たり、視線を逸らしたりする時間の割合はかなり低い。一方、図8(C)より、話者Bがサブリスナの場合には、話者を見る時間の割合は高いものの、話者Aと比較するとその割合は低い。その代わりに話者Bの場合には、視線を逸らす時間の割合が高く、発話者を見る時間の割合と等しい。 From Figure 7(C), when speaker A becomes a sub-listener, the proportion of time spent looking at the speaker is very high, as in the case of the main listener, and the proportion of time spent looking at the main listener or looking away. is quite low. On the other hand, as shown in FIG. 8C, when speaker B is a sublistener, the proportion of time spent looking at the speaker is high, but compared to speaker A, the proportion is low. On the other hand, in the case of speaker B, the proportion of time spent looking away is high and is equal to the proportion of time spent looking at the speaker.
 これらを図5(C)と比較すると、話者Aと話者Bの傾向の違いがより明らかとなる。すなわち、話者Aは個性を考慮しない場合と比較して発話者を見る時間の割合が高く、視線を逸らす時間の割合は低い。逆に話者Bは個性を考慮しない場合と比較して、発話者を見る時間の割合が低く、視線を逸らす時間の割合が高い。なお話者Bの場合には、メインリスナを見る時間の割合もかなり低くなっている。 Comparing these with FIG. 5(C), the difference in tendency between speaker A and speaker B becomes clearer. That is, compared to the case where individuality is not taken into account, speaker A spends a higher proportion of the time looking at the speaker and a lower proportion of the time he looks away. Conversely, speaker B spends a lower percentage of time looking at the speaker and a higher percentage of time looking away from the speaker, compared to the case where individuality is not considered. In the case of speaker B, the proportion of time spent looking at the main listener is also quite low.
 d.個性による視線の継続時間
 話者A及び話者Bの各々について、発話期間中(ターン交替期間以外の期間)の、発話者、メインリスナ、及びサブリスナとしての役割時における視線の継続時間の分布をそれぞれ解析した。その結果を図9(A)から(C)及び図10(A)から(C)にそれぞれ示す。これら図には、各条件において視線の継続時間のヒストグラムと、以下の式により表されるχ分布によって近似される分布曲線とを示す。
d. Gaze duration time based on individuality For each of Speaker A and Speaker B, the distribution of gaze duration times during the speaking period (period other than the turn change period) when playing the roles of speaker, main listener, and sublistener. Each was analyzed. The results are shown in FIGS. 9(A) to (C) and FIGS. 10(A) to (C), respectively. These figures show a histogram of the duration of the line of sight under each condition and a distribution curve approximated by the χ 2 distribution expressed by the following equation.
Figure JPOXMLDOC01-appb-M000001
 上式においてnはχ分布の自由度であり、scaleは分布曲線の縦軸方向の振幅の大きさを、locは分布曲線の横軸方向のバイアスを、それぞれ決めるパラメータである。
Figure JPOXMLDOC01-appb-M000001
In the above equation, n is the degree of freedom of the χ 2 distribution, scale is a parameter that determines the magnitude of the amplitude in the vertical axis direction of the distribution curve, and loc is a parameter that determines the bias in the horizontal axis direction of the distribution curve.
 話者A及び話者Bのそれぞれについて、発話者である時にメインリスナを見る継続時間、サブリスナを見る継続時間、及び視線を逸らす継続時間の3種類の継続時間の分布を算出した。図9は話者Aの結果を、図10は話者Bの結果を、それぞれ示す。 For each of Speaker A and Speaker B, we calculated the distribution of three types of duration: the duration of looking at the main listener, the duration of looking at the sub-listener, and the duration of looking away when the speaker is the speaker. FIG. 9 shows the results for speaker A, and FIG. 10 shows the results for speaker B.
 図9(A)を参照して、話者Aにおいては、メインリスナを見る時間の分布の近似曲線はn=3.4,loc=0.15,scale=0.2により与えられる。図9(B)に示されるサブリスナを見る継続時間の分布の近似曲線はn=4.34,loc=0.0,scale=0.18により与えられる。したがって、話者Aが発話者のときには、話者Bが発話者のときより、対話相手(メインリスナとサブリスナ)を見る継続時間は長めになる傾向にある。一方、図9(C)に記される、話者Aが視線を逸らす継続時間の分布の近似曲線はn=4.67,loc=0.0,scale=0.12である。この結果から、話者Aの場合には視線を逸らす継続時間は短くなる傾向にあることが分かる。 Referring to FIG. 9(A), for speaker A, the approximate curve of the distribution of the time spent looking at the main listener is given by n=3.4, loc=0.15, and scale=0.2. The approximate curve of the distribution of the sublistener viewing duration shown in FIG. 9B is given by n=4.34, loc=0.0, and scale=0.18. Therefore, when speaker A is the speaker, the duration of looking at the conversation partner (main listener and sublistener) tends to be longer than when speaker B is the speaker. On the other hand, the approximate curve of the distribution of the duration during which speaker A looks away, shown in FIG. 9C, is n=4.67, loc=0.0, and scale=0.12. From this result, it can be seen that in the case of speaker A, the duration of looking away tends to be shorter.
 以上により、話者Aが発話者となったとき、対話相手を見る継続時間は長めであり、時々視線を逸らす動作は短めに行う傾向にあることが分かる。 From the above, it can be seen that when Speaker A becomes the speaker, the duration of looking at the conversation partner tends to be long, and the occasional movement of looking away tends to be short.
 これに対し話者Bの場合、図10(A)を参照して、メインリスナを見る時間の分布の近似曲線はn=7.9,loc=0.0,scale=0.1により与えられる。また図10(B)に示されるサブリスナを見る時の継続時間の分布の近似曲線はn=4.7,loc=0.12,scale=0.12により与えられる。この結果より話者Bの場合には、メインリスナを見る継続時間及びサブリスナを見る継続時間の双方とも短めになることが分かる。さらに、図10(C)に示す、視線を逸らす継続時間の分布の近似曲線はn=3.28、loc=0.1、scale=0.2)により与えられる。この結果より、話者Bの場合には、視線を逸らす継続時間が長めになる傾向にあることが分かる。 On the other hand, for speaker B, referring to FIG. 10(A), the approximate curve of the distribution of the time spent looking at the main listener is given by n=7.9, loc=0.0, scale=0.1. . Further, the approximate curve of the distribution of duration when viewing the sublistener shown in FIG. 10(B) is given by n=4.7, loc=0.12, and scale=0.12. From this result, it can be seen that in the case of speaker B, both the duration of looking at the main listener and the duration of looking at the sub-listener are short. Furthermore, the approximate curve of the distribution of the duration of looking away shown in FIG. 10(C) is given by n=3.28, loc=0.1, scale=0.2). From this result, it can be seen that in the case of speaker B, the duration of looking away tends to be longer.
 以上により、話者Bが発話者の場合には視線逸らしの継続時間が長くなり、時々メインリスナ及びサブリスナの方を短時間だけ見る傾向にあることが分かる。 From the above, it can be seen that when speaker B is the speaker, the duration of gaze aversion is longer, and there is a tendency for the speaker to sometimes look at the main listener and sub-listener for a short period of time.
 なお、話者A及び話者Bの各々がメインリスナ及びサブリスナのときの視線方向の継続時間分布についての詳細はここでは示されない。しかし、これら話者が発話者のときと類似した継続時間の分布の傾向が見られた。 Note that details regarding the duration distribution of line-of-sight directions when speaker A and speaker B are the main listener and sublistener, respectively, are not shown here. However, a trend in duration distribution similar to that observed when these speakers were the speakers was observed.
 e.視線逸らし
 ・視線逸らしの間隔
 図11に、第1予備実験において参加者が発話中に視線を逸らした回数の散布図を示す。横軸は発話中の時間を示す。縦軸は人から視線を逸らした回数を示す。なお横軸の時間は前述したとおり、発話の冒頭部分と末端部分とを除いた時間である。図11には、視線逸らしの近似直線も表している。近似直線の回帰係数rは図11(A)、(B)及び(C)に関しそれぞれ0.83,0.82,0.67である。図11及び図6より、役割と関係なく視線を逸らす間隔が変わらないことが分かる。
e. Gaze aversion/interval of gaze aversion Figure 11 shows a scatter plot of the number of times participants averted their gaze during speech in the first preliminary experiment. The horizontal axis indicates the time during speech. The vertical axis indicates the number of times the person looked away from the person. As mentioned above, the time on the horizontal axis is the time excluding the beginning and end of the utterance. FIG. 11 also shows an approximate straight line for averting the line of sight. The regression coefficient r of the approximate straight line is 0.83, 0.82, and 0.67 for FIGS. 11(A), (B), and (C), respectively. From FIGS. 11 and 6, it can be seen that the interval at which the user looks away does not change regardless of the role.
 ・眼球の向き
 人が他の人を見るときの視線の動きは非常に単純である。したがってロボットが人を見る場合には、違和感のないようにロボットの顔を相手に向ければよい。しかしロボットが対話相手などから視線を逸らすときは単純な動きただけでは不十分なことが分かった。実際に収録された人による3者対話のデータを見ると、人の場合には眼球だけ移動させて視線を逸らす動きが見られた。このため、ロボットが視線を逸らすときに必要な人の眼球の動きの解析として、視線を逸らすときに人の眼球がどこを向いているのかに特に着目した。また、人が視線を逸らすときに、どの程度の時間、視線を逸らしているのかについて、その時間の分布について解析を行った。
- Direction of the eyeballs The movement of the gaze when a person looks at another person is very simple. Therefore, when a robot looks at a person, it is only necessary to turn the robot's face toward the other person so that the robot does not feel uncomfortable. However, it turns out that simple movement is not enough when the robot averts its gaze from the person it is interacting with. Looking at data from actual three-way conversations between people, it was found that people only moved their eyeballs and looked away. For this reason, in order to analyze the movement of the human eyeballs, which is necessary when the robot averts its line of sight, we specifically focused on where the human eyeballs are pointing when the robot averts its line of sight. We also analyzed the distribution of how long people look away when they look away.
 ・視線逸らしの継続時間
 図12に、人が視線を逸らす動きをした時の継続時間のヒストグラムを示す。この分布は次の指数分布により近似される。
- Duration of gaze aversion Figure 12 shows a histogram of the duration when a person moves to avert their gaze. This distribution is approximated by the following exponential distribution.
Figure JPOXMLDOC01-appb-M000002
 ただしμ=0.2、λ=0.63である。なお、0.2秒未満のサンプルは実際のエージェントへの実装を考えたとき、非現実的だと考え、0.2秒以上のサンプルに限って分析した。解析対象となったサンプルの継続時間の平均値は0.83秒、中央値は0.55秒だった。
Figure JPOXMLDOC01-appb-M000002
However, μ=0.2 and λ=0.63. Note that we considered samples shorter than 0.2 seconds to be unrealistic when considering implementation in an actual agent, so we analyzed only samples longer than 0.2 seconds. The average duration of the samples analyzed was 0.83 seconds, and the median was 0.55 seconds.
 ・視線逸らし時の瞳の配分
 人は、対話相手から視線を逸らすとき、顔だけでなく、瞳(眼球)を主に動かして視線を逸らす。そのように視線を逸らすとき、人は眼球を何回動かすのか、どのようなパターンに従って視線を逸らすのかについても解析した。
・Pupil distribution when looking away When people look away from the other person in a conversation, they mainly move their pupils (eyeballs) in addition to their faces. We also analyzed how many times people move their eyeballs when they look away in this way, and what patterns they follow when looking away.
 第1の予備実験の結果から、各話者において、視線を逸らす際の眼球の向きを{前方、上、下、左、右、左上、右上、左下、右下}の9方向に分類して割合を求めた。その結果を図13に示す。図13を参照して、人が視線を逸らすときには、中央及び中央下側という2箇所に瞳を逸らす傾向が強いことが分かる。また、他の7箇所については、多少の偏りはあるものの、人が目を逸らす際にはほぼ同等の割合により瞳を分散させることが分かる。 From the results of the first preliminary experiment, we classified the direction of each speaker's eyeballs when looking away into nine directions: {forward, up, down, left, right, upper left, upper right, lower left, lower right}. The percentage was calculated. The results are shown in FIG. Referring to FIG. 13, it can be seen that when people avert their gaze, they tend to avert their eyes to two places: the center and the lower center. Furthermore, regarding the other seven locations, it can be seen that, although there is some bias, when a person looks away, the pupils are dispersed at approximately the same rate.
 ・個性による瞳の配分
 第2予備実験の結果を第1予備実験と同様にして視線逸らし時の瞳の配分を解析した。図14に話者Aの結果を、図15に話者Bの結果を、それぞれ示す。
- Pupil distribution according to personality The results of the second preliminary experiment were analyzed in the same way as the first preliminary experiment to analyze the distribution of pupils when looking away. FIG. 14 shows the results for speaker A, and FIG. 15 shows the results for speaker B.
 図14及び図15を参照して、第2予備実験においても、第1予備実験における結果と同様、話者Aも話者Bも、視線を逸らす際に、上方向よりも、下方向に目を逸らす頻度が高いことが分かる。ただし、話者Bの方が下方向に目を逸らす頻度が高い傾向にある。両話者とも眼球が前方を向いている(すなわち瞳が中央に位置する)頻度が比較的高い。第2予備実験において得られた3者対話データにおいては、対話者は斜め左右に位置している。そのため、話者の眼球が前方を向いているということは、話者が対話相手のいずれからも視線を逸らしていることを意味している。 Referring to FIGS. 14 and 15, in the second preliminary experiment, as well as the results in the first preliminary experiment, both speaker A and speaker B looked downward rather than upward when averting their gaze. It can be seen that the frequency of deviation is high. However, speaker B tends to look away downward more frequently. Both speakers have a relatively high frequency of having their eyes facing forward (i.e., their pupils are located in the center). In the three-person dialogue data obtained in the second preliminary experiment, the dialogue participants are located diagonally to the left and right. Therefore, when the speaker's eyes are facing forward, it means that the speaker is looking away from either of the conversation partners.
 なお、図14及び図15においては右下に瞳を逸らす割合が、図13と比較して高く、中央及び中央下部のいずれでもない他の6箇所と比較しても高い。この偏りの原因は不明である。しかし、我々は、図13の結果に鑑み、後述の実施形態においては図14及び図15の場合も、視線逸らし時には、中央及び中央下部以外の7箇所には、ほぼ均等に瞳が向くような実装をした。 Note that in FIGS. 14 and 15, the proportion of pupils diverted to the lower right is higher than in FIG. 13, and also higher than in the other six locations that are neither in the center nor in the lower center. The cause of this bias is unknown. However, in view of the results shown in FIG. 13, in the embodiments described later, even in the cases of FIG. 14 and FIG. I implemented it.
 ・視線逸らしの回数
 第1実験においても第2実験においても、対話参加者が視線を逸らすとき、一点を長時間見ているのではなく、眼球を複数回動かす傾向が見られた。視線逸らしの時間が長くなるほど、眼球を動かす回数も多くなる傾向が見られた。視線を逸らした回数ごとの継続時間の分布を解析した結果に基づき、後掲の実施形態においては、ロボットが視線を逸らす回数は、視線逸らしの継続時間が0.7秒以下のときに1回、1.4秒以下のときに2回、2.1秒以下のときに3回、というように、0.7秒ごとに1回だけ、眼球を動かす回数を増やすこととした。
・Number of gaze aversions In both the first and second experiments, when conversation participants averted their gaze, they tended to move their eyeballs multiple times rather than looking at one point for a long time. There was a tendency that the longer the gaze averted, the more the number of eye movements. Based on the results of analyzing the distribution of the duration for each number of times the robot looks away, in the embodiment described later, the number of times the robot looks away is once when the duration of the look away is 0.7 seconds or less. , 2 times when the time is 1.4 seconds or less, 3 times when the time is 2.1 seconds or less, and so on, and the number of eyeball movements is increased by one time every 0.7 seconds.
 第2 実施形態
 1.システム構成
 図16を参照して、以下にロボット160を制御するための、この発明の実施形態に係る視線制御システムを採用した会話ロボットシステム150の構成を説明する。ロボットなどのエージェントが人と対話するためには、音声認識機能、音声合成機能、及びこれらを使用した対話機能を実装する必要がある。この実施形態においては、これら機能については既存のものを用いることを前提とする。また3者対話を行うためには、ターン交替及び各対話参加者の役割を動的に認識する必要がある。ターン交替を認識するのは難しいが、例えば以下の参考文献に記載されたターン交替のタイミングを認識する技術を利用できる。
Second embodiment 1. System Configuration Referring to FIG. 16, the configuration of a conversational robot system 150 that employs the line-of-sight control system according to the embodiment of the present invention for controlling the robot 160 will be described below. In order for an agent such as a robot to interact with people, it is necessary to implement a speech recognition function, a speech synthesis function, and a dialogue function using these functions. In this embodiment, it is assumed that existing functions are used. Furthermore, in order to carry out a three-way dialogue, it is necessary to dynamically recognize turn changes and the roles of each dialogue participant. Although it is difficult to recognize turn changes, techniques for recognizing turn change timing, such as those described in the following references, can be used.
 Chaoran Liu et al, “Turn-taking Estimation Model Based on Joint Embedding of Lexical and Prosodic Contents”, INTERSPEECH 2017, [Online], August 20-24, 2017, Stockholm, Sweden, [令和4年5月3日検索], インターネット <URL:https://isca-speech.org/archive_v0/Interspeech_2017/pdfs/0965.PDF>
 上記参考文献においては、ターン交替のタイミングを検出するモデルを使用している。それに対し本実施形態においてはターン交替のタイミングの1秒前を検出する必要がある。しかし、これについても、検出すべきタイミングをターン交替の1秒前に設定した上で、上記参考文献に記載の手法を用いてモデルの訓練を行うことによりターン交替のタイミングの1秒前を検出できる。
Chaoran Liu et al, “Turn-taking Estimation Model Based on Joint Embedding of Lexical and Prosodic Contents”, INTERSPEECH 2017, [Online], August 20-24, 2017, Stockholm, Sweden, [Retrieved May 3, 2020 ], Internet <URL: https://isca-speech.org/archive_v0/Interspeech_2017/pdfs/0965.PDF>
The above reference uses a model that detects turn change timing. In contrast, in this embodiment, it is necessary to detect one second before the turn change timing. However, in this case as well, by setting the timing to be detected as 1 second before the turn change and then training the model using the method described in the above reference, it is possible to detect 1 second before the turn change timing. can.
 また、以下の実施形態においては、ロボットを含めた参加者の役割の認識と、ロボットが発話権を獲得する動作をすべきか否かについての認識とは、所与のものとして外部から与えられるものとする。しかし、例えば発話者が誰かについては、発話者の位置を把握した上で、後述するように音源を定位することにより特定できる。またメインリスナが誰かについては対話参加者の顔画像に基づき、発話者が各参加者の方を向いている時間の割合などに基づいて推定できる。ロボットがメインリスナならば、ターン交替が検出されたときに発話を開始するようにすればよい。ロボットが発話者のときには、他の参加者の顔画像から、より長い時間だけロボットの方向を向いている参加者をメインリスナにする、などの手法も考えられる。 In addition, in the following embodiments, the recognition of the roles of participants including the robot and the recognition of whether the robot should perform an action to acquire the right to speak are given as given from the outside. shall be. However, for example, the identity of the speaker can be determined by determining the location of the speaker and localizing the sound source as described later. Furthermore, it is possible to estimate who the main listener is based on the facial images of the conversation participants and the proportion of time that the speaker faces each participant. If the robot is the main listener, it may start speaking when a turn change is detected. When a robot is the speaker, a method such as selecting a participant who faces the robot for a longer period of time based on facial images of other participants as the main listener may be considered.
 図16に、この実施形態に係る会話ロボットシステム150のハードウェア構成を示す。図16を参照して、会話ロボットシステム150は、ロボット160の視線制御を行うための確率モデルを記憶した確率モデル記憶装置172と、確率モデル記憶装置172に記憶された確率モデルを使用してロボット160の視線に関する動作を制御するための動作制御PC(Personal Computer)162と、動作制御PC162が接続されるネットワーク176とを含む。この実施形態に係るロボット160は、他の2人の対話参加者とともに3者対話を行うためのものである。 FIG. 16 shows the hardware configuration of the conversational robot system 150 according to this embodiment. Referring to FIG. 16, the conversational robot system 150 includes a probability model storage device 172 that stores a probability model for controlling the gaze of the robot 160, and a probability model stored in the probability model storage device 172. The computer 160 includes an operation control PC (Personal Computer) 162 for controlling line-of-sight-related operations, and a network 176 to which the operation control PC 162 is connected. The robot 160 according to this embodiment is for carrying out a three-way dialogue with two other dialogue participants.
 会話ロボットシステム150はさらに、対話相手の位置を検出するための人位置センサ178、及び人位置センサ178とネットワーク176とに接続された人位置認識PC180を含む。人位置センサ178は対話相手の位置を検出して検出信号を人位置認識PC180に与えるためのものである。人位置認識PC180は、人位置センサ178からの信号に基づいて、ロボット160から見た対話相手の位置を算出し、ネットワーク176を介して動作制御PC162に送信するためのものである。 The conversational robot system 150 further includes a human position sensor 178 for detecting the position of a conversation partner, and a human position recognition PC 180 connected to the human position sensor 178 and the network 176. The human position sensor 178 is for detecting the position of the conversation partner and providing a detection signal to the human position recognition PC 180. The human position recognition PC 180 is for calculating the position of the dialogue partner as seen from the robot 160 based on the signal from the human position sensor 178 and transmitting the position to the operation control PC 162 via the network 176.
 会話ロボットシステム150はさらに、マイクロフォン164と、マイクロフォン164から受ける音声信号に基づいて対話参加者の発話内容の認識とターン交替及び各参加者の役割とを認識し、その結果をネットワーク176を介して動作制御PC162に送信するための音声処理PC166とを含む。マイクロフォン164はマイクロフォンアレイであり、音声処理PC166はマイクロフォン164の出力に基づいて音源の位置が特定できる。音声処理PC166は、発話テキスト、ターン交替の検出情報、及び各参加者の役割とともに、発話を行った人物の位置を示す情報も動作制御PC162に送信する機能を持つ。動作制御PC162は、発話テキストの内容と音声処理PC166によるターン交替のタイミングの検出結果及び役割の認識結果とに基づいて、自己が発話権を取得すべきか否かを判定する機能を持つ。 The conversation robot system 150 further uses a microphone 164 to recognize the utterances of the conversation participants, the turn changes, and the roles of each participant based on the audio signals received from the microphone 164, and sends the results via the network 176. and an audio processing PC 166 for transmitting data to the operation control PC 162. The microphone 164 is a microphone array, and the audio processing PC 166 can specify the position of the sound source based on the output of the microphone 164. The audio processing PC 166 has a function of transmitting to the operation control PC 162 information indicating the position of the person who made the utterance, as well as the utterance text, turn change detection information, and the role of each participant. The operation control PC 162 has a function of determining whether or not it should acquire the right to speak based on the content of the uttered text, the detection result of the turn change timing, and the recognition result of the role by the voice processing PC 166.
 会話ロボットシステム150はさらに、発話テキストを受けてそのテキストに対応する音声信号を生成するための音声合成PC170と、音声合成PC170から音声信号を受けて音声に変換するためのスピーカ168とを含む。音声合成PC170による音声合成の声質は、ロボット160の外観にふさわしいものに選ばれる。 The conversation robot system 150 further includes a speech synthesis PC 170 for receiving spoken text and generating an audio signal corresponding to the text, and a speaker 168 for receiving an audio signal from the speech synthesis PC 170 and converting it into speech. The quality of the voice synthesized by the voice synthesis PC 170 is selected to be suitable for the appearance of the robot 160.
 会話ロボットシステム150はさらに、ネットワーク176に接続され、ネットワーク176を介して他のPCから発話テキストを受けたことに応答して、その発話に対する応答を生成し出力するための対話用PC174を含む。動作制御PC162は、音声処理PC166によりロボット160が発話権を取得すべきと判定されたときには、発話者の発話のテキストを対話用PC174に送る機能を持つ。対話用PC174がこの入力に対する応答の発話テキストを生成して動作制御PC162及び確率モデル記憶装置172に送信することにより、その発話テキストに対応する音声が音声合成PC170及びスピーカ168により生成され、動作制御PC162は発話に対応して音声処理PC166の動きを制御できる。 The conversation robot system 150 further includes a conversation PC 174 connected to the network 176 and configured to generate and output a response to the utterance in response to receiving utterance text from another PC via the network 176. The operation control PC 162 has a function of sending the text of the speaker's utterance to the dialogue PC 174 when the voice processing PC 166 determines that the robot 160 should acquire the right to speak. The dialogue PC 174 generates an uttered text in response to this input and sends it to the operation control PC 162 and the probability model storage device 172, so that the speech synthesis PC 170 and the speaker 168 generate a voice corresponding to the uttered text, and the operation control The PC 162 can control the movement of the audio processing PC 166 in response to speech.
 音声処理PC166は、マイクロフォン164の出力する音声信号に対して音声認識を行い、さらにマイクロフォン164の出力に基づいて音源定位を行って、音声認識の結果として得られるテキストと、発話者の位置を示す情報とを動作制御PC162に送信する機能を持つ音声認識部190とを含む。音声処理PC166はさらに、音声認識部190の出力するテキストと、マイクロフォン164の音声信号から得られる音声パワー及び基本周波数F0を含む韻律情報とを用いて、上記した参考文献により開示された方法に従ってターン交替を検出する処理と、ターン交替に伴う各参加者の役割を認識する処理とを実行し、結果を動作制御PC162に送信するためのターン認識部192を含む。 The audio processing PC 166 performs speech recognition on the audio signal output from the microphone 164, further performs sound source localization based on the output of the microphone 164, and indicates the text obtained as a result of the speech recognition and the position of the speaker. and a voice recognition unit 190 having a function of transmitting information to the operation control PC 162. The voice processing PC 166 further uses the text output from the voice recognition unit 190 and the prosody information including the voice power and fundamental frequency F0 obtained from the voice signal of the microphone 164 to perform a turn according to the method disclosed in the above-mentioned reference document. It includes a turn recognition unit 192 that executes a process of detecting a turn change and a process of recognizing the role of each participant in the turn change, and transmits the results to the operation control PC 162.
 図17にロボット160の外観を示す。図17に示すようにロボット160はやや小ぶりなロボットであり、上半身及び頭部を左右に回転させることができる。さらにロボット160の眼球には大きな瞳が描かれており、眼球の回転角度を制御することによりロボット160の視線方向を上下左右に移動させることができる。 FIG. 17 shows the appearance of the robot 160. As shown in FIG. 17, the robot 160 is a rather small robot, and can rotate its upper body and head left and right. Furthermore, a large pupil is drawn on the eyeball of the robot 160, and by controlling the rotation angle of the eyeball, the line of sight direction of the robot 160 can be moved vertically and horizontally.
 2.ハードウェア構成
 図18は、例えば図16に示す動作制御PC162として動作するコンピュータシステムのハードウェアブロック図である。図16に示す音声処理PC166、音声合成PC170、及び人位置認識PC180についても動作制御PC162とほぼ同様の構成のコンピュータシステムにより実現できる。ここではこれらPC162を代表するものとしてコンピュータシステム250の構成についてのみ述べることとし、個々のPCの構成の詳細については述べない。
2. Hardware Configuration FIG. 18 is a hardware block diagram of a computer system that operates as the operation control PC 162 shown in FIG. 16, for example. The voice processing PC 166, voice synthesis PC 170, and human position recognition PC 180 shown in FIG. 16 can also be realized by a computer system having almost the same configuration as the operation control PC 162. Here, only the configuration of the computer system 250 as a representative of these PCs 162 will be described, and details of the configuration of each individual PC will not be described.
 図18を参照して、このコンピュータシステム250はDVD(Digital Versatile Disc)ドライブ302を有するコンピュータ270と、いずれもコンピュータ270に接続された、ユーザと対話するためのキーボード274、マウス276、及びモニタ272とを含む。もちろんこれらはユーザ対話が必要となったときのための構成の一例であって、ユーザ対話に利用できる一般のハードウェア及びソフトウェア(例えばタッチパネル、音声入力、ポインティングデバイス一般)ならばどのようなものも利用できる。 Referring to FIG. 18, this computer system 250 includes a computer 270 having a DVD (Digital Versatile Disc) drive 302, a keyboard 274, a mouse 276, and a monitor 272, all connected to the computer 270, for interacting with the user. including. Of course, these are examples of configurations for when user interaction is required, and any general hardware and software that can be used for user interaction (e.g. touch panel, voice input, pointing device in general) can be used. Available.
 図18を参照して、コンピュータ270は、DVDドライブ302に加えて、CPU(Central Processing Unit)290と、GPU(Graphics Processing Unit)292と、CPU290、GPU292、DVDドライブ302に接続されたバス310と、バス310に接続され、コンピュータ270のブートアッププログラムなどを記憶するROM(Read-Only Memory)296と、バス310に接続され、プログラムを構成する命令、システムプログラム、及び作業データなどを記憶するRAM(Random Access Memory)298と、バス310に接続された不揮発性メモリであるSSD(Solid State Drive)300とを含む。SSD300は、CPU290及びGPU292が実行するプログラム、並びにCPU290及びGPU292が実行するプログラムが使用するデータなどを記憶するためのものである。コンピュータ270はさらに、他端末との通信を可能とするネットワーク176への接続を提供するネットワークI/F(Interface)308と、USB(Universal Serial Bus)メモリ284が着脱可能で、USBメモリ284とコンピュータ270内の各部との通信を提供するUSBポート306とを含む。 Referring to FIG. 18, in addition to a DVD drive 302, a computer 270 is connected to a CPU (Central Processing Unit) 290, a GPU (Graphics Processing Unit) 292, and a DVD drive 302. bus 310 and , a ROM (Read-Only Memory) 296 connected to the bus 310 and storing boot-up programs for the computer 270, and a RAM connected to the bus 310 and storing instructions constituting the program, system programs, work data, etc. (Random Access Memory) 298 and an SSD (Solid State Drive) 300 that is a nonvolatile memory connected to a bus 310. The SSD 300 is for storing programs executed by the CPU 290 and GPU 292, data used by the programs executed by the CPU 290 and GPU 292, and the like. The computer 270 further includes a network I/F (Interface) 308 that provides a connection to the network 176 that enables communication with other terminals, and a USB (Universal Serial Bus) memory 284 that is removably connected to the computer. 270.
 コンピュータ270はさらに、マイクロフォン164及びスピーカ168とバス310とに接続され、CPU290により生成されRAM298又はSSD300に保存された音声信号、映像信号及びテキストデータをCPU290の指示に従って読み出し、アナログ変換及び増幅処理をしてスピーカ168を駆動したり、マイクロフォン164からのアナログの音声信号をデジタル化し、RAM298又はSSD300の、CPU290により指定される任意のアドレスに保存したりする機能を持つ音声I/F304を含む。 The computer 270 is further connected to the microphone 164, the speaker 168, and the bus 310, reads audio signals, video signals, and text data generated by the CPU 290 and stored in the RAM 298 or the SSD 300 according to instructions from the CPU 290, and performs analog conversion and amplification processing. It includes an audio I/F 304 having a function of driving a speaker 168, digitizing an analog audio signal from a microphone 164, and storing it in an arbitrary address specified by the CPU 290 in the RAM 298 or SSD 300.
 上記実施形態においては、動作制御PC162、音声処理PC166、音声合成PC170、及び人位置認識PC180などの機能を実現するプログラムなどは、いずれも例えば図18に示すSSD300、RAM298、DVD278又はUSBメモリ284、若しくはネットワークI/F308及びネットワーク176を介して接続された図示しない外部装置の記憶媒体などに格納される。典型的には、これらのデータ及びパラメータなどは、例えば外部からSSD300に書込まれコンピュータ270の実行時にはRAM298にロードされる。 In the above embodiment, programs for realizing the functions of the operation control PC 162, the voice processing PC 166, the voice synthesis PC 170, and the human position recognition PC 180 are all stored in the SSD 300, RAM 298, DVD 278, or USB memory 284 shown in FIG. 18, for example. Alternatively, the information may be stored in a storage medium of an external device (not shown) connected via the network I/F 308 and the network 176. Typically, these data and parameters are written into the SSD 300 from the outside, for example, and loaded into the RAM 298 when the computer 270 is executed.
 このコンピュータシステムを、図7に示す動作制御PC162、音声処理PC166、音声合成PC170、及び人位置認識PC180並びにその各構成要素の機能を実現するよう動作させるためのコンピュータプログラムは、DVDドライブ302に装着されるDVD278に記憶され、DVDドライブ302からSSD300に転送される。又は、これらのプログラムはUSBメモリ284に記憶され、USBメモリ284はUSBポート306に装着され、プログラムがSSD300に転送される。又は、このプログラムはネットワーク176を通じてコンピュータ270に送信されSSD300に記憶されてもよい。 A computer program for operating this computer system so as to realize the functions of the operation control PC 162, voice processing PC 166, voice synthesis PC 170, and human position recognition PC 180 shown in FIG. 7 and their respective components is installed in the DVD drive 302. The data is stored on the DVD 278 and transferred from the DVD drive 302 to the SSD 300. Alternatively, these programs are stored in the USB memory 284, the USB memory 284 is attached to the USB port 306, and the programs are transferred to the SSD 300. Alternatively, this program may be transmitted to computer 270 via network 176 and stored on SSD 300.
 プログラムは実行のときにRAM298にロードされる。もちろん、キーボード274、モニタ272及びマウス276を用いてソースプログラムを入力し、コンパイルした後のオブジェクトプログラムをSSD300に格納してもよい。上記実施形態のようにスクリプト言語の場合には、キーボード274などを用いて入力したスクリプトをSSD300に格納してもよい。仮想マシン上において動作するプログラムの場合には、仮想マシンとして機能するプログラムを予めコンピュータ270にインストールしておく必要がある。音声認識及び音声合成などにはニューラルネットワークが使用される。訓練済のニューラルネットワークを使用してもよいし、会話ロボットシステム150において訓練を行ってもよい。 The program is loaded into RAM 298 during execution. Of course, a source program may be input using the keyboard 274, monitor 272, and mouse 276, and the compiled object program may be stored in the SSD 300. In the case of a script language as in the above embodiment, a script input using the keyboard 274 or the like may be stored in the SSD 300. In the case of a program that operates on a virtual machine, it is necessary to install the program that functions as a virtual machine on the computer 270 in advance. Neural networks are used for speech recognition, speech synthesis, etc. A trained neural network may be used, or the training may be performed on the talking robot system 150.
 CPU290は、その内部のプログラムカウンタと呼ばれるレジスタ(図示せず)により示されるアドレスに従ってRAM298からプログラムを読み出して命令を解釈し、命令の実行に必要なデータを命令により指定されるアドレスに従ってRAM298、SSD300又はそれ以外の機器から読み出して命令により指定される処理を実行する。CPU290は、実行結果のデータを、RAM298、SSD300、CPU290内のレジスタなど、プログラムにより指定されるアドレスに格納する。アドレスによってはロボットのアクチュエータへの指令、音声信号などとしてコンピュータから出力される。このとき、プログラムカウンタの値もプログラムによって更新される。コンピュータプログラムは、DVD278から、USBメモリ284から、又はネットワーク176を介して、RAM298に直接にロードしてもよい。なお、CPU290が実行するプログラムの中で、一部のタスク(主として数値計算)については、プログラムに含まれる命令により、又はCPU290による命令実行時の解析結果に従って、GPU292にディスパッチされる。 The CPU 290 reads the program from the RAM 298 according to the address indicated by an internal register called a program counter (not shown), interprets the instruction, and stores the data necessary for executing the instruction in the RAM 298 and the SSD 300 according to the address specified by the instruction. Or read it from other devices and execute the process specified by the command. The CPU 290 stores the data of the execution result at an address specified by the program, such as the RAM 298, the SSD 300, or a register within the CPU 290. Depending on the address, the computer outputs a command to the robot's actuator, a voice signal, etc. At this time, the value of the program counter is also updated by the program. Computer programs may be loaded directly into RAM 298 from DVD 278 , from USB memory 284 , or via network 176 . Note that in the program executed by the CPU 290, some tasks (mainly numerical calculations) are dispatched to the GPU 292 according to instructions included in the program or according to an analysis result when the CPU 290 executes the instructions.
 コンピュータ270により上記した実施形態に係る各部の機能を実現するプログラムは、それら機能を実現するようコンピュータ270を動作させるように記述され配列された複数の命令を含む。この命令を実行するのに必要な基本的機能のいくつかはコンピュータ270上において動作するオペレーティングシステム(OS)若しくはサードパーティのプログラム、コンピュータ270にインストールされる各種ツールキットのモジュール又はプログラムの実行環境により提供される場合もある。したがって、このプログラムはこの実施形態のシステム及び方法を実現するのに必要な機能全てを必ずしも含まなくてよい。このプログラムは、命令の中で、所望の結果が得られるように制御されたやり方によって適切な機能又はモジュールなどを静的にリンクすることにより、又は動的に呼出すことにより、上記した各装置及びその構成要素としての動作を実行する命令のみを含んでいればよい。そのためのコンピュータ270の動作方法は周知なので、ここでは繰り返さない。 A program for realizing the functions of each part according to the above-described embodiment by the computer 270 includes a plurality of instructions written and arranged to cause the computer 270 to operate to realize those functions. Some of the basic functions required to execute this instruction are provided by the operating system (OS) running on the computer 270, third party programs, modules of various toolkits installed on the computer 270, or the program execution environment. In some cases, it may be provided. Therefore, this program does not necessarily include all the functions necessary to implement the system and method of this embodiment. This program includes each of the above-mentioned devices and modules by statically linking or dynamically calling appropriate functions or modules in a controlled manner to obtain the desired results. It is sufficient to include only instructions for executing operations as its constituent elements. The manner in which computer 270 operates for this purpose is well known and will not be repeated here.
 なお、GPU292は並列処理を行うことが可能であり、機械学習に伴う多量の計算を同時並列的又はパイプライン的に実行できる。例えばプログラムのコンパイル時にプログラム中に発見された並列的計算要素、又はプログラムの実行時に発見された並列的計算要素は、随時、CPU290からGPU292にディスパッチされ、実行され、その結果が直接に、又はRAM298の所定アドレスを介してCPU290に返され、プログラム中の所定の変数に代入される。 Note that the GPU 292 is capable of parallel processing, and can execute a large amount of calculations associated with machine learning simultaneously in parallel or in a pipeline manner. For example, parallel computing elements found in a program when the program is compiled or parallel computing elements discovered when the program is executed are dispatched from the CPU 290 to the GPU 292 and executed, and the results are sent directly or to the RAM 298. is returned to the CPU 290 via a predetermined address, and is substituted into a predetermined variable in the program.
 3.機能的構成
 図19に、この実施形態に係る会話ロボットシステム150の中の視線制御に関連する部分である視線制御システム350の機能的構成を示す。図19を参照して、視線制御システム350は、ロボット160及び他の対話参加者の現在の役割を判定し出力するための役割判定部360と、ターン交替を検出するためのターン交替検出部362とを含む。この実施形態においてはターン交替の検出及び対話参加者の役割についてはその都度検出している。しかし、例えばあらかじめ準備したシナリオにそってロボットを含む3者対話を行う場合には、各参加者の役割及びターン交替はシナリオによりほぼ特定される。そのため、役割判定部360及びターン交替検出部362を設ける必要はない。後述する評価実験はそうした条件で視線制御を行っている。
3. Functional Configuration FIG. 19 shows the functional configuration of a line-of-sight control system 350, which is a part related to line-of-sight control in the conversation robot system 150 according to this embodiment. Referring to FIG. 19, the line of sight control system 350 includes a role determination unit 360 for determining and outputting the current roles of the robot 160 and other dialogue participants, and a turn change detection unit 362 for detecting a turn change. including. In this embodiment, turn changes and the roles of dialogue participants are detected each time. However, for example, when a three-party dialogue including a robot is performed according to a scenario prepared in advance, the roles and turn turns of each participant are almost specified by the scenario. Therefore, it is not necessary to provide the role determining section 360 and the turn replacement detecting section 362. In the evaluation experiment described below, line of sight control was performed under such conditions.
 視線制御システム350はさらに、ターン交替時及び発話中のロボット160の視線方向を確率的に決定するための視線方向モデル364と、ロボット160の視線逸らし時の視線方向を過効率的に決定するための視線逸らしモデル366と、ロボット160の個性(外向的、内向的、中立)に関する情報を記憶するための個性情報記憶部368とを含む。視線方向モデル364の詳細については図20及び図21を参照して後述する。モデル366の詳細については図22及び図23を参照して後述する。なお、視線方向モデル364及び視線逸らしモデル366はいずれも図16に示す確率モデル記憶装置172に記憶される。個性情報記憶部368も実際には確率モデル記憶装置172により実現される。確率モデル記憶装置172は図18のSSD300により実現される。 The line-of-sight control system 350 further includes a line-of-sight direction model 364 for stochastically determining the line-of-sight direction of the robot 160 during a turn change or while speaking, and a line-of-sight model 364 for determining the line-of-sight direction of the robot 160 in an overefficient manner when the robot 160 averts its line of sight. , and a personality information storage unit 368 for storing information regarding the personality (extrovert, introvert, neutral) of the robot 160. Details of the viewing direction model 364 will be described later with reference to FIGS. 20 and 21. Details of the model 366 will be described later with reference to FIGS. 22 and 23. Note that both the line-of-sight direction model 364 and the line-of-sight aversion model 366 are stored in the probability model storage device 172 shown in FIG. Personality information storage section 368 is also actually realized by probability model storage device 172. The probabilistic model storage device 172 is realized by the SSD 300 in FIG.
 視線制御システム350はさらに、ターン交替検出部362からターン交替が検出されたことを示す信号を受けたことに応答して、役割判定部360からロボット160及び他の対話参加者の役割に関する情報を受けて、視線方向モデル364、個性情報記憶部368及び368を使用してロボット160の視線動作を制御するためのパラメータを作成するための視線動作生成部370と、視線動作生成部370により作成されたパラメータに従って、ロボット160の視線を制御するための各種のアクチュエータ(図示せず)を制御するための視線動作制御部372とを含む。 The line of sight control system 350 further receives information regarding the roles of the robot 160 and other dialogue participants from the role determination unit 360 in response to receiving a signal from the turn change detection unit 362 indicating that a turn change has been detected. In response, the gaze direction model 364 and the gaze motion generation section 370 create parameters for controlling the gaze motion of the robot 160 using the personality information storage sections 368 and 368. and a line-of-sight movement control unit 372 for controlling various actuators (not shown) for controlling the line-of-sight of the robot 160 according to the parameters set.
 4.モデル構成
 A.視線方向モデル364
 図20を参照して、視線方向モデル364は、個性及び役割の組み合わせに応じてあらかじめ準備される発話時視線方向モデル400と、個性別及び役割の組み合わせに応じてあらかじめ準備される視線継続時間モデル402と、個性及び役割の組み合わせに応じてあらかじめ準備される、それぞれターン交替時の1回目、2回目、及び3回目の区間における視線方向の決定のためのターン交替時視線方向モデル404、406及び408とを含む。
4. Model configuration A. Gaze direction model 364
Referring to FIG. 20, the gaze direction model 364 includes a speech gaze direction model 400 that is prepared in advance according to the combination of personality and role, and a gaze duration model that is prepared in advance according to the combination of personality and role. 402, turn change gaze direction models 404, 406, and 406 for determining gaze directions in the first, second, and third sections of turn change, respectively, which are prepared in advance according to the combination of personalities and roles. 408.
 発話時視線方向モデル400の構成は図6に示したものと実質的に同じである。図6に示したものは例えば外向的な参加者のモデルである。したがって発話時視線方向モデル400は、図6に示したものに加えて、内向的な参加者のためのモデル、及び中立的な参加者のためのモデルを含む。 The configuration of the speech gaze direction model 400 is substantially the same as that shown in FIG. 6. For example, what is shown in FIG. 6 is a model of an extroverted participant. Therefore, the speech gaze direction model 400 includes, in addition to the model shown in FIG. 6, a model for introverted participants and a model for neutral participants.
 図21に、視線継続時間モデル402の構成の1例を示す。図21を参照して、視線継続時間モデル402は、ロボット160の役割(発話者、メインリスナ及びサブリスナ)、及びロボット160に想定される個性(外向的、内向的、中立)の組み合わせに応じ、各視線先(自分の役割を除く他の参加者の役割)の各々について、χ乗分布曲線の自由度(n)、横軸方向のバイアス(loc)及び縦軸方向の振幅(scale)の値を記憶したものである。これらの値をχ乗分布の式に代入することで分布を特定し、その分布からランダムに値をサンプリングすることにより、視線方向の決定時にその継続時間も決定できる。 FIG. 21 shows an example of the configuration of the gaze duration model 402. Referring to FIG. 21, the gaze duration model 402 is based on the combination of the role of the robot 160 (speaker, main listener, and sublistener) and the personality assumed for the robot 160 (extrovert, introvert, neutral). For each gaze destination (roles of other participants excluding one's own role), the degree of freedom (n) of the chi - square distribution curve, the bias in the horizontal axis direction (loc), and the amplitude in the vertical axis direction (scale) are calculated. It stores the value. By substituting these values into the equation of the chi - square distribution, the distribution is specified, and by randomly sampling values from the distribution, the duration can also be determined when determining the line-of-sight direction.
 ターン交替時視線方向モデル404、406及び408は、発話時視線方向モデル400と同じ構成を持つ。違いは、その格納した値が、ターン交替時の各区間に限定されているという点である。 The turn change gaze direction models 404, 406, and 408 have the same configuration as the speech gaze direction model 400. The difference is that the stored value is limited to each section at the time of turn change.
 B.視線逸らしモデル366
 図22に、図19に示す視線逸らしモデル366の構成を示す。図22を参照して、視線逸らしモデル366は、ターン交替時視線逸らし方向モデル450及び発話時視線逸らし方向モデル452を含む。これらモデルはいずれも、個性別に設けられる。
B. Averted gaze model 366
FIG. 22 shows the configuration of the line-of-sight aversion model 366 shown in FIG. 19. Referring to FIG. 22, the gaze aversion model 366 includes a gaze aversion direction model 450 during turn change and a gaze aversion direction model 452 during speech. All of these models are provided individually.
 図23にターン交替時視線逸らし方向モデル450の構成の一例を示す。図23を参照して、ターン交替時視線逸らし方向モデル450は、例えば図14に示す外向的な参加者の視線逸らし方向をモデル化したものである。この実施形態においては、ターン交替時視線逸らし方向モデル450は配列により実現されている。各要素は、図14の中央の列及び右の列を最も左側の列の下に移動させたときの順番に配列されている。各要素は、視線逸らし方向(左上から右下までの9方向)と、その方向に視線を逸らす割合(確率)とを示す。図14において「L」は左を、「C」は中央を、「R」は右を、「U」は上を、「D」は下を、それぞれ示す。例えば「LD」は左下を表し、「CD」は中央下を表す。 FIG. 23 shows an example of the configuration of the direction model 450 for looking away when changing turns. Referring to FIG. 23, a turn change gaze aversion direction model 450 models the gaze aversion direction of the extroverted participant shown in FIG. 14, for example. In this embodiment, the turn change time gaze aversion direction model 450 is realized by an array. The elements are arranged in the order in which the center column and right column in FIG. 14 are moved below the leftmost column. Each element indicates a gaze averting direction (nine directions from the upper left to the lower right) and a rate (probability) of averting the gaze in that direction. In FIG. 14, "L" indicates the left, "C" indicates the center, "R" indicates the right, "U" indicates the top, and "D" indicates the bottom. For example, "LD" represents the lower left, and "CD" represents the lower center.
 後述するように、プログラムがターン交替時視線逸らし方向モデル450を参照することにより、ロボット160の役割に応じ、ターン交替時にロボット160が視線逸らしを行うときの方向を確率的に定めることができる。 As will be described later, by referring to the turn change time gaze aversion direction model 450, the program can stochastically determine the direction in which the robot 160 averts its gaze during turn change, depending on the role of the robot 160.
 5.プログラム構成
 A.全体構成
 図24に、この実施形態に係るロボット160を制御するために視線制御システム350が実行するプログラムの全体構成をフローチャート形式により示す。図24を参照して、このプログラムは、一定の時間間隔をもって繰り返し起動され、起動時に対話の状態(ターン交替時か否か、各対話参加者の役割は何か、など)をセンシングするステップ480と、現時点が視線方向を決定するタイミングか否かに従って制御の流れを分岐させるステップ482とを含む。視線方向を決定するタイミングとは、例えばターン交替時であれば、ターン交替が発生するタイミングを0秒として-1秒、-0.3秒、及び0.3秒となったときである。ターン交替時以外、すなわち発話時には、その前の視線方向決定の際に決定された視線の継続時間が満了した時点である。その方向が視線逸らし方向であり、その継続時間が0.7秒以上となったときには、さらに0.7秒ごとに視線方向を決定し直す処理を行う。これらのタイミングについては、各イベントが発生したときにその時間をタイマに設定し、タイマが満了したか否かに従って検出できる。
5. Program structure A. Overall Configuration FIG. 24 shows, in a flowchart format, the overall configuration of a program executed by the line of sight control system 350 to control the robot 160 according to this embodiment. Referring to FIG. 24, this program is repeatedly activated at regular time intervals, and at the time of activation, the program senses the state of the dialogue (whether or not it is time to change turns, what the role of each dialogue participant is, etc.) at step 480. and step 482 of branching the flow of control depending on whether or not the current time is the timing to determine the viewing direction. For example, in the case of a turn change, the timing for determining the line of sight direction is -1 second, -0.3 second, and 0.3 second, assuming that the timing at which the turn change occurs is 0 seconds. At times other than when changing turns, that is, when speaking, this is the point in time when the duration of the line of sight determined in the previous line of sight direction determination has expired. If the direction is a line-of-sight direction and the duration thereof is 0.7 seconds or more, the process of re-determining the line-of-sight direction is further performed every 0.7 seconds. Regarding these timings, when each event occurs, the time can be set in a timer, and detection can be made according to whether the timer has expired or not.
 このプログラムはさらに、ステップ482の判定が肯定のときに実行され、現在のロボット160に割当てられた役割と、その個性と、発話時かターン交替時かという情報との組み合わせに従って、視線方向モデル364から選択したモデル、又はさらに視線逸らしモデル366から選択したモデルに従って視線方向とその継続時間とを決定するステップ484と、ステップ484の後、及びステップ482における判定が否定的である場合には直接に実行され、ロボット160の各部をステップ484で設定されたパラメータに従って制御してプログラムの実行を終了するステップ486とを含む。なお、このプログラムでは、図示していない他のステップにおいてロボットのための音声認識、音声合成、及び姿勢の制御などのための処理も実行される。ステップ486ではそうした処理に基づく制御も行われる。 This program is further executed when the determination in step 482 is affirmative, and generates a line-of-sight direction model 360 according to the combination of the role currently assigned to the robot 160, its personality, and information on whether it is speaking or changing turns. a step 484 of determining the direction of gaze and its duration according to a model selected from , or further from a model selected from gaze aversion models 366; after step 484, and directly if the determination in step 482 is negative; and step 486, in which each part of the robot 160 is controlled according to the parameters set in step 484, and the execution of the program is ended. Note that in this program, processes for voice recognition, voice synthesis, posture control, etc. for the robot are also executed in other steps not shown. In step 486, control based on such processing is also performed.
 ステップ486では、ステップ484において決定されたパラメータに従って、タイマの進行にあわせてロボット160の視線を制御するために、ロボット160の頭部の向き、及び眼球の位置などが制御される。プログラムが実行されるたびに、時間が進行するため、プログラムが実行されるたびにロボット160の頭部の位置及び眼球の位置が変化していく。 In step 486, in accordance with the parameters determined in step 484, the direction of the head of the robot 160, the position of the eyeballs, etc. are controlled in order to control the line of sight of the robot 160 in accordance with the progress of the timer. Since time progresses each time the program is executed, the position of the head and the position of the eyes of the robot 160 change each time the program is executed.
 B.ステップ480(状態センシング)
 図25に、図24のステップ480を実現するプログラムの構成をフローチャート形式により示す。図25を参照して、ステップ480は、図19に示す役割判定部360からロボット160及び各参加者の役割を示す情報を読み出すステップ510と、図19に示すターン交替検出部362からターン状態を示す情報を読み出すステップ512と、図19に示す個性情報記憶部368からロボット160に付与された個性を示す情報とを読み出してこのプログラムの実行を終了するステップ514とを含む。
B. Step 480 (condition sensing)
FIG. 25 shows the configuration of a program that implements step 480 in FIG. 24 in a flowchart format. Referring to FIG. 25, step 480 includes a step 510 of reading out information indicating the role of the robot 160 and each participant from the role determination section 360 shown in FIG. and step 514 of reading information indicating the individuality given to the robot 160 from the individuality information storage unit 368 shown in FIG. 19 and ending the execution of this program.
 ロボット160に付与された個性は固定されていると考えてよい。したがって個性に関する情報は必ずしもプログラムの実行の繰り返しごとに読み出す必要はない。しかし個性だけではなく、感情などのように時間により変化することが想定される状態まで含めてロボット160に付与する場合には、図25に示すようにプログラムの実行の繰り返しごとにその情報を読み出すことが望ましい。 The personality given to the robot 160 can be considered to be fixed. Therefore, information regarding individuality does not necessarily need to be read every time the program is executed. However, if the robot 160 is to be given not only individuality but also states that are expected to change over time, such as emotions, the information is read out each time the program is executed, as shown in FIG. This is desirable.
 C.ステップ484(視線方向と継続時間の決定)
 図26を参照して、ステップ484は、ロボット160に割当てられた役割、ターン状態、及び個性に応じた視線方向モデル及び視線継続時間モデルを視線方向モデル364から選択し読み出すステップ550と、同じくロボット160に割当てられた役割、ターン状態、及び個性に応じたターン交替時視線逸らし方向モデル450又は発話時視線逸らし方向モデル452を視線逸らしモデル366から選択し読み出すステップ552とを含む。ステップ550では、ターン状態が発話時であれば図20に示す発話時視線方向モデル400が選択される。ターン状態がターン交替時であれば、タイミングに従いターン交替時視線方向モデル404及び406及び408のいずれか1つが選択される。
C. Step 484 (determining gaze direction and duration)
Referring to FIG. 26, step 484 includes step 550 in which a gaze direction model and a gaze duration model corresponding to the role, turn state, and personality assigned to robot 160 are selected and read from gaze direction model 364; A step 552 of selecting and reading out the gaze aversion direction model 450 during turn change or the gaze aversion direction model 452 during utterance from the gaze aversion models 366 according to the role, turn state, and personality assigned to the turn change mode 160 . In step 550, if the turn state is the speaking time, the speaking gaze direction model 400 shown in FIG. 20 is selected. If the turn state is a turn change, one of the turn change line-of- sight direction models 404, 406, and 408 is selected according to the timing.
 このプログラムはさらに、ステップ552に続き、一様分布から[0,1]の範囲の値pをサンプリングするステップ554と、ステップ554においてサンプリングされた値pを用い、ステップ550において選択された視線方向モデルから視線方向を決定するステップ556とを含む。この明細書では、このようにして視線方向モデルを[0,1]の範囲でランダムにサンプリングされた値を用いて視線方向を決定することを「視線方向のサンプリング」と呼ぶ。ステップ556の詳細については図27を参照して後述する。 The program further continues step 552 with step 554 of sampling a value p in the range [0,1] from a uniform distribution, and using the value p sampled in step 554 to determine the viewing direction selected in step 550. determining 556 a viewing direction from the model. In this specification, determining the line-of-sight direction using randomly sampled values of the line-of-sight direction model in the range [0, 1] is referred to as "sampling of the line-of-sight direction." Details of step 556 will be described later with reference to FIG. 27.
 このプログラムはさらに、ステップ554と同様、一様分布から[0,1]の範囲の値pをサンプリングするステップ558と、ステップ550で読み出された視線継続時間モデル402から、図25示す処理により設定されたロボット160の役割及びロボット160に付与された個性に従ったχ乗分布のパラメータn、loc及びscaleを読み出しこれら値とステップ558においてサンプリングされた値pとを用いて視線の継続時間を決定するステップ560とを含む。ステップ560の詳細については図28を参照して後述する。 This program further includes a step 558 of sampling a value p in the range [0, 1] from a uniform distribution, similar to step 554, and a process shown in FIG. 25 from the gaze duration model 402 read out in step 550. The parameters n, loc, and scale of the chi - square distribution according to the set role of the robot 160 and the personality given to the robot 160 are read out, and the duration of the line of sight is calculated using these values and the value p sampled in step 558. and step 560 of determining. Details of step 560 will be described later with reference to FIG.
 このプログラムはさらに、ステップ556で決定された視線方向が視線逸らしか否かに従って制御の流れを分岐させるステップ562を含む。ステップ562の判定が否定のときにはこのプログラムの実行は終了する。 This program further includes a step 562 in which the control flow is branched depending on whether the line-of-sight direction determined in step 556 is an averted line-of-sight direction. When the determination at step 562 is negative, execution of this program ends.
 このプログラムはさらに、ステップ562の判定が肯定的であることに応答して、一様分布から[0,1]の範囲の値pをサンプリングするステップ564と、ステップ564でサンプリングされた値pを用い、ステップ556で選択された視線方向モデルを用いて視線方向を決定してこのプログラムの実行を終了するステップ566とを含む。ステップ566の詳細については図29を参照して後述する。 The program further includes a step 564 of sampling a value p in the range [0, 1] from a uniform distribution in response to a positive determination in step 562, and a step 564 of sampling a value p in the range [0, 1] from a uniform distribution. and step 566 of determining the viewing direction using the viewing direction model selected in step 556 and terminating the execution of this program. Details of step 566 will be described later with reference to FIG.
 図27に、図26のステップ556の詳細をフローチャート形式で示す。なおこの図では、視線をメインリスナに向ける確率を表す変数として変数S、視線を発話者に向ける確率としてS、及び視線をサブリスナに向ける確率としてSを、それぞれ用いる。図27を参照して、ステップ556は、変数Sの値と変数Sの値とを合計した値を変数Sに代入し、このように値が算出された変数Sとの値と変数Sの値とを合計した値を変数Sに代入するステップ600を含む。この計算を行うことにより、変数Sには発話者の方向を向く確率が格納される。変数Sには発話者の方向を向く確率とメインリスナの方向を向く確率とが合算された値が格納される。変数Sには発話者、メインリスナ及びサブリスナの方向を向く確率が合算された値が格納される。 FIG. 27 shows details of step 556 in FIG. 26 in flowchart form. In this figure, a variable S M is used as a variable representing the probability of directing the line of sight to the main listener, S S is the probability of directing the line of sight to the speaker, and S B is used as the probability of directing the line of sight to the sublistener. Referring to FIG. 27, step 556 assigns the sum of the value of the variable SS and the value of the variable SM to the variable SM , and the value of the variable SM whose value has been calculated in this way. The process includes step 600 of assigning the sum of the values of the variable SB and the value of the variable SB to the variable SB . By performing this calculation, the probability of facing toward the speaker is stored in the variable SS . The variable SM stores the sum of the probability of facing the speaker and the probability of facing the main listener. The variable SB stores the sum of the probabilities of facing the speaker, the main listener, and the sublistener.
 このプログラムはさらに、ステップ600に続き値pが変数Sの値より小さいか否かに従い制御の流れを分岐させるステップ602と、ステップ602における判定が肯定的であるときに、視線方向を発話者の方向であると決定しこのプログラムの実行を終了するステップ604とを含む。 This program further includes a step 602 in which, following step 600, the flow of control is branched depending on whether the value p is smaller than the value of the variable S. and step 604 in which the direction is determined and the execution of this program is terminated.
 このプログラムはさらに、ステップ602における判定が否定的であることに応答して、値pが変数Sの値より小さいか否かに従って制御の流れを分岐させるステップ606と、ステップ606における判定結果が肯定的であることに応答して、視線方向をメインリスナの方向であると決定してこのプログラムの実行を終了するステップ608とを含む。 This program further includes a step 606 in which, in response to the determination in step 602 being negative, the flow of control is branched depending on whether the value p is smaller than the value of the variable S M ; In response to the affirmative, determining 608 the viewing direction to be the direction of the main listener and terminating execution of the program.
 このプログラムはさらに、ステップ606における判定が否定的であることに応答して、値pが変数Sの値より小さいか否かに従って制御の流れを分岐させるステップ610と、ステップ610における判定が肯定的であるときに、視線方向をサブリスナの方向に決定してこのプログラムの実行を終了するステップ612と、ステップ610における判定が否定的であるときに、視線方向を視線逸らし方向であると決定してこのプログラムの実行を終了するステップ614とを含む。 This program further includes a step 610 in which, in response to the negative determination in step 606, the flow of control is branched depending on whether the value p is smaller than the value of the variable SB ; At step 612, the line of sight direction is determined to be the direction of the sublistener and the execution of this program is ended. and step 614 of terminating execution of the lever program.
 ステップ600における処理を行うことにより、こうしたアルゴリズムを用いて、値pとモデルの表す確率とに従い視線方向を決定できる。 By performing the processing in step 600, such an algorithm can be used to determine the viewing direction according to the value p and the probability represented by the model.
 図28に、図26のステップ560で実行されるプログラムの制御構造をフローチャート形式で示す。図28を参照して、このプログラムは、図26のステップ556において選択された視線継続時間モデル402から値n、loc、及びscaleを読み出すステップ650と、[0,1]の一様分布から値pをサンプリングしこの値を変数xに代入するステップ652と、既に述べた視線継続時間を算出するχ乗分布の式にn、loc、scale及びxの値を代入することにより視線の継続時間を算出してこのプログラムの実行を終了するステップ654とを含む。 FIG. 28 shows the control structure of the program executed in step 560 of FIG. 26 in a flowchart format. Referring to FIG. 28, the program includes a step 650 of reading the values n, loc, and scale from the gaze duration model 402 selected in step 556 of FIG. Step 652 of sampling p and substituting this value into the variable x, and substituting the values of n, loc, scale, and x into the equation of the chi - square distribution for calculating the gaze duration, which has already been described, calculates the gaze duration. and step 654 of calculating the value and terminating the execution of this program.
 図29に図26のステップ566を実現するプログラムの制御構造をフローチャート形式で示す。図29を参照して、このプログラムは、プログラム中のループ処理のための繰り返し変数iに0を、図23に示す各確率を配列番号0から順番に累積していくための変数Sに0を、それぞれ代入するステップ680と、ステップ680に続き、変数iの値が8より小さいか否かに従って制御の流れを分岐させるステップ682とを含む。ここで「8」とは、図23に示す配列における添字の最大値である。 FIG. 29 shows the control structure of the program that implements step 566 in FIG. 26 in a flowchart format. Referring to FIG. 29, this program sets 0 to the repetition variable i for loop processing in the program, and 0 to the variable S for accumulating each probability shown in FIG. 23 in order from array number 0. , respectively, and following step 680, a step 682 of branching the flow of control depending on whether the value of variable i is less than 8. Here, "8" is the maximum value of the subscript in the array shown in FIG.
 このプログラムはさらに、ステップ682の判定が肯定的であるときに、変数Sに、図23に示す配列の確率P(i)を加算するステップ684と、変数pの値が変数Sの値より小さいか否かに従って制御の流れを分岐させるステップ686と、ステップ686の判定が否定的であるとき、すなわち変数pの値が変数Sの値以上であるときに、変数iの値に1を加算して制御をステップ682に戻すステップ688とを含む。ステップ686における判定が肯定的である場合には、視線逸らし方向を図23の配列における視線逸らし方向D(i)に決定しプログラムの実行を終了するステップ690を含む。 This program further includes a step 684 of adding the probability P(i) of the array shown in FIG. 23 to the variable S when the determination in step 682 is affirmative; Step 686 branches the flow of control according to whether or not and returning control to step 682. If the determination in step 686 is affirmative, step 690 is included in which the line-of-sight averting direction is determined to be the line-of-sight averting direction D(i) in the array of FIG. 23 and the execution of the program is ended.
 このプログラムはさらに、ステップ682における判定が否定的であることに応答して、視線逸らし方向を視線逸らし方向D(8)に決定しプログラムの実行を終了するステップ692を含む。 This program further includes a step 692 in which, in response to the negative determination in step 682, the direction of averting the line of sight is determined to be the direction of averting the line of sight D(8) and the execution of the program ends.
 このプログラムを実行することにより、例えば図14に示す確率の分布に従って視線逸らし方向を決定できる。 By executing this program, it is possible to determine the direction of gaze aversion according to the probability distribution shown in FIG. 14, for example.
 6.動作
 この実施形態に係る会話ロボットシステム150は以下のように動作する。なお以下の説明は、視線制御システム350に関するもののみについて、主として図24から図29を参照して行う。
6. Operation The conversational robot system 150 according to this embodiment operates as follows. Note that the following explanation will mainly be made with reference to FIGS. 24 to 29 only regarding the line of sight control system 350.
 視線制御システム350が起動されると、図24のステップ480が実行される。この例では、ロボット160を含む対話参加者の各役割、ターン状態としてはあらかじめ定める初期値として役割=発話者、ターン状態として発話期間(ターン交替期間以外)が設定されているものとする。また図19に示す個性情報記憶部368にはロボット160に付与する個性を示す情報が格納されているものとする。ここでは、ロボット160には「外向的」という個性が付与されているものとする。ステップ480ではこうした情報が図18のRAM298に読み込まれる。 When the line of sight control system 350 is activated, step 480 in FIG. 24 is executed. In this example, it is assumed that each role and turn state of the dialogue participants including the robot 160 is set as a predetermined initial value as role=speaker, and the turn state is set as a speaking period (other than the turn change period). Further, it is assumed that the personality information storage unit 368 shown in FIG. 19 stores information indicating the personality to be given to the robot 160. Here, it is assumed that the robot 160 is given the personality of "extrovert." In step 480, such information is loaded into RAM 298 of FIG.
 続いてステップ482において、視線方向決定のタイミングか否かが決定される。ターン状態の初期値として発話期間が設定されており、かつ視線継続時間のタイマがクリアされているものとすると、図24のステップ482における判定は肯定となり、制御はステップ484に進む。 Subsequently, in step 482, it is determined whether it is time to determine the line of sight direction. Assuming that the utterance period is set as the initial value of the turn state and the gaze duration timer is cleared, the determination in step 482 of FIG. 24 is affirmative, and the control proceeds to step 484.
 図26を参照して、ステップ550では、役割=発話者、ターン状態=発話期間、個性=外向的に対応する発話時視線方向モデル400が読み出され、ステップ552では役割=発話者、個性=外向的に対応する視線継続時間モデル402がそれぞれ選択される。さらにステップ554において値pがサンプリングされ、ステップ556が実行される。ステップ556においては、図27に示す処理が実行され、視線方向が決定される。ここではロボット160の役割は発話者なので、視線方向としてはメインリスナ、サブリスナ及び視線逸らしのいずれかが選択される。 Referring to FIG. 26, in step 550, the speaking gaze direction model 400 corresponding to role=speaker, turn state=speech period, personality=extraversion is read out, and in step 552, role=speaker, personality= Each extrovertedly corresponding gaze duration model 402 is selected. Furthermore, the value p is sampled in step 554 and step 556 is executed. In step 556, the process shown in FIG. 27 is executed to determine the viewing direction. Here, since the role of the robot 160 is the speaker, one of main listener, sublistener, and averting the line of sight is selected as the line of sight direction.
 図26に戻り、ステップ558において値pが再びサンプリングされる。続くステップ560においては以下のように視線継続時間が決定される。 Returning to FIG. 26, the value p is sampled again in step 558. In the following step 560, the gaze duration time is determined as follows.
 図28を参照して、ステップ650において、図26のステップ552において読み出された視線継続時間モデル402から、ロボット160の役割及びロボット160に付与された個性に従ったχ乗分布のパラメータn、loc及びscaleが読み出される。ステップ652において、ステップ558においてサンプリングされた値pが変数xに代入される。続くステップ654において、パラメータn、loc及びscaleと変数xの値とを継続時間計算のための式に代入して値を計算することにより視線の継続時間が決定される。 Referring to FIG. 28, in step 650, from the gaze duration model 402 read out in step 552 of FIG . , loc and scale are read. In step 652, the value p sampled in step 558 is assigned to variable x. In the following step 654, the duration of the line of sight is determined by substituting the parameters n, loc, and scale and the value of the variable x into a formula for duration calculation to calculate the value.
 再び図26に戻り、続くステップ560においては、ステップ556で決定された視線方向が視線逸らしか否かが判定される。ここでは視線方向が視線逸らしではないとする。すると、図26に示す処理(図24のステップ484)は直ちに終了する。図24に戻り、ステップ486においては、ステップ484で決定された視線方向及び視線継続時間に従ったロボット160の制御が開始される。なおステップ486においては、これら視線制御以外についてもロボット160の制御が実行される。仮にステップ556で決定された視線方向が視線逸らしであれば、図26のステップ564及び566が実行されてロボット160の視線逸らしの方向が決定される。ステップ486においては、ロボット160の視線はメインリスナでもサブリスナでもない方向となるように制御される。 Returning to FIG. 26 again, in the following step 560, it is determined whether the line of sight direction determined in step 556 is an averted line of sight. Here, it is assumed that the direction of the line of sight is not averted. Then, the process shown in FIG. 26 (step 484 in FIG. 24) ends immediately. Returning to FIG. 24, in step 486, control of the robot 160 according to the line-of-sight direction and line-of-sight duration determined in step 484 is started. Note that in step 486, control of the robot 160 is also executed in addition to the line-of-sight control. If the line-of-sight direction determined in step 556 is a line-of-sight direction, steps 564 and 566 in FIG. 26 are executed to determine the line-of-sight direction of the robot 160. In step 486, the line of sight of the robot 160 is controlled so as to be in a direction that is neither the main listener nor the sublistener.
 次に図24に示す処理が起動されたものとする。ステップ480は1回目の処理と同様である。ステップ482においては、まだ視線方向の継続時間が満了していないのでステップ484の処理は実行されず、ステップ486の処理だけが実行される。したがって、ロボット160の視線方向の変化がまだ完了していなければ視線方向の変化が継続して実行され、完了していればその視線方向が維持される。 Next, it is assumed that the process shown in FIG. 24 is started. Step 480 is the same as the first process. In step 482, since the duration time of the line of sight direction has not yet expired, the process in step 484 is not executed, and only the process in step 486 is executed. Therefore, if the change in the line-of-sight direction of the robot 160 has not yet been completed, the change in the line-of-sight direction is continued; if it has been completed, the change in the line-of-sight direction is maintained.
 こうして図24の処理が複数回実行された後、ターン交替が検出されたものとする。ここでは実際のターン交替のタイミングの1秒前が検出され図24のステップ480においてロボット160に通知される。ターン交替のタイミングの1秒前は視線方向決定のタイミングである。したがって上記したステップ484の処理が再度実行されて、ロボット160の新しい役割と、ターン状態=ターン交替期間と、ロボット160の個性とに応じて選択された視線方向モデルと視線継続モデルとを用いて新たな視線方向と新たな視線継続時間が決定される。その後、図24のステップ486によりロボット160の視線の制御が新たなパラメータに従って開始される。 It is assumed that a turn change is detected after the process in FIG. 24 has been executed multiple times in this way. Here, one second before the actual turn change timing is detected and notified to the robot 160 in step 480 of FIG. 24. One second before the turn change timing is the timing for determining the line of sight direction. Therefore, the process of step 484 described above is executed again, using the gaze direction model and gaze continuation model selected according to the new role of the robot 160, the turn state = turn change period, and the personality of the robot 160. A new gaze direction and a new gaze duration are determined. Thereafter, in step 486 of FIG. 24, control of the line of sight of the robot 160 is started according to the new parameters.
 以下、視線制御システム350は上記した処理を繰り返す。なお、既に述べたとおり、直前に決定した視線継続時間が満了した時、又はターン交替のタイミングを0秒として-1秒、-0.3秒、及び0.3秒のときにロボット160の視線方向の決定が行われる。 Hereafter, the line of sight control system 350 repeats the above-described process. As already mentioned, the robot 160's line of sight changes when the previously determined line of sight duration time expires, or when the turn change timing is -1 second, -0.3 seconds, and 0.3 seconds. A direction determination is made.
 なお、実際の3者対話では、視線逸らしのときには、一点を長時間見ているのではなく、途中で視線方向が変化することが観測された。この変化の回数は、視線逸らしの時間が長いほど多くなる傾向が見られた。そこで、視線を逸らした回数ごとの継続時間の分布の解析結果により、継続時間が0.7秒以下で1回、1.4秒以下で2回、2.1秒以下で3回、というように、0.7秒ごとに1回、視線方向を変化させる回数を増やすこととした。 Furthermore, in actual three-way dialogue, when looking away, it was observed that the direction of the line of sight changed midway through, rather than looking at one point for a long time. There was a tendency for the number of these changes to increase as the time spent averting gaze increased. Therefore, based on the analysis results of the distribution of duration for each number of times the line of sight was averted, we found that 1 time when the duration was 0.7 seconds or less, 2 times when the duration was 1.4 seconds or less, and 3 times when the duration was 2.1 seconds or less. In addition, we decided to increase the number of times the gaze direction was changed by once every 0.7 seconds.
 この場合にも、基本的には図13から図15に示す分布に従って視線方向を変化させた。ただし視線逸らしの間にさらに視線方向を変化させる場合、それまでの視線方向を中央とし、中央以外に視線方向を変化させることを前提に図13から図15に示す分布を適用した。また視線逸らしの間に2回以上視線方向を変化させる場合には、実際の観察結果として、視線を直前の方向に戻す傾向が見られたため、実装としても直前の視線方向に戻すようにした。 In this case as well, the viewing direction was basically changed according to the distributions shown in FIGS. 13 to 15. However, when the line-of-sight direction is further changed while the line-of-sight is averted, the distributions shown in FIGS. 13 to 15 are applied on the premise that the line-of-sight direction is set at the center and the line-of-sight direction is changed to a position other than the center. Furthermore, when the line of sight direction is changed more than once while looking away, actual observation results show that there is a tendency for the line of sight to return to the immediately previous direction, so in implementation, the line of sight is returned to the immediately previous direction.
 第3 評価実験
 1.評価実験1
 上記実施形態に係るロボット160について、個性を指定しない場合の視線方向の変化が観察者にどのように感じられるかについての評価実験を行った。
Third evaluation experiment 1. Evaluation experiment 1
Regarding the robot 160 according to the embodiment described above, an evaluation experiment was conducted to examine how a change in the line of sight direction is perceived by an observer when no personality is specified.
 A.モデル
 この実施形態では、役割ごとの視線パターン及び視線を逸らすときの瞳の動きが対話における重要な要素である。そのため、評価実験では以下の4つの条件によりロボット160の動きに対する評価を比較することにした。
A. Model In this embodiment, role-specific gaze patterns and pupil movements when looking away are important elements in the interaction. Therefore, in the evaluation experiment, it was decided to compare the evaluation of the movement of the robot 160 under the following four conditions.
 a.ベースライン(同割合-頭部モデル)
 ベースラインとして、ロボット160の役割が発話者の時、2人の対話者の方向に同じ割合でロボットが視線を向けるモデルを使用した。2人の対話者とそれ以外のところを見る割合であるが、第1の対話者、第2の対話者、及びそれ以外を見る割合をおおよそ3:3:4になるように設定した。この際、ロボットが視線を向ける場所については、人の顔又は2人の対話者の間に角度4度を分散とした2次元ガウス分布を適用して得られた場所をロボットが見るようにした。聞き手の時は違和感を所持させないように、ロボットが話者の方向を向くようにした。
a. Baseline (same proportion - head model)
As a baseline, we used a model in which the robot 160 turns its gaze at the same rate in the direction of two interlocutors when its role is as a speaker. Regarding the ratio of looking at the two interlocutors and the others, the ratio of looking at the first interlocutor, the second interlocutor, and the others was set to be approximately 3:3:4. At this time, the robot looks at the person's face or at a location obtained by applying a two-dimensional Gaussian distribution with an angle of 4 degrees distributed between the two interlocutors. . When acting as a listener, I made the robot face the direction of the speaker so as not to make the listener feel uncomfortable.
 b.比較モデル(同割合-頭部-眼球モデル)
 ロボットの視線制御では、眼球の動きが大切である。そのため、前述の同割合モデルにおいて、動かす部位を頭だけでなく眼球も追加し共に動かすモデルとした。
b. Comparison model (same ratio - head - eyeball model)
Eye movements are important for robot gaze control. Therefore, in the same-proportion model described above, we added not only the head but also the eyeballs and moved them together.
 c.評価モデル1(実施形態に係る頭部-眼球モデル)
 上記実施形態に基づいた視線制御及び視線逸らしの実装を行ったモデルである。
c. Evaluation model 1 (head-eyeball model according to embodiment)
This is a model that implements line-of-sight control and line-of-sight aversion based on the above embodiment.
 d.評価モデル2(実施形態に係る頭部モデル)
 前述の評価モデルの中で、眼球の動作をなくし、頭部動作のみを行うモデルである。
d. Evaluation model 2 (head model according to embodiment)
Among the evaluation models described above, this model eliminates eyeball movements and performs only head movements.
 B.印象評価
 この実施形態において最も期待したい効果はロボットの振る舞いが自然らしいかどうかである。そのため、評価者に対する質問として「全体的にロボットの振る舞いは自然に感じましたか」という質問項目を設けた。
B. Impression Evaluation The most desired effect in this embodiment is whether the robot's behavior seems natural. Therefore, we created a question for the evaluators: ``Overall, did the robot's behavior feel natural?''
 C.実験の設定
 この評価実験においては、実際に収録した3人により行われた対話の中の1人の会話音声をロボットに搭載したビデオを上記したモデルごとに作成した。その結果、4本のビデオが得られた。それらのビデオを評価者に視聴してもらい、その時の視線の動き方について評価してもらった。実際には図30により示すような動画を各評価者に見てもらった。
C. Experimental Settings In this evaluation experiment, a video was created for each of the above-mentioned models in which the robot was loaded with the audio of one of the conversations that were actually recorded between the three people. As a result, four videos were obtained. Evaluators were asked to watch these videos and evaluate the way their eyes moved during the video. Actually, each evaluator was asked to watch a video as shown in FIG. 30.
 男女30人(平均年齢32.1歳、分散10.3)に対して4つの動画を視聴してもらい、それぞれに対して印象評価をしてもらった。図30にこの実験において使用したビデオ画像720の1例を示す。実験はあくまでロボット160の視線方向の変化に関するものである。したがってこの評価実験では、発話は実際の人間による3者対話を使用し、その中の一人の発話をロボット160が行うようにした。その際には、ロボット160の視線方向をロボット160の役割、個性、ターン状態に従って、上記実施形態と同様に動くように制御した。実際の3者対話についてはラベル付がされているため、図16に示す役割判定部360及びターン交替検出部362などについては不要で適切なタイミングでそうしたラベルに応じた情報を視線動作生成部370に与えるようにした。 Thirty men and women (average age 32.1 years, variance 10.3) were asked to watch four videos and rate their impressions of each. FIG. 30 shows an example of a video image 720 used in this experiment. The experiment is strictly related to changes in the line of sight direction of the robot 160. Therefore, in this evaluation experiment, a three-party dialogue between actual humans was used for the utterances, and the robot 160 made the utterances of one of them. At that time, the line of sight direction of the robot 160 was controlled to move in the same manner as in the above embodiment according to the role, personality, and turn state of the robot 160. Since the actual three-way dialogue is labeled, the role determination unit 360 and turn change detection unit 362 shown in FIG. I decided to give it to
 ビデオ画像720は、図30に示すように中央にロボット160の動画を配置し、左右に他の2人の話者を示す簡単なキャラクタ画像730及び732を配置した。キャラクタ画像730及び732は3者対話であることを示すためのものだが、例えばロボット160の右側(図30の左側)に存在する話者が発話しているときにはキャラクタ画像730の周りに発話していることを示すサインを表示するようにした。ロボット160の左側(向かって右側)の話者の場合も同様である。図30は向かって右側の話者が発話している状況を示す。 In the video image 720, as shown in FIG. 30, a moving image of the robot 160 is placed in the center, and simple character images 730 and 732 showing the other two speakers are placed on the left and right sides. The character images 730 and 732 are used to indicate a three-way dialogue, but for example, when a speaker on the right side of the robot 160 (left side in FIG. 30) is speaking, the characters around the character image 730 are A sign will now be displayed to indicate that it is present. The same applies to the speaker on the left side of the robot 160 (on the right side when facing the robot 160). FIG. 30 shows a situation where the speaker on the right side is speaking.
 評価実験では、このような4本の動画を男女30人に視聴してもらい、印象評価を収集した。なお、この動画の視聴時には、向かって右側の話者の発話は評価者の右耳に、左側の話者の発話は評価者の左耳に、ロボットの発話は評価者の両耳に、それぞれ聞こえるようにした。こうした形で、前述した4つのモデルについてそれぞれ個別に印象評価をしてもらった。 In an evaluation experiment, 30 men and women watched these four videos and collected impression ratings. When viewing this video, the utterances of the speaker on the right will be heard in the evaluator's right ear, the utterances of the speaker on the left will be heard in the evaluator's left ear, and the utterances of the robot will be heard in the evaluator's both ears. I made it audible. In this way, we asked participants to evaluate their impressions of each of the four models mentioned above.
 実験手順は次のとおりである。まず、順序効果を減らすために4つの動画の提示順序をランダム化した。次に、4つの手法の動画について個別に7点スケール(1:とても不自然、4:どちらともいえない、7:とても自然)で評価者に印象評価してもらった。以上を1セットとし、このセットを3つの対話区間(各対話区間の長さは1分程度)に対し合計12個の動画について印象評価し、結果を集計した。 The experimental procedure is as follows. First, the presentation order of the four videos was randomized to reduce order effects. Next, the evaluators were asked to rate their impressions of the videos of the four methods individually on a 7-point scale (1: very unnatural, 4: neither, 7: very natural). The above was considered as one set, and impressions were evaluated for a total of 12 videos for three dialogue sections (each dialogue section was about 1 minute in length), and the results were tabulated.
 D.評価結果
上記3つの対話区間の中で、2つでは条件間に有意差が認められなかった。残りの1対話区間では図31に示すように条件間で有意差が見られた。図31に、「全体的にロボットの振る舞いは自然に感じましたか」という質問に対する評価結果を示す。各モデルについて、結果の平均値、標準誤差、ライアン法に基づいて多重比較の結果をそれぞれ算出した。
D. Evaluation Results Among the three dialogue sections mentioned above, no significant differences were observed between the conditions in two. In the remaining one dialogue section, a significant difference was observed between the conditions as shown in FIG. 31. FIG. 31 shows the evaluation results for the question "Overall, did the robot's behavior feel natural?" For each model, the results of multiple comparisons were calculated based on the mean value, standard error, and Ryan method of the results.
 ライアン法の結果、ベースラインモデルと評価モデル1の間にp値0.020(≦.05)、比較モデルと評価モデル1の間にp値0.001(≦.05)、評価モデル2と評価モデル1との間にp値0.008(≦.05)と、それぞれ有意差が見られた。 As a result of the Ryan method, the p value was 0.020 (≦.05) between the baseline model and evaluation model 1, the p value was 0.001 (≦.05) between the comparison model and evaluation model 1, and the p value was 0.001 (≦.05) between the comparison model and evaluation model 1. There was a significant difference between evaluation model 1 and p value 0.008 (≦.05).
 以上の結果から、上記4つのモデルの中では、実施形態にもとづいた評価モデル3によるロボットの振る舞いが最も自然であることが示された。 From the above results, it was shown that among the above four models, the robot behavior according to evaluation model 3 based on the embodiment is the most natural.
 2.評価実験2
 上記評価実験1の設定に加え、ロボットの視線制御に個性まで考慮し、視線動作による個性(外向性)の印象を評価するための実験を行った。
2. Evaluation experiment 2
In addition to the settings of Evaluation Experiment 1 above, we also took into account individuality in the robot's gaze control, and conducted an experiment to evaluate the impression of individuality (extroversion) based on gaze movements.
 A.実験手法
 視線動作を評価するための実験を行うため、まず3者対話データから話者A又は話者Bが参加している対話区間を抽出した。さらに、各話者の会話音声を用いて、ロボットの視線動作を生成し、動画を作成した。実験では複数の条件において生成した視線動作を評価してもらった。視線動作による個性表出の印象の違いを調べるため、話者Aのデータから作成したモデル(PA-M)と、話者Bのデータから作成したモデル(PB-M)を使ってロボットの視線動作を生成した。比較対象として、話者の視線動作を再現したもの(PA-R又はPB-R)と、動作なしで音声のみを呈示したもの(PA-S又はPB-S)も準備した。
A. Experimental Method In order to conduct an experiment to evaluate eye gaze movements, we first extracted dialogue sections in which speaker A or speaker B participated from the three-party dialogue data. Furthermore, the robot's gaze movements were generated using the conversational voices of each speaker, and a video was created. In the experiment, participants were asked to evaluate the gaze movements generated under multiple conditions. In order to investigate the difference in the impression of personality expression due to gaze movements, we used a model created from speaker A's data (PA-M) and a model created from speaker B's data (PB-M) to determine the robot's gaze. generated the action. For comparison purposes, we also prepared one that reproduced the speaker's gaze movements (PA-R or PB-R) and one that presented only voice without movements (PA-S or PB-S).
 話者Aと話者Bに関して、それぞれ40秒前後の会話区間を抽出し、上述の4つ条件で動画を収録し、被験者実験に用いた。動画の内容は図30に示したものと同様である。評価者への発話の聞こえ方も評価実験1と同様である。では左右(斜め前)に対話者がいることを明確にするため、図7(b)に示すように、画面の両側に対話者を示すキャラクタ画像を描き、声が発すると放射線が描かれるようにした。対話者の音声も左右の耳にそれぞれ聞こえるようにし、話者A又は話者Bの音声は両耳に聞こえるようにした。音声のみの動画では、対象となる話者もロボットの代わりに動作なしのキャラクタ画像を描いた。 Regarding speaker A and speaker B, conversation sections of around 40 seconds each were extracted, videos were recorded under the above four conditions, and used in the subject experiment. The content of the video is similar to that shown in FIG. 30. The way the utterances were heard by the evaluator was also the same as in Evaluation Experiment 1. In order to make it clear that there are interlocutors on the left and right (diagonally in front), we draw character images representing the interlocutors on both sides of the screen, as shown in Figure 7(b), so that when a voice is uttered, a radiation line is drawn. I made it. The voices of the interlocutors were also audible to the left and right ears, respectively, and the voices of speaker A or B were audible to both ears. In the audio-only video, the target speaker also drew a character image without movement instead of a robot.
 実験参加者は、各対話区間において、音声のみの動画(S-M)をまず視聴し、その後、ロボットの視線動作が含まれる3つの動画を視聴した。順序効果を減らすために、3つの動画の提示順序をランダム化した。各動画を視聴した直後に7点スケール(1:非常に内向的、4:どちらともいえない、7:非常に外向的)で印象評価をしてもらった。なお、実験参加者には視線動作を制御した実験であることは伝えなかった。 In each interaction section, the experiment participants first watched an audio-only video (SM), and then watched three videos that included the robot's gaze movements. The presentation order of the three videos was randomized to reduce order effects. Immediately after watching each video, participants were asked to rate their impressions on a 7-point scale (1: very introverted, 4: neutral, 7: very extroverted). Furthermore, the participants were not informed that the experiment involved controlling gaze movements.
 B. 実験結果
 男女41人(平均37.5歳、標準偏差14.1歳)が実験に参加した。図32に印象評価の結果を示す。
B. Experimental Results 41 men and women (mean age 37.5 years, standard deviation 14.1 years) participated in the experiment. Figure 32 shows the results of impression evaluation.
 a.話者Aの声
 図32(A)より、まず音声のみを聞いた場合(PA-S)の外向性の印象は、外向寄り(4以上)であることが分かる。ロボットを通して視線動作を制御した場合、話者Aのデータから構築した視線動作生成モデル(PA-M)は、話者Aの視線動作を再現した場合(PA-R)と同等の外向性の印象を受け、内向寄りである話者Bのデータから構築したモデル(PB-M)では外向性の印象が有意差をもって下がることが示されている。
a. Voice of Speaker A From FIG. 32(A), it can be seen that the impression of extroversion when listening only to the voice (PA-S) is toward extroversion (4 or more). When the gaze behavior is controlled through the robot, the gaze behavior generation model (PA-M) constructed from speaker A's data produces the same impression of extroversion as when the gaze behavior of speaker A is reproduced (PA-R). Based on this, a model (PB-M) constructed from the data of speaker B, who leans towards introversion, shows that the impression of extroversion decreases by a significant difference.
 b.話者Bの声
 図32(B)より、音声のみを聞いた場合(PB-S)は、内向寄り(4以下)であることが分かる。ロボットを通して視線動作を制御した場合、話者Bのデータから構築した視線動作生成モデル(PB-M)は、話者Bの視線動作を再現した場合(PB-R)と同等の外向性の印象で、外向寄りである話者Aのデータから構築したモデル(PA-M)では外向性の印象が有意的に上がることが示されている。
b. Voice of Speaker B From FIG. 32(B), it can be seen that when only the voice is heard (PB-S), the person is more introverted (4 or less). When the gaze behavior is controlled through the robot, the gaze behavior generation model (PB-M) constructed from speaker B's data produces the same impression of extroversion as when the gaze behavior of speaker B is reproduced (PB-R). The model (PA-M) constructed from the data of speaker A, who is more extroverted, has been shown to significantly increase the impression of extroversion.
 C. 考察
 以上の結果より、異なる個性(外向性)を持つ話者の視線生成モデルを使い分けることにより、同じ音声でも表出される個性の印象を変えることができることが示された。この結果より、モデルのパラメータを内挿又は外挿することにより、外向性の印象を連続的に制御できることも示唆された。
C. Discussion The above results show that by using different gaze generation models for speakers with different personalities (extraversion), it is possible to change the impression of personality expressed even in the same voice. These results also suggest that impressions of extraversion can be continuously controlled by interpolating or extrapolating model parameters.
 第4 変形例
 上記実施形態では、モデルには各方向に視線を向ける確率が格納され、実際に視線方向を決定するときにはそれら確率を加算してから乱数により視線方向をサンプリングしている。しかしこの発明はそのような実施形態には限定されない。例えばモデルには、あらかじめ各確率を加算した形の値(累積確率)を格納するようにしておいてもよい。上記実施形態では、視線逸らし時の方向は9方向としている。しかしこの発明はそのような実施形態には限定されない。より細かく方向を分類してもよい。
Fourth Modified Example In the above embodiment, the model stores the probabilities of directing the line of sight in each direction, and when actually determining the line of sight direction, these probabilities are added together and then the line of sight direction is sampled using random numbers. However, the invention is not limited to such embodiments. For example, the model may store in advance a value obtained by adding up each probability (cumulative probability). In the above embodiment, there are nine directions when looking away. However, the invention is not limited to such embodiments. Directions may be classified more finely.
 さらに上記実施形態では、ロボットなどのエージェントの対話上の役割と、ターン状態と、個性との組み合わせにより視線方向モデル及び視線継続モデルを準備し利用している。しかしこの発明はそのような実施形態には限定されない。これらの中のいずれかを抜いた形でモデルを準備してもよい。またこれらとは異なる基準を視線方向モデル選択のために用いてもよい。例えば年齢、教師と学生、親と子などの社会的立場を基準としてもよい。 Further, in the above embodiment, a line-of-sight direction model and a line-of-sight continuation model are prepared and used based on a combination of the interaction role, turn state, and personality of an agent such as a robot. However, the invention is not limited to such embodiments. A model may be prepared without any of these. Further, criteria different from these may be used for selecting the line-of-sight direction model. For example, age, social positions such as teacher and student, parent and child, etc. may be used as criteria.
 上記実施形態は3者対話に関するものである。しかし、上記実施形態の説明からも明らかなとおり、各発話者が区別され、その役割が分類できるような状況であれば4人以上の対話でも本発明を適用できる。 The above embodiment relates to three-party dialogue. However, as is clear from the description of the above embodiments, the present invention can be applied to conversations involving four or more people as long as each speaker can be distinguished and their roles can be classified.
 今回開示された実施の形態は単に例示であって、本発明が上記した実施の形態のみに制限されるわけではない。本発明の範囲は、発明の詳細な説明の記載を参酌した上で、請求の範囲の各請求項によって示され、そこに記載された文言と均等の意味及び範囲内での全ての変更を含む。 The embodiments disclosed herein are merely examples, and the present invention is not limited to the above-described embodiments. The scope of the present invention is indicated by each claim, with reference to the detailed description of the invention, and includes all changes within the scope and meanings equivalent to the words stated therein. .
50 3者対話
60、62、64 参加者
80、82、84 視線ラベル列
100、104 発話
102、106 ターン交替ラベル
108 ターン交替
150 会話ロボットシステム
160 ロボット
162 動作制御PC
164 マイクロフォン
166 音声処理PC
168 スピーカ
170 音声合成PC
172 確率モデル記憶装置
174 対話用PC
176 ネットワーク
178 人位置センサ
180 人位置認識PC
190 音声認識部
192 ターン認識部
250 コンピュータシステム
270 コンピュータ
272 モニタ
274 キーボード
276 マウス
278 DVD
284 USBメモリ
290 CPU
292 GPU
296 ROM
298 RAM
300 SSD
302 DVDドライブ
304 音声I/F
306 USBポート
308 ネットワークI/F
310 バス
350 視線制御システム
360 役割判定部
362 ターン交替検出部
364 視線方向モデル
366 モデル
368 個性情報記憶部
370 視線動作生成部
372 視線動作制御部
400 発話時視線方向モデル
402 視線継続時間モデル
404、406、408 ターン交替時視線方向モデル
450、452 方向モデル
 
50 Three- way dialogue 60, 62, 64 Participants 80, 82, 84 Gaze label string 100, 104 Utterances 102, 106 Turn change label 108 Turn change 150 Conversation robot system 160 Robot 162 Operation control PC
164 Microphone 166 Audio processing PC
168 Speaker 170 Voice synthesis PC
172 Probabilistic model storage device 174 PC for interaction
176 Network 178 Person position sensor 180 Person position recognition PC
190 Voice recognition unit 192 Turn recognition unit 250 Computer system 270 Computer 272 Monitor 274 Keyboard 276 Mouse 278 DVD
284 USB memory 290 CPU
292 GPUs
296 ROM
298 RAM
300 SSD
302 DVD drive 304 Audio I/F
306 USB port 308 Network I/F
310 Bus 350 Gaze control system 360 Role determination section 362 Turn change detection section 364 Gaze direction model 366 Model 368 Personality information storage section 370 Gaze motion generation section 372 Gaze motion control section 400 Gaze direction model during speech 402 Gaze duration model 404, 406 , 408 View direction model during turn change 450, 452 Direction model

Claims (17)

  1.  複数人対話におけるロボットの視線を制御するための視線制御装置であって、
     視線方向の決定のためのタイミングとなったことに応答して、前記複数人対話における前記ロボットの役割と対話フローの状態との組み合わせに基づいて、前記ロボットの視線方向を定めるための視線方向設定手段と、
     前記視線方向設定手段により前記ロボットの視線方向が定められたことに応答して、前記ロボットの顔の向き及び眼球の方向を制御するための制御パラメータを生成するための制御パラメータ生成手段とを含む、視線制御装置。
    A line-of-sight control device for controlling the line-of-sight of a robot in multi-person dialogue,
    Gaze direction setting for determining the gaze direction of the robot based on a combination of the role of the robot in the multi-person dialogue and the state of the dialogue flow in response to the timing for determining the gaze direction. means and
    control parameter generation means for generating control parameters for controlling the direction of the robot's face and the direction of the eyeballs in response to the gaze direction of the robot being determined by the gaze direction setting means; , line of sight control device.
  2.  前記視線方向設定手段は、
     前記複数人対話における複数の参加者の各役割と、前記対話フローの状態との組み合わせに応じて、前記複数の参加者があらかじめ定められた複数の方向をそれぞれ向く確率を役割ごとに定める方向決定モデルを記憶するための方向決定モデル記憶手段と、
     視線方向の決定のための前記タイミングとなったことに応答して、前記方向決定モデルから前記ロボットの役割と前記対話フローの状態との組み合わせに応じた確率分布を抽出するための確率分布抽出手段と、
     前記確率分布抽出手段により抽出された確率分布から前記ロボットの視線方向をサンプリングするための第1サンプリング手段とを含む、請求項1に記載の視線制御装置。
    The line-of-sight direction setting means
    Direction determination that determines, for each role, the probability that the plurality of participants will face in a plurality of predetermined directions, according to a combination of each role of the plurality of participants in the multi-person dialogue and the state of the dialogue flow. a direction determining model storage means for storing the model;
    Probability distribution extraction means for extracting a probability distribution according to the combination of the role of the robot and the state of the dialogue flow from the direction determination model in response to the timing for determining the line of sight direction. and,
    The line-of-sight control device according to claim 1, further comprising: first sampling means for sampling the line-of-sight direction of the robot from the probability distribution extracted by the probability distribution extraction means.
  3.  前記方向決定モデルの前記複数の方向は、前記複数の参加者の方向と、前記複数の参加者の方向のいずれとも異なる視線逸らし方向とを含む、請求項2に記載の視線制御装置。 The line-of-sight control device according to claim 2, wherein the plurality of directions of the direction determination model include directions of the plurality of participants and a line-of-sight aversion direction that is different from any of the directions of the plurality of participants.
  4.  前記視線方向設定手段はさらに、
     前記ロボットの役割と前記対話フローの状態との組み合わせに応じて前記視線逸らし方向を確率的に決定するための確率モデルからなる視線逸らし方向モデルを記憶するための視線逸らし方向モデル記憶手段と、
     前記第1サンプリング手段によりサンプリングされた視線方向が、前記視線逸らし方向であることに応答して、前記視線逸らし方向モデルから前記ロボットの視線を逸らす方向をサンプリングするための第2サンプリング手段とを含む、請求項3に記載の視線制御装置。
    The viewing direction setting means further includes:
    a gaze aversion direction model storage means for storing a gaze aversion direction model comprising a probabilistic model for probabilistically determining the gaze aversion direction according to a combination of the role of the robot and the state of the dialogue flow;
    and second sampling means for sampling a direction in which the robot's line of sight is averted from the line of sight direction model in response to the line of sight direction sampled by the first sampling unit being the line of sight averting direction. The line of sight control device according to claim 3.
  5.  さらに、前記ロボットの役割と、前記対話フローの状態と、前記視線方向設定手段により定められた視線方向との組み合わせに応じて、前記ロボットの視線の継続時間を算出するための継続時間算出部を含む、請求項1から請求項4のいずれか1項に記載の視線制御装置。 Furthermore, a duration calculation unit for calculating the duration of the line of sight of the robot according to a combination of the role of the robot, the state of the dialogue flow, and the line of sight direction determined by the line of sight direction setting means. The line of sight control device according to any one of claims 1 to 4, comprising:
  6.  前記視線方向の決定のための前記タイミングは、前記対話フローの状態がターン交替状態のときと、それ以外のときとで異なる、請求項5に記載の視線制御装置。 The line-of-sight control device according to claim 5, wherein the timing for determining the line-of-sight direction is different when the dialog flow state is a turn change state and at other times.
  7.  前記視線方向の決定のための前記タイミングは、前記対話フローの状態がターン交替状態の
    ときにはターン交替状態におけるあらかじめ定められたタイミングであり、前記対話フローの状態がターン交替状態でないときには、直前に前記継続時間算出部により算出された継続時間が満了したタイミングである、請求項6に記載の視線制御装置。
    The timing for determining the line of sight direction is a predetermined timing in a turn change state when the state of the dialogue flow is a turn change state; The line of sight control device according to claim 6, wherein the timing is when the duration calculated by the duration calculation unit expires.
  8.  前記視線方向設定手段は、前記複数人対話における複数の参加者の各役割と、前記対話フローの状態と、前記ロボットに想定される個性との組み合わせに応じて、前記複数の参加者があらかじめ定められた複数の方向をそれぞれ向く確率を役割ごとに定める方向決定モデルを記憶するための方向決定モデル記憶手段と、
     視線方向の決定のための前記タイミングとなったことに応答して、前記方向決定モデルから前記ロボットの役割と前記対話フローの状態と前記個性との組み合わせに応じた確率分布を抽出するための確率分布抽出手段と、
     前記確率分布抽出手段により抽出された確率分布から前記ロボットの視線方向をサンプリングするための第1サンプリング手段とを含む、請求項1に記載の視線制御装置。
    The line-of-sight direction setting means is configured to be set in advance by the plurality of participants in accordance with a combination of the roles of the plurality of participants in the multi-person dialogue, the state of the dialogue flow, and the expected personality of the robot. direction determination model storage means for storing a direction determination model that determines the probability of facing each of the plurality of directions for each role;
    A probability for extracting a probability distribution according to a combination of the role of the robot, the state of the dialogue flow, and the personality from the direction determination model in response to the timing for determining the line of sight direction. a distribution extraction means;
    The line-of-sight control device according to claim 1, further comprising: first sampling means for sampling the line-of-sight direction of the robot from the probability distribution extracted by the probability distribution extraction means.
  9.  前記方向決定モデルの前記複数の方向は、前記複数の参加者の方向と、前記複数の参加者の方向のいずれとも異なる視線逸らし方向とを含む、請求項8に記載の視線制御装置。 The line-of-sight control device according to claim 8, wherein the plurality of directions of the direction determination model include directions of the plurality of participants and a line-of-sight aversion direction that is different from any of the directions of the plurality of participants.
  10.  前記視線方向設定手段はさらに、
     前記ロボットの役割と前記対話フローの状態と前記個性との組み合わせに応じて前記視線逸らし方向を確率的に決定するための確率モデルからなる視線逸らし方向モデルを記憶するための視線逸らし方向モデル記憶手段と、
     前記第1サンプリング手段によりサンプリングされた視線方向が、前記視線逸らし方向であることに応答して、前記視線逸らし方向モデルから前記ロボットの視線を逸らす方向をサンプリングするための第2サンプリング手段とを含む、請求項9に記載の視線制御装置。
    The viewing direction setting means further includes:
    Gaze aversion direction model storage means for storing a gaze aversion direction model consisting of a probabilistic model for probabilistically determining the gaze aversion direction according to a combination of the role of the robot, the state of the dialogue flow, and the individuality. and,
    and second sampling means for sampling a direction in which the robot's line of sight is averted from the line of sight direction model in response to the line of sight direction sampled by the first sampling unit being the line of sight averting direction. The line of sight control device according to claim 9.
  11.  さらに、前記ロボットの役割と、前記対話フローの状態と、前記個性と、前記視線方向設定手段により定められた視線方向との組み合わせに応じて、前記ロボットの視線の継続時間を算出するための継続時間算出部を含む、請求項8から請求項10のいずれか1項に記載の視線制御装置。 Further, a continuation for calculating the duration of the line of sight of the robot according to a combination of the role of the robot, the state of the dialogue flow, the personality, and the line of sight direction determined by the line of sight direction setting means. The line of sight control device according to any one of claims 8 to 10, comprising a time calculation section.
  12.  前記視線方向の決定のための前記タイミングは、前記対話フローの状態がターン交替状態のときと、それ以外のときとで異なる、請求項11に記載の視線制御装置。 The line-of-sight control device according to claim 11, wherein the timing for determining the line-of-sight direction differs between when the dialog flow state is a turn change state and at other times.
  13.  前記視線方向の決定のための前記タイミングは、前記対話フローの状態がターン交替状態のときには当該ターン交替状態におけるあらかじめ定められたタイミングであり、前記対話フローの状態がターン交替状態でないときには、直前に前記継続時間算出部により算出された継続時間が満了したタイミングである、請求項11に記載の視線制御装置。 The timing for determining the line of sight direction is a predetermined timing in the turn change state when the state of the dialogue flow is a turn change state, and a predetermined timing in the turn change state when the state of the dialogue flow is not a turn change state. The line of sight control device according to claim 11, wherein the timing is when the duration calculated by the duration calculation unit expires.
  14.  複数人対話におけるロボットの視線を制御するための、コンピュータにより実現される視線制御方法であって、
     コンピュータが、視線方向の決定のためのタイミングとなったことに応答して、前記複数人対話における前記ロボットの役割と対話フローの状態との組み合わせに基づいて、前記ロボットの視線方向を定めるステップと、
     コンピュータが、前記視線方向を定めるステップにおいて前記ロボットの視線方向が定められたことに応答して、前記ロボットの顔の向き及び眼球の方向を制御するための制御パラメータを生成するステップとを含む、視線制御方法。
    A computer-implemented gaze control method for controlling the gaze of a robot in multi-person dialogue, the method comprising:
    a step in which the computer determines the line of sight direction of the robot based on a combination of the role of the robot in the multi-person interaction and the state of the dialog flow in response to the timing for determining the line of sight direction; ,
    the computer generating control parameters for controlling the direction of the face and the direction of the eyes of the robot in response to the direction of the robot's line of sight being determined in the step of determining the direction of the line of sight; Gaze control method.
  15.  複数人対話におけるロボットの視線を制御するためのコンピュータプログラムであって、コンピュータを、
     視線方向の決定のためのタイミングとなったことに応答して、前記複数人対話における前記ロボットの役割と対話フローの状態との組み合わせに基づいて、前記ロボットの視線方向を定めるための視線方向設定手段と、
     前記視線方向設定手段により前記ロボットの視線方向が定められたことに応答して、前記ロボットの顔の向き及び眼球の方向を制御するための制御パラメータを生成するための制御パラメータ生成手段として機能させる、コンピュータプログラム。
    A computer program for controlling the line of sight of a robot in multi-person dialogue, the computer program
    Gaze direction setting for determining the gaze direction of the robot based on a combination of the role of the robot in the multi-person dialogue and the state of the dialogue flow in response to the timing for determining the gaze direction. means and
    Functions as a control parameter generation means for generating control parameters for controlling the direction of the robot's face and the direction of the eyeballs in response to the gaze direction of the robot being determined by the gaze direction setting means. , computer program.
  16.  複数人対話におけるロボットの視線を制御するための、プロセッサを持つコンピュータにより実現される視線制御装置であって、
     前記プロセッサは、
     前記ロボットの視線方向の決定のためのタイミングとなったことに応答して、前記複数人対話における前記ロボットの役割と対話フローの状態との組み合わせに基づいて、前記ロボットの視線方向を定め、
     前記ロボットの前記視線方向が定められたことに応答して、前記ロボットの顔の向き及び眼球の方向を制御するための制御パラメータを生成するように構成されている、視線制御装置。
    A line-of-sight control device realized by a computer with a processor for controlling the line-of-sight of a robot in multi-person dialogue,
    The processor includes:
    In response to the timing for determining the line of sight direction of the robot, determining the line of sight direction of the robot based on a combination of the role of the robot in the multi-person dialogue and the state of the dialogue flow;
    A line-of-sight control device configured to generate control parameters for controlling a face direction and an eyeball direction of the robot in response to determining the line-of-sight direction of the robot.
  17.  請求項16に記載の視線制御装置として機能するようコンピュータを動作させるコンピュータプログラムを記憶した、非一時的な、コンピュータ読み出し可能な非一時的記憶媒体。 A non-transitory, computer-readable storage medium storing a computer program for operating a computer to function as the line-of-sight control device according to claim 16.
PCT/JP2022/038670 2022-05-27 2022-10-18 Line-of-sight control device and method, non-temporary storage medium, and computer program WO2023228433A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2022086674 2022-05-27
JP2022-086674 2022-05-27

Publications (1)

Publication Number Publication Date
WO2023228433A1 true WO2023228433A1 (en) 2023-11-30

Family

ID=88918869

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/038670 WO2023228433A1 (en) 2022-05-27 2022-10-18 Line-of-sight control device and method, non-temporary storage medium, and computer program

Country Status (1)

Country Link
WO (1) WO2023228433A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002222437A (en) * 2001-01-26 2002-08-09 Nippon Telegr & Teleph Corp <Ntt> Device and method for personification interface, personification interface program, and recording medium with recorded personification interface program

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002222437A (en) * 2001-01-26 2002-08-09 Nippon Telegr & Teleph Corp <Ntt> Device and method for personification interface, personification interface program, and recording medium with recorded personification interface program

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FUKAYAMA ATSUSHI, OHNO TAKEHIKO, MUKAWA NAOKI, SAWAKI MINAKO, HAGITA NORIHIRO: "Gaze Control Method for Impression Management of Interface Agents", TRANSACTIONS OF INFORMATION PROCESSING SOCIETY OF JAPAN, vol. 43, no. 12, 15 December 2002 (2002-12-15), pages 3596 - 3606, XP093112127 *
SHINTANI, ISHI CARLOS T, ISHIGURO HIROSHI: "Analysis of Role-Based Gaze Behaviors and their Implementation in a Robot in Multiparty Conversation", THE 57TH JSAI SIG ON AI CHALLENGE, JAPANESE SOCIETY FOR ARTIFICIAL INTELLIGENCE, 21 November 2020 (2020-11-21), pages 106 - 114, XP093112124 *

Similar Documents

Publication Publication Date Title
US11151997B2 (en) Dialog system, dialog method, dialog apparatus and program
Colburn et al. The role of eye gaze in avatar mediated conversational interfaces
WO2017200074A1 (en) Dialog method, dialog system, dialog device, and program
WO2017200080A1 (en) Intercommunication method, intercommunication device, and program
JP5294315B2 (en) Dialogue activation robot
Johansson et al. Opportunities and obligations to take turns in collaborative multi-party human-robot interaction
Papaioannou et al. Hybrid chat and task dialogue for more engaging hri using reinforcement learning
Clavel et al. Fostering user engagement in face-to-face human-agent interactions: a survey
Bilac et al. Gaze and filled pause detection for smooth human-robot conversations
Ehret et al. Who's next? Integrating Non-Verbal Turn-Taking Cues for Embodied Conversational Agents
JP2010110864A (en) Robot system and method and program for activating communication
WO2021210332A1 (en) Information processing device, information processing system, information processing method, and program
JP7124715B2 (en) Information processing device, information processing method, and program
JP2004234631A (en) System for managing interaction between user and interactive embodied agent, and method for managing interaction of interactive embodied agent with user
El Haddad et al. Laughter and smile processing for human-computer interactions
WO2023228433A1 (en) Line-of-sight control device and method, non-temporary storage medium, and computer program
Bohus et al. Computational models for multiparty turn taking
JP2003108502A (en) Physical media communication system
Nixon et al. An eye gaze model for controlling the display of social status in believable virtual humans
Huang et al. Can a Virtual Listener Replace a Human Listener in Active Listening Conversation?
WO2017200077A1 (en) Dialog method, dialog system, dialog device, and program
JP7286303B2 (en) Conference support system and conference robot
Carvalho et al. Investigating and Comparing the Perceptions of Voice Interaction in Digital Games: Opportunities for Health and Wellness Applications
Babu et al. Marve: a prototype virtual human interface framework for studying human-virtual human interaction
Niculescu et al. Socializing with Olivia, the youngest robot receptionist outside the lab

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22943832

Country of ref document: EP

Kind code of ref document: A1