WO2024111121A1

WO2024111121A1 - Interview support device, interview support method, and program

Info

Publication number: WO2024111121A1
Application number: PCT/JP2022/043607
Authority: WO
Inventors: 香央里藤村; 大河佐野; 妙佐藤; 康雄石榑; 麻美宮島
Original assignee: 日本電信電話株式会社
Priority date: 2022-11-25
Filing date: 2022-11-25
Publication date: 2024-05-30

Abstract

An interview support device according to an embodiment of the present disclosure comprises: a determination unit configured to determine whether or not an emotional change has occurred in a first subject during an interview using at least one of the voice of the first subject and a first voice recognition result that represents the result of voice recognition of the voice of the first subject; a conversation extraction unit configured to, if it is determined that an emotional change has occurred in the first subject, extract a conversation comprising a plurality of utterances including an utterance that caused the emotional change, from the first voice recognition result and a second voice recognition result that represents the result of voice recognition of the voice of a first interviewer who is interviewing the first subject; a case creation unit configured to create case information in which the characteristics of the first subject are associated with the conversation, and store the created case information in a storage unit; and a similar case presentation unit configured to present, as similar case information, case information that is included in the case information stored in the storage unit, and that includes characteristics similar to the characteristics of a second subject to be interviewed, to a second interviewer who will interview the second subject.

Description

Interview support device, interview support method, and program

This disclosure relates to an interview support device, an interview support method, and a program.

When an interviewer interviews an interviewee (the person being interviewed; hereafter also referred to as the "subject"), the interviewer picks up on the subject's emotional changes from facial expressions, movements, gaze, responses, etc., and from these emotional changes he or she is able to grasp the subject's interest or concern in a particular topic or matter. For example, in an interview regarding health guidance, the interviewer will grasp from emotional changes picked up from changes in the subject's tone of voice, volume of speech, etc., whether a rapport has been established with the subject, whether the subject has become interested in health, whether they feel motivated to improve their lifestyle, whether they are sufficiently motivated to take action, etc.

In addition, one known study to clarify the motivational factors that motivate people who receive specific health guidance to engage in health promotion behavior is to conduct semi-structured interviews with people who receive specific health guidance and extract motivational factors from the content of the interviews (Non-Patent Document 1).

However, in various consultations (including interviews, conversations, etc.) including health guidance, the know-how of conversations that induce emotional changes in the subject is accumulated by each individual interviewer and is not shared. Furthermore, conversations that induce emotional changes in the subject may differ depending on the characteristics of the subject (for example, the subject's personality, lifestyle, level of health awareness, etc.).

The present disclosure has been made in consideration of the above points, and provides technology that can present an interviewee with a conversation that induces an emotional change in the subject according to the subject's characteristics.

The interview support device according to one aspect of the present disclosure includes a determination unit configured to determine whether or not an emotional change has occurred in the first subject using at least one of the voice of the first subject during an interview and a first voice recognition result representing the result of voice recognition of the voice of the first subject; a conversation extraction unit configured to extract a conversation consisting of a plurality of utterances including an utterance in which the emotional change has occurred from the first voice recognition result and a second voice recognition result representing the result of voice recognition of the voice of the first interviewer who is interviewing the first subject when it is determined that an emotional change has occurred in the first subject; a case creation unit configured to create case information that associates the characteristics of the first subject with the conversation and store the case information in a storage unit; and a similar case presentation unit configured to present case information stored in the storage unit that includes characteristics similar to the characteristics of the second subject to be interviewed as similar case information to the second interviewer who is interviewing the second subject.

Technology is provided that can present an interviewee with a conversation that induces emotional changes in the subject based on the subject's characteristics.

1 is a diagram illustrating an example of a hardware configuration of an interview support device according to an embodiment of the present invention. 1 is a diagram illustrating an example of a functional configuration of an interview support device according to an embodiment of the present invention. FIG. 11 is a diagram illustrating an example of case information. 10 is a flowchart illustrating an example of a case creation process according to the present embodiment. 10 is a flowchart illustrating an example of a similar case presentation process according to the present embodiment. FIG. 13 is a diagram showing an example of a presentation result of subject characteristics included in similar cases. FIG. 13 is a diagram showing an example of a presentation result of conversations included in similar cases.

An embodiment of the present invention will be described below. In the following embodiment, an interview support device 10 will be described that can present a conversation that will cause an emotional change in the subject of the interview according to the characteristics of the subject of the interview, for various interviews (including interviews, conversations, etc.) including health guidance. The interview support device 10 of this embodiment allows the interviewer to know the conversation that will cause an emotional change in the subject according to the characteristics of the subject, making it possible to effectively conduct an interview with the subject. For example, in an interview regarding health guidance, the interviewer can refer to the conversation presented by the interview support device 10 and effectively have a conversation that encourages the subject to change their behavior, such as improving their lifestyle habits.

In addition, emotional changes may also be called "mental changes," and a conversation that causes an emotional change in a subject refers to a conversation that touches or penetrates the subject's heart.

In the following, an interview regarding health guidance is assumed as the interview. However, this is only one example, and interviews to which the interview support device 10 according to this embodiment can be applied are not limited to interviews regarding health guidance. In addition to interviews regarding health guidance, the device can be applied to various interviews (including interviews) such as interviews regarding career guidance at schools, interviews regarding learning guidance at cram schools, personnel interviews and business interviews at companies, and employment interviews. More generally, the device can be applied to cases in which a certain person (interviewer) has some kind of conversation with one or more other people (subjects). The interview (including interviews, conversations, etc.) may be in the form of an online interview (including web interviews, etc.) or a face-to-face interview.

Here, the interview support device 10 according to this embodiment executes two processes, a "case creation process" and a "similar case presentation process." The "case creation process" is a process for creating case information that associates the characteristics of the subject with a conversation that caused an emotional change in the subject, using the results of speech recognition of the voices of the interviewee and the subject, the results of a preliminary interview, etc. On the other hand, the "similar case presentation process" is a process for acquiring case information that includes characteristics similar to the characteristics of the subject from the case information created in the case creation process as similar case information, and presenting the conversation included in this similar case information to the interviewee. The case creation process is executed before the similar case presentation process, but after a certain amount of case information has been created, for example, the case creation process may be executed in the background of the similar case presentation process, or the case creation process may be executed periodically or non-periodically.

In the following, the interviewer and the subject in the case creation process are also referred to as the "first interviewer" and the "first subject", respectively, and the interviewer and the subject in the similar case presentation process are also referred to as the "second interviewer" and the "second subject". The characteristics of the first subject are also referred to as the "first subject characteristics", and the characteristics of the second subject are also referred to as the "second subject characteristics". Here, the subject's characteristics are the nature that represents the characteristics of the subject. For example, the characteristics of a subject in an interview regarding health guidance include the test results in a health check as the subject's physical characteristics, personality tendencies and health consciousness as psychological characteristics, and occupation, family structure, and the subject's lifestyle as social characteristics. However, these are all examples, and the subject's characteristics are not limited to these, and various characteristics can be used as the subject's characteristics depending on the type of interview.

<Example of Hardware Configuration of Interview Support Device 10>
An example of the hardware configuration of an interview support device 10 according to this embodiment is shown in Fig. 1. As shown in Fig. 1, the interview support device 10 according to this embodiment is realized with the hardware configuration of a general computer, and has, for example, an input device 101, a display device 102, an external I/F 103, a communication I/F 104, a RAM (Random Access Memory) 105, a ROM (Read Only Memory) 106, an auxiliary storage device 107, and a processor 108. Each of these pieces of hardware are connected to each other so as to be able to communicate with each other via a bus 109.

The input device 101 is, for example, a keyboard, a mouse, a touch panel, a physical button, etc. The display device 102 is, for example, a display, a display panel, etc. Note that the interview support device 10 does not have to have at least one of the input device 101 and the display device 102, for example.

The external I/F 103 is an interface with external devices such as a recording medium 103a. Examples of recording media 103a include a CD (Compact Disc), a DVD (Digital Versatile Disk), an SD memory card (Secure Digital memory card), and a USB (Universal Serial Bus) memory card.

The communication I/F 104 is an interface for connecting the interview support device 10 to a communication network. The RAM 105 is a volatile semiconductor memory (storage device) that temporarily stores programs and data. The ROM 106 is a non-volatile semiconductor memory (storage device) that can store programs and data even when the power is turned off. The auxiliary storage device 107 is a non-volatile storage device such as a HDD (Hard Disk Drive), SSD (Solid State Drive), flash memory, etc. The processor 108 is, for example, various types of arithmetic devices such as a CPU (Central Processing Unit).

Note that the hardware configuration shown in FIG. 1 is an example, and the hardware configuration of the interview support device 10 is not limited to this. For example, the interview support device 10 may have multiple auxiliary storage devices 107 or multiple processors 108, may not have some of the hardware shown in the figure, or may have various hardware other than the hardware shown in the figure (e.g., a microphone, a speaker, a camera, etc.).

<Example of functional configuration of interview support device 10>
An example of the functional configuration of the interview support device 10 according to this embodiment is shown in Fig. 2. As shown in Fig. 2, the interview support device 10 according to this embodiment has a case creation processing unit 210 and a similar case presentation processing unit 220. Each of these units is realized, for example, by processing in which one or more programs installed in the interview support device 10 are executed by the processor 108 or the like. The interview support device 10 according to this embodiment also has a conversation DB 230 and a case information DB 240. Each of these DBs (databases) is realized, for example, by the auxiliary storage device 107 or the like. However, either or both of the conversation DB 230 and the case information DB 240 may also be realized by a storage device provided in a database server or the like connected to the interview support device 10 via a communications network.

In the example shown in FIG. 2, the case creation processing unit 210, the similar case presentation processing unit 220, the conversation DB 230, and the case information DB 240 are included in the interview support device 10 realized by a single computer, but the case creation processing unit 210, the similar case presentation processing unit 220, the conversation DB 230, and the case information DB 240 may be distributed among multiple computers. In this case, the system realized by these multiple computers may be called an "interview support system" or the like.

The case creation processing unit 210 executes the case creation process. Here, the case creation processing unit 210 includes a first interviewee voice recording unit 211, a first interviewee voice recognition unit 212, a first subject voice recording unit 213, a first subject voice recognition unit 214, an emotion change analysis unit 215, a conversation extraction unit 216, a first subject characteristic acquisition unit 217, and a case information creation unit 218.

The first interviewee voice recording unit 211 records the voice of the first interviewee input to the microphone for the first interviewee to create voice data (hereinafter also referred to as first interviewee voice data). The microphone for the first interviewee may be directly connected to or built into the interview support device 10, or may be connected to the interview support device 10 via a communication network, or in the case of an online interview, may be directly connected to or built into a terminal (PC (personal computer), smartphone, tablet terminal) used by the first interviewee.

The first interviewer voice recognition unit 212 performs voice recognition on the voice represented by the first interviewer voice data, and creates text data with time information (hereinafter also referred to as first interviewer text data) in which text representing the content of the voice is associated with the time of speech. The first interviewer voice recognition unit 212 also stores this first interviewer text data in the conversation DB 230. The first interviewer voice recognition unit 212 may create the first interviewer text data using existing voice recognition technology.

The first subject voice recording unit 213 records the voice of the first subject input to the microphone for the first subject to create voice data (hereinafter also referred to as first subject voice data). The microphone for the first subject may be directly connected to or built into the interview support device 10, or may be connected to the interview support device 10 via a communication network, or in the case of an online interview, may be directly connected to or built into a terminal (PC (personal computer), smartphone, tablet terminal) used by the first subject.

The first subject voice recognition unit 214 performs voice recognition on the voice represented by the first subject voice data, and creates text data with time information (hereinafter also referred to as first subject text data) in which text representing the spoken content of the voice is associated with the time of utterance. The first subject voice recognition unit 214 also stores this first subject text data in the conversation DB 230. The first subject voice recognition unit 214 may create the first subject text data using existing voice recognition technology.

In the case of a face-to-face interview, the microphone for the first interviewer and the microphone for the first subject may be common. In this case, the first interviewer voice recording unit 211 and the first subject voice recording unit 213, and the first interviewer voice recognition unit 212 and the first subject voice recognition unit 214 may each be common, and the voice recognition technology used may be one that is capable of voice recognition for each speaker.

The emotion change analysis unit 215 performs emotion analysis or emotion recognition (hereinafter collectively referred to as "emotion analysis") using the first subject's voice data and the first subject's text data to determine whether or not there has been an emotional change in the first subject, and if an emotional change has occurred, identifies the time of the change.

When it is determined that an emotional change has occurred in the first subject, the conversation extraction unit 216 uses the first interviewee text data and the first subject text data stored in the conversation DB 230 to extract a conversation including the utterance that caused the emotional change and the utterances before and after it.

The first subject characteristic acquisition unit 217 acquires the first subject characteristic. For example, in an interview regarding health guidance, the first subject characteristic acquisition unit 217 may acquire the first subject characteristic from data showing the results of a health check or data from a medical questionnaire. Also, for example, the first subject characteristic acquisition unit 217 may acquire the first subject characteristic from at least one of the first interviewee text data and the first subject text data.

The case information creation unit 218 creates case information that associates a subject ID indicating identification information for identifying the first subject, the first subject characteristics acquired by the first subject characteristics acquisition unit 217, the time at which an emotional change occurred in the first subject, and the conversation extracted by the conversation extraction unit 216. The case information creation unit 218 also stores the case information in the case information DB 240.

The similar case presentation processing unit 220 executes the similar case presentation process. Here, the similar case presentation processing unit 220 includes a second subject characteristic acquisition unit 221, a similar case acquisition unit 222, and a similar case presentation unit 223.

The second subject characteristic acquisition unit 221 acquires the second subject characteristic. The second subject characteristic acquisition unit 221 may acquire the second subject characteristic in the same manner as the first subject characteristic acquisition unit 217. That is, for example, in an interview regarding health guidance, the second subject characteristic acquisition unit 221 may acquire the second subject characteristic from data representing the results of a health check or data from a medical questionnaire. In addition, when there is a result of speech recognition of the voice of the second interviewee and the voice of the second subject (i.e., second interviewee text data and second subject text data), the second subject characteristic acquisition unit 221 may acquire the second subject characteristic from at least one of the second interviewee text data and the second subject text data.

The similar case acquisition unit 222 acquires case information including a first subject characteristic that is similar to a second subject characteristic from the case information DB 240 as similar case information.

The similar case presentation unit 223 presents the first subject characteristics and conversation contained in the similar case information to the second interviewer. As a result, the conversation that caused an emotional change in a subject with characteristics similar to the second subject with whom the second interviewer is currently interviewing is presented to the second interviewer. This allows the second interviewer to refer to this conversation and effectively conduct the interview with the second subject.

The conversation DB 230 stores the first interviewee text data and the first target person text data.

The case information DB240 stores case information. An example of case information will be described later.

Note that the functional configuration of the interview support device 10 shown in FIG. 2 is an example and is not limited to this. For example, there may be a "second interviewee voice recording unit" that records the voice of the second interviewee, a "second interviewee voice recognition unit" that performs voice recognition on the voice of the second interviewee to create second interviewee text data, a "second subject voice recording unit" that records the voice of the second subject, and a "second subject voice recognition unit" that performs voice recognition on the voice of the second subject to create second subject text data.

<Case Information>
An example of case information for a health guidance interview is shown in Fig. 3. As shown in Fig. 3, the case information includes a "subject ID" indicating identification information for identifying a first subject, a "first subject characteristic" indicating the characteristic of the first subject, an "emotion change time" indicating the time when an emotion change occurred, and a "conversation" indicating the conversation extracted by the conversation extraction unit 216.

For example, the case information shown in FIG. 3 includes a subject ID "B1". The case information shown in FIG. 3 also includes items representing characteristics classified as "physical", "psychological", "social", "lifestyle", etc. as the first subject characteristic, and the physical aspect includes items such as "Test value: BMI", "Test value: high blood pressure", and "Test value: hyperglycemia". Similarly, the psychological aspect includes items such as "personality tendency: scrupulous", "personality tendency: lazy", "personality tendency: logical", "health consciousness: high", and "health consciousness: low", the social aspect includes items such as "family composition: spouse", "family composition: children", and the lifestyle habit includes items such as "exercise/golf", "exercise/running", "drinking", "smoking", and "sleep". In the example shown in FIG. 3, the items representing each of these characteristics are expressed as two values (for example, "1" if a certain condition corresponding to the item is met, and "0" if it is not met). For example, "Test value: BMI" is expressed as "1" if the BMI value is outside a predetermined standard range, and as "0" otherwise. Similarly, for example, "personality tendency: meticulous" is represented as "1" if the personality is meticulous, and as "0" if not. Similarly, for example, "drinking" is represented as "1" if the person has a drinking habit, and as "0" if not. The same applies to items that represent other characteristics.

Also, for example, the case information shown in FIG. 3 includes the emotion change time "2022/9/20 11:40:22". Furthermore, the case information shown in FIG. 3 includes a conversation including an utterance at the time when the emotion change occurred, and two utterances by the first interviewer and the first subject before and after that time, as a conversation including utterances before and after the emotion change occurred. For example, in the example shown in FIG. 3, an emotion change occurred with the utterance of the first subject (the utterance underlined in FIG. 3), "I see. What happens if your blood vessels are damaged?", and the case information includes two utterances by the first interviewer before and after the utterance at which the emotion change occurred, and two utterances by the second subject before and after the utterance at which the emotion change occurred.

<Case information creation process>
The case creation process according to this embodiment will be described below with reference to Fig. 4. Note that the following steps S101 to S109 are repeatedly executed every time an interview is conducted between the first interviewer and the first target person, for example.

The first interviewee voice recording unit 211 records the voice of the first interviewee to create first interviewee voice data (step S101).

The first interviewee voice recognition unit 212 performs voice recognition on the voice represented by the first interviewee voice data created in step S101 above, and creates first interviewee text data (step S102). The first interviewee text data is stored in the conversation DB 230.

The first subject voice recording unit 213 records the voice of the first subject and creates first subject voice data (step S103).

The first subject voice recognition unit 214 performs voice recognition on the voice represented by the first subject voice data created in step S103 above, and creates first subject text data (step S104). The first subject text data is stored in the conversation DB 230.

Note that the above steps S101 to S102 and steps S103 to S104 are repeatedly executed, for example, every unit time (for example, a predetermined time span of several tens of seconds to about one minute), and the next step S105 and subsequent steps are executed after the end of the interview. That is, the first interviewee text data and the first target person text data are stored in the conversation DB 230 for each unit time. However, this does not necessarily have to be every unit time, and for example, the voice from the start to the end of the interview may be recorded in steps S101 and S103, and the voice may be subjected to voice recognition in steps S102 and S104 to create the first interviewee text data and the first target person text data. In addition, the next step S105 does not necessarily have to be executed after the end of the interview, and may be executed during the interview, for example, after a certain amount of the first interviewee text data and the first target person text data have been stored in the conversation DB 230.

Then, the emotion change analysis unit 215 performs emotion analysis using the first subject's voice data and the first subject's text data to determine whether or not there has been an emotion change in the first subject, and if an emotion change has occurred, to identify the time of the emotion change (step S105). Here, several examples of emotion analysis by the emotion change analysis unit 215 are described below.

・Sentiment analysis example 1
In this emotion analysis example, the emotion change analysis unit 215 analyzes emotion changes from the tone of the first subject's voice, the number of backchannels, and the number of utterances. This is because it is considered that the higher the tone of the voice, the more backchannels, and the more utterances, the more the first subject is interested in the approach from the first interviewer (e.g., health guidance, etc.), and an emotion change is occurring.

Specifically, the emotion change analysis unit 215 analyzes emotion changes for each unit time by following steps 11 to 18 below. For simplicity's sake, the unit time is represented as t, and an explanation will be given below of the case where emotion changes are analyzed in a certain unit time t.

Step 11: The emotion change analysis unit 215 uses the first subject's voice data for unit time t to obtain a fundamental frequency _xt that indicates the pitch of the voice for unit time t.

Step 12: The emotion change analysis unit 215 obtains the number of backchannels _yt in the unit time t by using the first subject text data in the unit time t and a dictionary in which words used for backchannels are registered. Examples of words used for backchannels include "Yes,""Yeah,""Isee,""That'sright,""Uh-huh," and "That's right."

Step 13: The emotion change analysis unit 215 obtains the number of utterances _zt in unit time t by using at least one of the first subject's voice data in unit time t and the first subject's text data in unit time t. Note that an utterance is a unit of speech that has syntactic, discursive, and interactive coherence (Reference 1).

Step 14: The emotion change analysis unit 215 determines whether or not at least one of _{xt >} _thx , yt > _thy , and _zt > _thz is satisfied, where _thx is the threshold for the fundamental frequency, _thy is the threshold for the number of backchannels, _and _thz is the threshold for the number of utterances per unit time. Note that the values of these thresholds _thx , _thy , and _thz are set in advance.

Step 15: If it is determined in step 14 above that none of _xt > _thx , _yt > _thy , and _zt > _thz is satisfied, the emotion change analysis unit 215 determines that no emotion change has occurred in the first subject in unit time t. On the other hand, if it is determined in step 14 above that at least one of x> _thx , y> _thy , and z> _thz is satisfied, the emotion change analysis unit 215 calculates the index value S _t as follows.

S _t = (a x _t + b x y _t + c x z _t ) / (a + b + c)
Here, a is a weight for the fundamental frequency, b is a weight for the number of backchannels, and c is a weight for the number of utterances. The values of a, b, and c are set in advance. The index value S _t indicates the degree to which the emotion of the first subject has changed in unit time t, and the higher the value, the greater the change in the emotion of the first subject in unit time t. This index value S _t may be called, for example, "degree of sensitivity."

Step 16: The emotion change analysis unit 215 determines whether S _t > th _S is satisfied, with th _S being the threshold for the index value. The value of the threshold th _S is set in advance. There are various methods for determining the value of the threshold th _S. For example, interviews regarding health guidance are conducted with multiple subjects in the same environment, and when the interviewer judges that an emotional change has occurred in the subject (for example, the subject has become more health conscious, has become interested in health behavior, has felt the need for behavioral change, etc.), the voice pitch (fundamental frequency), the number of backchannels, and the number of utterances per unit time are measured, and the value of the threshold th _S is determined using these as teachers.

Step 17: If it is determined in step 16 above that S _t > th _S is not satisfied, the emotion change analysis unit 215 determines that no emotion change has occurred in the first subject in the unit time t. On the other hand, if it is determined in step 16 above that S _t > th _S is satisfied, the emotion change analysis unit 215 determines that an emotion change has occurred in the first subject in the unit time t.

Step 18: If it is determined in step 17 above that an emotional change has occurred in the first subject, the emotional change analysis unit 215 identifies the time when the emotional change occurred in the first subject (hereinafter also referred to as the emotional change time). For example, the emotional change analysis unit 215 may identify the start time, end time, or intermediate time of the unit time t as the emotional change time. Alternatively, for example, if the unit time t is a relatively long time span (for example, about one minute or more), the emotional change analysis unit 215 may find the amount of change in the pitch of the first subject's voice (fundamental frequency) or the amount of change in the number of utterances within that unit time t, and identify the time when the amount of change in the pitch of the voice becomes equal to or exceeds a predetermined threshold or the time when the amount of change in the number of utterances becomes equal to or exceeds a predetermined threshold as the emotional change time.

・Sentiment analysis example 2
When an emotion change occurs, for example, the first subject's arm movement, upper body movement, head movement, and head up and down movement (i.e., nodding, etc.) may increase. For this reason, when calculating the index value S _t in step 15 of the emotion analysis example 1 above, the amount of movement u _t of the first subject in unit time t may be taken into consideration. That is, in step 15 of the emotion analysis example 1 above, the emotion change analysis unit 215 may calculate the index value S _t as follows.

S _t = (a x _t + b x y _t + c x z _t + d x u _t ) / (a + b + c + d)
Here, d is a weight for the amount of movement of the first subject, and its value is set in advance.

The amount of movement u _t can be calculated from image data of the first subject when the image data is available. In addition, the amount of movement u t can be calculated from the sensor value when an acceleration sensor or motion sensor is attached to the head or arm of the first subject and the sensor value is available.

・Sentiment analysis example 3
In the above emotion analysis example 1 or emotion analysis example 2, the amount of speech of the first subject (Reference 2) may be used instead of or in addition to the pitch of the voice, the number of backchannels, and the number of utterances. The amount of speech refers to the speech time, speech frequency, and speech length. Note that the speech time indicates the amount of speech generated relative to the data length (i.e., unit time) as a percentage. The speech frequency refers to the number of utterances per unit time. The speech length refers to the average time from the start to the end of one utterance.

・Sentiment analysis example 4
The presence or absence of a change in emotion and the time of the change may be determined by using an existing emotion analysis technique. Examples of the existing emotion analysis technique include those described in References 3 to 8. References 3 and 8 are techniques for analyzing emotion from audio, References 4 and 6 are techniques for analyzing emotion from video, Reference 7 is a technique for analyzing emotion from text, and Reference 5 is a technique for analyzing emotion from text, audio, and video.

If it is not determined in step S105 that the first subject has experienced an emotional change in all unit times t (NO in step S106), the case creation processing unit 210 ends the case creation process. On the other hand, if it is determined in step S105 that the first subject has experienced an emotional change in a certain unit time t (YES in step S106), the conversation extraction unit 216 extracts a conversation including the utterance in which the emotional change occurred and the utterances before and after the utterance from the conversation DB 230 (step S107). That is, the conversation extraction unit 216 extracts, for example, the utterance of the first subject at the utterance time closest to the time of the emotional change of the first subject, N utterances of the first subject before and after the utterance time, and the utterance of the first interviewer before and after the utterance time from the conversation DB 230. As a result, a conversation consisting of 2N+1 utterances of the first subject and 2N utterances of the first interviewer is extracted. Note that N is a predetermined integer of 1 or more, and can be set to, for example, N=2, N=3, etc., but is not limited to this. In addition, the conversation extraction unit 216 may extract from the conversation DB 230 a conversation that includes, for example, an utterance by the first subject (or the first interviewee) at the utterance time closest to the emotion change time of the first subject, and an utterance uttered within a predetermined time span before and after the utterance time.

When extracting the conversation in step S107 above, in addition to or instead of the utterance of the first subject whose time of speech is closest to the time of the emotion change of the first subject, the utterance of the first interviewee whose time of speech is closest to the time of the emotion change of the first subject may be extracted.

The first subject characteristic acquisition unit 217 acquires the first subject characteristic (step S108).

The case information creation unit 218 creates case information that associates the subject ID of the first subject, the first subject characteristic acquired in step S108 above, the emotion change time identified in step S105 above, and the conversation extracted in step S107 above, and stores the case information in the case information DB 240 (step S109). This results in obtaining case information that includes the first subject characteristic and the conversation that causes an emotional change in the first subject having that characteristic.

<Similar Case Presentation Processing>
The similar case presentation process according to this embodiment will be described with reference to Fig. 5. Note that the following steps S201 to S203 are repeatedly executed every time an interview is conducted between the second interviewee and the second target person, for example.

The second subject characteristic acquisition unit 221 acquires the second subject characteristic (step S201).

The similar case acquisition unit 222 acquires case information including a first subject characteristic similar to the second subject characteristic acquired in step S201 above from the case information DB 240 as similar case information (step S202). Specifically, the similar case acquisition unit 222 acquires similar case information by steps 21 to 23 below.

Step 21: The similar case acquisition unit 222 obtains the similarity between the second subject characteristic obtained in step S201 and the first subject characteristic included in each case information stored in the case information DB 240. If the subject characteristic includes n items representing the characteristics of the subject, the first subject characteristic and the second subject characteristic are expressed as an n-dimensional vector. Hereinafter, the first subject characteristic included in the m-th case information stored in the case information DB 240 is defined as V ^(m) =( _v1 ^(m) ..., _vn ^(m) ), and the second subject characteristic is defined as W=( _w1 ..., _wn ). At this time, the similar case acquisition unit 222 obtains the similarity Sim(V(m), W) using the first subject characteristic V ⁽ ^m) and the second subject characteristic W for each m=1,..., M (where M is the number of case information stored in the case information DB 240). As the similarity Sim(.,.), for example, cosine similarity or the like may be used, but this is not limited to this, and any similarity that can measure the similarity between vectors can be used.

Step 22: The similar case acquiring unit 222 obtains m′ that is the maximum value of Sim(V ^(m) , W) obtained in the above step 21.

Step 23: The similar case acquisition unit 222 acquires the m'th case information obtained in step 22 above from the case information DB 240. This allows the case information containing the first subject characteristic that is most similar to the second subject characteristic to be obtained as similar case information.

In step S202 above, the case information containing the first subject characteristic that is most similar to the second subject characteristic is defined as the similar case information, but this is not limited to this. For example, the top M' (M' is an integer equal to or greater than 2) pieces of case information in order of similarity to the second subject characteristic may be obtained as the similar case information.

The similar case presentation unit 223 presents the first subject characteristic and conversation contained in the similar case information acquired in step S202 above to the second interviewer (step S203). When presenting the first subject characteristic and conversation contained in the similar case information to the second interviewer, for example, the first subject characteristic and conversation may be displayed on the display of a terminal used by the second interviewer. In addition, for example, when the second interviewer is using the interview support device 10, the first subject characteristic and conversation may be displayed on the display device 102 of the interview support device 10.

Here, an example of the first subject characteristic presented to the second interviewer in step S203 above is shown in FIG. 6. In the first subject characteristic 2100 shown in FIG. 6, items with the same values as the items included in the first subject characteristic are highlighted in a manner different from the other items. That is, in the example shown in FIG. 6, "Laboratory value: high blood pressure", "Laboratory value: high blood sugar", "Personality tendency: logical", "Drinking", and "Smoking" are highlighted. This allows the second interviewer to easily know the items that are the same characteristics as the second subject he or she is interviewing.

FIG. 7 shows an example of the conversation presented to the second interviewer in step S203 above. In conversation 2200 shown in FIG. 7, utterances 2201-2205 of the first subject and utterances 2211-2214 of the first interviewer are displayed, and utterance 2201 of the first subject, whose utterance time is closest to the emotion change time, is highlighted in a manner different from the other utterances. This allows the second interviewer to know the utterance when the emotion change occurred (utterance 2201) and the utterances before and after it (utterances 2202-2205 and utterances 2211-2214), and he or she can use this as a reference to effectively conduct the interview.

Note that in step S203 above, both the first subject characteristics and the conversation contained in the similar case information are presented to the second interviewee, but this is not limited thereto, and for example, only the conversation contained in the similar case information may be presented to the second interviewee.

<Summary>
As described above, the interview support device 10 according to the present embodiment stores case information in association with a conversation that occurs when an emotional change (mental change) occurs in a subject during an interview with an interviewer, and the characteristics of the subject. Then, when another interview is conducted, the interview support device 10 according to the present embodiment extracts case information that includes characteristics similar to the characteristics of the subject of the interview from past case information according to the characteristics of the subject of the interview, and presents the case information to the interviewer of the interview. In this way, the interview support device 10 according to the present embodiment makes it possible for a plurality of interviewers to share a conversation that causes a mental change in a subject having at least similar characteristics. Therefore, each interviewer can effectively and efficiently have a conversation that encourages a behavioral change in the subject by referring to the information presented by the interview support device 10.

For example, in an interview regarding health guidance, as the level of interest in health and enthusiasm for the interview vary from person to person, the interviewer must proceed with the conversation while adjusting the dialogue process according to the personality traits and reactions of the subject picked up during the conversation, and lead the subject to an action plan such as improving lifestyle habits while drawing out the subject's motivation (Reference 9). For this reason, by using the interview support device 10 of this embodiment, a conversation that can draw out the subject's motivation becomes clear, and as a result, the interviewer can provide the subject with effective and efficient health guidance.

The present invention is not limited to the specifically disclosed embodiments above, and various modifications, changes, and combinations with known technologies are possible without departing from the scope of the claims.

[References]
Reference 1: Corpus of Japanese Daily Conversation | Multifaceted research on spoken language based on a large-scale corpus of daily conversation, Internet <URL: https://www2.ninjal.ac.jp/conversation/cejc-monitor/transcript.html>
Reference 2: Hirai, Yuki, and Inoue, Tomoo: State estimation in pair programming learning - Differences in conversation between success and failure in resolving stumbling blocks, Transactions of the Information Processing Society of Japan, Vol. 53, No. 1, pp. 72-80 (2012).
Reference 3: Emotion recognition technology, Internet <URL: https://www.docomo.ne.jp/corporate/technology/rd/tech/term/21/index.html>
Reference 4: Com Analyzer, Internet <URL: https://www.nttdata.com/jp/ja/news/release/2019/052700/>
Reference 5: AI suite, Internet <URL: https://cloud.watch.impress.co.jp/docs/news/1364523.html>
Reference 6: Heart Sensor for Communication, Internet <URL: https://service.cac.co.jp/hctech/ks4c>
Reference 7: Tone Analyzer, Internet <URL: https://cloud.ibm.com/docs/tone-analyzer/getting-started.html>
Reference 8: Web Empath API, Internet <URL: https://www.apibank.jp/ApiBank/api/detail?api_no=555&api_type=I>
Reference 9: Tae Sato, Kaori Fujimura, Reiko Ariga, Yasuo Ishigure, Asami Miyajima, Tamae Ogata, Akina Mine, Emiko Kikuchi, Yasushi Nishizaki, "An Attempt at Analyzing Dialogue Processes in Health Guidance for the Prevention of Lifestyle-Related Diseases," IEICE Technical Report, vol. 122, no. 166, HCS2022-41, pp. 27-32, August 2022.

10 Interview support device 101 Input device 102 Display device 103 External I/F
103a Recording medium 104 Communication I/F
105 RAM
106 ROM
107 Auxiliary storage device 108 Processor 109 Bus 210 Case creation processing unit 211 First interviewee voice recording unit 212 First interviewee voice recognition unit 213 First subject voice recording unit 214 First subject voice recognition unit 215 Emotion change analysis unit 216 Conversation extraction unit 217 First subject characteristic acquisition unit 218 Case information creation unit 220 Similar case presentation processing unit 221 Second subject characteristic acquisition unit 222 Similar case acquisition unit 223 Similar case presentation unit 230 Conversation DB
240 Case Information DB

Claims

a determination unit configured to determine whether or not an emotional change has occurred in the first subject, using at least one of a voice of the first subject during an interview and a first speech recognition result representing a result of speech recognition of the voice of the first subject;
a conversation extraction unit configured to extract, when it is determined that an emotional change has occurred in the first subject, a conversation consisting of a plurality of utterances including an utterance in which the emotional change has occurred from the first speech recognition result and a second speech recognition result representing a result of speech recognition of a first interviewer who is interviewing the first subject;
a case creation unit configured to create case information in which characteristics of the first subject are associated with the conversation, and store the case information in a storage unit;
a similar case presentation unit configured to present case information including characteristics similar to those of the second subject to be interviewed, among the case information stored in the storage unit, to a second interviewer who interviews the second subject as similar case information;
An interview support device having the above structure.
The similar case presentation unit,
The interview support device according to claim 1, which is configured to highlight and present to the second interviewer the utterance in which the emotional change occurred among the multiple utterances that constitute the conversation included in the similar case information.
The similar case presentation unit,
The interview support device according to claim 2 , configured to highlight and present to the second interviewee those characteristics contained in the similar case information that are the same as those of the second subject.
The determination unit is
Calculating a pitch of the voice of the first target person, a number of backchannels, and a number of utterances of the first target person per unit time using the voice of the first target person and the first speech recognition result;
Calculating an index value representing a degree of change in the emotion of the first subject using the pitch of the voice, the number of backchannels, and the number of utterances;
4. The interview support device according to claim 1, further comprising: a step of determining whether or not the emotion change has occurred based on the index value.
The determination unit is
Further calculating a movement amount of the head or arm of the first subject using video data of the first subject or a sensor value of a sensor attached to the first subject;
5. The interview support device according to claim 4, further configured to calculate the index value by using the amount of movement.
The interview support device according to claim 1, wherein the characteristics include the subject's physical characteristics, psychological characteristics, social characteristics, and lifestyle characteristics.
a determination step of determining whether or not an emotional change has occurred in the first subject by using at least one of a voice of the first subject during an interview and a first speech recognition result representing a result of speech recognition of the voice of the first subject;
a conversation extraction step of extracting a conversation consisting of a plurality of utterances including an utterance in which an emotional change has occurred from the first speech recognition result and a second speech recognition result representing a result of speech recognition of a first interviewer who is interviewing the first subject, when it is determined that an emotional change has occurred in the first subject;
a case creation step of creating case information in which characteristics of the first subject are associated with the conversation and storing the case information in a storage unit;
a similar case presentation step of presenting case information including characteristics similar to those of a second subject to be interviewed, among the case information stored in the storage unit, to a second interviewer who interviews the second subject as similar case information;
The interview support method is carried out by a computer.
a determination step of determining whether or not an emotional change has occurred in the first subject by using at least one of a voice of the first subject during an interview and a first speech recognition result representing a result of speech recognition of the voice of the first subject;
a conversation extraction step of extracting a conversation consisting of a plurality of utterances including an utterance in which an emotional change has occurred from the first speech recognition result and a second speech recognition result representing a result of speech recognition of a first interviewer who is interviewing the first subject, when it is determined that an emotional change has occurred in the first subject;
a case creation step of creating case information in which characteristics of the first subject are associated with the conversation and storing the case information in a storage unit;
a similar case presentation step of presenting case information including characteristics similar to those of a second subject to be interviewed, among the case information stored in the storage unit, to a second interviewer who interviews the second subject as similar case information;
A program that causes a computer to execute the following.