WO2022102432A1

WO2022102432A1 - Information processing device and information processing method

Info

Publication number: WO2022102432A1
Application number: PCT/JP2021/039945
Authority: WO
Inventors: 侑理網本; 裕美倉沢
Original assignee: ソニーグループ株式会社
Priority date: 2020-11-13
Filing date: 2021-10-29
Publication date: 2022-05-19

Abstract

The present technology pertains to an information processing device and an information processing method which make it possible to suitably support a person to be graded in the grading of interpersonal communication in which dialogue skills are required. Provided is the information processing device comprising a processing unit which grades, on the basis of standard information which serves as a standard for grading a dialogue, the dialogue between a first speaker in a first space and a second speaker in a second space that is different from the first space, and presents, for the first speaker, grading information about the grading of the dialogue in real time. The present technology can be applied to, for example, a dialogue grading device which grades a dialogue.

Description

Information processing equipment and information processing method

The present technology relates to an information processing device and an information processing method, and particularly to an information processing device and an information processing method capable of appropriately supporting a person to be scored when scoring interpersonal communication that requires dialogue skills. ..

Professionals in the medical field have tests related to interpersonal communication. Patent Document 1 discloses a simulation system that simulates a psychological change of a model patient in a medical interview and changes the answer of the model patient according to a question content and an interview procedure.

In addition, there are preferable expressions and ways of speaking depending on the type of job, such as sales position or call center operator.

International Publication No. 2007/026715

By the way, as a means to improve the skill of interpersonal communication, it is required to give appropriate feedback to the graded person at the time of dialogue through scoring etc. to support the improvement of dialogue communication skill.

This technology was made in view of such a situation, and it is possible to give appropriate feedback to the graded persons in order to improve the skill of interpersonal communication and to support the improvement of the skill of dialogue communication. Is.

The information processing device of one aspect of the present technology has a first speaker in the first space and a second speaker in a second space different from the first space based on the reference information that serves as a reference for dialogue scoring. It is an information processing apparatus provided with a processing unit that scores a dialogue by the speakers and presents scoring information regarding the scoring of the dialogue to the first speaker in real time.

In the information processing method of one aspect of the present technology, the information processing apparatus is different from the first speaker in the first space and the first space based on the reference information which is the standard for dialogue scoring. It is an information processing method that scores a dialogue by a second speaker in a space and presents scoring information regarding the scoring of the dialogue to the first speaker in real time.

In the information processing device and the information processing method of one aspect of the present technology, the first speaker in the first space and the second space different from the first space are based on the reference information which is the standard for dialogue scoring. The dialogue by the second speaker in the space is graded, and the scoring information regarding the scoring of the dialogue is presented to the first speaker in real time.

The information processing device on one aspect of the present technology may be an independent device or an internal block constituting one device.

It is a figure which shows the structural example of one Embodiment of the information processing system to which this technique is applied. It is a figure which shows the 1st example of the structure of the information processing apparatus of FIG. It is a figure which shows the 1st example of the functional configuration of the information processing apparatus of FIG. It is a flowchart explaining the flow of a dialogue correspondence process. It is a flowchart explaining the detail of analysis scoring processing. It is a figure which shows the configuration example when the information processing system to which this technique is applied is used for a medical interview. It is a figure which shows the configuration example of the information processing apparatus when it is used for a medical interview. It is a figure which shows the example of the language information to be added points in the medical interview. It is a figure which shows the example of the assumed use scene for each evaluation axis. It is a figure which shows the display example of the evaluation axis scoring information in a medical interview. It is a figure which shows the display example of the dialogue scene transition information in a medical interview. It is a figure which shows the display example of the dialogue communication matter information in a medical interview. It is a figure which shows the real-time display example of the scoring information in a medical interview. It is a figure which shows the example of the dialogue with overlap in a medical interview. It is a figure which shows the example of the dialogue text at the time of analysis. It is a figure which shows the 2nd example of the structure of the information processing apparatus of FIG. It is a figure which shows the 1st example of the functional configuration of the information processing apparatus of FIG. It is a figure which shows the 2nd example of the functional configuration of the information processing apparatus of FIG. It is a figure which shows the display example of the dialogue partner evaluation information in a medical interview. It is a figure which shows the real-time display example of the scoring information in a medical interview. It is a figure which shows the 1st example of the display of input information and scoring information. It is a figure which shows the 2nd example of the display of input information and scoring information. It is a figure which shows an example of feedback when a speaker of a pharmacist role in a medical interview notices an omission of a communication matter, and there is a change of a dialogue strategy. It is a flowchart explaining the flow of feedback processing. It is a figure which shows the example of feedback when there was no change of the dialogue strategy to the omission of communication matter in a medical interview. It is a figure which shows the example of feedback when the understanding degree of a speaker of a patient role changes by paraphrase of a speaker of a pharmacist role in a medical interview. It is a figure which shows the real-time display example of the scoring information in the apparel customer service. It is a figure which shows the other configuration example of one Embodiment of the information processing system to which this technique is applied.

<1. First Embodiment>

(System configuration)
FIG. 1 shows a configuration example of an embodiment of an information processing system to which the present technology is applied.

In FIG. 1, the information processing system 1 is configured by connecting an information processing device 10 as a telepresence device and an information processing device 20 to each other via a network 50.

The information processing device 10 and the information processing device 20 are installed in different spaces such as different buildings and different rooms. That is, the user in the vicinity of the information processing device 10 (first speaker) and the user in the vicinity of the information processing device 20 (second speaker) are in remote locations such as remote areas. Become a speaker with whom you have a dialogue. The first speaker is the grader whose interpersonal communication skills are graded, and the second speaker is the dialogue partner of the grader.

The information processing device 10 and the information processing device 20 each have a large-sized display (for example, a size capable of displaying the whole body of the speaker), a camera for photographing the surrounding state, a speaker's speech, an environmental sound, and the like. A microphone that collects the sound around the building, a speaker that outputs the sound, etc. are installed.

The information processing device 10 displays a video corresponding to the captured image taken by the information processing device 20 and information superimposed on the video, and outputs the sound collected by the information processing device 20. On the other hand, the information processing apparatus 20 displays an image corresponding to the captured image captured by the information processing apparatus 10, and outputs the sound collected by the information processing apparatus 10. As a result, the first speaker and the second speaker in different spaces can have a dialogue through the display.

The network 50 is configured to include a communication network such as the Internet, an intranet, or a mobile phone network, and enables interconnection between devices using a communication protocol such as TCP / IP (Transmission Control Protocol / Internet Protocol). ..

(Device configuration)
FIG. 2 shows a configuration example of the information processing apparatus 10 of FIG.

The information processing device 10 is an electronic device such as a display device that can be connected to a network 50 such as the Internet, and is configured as a telepresence device.

In the information processing apparatus 10, the CPU (Central Processing Unit) 101, the ROM (Read Only Memory) 102, and the RAM (Random Access Memory) 103 are connected to each other by the bus 104. The CPU 101 controls the operation of each unit and performs various processes by executing the program recorded in the ROM 102 or the storage unit 108. Various data are appropriately stored in the RAM 103.

The input / output I / F 105 is also connected to the bus 104. An input unit 106, an output unit 107, a storage unit 108, and a communication unit 109 are connected to the input / output I / F 105.

The input unit 106 supplies various input data to each unit including the CPU 101. The input unit 106 includes an operation unit 111, a camera 112, and a microphone 113.

The operation unit 111 is operated by the user and outputs operation data corresponding to the operation. The operation unit 111 is composed of physical buttons, a touch panel, and the like. The camera 112 generates and outputs captured image data by photoelectrically converting the light from the subject incident therein and performing signal processing on the electric signal obtained as a result. The camera 112 includes an image sensor, a signal processing unit, and the like. The microphone 113 receives sound as vibration of air and outputs sound data as an electric signal thereof.

The output unit 107 outputs various information according to the control from the CPU 101. The output unit 107 has a display 121 and a speaker 122.

The display 121 displays an image or the like according to the captured image data according to the control from the CPU 101. The display 121 is composed of a panel unit such as a liquid crystal panel or an OLED (Organic Light Emitting Diode) panel, a signal processing unit, and the like. The speaker 122 outputs sound according to the sound data according to the control from the CPU 101.

The storage unit 108 records various data and programs according to the control from the CPU 101. The CPU 101 reads various data from the storage unit 108, processes them, and executes a program. The storage unit 108 is configured as an auxiliary storage device. The storage unit 108 may be configured as an internal storage such as an HDD (Hard Disk Drive) or an SSD (Solid State Drive), or may be an external storage such as a memory card.

The communication unit 109 communicates with other devices via the network 50 according to the control from the CPU 101. The communication unit 109 is configured as a communication module corresponding to cellular communication (for example, LTE-Advanced, 5G (5th Generation), etc.), wireless communication such as wireless LAN (Local Area Network), or wired communication.

The information processing device 10 configured as described above is an example, for example, a short-range wireless communication circuit that performs wireless communication according to a short-range wireless communication standard such as Bluetooth (registered trademark) or NFC (Near Field Communication), or a short-range wireless communication circuit. A power supply circuit or the like can be provided. In the information processing apparatus 10, instead of the speaker 122, headphones or the like connected to the output terminal may be used. Further, the display 121 may be a projector. With the projector, it is possible to project and display an image corresponding to the captured image data on an arbitrary screen.

In the information processing system 1 of FIG. 1, the configuration of the information processing apparatus 20 is the same as that of the information processing apparatus 10 shown in FIG. 2, so the description thereof will be omitted.

(Functional configuration)
FIG. 3 shows an example of a functional configuration of the information processing apparatus 10 of FIG.

In FIG. 3, the information processing apparatus 10 includes a voice input unit 151, a voice recognition unit 152, a sentence division unit 153, a speech content analysis unit 154, a point-adding target language information DB 155, a time acquisition unit 156, an image input unit 157, and an image recognition unit. It has an image analysis unit 159, a point addition target image information DB 160, a point addition target integration unit 161, a question information DB 162, an intermediate information display unit 163, an intermediate result notification unit 164, a scoring result generation unit 165, and a scoring result display unit 166.

The analysis processing unit 191 is composed of the sentence division unit 153, the utterance content analysis unit 154, the point addition target language information DB 155, the time acquisition unit 156, the image analysis unit 159, and the point addition target image information DB 160. The scoring processing unit 192 is configured by the scoring target integration unit 161, the question information DB 162, and the scoring result generation unit 165.

For example, the voice recognition unit 152, the image recognition unit 158, the analysis processing unit 191 and the scoring processing unit 192 are realized by the CPU 101 of FIG. 2 executing a program. However, the point-adding target language information DB 155, the point-adding target image information DB 160, and the question information DB 162 are recorded in the storage unit 108 of FIG.

Further, the voice input unit 151 corresponds to the microphone 113 of FIG. 2, and the image input unit 157 corresponds to the camera 112 of FIG. The intermediate information display unit 163 and the scoring result display unit 166 correspond to the display 121 of FIG. 2, and the intermediate result notification unit 164 corresponds to the display 121 or the speaker 122 of FIG.

The voice input unit 151 inputs the voice data of the speaker's utterance to the voice recognition unit 152. The voice recognition unit 152 performs voice recognition processing using the voice data from the voice input unit 151. In this voice recognition process, the voice data of the speaker's utterance is converted into text data by using a statistical method or the like, and the voice recognition result is supplied to the sentence division unit 153 and the time acquisition unit 156.

The sentence division unit 153 performs sentence division processing using the voice recognition result from the voice recognition unit 152. In this sentence division process, the text corresponding to the speaker's utterance is divided into predetermined processing units, and the sentence division result is supplied to the utterance content analysis unit 154.

The utterance content analysis unit 154 performs an utterance content analysis process using the sentence division result from the sentence division unit 153 and the point addition target language information stored in the point addition target language information DB 155. The language information to be added is information for extracting (identifying) the language (wording) to be added when scoring the skill of interpersonal communication. In the utterance content analysis process, by using the similarity between texts, the text including the language to be added is analyzed from the divided text, and the analysis result is supplied to the point addition target integration unit 161.

The time acquisition unit 156 acquires the time according to the voice recognition result from the voice recognition unit 152, and supplies the time information to the image analysis unit 159 and the point addition target integration unit 161.

The image input unit 157 inputs captured image data including the speaker into the image recognition unit 158. The image recognition unit 158 performs image recognition processing using the captured image data from the image input unit 157. In this image recognition process, a speaker (face, body part, etc.) as an object is recognized by using a pattern recognition technique or the like, and the image recognition result is supplied to the image analysis unit 159.

The image analysis unit 159 performs image analysis processing using the image recognition result from the image recognition unit 158 and the point addition target image stored in the point addition target image information DB 160. The image to be added is information for extracting (identifying) the image to be added when scoring the skill of interpersonal communication. In the image analysis process, the image to be added points is analyzed from the recognized image of the speaker or the like, and the analysis result is supplied to the point addition target integration unit 161. Further, in the image analysis process, the time information from the time acquisition unit 156 is used, and the analysis result of the image is associated with the analysis result of the utterance content.

The point addition target integration unit 161 is supplied with the analysis result of the utterance content from the utterance content analysis unit 154, the image analysis result from the image analysis unit 159, and the time information from the time acquisition unit 156. The point-adding target integration unit 161 uses the question information stored in the question information DB 162 to perform an integration process for integrating the analysis result of the utterance content associated with the time information and the analysis result of the image.

Question information is information on how points should be added to the points to be added. In the integrated process, scoring information regarding the scoring of the dialogue is obtained by integrating the language to be scored indicated by the analysis result of the utterance content and the image to be scored indicated by the analysis result of the image and performing addition. Be done. This scoring information includes the scoring results of the dialogue or intermediate information presented during the dialogue.

The point-adding target integration unit 161 supplies intermediate information such as whether or not the speaker has acted to add points to the intermediate information display unit 163 when the dialogue is in progress. The intermediate information display unit 163 displays the intermediate information from the point addition target integration unit 161 in real time. Further, when the dialogue is in the middle of the dialogue, the point-adding target integration unit 161 supplies the intermediate result such as the scoring result of the dialogue from the start of the dialogue to the time point to the intermediate result notification unit 164 as intermediate information. The intermediate result notification unit 164 notifies the intermediate information such as the intermediate result from the point addition target integration unit 161 in real time.

Further, when the dialogue is completed, the point-adding target integration unit 161 supplies the scoring result of the dialogue to the scoring result generation unit 165. The scoring result generation unit 165 performs a scoring result generation process using the scoring result of the dialogue from the point addition target integration unit 161. In the scoring result generation process, by performing predetermined processing such as weighting of important items among the points to be added, the final scoring result (score) in the entire dialogue is generated, and the scoring result is the scoring result display unit 166. Is supplied to. The scoring result display unit 166 displays the scoring result from the scoring result generation unit 165.

(Process flow)
Next, the flow of the dialogue correspondence processing executed by the information processing apparatus 10 will be described with reference to the flowchart of FIG.

In step S11, the voice recognition unit 152 performs voice recognition processing using the voice data from the voice input unit 151, and converts the utterance of the speaker who is the scoring target person into the text Ui (0 < _i ≦ N). .. A time stamp is added to this text.

In step S12, the sentence dividing unit 153 divides the text Ui from the voice recognition unit ₁₅₂ into divided texts _u ₁ , u ₂ , ..., Un.

In step S13, the analysis processing unit 191 and the scoring processing unit 192 analyze the divided text u _j (0 <j ≦ n) and calculate each scoring item for the divided text u _j to perform the analysis scoring process. conduct. The details of this analysis scoring process will be described later with reference to the flowchart of FIG.

When the process of step S13 is completed, the process proceeds to step S14. In step S14, it is determined whether or not the score has been added based on the result of the analysis scoring process.

If it is determined in the determination process of step S14 that points have been added, the process proceeds to step S15, and the processes of steps S15 and S16 are executed, so that intermediate information corresponding to the result of the analysis scoring process is presented. To. That is, in steps S15 and S16, the intermediate information display unit 163 displays the contents of the correct answer and the intermediate information such as the intermediate result of adding the score to the scoring item to be added. For example, if the speaker who is the target of scoring speaks "hello" and the scoring item "greeting" is added, a list with the scoring item checked, a graph with the score added, etc. Is displayed. Alternatively, the intermediate result notification unit 164 may notify the intermediate information such as the intermediate result by sound or the like.

If it is determined in the determination process of step S14 that no points have been added, the process proceeds to step S17, and the process of step S17 is executed, so that intermediate information corresponding to the result of the analysis scoring process is presented. To. That is, in step S17, the intermediate information display unit 163 displays intermediate information such as score sheet information. For example, if the speaker who is the target of scoring does not make an utterance to be scored, the scoring items in the list will not be checked and the score will not be added to the graph or the like.

When the process of step S16 or S17 is completed, the process proceeds to step S18. In step S18, when it is determined that i <N, that is, when it is determined that the next utterance in the dialogue exists after processing all the divided text _uji in the text _Ui to be processed. , The processing after step S11 is repeated with the text _{Ui + 1} converted from the next utterance as the processing target. As a result, when a dialogue is taking place between the scoring target person and the other party of the dialogue, the display of the intermediate information according to the determination result of the presence or absence of points added regarding the dialogue is updated in real time.

On the other hand, when it is determined in the determination process of step S18 that i = N, that is, when it is determined that the next utterance in the dialogue does not exist and the dialogue is completed, the process proceeds to step S19. It can be advanced. In step S19, the scoring result display unit 166 displays the scoring result including the final score corresponding to the series of utterances in the dialogue. This scoring result is the final scoring result and corresponds to the intermediate result at the end of the dialogue among the intermediate results that are continuously updated during the dialogue.

In this way, the scoring information at each point in time is presented during the dialogue and at the end of the dialogue, but each utterance such as whether the information necessary for the utterance of the speaker who is the scoring target was included during the dialogue. While the interim results including the scores of are presented, at the end of the dialogue, the scoring results including the scores of the entire dialogue such as the composition of the dialogue and the overall evaluation are presented.

The flow of dialogue support processing has been explained above. Here, the details of the analysis scoring process in step S13 of FIG. 4 will be described with reference to the flowchart of FIG.

In step S31, the utterance content analysis unit 154 analyzes the utterance content of the divided text _uj using the point-adding target language information stored in the point-adding target language information DB 155. In the analysis of the utterance content, the similarity between the divided text _uji and the scoring item example sentence is analyzed (S32), the composition in the dialogue is analyzed (S33), the speech attitude classification is analyzed (S34), and the utterance in the dialogue composition is analyzed. Attitude classification analysis (S35) and the like are performed.

For example, the language information to be added includes information on scoring item example sentences, composition in dialogue, speech attitude classification, etc. as information for extracting the language to be added from the divided text, and is divided using these information. By processing the text u _j , the similarity with the scoring item example sentence, the composition in the dialogue, the speech attitude classification, the speech attitude classification in the dialogue composition, and the like are analyzed. The speech attitude classification is a classification of how the speaker speaks to the other party. Here, by using the similarity between texts for scoring, the evaluator can register and grade the example sentences to be added for each scoring item.

In step S36, the image input unit 157 acquires the captured image at the corresponding time. In step S37, the image recognition unit 158 performs image recognition on the acquired captured image.

In step S38, the image analysis unit 159 analyzes the operation included in the image recognition result by using the point addition target image information stored in the point addition target image information DB 160. In this image analysis, analysis of the facial expression of the speaker (S39), analysis of the movement of the speaker (S40), analysis of the line of sight of the speaker (S41), analysis of the presentation (S42), and the like are performed.

For example, the point-adding target image information includes information on facial expressions, movements, gaze, presentations, etc. as information for extracting the point-adding target image from the captured image, and the image recognition result is analyzed using these information. By doing so, facial expressions, movements, gazes, presentations, etc. are analyzed. For example, when the usage scene is a medical interview, an organ model or various materials presented by a doctor can be a presentation.

When the analysis of the utterance content and the analysis of the image are completed, the process proceeds to step S43. In step S43, the point addition target integration unit 161 determines the point addition condition. In the determination of the point addition condition, the point addition target and the score extracted by the analysis of the utterance content and the analysis of the image are determined. For example, if the scoring target person greets "Hello", the score will be added as a point addition target, but if the greeting is made with a smile at that time, a further score will be given, and the analysis result of the utterance content will be given. A score that integrates the analysis results of the image is output. Further, when the graded person is showing the material when explaining to the other party of the dialogue, a further score may be given.

As described above, in the information processing apparatus 10, the analysis processing unit 191 performs the analysis processing using the recognition results of the audio data and the captured image data, and the scoring processing unit 192 performs the scoring processing using the analysis results. Then, the scoring information is presented in real time. For example, if the dialogue is in the middle, intermediate information such as an intermediate result is presented as scoring information, and if it is after the dialogue is completed, the final scoring result is presented as scoring information.

Assumed usage scenes for evaluation of interpersonal communication (skill scoring) by information processing system 1, there are medical interviews, call centers, sales, sales, etc. In the information processing apparatus 10, reference information such as point-adding target language information, point-adding target image information, and question information stored in the database, which is a standard for dialogue scoring, is set in advance so as to be contents according to the usage scene. Therefore, it is possible to evaluate interpersonal communication in a desired usage scene where dialogue skills are required. The reference information is not limited to information regarding score addition, but may be information that serves as a reference for dialogue scoring, such as information regarding score deduction. Hereinafter, a case where the usage scene is a medical interview will be described.

(Example of medical interview)
FIG. 6 shows an example of using the information processing system 1 to which the present technology is applied for a medical interview.

In FIG. 6, the information processing device 10 is installed in the space SP1, and the information processing device 20 is installed in the space SP2. Between the information processing device 10 and the information processing device 20, data such as a video corresponding to a captured image taken by each camera and a sound collected by a microphone can be transmitted and received, for example, the connection between both devices can be performed. It is always done in real time while it is established.

In the space SP1, the speaker UA uses the information processing device 10, while in the space SP2, the speaker UB uses the information processing device 20, so that the speaker UA and the speaker UB at a remote location are displayed. You can have a dialogue over. The speaker UA is a grader who is graded for interpersonal communication skills. The speaker UB is the partner of the speaker UA's dialogue. For example, if the speaker UA is a pharmacist, the speaker UB is a patient.

In the information processing device 10 installed in the space SP1, the scoring information is displayed together with the image of the speaker UB. Specifically, as shown in FIG. 7, on the display 121 of the information processing apparatus 10, scoring information regarding the scoring of the dialogue between the speaker UA and the speaker UB is displayed in real time by graphs, tables, flowcharts, and the like. The speaker UA, who plays the role of a pharmacist, can have a dialogue with the speaker UB through the display while checking the scoring information. The speaker UA is photographed by the camera 112 provided in the upper part of the information processing apparatus 10, the voice of the speaker UA is collected by the microphone 113 provided in the lower part, and the speakers provided in the left and right parts. The voice of the speaker UB is output by 122-1 and 122-2.

On the other hand, as shown in FIG. 6, the image of the speaker UA is displayed on the display of the information processing apparatus 20 installed in the space SP2, and the voice of the speaker UA is output from the speaker. The speaker UB can interact with the speaker UA through the display.

(Example of scoring items)
FIG. 8 shows an example of the point-adding target language information stored in the point-adding target language information DB 155. In the language information to be added points, item requirements and scoring item example sentences are set in advance for each scoring item when used for a medical interview.

The scoring item that is "greeting" is required to have an expression similar to a standard greeting or a professional greeting as an expression to start a conversation. Examples of scoring items for "greetings" include "hello", "what happened today", and "it's been a long time". The scoring item that is "self-introduction" is required to have an expression that introduces one's name and position. Examples of scoring items for "self-introduction" include "I am the pharmacist in charge of today", "I am in charge of XX today", and "I am △△".

The scoring item that is "confirmation of name" is required to have an expression asking for the name of the other party. Examples of scoring item examples for "confirm name" include "Are you sure you want to ask your name?", "What is your name?", "Are you sure you want your name?" be. The scoring item, which is the "reason for visiting the hospital," is required to have an expression asking the reason for the other party's visit. Examples of scoring items for "reasons for visit" include "new prescription drugs are available", "what are your requirements today", and "is the same drug as last time?".

(Example of evaluation axis)
FIG. 9 shows the contents and assumed usage scenes for each evaluation axis such as transmission items and easy-to-understand. In this example, in addition to medical interviews, call centers, sales, and sales are illustrated as assumed usage scenes.

The evaluation axis, which is a "communication item", represents an evaluation of how well the items to be communicated were covered, and there are medical interviews and call centers as assumed usage scenes. The evaluation axis, which is "easy to understand," avoids technical terms and expresses the evaluation of whether difficult words have been paraphrased, and medical interviews and call centers are assumed usage scenes. The evaluation axis, which is "empathy," represents the evaluation of whether or not the complaint was expressed in a sympathetic manner, and medical interviews and call centers are assumed usage scenes.

The evaluation axis, which is "composition ability", represents the evaluation of whether the dialogue progressed well, and there are medical interviews and call centers as assumed usage scenes. The evaluation axis, which is "proposal power," represents the evaluation of whether or not a proposal could be made in a contextual manner, and sales, sales, and call centers are assumed usage scenes. The evaluation axis, which is "diffusive", represents the evaluation of whether the topic has been sufficiently expanded, and sales and sales are assumed usage scenes.

The evaluation axis, which is "disclosure", represents the evaluation of whether the merits and demerits are accurately conveyed, and the assumed usage scenes include medical interviews, sales, and sales. The evaluation axis, which is "synchronization", represents the evaluation of whether or not the other party was trying to match the speaking pace, and the medical interview or call center is the assumed usage scene. The evaluation axis, which is "accuracy", represents the evaluation of whether all the transmitted contents were correct information, and medical interviews, sales, call centers, and sales are assumed usage scenes. The evaluation axis, which is "listening ability", represents the evaluation of whether or not the other party is being encouraged to talk at an appropriate timing, and sales and sales are assumed usage scenes.

(Display example of evaluation axis scoring information)
FIG. 10 shows an example of displaying evaluation axis scoring information in a medical interview.

As shown in FIG. 9, when used for a medical interview, seven evaluation axes of communication item, comprehensibility, empathy, composition, disclosure, conformity, and accuracy are used. In FIG. 10, the evaluation axis scoring information is represented by a bar graph extending the evaluation value (score) in the horizontal direction for each of the seven evaluation axes. In the example of FIG. 10, the speaker UA has a high evaluation in terms of accuracy and communication items in the dialogue with the speaker UB, but the evaluation is low in terms of empathy and the like.

(Display example of dialogue scene transition information)
FIG. 11 shows an example of displaying dialogue scene transition information in a medical interview.

In FIG. 11, the dialogue scene transition information is represented by the transition of the dialogue scene and the progress bar at the time of the type of dialogue in which there is a time limit in the medical interview. Dialogue scenes in medical interviews include the introduction (Intro), interview (History Taking), explanation (Explanation), and termination (Closing). In this example, 10 minutes is set as the time limit for the medical interview.

FIG. 11A shows a transition of the dialogue scene and a display example of the progress bar after 6 minutes and 21 seconds have passed from the start of the medical interview. In FIG. 11A, it is shown that the interview with the introductory part is completed by the dialogue between the speaker UA and the speaker UB at the time when 6 minutes and 21 seconds have elapsed, and the explanation is being given.

B in FIG. 11 shows a transition of the dialogue scene and a display example of the progress bar 10 minutes after the start of the medical interview. In FIG. 11B, the dialogue between the speaker UA and the speaker UB shows that the introductory part, the interview, the explanation, the interview, the explanation, and the conclusion were all completed within 10 minutes.

C in FIG. 11 shows a transition of the dialogue scene and a display example of the progress bar after 10 minutes or more have passed from the start of the medical interview. In C of FIG. 11, the dialogue between the speaker UA and the speaker UB prolongs the explanation after the introductory part and the interview are completed, and the explanation continues even if the time limit of 10 minutes is exceeded. Represents.

(Display example of dialogue communication matter information)
FIG. 12 shows an example of displaying dialogue communication matter information in a medical interview.

In FIG. 12, the dialogue communication matter information is represented by a checklist of dialogue progress and completeness of communication matters in the medical interview. Dialogue scenes in medical interviews include the introduction section (Intro), interview (History Taking), explanation (Explanation), and termination (Closing), and communication items are set for each dialogue scene.

The matters to be communicated by the introduction department include greetings, self-introduction, name confirmation, confirmation of reasons for visiting the hospital, etc. The matters to be communicated in the interview include confirmation of the chief complaint, confirmation of the site, confirmation of symptoms, confirmation of the period, and the like. The communication items of the explanation include the method of taking the drug, the period of taking the drug, side effects, precautions for swallowing, and the like. Closing communication includes greetings, gratitude, questioning, and next appointments.

In A of FIG. 12, when the introductory part, the interview, and the dialogue scene that is the explanation proceed in order, the communication items with the check mark among the communication items with the check boxes at the beginning are actually transmitted. It becomes a matter. That is, the speaker UA, in a dialogue with the speaker UB, gives a greeting, introduces himself, and confirms the reason for visiting the hospital, confirms the chief complaint by interview, and explains the medication method in the explanation. The degree of achievement is shown.

In FIG. 12B, when the introductory part, the interview, the explanation, and the closing dialogue scene proceed in order, the speaker UA talks with the speaker UB, and the introductory part gives a greeting, introduces himself, and the reason for visiting the hospital. The main complaint, site, symptom, and period are confirmed by interview, the medication method and side effects are explained in the explanation, and the next appointment is made at the end, and the degree of achievement is shown. ing.

(Display example of scoring information)
FIG. 13 shows an example of real-time display of scoring information in a medical interview.

In FIG. 13, in the information processing apparatus 10 installed in the space SP1, the image 201 including the speaker UB is displayed on the display 121. On the video 201, scoring information including the evaluation axis scoring information 202, the dialogue scene transition information 203, and the dialogue transmission item information 204 is superimposed and displayed. The speaker UA (pharmacist role) who is the scoring target can confirm the scoring information while interacting with the speaker UB (patient role) through the display 121.

In the evaluation axis scoring information 202, the evaluation values (scores) for each of the seven evaluation axes including accuracy and the like are represented by a bar graph. The dialogue scene transition information 203 represents the progress from the start to the explanation (Explanation), which is the dialogue scene at the time when 6 minutes and 21 seconds have elapsed. In other words, as a dialogue scene, the introduction section (Intro) and the interview (History Taking) have been completed, and the explanation (Explanation) has progressed. The overall flow and current progress of these dialogue scenes is represented by the medical interview flow 205.

The dialogue communication matter information 204 represents predetermined communication matters for each dialogue scene which is an introduction part (Intro), a medical examination (History Taking), and an explanation (Explanation), and when it is actually transmitted by the speaker UA, A check mark is entered. In this example, "greeting", "self-introduction", and "confirmation of reason for visit" are checked in the communication items of the introduction department. Among the matters to be communicated in the interview, there is a check mark in "Confirmation of chief complaint". There is a check mark in "How to take medication" in the message of the explanation.

As described above, in the information processing apparatus 10, intermediate information such as the evaluation axis scoring information 202, the dialogue scene transition information 203, and the dialogue transmission matter information 204 is displayed in real time as scoring information on the display 121, so that the scoring target person The speaker UA can have a dialogue with the speaker UB through the display 121 while checking the scoring information.

As a result, the speaker UA can, for example, change the dialogue strategy or paraphrase the utterance content as a dialogue that reflects the content of the confirmed scoring information. In other words, since the scoring information is displayed only on the display of the speaker UA who is the target of scoring, the speaker UA can modify the dialogue policy without being noticed by the speaker UB while having a dialogue. You can improve your dialogue skills.

Further, the speaker UA in the space SP1 in which the information processing device 10 is installed is used as a pharmacist, and the speaker UB in the space SP2 in which the information processing device 20 is installed is used as a patient. The speaker UA and the speaker UB can practice dialogue. Further, the information processing device 10 as the telepresence device and the information processing device 20 are connected to each other via the network 50, so that the speaker UA and the speaker UA can be seen through a display having a size capable of displaying the whole body of the speaker. Since the speaker UB can have a dialogue, it is possible to practice a realistic dialogue in a form closer to reality.

(Example of dialogue with overlap)
By the way, in the dialogue between the speaker UA and the speaker UB, it is assumed that the utterance of one speaker is covered by the utterance of the other speaker and becomes an overlapping dialogue. FIG. 14 shows an example of an overlapping dialogue in a medical interview.

In FIG. 14, when the dialogue between the speaker UA and the speaker UB is represented with the direction from the left side to the right side as the time axis, the portion represented by the black star mark represents the overlapping portion.

That is, while the speaker UA was speaking "What happened today?", The speaker UB started the utterance "Well, to buy medicine for allergies." The utterance part of a speaker UB overlaps with the utterance of the speaker UA. Also, while the speaker UA was uttering "It's an allergy medicine", the speaker UB started uttering "Ah, maybe it was different", so the speaker who is "Ah". The utterance part of the UB overlaps with the utterance of the speaker UA.

Here, in the information processing apparatus 10, the text of the dialogue analyzed by the utterance content analysis unit 154 is as shown in FIG. That is, in the utterance content analysis unit 154, the dialogue is analyzed with "what happened today" as the utterance of the speaker UA and "er, to buy medicine for allergies" as the utterance of the speaker UB. Further, in the utterance content analysis unit 154, the dialogue is analyzed with "It's an allergic drug" as the utterance of the speaker UA and "Oh, maybe it was different" as the utterance of the speaker UB.

Since the speaker UA in the space SP1 in which the information processing device 10 is installed and the speaker UB in the space SP2 in which the information processing device 20 is installed are in different places, each utterance is made by a different microphone. Sound is collected and input on different channels. Therefore, even in a dialogue with overlap, the utterances of each speaker can be easily extracted and converted into text.

After that, the dialogue between the speaker UA and the speaker UB is continued, and as shown in FIG. 15, the text of the dialogue is analyzed by the utterance content analysis unit 154.

For example, in response to the speaker UA's utterance, "Please confirm your name first. Do you come by yourself, Mr. ○○ △△ stated in the prescription?", "No, I am the mother of ○○ △△. When the utterance of the speaker UB is made, the text of the utterance of the speaker UA is "Please confirm your name first." Is it analyzed separately?

Also, in response to the speaker UA's utterance, "You are the mother of Mr. ○○ △△. Is there any medicine your child is currently taking?" When the speaker UB who says "I'm drinking le" is uttered, the following analysis is performed. In other words, the utterance text of the speaker UA is analyzed by dividing it into "Mr. ○○ △△'s mother." And "Are there any medicines your child is currently taking?" In addition, the text of the speaker UB's utterance is analyzed by dividing it into "I have a food allergy" and "I am taking intal before meals."

In this way, in the information processing apparatus 10, the utterance text input in different channels for each speaker is analyzed in a predetermined processing unit such as one sentence unit, so that the utterance sections that frequently occur in the dialogue overlap. It is possible to suppress the influence on the voice recognition result and the analysis result of the utterance content. This makes it possible to more accurately extract the words and actions of the speaker UA, who is the subject of scoring, and perform more accurate scoring.

As described above, in the information processing apparatus 10, the speaker UA (speaker acting as a pharmacist) and the speaker UB (speaker acting as a patient) are used based on the reference information stored in the database as the reference for dialogue scoring. The dialogue is graded, and scoring information (interim results, scoring results, etc.) regarding the scoring of the dialogue is presented in real time to the speaker UA who is the target of scoring. As a result, in order to improve interpersonal communication skills such as medical interviews, it is possible to provide appropriate feedback to the speaker UA, which is the subject of scoring, through scoring and the like to support them. Here, the scoring information presented in real time is presented in a natural manner that can be confirmed by the speaker UA who is the scoring target and cannot be understood by the speaker UB who is the other party of the dialogue. Will be.

Further, in the information processing apparatus 10, reference information such as point-adding target language information and point-adding target image information according to usage scenes such as dialogue interviews is set in advance in a database and used at the time of dialogue scoring. It is possible to perform scoring that absorbs evaluation fluctuations. In addition, since the remote communication technology by the information processing device 10 and the information processing device 20 configured as the telepresence device is used to enable the speaker UA and the speaker UB in different spaces to have a dialogue, the scoring target. It is possible to realize an experience as if the speaker UB, who is the other party of the dialogue, is present to the speaker UA who is the person.

<2. Second Embodiment>

(Device configuration)
FIG. 16 shows another configuration example of the information processing apparatus 10 of FIG.

In FIG. 16, the information processing apparatus 10A has the same reference numerals as those corresponding to the information processing apparatus 10 in FIG. 2, and the description thereof will be omitted. Compared to the information processing device 10 of FIG. 2, the information processing device 10A of FIG. 16 is provided with an input unit 106A instead of the input unit 106.

The input unit 106A has an operation unit 111, a camera 112, a microphone 113, and a sensor 114. The sensor 114 senses spatial information, time information, and the like, and outputs the sensing information obtained as a result of the sensing. The sensor 114 includes various sensors such as a distance measuring sensor and an image sensor. The camera 112 can be included in the sensor 114 as an image sensor.

In the information processing system 1 of FIG. 1, the information processing apparatus 20 can be configured to be provided with the sensor 114 in the same manner as the information processing apparatus 10A shown in FIG.

(Functional configuration)
FIG. 17 shows an example of a functional configuration of the information processing apparatus 10A of FIG.

In FIG. 17, the information processing apparatus 10A has the same reference numerals as the parts corresponding to the information processing apparatus 10 in FIG. 3, and the description thereof will be omitted. Compared with the information processing device 10 of FIG. 3, the information processing device 10A of FIG. 17 has a sensing information input unit 171 instead of the image input unit 157, the image recognition unit 158, the image analysis unit 159, and the point addition target image information DB 160. , Sensing information recognition unit 172, sensing information analysis unit 173, and point-adding target sensing information DB 174 are provided.

In FIG. 17, the analysis processing unit 191A is configured by the sentence division unit 153, the utterance content analysis unit 154, the point addition target language information DB 155, the time acquisition unit 156, the sensing information analysis unit 173, and the point addition target sensing information DB 174.

The sensing information input unit 171 inputs sensing information (sensing information obtained on the speaker UA side) to the sensing information recognition unit 172. The sensing information recognition unit 172 performs sensing information recognition processing using the sensing information from the sensing information input unit 171. In this sensing information recognition process, the sensing information to be processed is recognized, and the sensing information recognition result is supplied to the sensing information analysis unit 173. This sensing information may include, for example, distance information, image information, and biological information such as the heart rate and brain waves of the speaker UA.

The sensing information analysis unit 173 performs sensing information analysis processing using the sensing information recognition result from the sensing information recognition unit 172 and the point addition target sensing information stored in the point addition target sensing information DB 174. The point-added target sensing information is information for extracting (identifying) the point-added target sensing information when scoring interpersonal communication skills. In the sensing information analysis process, the sensing information of the point addition target is analyzed from the recognized sensing information, and the analysis result is supplied to the point addition target integration unit 161. Further, in the sensing information analysis process, the time information from the time acquisition unit 156 is used, and the analysis result of the sensing information is associated with the analysis result of the utterance content.

The point-adding target integration unit 161 is supplied with the analysis result of the utterance content from the utterance content analysis unit 154, the analysis result of the sensing information from the sensing information analysis unit 173, and the time information from the time acquisition unit 156. The point-adding target integration unit 161 uses the question information stored in the question information DB 162 to perform an integration process for integrating the analysis result of the utterance content associated with the time information and the analysis result of the sensing information.

FIG. 17 shows a configuration when the sensing information obtained by the speaker UA side in the space SP1 is used, but the sensing information obtained by the speaker UB side in the space SP2 may be used. FIG. 18 shows another example of the functional configuration of the information processing apparatus 10A of FIG.

In FIG. 18, the information processing apparatus 10A has the same reference numerals as the parts corresponding to the information processing apparatus 10A in FIG. 17, and the description thereof will be omitted. Compared to the information processing device 10A of FIG. 17, the information processing device 10A of FIG. 18 is newly provided with a sensing information input unit 181, a sensing information recognition unit 182, a sensing information analysis unit 183, and a point addition target sensing information DB 184.

In FIG. 18, the sentence division unit 153, the utterance content analysis unit 154, the point addition target language information DB 155, the time acquisition unit 156, the sensing information analysis unit 173, the point addition target sensing information DB 174, the sensing information analysis unit 183, and the point addition target sensing information. The analysis processing unit 191B is configured by the DB 184.

The sensing information input unit 181 inputs the sensing information obtained on the speaker UB side to the sensing information recognition unit 182. The sensing information recognition unit 182 performs sensing information recognition processing using the sensing information from the sensing information input unit 181. In this sensing information recognition process, the sensing information to be processed is recognized, and the sensing information recognition result is supplied to the sensing information analysis unit 183. This sensing information may include, for example, distance information, image information, and biological information such as the heart rate and brain waves of the speaker UB.

The sensing information analysis unit 183 performs sensing information analysis processing using the sensing information recognition result from the sensing information recognition unit 182 and the point addition target sensing information stored in the point addition target sensing information DB 184. The point-added target sensing information is information for extracting (identifying) the point-added target sensing information when scoring interpersonal communication skills. In the sensing information analysis process, the sensing information of the point addition target is analyzed from the recognized sensing information, and the analysis result is supplied to the point addition target integration unit 161.

In the point-adding target integration unit 161, the analysis result of the utterance content from the utterance content analysis unit 154, the analysis result of the sensing information from the sensing information analysis unit 173, and the analysis result of the sensing information from the sensing information analysis unit 183 are provided. Time information from the time acquisition unit 156 is supplied. The point-adding target integration unit 161 uses the question information stored in the question information DB 162 to perform an integration process for integrating the analysis result of the utterance content associated with the time information and the analysis result of the sensing information.

The sensing information processed by the information processing device 10A may be acquired from an electronic device such as a smartphone, a wearable terminal, a measuring instrument, or a measuring instrument possessed by the speaker UA or the speaker UB. In this electronic device, an acceleration sensor that measures acceleration in the three directions of the XYZ axes, a gyro sensor that measures the angular velocity of the three axes of the XYZ axes, a distance measuring sensor that measures the distance, and the heart rate, body temperature, and posture of the organism. It is possible to have various sensors such as a biological sensor that measures such information, a proximity sensor that measures a proximity object, and a magnetic sensor that measures the magnitude and direction of a magnetic field (magnetic field).

(Display example of dialogue partner evaluation information)
FIG. 19 shows an example of displaying the dialogue partner evaluation information in the medical interview.

In FIG. 19, the dialogue partner evaluation information represents an evaluation calculated from the sensing information of the dialogue partner in the medical interview. In the example of FIG. 19, when the speaker UA has a dialogue with the speaker UB at the time of a medical interview, the degree of understanding, empathy, and interest are used as evaluation axes for the speaker UA based on the sensing information obtained on the speaker UB side. We are looking for degree, likability, and reliability. For example, when the sensor on the speaker UB side detects that the pupil of the speaker UB has opened at the timing when the speaker UA speaks, the biometric information of the dialogue partner is added, such as adding a favorability score. It is possible to present the evaluation information calculated from the response and the response.

In FIG. 19, the dialogue partner evaluation information is represented by a bar graph extending the evaluation value (score) in the horizontal direction for each of the five evaluation axes. However, the bar graph starts from 0 and extends to the right or left side, and the positive value increases toward the right side to indicate a positive evaluation (high evaluation), while the negative evaluation (low evaluation) increases toward the left side. Represents.

Here, when the evaluations shown in A, B, and C in FIG. 19 are displayed in chronological order, that is, A in FIG. 19 is the evaluation in the first half of the dialogue, and B in FIG. 19 is the middle stage of the dialogue. When C in FIG. 19 and C in FIG. 19 represent the evaluation in the latter half of the dialogue, it is assumed that the dialogue is as follows, for example.

That is, in the first half of the dialogue, as shown by the evaluation of A in FIG. 19, the speaker UB gives a slight evaluation to the speaker UA, but the value itself is not large. In the middle of the dialogue, as shown by the evaluation of B in FIG. 19, all the evaluations except the comprehension level are low from the information such as the speaker UB having a dark facial expression. After that, in the latter half of the dialogue, as shown by the evaluation of C in FIG. 19, the degree of understanding, empathy, and degree of interest are based on information such as the speaker UB nodding or having a bright facial expression. , Favorability, and reliability are all highly evaluated.

(Display example of scoring information)
FIG. 20 shows an example of real-time display of scoring information in a medical interview.

In FIG. 20, as in FIG. 13, the speaker UA can confirm the scoring information while interacting with the speaker UB through the display 121, but the scoring information further includes the dialogue partner evaluation information 206. ..

In the dialogue partner evaluation information 206, the evaluation value (score) calculated from the sensing information on the speaker UB side is represented by a bar graph. In this example, the speaker UB has a high evaluation of comprehension, empathy, likability, and reliability for the speaker UA, but has a low evaluation of interest.

In addition, the dialogue scene transition information 203 indicates that the dialogue scene has completed the introduction section (Intro) and the interview (History Taking) and has progressed to the explanation (Explanation). Correspondingly, the measured values (facial, head, body) obtained from the sensing information are also shown. For example, the state of the speaker UB can be inferred from the measured values of the face, the head, and the body.

As described above, in the information processing apparatus 10A, intermediate information such as the evaluation axis scoring information 202, the dialogue scene transition information 203, the dialogue transmission item information 204, and the dialogue partner evaluation information 206 is displayed in real time as scoring information, so that the scoring is performed. The speaker UA, who is the target person, can have a dialogue with the speaker UB through the display 121 while checking the scoring information and considering what kind of evaluation the speaker UB is doing. The evaluation axis scoring information 202, the dialogue scene transition information 203, the dialogue transmission item information 204, and the dialogue partner evaluation information 206 are examples of intermediate information, and at least one of these information is included in the intermediate information. It suffices if it is.

(Example of screen display transition)
21 and 22 show an example of transition between input information and scoring information. 21 and 22 show examples of displaying input information and scoring information 21 seconds after the start of the medical interview and 6 minutes and 21 seconds later, respectively. The input information and the scoring information are superimposed and displayed on the video including the speaker UB.

In FIGS. 21 and 22, the display 121 of the information processing apparatus 10A displays scoring information including the evaluation axis scoring information 202, the dialogue scene transition information 203, the dialogue transmission matter information 204, and the dialogue partner evaluation information 206. Has been done.

In FIG. 21, the dialogue scene transition information 203 shows an introduction unit (Intro), which is a dialogue scene at the time when 21 seconds have elapsed from the start, and a progress bar showing the progress thereof. Further, in the input information 211, in the dialogue scene which is the introduction part, the speaker UA made a utterance "Nice to meet you", and the expression at that time was bowed with a smile. ..

Dialogue communication item information 204 shows a checklist with a check mark in "greeting" among the communication items of the introduction department (Intro). The dialogue partner evaluation information 206 represents an evaluation value calculated from the sensing information on the speaker UB side. In this example, the bar graph shows that the speaker UB greeted by the speaker UA has increased empathy, likability, and reliability for the speaker UA.

In the evaluation axis scoring information 202, the evaluation values for each of the seven evaluation axes are represented by a bar graph. In this example, as shown in the dialogue transmission item information 204, since the speaker UA has already given a greeting as the transmission item of the introduction unit (Intro), the transmission item in the evaluation axis scoring information 202 is correspondingly. The evaluation value (value of the bar graph) of the evaluation axis is increasing (+1). Further, as shown in the dialogue partner evaluation information 206, since the speaker UB has sympathy, likability, and reliability with respect to the speaker UA, the conformity in the evaluation axis scoring information 202 accordingly. The evaluation value (value of the bar graph) of a certain evaluation axis is increasing (+0.5). In addition, in FIG. 21, in order to make the explanation easy to understand, the factor of the increase in the evaluation value is represented by an arrow, but this arrow is actually hidden.

After that (6 minutes later), in FIG. 22, the dialogue scene transition information 203 shows an explanation (Explanation) which is a dialogue scene at the time when 6 minutes 21 has elapsed from the start, and a progress bar showing the progress thereof. .. In addition, in the input information 211, in the dialogue scene described, the speaker UA made an utterance saying "Please drink 2 tablets with water or lukewarm water after each meal." It is shown in the balloon that he was doing the gesture of.

Dialogue communication item information 204 shows a checklist in which the medication method is checked among the communication items of the explanation (Explanation). In the dialogue partner evaluation information 206, as an evaluation value calculated from the sensing information on the speaker UB side, an evaluation value in which the degree of understanding and reliability by the speaker UB who received the explanation of the medication method is highly evaluated is a bar graph. It is represented.

Further, as shown in the dialogue communication item information 204, since the speaker UA has already explained the medication method as the communication item of the explanation (Explanation), the transmission item in the evaluation axis scoring information 202 accordingly. The evaluation value of the evaluation axis is increasing (+1). Further, as shown in the dialogue partner evaluation information 206, since the speaker UB has an understanding level and a reliability level with respect to the speaker UA, the evaluation is the accuracy in the evaluation axis scoring information 202 accordingly. The evaluation value of the axis is increasing (+1). Although the factor of the increase in the evaluation value is also indicated by an arrow in FIG. 22, this arrow is actually hidden.

The evaluation axis scoring information 202 is intermediate information (interim result) presented during the dialogue between the speaker UA and the speaker UB, but is also information presented as the final scoring result after the dialogue is completed, and the dialogue is transmitted. It can be said that the information is linked to the intermediate information such as the matter information 204 and the dialogue partner evaluation information 206.

In this way, in the information processing device 10A, when the speaker UA and the speaker UB have a dialogue through the display, the language analysis, facial expressions, body movements, how the other party feels, etc. are scored together. Then, the screen is updated in real time according to the input information and the scoring information. As a result, the speaker UA can confirm the scoring result of the dialogue without being noticed by the speaker UB during the dialogue with the speaker UB, and can perform a dialogue reflecting the confirmed contents in the subsequent dialogue. can.

Next, referring to FIGS. 23 to 26, when the speaker UA acting as a pharmacist and the speaker UB acting as a patient have a dialogue in a medical interview, the speaker can obtain the scoring information displayed in real time on the display. A specific example of a case where the UA confirms and determines the subsequent dialogue policy will be described.

(First example)
FIG. 23 shows an example of feedback when the speaker UA notices an omission of communication items in a medical interview and changes the dialogue strategy. In FIG. 23, the passage of time is represented by an arrow pointing from the upper side to the lower side in the figure. Further, in FIG. 23, the screen on the right side of the arrow is a screen displayed on the display 121 of the information processing apparatus 10A on the speaker UA side (scoring information superimposed on the image of the speaker UB). Further, the picture of the bell in the figure is an image of the sound output from the speaker 122. The meanings of the arrows and screens are the same in FIGS. 25 and 26, which will be described later.

In FIG. 23, at time t11 immediately after the start of the medical interview, the speaker UA is uttering "Hello, what happened today?" By scoring the linguistic analysis of this utterance together with facial expressions, body movements, how the other party feels, etc., the following scoring information is superimposed on the speaker UB image on the display 121. Is displayed. In other words, among the matters communicated by the Intro, there is a checklist with a check mark in the confirmation of "greeting" and "reason for visiting", and that there is sympathy, interest, and reliability from the speaker UB. The scoring information including the bar graph is displayed.

At time t12, the speaker UB is speaking, "Please give me a medicine for hypertension. This is a prescription." On the other hand, at time t13, the speaker UA made an utterance saying, "This is my first medicine. May I take about 5 minutes to explain the medicine?" There is. By scoring the linguistic analysis of this utterance and the evaluation of the dialogue partner together, the display of the scoring information including the checklist and the bar graph is updated.

At time t14, the speaker UB made an utterance saying, "No, I've heard enough from the doctor, so it's okay without explanation." At this point, the speaker UA confirmed the checklist on the screen and recognized that he failed to introduce himself, and also confirmed the bar graph and the reliability did not increase, so he introduced himself together with the role of the pharmacist. And change the dialogue strategy.

At time t15, the speaker UA said, "I'm late, but I'm △△, a pharmacist at the XX pharmacy. Why don't you check the medicines with me?" I'm making a certain utterance. By scoring the linguistic analysis of this utterance and the evaluation of the dialogue partner together, the display of the scoring information including the checklist and the bar graph is updated. In other words, among the matters communicated by the Intro, in addition to "greeting" and "confirmation of the reason for visiting the hospital", a checklist with a check mark in "self-introduction" and the reliability from the speaker UB increased. Scoring information is displayed, including a bar graph indicating that it has been done.

At this time, the information processing apparatus 10A outputs a predetermined sound from the speaker 122 to notify the speaker UA that the points have been added by noticing the omission of the transmitted matter and changing the dialogue strategy. Can be done. Further, the notification of the point addition is not limited to the notification by sound, and may be notified, for example, by displaying information indicating the point addition on the screen or by performing a tactile presentation by vibration. However, in order to notify that the change in the dialogue strategy has led to the addition of points, feedback is given in a method different from the usual addition notification so that there is some difference from the normal addition notification.

FIG. 24 is a flowchart illustrating the flow of feedback processing.

When the utterance of the speaker UA is accepted (S101), it is determined whether or not the utterance includes the end sign (S102). If it is determined that the end sign is included (Yes in S102), the process ends. On the other hand, when it is determined that the end sign is not included (No in S102), it is determined whether or not there is a problem in the dialogue (S103).

If it is determined that there is no problem in the dialogue (No in S103), information indicating points to be added is displayed on the screen (S104). On the other hand, if it is determined that there is a problem in the dialogue (Yes in S103), it is determined whether or not the position is recoverable (S105). When it is determined that the position is not recoverable (No in S105), information indicating a goal is displayed on the screen (S106).

When it is determined that the position is recoverable (Yes in S105), the utterance of the speaker UA is accepted (S107). When the utterance of the speaker UA (the utterance related to recovery) is accepted, it is determined whether or not the speaker has recovered by the utterance (S108).

If it is determined that recovery has not been performed (No in S108), the process returns to step S105, and the subsequent processing is repeated. That is, if the position is recoverable, it is determined whether or not the speaker UA's utterance is received and recovered again, and if recovery is not possible, information indicating a goal is displayed on the screen.

On the other hand, if it is determined that the recovery has been made (Yes in S108), the points to be added are notified by sound (S109), and the information indicating the points to be added is displayed on the screen (S110). The notification of the points added may be given at least one of the sound output and the screen display.

When step S104, S106, or S110 is completed, the process returns to step S101, and the above-mentioned process is repeated.

Here, when the first example shown in FIG. 23 is applied to the feedback processing shown in FIG. 24, it becomes as follows. That is, in the first example, the speaker UA who plays the role of a pharmacist speaks, but since "self-introduction" and "confirmation of name" are omitted from the communication items of the introduction department (Intro), dialogue is performed. It is determined that there is a problem with (Yes in S103), and since the dialogue scene has not progressed to the explanation (Explanation), it is determined that the position is recoverable (Yes in S105). Then, since the speaker UA who confirmed the screen changes the dialogue strategy and speaks about self-introduction, it is determined that the recovery has been made (Yes in S108), and points are added by sound output or screen display (Yes). S109, S110).

(Second example)
FIG. 25 shows an example of feedback in the absence of a change in dialogue strategy for omission of communication in a medical interview.

In FIG. 25, at time t21 to time t24, a dialogue between the speaker UA and the speaker UB is performed as in the case of time t11 to time t14 in FIG. 23, and the language analysis of the speech and the evaluation of the dialogue partner are scored together. By doing so, a screen including a checklist, a bar graph, etc. is displayed.

At time t25, the speaker UA checks the checklist on the screen and recognizes that he failed to introduce himself, but considering the time limit in the medical interview (for example, 10 minutes), the remaining time is short. Therefore, it is judged that it is better to proceed to the interview as it is. Therefore, the speaker UA is making an utterance saying, "I want to make sure that the medicine is correct, so can I ask for the symptoms?"

By scoring the language analysis of this utterance and the evaluation of the dialogue partner together, the display of the scoring information including the checklist and the bar graph is updated. That is, the dialogue scene includes the introduction section (Intro) and the interview (History Taking), and among the communication items of the introduction section, "greeting" and "confirmation of the reason for visiting the hospital" are checked, and among the communication items of the interview. A checklist with a checkmark in "Confirm Chief Complaint" and scoring information, including a bar graph with no particular change in score, are displayed. That is, as shown on this screen, since the dialogue scene has transitioned from the introduction section to the interview, even if the communication matter of the introduction section is spoken at this point, no points are added.

Here, when the second example shown in FIG. 25 is applied to the feedback processing shown in FIG. 24, it becomes as follows. That is, in the second example, the speaker UA who plays the role of a pharmacist speaks, but since "self-introduction" and "confirmation of name" are omitted from the communication items of the introduction section (Intro), dialogue is performed. It is determined that there is a problem with (Yes in S103). However, since the dialogue scene has progressed to a medical examination (History Taking), it is determined that the position is not recoverable (No in S105), and information indicating a goal is displayed on the screen (S106).

(Third example)
FIG. 26 shows an example of feedback when the comprehension level of the speaker UB as a patient changes due to the paraphrase of the speaker UA as a pharmacist in a medical interview.

In FIG. 26, at time t31, which is a predetermined time after the start of the medical interview, the speaker UA makes an utterance, "When does the symptom remit?". By scoring the linguistic analysis of the utterance and the evaluation of the dialogue partner together, the following scoring information is superimposed on the video of the speaker UB and displayed on the display 121. In other words, it includes a checklist with checkmarks in "Greetings" and "Confirmation of reasons for visiting the hospital" and a bar graph that is highly evaluated in terms of interest, likability, reliability, etc. The scoring information is displayed.

Due to this update of the scoring information, the evaluation of the degree of understanding has dropped sharply compared to the displayed contents of the scoring information before the update. Further, on this screen, measured values (facial, head, body) indicating that the speaker UB is tilting his head with a troubled expression are displayed.

On the other hand, at time t32, the speaker UB is uttering "Is it lunch?" At this time, the speaker UA checks the bar graph and measured values on the screen, and from the sudden drop in comprehension and facial expression, the word "remission" may not be understood, so paraphrase it into another word. Judge that it is better.

At time t33, the speaker UA is making an utterance, "Well, then can you tell me when the symptoms will be alleviated?" By scoring the linguistic analysis of the utterance and the evaluation of the dialogue partner together, the following scoring information is superimposed and displayed on the display 121.

That is, the scoring information including the bar graph (the comprehension level increased from 10 to 80) in which the evaluation of the comprehension level of the speaker UB increased is displayed. At this time, the degree of similarity (for example, the degree of similarity 0.8) of the wording to be paraphrased (for example, the wording including “remission” and the wording including “relaxation”) can be calculated and added to the evaluation. Further, in this scoring information, a measured value indicating that the speaker UB does not have a particularly troubled facial expression or the like is displayed.

Further, at this time, in the information processing apparatus 10A, by outputting a predetermined sound from the speaker 122, the degree of understanding of the speaker UB acting as a patient changes with the speaker UA by paraphrasing the speaker UA acting as a pharmacist. It is possible to notify that points have been added. After that, at time t34, the speaker UB understands the intention of the speaker UA's question and makes an utterance saying, "Oh, I see, it calms down when I warm my stomach."

Here, when the third example shown in FIG. 26 is applied to the feedback processing shown in FIG. 24, it becomes as follows. That is, in the third example, the speaker UA acting as a pharmacist is speaking (speech including "relief"), but after the speech, the understanding of the speaker UB acting as a patient is significantly reduced and non-verbal information. Since a negative value is given in, it is judged that there is a problem in the dialogue (Yes in S103).

In addition, since the dialogue scene is the introduction part (Intro) and the speaker change is once, it is determined that the position is recoverable (Yes in S105), and the speaker UA speaks with the same content (“remission”). After repeating (paraphrasing utterance), the speaker UB's comprehension level was improved, so it was judged that the speaker had recovered (Yes in S108), and points were added by sound output or screen display (S109, S110). )

Regarding professionals in the medical field, doctors, nurses, pharmacists, etc. have a test related to interpersonal communication in the objective clinical examination (OSCE: Objective Structured Clinical Examination). The evaluation of interpersonal communication by the information processing system 1 can be used for practicing various tests.

<3. Modification example>

(Example of apparel customer service)
In the above description, the case where the information processing system 1 is used for the medical interview is described as the assumed usage scene of the evaluation of the interpersonal communication, but it may be used for the evaluation of the interpersonal communication such as the call center, sales, and sales. Hereinafter, a case where the information processing system 1 to which the present technology is applied is used for evaluation of interpersonal communication in apparel sales will be described.

FIG. 27 shows an example of displaying a real-time score when used for apparel customer service. When used for apparel customer service, the speaker UA is an apparel salesperson and the speaker UB is a customer (customer).

In FIG. 27, the image 221 including the speaker UB is displayed on the display 121 of the information processing device 10A on the speaker UA side. The video 221 displays the evaluation axis scoring information 222, the dialogue scene transition information 223, the dialogue transmission item information 224, and the scoring information including the dialogue partner evaluation information 226 superimposed.

In the evaluation axis scoring information 222, the evaluation value (score) for each evaluation axis, which is listening ability, accuracy, disclosure, diffusivity, and proposal ability, is represented by a bar graph. The dialogue scene transition information 223 shows the progress from the start to the product proposal (Recommendation), which is the dialogue scene at the time when 6 minutes and 21 seconds have elapsed. In other words, as a dialogue scene, the call (Small talk) and the needs search (Needs exploration) have been completed, and the product proposal (Recommendation) has progressed. The overall flow of these dialogue scenes and the current progress are represented by the customer service flow 225.

In the dialogue communication matter information 224, predetermined communication matters are expressed for each dialogue scene such as small talk, needs exploration, and product proposal (Recommendation), and are actually transmitted by the speaker UA. If so, a check mark will be added. In this example, "Call", "Seasonal topics", and "Introduction of new items" are checked among the items to be communicated. Among the items to be communicated in the needs search, there are check marks in "Item color", "Item shape", "Item material", and "Wearing scene". Among the items to be communicated in the product proposal, "Item introduction" and "Reference to trends" are checked.

In the dialogue partner evaluation information 226, the evaluation value (score) calculated from the sensing information on the speaker UB side is represented by a bar graph. In this example, the speaker UB has a relatively high degree of understanding, empathy, likability, and reliability for the speaker UA, and has a low degree of interest.

When the

information processing devices

10 and 10A are used for apparel customer service, they have the configuration shown in FIG. 3, FIG. 17 or FIG. 18, as in the case of using for medical interviews, but the information stored in each database is stored. It is necessary to change the information for apparel customer service. That is, information for apparel customer service, not information for medical interviews, may be registered in the point-adding target language information DB 155, the point-adding target image information DB 160, the question information DB 162, the point-adding target sensing information DB 174, and the point-adding target sensing information DB 184. ..

(Other configurations of the system)
In the above description, in the information processing system 1, the

information processing devices

10 and 10A have been described as having the configuration shown in FIG. 3, FIG. 17, or FIG. 18, but the

information processing devices

10 and 10A are described in FIG. The server 30 connected to the network 50 may have some functions of the configuration shown in FIG. 17 or FIG.

FIG. 28 shows another configuration example of an embodiment of an information processing system to which the present technology is applied.

In FIG. 28, the information processing system 1A is configured such that the information processing device 10, the information processing device 20, and the server 30 are connected to each other via the network 50.

For example, among the components shown in FIG. 3, an analysis processing unit 191 and a scoring processing unit 192 are provided in the server 30, and a voice input unit 151, a voice recognition unit 152, an image input unit 157, an image recognition unit 158, and intermediate information. The information processing apparatus 10 is provided with a display unit 163, an intermediate result notification unit 164, and a scoring result display unit 166.

In this case, the information processing apparatus 10 transmits data including the results of voice recognition and image recognition to the server 30 via the network 50. The server 30 performs analysis processing and scoring processing using the data transmitted from the information processing apparatus 10. The server 30 transmits data including the processing result to the information processing apparatus 10 via the network 50. The information processing apparatus 10 displays information or outputs voice based on the data transmitted from the server 30.

It should be noted that other configurations may be used, such as providing the voice recognition unit 152 and the image recognition unit 158 on the server 30 side instead of the information processing device 10. That is, in the present technology, it is possible to adopt a cloud computing configuration in which one function is shared by a plurality of devices via a network and jointly processed.

Further, the information processing device 10 may be composed of a processing device such as a home server and an input / output device such as a display device. In this case, the processing device and the input / output device are provided in the same space (same room, same building, etc.). That is, among the components shown in FIG. 3, the analysis processing unit 191 and the scoring processing unit 192, the voice recognition unit 152, and the image recognition unit 158 are provided in the processing device, and the voice input unit 151, the image input unit 157, and the intermediate unit are provided. The information display unit 163, the intermediate result notification unit 164, and the scoring result display unit 166 are provided in the input / output device.

In the above description, the case where the information processing device 10 is configured as a telepresence device such as a display device has been shown, but the information processing device 10 may be an electronic device such as a PC (Personal Computer). For example, when the information processing device 10 and the information processing device 20 are PCs, the speaker UA and the speaker UB at a remote location have a dialogue through the display by using an application such as a video call application. be able to.

Although the information processing device 10 has been described in FIG. 28, the server 30 can similarly process some of the functions of the information processing device 10A.

As described above, in the present technology, the information processing device 10 or the information processing device 10A is a speaker UA (for example, a pharmacist role or an apparel) in the space SP1 based on the reference information stored in the database as a reference for dialogue scoring. The dialogue between the speaker acting as a salesperson) and the speaker UB in the space SP2 (for example, the speaker acting as a patient or a customer) is scored, and the scoring information regarding the scoring of the dialogue is presented to the speaker UA in real time. As a result, the speaker UA can confirm the scoring information in real time at the time of dialogue with the speaker UB. Therefore, in order to improve interpersonal communication skills such as medical interviews and apparel customer service, it is possible to provide appropriate feedback to the speaker UA, which is the subject of scoring, through scoring and the like to support them.

Currently, when interpersonal communication that requires dialogue skills is required, there was a request to evaluate the skills and utilize them for improvement, but it was difficult to objectively evaluate the skills and provide feedback. On the other hand, in this technique, intermediate information such as intermediate results can be presented in real time to the speaker UA who is the scoring target, so that the skill of the scoring target is objectively evaluated and fed back. However, it can be used to improve that skill.

In addition, in order to improve interpersonal communication skills, it is ideal to perform simulations in a situation closer to the actual situation, but at present, training of dialogue practice partners, evaluation fluctuations among graders, and geographical restrictions Therefore, it is costly to carry out such a simulation. On the other hand, in the present technology, the speaker UA and the speaker UB in different spaces can have a dialogue by using the information processing device 10 and the information processing device 20 configured as telepresence devices. Since there are no restrictions and you can participate in the dialogue more easily, it is easy to practice and train the dialogue. Further, in the information processing apparatus 10, since the scoring is performed using the reference information stored in the database as the reference for the dialogue scoring, an event such as an evaluation fluctuation between the graders does not occur. As a result, by using this technique, it is possible to easily realize a simulation in a situation closer to the actual production than in the present situation.

Furthermore, at present, from the viewpoint of evaluation of interpersonal communication, text-based counseling evaluation is being attempted, but a system such as this technology that analyzes and scores the utterances of multiple speakers in real time is available. Does not exist.

The program executed by the

information processing devices

10 and 10A (CPU101) can be recorded and provided on a removable recording medium such as a package medium, for example. The removable recording medium includes a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, and the like. The program can also be provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting.

In the present specification, the processes performed by the

information processing devices

10 and 10A (CPU101) according to the program do not necessarily have to be performed in chronological order in the order described in the above flowchart. That is, the processing performed by the

information processing devices

10 and 10A (CPU101) according to the program includes processing executed in parallel or individually (for example, parallel processing or processing by an object).

Further, the program may be processed by one computer (processor) or may be distributed processed by a plurality of computers. For example, each step in the above flowchart may be executed by one device or may be shared and executed by a plurality of devices. Further, when a plurality of processes are included in one step, the plurality of processes included in the one step can be executed by one device or shared by a plurality of devices. In addition, the program may be transferred to a distant computer for execution.

In the present specification, the system means a set of a plurality of components (devices, modules (parts), etc.), and it does not matter whether all the components are in the same housing. Therefore, a plurality of devices housed in separate housings and connected via a network, and a device in which a plurality of modules are housed in one housing are both systems. ..

The embodiment of the present technique is not limited to the above-described embodiment, and various changes can be made without departing from the gist of the present technique. Further, the effects described in the present specification are merely exemplary and not limited, and other effects may be used.

Note that this technology can have the following configuration.

(1)
Based on the reference information that serves as the basis for scoring the dialogue, the dialogue between the first speaker in the first space and the second speaker in the second space different from the first space is scored.
An information processing device including a processing unit that presents scoring information regarding the scoring of the dialogue to the first speaker in real time.
(2)
The information processing apparatus according to (1), wherein the scoring information includes a scoring result of the dialogue or intermediate information in the middle of the dialogue.
(3)
The information processing apparatus according to (2), wherein the processing unit presents the scoring information including intermediate information at that time during a dialogue between the first speaker and the second speaker.
(4)
The intermediate information includes evaluation axis scoring information indicating evaluation for each evaluation axis according to the usage scene, dialogue scene transition information indicating transition of dialogue scene, and dialogue transmission item indicating achievement level of transmission item for each dialogue scene. The information processing apparatus according to (3) above, which includes at least one piece of information and dialogue partner evaluation information representing the evaluation of the first speaker by the second speaker.
(5)
When the first speaker who confirmed the scoring information presented in real time has a dialogue reflecting the confirmation contents in the subsequent dialogue, the processing unit has the scoring information according to the reflection result. The information processing apparatus according to any one of (2) to (4) above.
(6)
The information processing device according to (5), wherein the processing unit notifies the first speaker of the reflection result by a method different from the presentation of the scoring information in real time.
(7)
Described in any one of (2) to (6) above, the processing unit presents the scoring information including the scoring result of the entire dialogue after the dialogue between the first speaker and the second speaker is completed. Information processing equipment.
(8)
The information processing device according to any one of (1) to (7) above, wherein the processing unit scores the dialogue based on the utterance content of the first speaker.
(9)
The processing unit targets at least one of the similarity with the preset scoring item example sentence, the composition in the dialogue, the speech attitude classification, and the speech attitude classification in the dialogue composition, and the first speaker. The information processing apparatus according to (8) above, which scores the dialogue by analyzing the utterance of the above.
(10)
The information processing apparatus according to (8) or (9), wherein the processing unit scores the dialogue based on sensing information about at least one speaker of the first speaker and the second speaker. ..
(11)
The information processing apparatus according to (10), wherein the sensing information is information obtained by various sensors and is information according to the timing of utterance by the first speaker.
(12)
The sensing information includes a captured image captured by a camera.
The processing unit scores the dialogue by analyzing the captured image of at least one of the preset facial expressions of the speaker, the movement of the speaker, the line of sight of the speaker, and the presentation. The information processing apparatus according to (11) above.
(13)
The information according to any one of (10) to (12) above, wherein the processing unit presents the scoring information obtained by applying preset point addition conditions to the analysis result of the utterance content and the sensing information. Processing equipment.
(14)
The first speaker is a scoring subject and is
The second speaker is the other party of the dialogue of the graded person, and is
The processing unit
An image including the second speaker is displayed on the display, and the image is displayed.
The information processing apparatus according to any one of (1) to (13), wherein the information corresponding to the scoring information is displayed on the display, or the sound corresponding to the scoring information is output from the speaker.
(15)
A first camera and a first display are installed in the first space.
A second camera and a second display are installed in the second space.
In the above (14), an image taken by a camera installed in one space between the first space and the second space is displayed in real time by a display installed in the other space. The information processing device described.
(16)
The first camera installed in the first space and the first display are integrally configured, and the second camera installed in the second space and the second display are integrated. The information processing apparatus according to (15) above, which is interconnected with other information processing apparatus configured as above via a network.
(17)
With a first sensor
The information processing device according to (16), wherein the processing unit scores the dialogue based on sensing information obtained from the first sensor and the second sensor included in the other information processing device.
(18)
Information processing equipment
Based on the reference information that serves as the basis for scoring the dialogue, the dialogue between the first speaker in the first space and the second speaker in the second space different from the first space is scored.
An information processing method that presents scoring information regarding the scoring of the dialogue to the first speaker in real time.

1,1A information processing system, 10,10A information processing device, 20 information processing device, 30 server, 50 network, 101 CPU, 102 ROM, 103 RAM, 106, 106A input unit, 107 output unit, 108 storage unit, 109 communication Unit, 111 operation unit, 112 camera, 113 microphone, 114 sensor, 121 display, 122 speaker, 151 voice input unit, 152 voice recognition unit, 153 sentence division unit, 154 speech content analysis unit, 155 point-adding target language information DB, 156 Time acquisition unit, 157 image input unit, 158 image recognition unit, 159 image analysis unit, 160 point addition target image information DB, 161 point addition target integration unit, 162 question information DB, 163 intermediate information display unit, 164 intermediate result notification unit, 165 Scoring result generation unit, 166 scoring result display unit, 171 sensing information input unit, 172 sensing information recognition unit, 173 sensing information analysis unit, 174 scoring target sensing information DB, 181 sensing information input unit, 182 sensing information recognition unit, 183 sensing Information analysis unit, 184 point-adding target sensing information DB, 191,191A, 191B analysis processing unit, 192 scoring processing unit

Claims

Based on the reference information that serves as the basis for scoring the dialogue, the dialogue between the first speaker in the first space and the second speaker in the second space different from the first space is scored.
An information processing device including a processing unit that presents scoring information regarding the scoring of the dialogue to the first speaker in real time.
The information processing apparatus according to claim 1, wherein the scoring information includes a scoring result of the dialogue or intermediate information in the middle of the dialogue.
The information processing device according to claim 2, wherein the processing unit presents the scoring information including intermediate information at that time during a dialogue between the first speaker and the second speaker.
The intermediate information includes evaluation axis scoring information indicating evaluation for each evaluation axis according to the usage scene, dialogue scene transition information indicating transition of dialogue scene, and dialogue transmission item indicating achievement level of transmission item for each dialogue scene. The information processing apparatus according to claim 3, further comprising at least one piece of information and dialogue partner evaluation information representing the evaluation of the first speaker by the second speaker.
When the first speaker who confirmed the scoring information presented in real time has a dialogue reflecting the confirmation contents in the subsequent dialogue, the processing unit has the scoring information according to the reflection result. The information processing apparatus according to claim 2.
The information processing device according to claim 5, wherein the processing unit notifies the first speaker of the reflection result by a method different from the presentation of the scoring information in real time.
The information processing device according to claim 2, wherein the processing unit presents the scoring information including the scoring result of the entire dialogue after the dialogue between the first speaker and the second speaker is completed.
The information processing device according to claim 1, wherein the processing unit scores the dialogue based on the utterance content of the first speaker.
The processing unit targets at least one of the similarity with the preset scoring item example sentence, the composition in the dialogue, the speech attitude classification, and the speech attitude classification in the dialogue composition, and the first speaker. The information processing apparatus according to claim 8, wherein the dialogue is scored by analyzing the utterance of the above.
The information processing device according to claim 8, wherein the processing unit scores the dialogue based on sensing information about at least one speaker of the first speaker and the second speaker.
The information processing device according to claim 10, wherein the sensing information is information obtained by various sensors and is information according to the timing of utterance by the first speaker.
The sensing information includes a captured image captured by a camera.
The processing unit scores the dialogue by analyzing the captured image of at least one of the preset facial expressions of the speaker, the movement of the speaker, the line of sight of the speaker, and the presentation. The information processing apparatus according to claim 11.
The information processing device according to claim 10, wherein the processing unit presents the scoring information obtained by applying a preset point addition condition to the analysis result of the utterance content and the sensing information.
The first speaker is a scoring subject and is
The second speaker is the other party of the dialogue of the graded person, and is
The processing unit
An image including the second speaker is displayed on the display, and the image is displayed.
The information processing apparatus according to claim 1, wherein the information corresponding to the scoring information is displayed on the display, or the sound corresponding to the scoring information is output from the speaker.
A first camera and a first display are installed in the first space.
A second camera and a second display are installed in the second space.
14. The 14th aspect of the present invention, wherein an image taken by a camera installed in one space between the first space and the second space is displayed in real time by a display installed in the other space. Information processing equipment.
The first camera installed in the first space and the first display are integrally configured, and the second camera installed in the second space and the second display are integrated. The information processing apparatus according to claim 15, which is interconnected with other information processing apparatus configured as follows via a network.
With a first sensor
The information processing device according to claim 16, wherein the processing unit scores the dialogue based on sensing information obtained from the first sensor and the second sensor included in the other information processing device.
Information processing equipment
Based on the reference information that serves as the basis for scoring the dialogue, the dialogue between the first speaker in the first space and the second speaker in the second space different from the first space is scored.
An information processing method that presents scoring information regarding the scoring of the dialogue to the first speaker in real time.