WO2022102432A1 - Information processing device and information processing method - Google Patents

Information processing device and information processing method Download PDF

Info

Publication number
WO2022102432A1
WO2022102432A1 PCT/JP2021/039945 JP2021039945W WO2022102432A1 WO 2022102432 A1 WO2022102432 A1 WO 2022102432A1 JP 2021039945 W JP2021039945 W JP 2021039945W WO 2022102432 A1 WO2022102432 A1 WO 2022102432A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
speaker
dialogue
scoring
information processing
Prior art date
Application number
PCT/JP2021/039945
Other languages
French (fr)
Japanese (ja)
Inventor
侑理 網本
裕美 倉沢
Original Assignee
ソニーグループ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニーグループ株式会社 filed Critical ソニーグループ株式会社
Publication of WO2022102432A1 publication Critical patent/WO2022102432A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B19/00Teaching not covered by other main groups of this subclass
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B19/00Teaching not covered by other main groups of this subclass
    • G09B19/04Speaking

Definitions

  • the present technology relates to an information processing device and an information processing method, and particularly to an information processing device and an information processing method capable of appropriately supporting a person to be scored when scoring interpersonal communication that requires dialogue skills. ..
  • Patent Document 1 discloses a simulation system that simulates a psychological change of a model patient in a medical interview and changes the answer of the model patient according to a question content and an interview procedure.
  • This technology was made in view of such a situation, and it is possible to give appropriate feedback to the graded persons in order to improve the skill of interpersonal communication and to support the improvement of the skill of dialogue communication. Is.
  • the information processing device of one aspect of the present technology has a first speaker in the first space and a second speaker in a second space different from the first space based on the reference information that serves as a reference for dialogue scoring. It is an information processing apparatus provided with a processing unit that scores a dialogue by the speakers and presents scoring information regarding the scoring of the dialogue to the first speaker in real time.
  • the information processing apparatus is different from the first speaker in the first space and the first space based on the reference information which is the standard for dialogue scoring. It is an information processing method that scores a dialogue by a second speaker in a space and presents scoring information regarding the scoring of the dialogue to the first speaker in real time.
  • the first speaker in the first space and the second space different from the first space are based on the reference information which is the standard for dialogue scoring.
  • the dialogue by the second speaker in the space is graded, and the scoring information regarding the scoring of the dialogue is presented to the first speaker in real time.
  • the information processing device on one aspect of the present technology may be an independent device or an internal block constituting one device.
  • FIG. 1 shows a configuration example of an embodiment of an information processing system to which the present technology is applied.
  • the information processing system 1 is configured by connecting an information processing device 10 as a telepresence device and an information processing device 20 to each other via a network 50.
  • the information processing device 10 and the information processing device 20 are installed in different spaces such as different buildings and different rooms. That is, the user in the vicinity of the information processing device 10 (first speaker) and the user in the vicinity of the information processing device 20 (second speaker) are in remote locations such as remote areas. Become a speaker with whom you have a dialogue. The first speaker is the grader whose interpersonal communication skills are graded, and the second speaker is the dialogue partner of the grader.
  • the information processing device 10 and the information processing device 20 each have a large-sized display (for example, a size capable of displaying the whole body of the speaker), a camera for photographing the surrounding state, a speaker's speech, an environmental sound, and the like.
  • a microphone that collects the sound around the building, a speaker that outputs the sound, etc. are installed.
  • the information processing device 10 displays a video corresponding to the captured image taken by the information processing device 20 and information superimposed on the video, and outputs the sound collected by the information processing device 20.
  • the information processing apparatus 20 displays an image corresponding to the captured image captured by the information processing apparatus 10, and outputs the sound collected by the information processing apparatus 10.
  • the first speaker and the second speaker in different spaces can have a dialogue through the display.
  • the network 50 is configured to include a communication network such as the Internet, an intranet, or a mobile phone network, and enables interconnection between devices using a communication protocol such as TCP / IP (Transmission Control Protocol / Internet Protocol). ..
  • a communication protocol such as TCP / IP (Transmission Control Protocol / Internet Protocol).
  • FIG. 2 shows a configuration example of the information processing apparatus 10 of FIG.
  • the information processing device 10 is an electronic device such as a display device that can be connected to a network 50 such as the Internet, and is configured as a telepresence device.
  • the CPU Central Processing Unit
  • the ROM Read Only Memory
  • the RAM Random Access Memory
  • the CPU 101 controls the operation of each unit and performs various processes by executing the program recorded in the ROM 102 or the storage unit 108.
  • Various data are appropriately stored in the RAM 103.
  • the input / output I / F 105 is also connected to the bus 104.
  • An input unit 106, an output unit 107, a storage unit 108, and a communication unit 109 are connected to the input / output I / F 105.
  • the input unit 106 supplies various input data to each unit including the CPU 101.
  • the input unit 106 includes an operation unit 111, a camera 112, and a microphone 113.
  • the operation unit 111 is operated by the user and outputs operation data corresponding to the operation.
  • the operation unit 111 is composed of physical buttons, a touch panel, and the like.
  • the camera 112 generates and outputs captured image data by photoelectrically converting the light from the subject incident therein and performing signal processing on the electric signal obtained as a result.
  • the camera 112 includes an image sensor, a signal processing unit, and the like.
  • the microphone 113 receives sound as vibration of air and outputs sound data as an electric signal thereof.
  • the output unit 107 outputs various information according to the control from the CPU 101.
  • the output unit 107 has a display 121 and a speaker 122.
  • the display 121 displays an image or the like according to the captured image data according to the control from the CPU 101.
  • the display 121 is composed of a panel unit such as a liquid crystal panel or an OLED (Organic Light Emitting Diode) panel, a signal processing unit, and the like.
  • the speaker 122 outputs sound according to the sound data according to the control from the CPU 101.
  • the storage unit 108 records various data and programs according to the control from the CPU 101.
  • the CPU 101 reads various data from the storage unit 108, processes them, and executes a program.
  • the storage unit 108 is configured as an auxiliary storage device.
  • the storage unit 108 may be configured as an internal storage such as an HDD (Hard Disk Drive) or an SSD (Solid State Drive), or may be an external storage such as a memory card.
  • the communication unit 109 communicates with other devices via the network 50 according to the control from the CPU 101.
  • the communication unit 109 is configured as a communication module corresponding to cellular communication (for example, LTE-Advanced, 5G (5th Generation), etc.), wireless communication such as wireless LAN (Local Area Network), or wired communication.
  • cellular communication for example, LTE-Advanced, 5G (5th Generation), etc.
  • wireless communication such as wireless LAN (Local Area Network), or wired communication.
  • the information processing device 10 configured as described above is an example, for example, a short-range wireless communication circuit that performs wireless communication according to a short-range wireless communication standard such as Bluetooth (registered trademark) or NFC (Near Field Communication), or a short-range wireless communication circuit.
  • a power supply circuit or the like can be provided.
  • the display 121 may be a projector. With the projector, it is possible to project and display an image corresponding to the captured image data on an arbitrary screen.
  • the configuration of the information processing apparatus 20 is the same as that of the information processing apparatus 10 shown in FIG. 2, so the description thereof will be omitted.
  • FIG. 3 shows an example of a functional configuration of the information processing apparatus 10 of FIG.
  • the information processing apparatus 10 includes a voice input unit 151, a voice recognition unit 152, a sentence division unit 153, a speech content analysis unit 154, a point-adding target language information DB 155, a time acquisition unit 156, an image input unit 157, and an image recognition unit. It has an image analysis unit 159, a point addition target image information DB 160, a point addition target integration unit 161, a question information DB 162, an intermediate information display unit 163, an intermediate result notification unit 164, a scoring result generation unit 165, and a scoring result display unit 166.
  • the analysis processing unit 191 is composed of the sentence division unit 153, the utterance content analysis unit 154, the point addition target language information DB 155, the time acquisition unit 156, the image analysis unit 159, and the point addition target image information DB 160.
  • the scoring processing unit 192 is configured by the scoring target integration unit 161, the question information DB 162, and the scoring result generation unit 165.
  • the voice recognition unit 152, the image recognition unit 158, the analysis processing unit 191 and the scoring processing unit 192 are realized by the CPU 101 of FIG. 2 executing a program.
  • the point-adding target language information DB 155, the point-adding target image information DB 160, and the question information DB 162 are recorded in the storage unit 108 of FIG.
  • the voice input unit 151 corresponds to the microphone 113 of FIG. 2
  • the image input unit 157 corresponds to the camera 112 of FIG.
  • the intermediate information display unit 163 and the scoring result display unit 166 correspond to the display 121 of FIG. 2
  • the intermediate result notification unit 164 corresponds to the display 121 or the speaker 122 of FIG.
  • the voice input unit 151 inputs the voice data of the speaker's utterance to the voice recognition unit 152.
  • the voice recognition unit 152 performs voice recognition processing using the voice data from the voice input unit 151.
  • the voice data of the speaker's utterance is converted into text data by using a statistical method or the like, and the voice recognition result is supplied to the sentence division unit 153 and the time acquisition unit 156.
  • the sentence division unit 153 performs sentence division processing using the voice recognition result from the voice recognition unit 152.
  • the text corresponding to the speaker's utterance is divided into predetermined processing units, and the sentence division result is supplied to the utterance content analysis unit 154.
  • the utterance content analysis unit 154 performs an utterance content analysis process using the sentence division result from the sentence division unit 153 and the point addition target language information stored in the point addition target language information DB 155.
  • the language information to be added is information for extracting (identifying) the language (wording) to be added when scoring the skill of interpersonal communication.
  • the utterance content analysis process by using the similarity between texts, the text including the language to be added is analyzed from the divided text, and the analysis result is supplied to the point addition target integration unit 161.
  • the time acquisition unit 156 acquires the time according to the voice recognition result from the voice recognition unit 152, and supplies the time information to the image analysis unit 159 and the point addition target integration unit 161.
  • the image input unit 157 inputs captured image data including the speaker into the image recognition unit 158.
  • the image recognition unit 158 performs image recognition processing using the captured image data from the image input unit 157. In this image recognition process, a speaker (face, body part, etc.) as an object is recognized by using a pattern recognition technique or the like, and the image recognition result is supplied to the image analysis unit 159.
  • the image analysis unit 159 performs image analysis processing using the image recognition result from the image recognition unit 158 and the point addition target image stored in the point addition target image information DB 160.
  • the image to be added is information for extracting (identifying) the image to be added when scoring the skill of interpersonal communication.
  • the image to be added points is analyzed from the recognized image of the speaker or the like, and the analysis result is supplied to the point addition target integration unit 161. Further, in the image analysis process, the time information from the time acquisition unit 156 is used, and the analysis result of the image is associated with the analysis result of the utterance content.
  • the point addition target integration unit 161 is supplied with the analysis result of the utterance content from the utterance content analysis unit 154, the image analysis result from the image analysis unit 159, and the time information from the time acquisition unit 156.
  • the point-adding target integration unit 161 uses the question information stored in the question information DB 162 to perform an integration process for integrating the analysis result of the utterance content associated with the time information and the analysis result of the image.
  • Question information is information on how points should be added to the points to be added.
  • scoring information regarding the scoring of the dialogue is obtained by integrating the language to be scored indicated by the analysis result of the utterance content and the image to be scored indicated by the analysis result of the image and performing addition. Be done.
  • This scoring information includes the scoring results of the dialogue or intermediate information presented during the dialogue.
  • the point-adding target integration unit 161 supplies intermediate information such as whether or not the speaker has acted to add points to the intermediate information display unit 163 when the dialogue is in progress.
  • the intermediate information display unit 163 displays the intermediate information from the point addition target integration unit 161 in real time.
  • the point-adding target integration unit 161 supplies the intermediate result such as the scoring result of the dialogue from the start of the dialogue to the time point to the intermediate result notification unit 164 as intermediate information.
  • the intermediate result notification unit 164 notifies the intermediate information such as the intermediate result from the point addition target integration unit 161 in real time.
  • the point-adding target integration unit 161 supplies the scoring result of the dialogue to the scoring result generation unit 165.
  • the scoring result generation unit 165 performs a scoring result generation process using the scoring result of the dialogue from the point addition target integration unit 161.
  • the scoring result generation process by performing predetermined processing such as weighting of important items among the points to be added, the final scoring result (score) in the entire dialogue is generated, and the scoring result is the scoring result display unit 166. Is supplied to.
  • the scoring result display unit 166 displays the scoring result from the scoring result generation unit 165.
  • step S11 the voice recognition unit 152 performs voice recognition processing using the voice data from the voice input unit 151, and converts the utterance of the speaker who is the scoring target person into the text Ui (0 ⁇ i ⁇ N). .. A time stamp is added to this text.
  • step S12 the sentence dividing unit 153 divides the text Ui from the voice recognition unit 152 into divided texts u 1 , u 2 , ..., Un.
  • step S13 the analysis processing unit 191 and the scoring processing unit 192 analyze the divided text u j (0 ⁇ j ⁇ n) and calculate each scoring item for the divided text u j to perform the analysis scoring process. conduct. The details of this analysis scoring process will be described later with reference to the flowchart of FIG.
  • step S13 When the process of step S13 is completed, the process proceeds to step S14.
  • step S14 it is determined whether or not the score has been added based on the result of the analysis scoring process.
  • step S15 the process proceeds to step S15, and the processes of steps S15 and S16 are executed, so that intermediate information corresponding to the result of the analysis scoring process is presented.
  • the intermediate information display unit 163 displays the contents of the correct answer and the intermediate information such as the intermediate result of adding the score to the scoring item to be added. For example, if the speaker who is the target of scoring speaks "hello" and the scoring item "greeting" is added, a list with the scoring item checked, a graph with the score added, etc. Is displayed.
  • the intermediate result notification unit 164 may notify the intermediate information such as the intermediate result by sound or the like.
  • step S17 the process of step S17 is executed, so that intermediate information corresponding to the result of the analysis scoring process is presented.
  • the intermediate information display unit 163 displays intermediate information such as score sheet information. For example, if the speaker who is the target of scoring does not make an utterance to be scored, the scoring items in the list will not be checked and the score will not be added to the graph or the like.
  • step S18 when it is determined that i ⁇ N, that is, when it is determined that the next utterance in the dialogue exists after processing all the divided text uji in the text Ui to be processed. , The processing after step S11 is repeated with the text Ui + 1 converted from the next utterance as the processing target.
  • the display of the intermediate information according to the determination result of the presence or absence of points added regarding the dialogue is updated in real time.
  • step S18 the scoring result display unit 166 displays the scoring result including the final score corresponding to the series of utterances in the dialogue.
  • This scoring result is the final scoring result and corresponds to the intermediate result at the end of the dialogue among the intermediate results that are continuously updated during the dialogue.
  • the scoring information at each point in time is presented during the dialogue and at the end of the dialogue, but each utterance such as whether the information necessary for the utterance of the speaker who is the scoring target was included during the dialogue. While the interim results including the scores of are presented, at the end of the dialogue, the scoring results including the scores of the entire dialogue such as the composition of the dialogue and the overall evaluation are presented.
  • step S31 the utterance content analysis unit 154 analyzes the utterance content of the divided text uj using the point-adding target language information stored in the point-adding target language information DB 155.
  • the similarity between the divided text uji and the scoring item example sentence is analyzed (S32)
  • the composition in the dialogue is analyzed (S33)
  • the speech attitude classification is analyzed (S34)
  • the utterance in the dialogue composition is analyzed.
  • Attitude classification analysis (S35) and the like are performed.
  • the language information to be added includes information on scoring item example sentences, composition in dialogue, speech attitude classification, etc. as information for extracting the language to be added from the divided text, and is divided using these information.
  • the speech attitude classification is a classification of how the speaker speaks to the other party.
  • the evaluator can register and grade the example sentences to be added for each scoring item.
  • step S36 the image input unit 157 acquires the captured image at the corresponding time.
  • step S37 the image recognition unit 158 performs image recognition on the acquired captured image.
  • step S38 the image analysis unit 159 analyzes the operation included in the image recognition result by using the point addition target image information stored in the point addition target image information DB 160.
  • this image analysis analysis of the facial expression of the speaker (S39), analysis of the movement of the speaker (S40), analysis of the line of sight of the speaker (S41), analysis of the presentation (S42), and the like are performed.
  • the point-adding target image information includes information on facial expressions, movements, gaze, presentations, etc. as information for extracting the point-adding target image from the captured image, and the image recognition result is analyzed using these information.
  • facial expressions, movements, gazes, presentations, etc. are analyzed.
  • an organ model or various materials presented by a doctor can be a presentation.
  • step S43 the point addition target integration unit 161 determines the point addition condition.
  • the point addition target and the score extracted by the analysis of the utterance content and the analysis of the image are determined. For example, if the scoring target person greets "Hello", the score will be added as a point addition target, but if the greeting is made with a smile at that time, a further score will be given, and the analysis result of the utterance content will be given. A score that integrates the analysis results of the image is output. Further, when the graded person is showing the material when explaining to the other party of the dialogue, a further score may be given.
  • the analysis processing unit 191 performs the analysis processing using the recognition results of the audio data and the captured image data
  • the scoring processing unit 192 performs the scoring processing using the analysis results. Then, the scoring information is presented in real time. For example, if the dialogue is in the middle, intermediate information such as an intermediate result is presented as scoring information, and if it is after the dialogue is completed, the final scoring result is presented as scoring information.
  • reference information such as point-adding target language information, point-adding target image information, and question information stored in the database, which is a standard for dialogue scoring, is set in advance so as to be contents according to the usage scene. Therefore, it is possible to evaluate interpersonal communication in a desired usage scene where dialogue skills are required.
  • the reference information is not limited to information regarding score addition, but may be information that serves as a reference for dialogue scoring, such as information regarding score deduction.
  • the usage scene is a medical interview will be described.
  • FIG. 6 shows an example of using the information processing system 1 to which the present technology is applied for a medical interview.
  • the information processing device 10 is installed in the space SP1, and the information processing device 20 is installed in the space SP2.
  • data such as a video corresponding to a captured image taken by each camera and a sound collected by a microphone can be transmitted and received, for example, the connection between both devices can be performed. It is always done in real time while it is established.
  • the speaker UA uses the information processing device 10
  • the speaker UB uses the information processing device 20, so that the speaker UA and the speaker UB at a remote location are displayed.
  • the speaker UA is a grader who is graded for interpersonal communication skills.
  • the speaker UB is the partner of the speaker UA's dialogue. For example, if the speaker UA is a pharmacist, the speaker UB is a patient.
  • the scoring information is displayed together with the image of the speaker UB.
  • scoring information regarding the scoring of the dialogue between the speaker UA and the speaker UB is displayed in real time by graphs, tables, flowcharts, and the like.
  • the speaker UA who plays the role of a pharmacist, can have a dialogue with the speaker UB through the display while checking the scoring information.
  • the speaker UA is photographed by the camera 112 provided in the upper part of the information processing apparatus 10, the voice of the speaker UA is collected by the microphone 113 provided in the lower part, and the speakers provided in the left and right parts.
  • the voice of the speaker UB is output by 122-1 and 122-2.
  • the image of the speaker UA is displayed on the display of the information processing apparatus 20 installed in the space SP2, and the voice of the speaker UA is output from the speaker.
  • the speaker UB can interact with the speaker UA through the display.
  • FIG. 8 shows an example of the point-adding target language information stored in the point-adding target language information DB 155.
  • item requirements and scoring item example sentences are set in advance for each scoring item when used for a medical interview.
  • the scoring item that is "greeting” is required to have an expression similar to a standard greeting or a professional greeting as an expression to start a conversation. Examples of scoring items for “greetings” include “hello”, “what happened today”, and “it's been a long time”.
  • the scoring item that is “self-introduction” is required to have an expression that introduces one's name and position. Examples of scoring items for "self-introduction” include “I am the pharmacist in charge of today", “I am in charge of XX today", and "I am ⁇ ".
  • the scoring item that is "confirmation of name” is required to have an expression asking for the name of the other party. Examples of scoring item examples for “confirm name” include “Are you sure you want to ask your name?”, "What is your name?", "Are you sure you want your name?” be.
  • the scoring item, which is the "reason for visiting the hospital,” is required to have an expression asking the reason for the other party's visit. Examples of scoring items for "reasons for visit” include “new prescription drugs are available", “what are your requirements today", and “is the same drug as last time?".
  • FIG. 9 shows the contents and assumed usage scenes for each evaluation axis such as transmission items and easy-to-understand.
  • assumed usage scenes such as transmission items and easy-to-understand.
  • call centers, sales, and sales are illustrated as assumed usage scenes.
  • the evaluation axis which is a "communication item”, represents an evaluation of how well the items to be communicated were covered, and there are medical interviews and call centers as assumed usage scenes.
  • the evaluation axis which is "easy to understand,” avoids technical terms and expresses the evaluation of whether difficult words have been paraphrased, and medical interviews and call centers are assumed usage scenes.
  • the evaluation axis, which is "empathy,” represents the evaluation of whether or not the complaint was expressed in a sympathetic manner, and medical interviews and call centers are assumed usage scenes.
  • the evaluation axis which is "composition ability”, represents the evaluation of whether the dialogue progressed well, and there are medical interviews and call centers as assumed usage scenes.
  • the evaluation axis which is "proposal power,” represents the evaluation of whether or not a proposal could be made in a contextual manner, and sales, sales, and call centers are assumed usage scenes.
  • the evaluation axis which is "diffusive”, represents the evaluation of whether the topic has been sufficiently expanded, and sales and sales are assumed usage scenes.
  • the evaluation axis which is "disclosure”, represents the evaluation of whether the merits and demerits are accurately conveyed, and the assumed usage scenes include medical interviews, sales, and sales.
  • FIG. 10 shows an example of displaying evaluation axis scoring information in a medical interview.
  • the evaluation axis scoring information is represented by a bar graph extending the evaluation value (score) in the horizontal direction for each of the seven evaluation axes.
  • the speaker UA has a high evaluation in terms of accuracy and communication items in the dialogue with the speaker UB, but the evaluation is low in terms of empathy and the like.
  • FIG. 11 shows an example of displaying dialogue scene transition information in a medical interview.
  • the dialogue scene transition information is represented by the transition of the dialogue scene and the progress bar at the time of the type of dialogue in which there is a time limit in the medical interview.
  • Dialogue scenes in medical interviews include the introduction (Intro), interview (History Taking), explanation (Explanation), and termination (Closing). In this example, 10 minutes is set as the time limit for the medical interview.
  • FIG. 11A shows a transition of the dialogue scene and a display example of the progress bar after 6 minutes and 21 seconds have passed from the start of the medical interview.
  • FIG. 11A it is shown that the interview with the introductory part is completed by the dialogue between the speaker UA and the speaker UB at the time when 6 minutes and 21 seconds have elapsed, and the explanation is being given.
  • FIG. 11 shows a transition of the dialogue scene and a display example of the progress bar 10 minutes after the start of the medical interview.
  • the dialogue between the speaker UA and the speaker UB shows that the introductory part, the interview, the explanation, the interview, the explanation, and the conclusion were all completed within 10 minutes.
  • C in FIG. 11 shows a transition of the dialogue scene and a display example of the progress bar after 10 minutes or more have passed from the start of the medical interview.
  • the dialogue between the speaker UA and the speaker UB prolongs the explanation after the introductory part and the interview are completed, and the explanation continues even if the time limit of 10 minutes is exceeded. Represents.
  • FIG. 12 shows an example of displaying dialogue communication matter information in a medical interview.
  • the dialogue communication matter information is represented by a checklist of dialogue progress and completeness of communication matters in the medical interview.
  • Dialogue scenes in medical interviews include the introduction section (Intro), interview (History Taking), explanation (Explanation), and termination (Closing), and communication items are set for each dialogue scene.
  • the matters to be communicated by the introduction department include greetings, self-introduction, name confirmation, confirmation of reasons for visiting the hospital, etc.
  • the matters to be communicated in the interview include confirmation of the chief complaint, confirmation of the site, confirmation of symptoms, confirmation of the period, and the like.
  • the communication items of the explanation include the method of taking the drug, the period of taking the drug, side effects, precautions for swallowing, and the like.
  • Closing communication includes greetings, gratitude, questioning, and next appointments.
  • FIG. 12B when the introductory part, the interview, the explanation, and the closing dialogue scene proceed in order, the speaker UA talks with the speaker UB, and the introductory part gives a greeting, introduces himself, and the reason for visiting the hospital.
  • the main complaint, site, symptom, and period are confirmed by interview, the medication method and side effects are explained in the explanation, and the next appointment is made at the end, and the degree of achievement is shown. ing.
  • FIG. 13 shows an example of real-time display of scoring information in a medical interview.
  • the image 201 including the speaker UB is displayed on the display 121.
  • scoring information including the evaluation axis scoring information 202, the dialogue scene transition information 203, and the dialogue transmission item information 204 is superimposed and displayed.
  • the speaker UA (pharmacist role) who is the scoring target can confirm the scoring information while interacting with the speaker UB (patient role) through the display 121.
  • the evaluation axis scoring information 202 the evaluation values (scores) for each of the seven evaluation axes including accuracy and the like are represented by a bar graph.
  • the dialogue scene transition information 203 represents the progress from the start to the explanation (Explanation), which is the dialogue scene at the time when 6 minutes and 21 seconds have elapsed.
  • the explanation Explanation
  • the overall flow and current progress of these dialogue scenes is represented by the medical interview flow 205.
  • the dialogue communication matter information 204 represents predetermined communication matters for each dialogue scene which is an introduction part (Intro), a medical examination (History Taking), and an explanation (Explanation), and when it is actually transmitted by the speaker UA, A check mark is entered.
  • introduction part Intro
  • medical examination History Taking
  • Explanation An explanation
  • a check mark is entered.
  • "greeting”, “self-introduction”, and "confirmation of reason for visit” are checked in the communication items of the introduction department.
  • intermediate information such as the evaluation axis scoring information 202, the dialogue scene transition information 203, and the dialogue transmission matter information 204 is displayed in real time as scoring information on the display 121, so that the scoring target person
  • the speaker UA can have a dialogue with the speaker UB through the display 121 while checking the scoring information.
  • the speaker UA can, for example, change the dialogue strategy or paraphrase the utterance content as a dialogue that reflects the content of the confirmed scoring information.
  • the speaker UA can modify the dialogue policy without being noticed by the speaker UB while having a dialogue. You can improve your dialogue skills.
  • the speaker UA in the space SP1 in which the information processing device 10 is installed is used as a pharmacist
  • the speaker UB in the space SP2 in which the information processing device 20 is installed is used as a patient.
  • the speaker UA and the speaker UB can practice dialogue.
  • the information processing device 10 as the telepresence device and the information processing device 20 are connected to each other via the network 50, so that the speaker UA and the speaker UA can be seen through a display having a size capable of displaying the whole body of the speaker. Since the speaker UB can have a dialogue, it is possible to practice a realistic dialogue in a form closer to reality.
  • FIG. 14 shows an example of an overlapping dialogue in a medical interview.
  • the speaker UB started the utterance "Well, to buy medicine for allergies.”
  • the utterance part of a speaker UB overlaps with the utterance of the speaker UA.
  • the speaker UB started uttering "Ah, maybe it was different”, so the speaker who is "Ah”.
  • the utterance part of the UB overlaps with the utterance of the speaker UA.
  • the text of the dialogue analyzed by the utterance content analysis unit 154 is as shown in FIG. That is, in the utterance content analysis unit 154, the dialogue is analyzed with "what happened today" as the utterance of the speaker UA and "er, to buy medicine for allergies" as the utterance of the speaker UB. Further, in the utterance content analysis unit 154, the dialogue is analyzed with "It's an allergic drug" as the utterance of the speaker UA and "Oh, maybe it was different" as the utterance of the speaker UB.
  • each utterance is made by a different microphone. Sound is collected and input on different channels. Therefore, even in a dialogue with overlap, the utterances of each speaker can be easily extracted and converted into text.
  • the dialogue between the speaker UA and the speaker UB is continued, and as shown in FIG. 15, the text of the dialogue is analyzed by the utterance content analysis unit 154.
  • the utterance text input in different channels for each speaker is analyzed in a predetermined processing unit such as one sentence unit, so that the utterance sections that frequently occur in the dialogue overlap. It is possible to suppress the influence on the voice recognition result and the analysis result of the utterance content. This makes it possible to more accurately extract the words and actions of the speaker UA, who is the subject of scoring, and perform more accurate scoring.
  • the speaker UA (speaker acting as a pharmacist) and the speaker UB (speaker acting as a patient) are used based on the reference information stored in the database as the reference for dialogue scoring.
  • the dialogue is graded, and scoring information (interim results, scoring results, etc.) regarding the scoring of the dialogue is presented in real time to the speaker UA who is the target of scoring.
  • scoring information (interim results, scoring results, etc.) regarding the scoring of the dialogue is presented in real time to the speaker UA who is the target of scoring.
  • the scoring information presented in real time is presented in a natural manner that can be confirmed by the speaker UA who is the scoring target and cannot be understood by the speaker UB who is the other party of the dialogue. Will be.
  • reference information such as point-adding target language information and point-adding target image information according to usage scenes such as dialogue interviews is set in advance in a database and used at the time of dialogue scoring. It is possible to perform scoring that absorbs evaluation fluctuations.
  • the scoring target since the remote communication technology by the information processing device 10 and the information processing device 20 configured as the telepresence device is used to enable the speaker UA and the speaker UB in different spaces to have a dialogue, the scoring target. It is possible to realize an experience as if the speaker UB, who is the other party of the dialogue, is present to the speaker UA who is the person.
  • FIG. 16 shows another configuration example of the information processing apparatus 10 of FIG.
  • the information processing apparatus 10A has the same reference numerals as those corresponding to the information processing apparatus 10 in FIG. 2, and the description thereof will be omitted.
  • the information processing device 10A of FIG. 16 is provided with an input unit 106A instead of the input unit 106.
  • the input unit 106A has an operation unit 111, a camera 112, a microphone 113, and a sensor 114.
  • the sensor 114 senses spatial information, time information, and the like, and outputs the sensing information obtained as a result of the sensing.
  • the sensor 114 includes various sensors such as a distance measuring sensor and an image sensor.
  • the camera 112 can be included in the sensor 114 as an image sensor.
  • the information processing apparatus 20 can be configured to be provided with the sensor 114 in the same manner as the information processing apparatus 10A shown in FIG.
  • FIG. 17 shows an example of a functional configuration of the information processing apparatus 10A of FIG.
  • the information processing apparatus 10A has the same reference numerals as the parts corresponding to the information processing apparatus 10 in FIG. 3, and the description thereof will be omitted.
  • the information processing device 10A of FIG. 17 has a sensing information input unit 171 instead of the image input unit 157, the image recognition unit 158, the image analysis unit 159, and the point addition target image information DB 160.
  • Sensing information recognition unit 172, sensing information analysis unit 173, and point-adding target sensing information DB 174 are provided.
  • the analysis processing unit 191A is configured by the sentence division unit 153, the utterance content analysis unit 154, the point addition target language information DB 155, the time acquisition unit 156, the sensing information analysis unit 173, and the point addition target sensing information DB 174.
  • the sensing information input unit 171 inputs sensing information (sensing information obtained on the speaker UA side) to the sensing information recognition unit 172.
  • the sensing information recognition unit 172 performs sensing information recognition processing using the sensing information from the sensing information input unit 171. In this sensing information recognition process, the sensing information to be processed is recognized, and the sensing information recognition result is supplied to the sensing information analysis unit 173.
  • This sensing information may include, for example, distance information, image information, and biological information such as the heart rate and brain waves of the speaker UA.
  • the sensing information analysis unit 173 performs sensing information analysis processing using the sensing information recognition result from the sensing information recognition unit 172 and the point addition target sensing information stored in the point addition target sensing information DB 174.
  • the point-added target sensing information is information for extracting (identifying) the point-added target sensing information when scoring interpersonal communication skills.
  • the sensing information of the point addition target is analyzed from the recognized sensing information, and the analysis result is supplied to the point addition target integration unit 161. Further, in the sensing information analysis process, the time information from the time acquisition unit 156 is used, and the analysis result of the sensing information is associated with the analysis result of the utterance content.
  • the point-adding target integration unit 161 is supplied with the analysis result of the utterance content from the utterance content analysis unit 154, the analysis result of the sensing information from the sensing information analysis unit 173, and the time information from the time acquisition unit 156.
  • the point-adding target integration unit 161 uses the question information stored in the question information DB 162 to perform an integration process for integrating the analysis result of the utterance content associated with the time information and the analysis result of the sensing information.
  • FIG. 17 shows a configuration when the sensing information obtained by the speaker UA side in the space SP1 is used, but the sensing information obtained by the speaker UB side in the space SP2 may be used.
  • FIG. 18 shows another example of the functional configuration of the information processing apparatus 10A of FIG.
  • the information processing apparatus 10A has the same reference numerals as the parts corresponding to the information processing apparatus 10A in FIG. 17, and the description thereof will be omitted.
  • the information processing device 10A of FIG. 18 is newly provided with a sensing information input unit 181, a sensing information recognition unit 182, a sensing information analysis unit 183, and a point addition target sensing information DB 184.
  • the analysis processing unit 191B is configured by the DB 184.
  • the sensing information input unit 181 inputs the sensing information obtained on the speaker UB side to the sensing information recognition unit 182.
  • the sensing information recognition unit 182 performs sensing information recognition processing using the sensing information from the sensing information input unit 181. In this sensing information recognition process, the sensing information to be processed is recognized, and the sensing information recognition result is supplied to the sensing information analysis unit 183.
  • This sensing information may include, for example, distance information, image information, and biological information such as the heart rate and brain waves of the speaker UB.
  • the sensing information analysis unit 183 performs sensing information analysis processing using the sensing information recognition result from the sensing information recognition unit 182 and the point addition target sensing information stored in the point addition target sensing information DB 184.
  • the point-added target sensing information is information for extracting (identifying) the point-added target sensing information when scoring interpersonal communication skills.
  • the sensing information of the point addition target is analyzed from the recognized sensing information, and the analysis result is supplied to the point addition target integration unit 161.
  • the analysis result of the utterance content from the utterance content analysis unit 154, the analysis result of the sensing information from the sensing information analysis unit 173, and the analysis result of the sensing information from the sensing information analysis unit 183 are provided.
  • Time information from the time acquisition unit 156 is supplied.
  • the point-adding target integration unit 161 uses the question information stored in the question information DB 162 to perform an integration process for integrating the analysis result of the utterance content associated with the time information and the analysis result of the sensing information.
  • the sensing information processed by the information processing device 10A may be acquired from an electronic device such as a smartphone, a wearable terminal, a measuring instrument, or a measuring instrument possessed by the speaker UA or the speaker UB.
  • an acceleration sensor that measures acceleration in the three directions of the XYZ axes
  • a gyro sensor that measures the angular velocity of the three axes of the XYZ axes
  • a distance measuring sensor that measures the distance
  • various sensors such as a biological sensor that measures such information, a proximity sensor that measures a proximity object, and a magnetic sensor that measures the magnitude and direction of a magnetic field (magnetic field).
  • FIG. 19 shows an example of displaying the dialogue partner evaluation information in the medical interview.
  • the dialogue partner evaluation information represents an evaluation calculated from the sensing information of the dialogue partner in the medical interview.
  • the degree of understanding, empathy, and interest are used as evaluation axes for the speaker UA based on the sensing information obtained on the speaker UB side.
  • degree, likability, and reliability For example, when the sensor on the speaker UB side detects that the pupil of the speaker UB has opened at the timing when the speaker UA speaks, the biometric information of the dialogue partner is added, such as adding a favorability score. It is possible to present the evaluation information calculated from the response and the response.
  • the dialogue partner evaluation information is represented by a bar graph extending the evaluation value (score) in the horizontal direction for each of the five evaluation axes.
  • the bar graph starts from 0 and extends to the right or left side, and the positive value increases toward the right side to indicate a positive evaluation (high evaluation), while the negative evaluation (low evaluation) increases toward the left side. Represents.
  • a in FIG. 19 is the evaluation in the first half of the dialogue
  • B in FIG. 19 is the middle stage of the dialogue.
  • C in FIG. 19 and C in FIG. 19 represent the evaluation in the latter half of the dialogue, it is assumed that the dialogue is as follows, for example.
  • the speaker UB gives a slight evaluation to the speaker UA, but the value itself is not large.
  • the evaluation of B in FIG. 19 all the evaluations except the comprehension level are low from the information such as the speaker UB having a dark facial expression.
  • the degree of understanding, empathy, and degree of interest are based on information such as the speaker UB nodding or having a bright facial expression. , Favorability, and reliability are all highly evaluated.
  • FIG. 20 shows an example of real-time display of scoring information in a medical interview.
  • the speaker UA can confirm the scoring information while interacting with the speaker UB through the display 121, but the scoring information further includes the dialogue partner evaluation information 206. ..
  • the evaluation value (score) calculated from the sensing information on the speaker UB side is represented by a bar graph.
  • the speaker UB has a high evaluation of comprehension, empathy, likability, and reliability for the speaker UA, but has a low evaluation of interest.
  • the dialogue scene transition information 203 indicates that the dialogue scene has completed the introduction section (Intro) and the interview (History Taking) and has progressed to the explanation (Explanation).
  • the measured values (facial, head, body) obtained from the sensing information are also shown.
  • the state of the speaker UB can be inferred from the measured values of the face, the head, and the body.
  • intermediate information such as the evaluation axis scoring information 202, the dialogue scene transition information 203, the dialogue transmission item information 204, and the dialogue partner evaluation information 206 is displayed in real time as scoring information, so that the scoring is performed.
  • the speaker UA who is the target person, can have a dialogue with the speaker UB through the display 121 while checking the scoring information and considering what kind of evaluation the speaker UB is doing.
  • the evaluation axis scoring information 202, the dialogue scene transition information 203, the dialogue transmission item information 204, and the dialogue partner evaluation information 206 are examples of intermediate information, and at least one of these information is included in the intermediate information. It suffices if it is.
  • 21 and 22 show an example of transition between input information and scoring information.
  • 21 and 22 show examples of displaying input information and scoring information 21 seconds after the start of the medical interview and 6 minutes and 21 seconds later, respectively.
  • the input information and the scoring information are superimposed and displayed on the video including the speaker UB.
  • the display 121 of the information processing apparatus 10A displays scoring information including the evaluation axis scoring information 202, the dialogue scene transition information 203, the dialogue transmission matter information 204, and the dialogue partner evaluation information 206. Has been done.
  • the dialogue scene transition information 203 shows an introduction unit (Intro), which is a dialogue scene at the time when 21 seconds have elapsed from the start, and a progress bar showing the progress thereof. Further, in the input information 211, in the dialogue scene which is the introduction part, the speaker UA made a utterance "Nice to meet you", and the expression at that time was bowed with a smile. ..
  • Dialogue communication item information 204 shows a checklist with a check mark in "greeting" among the communication items of the introduction department (Intro).
  • the dialogue partner evaluation information 206 represents an evaluation value calculated from the sensing information on the speaker UB side.
  • the bar graph shows that the speaker UB greeted by the speaker UA has increased empathy, likability, and reliability for the speaker UA.
  • the evaluation values for each of the seven evaluation axes are represented by a bar graph.
  • the transmission item in the evaluation axis scoring information 202 is correspondingly.
  • the evaluation value (value of the bar graph) of the evaluation axis is increasing (+1).
  • the dialogue partner evaluation information 206 since the speaker UB has sympathy, likability, and reliability with respect to the speaker UA, the conformity in the evaluation axis scoring information 202 accordingly.
  • the evaluation value (value of the bar graph) of a certain evaluation axis is increasing (+0.5).
  • the factor of the increase in the evaluation value is represented by an arrow, but this arrow is actually hidden.
  • the dialogue scene transition information 203 shows an explanation (Explanation) which is a dialogue scene at the time when 6 minutes 21 has elapsed from the start, and a progress bar showing the progress thereof. ..
  • the speaker UA made an utterance saying "Please drink 2 tablets with water or lukewarm water after each meal.” It is shown in the balloon that he was doing the gesture of.
  • Dialogue communication item information 204 shows a checklist in which the medication method is checked among the communication items of the explanation (Explanation).
  • the dialogue partner evaluation information 206 as an evaluation value calculated from the sensing information on the speaker UB side, an evaluation value in which the degree of understanding and reliability by the speaker UB who received the explanation of the medication method is highly evaluated is a bar graph. It is represented.
  • the transmission item in the evaluation axis scoring information 202 accordingly.
  • the evaluation value of the evaluation axis is increasing (+1).
  • the evaluation is the accuracy in the evaluation axis scoring information 202 accordingly.
  • the evaluation value of the axis is increasing (+1).
  • the evaluation axis scoring information 202 is intermediate information (interim result) presented during the dialogue between the speaker UA and the speaker UB, but is also information presented as the final scoring result after the dialogue is completed, and the dialogue is transmitted. It can be said that the information is linked to the intermediate information such as the matter information 204 and the dialogue partner evaluation information 206.
  • the speaker UA and the speaker UB have a dialogue through the display, the language analysis, facial expressions, body movements, how the other party feels, etc. are scored together. Then, the screen is updated in real time according to the input information and the scoring information. As a result, the speaker UA can confirm the scoring result of the dialogue without being noticed by the speaker UB during the dialogue with the speaker UB, and can perform a dialogue reflecting the confirmed contents in the subsequent dialogue. can.
  • the speaker UA acting as a pharmacist and the speaker UB acting as a patient have a dialogue in a medical interview
  • the speaker can obtain the scoring information displayed in real time on the display.
  • a specific example of a case where the UA confirms and determines the subsequent dialogue policy will be described.
  • FIG. 23 shows an example of feedback when the speaker UA notices an omission of communication items in a medical interview and changes the dialogue strategy.
  • the passage of time is represented by an arrow pointing from the upper side to the lower side in the figure.
  • the screen on the right side of the arrow is a screen displayed on the display 121 of the information processing apparatus 10A on the speaker UA side (scoring information superimposed on the image of the speaker UB).
  • the picture of the bell in the figure is an image of the sound output from the speaker 122.
  • the meanings of the arrows and screens are the same in FIGS. 25 and 26, which will be described later.
  • the speaker UB is speaking, "Please give me a medicine for hypertension. This is a prescription.”
  • the speaker UA made an utterance saying, "This is my first medicine. May I take about 5 minutes to explain the medicine?" There is.
  • the display of the scoring information including the checklist and the bar graph is updated.
  • the speaker UB made an utterance saying, "No, I've heard enough from the doctor, so it's okay without explanation.”
  • the speaker UA confirmed the checklist on the screen and recognized that he failed to introduce himself, and also confirmed the bar graph and the reliability did not increase, so he introduced himself together with the role of the pharmacist. And change the dialogue strategy.
  • the speaker UA said, "I'm late, but I'm ⁇ , a pharmacist at the XX pharmacy. Why don't you check the medicines with me?" I'm making a certain utterance.
  • the display of the scoring information including the checklist and the bar graph is updated.
  • Scoring information is displayed, including a bar graph indicating that it has been done.
  • the information processing apparatus 10A outputs a predetermined sound from the speaker 122 to notify the speaker UA that the points have been added by noticing the omission of the transmitted matter and changing the dialogue strategy. Can be done. Further, the notification of the point addition is not limited to the notification by sound, and may be notified, for example, by displaying information indicating the point addition on the screen or by performing a tactile presentation by vibration. However, in order to notify that the change in the dialogue strategy has led to the addition of points, feedback is given in a method different from the usual addition notification so that there is some difference from the normal addition notification.
  • FIG. 24 is a flowchart illustrating the flow of feedback processing.
  • the utterance of the speaker UA is accepted (S107).
  • the utterance of the speaker UA (the utterance related to recovery) is accepted, it is determined whether or not the speaker has recovered by the utterance (S108).
  • step S108 If it is determined that recovery has not been performed (No in S108), the process returns to step S105, and the subsequent processing is repeated. That is, if the position is recoverable, it is determined whether or not the speaker UA's utterance is received and recovered again, and if recovery is not possible, information indicating a goal is displayed on the screen.
  • the points to be added are notified by sound (S109), and the information indicating the points to be added is displayed on the screen (S110).
  • the notification of the points added may be given at least one of the sound output and the screen display.
  • step S104, S106, or S110 the process returns to step S101, and the above-mentioned process is repeated.
  • FIG. 25 shows an example of feedback in the absence of a change in dialogue strategy for omission of communication in a medical interview.
  • a dialogue between the speaker UA and the speaker UB is performed as in the case of time t11 to time t14 in FIG. 23, and the language analysis of the speech and the evaluation of the dialogue partner are scored together. By doing so, a screen including a checklist, a bar graph, etc. is displayed.
  • the speaker UA checks the checklist on the screen and recognizes that he failed to introduce himself, but considering the time limit in the medical interview (for example, 10 minutes), the remaining time is short. Therefore, it is judged that it is better to proceed to the interview as it is. Therefore, the speaker UA is making an utterance saying, "I want to make sure that the medicine is correct, so can I ask for the symptoms?"
  • the display of the scoring information including the checklist and the bar graph is updated. That is, the dialogue scene includes the introduction section (Intro) and the interview (History Taking), and among the communication items of the introduction section, "greeting” and “confirmation of the reason for visiting the hospital” are checked, and among the communication items of the interview.
  • a checklist with a checkmark in "Confirm Chief Complaint” and scoring information, including a bar graph with no particular change in score, are displayed. That is, as shown on this screen, since the dialogue scene has transitioned from the introduction section to the interview, even if the communication matter of the introduction section is spoken at this point, no points are added.
  • FIG. 26 shows an example of feedback when the comprehension level of the speaker UB as a patient changes due to the paraphrase of the speaker UA as a pharmacist in a medical interview.
  • the speaker UA makes an utterance, "When does the symptom remit?".
  • the following scoring information is superimposed on the video of the speaker UB and displayed on the display 121. In other words, it includes a checklist with checkmarks in "Greetings” and “Confirmation of reasons for visiting the hospital” and a bar graph that is highly evaluated in terms of interest, likability, reliability, etc. The scoring information is displayed.
  • the speaker UB is uttering "Is it lunch?"
  • the speaker UA checks the bar graph and measured values on the screen, and from the sudden drop in comprehension and facial expression, the word “remission” may not be understood, so paraphrase it into another word. Judge that it is better.
  • the speaker UA is making an utterance, "Well, then can you tell me when the symptoms will be alleviated?"
  • scoring the linguistic analysis of the utterance and the evaluation of the dialogue partner together, the following scoring information is superimposed and displayed on the display 121.
  • the scoring information including the bar graph (the comprehension level increased from 10 to 80) in which the evaluation of the comprehension level of the speaker UB increased is displayed.
  • the degree of similarity for example, the degree of similarity 0.8
  • the wording to be paraphrased for example, the wording including “remission” and the wording including “relaxation”
  • a measured value indicating that the speaker UB does not have a particularly troubled facial expression or the like is displayed.
  • the information processing apparatus 10A by outputting a predetermined sound from the speaker 122, the degree of understanding of the speaker UB acting as a patient changes with the speaker UA by paraphrasing the speaker UA acting as a pharmacist. It is possible to notify that points have been added. After that, at time t34, the speaker UB understands the intention of the speaker UA's question and makes an utterance saying, "Oh, I see, it calms down when I warm my stomach.”
  • the third example shown in FIG. 26 when the third example shown in FIG. 26 is applied to the feedback processing shown in FIG. 24, it becomes as follows. That is, in the third example, the speaker UA acting as a pharmacist is speaking (speech including "relief"), but after the speech, the understanding of the speaker UB acting as a patient is significantly reduced and non-verbal information. Since a negative value is given in, it is judged that there is a problem in the dialogue (Yes in S103).
  • FIG. 27 shows an example of displaying a real-time score when used for apparel customer service.
  • the speaker UA is an apparel salesperson and the speaker UB is a customer (customer).
  • the image 221 including the speaker UB is displayed on the display 121 of the information processing device 10A on the speaker UA side.
  • the video 221 displays the evaluation axis scoring information 222, the dialogue scene transition information 223, the dialogue transmission item information 224, and the scoring information including the dialogue partner evaluation information 226 superimposed.
  • the evaluation value (score) for each evaluation axis which is listening ability, accuracy, disclosure, diffusivity, and proposal ability, is represented by a bar graph.
  • the dialogue scene transition information 223 shows the progress from the start to the product proposal (Recommendation), which is the dialogue scene at the time when 6 minutes and 21 seconds have elapsed.
  • the call (Small talk) and the needs search (Needs exploration) have been completed, and the product proposal (Recommendation) has progressed.
  • the overall flow of these dialogue scenes and the current progress are represented by the customer service flow 225.
  • predetermined communication matters are expressed for each dialogue scene such as small talk, needs exploration, and product proposal (Recommendation), and are actually transmitted by the speaker UA. If so, a check mark will be added.
  • “Call”, “Seasonal topics”, and “Introduction of new items” are checked among the items to be communicated.
  • “Item introduction” and “Reference to trends” are checked.
  • the evaluation value (score) calculated from the sensing information on the speaker UB side is represented by a bar graph.
  • the speaker UB has a relatively high degree of understanding, empathy, likability, and reliability for the speaker UA, and has a low degree of interest.
  • the information processing devices 10 and 10A When the information processing devices 10 and 10A are used for apparel customer service, they have the configuration shown in FIG. 3, FIG. 17 or FIG. 18, as in the case of using for medical interviews, but the information stored in each database is stored. It is necessary to change the information for apparel customer service. That is, information for apparel customer service, not information for medical interviews, may be registered in the point-adding target language information DB 155, the point-adding target image information DB 160, the question information DB 162, the point-adding target sensing information DB 174, and the point-adding target sensing information DB 184. ..
  • FIG. 28 shows another configuration example of an embodiment of an information processing system to which the present technology is applied.
  • the information processing system 1A is configured such that the information processing device 10, the information processing device 20, and the server 30 are connected to each other via the network 50.
  • an analysis processing unit 191 and a scoring processing unit 192 are provided in the server 30, and a voice input unit 151, a voice recognition unit 152, an image input unit 157, an image recognition unit 158, and intermediate information.
  • the information processing apparatus 10 is provided with a display unit 163, an intermediate result notification unit 164, and a scoring result display unit 166.
  • the information processing apparatus 10 transmits data including the results of voice recognition and image recognition to the server 30 via the network 50.
  • the server 30 performs analysis processing and scoring processing using the data transmitted from the information processing apparatus 10.
  • the server 30 transmits data including the processing result to the information processing apparatus 10 via the network 50.
  • the information processing apparatus 10 displays information or outputs voice based on the data transmitted from the server 30.
  • the information processing device 10 may be composed of a processing device such as a home server and an input / output device such as a display device.
  • the processing device and the input / output device are provided in the same space (same room, same building, etc.). That is, among the components shown in FIG. 3, the analysis processing unit 191 and the scoring processing unit 192, the voice recognition unit 152, and the image recognition unit 158 are provided in the processing device, and the voice input unit 151, the image input unit 157, and the intermediate unit are provided.
  • the information display unit 163, the intermediate result notification unit 164, and the scoring result display unit 166 are provided in the input / output device.
  • the information processing device 10 is configured as a telepresence device such as a display device
  • the information processing device 10 may be an electronic device such as a PC (Personal Computer).
  • the speaker UA and the speaker UB at a remote location have a dialogue through the display by using an application such as a video call application. be able to.
  • the server 30 can similarly process some of the functions of the information processing device 10A.
  • the information processing device 10 or the information processing device 10A is a speaker UA (for example, a pharmacist role or an apparel) in the space SP1 based on the reference information stored in the database as a reference for dialogue scoring.
  • the dialogue between the speaker acting as a salesperson) and the speaker UB in the space SP2 (for example, the speaker acting as a patient or a customer) is scored, and the scoring information regarding the scoring of the dialogue is presented to the speaker UA in real time.
  • the speaker UA can confirm the scoring information in real time at the time of dialogue with the speaker UB. Therefore, in order to improve interpersonal communication skills such as medical interviews and apparel customer service, it is possible to provide appropriate feedback to the speaker UA, which is the subject of scoring, through scoring and the like to support them.
  • the speaker UA and the speaker UB in different spaces can have a dialogue by using the information processing device 10 and the information processing device 20 configured as telepresence devices. Since there are no restrictions and you can participate in the dialogue more easily, it is easy to practice and train the dialogue. Further, in the information processing apparatus 10, since the scoring is performed using the reference information stored in the database as the reference for the dialogue scoring, an event such as an evaluation fluctuation between the graders does not occur. As a result, by using this technique, it is possible to easily realize a simulation in a situation closer to the actual production than in the present situation.
  • the program executed by the information processing devices 10 and 10A can be recorded and provided on a removable recording medium such as a package medium, for example.
  • the removable recording medium includes a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, and the like.
  • the program can also be provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting.
  • the processes performed by the information processing devices 10 and 10A (CPU101) according to the program do not necessarily have to be performed in chronological order in the order described in the above flowchart. That is, the processing performed by the information processing devices 10 and 10A (CPU101) according to the program includes processing executed in parallel or individually (for example, parallel processing or processing by an object).
  • the program may be processed by one computer (processor) or may be distributed processed by a plurality of computers.
  • each step in the above flowchart may be executed by one device or may be shared and executed by a plurality of devices.
  • the plurality of processes included in the one step can be executed by one device or shared by a plurality of devices.
  • the program may be transferred to a distant computer for execution.
  • the system means a set of a plurality of components (devices, modules (parts), etc.), and it does not matter whether all the components are in the same housing. Therefore, a plurality of devices housed in separate housings and connected via a network, and a device in which a plurality of modules are housed in one housing are both systems. ..
  • the embodiment of the present technique is not limited to the above-described embodiment, and various changes can be made without departing from the gist of the present technique. Further, the effects described in the present specification are merely exemplary and not limited, and other effects may be used.
  • An information processing device including a processing unit that presents scoring information regarding the scoring of the dialogue to the first speaker in real time.
  • the scoring information includes a scoring result of the dialogue or intermediate information in the middle of the dialogue.
  • the processing unit presents the scoring information including intermediate information at that time during a dialogue between the first speaker and the second speaker.
  • the intermediate information includes evaluation axis scoring information indicating evaluation for each evaluation axis according to the usage scene, dialogue scene transition information indicating transition of dialogue scene, and dialogue transmission item indicating achievement level of transmission item for each dialogue scene.
  • the information processing apparatus which includes at least one piece of information and dialogue partner evaluation information representing the evaluation of the first speaker by the second speaker.
  • the processing unit has the scoring information according to the reflection result.
  • the information processing apparatus according to any one of (2) to (4) above.
  • the information processing device wherein the processing unit notifies the first speaker of the reflection result by a method different from the presentation of the scoring information in real time.
  • the processing unit presents the scoring information including the scoring result of the entire dialogue after the dialogue between the first speaker and the second speaker is completed. Information processing equipment.
  • the information processing device according to any one of (1) to (7) above, wherein the processing unit scores the dialogue based on the utterance content of the first speaker.
  • the processing unit targets at least one of the similarity with the preset scoring item example sentence, the composition in the dialogue, the speech attitude classification, and the speech attitude classification in the dialogue composition, and the first speaker.
  • the information processing apparatus which scores the dialogue by analyzing the utterance of the above.
  • the information processing apparatus according to (8) or (9), wherein the processing unit scores the dialogue based on sensing information about at least one speaker of the first speaker and the second speaker. ..
  • the information processing apparatus is information obtained by various sensors and is information according to the timing of utterance by the first speaker.
  • the sensing information includes a captured image captured by a camera.
  • the processing unit scores the dialogue by analyzing the captured image of at least one of the preset facial expressions of the speaker, the movement of the speaker, the line of sight of the speaker, and the presentation.
  • the information processing apparatus according to (11) above.
  • (13) The information according to any one of (10) to (12) above, wherein the processing unit presents the scoring information obtained by applying preset point addition conditions to the analysis result of the utterance content and the sensing information. Processing equipment.
  • the first speaker is a scoring subject and is The second speaker is the other party of the dialogue of the graded person, and is The processing unit An image including the second speaker is displayed on the display, and the image is displayed.
  • the information processing apparatus according to any one of (1) to (13), wherein the information corresponding to the scoring information is displayed on the display, or the sound corresponding to the scoring information is output from the speaker.
  • a first camera and a first display are installed in the first space.
  • a second camera and a second display are installed in the second space.
  • an image taken by a camera installed in one space between the first space and the second space is displayed in real time by a display installed in the other space.
  • the first camera installed in the first space and the first display are integrally configured, and the second camera installed in the second space and the second display are integrated.
  • the information processing apparatus according to (15) above which is interconnected with other information processing apparatus configured as above via a network.
  • the processing unit scores the dialogue based on sensing information obtained from the first sensor and the second sensor included in the other information processing device.
  • Information processing equipment Based on the reference information that serves as the basis for scoring the dialogue, the dialogue between the first speaker in the first space and the second speaker in the second space different from the first space is scored. An information processing method that presents scoring information regarding the scoring of the dialogue to the first speaker in real time.
  • 1,1A information processing system 10,10A information processing device, 20 information processing device, 30 server, 50 network, 101 CPU, 102 ROM, 103 RAM, 106, 106A input unit, 107 output unit, 108 storage unit, 109 communication Unit, 111 operation unit, 112 camera, 113 microphone, 114 sensor, 121 display, 122 speaker, 151 voice input unit, 152 voice recognition unit, 153 sentence division unit, 154 speech content analysis unit, 155 point-adding target language information DB, 156 Time acquisition unit, 157 image input unit, 158 image recognition unit, 159 image analysis unit, 160 point addition target image information DB, 161 point addition target integration unit, 162 question information DB, 163 intermediate information display unit, 164 intermediate result notification unit, 165 Scoring result generation unit, 166 scoring result display unit, 171 sensing information input unit, 172 sensing information recognition unit, 173 sensing information analysis unit, 174 scoring target sensing information DB, 181 sensing information input unit, 182 sensing information recognition unit, 183 sensing Information

Abstract

The present technology pertains to an information processing device and an information processing method which make it possible to suitably support a person to be graded in the grading of interpersonal communication in which dialogue skills are required. Provided is the information processing device comprising a processing unit which grades, on the basis of standard information which serves as a standard for grading a dialogue, the dialogue between a first speaker in a first space and a second speaker in a second space that is different from the first space, and presents, for the first speaker, grading information about the grading of the dialogue in real time. The present technology can be applied to, for example, a dialogue grading device which grades a dialogue.

Description

情報処理装置、及び情報処理方法Information processing equipment and information processing method
 本技術は、情報処理装置、及び情報処理方法に関し、特に、対話技能が求められる対人コミュニケーションを採点するに際して採点対象者を適切に支援することができるようにした情報処理装置、及び情報処理方法に関する。 The present technology relates to an information processing device and an information processing method, and particularly to an information processing device and an information processing method capable of appropriately supporting a person to be scored when scoring interpersonal communication that requires dialogue skills. ..
 医療現場の専門職では、対人コミュニケーションに関する試験がある。特許文献1には、医療面接におけるモデル患者の心理学的な変化をシミュレートし、質問内容や面接手順に応じてモデル患者の回答を変化させるシミュレーションシステムが開示されている。 Professionals in the medical field have tests related to interpersonal communication. Patent Document 1 discloses a simulation system that simulates a psychological change of a model patient in a medical interview and changes the answer of the model patient according to a question content and an interview procedure.
 また、営業職やコールセンタのオペレータなど、その職種によっても、好ましい表現方法や話し方がある。 In addition, there are preferable expressions and ways of speaking depending on the type of job, such as sales position or call center operator.
国際公開第2007/026715号International Publication No. 2007/026715
 ところで、対人コミュニケーションのスキルを向上させる手段として、対話時に採点対象者へ採点等を通じて適切にフィードバックを与え、対話コミュニケーションのスキル向上を支援することが求められる。 By the way, as a means to improve the skill of interpersonal communication, it is required to give appropriate feedback to the graded person at the time of dialogue through scoring etc. to support the improvement of dialogue communication skill.
 本技術はこのような状況に鑑みてなされたものであり、対人コミュニケーションのスキルを向上させるために採点対象者へ適切にフィードバックを与え、対話コミュニケーションのスキル向上を支援することができるようにするものである。 This technology was made in view of such a situation, and it is possible to give appropriate feedback to the graded persons in order to improve the skill of interpersonal communication and to support the improvement of the skill of dialogue communication. Is.
 本技術の一側面の情報処理装置は、対話採点の基準となる基準情報に基づいて、第1の空間にいる第1の話者と前記第1の空間と異なる第2の空間にいる第2の話者による対話を採点し、前記第1の話者に対し、前記対話の採点に関する採点情報をリアルタイムに提示する処理部を備える情報処理装置である。 The information processing device of one aspect of the present technology has a first speaker in the first space and a second speaker in a second space different from the first space based on the reference information that serves as a reference for dialogue scoring. It is an information processing apparatus provided with a processing unit that scores a dialogue by the speakers and presents scoring information regarding the scoring of the dialogue to the first speaker in real time.
 本技術の一側面の情報処理方法は、情報処理装置が、対話採点の基準となる基準情報に基づいて、第1の空間にいる第1の話者と前記第1の空間と異なる第2の空間にいる第2の話者による対話を採点し、前記第1の話者に対し、前記対話の採点に関する採点情報をリアルタイムに提示する情報処理方法である。 In the information processing method of one aspect of the present technology, the information processing apparatus is different from the first speaker in the first space and the first space based on the reference information which is the standard for dialogue scoring. It is an information processing method that scores a dialogue by a second speaker in a space and presents scoring information regarding the scoring of the dialogue to the first speaker in real time.
 本技術の一側面の情報処理装置、及び情報処理方法においては、対話採点の基準となる基準情報に基づいて、第1の空間にいる第1の話者と前記第1の空間と異なる第2の空間にいる第2の話者による対話が採点され、前記第1の話者に対し、前記対話の採点に関する採点情報がリアルタイムに提示される。 In the information processing device and the information processing method of one aspect of the present technology, the first speaker in the first space and the second space different from the first space are based on the reference information which is the standard for dialogue scoring. The dialogue by the second speaker in the space is graded, and the scoring information regarding the scoring of the dialogue is presented to the first speaker in real time.
 なお、本技術の一側面の情報処理装置は、独立した装置であってもよいし、1つの装置を構成している内部ブロックであってもよい。 The information processing device on one aspect of the present technology may be an independent device or an internal block constituting one device.
本技術を適用した情報処理システムの一実施の形態の構成例を示す図である。It is a figure which shows the structural example of one Embodiment of the information processing system to which this technique is applied. 図1の情報処理装置の構成の第1の例を示す図である。It is a figure which shows the 1st example of the structure of the information processing apparatus of FIG. 図1の情報処理装置の機能的構成の第1の例を示す図である。It is a figure which shows the 1st example of the functional configuration of the information processing apparatus of FIG. 対話対応処理の流れを説明するフローチャートである。It is a flowchart explaining the flow of a dialogue correspondence process. 解析採点処理の詳細を説明するフローチャートである。It is a flowchart explaining the detail of analysis scoring processing. 本技術を適用した情報処理システムを医療面談に利用する場合の構成例を示す図である。It is a figure which shows the configuration example when the information processing system to which this technique is applied is used for a medical interview. 医療面談に利用する場合における情報処理装置の構成例を示す図である。It is a figure which shows the configuration example of the information processing apparatus when it is used for a medical interview. 医療面談における加点対象言語情報の例を示す図である。It is a figure which shows the example of the language information to be added points in the medical interview. 評価軸ごとの想定利用シーンの例を示す図である。It is a figure which shows the example of the assumed use scene for each evaluation axis. 医療面談における評価軸採点情報の表示例を示す図である。It is a figure which shows the display example of the evaluation axis scoring information in a medical interview. 医療面談における対話シーン遷移情報の表示例を示す図である。It is a figure which shows the display example of the dialogue scene transition information in a medical interview. 医療面談における対話伝達事項情報の表示例を示す図である。It is a figure which shows the display example of the dialogue communication matter information in a medical interview. 医療面談における採点情報のリアルタイム表示例を示す図である。It is a figure which shows the real-time display example of the scoring information in a medical interview. 医療面談におけるオーバラップのある対話例を示す図である。It is a figure which shows the example of the dialogue with overlap in a medical interview. 解析時の対話テキストの例を示す図である。It is a figure which shows the example of the dialogue text at the time of analysis. 図1の情報処理装置の構成の第2の例を示す図である。It is a figure which shows the 2nd example of the structure of the information processing apparatus of FIG. 図16の情報処理装置の機能的構成の第1の例を示す図である。It is a figure which shows the 1st example of the functional configuration of the information processing apparatus of FIG. 図16の情報処理装置の機能的構成の第2の例を示す図である。It is a figure which shows the 2nd example of the functional configuration of the information processing apparatus of FIG. 医療面談における対話相手評価情報の表示例を示す図である。It is a figure which shows the display example of the dialogue partner evaluation information in a medical interview. 医療面談における採点情報のリアルタイム表示例を示す図である。It is a figure which shows the real-time display example of the scoring information in a medical interview. 入力情報と採点情報の表示の第1の例を示す図である。It is a figure which shows the 1st example of the display of input information and scoring information. 入力情報と採点情報の表示の第2の例を示す図である。It is a figure which shows the 2nd example of the display of input information and scoring information. 医療面談における薬剤師役の話者が伝達事項の漏れに気づき対話戦略の変更があった場合のフィードバックの例を示す図である。It is a figure which shows an example of feedback when a speaker of a pharmacist role in a medical interview notices an omission of a communication matter, and there is a change of a dialogue strategy. フィードバック処理の流れを説明するフローチャートである。It is a flowchart explaining the flow of feedback processing. 医療面談における伝達事項の漏れへの対話戦略の変更がなかった場合のフィードバックの例を示す図である。It is a figure which shows the example of feedback when there was no change of the dialogue strategy to the omission of communication matter in a medical interview. 医療面談における薬剤師役の話者の言い換えにより患者役の話者の理解度が変化した場合のフィードバックの例を示す図である。It is a figure which shows the example of feedback when the understanding degree of a speaker of a patient role changes by paraphrase of a speaker of a pharmacist role in a medical interview. アパレル接客における採点情報のリアルタイム表示例を示す図である。It is a figure which shows the real-time display example of the scoring information in the apparel customer service. 本技術を適用した情報処理システムの一実施の形態の他の構成例を示す図である。It is a figure which shows the other configuration example of one Embodiment of the information processing system to which this technique is applied.
<1.第1の実施の形態> <1. First Embodiment>
(システム構成)
 図1は、本技術を適用した情報処理システムの一実施の形態の構成例を示している。
(System configuration)
FIG. 1 shows a configuration example of an embodiment of an information processing system to which the present technology is applied.
 図1において、情報処理システム1は、テレプレゼンス装置としての情報処理装置10と情報処理装置20とが、ネットワーク50を介して相互に接続されて構成される。 In FIG. 1, the information processing system 1 is configured by connecting an information processing device 10 as a telepresence device and an information processing device 20 to each other via a network 50.
 情報処理装置10と情報処理装置20とは、異なる建物や異なる部屋といったように、それぞれ異なる空間に設置される。すなわち、情報処理装置10の近傍にいるユーザ(第1の話者)と、情報処理装置20の近傍にいるユーザ(第2の話者)とは、お互いに遠隔地などの離れた場所にいて対話を行う話者となる。第1の話者が、対人コミュニケーションのスキルを採点される採点対象者であり、第2の話者は、採点対象者の対話の相手である。 The information processing device 10 and the information processing device 20 are installed in different spaces such as different buildings and different rooms. That is, the user in the vicinity of the information processing device 10 (first speaker) and the user in the vicinity of the information processing device 20 (second speaker) are in remote locations such as remote areas. Become a speaker with whom you have a dialogue. The first speaker is the grader whose interpersonal communication skills are graded, and the second speaker is the dialogue partner of the grader.
 情報処理装置10と情報処理装置20には、それぞれ大型のサイズ(例えば話者の全身を表示可能なサイズ)のディスプレイのほかに、周辺の様子を撮影するカメラ、話者の発話や環境音などの周辺の音を集音するマイクロフォン、音を出力するスピーカなどが設けられる。 The information processing device 10 and the information processing device 20 each have a large-sized display (for example, a size capable of displaying the whole body of the speaker), a camera for photographing the surrounding state, a speaker's speech, an environmental sound, and the like. A microphone that collects the sound around the building, a speaker that outputs the sound, etc. are installed.
 情報処理装置10は、情報処理装置20により撮影された撮影画像に応じた映像や当該映像に重畳される情報を表示するとともに、情報処理装置20により集音された音を出力する。一方で、情報処理装置20は、情報処理装置10により撮影された撮影画像に応じた映像を表示するとともに、情報処理装置10により集音された音を出力する。これにより、異なる空間にいる第1の話者と第2の話者は、ディスプレイ越しに対話を行うことができる。 The information processing device 10 displays a video corresponding to the captured image taken by the information processing device 20 and information superimposed on the video, and outputs the sound collected by the information processing device 20. On the other hand, the information processing apparatus 20 displays an image corresponding to the captured image captured by the information processing apparatus 10, and outputs the sound collected by the information processing apparatus 10. As a result, the first speaker and the second speaker in different spaces can have a dialogue through the display.
 ネットワーク50は、インターネット、イントラネット、又は携帯電話網などの通信網を含んで構成され、TCP/IP(Transmission Control Protocol / Internet Protocol)等の通信プロトコルを用いた機器間の相互接続を可能にしている。 The network 50 is configured to include a communication network such as the Internet, an intranet, or a mobile phone network, and enables interconnection between devices using a communication protocol such as TCP / IP (Transmission Control Protocol / Internet Protocol). ..
(装置構成)
 図2は、図1の情報処理装置10の構成例を示している。
(Device configuration)
FIG. 2 shows a configuration example of the information processing apparatus 10 of FIG.
 情報処理装置10は、インターネット等のネットワーク50に接続可能なディスプレイ装置などの電子機器であって、テレプレゼンス装置として構成される。 The information processing device 10 is an electronic device such as a display device that can be connected to a network 50 such as the Internet, and is configured as a telepresence device.
 情報処理装置10において、CPU(Central Processing Unit)101、ROM(Read Only Memory)102、及びRAM(Random Access Memory)103は、バス104により相互に接続される。CPU101は、ROM102や記憶部108に記録されたプログラムを実行することで、各部の動作の制御や各種の処理を行う。RAM103には、各種のデータが適宜記憶される。 In the information processing apparatus 10, the CPU (Central Processing Unit) 101, the ROM (Read Only Memory) 102, and the RAM (Random Access Memory) 103 are connected to each other by the bus 104. The CPU 101 controls the operation of each unit and performs various processes by executing the program recorded in the ROM 102 or the storage unit 108. Various data are appropriately stored in the RAM 103.
 バス104にはまた、入出力I/F105が接続される。入出力I/F105には、入力部106、出力部107、記憶部108、及び通信部109が接続される。 The input / output I / F 105 is also connected to the bus 104. An input unit 106, an output unit 107, a storage unit 108, and a communication unit 109 are connected to the input / output I / F 105.
 入力部106は、各種の入力データを、CPU101を含む各部に供給する。入力部106は、操作部111、カメラ112、及びマイクロフォン113を有する。 The input unit 106 supplies various input data to each unit including the CPU 101. The input unit 106 includes an operation unit 111, a camera 112, and a microphone 113.
 操作部111は、ユーザによって操作され、その操作に応じた操作データを出力する。操作部111は、物理的なボタンやタッチパネル等から構成される。カメラ112は、そこに入射される被写体からの光を光電変換して、その結果得られる電気信号に対する信号処理を行うことで撮影画像データを生成して出力する。カメラ112は、イメージセンサや信号処理部等から構成される。マイクロフォン113は、空気の振動としての音を受け付け、その電気信号としての音データを出力する。 The operation unit 111 is operated by the user and outputs operation data corresponding to the operation. The operation unit 111 is composed of physical buttons, a touch panel, and the like. The camera 112 generates and outputs captured image data by photoelectrically converting the light from the subject incident therein and performing signal processing on the electric signal obtained as a result. The camera 112 includes an image sensor, a signal processing unit, and the like. The microphone 113 receives sound as vibration of air and outputs sound data as an electric signal thereof.
 出力部107は、CPU101からの制御に従い、各種の情報を出力する。出力部107は、ディスプレイ121、及びスピーカ122を有する。 The output unit 107 outputs various information according to the control from the CPU 101. The output unit 107 has a display 121 and a speaker 122.
 ディスプレイ121は、CPU101からの制御に従い、撮影画像データに応じた映像等を表示する。ディスプレイ121は、液晶パネルやOLED(Organic Light Emitting Diode)パネル等のパネル部と信号処理部などから構成される。スピーカ122は、CPU101からの制御に従い、音データに応じた音を出力する。 The display 121 displays an image or the like according to the captured image data according to the control from the CPU 101. The display 121 is composed of a panel unit such as a liquid crystal panel or an OLED (Organic Light Emitting Diode) panel, a signal processing unit, and the like. The speaker 122 outputs sound according to the sound data according to the control from the CPU 101.
 記憶部108は、CPU101からの制御に従い、各種のデータやプログラムを記録する。CPU101は、記憶部108から各種のデータを読み出して処理したり、プログラムを実行したりする。記憶部108は、補助記憶装置として構成される。記憶部108は、HDD(Hard Disk Drive)やSSD(Solid State Drive)等の内部ストレージとして構成されてもよいし、メモリカード等の外部ストレージであってもよい。 The storage unit 108 records various data and programs according to the control from the CPU 101. The CPU 101 reads various data from the storage unit 108, processes them, and executes a program. The storage unit 108 is configured as an auxiliary storage device. The storage unit 108 may be configured as an internal storage such as an HDD (Hard Disk Drive) or an SSD (Solid State Drive), or may be an external storage such as a memory card.
 通信部109は、CPU101からの制御に従い、ネットワーク50を介して他の機器と通信を行う。通信部109は、セルラー方式の通信(例えばLTE-Advancedや5G(5th Generation)等)や、無線LAN(Local Area Network)などの無線通信、又は有線通信に対応した通信モジュールとして構成される。 The communication unit 109 communicates with other devices via the network 50 according to the control from the CPU 101. The communication unit 109 is configured as a communication module corresponding to cellular communication (for example, LTE-Advanced, 5G (5th Generation), etc.), wireless communication such as wireless LAN (Local Area Network), or wired communication.
 以上のように構成される情報処理装置10は一例であって、例えば、Bluetooth(登録商標)やNFC(Near Field Communication)等の近距離無線通信規格による無線通信を行う近距離無線通信回路や、電源回路などを設けることができる。情報処理装置10においては、スピーカ122の代わりに、出力端子に接続されるヘッドホンなどを用いても構わない。また、ディスプレイ121は、プロジェクタであっても構わない。プロジェクタにより、任意のスクリーンに、撮影画像データに応じた映像を投影して表示させることができる。 The information processing device 10 configured as described above is an example, for example, a short-range wireless communication circuit that performs wireless communication according to a short-range wireless communication standard such as Bluetooth (registered trademark) or NFC (Near Field Communication), or a short-range wireless communication circuit. A power supply circuit or the like can be provided. In the information processing apparatus 10, instead of the speaker 122, headphones or the like connected to the output terminal may be used. Further, the display 121 may be a projector. With the projector, it is possible to project and display an image corresponding to the captured image data on an arbitrary screen.
 なお、図1の情報処理システム1において、情報処理装置20の構成は、図2に示した情報処理装置10と同様の構成となるため、その説明は省略する。 In the information processing system 1 of FIG. 1, the configuration of the information processing apparatus 20 is the same as that of the information processing apparatus 10 shown in FIG. 2, so the description thereof will be omitted.
(機能的構成)
 図3は、図1の情報処理装置10の機能的構成例を示している。
(Functional configuration)
FIG. 3 shows an example of a functional configuration of the information processing apparatus 10 of FIG.
 図3において、情報処理装置10は、音声入力部151、音声認識部152、文分割部153、発話内容解析部154、加点対象言語情報DB155、時刻取得部156、画像入力部157、画像認識部158、画像解析部159、加点対象画像情報DB160、加点対象統合部161、設問情報DB162、中間情報表示部163、中間結果通知部164、採点結果生成部165、及び採点結果表示部166を有する。 In FIG. 3, the information processing apparatus 10 includes a voice input unit 151, a voice recognition unit 152, a sentence division unit 153, a speech content analysis unit 154, a point-adding target language information DB 155, a time acquisition unit 156, an image input unit 157, and an image recognition unit. It has an image analysis unit 159, a point addition target image information DB 160, a point addition target integration unit 161, a question information DB 162, an intermediate information display unit 163, an intermediate result notification unit 164, a scoring result generation unit 165, and a scoring result display unit 166.
 文分割部153、発話内容解析部154、加点対象言語情報DB155、時刻取得部156、画像解析部159、及び加点対象画像情報DB160により、解析処理部191が構成される。加点対象統合部161、設問情報DB162、及び採点結果生成部165により、採点処理部192が構成される。 The analysis processing unit 191 is composed of the sentence division unit 153, the utterance content analysis unit 154, the point addition target language information DB 155, the time acquisition unit 156, the image analysis unit 159, and the point addition target image information DB 160. The scoring processing unit 192 is configured by the scoring target integration unit 161, the question information DB 162, and the scoring result generation unit 165.
 例えば、音声認識部152、画像認識部158、解析処理部191、及び採点処理部192は、図2のCPU101がプログラムを実行することで実現される。ただし、加点対象言語情報DB155、加点対象画像情報DB160、及び設問情報DB162は、図2の記憶部108に記録される。 For example, the voice recognition unit 152, the image recognition unit 158, the analysis processing unit 191 and the scoring processing unit 192 are realized by the CPU 101 of FIG. 2 executing a program. However, the point-adding target language information DB 155, the point-adding target image information DB 160, and the question information DB 162 are recorded in the storage unit 108 of FIG.
 また、音声入力部151は、図2のマイクロフォン113に相当し、画像入力部157は、図2のカメラ112に相当する。中間情報表示部163、及び採点結果表示部166は、図2のディスプレイ121に相当し、中間結果通知部164は、図2のディスプレイ121又はスピーカ122に相当する。 Further, the voice input unit 151 corresponds to the microphone 113 of FIG. 2, and the image input unit 157 corresponds to the camera 112 of FIG. The intermediate information display unit 163 and the scoring result display unit 166 correspond to the display 121 of FIG. 2, and the intermediate result notification unit 164 corresponds to the display 121 or the speaker 122 of FIG.
 音声入力部151は、話者の発話の音声データを音声認識部152に入力する。音声認識部152は、音声入力部151からの音声データを用いた音声認識処理を行う。この音声認識処理では、統計的手法等を用いて、話者の発話の音声データがテキストデータに変換され、その音声認識結果が文分割部153及び時刻取得部156に供給される。 The voice input unit 151 inputs the voice data of the speaker's utterance to the voice recognition unit 152. The voice recognition unit 152 performs voice recognition processing using the voice data from the voice input unit 151. In this voice recognition process, the voice data of the speaker's utterance is converted into text data by using a statistical method or the like, and the voice recognition result is supplied to the sentence division unit 153 and the time acquisition unit 156.
 文分割部153は、音声認識部152からの音声認識結果を用いた文分割処理を行う。この文分割処理では、話者の発話に応じたテキストが所定の処理単位で分割され、その文分割結果が発話内容解析部154に供給される。 The sentence division unit 153 performs sentence division processing using the voice recognition result from the voice recognition unit 152. In this sentence division process, the text corresponding to the speaker's utterance is divided into predetermined processing units, and the sentence division result is supplied to the utterance content analysis unit 154.
 発話内容解析部154は、文分割部153からの文分割結果と、加点対象言語情報DB155に格納された加点対象言語情報を用いた発話内容解析処理を行う。加点対象言語情報は、対人コミュニケーションのスキルを採点する際の加点対象の言語(文言)を抽出(特定)するための情報である。発話内容解析処理では、テキスト間類似度を用いることで、分割された分割テキストから、加点対象の言語を含むテキストが解析され、その解析結果が加点対象統合部161に供給される。 The utterance content analysis unit 154 performs an utterance content analysis process using the sentence division result from the sentence division unit 153 and the point addition target language information stored in the point addition target language information DB 155. The language information to be added is information for extracting (identifying) the language (wording) to be added when scoring the skill of interpersonal communication. In the utterance content analysis process, by using the similarity between texts, the text including the language to be added is analyzed from the divided text, and the analysis result is supplied to the point addition target integration unit 161.
 時刻取得部156は、音声認識部152からの音声認識結果に応じた時刻を取得し、その時刻情報を画像解析部159及び加点対象統合部161に供給する。 The time acquisition unit 156 acquires the time according to the voice recognition result from the voice recognition unit 152, and supplies the time information to the image analysis unit 159 and the point addition target integration unit 161.
 画像入力部157は、話者を含む撮影画像データを画像認識部158に入力する。画像認識部158は、画像入力部157からの撮影画像データを用いた画像認識処理を行う。この画像認識処理では、パターン認識技術等を用いて、オブジェクトとしての話者(顔や身体の部位等)などが認識され、その画像認識結果が画像解析部159に供給される。 The image input unit 157 inputs captured image data including the speaker into the image recognition unit 158. The image recognition unit 158 performs image recognition processing using the captured image data from the image input unit 157. In this image recognition process, a speaker (face, body part, etc.) as an object is recognized by using a pattern recognition technique or the like, and the image recognition result is supplied to the image analysis unit 159.
 画像解析部159は、画像認識部158からの画像認識結果と、加点対象画像情報DB160に格納された加点対象画像を用いた画像解析処理を行う。加点対象画像情報は、対人コミュニケーションのスキルを採点する際の加点対象の画像を抽出(特定)するための情報である。画像解析処理では、認識された話者などの画像から、加点対象の画像が解析され、その解析結果が加点対象統合部161に供給される。また、画像解析処理では、時刻取得部156からの時刻情報が用いられ、画像の解析結果が発話内容の解析結果と紐付けられる。 The image analysis unit 159 performs image analysis processing using the image recognition result from the image recognition unit 158 and the point addition target image stored in the point addition target image information DB 160. The image to be added is information for extracting (identifying) the image to be added when scoring the skill of interpersonal communication. In the image analysis process, the image to be added points is analyzed from the recognized image of the speaker or the like, and the analysis result is supplied to the point addition target integration unit 161. Further, in the image analysis process, the time information from the time acquisition unit 156 is used, and the analysis result of the image is associated with the analysis result of the utterance content.
 加点対象統合部161には、発話内容解析部154からの発話内容の解析結果と、画像解析部159からの画像の解析結果と、時刻取得部156からの時刻情報が供給される。加点対象統合部161は、設問情報DB162に格納された設問情報を用い、時刻情報により紐付けられた発話内容の解析結果と画像の解析結果を統合する統合処理が行われる。 The point addition target integration unit 161 is supplied with the analysis result of the utterance content from the utterance content analysis unit 154, the image analysis result from the image analysis unit 159, and the time information from the time acquisition unit 156. The point-adding target integration unit 161 uses the question information stored in the question information DB 162 to perform an integration process for integrating the analysis result of the utterance content associated with the time information and the analysis result of the image.
 設問情報は、加点対象に対してどのように加点すべきかに関する情報である。統合処理では、発話内容の解析結果が示す加点対象の言語と、画像の解析結果が示す加点対象の画像とを統合して加算を行う採点が行われることで、対話の採点に関する採点情報が得られる。この採点情報は、対話の採点結果、又は対話の途中に提示される中間情報を含む。 Question information is information on how points should be added to the points to be added. In the integrated process, scoring information regarding the scoring of the dialogue is obtained by integrating the language to be scored indicated by the analysis result of the utterance content and the image to be scored indicated by the analysis result of the image and performing addition. Be done. This scoring information includes the scoring results of the dialogue or intermediate information presented during the dialogue.
 加点対象統合部161は、対話の途中である場合に、話者が加点になる行為をしたかどうかなどの中間情報を、中間情報表示部163に供給する。中間情報表示部163は、加点対象統合部161からの中間情報をリアルタイムに表示する。また、加点対象統合部161は、対話の途中である場合に、中間情報として、対話開始からその時点までの対話の採点結果などの中間結果を中間結果通知部164に供給する。中間結果通知部164は、加点対象統合部161からの中間結果等の中間情報をリアルタイムに通知する。 The point-adding target integration unit 161 supplies intermediate information such as whether or not the speaker has acted to add points to the intermediate information display unit 163 when the dialogue is in progress. The intermediate information display unit 163 displays the intermediate information from the point addition target integration unit 161 in real time. Further, when the dialogue is in the middle of the dialogue, the point-adding target integration unit 161 supplies the intermediate result such as the scoring result of the dialogue from the start of the dialogue to the time point to the intermediate result notification unit 164 as intermediate information. The intermediate result notification unit 164 notifies the intermediate information such as the intermediate result from the point addition target integration unit 161 in real time.
 また、加点対象統合部161は、対話が終了した場合に、対話の採点結果を採点結果生成部165に供給する。採点結果生成部165は、加点対象統合部161からの対話の採点結果を用いた採点結果生成処理を行う。採点結果生成処理では、加点対象のうち重要な項目の重み付けなどの所定の処理が行われることで、対話全体における最終的な採点結果(スコア)が生成され、その採点結果が採点結果表示部166に供給される。採点結果表示部166は、採点結果生成部165からの採点結果を表示する。 Further, when the dialogue is completed, the point-adding target integration unit 161 supplies the scoring result of the dialogue to the scoring result generation unit 165. The scoring result generation unit 165 performs a scoring result generation process using the scoring result of the dialogue from the point addition target integration unit 161. In the scoring result generation process, by performing predetermined processing such as weighting of important items among the points to be added, the final scoring result (score) in the entire dialogue is generated, and the scoring result is the scoring result display unit 166. Is supplied to. The scoring result display unit 166 displays the scoring result from the scoring result generation unit 165.
(処理の流れ)
 次に、図4のフローチャートを参照して、情報処理装置10により実行される対話対応処理の流れを説明する。
(Process flow)
Next, the flow of the dialogue correspondence processing executed by the information processing apparatus 10 will be described with reference to the flowchart of FIG.
 ステップS11において、音声認識部152は、音声入力部151からの音声データを用いた音声認識処理を行い、採点対象者である話者の発話をテキストU(0<i≦N)に変換する。このテキストには、タイムスタンプが付加される。 In step S11, the voice recognition unit 152 performs voice recognition processing using the voice data from the voice input unit 151, and converts the utterance of the speaker who is the scoring target person into the text Ui (0 < i ≦ N). .. A time stamp is added to this text.
 ステップS12において、文分割部153は、音声認識部152からのテキストUを、分割テキストu,u,・・・,uに分割する。 In step S12, the sentence dividing unit 153 divides the text Ui from the voice recognition unit 152 into divided texts u 1 , u 2 , ..., Un.
 ステップS13において、解析処理部191及び採点処理部192は、分割テキストu(0<j≦n)を解析し、分割テキストuに対して各採点項目を計算することで、解析採点処理を行う。この解析採点処理の詳細は、図5のフローチャートを参照して後述する。 In step S13, the analysis processing unit 191 and the scoring processing unit 192 analyze the divided text u j (0 <j ≦ n) and calculate each scoring item for the divided text u j to perform the analysis scoring process. conduct. The details of this analysis scoring process will be described later with reference to the flowchart of FIG.
 ステップS13の処理が終了すると、処理は、ステップS14に進められる。ステップS14では、解析採点処理の結果に基づき、スコアに加点があったかどうかが判定される。 When the process of step S13 is completed, the process proceeds to step S14. In step S14, it is determined whether or not the score has been added based on the result of the analysis scoring process.
 ステップS14の判定処理で加点があったと判定された場合、処理は、ステップS15に進められ、ステップS15,S16の処理が実行されることで、解析採点処理の結果に応じた中間情報が提示される。すなわち、ステップS15,S16では、中間情報表示部163が、正解した内容と、加点対象の採点項目にスコアを加算した中間結果などの中間情報を表示する。例えば、採点対象者である話者が「こんにちは」である発話をして、"挨拶"である採点項目が加点された場合、当該採点項目にチェックを入れたリストや、スコアを加算したグラフなどが表示される。あるいは、中間結果通知部164が中間結果などの中間情報を、音などで通知してもよい。 If it is determined in the determination process of step S14 that points have been added, the process proceeds to step S15, and the processes of steps S15 and S16 are executed, so that intermediate information corresponding to the result of the analysis scoring process is presented. To. That is, in steps S15 and S16, the intermediate information display unit 163 displays the contents of the correct answer and the intermediate information such as the intermediate result of adding the score to the scoring item to be added. For example, if the speaker who is the target of scoring speaks "hello" and the scoring item "greeting" is added, a list with the scoring item checked, a graph with the score added, etc. Is displayed. Alternatively, the intermediate result notification unit 164 may notify the intermediate information such as the intermediate result by sound or the like.
 また、ステップS14の判定処理で加点がなかったと判定された場合、処理は、ステップS17に進められ、ステップS17の処理が実行されることで、解析採点処理の結果に応じた中間情報が提示される。すなわち、ステップS17では、中間情報表示部163が、スコアシートの情報などの中間情報を表示する。例えば、採点対象者である話者が加点対象となる発話をしていない場合、リストの採点項目にはチェックが入らず、グラフなどにもスコアが加算されることはない。 If it is determined in the determination process of step S14 that no points have been added, the process proceeds to step S17, and the process of step S17 is executed, so that intermediate information corresponding to the result of the analysis scoring process is presented. To. That is, in step S17, the intermediate information display unit 163 displays intermediate information such as score sheet information. For example, if the speaker who is the target of scoring does not make an utterance to be scored, the scoring items in the list will not be checked and the score will not be added to the graph or the like.
 ステップS16又はS17の処理が終了すると、処理は、ステップS18に進められる。ステップS18では、i<Nであると判定された場合、すなわち、処理対象のテキストUにおける全ての分割テキストuを処理した後に、対話内における次の発話が存在すると判定された場合には、次の発話を変換したテキストUi+1を処理対象として、ステップS11以降の処理が繰り返される。これにより、採点対象者とその対話の相手の間で対話が行われているとき、その対話に関する加点の有無の判定結果に応じた中間情報の表示がリアルタイムで更新される。 When the process of step S16 or S17 is completed, the process proceeds to step S18. In step S18, when it is determined that i <N, that is, when it is determined that the next utterance in the dialogue exists after processing all the divided text uji in the text Ui to be processed. , The processing after step S11 is repeated with the text Ui + 1 converted from the next utterance as the processing target. As a result, when a dialogue is taking place between the scoring target person and the other party of the dialogue, the display of the intermediate information according to the determination result of the presence or absence of points added regarding the dialogue is updated in real time.
 一方で、ステップS18の判定処理で、i=Nであると判定された場合、すなわち、対話内における次の発話が存在せず、対話が終了したと判定された場合、処理は、ステップS19に進められる。ステップS19において、採点結果表示部166が、対話内の一連の発話に応じた最終的なスコアを含む採点結果を表示する。この採点結果は、最終的な採点結果であって、対話中に更新され続ける中間結果のうちの対話終了時の中間結果に対応することになる。 On the other hand, when it is determined in the determination process of step S18 that i = N, that is, when it is determined that the next utterance in the dialogue does not exist and the dialogue is completed, the process proceeds to step S19. It can be advanced. In step S19, the scoring result display unit 166 displays the scoring result including the final score corresponding to the series of utterances in the dialogue. This scoring result is the final scoring result and corresponds to the intermediate result at the end of the dialogue among the intermediate results that are continuously updated during the dialogue.
 このように、対話中と対話終了時にそれぞれの時点での採点情報が提示されるが、対話中には、採点対象者である話者の発話に必要な情報が含まれていたかなどの各発話のスコアを含む中間結果が提示される一方で、対話終了時には、対話の構成や全体の評価などの対話全体のスコアを含む採点結果が提示される。 In this way, the scoring information at each point in time is presented during the dialogue and at the end of the dialogue, but each utterance such as whether the information necessary for the utterance of the speaker who is the scoring target was included during the dialogue. While the interim results including the scores of are presented, at the end of the dialogue, the scoring results including the scores of the entire dialogue such as the composition of the dialogue and the overall evaluation are presented.
 以上、対話対応処理の流れを説明した。ここで、図5のフローチャートを参照して、図4のステップS13の解析採点処理の詳細を説明する。 The flow of dialogue support processing has been explained above. Here, the details of the analysis scoring process in step S13 of FIG. 4 will be described with reference to the flowchart of FIG.
 ステップS31では、発話内容解析部154が、加点対象言語情報DB155に格納された加点対象言語情報を用いて、分割テキストuの発話内容を解析する。この発話内容の解析では、分割テキストuと採点項目例文との類似度の解析(S32)、対話内の構成の解析(S33)、発話態度分類の解析(S34)、及び対話構成内の発話態度分類の解析(S35)などが行われる。 In step S31, the utterance content analysis unit 154 analyzes the utterance content of the divided text uj using the point-adding target language information stored in the point-adding target language information DB 155. In the analysis of the utterance content, the similarity between the divided text uji and the scoring item example sentence is analyzed (S32), the composition in the dialogue is analyzed (S33), the speech attitude classification is analyzed (S34), and the utterance in the dialogue composition is analyzed. Attitude classification analysis (S35) and the like are performed.
 例えば、加点対象言語情報には、分割テキストから加点対象の言語を抽出するための情報として、採点項目例文、対話内の構成、発話態度分類などに関する情報を含み、これらの情報を用いて、分割テキストuを処理することで、採点項目例文との類似度、対話内の構成、発話態度分類、対話構成内の発話態度分類などが解析される。なお、発話態度分類は、話者が相手に対してどのような心的態度で発話するかを分類したものである。ここでは、採点にテキスト間類似度を用いることで、評価者が採点項目ごとに加点対象となる例文を登録して採点を行えるようにしている。 For example, the language information to be added includes information on scoring item example sentences, composition in dialogue, speech attitude classification, etc. as information for extracting the language to be added from the divided text, and is divided using these information. By processing the text u j , the similarity with the scoring item example sentence, the composition in the dialogue, the speech attitude classification, the speech attitude classification in the dialogue composition, and the like are analyzed. The speech attitude classification is a classification of how the speaker speaks to the other party. Here, by using the similarity between texts for scoring, the evaluator can register and grade the example sentences to be added for each scoring item.
 ステップS36では、画像入力部157が、該当時刻の撮影画像を取得する。ステップS37では、画像認識部158が、取得した撮影画像に対する画像認識を行う。 In step S36, the image input unit 157 acquires the captured image at the corresponding time. In step S37, the image recognition unit 158 performs image recognition on the acquired captured image.
 ステップS38では、画像解析部159が、加点対象画像情報DB160に格納された加点対象画像情報を用いて、画像認識結果に含まれる動作を解析する。この画像解析では、話者の表情の解析(S39)、話者の動作の解析(S40)、話者の視線の解析(S41)、及び提示物の解析(S42)などが行われる。 In step S38, the image analysis unit 159 analyzes the operation included in the image recognition result by using the point addition target image information stored in the point addition target image information DB 160. In this image analysis, analysis of the facial expression of the speaker (S39), analysis of the movement of the speaker (S40), analysis of the line of sight of the speaker (S41), analysis of the presentation (S42), and the like are performed.
 例えば、加点対象画像情報には、撮影画像から加点対象の画像を抽出するための情報として、表情、動作、視線、提示物などに関する情報を含み、これらの情報を用いて画像認識結果が解析されることで、表情、動作、視線、提示物などが解析される。例えば、利用シーンが医療面談である場合、医師から提示される臓器の模型や各種資料などが提示物となり得る。 For example, the point-adding target image information includes information on facial expressions, movements, gaze, presentations, etc. as information for extracting the point-adding target image from the captured image, and the image recognition result is analyzed using these information. By doing so, facial expressions, movements, gazes, presentations, etc. are analyzed. For example, when the usage scene is a medical interview, an organ model or various materials presented by a doctor can be a presentation.
 発話内容の解析と画像の解析が完了すると、処理はステップS43に進められる。ステップS43では、加点対象統合部161が、加点条件の判定を行う。この加点条件の判定では、発話内容の解析と画像の解析で抽出された加点対象やスコアが判定される。例えば、採点対象者により「こんにちは」である挨拶がされていれば、加点対象としてスコアが加点されるが、その際に笑顔で挨拶しているときにはさらなるスコアが付与され、発話内容の解析結果と画像の解析結果が統合されたスコアが出力される。また、採点対象者が、対話の相手に対して説明を行うに際して資料を見せているときにはさらなるスコアが付与されてもよい。 When the analysis of the utterance content and the analysis of the image are completed, the process proceeds to step S43. In step S43, the point addition target integration unit 161 determines the point addition condition. In the determination of the point addition condition, the point addition target and the score extracted by the analysis of the utterance content and the analysis of the image are determined. For example, if the scoring target person greets "Hello", the score will be added as a point addition target, but if the greeting is made with a smile at that time, a further score will be given, and the analysis result of the utterance content will be given. A score that integrates the analysis results of the image is output. Further, when the graded person is showing the material when explaining to the other party of the dialogue, a further score may be given.
 このように、情報処理装置10では、音声データと撮影画像データの認識結果を用いた解析処理が解析処理部191により行われ、その解析結果を用いた採点処理が採点処理部192により行われることで、採点情報がリアルタイムで提示される。例えば、対話の途中である場合には、採点情報として中間結果等の中間情報が提示され、対話終了後である場合には、採点情報として最終的な採点結果が提示される。 As described above, in the information processing apparatus 10, the analysis processing unit 191 performs the analysis processing using the recognition results of the audio data and the captured image data, and the scoring processing unit 192 performs the scoring processing using the analysis results. Then, the scoring information is presented in real time. For example, if the dialogue is in the middle, intermediate information such as an intermediate result is presented as scoring information, and if it is after the dialogue is completed, the final scoring result is presented as scoring information.
 情報処理システム1による対人コミュニケーションの評価(スキルの採点)の想定利用シーンとしては、医療面談やコールセンタ、営業、販売などがある。情報処理装置10では、データベースに格納される加点対象言語情報、加点対象画像情報、及び設問情報などの対話採点の基準となる基準情報を、利用シーンに応じた内容となるように予め設定することで、対話技能が求められる所望の利用シーンでの対人コミュニケーションの評価を行うことができる。なお、基準情報としては、スコアの加点に関する情報に限らず、スコアの減点に関する情報など、対話採点の基準となる情報であればよい。以下、利用シーンが医療面談となる場合を説明する。 Assumed usage scenes for evaluation of interpersonal communication (skill scoring) by information processing system 1, there are medical interviews, call centers, sales, sales, etc. In the information processing apparatus 10, reference information such as point-adding target language information, point-adding target image information, and question information stored in the database, which is a standard for dialogue scoring, is set in advance so as to be contents according to the usage scene. Therefore, it is possible to evaluate interpersonal communication in a desired usage scene where dialogue skills are required. The reference information is not limited to information regarding score addition, but may be information that serves as a reference for dialogue scoring, such as information regarding score deduction. Hereinafter, a case where the usage scene is a medical interview will be described.
(医療面談の例)
 図6は、本技術を適用した情報処理システム1を医療面談に利用する場合の例を示している。
(Example of medical interview)
FIG. 6 shows an example of using the information processing system 1 to which the present technology is applied for a medical interview.
 図6において、情報処理装置10は、空間SP1に設置され、情報処理装置20は、空間SP2に設置されている。情報処理装置10と情報処理装置20との間では、それぞれのカメラにより撮影された撮影画像に応じた映像、マイクロフォンにより集音された音などのデータの送受信が、例えば、双方の装置の接続が確立されている間、常時、リアルタイムで行われる。 In FIG. 6, the information processing device 10 is installed in the space SP1, and the information processing device 20 is installed in the space SP2. Between the information processing device 10 and the information processing device 20, data such as a video corresponding to a captured image taken by each camera and a sound collected by a microphone can be transmitted and received, for example, the connection between both devices can be performed. It is always done in real time while it is established.
 空間SP1では、話者UAが情報処理装置10を利用する一方で、空間SP2では、話者UBが情報処理装置20を利用することで、遠隔地にいる話者UAと話者UBとがディスプレイ越しに対話を行うことができる。話者UAは、対人コミュニケーションのスキルを採点される採点対象者である。話者UBは、話者UAの対話の相手である。例えば、話者UAが薬剤師である場合、話者UBは患者となる。 In the space SP1, the speaker UA uses the information processing device 10, while in the space SP2, the speaker UB uses the information processing device 20, so that the speaker UA and the speaker UB at a remote location are displayed. You can have a dialogue over. The speaker UA is a grader who is graded for interpersonal communication skills. The speaker UB is the partner of the speaker UA's dialogue. For example, if the speaker UA is a pharmacist, the speaker UB is a patient.
 空間SP1に設置された情報処理装置10では、話者UBの映像とともに、採点情報が表示される。具体的には、図7に示すように、情報処理装置10のディスプレイ121には、グラフ、表、フローチャート等により、話者UAと話者UBによる対話の採点に関する採点情報がリアルタイムに表示されており、薬剤師役の話者UAは、採点情報を確認しながら、ディスプレイ越しに話者UBと対話を行うことができる。なお、情報処理装置10の上部に設けられたカメラ112により話者UAが撮影され、下部に設けられたマイクロフォン113により話者UAの音声が集音され、左部と右部に設けられたスピーカ122-1,122-2により話者UBの音声が出力される。 In the information processing device 10 installed in the space SP1, the scoring information is displayed together with the image of the speaker UB. Specifically, as shown in FIG. 7, on the display 121 of the information processing apparatus 10, scoring information regarding the scoring of the dialogue between the speaker UA and the speaker UB is displayed in real time by graphs, tables, flowcharts, and the like. The speaker UA, who plays the role of a pharmacist, can have a dialogue with the speaker UB through the display while checking the scoring information. The speaker UA is photographed by the camera 112 provided in the upper part of the information processing apparatus 10, the voice of the speaker UA is collected by the microphone 113 provided in the lower part, and the speakers provided in the left and right parts. The voice of the speaker UB is output by 122-1 and 122-2.
 一方で、図6に示すように、空間SP2に設置された情報処理装置20のディスプレイには、話者UAの映像が表示され、スピーカからは話者UAの音声が出力されるので、患者役の話者UBは、ディスプレイ越しに話者UAと対話を行うことができる。 On the other hand, as shown in FIG. 6, the image of the speaker UA is displayed on the display of the information processing apparatus 20 installed in the space SP2, and the voice of the speaker UA is output from the speaker. The speaker UB can interact with the speaker UA through the display.
(採点項目の例)
 図8は、加点対象言語情報DB155に格納される加点対象言語情報の例を示している。加点対象言語情報には、医療面談に利用する場合における採点項目ごとに、項目の要件と採点項目例文が予め設定されている。
(Example of scoring items)
FIG. 8 shows an example of the point-adding target language information stored in the point-adding target language information DB 155. In the language information to be added points, item requirements and scoring item example sentences are set in advance for each scoring item when used for a medical interview.
 "挨拶"である採点項目は、会話を開始する表現として定型的な挨拶や専門的な挨拶に類似する表現があるかが要件とされる。"挨拶"の採点項目例文としては、「こんにちは」、「本日はどうされましたか」、「お久しぶりですね」などがある。"自己紹介"である採点項目は、自身の名前や役職を紹介する表現があるかが要件とされる。"自己紹介"の採点項目例文としては、「わたしは本日の担当薬剤師の○○です」、「本日担当させていただきます××です」、「わたしは△△です」などがある。 The scoring item that is "greeting" is required to have an expression similar to a standard greeting or a professional greeting as an expression to start a conversation. Examples of scoring items for "greetings" include "hello", "what happened today", and "it's been a long time". The scoring item that is "self-introduction" is required to have an expression that introduces one's name and position. Examples of scoring items for "self-introduction" include "I am the pharmacist in charge of today", "I am in charge of XX today", and "I am △△".
 "名前の確認"である採点項目は、相手の名前を尋ねる表現があるかが要件とされる。"名前の確認"の採点項目例文としては、「お名前をお伺いしてもよろしいでしょうか」、「あなたのお名前は何ですか」、「お名前頂戴してもよろしいですか」などがある。"来院理由"である採点項目は、相手の来院の理由を尋ねる表現があるかが要件とされる。"来院理由"の採点項目例文としては、「新しい処方薬が出ていますね」、「今日はどういったご用件ですか」、「前回と同様のお薬ですか」などがある。 The scoring item that is "confirmation of name" is required to have an expression asking for the name of the other party. Examples of scoring item examples for "confirm name" include "Are you sure you want to ask your name?", "What is your name?", "Are you sure you want your name?" be. The scoring item, which is the "reason for visiting the hospital," is required to have an expression asking the reason for the other party's visit. Examples of scoring items for "reasons for visit" include "new prescription drugs are available", "what are your requirements today", and "is the same drug as last time?".
(評価軸の例)
 図9は、伝達項目や分かりやすさなどの評価軸ごとに、その内容と想定利用シーンを示している。この例では、想定利用シーンとして、医療面談のほかに、コールセンタ、営業、販売を例示している。
(Example of evaluation axis)
FIG. 9 shows the contents and assumed usage scenes for each evaluation axis such as transmission items and easy-to-understand. In this example, in addition to medical interviews, call centers, sales, and sales are illustrated as assumed usage scenes.
 "伝達項目"である評価軸は、伝達すべき事項はどれくらい網羅できたかの評価を表し、想定利用シーンとして、医療面談やコールセンタがある。"分かりやすさ"である評価軸は、専門用語を避け、難しい語は言い換えを行ったかの評価を表し、医療面談やコールセンタが想定利用シーンとされる。"共感力"である評価軸は、愁訴に対して共感的な表現を行ったかの評価を表し、医療面談やコールセンタが想定利用シーンとされる。 The evaluation axis, which is a "communication item", represents an evaluation of how well the items to be communicated were covered, and there are medical interviews and call centers as assumed usage scenes. The evaluation axis, which is "easy to understand," avoids technical terms and expresses the evaluation of whether difficult words have been paraphrased, and medical interviews and call centers are assumed usage scenes. The evaluation axis, which is "empathy," represents the evaluation of whether or not the complaint was expressed in a sympathetic manner, and medical interviews and call centers are assumed usage scenes.
 "構成力"である評価軸は、よく構成された対話進行であったかの評価を表し、想定利用シーンとして、医療面談やコールセンタがある。"提案力"である評価軸は、文脈に沿った形で提案することができたかの評価を表し、営業や販売、コールセンタが想定利用シーンとされる。"拡散性"である評価軸は、話題を十分に広げることができたかの評価を表し、営業や販売が想定利用シーンとされる。 The evaluation axis, which is "composition ability", represents the evaluation of whether the dialogue progressed well, and there are medical interviews and call centers as assumed usage scenes. The evaluation axis, which is "proposal power," represents the evaluation of whether or not a proposal could be made in a contextual manner, and sales, sales, and call centers are assumed usage scenes. The evaluation axis, which is "diffusive", represents the evaluation of whether the topic has been sufficiently expanded, and sales and sales are assumed usage scenes.
 "開示性"である評価軸は、メリットやデメリットを正確に伝えたかの評価を表し、想定利用シーンとして、医療面談や営業、販売がある。"同調性"である評価軸は、相手の話すペースに合わせようとしていたかの評価を表し、医療面談やコールセンタが想定利用シーンとなる。"正確性"である評価軸は、伝えた内容は全て正しい情報であったかの評価を表し、医療面談や営業、コールセンタ、販売が想定利用シーンとなる。"傾聴力"である評価軸は、適切なタイミングで相手に話を促しているかの評価を表し、営業や販売が想定利用シーンとされる。 The evaluation axis, which is "disclosure", represents the evaluation of whether the merits and demerits are accurately conveyed, and the assumed usage scenes include medical interviews, sales, and sales. The evaluation axis, which is "synchronization", represents the evaluation of whether or not the other party was trying to match the speaking pace, and the medical interview or call center is the assumed usage scene. The evaluation axis, which is "accuracy", represents the evaluation of whether all the transmitted contents were correct information, and medical interviews, sales, call centers, and sales are assumed usage scenes. The evaluation axis, which is "listening ability", represents the evaluation of whether or not the other party is being encouraged to talk at an appropriate timing, and sales and sales are assumed usage scenes.
(評価軸採点情報の表示例)
 図10は、医療面談における評価軸採点情報の表示例を示している。
(Display example of evaluation axis scoring information)
FIG. 10 shows an example of displaying evaluation axis scoring information in a medical interview.
 図9に示したように、医療面談に利用する場合には、伝達項目、分かりやすさ、共感力、構成力、開示性、同調性、及び正確性の7つの評価軸が用いられる。図10において、評価軸採点情報は、7つの評価軸ごとに、その評価値(スコア)を横方向に伸びる棒グラフで表される。図10の例では、話者UAは、話者UBとの対話で、正確性や伝達項目などについて高評価を得ているが、共感力などについては評価が低い。 As shown in FIG. 9, when used for a medical interview, seven evaluation axes of communication item, comprehensibility, empathy, composition, disclosure, conformity, and accuracy are used. In FIG. 10, the evaluation axis scoring information is represented by a bar graph extending the evaluation value (score) in the horizontal direction for each of the seven evaluation axes. In the example of FIG. 10, the speaker UA has a high evaluation in terms of accuracy and communication items in the dialogue with the speaker UB, but the evaluation is low in terms of empathy and the like.
(対話シーン遷移情報の表示例)
 図11は、医療面談における対話シーン遷移情報の表示例を示している。
(Display example of dialogue scene transition information)
FIG. 11 shows an example of displaying dialogue scene transition information in a medical interview.
 図11において、対話シーン遷移情報は、医療面談における制約時間があるタイプの対話時の対話シーンの遷移とプログレスバーで表される。医療面談における対話シーンとしては、導入部(Intro)、問診(History Taking)、説明(Explanation)、及び終結(Closing)などがある。この例では、医療面談における制約時間として、10分が設定されている。 In FIG. 11, the dialogue scene transition information is represented by the transition of the dialogue scene and the progress bar at the time of the type of dialogue in which there is a time limit in the medical interview. Dialogue scenes in medical interviews include the introduction (Intro), interview (History Taking), explanation (Explanation), and termination (Closing). In this example, 10 minutes is set as the time limit for the medical interview.
 図11のAは、医療面談開始から6分21秒経過後における対話シーンの遷移とプログレスバーの表示例を示している。図11のAでは、6分21秒経過した時点で、話者UAと話者UBの対話によって、導入部と問診が終了して、説明中であることを表している。 FIG. 11A shows a transition of the dialogue scene and a display example of the progress bar after 6 minutes and 21 seconds have passed from the start of the medical interview. In FIG. 11A, it is shown that the interview with the introductory part is completed by the dialogue between the speaker UA and the speaker UB at the time when 6 minutes and 21 seconds have elapsed, and the explanation is being given.
 図11のBは、医療面談開始から10分経過後における対話シーンの遷移とプログレスバーの表示例を示している。図11のBでは、話者UAと話者UBの対話で、導入部、問診、説明、問診、説明、及び終結が10分以内に全て終了したことを表している。 B in FIG. 11 shows a transition of the dialogue scene and a display example of the progress bar 10 minutes after the start of the medical interview. In FIG. 11B, the dialogue between the speaker UA and the speaker UB shows that the introductory part, the interview, the explanation, the interview, the explanation, and the conclusion were all completed within 10 minutes.
 図11のCは、医療面談開始から10分以上経過した後における対話シーンの遷移とプログレスバーの表示例を示している。図11のCでは、話者UAと話者UBの対話によって、導入部と問診が終了した後に、説明が長引いてしまい、制約時間である10分を超えても説明が継続中であることを表している。 C in FIG. 11 shows a transition of the dialogue scene and a display example of the progress bar after 10 minutes or more have passed from the start of the medical interview. In C of FIG. 11, the dialogue between the speaker UA and the speaker UB prolongs the explanation after the introductory part and the interview are completed, and the explanation continues even if the time limit of 10 minutes is exceeded. Represents.
(対話伝達事項情報の表示例)
 図12は、医療面談における対話伝達事項情報の表示例を示している。
(Display example of dialogue communication matter information)
FIG. 12 shows an example of displaying dialogue communication matter information in a medical interview.
 図12において、対話伝達事項情報は、医療面談における対話進行と伝達事項の網羅性のチェックリストで表される。医療面談における対話シーンとして、導入部(Intro)、問診(History Taking)、説明(Explanation)、及び終結(Closing)などがあるが、対話シーンごとに、伝達事項が設定されている。 In FIG. 12, the dialogue communication matter information is represented by a checklist of dialogue progress and completeness of communication matters in the medical interview. Dialogue scenes in medical interviews include the introduction section (Intro), interview (History Taking), explanation (Explanation), and termination (Closing), and communication items are set for each dialogue scene.
 導入部の伝達事項には、挨拶、自己紹介、名前の確認、来院理由の確認などが含まれる。問診の伝達事項には、主訴の確認、部位の確認、症状の確認、期間の確認などが含まれる。説明の伝達事項には、服薬方法、服薬期間、副作用、飲み合わせの注意などが含まれる。終結の伝達事項には、挨拶、感謝、疑問点の解消、次回の予約などが含まれる。 The matters to be communicated by the introduction department include greetings, self-introduction, name confirmation, confirmation of reasons for visiting the hospital, etc. The matters to be communicated in the interview include confirmation of the chief complaint, confirmation of the site, confirmation of symptoms, confirmation of the period, and the like. The communication items of the explanation include the method of taking the drug, the period of taking the drug, side effects, precautions for swallowing, and the like. Closing communication includes greetings, gratitude, questioning, and next appointments.
 図12のAでは、導入部、問診、及び説明である対話シーンが順に進行した場合に、チェックボックスを先頭に配した伝達事項のうち、チェックマークが入っている伝達事項が、実際に伝達された事項となる。すなわち、話者UAは、話者UBとの対話で、導入部にて挨拶と自己紹介と来院理由の確認を行い、問診にて主訴の確認を行い、説明にて服薬方法の説明を行っており、それらの達成度が表されている。 In A of FIG. 12, when the introductory part, the interview, and the dialogue scene that is the explanation proceed in order, the communication items with the check mark among the communication items with the check boxes at the beginning are actually transmitted. It becomes a matter. That is, the speaker UA, in a dialogue with the speaker UB, gives a greeting, introduces himself, and confirms the reason for visiting the hospital, confirms the chief complaint by interview, and explains the medication method in the explanation. The degree of achievement is shown.
 図12のBでは、導入部、問診、説明、及び終結である対話シーンが順に進行した場合に、話者UAは、話者UBとの対話で、導入部にて挨拶と自己紹介と来院理由の確認を行い、問診にて主訴、部位、症状、期間の確認を行い、説明にて服薬方法と副作用の説明を行い、終結にて次回の予約を行っており、それらの達成度が表されている。 In FIG. 12B, when the introductory part, the interview, the explanation, and the closing dialogue scene proceed in order, the speaker UA talks with the speaker UB, and the introductory part gives a greeting, introduces himself, and the reason for visiting the hospital. The main complaint, site, symptom, and period are confirmed by interview, the medication method and side effects are explained in the explanation, and the next appointment is made at the end, and the degree of achievement is shown. ing.
(採点情報の表示例)
 図13は、医療面談における採点情報のリアルタイム表示例を示している。
(Display example of scoring information)
FIG. 13 shows an example of real-time display of scoring information in a medical interview.
 図13において、空間SP1に設置された情報処理装置10では、ディスプレイ121に、話者UBを含む映像201が表示される。映像201には、評価軸採点情報202、対話シーン遷移情報203、及び対話伝達事項情報204を含む採点情報が重畳して表示される。採点対象者である話者UA(薬剤師役)は、ディスプレイ121越しに話者UB(患者役)と対話をしながら採点情報を確認することができる。 In FIG. 13, in the information processing apparatus 10 installed in the space SP1, the image 201 including the speaker UB is displayed on the display 121. On the video 201, scoring information including the evaluation axis scoring information 202, the dialogue scene transition information 203, and the dialogue transmission item information 204 is superimposed and displayed. The speaker UA (pharmacist role) who is the scoring target can confirm the scoring information while interacting with the speaker UB (patient role) through the display 121.
 評価軸採点情報202は、正確性等を含む7つの評価軸ごとの評価値(スコア)が棒グラフで表されている。対話シーン遷移情報203は、開始から6分21秒経過時点での対話シーンである説明(Explanation)までの進行状況が表されている。つまり、対話シーンとしては、導入部(Intro)と問診(History Taking)が終了し、説明(Explanation)まで進行している。これらの対話シーンの全体の流れと現時点での進行状況は、医療面談フロー205により表されている。 In the evaluation axis scoring information 202, the evaluation values (scores) for each of the seven evaluation axes including accuracy and the like are represented by a bar graph. The dialogue scene transition information 203 represents the progress from the start to the explanation (Explanation), which is the dialogue scene at the time when 6 minutes and 21 seconds have elapsed. In other words, as a dialogue scene, the introduction section (Intro) and the interview (History Taking) have been completed, and the explanation (Explanation) has progressed. The overall flow and current progress of these dialogue scenes is represented by the medical interview flow 205.
 対話伝達事項情報204は、導入部(Intro)、問診(History Taking)、及び説明(Explanation)である対話シーンごとに所定の伝達事項が表され、実際に話者UAにより伝達された場合にはチェックマークが入る。この例では、導入部の伝達事項のうち、"挨拶"と"自己紹介"と"来院理由の確認"にチェックマークが入っている。問診の伝達事項のうち、"主訴の確認"にチェックマークが入っている。説明の伝達事項のうち、"服薬方法"にチェックマークが入っている。 The dialogue communication matter information 204 represents predetermined communication matters for each dialogue scene which is an introduction part (Intro), a medical examination (History Taking), and an explanation (Explanation), and when it is actually transmitted by the speaker UA, A check mark is entered. In this example, "greeting", "self-introduction", and "confirmation of reason for visit" are checked in the communication items of the introduction department. Among the matters to be communicated in the interview, there is a check mark in "Confirmation of chief complaint". There is a check mark in "How to take medication" in the message of the explanation.
 このように、情報処理装置10では、ディスプレイ121に、評価軸採点情報202、対話シーン遷移情報203、及び対話伝達事項情報204などの中間情報が採点情報としてリアルタイムに表示されるため、採点対象者である話者UAは、採点情報を確認しながら、ディスプレイ121越しに話者UBと対話を行うことができる。 As described above, in the information processing apparatus 10, intermediate information such as the evaluation axis scoring information 202, the dialogue scene transition information 203, and the dialogue transmission matter information 204 is displayed in real time as scoring information on the display 121, so that the scoring target person The speaker UA can have a dialogue with the speaker UB through the display 121 while checking the scoring information.
 これにより、話者UAは、確認した採点情報の内容を反映させた対話として、例えば、対話戦略の変更や発話内容の言い換えなどを行うことができる。つまり、採点対象者である話者UA側のディスプレイにのみ採点情報が表示されるため、話者UAは、対話をしながら、話者UBに気付かれることなく、対話の方針を修正したり、対話スキルを向上させたりすることができる。 As a result, the speaker UA can, for example, change the dialogue strategy or paraphrase the utterance content as a dialogue that reflects the content of the confirmed scoring information. In other words, since the scoring information is displayed only on the display of the speaker UA who is the target of scoring, the speaker UA can modify the dialogue policy without being noticed by the speaker UB while having a dialogue. You can improve your dialogue skills.
 また、情報処理装置10が設置された空間SP1にいる話者UAを薬剤師役とし、情報処理装置20が設置された空間SP2にいる話者UBを患者役として、物理的に離れた場所にいる話者UAと話者UBで、対話の練習を行うことができる。さらに、テレプレゼンス装置としての情報処理装置10と情報処理装置20とが、ネットワーク50を介して相互に接続されることで、話者の全身を表示可能なサイズのディスプレイ越しに、話者UAと話者UBが対話を行うことができるため、より現実に近いかたちでのリアルな対話練習を行うことができる。 Further, the speaker UA in the space SP1 in which the information processing device 10 is installed is used as a pharmacist, and the speaker UB in the space SP2 in which the information processing device 20 is installed is used as a patient. The speaker UA and the speaker UB can practice dialogue. Further, the information processing device 10 as the telepresence device and the information processing device 20 are connected to each other via the network 50, so that the speaker UA and the speaker UA can be seen through a display having a size capable of displaying the whole body of the speaker. Since the speaker UB can have a dialogue, it is possible to practice a realistic dialogue in a form closer to reality.
(オーバラップのある対話の例)
 ところで、話者UAと話者UBの対話では、一方の話者の発話が他方の話者の発話に被ってオーバラップのある対話となることが想定される。図14は、医療面談におけるオーバラップのある対話例を示している。
(Example of dialogue with overlap)
By the way, in the dialogue between the speaker UA and the speaker UB, it is assumed that the utterance of one speaker is covered by the utterance of the other speaker and becomes an overlapping dialogue. FIG. 14 shows an example of an overlapping dialogue in a medical interview.
 図14では、左側から右側に向かう方向を時間軸として、話者UAと話者UBの対話を表したときに、黒星印で表した部分がオーバラップ部分を表している。 In FIG. 14, when the dialogue between the speaker UA and the speaker UB is represented with the direction from the left side to the right side as the time axis, the portion represented by the black star mark represents the overlapping portion.
 すなわち、話者UAが「今日はどうされましたか」である発話をしている最中に、話者UBが「えー、アレルギーの薬を買いに」である発話を開始したため、「えー」である話者UBの発話の部分が、話者UAの発話と重なっている。また、話者UAが「アレルギーの薬ですね」である発話をしている最中に、話者UBが「あ、違ったかも」である発話を開始したため、「あ、」である話者UBの発話の部分が、話者UAの発話と重なっている。 That is, while the speaker UA was speaking "What happened today?", The speaker UB started the utterance "Well, to buy medicine for allergies." The utterance part of a speaker UB overlaps with the utterance of the speaker UA. Also, while the speaker UA was uttering "It's an allergy medicine", the speaker UB started uttering "Ah, maybe it was different", so the speaker who is "Ah". The utterance part of the UB overlaps with the utterance of the speaker UA.
 ここで、情報処理装置10において、発話内容解析部154により解析される対話のテキストは、図15に示すようになる。すなわち、発話内容解析部154では、「今日はどうされましたか」を話者UAの発話とし、「えー、アレルギーの薬を買いに」を話者UBの発話として対話が解析される。また、発話内容解析部154では、「アレルギーの薬ですね」を話者UAの発話とし、「あ、違ったかも」を話者UBの発話として対話が解析される。 Here, in the information processing apparatus 10, the text of the dialogue analyzed by the utterance content analysis unit 154 is as shown in FIG. That is, in the utterance content analysis unit 154, the dialogue is analyzed with "what happened today" as the utterance of the speaker UA and "er, to buy medicine for allergies" as the utterance of the speaker UB. Further, in the utterance content analysis unit 154, the dialogue is analyzed with "It's an allergic drug" as the utterance of the speaker UA and "Oh, maybe it was different" as the utterance of the speaker UB.
 なお、情報処理装置10が設置された空間SP1にいる話者UAと、情報処理装置20が設置された空間SP2にいる話者UBとは、異なる場所にいるため、それぞれの発話が異なるマイクロフォンにより集音されて異なるチャネルで入力される。そのため、オーバラップのある対話であっても、話者ごとの発話を容易に抽出してテキスト化することができる。 Since the speaker UA in the space SP1 in which the information processing device 10 is installed and the speaker UB in the space SP2 in which the information processing device 20 is installed are in different places, each utterance is made by a different microphone. Sound is collected and input on different channels. Therefore, even in a dialogue with overlap, the utterances of each speaker can be easily extracted and converted into text.
 その後も、話者UAと話者UBの対話が継続され、図15に示すように、その対話のテキストが発話内容解析部154により解析される。 After that, the dialogue between the speaker UA and the speaker UB is continued, and as shown in FIG. 15, the text of the dialogue is analyzed by the utterance content analysis unit 154.
 例えば、「まずお名前の確認をさせてください。処方箋に記載の○○△△さんご自身でいらっしゃいますか。」である話者UAの発話に対し、「いえ、○○△△の母です」である話者UBの発話がなされたとき、話者UAの発話のテキストは、「まずお名前の確認をさせてください。」と「処方箋に記載の○○△△さんご自身でいらっしゃいますか。」に分けて解析される。 For example, in response to the speaker UA's utterance, "Please confirm your name first. Do you come by yourself, Mr. ○○ △△ stated in the prescription?", "No, I am the mother of ○○ △△. When the utterance of the speaker UB is made, the text of the utterance of the speaker UA is "Please confirm your name first." Is it analyzed separately?
 また、「○○△△さんのお母様ですね。お子さんは何か現在服用されているお薬はありますか。」である話者UAの発話に対し、「食べ物のアレルギーがあるので、食前にインタールを飲んでいます。」である話者UBの発話がなされたときには、次のような解析が行われる。すなわち、話者UAの発話のテキストは、「○○△△さんのお母様ですね。」と「お子さんは何か現在服用されているお薬はありますか。」に分けて解析される。また、話者UBの発話のテキストは、「食べ物のアレルギーがある」と「食前にインタールを飲んでいます。」に分けて解析される。 Also, in response to the speaker UA's utterance, "You are the mother of Mr. ○○ △△. Is there any medicine your child is currently taking?" When the speaker UB who says "I'm drinking le" is uttered, the following analysis is performed. In other words, the utterance text of the speaker UA is analyzed by dividing it into "Mr. ○○ △△'s mother." And "Are there any medicines your child is currently taking?" In addition, the text of the speaker UB's utterance is analyzed by dividing it into "I have a food allergy" and "I am taking intal before meals."
 このように、情報処理装置10では、話者ごとに異なるチャネルで入力される発話のテキストが、一文単位などの所定の処理単位で解析されるため、対話で度々発生する発話区間のオーバラップが音声認識結果や発話内容の解析結果に与える影響を抑制することができる。これにより、採点対象者である話者UAの言動をより正確に抽出して、より正確な採点を行うことが可能となる。 In this way, in the information processing apparatus 10, the utterance text input in different channels for each speaker is analyzed in a predetermined processing unit such as one sentence unit, so that the utterance sections that frequently occur in the dialogue overlap. It is possible to suppress the influence on the voice recognition result and the analysis result of the utterance content. This makes it possible to more accurately extract the words and actions of the speaker UA, who is the subject of scoring, and perform more accurate scoring.
 以上のように、情報処理装置10では、データベースに格納された対話採点の基準となる基準情報に基づいて、話者UA(薬剤師役の話者)と話者UB(患者役の話者)による対話が採点され、採点対象者である話者UAに対し、対話の採点に関する採点情報(中間結果や採点結果等)がリアルタイムで提示される。これにより、医療面談などの対人コミュニケーションのスキルを向上させるために、採点対象者である話者UAへ採点等を通じて適切にフィードバックを与えて支援することができる。ここで、リアルタイムで提示される採点情報は、採点対象者である話者UAからは確認することができ、かつ、対話の相手である話者UBからは分からないような自然なかたちで提示されることになる。 As described above, in the information processing apparatus 10, the speaker UA (speaker acting as a pharmacist) and the speaker UB (speaker acting as a patient) are used based on the reference information stored in the database as the reference for dialogue scoring. The dialogue is graded, and scoring information (interim results, scoring results, etc.) regarding the scoring of the dialogue is presented in real time to the speaker UA who is the target of scoring. As a result, in order to improve interpersonal communication skills such as medical interviews, it is possible to provide appropriate feedback to the speaker UA, which is the subject of scoring, through scoring and the like to support them. Here, the scoring information presented in real time is presented in a natural manner that can be confirmed by the speaker UA who is the scoring target and cannot be understood by the speaker UB who is the other party of the dialogue. Will be.
 また、情報処理装置10では、対話面談などの利用シーンに応じた加点対象言語情報や加点対象画像情報などの基準情報をデータベースに予め設定して、対話採点時に用いているため、採点者間の評価揺れを吸収した採点を行うことができる。また、テレプレゼンス装置として構成される情報処理装置10と情報処理装置20による遠隔通信技術を利用して、異なる空間にいる話者UAと話者UBが対話を行えるようにしているため、採点対象者である話者UAに対し、対話の相手である話者UBがその場いるかのような体験を実現することができる。 Further, in the information processing apparatus 10, reference information such as point-adding target language information and point-adding target image information according to usage scenes such as dialogue interviews is set in advance in a database and used at the time of dialogue scoring. It is possible to perform scoring that absorbs evaluation fluctuations. In addition, since the remote communication technology by the information processing device 10 and the information processing device 20 configured as the telepresence device is used to enable the speaker UA and the speaker UB in different spaces to have a dialogue, the scoring target. It is possible to realize an experience as if the speaker UB, who is the other party of the dialogue, is present to the speaker UA who is the person.
<2.第2の実施の形態> <2. Second Embodiment>
(装置構成)
 図16は、図1の情報処理装置10の他の構成例を示している。
(Device configuration)
FIG. 16 shows another configuration example of the information processing apparatus 10 of FIG.
 図16において、情報処理装置10Aには、図2の情報処理装置10と対応する部分に同一の符号を付してあり、その説明は省略する。図16の情報処理装置10Aは、図2の情報処理装置10と比べて、入力部106の代わりに、入力部106Aが設けられる。 In FIG. 16, the information processing apparatus 10A has the same reference numerals as those corresponding to the information processing apparatus 10 in FIG. 2, and the description thereof will be omitted. Compared to the information processing device 10 of FIG. 2, the information processing device 10A of FIG. 16 is provided with an input unit 106A instead of the input unit 106.
 入力部106Aは、操作部111、カメラ112、マイクロフォン113、及びセンサ114を有する。センサ114は、空間情報や時間情報等のセンシングを行い、そのセンシングの結果得られるセンシング情報を出力する。センサ114は、測距センサや画像センサなどの各種センサを含む。カメラ112は、画像センサとしてセンサ114に含めることができる。 The input unit 106A has an operation unit 111, a camera 112, a microphone 113, and a sensor 114. The sensor 114 senses spatial information, time information, and the like, and outputs the sensing information obtained as a result of the sensing. The sensor 114 includes various sensors such as a distance measuring sensor and an image sensor. The camera 112 can be included in the sensor 114 as an image sensor.
 なお、図1の情報処理システム1において、情報処理装置20は、図16に示した情報処理装置10Aと同様に、センサ114を設けた構成とすることができる。 In the information processing system 1 of FIG. 1, the information processing apparatus 20 can be configured to be provided with the sensor 114 in the same manner as the information processing apparatus 10A shown in FIG.
(機能的構成)
 図17は、図16の情報処理装置10Aの機能的構成例を示している。
(Functional configuration)
FIG. 17 shows an example of a functional configuration of the information processing apparatus 10A of FIG.
 図17において、情報処理装置10Aには、図3の情報処理装置10と対応する部分に同一の符号を付してあり、その説明は省略する。図17の情報処理装置10Aは、図3の情報処理装置10と比べて、画像入力部157、画像認識部158、画像解析部159、及び加点対象画像情報DB160の代わりに、センシング情報入力部171、センシング情報認識部172、センシング情報解析部173、及び加点対象センシング情報DB174が設けられる。 In FIG. 17, the information processing apparatus 10A has the same reference numerals as the parts corresponding to the information processing apparatus 10 in FIG. 3, and the description thereof will be omitted. Compared with the information processing device 10 of FIG. 3, the information processing device 10A of FIG. 17 has a sensing information input unit 171 instead of the image input unit 157, the image recognition unit 158, the image analysis unit 159, and the point addition target image information DB 160. , Sensing information recognition unit 172, sensing information analysis unit 173, and point-adding target sensing information DB 174 are provided.
 図17においては、文分割部153、発話内容解析部154、加点対象言語情報DB155、時刻取得部156、センシング情報解析部173、及び加点対象センシング情報DB174により、解析処理部191Aが構成される。 In FIG. 17, the analysis processing unit 191A is configured by the sentence division unit 153, the utterance content analysis unit 154, the point addition target language information DB 155, the time acquisition unit 156, the sensing information analysis unit 173, and the point addition target sensing information DB 174.
 センシング情報入力部171は、センシング情報(話者UA側で得られるセンシング情報)をセンシング情報認識部172に入力する。センシング情報認識部172は、センシング情報入力部171からのセンシング情報を用いたセンシング情報認識処理を行う。このセンシング情報認識処理では、処理対象のセンシング情報が認識され、そのセンシング情報認識結果がセンシング情報解析部173に供給される。このセンシング情報には、例えば、距離情報や画像情報のほか、話者UAの心拍数や脳波などの生体情報を含んでもよい。 The sensing information input unit 171 inputs sensing information (sensing information obtained on the speaker UA side) to the sensing information recognition unit 172. The sensing information recognition unit 172 performs sensing information recognition processing using the sensing information from the sensing information input unit 171. In this sensing information recognition process, the sensing information to be processed is recognized, and the sensing information recognition result is supplied to the sensing information analysis unit 173. This sensing information may include, for example, distance information, image information, and biological information such as the heart rate and brain waves of the speaker UA.
 センシング情報解析部173は、センシング情報認識部172からのセンシング情報認識結果と、加点対象センシング情報DB174に格納された加点対象センシング情報を用いたセンシング情報解析処理を行う。加点対象センシング情報は、対人コミュニケーションのスキルを採点する際の加点対象のセンシング情報を抽出(特定)するための情報である。センシング情報解析処理では、認識されたセンシング情報から、加点対象のセンシング情報が解析され、その解析結果が加点対象統合部161に供給される。また、センシング情報解析処理では、時刻取得部156からの時刻情報が用いられ、センシング情報の解析結果が発話内容の解析結果と紐付けられる。 The sensing information analysis unit 173 performs sensing information analysis processing using the sensing information recognition result from the sensing information recognition unit 172 and the point addition target sensing information stored in the point addition target sensing information DB 174. The point-added target sensing information is information for extracting (identifying) the point-added target sensing information when scoring interpersonal communication skills. In the sensing information analysis process, the sensing information of the point addition target is analyzed from the recognized sensing information, and the analysis result is supplied to the point addition target integration unit 161. Further, in the sensing information analysis process, the time information from the time acquisition unit 156 is used, and the analysis result of the sensing information is associated with the analysis result of the utterance content.
 加点対象統合部161には、発話内容解析部154からの発話内容の解析結果と、センシング情報解析部173からのセンシング情報の解析結果と、時刻取得部156からの時刻情報が供給される。加点対象統合部161は、設問情報DB162に格納された設問情報を用い、時刻情報により紐付けられた発話内容の解析結果とセンシング情報の解析結果を統合する統合処理が行われる。 The point-adding target integration unit 161 is supplied with the analysis result of the utterance content from the utterance content analysis unit 154, the analysis result of the sensing information from the sensing information analysis unit 173, and the time information from the time acquisition unit 156. The point-adding target integration unit 161 uses the question information stored in the question information DB 162 to perform an integration process for integrating the analysis result of the utterance content associated with the time information and the analysis result of the sensing information.
 図17では、空間SP1にいる話者UA側で得られるセンシング情報を用いた場合の構成を示したが、空間SP2にいる話者UB側で得られるセンシング情報を用いても構わない。図18は、図16の情報処理装置10Aの機能的構成の他の例を示している。 FIG. 17 shows a configuration when the sensing information obtained by the speaker UA side in the space SP1 is used, but the sensing information obtained by the speaker UB side in the space SP2 may be used. FIG. 18 shows another example of the functional configuration of the information processing apparatus 10A of FIG.
 図18において、情報処理装置10Aには、図17の情報処理装置10Aと対応する部分に同一の符号を付してあり、その説明は省略する。図18の情報処理装置10Aは、図17の情報処理装置10Aと比べて、センシング情報入力部181、センシング情報認識部182、センシング情報解析部183、及び加点対象センシング情報DB184が新たに設けられる。 In FIG. 18, the information processing apparatus 10A has the same reference numerals as the parts corresponding to the information processing apparatus 10A in FIG. 17, and the description thereof will be omitted. Compared to the information processing device 10A of FIG. 17, the information processing device 10A of FIG. 18 is newly provided with a sensing information input unit 181, a sensing information recognition unit 182, a sensing information analysis unit 183, and a point addition target sensing information DB 184.
 図18においては、文分割部153、発話内容解析部154、加点対象言語情報DB155、時刻取得部156、センシング情報解析部173、加点対象センシング情報DB174、センシング情報解析部183、及び加点対象センシング情報DB184により、解析処理部191Bが構成される。 In FIG. 18, the sentence division unit 153, the utterance content analysis unit 154, the point addition target language information DB 155, the time acquisition unit 156, the sensing information analysis unit 173, the point addition target sensing information DB 174, the sensing information analysis unit 183, and the point addition target sensing information. The analysis processing unit 191B is configured by the DB 184.
 センシング情報入力部181は、話者UB側で得られるセンシング情報をセンシング情報認識部182に入力する。センシング情報認識部182は、センシング情報入力部181からのセンシング情報を用いたセンシング情報認識処理を行う。このセンシング情報認識処理では、処理対象のセンシング情報が認識され、そのセンシング情報認識結果がセンシング情報解析部183に供給される。このセンシング情報には、例えば、距離情報や画像情報のほか、話者UBの心拍数や脳波などの生体情報を含んでもよい。 The sensing information input unit 181 inputs the sensing information obtained on the speaker UB side to the sensing information recognition unit 182. The sensing information recognition unit 182 performs sensing information recognition processing using the sensing information from the sensing information input unit 181. In this sensing information recognition process, the sensing information to be processed is recognized, and the sensing information recognition result is supplied to the sensing information analysis unit 183. This sensing information may include, for example, distance information, image information, and biological information such as the heart rate and brain waves of the speaker UB.
 センシング情報解析部183は、センシング情報認識部182からのセンシング情報認識結果と、加点対象センシング情報DB184に格納された加点対象センシング情報を用いたセンシング情報解析処理を行う。加点対象センシング情報は、対人コミュニケーションのスキルを採点する際の加点対象のセンシング情報を抽出(特定)するための情報である。センシング情報解析処理では、認識されたセンシング情報から、加点対象のセンシング情報が解析され、その解析結果が加点対象統合部161に供給される。 The sensing information analysis unit 183 performs sensing information analysis processing using the sensing information recognition result from the sensing information recognition unit 182 and the point addition target sensing information stored in the point addition target sensing information DB 184. The point-added target sensing information is information for extracting (identifying) the point-added target sensing information when scoring interpersonal communication skills. In the sensing information analysis process, the sensing information of the point addition target is analyzed from the recognized sensing information, and the analysis result is supplied to the point addition target integration unit 161.
 加点対象統合部161には、発話内容解析部154からの発話内容の解析結果と、センシング情報解析部173からのセンシング情報の解析結果と、センシング情報解析部183からのセンシング情報の解析結果と、時刻取得部156からの時刻情報が供給される。加点対象統合部161は、設問情報DB162に格納された設問情報を用い、時刻情報により紐付けられた発話内容の解析結果とセンシング情報の解析結果を統合する統合処理が行われる。 In the point-adding target integration unit 161, the analysis result of the utterance content from the utterance content analysis unit 154, the analysis result of the sensing information from the sensing information analysis unit 173, and the analysis result of the sensing information from the sensing information analysis unit 183 are provided. Time information from the time acquisition unit 156 is supplied. The point-adding target integration unit 161 uses the question information stored in the question information DB 162 to perform an integration process for integrating the analysis result of the utterance content associated with the time information and the analysis result of the sensing information.
 なお、情報処理装置10Aにより処理されるセンシング情報は、話者UA又は話者UBが所持しているスマートフォンやウェアラブル端末、計測器、測定器などの電子機器から取得してもよい。この電子機器においては、XYZ軸の3方向の加速度を測定する加速度センサや、XYZ軸の3軸の角速度を測定するジャイロセンサ、距離を測定する測距センサ、生物の持つ心拍数、体温、姿勢といった情報を測定する生体センサ、近接するものを測定する近接センサ、磁場(磁界)の大きさや方向を測定する磁気センサなどの各種のセンサを有することができる。 The sensing information processed by the information processing device 10A may be acquired from an electronic device such as a smartphone, a wearable terminal, a measuring instrument, or a measuring instrument possessed by the speaker UA or the speaker UB. In this electronic device, an acceleration sensor that measures acceleration in the three directions of the XYZ axes, a gyro sensor that measures the angular velocity of the three axes of the XYZ axes, a distance measuring sensor that measures the distance, and the heart rate, body temperature, and posture of the organism. It is possible to have various sensors such as a biological sensor that measures such information, a proximity sensor that measures a proximity object, and a magnetic sensor that measures the magnitude and direction of a magnetic field (magnetic field).
(対話相手評価情報の表示例)
 図19は、医療面談における対話相手評価情報の表示例を示している。
(Display example of dialogue partner evaluation information)
FIG. 19 shows an example of displaying the dialogue partner evaluation information in the medical interview.
 図19において、対話相手評価情報は、医療面談における対話相手側のセンシング情報から算出される評価を表している。図19の例では、医療面談時に話者UAが話者UBと対話を行う場合に、話者UB側で得られるセンシング情報に基づき、話者UAに対する評価軸として、理解度、共感度、関心度、好感度、及び信頼度を求めている。例えば、話者UAが発話を行ったタイミングで、話者UB側のセンサにより話者UBの瞳孔が開いたことが検知された場合に、好感度のスコアを加算するなど、対話相手の生体情報や応答などから計算される評価情報を提示することができる。 In FIG. 19, the dialogue partner evaluation information represents an evaluation calculated from the sensing information of the dialogue partner in the medical interview. In the example of FIG. 19, when the speaker UA has a dialogue with the speaker UB at the time of a medical interview, the degree of understanding, empathy, and interest are used as evaluation axes for the speaker UA based on the sensing information obtained on the speaker UB side. We are looking for degree, likability, and reliability. For example, when the sensor on the speaker UB side detects that the pupil of the speaker UB has opened at the timing when the speaker UA speaks, the biometric information of the dialogue partner is added, such as adding a favorability score. It is possible to present the evaluation information calculated from the response and the response.
 図19において、対話相手評価情報は、5つの評価軸ごとに、その評価値(スコア)を横方向に伸びる棒グラフで表される。ただし、棒グラフは、0を起点に右側又は左側に伸びており、右側にいくほど正の値が大きくなってプラス評価(高評価)を表す一方で、左側にいくほどマイナス評価(低評価)を表している。 In FIG. 19, the dialogue partner evaluation information is represented by a bar graph extending the evaluation value (score) in the horizontal direction for each of the five evaluation axes. However, the bar graph starts from 0 and extends to the right or left side, and the positive value increases toward the right side to indicate a positive evaluation (high evaluation), while the negative evaluation (low evaluation) increases toward the left side. Represents.
 ここで、図19のA,B,Cに示した評価が、その順で時系列に表示された場合、つまり、図19のAが対話の前半での評価、図19のBが対話の中盤での評価、図19のCが対話の後半での評価をそれぞれ表しているとき、例えば、次のような対話であったことが想定される。 Here, when the evaluations shown in A, B, and C in FIG. 19 are displayed in chronological order, that is, A in FIG. 19 is the evaluation in the first half of the dialogue, and B in FIG. 19 is the middle stage of the dialogue. When C in FIG. 19 and C in FIG. 19 represent the evaluation in the latter half of the dialogue, it is assumed that the dialogue is as follows, for example.
 すなわち、対話の前半では、図19のAの評価で表されるように、話者UBは、話者UAに対し、若干評価はしているものの、その値自体は大きなものではない。対話の中盤になると、図19のBの評価で表されるように、例えば話者UBが暗い表情をしているなどの情報から、理解度を除く全ての評価が低評価になっている。その後、対話の後半になると、図19のCの評価で表されるように、例えば話者UBが頷いていたり、明るい表情をしていたりするなどの情報から、理解度、共感度、関心度、好感度、及び信頼度の全ての評価が高評価になっている。 That is, in the first half of the dialogue, as shown by the evaluation of A in FIG. 19, the speaker UB gives a slight evaluation to the speaker UA, but the value itself is not large. In the middle of the dialogue, as shown by the evaluation of B in FIG. 19, all the evaluations except the comprehension level are low from the information such as the speaker UB having a dark facial expression. After that, in the latter half of the dialogue, as shown by the evaluation of C in FIG. 19, the degree of understanding, empathy, and degree of interest are based on information such as the speaker UB nodding or having a bright facial expression. , Favorability, and reliability are all highly evaluated.
(採点情報の表示例)
 図20は、医療面談における採点情報のリアルタイム表示例を示している。
(Display example of scoring information)
FIG. 20 shows an example of real-time display of scoring information in a medical interview.
 図20では、図13と同様に、話者UAは、ディスプレイ121越しに話者UBと対話をしながら採点情報を確認することができるが、採点情報に、対話相手評価情報206がさらに含まれる。 In FIG. 20, as in FIG. 13, the speaker UA can confirm the scoring information while interacting with the speaker UB through the display 121, but the scoring information further includes the dialogue partner evaluation information 206. ..
 対話相手評価情報206は、話者UB側のセンシング情報から算出される評価値(スコア)が棒グラフで表される。この例では、話者UBは、話者UAに対する理解度、共感度、好感度、及び信頼度が高評価となっているが、関心度が低評価になっている。 In the dialogue partner evaluation information 206, the evaluation value (score) calculated from the sensing information on the speaker UB side is represented by a bar graph. In this example, the speaker UB has a high evaluation of comprehension, empathy, likability, and reliability for the speaker UA, but has a low evaluation of interest.
 また、対話シーン遷移情報203には、対話シーンとしては、導入部(Intro)と問診(History Taking)が終了し、説明(Explanation)まで進行していることが表されているが、プログレスバーに対応して、センシング情報から得られる計測値(facial,head,body)も表されている。例えば、顔(facial)、頭(head)、身体(body)に関する計測値から、話者UBの状態を推測することができる。 In addition, the dialogue scene transition information 203 indicates that the dialogue scene has completed the introduction section (Intro) and the interview (History Taking) and has progressed to the explanation (Explanation). Correspondingly, the measured values (facial, head, body) obtained from the sensing information are also shown. For example, the state of the speaker UB can be inferred from the measured values of the face, the head, and the body.
 このように、情報処理装置10Aでは、評価軸採点情報202、対話シーン遷移情報203、対話伝達事項情報204、及び対話相手評価情報206などの中間情報が採点情報としてリアルタイムに表示されるため、採点対象者である話者UAは、採点情報を確認して話者UBがどのような評価をしているかを考慮しながら、ディスプレイ121越しに話者UBと対話を行うことができる。なお、評価軸採点情報202、対話シーン遷移情報203、対話伝達事項情報204、及び対話相手評価情報206は、中間情報の一例であり、それらの情報のうち、少なくとも1つの情報が中間情報に含まれていればよい。 As described above, in the information processing apparatus 10A, intermediate information such as the evaluation axis scoring information 202, the dialogue scene transition information 203, the dialogue transmission item information 204, and the dialogue partner evaluation information 206 is displayed in real time as scoring information, so that the scoring is performed. The speaker UA, who is the target person, can have a dialogue with the speaker UB through the display 121 while checking the scoring information and considering what kind of evaluation the speaker UB is doing. The evaluation axis scoring information 202, the dialogue scene transition information 203, the dialogue transmission item information 204, and the dialogue partner evaluation information 206 are examples of intermediate information, and at least one of these information is included in the intermediate information. It suffices if it is.
(画面表示の遷移例)
 図21、図22は、入力情報と採点情報の遷移例を示している。図21、図22には、医療面談の開始から21秒経過後と、6分21秒経過後における入力情報と採点情報の表示例がそれぞれ示されている。入力情報と採点情報は、話者UBを含む映像に重畳して表示される。
(Example of screen display transition)
21 and 22 show an example of transition between input information and scoring information. 21 and 22 show examples of displaying input information and scoring information 21 seconds after the start of the medical interview and 6 minutes and 21 seconds later, respectively. The input information and the scoring information are superimposed and displayed on the video including the speaker UB.
 図21、図22において、情報処理装置10Aのディスプレイ121には、評価軸採点情報202と、対話シーン遷移情報203と、対話伝達事項情報204と、対話相手評価情報206とを含む採点情報が表示されている。 In FIGS. 21 and 22, the display 121 of the information processing apparatus 10A displays scoring information including the evaluation axis scoring information 202, the dialogue scene transition information 203, the dialogue transmission matter information 204, and the dialogue partner evaluation information 206. Has been done.
 図21において、対話シーン遷移情報203には、開始から21秒経過時点での対話シーンである導入部(Intro)とその進行状況を表したプログレスバーが表されている。また、入力情報211には、導入部である対話シーンにおいて、話者UAにより、「はじめまして」である発話がなされ、そのときの表情が笑顔でお辞儀をしていたことが吹き出しで表されている。 In FIG. 21, the dialogue scene transition information 203 shows an introduction unit (Intro), which is a dialogue scene at the time when 21 seconds have elapsed from the start, and a progress bar showing the progress thereof. Further, in the input information 211, in the dialogue scene which is the introduction part, the speaker UA made a utterance "Nice to meet you", and the expression at that time was bowed with a smile. ..
 対話伝達事項情報204には、導入部(Intro)の伝達事項のうち、"挨拶"にチェックマークが入ったチェックリストが表されている。対話相手評価情報206には、話者UB側のセンシング情報から算出される評価値が表されている。この例では、話者UAにより挨拶をされた話者UBは、話者UAに対する共感度や好感度、信頼度が上がっていることが棒グラフで示されている。 Dialogue communication item information 204 shows a checklist with a check mark in "greeting" among the communication items of the introduction department (Intro). The dialogue partner evaluation information 206 represents an evaluation value calculated from the sensing information on the speaker UB side. In this example, the bar graph shows that the speaker UB greeted by the speaker UA has increased empathy, likability, and reliability for the speaker UA.
 評価軸採点情報202には、7つの評価軸ごとの評価値が棒グラフで表される。この例では、対話伝達事項情報204に表されたように、導入部(Intro)の伝達事項として話者UAにより挨拶が実施済みであることから、それに応じて、評価軸採点情報202における伝達項目である評価軸の評価値(棒グラフの値)が増加(+1)している。また、対話相手評価情報206に表されたように、話者UBが話者UAに対して共感度や好感度、信頼度を有することから、それに応じて、評価軸採点情報202における同調性である評価軸の評価値(棒グラフの値)が増加(+0.5)している。なお、図21では、説明を分かりやすくするために、評価値の増加の要因を矢印で表しているが、実際にはこの矢印は非表示とされる。 In the evaluation axis scoring information 202, the evaluation values for each of the seven evaluation axes are represented by a bar graph. In this example, as shown in the dialogue transmission item information 204, since the speaker UA has already given a greeting as the transmission item of the introduction unit (Intro), the transmission item in the evaluation axis scoring information 202 is correspondingly. The evaluation value (value of the bar graph) of the evaluation axis is increasing (+1). Further, as shown in the dialogue partner evaluation information 206, since the speaker UB has sympathy, likability, and reliability with respect to the speaker UA, the conformity in the evaluation axis scoring information 202 accordingly. The evaluation value (value of the bar graph) of a certain evaluation axis is increasing (+0.5). In addition, in FIG. 21, in order to make the explanation easy to understand, the factor of the increase in the evaluation value is represented by an arrow, but this arrow is actually hidden.
 その後(6分後)、図22において、対話シーン遷移情報203には、開始から6分21経過時点での対話シーンである説明(Explanation)とその進行状況を表したプログレスバーが表されている。また、入力情報211には、説明である対話シーンにおいて、話者UAにより、「毎食後2錠ずつ水かぬるま湯で、飲んでください。」である発話がなされ、そのときの表情が笑顔で手のジェスチャをしていたことが吹き出しで表されている。 After that (6 minutes later), in FIG. 22, the dialogue scene transition information 203 shows an explanation (Explanation) which is a dialogue scene at the time when 6 minutes 21 has elapsed from the start, and a progress bar showing the progress thereof. .. In addition, in the input information 211, in the dialogue scene described, the speaker UA made an utterance saying "Please drink 2 tablets with water or lukewarm water after each meal." It is shown in the balloon that he was doing the gesture of.
 対話伝達事項情報204には、説明(Explanation)の伝達事項のうち、服薬方法にチェックの入ったチェックリストが表されている。対話相手評価情報206には、話者UB側のセンシング情報から算出される評価値として、服薬方法の説明を受けた話者UBによる理解度や信頼度などが高評価となる評価値が棒グラフで表されている。 Dialogue communication item information 204 shows a checklist in which the medication method is checked among the communication items of the explanation (Explanation). In the dialogue partner evaluation information 206, as an evaluation value calculated from the sensing information on the speaker UB side, an evaluation value in which the degree of understanding and reliability by the speaker UB who received the explanation of the medication method is highly evaluated is a bar graph. It is represented.
 また、対話伝達事項情報204に表されたように、説明(Explanation)の伝達事項として話者UAにより服薬方法の説明が実施済みであることから、それに応じて、評価軸採点情報202における伝達項目である評価軸の評価値が増加(+1)している。また、対話相手評価情報206に表されたように、話者UBが話者UAに対して理解度や信頼度などを有することから、それに応じて、評価軸採点情報202における正確性である評価軸の評価値が増加(+1)している。なお、図22でも、評価値の増加の要因を矢印で表しているが、実際にはこの矢印は非表示とされる。 Further, as shown in the dialogue communication item information 204, since the speaker UA has already explained the medication method as the communication item of the explanation (Explanation), the transmission item in the evaluation axis scoring information 202 accordingly. The evaluation value of the evaluation axis is increasing (+1). Further, as shown in the dialogue partner evaluation information 206, since the speaker UB has an understanding level and a reliability level with respect to the speaker UA, the evaluation is the accuracy in the evaluation axis scoring information 202 accordingly. The evaluation value of the axis is increasing (+1). Although the factor of the increase in the evaluation value is also indicated by an arrow in FIG. 22, this arrow is actually hidden.
 評価軸採点情報202は、話者UAと話者UBの対話中に提示される中間情報(中間結果)であるが、対話終了後には、最終な採点結果として提示される情報でもあり、対話伝達事項情報204や対話相手評価情報206等の中間情報と連動した情報であると言える。 The evaluation axis scoring information 202 is intermediate information (interim result) presented during the dialogue between the speaker UA and the speaker UB, but is also information presented as the final scoring result after the dialogue is completed, and the dialogue is transmitted. It can be said that the information is linked to the intermediate information such as the matter information 204 and the dialogue partner evaluation information 206.
 このように、情報処理装置10Aでは、ディスプレイ越しに話者UAと話者UBが対話を行う際に、言語解析と、顔の表情や身体の動き、相手がどう感じているかなどを合わせて採点して、その入力情報と採点情報に応じて画面がリアルタイムで更新されるようにしている。これにより、話者UAは、話者UBとの対話中に、話者UBに気付かれることなく対話の採点結果を確認して、それ以降の対話で確認内容を反映させた対話を行うことができる。 In this way, in the information processing device 10A, when the speaker UA and the speaker UB have a dialogue through the display, the language analysis, facial expressions, body movements, how the other party feels, etc. are scored together. Then, the screen is updated in real time according to the input information and the scoring information. As a result, the speaker UA can confirm the scoring result of the dialogue without being noticed by the speaker UB during the dialogue with the speaker UB, and can perform a dialogue reflecting the confirmed contents in the subsequent dialogue. can.
 次に、図23乃至図26を参照して、医療面談において、薬剤師役の話者UAと患者役の話者UBが対話を行う場合に、ディスプレイにリアルタイムで表示される採点情報を、話者UAが確認してその後の対話方針を判断するケースの具体例を説明する。 Next, referring to FIGS. 23 to 26, when the speaker UA acting as a pharmacist and the speaker UB acting as a patient have a dialogue in a medical interview, the speaker can obtain the scoring information displayed in real time on the display. A specific example of a case where the UA confirms and determines the subsequent dialogue policy will be described.
(第1の例)
 図23は、医療面談において話者UAが伝達事項の漏れに気づき対話戦略の変更があった場合のフィードバックの例を示している。図23では、図中の上側から下側に向かう矢印により時間の経過を表している。また、図23では、矢印の右側の画面は、話者UA側の情報処理装置10Aのディスプレイ121に表示される画面(話者UBの映像に重畳される採点情報)である。また、図中の鐘の絵は、スピーカ122から出力される音をイメージしている。なお、この矢印や画面の意味は、後述する図25、図26でも同様とされる。
(First example)
FIG. 23 shows an example of feedback when the speaker UA notices an omission of communication items in a medical interview and changes the dialogue strategy. In FIG. 23, the passage of time is represented by an arrow pointing from the upper side to the lower side in the figure. Further, in FIG. 23, the screen on the right side of the arrow is a screen displayed on the display 121 of the information processing apparatus 10A on the speaker UA side (scoring information superimposed on the image of the speaker UB). Further, the picture of the bell in the figure is an image of the sound output from the speaker 122. The meanings of the arrows and screens are the same in FIGS. 25 and 26, which will be described later.
 図23において、医療面談の開始直後の時刻t11に、話者UAは、「こんにちは、今日はどうされましたか?」である発話を行っている。この発話の言語解析と、顔の表情や身体の動き、相手がどう感じているかなどを合わせて採点することで、ディスプレイ121には、次のような採点情報が話者UBの映像に重畳して表示される。すなわち、導入部(Intro)の伝達事項のうち、"挨拶"と"来院理由"の確認にチェックマークが入ったチェックリストと、話者UBからの共感度や関心度、信頼度があることを示す棒グラフを含む採点情報が表示される。 In FIG. 23, at time t11 immediately after the start of the medical interview, the speaker UA is uttering "Hello, what happened today?" By scoring the linguistic analysis of this utterance together with facial expressions, body movements, how the other party feels, etc., the following scoring information is superimposed on the speaker UB image on the display 121. Is displayed. In other words, among the matters communicated by the Intro, there is a checklist with a check mark in the confirmation of "greeting" and "reason for visiting", and that there is sympathy, interest, and reliability from the speaker UB. The scoring information including the bar graph is displayed.
 時刻t12において、話者UBは、「高血圧のお薬をください。これ、処方箋です。」である発話を行っている。それに対し、時刻t13において、話者UAは、「こちらははじめてのお薬ですね。5分ほどお時間を頂いて、お薬の説明をしてもいいですか。」である発話を行っている。この発話の言語解析と対話相手の評価などが合わせて採点されることで、チェックリストと棒グラフを含む採点情報の表示が更新される。 At time t12, the speaker UB is speaking, "Please give me a medicine for hypertension. This is a prescription." On the other hand, at time t13, the speaker UA made an utterance saying, "This is my first medicine. May I take about 5 minutes to explain the medicine?" There is. By scoring the linguistic analysis of this utterance and the evaluation of the dialogue partner together, the display of the scoring information including the checklist and the bar graph is updated.
 時刻t14において、話者UBは、「いえ、医師から十分に聞きましたので、説明無しで大丈夫です。」である発話を行っている。この時点で、話者UAは、画面のチェックリストを確認して自己紹介をし損ねたことを認識し、さらに棒グラフを確認して信頼度も上がっていないので、薬剤師の役割と合わせて自己紹介をするべきであると対話戦略を変更する。 At time t14, the speaker UB made an utterance saying, "No, I've heard enough from the doctor, so it's okay without explanation." At this point, the speaker UA confirmed the checklist on the screen and recognized that he failed to introduce himself, and also confirmed the bar graph and the reliability did not increase, so he introduced himself together with the role of the pharmacist. And change the dialogue strategy.
 時刻t15において、話者UAは、「申し遅れましたが、私は、○○薬局の薬剤師の△△と申します。薬の飲み合わせの確認を私と一緒にしてみませんか。」である発話を行っている。この発話の言語解析と対話相手の評価などが合わせて採点されることで、チェックリストと棒グラフを含む採点情報の表示が更新される。すなわち、導入部(Intro)の伝達事項のうち、"挨拶"と"来院理由の確認"に加えて、"自己紹介"にチェックマークが入ったチェックリストと、話者UBからの信頼度が上昇したことを示す棒グラフを含む採点情報が表示される。 At time t15, the speaker UA said, "I'm late, but I'm △△, a pharmacist at the XX pharmacy. Why don't you check the medicines with me?" I'm making a certain utterance. By scoring the linguistic analysis of this utterance and the evaluation of the dialogue partner together, the display of the scoring information including the checklist and the bar graph is updated. In other words, among the matters communicated by the Intro, in addition to "greeting" and "confirmation of the reason for visiting the hospital", a checklist with a check mark in "self-introduction" and the reliability from the speaker UB increased. Scoring information is displayed, including a bar graph indicating that it has been done.
 このとき、情報処理装置10Aでは、スピーカ122から所定の音を出力することで、話者UAに対し、伝達事項の漏れに気付いて対話戦略を変更することで、加点されたことを通知することができる。また、この加点の通知は、音による通知に限らず、例えば、画面上に加点を示す情報を表示したり、振動により触覚提示をしたりして通知してもよい。ただし、対話戦略の変更が加点につながったことを通知するため、通常の加点通知とは異なる方法でフィードバックして、通常の加点通知とは何らかの差分を持たせるようにする。 At this time, the information processing apparatus 10A outputs a predetermined sound from the speaker 122 to notify the speaker UA that the points have been added by noticing the omission of the transmitted matter and changing the dialogue strategy. Can be done. Further, the notification of the point addition is not limited to the notification by sound, and may be notified, for example, by displaying information indicating the point addition on the screen or by performing a tactile presentation by vibration. However, in order to notify that the change in the dialogue strategy has led to the addition of points, feedback is given in a method different from the usual addition notification so that there is some difference from the normal addition notification.
 図24は、フィードバック処理の流れを説明するフローチャートである。 FIG. 24 is a flowchart illustrating the flow of feedback processing.
 話者UAの発話が受け付けられると(S101)、その発話に終了サインが含まれるかどうかが判定される(S102)。終了サインが含まれると判定された場合(S102のYes)、処理は終了する。一方で、終了サインが含まれないと判定された場合(S102のNo)、対話に問題があるかどうかが判定される(S103)。 When the utterance of the speaker UA is accepted (S101), it is determined whether or not the utterance includes the end sign (S102). If it is determined that the end sign is included (Yes in S102), the process ends. On the other hand, when it is determined that the end sign is not included (No in S102), it is determined whether or not there is a problem in the dialogue (S103).
 対話に問題がないと判定された場合(S103のNo)、画面上に加点を示す情報が表示される(S104)。一方で、対話に問題があると判定された場合(S103のYes)、リカバリ可能な位置であるかどうかが判定される(S105)。リカバリ可能な位置ではないと判定された場合(S105のNo)、画面上に失点を示す情報が表示される(S106)。 If it is determined that there is no problem in the dialogue (No in S103), information indicating points to be added is displayed on the screen (S104). On the other hand, if it is determined that there is a problem in the dialogue (Yes in S103), it is determined whether or not the position is recoverable (S105). When it is determined that the position is not recoverable (No in S105), information indicating a goal is displayed on the screen (S106).
 リカバリ可能な位置であると判定された場合(S105のYes)、話者UAの発話が受け付けられる(S107)。話者UAの発話(リカバリに関する発話)が受け付けられると、その発話によりリカバリしたかどうかが判定される(S108)。 When it is determined that the position is recoverable (Yes in S105), the utterance of the speaker UA is accepted (S107). When the utterance of the speaker UA (the utterance related to recovery) is accepted, it is determined whether or not the speaker has recovered by the utterance (S108).
 リカバリしていないと判定された場合(S108のNo)、ステップS105に戻り、それ以降の処理が繰り返される。すなわち、リカバリ可能な位置であれば、再度、話者UAの発話を受け付けてリカバリしたかどうかが判定され、リカバリが不可能であれば、画面上に失点を示す情報が表示される。 If it is determined that recovery has not been performed (No in S108), the process returns to step S105, and the subsequent processing is repeated. That is, if the position is recoverable, it is determined whether or not the speaker UA's utterance is received and recovered again, and if recovery is not possible, information indicating a goal is displayed on the screen.
 一方で、リカバリしたと判定された場合(S108のYes)、音で加点を通知し(S109)、画面上に加点を示す情報を表示する(S110)。加点の通知は、音での出力、及び画面表示のうち、少なくとも一方で行われればよい。 On the other hand, if it is determined that the recovery has been made (Yes in S108), the points to be added are notified by sound (S109), and the information indicating the points to be added is displayed on the screen (S110). The notification of the points added may be given at least one of the sound output and the screen display.
 ステップS104,S106,又はS110が終了すると、処理は、ステップS101に戻り、上述した処理が繰り返される。 When step S104, S106, or S110 is completed, the process returns to step S101, and the above-mentioned process is repeated.
 ここで、図24に示したフィードバック処理に、図23に示した第1の例を当てはめると、次のようになる。すなわち、第1の例では、薬剤師役の話者UAが発話をしているが、導入部(Intro)の伝達事項のうち、"自己紹介"と"名前の確認"が漏れているため、対話に問題があると判定され(S103のYes)、対話シーンが説明(Explanation)にまで進んでいないため、リカバリ可能な位置であると判定される(S105のYes)。そして、画面を確認した話者UAが対話戦略を変更して自己紹介に関する発話を行っているため、リカバリしたと判定され(S108のYes)、音の出力又は画面表示により加点が通知される(S109,S110)。 Here, when the first example shown in FIG. 23 is applied to the feedback processing shown in FIG. 24, it becomes as follows. That is, in the first example, the speaker UA who plays the role of a pharmacist speaks, but since "self-introduction" and "confirmation of name" are omitted from the communication items of the introduction department (Intro), dialogue is performed. It is determined that there is a problem with (Yes in S103), and since the dialogue scene has not progressed to the explanation (Explanation), it is determined that the position is recoverable (Yes in S105). Then, since the speaker UA who confirmed the screen changes the dialogue strategy and speaks about self-introduction, it is determined that the recovery has been made (Yes in S108), and points are added by sound output or screen display (Yes). S109, S110).
(第2の例)
 図25は、医療面談における伝達事項の漏れへの対話戦略の変更がなかった場合のフィードバックの例を示している。
(Second example)
FIG. 25 shows an example of feedback in the absence of a change in dialogue strategy for omission of communication in a medical interview.
 図25において、時刻t21乃至時刻t24では、図23の時刻t11乃至時刻t14と同様に、話者UAと話者UBの対話が行われ、発話の言語解析と対話相手の評価が合わせて採点されることで、チェックリストや棒グラフ等を含む画面が表示される。 In FIG. 25, at time t21 to time t24, a dialogue between the speaker UA and the speaker UB is performed as in the case of time t11 to time t14 in FIG. 23, and the language analysis of the speech and the evaluation of the dialogue partner are scored together. By doing so, a screen including a checklist, a bar graph, etc. is displayed.
 時刻t25において、話者UAは、画面のチェックリストを確認して、自己紹介をし損ねたことを認識しているが、医療面談における制約時間(例えば10分)を考慮すると、残り時間が少ないので、そのまま問診に進んだほうがよいと判断する。そのため、話者UAは、「お薬に間違いがないかを念のため確認したいので、症状を聞いてもいいですか?」である発話を行っている。 At time t25, the speaker UA checks the checklist on the screen and recognizes that he failed to introduce himself, but considering the time limit in the medical interview (for example, 10 minutes), the remaining time is short. Therefore, it is judged that it is better to proceed to the interview as it is. Therefore, the speaker UA is making an utterance saying, "I want to make sure that the medicine is correct, so can I ask for the symptoms?"
 この発話の言語解析と対話相手の評価などが合わせて採点されることで、チェックリストと棒グラフを含む採点情報の表示が更新される。すなわち、対話シーンとして導入部(Intro)と問診(History Taking)を含み、導入部の伝達事項のうち、"挨拶"と"来院理由の確認"にチェックマークが入り、問診の伝達事項のうち、"主訴の確認"にチェックマークが入ったチェックリストと、特にスコアが変化していない棒グラフを含む採点情報が表示される。つまり、この画面に示されるように、対話シーンが導入部から問診に遷移しているため、この時点で、導入部の伝達事項を発話しても加点にはならない。 By scoring the language analysis of this utterance and the evaluation of the dialogue partner together, the display of the scoring information including the checklist and the bar graph is updated. That is, the dialogue scene includes the introduction section (Intro) and the interview (History Taking), and among the communication items of the introduction section, "greeting" and "confirmation of the reason for visiting the hospital" are checked, and among the communication items of the interview. A checklist with a checkmark in "Confirm Chief Complaint" and scoring information, including a bar graph with no particular change in score, are displayed. That is, as shown on this screen, since the dialogue scene has transitioned from the introduction section to the interview, even if the communication matter of the introduction section is spoken at this point, no points are added.
 ここで、図24に示したフィードバック処理に、図25に示した第2の例を当てはめると、次のようになる。すなわち、第2の例では、薬剤師役の話者UAが発話をしているが、導入部(Intro)の伝達事項のうち、"自己紹介"と"名前の確認"が漏れているため、対話に問題があると判定される(S103のYes)。しかしながら、対話シーンが問診(History Taking)に進んでいるため、リカバリ可能な位置ではないと判定され(S105のNo)、画面上に失点を示す情報が表示される(S106)。 Here, when the second example shown in FIG. 25 is applied to the feedback processing shown in FIG. 24, it becomes as follows. That is, in the second example, the speaker UA who plays the role of a pharmacist speaks, but since "self-introduction" and "confirmation of name" are omitted from the communication items of the introduction section (Intro), dialogue is performed. It is determined that there is a problem with (Yes in S103). However, since the dialogue scene has progressed to a medical examination (History Taking), it is determined that the position is not recoverable (No in S105), and information indicating a goal is displayed on the screen (S106).
(第3の例)
 図26は、医療面談における薬剤師役の話者UAの言い換えにより患者役の話者UBの理解度が変化した場合のフィードバックの例を示している。
(Third example)
FIG. 26 shows an example of feedback when the comprehension level of the speaker UB as a patient changes due to the paraphrase of the speaker UA as a pharmacist in a medical interview.
 図26において、医療面談の開始から所定時間経過後の時刻t31に、話者UAは、「どのようなときに症状が寛解しますか?」である発話を行っている。この発話の言語解析と対話相手の評価などが合わせて採点されることで、ディスプレイ121には、次のような採点情報が話者UBの映像に重畳して表示される。すなわち、導入部(Intro)の伝達事項のうち、"挨拶"と"来院理由の確認"にチェックマークが入ったチェックリストと、関心度や好感度、信頼度などが高評価となる棒グラフを含む採点情報が表示される。 In FIG. 26, at time t31, which is a predetermined time after the start of the medical interview, the speaker UA makes an utterance, "When does the symptom remit?". By scoring the linguistic analysis of the utterance and the evaluation of the dialogue partner together, the following scoring information is superimposed on the video of the speaker UB and displayed on the display 121. In other words, it includes a checklist with checkmarks in "Greetings" and "Confirmation of reasons for visiting the hospital" and a bar graph that is highly evaluated in terms of interest, likability, reliability, etc. The scoring information is displayed.
 この採点情報の更新によって、更新前の採点情報の表示内容と比べて、理解度の評価が急激に低下している。また、この画面には、話者UBが困った表情で首を傾げていることを示す計測値(facial,head,body)が表示されている。 Due to this update of the scoring information, the evaluation of the degree of understanding has dropped sharply compared to the displayed contents of the scoring information before the update. Further, on this screen, measured values (facial, head, body) indicating that the speaker UB is tilting his head with a troubled expression are displayed.
 それに対し、時刻t32に、話者UBは、「お昼とかですね?」である発話を行っている。このとき、話者UAは、画面の棒グラフや計測値を確認して、急に下がった理解度や表情からすると、「寛解」という言葉が通じていない可能性があるので、他の言葉に言い換えたほうが良いと判断する。 On the other hand, at time t32, the speaker UB is uttering "Is it lunch?" At this time, the speaker UA checks the bar graph and measured values on the screen, and from the sudden drop in comprehension and facial expression, the word "remission" may not be understood, so paraphrase it into another word. Judge that it is better.
 時刻t33において、話者UAは、「そうですか、ではどのようなときに症状が和らぐか聞かせてもらえますか?」である発話を行っている。この発話の言語解析と対話相手の評価などが合わせて採点されることで、ディスプレイ121には、次のような採点情報が重畳表示される。 At time t33, the speaker UA is making an utterance, "Well, then can you tell me when the symptoms will be alleviated?" By scoring the linguistic analysis of the utterance and the evaluation of the dialogue partner together, the following scoring information is superimposed and displayed on the display 121.
 すなわち、話者UBの理解度の評価が上昇した棒グラフ(理解度が10から80に上昇)を含む採点情報が表示される。このとき、言い換え対象の文言(例えば「寛解」を含む文言と「和らぐ」を含む文言)の類似度(例えば類似度0.8)を算出して評価に加味することができる。また、この採点情報には、話者UBが特に困った表情等をしていないことを示す計測値が表示される。 That is, the scoring information including the bar graph (the comprehension level increased from 10 to 80) in which the evaluation of the comprehension level of the speaker UB increased is displayed. At this time, the degree of similarity (for example, the degree of similarity 0.8) of the wording to be paraphrased (for example, the wording including “remission” and the wording including “relaxation”) can be calculated and added to the evaluation. Further, in this scoring information, a measured value indicating that the speaker UB does not have a particularly troubled facial expression or the like is displayed.
 また、このとき、情報処理装置10Aでは、スピーカ122から所定の音を出力することで、話者UAに対し、薬剤師役の話者UAの言い換えにより、患者役の話者UBの理解度が変化して加点されたことを通知することができる。その後、時刻t34において、話者UBは、話者UAの質問の意図を理解して、「あっ、なるほど、お腹を温めると落ち着きますね。」である発話を行っている。 Further, at this time, in the information processing apparatus 10A, by outputting a predetermined sound from the speaker 122, the degree of understanding of the speaker UB acting as a patient changes with the speaker UA by paraphrasing the speaker UA acting as a pharmacist. It is possible to notify that points have been added. After that, at time t34, the speaker UB understands the intention of the speaker UA's question and makes an utterance saying, "Oh, I see, it calms down when I warm my stomach."
 ここで、図24に示したフィードバック処理に、図26に示した第3の例を当てはめると、次のようになる。すなわち、第3の例では、薬剤師役の話者UAが発話(「寛解」を含む発話)をしているが、当該発話後に患者役の話者UBの理解度が著しく低下して非言語情報でネガティブな値が出ているため、対話に問題があると判定される(S103のYes)。 Here, when the third example shown in FIG. 26 is applied to the feedback processing shown in FIG. 24, it becomes as follows. That is, in the third example, the speaker UA acting as a pharmacist is speaking (speech including "relief"), but after the speech, the understanding of the speaker UB acting as a patient is significantly reduced and non-verbal information. Since a negative value is given in, it is judged that there is a problem in the dialogue (Yes in S103).
 また、対話シーンが導入部(Intro)であって話者交代が1回であるため、リカバリ可能な位置であると判定され(S105のYes)、話者UAが同一内容の発話(「寛解」を言い換えた発話)を繰り返した後に、話者UBの理解度に改善が見られたため、リカバリしたと判定され(S108のYes)、音の出力又は画面表示により加点が通知される(S109,S110) In addition, since the dialogue scene is the introduction part (Intro) and the speaker change is once, it is determined that the position is recoverable (Yes in S105), and the speaker UA speaks with the same content (“remission”). After repeating (paraphrasing utterance), the speaker UB's comprehension level was improved, so it was judged that the speaker had recovered (Yes in S108), and points were added by sound output or screen display (S109, S110). )
 なお、医療現場の専門職については、医師、看護師、薬剤師などでは、客観的臨床能力試験(OSCE:Objective Structured Clinical Examination)という試験の中で、対人コミュニケーションに関する試験があるが、例えば、そのような試験の練習などで、情報処理システム1による対人コミュニケーションの評価を利用することができる。 Regarding professionals in the medical field, doctors, nurses, pharmacists, etc. have a test related to interpersonal communication in the objective clinical examination (OSCE: Objective Structured Clinical Examination). The evaluation of interpersonal communication by the information processing system 1 can be used for practicing various tests.
<3.変形例> <3. Modification example>
(アパレル接客の例)
 上述した説明では、情報処理システム1による対人コミュニケーションの評価の想定利用シーンとして、医療面談に利用した場合を説明したが、コールセンタ、営業、販売などの対人コミュニケーションの評価で利用しても構わない。以下、本技術を適用した情報処理システム1を、アパレル販売の対人コミュニケーションの評価で利用する場合を説明する。
(Example of apparel customer service)
In the above description, the case where the information processing system 1 is used for the medical interview is described as the assumed usage scene of the evaluation of the interpersonal communication, but it may be used for the evaluation of the interpersonal communication such as the call center, sales, and sales. Hereinafter, a case where the information processing system 1 to which the present technology is applied is used for evaluation of interpersonal communication in apparel sales will be described.
 図27は、アパレル接客に利用した場合のリアルタイムスコアの表示例を示している。アパレル接客に利用する場合、話者UAはアパレル販売員であり、話者UBは客(顧客)である。 FIG. 27 shows an example of displaying a real-time score when used for apparel customer service. When used for apparel customer service, the speaker UA is an apparel salesperson and the speaker UB is a customer (customer).
 図27において、話者UA側の情報処理装置10Aのディスプレイ121には、話者UBを含む映像221が表示される。映像221には、評価軸採点情報222と、対話シーン遷移情報223と、対話伝達事項情報224と、対話相手評価情報226を含む採点情報が重畳して表示される。 In FIG. 27, the image 221 including the speaker UB is displayed on the display 121 of the information processing device 10A on the speaker UA side. The video 221 displays the evaluation axis scoring information 222, the dialogue scene transition information 223, the dialogue transmission item information 224, and the scoring information including the dialogue partner evaluation information 226 superimposed.
 評価軸採点情報222には、傾聴力、正確性、開示性、拡散性、及び提案力である評価軸ごとの評価値(スコア)が棒グラフで表されている。対話シーン遷移情報223には、開始から6分21秒経過時点での対話シーンである商品提案(Recommendation)までの進行状況が表されている。つまり、対話シーンとしては、声掛け(Small talk)とニーズ探索(Needs exploration)が終了し、商品提案(Recommendation)まで進行している。これらの対話シーンの全体の流れと現時点での進行状況は、接客フロー225により表されている。 In the evaluation axis scoring information 222, the evaluation value (score) for each evaluation axis, which is listening ability, accuracy, disclosure, diffusivity, and proposal ability, is represented by a bar graph. The dialogue scene transition information 223 shows the progress from the start to the product proposal (Recommendation), which is the dialogue scene at the time when 6 minutes and 21 seconds have elapsed. In other words, as a dialogue scene, the call (Small talk) and the needs search (Needs exploration) have been completed, and the product proposal (Recommendation) has progressed. The overall flow of these dialogue scenes and the current progress are represented by the customer service flow 225.
 対話伝達事項情報224には、声掛け(Small talk)、ニーズ探索(Needs exploration)、及び商品提案(Recommendation)である対話シーンごとに所定の伝達事項が表され、実際に話者UAにより伝達された場合にはチェックマークが入る。この例では、声掛けの伝達事項のうち、"声掛け"と"季節の話題"と"新着アイテム紹介"にチェックマークが入っている。ニーズ探索の伝達事項のうち、"アイテムの色"と"アイテムの形状"と"アイテムの素材"と"着用シーン"にチェックマークが入っている。商品提案の伝達事項のうち、"アイテム紹介"と"トレンドへの言及"にチェックマークが入っている。 In the dialogue communication matter information 224, predetermined communication matters are expressed for each dialogue scene such as small talk, needs exploration, and product proposal (Recommendation), and are actually transmitted by the speaker UA. If so, a check mark will be added. In this example, "Call", "Seasonal topics", and "Introduction of new items" are checked among the items to be communicated. Among the items to be communicated in the needs search, there are check marks in "Item color", "Item shape", "Item material", and "Wearing scene". Among the items to be communicated in the product proposal, "Item introduction" and "Reference to trends" are checked.
 対話相手評価情報226には、話者UB側のセンシング情報から算出される評価値(スコア)が棒グラフで表される。この例では、話者UBは、話者UAに対する理解度、共感度、好感度、及び信頼度が比較的高く、関心度が低くなっている。 In the dialogue partner evaluation information 226, the evaluation value (score) calculated from the sensing information on the speaker UB side is represented by a bar graph. In this example, the speaker UB has a relatively high degree of understanding, empathy, likability, and reliability for the speaker UA, and has a low degree of interest.
 情報処理装置10,10Aは、アパレル接客に利用する場合、医療面談に利用した場合と同様に、図3、図17、又は図18に示した構成となるが、各データベースに格納される情報をアパレル接客用の情報に変更する必要がある。すなわち、加点対象言語情報DB155、加点対象画像情報DB160、設問情報DB162、加点対象センシング情報DB174、及び加点対象センシング情報DB184に、医療面談用の情報ではなく、アパレル接客用の情報を登録すればよい。 When the information processing devices 10 and 10A are used for apparel customer service, they have the configuration shown in FIG. 3, FIG. 17 or FIG. 18, as in the case of using for medical interviews, but the information stored in each database is stored. It is necessary to change the information for apparel customer service. That is, information for apparel customer service, not information for medical interviews, may be registered in the point-adding target language information DB 155, the point-adding target image information DB 160, the question information DB 162, the point-adding target sensing information DB 174, and the point-adding target sensing information DB 184. ..
(システムの他の構成)
 上述した説明では、情報処理システム1において、情報処理装置10,10Aが、図3、図17、又は図18に示した構成を有するとして説明したが、情報処理装置10,10Aが、図3、図17、又は図18に示した構成の一部の機能については、ネットワーク50に接続されたサーバ30が有してもよい。
(Other configurations of the system)
In the above description, in the information processing system 1, the information processing devices 10 and 10A have been described as having the configuration shown in FIG. 3, FIG. 17, or FIG. 18, but the information processing devices 10 and 10A are described in FIG. The server 30 connected to the network 50 may have some functions of the configuration shown in FIG. 17 or FIG.
 図28は、本技術を適用した情報処理システムの一実施の形態の他の構成例を示している。 FIG. 28 shows another configuration example of an embodiment of an information processing system to which the present technology is applied.
 図28において、情報処理システム1Aは、情報処理装置10と、情報処理装置20と、サーバ30とが、ネットワーク50を介して相互に接続されて構成される。 In FIG. 28, the information processing system 1A is configured such that the information processing device 10, the information processing device 20, and the server 30 are connected to each other via the network 50.
 例えば、図3に示した構成要素のうち、解析処理部191及び採点処理部192がサーバ30に設けられ、音声入力部151、音声認識部152、画像入力部157、画像認識部158、中間情報表示部163、中間結果通知部164、及び採点結果表示部166が情報処理装置10に設けられる。 For example, among the components shown in FIG. 3, an analysis processing unit 191 and a scoring processing unit 192 are provided in the server 30, and a voice input unit 151, a voice recognition unit 152, an image input unit 157, an image recognition unit 158, and intermediate information. The information processing apparatus 10 is provided with a display unit 163, an intermediate result notification unit 164, and a scoring result display unit 166.
 この場合において、情報処理装置10は、音声認識と画像認識の結果を含むデータを、ネットワーク50を介してサーバ30に送信する。サーバ30は、情報処理装置10から送信されてくるデータを用いて解析処理と採点処理を行う。サーバ30は、処理の結果を含むデータを、ネットワーク50を介して情報処理装置10に送信する。情報処理装置10は、サーバ30から送信されてくるデータに基づき、情報を表示するか、又は音声を出力する。 In this case, the information processing apparatus 10 transmits data including the results of voice recognition and image recognition to the server 30 via the network 50. The server 30 performs analysis processing and scoring processing using the data transmitted from the information processing apparatus 10. The server 30 transmits data including the processing result to the information processing apparatus 10 via the network 50. The information processing apparatus 10 displays information or outputs voice based on the data transmitted from the server 30.
 なお、音声認識部152と画像認識部158を、情報処理装置10ではなく、サーバ30側に設けるなど、他の構成を用いても構わない。すなわち、本技術では、1つの機能を、ネットワークを介して複数の装置で分担、共同して処理するクラウドコンピューティングの構成をとることができる。 It should be noted that other configurations may be used, such as providing the voice recognition unit 152 and the image recognition unit 158 on the server 30 side instead of the information processing device 10. That is, in the present technology, it is possible to adopt a cloud computing configuration in which one function is shared by a plurality of devices via a network and jointly processed.
 また、情報処理装置10は、ホームサーバ等の処理装置と、ディスプレイ装置等の入出力装置から構成されてもよい。この場合において、処理装置と入出力装置は、同一の空間(同じ部屋や同じ建物など)に設けられる。すなわち、図3に示した構成要素のうち、解析処理部191、採点処理部192、音声認識部152、及び画像認識部158が処理装置に設けられ、音声入力部151、画像入力部157、中間情報表示部163、中間結果通知部164、及び採点結果表示部166が入出力装置に設けられる。 Further, the information processing device 10 may be composed of a processing device such as a home server and an input / output device such as a display device. In this case, the processing device and the input / output device are provided in the same space (same room, same building, etc.). That is, among the components shown in FIG. 3, the analysis processing unit 191 and the scoring processing unit 192, the voice recognition unit 152, and the image recognition unit 158 are provided in the processing device, and the voice input unit 151, the image input unit 157, and the intermediate unit are provided. The information display unit 163, the intermediate result notification unit 164, and the scoring result display unit 166 are provided in the input / output device.
 上述した説明では、情報処理装置10がディスプレイ装置等のテレプレゼンス装置として構成される場合を示したが、情報処理装置10は、PC(Personal Computer)等の電子機器であってもよい。例えば、情報処理装置10と情報処理装置20がPCである場合には、ビデオ通話アプリ等のアプリケーションを利用することで、遠隔地にいる話者UAと話者UBとがディスプレイ越しに対話を行うことができる。 In the above description, the case where the information processing device 10 is configured as a telepresence device such as a display device has been shown, but the information processing device 10 may be an electronic device such as a PC (Personal Computer). For example, when the information processing device 10 and the information processing device 20 are PCs, the speaker UA and the speaker UB at a remote location have a dialogue through the display by using an application such as a video call application. be able to.
 なお、図28においては、情報処理装置10を説明したが、情報処理装置10Aについても同様に、一部の機能をサーバ30に処理させることができる。 Although the information processing device 10 has been described in FIG. 28, the server 30 can similarly process some of the functions of the information processing device 10A.
 以上のように、本技術では、情報処理装置10又は情報処理装置10Aが、データベースに格納された対話採点の基準となる基準情報に基づいて、空間SP1にいる話者UA(例えば薬剤師役やアパレル販売員役の話者)と空間SP2にいる話者UB(例えば患者役や顧客役の話者)による対話を採点し、話者UAに対し、対話の採点に関する採点情報をリアルタイムに提示する。これにより、話者UAは、話者UBとの対話時に、採点情報をリアルタイムで確認できる。そのため、医療面談やアパレル接客などの対人コミュニケーションのスキルを向上させるために、採点対象者である話者UAへ採点等を通じて適切にフィードバックを与えて支援することができる。 As described above, in the present technology, the information processing device 10 or the information processing device 10A is a speaker UA (for example, a pharmacist role or an apparel) in the space SP1 based on the reference information stored in the database as a reference for dialogue scoring. The dialogue between the speaker acting as a salesperson) and the speaker UB in the space SP2 (for example, the speaker acting as a patient or a customer) is scored, and the scoring information regarding the scoring of the dialogue is presented to the speaker UA in real time. As a result, the speaker UA can confirm the scoring information in real time at the time of dialogue with the speaker UB. Therefore, in order to improve interpersonal communication skills such as medical interviews and apparel customer service, it is possible to provide appropriate feedback to the speaker UA, which is the subject of scoring, through scoring and the like to support them.
 現状では、対話技能が求められる対人コミュニケーションが必要な場合に、そのスキルを評価し改善に活かしたいという要求があったが、そのスキルを客観的に評価してフィードバックすることは困難であった。それに対して、本技術では、採点対象者である話者UAに対し、対話時に中間結果等の中間情報をリアルタイムに提示することができるため、採点対象者のスキルを客観的に評価してフィードバックし、そのスキル改善に活かすことができる。 Currently, when interpersonal communication that requires dialogue skills is required, there was a request to evaluate the skills and utilize them for improvement, but it was difficult to objectively evaluate the skills and provide feedback. On the other hand, in this technique, intermediate information such as intermediate results can be presented in real time to the speaker UA who is the scoring target, so that the skill of the scoring target is objectively evaluated and fed back. However, it can be used to improve that skill.
 また、対人コミュニケーションのスキル向上のためには、より本番に近い状況下でシミュレーションを行うことが理想であるが、現状では、対話の練習相手の育成や、採点者間の評価揺れ、地理的制約などもあり、そのようなシミュレーションの実施にはコストがかかる。それに対して、本技術では、テレプレゼンス装置として構成される情報処理装置10と情報処理装置20を利用して、異なる空間にいる話者UAと話者UBが対話を行うことができるため、地理的制約がなく、より簡単に対話に参加できるため対話の練習相手も育成も容易ある。また、情報処理装置10では、データベースに格納された対話採点の基準となる基準情報を用いた採点が行われるため、採点者間の評価揺れのような事象も発生しない。その結果として、本技術を用いることで、現状と比べて、より本番に近い状況下でのシミュレーションを容易に実現することができる。 In addition, in order to improve interpersonal communication skills, it is ideal to perform simulations in a situation closer to the actual situation, but at present, training of dialogue practice partners, evaluation fluctuations among graders, and geographical restrictions Therefore, it is costly to carry out such a simulation. On the other hand, in the present technology, the speaker UA and the speaker UB in different spaces can have a dialogue by using the information processing device 10 and the information processing device 20 configured as telepresence devices. Since there are no restrictions and you can participate in the dialogue more easily, it is easy to practice and train the dialogue. Further, in the information processing apparatus 10, since the scoring is performed using the reference information stored in the database as the reference for the dialogue scoring, an event such as an evaluation fluctuation between the graders does not occur. As a result, by using this technique, it is possible to easily realize a simulation in a situation closer to the actual production than in the present situation.
 さらに、現状、対人コミュニケーションの評価という観点では、テキストベースのカウンセリング評価などは試みられているが、本技術のように、複数人の話者の発話をリアルタイムに解析して採点するようなシステムは存在していない。 Furthermore, at present, from the viewpoint of evaluation of interpersonal communication, text-based counseling evaluation is being attempted, but a system such as this technology that analyzes and scores the utterances of multiple speakers in real time is available. Does not exist.
 なお、情報処理装置10,10A(CPU101)が実行するプログラムは、例えば、パッケージメディア等としてのリムーバブル記録媒体に記録して提供することができる。リムーバブル記録媒体は、磁気ディスク、光ディスク、光磁気ディスク、又は半導体メモリなどを含む。また、プログラムは、ローカルエリアネットワーク、インターネット、デジタル衛星放送といった、有線又は無線の伝送媒体を介して提供することができる。 The program executed by the information processing devices 10 and 10A (CPU101) can be recorded and provided on a removable recording medium such as a package medium, for example. The removable recording medium includes a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, and the like. The program can also be provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting.
 本明細書において、情報処理装置10,10A(CPU101)がプログラムに従って行う処理は、必ずしも上述のフローチャートとして記載された順序に沿って時系列に行われる必要はない。すなわち、情報処理装置10,10A(CPU101)がプログラムに従って行う処理は、並列的あるいは個別に実行される処理(例えば、並列処理あるいはオブジェクトによる処理)も含む。 In the present specification, the processes performed by the information processing devices 10 and 10A (CPU101) according to the program do not necessarily have to be performed in chronological order in the order described in the above flowchart. That is, the processing performed by the information processing devices 10 and 10A (CPU101) according to the program includes processing executed in parallel or individually (for example, parallel processing or processing by an object).
 また、プログラムは、1のコンピュータ(プロセッサ)により処理されてもよいし、複数のコンピュータによって分散処理されてもよい。例えば、上述のフローチャートの各ステップは、1つの装置で実行するほか、複数の装置で分担して実行することができる。さらに、1つのステップに複数の処理が含まれる場合には、その1つのステップに含まれる複数の処理は、1つの装置で実行するほか、複数の装置で分担して実行することができる。さらに、プログラムは、遠方のコンピュータに転送されて実行されてもよい。 Further, the program may be processed by one computer (processor) or may be distributed processed by a plurality of computers. For example, each step in the above flowchart may be executed by one device or may be shared and executed by a plurality of devices. Further, when a plurality of processes are included in one step, the plurality of processes included in the one step can be executed by one device or shared by a plurality of devices. In addition, the program may be transferred to a distant computer for execution.
 本明細書において、システムとは、複数の構成要素(装置、モジュール(部品)等)の集合を意味し、すべての構成要素が同一筐体中にあるか否かは問わない。したがって、別個の筐体に収納され、ネットワークを介して接続されている複数の装置、及び、1つの筐体の中に複数のモジュールが収納されている1つの装置は、いずれも、システムである。 In the present specification, the system means a set of a plurality of components (devices, modules (parts), etc.), and it does not matter whether all the components are in the same housing. Therefore, a plurality of devices housed in separate housings and connected via a network, and a device in which a plurality of modules are housed in one housing are both systems. ..
 なお、本技術の実施の形態は、上述した実施の形態に限定されるものではなく、本技術の要旨を逸脱しない範囲において種々の変更が可能である。また、本明細書に記載された効果はあくまで例示であって限定されるものではなく、他の効果があってもよい。 The embodiment of the present technique is not limited to the above-described embodiment, and various changes can be made without departing from the gist of the present technique. Further, the effects described in the present specification are merely exemplary and not limited, and other effects may be used.
 なお、本技術は、以下のような構成をとることができる。 Note that this technology can have the following configuration.
(1)
 対話採点の基準となる基準情報に基づいて、第1の空間にいる第1の話者と前記第1の空間と異なる第2の空間にいる第2の話者による対話を採点し、
 前記第1の話者に対し、前記対話の採点に関する採点情報をリアルタイムに提示する
 処理部を備える
 情報処理装置。
(2)
 前記採点情報は、前記対話の採点結果又は前記対話の途中の中間情報を含む
 前記(1)に記載の情報処理装置。
(3)
 前記処理部は、前記第1の話者と前記第2の話者との対話中に、その時点における中間情報を含む前記採点情報を提示する
 前記(2)に記載の情報処理装置。
(4)
 前記中間情報は、利用シーンに応じた評価軸ごとの評価を表した評価軸採点情報、対話シーンの遷移を表した対話シーン遷移情報、対話シーンごとの伝達事項の達成度を表した対話伝達事項情報、及び前記第2の話者による前記第1の話者に対する評価を表した対話相手評価情報のうち、少なくとも1つの情報を含む
 前記(3)に記載の情報処理装置。
(5)
 前記処理部は、リアルタイムに提示された前記採点情報を確認した前記第1の話者が、それ以降の対話で確認内容を反映させた対話を行った場合、その反映結果に応じた前記採点情報を提示する
 前記(2)乃至(4)のいずれかに記載の情報処理装置。
(6)
 前記処理部は、前記第1の話者に対し、その反映結果を、リアルタイムでの前記採点情報の提示と異なる方法で通知する
 前記(5)に記載の情報処理装置。
(7)
 前記処理部は、前記第1の話者と前記第2の話者との対話終了後に、対話全体の採点結果を含む前記採点情報を提示する
 前記(2)乃至(6)のいずれかに記載の情報処理装置。
(8)
 前記処理部は、前記第1の話者の発話内容に基づいて、前記対話を採点する
 前記(1)乃至(7)のいずれかに記載の情報処理装置。
(9)
 前記処理部は、予め設定された採点項目例文との類似度、対話内の構成、発話態度分類、及び対話構成内の発話態度分類のうち、少なくとも1つを対象として、前記第1の話者の発話を解析することで、前記対話を採点する
 前記(8)に記載の情報処理装置。
(10)
 前記処理部は、前記第1の話者及び前記第2の話者の少なくとも一方の話者に関するセンシング情報に基づいて、前記対話を採点する
 前記(8)又は(9)に記載の情報処理装置。
(11)
 前記センシング情報は、各種センサにより得られる情報であって、前記第1の話者による発話のタイミングに応じた情報である
 前記(10)に記載の情報処理装置。
(12)
 前記センシング情報は、カメラにより撮影された撮影画像を含み、
 前記処理部は、予め設定された話者の表情、話者の動作、話者の視線、及び提示物のうち、少なくとも1つを対象として、前記撮影画像を解析することで、前記対話を採点する
 前記(11)に記載の情報処理装置。
(13)
 前記処理部は、前記発話内容及び前記センシング情報の解析結果に、予め設定された加点条件を適用して得られる前記採点情報を提示する
 前記(10)乃至(12)のいずれかに記載の情報処理装置。
(14)
 前記第1の話者は、採点対象者であり、
 前記第2の話者は、前記採点対象者の対話の相手であり、
 前記処理部は、
  前記第2の話者を含む映像をディスプレイに表示し、
  前記採点情報に応じた情報を前記ディスプレイに表示するか、又は前記採点情報に応じた音をスピーカから出力する
 前記(1)乃至(13)のいずれかに記載の情報処理装置。
(15)
 前記第1の空間には、第1のカメラと第1のディスプレイが設置され、
 前記第2の空間には、第2のカメラと第2のディスプレイが設置され、
 前記第1の空間と前記第2の空間との間で、一方の空間に設置されたカメラにより撮影された撮影画像を、他方の空間に設置されたディスプレイによりリアルタイムで表示する
 前記(14)に記載の情報処理装置。
(16)
 前記第1の空間に設置された前記第1のカメラと前記第1のディスプレイと一体となって構成され、前記第2の空間に設置された前記第2のカメラと前記第2のディスプレイと一体となって構成された他の情報処理装置とネットワークを介して相互に接続される
 前記(15)に記載の情報処理装置。
(17)
 第1のセンサをさらに備え、
 前記処理部は、前記第1のセンサ、及び前記他の情報処理装置が備える第2のセンサから得られるセンシング情報に基づいて、前記対話を採点する
 前記(16)に記載の情報処理装置。
(18)
 情報処理装置が、
 対話採点の基準となる基準情報に基づいて、第1の空間にいる第1の話者と前記第1の空間と異なる第2の空間にいる第2の話者による対話を採点し、
 前記第1の話者に対し、前記対話の採点に関する採点情報をリアルタイムに提示する
 情報処理方法。
(1)
Based on the reference information that serves as the basis for scoring the dialogue, the dialogue between the first speaker in the first space and the second speaker in the second space different from the first space is scored.
An information processing device including a processing unit that presents scoring information regarding the scoring of the dialogue to the first speaker in real time.
(2)
The information processing apparatus according to (1), wherein the scoring information includes a scoring result of the dialogue or intermediate information in the middle of the dialogue.
(3)
The information processing apparatus according to (2), wherein the processing unit presents the scoring information including intermediate information at that time during a dialogue between the first speaker and the second speaker.
(4)
The intermediate information includes evaluation axis scoring information indicating evaluation for each evaluation axis according to the usage scene, dialogue scene transition information indicating transition of dialogue scene, and dialogue transmission item indicating achievement level of transmission item for each dialogue scene. The information processing apparatus according to (3) above, which includes at least one piece of information and dialogue partner evaluation information representing the evaluation of the first speaker by the second speaker.
(5)
When the first speaker who confirmed the scoring information presented in real time has a dialogue reflecting the confirmation contents in the subsequent dialogue, the processing unit has the scoring information according to the reflection result. The information processing apparatus according to any one of (2) to (4) above.
(6)
The information processing device according to (5), wherein the processing unit notifies the first speaker of the reflection result by a method different from the presentation of the scoring information in real time.
(7)
Described in any one of (2) to (6) above, the processing unit presents the scoring information including the scoring result of the entire dialogue after the dialogue between the first speaker and the second speaker is completed. Information processing equipment.
(8)
The information processing device according to any one of (1) to (7) above, wherein the processing unit scores the dialogue based on the utterance content of the first speaker.
(9)
The processing unit targets at least one of the similarity with the preset scoring item example sentence, the composition in the dialogue, the speech attitude classification, and the speech attitude classification in the dialogue composition, and the first speaker. The information processing apparatus according to (8) above, which scores the dialogue by analyzing the utterance of the above.
(10)
The information processing apparatus according to (8) or (9), wherein the processing unit scores the dialogue based on sensing information about at least one speaker of the first speaker and the second speaker. ..
(11)
The information processing apparatus according to (10), wherein the sensing information is information obtained by various sensors and is information according to the timing of utterance by the first speaker.
(12)
The sensing information includes a captured image captured by a camera.
The processing unit scores the dialogue by analyzing the captured image of at least one of the preset facial expressions of the speaker, the movement of the speaker, the line of sight of the speaker, and the presentation. The information processing apparatus according to (11) above.
(13)
The information according to any one of (10) to (12) above, wherein the processing unit presents the scoring information obtained by applying preset point addition conditions to the analysis result of the utterance content and the sensing information. Processing equipment.
(14)
The first speaker is a scoring subject and is
The second speaker is the other party of the dialogue of the graded person, and is
The processing unit
An image including the second speaker is displayed on the display, and the image is displayed.
The information processing apparatus according to any one of (1) to (13), wherein the information corresponding to the scoring information is displayed on the display, or the sound corresponding to the scoring information is output from the speaker.
(15)
A first camera and a first display are installed in the first space.
A second camera and a second display are installed in the second space.
In the above (14), an image taken by a camera installed in one space between the first space and the second space is displayed in real time by a display installed in the other space. The information processing device described.
(16)
The first camera installed in the first space and the first display are integrally configured, and the second camera installed in the second space and the second display are integrated. The information processing apparatus according to (15) above, which is interconnected with other information processing apparatus configured as above via a network.
(17)
With a first sensor
The information processing device according to (16), wherein the processing unit scores the dialogue based on sensing information obtained from the first sensor and the second sensor included in the other information processing device.
(18)
Information processing equipment
Based on the reference information that serves as the basis for scoring the dialogue, the dialogue between the first speaker in the first space and the second speaker in the second space different from the first space is scored.
An information processing method that presents scoring information regarding the scoring of the dialogue to the first speaker in real time.
 1,1A 情報処理システム, 10,10A 情報処理装置, 20 情報処理装置, 30 サーバ, 50 ネットワーク, 101 CPU, 102 ROM, 103 RAM, 106,106A 入力部, 107 出力部, 108 記憶部, 109 通信部, 111 操作部, 112 カメラ, 113 マイクロフォン, 114 センサ, 121 ディスプレイ, 122 スピーカ, 151 音声入力部, 152 音声認識部, 153 文分割部, 154 発話内容解析部, 155 加点対象言語情報DB, 156 時刻取得部, 157 画像入力部, 158 画像認識部, 159 画像解析部, 160 加点対象画像情報DB, 161 加点対象統合部, 162 設問情報DB, 163 中間情報表示部, 164 中間結果通知部, 165 採点結果生成部, 166 採点結果表示部, 171 センシング情報入力部, 172 センシング情報認識部, 173 センシング情報解析部, 174 加点対象センシング情報DB, 181 センシング情報入力部, 182 センシング情報認識部, 183 センシング情報解析部, 184 加点対象センシング情報DB, 191,191A,191B 解析処理部, 192 採点処理部 1,1A information processing system, 10,10A information processing device, 20 information processing device, 30 server, 50 network, 101 CPU, 102 ROM, 103 RAM, 106, 106A input unit, 107 output unit, 108 storage unit, 109 communication Unit, 111 operation unit, 112 camera, 113 microphone, 114 sensor, 121 display, 122 speaker, 151 voice input unit, 152 voice recognition unit, 153 sentence division unit, 154 speech content analysis unit, 155 point-adding target language information DB, 156 Time acquisition unit, 157 image input unit, 158 image recognition unit, 159 image analysis unit, 160 point addition target image information DB, 161 point addition target integration unit, 162 question information DB, 163 intermediate information display unit, 164 intermediate result notification unit, 165 Scoring result generation unit, 166 scoring result display unit, 171 sensing information input unit, 172 sensing information recognition unit, 173 sensing information analysis unit, 174 scoring target sensing information DB, 181 sensing information input unit, 182 sensing information recognition unit, 183 sensing Information analysis unit, 184 point-adding target sensing information DB, 191,191A, 191B analysis processing unit, 192 scoring processing unit

Claims (18)

  1.  対話採点の基準となる基準情報に基づいて、第1の空間にいる第1の話者と前記第1の空間と異なる第2の空間にいる第2の話者による対話を採点し、
     前記第1の話者に対し、前記対話の採点に関する採点情報をリアルタイムに提示する
     処理部を備える
     情報処理装置。
    Based on the reference information that serves as the basis for scoring the dialogue, the dialogue between the first speaker in the first space and the second speaker in the second space different from the first space is scored.
    An information processing device including a processing unit that presents scoring information regarding the scoring of the dialogue to the first speaker in real time.
  2.  前記採点情報は、前記対話の採点結果又は前記対話の途中の中間情報を含む
     請求項1に記載の情報処理装置。
    The information processing apparatus according to claim 1, wherein the scoring information includes a scoring result of the dialogue or intermediate information in the middle of the dialogue.
  3.  前記処理部は、前記第1の話者と前記第2の話者との対話中に、その時点における中間情報を含む前記採点情報を提示する
     請求項2に記載の情報処理装置。
    The information processing device according to claim 2, wherein the processing unit presents the scoring information including intermediate information at that time during a dialogue between the first speaker and the second speaker.
  4.  前記中間情報は、利用シーンに応じた評価軸ごとの評価を表した評価軸採点情報、対話シーンの遷移を表した対話シーン遷移情報、対話シーンごとの伝達事項の達成度を表した対話伝達事項情報、及び前記第2の話者による前記第1の話者に対する評価を表した対話相手評価情報のうち、少なくとも1つの情報を含む
     請求項3に記載の情報処理装置。
    The intermediate information includes evaluation axis scoring information indicating evaluation for each evaluation axis according to the usage scene, dialogue scene transition information indicating transition of dialogue scene, and dialogue transmission item indicating achievement level of transmission item for each dialogue scene. The information processing apparatus according to claim 3, further comprising at least one piece of information and dialogue partner evaluation information representing the evaluation of the first speaker by the second speaker.
  5.  前記処理部は、リアルタイムに提示された前記採点情報を確認した前記第1の話者が、それ以降の対話で確認内容を反映させた対話を行った場合、その反映結果に応じた前記採点情報を提示する
     請求項2に記載の情報処理装置。
    When the first speaker who confirmed the scoring information presented in real time has a dialogue reflecting the confirmation contents in the subsequent dialogue, the processing unit has the scoring information according to the reflection result. The information processing apparatus according to claim 2.
  6.  前記処理部は、前記第1の話者に対し、その反映結果を、リアルタイムでの前記採点情報の提示と異なる方法で通知する
     請求項5に記載の情報処理装置。
    The information processing device according to claim 5, wherein the processing unit notifies the first speaker of the reflection result by a method different from the presentation of the scoring information in real time.
  7.  前記処理部は、前記第1の話者と前記第2の話者との対話終了後に、対話全体の採点結果を含む前記採点情報を提示する
     請求項2に記載の情報処理装置。
    The information processing device according to claim 2, wherein the processing unit presents the scoring information including the scoring result of the entire dialogue after the dialogue between the first speaker and the second speaker is completed.
  8.  前記処理部は、前記第1の話者の発話内容に基づいて、前記対話を採点する
     請求項1に記載の情報処理装置。
    The information processing device according to claim 1, wherein the processing unit scores the dialogue based on the utterance content of the first speaker.
  9.  前記処理部は、予め設定された採点項目例文との類似度、対話内の構成、発話態度分類、及び対話構成内の発話態度分類のうち、少なくとも1つを対象として、前記第1の話者の発話を解析することで、前記対話を採点する
     請求項8に記載の情報処理装置。
    The processing unit targets at least one of the similarity with the preset scoring item example sentence, the composition in the dialogue, the speech attitude classification, and the speech attitude classification in the dialogue composition, and the first speaker. The information processing apparatus according to claim 8, wherein the dialogue is scored by analyzing the utterance of the above.
  10.  前記処理部は、前記第1の話者及び前記第2の話者の少なくとも一方の話者に関するセンシング情報に基づいて、前記対話を採点する
     請求項8に記載の情報処理装置。
    The information processing device according to claim 8, wherein the processing unit scores the dialogue based on sensing information about at least one speaker of the first speaker and the second speaker.
  11.  前記センシング情報は、各種センサにより得られる情報であって、前記第1の話者による発話のタイミングに応じた情報である
     請求項10に記載の情報処理装置。
    The information processing device according to claim 10, wherein the sensing information is information obtained by various sensors and is information according to the timing of utterance by the first speaker.
  12.  前記センシング情報は、カメラにより撮影された撮影画像を含み、
     前記処理部は、予め設定された話者の表情、話者の動作、話者の視線、及び提示物のうち、少なくとも1つを対象として、前記撮影画像を解析することで、前記対話を採点する
     請求項11に記載の情報処理装置。
    The sensing information includes a captured image captured by a camera.
    The processing unit scores the dialogue by analyzing the captured image of at least one of the preset facial expressions of the speaker, the movement of the speaker, the line of sight of the speaker, and the presentation. The information processing apparatus according to claim 11.
  13.  前記処理部は、前記発話内容及び前記センシング情報の解析結果に、予め設定された加点条件を適用して得られる前記採点情報を提示する
     請求項10に記載の情報処理装置。
    The information processing device according to claim 10, wherein the processing unit presents the scoring information obtained by applying a preset point addition condition to the analysis result of the utterance content and the sensing information.
  14.  前記第1の話者は、採点対象者であり、
     前記第2の話者は、前記採点対象者の対話の相手であり、
     前記処理部は、
      前記第2の話者を含む映像をディスプレイに表示し、
      前記採点情報に応じた情報を前記ディスプレイに表示するか、又は前記採点情報に応じた音をスピーカから出力する
     請求項1に記載の情報処理装置。
    The first speaker is a scoring subject and is
    The second speaker is the other party of the dialogue of the graded person, and is
    The processing unit
    An image including the second speaker is displayed on the display, and the image is displayed.
    The information processing apparatus according to claim 1, wherein the information corresponding to the scoring information is displayed on the display, or the sound corresponding to the scoring information is output from the speaker.
  15.  前記第1の空間には、第1のカメラと第1のディスプレイが設置され、
     前記第2の空間には、第2のカメラと第2のディスプレイが設置され、
     前記第1の空間と前記第2の空間との間で、一方の空間に設置されたカメラにより撮影された撮影画像を、他方の空間に設置されたディスプレイによりリアルタイムで表示する
     請求項14に記載の情報処理装置。
    A first camera and a first display are installed in the first space.
    A second camera and a second display are installed in the second space.
    14. The 14th aspect of the present invention, wherein an image taken by a camera installed in one space between the first space and the second space is displayed in real time by a display installed in the other space. Information processing equipment.
  16.  前記第1の空間に設置された前記第1のカメラと前記第1のディスプレイと一体となって構成され、前記第2の空間に設置された前記第2のカメラと前記第2のディスプレイと一体となって構成された他の情報処理装置とネットワークを介して相互に接続される
     請求項15に記載の情報処理装置。
    The first camera installed in the first space and the first display are integrally configured, and the second camera installed in the second space and the second display are integrated. The information processing apparatus according to claim 15, which is interconnected with other information processing apparatus configured as follows via a network.
  17.  第1のセンサをさらに備え、
     前記処理部は、前記第1のセンサ、及び前記他の情報処理装置が備える第2のセンサから得られるセンシング情報に基づいて、前記対話を採点する
     請求項16に記載の情報処理装置。
    With a first sensor
    The information processing device according to claim 16, wherein the processing unit scores the dialogue based on sensing information obtained from the first sensor and the second sensor included in the other information processing device.
  18.  情報処理装置が、
     対話採点の基準となる基準情報に基づいて、第1の空間にいる第1の話者と前記第1の空間と異なる第2の空間にいる第2の話者による対話を採点し、
     前記第1の話者に対し、前記対話の採点に関する採点情報をリアルタイムに提示する
     情報処理方法。
    Information processing equipment
    Based on the reference information that serves as the basis for scoring the dialogue, the dialogue between the first speaker in the first space and the second speaker in the second space different from the first space is scored.
    An information processing method that presents scoring information regarding the scoring of the dialogue to the first speaker in real time.
PCT/JP2021/039945 2020-11-13 2021-10-29 Information processing device and information processing method WO2022102432A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2020189071 2020-11-13
JP2020-189071 2020-11-13

Publications (1)

Publication Number Publication Date
WO2022102432A1 true WO2022102432A1 (en) 2022-05-19

Family

ID=81602233

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/039945 WO2022102432A1 (en) 2020-11-13 2021-10-29 Information processing device and information processing method

Country Status (1)

Country Link
WO (1) WO2022102432A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007520790A (en) * 2003-11-25 2007-07-26 ハイパークオリティー,インク. Customer / Agent Dialogue Audio / Video Service Quality Analyzer
JP2010517098A (en) * 2007-01-30 2010-05-20 ブレイクスルー パフォーマンス テック エルエルシー System and method for computerized interactive technology training
JP2018124604A (en) * 2017-01-30 2018-08-09 グローリー株式会社 Customer service support system, customer service support device and customer service support method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007520790A (en) * 2003-11-25 2007-07-26 ハイパークオリティー,インク. Customer / Agent Dialogue Audio / Video Service Quality Analyzer
JP2010517098A (en) * 2007-01-30 2010-05-20 ブレイクスルー パフォーマンス テック エルエルシー System and method for computerized interactive technology training
JP2018124604A (en) * 2017-01-30 2018-08-09 グローリー株式会社 Customer service support system, customer service support device and customer service support method

Similar Documents

Publication Publication Date Title
US20220293007A1 (en) Computing technologies for diagnosis and therapy of language-related disorders
Grogan-Johnson et al. A comparison of speech sound intervention delivered by telepractice and side-by-side service delivery models
Tsiourti et al. A virtual assistive companion for older adults: design implications for a real-world application
Jeong et al. Deploying a robotic positive psychology coach to improve college students’ psychological well-being
US20210027759A1 (en) Interactive dementia assistive devices and systems with artificial intelligence, and related methods
US20050255434A1 (en) Interactive virtual characters for training including medical diagnosis training
US20150088546A1 (en) Mobile Information Gateway for Use by Medical Personnel
Alepis et al. Object-oriented user interfaces for personalized mobile learning
Ochs et al. An architecture of virtual patient simulation platform to train doctors to break bad news
KR20200043658A (en) Vr presentation and interview training system
CN116762125A (en) Environment collaboration intelligent system and method
O’Brien et al. Repurposing a smartwatch to support individuals with autism spectrum disorder: Sensory and operational considerations
WO2021172039A1 (en) Information processing device, information processing method, and program
WO2022102432A1 (en) Information processing device and information processing method
US20220254514A1 (en) Medical Intelligence System and Method
LoPresti et al. Consumer satisfaction with telerehabilitation service provision of alternative computer access and augmentative and alternative communication
WO2023079370A1 (en) System and method for enhancing quality of a teaching-learning experience
Di Nuovo et al. Experimental evaluation of a multi-modal user interface for a robotic service
Flaubert et al. Augmentative and alternative communication and voice products and technologies
Jakubowitz et al. Telepractice: A Clinical Guide for Speech-language Pathologists
Shane et al. AAC in the 21st century The outcome of technology: Advancements and amended societal attitudes
Johnsen Design and validation of a virtual human system for interpersonal skills education
Shan et al. Speech-in-noise comprehension is improved when viewing a deep-neural-network-generated talking face
Oker et al. Laurence Chaby1, 2*, Amine Benamara3, Maribel Pino4, 5, Elise Prigent3, Brian Ravenet3, Jean-Claude Martin3, Hélène Vanderstichel6, Raquel Becerril-Ortega6, Anne-Sophie Rigaud4, 5 and Mohamed Chetouani2
Scholten et al. Similarity in action with an Embodied Conversational Agent: can synchronous speech yield higher levels of rapport? An exploratory eHealth study

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21891679

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21891679

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP