WO2024009623A1 - Evaluation system, evaluation device, and evaluation method - Google Patents

Evaluation system, evaluation device, and evaluation method Download PDF

Info

Publication number
WO2024009623A1
WO2024009623A1 PCT/JP2023/018500 JP2023018500W WO2024009623A1 WO 2024009623 A1 WO2024009623 A1 WO 2024009623A1 JP 2023018500 W JP2023018500 W JP 2023018500W WO 2024009623 A1 WO2024009623 A1 WO 2024009623A1
Authority
WO
WIPO (PCT)
Prior art keywords
person
satisfaction
feature amount
unit
satisfaction level
Prior art date
Application number
PCT/JP2023/018500
Other languages
French (fr)
Japanese (ja)
Inventor
孝治 堀内
武志 安慶
裕人 冨田
純子 上田
義照 田中
毅 吉原
康 岡田
Original Assignee
パナソニックIpマネジメント株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by パナソニックIpマネジメント株式会社 filed Critical パナソニックIpマネジメント株式会社
Publication of WO2024009623A1 publication Critical patent/WO2024009623A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Definitions

  • the present disclosure relates to an evaluation system, an evaluation device, and an evaluation method.
  • Patent Document 1 discloses a store management system that calculates the employee satisfaction level of a store clerk based on a conversation between the store clerk and a conversation partner.
  • the store management system stores calculation algorithms for calculating employee satisfaction for each type of person who can be a conversation partner.
  • the store management system acquires a conversation between a store employee and a conversation partner, recognizes the store employee's emotion based on the store employee's voice included in the conversation, and determines the type of conversation partner.
  • the store management system calculates the employee's satisfaction level based on the recognition result of the employee's emotion and a calculation algorithm corresponding to the determined type of conversation partner.
  • emotional information is calculated from voice data based on a conversation between people (for example, a store clerk and a customer), and the store clerk's satisfaction level (that is, employee satisfaction level) is calculated from the calculated emotional information.
  • the store clerk's satisfaction level that is, employee satisfaction level
  • evaluating a person's satisfaction level based only on voice data included in a conversation may not be accurate enough, and a more accurate satisfaction evaluation is required.
  • the present disclosure was devised in view of the conventional situation described above, and aims to perform highly accurate satisfaction evaluation using multiple pieces of information included in conversations between people.
  • the present disclosure includes: an acquisition unit that acquires audio data related to a conversation between a first person and a second person; an imaging unit that images the first person and the second person; an extraction unit that extracts a first feature amount related to the line of sight or face direction of each of the first person and the second person based on imaging data, and a second feature amount of the audio data; and calculating the satisfaction level of the first person based on the first feature amount, the second feature amount, and a calculation algorithm for calculating the satisfaction level of the first person.
  • An evaluation system is provided, comprising a satisfaction level calculation section.
  • the present disclosure acquires audio data related to a conversation between a first person and a second person, and imaging data obtained by imaging the first person and the second person, and based on the imaging data.
  • an extraction unit that extracts a first feature amount related to the line of sight or face direction of the first person and the second person, and a second feature amount of the audio data;
  • a satisfaction calculation unit that calculates the satisfaction level of the first person based on the first feature amount, the second feature amount, and a calculation algorithm for calculating the satisfaction level of the first person;
  • an evaluation device comprising:
  • the present disclosure acquires audio data related to a conversation between a first person and a second person, images the first person and the second person, and based on the imaged data, the first person A first feature amount related to the line of sight or face direction of the person and the second person, and a second feature amount of the audio data are extracted, and the first feature amount and the second person are extracted.
  • An evaluation method is provided that calculates the degree of satisfaction of the first person based on a second feature amount and a calculation algorithm for calculating the degree of satisfaction of the first person.
  • Diagram showing an overview of this embodiment Diagram showing an example of feature quantities A block diagram showing an example of the internal configuration of a terminal device and a server according to Embodiment 1. Diagram showing how satisfaction is calculated for logic-based algorithms Diagram showing an example of calculating satisfaction level at predetermined time intervals Sequence diagram of satisfaction evaluation processing according to Embodiment 1 A diagram showing an example of the internal configuration of a terminal device and a server according to Embodiment 2.
  • Sequence diagram of satisfaction evaluation processing according to Embodiment 2 A diagram showing an example of the internal configuration of a terminal device according to Embodiment 3 Flowchart showing the process of calculating satisfaction level on a terminal device Sequence diagram showing the process of calculating the satisfaction level on the server from data captured in images and sounds in the past Diagram showing an example of a screen displayed on a terminal device Diagram showing an example of a screen where a message is displayed depending on the satisfaction result
  • FIG. 1 is a diagram showing an overview of this embodiment.
  • FIG. 1 shows a case where a person A is having a conversation with a person B using a terminal device 1 that is connected via a network to the terminal device used by the person B.
  • Person A and Person B have an interpersonal relationship; for example, Person A is Person B's subordinate, and Person B is Person A's superior.
  • the relationship between Person A and Person B is not limited to that between a boss and a subordinate, but may be between employees and customers, between colleagues, between an interviewer and an interviewee, or in any other relationship (for example, between a teacher and a student).
  • person B who is a boss, interviews person A, who is a subordinate, online.
  • person A may be read as a first person
  • person B may be read as a second person.
  • the audio acquisition device 10 is, for example, a microphone, and picks up the person A's utterance CO “ ⁇ .”
  • the audio acquisition device 10 may be installed in the terminal device 1 or may be an external device communicably connected to the terminal device 1.
  • the data collected by the audio acquisition device 10 will be referred to as audio data.
  • the imaging device 11 is, for example, a camera, and images the person A.
  • the imaging device 11 may be installed in the terminal device 1 or may be an external device communicably connected to the terminal device 1.
  • imaging data the data of the person A captured by the imaging device 11 will be referred to as imaging data.
  • the terminal device 1 transmits the audio data acquired by the audio acquisition device 10 and the imaging data acquired by the imaging device 11 to a device that extracts feature amounts.
  • the device that extracts the feature amount is, for example, a server. Note that the terminal device 1 may extract the feature amount without transmitting the audio data and the imaging data to the server.
  • the feature amounts extracted from the imaging data and audio data are, for example, facial expressions, line of sight, speech, or actions. Note that the feature amounts extracted from the imaging data and audio data are not limited to these.
  • Information on facial expression or line of sight is extracted from image FR1 representing the face of person A in the captured image data.
  • Information regarding the behavior is extracted from the image FR2 representing the upper body of the person A in the image data.
  • Information related to the utterance is extracted from the audio data.
  • Information related to facial expressions, line of sight, speech, or actions is extracted by the terminal device 1 or the server 2 (see FIG. 7), and will be described in detail later.
  • the degree of satisfaction is calculated using the extracted feature data (hereinafter referred to as feature amount data) and an algorithm for estimating the degree of satisfaction (hereinafter referred to as satisfaction degree estimation algorithm).
  • Satisfaction is an index representing the degree of satisfaction of person A with the conversation with person B, which is estimated by a satisfaction estimation algorithm based on feature data.
  • Satisfaction estimation algorithms include algorithms based on predetermined logic (hereinafter referred to as logic-based algorithms) and algorithms based on machine learning (hereinafter referred to as machine learning-based algorithms).
  • a logic-based algorithm is an algorithm that defines a procedure for calculating satisfaction by repeatedly adding and subtracting points based on predetermined logic.
  • Machine learning-based algorithms are, for example, algorithms that use deep learning based on multilayer perceptrons, random forests, or convolutional neural networks as a configuration and directly output satisfaction levels from feature data.
  • Person B can confirm that Person A is highly satisfied by checking the satisfaction calculated from the audio data and image data. Able to communicate smoothly. Note that the conversation between person A and person B is not limited to an online interview or the like, but may be a face-to-face conversation.
  • FIG. 2 is a diagram showing an example of feature amounts.
  • facial expressions include a smiling face, a straight face, or a crying face.
  • Actions include nodding, standing still, or tilting one's head.
  • Embodiment 1 The evaluation system 100 in Embodiment 1 extracts user feature data using a terminal device when users (for example, person A and person B) are having an online conversation, and extracts the extracted feature data. is sent to the server and the satisfaction level is calculated by the server.
  • users for example, person A and person B
  • the satisfaction level is calculated by the server.
  • FIG. 3 is a block diagram showing an example of the internal configuration of the terminal device and the server according to the first embodiment.
  • the evaluation system 100A includes at least a terminal device 1A and a server 2A.
  • the number of terminal devices is not limited to one, but may be two or more than two.
  • the terminal device 1A is an example of a terminal used by a user. Note that when distinguishing between an evaluation system, a terminal device, and a server, an alphabet is added after the number. In addition, when the evaluation system, terminal device, and server are not distinguished, only numbers will be used in the description.
  • the terminal device 1A and the server 2A are communicably connected via the network NW.
  • the terminal device 1A and the server 2A may be communicably connected via a wired LAN (Local Area Network).
  • the terminal device 1A and the server 2A may perform wireless communication (for example, wireless LAN such as Wi-Fi (registered trademark)) without going through the network NW.
  • the terminal device 1A includes at least a communication I/F 13, a memory 14, an input device 15, a display device 16, an I/F 17, an audio acquisition device 10, an imaging device 11, and a processor 12.
  • the terminal device 1A is a PC (Personal Computer), a tablet, a mobile terminal, a housing including the audio acquisition device 10 and the imaging device 11, or the like.
  • the communication I/F 13 is a network interface circuit that performs wireless or wired communication with the network NW.
  • I/F represents an interface.
  • the terminal device 1A is communicably connected to the server 2A via the communication I/F 13 and the network NW.
  • the communication I/F 13 transmits the feature amount data extracted by the feature amount extraction unit 12A (see below) to the server 2A.
  • Communication methods used for communication in the communication I/F 13 include, for example, WAN (Wide Area Network), LAN (Local Area Network), LTE (Long Term Evolution), mobile communication such as 5G, power line communication, and short-range wireless communication.
  • Communication for example, Bluetooth (registered trademark) communication), communication for mobile phones, etc.
  • the memory 14 includes, for example, a RAM (Random Access Memory) as a work memory used when executing each process of the processor 12, and a ROM (Read Only Memory) that stores programs and data that define the operations of the processor 12. has. Data or information generated or acquired by the processor 12 is temporarily stored in the RAM. A program that defines the operation of the processor 12 is written in the ROM.
  • RAM Random Access Memory
  • ROM Read Only Memory
  • the input device 15 receives input from a user (for example, person A or person B).
  • the input device 15 is, for example, a touch panel display or a keyboard.
  • the input device 15 accepts operations in response to instructions displayed on the display device 16.
  • the display device 16 displays a screen (see below) created by the drawing screen creation unit 24B of the server 2.
  • the display device 16 is, for example, a display or a notebook PC monitor.
  • the I/F 17 is a software interface.
  • the I/F 17 is communicably connected to the communication I/F 13, memory 14, input device 15, display device 16, audio acquisition device 10, imaging device 11, and processor 12, and exchanges data with each device. .
  • the I/F 17 may be omitted from the terminal device 1A, and data may be exchanged between the devices of the terminal device 1A.
  • the audio acquisition device 10 picks up the utterances of a user (for example, person A or person B).
  • the audio acquisition device 10 is configured with a microphone device that can collect audio generated based on a user's utterance (that is, detect an audio signal).
  • the audio acquisition device 10 collects audio generated based on a user's utterance, converts it into an electrical signal as audio data, and outputs the electrical signal to the I/F 17 .
  • the imaging device 11 is a camera that images a user (for example, person A or person B).
  • the imaging device 11 includes at least a lens (not shown) as an optical element and an image sensor (not shown).
  • the lens receives light reflected by the object from within the angle of view of the imaged area of the imaging device 11 and forms an optical image of the object on the light receiving surface (in other words, the imaging surface) of the image sensor.
  • the image sensor is, for example, a solid-state imaging device such as a CCD (Charged Coupled Device) or a CMOS (Complementary Metal Oxide Semiconductor).
  • the image sensor converts an optical image formed on an imaging surface through a lens into an electrical signal and sends it to the I/F 17 at predetermined time intervals (for example, 1/30 seconds).
  • the audio acquisition device 10 and the imaging device 11 may be external devices that are communicably connected to the terminal device 1A.
  • the processor 12 is, for example, a CPU (Central Processing Unit), a DSP (Digital Signal Processor), a GPU (Graphical Processing Unit), or an FPGA (Field Processing Unit). It is a semiconductor chip on which at least one of electronic devices such as a programmable gate array is mounted.
  • the processor 12 functions as a controller that controls the overall operation of the terminal device 1A, and performs control processing for unifying the operation of each part of the terminal device 1A, data input/output processing with the I/F 17, and data calculation. Performs processing and data storage.
  • the processor 12 realizes the function of the feature extraction unit 12A.
  • the processor 12 uses the RAM of the memory 14 during operation, and temporarily stores data generated or acquired by the processor 12 in the RAM of the memory 14.
  • the feature amount extraction unit 12A extracts feature amounts (see FIG. 2) based on the audio data acquired from the audio acquisition device 10 and the image data acquired from the imaging device 11.
  • the feature amount extraction unit 12A extracts each feature amount from the audio data and the imaging data using, for example, trained model data for AI (Artificial Intelligence) processing stored in the memory 14 (in other words, based on AI). may be extracted.
  • trained model data for AI Artificial Intelligence
  • the feature amount extraction unit 12A detects the face part of the person A from the image data, and also detects the direction (in other words, the line of sight) of both eyes (that is, the left eye and the right eye) of the detected face part.
  • the feature extracting unit 12A detects the line of sight of the person A who is viewing the screen displayed on the display device 16 (for example, the captured video of the person B).
  • the line of sight detection method can be realized using publicly known technology, for example, it may be detected based on the difference in the orientation of both eyes reflected in each of a plurality of captured images (frames), or it may be realized using other detection methods. good.
  • the feature extracting unit 12A detects the facial part of the person A from the image data and also detects the direction of the face.
  • the direction of the face is the angle of the face with respect to a specific location on the display device 16 (for example, the center position of the panel of the display device 16).
  • the angle of the face is an azimuth angle and an elevation angle indicating the three-dimensional direction in which the face of the person A looking at the specific location exists, as viewed from a specific location on the display device 16 (see above).
  • This is a vector representation composed of . Note that the specific location does not have to be limited to the center position of the panel.
  • the face direction detection method can be realized using known techniques.
  • the feature extracting unit 12A detects the face part of the person A from the image data and also detects the facial expression of the person A.
  • the facial expression detection method can be realized using known techniques.
  • the feature extraction unit 12A detects the motion of person A from the image data.
  • the motion detection method can be realized using known techniques.
  • the feature extraction unit 12A detects the speaking time of person A from the voice data.
  • the speaking time may be detected by, for example, integrating the time of a portion of the voice data in which the voice signal of person A is detected. Note that the speech time detection method may be implemented using other known techniques. Furthermore, the feature extracting unit 12A calculates the rate at which person A and person B are speaking, based on the detected speaking time.
  • the feature extraction unit 12A detects the emotion of person A from the voice data.
  • the feature extraction unit 12A detects the emotion by detecting, for example, the intensity of the voice, the number of moras per unit time, the intensity of each word, the volume, or the spectrum of the voice, etc. from the voice data.
  • the emotion detection method is not limited to this, and may be realized by other known techniques.
  • the server 2A includes a communication I/F 21, a memory 22, an input device 23, an I/F 26, and a processor 24.
  • the communication I/F 21 transmits and receives data to and from each of the one or more terminal devices 1 via the network NW.
  • the communication I/F 21 transmits data of a screen output from the I/F 26 to be displayed on the display device 16 to the terminal device 1A.
  • the memory 22 includes, for example, a RAM as a work memory used when the processor 24 executes each process, and a ROM that stores programs and data that define the operations of the processor 24. Data or information generated or acquired by the processor 24 is temporarily stored in the RAM. A program that defines the operation of the processor 24 is written in the ROM.
  • the memory 22 also stores a satisfaction level estimation algorithm.
  • the input device 23 receives input from a user (for example, an administrator of the evaluation system 100).
  • the input device 15 is, for example, a touch panel display or a keyboard.
  • the input device 15 accepts the setting of a threshold value (see below) for a logic-based algorithm.
  • the I/F 26 is a software interface.
  • the I/F 26 is communicably connected to the communication I/F 21, the memory 22, the input device 23, and the processor 24, and exchanges data with each device. Note that the I/F 26 may be omitted from the server 2A, and data may be exchanged between the devices of the server 2A.
  • the processor 24 is a semiconductor chip on which at least one of electronic devices such as a CPU, a DSP, a GPU, and an FPGA is mounted.
  • the processor 24 functions as a controller that governs the overall operation of the server 2A, and performs control processing for unifying the operation of each part of the server 2A, data input/output processing with the I/F 26, data arithmetic processing, and Performs data storage processing.
  • the processor 24 implements the functions of the satisfaction level estimation section 24A and the drawing screen creation section 24B.
  • the processor 24 uses the RAM of the memory 22 during operation, and temporarily stores data generated or obtained by the processor 24 in the RAM of the memory 22.
  • the satisfaction estimation unit 24A calculates the satisfaction of the person A using the feature amount data acquired from the terminal device 1A and the satisfaction estimation algorithm recorded in the memory 22.
  • the satisfaction level estimation unit 24A may calculate the satisfaction level using a logic-based algorithm, or may calculate the satisfaction level using a machine learning-based algorithm.
  • the satisfaction estimation unit 24A outputs information regarding the calculated satisfaction level to the drawing screen creation unit 24B.
  • the drawing screen creation unit 24B creates a screen to be displayed on the display device 16 of the terminal device 1A using the satisfaction level acquired from the satisfaction level estimation unit 24A.
  • the screen includes, for example, a captured video of person A, information regarding satisfaction, a button for controlling the start of satisfaction evaluation, and the like. Note that the items included on the screen are not limited to these.
  • Methods of displaying information regarding satisfaction include displaying satisfaction values calculated at predetermined time intervals by plotting them numerically or on a graph each time, or displaying satisfaction values during or at the end of a meeting. .
  • the graph regarding the satisfaction value is a graph in which values are plotted, a bar graph, a meter, or the like.
  • the drawing screen creation unit 24B outputs the created screen to the I/F 26.
  • FIG. 4 is a diagram showing a method for calculating the satisfaction level of the logic-based algorithm.
  • satisfaction is calculated by adding and subtracting points according to a predetermined agreement (hereinafter referred to as a determination method).
  • the feature amounts used in the determination method are referred to as determination elements.
  • Points are added and subtracted at predetermined time intervals, throughout the conversation, from the start of the conversation to the current time, or at the last 30% of the conversation. Note that the range of time during which points are added and points are subtracted is not limited to these and may be arbitrarily determined by the user.
  • the speech rate represents the percentage of time that a user (for example, person A or person B) speaks within a predetermined period of time. For example, if the user speaks for a total of 1.0 seconds out of 2.5 seconds, the speaking rate is 1.0/2.5, which is 0.4 (that is, 40%).
  • the speech rate can be calculated, for example, by extracting the user's speech time at specific time intervals and dividing the total of the extracted speech times by the extracted total time. Note that the method of calculating the speech rate is one example and is not limited to this. Calculation of the utterance rate may be performed by the feature amount extraction section 12A, or may be performed by the satisfaction level estimation section 24A based on the feature amount data acquired from the feature amount extraction section 12A.
  • the satisfaction estimation algorithm adds 0.5 points to the satisfaction level. Note that the numerical values added and subtracted below are merely examples, and are not limited to 0.5, but may be any predetermined value. If the speech rate of person A is less than the speech rate of person B, the satisfaction estimation algorithm deducts 0.5 points from the satisfaction level.
  • points may be added or subtracted by taking into consideration not only the speech rate of person A relative to the speech rate of person B, but also whether the speech rate of person A is equal to or higher than a preset threshold. That is, when the speech rate of person A is equal to or greater than the speech rate of person B and the speech rate of person A is equal to or greater than the first threshold value, the satisfaction level estimation algorithm adds 0.5 points to the satisfaction level. If the speech rate of person A is less than the speech rate of person B and the speech rate of person A is less than a second threshold that is less than or equal to the first threshold, the satisfaction level estimation algorithm subtracts 0.5 points from the satisfaction level.
  • the first threshold is, for example, 50%
  • the second threshold is, for example, 40%. Note that the values of the first threshold value and the second threshold value are merely examples, and may be changed as appropriate by the user (for example, person B).
  • points may be subtracted, or there is no need to add or subtract points.
  • points may be added, or points may not be added or points may be subtracted.
  • points may be added regardless of the speech rate of person B.
  • points may be deducted regardless of the speech rate of person B.
  • Emotion is an index calculated from conversation audio data.
  • a positive rate, neutral rate, and negative rate are calculated based on the "emotion, facial expression, or action.”
  • the satisfaction estimation algorithm uses the positive rate and negative rate to add and subtract points.
  • the positive rate indicates the rate at which the user's (for example, person A) emotion is determined to be positive within a predetermined period of time.
  • Examples of feature amounts that are determined to be positive include person A's voice becoming louder, person A's voice becoming louder, person A nodding, or person A smiling. Note that the feature amounts that are determined to be positive are just examples and are not limited to these.
  • the neutral rate indicates the rate at which the emotions of the user (for example, person A) are determined to be neutral within a predetermined period of time.
  • the neutral state is a state in which it is assumed that person A's emotions are neither positive nor negative.
  • a neutral state is a state in which person A is calm.
  • the feature amount that is determined to be neutral is, for example, that person A has a straight face or that person A is standing still. Note that the feature amounts that are determined to be neutral are merely examples, and are not limited to these.
  • the negative rate indicates the rate at which the emotions of the user (for example, person A) are determined to be negative within a predetermined period of time.
  • Features that are determined to be negative include, for example, person A has a crying face, person A's voice has become lower, person A's voice has lowered pitch, or person A is tilting his head. . Note that the feature amounts that are determined to be negative are merely examples and are not limited to these.
  • the evaluation system 100 makes two positive determinations, two negative determinations, and one neutral determination within 2.5 seconds.
  • the positive rate is (1+1)/5, which is 0.4 (that is, 40%).
  • the negative rate is (1+1)/5, which is 0.4 (that is, 40%).
  • the neutral rate is 1/5, which is 0.2 (that is, 20%).
  • the calculation of the positive rate, neutral rate, and negative rate may be performed by the feature amount extraction unit 12A, or may be performed by the satisfaction level estimation unit 24A based on the feature amount data acquired from the feature amount extraction unit 12A.
  • the satisfaction estimation algorithm adds 0.5 points to the satisfaction level. If the negative rate of person A is equal to or greater than the threshold for deducting points, the satisfaction estimation algorithm deducts 0.5 points from the satisfaction level.
  • the threshold for adding points is 50%. In this case, if the positive rate is 50% or more, the satisfaction estimation algorithm adds 0.5 points to the satisfaction. Note that the threshold value for adding points is not limited to 50% and may be changed as appropriate by the user.
  • the threshold for demerit points is 50%. In this case, if the negative rate is 50% or more, the satisfaction estimation algorithm reduces the satisfaction by 0.5 points. Note that the threshold value for demerit points is not limited to 50%, and may be changed as appropriate by the user.
  • the satisfaction estimation algorithm adds 0.5 points to the satisfaction. If the time the person A looks in the direction of the display is less than the fourth threshold, which is equal to or less than the third threshold, 0.5 points are deducted from the satisfaction level.
  • FIG. 5 is a diagram illustrating an example of calculating satisfaction levels at predetermined time intervals.
  • the satisfaction value at the start of evaluation is set to 3, and the satisfaction estimation algorithm repeatedly adds and subtracts satisfaction points.
  • the satisfaction value at the start of the evaluation is not limited to 3 and may be any value.
  • the satisfaction level is assumed to take a value between 0 and 5 points. Note that the range of values that the satisfaction level can take is not limited to 0 to 5 points, but may be in other ranges, and the range does not need to be set.
  • the graphs for Case CA and Case CB are plots of satisfaction values calculated every 30 seconds.
  • the horizontal axis of the graphs for case CA and case CB represents elapsed time, and the vertical axis represents satisfaction level.
  • Case CA and case CB are, for example, cases in which the conversation ends in 5 minutes.
  • the satisfaction level estimation algorithm repeatedly adds or subtracts points every 30 seconds, and when 5 minutes have passed, the satisfaction level is 5 points, and the user (for example, person A) is highly satisfied. Indicates that the conversation has ended.
  • the satisfaction level estimation algorithm repeatedly adds or subtracts points every 30 seconds, and when 5 minutes have passed, the satisfaction level is 0 points, and the user (for example, person A) has a low satisfaction level. Indicates that the conversation has ended.
  • FIG. 6 is a sequence diagram of satisfaction evaluation processing according to the first embodiment.
  • the satisfaction level is evaluated by two terminal devices (terminal device 1AA, terminal device 1AB) and server 2A.
  • the evaluation system 100A calculates the satisfaction level of person A from a conversation between person A and person B.
  • person A who is the person to be evaluated, uses terminal device 1AA, and person B uses terminal device 1AB.
  • the number of terminal devices is not limited to two, and may be one or two or more.
  • the terminal device 1AA sets the values of each threshold value related to addition and deduction of satisfaction points by the satisfaction estimation algorithm (St100). Note that the setting of the threshold value in the terminal device 1AA may be omitted from the process related to FIG. 6.
  • the terminal device 1AA starts evaluating the satisfaction level of person A (St101).
  • the start of the satisfaction evaluation is executed, for example, by the user (for example, person B) pressing a button to start evaluation displayed on the display device 16.
  • the terminal device 1AA acquires image data and audio data of person A (St102).
  • the terminal device 1AA extracts feature amounts based on the imaging data and audio data acquired in the process of step St102 (St103).
  • the terminal device 1AB sets the values of each threshold regarding addition and deduction of satisfaction points by the satisfaction estimation algorithm (St104).
  • the threshold value may be set arbitrarily by the person B, or may be automatically set based on a set value stored in the memory 14 in advance. Further, the setting of the threshold value may be performed not in the terminal device 1AB but in the server 2A.
  • the terminal device 1AB starts evaluating the satisfaction level of person A (St105).
  • the terminal device 1AB acquires image data and audio data of person B (St106).
  • the terminal device 1AB extracts feature amounts based on the imaging data and audio data acquired in the process of step St106 (St107).
  • the terminal device 1AA transmits the threshold value set in the process of step St100 and the feature amount extracted in the process of step St103 to the server 2A.
  • the terminal device 1AB transmits the threshold setting value set in the process of step St104 and the feature amount extracted in the process of step St107 to the server 2A (St108).
  • the server 2A calculates the satisfaction level based on the threshold setting value, the feature amount, and the satisfaction level estimation algorithm obtained in the process of step St108 (St109).
  • the terminal device 1AA requests the server 2A to transmit the satisfaction results (St110). Note that the process of step St110 may be omitted from the process related to FIG. 6.
  • the terminal device 1AB requests the server 2A to send the satisfaction results (St111).
  • the server 2A draws a screen related to the satisfaction level results.
  • the server 2A transmits a screen on which the satisfaction level results are drawn to the terminal device 1AB (St112).
  • the server 2A transmits a screen on which the satisfaction level results are drawn to the terminal device 1AA (St113).
  • the process of step St113 may be omitted from the process related to FIG. 6.
  • the terminal device 1AA displays the screen acquired in the process of step St113 on the display of the terminal device 1AA (St114).
  • the process of step St114 may be omitted from the process related to FIG. 6.
  • the terminal device 1AB displays the screen acquired in the process of step St112 on the display of the terminal device 1AB (St115).
  • the server 2A transmits a signal to end the evaluation to the terminal device 1AA and the terminal device 1AB (St116).
  • the terminal device 1AA ends the satisfaction evaluation based on the signal acquired in the process of step St116 (St117).
  • the terminal device 1AB ends the satisfaction evaluation based on the signal acquired in the process of step St116 (St118).
  • the terminal device 1AA transmits a request to send the final satisfaction result to the server 2A (St119).
  • the process of step St119 may be omitted from the process of FIG.
  • the terminal device 1AB transmits a request to transmit the final satisfaction result to the server 2A (St120).
  • the server 2A draws a screen related to the final satisfaction result based on the request obtained in the process of step St120.
  • the server 2A transmits a screen showing the final result of the satisfaction level to the terminal device 1AB (St121).
  • the server 2A draws a screen showing the final result of the satisfaction level.
  • the server 2A transmits a screen showing the final result of the satisfaction level to the terminal device 1AA (St122).
  • the process of step St122 may be omitted from the process of FIG. 6.
  • the terminal device 1AA displays the screen acquired in the process of step St122 on the display of the terminal device 1AA (St123).
  • the process of step St123 may be omitted from the process of FIG.
  • the terminal device 1AB displays the screen acquired in the process of step St121 on the display of the terminal device 1AB (St124).
  • Embodiment 2 In the evaluation system according to Embodiment 2, a server performs everything from extraction of feature amounts to calculation of satisfaction level all at once based on imaging data and audio data acquired by a terminal device.
  • the same reference numerals will be used for the same components as in Embodiment 1, and the description thereof will be omitted.
  • FIG. 7 is a diagram showing an example of the internal configuration of a terminal device and a server according to the second embodiment. Only the parts that are different from the hardware block diagram according to the first embodiment shown in FIG. 3 will be explained.
  • the feature extraction unit 12A is incorporated into the processor 24 of the server 2B. That is, the terminal device 1B includes a communication I/F 13, a memory 14, an input device 15, a display device 16, an I/F 17, an audio acquisition device 10, and an imaging device 11.
  • the server 2B includes a communication I/F 21, a memory 22, an input device 23, an I/F 26, and a processor 24.
  • the processor 24 realizes the functions of the feature amount extraction section 12A, the satisfaction estimation section 24A, and the drawing screen creation section 24B.
  • the feature amount extraction unit 12A extracts feature amounts based on the audio data and image data acquired from the terminal device 1B.
  • FIG. 8 is a sequence diagram of satisfaction evaluation processing according to the second embodiment. Processes similar to those in the sequence diagram of FIG. 6 of the first embodiment are given the same reference numerals, and only different processes will be described.
  • the terminal device 1BA transmits the threshold value set in the process of step St100 and the imaging data and audio data acquired in the process of step St102 to the server 2B (St200).
  • the terminal device 1BB transmits the threshold value set in the process of step St104 and the imaging data and audio data acquired in the process of step St106 to the server 2B (St200).
  • the server 2B extracts feature amounts based on the imaging data and audio data acquired in the process of step St200 (St201).
  • the server 2B calculates the degree of satisfaction based on the feature amount extracted in the process of step St201 (St202).
  • the following processing is the same as each processing related to the sequence diagram of FIG. 6, so the explanation will be omitted.
  • the evaluation system calculates the degree of satisfaction using the terminal device or the server based on the imaging data and audio data (that is, the data recorded or recorded in the past) acquired by the terminal device in the past.
  • the same reference numerals will be used for the same components as in Embodiment 1, and the description thereof will be omitted.
  • FIG. 9 is a diagram showing an example of the internal configuration of a terminal device according to the third embodiment. Only the parts that are different from the hardware block diagram according to the first embodiment shown in FIG. 3 will be explained.
  • the terminal device 1C includes a communication I/F 13, a memory 14, an input device 15, a display device 16, an audio acquisition device 10, an imaging device 11, and a processor 12.
  • the audio acquisition device 10 and the imaging device 11 may be omitted.
  • the communication I/F 13 may transmit the screen drawn by the drawing screen creation unit 24B of the processor 12 to another terminal device or the like. Further, when the audio acquisition device 10 and the imaging device 11 are external devices, the communication I/F 13 acquires image data captured in the past and audio data captured in the past from the external devices.
  • the feature amount extraction unit 12A of the processor 12 extracts feature amounts based on image data captured in the past and audio data captured in the past.
  • the feature extraction unit 12A outputs the extracted feature data to the satisfaction estimation unit 24A.
  • the feature extraction unit 12A obtains one file that includes the imaging data and audio data of both person A and person B.
  • the feature extracting unit 12A separates one file into four pieces of data: image data and voice data of person A, and image data and voice data of person B, using known techniques such as image recognition or voice recognition. do.
  • the feature extracting unit 12A may obtain two files: a file containing image data and audio data of person A, and a file containing image data and audio data of person B. .
  • the feature extraction unit 12A separates each file into image data and audio data using a known technique.
  • the input device 15 may obtain an input from a user (for example, person A or person B) regarding whether each of the two files is associated with person A or person B.
  • the feature extraction unit 12A obtains four files: a file of image data of person A, a file of voice data of person A, a file of image data of person B, and a file of voice data of person B. You may.
  • the input device 15 may obtain input from the user (for example, person A or person B) regarding whether each of the four files is associated with person A or person B.
  • the hardware block diagram of the third embodiment is similar to FIG. 7 of the second embodiment.
  • the server 2B acquires audio data previously acquired by the audio acquisition device 10 of the terminal device 1B and imaging data previously acquired by the imaging device 11 of the terminal device 1B.
  • the feature amount extraction unit 12A of the server 2B extracts the feature amount based on the acquired audio data and image data, and the satisfaction level estimation unit 24A calculates the satisfaction level based on the extracted feature amount.
  • FIG. 10 is a flowchart illustrating the process of calculating the degree of satisfaction on the terminal device. Each process related to FIG. 10 is executed by the processor 12.
  • the processor 12 sets the values of each threshold regarding addition and deduction of satisfaction points by the satisfaction estimation algorithm (St300).
  • the processor 12 may set the threshold value by obtaining an input signal from the user (for example, person B) obtained from the input device 15, or may set the threshold value automatically based on the setting value stored in advance in the memory 14. You can.
  • the processor 12 acquires previously captured image data and captured audio data stored in the memory 14 (St301). Note that the processor 12 is not limited to past data, and may acquire data currently being acquired by the audio acquisition device 10 and the imaging device 11 of the terminal device 1C.
  • the processor 12 extracts feature amounts from the imaging data and audio data acquired in the process of step St301 (St302).
  • the processor 12 calculates the satisfaction level of the user (for example, person A) based on the feature amount extracted in the process of step St302 (St303).
  • the processor 12 draws a screen showing the satisfaction level calculated in the process of step St303 (St304).
  • FIG. 11 is a sequence diagram showing a process in which the server calculates the degree of satisfaction from data captured in images and sounds in the past.
  • the terminal device 1B transmits to the server 2B threshold setting information regarding addition and deduction of satisfaction points based on the satisfaction estimation algorithm (St400).
  • the terminal device 1B transmits image data captured in the past and audio data captured in the past to the server 2B (St401).
  • the server 2B extracts feature amounts based on the imaging data and audio data acquired in the process of step St401 (St402).
  • the server 2B calculates the degree of satisfaction based on the feature amount acquired in the process of step St402 (St403).
  • the terminal device 1B requests the server 2B to send the final result of the satisfaction level (St404).
  • the server 2B draws a screen including the final satisfaction result based on the request received from the terminal device 1B in the process of step St404.
  • the server 2B transmits the drawn screen to the terminal device 1B (St405).
  • the terminal device 1B displays a screen including the final result of the satisfaction level obtained in the process of step St405 (St406).
  • FIG. 12 is a diagram showing an example of a screen displayed on a terminal device.
  • Screen MN1 is an example of a screen displayed on terminal device 1 at a certain moment during a meeting. For example, if Person A and Person B are having a meeting and Person A is the person to be evaluated, screen MN1 is the screen that Person B refers to. Screen MN1 includes display areas IT1, IT2 and buttons BT1, BT2, BT3, BT4, BT5, and BT6.
  • the display area IT2 is an area where the captured video of the person A is displayed in real time.
  • the drawing screen creation unit 24B displays the captured video of the person A acquired from the imaging device 11 in the display area IT2.
  • the display area IT1 is an area where the satisfaction results are displayed.
  • the display area IT1 displays a graph in which satisfaction values calculated at predetermined time intervals are plotted.
  • the drawing screen creation unit 24B may display the satisfaction level in the display area IT1 at the timing when the satisfaction level is acquired from the satisfaction level estimation unit 24A.
  • the display area IT1 is not limited to graphs, and may display satisfaction values calculated from the start of the meeting to now in numbers, or satisfaction values calculated at predetermined time intervals. may be displayed numerically each time.
  • the display area IT1 may display the current satisfaction level in text such as "high”, “medium”, or "low” based on the calculated satisfaction value, or may display the current satisfaction level in accordance with the satisfaction level. You may also display emoticons or pictograms.
  • the button BT1 is a button that turns on the display of the captured image of the user on the other party's terminal device 1.
  • the button BT2 is a button for turning off the display of the user's captured video on the other party's terminal device 1.
  • the button BT3 is a button that turns on the output of your own voice to the terminal device 1 of the other party.
  • the button BT4 is a button for turning off the output of your own voice to the terminal device 1 of the other party.
  • the button BT5 is a button for starting or ending satisfaction evaluation. Button BT5 may be omitted from screen MN1.
  • the button BT6 is a button for starting or ending a conference.
  • Screen MN2 is an example of a screen displayed on the terminal device 1 at a certain moment during the meeting.
  • Screen MN2 is a screen displayed on terminal device 1 when one minute has passed since screen MN1 was displayed on terminal device 1.
  • the display area IT3 is an area where the satisfaction results are displayed.
  • the display area IT3 displays a graph in which satisfaction values calculated at predetermined time intervals are plotted.
  • the display area IT3 displays a graph in which two satisfaction results are additionally plotted on the graph displayed in the display area IT1 as the conversation between person A and person B progresses for one minute. In this way, in the display area IT3, satisfaction results are additionally plotted in real time according to the elapsed time.
  • FIG. 13 is a diagram showing an example of a screen on which a message is displayed according to the satisfaction level result.
  • elements that overlap with those in FIG. 12 are given the same reference numerals to simplify or omit the description, and different contents will be described.
  • Screen MN3 is an example of a screen displayed on terminal device 1 at a certain moment during a meeting.
  • Screen MN3 is a screen that is displayed on terminal device 1 when one minute has elapsed since screen MN1 was displayed on terminal device 1.
  • the display area IT4 is an area where the satisfaction results are displayed.
  • the display area IT4 displays a graph in which satisfaction values calculated at predetermined time intervals are plotted.
  • the display area IT4 displays a graph in which two satisfaction results are additionally plotted on the graph displayed in the display area IT1 as the conversation between person A and person B progresses for one minute.
  • the message Mes is a message displayed according to the satisfaction level. For example, the message Mes is displayed according to person A's speaking rate.
  • the satisfaction estimation unit 24A sends a signal to the drawing screen creation unit 24B that the speech rate of person A is less than the speech rate of person B. Output.
  • the satisfaction estimation unit 24A determines that the speech rate of person A is less than the speech rate of person B.
  • the signal may be output to the drawing screen creation section 24B.
  • the satisfaction level estimation unit 24A when determining that the speech rate of person A is less than the second threshold, the satisfaction level estimation unit 24A outputs a signal indicating that the speech rate of person A is less than the second threshold to the drawing screen creation unit 24B.
  • the feature extraction unit 12A performs the determination as to whether or not the speech rate of person A is less than the speech rate of person B, and the determination as to whether or not the speech rate of person A is less than the second threshold.
  • the drawing screen creation unit 24B creates a message for the person B to refrain from speaking based on the signal acquired from the satisfaction level estimation unit 24A, and causes the message to be displayed on the screen MN3.
  • the message to refrain from speaking is, for example, "Let's listen to what Person A has to say.” Note that the message to refrain from speaking is one example and is not limited to this.
  • the terminal device 1 or the server 2 may calculate the satisfaction level of each of a plurality of people and calculate the average value of the satisfaction levels of all the people. In this way, the terminal device 1 or the server 2 may aggregate the satisfaction level and notify the user without identifying the individual.
  • the terminal device 1 may display something that attracts the viewer's attention, such as an avatar, on the screen that person B is viewing.
  • the avatar and the like may also be displayed on the screen of person A.
  • the evaluation system can improve the satisfaction level of person A by displaying an avatar on the screen to attract the attention of person A and person B to the screen.
  • the terminal device 1 may display a notification that the person A is currently thinking.
  • the terminal device 1 or the server 2 may calculate the degree of satisfaction without displaying the captured video of the person having the conversation on the display device 16 (that is, with the display of the captured video turned off).
  • the evaluation system has an acquisition unit (for example, it includes a voice acquisition device 10) and an imaging unit (for example, an imaging device 11) that images a first person and a second person.
  • the evaluation system calculates a first feature amount related to the line of sight or face direction of each of the first person and the second person, and a second feature amount of the audio data based on the imaging data of the imaging unit.
  • It includes an extraction unit (for example, feature quantity extraction unit 12A) that performs extraction.
  • the evaluation system calculates the satisfaction level of the first person based on the first feature amount, the second feature amount, and the calculation algorithm for calculating the satisfaction level of the first person.
  • a calculation unit for example, a satisfaction level estimation unit 24A is provided.
  • the evaluation system can calculate the satisfaction level based on two pieces of information: information related to the first person's line of sight or face direction, and information related to the first person's voice data. Thereby, the evaluation system can perform highly accurate satisfaction evaluation using a plurality of pieces of information included in conversations between people.
  • the satisfaction calculation unit of the evaluation system of this embodiment calculates the satisfaction at predetermined time intervals from the start to the end of the conversation. Thereby, the evaluation system can calculate the satisfaction level of the first person at each time from the start of the conversation until the end of the conversation, and can perform a flexible evaluation of the satisfaction level.
  • the extraction unit of the evaluation system of the present embodiment uses, as second feature quantities, a first ratio indicating the ratio of the first person speaking in the conversation and a ratio of the second person speaking.
  • a second ratio is calculated.
  • the calculation algorithm adds a predetermined value to the satisfaction value when the first ratio is greater than or equal to the second ratio, and subtracts a predetermined value from the satisfaction value when the first ratio is less than the second ratio. do.
  • the evaluation system can evaluate the degree of satisfaction according to the speech rate of the first person relative to the speech rate of the second person.
  • the calculation algorithm of the evaluation system of the present embodiment adds a predetermined value to the satisfaction value when the first ratio is equal to or higher than the second ratio and the first ratio is equal to or higher than the first threshold value.
  • the calculation algorithm subtracts a predetermined value from the satisfaction value when the first ratio is less than the second ratio and the first ratio is less than the second threshold, which is less than or equal to the first threshold.
  • the extraction unit of the evaluation system of the present embodiment detects the emotion of the first person from the voice data as a second feature quantity, and detects a positive rate, which is the rate at which the first person feels positive, from the emotion. and a negative rate, which is the rate at which the first person felt negative based on the emotion.
  • the calculation algorithm adds a predetermined value to the satisfaction value when the positive rate is greater than or equal to a threshold for adding points, and subtracts a predetermined value from the satisfaction value when the negative rate is greater than or equal to a threshold for subtracting points.
  • the evaluation system can evaluate the degree of satisfaction based on the emotion detected from the voice data of the first person.
  • the evaluation system of this embodiment further includes a first display section (for example, display device 16) on which a second person is displayed when the first person has a conversation.
  • the extraction unit calculates a time period during which the first person looks at the first display unit as the first feature amount.
  • the calculation algorithm adds a predetermined value to the satisfaction value when the time is equal to or greater than a third threshold, and subtracts a predetermined value from the satisfaction value when the time is less than a fourth threshold that is equal to or less than the third threshold.
  • the evaluation system can evaluate the degree of satisfaction based on the time the first person looks at the first display section.
  • the calculation algorithm of the evaluation system of this embodiment calculates the degree of satisfaction based on machine learning.
  • the evaluation system can calculate the degree of satisfaction from the feature amount data using a calculation algorithm based on machine learning.
  • the evaluation system of the present embodiment includes a second display section (e.g., display device 16) that displays a screen that the second person refers to when having a conversation, and a screen creation section (e.g., It further includes a drawing screen creation section 24B).
  • the screen creation unit creates a screen including the satisfaction result calculated by the satisfaction calculation unit and causes the second display unit to display the screen. This allows the second person, who is the evaluator, to confirm the satisfaction level results of the first person. Thereby, the evaluation system can support the first person to have a conversation with a high level of satisfaction by notifying the second person of the result of the first person's satisfaction level.
  • the evaluation system further includes a second display unit that displays a screen that the second person refers to when having a conversation, and a screen creation unit that creates the screen.
  • the screen creation unit displays the satisfaction level on the second display unit while acquiring the satisfaction level calculated by the satisfaction level calculation unit from the satisfaction level calculation unit according to the conversation between the first person and the second person. .
  • This allows the second person to check the first person's current satisfaction level while conversing with the first person.
  • the evaluation system can support the first person to have a conversation so that the first person has a high degree of satisfaction.
  • the evaluation system further includes a second display unit that displays a screen that the second person refers to when having a conversation, and a screen creation unit that creates the screen.
  • the screen creation unit displays on the screen a message to the effect that the second person should refrain from speaking.
  • the evaluation system can display a message that helps increase the satisfaction level of the first person based on the speech rates of the first person and the second person.
  • the screen created by the screen creation unit of the evaluation system includes a display area where the captured video of the first person is displayed, a display area where the satisfaction result is displayed, and a screen where the second person's captured image is displayed.
  • a button for displaying the captured video on a screen referenced by the first person a button for outputting the audio data of the second person from a terminal device used by the first person, and a button for controlling the start or end of the conference. including a button.
  • the evaluation system can display a screen including the satisfaction result to the second person.
  • the evaluation system can support the second person to have a smooth conversation with the first person by notifying the second person of the satisfaction level of the first person.
  • the extraction unit of the evaluation system is based on the audio data acquired in advance by the acquisition unit and the imaged data of the first person and the second person that were imaged in advance by the imaging unit. Then, a first feature amount related to the line of sight or face direction of each of the first person and the second person, and a second feature amount of the audio data are extracted. Thereby, the evaluation system can extract the feature amount from the image data captured in the past and the audio data captured, and evaluate the satisfaction level of the first person.
  • the extraction unit of the evaluation system extracts a third feature amount related to the facial expressions of each of the first person and the second person based on the imaging data of the imaging unit, and calculates the satisfaction level.
  • the calculation unit calculates the satisfaction level of the first person based on the third feature amount and the calculation algorithm. Thereby, the evaluation system can evaluate the degree of satisfaction from the feature amount based on the facial expression of the first person.
  • the extraction unit of the evaluation system extracts a fourth feature amount related to each of the actions of the first person and the second person based on the imaging data of the imaging unit.
  • the satisfaction level calculation unit calculates the satisfaction level of person A based on the fourth feature amount and the calculation algorithm. Thereby, the evaluation system can evaluate the degree of satisfaction from the feature amount based on the behavior of the first person.
  • the second feature used in the evaluation system according to the present embodiment is at least one of the voice intensity, the number of moras per unit time, the intensity of each word, the volume, or the voice spectrum. Thereby, the evaluation system can calculate the first person's emotion from the second feature amount.
  • the second person in this embodiment has an interpersonal relationship with the first person, and the interpersonal relationship includes at least one of the following: between a boss and a subordinate, between an employee and a customer, between colleagues, or between an interviewer and an interviewee. It is characterized in that it includes one.
  • the evaluation system can evaluate the satisfaction level of the first person in a situation where the second person has a conversation with the first person with whom he or she has an interpersonal relationship.
  • the evaluation system further includes a calculation algorithm storage unit (for example, the memory 14 or the memory 22) that stores the calculation algorithm.
  • a calculation algorithm storage unit for example, the memory 14 or the memory 22
  • the evaluation system can evaluate the satisfaction level of the first person based on the calculation algorithm stored in the calculation algorithm storage unit.
  • the technology of the present disclosure is useful as an evaluation system, an evaluation device, and an evaluation method that perform highly accurate satisfaction evaluation using multiple pieces of information included in conversations between people.

Abstract

This evaluation system comprises: an acquisition unit for acquiring speech data pertaining to a conversation between a first person and a second person; an imaging unit which images the first person and the second person; an extraction unit which extracts a first feature amount pertaining to the line of sight or the orientation of the face of each of the first person and the second person on the basis of the imaging data from the imaging unit, and a second feature amount about the speech data; and a satisfaction level calculation unit which calculates the satisfaction level of the first person on the basis of the first feature amount, the second feature amount, and a calculation algorithm for calculating the satisfaction level of the first person.

Description

評価システム、評価装置および評価方法Evaluation system, evaluation device and evaluation method
 本開示は、評価システム、評価装置および評価方法に関する。 The present disclosure relates to an evaluation system, an evaluation device, and an evaluation method.
 特許文献1には、店員と会話相手との会話に基づいて、店員の従業員満足度を算出する店舗管理システムが開示されている。店舗管理システムは、会話相手となり得る人物の種別ごとに、従業員満足度を算出するための算出アルゴリズムを記憶する。店舗管理システムは、店員と会話相手の会話を取得し、会話に含まれる店員の音声に基づいて店員の感情を認識し、会話相手の種別を判別する。店舗管理システムは、店員の感情の認識結果と、判別された会話相手の種別に対応する算出アルゴリズムと、に基づいて、従業員の満足度を算出する。 Patent Document 1 discloses a store management system that calculates the employee satisfaction level of a store clerk based on a conversation between the store clerk and a conversation partner. The store management system stores calculation algorithms for calculating employee satisfaction for each type of person who can be a conversation partner. The store management system acquires a conversation between a store employee and a conversation partner, recognizes the store employee's emotion based on the store employee's voice included in the conversation, and determines the type of conversation partner. The store management system calculates the employee's satisfaction level based on the recognition result of the employee's emotion and a calculation algorithm corresponding to the determined type of conversation partner.
日本国特開2011-237957号公報Japanese Patent Application Publication No. 2011-237957
 近年、人物の心理状態の客観的把握の観点から、人物の満足度を測定および算出することが求められている。例えば、職場で従業員のモチベーションの維持または向上に、人物の満足度の算出技術を用いたいというニーズがあった。なお、人物の満足度の算出技術が求められる場面は上述した例に限られず、顧客の満足度からサービス向上に努める場面等で用いられてもよい。 In recent years, there has been a need to measure and calculate a person's satisfaction level from the perspective of objectively understanding the person's psychological state. For example, there was a need to use technology to calculate people's satisfaction levels to maintain or improve employee motivation in the workplace. Note that the situation in which a technique for calculating a person's satisfaction level is required is not limited to the above-mentioned example, and may be used in a situation where an effort is made to improve service based on customer satisfaction level.
 特許文献1では、人物間(例えば、店員と顧客)の会話に基づく音声データから感情の情報を算出し、算出された感情の情報から店員の満足度(つまり、従業員満足度)を算出する。しかしながら、会話に含まれる音声データのみに基づく人物の満足度の評価では精度が不十分である可能性があり、より高精度な満足度の評価が求められた。 In Patent Document 1, emotional information is calculated from voice data based on a conversation between people (for example, a store clerk and a customer), and the store clerk's satisfaction level (that is, employee satisfaction level) is calculated from the calculated emotional information. . However, evaluating a person's satisfaction level based only on voice data included in a conversation may not be accurate enough, and a more accurate satisfaction evaluation is required.
 本開示は、上述した従来の状況に鑑みて案出され、人物間の会話に含まれる複数の情報を用いた高精度な満足度の評価を行うことを目的とする。 The present disclosure was devised in view of the conventional situation described above, and aims to perform highly accurate satisfaction evaluation using multiple pieces of information included in conversations between people.
 本開示は、第1の人物と第2の人物との会話に係る音声データを取得する取得部と、前記第1の人物と前記第2の人物とを撮像する撮像部と、前記撮像部の撮像データに基づいて前記第1の人物と前記第2の人物とのそれぞれの視線もしくは顔の向きに係る第1の特徴量と、前記音声データの第2の特徴量と、を抽出する抽出部と、前記第1の特徴量と、前記第2の特徴量と、前記第1の人物の満足度を算出するための算出アルゴリズムと、に基づいて、前記第1の人物の満足度を算出する満足度算出部と、を備える、評価システムを提供する。 The present disclosure includes: an acquisition unit that acquires audio data related to a conversation between a first person and a second person; an imaging unit that images the first person and the second person; an extraction unit that extracts a first feature amount related to the line of sight or face direction of each of the first person and the second person based on imaging data, and a second feature amount of the audio data; and calculating the satisfaction level of the first person based on the first feature amount, the second feature amount, and a calculation algorithm for calculating the satisfaction level of the first person. An evaluation system is provided, comprising a satisfaction level calculation section.
 また、本開示は、第1の人物と第2の人物との会話に係る音声データと、前記第1の人物と前記第2の人物とを撮像した撮像データを取得し、前記撮像データに基づいて前記第1の人物と前記第2の人物とのそれぞれの視線もしくは顔の向きに係る第1の特徴量と、前記音声データの第2の特徴量と、を抽出する抽出部と、前記第1の特徴量と、前記第2の特徴量と、前記第1の人物の満足度を算出するための算出アルゴリズムと、に基づいて、前記第1の人物の満足度を算出する満足度算出部と、を備える、評価装置を提供する。 Further, the present disclosure acquires audio data related to a conversation between a first person and a second person, and imaging data obtained by imaging the first person and the second person, and based on the imaging data. an extraction unit that extracts a first feature amount related to the line of sight or face direction of the first person and the second person, and a second feature amount of the audio data; a satisfaction calculation unit that calculates the satisfaction level of the first person based on the first feature amount, the second feature amount, and a calculation algorithm for calculating the satisfaction level of the first person; Provided is an evaluation device comprising:
 また、本開示は、第1の人物と第2の人物との会話に係る音声データを取得し、前記第1の人物と前記第2の人物とを撮像し、撮像データに基づいて前記第1の人物と前記第2の人物とのそれぞれの視線もしくは顔の向きに係る第1の特徴量と、前記音声データの第2の特徴量と、を抽出し、前記第1の特徴量と、前記第2の特徴量と、前記第1の人物の満足度を算出するための算出アルゴリズムと、に基づいて、前記第1の人物の満足度を算出する、評価方法を提供する。 Further, the present disclosure acquires audio data related to a conversation between a first person and a second person, images the first person and the second person, and based on the imaged data, the first person A first feature amount related to the line of sight or face direction of the person and the second person, and a second feature amount of the audio data are extracted, and the first feature amount and the second person are extracted. An evaluation method is provided that calculates the degree of satisfaction of the first person based on a second feature amount and a calculation algorithm for calculating the degree of satisfaction of the first person.
 なお、これらの包括的または具体的な態様は、システム、装置、方法、集積回路、コンピュータプログラム、または、記録媒体で実現されてもよく、システム、装置、方法、集積回路、コンピュータプログラムおよび記録媒体の任意な組み合わせで実現されてもよい。 Note that these comprehensive or specific aspects may be realized by a system, an apparatus, a method, an integrated circuit, a computer program, or a recording medium. It may be realized by any combination of the following.
 本開示によれば、人物間の会話に含まれる複数の情報を用いた高精度な満足度の評価を行うことができる。 According to the present disclosure, it is possible to perform highly accurate satisfaction evaluation using multiple pieces of information included in conversations between people.
本実施の形態の概要を示す図Diagram showing an overview of this embodiment 特徴量の一例を表した図Diagram showing an example of feature quantities 実施の形態1に係る端末装置とサーバとのそれぞれの内部構成例を示すブロック図A block diagram showing an example of the internal configuration of a terminal device and a server according to Embodiment 1. ロジックベースのアルゴリズムの満足度の算出方法を示した図Diagram showing how satisfaction is calculated for logic-based algorithms 所定の時間間隔で満足度を算出した例を表す図Diagram showing an example of calculating satisfaction level at predetermined time intervals 実施の形態1に係る満足度の評価の処理のシーケンス図Sequence diagram of satisfaction evaluation processing according to Embodiment 1 実施の形態2に係る端末装置とサーバとのそれぞれの内部構成例を示した図A diagram showing an example of the internal configuration of a terminal device and a server according to Embodiment 2. 実施の形態2に係る満足度の評価の処理のシーケンス図Sequence diagram of satisfaction evaluation processing according to Embodiment 2 実施の形態3に係る端末装置の内部構成例を示した図A diagram showing an example of the internal configuration of a terminal device according to Embodiment 3 端末装置で満足度を算出する処理を表すフローチャートFlowchart showing the process of calculating satisfaction level on a terminal device 過去に撮像および収音したデータからサーバで満足度を算出する処理を示すシーケンス図Sequence diagram showing the process of calculating the satisfaction level on the server from data captured in images and sounds in the past 端末装置に表示される画面の一例を示す図Diagram showing an example of a screen displayed on a terminal device 満足度の結果に応じてメッセージを表示された画面の一例を示す図Diagram showing an example of a screen where a message is displayed depending on the satisfaction result
 以下、図面を適宜参照して、本開示に係る評価システム、評価装置および評価方法を具体的に開示した実施の形態について、詳細に説明する。ただし、必要以上に詳細な説明は省略する場合がある。例えば、すでによく知られた事項の詳細説明及び実質的に同一の構成に対する重複説明を省略する場合がある。これは、以下の説明が不必要に冗長になるのを避け、当業者の理解を容易にするためである。なお、添付図面及び以下の説明は、当業者が本開示を十分に理解するために提供されるのであって、これらにより特許請求の記載の主題を限定することは意図されていない。 Hereinafter, embodiments specifically disclosing an evaluation system, an evaluation device, and an evaluation method according to the present disclosure will be described in detail with appropriate reference to the drawings. However, more detailed explanation than necessary may be omitted. For example, detailed explanations of well-known matters and redundant explanations of substantially the same configurations may be omitted. This is to avoid unnecessary redundancy in the following description and to facilitate understanding by those skilled in the art. The accompanying drawings and the following description are provided to enable those skilled in the art to fully understand the present disclosure, and are not intended to limit the subject matter of the claims.
<発明の概要>
 まず、図1を参照して、本実施の形態の概要について説明する。図1は、本実施の形態の概要を示す図である。
<Summary of the invention>
First, an overview of this embodiment will be explained with reference to FIG. FIG. 1 is a diagram showing an overview of this embodiment.
 図1は、人物Bの使用する端末装置とネットワークで接続された端末装置1を用いて、人物Aが人物Bと会話しているケースを表している。人物Aと人物Bとは対人関係にあり、例えば人物Aが人物Bの部下で、人物Bが人物Aの上司である。なお、人物Aと人物Bとの関係は上司と部下に限られず、従業員と顧客、同僚同士または面接官と面接を受ける人、さらに他の関係(例えば教師と生徒)であってもよい。例えば、上司である人物Bが、部下である人物Aとオンラインで面談するケースが想定される。以下の説明では、人物Aを第1の人物と読み替え、人物Bを第2の人物と読み替えてもよい。 FIG. 1 shows a case where a person A is having a conversation with a person B using a terminal device 1 that is connected via a network to the terminal device used by the person B. Person A and Person B have an interpersonal relationship; for example, Person A is Person B's subordinate, and Person B is Person A's superior. Note that the relationship between Person A and Person B is not limited to that between a boss and a subordinate, but may be between employees and customers, between colleagues, between an interviewer and an interviewee, or in any other relationship (for example, between a teacher and a student). For example, a case is assumed in which person B, who is a boss, interviews person A, who is a subordinate, online. In the following description, person A may be read as a first person, and person B may be read as a second person.
 音声取得デバイス10は、例えばマイクであり、人物Aの発話CO「○○××です。」を収音する。音声取得デバイス10は、端末装置1に搭載されてもよいし、端末装置1に通信可能に接続された外部機器でもよい。以下、音声取得デバイス10により収音されたデータを音声データと称する。 The audio acquisition device 10 is, for example, a microphone, and picks up the person A's utterance CO “○○××.” The audio acquisition device 10 may be installed in the terminal device 1 or may be an external device communicably connected to the terminal device 1. Hereinafter, the data collected by the audio acquisition device 10 will be referred to as audio data.
 撮像デバイス11は、例えばカメラであり、人物Aを撮像する。撮像デバイス11は、端末装置1に搭載されてもよいし、端末装置1に通信可能に接続された外部機器でもよい。以下、撮像デバイス11により撮像された人物Aのデータを撮像データと称する。 The imaging device 11 is, for example, a camera, and images the person A. The imaging device 11 may be installed in the terminal device 1 or may be an external device communicably connected to the terminal device 1. Hereinafter, the data of the person A captured by the imaging device 11 will be referred to as imaging data.
 端末装置1は、音声取得デバイス10で取得した音声データおよび撮像デバイス11で取得した撮像データを、特徴量を抽出する装置へ送信する。特徴量を抽出する装置とは、例えばサーバである。なお、端末装置1は、音声データおよび撮像データをサーバに送信せず端末装置1で特徴量を抽出してもよい。 The terminal device 1 transmits the audio data acquired by the audio acquisition device 10 and the imaging data acquired by the imaging device 11 to a device that extracts feature amounts. The device that extracts the feature amount is, for example, a server. Note that the terminal device 1 may extract the feature amount without transmitting the audio data and the imaging data to the server.
 撮像データおよび音声データから抽出される特徴量は、例えば、表情、視線、発話または行動である。なお、撮像データおよび音声データから抽出される特徴量は、これらに限られない。撮像データの中の人物Aの顔を表す画像FR1から表情または視線の情報が抽出される。撮像データの中で人物Aの上半身を表す画像FR2から行動に係る情報が抽出される。音声データから発話に係る情報が抽出される。表情、視線、発話または行動に係る情報は、端末装置1またはサーバ2(図7参照)によって抽出されるが詳しくは後述する。 The feature amounts extracted from the imaging data and audio data are, for example, facial expressions, line of sight, speech, or actions. Note that the feature amounts extracted from the imaging data and audio data are not limited to these. Information on facial expression or line of sight is extracted from image FR1 representing the face of person A in the captured image data. Information regarding the behavior is extracted from the image FR2 representing the upper body of the person A in the image data. Information related to the utterance is extracted from the audio data. Information related to facial expressions, line of sight, speech, or actions is extracted by the terminal device 1 or the server 2 (see FIG. 7), and will be described in detail later.
 抽出された特徴量のデータ(以下、特徴量データと称する)と、満足度を推定するアルゴリズム(以下、満足度推定アルゴリズムと称する)とによって満足度が算出される。満足度とは、特徴量データを基に満足度推定アルゴリズムによって推定された人物Aの人物Bとの会話に対する満足の度合いを表す指標である。満足度推定アルゴリズムは、予め定められたロジックに基づいたアルゴリズム(以下、ロジックベースのアルゴリズムと称する)と、機械学習に基づいたアルゴリズム(以下、機械学習ベースのアルゴリズムと称する)とがある。 The degree of satisfaction is calculated using the extracted feature data (hereinafter referred to as feature amount data) and an algorithm for estimating the degree of satisfaction (hereinafter referred to as satisfaction degree estimation algorithm). Satisfaction is an index representing the degree of satisfaction of person A with the conversation with person B, which is estimated by a satisfaction estimation algorithm based on feature data. Satisfaction estimation algorithms include algorithms based on predetermined logic (hereinafter referred to as logic-based algorithms) and algorithms based on machine learning (hereinafter referred to as machine learning-based algorithms).
 ロジックベースのアルゴリズムは、予め定められた論理に基づき加点と減点とを繰り返し行い満足度を算出する手順を定義したアルゴリズムである。 A logic-based algorithm is an algorithm that defines a procedure for calculating satisfaction by repeatedly adding and subtracting points based on predetermined logic.
 機械学習ベースのアルゴリズムは、例えば、多層パーセプトロン、ランダムフォレストまたは畳み込みニューラルネットワークに基づくディープラーニングを構成として用い、特徴量データから満足度を直接出力するアルゴリズムである。 Machine learning-based algorithms are, for example, algorithms that use deep learning based on multilayer perceptrons, random forests, or convolutional neural networks as a configuration and directly output satisfaction levels from feature data.
 このように、オンラインでの面談等で相手の心理状態を読み取るのが難しい場合に、人物Bは、音声データおよび撮像データから算出された満足度を確認することで人物Aが高い満足度となる円滑なコミュニケーションをすることができる。なお、人物Aと人物Bとが会話するのは、オンラインでの面談等に限られず、対面での会話であってもよい。 In this way, when it is difficult to read the other party's psychological state during an online interview, etc., Person B can confirm that Person A is highly satisfied by checking the satisfaction calculated from the audio data and image data. Able to communicate smoothly. Note that the conversation between person A and person B is not limited to an online interview or the like, but may be a face-to-face conversation.
 次に、図2を参照して、特徴量の一例について説明する。図2は、特徴量の一例を表した図である。 Next, an example of the feature amount will be described with reference to FIG. 2. FIG. 2 is a diagram showing an example of feature amounts.
 「表情」から抽出される特徴量は、笑顔、真顔または泣顔などである。 Features extracted from "facial expressions" include a smiling face, a straight face, or a crying face.
 「視線」から抽出される特徴量は、視線の角度または顔の向きの角度などである。 Features extracted from the "line of sight" include the angle of the line of sight or the angle of the face direction.
 「発話」から抽出される特徴量は、発話時間または感情などである。 Features extracted from "utterances" include utterance time or emotion.
 「動作」から抽出される特徴量は、頷く、静止しているまたは首をかしげるなどである。 Features extracted from "actions" include nodding, standing still, or tilting one's head.
 なお、上述した「表情」、「視線」、「発話」および「動作」から抽出される特徴量は一例でありこれらに限定されない。 Note that the feature amounts extracted from the above-mentioned "facial expression", "gaze", "utterance", and "movement" are only examples, and are not limited to these.
<実施の形態1>
 実施の形態1における評価システム100は、ユーザ(例えば、人物Aおよび人物B)がオンラインで会話している場合に、ユーザの特徴量データの抽出を端末装置で実施し、抽出された特徴量データをサーバに送信しサーバで満足度を算出する。
<Embodiment 1>
The evaluation system 100 in Embodiment 1 extracts user feature data using a terminal device when users (for example, person A and person B) are having an online conversation, and extracts the extracted feature data. is sent to the server and the satisfaction level is calculated by the server.
 図3を参照して、実施の形態1に係る端末装置とサーバとのそれぞれの内部構成例を説明する。図3は、実施の形態1に係る端末装置とサーバとのそれぞれの内部構成例を示すブロック図である。 With reference to FIG. 3, an example of the internal configuration of each of the terminal device and the server according to the first embodiment will be described. FIG. 3 is a block diagram showing an example of the internal configuration of the terminal device and the server according to the first embodiment.
 評価システム100Aは、端末装置1Aとサーバ2Aとを少なくとも含む。端末装置は1つに限られず2つまたは2つ以上の複数であってもよい。端末装置1Aは、ユーザの使用する端末の一例である。なお、評価システム、端末装置およびサーバを区別する場合は、数字の後にアルファベットをつけて説明する。また、評価システム、端末装置およびサーバを区別しない場合は、数字のみをつけて説明する。 The evaluation system 100A includes at least a terminal device 1A and a server 2A. The number of terminal devices is not limited to one, but may be two or more than two. The terminal device 1A is an example of a terminal used by a user. Note that when distinguishing between an evaluation system, a terminal device, and a server, an alphabet is added after the number. In addition, when the evaluation system, terminal device, and server are not distinguished, only numbers will be used in the description.
 端末装置1Aとサーバ2Aとは、ネットワークNWを介して通信可能に接続される。なお、端末装置1Aとサーバ2Aとは有線LAN(Local Area Network)で通信可能に接続されてもよい。また、端末装置1Aとサーバ2AとはネットワークNWを介さず無線通信(例えば、Wi-Fi(登録商標)等の無線LAN)を行ってもよい。 The terminal device 1A and the server 2A are communicably connected via the network NW. Note that the terminal device 1A and the server 2A may be communicably connected via a wired LAN (Local Area Network). Further, the terminal device 1A and the server 2A may perform wireless communication (for example, wireless LAN such as Wi-Fi (registered trademark)) without going through the network NW.
 端末装置1Aは、通信I/F13、メモリ14、入力デバイス15、表示デバイス16、I/F17、音声取得デバイス10、撮像デバイス11およびプロセッサ12を少なくとも有する。端末装置1Aは、PC(Personal Computer)、タブレット、携帯端末、または音声取得デバイス10と撮像デバイス11とを備えた筐体などである。 The terminal device 1A includes at least a communication I/F 13, a memory 14, an input device 15, a display device 16, an I/F 17, an audio acquisition device 10, an imaging device 11, and a processor 12. The terminal device 1A is a PC (Personal Computer), a tablet, a mobile terminal, a housing including the audio acquisition device 10 and the imaging device 11, or the like.
 通信I/F13は、ネットワークNWとの間で無線または有線で通信を行うネットワークインターフェース回路である。ここでI/Fは、インターフェースのことを表す。端末装置1Aは、通信I/F13およびネットワークNWを介してサーバ2Aと通信可能に接続される。通信I/F13は、サーバ2Aに特徴量抽出部12A(後述参照)で抽出された特徴量データを送信する。通信I/F13での通信に使用される通信方式は、例えばWAN(Wide Area Network)、LAN(Local Area Network)、LTE(Long Term Evolution)、5G等の移動体通信、電力線通信、近距離無線通信(例えばBluetooth(登録商標)通信)、携帯電話用の通信等である。 The communication I/F 13 is a network interface circuit that performs wireless or wired communication with the network NW. Here, I/F represents an interface. The terminal device 1A is communicably connected to the server 2A via the communication I/F 13 and the network NW. The communication I/F 13 transmits the feature amount data extracted by the feature amount extraction unit 12A (see below) to the server 2A. Communication methods used for communication in the communication I/F 13 include, for example, WAN (Wide Area Network), LAN (Local Area Network), LTE (Long Term Evolution), mobile communication such as 5G, power line communication, and short-range wireless communication. Communication (for example, Bluetooth (registered trademark) communication), communication for mobile phones, etc.
 メモリ14は、例えば、プロセッサ12の各処理を実行する際に用いられるワークメモリとしてのRAM(Random Access Memory)と、プロセッサ12の動作を規定したプログラムおよびデータを格納するROM(Read Only Memory)とを有する。RAMには、プロセッサ12により生成あるいは取得されたデータもしくは情報が一時的に保存される。ROMには、プロセッサ12の動作を規定するプログラムが書き込まれている。 The memory 14 includes, for example, a RAM (Random Access Memory) as a work memory used when executing each process of the processor 12, and a ROM (Read Only Memory) that stores programs and data that define the operations of the processor 12. has. Data or information generated or acquired by the processor 12 is temporarily stored in the RAM. A program that defines the operation of the processor 12 is written in the ROM.
 入力デバイス15は、ユーザ(例えば、人物Aまたは人物B)からの入力を受け付ける。入力デバイス15は、例えば、タッチパネルディスプレイまたはキーボード等である。入力デバイス15は、表示デバイス16に表示される指示に対する操作を受け付ける。 The input device 15 receives input from a user (for example, person A or person B). The input device 15 is, for example, a touch panel display or a keyboard. The input device 15 accepts operations in response to instructions displayed on the display device 16.
 表示デバイス16は、サーバ2の描画画面作成部24Bが作成した画面(後述参照)を表示する。表示デバイス16は、例えば、ディスプレイまたはノートPCのモニタ等である。 The display device 16 displays a screen (see below) created by the drawing screen creation unit 24B of the server 2. The display device 16 is, for example, a display or a notebook PC monitor.
 I/F17は、ソフトウェアインタフェースである。I/F17は、通信I/F13、メモリ14、入力デバイス15、表示デバイス16、音声取得デバイス10、撮像デバイス11およびプロセッサ12と通信可能に接続されており、各デバイスとのデータのやり取りを行う。なお、I/F17は、端末装置1Aから省略され、端末装置1Aの各デバイス同士でデータのやり取りを行ってもよい。 The I/F 17 is a software interface. The I/F 17 is communicably connected to the communication I/F 13, memory 14, input device 15, display device 16, audio acquisition device 10, imaging device 11, and processor 12, and exchanges data with each device. . Note that the I/F 17 may be omitted from the terminal device 1A, and data may be exchanged between the devices of the terminal device 1A.
 音声取得デバイス10は、ユーザ(例えば、人物Aまたは人物B)の発話を収音する。音声取得デバイス10は、ユーザの発話に基づいて生じる音声を収音(つまり、音声信号を検知)可能なマイクロフォンデバイスにより構成される。音声取得デバイス10は、ユーザの発話に基づいて生じる音声を収音して音声データとして電気信号に変換してI/F17に出力する。 The audio acquisition device 10 picks up the utterances of a user (for example, person A or person B). The audio acquisition device 10 is configured with a microphone device that can collect audio generated based on a user's utterance (that is, detect an audio signal). The audio acquisition device 10 collects audio generated based on a user's utterance, converts it into an electrical signal as audio data, and outputs the electrical signal to the I/F 17 .
 撮像デバイス11は、ユーザ(例えば、人物Aまたは人物B)を撮像するカメラである。撮像デバイス11は、少なくとも光学素子としてのレンズ(不図示)とイメージセンサ(不図示)とを有して構成される。レンズは、撮像デバイス11の撮像した領域の画角内からの対象物により反射された光を入射してイメージセンサの受光面(言い換えると、撮像面)に対象物の光学像を結像する。イメージセンサは、例えばCCD(Charged Coupled Device)あるいはCMOS(Complementary Metal Oxide Semiconductor)などの固体撮像素子である。イメージセンサは、所定時間(例えば1/30(秒))ごとに、レンズを介して撮像面に結像した光学像を電気信号に変換してI/F17に送る。 The imaging device 11 is a camera that images a user (for example, person A or person B). The imaging device 11 includes at least a lens (not shown) as an optical element and an image sensor (not shown). The lens receives light reflected by the object from within the angle of view of the imaged area of the imaging device 11 and forms an optical image of the object on the light receiving surface (in other words, the imaging surface) of the image sensor. The image sensor is, for example, a solid-state imaging device such as a CCD (Charged Coupled Device) or a CMOS (Complementary Metal Oxide Semiconductor). The image sensor converts an optical image formed on an imaging surface through a lens into an electrical signal and sends it to the I/F 17 at predetermined time intervals (for example, 1/30 seconds).
 なお、音声取得デバイス10および撮像デバイス11は、端末装置1Aと通信可能に接続された外部装置でもよい。 Note that the audio acquisition device 10 and the imaging device 11 may be external devices that are communicably connected to the terminal device 1A.
 プロセッサ12は、例えばCPU(Central Processing Unit)、DSP(Digital Signal Processor)、GPU(Graphical Processing Unit)、FPGA(Field Programmable Gate Array)等の電子デバイスのうち少なくとも1つが実装された半導体チップである。プロセッサ12は、端末装置1Aの全体的な動作を司るコントローラとして機能し、端末装置1Aの各部の動作を統括するための制御処理、I/F17との間のデータの入出力処理、データの演算処理およびデータの記憶処理を行う。プロセッサ12は、特徴量抽出部12Aの機能を実現する。プロセッサ12は、動作中にメモリ14のRAMを使用し、プロセッサ12が生成あるいは取得したデータをメモリ14のRAMに一時的に保存する。 The processor 12 is, for example, a CPU (Central Processing Unit), a DSP (Digital Signal Processor), a GPU (Graphical Processing Unit), or an FPGA (Field Processing Unit). It is a semiconductor chip on which at least one of electronic devices such as a programmable gate array is mounted. The processor 12 functions as a controller that controls the overall operation of the terminal device 1A, and performs control processing for unifying the operation of each part of the terminal device 1A, data input/output processing with the I/F 17, and data calculation. Performs processing and data storage. The processor 12 realizes the function of the feature extraction unit 12A. The processor 12 uses the RAM of the memory 14 during operation, and temporarily stores data generated or acquired by the processor 12 in the RAM of the memory 14.
 特徴量抽出部12Aは、音声取得デバイス10から取得した音声データおよび撮像デバイス11から取得した撮像データを基に特徴量(図2参照)を抽出する。特徴量抽出部12Aは、例えばメモリ14に格納されたAI(Artificial Intelligents:人口知能)処理用の学習済みモデルデータを用いて(言い換えると、AIに基づいて)音声データおよび撮像データから各特徴量を抽出してもよい。 The feature amount extraction unit 12A extracts feature amounts (see FIG. 2) based on the audio data acquired from the audio acquisition device 10 and the image data acquired from the imaging device 11. The feature amount extraction unit 12A extracts each feature amount from the audio data and the imaging data using, for example, trained model data for AI (Artificial Intelligence) processing stored in the memory 14 (in other words, based on AI). may be extracted.
 特徴量抽出部12Aは、撮像データから人物Aの顔部分を検知するとともに、検知された顔部分の両眼(つまり左眼および右眼)の向き(言い換えると、視線)を検知する。特徴量抽出部12Aは、表示デバイス16に表示されている画面(例えば、人物Bの撮像映像)を見ている人物Aの視線を検知する。視線の検知方法は、公知技術により実現可能であり、例えば複数の撮像画像(フレーム)のそれぞれに映る両眼の向きの差分を基に検知してよいし、他の検知方法により実現されてもよい。 The feature amount extraction unit 12A detects the face part of the person A from the image data, and also detects the direction (in other words, the line of sight) of both eyes (that is, the left eye and the right eye) of the detected face part. The feature extracting unit 12A detects the line of sight of the person A who is viewing the screen displayed on the display device 16 (for example, the captured video of the person B). The line of sight detection method can be realized using publicly known technology, for example, it may be detected based on the difference in the orientation of both eyes reflected in each of a plurality of captured images (frames), or it may be realized using other detection methods. good.
 特徴量抽出部12Aは、撮像データから人物Aの顔部分を検知するとともに、顔の向きを検知する。顔の向きとは、表示デバイス16の特定箇所(例えば表示デバイス16のパネルの中心位置)に対する顔の角度である。言い換えると、顔の角度とは、表示デバイス16の特定箇所(上述参照)から見た、その特定箇所を見ている人物Aの顔が存在している3次元上の方向を示す方位角および仰角により構成されるベクトル表現である。なお、特定箇所は、パネルの中心位置に限定されなくてよい。顔の向きの検知方法は、公知技術により実現可能である。 The feature extracting unit 12A detects the facial part of the person A from the image data and also detects the direction of the face. The direction of the face is the angle of the face with respect to a specific location on the display device 16 (for example, the center position of the panel of the display device 16). In other words, the angle of the face is an azimuth angle and an elevation angle indicating the three-dimensional direction in which the face of the person A looking at the specific location exists, as viewed from a specific location on the display device 16 (see above). This is a vector representation composed of . Note that the specific location does not have to be limited to the center position of the panel. The face direction detection method can be realized using known techniques.
 特徴量抽出部12Aは、撮像データから人物Aの顔部分を検知するとともに、人物Aの表情を検知する。表情の検知方法は、公知技術により実現可能である。 The feature extracting unit 12A detects the face part of the person A from the image data and also detects the facial expression of the person A. The facial expression detection method can be realized using known techniques.
 特徴量抽出部12Aは、撮像データから人物Aの動作を検知する。動作の検知方法は、公知技術により実現可能である。 The feature extraction unit 12A detects the motion of person A from the image data. The motion detection method can be realized using known techniques.
 特徴量抽出部12Aは、音声データから人物Aの発話時間を検知する。発話時間の検知方法は、例えば音声データの中で人物Aの音声信号が検出された部分の時間を積算し検知してもよい。なお、発話時間の検知方法は、他の公知技術により実現されてもよい。また、特徴量抽出部12Aは、検知された発話時間を基に人物Aと人物Bとのそれぞれの発話している割合を算出する。 The feature extraction unit 12A detects the speaking time of person A from the voice data. The speaking time may be detected by, for example, integrating the time of a portion of the voice data in which the voice signal of person A is detected. Note that the speech time detection method may be implemented using other known techniques. Furthermore, the feature extracting unit 12A calculates the rate at which person A and person B are speaking, based on the detected speaking time.
 特徴量抽出部12Aは、音声データから人物Aの感情を検知する。特徴量抽出部12Aは、例えば、音声の強度、単位時間あたりのモーラ数、単語別の強度、音量または音声のスペクトル等を音声データから検出し感情を検知する。なお、感情の検知方法は、これに限られず他の公知技術により実現されてもよい。 The feature extraction unit 12A detects the emotion of person A from the voice data. The feature extraction unit 12A detects the emotion by detecting, for example, the intensity of the voice, the number of moras per unit time, the intensity of each word, the volume, or the spectrum of the voice, etc. from the voice data. Note that the emotion detection method is not limited to this, and may be realized by other known techniques.
 サーバ2Aは、通信I/F21と、メモリ22と、入力デバイス23と、I/F26と、プロセッサ24とを含んで構成される。 The server 2A includes a communication I/F 21, a memory 22, an input device 23, an I/F 26, and a processor 24.
 通信I/F21は、ネットワークNWを介して、1台以上の端末装置1のそれぞれとの間でデータ送受信を実行する。通信I/F21は、I/F26から出力された表示デバイス16に表示させる画面のデータを端末装置1Aに送信する。 The communication I/F 21 transmits and receives data to and from each of the one or more terminal devices 1 via the network NW. The communication I/F 21 transmits data of a screen output from the I/F 26 to be displayed on the display device 16 to the terminal device 1A.
 メモリ22は、例えば、プロセッサ24の各処理を実行する際に用いられるワークメモリとしてのRAMと、プロセッサ24の動作を規定したプログラムおよびデータを格納するROMとを有する。RAMには、プロセッサ24により生成あるいは取得されたデータもしくは情報が一時的に保存される。ROMには、プロセッサ24の動作を規定するプログラムが書き込まれている。また、メモリ22は、満足度推定アルゴリズムを記憶する。 The memory 22 includes, for example, a RAM as a work memory used when the processor 24 executes each process, and a ROM that stores programs and data that define the operations of the processor 24. Data or information generated or acquired by the processor 24 is temporarily stored in the RAM. A program that defines the operation of the processor 24 is written in the ROM. The memory 22 also stores a satisfaction level estimation algorithm.
 入力デバイス23は、ユーザ(例えば、評価システム100の管理者)から入力を受け付ける。入力デバイス15は、例えば、タッチパネルディスプレイまたはキーボード等である。例えば、入力デバイス15は、ロジックベースのアルゴリズムの閾値(後述参照)の設定を受け付ける。 The input device 23 receives input from a user (for example, an administrator of the evaluation system 100). The input device 15 is, for example, a touch panel display or a keyboard. For example, the input device 15 accepts the setting of a threshold value (see below) for a logic-based algorithm.
 I/F26は、ソフトウェアインタフェースである。I/F26は、通信I/F21、メモリ22、入力デバイス23、およびプロセッサ24と通信可能に接続されており、各デバイスとのデータのやり取りを行う。なお、I/F26は、サーバ2Aから省略され、サーバ2Aの各デバイス同士でデータのやり取りを行ってもよい。 The I/F 26 is a software interface. The I/F 26 is communicably connected to the communication I/F 21, the memory 22, the input device 23, and the processor 24, and exchanges data with each device. Note that the I/F 26 may be omitted from the server 2A, and data may be exchanged between the devices of the server 2A.
 プロセッサ24は、例えばCPU、DSP、GPU、FPGA等の電子デバイスのうち少なくとも1つが実装された半導体チップである。プロセッサ24は、サーバ2Aの全体的な動作を司るコントローラとして機能し、サーバ2Aの各部の動作を統括するための制御処理、I/F26との間のデータの入出力処理、データの演算処理およびデータの記憶処理を行う。プロセッサ24は、満足度推定部24Aおよび描画画面作成部24Bの各部の機能を実現する。プロセッサ24は、動作中にメモリ22のRAMを使用し、プロセッサ24が生成あるいは取得したデータをメモリ22のRAMに一時的に保存する。 The processor 24 is a semiconductor chip on which at least one of electronic devices such as a CPU, a DSP, a GPU, and an FPGA is mounted. The processor 24 functions as a controller that governs the overall operation of the server 2A, and performs control processing for unifying the operation of each part of the server 2A, data input/output processing with the I/F 26, data arithmetic processing, and Performs data storage processing. The processor 24 implements the functions of the satisfaction level estimation section 24A and the drawing screen creation section 24B. The processor 24 uses the RAM of the memory 22 during operation, and temporarily stores data generated or obtained by the processor 24 in the RAM of the memory 22.
 満足度推定部24Aは、端末装置1Aから取得した特徴量データとメモリ22に記録された満足度推定アルゴリズムとを用いて人物Aの満足度を算出する。満足度推定部24Aは、ロジックベースのアルゴリズムを用いて満足度を算出してもよいし、機械学習ベースのアルゴリズムを用いて満足度を算出してもよい。満足度推定部24Aは、算出した満足度に関する情報を描画画面作成部24Bに出力する。 The satisfaction estimation unit 24A calculates the satisfaction of the person A using the feature amount data acquired from the terminal device 1A and the satisfaction estimation algorithm recorded in the memory 22. The satisfaction level estimation unit 24A may calculate the satisfaction level using a logic-based algorithm, or may calculate the satisfaction level using a machine learning-based algorithm. The satisfaction estimation unit 24A outputs information regarding the calculated satisfaction level to the drawing screen creation unit 24B.
 描画画面作成部24Bは、満足度推定部24Aから取得した満足度を用いて端末装置1Aの表示デバイス16に表示させる画面を作成する。画面は、例えば、人物Aの撮像映像、満足度に関する情報、または満足度の評価の開始を制御するボタン等を含む。なお、画面に含まれる項目はこれらに限られない。満足度に関する情報の表示方法は、所定の時間間隔で算出された満足度の値を都度数値もしくはグラフにプロットして表示する、または会議の途中もしくは最後に満足度の値を表示するなどである。満足度の値に関するグラフは、値をプロットしたグラフ、棒グラフまたはメータ等である。描画画面作成部24Bは、作成した画面をI/F26に出力する。 The drawing screen creation unit 24B creates a screen to be displayed on the display device 16 of the terminal device 1A using the satisfaction level acquired from the satisfaction level estimation unit 24A. The screen includes, for example, a captured video of person A, information regarding satisfaction, a button for controlling the start of satisfaction evaluation, and the like. Note that the items included on the screen are not limited to these. Methods of displaying information regarding satisfaction include displaying satisfaction values calculated at predetermined time intervals by plotting them numerically or on a graph each time, or displaying satisfaction values during or at the end of a meeting. . The graph regarding the satisfaction value is a graph in which values are plotted, a bar graph, a meter, or the like. The drawing screen creation unit 24B outputs the created screen to the I/F 26.
 次に、図4を参照して、ロジックベースのアルゴリズムの満足度の算出方法を説明する。図4は、ロジックベースのアルゴリズムの満足度の算出方法を示した図である。 Next, a method for calculating the satisfaction level of the logic-based algorithm will be explained with reference to FIG. FIG. 4 is a diagram showing a method for calculating the satisfaction level of the logic-based algorithm.
 ロジックベースのアルゴリズムでは、予め定められた取り決め(以下、判定方法と称する)によって加点および減点を行い満足度を算出する。判定方法に用いる特徴量のことを判定要素と称する。加点および減点は、所定の時間間隔、会話を行っている時間全体、会話の開始から現在時刻または会話を行っている時間の後半30%で実行される。なお、加点および減点が実行される時間の範囲は、これらに限られずユーザによって任意に決められてもよい。 In a logic-based algorithm, satisfaction is calculated by adding and subtracting points according to a predetermined agreement (hereinafter referred to as a determination method). The feature amounts used in the determination method are referred to as determination elements. Points are added and subtracted at predetermined time intervals, throughout the conversation, from the start of the conversation to the current time, or at the last 30% of the conversation. Note that the range of time during which points are added and points are subtracted is not limited to these and may be arbitrarily determined by the user.
 まず、判定要素が「発話率」の場合の判定方法を説明する。発話率は、所定の時間内でユーザ(例えば、人物Aまたは人物B)が発話した時間の割合を表す。例えば、2.5秒のうちユーザが合計1.0秒発話した場合、発話率は1.0/2.5で0.4(つまり、40%)となる。発話率の算出方法は、例えば、特定の時間間隔でユーザの発話時間を抽出し、抽出された発話時間の合計を、抽出していた合計時間で割ることで算出できる。なお、発話率の算出方法は一例でありこれに限られない。発話率の算出は、特徴量抽出部12Aが行ってもよいし、特徴量抽出部12Aから取得した特徴量データに基づき満足度推定部24Aが行ってもよい。 First, the determination method when the determination element is "speech rate" will be explained. The speech rate represents the percentage of time that a user (for example, person A or person B) speaks within a predetermined period of time. For example, if the user speaks for a total of 1.0 seconds out of 2.5 seconds, the speaking rate is 1.0/2.5, which is 0.4 (that is, 40%). The speech rate can be calculated, for example, by extracting the user's speech time at specific time intervals and dividing the total of the extracted speech times by the extracted total time. Note that the method of calculating the speech rate is one example and is not limited to this. Calculation of the utterance rate may be performed by the feature amount extraction section 12A, or may be performed by the satisfaction level estimation section 24A based on the feature amount data acquired from the feature amount extraction section 12A.
 人物Aの発話率が人物Bの発話率以上の場合、満足度推定アルゴリズムは満足度を0.5加点する。なお、以下加点および減点される数値は一例であり0.5に限定されず予め定められた所定値でよい。人物Aの発話率が人物Bの発話率未満の場合、満足度推定アルゴリズムは満足度を0.5減点する。 If the speech rate of person A is greater than or equal to the speech rate of person B, the satisfaction estimation algorithm adds 0.5 points to the satisfaction level. Note that the numerical values added and subtracted below are merely examples, and are not limited to 0.5, but may be any predetermined value. If the speech rate of person A is less than the speech rate of person B, the satisfaction estimation algorithm deducts 0.5 points from the satisfaction level.
 また、判定方法として人物Bの発話率に対する人物Aの発話率だけでなく、人物Aの発話率が予め設定した閾値以上であるか否かも考慮して、加点および減点を行ってもよい。つまり、人物Aの発話率が人物Bの発話率以上かつ人物Aの発話率が第1閾値以上の場合、満足度推定アルゴリズムは満足度を0.5加点する。人物Aの発話率が人物Bの発話率未満かつ人物Aの発話率が第1閾値以下の第2閾値未満である場合、満足度推定アルゴリズムは満足度を0.5減点する。第1閾値は例えば50%であり、第2閾値は例えば40%である。なお、第1閾値および第2閾値の値は一例であり、ユーザ(例えば人物B)によって適宜変更可能であってもよい。 Additionally, as a determination method, points may be added or subtracted by taking into consideration not only the speech rate of person A relative to the speech rate of person B, but also whether the speech rate of person A is equal to or higher than a preset threshold. That is, when the speech rate of person A is equal to or greater than the speech rate of person B and the speech rate of person A is equal to or greater than the first threshold value, the satisfaction level estimation algorithm adds 0.5 points to the satisfaction level. If the speech rate of person A is less than the speech rate of person B and the speech rate of person A is less than a second threshold that is less than or equal to the first threshold, the satisfaction level estimation algorithm subtracts 0.5 points from the satisfaction level. The first threshold is, for example, 50%, and the second threshold is, for example, 40%. Note that the values of the first threshold value and the second threshold value are merely examples, and may be changed as appropriate by the user (for example, person B).
 また、人物Aの発話率が人物Bの発話率以上であるが人物Aの発話率が第1閾値未満の場合は、減点してもよいし、加点も減点も行わなくてもよい。 Furthermore, if the speech rate of person A is greater than or equal to the speech rate of person B, but the speech rate of person A is less than the first threshold, points may be subtracted, or there is no need to add or subtract points.
 また、人物Aの発話率が人物Bの発話率未満であるが人物Aの発話率が第2閾値以上の場合は、加点してもよいし、加点も減点も行わなくてもよい。 Furthermore, if the speech rate of person A is less than the speech rate of person B, but the speech rate of person A is greater than or equal to the second threshold, points may be added, or points may not be added or points may be subtracted.
 また、人物Aの発話率が第1閾値以上の場合は、人物Bの発話率に関係なく加点してもよい。 Furthermore, if the speech rate of person A is equal to or greater than the first threshold, points may be added regardless of the speech rate of person B.
 また、人物Aの発話率が第2閾値未満の場合は、人物Bの発話率に関係なく減点してもよい。 Further, if the speech rate of person A is less than the second threshold, points may be deducted regardless of the speech rate of person B.
 次に、判定要素が「感情、表情または動作」の場合の判定方法を説明する。感情は、会話の音声データから算出される指標である。判定要素が「感情、表情または動作」の場合、「感情、表情または動作」を基にポジティブ率、ニュートラル率およびネガティブ率(後述参照)が算出される。満足度推定アルゴリズムは、ポジティブ率およびネガティブ率を用いて加点および減点を行う。 Next, a determination method when the determination element is "emotion, facial expression, or action" will be explained. Emotion is an index calculated from conversation audio data. When the determination element is "emotion, facial expression, or action," a positive rate, neutral rate, and negative rate (see below) are calculated based on the "emotion, facial expression, or action." The satisfaction estimation algorithm uses the positive rate and negative rate to add and subtract points.
 ポジティブ率は、所定の時間内でユーザ(例えば、人物A)の感情がポジティブと判定された割合を示す。ポジティブと判定される特徴量は、例えば、人物Aの声が大きくなった、人物Aの声の高さが高くなった、人物Aが頷いているまたは人物Aが笑顔等である。なお、ポジティブと判定される特徴量は一例でありこれらに限定されない。 The positive rate indicates the rate at which the user's (for example, person A) emotion is determined to be positive within a predetermined period of time. Examples of feature amounts that are determined to be positive include person A's voice becoming louder, person A's voice becoming louder, person A nodding, or person A smiling. Note that the feature amounts that are determined to be positive are just examples and are not limited to these.
 ニュートラル率は、所定の時間内でユーザ(例えば、人物A)の感情がニュートラルと判定された割合を示す。ニュートラルな状態とは、人物Aの感情がポジティブでもネガティブでもないと推測される状態である。例えば、ニュートラルな状態とは、人物Aが平然としている状態である。ニュートラルと判定される特徴量は、例えば、人物Aが真顔であるまたは人物Aは静止している等である。なお、ニュートラルと判定される特徴量は一例でありこれらに限定されない。 The neutral rate indicates the rate at which the emotions of the user (for example, person A) are determined to be neutral within a predetermined period of time. The neutral state is a state in which it is assumed that person A's emotions are neither positive nor negative. For example, a neutral state is a state in which person A is calm. The feature amount that is determined to be neutral is, for example, that person A has a straight face or that person A is standing still. Note that the feature amounts that are determined to be neutral are merely examples, and are not limited to these.
 ネガティブ率は、所定の時間内でユーザ(例えば、人物A)の感情がネガティブと判定された割合を示す。ネガティブと判定される特徴量は、例えば、人物Aが泣顔である、人物Aの声が小さくなった、人物Aの声の高さが低くなったまたは人物Aが首をかしげている等である。なお、ネガティブと判定される特徴量は一例でありこれらに限定されない。 The negative rate indicates the rate at which the emotions of the user (for example, person A) are determined to be negative within a predetermined period of time. Features that are determined to be negative include, for example, person A has a crying face, person A's voice has become lower, person A's voice has lowered pitch, or person A is tilting his head. . Note that the feature amounts that are determined to be negative are merely examples and are not limited to these.
 ポジティブ率、ニュートラル率およびネガティブ率の算出の一例として、2.5秒間の中で0.5秒に1回感情を判定するケースを説明する。評価システム100は、2.5秒間の中で、例えば、ポジティブと2回判定し、ネガティブと2回判定し、ニュートラルと1回判定する。この場合、ポジティブ率は、(1+1)/5で0.4(つまり、40%)となる。ネガティブ率は、(1+1)/5で0.4(つまり、40%)となる。ニュートラル率は、1/5で0.2(つまり20%)となる。 As an example of calculating the positive rate, neutral rate, and negative rate, a case will be described in which emotions are determined once every 0.5 seconds within 2.5 seconds. For example, the evaluation system 100 makes two positive determinations, two negative determinations, and one neutral determination within 2.5 seconds. In this case, the positive rate is (1+1)/5, which is 0.4 (that is, 40%). The negative rate is (1+1)/5, which is 0.4 (that is, 40%). The neutral rate is 1/5, which is 0.2 (that is, 20%).
 なお、ポジティブ率、ニュートラル率およびネガティブ率の算出は、特徴量抽出部12Aが行ってもよいし、特徴量抽出部12Aから取得した特徴量データに基づき満足度推定部24Aが行ってもよい。 Note that the calculation of the positive rate, neutral rate, and negative rate may be performed by the feature amount extraction unit 12A, or may be performed by the satisfaction level estimation unit 24A based on the feature amount data acquired from the feature amount extraction unit 12A.
 人物Aのポジティブ率が加点に関する閾値以上の場合、満足度推定アルゴリズムは満足度を0.5加点する。人物Aのネガティブ率が減点に関する閾値以上の場合、満足度推定アルゴリズムは満足度を0.5減点する。 If the positive rate of person A is greater than or equal to the threshold for adding points, the satisfaction estimation algorithm adds 0.5 points to the satisfaction level. If the negative rate of person A is equal to or greater than the threshold for deducting points, the satisfaction estimation algorithm deducts 0.5 points from the satisfaction level.
 例えば、加点に関する閾値は50%である。この場合、ポジティブ率が50%以上の場合、満足度推定アルゴリズムは満足度を0.5加点する。なお、加点に関する閾値は50%に限られずユーザによって適宜変更可能であってもよい。 For example, the threshold for adding points is 50%. In this case, if the positive rate is 50% or more, the satisfaction estimation algorithm adds 0.5 points to the satisfaction. Note that the threshold value for adding points is not limited to 50% and may be changed as appropriate by the user.
 例えば、減点に関する閾値は50%である。この場合、ネガティブ率が50%以上の場合、満足度推定アルゴリズムは満足度を0.5減点する。なお、減点に関する閾値は50%に限られずユーザによって適宜変更可能であってもよい。 For example, the threshold for demerit points is 50%. In this case, if the negative rate is 50% or more, the satisfaction estimation algorithm reduces the satisfaction by 0.5 points. Note that the threshold value for demerit points is not limited to 50%, and may be changed as appropriate by the user.
 次に、判定要素が「視線」の場合の判定方法を説明する。判定要素が「視線」の場合、ディスプレイ(例えば、表示デバイス16)の方向をユーザ(例えば、人物A)が見ている時間を基に加点または減点が行われる。 Next, a determination method when the determination element is "line of sight" will be explained. When the determination factor is "line of sight," points are added or subtracted based on the time the user (for example, person A) looks in the direction of the display (for example, display device 16).
 人物Aのディスプレイの方向を見ている時間が第3閾値以上の場合、満足度推定アルゴリズムは満足度を0.5加点する。人物Aのディスプレイの方向を見ている時間が第3閾値以下の第4閾値未満の場合、満足度を0.5減点する。 If the time that person A looks in the direction of the display is equal to or greater than the third threshold, the satisfaction estimation algorithm adds 0.5 points to the satisfaction. If the time the person A looks in the direction of the display is less than the fourth threshold, which is equal to or less than the third threshold, 0.5 points are deducted from the satisfaction level.
 次に、図5を参照して、所定の時間間隔で満足度を算出した例を説明する。図5は、所定の時間間隔で満足度を算出した例を表す図である。 Next, an example in which satisfaction is calculated at predetermined time intervals will be described with reference to FIG. FIG. 5 is a diagram illustrating an example of calculating satisfaction levels at predetermined time intervals.
 ケースCAおよびケースCBで、評価開始時の満足度の値を3とし、満足度推定アルゴリズムによって満足度の加点および減点を繰り返す。なお、評価開始時の満足度の値は3に限られず任意の値でもよい。ケースCAおよびケースCBでは、満足度は0から5点の間の値を取るものとする。なお、満足度の取りうる値の範囲は0から5点に限られず他の範囲でもよいし、範囲は設定されなくてもよい。 In case CA and case CB, the satisfaction value at the start of evaluation is set to 3, and the satisfaction estimation algorithm repeatedly adds and subtracts satisfaction points. Note that the satisfaction value at the start of the evaluation is not limited to 3 and may be any value. In case CA and case CB, the satisfaction level is assumed to take a value between 0 and 5 points. Note that the range of values that the satisfaction level can take is not limited to 0 to 5 points, but may be in other ranges, and the range does not need to be set.
 ケースCAおよびケースCBのグラフは、30秒ごとに算出された満足度の値をプロットしたものである。ケースCAおよびケースCBのグラフの横軸は経過時間を表し、縦軸は満足度を表す。ケースCAおよびケースCBは、一例として、5分で会話が終了したケースとする。 The graphs for Case CA and Case CB are plots of satisfaction values calculated every 30 seconds. The horizontal axis of the graphs for case CA and case CB represents elapsed time, and the vertical axis represents satisfaction level. Case CA and case CB are, for example, cases in which the conversation ends in 5 minutes.
 ケースCAでは、満足度推定アルゴリズムによって30秒ごとに加点または減点が繰り返され、5分時間が経過した際に満足度が5点となっており、ユーザ(例えば、人物A)が高い満足度で会話を終了したことを表す。 In case CA, the satisfaction level estimation algorithm repeatedly adds or subtracts points every 30 seconds, and when 5 minutes have passed, the satisfaction level is 5 points, and the user (for example, person A) is highly satisfied. Indicates that the conversation has ended.
 ケースCBでは、満足度推定アルゴリズムによって30秒ごとに加点または減点が繰り返され、5分時間が経過した際に満足度が0点となっており、ユーザ(例えば、人物A)が低い満足度で会話を終了したことを表す。 In case CB, the satisfaction level estimation algorithm repeatedly adds or subtracts points every 30 seconds, and when 5 minutes have passed, the satisfaction level is 0 points, and the user (for example, person A) has a low satisfaction level. Indicates that the conversation has ended.
 次に、図6を参照して、実施の形態1に係る満足度の評価の処理を説明する。図6は、実施の形態1に係る満足度の評価の処理のシーケンス図である。 Next, the satisfaction evaluation process according to the first embodiment will be described with reference to FIG. 6. FIG. 6 is a sequence diagram of satisfaction evaluation processing according to the first embodiment.
 2つの端末装置(端末装置1AA、端末装置1AB)とサーバ2Aとによって満足度の評価が行われる。例えば、評価システム100Aが、人物Aと人物Bとの会話から人物Aの満足度を算出するとする。被評価者である人物Aが端末装置1AAを使用し、人物Bが端末装置1ABを使用するとする。なお、端末装置の数は2つに限られず1つでもよいし2つ以上の複数でもよい。 The satisfaction level is evaluated by two terminal devices (terminal device 1AA, terminal device 1AB) and server 2A. For example, assume that the evaluation system 100A calculates the satisfaction level of person A from a conversation between person A and person B. Assume that person A, who is the person to be evaluated, uses terminal device 1AA, and person B uses terminal device 1AB. Note that the number of terminal devices is not limited to two, and may be one or two or more.
 端末装置1AAは、満足度推定アルゴリズムによる満足度の加点および減点に係る各閾値の値を設定する(St100)。なお、端末装置1AAでの閾値の設定は図6に係る処理から省略されてもよい。 The terminal device 1AA sets the values of each threshold value related to addition and deduction of satisfaction points by the satisfaction estimation algorithm (St100). Note that the setting of the threshold value in the terminal device 1AA may be omitted from the process related to FIG. 6.
 端末装置1AAは、人物Aの満足度の評価を開始する(St101)。満足度の評価の開始は、例えば、ユーザ(例えば人物B)により表示デバイス16に表示された評価を開始するボタンを押下する等によって実行される。 The terminal device 1AA starts evaluating the satisfaction level of person A (St101). The start of the satisfaction evaluation is executed, for example, by the user (for example, person B) pressing a button to start evaluation displayed on the display device 16.
 端末装置1AAは、人物Aの撮像データおよび音声データを取得する(St102)。 The terminal device 1AA acquires image data and audio data of person A (St102).
 端末装置1AAは、ステップSt102の処理で取得した撮像データおよび音声データに基づき特徴量を抽出する(St103)。 The terminal device 1AA extracts feature amounts based on the imaging data and audio data acquired in the process of step St102 (St103).
 端末装置1ABは、満足度推定アルゴリズムによる満足度の加点および減点に関する各閾値の値を設定する(St104)。なお、閾値の設定は、人物Bによって任意に設定されてもよいしメモリ14に予め保存されている設定値に基づき自動で行われてもよい。また、閾値の設定は、端末装置1ABではなくサーバ2Aで行われてもよい。 The terminal device 1AB sets the values of each threshold regarding addition and deduction of satisfaction points by the satisfaction estimation algorithm (St104). Note that the threshold value may be set arbitrarily by the person B, or may be automatically set based on a set value stored in the memory 14 in advance. Further, the setting of the threshold value may be performed not in the terminal device 1AB but in the server 2A.
 端末装置1ABは、人物Aの満足度の評価を開始する(St105)。 The terminal device 1AB starts evaluating the satisfaction level of person A (St105).
 端末装置1ABは、人物Bの撮像データおよび音声データを取得する(St106)。 The terminal device 1AB acquires image data and audio data of person B (St106).
 端末装置1ABは、ステップSt106の処理で取得した撮像データおよび音声データに基づき特徴量を抽出する(St107)。 The terminal device 1AB extracts feature amounts based on the imaging data and audio data acquired in the process of step St106 (St107).
 端末装置1AAは、ステップSt100の処理で設定した閾値の設定値と、ステップSt103の処理で抽出した特徴量と、をサーバ2Aに送信する。端末装置1ABは、ステップSt104の処理で設定した閾値の設定値と、ステップSt107の処理で抽出した特徴量と、をサーバ2Aに送信する(St108)。 The terminal device 1AA transmits the threshold value set in the process of step St100 and the feature amount extracted in the process of step St103 to the server 2A. The terminal device 1AB transmits the threshold setting value set in the process of step St104 and the feature amount extracted in the process of step St107 to the server 2A (St108).
 サーバ2Aは、ステップSt108の処理で取得した閾値の設定値と特徴量と満足度推定アルゴリズムとに基づき満足度を算出する(St109)。 The server 2A calculates the satisfaction level based on the threshold setting value, the feature amount, and the satisfaction level estimation algorithm obtained in the process of step St108 (St109).
 端末装置1AAは、サーバ2Aに満足度の結果の送信を要求する(St110)。なお、ステップSt110の処理は、図6に係る処理から省略されてもよい。 The terminal device 1AA requests the server 2A to transmit the satisfaction results (St110). Note that the process of step St110 may be omitted from the process related to FIG. 6.
 端末装置1ABは、サーバ2Aに満足度の結果の送信を要求する(St111)。 The terminal device 1AB requests the server 2A to send the satisfaction results (St111).
 サーバ2Aは、満足度の結果に係る画面を描画する。サーバ2Aは、満足度の結果を描画した画面を端末装置1ABに送信する(St112)。サーバ2Aは、満足度の結果を描画した画面を端末装置1AAに送信する(St113)。ステップSt113の処理は、図6に係る処理から省略されてもよい。 The server 2A draws a screen related to the satisfaction level results. The server 2A transmits a screen on which the satisfaction level results are drawn to the terminal device 1AB (St112). The server 2A transmits a screen on which the satisfaction level results are drawn to the terminal device 1AA (St113). The process of step St113 may be omitted from the process related to FIG. 6.
 端末装置1AAは、ステップSt113の処理で取得した画面を端末装置1AAのディスプレイに表示させる(St114)。ステップSt114の処理は、図6に係る処理から省略されてもよい。 The terminal device 1AA displays the screen acquired in the process of step St113 on the display of the terminal device 1AA (St114). The process of step St114 may be omitted from the process related to FIG. 6.
 端末装置1ABは、ステップSt112の処理で取得した画面を端末装置1ABのディスプレイに表示させる(St115)。 The terminal device 1AB displays the screen acquired in the process of step St112 on the display of the terminal device 1AB (St115).
 なお、ステップSt103およびステップSt107の特徴量を抽出する処理から、ステップSt114およびステップSt115の結果を描画する処理までの処理は、繰り返し実行されてもよい。 Note that the processes from the process of extracting feature amounts in steps St103 and St107 to the process of drawing the results of steps St114 and St115 may be repeatedly executed.
 サーバ2Aは、評価を終了させる旨の信号を端末装置1AAおよび端末装置1ABに送信する(St116)。 The server 2A transmits a signal to end the evaluation to the terminal device 1AA and the terminal device 1AB (St116).
 端末装置1AAは、ステップSt116の処理で取得した信号を基に満足度の評価を終了する(St117)。 The terminal device 1AA ends the satisfaction evaluation based on the signal acquired in the process of step St116 (St117).
 端末装置1ABは、ステップSt116の処理で取得した信号を基に満足度の評価を終了する(St118)。 The terminal device 1AB ends the satisfaction evaluation based on the signal acquired in the process of step St116 (St118).
 端末装置1AAは、満足度の最終結果の送信要求をサーバ2Aに送信する(St119)。ステップSt119の処理は、図6の処理から省略されてもよい。 The terminal device 1AA transmits a request to send the final satisfaction result to the server 2A (St119). The process of step St119 may be omitted from the process of FIG.
 端末装置1ABは、満足度の最終結果の送信要求をサーバ2Aに送信する(St120)。 The terminal device 1AB transmits a request to transmit the final satisfaction result to the server 2A (St120).
 サーバ2Aは、ステップSt120の処理で取得した要求に基づき、満足度の最終結果に係る画面を描画する。サーバ2Aは、満足度の最終結果の画面を端末装置1ABに送信する(St121)。 The server 2A draws a screen related to the final satisfaction result based on the request obtained in the process of step St120. The server 2A transmits a screen showing the final result of the satisfaction level to the terminal device 1AB (St121).
 サーバ2Aは、ステップSt119の処理で取得した要求に基づき、満足度の最終結果を示す画面を描画する。サーバ2Aは、満足度の最終結果の画面を端末装置1AAに送信する(St122)。ステップSt122の処理は、図6の処理から省略されてもよい。 Based on the request obtained in the process of step St119, the server 2A draws a screen showing the final result of the satisfaction level. The server 2A transmits a screen showing the final result of the satisfaction level to the terminal device 1AA (St122). The process of step St122 may be omitted from the process of FIG. 6.
 端末装置1AAは、ステップSt122の処理で取得した画面を端末装置1AAのディスプレイに表示させる(St123)。ステップSt123の処理は、図6の処理から省略されてもよい。 The terminal device 1AA displays the screen acquired in the process of step St122 on the display of the terminal device 1AA (St123). The process of step St123 may be omitted from the process of FIG.
 端末装置1ABは、ステップSt121の処理で取得した画面を端末装置1ABのディスプレイに表示させる(St124)。 The terminal device 1AB displays the screen acquired in the process of step St121 on the display of the terminal device 1AB (St124).
<実施の形態2>
 実施の形態2に係る評価システムは、端末装置で取得した撮像データおよび音声データに基づき、サーバが特徴量の抽出から満足度の算出まで一括して実施する。以下、実施の形態1と同一の構成要素については同一の符号を用いることで、その説明を省略する。
<Embodiment 2>
In the evaluation system according to Embodiment 2, a server performs everything from extraction of feature amounts to calculation of satisfaction level all at once based on imaging data and audio data acquired by a terminal device. Hereinafter, the same reference numerals will be used for the same components as in Embodiment 1, and the description thereof will be omitted.
 図7を参照して、実施の形態2に係る端末装置とサーバとのそれぞれの内部構成例を説明する。図7は、実施の形態2に係る端末装置とサーバとのそれぞれの内部構成例を示した図である。図3の実施の形態1に係るハードウェアブロック図と異なる部分のみを説明する。 With reference to FIG. 7, an example of the internal configuration of the terminal device and the server according to the second embodiment will be described. FIG. 7 is a diagram showing an example of the internal configuration of a terminal device and a server according to the second embodiment. Only the parts that are different from the hardware block diagram according to the first embodiment shown in FIG. 3 will be explained.
 実施の形態2に係る評価システム100Bでは、特徴量抽出部12Aがサーバ2Bのプロセッサ24に組み込まれる。つまり、端末装置1Bは、通信I/F13、メモリ14、入力デバイス15、表示デバイス16、I/F17、音声取得デバイス10および撮像デバイス11を有する。サーバ2Bは、通信I/F21、メモリ22、入力デバイス23、I/F26およびプロセッサ24を有する。 In the evaluation system 100B according to the second embodiment, the feature extraction unit 12A is incorporated into the processor 24 of the server 2B. That is, the terminal device 1B includes a communication I/F 13, a memory 14, an input device 15, a display device 16, an I/F 17, an audio acquisition device 10, and an imaging device 11. The server 2B includes a communication I/F 21, a memory 22, an input device 23, an I/F 26, and a processor 24.
 プロセッサ24は、特徴量抽出部12A、満足度推定部24Aおよび描画画面作成部24Bの各部の機能を実現する。 The processor 24 realizes the functions of the feature amount extraction section 12A, the satisfaction estimation section 24A, and the drawing screen creation section 24B.
 特徴量抽出部12Aは、端末装置1Bから取得した音声データおよび撮像データを基に特徴量を抽出する。 The feature amount extraction unit 12A extracts feature amounts based on the audio data and image data acquired from the terminal device 1B.
 図8を参照して、実施の形態2に係る満足度の評価の処理を説明する。図8は、実施の形態2に係る満足度の評価の処理のシーケンス図である。実施の形態1の図6のシーケンス図と同様の処理は同一符号を付記し、異なる処理のみを説明する。 With reference to FIG. 8, satisfaction evaluation processing according to the second embodiment will be described. FIG. 8 is a sequence diagram of satisfaction evaluation processing according to the second embodiment. Processes similar to those in the sequence diagram of FIG. 6 of the first embodiment are given the same reference numerals, and only different processes will be described.
 端末装置1BAは、ステップSt100の処理で設定した閾値の値と、ステップSt102の処理で取得した撮像データおよび音声データとをサーバ2Bに送信する(St200)。 The terminal device 1BA transmits the threshold value set in the process of step St100 and the imaging data and audio data acquired in the process of step St102 to the server 2B (St200).
 端末装置1BBは、ステップSt104の処理で設定した閾値の値と、ステップSt106の処理で取得した撮像データおよび音声データとをサーバ2Bに送信する(St200)。 The terminal device 1BB transmits the threshold value set in the process of step St104 and the imaging data and audio data acquired in the process of step St106 to the server 2B (St200).
 サーバ2Bは、ステップSt200の処理で取得した撮像データおよび音声データを基に特徴量を抽出する(St201)。 The server 2B extracts feature amounts based on the imaging data and audio data acquired in the process of step St200 (St201).
 サーバ2Bは、ステップSt201の処理で抽出した特徴量を基に満足度を算出する(St202)。以下の処理は、図6のシーケンス図に係る各処理と同様であるため説明を省略する。 The server 2B calculates the degree of satisfaction based on the feature amount extracted in the process of step St201 (St202). The following processing is the same as each processing related to the sequence diagram of FIG. 6, so the explanation will be omitted.
<実施の形態3>
 実施の形態3に係る評価システムは、過去に端末装置で取得した撮像データおよび音声データ(つまり、過去に録画または録音したデータ)に基づき、端末装置もしくはサーバで満足度の算出を実施する。以下、実施の形態1と同一の構成要素については同一の符号を用いることで、その説明を省略する。
<Embodiment 3>
The evaluation system according to the third embodiment calculates the degree of satisfaction using the terminal device or the server based on the imaging data and audio data (that is, the data recorded or recorded in the past) acquired by the terminal device in the past. Hereinafter, the same reference numerals will be used for the same components as in Embodiment 1, and the description thereof will be omitted.
 図9を参照して、実施の形態3に係る端末装置の内部構成例を説明する。図9は、実施の形態3に係る端末装置の内部構成例を示した図である。図3の実施の形態1に係るハードウェアブロック図と異なる部分のみを説明する。 An example of the internal configuration of the terminal device according to the third embodiment will be described with reference to FIG. FIG. 9 is a diagram showing an example of the internal configuration of a terminal device according to the third embodiment. Only the parts that are different from the hardware block diagram according to the first embodiment shown in FIG. 3 will be explained.
 端末装置1Cは、通信I/F13、メモリ14、入力デバイス15、表示デバイス16、音声取得デバイス10、撮像デバイス11およびプロセッサ12を有する。音声取得デバイス10および撮像デバイス11は省略されてもよい。 The terminal device 1C includes a communication I/F 13, a memory 14, an input device 15, a display device 16, an audio acquisition device 10, an imaging device 11, and a processor 12. The audio acquisition device 10 and the imaging device 11 may be omitted.
 通信I/F13は、プロセッサ12の描画画面作成部24Bによって描画された画面を他の端末装置等に送信してもよい。また、音声取得デバイス10および撮像デバイス11が外部装置である場合に、通信I/F13は外部装置から過去に撮像された撮像データおよび過去に収音された音声データを取得する。 The communication I/F 13 may transmit the screen drawn by the drawing screen creation unit 24B of the processor 12 to another terminal device or the like. Further, when the audio acquisition device 10 and the imaging device 11 are external devices, the communication I/F 13 acquires image data captured in the past and audio data captured in the past from the external devices.
 プロセッサ12の特徴量抽出部12Aは、過去に撮像された撮像データおよび過去に収音された音声データを基に特徴量を抽出する。特徴量抽出部12Aは、抽出した特徴量データを満足度推定部24Aに出力する。 The feature amount extraction unit 12A of the processor 12 extracts feature amounts based on image data captured in the past and audio data captured in the past. The feature extraction unit 12A outputs the extracted feature data to the satisfaction estimation unit 24A.
 特徴量抽出部12Aは、人物Aと人物Bとの両方の撮像データおよび音声データが含まれた1つのファイルを取得する。特徴量抽出部12Aは、1つのファイルから画像認識または音声認識等の公知の技術を用いて、人物Aの撮像データと音声データと、人物Bの撮像データと音声データとの4つのデータに分離する。 The feature extraction unit 12A obtains one file that includes the imaging data and audio data of both person A and person B. The feature extracting unit 12A separates one file into four pieces of data: image data and voice data of person A, and image data and voice data of person B, using known techniques such as image recognition or voice recognition. do.
 また、特徴量抽出部12Aは、人物Aの撮像データと音声データとが含まれたファイルと、人物Bの撮像データと音声データとが含まれたファイルとの2つのファイルを取得してもよい。特徴量抽出部12Aは、公知の技術を用いて各ファイルから撮像データと音声データとに分離する。なお、入力デバイス15は、ユーザ(例えば、人物Aまたは人物B)から、2つの各ファイルが人物Aと人物Bとのどちらに紐づくのかに係る入力を取得してもよい。 Further, the feature extracting unit 12A may obtain two files: a file containing image data and audio data of person A, and a file containing image data and audio data of person B. . The feature extraction unit 12A separates each file into image data and audio data using a known technique. Note that the input device 15 may obtain an input from a user (for example, person A or person B) regarding whether each of the two files is associated with person A or person B.
 また、特徴量抽出部12Aは、人物Aの撮像データのファイルと、人物Aの音声データのファイルと、人物Bの撮像データのファイルと、人物Bの音声データのファイルとの4つのファイルを取得してもよい。なお、入力デバイス15は、ユーザ(例えば、人物Aまたは人物B)から、4つの各ファイルが人物Aと人物Bとのどちらに紐づくのかに関する入力を取得してもよい。 In addition, the feature extraction unit 12A obtains four files: a file of image data of person A, a file of voice data of person A, a file of image data of person B, and a file of voice data of person B. You may. Note that the input device 15 may obtain input from the user (for example, person A or person B) regarding whether each of the four files is associated with person A or person B.
 また、過去に端末装置で取得した撮像データおよび音声データに基づきサーバで満足度を算出する場合、実施の形態3のハードウェアブロック図は、実施の形態2の図7と同様となる。端末装置1Bの音声取得デバイス10で過去に取得した音声データと、端末装置1Bの撮像デバイス11で過去に取得した撮像データとを、サーバ2Bは取得する。サーバ2Bの特徴量抽出部12Aは、取得した音声データと撮像データとに基づき特徴量を抽出し、満足度推定部24Aは、抽出された特徴量に基づき満足度を算出する。 Furthermore, when the satisfaction level is calculated by the server based on the imaging data and audio data acquired by the terminal device in the past, the hardware block diagram of the third embodiment is similar to FIG. 7 of the second embodiment. The server 2B acquires audio data previously acquired by the audio acquisition device 10 of the terminal device 1B and imaging data previously acquired by the imaging device 11 of the terminal device 1B. The feature amount extraction unit 12A of the server 2B extracts the feature amount based on the acquired audio data and image data, and the satisfaction level estimation unit 24A calculates the satisfaction level based on the extracted feature amount.
 次に、図10を参照して、端末装置で満足度を算出する処理を説明する。図10は、端末装置で満足度を算出する処理を表すフローチャートである。図10に係る各処理はプロセッサ12によって実行される。 Next, with reference to FIG. 10, the process of calculating the satisfaction level on the terminal device will be described. FIG. 10 is a flowchart illustrating the process of calculating the degree of satisfaction on the terminal device. Each process related to FIG. 10 is executed by the processor 12.
 プロセッサ12は、満足度推定アルゴリズムによる満足度の加点および減点に関する各閾値の値を設定する(St300)。プロセッサ12は、入力デバイス15から取得したユーザ(例えば、人物B)の入力信号を取得して閾値の設定をしてもよいし、メモリ14に予め保存された設定値を基に自動で設定してもよい。 The processor 12 sets the values of each threshold regarding addition and deduction of satisfaction points by the satisfaction estimation algorithm (St300). The processor 12 may set the threshold value by obtaining an input signal from the user (for example, person B) obtained from the input device 15, or may set the threshold value automatically based on the setting value stored in advance in the memory 14. You can.
 プロセッサ12は、メモリ14に保存されている過去に撮像した撮像データおよび収音した音声データを取得する(St301)。なお、プロセッサ12は、過去のデータに限られず、端末装置1Cの音声取得デバイス10と撮像デバイス11とで現在取得しているデータを取得してもよい。 The processor 12 acquires previously captured image data and captured audio data stored in the memory 14 (St301). Note that the processor 12 is not limited to past data, and may acquire data currently being acquired by the audio acquisition device 10 and the imaging device 11 of the terminal device 1C.
 プロセッサ12は、ステップSt301の処理で取得した撮像データと音声データとから特徴量を抽出する(St302)。 The processor 12 extracts feature amounts from the imaging data and audio data acquired in the process of step St301 (St302).
 プロセッサ12は、ステップSt302の処理で抽出した特徴量を基にユーザ(例えば、人物A)の満足度を算出する(St303)。 The processor 12 calculates the satisfaction level of the user (for example, person A) based on the feature amount extracted in the process of step St302 (St303).
 プロセッサ12は、ステップSt303の処理で算出した満足度の結果を示す画面を描画する(St304)。 The processor 12 draws a screen showing the satisfaction level calculated in the process of step St303 (St304).
 次に、図11を参照して過去に撮像および収音したデータからサーバで満足度を算出する処理を説明する。図11は、過去に撮像および収音したデータからサーバで満足度を算出する処理を示すシーケンス図である。 Next, with reference to FIG. 11, a description will be given of a process in which the server calculates the satisfaction level from data captured in images and sounds in the past. FIG. 11 is a sequence diagram showing a process in which the server calculates the degree of satisfaction from data captured in images and sounds in the past.
 端末装置1Bは、満足度推定アルゴリズムによる満足度の加点および減点に関する閾値の設定情報をサーバ2Bに送信する(St400)。 The terminal device 1B transmits to the server 2B threshold setting information regarding addition and deduction of satisfaction points based on the satisfaction estimation algorithm (St400).
 端末装置1Bは、過去に撮像した撮像データおよび過去に収音した音声データをサーバ2Bに送信する(St401)。 The terminal device 1B transmits image data captured in the past and audio data captured in the past to the server 2B (St401).
 サーバ2Bは、ステップSt401の処理で取得した撮像データおよび音声データを基に特徴量を抽出する(St402)。 The server 2B extracts feature amounts based on the imaging data and audio data acquired in the process of step St401 (St402).
 サーバ2Bは、ステップSt402の処理で取得した特徴量を基に満足度を算出する(St403)。 The server 2B calculates the degree of satisfaction based on the feature amount acquired in the process of step St402 (St403).
 端末装置1Bは、満足度の最終結果の送信をサーバ2Bに要求する(St404)。 The terminal device 1B requests the server 2B to send the final result of the satisfaction level (St404).
 サーバ2Bは、ステップSt404の処理で端末装置1Bから受けた要求に基づき満足度の最終結果を含む画面を描画する。サーバ2Bは、描画した画面を端末装置1Bに送信する(St405)。 The server 2B draws a screen including the final satisfaction result based on the request received from the terminal device 1B in the process of step St404. The server 2B transmits the drawn screen to the terminal device 1B (St405).
 端末装置1Bは、ステップSt405の処理で取得した満足度の最終結果を含む画面を表示する(St406)。 The terminal device 1B displays a screen including the final result of the satisfaction level obtained in the process of step St405 (St406).
 次に、図12を参照して、端末装置に表示される画面の一例を説明する。図12は、端末装置に表示される画面の一例を示す図である。 Next, an example of a screen displayed on the terminal device will be described with reference to FIG. 12. FIG. 12 is a diagram showing an example of a screen displayed on a terminal device.
 画面MN1は、会議をしている間のある瞬間において端末装置1に表示される画面の一例である。例えば、人物Aと人物Bとが会議をしており、人物Aが被評価者である場合、画面MN1は人物Bが参照する画面である。画面MN1は、表示領域IT1,IT2およびボタンBT1,BT2,BT3,BT4,BT5,BT6を含む。 Screen MN1 is an example of a screen displayed on terminal device 1 at a certain moment during a meeting. For example, if Person A and Person B are having a meeting and Person A is the person to be evaluated, screen MN1 is the screen that Person B refers to. Screen MN1 includes display areas IT1, IT2 and buttons BT1, BT2, BT3, BT4, BT5, and BT6.
 表示領域IT2は、人物Aの撮像映像がリアルタイムで表示される領域である。描画画面作成部24Bは、撮像デバイス11から取得した人物Aの撮像映像を表示領域IT2に表示させる。 The display area IT2 is an area where the captured video of the person A is displayed in real time. The drawing screen creation unit 24B displays the captured video of the person A acquired from the imaging device 11 in the display area IT2.
 表示領域IT1は、満足度の結果が表示される領域である。表示領域IT1は、所定の時間間隔で算出された満足度の値をプロットしたグラフを表示する。描画画面作成部24Bは、満足度推定部24Aから満足度を取得したタイミングで、満足度を表示領域IT1に表示させてもよい。なお、表示領域IT1は、グラフに限られず、会議開始時から今までの間で算出された満足度の値を数字で表示してもよいし、所定の時間間隔で算出された満足度の値を数字で都度表示してもよい。また、表示領域IT1は、算出された満足度の値をもとに「高」、「中」、「低」のようなテキストで現在の満足度を表示してもよいし、満足度に応じた顔文字または絵文字等を表示してもよい。 The display area IT1 is an area where the satisfaction results are displayed. The display area IT1 displays a graph in which satisfaction values calculated at predetermined time intervals are plotted. The drawing screen creation unit 24B may display the satisfaction level in the display area IT1 at the timing when the satisfaction level is acquired from the satisfaction level estimation unit 24A. Note that the display area IT1 is not limited to graphs, and may display satisfaction values calculated from the start of the meeting to now in numbers, or satisfaction values calculated at predetermined time intervals. may be displayed numerically each time. In addition, the display area IT1 may display the current satisfaction level in text such as "high", "medium", or "low" based on the calculated satisfaction value, or may display the current satisfaction level in accordance with the satisfaction level. You may also display emoticons or pictograms.
 ボタンBT1は、相手の端末装置1へ自分の撮像映像の表示をオンにするボタンである。ボタンBT2は、相手の端末装置1へ自分の撮像映像の表示をオフにするボタンである。 The button BT1 is a button that turns on the display of the captured image of the user on the other party's terminal device 1. The button BT2 is a button for turning off the display of the user's captured video on the other party's terminal device 1.
 ボタンBT3は、相手の端末装置1へ自分の音声の出力をオンにするボタンである。ボタンBT4は、相手の端末装置1へ自分の音声の出力をオフにするボタンである。 The button BT3 is a button that turns on the output of your own voice to the terminal device 1 of the other party. The button BT4 is a button for turning off the output of your own voice to the terminal device 1 of the other party.
 ボタンBT5は、満足度の評価を開始または終了させるボタンである。ボタンBT5は、画面MN1から省略されてもよい。 The button BT5 is a button for starting or ending satisfaction evaluation. Button BT5 may be omitted from screen MN1.
 ボタンBT6は、会議を開始または終了させるボタンである。 The button BT6 is a button for starting or ending a conference.
 画面MN2は、会議をしている間のある瞬間において端末装置1に表示される画面の一例である。画面MN2は、画面MN1が端末装置1に表示されてから時間が1分経過した時に端末装置1に表示される画面である。 Screen MN2 is an example of a screen displayed on the terminal device 1 at a certain moment during the meeting. Screen MN2 is a screen displayed on terminal device 1 when one minute has passed since screen MN1 was displayed on terminal device 1.
 表示領域IT3は、満足度の結果が表示される領域である。表示領域IT3は、所定の時間間隔で算出された満足度の値をプロットしたグラフを表示する。表示領域IT3は、表示領域IT1に表示されたグラフに、人物Aと人物Bとの会話が1分さらに進行したことによってさらに2つの満足度の結果が追加でプロットされたグラフを表示する。このように、表示領域IT3は、経過した時間に応じてリアルタイムで満足度の結果が追加でプロットされていく。 The display area IT3 is an area where the satisfaction results are displayed. The display area IT3 displays a graph in which satisfaction values calculated at predetermined time intervals are plotted. The display area IT3 displays a graph in which two satisfaction results are additionally plotted on the graph displayed in the display area IT1 as the conversation between person A and person B progresses for one minute. In this way, in the display area IT3, satisfaction results are additionally plotted in real time according to the elapsed time.
 次に、図13を参照して、満足度の結果に応じてメッセージを表示された画面の一例を説明する。図13は、満足度の結果に応じてメッセージを表示された画面の一例を示す図である。図13の説明において、図12と重複する要素については同一の符号を付与して説明を簡略化あるいは省略し、異なる内容について説明する。 Next, with reference to FIG. 13, an example of a screen on which a message is displayed according to the satisfaction level result will be described. FIG. 13 is a diagram showing an example of a screen on which a message is displayed according to the satisfaction level result. In the description of FIG. 13, elements that overlap with those in FIG. 12 are given the same reference numerals to simplify or omit the description, and different contents will be described.
 画面MN3は、会議をしている間のある瞬間において端末装置1に表示される画面の一例である。画面MN3は、画面MN1が端末装置1に表示されてから時間が1分経過した時に端末装置1に表示される画面である。 Screen MN3 is an example of a screen displayed on terminal device 1 at a certain moment during a meeting. Screen MN3 is a screen that is displayed on terminal device 1 when one minute has elapsed since screen MN1 was displayed on terminal device 1.
 表示領域IT4は、満足度の結果が表示される領域である。表示領域IT4は、所定の時間間隔で算出された満足度の値をプロットしたグラフを表示する。表示領域IT4は、表示領域IT1に表示されたグラフに、人物Aと人物Bとの会話が1分さらに進行したことによってさらに2つの満足度の結果が追加でプロットされたグラフを表示する。 The display area IT4 is an area where the satisfaction results are displayed. The display area IT4 displays a graph in which satisfaction values calculated at predetermined time intervals are plotted. The display area IT4 displays a graph in which two satisfaction results are additionally plotted on the graph displayed in the display area IT1 as the conversation between person A and person B progresses for one minute.
 メッセージMesは、満足度の値に応じて表示されるメッセージである。例えば、メッセージMesは、人物Aの発話率に応じて表示される。満足度推定部24Aは、人物Aの発話率が人物Bの発話率未満であると判定した場合、人物Aの発話率が人物Bの発話率未満である旨の信号を描画画面作成部24Bに出力する。なお、満足度推定部24Aは、所定の時間継続して人物Aの発話率が人物Bの発話率未満であると判定した場合、人物Aの発話率が人物Bの発話率未満である旨の信号を描画画面作成部24Bに出力してもよい。また、満足度推定部24Aは、人物Aの発話率が第2閾値未満であると判定した場合、人物Aの発話率が第2閾値未満である旨の信号を描画画面作成部24Bに出力してもよい。なお、人物Aの発話率が人物Bの発話率未満であるか否かに係る判定および人物Aの発話率が第2閾値未満であるか否かに係る判定は、特徴量抽出部12Aが行ってもよい。描画画面作成部24Bは、満足度推定部24Aから取得した信号に基づき、人物Bに対し発話を控える旨のメッセージを作成し画面MN3に表示させる。発話を控える旨のメッセージとは、例えば「人物Aの話を聞きましょう」である。なお、発話を控える旨のメッセージは一例でありこれに限られない。 The message Mes is a message displayed according to the satisfaction level. For example, the message Mes is displayed according to person A's speaking rate. When determining that the speech rate of person A is less than the speech rate of person B, the satisfaction estimation unit 24A sends a signal to the drawing screen creation unit 24B that the speech rate of person A is less than the speech rate of person B. Output. In addition, when determining that the speech rate of person A is less than the speech rate of person B for a predetermined period of time, the satisfaction estimation unit 24A determines that the speech rate of person A is less than the speech rate of person B. The signal may be output to the drawing screen creation section 24B. Further, when determining that the speech rate of person A is less than the second threshold, the satisfaction level estimation unit 24A outputs a signal indicating that the speech rate of person A is less than the second threshold to the drawing screen creation unit 24B. You can. Note that the feature extraction unit 12A performs the determination as to whether or not the speech rate of person A is less than the speech rate of person B, and the determination as to whether or not the speech rate of person A is less than the second threshold. You can. The drawing screen creation unit 24B creates a message for the person B to refrain from speaking based on the signal acquired from the satisfaction level estimation unit 24A, and causes the message to be displayed on the screen MN3. The message to refrain from speaking is, for example, "Let's listen to what Person A has to say." Note that the message to refrain from speaking is one example and is not limited to this.
 なお、端末装置1もしくはサーバ2は、複数の人物のそれぞれの満足度を算出し全員の満足度の平均値を算出してもよい。このように、端末装置1もしくはサーバ2は、個人を特定しない形で満足度を集計しユーザに通知してもよい。 Note that the terminal device 1 or the server 2 may calculate the satisfaction level of each of a plurality of people and calculate the average value of the satisfaction levels of all the people. In this way, the terminal device 1 or the server 2 may aggregate the satisfaction level and notify the user without identifying the individual.
 端末装置1は、人物Aの満足度の値に応じて、人物Bの見ている画面にアバタ等の視聴者の注意を引き付けるもの表示してもよい。なお、アバタ等は人物Aの画面にも表示されてもよい。例えば、評価システムは、満足度が低い場合、アバタが画面に表示されることで、人物Aおよび人物Bの注意を画面にひきつけ人物Aの満足度を向上させることができる。 Depending on the satisfaction level of person A, the terminal device 1 may display something that attracts the viewer's attention, such as an avatar, on the screen that person B is viewing. Note that the avatar and the like may also be displayed on the screen of person A. For example, when the satisfaction level is low, the evaluation system can improve the satisfaction level of person A by displaying an avatar on the screen to attract the attention of person A and person B to the screen.
 端末装置1は、人物Aの動作に応じて、人物Aが今考え中であることを伝える通知を表示してもよい。 Depending on the person A's actions, the terminal device 1 may display a notification that the person A is currently thinking.
 端末装置1またはサーバ2は、会話をしている相手の撮像映像を表示デバイス16に表示させず(つまり、撮像映像の表示をオフにした状態で)に満足度を算出してもよい。 The terminal device 1 or the server 2 may calculate the degree of satisfaction without displaying the captured video of the person having the conversation on the display device 16 (that is, with the display of the captured video turned off).
 以上により、本実施の形態に係る評価システム(例えば、評価システム100、評価システム100A、評価システム100B)は、第1の人物と第2の人物との会話に係る音声データを取得する取得部(例えば、音声取得デバイス10)と、第1の人物と第2の人物とを撮像する撮像部(例えば、撮像デバイス11)とを備える。評価システムは、撮像部の撮像データに基づいて第1の人物と第2の人物とのそれぞれの視線もしくは顔の向きに係る第1の特徴量と、音声データの第2の特徴量と、を抽出する抽出部(例えば、特徴量抽出部12A)を備える。評価システムは、第1の特徴量と、第2の特徴量と、第1の人物の満足度を算出するための算出アルゴリズムと、に基づいて、第1の人物の満足度を算出する満足度算出部(例えば、満足度推定部24A)、を備える。 As described above, the evaluation system according to the present embodiment (for example, evaluation system 100, evaluation system 100A, evaluation system 100B) has an acquisition unit ( For example, it includes a voice acquisition device 10) and an imaging unit (for example, an imaging device 11) that images a first person and a second person. The evaluation system calculates a first feature amount related to the line of sight or face direction of each of the first person and the second person, and a second feature amount of the audio data based on the imaging data of the imaging unit. It includes an extraction unit (for example, feature quantity extraction unit 12A) that performs extraction. The evaluation system calculates the satisfaction level of the first person based on the first feature amount, the second feature amount, and the calculation algorithm for calculating the satisfaction level of the first person. A calculation unit (for example, a satisfaction level estimation unit 24A) is provided.
 これにより、評価システムは、第1の人物の視線もしくは顔の向きに係る情報と、第1の人物の音声データに係る情報と、の2つの情報に基づき満足度を算出することができる。これにより、評価システムは、人物間の会話に含まれる複数の情報を用いた高精度な満足度の評価を行うことができる。 Thereby, the evaluation system can calculate the satisfaction level based on two pieces of information: information related to the first person's line of sight or face direction, and information related to the first person's voice data. Thereby, the evaluation system can perform highly accurate satisfaction evaluation using a plurality of pieces of information included in conversations between people.
 また、本実施の形態の評価システムの満足度算出部は、会話の開始から終了までの間、所定の時間間隔で満足度を算出する。これにより、評価システムは、会話が開始してから会話が終了するまでの各時間で第1の人物の満足度を算出することができ柔軟な満足度の評価を行うことができる。 Furthermore, the satisfaction calculation unit of the evaluation system of this embodiment calculates the satisfaction at predetermined time intervals from the start to the end of the conversation. Thereby, the evaluation system can calculate the satisfaction level of the first person at each time from the start of the conversation until the end of the conversation, and can perform a flexible evaluation of the satisfaction level.
 また、本実施の形態の評価システムの抽出部は、第2の特徴量として、会話の中の前記第1の人物が発話している割合を示す第1の割合と第2の人物が発話している割合を示す第2の割合とを算出する。算出アルゴリズムは、第1の割合が第2の割合以上の場合、満足度の値に所定値を加点し、第1の割合が第2の割合未満の場合、満足度の値に所定値を減点する。これにより、評価システムは、第2の人物の発話率に対する第1の人物の発話率に応じて満足度を評価することができる。 In addition, the extraction unit of the evaluation system of the present embodiment uses, as second feature quantities, a first ratio indicating the ratio of the first person speaking in the conversation and a ratio of the second person speaking. A second ratio is calculated. The calculation algorithm adds a predetermined value to the satisfaction value when the first ratio is greater than or equal to the second ratio, and subtracts a predetermined value from the satisfaction value when the first ratio is less than the second ratio. do. Thereby, the evaluation system can evaluate the degree of satisfaction according to the speech rate of the first person relative to the speech rate of the second person.
 また、本実施の形態の評価システムの算出アルゴリズムは、第1の割合が第2の割合以上かつ第1の割合が第1閾値以上の場合、満足度の値に所定値を加点する。算出アルゴリズムは、第1の割合が第2の割合未満かつ第1の割合が第1閾値以下の第2閾値未満の場合、満足度の値に所定値を減点する。これにより、評価システムは、第2の人物の発話率に対する第1の人物の発話率および閾値に対する第1の人物の発話率に応じて満足度を評価することができる。 Further, the calculation algorithm of the evaluation system of the present embodiment adds a predetermined value to the satisfaction value when the first ratio is equal to or higher than the second ratio and the first ratio is equal to or higher than the first threshold value. The calculation algorithm subtracts a predetermined value from the satisfaction value when the first ratio is less than the second ratio and the first ratio is less than the second threshold, which is less than or equal to the first threshold. Thereby, the evaluation system can evaluate the degree of satisfaction according to the speech rate of the first person relative to the speech rate of the second person and the speech rate of the first person relative to the threshold value.
 また、本実施の形態の評価システムの抽出部は、第2の特徴量として、音声データから第1の人物の感情を検出し、感情から第1の人物がポジティブに感じた割合であるポジティブ率と、感情から第1の人物がネガティブに感じた割合であるネガティブ率と、を算出する。算出アルゴリズムは、ポジティブ率が加点に係る閾値以上の場合、満足度の値に所定値を加点し、ネガティブ率が減点に係る閾値以上の場合、満足度の値に所定値を減点する。これにより、評価システムは、第1の人物の音声データから検出した感情に基づき満足度の評価を行うことができる。 Further, the extraction unit of the evaluation system of the present embodiment detects the emotion of the first person from the voice data as a second feature quantity, and detects a positive rate, which is the rate at which the first person feels positive, from the emotion. and a negative rate, which is the rate at which the first person felt negative based on the emotion. The calculation algorithm adds a predetermined value to the satisfaction value when the positive rate is greater than or equal to a threshold for adding points, and subtracts a predetermined value from the satisfaction value when the negative rate is greater than or equal to a threshold for subtracting points. Thereby, the evaluation system can evaluate the degree of satisfaction based on the emotion detected from the voice data of the first person.
 また、本実施の形態の評価システムは、第1の人物が会話をする際に第2の人物が表示される第1表示部(例えば、表示デバイス16)、をさらに備える。抽出部は、第1の特徴量として、第1の人物が第1表示部を見ている時間を算出する。算出アルゴリズムは、時間が第3閾値以上の場合、満足度の値に所定値を加点し、時間が第3閾値以下の第4閾値未満の場合、満足度の値に所定値を減点する。これにより、評価システムは、第1の人物が第1表示部を見ている時間に基づき満足度の評価を行うことができる。 Furthermore, the evaluation system of this embodiment further includes a first display section (for example, display device 16) on which a second person is displayed when the first person has a conversation. The extraction unit calculates a time period during which the first person looks at the first display unit as the first feature amount. The calculation algorithm adds a predetermined value to the satisfaction value when the time is equal to or greater than a third threshold, and subtracts a predetermined value from the satisfaction value when the time is less than a fourth threshold that is equal to or less than the third threshold. Thereby, the evaluation system can evaluate the degree of satisfaction based on the time the first person looks at the first display section.
 また、本実施の形態の評価システムの算出アルゴリズムは、機械学習に基づき満足度を算出する。これにより、評価システムは、特徴量データから機械学習に基づいた算出アルゴリズムで満足度を算出することができる。 Furthermore, the calculation algorithm of the evaluation system of this embodiment calculates the degree of satisfaction based on machine learning. Thereby, the evaluation system can calculate the degree of satisfaction from the feature amount data using a calculation algorithm based on machine learning.
 また、本実施の形態の評価システムは、第2の人物が会話をする際に参照する画面を表示する第2表示部(例えば、表示デバイス16)と、画面を作成する画面作成部(例えば、描画画面作成部24B)と、をさらに備える。画面作成部は、満足度算出部によって算出された満足度の結果を含む画面を作成し第2表示部に表示させる。これにより、評価者である第2の人物は、第1の人物の満足度の結果を確認することができる。これにより、評価システムは、第2の人物に第1の人物の満足度の結果を通知することで、第1の人物が高い満足度で会話をすることを支援することができる。 Furthermore, the evaluation system of the present embodiment includes a second display section (e.g., display device 16) that displays a screen that the second person refers to when having a conversation, and a screen creation section (e.g., It further includes a drawing screen creation section 24B). The screen creation unit creates a screen including the satisfaction result calculated by the satisfaction calculation unit and causes the second display unit to display the screen. This allows the second person, who is the evaluator, to confirm the satisfaction level results of the first person. Thereby, the evaluation system can support the first person to have a conversation with a high level of satisfaction by notifying the second person of the result of the first person's satisfaction level.
 また、本実施の形態に係る評価システムは、第2の人物が会話をする際に参照する画面を表示する第2表示部と、画面を作成する画面作成部と、をさらに備える。画面作成部は、第1の人物と第2の人物との会話に応じて満足度算出部が算出する満足度を満足度算出部から取得している間満足度を第2表示部に表示させる。これにより、第2の人物は、第1の人物と会話している間に第1の人物の現在の満足度を確認することができる。評価システムは、第2の人物に第1の人物の現在の満足度を通知することで、第1の人物が高い満足度となるように会話することを支援することができる。 The evaluation system according to the present embodiment further includes a second display unit that displays a screen that the second person refers to when having a conversation, and a screen creation unit that creates the screen. The screen creation unit displays the satisfaction level on the second display unit while acquiring the satisfaction level calculated by the satisfaction level calculation unit from the satisfaction level calculation unit according to the conversation between the first person and the second person. . This allows the second person to check the first person's current satisfaction level while conversing with the first person. By notifying the second person of the first person's current satisfaction level, the evaluation system can support the first person to have a conversation so that the first person has a high degree of satisfaction.
 また、本実施の形態に係る評価システムは、第2の人物が会話をする際に参照する画面を表示する第2表示部と、画面を作成する画面作成部と、をさらに備える。画面作成部は、第1の割合が第2の割合未満の場合、第2の人物に対し発話を控える旨のメッセージを画面に表示させる。これにより、評価システムは、第1の人物と第2の人物との発話率に基づき、第1の人物の満足度が高くなることを支援するメッセージを表示することができる。 The evaluation system according to the present embodiment further includes a second display unit that displays a screen that the second person refers to when having a conversation, and a screen creation unit that creates the screen. When the first ratio is less than the second ratio, the screen creation unit displays on the screen a message to the effect that the second person should refrain from speaking. Thereby, the evaluation system can display a message that helps increase the satisfaction level of the first person based on the speech rates of the first person and the second person.
 また、本実施の形態に係る評価システムの画面作成部が作成した画面は、第1の人物の撮像映像が映し出される表示領域と、満足度の結果が表示される表示領域と、第2の人物の撮像映像を第1の人物が参照する画面に表示させるボタンと、第2の人物の前記音声データを第1の人物が使用する端末装置から出力させるボタンと、会議の開始または終了を制御するボタンと、を含む。これにより、評価システムは、満足度の結果が含まれた画面を第2の人物に対し表示することができる。これにより、評価システムは、第2の人物に第1の人物の満足度を通知することで、第2の人物が第1の人物と円滑に会話することを支援することができる。 Further, the screen created by the screen creation unit of the evaluation system according to the present embodiment includes a display area where the captured video of the first person is displayed, a display area where the satisfaction result is displayed, and a screen where the second person's captured image is displayed. a button for displaying the captured video on a screen referenced by the first person, a button for outputting the audio data of the second person from a terminal device used by the first person, and a button for controlling the start or end of the conference. including a button. Thereby, the evaluation system can display a screen including the satisfaction result to the second person. Thereby, the evaluation system can support the second person to have a smooth conversation with the first person by notifying the second person of the satisfaction level of the first person.
 また、本実施の形態に係る評価システムの抽出部は、取得部で予め取得された音声データと、撮像部で予め撮像された第1の人物と第2の人物との撮像データと、に基づいて、第1の人物と第2の人物とのそれぞれの視線もしくは顔の向きに係る第1の特徴量と、音声データの第2の特徴量と、を抽出する。これにより、評価システムは、過去に撮像した撮像データおよび収音した音声データから特徴量を抽出し、第1の人物の満足度を評価することができる。 Further, the extraction unit of the evaluation system according to the present embodiment is based on the audio data acquired in advance by the acquisition unit and the imaged data of the first person and the second person that were imaged in advance by the imaging unit. Then, a first feature amount related to the line of sight or face direction of each of the first person and the second person, and a second feature amount of the audio data are extracted. Thereby, the evaluation system can extract the feature amount from the image data captured in the past and the audio data captured, and evaluate the satisfaction level of the first person.
 また、本実施の形態に係る評価システムの抽出部は、撮像部の撮像データに基づいて第1の人物と第2の人物とのそれぞれの表情に係る第3の特徴量を抽出し、満足度算出部は、第3の特徴量と算出アルゴリズムとに基づいて第1の人物の満足度を算出する。これにより、評価システムは、第1の人物の表情に基づく特徴量から満足度を評価することができる。 Further, the extraction unit of the evaluation system according to the present embodiment extracts a third feature amount related to the facial expressions of each of the first person and the second person based on the imaging data of the imaging unit, and calculates the satisfaction level. The calculation unit calculates the satisfaction level of the first person based on the third feature amount and the calculation algorithm. Thereby, the evaluation system can evaluate the degree of satisfaction from the feature amount based on the facial expression of the first person.
 また、本実施の形態に係る評価システムの抽出部は、撮像部の撮像データに基づいて第1の人物と第2の人物とのそれぞれの行動に係る第4の特徴量を抽出する。満足度算出部は、第4の特徴量と算出アルゴリズムとに基づいて人物Aの満足度を算出する。これにより、評価システムは、第1の人物の行動に基づく特徴量から満足度を評価することができる。 Furthermore, the extraction unit of the evaluation system according to the present embodiment extracts a fourth feature amount related to each of the actions of the first person and the second person based on the imaging data of the imaging unit. The satisfaction level calculation unit calculates the satisfaction level of person A based on the fourth feature amount and the calculation algorithm. Thereby, the evaluation system can evaluate the degree of satisfaction from the feature amount based on the behavior of the first person.
 また、本実施の形態に係る評価システムで用いられる第2の特徴量は、音声の強度、単位時間あたりのモーラ数、単語別の強度、音量または音声のスペクトルのうち少なくとも1つである。これにより、評価システムは、第2の特徴量から第1の人物の感情を算出することができる。 Further, the second feature used in the evaluation system according to the present embodiment is at least one of the voice intensity, the number of moras per unit time, the intensity of each word, the volume, or the voice spectrum. Thereby, the evaluation system can calculate the first person's emotion from the second feature amount.
 また、本実施の形態の第2の人物は、第1の人物と対人関係にあり、対人関係には、上司と部下、従業員と顧客、同僚同士または面接官と面接を受ける人のうち少なくとも1つが含まれることを特徴とする。これにより、評価システムは、第2の人物が対人関係にある第1の人物と会話する状況において第1の人物の満足度を評価することができる。 Further, the second person in this embodiment has an interpersonal relationship with the first person, and the interpersonal relationship includes at least one of the following: between a boss and a subordinate, between an employee and a customer, between colleagues, or between an interviewer and an interviewee. It is characterized in that it includes one. Thereby, the evaluation system can evaluate the satisfaction level of the first person in a situation where the second person has a conversation with the first person with whom he or she has an interpersonal relationship.
 また、本実施の形態に係る評価システムは、算出アルゴリズムを記憶する算出アルゴリズム記憶部(例えば、メモリ14またはメモリ22)、をさらに備える。これにより、評価システムは、算出アルゴリズム記憶部に記憶された算出アルゴリズムに基づき、第1の人物の満足度を評価することができる。 Furthermore, the evaluation system according to the present embodiment further includes a calculation algorithm storage unit (for example, the memory 14 or the memory 22) that stores the calculation algorithm. Thereby, the evaluation system can evaluate the satisfaction level of the first person based on the calculation algorithm stored in the calculation algorithm storage unit.
 以上、添付図面を参照しながら実施の形態について説明したが、本開示はかかる例に限定されない。当業者であれば、特許請求の範囲に記載された範疇内において、各種の変更例、修正例、置換例、付加例、削除例、均等例に想到し得ることは明らかであり、それらについても本開示の技術的範囲に属すると了解される。また、発明の趣旨を逸脱しない範囲において、上述した実施の形態における各構成要素を任意に組み合わせてもよい。 Although the embodiments have been described above with reference to the accompanying drawings, the present disclosure is not limited to such examples. It is clear that those skilled in the art can come up with various changes, modifications, substitutions, additions, deletions, and equivalents within the scope of the claims, and It is understood that it falls within the technical scope of the present disclosure. Further, each of the constituent elements in the embodiments described above may be combined as desired without departing from the spirit of the invention.
 なお、本出願は、2022年7月4日出願の日本特許出願(特願2022-107706)に基づくものであり、その内容は本出願の中に参照として援用される。 Note that this application is based on a Japanese patent application (Japanese Patent Application No. 2022-107706) filed on July 4, 2022, the contents of which are incorporated as a reference in this application.
 本開示の技術は、人物間の会話に含まれる複数の情報を用いた高精度な満足度の評価を行う評価システム、評価装置および評価方法として有用である。 The technology of the present disclosure is useful as an evaluation system, an evaluation device, and an evaluation method that perform highly accurate satisfaction evaluation using multiple pieces of information included in conversations between people.
 100,100A,100B 評価システム
 1,1A,1B,1C,1AA,1AB,1BA,1BB 端末装置
 2,2A,2B サーバ
 10 音声取得デバイス
 11 撮像デバイス
 12,24 プロセッサ
 12A 特徴量抽出部
 13 通信I/F
 14,22 メモリ
 15,23 入力デバイス
 16 表示デバイス
 17,26 I/F
 21 通信I/F
 24A 満足度推定部
 24B 描画画面作成部
 NW ネットワーク
 A,B 人物
 CO 発話
 FR1,FR2 画像
 CA,CB ケース
 MN1,MN2,MN3 画面
 IT1,IT2,IT3,IT4 表示領域
 BT1,BT2,BT3,BT4,BT5,BT6 ボタン
 Mes メッセージ
100, 100A, 100B Evaluation system 1, 1A, 1B, 1C, 1AA, 1AB, 1BA, 1BB Terminal device 2, 2A, 2B Server 10 Audio acquisition device 11 Imaging device 12, 24 Processor 12A Feature extraction unit 13 Communication I/ F
14, 22 Memory 15, 23 Input device 16 Display device 17, 26 I/F
21 Communication I/F
24A Satisfaction estimation unit 24B Drawing screen creation unit NW Network A, B Person CO Speech FR1, FR2 Image CA, CB Case MN1, MN2, MN3 Screen IT1, IT2, IT3, IT4 Display area BT1, BT2, BT3, BT4, BT5 ,BT6 button Mes message

Claims (19)

  1.  第1の人物と第2の人物との会話に係る音声データを取得する取得部と、
     前記第1の人物と前記第2の人物とを撮像する撮像部と、
     前記撮像部の撮像データに基づいて前記第1の人物と前記第2の人物とのそれぞれの視線もしくは顔の向きに係る第1の特徴量と、前記音声データの第2の特徴量と、を抽出する抽出部と、
     前記第1の特徴量と、前記第2の特徴量と、前記第1の人物の満足度を算出するための算出アルゴリズムと、に基づいて、前記第1の人物の満足度を算出する満足度算出部と、を備える、
     評価システム。
    an acquisition unit that acquires audio data related to a conversation between the first person and the second person;
    an imaging unit that captures images of the first person and the second person;
    A first feature amount related to the line of sight or face direction of each of the first person and the second person based on image data of the image capture unit, and a second feature amount of the audio data. An extraction part that extracts;
    a satisfaction level that calculates the satisfaction level of the first person based on the first feature amount, the second feature amount, and a calculation algorithm for calculating the satisfaction level of the first person; comprising a calculation section;
    Rating system.
  2.  前記満足度算出部は、前記会話の開始から終了までの間、所定の時間間隔で前記満足度を算出する、
     請求項1の評価システム。
    The satisfaction level calculation unit calculates the satisfaction level at predetermined time intervals from the start to the end of the conversation.
    The evaluation system according to claim 1.
  3.  前記抽出部は、前記第2の特徴量として、前記会話の中の前記第1の人物が発話している割合を示す第1の割合と前記第2の人物が発話している割合を示す第2の割合とを算出し、
     前記算出アルゴリズムは、
      前記第1の割合が前記第2の割合以上の場合、前記満足度の値に所定値を加点し、
      前記第1の割合が前記第2の割合未満の場合、前記満足度の値に前記所定値を減点する、
     請求項1に記載の評価システム。
    The extraction unit includes, as the second feature amount, a first rate indicating a rate at which the first person is speaking in the conversation, and a second rate indicating a rate at which the second person is speaking. Calculate the ratio of 2 and
    The calculation algorithm is
    If the first ratio is greater than or equal to the second ratio, a predetermined value is added to the satisfaction value;
    If the first ratio is less than the second ratio, subtracting the predetermined value from the satisfaction value;
    The evaluation system according to claim 1.
  4.  前記算出アルゴリズムは、
      前記第1の割合が前記第2の割合以上かつ前記第1の割合が第1閾値以上の場合、前記満足度の値に前記所定値を加点し、
      前記第1の割合が前記第2の割合未満かつ前記第1の割合が前記第1閾値以下の第2閾値未満の場合、前記満足度の値に前記所定値を減点する、
     請求項3に記載の評価システム。
    The calculation algorithm is
    If the first proportion is equal to or greater than the second proportion and the first proportion is equal to or greater than a first threshold, the predetermined value is added to the satisfaction value;
    If the first percentage is less than the second percentage and the first percentage is less than a second threshold that is less than or equal to the first threshold, subtracting the predetermined value from the satisfaction value;
    The evaluation system according to claim 3.
  5.  前記抽出部は、前記第2の特徴量として、前記音声データから前記第1の人物の感情を検出し、前記感情から前記第1の人物がポジティブに感じた割合であるポジティブ率と、前記感情から前記第1の人物がネガティブに感じた割合であるネガティブ率と、を算出し、
     前記算出アルゴリズムは、
      前記ポジティブ率が加点に係る閾値以上の場合、前記満足度の値に所定値を加点し、
      前記ネガティブ率が減点に係る閾値以上の場合、前記満足度の値に前記所定値を減点する、
     請求項1に記載の評価システム。
    The extraction unit detects the emotion of the first person from the voice data, and extracts from the emotion a positive rate, which is a rate at which the first person feels positive, and the emotion, as the second feature amount. Calculate the negative rate, which is the rate at which the first person felt negative, from
    The calculation algorithm is
    If the positive rate is equal to or higher than a threshold for adding points, add a predetermined value to the satisfaction value,
    If the negative rate is equal to or higher than a threshold for point deduction, subtracting the predetermined value from the satisfaction level;
    The evaluation system according to claim 1.
  6.  前記第1の人物が前記会話をする際に前記第2の人物が表示される第1表示部、をさらに備え、
     前記抽出部は、前記第1の特徴量として、前記第1の人物が前記第1表示部を見ている時間を算出し、
     前記算出アルゴリズムは、
      前記時間が第3閾値以上の場合、前記満足度の値に所定値を加点し、
      前記時間が前記第3閾値以下の第4閾値未満の場合、前記満足度の値に前記所定値を減点する、
     請求項1に記載の評価システム。
    further comprising a first display section on which the second person is displayed when the first person has the conversation,
    The extraction unit calculates, as the first feature amount, a time period during which the first person looks at the first display unit,
    The calculation algorithm is
    If the time is equal to or greater than a third threshold, a predetermined value is added to the satisfaction value;
    if the time is less than a fourth threshold that is equal to or less than the third threshold, subtracting the predetermined value from the satisfaction value;
    The evaluation system according to claim 1.
  7.  前記算出アルゴリズムは、機械学習に基づき満足度を算出する、
     請求項1に記載の評価システム。
    The calculation algorithm calculates the satisfaction level based on machine learning.
    The evaluation system according to claim 1.
  8.  前記第2の人物が前記会話をする際に参照する画面を表示する第2表示部と、
     前記画面を作成する画面作成部と、をさらに備え、
     前記画面作成部は、前記満足度算出部によって算出された前記満足度の結果を含む画面を作成し前記第2表示部に表示させる、
     請求項1に記載の評価システム。
    a second display unit that displays a screen that the second person refers to when having the conversation;
    further comprising a screen creation unit that creates the screen,
    The screen creation unit creates a screen including the satisfaction result calculated by the satisfaction calculation unit and displays it on the second display unit.
    The evaluation system according to claim 1.
  9.  前記第2の人物が前記会話をする際に参照する画面を表示する第2表示部と、
     前記画面を作成する画面作成部と、をさらに備え、
     前記画面作成部は、前記第1の人物と前記第2の人物との前記会話に応じて前記満足度算出部が算出する前記満足度を前記満足度算出部から取得している間前記満足度を前記第2表示部に表示させる、
     請求項2に記載の評価システム。
    a second display unit that displays a screen that the second person refers to when having the conversation;
    further comprising a screen creation unit that creates the screen,
    The screen creation unit calculates the satisfaction level while acquiring the satisfaction level calculated by the satisfaction level calculation unit from the satisfaction level calculation unit in accordance with the conversation between the first person and the second person. is displayed on the second display section,
    The evaluation system according to claim 2.
  10.  前記第2の人物が前記会話をする際に参照する画面を表示する第2表示部と、
     前記画面を作成する画面作成部と、をさらに備え、
     前記画面作成部は、前記第1の割合が前記第2の割合未満の場合、前記第2の人物に対し発話を控える旨のメッセージを前記画面に表示させる、
     請求項3に記載の評価システム。
    a second display unit that displays a screen that the second person refers to when having the conversation;
    further comprising a screen creation unit that creates the screen,
    When the first ratio is less than the second ratio, the screen creation unit displays a message on the screen to the effect that the second person should refrain from speaking.
    The evaluation system according to claim 3.
  11.  前記画面は、前記第1の人物の撮像映像が映し出される表示領域と、前記満足度の結果が表示される表示領域と、前記第2の人物の撮像映像を前記第1の人物が参照する画面に表示させるボタンと、前記第2の人物の前記音声データを前記第1の人物が使用する端末装置から出力させるボタンと、会議の開始または終了を制御するボタンと、を含む、
     請求項8に記載の評価システム。
    The screen includes a display area where a captured image of the first person is displayed, a display area where the satisfaction result is displayed, and a screen where the first person refers to the captured image of the second person. a button to display the voice data of the second person from a terminal device used by the first person, and a button to control the start or end of the conference.
    The evaluation system according to claim 8.
  12.  前記抽出部は、前記取得部で予め取得された前記音声データと、前記撮像部で予め撮像された前記第1の人物と前記第2の人物との撮像データと、に基づいて、前記第1の人物と前記第2の人物とのそれぞれの前記視線もしくは前記顔の向きに係る前記第1の特徴量と、前記音声データの前記第2の特徴量と、を抽出する、
     請求項1に記載の評価システム。
    The extraction unit is configured to extract the first person based on the audio data acquired in advance by the acquisition unit and image data of the first person and the second person, which are imaged in advance by the imaging unit. extracting the first feature amount related to the line of sight or the direction of the face of the person and the second person, and the second feature amount of the audio data;
    The evaluation system according to claim 1.
  13.  前記抽出部は、前記撮像部の前記撮像データに基づいて前記第1の人物と前記第2の人物とのそれぞれの表情に係る第3の特徴量を抽出し、
     前記満足度算出部は、前記第3の特徴量と前記算出アルゴリズムとに基づいて前記第1の人物の満足度を算出する、
     請求項1に記載の評価システム。
    The extraction unit extracts a third feature amount related to facial expressions of each of the first person and the second person based on the imaging data of the imaging unit,
    The satisfaction level calculation unit calculates the satisfaction level of the first person based on the third feature amount and the calculation algorithm.
    The evaluation system according to claim 1.
  14.  前記抽出部は、前記撮像部の前記撮像データに基づいて前記第1の人物と前記第2の人物とのそれぞれの行動に係る第4の特徴量を抽出し、
     前記満足度算出部は、前記第4の特徴量と前記算出アルゴリズムとに基づいて前記人物Aの満足度を算出する、
     請求項1に記載の評価システム。
    The extracting unit extracts a fourth feature amount related to each of the actions of the first person and the second person based on the imaging data of the imaging unit,
    The satisfaction level calculation unit calculates the satisfaction level of the person A based on the fourth feature amount and the calculation algorithm.
    The evaluation system according to claim 1.
  15.  前記第2の特徴量は、音声の強度、単位時間あたりのモーラ数、単語別の強度、音量または音声のスペクトルのうち少なくとも1つである、
     請求項1に記載の評価システム。
    The second feature amount is at least one of the intensity of the voice, the number of moras per unit time, the intensity of each word, the volume, or the spectrum of the voice.
    The evaluation system according to claim 1.
  16.  前記第2の人物は、前記第1の人物と対人関係にあり、
     前記対人関係には、上司と部下、従業員と顧客、同僚同士または面接官と面接を受ける人のうち少なくとも1つが含まれることを特徴とする、
     請求項1に記載の評価システム。
    The second person has an interpersonal relationship with the first person,
    The interpersonal relationship includes at least one of the following: between a boss and a subordinate, between an employee and a customer, between colleagues, or between an interviewer and an interviewee.
    The evaluation system according to claim 1.
  17.  前記算出アルゴリズムを記憶する算出アルゴリズム記憶部、をさらに備える、
     請求項1に記載の評価システム。
    further comprising a calculation algorithm storage unit that stores the calculation algorithm;
    The evaluation system according to claim 1.
  18.  第1の人物と第2の人物との会話に係る音声データと、前記第1の人物と前記第2の人物とを撮像した撮像データを取得し、前記撮像データに基づいて前記第1の人物と前記第2の人物とのそれぞれの視線もしくは顔の向きに係る第1の特徴量と、前記音声データの第2の特徴量と、を抽出する抽出部と、
     前記第1の特徴量と、前記第2の特徴量と、前記第1の人物の満足度を算出するための算出アルゴリズムと、に基づいて、前記第1の人物の満足度を算出する満足度算出部と、を備える、
     評価装置。
    Acquire audio data related to a conversation between a first person and a second person and image data obtained by capturing images of the first person and the second person, and acquire the image data of the first person and the second person based on the image data. an extraction unit that extracts a first feature amount related to the line of sight or face direction of each of the second person and the second person, and a second feature amount of the audio data;
    a satisfaction level that calculates the satisfaction level of the first person based on the first feature amount, the second feature amount, and a calculation algorithm for calculating the satisfaction level of the first person; comprising a calculation section;
    Evaluation device.
  19.  第1の人物と第2の人物との会話に係る音声データを取得し、
     前記第1の人物と前記第2の人物とを撮像し、
     撮像データに基づいて前記第1の人物と前記第2の人物とのそれぞれの視線もしくは顔の向きに係る第1の特徴量と、前記音声データの第2の特徴量と、を抽出し、
     前記第1の特徴量と、前記第2の特徴量と、前記第1の人物の満足度を算出するための算出アルゴリズムと、に基づいて、前記第1の人物の満足度を算出する、
     評価方法。
    Obtain audio data related to a conversation between a first person and a second person,
    imaging the first person and the second person;
    extracting a first feature amount related to the line of sight or face direction of each of the first person and the second person based on the imaging data, and a second feature amount of the audio data;
    calculating the satisfaction level of the first person based on the first feature amount, the second feature amount, and a calculation algorithm for calculating the satisfaction level of the first person;
    Evaluation method.
PCT/JP2023/018500 2022-07-04 2023-05-17 Evaluation system, evaluation device, and evaluation method WO2024009623A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2022107706A JP2024006627A (en) 2022-07-04 2022-07-04 Evaluation system, evaluation device, and evaluation method
JP2022-107706 2022-07-04

Publications (1)

Publication Number Publication Date
WO2024009623A1 true WO2024009623A1 (en) 2024-01-11

Family

ID=89453003

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/018500 WO2024009623A1 (en) 2022-07-04 2023-05-17 Evaluation system, evaluation device, and evaluation method

Country Status (2)

Country Link
JP (1) JP2024006627A (en)
WO (1) WO2024009623A1 (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011210133A (en) * 2010-03-30 2011-10-20 Seiko Epson Corp Satisfaction degree calculation method, satisfaction degree calculation device and program
JP2011237957A (en) * 2010-05-10 2011-11-24 Seiko Epson Corp Satisfaction calculation device, satisfaction calculation method and program
JP2018041120A (en) * 2016-09-05 2018-03-15 富士通株式会社 Business assessment method, business assessment device and business assessment program
JP2018124604A (en) * 2017-01-30 2018-08-09 グローリー株式会社 Customer service support system, customer service support device and customer service support method
JP2020113197A (en) * 2019-01-16 2020-07-27 オムロン株式会社 Information processing apparatus, information processing method, and information processing program
JP2020160425A (en) * 2019-09-24 2020-10-01 株式会社博報堂Dyホールディングス Evaluation system, evaluation method, and computer program
JP2021072497A (en) * 2019-10-29 2021-05-06 株式会社Zenkigen Analysis device and program
WO2022064621A1 (en) * 2020-09-24 2022-03-31 株式会社I’mbesideyou Video meeting evaluation system and video meeting evaluation server
JP2022075662A (en) * 2020-10-27 2022-05-18 株式会社I’mbesideyou Information extraction apparatus
WO2022137547A1 (en) * 2020-12-25 2022-06-30 株式会社日立製作所 Communication assistance system

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011210133A (en) * 2010-03-30 2011-10-20 Seiko Epson Corp Satisfaction degree calculation method, satisfaction degree calculation device and program
JP2011237957A (en) * 2010-05-10 2011-11-24 Seiko Epson Corp Satisfaction calculation device, satisfaction calculation method and program
JP2018041120A (en) * 2016-09-05 2018-03-15 富士通株式会社 Business assessment method, business assessment device and business assessment program
JP2018124604A (en) * 2017-01-30 2018-08-09 グローリー株式会社 Customer service support system, customer service support device and customer service support method
JP2020113197A (en) * 2019-01-16 2020-07-27 オムロン株式会社 Information processing apparatus, information processing method, and information processing program
JP2020160425A (en) * 2019-09-24 2020-10-01 株式会社博報堂Dyホールディングス Evaluation system, evaluation method, and computer program
JP2021072497A (en) * 2019-10-29 2021-05-06 株式会社Zenkigen Analysis device and program
WO2022064621A1 (en) * 2020-09-24 2022-03-31 株式会社I’mbesideyou Video meeting evaluation system and video meeting evaluation server
JP2022075662A (en) * 2020-10-27 2022-05-18 株式会社I’mbesideyou Information extraction apparatus
WO2022137547A1 (en) * 2020-12-25 2022-06-30 株式会社日立製作所 Communication assistance system

Also Published As

Publication number Publication date
JP2024006627A (en) 2024-01-17

Similar Documents

Publication Publication Date Title
US9674485B1 (en) System and method for image processing
JP2016149063A (en) Emotion estimation system and emotion estimation method
WO2015110880A1 (en) A wearable device, system and method for name recollection
JP2019058625A (en) Emotion reading device and emotion analysis method
WO2019137147A1 (en) Method for identifying identity in video conference and related apparatus
US20200058302A1 (en) Lip-language identification method and apparatus, and augmented reality device and storage medium
JP2016103081A (en) Conversation analysis device, conversation analysis system, conversation analysis method and conversation analysis program
JP2019144917A (en) Stay situation display system and stay situation display method
CN110569726A (en) interaction method and system for service robot
JP7153888B2 (en) Two-way video communication system and its operator management method
WO2024009623A1 (en) Evaluation system, evaluation device, and evaluation method
US11100944B2 (en) Information processing apparatus, information processing method, and program
JP6598227B1 (en) Cat-type conversation robot
JP7206741B2 (en) HEALTH CONDITION DETERMINATION SYSTEM, HEALTH CONDITION DETERMINATION DEVICE, SERVER, HEALTH CONDITION DETERMINATION METHOD, AND PROGRAM
JP5847646B2 (en) Television control apparatus, television control method, and television control program
US11935140B2 (en) Initiating communication between first and second users
JP6711621B2 (en) Robot, robot control method, and robot program
JP6550951B2 (en) Terminal, video conference system, and program
US10440183B1 (en) Cognitive routing of calls based on derived employee activity
KR101878155B1 (en) Method for controlling of mobile terminal
JP7253371B2 (en) Extraction program, extraction method and extraction device
EP3956748A1 (en) Headset signals to determine emotional states
WO2024038699A1 (en) Expression processing device, expression processing method, and expression processing program
EP4242943A1 (en) Information processing system, information processing method, and carrier means
JP7468689B2 (en) Analytical device, analytical method, and analytical program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23835160

Country of ref document: EP

Kind code of ref document: A1