WO2022168175A1 - Terminal d'évaluation de session vidéo, système d'évaluation de session vidéo et programme d'évaluation de session vidéo - Google Patents
Terminal d'évaluation de session vidéo, système d'évaluation de session vidéo et programme d'évaluation de session vidéo Download PDFInfo
- Publication number
- WO2022168175A1 WO2022168175A1 PCT/JP2021/003792 JP2021003792W WO2022168175A1 WO 2022168175 A1 WO2022168175 A1 WO 2022168175A1 JP 2021003792 W JP2021003792 W JP 2021003792W WO 2022168175 A1 WO2022168175 A1 WO 2022168175A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- evaluation
- moving image
- video
- video session
- terminal
- Prior art date
Links
- 238000011156 evaluation Methods 0.000 title claims abstract description 214
- 230000008859 change Effects 0.000 claims abstract description 87
- 238000004458 analytical method Methods 0.000 claims description 87
- 238000010191 image analysis Methods 0.000 claims description 40
- 238000006243 chemical reaction Methods 0.000 description 70
- 230000008921 facial expression Effects 0.000 description 42
- 230000033001 locomotion Effects 0.000 description 40
- 230000008451 emotion Effects 0.000 description 27
- 238000004891 communication Methods 0.000 description 16
- 239000000463 material Substances 0.000 description 16
- 230000004424 eye movement Effects 0.000 description 15
- 238000000034 method Methods 0.000 description 14
- 238000010586 diagram Methods 0.000 description 12
- 238000012545 processing Methods 0.000 description 11
- 230000006399 behavior Effects 0.000 description 10
- 230000001815 facial effect Effects 0.000 description 10
- 238000001514 detection method Methods 0.000 description 8
- 230000008520 organization Effects 0.000 description 8
- 230000006870 function Effects 0.000 description 7
- 208000019901 Anxiety disease Diseases 0.000 description 6
- 230000002996 emotional effect Effects 0.000 description 6
- 230000036506 anxiety Effects 0.000 description 5
- 230000007774 longterm Effects 0.000 description 5
- 230000009471 action Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000014509 gene expression Effects 0.000 description 4
- 238000002372 labelling Methods 0.000 description 4
- 238000004590 computer program Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 230000008094 contradictory effect Effects 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 230000000994 depressogenic effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 210000000887 face Anatomy 0.000 description 2
- 230000010365 information processing Effects 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 208000035473 Communicable disease Diseases 0.000 description 1
- 206010016275 Fear Diseases 0.000 description 1
- 206010022998 Irritability Diseases 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000008512 biological response Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000010219 correlation analysis Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000001151 other effect Effects 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000002829 reductive effect Effects 0.000 description 1
- 230000001932 seasonal effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
Definitions
- the present disclosure relates to a video session evaluation terminal, a video session evaluation system, and a video session evaluation program.
- Patent Document 1 Conventionally, there is known a technique for analyzing the emotions others receive in response to a speaker's remarks (see Patent Document 1, for example). There is also known a technique for analyzing changes in facial expressions of a subject over a long period of time in time series and estimating the emotions held during that period (see, for example, Patent Document 2). Furthermore, there are known techniques for identifying factors that have the greatest effect on changes in emotions (see Patent Documents 3 to 5, for example). Furthermore, there is also known a technique of comparing a subject's usual facial expression with a current facial expression and issuing an alert when the facial expression is dark (see, for example, Patent Document 6).
- Patent Documents 7 to 9 There is also known a technique for determining the degree of emotion of a subject by comparing the subject's normal (expressionless) facial expression with the current facial expression (for example, Patent Documents 7 to 9). reference). Furthermore, there is also known a technique for analyzing the feeling of an organization and the atmosphere within a group that an individual feels (see Patent Documents 10 and 11, for example).
- the purpose of the present invention is to objectively evaluate exchanged communication in order to conduct more efficient communication in situations where online communication is the main focus.
- a camera unit that acquires a moving image obtained by photographing a target person; a line-of-sight acquisition unit that acquires a movement of the subject's line of sight based on the acquired moving image; a display unit that continuously displays a plurality of images to the subject; a position acquisition unit that acquires a positional relationship between the camera unit and the display unit; an output unit that associates and outputs the eye movement for each of the plurality of displayed images; is obtained.
- acquisition means for acquiring at least a moving image
- face recognition means for recognizing at least a face image of a target person included in the moving image for each predetermined frame
- voice recognition means for recognizing at least the voice of the subject included in the moving image
- evaluation means for calculating an evaluation value from a predetermined viewpoint based on both the recognized face image and the recognized voice
- output means for outputting the evaluation value as change information along a time series
- a moving image analysis system comprising: identifying means for referring to other change information relating to other moving images and identifying other moving images containing the same pattern as the pattern extracted from the change information. is obtained.
- acquisition means for acquiring moving images relating to a video session conducted between at least two user terminals; face recognition means for recognizing a user's face image included in the moving image for each predetermined frame; voice recognition means for recognizing at least voice of the user included in the moving image; evaluation means for calculating an evaluation value from a plurality of viewpoints based on both the recognized face image and the recognized voice; a storage means for storing the evaluation value as change information along a time series; detection means for detecting that only the evaluation value in one of the plurality of viewpoints has changed beyond a predetermined range; a peculiar frame acquiring means for acquiring a peculiar frame including the detected range; Video image analysis system. is obtained.
- acquisition means for acquiring moving images relating to a video session conducted between at least two user terminals; face recognition means for recognizing a user's face image included in the moving image for each predetermined frame; voice recognition means for recognizing at least voice of the user included in the moving image; facial expression evaluation means for calculating facial expression evaluation values from a plurality of viewpoints based on the recognized face image; speech evaluation means for calculating speech evaluation values from a plurality of viewpoints based on the recognized speech; facial expression/speech correlation evaluation means for evaluating the correlation between the facial expression evaluation value and the speech evaluation value of at least one of the users; detection means for detecting that the facial expression evaluation value and the voice evaluation value have changed beyond a predetermined range based on the correlation; a peculiar frame acquiring means for acquiring a peculiar frame including the detected range; Video image analysis system. is obtained.
- acquisition means for acquiring moving images relating to a video session conducted between at least two user terminals; face recognition means for recognizing a user's face image included in the moving image for each predetermined frame; Emotion evaluation means for analyzing the user's standard facial expression from the recognized face image and evaluating the degree of deviation from the user's standard facial expression; concentration evaluation means for evaluating at least the amount of eye movement or face movement of the user from the recognized face image; Safety evaluation means for evaluating the user's feeling of anxiety from the recognized face image; score generation means for generating a score based on two or more evaluations of the emotion evaluation means, the concentration evaluation means, and the safety evaluation means; Video image analysis system. is obtained.
- Acquisition means for acquiring a moving image of a video session conducted with another terminal; face recognition means for recognizing at least a face image of a target person included in the moving image for each predetermined frame; a target person specifying means for specifying a target person frame in which the target person is recognized from each of the moving images relating to the plurality of the video sessions; further comprising a digest generating video means for generating a digest video by connecting a plurality of target person frames; Video analysis system. is obtained.
- Acquisition means for acquiring a moving image of a video session conducted with another terminal; face recognition means for recognizing at least a face image of a target person included in the moving image for each predetermined frame; evaluation means for evaluating at least the amount of eye movement and face movement of the user from the recognized face image;
- a moving image analysis system comprising score calculation means for calculating a score relating to the degree of concentration based on the evaluation. is obtained.
- Acquisition means for acquiring a moving image of a video session conducted with another terminal; face recognition means for recognizing at least a face image of a user included in the moving image for each predetermined frame; voice recognition means for recognizing at least the voice of the user included in the moving image; evaluation means for calculating an evaluation value from a plurality of viewpoints based on both the recognized face image and the voice; For the same user, a first evaluation analysis value obtained by analyzing the evaluation value in a first period and a second evaluation analysis value obtained by analyzing the evaluation value in a second period longer than the first period a state evaluation means for evaluating the state of the user based on Video image analysis system. is obtained.
- Acquisition means for acquiring a moving image of a video session conducted with another terminal; face recognition means for recognizing at least a face image of a target person included in the moving image for each predetermined frame; voice recognition means for recognizing at least the voice of the user included in the moving image; evaluation means for calculating an evaluation value from a plurality of viewpoints based on both the recognized face image and the voice; answer acquisition means for acquiring the user's answer information to the question information created based on the plurality of viewpoints; evaluating the state of the user by comparing the evaluation value and the answer information; status rating system. is obtained.
- Acquisition means for acquiring a moving image of a video session conducted with another terminal; face recognition means for recognizing at least a face image of a user included in the moving image for each predetermined frame; voice recognition means for recognizing at least the voice of the subject included in the moving image; evaluation means for calculating an evaluation value from a plurality of viewpoints based on both the recognized face image and the voice; Annotation receiving means for receiving an annotation from the user for the evaluation value;
- a video session evaluation system comprising display means for simultaneously displaying the evaluation value and the received annotation. is obtained.
- a moving image acquisition means for acquiring a moving image of a sales video session conducted between a sales terminal of a person in charge of sales and a terminal of a sales destination of a person in charge of sales; a contract information acquisition means for acquiring contract information of the sales video session; face recognition means for recognizing the face image of at least one of the person in charge of the sales side or the person in charge of the sales partner included in the moving image for each predetermined frame; A speech recognition means that recognizes the speech of at least one of the person in charge on the sales side and the person in charge of the sales partner included in the moving image, and an evaluation value from a plurality of viewpoints based on both the recognized face image and the recognized voice.
- model generation means for generating a contract conclusion estimation model for estimating the contract conclusion rates of moving images of other sales video sessions as a plurality of ranks, using the evaluation value and the contract conclusion information as teacher data; determining means for associating one of the plurality of ranks with a new sales video session using the model; Video session rating system. is obtained.
- a lecturer terminal having a lecturer-side camera for capturing at least the face of a lecturer user; and a student terminal communicably connected to the lecturer terminal via a network for capturing at least the face of a student.
- Acquisition means for acquiring a moving image of a video session held between a student terminal having a face camera and a hand camera for projecting the hand of the student;
- Hand action recognition means for recognizing at least the action of the student's hand in the moving image acquired from the hand camera for each predetermined frame; estimating means for estimating the degree of comprehension of the student based on the recognized hand motion; comprising a Video session rating system. is obtained.
- exchanged communication can be objectively evaluated in order to conduct more efficient communication in situations where online communication is the main activity.
- FIG. 1 is an example of a functional block diagram of an evaluation terminal according to an embodiment of the present invention
- FIG. FIG. 2 is a diagram showing functional configuration example 1 of an evaluation terminal according to an embodiment of the present invention
- FIG. 8 is a diagram showing functional configuration example 2 of the evaluation terminal according to the embodiment of the present invention
- FIG. 10 is a diagram showing a functional configuration example 3 of the evaluation terminal according to the embodiment of the present invention
- 7 is a screen display example according to the functional configuration example 3 of FIG. 6.
- FIG. FIG. 7 is another screen display example according to the functional configuration example 3 of FIG. 6.
- FIG. 12 is a diagram showing another configuration of functional configuration example 3 of the evaluation terminal according to the embodiment of the present invention
- FIG. 12 is a diagram showing another configuration of functional configuration example 3 of the evaluation terminal according to the embodiment of the present invention
- Fig. 2 shows a heat map of the system according to the first embodiment of the invention
- It is a figure which shows the image of the calibration of the system by the 1st Embodiment of this invention.
- Fig. 3 is a graphical comparison of a system according to a second embodiment of the invention
- Fig. 10 is a comparative diagram of another graph of the system according to the second embodiment of the invention
- Fig. 3 shows a graph of a system according to a third embodiment of the invention;
- FIG. 10 shows another graph of the system according to the third embodiment of the invention
- Fig. 11 shows another graph of the system according to the fourth embodiment of the invention
- FIG. 22 is a diagram showing an image of system evaluation according to the twelfth embodiment of the present invention.
- the contents of the embodiments of the present disclosure are listed and described.
- the present disclosure has the following configurations.
- a camera unit that acquires a moving image obtained by photographing a target person; a line-of-sight acquisition unit that acquires a movement of the subject's line of sight based on the acquired moving image; a display unit that continuously displays a plurality of images to the subject; a position acquisition unit that acquires a positional relationship between the camera unit and the display unit; an output unit that associates and outputs the eye movement for each of the plurality of displayed images;
- a line-of-sight evaluation system A line-of-sight evaluation system.
- Gaze evaluation system [Item 2] The line-of-sight evaluation system according to item 1, The output unit superimposes on the image a heat map indicating a fixation time generated based on the movement of the eye line and outputs the image. Gaze evaluation system [Item 3] The line-of-sight evaluation system according to item 1, The output unit further associates and outputs the movement of the line of sight of another subject who displayed the same image. Gaze evaluation system [Item 4] The line-of-sight evaluation system according to item 3, Further comprising a peculiar determination unit that determines whether the movement of the eye line associated with the subject is more specific than the movement of the eye line associated with the other subject, line of sight evaluation system.
- the line-of-sight evaluation system according to any one of items 1 to 4,
- the output unit associates and outputs a leveled heat map obtained by leveling the eye movements of the plurality of subjects for each image.
- line of sight evaluation system [Item 6] acquisition means for acquiring at least a moving image; face recognition means for recognizing at least a face image of a target person included in the moving image for each predetermined frame; voice recognition means for recognizing at least the voice of the subject included in the moving image; evaluation means for calculating an evaluation value from a predetermined viewpoint based on both the recognized face image and the recognized voice; output means for outputting the evaluation value as change information along a time series;
- a moving image analysis system comprising: identifying means for referring to other change information relating to other moving images and identifying other moving images containing the same pattern as the pattern extracted from the change information.
- the moving image analysis system according to item 6,
- the output means outputs the evaluation values as chronological graph information
- the identification means receives a selection operation of a portion of the graph information from the analysis user, and identifies a corresponding frame of another moving image that includes the same graph pattern as the graph pattern of the selected portion. and a moving image analysis system.
- the moving image analysis system according to item 6 or item 7, The moving image analysis system, wherein the identifying means identifies other moving images including the same pattern as the pattern extracted from the change information in the same time period.
- acquisition means for acquiring moving images relating to a video session conducted between at least two user terminals; face recognition means for recognizing a user's face image included in the moving image for each predetermined frame; voice recognition means for recognizing at least voice of the user included in the moving image; evaluation means for calculating an evaluation value from a plurality of viewpoints based on both the recognized face image and the recognized voice; a storage means for storing the evaluation value as change information along a time series; detection means for detecting that only the evaluation value in one of the plurality of viewpoints has changed beyond a predetermined range; a peculiar frame acquiring means for acquiring a peculiar frame including the detected range; Video image analysis system.
- the moving image analysis system according to item 9 The plurality of viewpoints includes a first viewpoint and a second viewpoint associated with mutually contradictory attributes, The detection means detects that the evaluation value from the first viewpoint and the evaluation value from the second viewpoint have deviated beyond the predetermined range. Video image analysis system.
- the detection means detects that the evaluation value of the one viewpoint changes beyond a predetermined range within a predetermined time period after the first time point and immediately after that, the evaluation value becomes substantially the same as the evaluation value at the first time point. detect that it has returned to a value, Video image analysis system.
- acquisition means for acquiring moving images relating to a video session conducted between at least two user terminals; face recognition means for recognizing a user's face image included in the moving image for each predetermined frame; voice recognition means for recognizing at least voice of the user included in the moving image; facial expression evaluation means for calculating facial expression evaluation values from a plurality of viewpoints based on the recognized face image; speech evaluation means for calculating speech evaluation values from a plurality of viewpoints based on the recognized speech; facial expression/speech correlation evaluation means for evaluating the correlation between the facial expression evaluation value and the speech evaluation value of at least one of the users; detection means for detecting that the facial expression evaluation value and the voice evaluation value have changed beyond a predetermined range based on the correlation; a peculiar frame acquiring means for acquiring a peculiar frame including the detected range; Video image analysis system.
- the moving image analysis system according to item 15 further comprising attribute evaluation means for associating attributes corresponding to the facial expression evaluation value and the voice evaluation value, The detection means detects that the attribute of the facial expression evaluation value and the attribute of the voice evaluation value are mutually exclusive.
- Video image analysis system [Item 17] The moving image analysis system according to either item 15 or item 16, Further comprising a digest generating video means for generating a digest video by linking the plurality of specific frames acquired from the moving image, Video analysis system.
- the moving image analysis system according to any one of items 15 to 18,
- the video session is capable of sharing screen information displayed on the screen of one user terminal, further comprising shared screen output means for outputting at least the screen information corresponding to the shared specific frame; Video image analysis system.
- acquisition means for acquiring moving images relating to a video session conducted between at least two user terminals; face recognition means for recognizing a user's face image included in the moving image for each predetermined frame; Emotion evaluation means for analyzing the user's standard facial expression from the recognized face image and evaluating the degree of deviation from the user's standard facial expression; concentration evaluation means for evaluating at least the amount of eye movement or face movement of the user from the recognized face image; Safety evaluation means for evaluating the user's feeling of anxiety from the recognized face image; score generation means for generating a score based on two or more evaluations of the emotion evaluation means, the concentration evaluation means, and the safety evaluation means; Video image analysis system.
- Acquisition means for acquiring a moving image of a video session conducted with another terminal; face recognition means for recognizing at least a face image of a target person included in the moving image for each predetermined frame; a target person specifying means for specifying a target person frame in which the target person is recognized from each of the moving images relating to the plurality of the video sessions; further comprising a digest generating video means for generating a digest video by connecting a plurality of target person frames; Video analysis system.
- Acquisition means for acquiring a moving image of a video session conducted with another terminal; face recognition means for recognizing at least a face image of a target person included in the moving image for each predetermined frame; evaluation means for evaluating at least the amount of eye movement and face movement of the user from the recognized face image;
- a moving image analysis system comprising score calculation means for calculating a score relating to the degree of concentration based on the evaluation.
- Acquisition means for acquiring a moving image of a video session conducted with another terminal; face recognition means for recognizing at least a face image of a user included in the moving image for each predetermined frame; voice recognition means for recognizing at least the voice of the user included in the moving image; evaluation means for calculating an evaluation value from a plurality of viewpoints based on both the recognized face image and the voice; For the same user, a first evaluation analysis value obtained by analyzing the evaluation value in a first period and a second evaluation analysis value obtained by analyzing the evaluation value in a second period longer than the first period a state evaluation means for evaluating the state of the user based on Video image analysis system.
- a moving image analysis system according to item 23, trend detection means for detecting a predetermined trend with respect to the second evaluation analysis value; a correction means for correcting the first evaluation analysis value according to the detected trend; Video analysis system.
- Acquisition means for acquiring a moving image of a video session conducted with another terminal; face recognition means for recognizing at least a face image of a target person included in the moving image for each predetermined frame; voice recognition means for recognizing at least the voice of the user included in the moving image; evaluation means for calculating an evaluation value from a plurality of viewpoints based on both the recognized face image and the voice; answer acquisition means for acquiring the user's answer information to the question information created based on the plurality of viewpoints; evaluating the state of the user by comparing the evaluation value and the answer information; status rating system.
- [Item 26] 26.
- Acquisition means for acquiring a moving image of a video session conducted with another terminal; face recognition means for recognizing at least a face image of a user included in the moving image for each predetermined frame; voice recognition means for recognizing at least the voice of the subject included in the moving image; evaluation means for calculating an evaluation value from a plurality of viewpoints based on both the recognized face image and the voice; Annotation receiving means for receiving an annotation from the user for the evaluation value;
- a video session evaluation system comprising display means for simultaneously displaying the evaluation value and the received annotation.
- Video session rating system [Item 29] a moving image acquisition means for acquiring a moving image of a sales video session conducted between a sales terminal of a person in charge of sales and a terminal of a sales destination of a person in charge of sales; a contract information acquisition means for acquiring contract information of the sales video session; face recognition means for recognizing the face image of at least one of the person in charge of the sales side or the person in charge of the sales partner included in the moving image for each predetermined frame; A speech recognition means that recognizes the speech of at least one of the person in charge on the sales side and the person in charge of the sales partner included in the moving image, and an evaluation value from a plurality of viewpoints based on both the recognized face image and the recognized voice.
- model generation means for generating a contract conclusion estimation model for estimating the contract conclusion rates of moving images of other sales video sessions as a plurality of ranks, using the evaluation value and the contract conclusion information as teacher data; determining means for associating one of the plurality of ranks with a new sales video session using the model; Video session rating system.
- a lecturer terminal having a lecturer-side camera for capturing at least the face of a lecturer user; and a student terminal communicably connected to the lecturer terminal via a network for capturing at least the face of a student.
- Acquisition means for acquiring a moving image of a video session held between a student terminal having a face camera and a hand camera for projecting the hand of the student; hand action recognition means for recognizing at least the action of the student's hand in the moving image acquired from the hand camera for each predetermined frame; estimating means for estimating the degree of comprehension of the student based on the recognized hand motion; comprising a Video session rating system.
- 32 32.
- the video session rating system of item 31 comprising: speech recognition means for recognizing at least the speech of the student included in the moving image; evaluation means for calculating an evaluation value from a plurality of viewpoints based on both the recognized face image and the speech; The first estimation means estimates the degree of understanding of the student based on the motion at hand and the evaluation value.
- Video session rating system [Item 33] A video session evaluation system according to item 31 or item 32, The estimating means estimates the degree of comprehension of the learner according to the amount of the recognized movement of the hand. Video session rating system.
- the video session evaluation system according to any one of items 31 to 33, The estimating means further comprises alert means for issuing an alert when the student's face is not captured by the student's face camera and when the hand camera does not recognize the student's hand movement, Video session rating system.
- a video session in an environment where a video session (hereinafter referred to as an online session including one-way and two-way sessions) is held by a plurality of people, the person to be analyzed among the plurality of people is different from the others. It is a system that analyzes and evaluates specific emotions (feelings that occur in response to one's own or others' behavior. Pleasure/displeasure or degree of such).
- Online sessions are, for example, online meetings, online classes, online chats, etc.
- Terminals installed in multiple locations are connected to a server via a communication network such as the Internet, and moving images are transmitted between multiple terminals through the server. It's made to be interactable.
- Moving images also include images such as materials that are shared and viewed by a plurality of users. It is possible to switch between the face image and the document image on the screen of each terminal to display only one of them, or to divide the display area and display the face image and the document image at the same time. In addition, it is possible to display the image of one user out of a plurality of users on the full screen, or divide the images of some or all of the users into small screens and display them.
- an online session leader, moderator, or manager designates any user as an analysis subject.
- Hosts of online sessions are, for example, instructors of online classes, chairpersons and facilitators of online meetings, coaches of sessions for coaching purposes, and the like.
- An online session host is typically one of the users participating in the online session, but may be another person who does not participate in the online session. It should be noted that all participants may be subject to analysis without specifying the person to be analyzed.
- the leader, moderator, or manager of an online session (hereinafter collectively referred to as the organizer) to designate any user as a person to be analyzed.
- Hosts of online sessions are, for example, instructors of online classes, chairpersons and facilitators of online meetings, coaches of sessions for coaching purposes, and the like.
- An online session host is typically one of the users participating in the online session, but may be another person who does not participate in the online session.
- the video session evaluation system displays at least moving images obtained from a video session established between a plurality of terminals.
- the displayed moving image is acquired by the terminal, and at least a face image included in the moving image is identified for each predetermined frame unit. An evaluation value for the identified face image is then calculated. The evaluation value is shared as necessary.
- the acquired moving images are stored in the terminal, analyzed and evaluated on the terminal, and the results are provided to the user of the terminal. Therefore, for example, even a video session containing personal information or a video session containing confidential information can be analyzed and evaluated without providing the moving image itself to an external evaluation agency or the like.
- the evaluation result evaluation value
- the video session evaluation system includes user terminals 10 and 20 each having at least an input unit such as a camera unit and a microphone unit, a display unit such as a display, and an output unit such as a speaker. , a video session service terminal 30 for providing an interactive video session to the user terminals 10, 20, and an evaluation terminal 40 for performing part of the evaluation of the video session.
- FIG. 2 is a diagram showing a hardware configuration example of a computer that implements each of the terminals 10 to 40 according to this embodiment.
- the computer includes at least a control unit 110, a memory 120, a storage 130, a communication unit 140, an input/output unit 150, and the like. These are electrically connected to each other through bus 160 .
- the control unit 110 is an arithmetic device that controls the overall operation of each terminal, controls transmission and reception of data between elements, executes applications, and performs information processing necessary for authentication processing.
- the control unit 110 is a processor such as a CPU, and executes each information processing by executing a program or the like stored in the storage 130 and developed in the memory 120 .
- the memory 120 includes a main memory made up of a volatile memory device such as a DRAM, and an auxiliary memory made up of a non-volatile memory device such as a flash memory or an HDD.
- the memory 120 is used as a work area or the like for the control unit 110, and stores the BIOS executed when each terminal is started, various setting information, and the like.
- the storage 130 stores various programs such as application programs.
- a database storing data used for each process may be constructed in the storage 130 .
- moving images in the online session are not recorded in the storage 130 of the video session service terminal 30, but are stored in the storage 130 of the user terminal 10.
- the evaluation terminal 40 stores an application and other programs necessary for evaluating the moving image acquired on the user terminal 10, and appropriately provides them so that the user terminal 10 can use them.
- the storage 13 managed by the evaluation terminal 40 may share, for example, only the results of analysis and evaluation by the user terminal 10 .
- the communication unit 140 connects the terminal to the network.
- the communication unit 140 is, for example, wired LAN, wireless LAN, Wi-Fi (registered trademark), infrared communication, Bluetooth (registered trademark), short-range or non-contact communication, etc., and communicates directly with an external device or through a network access point. Communicate via
- the input/output unit 150 is, for example, information input devices such as a keyboard, mouse, and touch panel, and output devices such as a display.
- a bus 160 is commonly connected to each of the above elements and transmits, for example, address signals, data signals and various control signals.
- the evaluation terminal acquires a moving image from a video session service terminal, identifies at least a face image included in the moving image for each predetermined frame unit, and calculates an evaluation value for the face image.
- this service provides user terminals 10 and 20 with two-way images and voice. Communication is possible.
- a moving image captured by the camera of the other user's terminal is displayed on the display of the user's terminal, and audio captured by the microphone of the other's user's terminal can be output from the speaker.
- this service allows both or either of the user terminals to record moving images and sounds (collectively referred to as “moving images, etc.") in the storage unit of at least one of the user terminals. configured as possible.
- the recorded moving image information Vs (hereinafter referred to as “recorded information”) is cached in the user terminal that started recording and is locally recorded only in one of the user terminals. If necessary, the user can view the recorded information by himself or share it with others within the scope of the use of this service.
- the user terminal 10 acquires the recorded information and performs analysis and evaluation as described later.
- the user terminal 10 evaluates the video acquired as described above by the following analysis.
- FIG. 4 is a block diagram showing a configuration example according to this embodiment.
- the video session evaluation system of this embodiment is realized as a functional configuration of the user terminal 10.
- the user terminal 10 has, as its functions, a moving image acquisition unit 11, a biological reaction analysis unit 12, a peculiar determination unit 13, a related event identification unit 14, a clustering unit 15, and an analysis result notification unit 16.
- Each of the functional blocks 11 to 16 can be configured by any of hardware, a DSP (Digital Signal Processor), and software provided in the user terminal 10, for example.
- DSP Digital Signal Processor
- each of the functional blocks 11 to 16 is actually configured with a computer CPU, RAM, ROM, etc., and a program stored in a recording medium such as RAM, ROM, hard disk, or semiconductor memory. is realized by the operation of
- the moving image acquisition unit 11 acquires from each terminal a moving image obtained by photographing a plurality of people (a plurality of users) with a camera provided in each terminal during an online session. It does not matter whether the moving image acquired from each terminal is set to be displayed on the screen of each terminal. That is, the moving image acquisition unit 11 acquires moving images from each terminal, including moving images being displayed and moving images not being displayed on each terminal.
- the biological reaction analysis unit 12 analyzes changes in the biological reaction of each of a plurality of people based on the moving images (whether or not they are being displayed on the screen) acquired by the moving image acquiring unit 11.
- the biological reaction analysis unit 12 separates the moving image acquired by the moving image acquisition unit 11 into a set of images (collection of frame images) and voice, and analyzes changes in the biological reaction from each.
- the biological reaction analysis unit 12 analyzes the user's facial image using a frame image separated from the moving image acquired by the moving image acquisition unit 11 to obtain at least one of facial expression, gaze, pulse, and facial movement. Analyze changes in biological reactions related to Further, the biological reaction analysis unit 12 analyzes the voice separated from the moving image acquired by the moving image acquisition unit 11 to analyze changes in the biological reaction related to at least one of the user's utterance content and voice quality.
- the biological reaction analysis unit 12 calculates a biological reaction index value reflecting the change in biological reaction by quantifying the change in biological reaction according to a predetermined standard.
- the analysis of changes in facial expressions is performed, for example, as follows. That is, for each frame image, a facial region is identified from the frame image, and the identified facial expressions are classified into a plurality of types according to an image analysis model machine-learned in advance. Then, based on the classification results, it analyzes whether positive facial expression changes occur between consecutive frame images, whether negative facial expression changes occur, and to what extent the facial expression changes occur, A facial expression change index value corresponding to the analysis result is output.
- the analysis of changes in line of sight is performed as follows. That is, for each frame image, the eye region is specified in the frame image, and the orientation of both eyes is analyzed to analyze where the user is looking. For example, it analyzes whether the user is looking at the face of the speaker being displayed, whether the user is looking at the shared material being displayed, or is looking outside the screen. Also, it may be analyzed whether the eye movement is large or small, or whether the movement is frequent or infrequent. A change in line of sight is also related to the user's degree of concentration.
- the biological reaction analysis unit 12 outputs a line-of-sight change index value according to the analysis result of the line-of-sight change.
- the analysis of pulse changes is performed, for example, as follows. That is, for each frame image, the face area is specified in the frame image. Then, using a trained image analysis model that captures numerical values of face color information (G of RGB), changes in the G color of the face surface are analyzed. By arranging the results along the time axis, a waveform representing changes in color information is formed, and the pulse is identified from this waveform. When a person is tense, the pulse speeds up, and when the person is calm, the pulse slows down. The biological reaction analysis unit 12 outputs a pulse change index value according to the analysis result of the pulse change.
- G of RGB face color information
- analysis of changes in facial movements is performed as follows. That is, for each frame image, the face area is specified in the frame image, and the direction of the face is analyzed to analyze where the user is looking. For example, it analyzes whether the user is looking at the face of the speaker being displayed, whether the user is looking at the shared material being displayed, or is looking outside the screen. Further, it may be analyzed whether the movement of the face is large or small, or whether the movement is frequent or infrequent. The movement of the face and the movement of the line of sight may be analyzed together. For example, it may be analyzed whether the face of the speaker being displayed is viewed straight, whether the face is viewed with upward or downward glances, or whether the face is viewed obliquely.
- the biological reaction analysis unit 12 outputs a face orientation change index value according to the analysis result of the face orientation change.
- the biological reaction analysis unit 12 converts the voice into a character string by performing known voice recognition processing on the voice for a specified time (for example, about 30 to 150 seconds), and morphologically analyzes the character string. By doing so, words such as particles and articles that are unnecessary for expressing conversation are removed. Then, vectorize the remaining words, analyze whether a positive emotional change has occurred, whether a negative emotional change has occurred, and to what extent the emotional change has occurred. Outputs the utterance content index value.
- Voice quality analysis is performed, for example, as follows. That is, the biological reaction analysis unit 12 identifies the acoustic features of the voice by performing known voice analysis processing on the voice for a specified time (for example, about 30 to 150 seconds). Then, based on the acoustic features, it analyzes whether a positive change in voice quality has occurred, whether a negative change in voice quality has occurred, and to what extent the change in voice quality has occurred, and according to the analysis results, output the voice quality change index value.
- a specified time for example, about 30 to 150 seconds
- the biological reaction analysis unit 12 uses at least one of the facial expression change index value, eye line change index value, pulse change index value, face direction change index value, statement content index value, and voice quality change index value calculated as described above. to calculate the biological reaction index value.
- the biological reaction index value is calculated by weighting the facial expression change index value, eye line change index value, pulse change index value, face direction change index value, statement content index value, and voice quality change index value.
- the peculiarity determination unit 13 determines whether or not the change in the analyzed biological reaction of the person to be analyzed is more specific than the change in the analyzed biological reaction of the person other than the person to be analyzed. In the present embodiment, the peculiarity determination unit 13 compares changes in the biological reaction of the person to be analyzed with those of others based on the biological reaction index values calculated for each of the plurality of users by the biological reaction analysis unit 12. is specific or not.
- the peculiar determination unit 13 calculates the variance of the biological reaction index values calculated for each of the plurality of persons by the biological reaction analysis unit 12, and compares the biological reaction index values calculated for the analysis subject with the variance, It is determined whether or not the change in the analyzed biological reaction of the person to be analyzed is specific compared to others.
- the following three patterns are conceivable as cases where the changes in biological reactions analyzed for the subject of analysis are more specific than those of others.
- the first is a case where a relatively large change in biological reaction occurs in the subject of analysis, although no particularly large change in biological reaction has occurred in the other person.
- the second is a case where a particularly large change in biological reaction has not occurred in the subject of analysis, but a relatively large change in biological reaction has occurred in the other person.
- the third is a case where a relatively large change in biological reaction occurs in both the subject of analysis and the other person, but the content of the change differs between the subject of analysis and the other person.
- the related event identification unit 14 identifies an event occurring in relation to at least one of the person to be analyzed, the other person, and the environment when the change in the biological reaction determined to be peculiar by the peculiarity determination unit 13 occurs. .
- the related event identification unit 14 identifies from the moving image the speech and behavior of the person to be analyzed when a specific change in biological reaction occurs in the person to be analyzed.
- the related event identifying unit 14 identifies, from the moving image, the speech and behavior of the other person when a specific change in the biological reaction of the person to be analyzed occurs.
- the related event identification unit 14 identifies from the moving image the environment in which a specific change in the biological reaction of the person to be analyzed occurs.
- the environment is, for example, the shared material being displayed on the screen, the background image of the person to be analyzed, and the like.
- the clustering unit 15 clusters the change in the biological reaction determined to be specific by the peculiarity determination unit 13 (for example, one or a combination of eye gaze, pulse, facial movement, statement content, and voice quality), and the peculiarity Analyzing the degree of correlation with an event (event identified by the related event identification unit 14) that occurs when a change in biological reaction occurs, and if it is determined that the correlation is at a certain level or more , to cluster the subjects or events based on the correlation analysis results.
- the peculiarity determination unit 13 for example, one or a combination of eye gaze, pulse, facial movement, statement content, and voice quality
- the clustering unit 15 clusters the person to be analyzed or the event into one of a plurality of pre-segmented categories according to the content of the event, the degree of negativity, the magnitude of the correlation, and the like.
- the clustering unit 15 clusters the person to be analyzed or the event into one of a plurality of pre-segmented classifications according to the content of the event, the degree of positivity, the degree of correlation, and the like.
- the analysis result notification unit 16 reports at least one of the changes in the biological reaction determined to be specific by the peculiar determination unit 13, the event identified by the related event identification unit 14, and the classification clustered by the clustering unit 15. , to notify the designator of the subject of analysis (the subject of analysis or the organizer of the online session).
- the analysis result notification unit 16 recognizes that when a change in a specific biological reaction that is different from that of the other person occurs in the person to be analyzed (one of the three patterns described above; the same applies hereinafter), the analysis target is Notifies the person to be analyzed of his/her own behavior. This allows the person to be analyzed to understand that he/she has a different feeling from others when he or she performs a certain behavior. At this time, the person to be analyzed may also be notified of the change in the specific biological reaction identified for the person to be analyzed. Furthermore, the person to be analyzed may be further notified of the change in the biological reaction of the other person to be compared.
- the words and deeds of the person to be analyzed performed without being particularly conscious of their usual emotions, or the words and deeds of the person to be analyzed consciously accompanied by certain emotions, and the emotions and behaviors that others received
- the emotion held by the person to be analyzed is different from the feeling held by the person to be analyzed at the time
- the person to be analyzed is notified of the speech and behavior of the person to be analyzed at that time.
- the analysis result notification unit 16 notifies the organizer of the online session of the event occurring when the person to be analyzed undergoes a specific change in biological reaction that is different from that of the other person, together with the change in the specific biological reaction. to notify.
- the organizer of the online session can know what kind of event affects what kind of emotional change as a phenomenon specific to the specified analysis subject. Then, it becomes possible to perform appropriate treatment on the person to be analyzed according to the grasped contents.
- the analysis result notification unit 16 notifies the organizer of the online session of the event occurring when a specific change in biological reaction occurs in the analysis subject, which is different from that of others, or the clustering result of the analysis subject. do.
- online session organizers can grasp behavioral tendencies peculiar to analysis subjects and predict possible future behaviors and situations, depending on which classification the specified analysis subjects have been clustered into. be able to. Then, it becomes possible to take appropriate measures for the person to be analyzed.
- the biological reaction index value is calculated by quantifying the change in biological reaction according to a predetermined standard, and the analysis subject is analyzed based on the biological reaction index value calculated for each of the plurality of people.
- the biological reaction analysis unit 12 analyzes the movement of the line of sight for each of a plurality of people and generates a heat map indicating the direction of the line of sight.
- the peculiar determination unit 13 compares the heat map generated for the person to be analyzed by the biological reaction analysis unit 12 with the heat map generated for the other person, so that the change in the biological reaction analyzed for the person to be analyzed It is determined whether it is specific compared with the change in biological response analyzed for.
- moving images of a video session are stored in the local storage of the user terminal 10, and the above analysis is performed on the user terminal 10.
- the machine specs of the user terminal 10 it is possible to analyze the moving image information without providing it to the outside.
- the video session evaluation system of this embodiment may include a moving image acquisition unit 11, a biological reaction analysis unit 12, and a reaction information presentation unit 13a as functional configurations.
- the reaction information presentation unit 13a presents information indicating changes in biological reactions analyzed by the biological reaction analysis unit 12a, including participants not displayed on the screen.
- the reaction information presenting unit 13a presents information indicating changes in biological reactions to an online session leader, moderator, or administrator (hereinafter collectively referred to as the organizer).
- Hosts of online sessions are, for example, instructors of online classes, chairpersons and facilitators of online meetings, coaches of sessions for coaching purposes, and the like.
- An online session host is typically one of the users participating in the online session, but may be another person who does not participate in the online session.
- the organizer of the online session can also grasp the state of the participants who are not displayed on the screen in an environment where the online session is held by multiple people.
- FIG. 6 is a block diagram showing a configuration example according to this embodiment. As shown in FIG. 6, in the video session evaluation system of the present embodiment, functions similar to those of the above-described first embodiment are given the same reference numerals, and explanations thereof may be omitted.
- the system includes a camera unit that acquires images of a video session, a microphone unit that acquires audio, an analysis unit that analyzes and evaluates moving images, and information obtained by evaluating the acquired moving images.
- an object generator for generating a display object (described below) based on the display; and a display for displaying both the moving image of the video session and the display object during execution of the video session.
- the analysis unit includes the moving image acquisition unit 11, the biological reaction analysis unit 12, the peculiar determination unit 13, the related event identification unit 14, the clustering unit 15, and the analysis result notification unit 16, as described above.
- the function of each element is as described above.
- the object generation unit generates an object 50 representing the recognized face part and the above-mentioned Information 100 indicating the content of the analysis/evaluation performed is superimposed on the moving image and displayed.
- the object 50 may identify and display all faces of a plurality of persons when the faces of the plurality of persons are moved in the moving image.
- the object 50 is, for example, when the camera function of the video session is stopped at the other party's terminal (that is, it is stopped by software within the application of the video session instead of physically covering the camera). If the other party's face is recognized by the other party's camera, the object 50 or the object 100 may be displayed in the part where the other party's face is located. This makes it possible for both parties to confirm that the other party is in front of the terminal even if the camera function is turned off. In this case, for example, in a video session application, the information obtained from the camera may be hidden while only the object 50 or object 100 corresponding to the face recognized by the analysis unit is displayed. Also, the video information acquired from the video session and the information recognized by the analysis unit may be divided into different display layers, and the layer relating to the former information may be hidden.
- the objects 50 and 100 may be displayed in all areas or only in some areas. For example, as shown in FIG. 8, it may be displayed only on the moving image on the guest side.
- the device described in this specification may be realized as a single device, or may be realized by a plurality of devices (for example, cloud servers) or the like, all or part of which are connected via a network.
- the control unit 110 and the storage 130 of each terminal 10 may be realized by different servers connected to each other via a network.
- the system includes user terminals 10, 20, a video session service terminal 30 for providing an interactive video session to the user terminals 10, 20, and an evaluation terminal 40 for evaluating the video session
- Variation combinations of the following configurations are conceivable.
- (1) Processing everything only on the user terminal As shown in FIG. 9, by performing the processing by the analysis unit on the terminal that is performing the video session (although a certain processing capacity is required), the video session can be performed. Analysis/evaluation results can be obtained at the same time (in real time) as you are.
- an analysis unit may be provided in an evaluation terminal connected via a network or the like.
- the moving image acquired by the user terminal is shared with the evaluation terminal at the same time as or after the video session, and after being analyzed and evaluated by the analysis unit in the evaluation terminal, the information of the object 50 and the object 100 is provided to the user. Together with or separately from the moving image data (that is, information including at least analysis data) is shared with the terminal and displayed on the display unit.
- FIG. 11 A first embodiment of the present invention will be described with reference to FIGS. 11 and 12.
- FIG. 11 The system according to the present embodiment, based on the information about which place on the screen the eyes of the person to be evaluated are gazing, and the information of the material displayed at that time, the information of the displayed material. Analyze and evaluate which parts were watched and for how long.
- the system according to the present embodiment includes camera means for acquiring a moving image obtained by photographing the person to be evaluated, line-of-sight acquiring means for acquiring the eye movement of the subject based on the acquired moving image, and display means for sequentially displaying a plurality of images to the subject.
- this system has position acquisition means for acquiring the positional relationship between the camera means and the display means.
- This makes it possible to calibrate the subject's eye movement and gaze point.
- the subject's eye condition is acquired by the camera unit of the display, and then a predetermined place on the screen (calibration point: center, screen Four corners, etc.) to acquire the movement of the eyes.
- a predetermined place on the screen calibrbration point: center, screen Four corners, etc.
- eye movements for example, it is possible to have the user look at the calibration points intentionally by playing an announcement on the screen.
- a conspicuous sign may be displayed only in the center in an eye-catching manner, and the eye movement at that moment may be estimated as a state of gazing at the center.
- gaze points are associated with their gaze time on the (shared) material displayed on the screen and output like a heat map. As a result, it is possible to grasp which part of the material was stopped for how long, and the part of interest of the subject can be understood.
- the system according to the present embodiment may generate a heat map for the same material, taking into consideration the movements of other target persons (other business sites, other students, etc.).
- the points of gaze of other subjects may also be displayed on the material.
- the part that the target person does not gaze at even though the other target person is gazing, or the part that the target person is gazing even though the other target person is not gazing It is good also as outputting the specific feature peculiar to a subject like.
- each material may be associated with a normalized heat map obtained by normalizing the eye movement of the subject and output. For example, the necessity of the material can be grasped from the point of view of which material was looked at well. On the other hand, it can be seen that there is not much need for materials with short fixation times.
- FIG. 13 A second embodiment of the present invention will be described with reference to FIGS. 13 and 14.
- FIG. 13 The system according to the present embodiment visualizes the evaluation values analyzed and evaluated based on the facial expressions and voices described above as a graph, and extracts other subjects with the same pattern as the pattern read from the graph.
- the system according to the present embodiment recognizes the facial image and voice of the subject included in the acquired moving image and calculates the evaluation value (for example, as shown in the graph shown in FIG. 13). 2) output as chronological change information.
- the value "safety" indicating a sense of security will be described as an example.
- the illustrated graph is obtained by plotting the time axis on the horizontal axis and the evaluation value indicating the degree of security on the vertical axis. It can be seen that the graph (A) showing the subject A shows a large drop in value from time t1 to t2. For example, such a graph appears when A's feeling uneasy or afraid is detected by judging facial expressions and voices in a complex manner.
- the graph (B) representing the subject B also shows a large decrease in values between times t1 and t2. For example, such a graph appears when A's feeling uneasy or afraid is detected by judging facial expressions and voices in a complex manner.
- the system extracts information (B) that includes the same pattern by referring to changes from time t1 to t2 in (A).
- a partial selection operation of the original graph information is received from the analyst (for example, by selecting time t1 to t2 in (A) of FIG. 17), and the selection operation is A corresponding frame of another moving image containing the same graph pattern as the graph pattern of the marked portion may be specified.
- FIG. 15 A third embodiment of the present invention will be described with reference to FIGS. 15 and 16.
- FIG. This lion-shaped system detects a momentary and significant change in the above-described facial expression and voice that exceeds a predetermined threshold.
- it is possible to analyze the subject's deep psychology by detecting a significant change in only one of the evaluations from a plurality of viewpoints. Efficient analysis can also be performed by extracting and evaluating a moving image in which such a change has occurred.
- the system cuts out and joins portions L1 of a predetermined length before and after including the change in t1, and L2 of a predetermined length before and after including the change in t2 to generate a digest movie. This makes it possible to extract a moving image that includes the moment when deep psychology appears.
- the system according to the present embodiment may detect that two graphs different from each other greatly change instantaneously. For example, as shown in FIG. 16, in evaluation values associated with mutually contradictory characteristics such as happy and sad, happy momentarily decreases and sad momentarily increases at time t1. In this way, when one evaluation value momentarily changes and the opposite evaluation value momentarily increases, the emotion that has increased is often the true emotion.
- a digest video may be generated by concatenating multiple frames (specific frames) acquired from within the video.
- the speech corresponding to the peculiar frame may be converted into text and output. If it is possible to share the screen information displayed on the screen of the user terminal, the screen information corresponding to the peculiar frame may be output when the momentary change described above occurs.
- the system associates facial expression evaluation values and voice evaluation values with attributes in advance, and detects changes beyond a predetermined range based on the correlation between the attributes. For example, a positive label is associated with the words “thank you” and “well understood”, and a correlation with facial expression evaluation (happy, sad, safety) is defined in advance.
- the system shown in FIG. 18 can share screen information displayed on the screen.
- the content of the screen, the text information, and the graph information indicating the emotion may be associated with each other.
- emotion that evaluates the degree of divergence from the standard expression of the user concentration that evaluates at least the amount of eye movement or facial movement of the user from the recognized face image, and user's expression from the recognized face image Evaluation is performed from three perspectives: safety, which evaluates feelings related to anxiety.
- the evaluation may be performed using a learner that has learned each point of view, or may be evaluated by other methods.
- a score is generated based on two or more evaluations from each evaluated aspect.
- FIG. 6 A system according to a sixth embodiment of the present invention will now be described with reference to FIG.
- the system according to the present embodiment identifies, for example, a video in which a specific target person is shown from a plurality of business videos, lecture videos, and the like. This makes it possible to focus on and evaluate a specific person from various online sessions.
- a digest video may be generated by cutting out only the part in which the target person is shown in the video. For example, of the lectures 001 to 004 of moving images shown in the figure, if the moving images in which the subject is shown are lectures 001, 002, and 004, the system selects t1, t1, and t1 from the respective moving images. t2 and t3 are extracted. The extracted part can be reproduced as a digest moving image.
- a system according to a seventh embodiment of the present invention will now be described with reference to FIG.
- the system according to this embodiment calculates a so-called concentration score (degree of concentration) of a subject participating in an online session. Online, especially in webinar format, the audience's camera is often turned off. According to this system, in such a case, it becomes possible to quantitatively determine how much each collector is concentrating on the lecture.
- the system recognizes the face image captured by the camera during the session (whether or not you share your camera image with the other party) and evaluates the amount of eye movement and face movement of the subject, respectively. .
- absolute values are evaluated as to how much the face has moved and how much the eyes have moved from the initial position.
- the face is not moving, but the eyes are moving in various directions, which suggests that the subject is reading the material.
- the degree of concentration is grasped as such two patterns. In the former case, it is assumed that the speaker is paying close attention to the speaker's face and listening attentively to the talk. It can be inferred that the
- a score related to the degree of concentration may be calculated based on the degree of movement (Value in the graph shown).
- the degree of concentration may be 100 when both face and eyes are 0, and may be 0 when both are the maximum values.
- a system according to an eighth embodiment of the present invention will now be described with reference to FIG.
- the system according to this embodiment obtains a true evaluation by correcting seasonal and temporal factors by performing the evaluations performed by the system in the above-described first to seventh embodiments in different spans. I'm trying to.
- the present system provides a short-term evaluation value (analysis value) obtained by analyzing the evaluation value over a short period of time and a long-term evaluation value (analysis value) obtained by analyzing the evaluation value over a long period of time for the same user. Assess the person's condition. For example, a student's evaluation value for one year (long-term evaluation value) and monthly evaluation value (short-term evaluation value) may be analyzed.
- what is analyzed from the long-term evaluation value is that the subject's long-term Features can be analyzed.
- the content analyzed from the short-term evaluation value it is possible to analyze the short-term characteristics of the subject, such as feeling depressed at the end of the month and having a bright expression on Friday.
- the above-mentioned long term is, for example, a cycle of three months, six months, or one year
- the short term is, for example, a cycle of one day, one week, or one month, but is not limited thereto.
- the evaluation in this case may employ the average value or the median value of the evaluation values, or may calculate an appropriate value using various statistical techniques.
- the trend of the evaluation values described above is analyzed, and for example, when an evaluation is made that the mood is depressed at the end of the month, the happiness score that occurred at the end of the month is corrected by multiplying it by a predetermined coefficient. good. That is, when an evaluation different from that on the trend occurs, the true emotion can be analyzed by performing the evaluation with a higher weight than the evaluation.
- the evaluation value in the end of February is P1 for a subject with a happiness trend indicated by a solid line.
- the happy score is supposed to be low at the end of the month, it can be seen that it deviates from the trend.
- the correction method may be to add or subtract the positive or negative deviation from the trend, or any other method.
- the system according to the present embodiment acquires subjective responses (questionnaires, interviews, etc.) from subjects in advance regarding a certain theme, and compares them with the acquired evaluation values. As a result, for example, even if a subjective answer such as "I am not dissatisfied with the lecture" is obtained in a questionnaire, when the actual expression and voice are analyzed, if the evaluation of the degree of happiness is low, there will be some degree of conjecture. I can understand that you are working.
- the system according to this embodiment is particularly suitable in the field of employee awareness surveys from companies.
- questionnaires to the subjects included questions about happiness (e.g. job satisfaction at the company, openness, etc.), questions about anxiety (e.g., troubles, fears, etc.), and questions about future safety (e.g. career path).
- questions about happiness e.g. job satisfaction at the company, openness, etc.
- questions about anxiety e.g., troubles, fears, etc.
- questions about future safety e.g. career path.
- securement, promotion, salary increase, etc. may be prepared and answered, and compared with the evaluation values regarding happiness, anxiety, and safety among the facial expressions and voices of the subject.
- a tenth embodiment of the present invention will be described with reference to FIG.
- the system according to the present embodiment accepts labeling (annotation) of the situation at that time from the subject after the evaluation value obtained from the facial expression and voice of the subject.
- the evaluation value can be subjectively evaluated after the fact, and the algorithm can be updated by feeding back the evaluation result.
- labeling may be performed for each time zone, and the evaluation value may be superimposed on the content of the accepted label.
- ⁇ the evaluation value is correct/not correct''
- ⁇ the situation at that time'' ⁇ values based on self-standards''.
- the system according to this embodiment relates to a sales video session between a sales terminal of a sales representative and a sales terminal of a sales representative.
- sales representatives make predictions about the closing rate based on their own sales interviews, etc., and their experience. I used to stand up.
- this system by analyzing the sales video sessions, it is possible to change the closing rate by machine learning statistical processing based on the data of the past sales video sessions and the sales results.
- This system includes a deal closing information obtaining means for obtaining deal closing information of a past sales video session, and a predetermined face image of at least one of the person in charge on the sales side or the person in charge of the sales partner included in the moving image of the sales video session. face recognition means for recognizing each frame, voice recognition means for recognizing at least one of the voices of the person in charge of the sales side or the person in charge of the sales side included in the moving image, and both the recognized face image and voice and evaluation means for calculating an evaluation value from a plurality of viewpoints.
- the present system includes model generation means for generating a contract conclusion estimation model for estimating the contract conclusion rate of moving images of other sales video sessions using the evaluation values and contract conclusion information as training data, and uses the model. to determine win rates for new sales video sessions.
- the closing rate may be, for example, numerical values such as 50% and 70%, or ranks (zones) such as A, B, and C.
- the contract rate may be calculated based on the degree of similarity with moving images that have been contracted in the past. For example, if the similarity between the video of a new business video session and the video of a contract closed with the same business partner (or similar business client) in the past is 70%, the closing rate of the new business is also A determination of 70% may also be made.
- this system calculates expected sales forecast figures for a given period of time, such as the current month or quarter. do.
- FIG. 1 A system according to a twelfth embodiment of the present invention will now be described with reference to FIG.
- the system according to this embodiment is suitable mainly for online learning guidance.
- This system includes a lecturer terminal and student terminals that are communicably connected to each other via a network.
- the lecturer terminal has a lecturer-side camera for capturing at least the face of the lecturer user.
- the student terminal has a face camera for capturing at least the face of the student, and a camera for capturing the hands of the student (the state of writing on a notebook or print, or the state on the desk). has a handheld camera.
- this system includes hand movement recognition means for recognizing the movement of the student's hand in the moving image acquired from the camera at hand for each predetermined frame, and the recognition of the movement of the student's hand based on the recognized hand movement.
- estimating means for estimating the degree of
- the estimation means estimates the degree of comprehension of the student according to the amount of recognized hand movements. For example, whether or not the amount of writing on the board matches the amount of writing on the board by the instructor (whether or not the teacher is taking proper notes), or by analyzing the color of the pen being used, the important points can be color-coded. It is possible to evaluate whether or not you are devising.
- an alert may be issued to the instructor terminal or the student terminal.
- the degree of understanding may be estimated based on the evaluation value based on the student's facial expression and voice, and the movement of the hand.
- the hand camera cannot detect the student's hand movement while the student's emotions such as feeling irritated or anxious are detected, the lecture will not be effective. It can be assumed that the
- ⁇ Supplementary hardware configuration The sequence of operations performed by the apparatus described herein may be implemented using software, hardware, or a combination of software and hardware. It is possible to create a computer program for realizing each function of the information sharing support device 10 according to the present embodiment and implement it in a PC or the like. It is also possible to provide a computer-readable recording medium storing such a computer program.
- the recording medium is, for example, a magnetic disk, an optical disk, a magneto-optical disk, a flash memory, or the like. Also, the above computer program may be distributed, for example, via a network without using a recording medium.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
Abstract
Le problème de la présente invention consiste à évaluer une session vidéo par l'évaluation d'une vidéo acquise dans la session vidéo. À cet effet, la présente divulgation a pour solution un système d'évaluation de session vidéo comprenant : un moyen d'acquisition pour acquérir au moins une vidéo ; un moyen de reconnaissance faciale pour reconnaître au moins une image faciale d'un sujet inclus dans la vidéo pour chaque trame prescrite ; un moyen de reconnaissance vocale pour reconnaître au moins la parole du sujet inclus dans la vidéo ; un moyen d'évaluation pour calculer des valeurs d'évaluation pour un aspect prescrit sur la base à la fois des images faciales reconnues et de la parole reconnue ; un moyen de sortie pour fournir les valeurs d'évaluation en tant qu'informations de changement dans une série chronologique ; et un moyen de spécification pour référencer d'autres informations de changement associées à d'autres vidéos et spécifier d'autres vidéos qui comprennent le même profil que le profil extrait des informations de changement.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2021/003792 WO2022168175A1 (fr) | 2021-02-02 | 2021-02-02 | Terminal d'évaluation de session vidéo, système d'évaluation de session vidéo et programme d'évaluation de session vidéo |
JP2022518704A JPWO2022168175A1 (fr) | 2021-02-02 | 2021-02-02 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2021/003792 WO2022168175A1 (fr) | 2021-02-02 | 2021-02-02 | Terminal d'évaluation de session vidéo, système d'évaluation de session vidéo et programme d'évaluation de session vidéo |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022168175A1 true WO2022168175A1 (fr) | 2022-08-11 |
Family
ID=82741154
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2021/003792 WO2022168175A1 (fr) | 2021-02-02 | 2021-02-02 | Terminal d'évaluation de session vidéo, système d'évaluation de session vidéo et programme d'évaluation de session vidéo |
Country Status (2)
Country | Link |
---|---|
JP (1) | JPWO2022168175A1 (fr) |
WO (1) | WO2022168175A1 (fr) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2016206261A (ja) * | 2015-04-16 | 2016-12-08 | 本田技研工業株式会社 | 会話処理装置、および会話処理方法 |
JP2020155944A (ja) * | 2019-03-20 | 2020-09-24 | 株式会社リコー | 発話者検出システム、発話者検出方法及びプログラム |
-
2021
- 2021-02-02 JP JP2022518704A patent/JPWO2022168175A1/ja active Pending
- 2021-02-02 WO PCT/JP2021/003792 patent/WO2022168175A1/fr active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2016206261A (ja) * | 2015-04-16 | 2016-12-08 | 本田技研工業株式会社 | 会話処理装置、および会話処理方法 |
JP2020155944A (ja) * | 2019-03-20 | 2020-09-24 | 株式会社リコー | 発話者検出システム、発話者検出方法及びプログラム |
Also Published As
Publication number | Publication date |
---|---|
JPWO2022168175A1 (fr) | 2022-08-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022168180A1 (fr) | Terminal d'évaluation de session vidéo, système d'évaluation de session vidéo et programme d'évaluation de session vidéo | |
WO2022168185A1 (fr) | Terminal d'évaluation de session vidéo, système d'évaluation de session vidéo et programme d'évaluation de session vidéo | |
WO2022180860A1 (fr) | Terminal, système et programme d'évaluation de session vidéo | |
WO2022168175A1 (fr) | Terminal d'évaluation de session vidéo, système d'évaluation de session vidéo et programme d'évaluation de session vidéo | |
WO2022168176A1 (fr) | Terminal d'évaluation de session vidéo, système d'évaluation de session vidéo et programme d'évaluation de session vidéo | |
WO2022168178A1 (fr) | Terminal, système et programme d'évaluation de session vidéo | |
WO2022168183A1 (fr) | Terminal d'évaluation de session vidéo, système d'évaluation de session vidéo et programme d'évaluation de session vidéo | |
WO2022168182A1 (fr) | Terminal d'évaluation, système d'évaluation et programme d'évaluation de session vidéo | |
WO2022168177A1 (fr) | Terminal d'évaluation de session vidéo, système d'évaluation de session vidéo et programme d'évaluation de session vidéo | |
WO2022168174A1 (fr) | Terminal d'évaluation de session vidéo, système d'évaluation de session vidéo et programme d'évaluation de session vidéo | |
WO2022168179A1 (fr) | Terminal d'évaluation de session vidéo, système d'évaluation de session vidéo, et programme d'évaluation de session vidéo | |
WO2022168181A1 (fr) | Terminal d'évaluation de session vidéo, système d'évaluation de session vidéo et programme d'évaluation de session vidéo | |
WO2022168184A1 (fr) | Terminal d'évaluation, système d'évaluation et programme d'évaluation de session vidéo | |
JP7152825B1 (ja) | ビデオセッション評価端末、ビデオセッション評価システム及びビデオセッション評価プログラム | |
WO2023032058A1 (fr) | Terminal d'évaluation de session vidéo, système d'évaluation de session vidéo et programme d'évaluation de session vidéo | |
WO2022180852A1 (fr) | Terminal, système et programme d'évaluation de session vidéo | |
WO2022180858A1 (fr) | Terminal, système et programme d'évaluation de session vidéo | |
WO2022180855A1 (fr) | Terminal, système et programme d'évaluation de session vidéo | |
WO2022180854A1 (fr) | Terminal, système et programme d'évaluation de session vidéo | |
WO2022180857A1 (fr) | Terminal, système et programme d'évaluation de session vidéo | |
WO2022180859A1 (fr) | Terminal, système et programme d'évaluation de session vidéo | |
WO2022180861A1 (fr) | Terminal, système et programme d'évaluation de session vidéo | |
JP7138998B1 (ja) | ビデオセッション評価端末、ビデオセッション評価システム及びビデオセッション評価プログラム | |
WO2022230155A1 (fr) | Système d'analyse vidéo | |
WO2022180856A1 (fr) | Terminal, système et programme d'évaluation de session vidéo |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2022518704 Country of ref document: JP Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21924572 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21924572 Country of ref document: EP Kind code of ref document: A1 |