WO2020236331A2 - Method for assessment of audience attention - Google Patents

Method for assessment of audience attention Download PDF

Info

Publication number
WO2020236331A2
WO2020236331A2 PCT/US2020/027605 US2020027605W WO2020236331A2 WO 2020236331 A2 WO2020236331 A2 WO 2020236331A2 US 2020027605 W US2020027605 W US 2020027605W WO 2020236331 A2 WO2020236331 A2 WO 2020236331A2
Authority
WO
WIPO (PCT)
Prior art keywords
attention
response
subjects
recited
recorded
Prior art date
Application number
PCT/US2020/027605
Other languages
English (en)
French (fr)
Other versions
WO2020236331A3 (en
Inventor
Lucas C. Parra
Jens Madsen
Original Assignee
Research Foundation Of The City University Of New York
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Research Foundation Of The City University Of New York filed Critical Research Foundation Of The City University Of New York
Priority to EP20809217.1A priority Critical patent/EP3952746A4/de
Publication of WO2020236331A2 publication Critical patent/WO2020236331A2/en
Publication of WO2020236331A3 publication Critical patent/WO2020236331A3/en
Priority to US17/450,415 priority patent/US20220030080A1/en

Links

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/16Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
    • A61B5/163Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state by tracking eye movement, gaze, or pupil change
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/0002Remote monitoring of patients using telemetry, e.g. transmission of vital signals via a communication network
    • A61B5/0015Remote monitoring of patients using telemetry, e.g. transmission of vital signals via a communication network characterised by features of the telemetry system
    • A61B5/0022Monitoring a patient using a global network, e.g. telephone networks, internet
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/02Detecting, measuring or recording pulse, heart rate, blood pressure or blood flow; Combined pulse/heart-rate/blood pressure determination; Evaluating a cardiovascular condition not otherwise provided for, e.g. using combinations of techniques provided for in this group with electrocardiography or electroauscultation; Heart catheters for measuring blood pressure
    • A61B5/024Detecting, measuring or recording pulse rate or heart rate
    • A61B5/02438Detecting, measuring or recording pulse rate or heart rate with portable devices, e.g. worn by the patient
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/103Detecting, measuring or recording devices for testing the shape, pattern, colour, size or movement of the body or parts thereof, for diagnostic purposes
    • A61B5/11Measuring movement of the entire body or parts thereof, e.g. head or hand tremor, mobility of a limb
    • A61B5/1126Measuring movement of the entire body or parts thereof, e.g. head or hand tremor, mobility of a limb using a particular sensing technique
    • A61B5/1128Measuring movement of the entire body or parts thereof, e.g. head or hand tremor, mobility of a limb using a particular sensing technique using image analysis
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/16Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
    • A61B5/165Evaluating the state of mind, e.g. depression, anxiety
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/68Arrangements of detecting, measuring or recording means, e.g. sensors, in relation to patient
    • A61B5/6801Arrangements of detecting, measuring or recording means, e.g. sensors, in relation to patient specially adapted to be attached to or worn on the body surface
    • A61B5/6802Sensor mounted on worn items
    • A61B5/681Wristwatch-type devices
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7246Details of waveform analysis using correlation, e.g. template matching or determination of similarity
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/20ICT specially adapted for the handling or processing of patient-related medical or healthcare data for electronic clinical trials or questionnaires
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H40/00ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
    • G16H40/60ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the operation of medical equipment or devices
    • G16H40/63ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the operation of medical equipment or devices for local operation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B2503/00Evaluating a particular growth phase or type of persons or animals
    • A61B2503/20Workers
    • A61B2503/24Computer workstation operators
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/05Detecting, measuring or recording for diagnosis by means of electric currents or magnetic fields; Measuring using microwaves or radio waves 
    • A61B5/053Measuring electrical impedance or conductance of a portion of the body
    • A61B5/0531Measuring skin impedance
    • A61B5/0533Measuring galvanic skin response
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/08Detecting, measuring or recording devices for evaluating the respiratory organs
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/16Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
    • A61B5/168Evaluating attention deficit, hyperactivity

Definitions

  • Audience attention is an important commodity given the abundance of electronic media today. Many producers of media (e.g. movies, advertisements, broadcast events, online concerts, online exercise classes, online learning) are financially motivated to monitor the attention of their audience. Unfortunately, no single solution has proven to be entirely satisfactory. Most prior approached to monitor attention have relied on comparing eye gaze position with the item of interest. For example, is the eye gaze of a driver directed at the road? Is the eye gaze of a computer user directed as the computer monitor? Is the eye gaze directed as a specific advertising on the screen? However, in many scenarios the user/viewer is directly looking at the screen, but their mind is still not attentively engaged with the content.
  • This disclosure provides a method to assess a human subject’s attention while experiencing dynamic media.
  • An attention-predictive response e.g. time course of gaze position, or pupil size, or heart-rate, etc.
  • Similarity of the time course of this response to the time course of a predicted response provides a quantitative measure of the subject’s attention to the media.
  • a method to assess a human subject’s attention comprising steps of: playing a pre-record dynamic media to a plurality of human subjects; digitally recording at least one attention -predictive response of each human subject in the plurality of human subjects dynamically over time during the step of playing, thereby producing a digitally recorded attention-predictive response for each human subject; and quantifying, for each human subject in the plurality of human subjects, a similarity over time of the digitally recorded attention-predictive response to a corresponding anticipated response to the pre-recorded dynamic media.
  • a method to assess a human subject’s attention comprising steps of: digitally recording, dynamically in real-time, at least one attention-predictive response of each human subject in a plurality of human subjects while the human subjects are experiencing a real-time dynamic media that is being broadcast by a broadcaster, thereby producing a digitally recorded attention- predictive response for each human subject; and quantifying, for each human subject in the plurality of human subjects, a similarity over time of the digitally recorded attention- predictive response to a corresponding anticipated response to the dynamic media.
  • a method of adjusting a video game in response to player attention is provided.
  • the method comprising: generating a dynamic video display that is produced during play of a video game, wherein the dynamic video display has an anticipated response with regard to an attention-predictive response of a human subject; digitally recording, dynamically in real-time, at least one attention-predictive response of the human subject dynamically over time while the human subject is experiencing the dynamic video display, thereby producing a digitally recorded attention- predictive response; quantifying a similarity over time of the digitally recorded attention- predictive response to a corresponding anticipated response to the dynamic video display; and adjusting the video game in response to changes in the similarity over time.
  • FIG. 2 is a schematic depiction of an embodiment that shows similarity of response is determined by first aggregating the time course of eye-gaze position across a test group.
  • the time course is the median eye-gaze position in the group at each instant in time. Similarity is them determined by correlating the individual user eye- gaze position with this anonymous aggregated group response.
  • reference response is predicted by some other means, e.g. a computational model, then the correlation is no longer a intersubject correlation, but a correlation between subjects and a computer predicted response.
  • FIG. 3 A is a graph showing two subjects’ gaze position and pupil size follow each other during attentive viewing
  • FIG. 3B is a graph showing the same two subjects viewing the same segment of video while distracted by a counting task
  • FIG. 3C is a graph showing the intersubject correlation (ISC) of eye movement measured as the mean of ISC of vertical and horizontal gaze position and pupil size. Values for each subject are shown as dots for all videos in Experiment 1. Each dot is connected with a line between two different conditions namely when subjects were either attending (A) or were distracted (D) while watching the video.
  • ISC intersubject correlation
  • FIG. 3D depicts the receiver operator curve for deciding whether a subject is attending or distracted based on their ISC.
  • FIG. 3E is a graph demonstrating that Intentional learning shows a higher ISC. Each dot is the average ISC for each subject when they watched all instructional videos in the attend condition using either the intentional or incidental learning style.
  • FIG. 4A shows a graph illustrating eye movements of three representative subjects as they watch the“Why are Stars Star-Shaped?” video. Two high performing subjects have similar eye movements and pupil size. A third, low performing subject does not match their gaze position or pupil size.
  • FIG. 4B graphs the ISC of eye movement and performance on quiz taking (Score) for each of five videos in Experiment 1. Each dot is a subject. The high and low performing subjects (subjects 1-3) from FIG. 3 A are highlighted for the Stars video. Dotted lines represent performance of subjects naive to the video.
  • FIG. 4C is similar to FIG. 4B but averaging over the five videos.
  • the data was collected in two different conditions: During intentional learning (Experiment 1) where subjects knew they would be quizzed on the material. During incidental learning (Experiment 2) where subjects did not know that quizzes would follow the viewing.
  • FIG. 4D is a graph showing that videos in three different production styles (Experiment 3) show similar correlation values between test scores and ISC. Each point is a subject where values are averaged over two videos presented in each of the three styles.
  • FIG. 4E depicts a graph showing quiz score results for different question types. Each point is a subject with test scores averaged over all questions about factual information (recognition) versus questions requiring comprehension. ISC were averaged over all six videos in Experiment 3.
  • FIG. 5 A are graphs depicting gaze position for‘Immune’ video in
  • FIG. 5B is a graph depicting deviation of gaze position when subjects looked at 4“validation” dots presented in sequence on the corners of the screen, collected in the Laboratory, classroom and At-home settings for the first video shown to subjects (see Methods) indicates a significant difference in means.
  • FIG. 5C are graphs showing weighted average of the vertical, horizontal and velocity ISC eye-movement (wISC) is predictive of performance in the classroom.
  • FIG. 5D are graphs showing Eye-movement wISC is predictive of performance in the At-home setting.
  • the present disclosure details how to assess audience attention unobtrusively and remotely for dynamic media such as video (including movies, online courses and video games) and audio (including music, podcasts and audio books). Attention may be measured on either a group or an individual level.
  • the disclosed method scales to large number of individuals and can be readily automated.
  • This disclosure shows the effectiveness of measuring attention from the predictability of behavioral or physiological responses such as eye-movements, pupil dilation and/or heart rate. Such signals can be readily collected remotely.
  • the disclosure further teaches how to adapt the dynamic media or the interaction to capture attention of an audience.
  • an audience is exposed to a dynamic media and audience response over time is digitally recorded (i.e. dynamically recorded).
  • the recorded response is compared to a predicted response over time in a group of viewers (FIG. 1).
  • Predictability is established based on reference responses which can be determined by aggregates of other individuals’ responses to the dynamic media, or by predicting the responses to the dynamic media using computational models of responses.
  • the level of attention (a continuous variable) can be measured as the similarity of the observed audience response over time to that of the predicted response.
  • similarity is measured as the temporal correlation of the time course of the observed audience response with the time course of the predicted reference response (FIG. 2).
  • Salience refers to the visual characteristics that make a point "salient" in an image, which, by definition means that they attract attention.
  • These computer models for salience therefore aims to predict gaze position on an image or video.
  • These salience models may be used to generate a predicted response for the temporal trajectory of gaze position. This is in particular important when trying to determine attention in video game. For a video game one typically cannot record data from a reference group to establish what the predicted response should be. This is simply because most video games progress differently every time they are played.
  • a computer model of visual salience may be utilized to predict the viewer eye-gaze response based on salience of the visual content of the video game. If a gamer is not following the visual content as expected, then this user is likely not properly attending to the game.
  • the predicted response is established by recording the responses over time in a reference group of subjects. If the response of a given test subject is similar to the response of all the members in the group, then this subject is coincided attentive. In such an embodiment, there is no need to have a single predicted response. Instead, the responses of the test subjects can be dynamically collected and compared to all members of the reference group.
  • Responses can include a variety of behavioral or physiological responses that are predictable in attentive individuals. In this specification these responses are referred to as attention -predictive responses.
  • behavioral responses include head movements, hand movements, eye movement (e.g. time course of gaze-position, FIG. 3 A and FIG. 3B for attentive and distracted subjects respectively), pupil size (FIG. 3A and FIG. 3B), eye movement velocity, facial expression and computer mouse movements (e.g. computer cursor movements).
  • physiological responses include pupil dilation, heart rate, breathing effort (e.g. thorax diameter, etc.), galvanic skin response, etc.
  • the physiological responses can be recorded using conventional smart devices such as smart watches, arm bands, etc.
  • At least one of the attention-predictive response is captured unobtrusively and transmitted remotely over the internet, such as with a web camera, wrist band, smartwatch, earwear, smart glasses, motion sensor, or other unobtrusive method to remotely capture such responses.
  • the term“remote” refers to the subjects being physically distanced from one another such that the subjects cannot physically interact without the use of a device (e.g. internet, wifi, wireless, computer, etc.).
  • the response is recorded ahead of time in a reference audience and then aggregated across the reference group, e.g. as the median response.
  • This response is now anonymous as it is not associated with any specific individual and is transmitted remotely to the user to assess the individual level of attention while preserving privacy.
  • similarity of attention-predictive responses to a reference response is measured with correlation in time of the time courses of the different response (FIG. 1 and FIG. 2). If the reference response is the response of other subjects, this results in inter-subject correlation (ISC).
  • ISC inter-subject correlation
  • the reference response may be computer generated using a predictive model of
  • the predicted response for a given dynamic media that is asynchronously broadcast is a property of the media itself.
  • Asynchronous means the material is experienced by subjects at time other than the time when the media was recorded.
  • a prerecorded dynamic media may be experienced (e.g. viewed and/or listened to) by an audience of initial subjects which serves as an attentive reference group.
  • One or more attention-predictive responses are aggregated across this attentive reference group to serve as the attention-predictive response.
  • the responses are a function of time as the dynamic media is experienced. This aggregated data is then associated with the dynamic media itself.
  • the subject’s attention- predictive responses are quantitatively compared to this aggregate to determine how similar the subject’s response is to the attention -predictive response.
  • the degree of similarity to the attentive-predicted response can be reported for each viewer in an audience, or for the entire audience.
  • a prerecorded dynamic media may be experienced (e.g. viewed and/or listened to) by an audience of initial subjects.
  • the attention-predictive responses of the subjects are classified as either (1) attentive responses or (2) inattentive responses. In one embodiment this is done by measuring the similarity of the response to that of a reference group using inter-subject correlation (FIG. 1 and 2).
  • Subjects in an attentive state show a high ISC value, which subjects in a disattentive state show low ISC values (FIG. 3C). By thresholding this ISC values subjects can be classified as attentive or distracted.
  • the performance of this approach is demonstrated with the receiver- operator curve in FIG. 3D). This classification may the different at different points in time of the media. Attentive responses are generally correlated with one another.
  • inattentive responses are generally not correlated with one another.
  • the aggregated data for the attentive responses may be used as the target data that is associated with the dynamic media. More generally, however, the level of attention is on a continuum and is not a binary state.
  • ISC levels in theory fall on a continuum between 0 and 1, with zero indicating no correlation (no similarity) and 1 indicating perfect correlation. In practice perfect correlation on behavioral and physiological time courses is never achieved. Indeed, the level of correlation varies for different types of responses.
  • eye movement can achieve correlation values as high as 0.5 whereas inter-subject correlation of heart-rate often do not exceed 0.1. Therefore, there is no absolute threshold of similarity and one should evaluate the measure of similarity specifically for each type of response and each time of the media stimulus.
  • a test audience may view a prerecorded movie, short television show, advertisement or short video.
  • a test audience for such an application includes at least 20 attentive viewers. During the viewing different audience members pay more less attention to different portions of the media. Those audience members who are attentive to the video have their attention-predictive responses correlated in time to one another (i.e. the time course of the responses are similar). Those audience members who are inattentive to the video have attention- predictive responses that are uncorrelated (the time courses of the responses are dissimilar).
  • a television or movie producer can then determine how attentive of the audience was at different portions of the media (FIG. 3C). This aids in the producer making production or editing decisions. For example, if the audience is not found to be attentive to significant portions of a movie, then the movie may be edited before release. Conversely, if the audience is found to be attentive to the movie, then the movie may be commercially distributed to a wider audience.
  • an online video advertisement may be sent to a subject’s browser or mobile device (e.g. smart phone or tablet).
  • a subject e.g. smart phone or tablet.
  • attention-predictive responses the advertiser that determine whether or not the subject is paying attention to the advertisement.
  • online education platform may be present an education video to a remote student.
  • the platform mas assess the student’s attention to the content of the video by determining the similarity of the student's attention-predictive responses to that of the reference responses. If the responses are dissimilar then the student is not attentive, and the education platform may chose to interrupt, or modify further presentation of the education content.
  • Synchronous means the material is broadcast live with the subjects participating at time of broadcast.
  • attention-predictive responses can be aggregated across all live subjects (e.g. 20 or more subjects) to provide instantaneous feedback to the broadcaster indicating if the audience, at a whole, is paying attention. This will allow the broadcaster to pause when attention wanes, elaborate on concepts when subjects lose attention, or otherwise attempt to engage audience attention.
  • a digital signal may be generated to notify the broadcaster of this fact.
  • the digit signal may be a digital signal that is sent to a computer program that is broadcasting the video (e.g. a computer program that is running a video game).
  • the digital signal may be a graphic or auditory alert that is perceptible by a human broadcaster. For example, when the ISC is below the 70 th percentile typically observed in ISC values then the human broadcaster may hear a tone or see a graphic indicator (e.g. a yellow light).
  • the similarity metric can be the correlation coefficient of the time course of the response with the time course of the predicted response. This similarity metric can be reported as a percentile. For example, a given audience member may have a similarity to the reference that is in the 90th percentile of similarity in the group. This means this subject is particularly attentive. Another subject may have a similarity metric that is in the 20th percentile for the group. This subject is particularly inattentive. When a reference group is used to determine the predicted response, then the predicted response can be obtained, for example, as the median response over time.
  • the method produces an output (e.g. table of numeric data, graphs, etc.) that summarize the attention levels of the group or of individuals within the group as a function of time.
  • the output may depict a graph or score of each individual’s attention score (e.g. as a percentile) relative to the anticipated response.
  • This graph or score is dynamic in that the values vary of time as the dynamic media is played. Such information is useful to determine which times in the media captured, or failed to capture, attention of at least some of the subjects.
  • a commercial provider of media research could use the disclosed method to measure audience attention in real time.
  • the service provided would enroll target audiences (not unlike traditional providers of such services as Nielsen Media Research).
  • the provider would review behavioral or physiological response data in real time, and analyzing it for predictability, report an instantaneous measure of audience attention to their clients.
  • a synchronously broadcast exercise class may monitor attention-predictive responses such as body movements. If the attention-predictive responses of the audience are synchronized then the exercise class is going well. If the responses begin to become asynchronous then the instructor may need to take action to recapture the attention of the class.
  • online conferences can use a similar approach to increase attentiveness of the audience.
  • Another application of synchronous monitoring is to adaptively change the content.
  • the content is often generated programmatically.
  • the disclosed method can be used to adapt the content to capture maximum attention. For example, if eye movement cannot be predicted from the visual dynamic of the video game, the game program may choose to adapt parameters such as speed or difficulty of the game.
  • Yet another application is in online education. Lack of attention leads to a failure to learn the study material.
  • online education has only limited ways to determine whether students are paying attention. For example, one can determine if students are clicking with a pointer on interactive user interfaces, or are playing a video on the online platform, but there is no way to determine if students are actually actively paying attention to the material.
  • Predictable behavioral responses predictable in the sense that they are similar to that of other subjects, are indicative of attention and can thus be used to adjust the study material. For example, during presentation of educational videos, eye movements can be readily measured remotely with web cameras.
  • a given student moves their eyes similarly to that of an attentive group (recorded previously in a asynchronous broadcast or determined in real-time in a synchronous broadcast) then the student is most likely paying attention to the video. If that is not the case, then the student is not paying attention, and one could interrupt the video playback to engage the student, for example, in a question and answer dialog about the preceding material, similarly to what a real teacher might do in a clear classroom.
  • Some videos featured a teacher writing on a board, while others use more modern storytelling using animations or the popular writing-hand style.
  • ISC attentional modulation of intersubject correlation
  • the attended viewing and distracted viewing are labeled A and D, respectively, in FIG. 3C.
  • This confirms the evident variability across videos and subjects.
  • the effect of attention is so strong that despite the variability between subjects one can still determine the attention condition near perfectly from the ISC of individual subjects (FIG. 3B).
  • the gaze position data collected with the web camera is significantly noisier than using the professional eye tracker in the lab (FIG. 5A).
  • the accuracy of gaze position determination was computed when subjects are asked to look at a dot on the screen (FIG. 5B).
  • the five video stimuli used in Experiments 1, 2, 4 and 5 were selected from the‘Kurz many - In a Nutshell’ and‘minute physics’ YouTube channels. They cover topics relating to physics, biology, and computer science (Table 1 and 2, Range: 2.4 - 6.5 minutes, Average: 4.1 ⁇ 2.0 minutes).
  • Two of the videos (‘Immune’ and‘Internet’) used purely animations, where‘Boys‘ used paper cutouts and handwriting.‘Bulbs’ and‘Stars’ showed a hand drawing illustrations aiding the narrative.
  • the six video stimuli used in Experiments 3-5 were selected from‘Khan Academy’,‘eHow’,‘Its ok to be smart’ and ‘SciShow’.
  • the videos cover topics related to biology, astronomy and physics (Table 1 and 2, Duration: 4.2 - 6 minutes long, Average: 5.15 ⁇ 57 seconds). They were specifically chosen to follow recommendations from a large scale MOOC analysis.
  • the three styles chosen were based on popular styles from YouTube.‘Mosquitoes’ and ‘Related’ produced in the‘Presenter & Animation’ style shows a presenter talking as pictures and animations are shown.‘Planets’ and‘Enzymes’ were produced in the ‘Presenter & Glass Board’ style and shows a presenter drawing illustrations and equations on a glass board facing the viewer.‘Capacitors’ and‘Work energy’ used the ‘Animation & Writing hand’ style.
  • Experiment 1 Intentional learning, subjects watched a video and answered afterwards a short four-alternative forced-choice questionnaire. The subjects were aware that they would be tested on the material. The test covered factual information imparted during the video (11 - 12 recall questions). Examples of questions and answer options can be found in Tab. 1.
  • Experiment 2 incidental learning
  • Experiment 3 subjects were informed that questions regarding the material would be presented after each video and followed the procedure of Experiment 1, using a different set of stimuli. The order of video presentation, questions and answer options were randomized for all three experiments.
  • WEBGAZER(TM). WEBGAZER(TM) runs locally on the subject’s computer and uses their webcam to compute their gaze position.
  • the script fits a wireframe to the subject’s face and captures images of their eyes to compute where on the screen they are looking. Only the gaze position and the coordinates of the eye images used for the eye position computation were transmitted from the subject’s computer to a web server. In order for the model to compute where on the screen the participant is looking, a standard 9-point calibration scheme was used. Subject had to achieve a 70% accuracy to proceed in the experiment. User data was transferred to the server for analysis. However, in a fully local implementation of the approach no user data would be transmitted. Instead, median eye positions of a previously recorded group would be transmitted to the remote location and median-to-subject correlation could be computed entirely locally.
  • WEBGAZER(TM) estimates point of gaze on the screen as well as the position and size of the eyes on the webcam image. Eye position and size allowed estimations of the movement of the subject in horizontal and vertical directions. The point of gaze and eye image position & size were upsampled to a uniform 1000Hz, from the variable sampling rate of each remote webcam (typically in the range of 15-100Hz). An inclusion criteria for the study was that the received gaze position data should be sampled at at least 15Hz in average. Missing data were linearly interpolated and the gaze positions were denoised using a 200ms and 300ms long median filter.
  • Movements of the participant were linearly regressed out of the gaze position data using the estimated position of the participant from the image patch coordinates. This was done because the estimated gaze position is sensitive to movements of the subject (this was found to increase the overall ISC). Subjects that had excessive movements were removed from the study (16 out of 1159 subjects; excessive movement is defined as 1000 times the standard deviation of the recorded image patch coordinates in the horizontal, vertical and depth directions). Blinks were detected as peaks in the vertical gaze position data. The onset and offset of each blink were identified as a minimum point in the first order temporal derivative of the gaze position. Blinks were filled using linear interpolation in both the horizontal and vertical directions. Subjects that had more than 20% of data interpolated using this method was removed from the cohort (14 out of 1159 subjects).
  • gaze position is measured in units of pixels, i.e. where on the screen the subject is looking. Because the resolutions of computer screens varies across subjects, the recorded gaze position data in pixels were normalized to the width and height of the window the video was played in (between 0 and 1 indicating the edges of the video player). Events indicating end of the video stimuli (“stop event”) were used to segment the gaze position data. The start time for each subject was estimated as the difference between the stop event and the actual duration of the video. This was done because the time to load the YouTube player was variable across user platforms. [0074] Estimate of the quality of gaze position
  • Intersubject correlation of eye movements is calculated by (1) computing the
  • lSC (lSCverticai+lSCho zontai+lSCpupii)/3.
  • the ISC values for the attend and distract conditions were computed on the data for the two conditions separately.
  • a three-way repeated measures ANOVA was used with fixed effect of video and attentional state (attend vs. distract) and random effect of subject.
  • the receiver operating characteristic curve was used. Each point on the curve is a single subject.
  • AUC area under the ROC curve is used
  • ISC was computed for each video in the attend condition and averaged across all videos.
  • First movement velocity was computed by taking the temporal derivative of horizontal and vertical gaze positions using the Hilbert transform. Two-dimensional spatial vectors of these velocity estimates (combining Hilbert transforms of horizontal and vertical directions) were formed. These vectors are normalized to unit length. The median gaze velocity vectors is obtained as the median of the two coordinates across all subjects. The median-to-subject correlation of velocity, MSCveiodty , is then computed as the cosine distance between the velocity vectors of each subject and the median velocity vector, averaged over time.
  • wISC wiMSCvemcai + wzMSChonmntai + w/sMSCV/oaty.
  • the weights Wi are chosen to best predict quiz scores with the constraint that they must sum up to 1 and that they are all positive. This is done with conventional constrained optimization. The constraints insure that the wISC values are bounded between -1 and 1. To avoid a biased estimate of predictability these weights were optimized for each subject on the gaze/score data leaving out that subject from the optimization, i.e. use leave-one out cross-validation.
  • Test performance was calculated as the percentage correct responses each subject gave for each video. For questions that had multiple correct options, points were given per correct selected options and subtracted per incorrect selected option.
  • the questionnaires were designed in pilot experiments to yield an even distribution of answer options from subjects that had not seen the videos. All questions and answer options can be found here. To estimate the baseline difficulty of the questions, separate naive cohorts of subjects were given the same questions without seeing the videos.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Pathology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Biophysics (AREA)
  • Veterinary Medicine (AREA)
  • Surgery (AREA)
  • Molecular Biology (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Physics & Mathematics (AREA)
  • Psychiatry (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Physiology (AREA)
  • Developmental Disabilities (AREA)
  • Hospice & Palliative Care (AREA)
  • Educational Technology (AREA)
  • Social Psychology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychology (AREA)
  • Cardiology (AREA)
  • Child & Adolescent Psychology (AREA)
  • Databases & Information Systems (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Signal Processing (AREA)
  • Radiology & Medical Imaging (AREA)
  • Dentistry (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Eye Examination Apparatus (AREA)
PCT/US2020/027605 2019-04-10 2020-04-10 Method for assessment of audience attention WO2020236331A2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP20809217.1A EP3952746A4 (de) 2019-04-10 2020-04-10 Verfahren zur bewertung der publikumsaufmerksamkeit
US17/450,415 US20220030080A1 (en) 2019-04-10 2021-10-08 Method for assessment of human attention

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201962831902P 2019-04-10 2019-04-10
US62/831,902 2019-04-10
US201962879765P 2019-07-29 2019-07-29
US62/879,765 2019-07-29

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/450,415 Continuation-In-Part US20220030080A1 (en) 2019-04-10 2021-10-08 Method for assessment of human attention

Publications (2)

Publication Number Publication Date
WO2020236331A2 true WO2020236331A2 (en) 2020-11-26
WO2020236331A3 WO2020236331A3 (en) 2021-02-11

Family

ID=73458885

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2020/027605 WO2020236331A2 (en) 2019-04-10 2020-04-10 Method for assessment of audience attention

Country Status (2)

Country Link
EP (1) EP3952746A4 (de)
WO (1) WO2020236331A2 (de)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113223271A (zh) * 2021-04-29 2021-08-06 读书郎教育科技有限公司 一种智能台灯学习视频控制系统及方法
CN116719418A (zh) * 2023-08-09 2023-09-08 湖南马栏山视频先进技术研究院有限公司 一种注视点预测模型的检验方法及装置

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010105034A2 (en) * 2009-03-11 2010-09-16 Corventis, Inc. Physiological monitoring for electronic gaming
WO2011031932A1 (en) * 2009-09-10 2011-03-17 Home Box Office, Inc. Media control and analysis based on audience actions and reactions
WO2011133548A2 (en) * 2010-04-19 2011-10-27 Innerscope Research, Inc. Short imagery task (sit) research method
WO2015034673A1 (en) * 2013-09-04 2015-03-12 Questionmark Computing Limited System and method for data anomaly detection process in assessments
US20170188930A1 (en) * 2014-09-10 2017-07-06 Oregon Health & Science University Animation-based autism spectrum disorder assessment
US9967618B2 (en) * 2015-06-12 2018-05-08 Verizon Patent And Licensing Inc. Capturing a user reaction to media content based on a trigger signal and using the user reaction to determine an interest level associated with a segment of the media content
WO2017152215A1 (en) * 2016-03-07 2017-09-14 Darling Matthew Ross A system for improving engagement

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113223271A (zh) * 2021-04-29 2021-08-06 读书郎教育科技有限公司 一种智能台灯学习视频控制系统及方法
CN116719418A (zh) * 2023-08-09 2023-09-08 湖南马栏山视频先进技术研究院有限公司 一种注视点预测模型的检验方法及装置
CN116719418B (zh) * 2023-08-09 2023-10-27 湖南马栏山视频先进技术研究院有限公司 一种注视点预测模型的检验方法及装置

Also Published As

Publication number Publication date
EP3952746A2 (de) 2022-02-16
EP3952746A4 (de) 2022-06-08
WO2020236331A3 (en) 2021-02-11

Similar Documents

Publication Publication Date Title
US10839350B2 (en) Method and system for predicting audience viewing behavior
US11200964B2 (en) Short imagery task (SIT) research method
Ochoa et al. The RAP system: Automatic feedback of oral presentation skills using multimodal analysis and low-cost sensors
JP5194015B2 (ja) 感覚的刺激への視聴者反応を決定する方法およびシステム
US20100004977A1 (en) Method and System For Measuring User Experience For Interactive Activities
Wegge Communication via videoconference: Emotional and cognitive consequences of affective personality dispositions, seeing one's own picture, and disturbing events
EP3952746A2 (de) Verfahren zur bewertung der publikumsaufmerksamkeit
WO2023002496A1 (en) Smart e-learning system using adaptive video lecture delivery based on attentiveness of the viewer
US20220030080A1 (en) Method for assessment of human attention
Grewal Awareness of physical activity levels and sedentary behaviour: An assessment of awareness of physical activity levels and sedentary behaviour among parents and children
Sari et al. Can Short Video Ads Evoke Empathy?
Vincent et al. Teaching psychology in virtual reality 2: Wherever you learn, there you are.
Chen et al. Familiar video stories as a means for children with autism: An analytics approach
Postma The Influence of Social Cues on Attention in YouTube Videos “An Eye Tracking Study”
Korving Visibility of Lecturers in Weblectures
Komine Image evaluation using biological information
Richards When Eyes and Ears Compete: Eye Tracking How Television News Viewers Read and Recall Pull Quote Graphics
Madsen et al. Eye movements predict test scores in online video education
Banerjee The Effects of Interactive CD-ROMs on Attention
Beiter et al. OPTIMAL CLASSROOM VIEWS FOR DEAF STUDENTS
AU2013273825A1 (en) Method and System for Determining Audience Response to a Sensory Stimulus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20809217

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020809217

Country of ref document: EP

Effective date: 20211110