GB2611161A

GB2611161A - Behavioural monitoring system

Info

Publication number: GB2611161A
Application number: GB2210869.0A
Authority: GB
Inventors: Javadi Amir-Homayoun
Original assignee: Individual
Current assignee: Individual
Priority date: 2021-09-20
Filing date: 2022-07-25
Publication date: 2023-03-29
Also published as: GB202210868D0; GB2611401B; GB2610926A; GB2611401A; GB202210867D0; GB202210869D0

Abstract

A system for monitoring human behaviour to complete a task using a computer over a predetermined time period comprises: a first camera 1708 to capture a first plurality of images of a user situated at a computer screen to monitor the user behaviour during the time period; a second camera 1710 to capture a second plurality of images of the working environment including a surface at which the user is working to complete the task using the computer and in which the user is positioned during the task; an environment processor 1718 to process the second plurality of captured images to determine objects within the working environment at the beginning of the time period and to process the second plurality of captured images to identify objects present within the working environment during the time period until the end of the time period; and an alert processor 1716 to compare the determined objects from the beginning of the task with objects within the working environment during the time period and to generate an alert if there has been any change in the number or type of objects positioned within the working environment during the time period.

Description

Behavioural Monitoring System

Field of the Invention

The present invention relates to a behavioural monitoring system and a method of monitoring user behaviour and more particularly, though not exclusively, the present invention relates to monitoring user behaviour for completing a task using a computer from a remote location. One non-limiting application is in the field of automated invigilation of online examinations for multiple different users.

Background

Remote behavioural monitoring is useful where one or more users are required to behave within a set of constraints in order to correctly carry out a task, for example to follow a set of rules in order to carry out the task correctly. Deviation of user behaviour from those set of constraints can lead to issues such as safety violations or dishonest interaction. One area of particular interest is invigilation.

Exam monitoring, otherwise referred to as proctoring or invigilation, has been a long standing, fundamental part of creating exam conditions in which cheating is prevented for examinations. While in-person exams have had years to develop an effective system to prevent cheating, online exams are fundamentally harder to monitor. Due to the inability to be physically present with examinees, there are many ways for examinees to cheat; ranging from, but not exclusive to, using learning materials, the internet, peer-to-peer communication and having somebody else take the exam for you. Reports on student perception would suggest that it is generally accepted that cheating in online exams is easier than in-person exams (King et al., 2009). Recent research by Chen et al. (2020) has shown that, if left unmonitored, a cheating effect can be seen in online exams. This was also seen to grow over time -becoming stronger or more prevalent, and was coupled with less time spent revising. As such, this raises questions over the credibility of unmonitored online exams and stresses the need for exam monitoring -as research would suggest, exam monitoring can reduce cheating and dishonest behaviour (Dendir & Maxwell, 2020; Sutton, 2019).

Many companies have developed online exam monitoring services -a way of helping to prevent cheating found not to negatively impact user performance (Lee, 2020). These are encompassed by two overarching approaches: one which locks and controls the user's computer and browser, while the other option includes both the prior and the recording of video and audio throughout the examination as well as related information, such as the number of monitors in use. However, these approaches are not perfect. The first approach is restricted in effectiveness due to the advancements of other technologies, such as smartphones, which provide easy access to information. Consequently, many education institutes utilise the second approach, despite the high expense. This approach is offered through one of two methods. The first method is live monitoring, this can be done by a machine or manually by a human and consists of monitoring the video and audio of examinees live during the exam. The second method is reviewing recorded data. Again, this can be done by machine or human. Regardless of the approach, both methods focus, somewhat primitively, on simple behaviours such as where the student is looking (gaze processing), processing of the background sounds (audio processing), and background monitoring in which it is ensured nobody other than the examinee enters the room (within camera-shot). Both gaze and audio processing work by producing a binary output based on a simplistic view towards cheating behaviour. In gaze processing, any time spent looking away from the computer screen is deemed as potential cheating behaviour, while in audio processing this is done if a human voice is heard. These approaches can be considered flawed for a number of reasons: Firstly, forcing the examinee to fixate on the screen for such long periods of time can be stress inducing and detrimental to the examinee's performance because of the following reasons: It is very natural for a subject to look away from the screen when they are thinking -with directional changes in gaze and furrowed eyebrows being amongst the most frequently observed behaviours in online exam monitoring (Kolski & Weible, 2018).

Behaviours such as a subject wanting to rest their eyes or simply looking at their watch can feel 'risky' due to the surveillance and can thus induce stress.

Pinpointing such vague behaviour can also lead to a vastly increased amount of footage that subsequently needs reviewing.

Secondly, due to the concerns outlined above, some companies offer human monitoring in order to develop more flexible decision making. This results in: The feeling of constantly being watched by a human which can be even more stress and anxiety inducing (Woldeab & Brothen, 2019), contributing to online proctoring's association with more negative test-taker reactions (Karim et al., 2014), with students associating such systems with increased feelings of anxiety and invasions of privacy (Gudino Paredes et al., 2021).

The system cannot be scaled easily -it requires spreading the human operator's monitoring across many examinees or utilising many more operators. For in-person exams, it is suggested that there should be one invigilator per maximum 30 examinees (Joint Council for Qualifications (JCQ), 2020). However, this may be hard to do online, potentially increasing the number of required invigilators/operators per exam -with ProctorU suggesting that one proctor per 3 to 4 examinees is ideal (ProctorU, 2018; Live Science, 2008).

The costs increase massively the more humans that are involved in the process. Furthermore, the increase in operators increases the risk of data leaks as well as increasing the overall subjectivity of the proctoring taking place. Other risks include the lack of operator availability in busy exam seasons and the increased chance of disruptions to the delivery of online exams -for example, through the inaccessibility of one operating centre.

Other logistic features, such as time of day (especially when considering the globalised world in which online education and assessment may take place -within different time zones), can become very difficult to navigate between the examinees and the operators.

With such considerations in mind, the world of online exam monitoring is far from perfect -with some research suggesting that online exam monitoring results in greater cost than reward for the institutes (Cluskey et al., 2011). There is a need for platforms that look beyond this simplistic yes/no, binary output behaviour to be developed. At stake is not only the educational institute's finances, but also the credibility of the exams they administer as well as the mental well-being and peace of mind of the examinees. In a review assessing both manual and artificial intelligence (Al) based exam monitoring, the issue of striking a balance between trust and privacy arose frequently (Nigam et al., 2021). While Al exam monitoring may be less intrusive, it can be difficult to ensure trustworthiness using the current models of Al, with respect to assessing fraudulent behaviours, to the same degree as a human operator. Once again, this emphasises the need to develop these measures beyond their current capacity to improve the effectiveness and fairness and suggests the need for a technological solution. The present disclosure seeks to address at least some of the above-described problems with known behavioural monitoring techniques and systems.

Summary

According to one aspect of the present invention there is provided a behavioural monitoring system for monitoring human behaviour to complete a task using a computer, or other mobile device, such as a tablet or mobile phone. The system comprises a camera for capturing one or more images of a user situated at a computer screen, the images including the eyes of the user. The system also comprises a gaze detection processor configured to process the captured images to determine a focus location on the computer screen at which the user's eyes are focussed and a first timestamp associated with the time when the user's eyes are focussed on the focus location. The system also comprises an input device location processor configured to determine or receive location data of a user-controllable cursor displayed within the computer screen, the location data specifying a current screen location of the cursor, and to determine a second timestamp relating to the time at which the cursor is at the current screen location. The system also comprises an alert processor configured to determine when the current location of the cursor and the focus location are aligned; to compare the first and second timestamps related to the aligned location; and to generate an alert if the first timestamp indicates a time after the second timestamp.

The alert processor may be configured to determine alignment when the current location of the cursor and the focus location are within a predetermined distance from each other on the computer screen.

The predetermined distance may be relatively small compared to the size of the computer screen. For example, the predetermined distance may be around 6 to 8 % of the screen size.

The behavioural monitoring system may further comprise a content generator configured to provide a plurality of instances of output on the screen. The alert processor may be configured to determine alignment of the cursor and the focus location in response to each output and to generate the alert if the first timestamp indicates a time after the second timestamp for each of the plurality of outputs.

The input device location processor may be configured to determine the current location of the cursor by detection of a cursor interaction with data or objects presented on the screen.

The input device may detect two-dimensional motion relative to a surface and translate that motion into motion of the cursor on the computer screen, and the input device location processor may detect actuation of a button of the input device as the cursor interaction.

The behavioural monitoring system may further comprise an input device operable by the user to control the position of the cursor on the computer screen and a transmitter for transmitting the position of the cursor as determined by the user input device to the input device processor.

According to another aspect of the present invention, there is provided a method of monitoring human behaviour to complete a task using a computer. The method comprises capturing one or more images of a user situated at a computer screen, the images including eyes of the user. The method also comprises processing the captured images to determine a focus location on the computer screen at which the user's eyes are focussed and generating a first timestamp associated with the time when the user's eyes are focussed on the focus location. The method further comprises determining or receiving location data of a user-controllable cursor displayed within the computer screen, the location data specifying a current computer screen location of the cursor, and generating a second timestamp relating to the time at which the cursor is at the current computer screen location. The method also comprises establishing when the current location of the cursor and the focus location are aligned, comparing the first and second fimestamps related to the equivalent location; and generating an alert if the first timestamp indicates a time after the second timestamp.

According to a further aspect of the present invention, there is provided a behavioural monitoring system for monitoring human behaviour to complete a task using a computer. The behavioural monitoring system comprises a camera for capturing one or more images of a user situated at a computer screen, the images including eyes of the user. The system also comprises a gaze detection processor configured to process the captured images to determine a focus location on the computer screen at which the user's eyes are focussed. The system further comprises an input device location processor configured to receive location data of a user-controllable cursor displayed within the computer screen, the location data specifying a current screen location of the cursor, and to determine a timestamp relating to the time at which the cursor is at the current screen location. The system additionally comprises an alert processor configured to determine if the focus location is within a predetermined distance of the current screen location of the cursor during a predetermined time period from the timestamp and to generate an alert if the distance between the focus location and the current location of the cursor remains greater than the predetermined distance during the predetermined time period.

According to yet a further aspect of the present invention, there is provided a behavioural monitoring system for monitoring human behaviour to complete a task using a computer. The system comprises a camera for capturing a plurality of images of a user situated at a computer screen, each images including eyes of the user. The system also comprises a gaze detection processor configured to process the captured images to determine a focus location on the computer screen at which the user's eyes are focussed at a particular moment in time and a pattern construction processor configured to process a plurality of determined focus locations for the user over a time period and to create a determined pattern of movement of the user's eyes during the time period. The system further comprises a data store storing a first set of patterns of user eye movement relating to acceptable user behaviour, and a second set of patterns of user eye movement relating to unacceptable user behaviour. The system further comprises a pattern matching processor configured to compare the determined pattern to the first and second sets of patterns of eye movement stored in the data store; and to generate an alert if the determined pattern matches any of the second set of patterns stored in the data store.

The gaze detection processor may be configured to determine if the plurality of focus locations is within a screen area representing the location of the computer screen.

The first set of patterns may comprise a reading pattern located within the screen area.

The second set of patterns may comprise a reading pattern located outside the screen area.

The reading pattern may comprise a scan pattern comprised of a series of substantially parallel lines in one axis where an end of each line is connected to a beginning of an adjacent line in a raster to form a continuous line.

The reading pattern may comprise a series of fixation points separated in time and location.

The behavioural monitoring system may further comprise a learning processor configured to use machine learning to utilise different eye movement patterns which occur during a period of acceptable behaviour and to categorise these patterns within the first set of patterns.

The gaze detection processor may be configured to detect a user's head movements during the period of acceptable behaviour and the learning processor is configured to use the learned head movements as part of the first set of patterns.

The learning processor may be configured to use machine learning to utilise different eye movement patterns which occur during a period of unacceptable behaviour and to categorise these patterns within the second set of patterns.

The gaze detection processor may be configured to detect a user's head movements during the period of unacceptable behaviour and the learning processor may be configured to use the learned head movements as part of the second set of patterns.

The behavioural monitoring system may further comprise a behavioural calibration processor arranged to provide the user with a series of test tasks using the computer screen over a test time period and to determine the first set of patterns using the output of the pattern construction processor over the test time period.

The data store may comprise a neural network, and the output of the pattern construction processor over the test time period may be used to train the neural network to recognise the user's acceptable behaviour to the test tasks.

The data store may be configured initially to provide the first set of patterns derived from the results of testing a plurality of different users and then at a later point in time to provide the first set of patterns derived from the results of testing the current user.

The camera may be configured to capture an image of the eyes and the face of the user and the gaze detection processor may be configured to process the image data relating to the face.

The camera may be configured to capture an image of a head of the user and the gaze detection processor is configured to process the image data relating to the positioning of the head.

The gaze detection processor may be configured to determine the focus location using facial landmarks from the captured image.

The facial landmarks may include one or more of corners of lips, tip of nose, and corners of each eye.

The gaze detection processor may be configured to detect locations of pupils of the user within the one or more images using a detected shape of an eyelid within the one or more images.

The behavioural monitoring system may further comprise a gaze calibration engine configured to output a plurality of stimuli at different locations on the computer screen, each stimulus being output for a predetermined period of time, and to receive a focus location corresponding to each stimulus as determined by the gaze detection processor. The gaze calibration engine may be configured to create a spatial relationship between the camera, the eyes of the user and the computer screen to enable specific user eye movements to be calibrated to specific locations on the computer screen.

The gaze calibration engine may be configured to output the plurality of stimuli sequentially, in random positions about the computer screen.

The gaze detection processor may be configured to process the plurality of captured images to determine the position of the user's head in response to the presentation of each of the plurality of stimuli.

The camera may be configured to capture images at a capture rate of 25 to 30 images per second. The gaze detection processor may determine the focus location at the capture rate and the input device location processor may receive or determine the current location at the capture rate.

According to another aspect of the present invention, there is provided a method of monitoring human behaviour to complete a task using a computer. The method comprises capturing a plurality of images of a user situated at a computer screen, each image including eyes of the user. The method also involves processing the captured images to determine a focus location on the computer screen at which the user's eyes are focussed at a particular moment in time, processing a plurality of determined focus locations for the user over a time period and to create a determined pattern of movement of the user's eyes during the time period, and storing a first set of patterns of user eye movement relating to acceptable user behaviour, and a second set of patterns of user eye movement relating to unacceptable user behaviour. The method also involves comparing the determined pattern to the first and second sets of patterns of eye movement stored in the data store and generating an alert if the determined pattern matches any of the second set of patterns stored in the data store.

According to a further aspect of the present invention, there is provided a behavioural monitoring system for monitoring human behaviour to complete a task using a computer.

The system comprises a camera for capturing a plurality of images of a user situated at a computer screen, each image including eyes of the user. The system also comprises a gaze detection processor configured to process the captured images to determine a gaze direction in which the user's eyes are gazing at a particular moment in time and a screen view determining engine for determining an acceptable range of user gaze directions which indicate that the user is gazing at a location on the screen. The system further comprises a hotspot processor configured to compare the determined gaze direction of the user with the acceptable range of gaze directions and to determine if the determined gaze direction is outside the acceptable range of gaze directions, and to generate an alert if a period of time the determined gaze direction is outside the acceptable range is greater than a predetermined time period.

The hotspot processor may be configured to determine if the determined gaze direction is fixated on a single location/area outside a display area of the computer screen and if so, whether the time period of the fixation is greater than the predetermined time period.

The hotspot processor may be configured to determine if the determined gaze direction is repeatedly fixated on a single location/area outside the locations within the computer screen over a viewing time period.

The behavioural monitoring system may further comprise a further camera configured to capture a plurality of environment images of the working environment in which the user is positioned during the task. The working environment may include areas around the computer screen at which the user is working to complete the task. The system may further comprise an environment processor configured to process at least one of the plurality of environment images to determine acceptable and unacceptable objects within the working environment. The hotspot processor may be configured to compare the single location/area with the locations of objects determined by the environment processor during the time period and to generate an alert if the single location corresponds to the location of an unacceptable object.

The behavioural monitoring system may further comprise a gaze calibration engine configured to output a plurality of stimuli at different locations on the computer screen, each stimulus being output for a predetermined period of time, and to receive a gaze direction corresponding to each stimulus as determined by the gaze detection processor. The gaze calibration engine may be configured to create a spatial relationship between the camera, the eyes of the user and the computer screen to enable specific user eye movements to be calibrated to specific gaze directions determined by the stimuli on the computer screen.

According to a further aspect of the present invention, there is provided a method of monitoring human behaviour to complete a task using a computer. The method comprises capturing a plurality of images of a user situated at a computer screen, each image including eyes of the user. The method further comprises processing the captured images to determine a gaze direction in which the user's eyes are gazing at a particular moment in time. The method also comprises determining an acceptable range of user gaze directions which indicate that the user is gazing at a location on the screen, comparing the determined gaze direction of the user with the acceptable range of gaze directions, and determining if the determined gaze direction is outside the acceptable range of gaze directions. The method further comprises generating an alert if a period of time the determined gaze direction is outside the acceptable range is greater than a predetermined time period.

According to yet a further aspect of the present invention, there is provided a behavioural monitoring system for monitoring human behaviour to complete a task using a computer over a predetermined time period. The system comprises a first camera configured to capture a first plurality of images of a user situated at a computer screen to monitor the user behaviour during the time period and a second camera configured to capture a second plurality of images of the working environment in which the user is positioned during the task. The working environment includes a surface at which the user is working to complete the task using the computer. The environment processor is configured to process at least one of the second plurality of captured images to determine objects within the working environment at the beginning of the time period and to process at least some of the second plurality of captured images to identify objects present within the working environment during the time period until the end of the time period. The system also comprises an alert processor configured to compare the determined objects from the beginning of the task with objects within the working environment during the time period and to generate an alert if there has been any change in the number or type of objects positioned within the working environment during the time period.

The second camera may be configured to capture a 360-degree image or video of the environment at the start of the predetermined time period and the environment processor may be configured to build a three-dimensional representation of the environment.

In some embodiments, the second plurality of images comprises a first and second image of the environment taken from a first position and second position respectively, wherein representations of a first object and a second object are captured in each of the first image and second image; and the environment processor is configured to: determine a displacement distance between the first and second positions; determine a first linear displacement distance of corresponding points of the first object between the first and second images; determine a second linear displacement distance of corresponding points of the second object between the first and second images; and determine locations of the first and second objects with respect to the first and second positions using the displacement distance, the first linear displacement distance and the second linear displacement distance using a stereoscopic matching function.

In another embodiment, the second plurality of images comprises a first and second image of the environment taken from a location in a first angular direction and a second angular direction respectively, wherein representations of a first object and a second object are captured in each of the first image and second image; wherein the environment processor is configured to: determine an image displacement angle between the first and second angular directions; determine a first angular displacement of corresponding points of the first object between the first and second images; determine a second angular displacement of corresponding points of the second object between the first and second images; and determine locations of the first and second objects with respect to the location using the image displacement angle, the first angular displacement and the second angular displacement using an angular stereoscopic matching function.

The determined locations of the first and second objects as described above can be used to build a 3D model of the environment in a far more efficient manner than has been possible previously.

Preferably the environment processor is further configured to apply a low-pass filter to the first and second images to effect smoothing.

The second plurality of images may comprise images captured at random points in time during the predetermined time period.

The first camera may be a camera with a fixed position in relation to the screen.

The second camera may be a portable camera which can be appropriately configured to capture the working environment.

The behavioural monitoring system may further comprise a communications engine configured to transmit the captured images and or alerts to a remote computer via a communications network.

According to an additional aspect of the present invention there is provided a method of monitoring human behaviour to complete a task using a computer over a predetermined time period. The method comprises capturing a first plurality of images of a user situated at a computer screen to monitor the user behaviour during the time period and capturing a second plurality of images of the working environment in which the user is positioned during the task. The working environment includes a surface at which the user is working to complete the task using the computer. The method further comprises processing at least one of the second plurality of captured images to determine objects within the working environment at the beginning of the time period and to process at least some of the second plurality of captured images to identify objects present within the working environment during the time period until the end of the time period. The method also comprises comparing the determined objects from the beginning of the task with objects within the working environment during the time period and generating an alert if there has been any change in the number or type of objects positioned within the working environment during the time period.

An embodiment described in this application offers a system that reaches far beyond the conventional methods of online examine monitoring through the use of cutting-edge technology and research. Some of the features of this system are: Advanced machine vision methods to extract a fairly accurate gazing location based on the head pose and pupil location in the 2D image; Advanced machine learning methods to extract profiles of gazing points and examinee's behaviour; * Advanced artificial intelligence to match profiles of gazing points and examinee's behaviour with honest or dishonest patterns of activities; and * Statistics and machine learning to create immediate and long-term profiles of activities for each examinee and the group of examinees; All these features support a behavioural monitoring system that is more transparent to the users, leading to a better exam experience for the examinee and examiner while simultaneously providing the educational institute with a competitively priced yet more developed and insightful service. Advantageously, reducing reliance on live monitoring by humans, as well as pinpointing more accurate (and subsequently less frequent and vague) suspicious behaviour goes a long way to relieving some of the anxiety felt by students as a result of being watched/recorded (Gudirio Paredes et al., 2021). Thus, the present disclosure serves to lead the way into a fairer world of online examination monitoring.

Furthermore, the present disclosure addresses several technical problems associated with the process of behavioural monitoring. Firstly, within a remote monitoring application how can the system automatically and efficiently determine whether the data input received from the subject being monitored is fraudulent? The fraudulent data itself may be almost identical to true data in terms of content so how can it be distinguished automatically? Whilst capturing images of a subject's behaviour is known, determining fraudulent behaviour using reliable data processing techniques which minimise the amount of processing power required is a problem which the present disclosure overcomes. For example, the use of timing information rather than wholly relying on image processing techniques, can significantly reduce the processing load on the server. Whilst this may not represent a significant problem for one subject being monitored it does represent a significant technical problem when scaled to monitoring tens of thousands of subjects simultaneously in spaced-apart geographical locations over a wide area communications network, as is typically required in remote invigilation situations.

Other methods of reducing the amount of image data that has to be processed can also be used. For example, minimising the portion of the captured image data to be processed at a high resolution to the portion of the image which relates to the subject's eyes (gaze detection processor) rather than processing the entire image at a high resolution significantly reduces the amount of data that needs to be processed. In this case, the region in the captured image of the subject's eyes are first determined from a course data processing of the captured image (for example determining a framework of the subject's head from several data points such that the location of the eyes are known) before the specific region of interest (the subject's eyes are processed in higher resolution to determine direction of gaze). Also once the location of the subject's eyes has been determined, this will not vary significantly in subsequent frames or can be tracked by monitoring the movement of a single point of the framework) and, as such, the region of interest for high resolution processing can be tracked and adjusted from frame to frame without requiring a new determination of the region of interest in the captured image. Other simple image processing techniques can also be employed, such as a simple measurement of distance between the location of a cursor on the screen and the user's focus on the screen, simple pattern matching using an Al processor trained on acceptable and unacceptable patterns, or timing of direction of gaze.

Furthermore, the issues of accuracy in determining the direction of gaze of a subject are addressed by creating a spatial framework between a camera, a screen and the subject's head before seeking to determine positioning of the subject's head and pupils within that framework. Such calibration is adaptive to each specific user and their particular positioning relative to their camera and screen and provides a framework from which accurate directional determinations can be made.

In the following description it is to be noted:

* The terms 'examiner' and 'teacher' are used interchangeably.

* The terms 'examinee', 'user', 'candidate', 'subject' and 'students' are used interchangeably.

* The terms 'exam' and 'task' are used interchangeably. However the term 'task' is more generalised including computer interaction activities which are not online examinations, for example interviews or behavioural monitoring of vehicle use.

* Unless otherwise specified, each section below represents a tree diagram.

Brief Description of the Drawings

In the drawings: Figure la is a schematic diagram of a behavioural monitoring system with three candidates (users) each facing a monitor, a remote server, and remote data store embodying the present invention; Figure lb is a schematic diagram of an alternative behavioural monitoring system with three candidates each facing a monitor, a web server, an analysis server and a remote data store embodying the present invention; Figure 2 is a schematic diagram showing the set up for a single candidate using the behavioural monitoring system of Figure la or lb, comprising a monitor, webcam, smartphone, keyboard and mouse; Figure 3 is a flow chart showing a method for operating the behavioural monitoring system of Figure la or 1 b; Figure 4 is a flowchart of a method for collecting preparatory data; Figure 5 is a schematic diagram showing a candidate's gaze being calibrated by the behavioural monitoring system of Figure la or lb to determine the area of the screen; Figure 6 is a schematic diagram showing a calibration being carried out by the behavioural monitoring system of Figure 1 to extract the spatial location of the head and eyes; Figure 7 is a schematic diagram showing a first representation of the user's computer and first and second objects in the user's environment being captured by the first camera at a first position; Figure 8 is a schematic diagram showing a second representation of the user's computer and the first and second objects in the user's environment of Figure 7 captured by the first camera at a second position; Figure 9 is a schematic diagram showing an aerial (plan) view of the first and second objects in the user's environment of Figures 7 and 8, and the fields of view of the first camera when positioned at the first position and the second position; Figure 10 is a schematic diagram showing a first image of the first and second objects captured by the first camera at the first position and a second image of the first and second objects captured by the first camera at the second position; Figure 11 is a schematic diagram showing different respective views of an object with an edge connected to a further object, captured by the first camera at respective first and second positions.

Figure 12 is a block diagram of a candidate's computer shown in Figure 2, comprising a local data store, monitor, camera, clock, communications engine, gaze calibration engine, gaze detection processor, content generator, input device location processor and alert processor in accordance with an embodiment of the present invention; Figure 13a is a schematic diagram showing a face mesh fitted to the face of the user to extract landmarks by the behavioural monitoring system of Figure la or lb; Figure 13b is a schematic diagram showing landmarks obtained in Figure 13a in the 3D space. The shown landmarks represent a subset of the landmarks extracted using toolboxes such as MediaPipe by the behavioural monitoring system of Figure la or lb; Figure 14 is a schematic diagram showing a calibration being performed by the behavioural monitoring system of Figure 1 to extract the spatial relationship between the camera, screen, and the user's head; Figure 15a is a schematic diagram showing the orientation of the centre of the head and the eyes of the user; Figure 15b is a schematic diagram showing the spatial relationship between eyes and the centre of the head of the user; Figure 16 shows an example array of events at different timepoints.

Figure 17 is a block diagram of a candidate's computer shown in Figure 2, comprising a local data store with stored patterns, a clock, a communications engine, a pattern matching processor, a behavioural calibration processor, a gaze calibration engine, a gaze detection processor, pattern construction processor, a learning processor, a monitor, mouse, keyboard, and camera, in accordance with a further embodiment of the present invention.

Figure 18 is a schematic diagram showing different patterns of eye movement on the screen (a reading pattern) and on the wall (an eye wandering pattern) relative to the user's head as determined by the behavioural monitoring system of Figure 17; Figure 19a is a polar histogram showing the angle of eye movement for patterns of reading behaviour as determined by the behavioural monitoring system of Figure la or 1b; Figure 19b is a polar histogram showing the angle of eye movement for patterns of eye wandering behaviour as determined by the behavioural monitoring system of Figure la or 1 b; Figure 20 is a block diagram of a candidate's computer, comprising a local data store, a clock, a communications engine, a hot spot processor, an environment processor, a screen view determining engine, a gaze calibration engine, a gaze detection processor, a monitor, a mouse, a keyboard, a first camera and a second camera, in accordance with yet a further embodiment of the present invention.

Figure 21 is a schematic diagram showing the a 3D representation of Figure 17 of the environment and a fixation point on the wall as determined by the behavioural monitoring system of Figure 1, where (see Figure 17 for further details). The patch on the wall to which the user is looking at (dashed line) indicates hotspots as a result of elongated fixation on different areas of the wall; Figure 22 is a block diagram of a candidate's computer shown in Figure 2, comprising a local data store, a clock, a communications engine, an alert processor, an environment processor, a monitor, a mouse, a keyboard, a first camera and a second camera, in accordance with an additional embodiment of the present invention.

Figure 23 is a schematic diagram showing a 3D representation of an environment around the user, comprising a camera, screen, smartphone on the user's desk, keyboard, mouse, sample objects on the desk, sample frames on the wall, and an open space on the wall; Figure 24 is a flowchart of the steps involved in the collecting task data step of Figure 3; and Figure 25 is a flowchart of a method for processing task data using the behavioural monitoring system of Figure 1.

Detailed Description of Exemplary Embodiments Behavioural Monitoring System There is presented a behavioural monitoring system that monitors human behaviour whilst one or more users are carrying out a task using a computer and generates an alert if one or more criteria are met indicating that a person has not complied with task requirements. For example, the behavioural monitoring system may be used to monitor how a candidate behaves whilst sitting an online examination and generate an alert to an invigilator if the system determines that the candidate may be cheating (behaving in a fraudulent manner).

In another example, the behavioural monitoring system may be used to monitor how an interviewee behaves whilst being interviewed remotely and generate an alert to the interviewer if the system determines that the interviewee is cheating (e.g., receiving external assistance). In a further example, the behavioural monitoring system may be used to monitor a driver of a vehicle and generate an alert to the driver or a third party if the system determines that the driver is not complying with one or more requirements of the highway code (e.g., not checking mirrors at when overtaking another vehicle or not directing their gaze to the road ahead for more than a predefined time threshold). In yet a further example, the behavioural monitoring system may be used to monitor the participant of an online game and generate an alert if they are not adhering to one or more rules of the game. For example, this may be used during competitions such as an online chess match (for example using a chess engine on a second computer) or when the participant is completing the game for monetary compensation.

There are several aspects in the present disclosure which all provide an improved technique and system for behavioural monitoring during a task. Each of these can be carried out by use of a computer with a monitor, a camera, a communications link to a central server, and optionally a smartphone (or equivalents thereof). As will be explained later, these basic elements can be configured with software applications to monitor a candidate.

The following description of the behavioural monitoring system will be described in reference to a candidate carrying out an online examination although it should be recognised that the system may be used in other situations, such as those provided as examples above.

The term candidate' and user' are used interchangeably to refer to a person carrying out the task. Although the following description focuses on a single user carrying out a task, it should be recognised that, in a typical embodiment, multiple users would typically be carrying out tasks simultaneously, with one or more third parties monitoring their behaviour remotely.

A schematic diagram of the behavioural monitoring system is shown in Figure la. The system may comprise one or more users 100, each with a computer 102, a remote server 104 connected to the one or more users 100 via the internet 106, a remote data store 108 comprising one or more data files 110, which store captured user behaviour from multiple locations. These files 110 can be used to determine trends across many different users and can inform better analysis by determining quantitatively boundaries of acceptable behaviour. An overseeing computer 112, which receives an alert that indicates whether the user is likely to have cheated or otherwise not complied with the relevant regulations, is also provided.

The behavioural monitoring system may include, at each user interaction station, a first camera 114 per user (such as an external or internal webcam or any other type of camera) arranged to take photos of at least the eyes of each user or record a video of at least the eyes of each user. The first camera 114 may for example be mounted or placed on or around the computer screen 103. The first camera 114 may capture the face, head, torso and/or body of each user 100. The behavioural monitoring system may further include a computer mouse 116 per user at each user interaction station. The behavioural monitoring system may optionally include other means that allow a user to interact with the computer, such as a keyboard 118, stylus, or digital pen 120. The behavioural monitoring system may further include, at each user interaction station, a second camera 114 per user arranged to take one or more photos or videos of the environment of each user 100. The second camera 114 may for example be the camera of a mobile phone, such as a smartphone. The environment of each user refers to the environment at each user interaction station, namely the environment in which they are to undertake the task and may include the user's desk, one or more walls of the room in which the user is completing the task in and any objects within the field of view, such as but not limited to a computer/laptop, one or more chairs, doors, windows, paper, stationary, food, food containers and/or drinks bottles.

As shown in Figure 1b, the behavioural monitoring system may comprise a web server 118 rather than a remote server 104. Data relating to the behaviour of the user 100 is transmitted from the user's computer 102 to the web server 118 via the internet 106. Such data may also be analysed at an analysis server 120 to determine aggregated data from multiple users 100 and trends in the ways that users 100 are reacting to information provided on their respective screens 103 and stored in a remote data store. This captured data can be analysed in real time to provide a concurrent view of reactions to the presented data and/or to set boundaries based on quantitative data for acceptable or unacceptable behaviour. For example, in an invigilation of an exam, an error in the paper may manifest itself as a common reaction by a plurality of users to being presented with that information and this would be captured in real time. Analysed data may be sent to the overseeing computer 112, together with a recommendation of what action to take, for example allocating more time to complete the examination.

Figure 2 is a diagram showing an example set-up for a single user 200 completing the task. The camera 214 is positioned to take images of the user 200 when they are facing a computer 202 which they are using to complete the task. For example in this embodiment, the camera 214 is positioned on the monitor 203 as shown in Figure 2. The further camera 214 is provided and positioned to capture images and/or videos of the user 200 and/or their environment, such as but not limited to the surface of the user's desk. The computer mouse 216 may be wired or wireless (as in this embodiment).

Method for Monitoring Human Behaviour Figure 3 is a flowchart showing an example method for monitoring human behaviour. The process comprises two distinct stages preparatory data collection and task data collection.

At Step 300, preparatory data is collected before being processed at Step 302. At Step 303, preparatory data or processed preparatory data is stored at the local data store and subsequently uploaded to the remote server 104. Steps 300 to 303 are repeated until all the preparatory data has been collected, processed, stored and uploaded. The second stage commences with task data being collected at Step 304. At Step 306, the task data is stored locally and subsequently uploaded to the remote server 104. At Step 308, the task data is analysed and the uploading of task data to the remote server 104 is completed. Human behavioural profile(s) are updated at Step 310. Data may continue to be updated as the human behavioural profile is being updated or once the human behavioural profile has been updated. At Step 312, the analysis of the task data is fed back to a party monitoring the data analysis, such as an examiner/invigilator. Steps 304 to 312 are repeated until all the task data has been collected, stored, uploaded, analysed, used to update the human behavioural profile(s) and the analysis has been fed back to the examiner/user monitoring the overseeing computer 112.

Collection of Preparatory Data The collection of preparatory data may include capturing environment data, desk data and/or eye-calibration data. The collection of preparatory data is typically a one-time only procedure for any given task and occurs before the task is started.

An example method for collecting the preparatory data is shown in Figure 4. At Step 400, the process of collecting environment data commences with a 3600 video of the user's environment being captured using the second camera. At Step 402, a 360° video of the user's desk is captured using the second camera.

As will be discussed in further detail below, the 360° environment data is processed at Step 404 to construct a 3D representation of the environment. At Step 406, the 360° environment data is processed to extract items of the user's desk.

Next eye calibration data is collected. This begins at Step 408, with eye calibration data being collected using the first camera. At Step 410, the eye calibration data is processed to create a frame of reference for the camera, monitor/screen and region that the candidate's head is placed in.

Capturing environment data may include capturing a video of the environment where the task is to be carried out. The video may for example be a 360° video. The video is typically captured using the second camera, which may be a smartphone. The second camera may remain stationary while the preparatory data is collected, or the user may be required to move the second camera (e.g., by moving the smartphone) to capture the environment preparatory data. For example, the user may be instructed to move the second camera around to capture each wall of the room in which they are carrying out the task, the ceiling, the corners of the room, the underside of their chair, the floor area below their chair, and/or any windows in the room. Alternatively, or additionally, the second camera may be positioned to capture a particular field of view. This may include at least the user, their desk and/or the computer they are using to carry out the task. The second camera may remain stationary while the data is collected.

Capturing desk data may include capturing a video of the content of the user's desk where the user is to sit or stand at while they undertake the task. The video may for example be a 360° video. The video is typically captured using the second camera, which may be a smartphone. The second camera may remain stationary while the preparatory desk data is collected, or the user may be required to move the camera (e.g., by moving the smartphone) to capture the preparatory desk data. For example, the user may be instructed to move the second camera around to show the top of their desk, the underside of their desk and/or floor area below the desk, behind the monitor they are using to carry out the task, underneath and behind their keyboard and/or underneath and/or behind any objects, such as a notebook.

The collection of eye-calibration data may include showing one or more stimuli (such as a spot) in different locations on the user's monitor screen and recording, at the first camera, the user's eye movement in response to that stimuli. The stimuli (e.g., dots) may be sequentially presented all over the screen and the user may be instructed to direct their gaze towards the stimuli such that images and/or videos of the user's eyes when looking at different parts of the screen are captured. The angle of the user's line of vision and/or head may change when a stimulus (e.g., a dot) appears in a different location on the screen. The camera may capture images and/or videos of the user's eyes and/or head. The collection of eye-calibration data may further include determining when a user reacted to the stimulus (e.g. moved their eye when the stimulus changed position on the screen). This may be used to determine the user's reaction time.

Figure 5 shows how the eye calibration data is collected to extract the spatial location of the user's head and eyes. Stimulus may appear at different positions on the computer screen 503. For example, a stimulus 504 (e.g., a spot) may appear on the computer screen 503 at a first location and sequentially move to different locations on the computer screen. In the example shown in Figure 5, a spot appears, in turn, in each corner of the screen 503, the centre of the screen 503 and the middle of each edge of the screen 503. The user 500 is instructed to direct their gaze to each spot and the system may record the position of their eyes and head each time the spot is at a different location on the screen 503.

Figure 6 shows how the eye calibration data is collected to extract the spatial relationship between the first camera 614, computer screen 603 and candidate's head 600. The user may for example be asked to look at points displayed on the computer screen 603 to calibrate the direction of the user's gaze. Figure 6 also shows how the user may move their head within a zone 605 (a volume) and be within the field of view of the first camera 614.

Processing of Preparatory Data At Step 302 of Figure 3, the preparatory data is processed. The preparatory data may be processed at the user's computer 102, the overseeing computer 112, the remote server 104 and/or the web server 118. Some preliminary data analysis is carried out on the user's computer such as to code/compress the data that is to be transmitted and/or to generate a user warning for an easily detectable behaviour such as a mouse pointer being moved to a second monitor On this case warning the user not to use a second monitor). Clearly the more data processing that can be carried out on the user computer the less data has to be transmitted which can reduce the communication burden. However this has to be balanced with the security of the data as well as the abilities of the user computer.

The processing of the preparatory data may include processing the environment preparatory data to construct a three-dimensional (3D) representation of the environment where the user is going to undertake the task (e.g., walls, desk, computer/laptop, chair, doors and windows).

Processing the environment data may include using image recognition software to identify the objects in the field of view captured by the second camera and to determine their location within the environment. For example, this may include determining that there is a notepad, pen and water bottle on the desk next to the user. A model of the environment may be generated. The model may represent the user's environment in 360 degrees around the camera. The image recognition software may also identify objects that are not permitted, such as piece of paper attached to a wall or the ceiling of the room in which the user is carrying out the task.

In one example, the 3D representation of the environment is created using at least two images of the environment. The displacement and/or rotation of the second camera from one frame to the other may be identified and points within the at least two images may be mapped to 3D positions within the environment. For example considering a displacement technique, the further away the point is from the second camera lens, the less the point moves (from one image to another) with displacement of the camera. This concept is illustrated in Figures 7 to 10. Figure 7 shows a representation of the user's computer 603 and first and second objects 615, 616 in the user's environment captured by the second camera at a first position 619. Figure 8 shows a representation of the user's computer 603 and the first and second objects 615, 616 in the user's environment captured by the second camera at a second position 620. Figure 9 shows an aerial (plan) view of the first and second objects 615, 616 in the user's environment, and the fields of view 617, 618 of the second camera when positioned at the first position 619 and the second position 620 respectively. The marks 621, 622 show the centre point of the fields of view 617, 618 of the second camera when positioned at the first and second positions 619, 620 respectively. The distance between the first and second positions of the second camera 620, 621 is demi. Figure 10 shows a first image 623 of the first and second objects 615, 616 captured by the second camera at the first position 619 and a second image 624 of the first and second objects 615, 616 captured by the second camera at the second position 620. (These images are not representative of the actual images which would have been captured from the camera positions shown in Figure 9). The distance between the first and second objects 615, 616 in the first image is di. The distance between the first object 615 in the first image 623 and the first object 615 in the second image 624 is shown as cla,be, and the distance between the second object 616 in the first image 623 and the second object 616 in the second image 624 is shown as dconder. The distance between the first and second objects 615, 616 in the second image 624 is shown as d2. In Figures 7 to 10, the second object 616 (the cylinder) is positioned closer to the second camera. Therefore, in one parallel displacement, as shown by dm, the second object 616 moves more in the image (dconder > -cube, d 1 and the distance between the two objects changes (di # d2). The distances between objects within the environment and the second camera can therefore be identified based on di, d2, dcylinder, dcube and the displacement (and rotation in another embodiment) of the camera. This distance between the second camera and each object is a function of these parameters which is well known in optical distance calculation, for example stereo matching.

Subsequently the 3D representation of the environment can be generated in a relatively simple and efficient manner without requiring detailed image processing and complex 3D model creation but rather using such algorithmic calculations for positioning of objects within the 3D model of the environment. Considering that images are taken at discrete time points and images are prone to noise, a smoothing algorithm (e.g., a low-pass filter) may be applied to calculate the rough estimation of the locations for the calculations.

In a further example, edges and connected objects may be identified using at least two images of the environment. Surfaces of the objects can be created and the angles of the surfaces can be adjusted based on the changes in the position of the objects in the at least two image frames, captured when the first camera is positioned at different positions. With all the surfaces shown in the images constructed, a 3D representation of the environment can be generated. For example, as shown in Figure 11, whilst the real length of a wall dwall is constant between a first and second image of the environment, the distance, di, between two ends of the wall as they appear in the first image, and the distance, d2, between the same ends of the wall as they appear in the second image, are different across the two images. The values of di, d2 and the displacement (and rotation in another embodiment) of the second camera can be used to determine the real length of the wall dwausing similar the -S techniques to those described in the previous example above, Generating he 3D representation of the environment in this way advantageously means that surfaces, such as walls, are constructed using far less processing power by using an algorithm instead of complex 3D models to build the environment. The need for smoothing (as mentioned above) is less for larger distances, sizes and areas.

In an embodiment where the second camera is rotated about a generally vertical axis, the apparent angular movement of objects nearer to the camera will be greater than those located at a further distance. This can be used in a similar manner to the displacement embodiment described above, to create 3D positional understanding of objects within the environment. In this case, the distances described in the displacement example above would be replaced by measured angles of apparent movement. Accordingly, the measured angle of movement between an object imaged from a first angular position and a second angular position can be compared to the actual degree of angular movement of the second camera. This could result in an angular ratio being determined such as A Angleobjed1 / A Anglecamera.

Similarly, the same measurement and calculation could be carried out for a second object (object 2) located at a different distance namely, A Angleobjed2 / A Anglecaniera. The larger the ratio the closer the object is to the second camera. Furthermore, similar to the above described two examples, this distance between the second camera and each object is a function of these parameters (angular differences in this example) which is well known in optical distance calculation, and such a function can be used to determine ditances between objects.

The desk preparatory data may also be processed to determine the items on the user's desk (desk content) and generate a 3D model of the user's desk surface. The items on the user's desk may be determined using image recognition software, such as the MediaPipe Toolbox, Amazon Web Services (AWS), Amazon Rekognition and OpenCV. This information may be used later to detect if any new item is added, for example, when objects are concealed under the keyboard when the preparatory desk data is collected and are revealed while the task is being carried out. To detect whether any new items are added, the objects identified in an image or video of the desk surface prior to the task beginning may be compared to one or more images or videos and video (or random images) taken during the task period.

Preparatory eye-calibration data is processed to create a frame of reference for the camera, computer screen, and the user's head and/or the space in which the user's head is likely to be positioned in. Eye-calibration data is processed at a gaze calibration engine. As stated above, the collection of eye calibration data involves displaying a sequence of stimuli (e.g. dots) at different positions on the computer screen. Each stimulus is output for a predetermined period of time. The gaze detection processor is configured to determine the direction of the user's gaze when each stimulus is output and the gaze calibration engine is configured to create a spatial relationship between the first camera, the eyes of the user and the computer screen to enable specific user eye movements to be calibrated to specific gaze directions determined by the stimuli on the computer screen.

The position of the user's eyes (e.g., pupils, iris, eyelid etc.) are also typically determined each time the stimulus (e.g., a dot on the computer screen) moves to a different location on the screen.

The data indicating the position of the eyes and/or head of the user when looking at the sequence of stimuli may be extrapolated to determine the expected eye and/or head positions when the user is directing their gaze towards parts of the screen where the stimulus had not been displayed. For example, by displaying stimuli at the edges of the screen, a bounding box for a direction a gaze is determined and, if the direction of gaze is within the bounding box, the user can be considered to be looking at the screen. This enables the direction of the user's gaze to be determined when they are looking at any part of the screen.

By determining the eye and/or head movements of the user when the stimulus is located at different positions on the screen, it is possible to determine the spatial relationship between the camera, screen of the monitor and the eyes of the user. For example, if the direction of the user's gaze has moved by an angle of 9 degrees and the distance between subsequent dots that appear on the screen is b, then the distance between the user's eyes and the screen, a, may be calculated using the following equation: a - Equation tan Additionally or alternatively, data relating to the direction of the user's gaze may be calibrated at any point during the collection of preparatory or task data. The direction of the user's gaze may be calibrated based on the content of the screen, such as text or buttons appearing on the screen. In an example, when the user begins reading from the first line, the calibration may begin and detect the direction of the user's gaze as their eyes move along a line of text and jump to the next line or paragraph.

In another example, rather than displaying stimulus on the screen such as dots, the direction of the user's gaze may be calibrated using buttons which the user can select. For example, the user may be instructed to "Please click on "continue" to begin the test". Wien the user clicks on the button labelled "continue", the system may determine the direction of the user's gaze and use this to create a frame of reference for the camera, computer screen, and the user's head.

The calibration may be continuous. The calibration of the user's gaze may be carried out at multiple instances during the collection of preparatory and/or task data to adjust for changes in the user's behaviour, for example if the user is becoming tired during the course of the task, and to improve the calibration of the user's gaze throughout the task (e.g. as more data is collected).

The orientation of the user's head relative to the computer monitor and camera may also be determined. The positions of the user's eyes when focused on the screen during the collection of eye-calibration data may be used to calculate whether objects in the room are closer or further away than the user's monitor. The positions of the user's head and eyes (e.g., pupils, iris, eyelid etc.) when text is displayed on the screen may further be determined to monitor how the user's eyes move when they are reading and identify reading patterns of the user.

The preparatory data or processed preparatory data may also be coded/compressed at the user's computer 102 before being transmitted to the overseeing computer 112, remote server 104 and/or web server 118.

All or part of the preparatory data or processed preparatory data is stored locally at a local data store of the user's computer temporarily or permanently and/or uploaded or started to be uploaded, via the internet to the remote server and to the remote data store. It is advantageous to store the preparatory data or processed preparatory data at least temporarily at a local data store of the user's computer before it is uploaded to the remote server and remote data store because the internet speed can be too slow to transmit the data as it is being collected.

Some or all of the preparatory data or processed preparatory data may be transferred from the local data store of the user's computer 102 to the overseeing computer 112, remote server 104 and/or web server 118.

Processed eye-calibration data for a particular user may form part of a human behavioural profile for that user.

Human Behavioural Profile The behavioural monitoring system may optionally comprise one or more human behavioural profiles. These human behavioural profiles are data profiles corresponding to user behaviour and may include an immediate individual profile, an aggregate individual profile, an immediate group profile and an aggregate group profile.

The human behavioural profiles may include patterns of user behaviour using the captured images and/or videos from the first and/or second cameras. The images and/or videos show the general behaviour of the user and multiple prior users and are used to create profiles of acceptable/normal (i.e., not cheating) behaviour while a task is carried out. The behavioural profiles may, for example, include data relating to how the user(s) typically moves their eyes when they are reading, the user's neutral facial expression, where and how the user looks while they are thinking, the user's head movements during the task, how often a user blinks and the relationship between the number of blinks and what the user is doing (for example, whether the user blink less when they wander, or their gaze wanders, around the room). This information may be characterised by vectors and distributions, such as those in Figure 14. Some of the behavioural profiles are correlational. For example, the length of time a user spends wandering their eyes around the room may be correlated to the area they are looking at. A user may look at a wider area when they spend more time wandering their gaze.

By comparing the user's behaviour with their own normal behaviour and/or the normal behaviour of prior users, established by the human behavioural profiles, it is possible to identify whether the user has significantly deviated from their own normal behavioural profile and/or the normal behavioural profiles of a group of users. If the user's behaviour has deviated significantly from the normal behavioural profile(s), this can be an indication of potential cheating.

The behavioural profiles may be stored at the local data store 724 on the user's computer 702, at the server, and/or at the remote data store 108. At least part of the human behavioural profiles may be created at the overseeing computer 112.

The immediate individual behavioural profile may be a data profile of the individual behaviour of the user carrying out the task. An immediate individual behavioural profile may be created for all the users carrying out any given task (e.g., an examination). An immediate individual behavioural profile may, for example, include the time it takes for that user (e.g., examinee) to complete a particular task and/or the time each user spends not engaging with the hardware (i.e., not moving the mouse or pressing keys on the keyboard, which they may do when they are thinking).

To create an aggregate individual behavioural profile, the multiple behaviour profiles generated when a user has carried out multiple tasks may be aggregated. For example, a particular user may have attended more than one examination and so to create the aggregate individual behavioural profile for this user, the immediate individual behavioural profiles generated for each of their examinations may be combined. Aggregate individual behavioural profiles can help to determine individual behavioural baselines, for example to distinguish between one user who tends to generally think while looking at a single point, and another user who tends to look at a wider area while thinking.

The individual behavioural profiles are advantageous because different users can exhibit different behaviour while carrying out the same activity. For example, users with dyslexia may exhibit different behaviour whilst reading compared to those without dyslexia. Similarly, users with attention-deficit/hyperactivity disorder (ADHD) or obsessive-compulsive disorder (OCD) or may exhibit different behaviour whilst carrying out a task than those without these conditions. Having a data profile that reflects the normal or typical behaviour for a particular user assists in identifying atypical behaviour for that user, which can indicate cheating.

To generate an immediate group profile, the immediate individual profiles for all users for a particular task (e.g., a particular exam or type of exam) may be aggregated. Immediate group profiles help to establish an expected type of behaviour, which is determined by the type of task required of the user. For example, a multiple choice question (MCQ) exam typically results in a very different type of captured user behaviour to that of an essay-writing exam. Having an aggregated profile provides a benchmark to compare immediate user behaviour results against to determine if the immediate user behaviour is abnormal.

An aggregate group profile may include the collective behavioural profiles of all users for all tasks carried out (e.g. all examinations carried out using the present system). The aggregate group profile may be formed by aggregating the immediate group profiles for all tasks carried out. The aggregate group profile can provide useful data regarding the micro-timing of the human-machine interaction for a particular task or group. For example, an aggregate group profile can provide useful aggregated data, such as the average period of time a user fixates on the correct answer of a multiple choice question, small pattern data such as the relationship between the speed of the mouse pointer and the distribution of the user's gaze fixation points following the pointer, and larger pattern data such as the duration of eye gaze wandering following time spent reading (e.g. information indicating whether the user wanders their gaze more after reading a longer section of text).

Here, micro-timing refers to all aspects of data recording that involves timing (e.g., the small pattern data mentioned above).

An aggregate group profile may, for example, be a profile for an aggregation of users belonging to multiple groups completing a particular question or task. For example, it may include all users completing a particular type of question such as a physics multiple choice question. In another example, an aggregate group profile may be a profile for an aggregation of users belonging to a single group completing multiple tasks. For example, it may include all users in a group corresponding to a particular subject (e.g. students studying English) and doing a variety of tasks (e.g. writing essays, answering multiple choice questions etc.). In a further example, an aggregate group profile may include a mixture of both, namely a profile for an aggregation of users belonging to multiple groups completing a particular question or task and users belonging to a single group completing multiple tasks.

For example, multiple choice questions typically have a generic eye movement: reading the question, reading an option (not necessarily the first one, it could be the longest one), jumping to another option, and then another and another. This pattern of jumps do not typically happen in 'fill-in-the-blanks' questions. Tasks related to different subjects can also show different user behaviour than others. A maths multiple choice question typically shows different user behaviour than a history multiple choice question because one (usually) requires more processing. Moreover, students that study a particular subject can show one type of typical user behaviour and students that study another subject can show another type of typical user behaviour. For example. medical students typically rely more on their memories while engineering students typically rely more on their analytical abilities, and so these different groups typically show different observed behaviours.

Collection of Task Data At Step 304 of Figure 3, task data is collected. The collection of task data may include capturing video and/or image data, using the first and/or second camera, of at least the eyes of the user while the task is being carried out. It may also include capturing video and/or image data of the face and/or torso of the user while the task is being carried out, using the first and/or second camera. For example, the first camera may capture eye movements (including but not limited to movements of the pupil, iris and eyelid), facial expressions, movements associated with talking, head movements, shoulder movements and/or movements that indicate the user has moved from a seating to a standing position. The collection of task data may be continuous while the task is being carried out.

The collection of task data may also include collecting data corresponding to the user's interaction with the computer using a source of input. For example, this may include but is not limited to information indicating when and where the user is moving or has moved a mouse pointer or cursor, when and where the user is hovering the mouse pointer or cursor over a particular region of the screen, when and where on the computer screen the user has clicked the mouse, when the user pressed one or more keys on a keyboard, and when and where the user made contact with a touch screen or made contact with a digital pen.

At Step 306 of Figure 3, the task data is stored locally at a local data store of the user's computer and/or uploaded or started to be uploaded, via the internet to the remote server and remote data store. For example, the data may be recorded locally while simultaneously being uploaded to a remote server. The amount of data to be transmitted may be more than the bandwidth of the internet. The computer may continue to upload task data from the local data store to the remote server and remote data store after the task has ended.

The remote data store 108 may store all the captured user behaviour, a portion of the data captured and/or all or a portion of processed data. For example, the background of the captured images may be filtered out and only certain data may be stored, such as the data relating to the user (e.g. the user's facial features, head and/or body etc.). Alternatively, the background of the captured images may be blurred. The omission or blurring of the user's background advantageously means the data relating to the user behaviour can be stored without the data relating to any personal information or items identifiable in the background of the frames of the captured images.

At Step 308 of Figure 3, the task data is analysed, for example to determine and identify any indicators that suggest the user has been cheating. Task data may be analysed continuously on the user's computer. Task data may be analysed as the user is carrying out the task or once the task has been completed. The analysis of the task data is discussed in more detail below.

All or a portion of the task data may be uploaded to the remote server 104 and remote data store 108. In one example, only sections of the data are uploaded to minimise the amount of required communications. For example, data that shows no indication of abnormal behaviour (such as cheating) may not be uploaded to the remote server and remote data store to reduce communications.

The human behavioural profile(s) may be updated using task data. As shown in Figure 3, the human behavioural profile may be updated repeatedly while the task data is analysed.

The analysed data may be sent to the overseeing computer 112 to feedback the analysis of the task data to the third party who is monitoring the results (e.g. an invigilator/examiner). The third party (e.g. the invigilator/examiner) may monitor the results in real time (on-line), as the user or users are carrying out the task, or after the task has been completed (off-line).

Micro Timing.

In accordance with an aspect of the present embodiments, the analysis of task data may involve analysing the way a user looks at the monitor and how the user utilises the mouse and keyboard of the computer to interact with the computer. The system uses micro-timing to detect a mismatch between the user's behaviour and interactions with the computer (e.g., mouse or touchpad movements, keystrokes on a keyboard, movements with a stylus or digital pen etc.). Acceptable, non-cheating, behaviour may include a user movement and the effect of a computer interaction being in a predetermined order. A mismatch (e.g. the user behaviour and computer interaction not occurring in the correct order) could be an indication that the user is deviating from acceptable behaviour (e.g. cheating). For example, the user's eyes following a mouse pointer rather than leading the mouse pointer may indicate that a third party is controlling the mouse pointer rather than the intended user sitting an exam.

More specifically, using the clock of a computer, the system may determine when a user clicks or hovers their mouse pointer over a particular region of the screen. The system can also identify where on the screen the user has clicked or where the user is hovering their mouse on the screen. The system can further determine the direction of a user's gaze and therefore when the user's gaze is directed to the position on the screen where the user has clicked or the mouse pointer is or has been hovering. If it is identified that the user's gaze was only directed towards the position on the screen the user has clicked or the mouse is hovering after the mouse has been clicked and/or after the cursor arrives at this point, this may be flagged as an indication that the candidate is cheating, namely that a third party is controlling the mouse. Also, the system may identify that the user's gaze is following the movement of the mouse cursor around the screen, i.e. identify that the user's eye gaze is tracking the path of the cursor or mouse pointer. This may also indicate that the mouse is being controlled by another party, such as somebody out of view of the webcam or in a different room, for example using another monitor mirroring the candidate's screen and/or a wireless mouse. A single instance of the mouse arriving at a point before the candidate's gaze may be sufficient to flag a possible cheating incident. Alternatively, a plurality of such instances meeting a threshold may be required to flag a possible cheating incident.

In another example, a mismatch between the user's behaviour and interactions with the computer may be detected when the direction of the user's gaze does not correctly follow the text appearing on the computer screen as it is being typed on a keyboard, particularly when the typing stops. This may indicate that the keyboard is being controlled by another party, such as a person using a wireless keyboard out of view of the webcam.

Additional data may be collected during different examination questions or when the distance between the mouse pointer and the gazing point is greater than a threshold to improve the accuracy of the system (i.e. to avoid the system flagging possible cheating incidents when no cheating has occurred). The number of times the system calculates the point on the screen the user has directed their gaze towards and identifies the location of the mouse pointer/cursor on the screen may at least partially depend on the specifications of the hardware used by the candidate sitting the exam. For example, cameras usually have a recording rate of 25 to 30 frames per second (fps) so the number of times the location of the candidate's gaze on the screen is determined may be calculated based on this rate. The 'point' on the screen that the system determines a user has directed their gaze towards and positioned the mouse pointer/cursor in may vary in size (screen area). For example, the size of the area may depend on the variance of the gaze detection. VVhen the user is looking at a single point, the point on the screen the system detects the user is directing their gaze towards, is typically not fixed and has some variation (fitter). This variation dictates the size of that area. The variation may depend on factors such as but not limited to the quality of the image of the user's eye (i.e., in terms of hardware such as resolution), environmental factors (e.g., lighting) and the eyes (e.g., how visible the sclera is in the image).

It is to be appreciated that using a combination of gaze detection and timing of mouse clicks minimises the amount of image processing required by the local computer and/or the remote server. This is because any solution which looks for movement of the cursor in a manner which indicates fraudulent remote control of the subjects computer requires a great deal of processing power and also may not be as reliable as the above-proposed method of the present embodiment. Furthermore, the accuracy of detection of such fraudulent behaviour is vastly improved by using such a combination of timing of mouse clicks and gaze detection timing.

If the system detects that potentially cheating behaviour has occurred, a warning message may be displayed to the user. For example, if the system detects that the mouse pointer has moved to a second monitor, and a second monitor is not permitted while a particular task is being carried out, a message may be displayed to the user warning them that they should not use the second monitor. This may occur while preparatory and/or task data is being collected.

Figure 12 shows a schematic diagram of a behavioural monitoring system configured to implement the micro timing aspect of the present invention. The system comprises a computer 702, a computer mouse 704 and/or keyboard 706, a first camera 708, such as a webcam or other face-camera, and a computer monitor or screen 710. The user's computer 702 comprises a communications engine 712, a gaze calibration engine 714 for configuring the system to determine where the user is looking, a gaze detection processor 716 to actually determine the direction of the user's gaze, a content generator 718 for outputting relevant content to display on the monitor 710, a clock 720, an input device location processor 722 for determining the location and timing of user input on the monitor 710 such as mouse location and a mouse click, a local data store 724 for storing the results of any analysis of the captured data and an alert processor 726 for determining whether the captured behaviour represents fraudulent behaviour or normal (acceptable behaviour). The communications engine 712, in use, transmits data to a remote server or web server and/or receives data from a remote server or web server via a communications network such as the Internet.

The first camera 708 is configured to capture photos and/or videos of the eyes or face of the user during the collection of the eye calibration data and the task data and send data corresponding to the captured photos and/or videos to the user's computer 702. The gaze calibration engine 714 is configured to receive this data and determine the reading patterns of the user, the positions of the user's head and/or eyes when stimulus is displayed at specific locations on the computer monitor 710 and the spatial relationship between the first camera 708, the screen of the monitor 710 and the eyes of the user. The camera 708 is also configured to send data relating to the captured photos and/or videos to the gaze detection processor 716. The gaze calibration engine 714 is configured to send data relating to the spatial relationship between the first camera 708, the screen of the monitor 710 and the eyes of the user to the gaze detection processor 716.

The gaze detection processor 716 may be configured to receive this captured image/video data and determine the direction of the user's gaze and the location on the screen 710 which the user is directing their gaze towards. The location on the screen which the user is directing their gaze towards may be calculated by calculating the location and orientation of the user's head, determining the location of the pupils in the image of the user and using this to calculate the direction of the user's gaze and then mapping the direction of the user's gaze to the surface of the computer screen using the location and orientation of the user's head and the direction of the user's gaze. If the user's gaze is not directed to the computer screen, it can be mapped to what is behind the computer screen.

A user may exhibit different behaviour when looking at the same location of the screen. For example, a user may move their head to look at an area of the screen in one instance but move only their eyes to look at the same area in another instance. The system may determine the angle of the eyes, the angle of the head and use both to determine the direction of the user's gaze and/or the point on the screen to which they are directing their gaze, along with the location of the head.

Alternatively, the direction of the user's gaze and/or point which the user is directing their gaze towards can be determined holistically. A neural network can be used to determine the direction of the user's gaze and/or point which the user is directing their gaze towards directly from the image.

Once an image of the user has been captured (either a still image or video captured using the first camera), a face mesh is fitted to a model of the user's face, and tools such as the MediaPipe toolbox (Google) or Open CV (an open-source toolbox) may be used to detect face landmarks. Face landmarks are points on the user's face such as the corners of the lips, the tip of the nose, the top of the nose and the two corners of each eye. The identification of face landmarks of a user is used to determine the user's facial expressions and behaviour (e.g. indications that the user is squinting, reading, talking etc.).

Figures 13a and 13b are schematic diagrams showing representations of a face mesh 802 fitted to the face of the user 800. The representation of the user's face is generated using the image and/or video data of the user captured using the first camera and the face mesh 802 may be mapped onto the representation of the face and extract facial features, landmarks (labelled by the spots in Figure 8b), and expressions. Alternatively, facial features and/or landmarks may be detected by scanning the entire face holistically and detecting the eyes and other features based on their relative positioning (e.g. two eyes on either side of a nose). In a further example, facial features and/or landmarks may be detected with fewer constraints, such as by simply searching for an eye in the image. Head and eye processing is then carried out in parallel based on the detected face landmarks to determine the positions of the user's head and eyes. The 3D location and 3D orientation of the user's head pose can be calculated, for example using configural processing. The pupil locations and their placement in relation to the centre of the user's head are also determined, for example using configural processing. Other facial features may also be determined using configural processing (e.g., shape of the eyelids). The shape of the user's eyelids can help determine where the user is looking because a person's top eyelid is more closed when they are looking downwards than when they are looking upwards. The relative gaze direction is then calculated, which is the direction of the user's gaze relative to their head. The absolute gaze direction may also be calculated based on the user's head pose, the location of the user's pupils and/or the relative gaze direction. Absolute gaze direction means the gaze direction in relation to the space, compared to relative gaze direction which is relative to the user's head. If the user's head is represented by a single point, the absolute gaze direction is the eye direction On 3D) plus the head direction On 3D). If the user's head is represented by a sphere, a transition is included too (from the eyes to the centre of the head and back).

It is to be appreciated that the use of configural processing provides a significant advantage in terms of speed of processing of image data. The high-resolution analysis of the subject's pupils to determine the gaze direction is restricted to an area where the eyes are expected to be. This area is determined from recognition of a feature of the subject's face and a knowledge of the general location of the eyes in relation to that feature. For example, if a corner of a mouth is detected, then the general area in which the eyes will be located can be used to limit the area in which high-resolution analysis is required to determine gaze direction.

Using the calculated absolute gaze direction and eye calibration data, the alert processor 726 is able to identify whether the user is directing their gaze towards the screen or not. The alert processor 726 may set the on-screen status value accordingly ("true" if the user's gaze is directed towards the screen and "false" if their gaze is directed off the screen). If the onscreen status is "true", that is the alert processor 726 has calculated that the user is directing their gaze towards the screen 710, the gaze location on the screen (i.e., the position on the screen in which the user is directing their gaze towards) may be calculated. The alert processor 726 may record in the local data store 724 that the gazing object for this user at this particular moment in time is the screen (the timing information is provided as a time stamp by the clock). If the on-screen status is "false", that is the alert processor 726 has determined that the user is not directing their gaze towards the screen 710, the gaze location (where the user is directing their gaze towards) may be calculated for this moment in time using the preparatory environment data relating to the physical environment the user is carrying out the task in. The alert processor 726 may set the gazing object accordingly (e.g., record that the user is directing their gaze at a wall, object or open space such as a door etc.).

Figure 14 is a schematic diagram of the user 900 facing the computer screen 903, with the first camera 904 positioned above the computer monitor 903. Figure 14 shows a calibration being performed by the behavioural monitoring system of Figure 1 to extract the spatial relationship between the first camera 904, screen and user's head.

Figure 15a is a schematic diagram showing the orientation of the centre of the head of the user 1000 and the eyes of the user 1000.

Figure 15b is a schematic diagram showing the spatial relationship between the left and right eyes and head of the user 1000.

The system may further record the user's actions (e.g., mouse movements and clicks, their interaction with the keyboard etc.) using the input device location processor and the time at which each action is carried out. An action made by the user may be considered an 'event'.

The recording of events may be referred to as 'event tagging'. The recorded events/actions may be based on the input and output data from the user interacting with the computer hardware (e.g., the computer mouse, keyboard etc.). Events may be in the form of arrays of events for different timepoints. An example is shown in Figure 16, with the timestamp in column 1, order of the event in column 2 and type of the event in column 3. The event recorded may include the type of event (e.g., a mouse click or keystroke) and the effect of the event. For example, one event may involve an option being selected by a mouse click (mouse click + option) whereas another event may involve a user clicking in a text box (mouse click + textbox). Different event labels may be provided. For example, some events may be labelled as "goal directed behaviours". Such events may include clicking on an option in a multiple-choice question or pressing a key on a keyboard. Other events may for example be labelled as "change of questions on the screen", while others may be labelled as "beginning of the exam". The type of event can help to indicate whether the user is cheating. For example, if the user is found not to be looking directly at the part of the screen that allows them to click to move to the next question when they click the mouse, or their gaze follows the mouse rather than their mouse following their gaze when selecting to move to the next question, the system may identify that the user is not cheating because they are just moving to the next question. In contrast, if the user is exhibiting similar behaviour (e.g. their gaze follows the mouse rather than their mouse following their gaze) when answering a question, the system may flag that they are potentially cheating.

Returning to Figure 12, the computer mouse 704 is in communication with the user's computer 702, via a wired or wireless connection, and sends data indicating where the user has moved their computer mouse 704 to, to control the location of a cursor or pointer on the screen 710, to the input device location processor 722. The keyboard 706 may be used as an alternative means of controlling where the cursor is located on the user's screen. The keyboard 706 may be in communication with the user's computer 702, via a wired or wireless connection and send data relating to which keys the user has selected to control the location of the cursor on the screen to the input device location processor 722. The input device location processor 722 is configured to receive this data from the computer mouse 704 and/or keyboard 706 and identify the location of the pointer or cursor on the user's computer screen 710. The clock 720 is configured to determine the time at which the mouse 704 arrives at a point on the screen 710 and the time at which the user's gaze arrives at that point on the screen 710. The local data store 724 is configured to receive and store this data. The alert processor 726 is configured to receive the time data indicating when a user's mouse pointer or cursor arrives at a particular location on the screen 710 and, when the user's gaze is directed towards this location, determine whether the direction of the user's gaze arrives at the location on the screen 710 before or after the mouse pointer. The alert processor 726 is configured to generate an alert signal if it is determined that the user's gaze arrives at the location on the screen 710 after the cursor. Such behaviour indicates that the user is potentially cheating because it suggests that a third party, and not the user, is controlling their computer mouse 704. The alert signal may be sent to the communications engine 712, which may be configured to send data, including the alert signal, to the remote server and/or web server. The remote data store 108 is configured to receive and store the alert signal indicating that the user is potentially cheating. The alert signal is sent to the overseeing computer 112 to inform the invigilator that an event, indicating that the user may have cheated, has occurred.

Pattern Monitoring In accordance with a further aspect of the present embodiments, the analysis of task data may involve capturing patterns of human movement which deviate from acceptable patterns of movement and therefore indicate unacceptable behaviour such as cheating. Acceptable patterns of behaviour may include, but are not limited to, reading the computer screen, reading permitted material on the user's desk (e.g., if an examination paper has been printed out or the user has made notes whilst the task (e.g., exam) is being carried out and eye wandering around the room without exhibiting eye patterns that suggests the user is reading or repeatedly directing their gaze to a particular region of the wall for more than a predetermined period of time. To detect acceptable and unacceptable patterns of behaviour, the system can match captured patterns of movement (e.g., eye movement) with known patterns, in particular reading patterns. When the user first begins the task, the user's behaviour may be compared to known eye movement patterns constructed from monitoring the behaviour of many users. In time, as the user's own behaviour (e.g., eye movements while reading) is captured and analysed using a behavioural calibration processor to determine personalised eye movement patterns (e.g. personalised reading patterns), the user's behaviour may be compared to their personalised reading patterns to detect unacceptable patterns of behaviour. For example, the user's eye movements may be captured (e.g. via the first camera) while instructions or questions are appearing on the screen and the system may determine that the user is directing their gaze towards the part of the screen displaying the questions. The information extracted from the captured video may include, but is not limited to, data relating to the small saccades as the eyes move from word to word, large saccades when the eyes move from the end of one line to the beginning of the next, fixation pauses between each saccade as the user processes the text, the average time a user takes to scan a word, the polar histograms shown in Figures 19a and 19b, the speed at which a user scans a particular line, the distance between two saccades, the number and length of the reversal saccades (when reading a line a user occasionally jumps back a word or two to review -this happens more often for certain users, such as those with memory difficulties or learning disabilities), the vertical movement of the uses (this is particularly more pronounced for dyslexic people), and non-linear movements of the eyes (which can be dictated by languages -for example the second verb in German comes at the end of a sentence or a negation happens at the end, and sometimes readers, especially non-native speakers, jump to the last word to follow the sentence better). Certain assumptions may be made, for example that sentences are arranged in horizontal or vertical lines, but other patterns of behaviour may be determined while the present system is being implemented. Some patterns of behaviour may be very different to others, for example some languages read from left to right, others read up to down etc. In native Japanese, text is read from up to down and right to left. The user's pursuit, vergence and vestibular eye movements may also be determined. Comparing the user's eye movements to their personalised typical reading patterns is particularly advantageous since different individuals may exhibit different reading patterns, such as those with dyslexia or medical conditions (e.g., amblyopia). If a reading pattern of movement is detected as occurring in a direction that it should not be (e.g., away from the screen of the monitor such as towards the walls of the room they are in or the ceiling), then this may indicate potential cheating behaviour.

Patterns of eye and/or head movement identifying certain behaviours including reading, eye wandering, focussing, viewing a static image, thinking, scrolling, and watching videos may be determined using machine learning. The system may be trained to recognise patterns of eye and/or head movement indicating such behaviour using samples of data which have been captured when subjects have carried out these activities. Behaviours may for example be identified using changes in eye gaze, the movement of a candidate's head (e.g. looking out of a window), eyebrow movement, facial expression (e.g. frowning), hand gestures (e.g. a candidate touching their face), arm gestures and/or an indication that the user is leaning back in their chair (such that the candidate appears further away). Once these patterns of behaviour (captured head and/or eye movement) are learned by the system (using machine learning), new behaviour can be compared to the learned behaviour to determine the type of activity that the new behaviour most closely matches.

Figure 17 shows a schematic diagram of a behavioural monitoring system configured to implement the pattern monitoring aspect of the present invention. The system comprises a user's computer 1202, a computer mouse 1204, a keyboard, a first camera, and a computer monitor/screen 1210. The user's computer 1202 also comprises a clock 1212, a communications engine 1214, a pattern matching processor 1216, a local data store 1218 comprising one or more patterns, a behavioural calibration processor 1220, a gaze calibration engine 1222, a gaze detection processor 1224, a pattern construction processor 1226, and an artificial intelligence-based learning processor 1228. The system is configured to transmit and/or receive data to and/or from a remote server 108 or a web server 118.

The first camera 1208 (e.g. webcam) is configured to capture image and/or video data of the user. The gaze calibration engine 1222 is configured to receive the image and/or video data captured during the preparatory data collection and determine the orientation of the user's eyes and/or head relative to the computer monitor/screen 1210 and first camera 1208. The gaze detection processor 1224 is configured to receive the image and/or video data of the user and determine the direction of the user's gaze, as explained previously in the embodiment described with reference to Figure 7. The pattern construction processor 1226 is configured to receive the image and/or video data of the user and identify movements of the user's eyes and behaviours the user is exhibiting, for example when they carry out a task. Behaviours may for example be identified by fitting a face mesh to a model of the user's face and detecting landmarks using tools such as the MediaPipe toolbox or Open CV, as explained previously in the embodiment described above with reference to Figure 12. The system may for example identify furrowed brows, squinting, pursing of lips, eye movement, head movement and other behaviours a user may exhibit when carrying out a task. The learning processor 1228 is trained to recognise patterns of behaviour (e.g. reading, eye wandering, focusing, yawning, sneezing, talking etc.) using artificial intelligence. For example, the learning processor 1228 is trained using samples showing subjects carrying out known patterns of behaviour. In time, the learning processor 1228 can also use artificial intelligence to learn the patterns of behaviour of the specific user carrying out the task. The system may learn the behaviour a particular user exhibits when they are reading, or that a particular user has a slow reaction time. Learned patterns of behaviour are stored at the local data store 1218. The clock 1212 is configured to determine when the user was exhibiting a particular behaviour. The pattern matching processor is configured to compare the user's behaviour to the learned patterns of behaviour to identify the behaviour the user is exhibiting (e.g. to identify whether the user is reading, focusing, yawning etc.). The behavioural calibration processor 1220 is configured to map the type of behaviour the user is exhibiting while carrying out the task and where they were directing their gaze to at the time.

When it is determined that the user is exhibiting behaviour that indicates the user is reading and/or scanning whilst repeatedly directing their gaze to non-permitted regions (such as the ceiling or wall behind the user's computer) or directing their gaze to such regions for more than a predetermined time period, the system, using the pattern matching processor 1216, flags that the user is potentially cheating as such behaviour can indicate that the user is extracting information from text or figures arranged on the wall or ceiling of the room the user is carrying out the task in. Similarly, mapping the face mesh onto the representation of the user's face and monitoring the movement of the user's mouth can identify whether the user is talking, and hence potentially cheating by communicating with a third party in the room.

The system can also identify non-cheating behaviours. For example, if the system identifies that the user is yawning, this will not be considered potentially cheating behaviour. The communications engine 1214 is configured to transmit data to the remote server or web server and/or receive data from the remote server 104 or web server 118. For example, the communications engine 1214 may transmit data indicating whether the user has exhibited any potentially cheating behaviours. The data is stored at a remote server 104 and/or transmitted to an overseeing computer 112, to be received by a third party (e.g. an examiner/invigilator) who wishes to know whether the user appears to have cheated.

Figure 18 is a schematic diagram showing different patterns of eye movement on the screen 1306 and on the wall 1308. The computer screen 1306 shows the different points on the screen 1306 at which the user is directing their gaze as they read text on the computer screen 1306. The wall 1308 behind the user's computer screen 1306 shows a pattern of the user's gaze as they scan and read items 1304 (e.g. paper) attached to the wall 1308 behind the user's computer 1302. If such behaviour is detected, the system can flag that the user is potentially cheating.

Figure 19a is a polar histogram showing the angle of eye movement for patterns of reading behaviour as determined by the pattern processor of the behavioural monitoring system.

This pattern of eye movement suggests the user is focusing on a particular area and reading because there is little deviation in the angle of their gaze direction.

Figure 19b is a polar histogram showing the angle of eye movement for patterns of eye wandering behaviour as determined by the pattern processor of the behavioural monitoring system. This pattern of eye movement suggests the user is wandering their eyes around the room as there is significant deviation in the angle of their gaze direction.

Hot Spot Detection In accordance with yet a further aspect of the present embodiment, the analysis of task data may involve detecting hotspots or fixations, where the user looks away from their computer screen and towards a particular region repeatedly and/or for more than a predetermined amount of time. Such behaviour may indicate that there are sources of information that could potentially provide information to a user (e.g. "cheat sheets" or another person assisting the user while they are sitting an exam). The detection of such behaviour may be considered unacceptable and identify that the user is potentially cheating.

Figure 20 shows a schematic diagram of a behavioural monitoring system configured to implement the hot spot detection aspect of the present application. The system comprises a user's computer 1502, a computer monitor/screen 1504, a computer mouse 1506, a computer keyboard 1508, a first camera 1510 and a second camera 1512. The user's computer 1502 comprises a clock 1514, a communications engine 1516, a screen view determining engine 1518, a local data store 1520, a hot spot processor 1522, an environment processor 1524, a gaze calibration engine 1526, and a gaze detection processor 1528. The system is configured to transmit and/or receive data to and/or from a remote server 104 or a web server 118.

The first camera 1510 is configured to capture image and/or video data of the user. The gaze calibration engine 1526 is configured to receive the image and/or video data of the user and identify the orientation of the user's eyes and/or head relative to the computer monitor/screen 1504 and first camera 1510. The gaze detection processor 1528 is configured to receive the image and/or video data of the user and determine the direction of the users gaze, as explained previously in connection with the embodiment of Figure 12. The screen view determining engine 1518 is configured to determine an acceptable range of user gaze directions which indicate that the user is directing their gaze towards the screen. The hotspot processor 1522 is configured to compare the determined gaze direction of the user with the acceptable range of gaze directions and generate an alert if a period of time the determined gaze direction is outside the acceptable range and to generate an alert if a period of time the determined gaze direction is outside the acceptable range is greater than a predetermined time period. The hotspot processor 1522 is configured to determine if the user's gaze is fixation on a single location/area outside a display area of the computer screen 1504 and if so whether the time the user spends fixating on this area is greater than a threshold time period. The hotspot processor 1522 is configured to determine if the user's gaze is repeatedly fixated on a single area away from the computer screen 1504 over a particular time period. The second camera 1512 is configured to capture a plurality of environment images of the working environment in which the user is positioned during the task (photos and/or images) (e.g. environment data). For example, this may include images of the walls, desk, windows and other objects in the environment (e.g. room) in which the user is carrying out the task. The environment processor 1524 is configured to process at least one of the plurality of environment images and identify one or more objects within the working environment. The hotspot processor 1522 is configured to use this information to determine what the user is looking at. The environment processor 1524 is further configured to determine whether viewing any of the objects represents an acceptable or unacceptable behaviour. This may for example be carried out using machine learning. For example, the system may be trained to identify objects using samples of known objects. The system may also store a record of objects that are acceptable and objects that are not acceptable, for some depending on where they are located. The hotspot processor 1522 may be configured to compare the single are the user is fixating their gaze upon and generate an alert if the location corresponds to the location of an unacceptable object.

Figure 21 is a schematic diagram showing an example environment with a fixation point 1602 on the wall 1603 as determined by the behavioural monitoring system. The visualisation on the wall 1603 shows different regions the system has detected the user 1600 has been directing their gaze towards (e.g., for time periods exceeding a threshold), including a 'hotspot' 1602 in line with the user's current gaze (denoted by the dashed line). The hotspot 1602 indicates a region that the user has repeatedly fixated upon. Such a visualisation may be created by placing a 3D Gaussian blob on a reconstruction of the physical environment at every fixation point for a given unit of time. The sum of these blobs creates a heatmap 1604 which shows the location of the user's gaze over time. Hotter locations in this heatmap 1604 indicate where the user has fixated more (see hotspot 1602). Heat maps may be generated by the hotspot processor using gaze data and can be mapped to objects within the environment as determined using the environment data. A visual representation of the 3D space around the user can also be constructed from this data.

Environment Monitoring In accordance with an additional aspect of the present embodiments, the behavioural monitoring system is configured to monitor the environment in which the user is interacting with the computer (e.g., the room they are sitting an exam). In particular, the system monitors the user's working environment separately from monitoring their behaviour within the work environment. For example, a second camera (e.g., smartphone camera) may be placed on the user's desk to capture and record the content of the desk surface during the task (typically periodically but this can in some cases where enough data storage is available be a continuous process). This content may for example be captured in the form of a wide angle or 360-degree video. The content of the desk surface captured during the task is then compared with the content recorded prior to the beginning of the task (i.e. the preparatory environment data). For example, a snapshot (static image or short video) of the user's working surface (e.g. desk surface) at the start of a task (e.g. taken during the preparatory data collection) is compared with an image or video capturing the user's working environment while the task is being carried out. Objects (e.g., paper, books) that appear in the room during the task that were not present prior to the beginning of the task could be sources of information that the user had concealed prior to the commencement of the exam and that the user is using to cheat with. Such objects would be noticed by continuous image capture (video) of the working environment The video data from the second camera (e.g. smartphone) showing the desk surface is sent to the remote server before being processed to identify whether any new items have appeared on the desk during the course of the task (e.g. exam).

It is also possible to minimise the amount of data captured (and hence the required local storage and required data to be transmitted) by configuring the second camera to capture images of the desk surface at random times throughout the task (e.g. exam). The risk of the user cheating would be reduced as the user would not know when the second camera is going to take a picture.

Figure 22 shows a schematic diagram of the behavioural monitoring system configured to implement the environment monitoring aspect of the present embodiment The system comprises a user's computer 1702, a computer mouse 1704, a keyboard 1706, a first camera 1708, a second camera 1710, and a computer monitor/screen 1712. The user's computer comprises a communications engine 1714, an alert processor 1716, an environment processor 1718, a local data store 1720 and a clock 1722. The system is configured to transmit and/or receive data to and/or from a remote server 104 or a web server 118.

The first camera 1708 is configured to capture one or more images and/or videos of a user situated at a computer screen 1712 to monitor the user's behaviour while they carry out a task. The second camera 1710 is configured to capture one or more images/videos of the working environment in which the user is positioned during the task, which may include a surface the user is working at (e.g. a desk) to carry out the task. The second camera 1710 may for example record the surface the user is working at continuously or at intervals as they carry out the task. The second camera 1710 may for example by placed on the surface (e.g. desk) in a standing position recording a video of the surface the user is working at. The environment processor 1718 is configured to process the one or more images/videos of the working environment to determine objects within the working environment at the beginning of the task and to process one or more images/videos of the working environment to determine objects present in the working environment while the task is being carried out. Objects may for example be identified using image recognition software. The clock 1722 is configured to identify when the presence of an object is identified and how long they are visible for. The alert processor 1716 is configured to compare the objects identified at the beginning of the task with the objects identified while the task is being carried out and generate an alert if there has been a change in the number or type of objects present within the working environment.

The second camera 1710 is configured to capture a 360-degree image or video of the environment at the beginning of the task and the environment processor is configured to build a three-dimensional representation of the environment.

The communications engine 1714 is configured to transmit the captured images and/or alerts to a remote server 104, remote data store 108 and/or overseeing computer 112, which may for example be monitored by an Examiner.

Figure 23 is a schematic diagram showing a representation of an example environment in which the user is carrying out the task. Figure 18 shows that the environment is comprised of a first camera 1802, a computer monitor/screen 1804, second camera 1806 (e.g. smartphone), keyboard 1808, computer mouse 1810, sample objects 1812 on the user's desk (e.g. notepad and pen), open space on the wall 1813 and example objects 1814 on the wall, which the user may use to cheat with.

Method for collecting task data Figure 24 shows a method for using the behaviour monitoring system to collect task data. At Step 1902, data is collected which involves receiving eye gaze data from the camera, receiving mouse data from the computer mouse and receiving video data from the second camera, such as a smartphone. This data is stored locally at a local data store at Step 1904, before being transmitted to a remote server at Step 1906, until there is no more task data to be transmitted to the remote server. Storing data locally before being transmitted to the remote server means that data is recorded even if the internet connection drops or the amount to be transmitted is more than the bandwidth of the internet. Data may be uploaded from the local data store to the remote server after the task has ended.

Figure 25 shows a method for using the behaviour monitoring system to process task data. The method comprises receiving, at Step 2002, eye gaze data from the gaze detection processor indicating the point X on the screen that the user is directing their gaze towards and determining, at Step 2004, the time, ti, at which the user's gaze is directed at point X. The method further involves receiving, at Step 2006, mouse and/or keyboard data indicating where the cursor or mouse pointer is positioned on the screen and determining, at Step 2008, the time, t2, when the cursor or mouse pointer is at point X on the screen. Point X may be the size of a single pixel or a group of pixels. The method additionally involves comparing, at Step 2010, the fimestamps, ti and t2 and if ti is greater than t2, that is if the user's eye is directed to Point X after the cursor or mouse pointer has arrived at Point X, the method further involves sending, at Step 2012, a notification to the overseeing computer (e.g. an examiner/invigilator) that indicates that the user may be cheating. Optionally, the method may also include receiving, at Step 2014, video data from a second camera and identifying, at Step 2014, objects on the desk surface while the user is carrying out the task, for example using image recognition software. The method may also involve identifying, at Step 2018, objects in the user's working environment at the beginning of the task (preliminary video data) and objects in the user's working environment while the task is being carried out and determining, at Step 2020, whether the number or type of objects has changed. If any new objects are identified, the method may additionally include sending, at Step 2022, a further notification to the overseeing computer informing the third party monitoring the overseeing computer (e.g. the examiner) that the user may be cheating. If no S new objects are identified, no notification is sent.

Having described several different embodiments, it is to be appreciated that the different functionality of each embodiment is complimentary in that they work on different aspects of captured images to provide different aspects of monitoring of user behaviour. In this regard the different elements recited shown in the embodiments of Figures 12, 17, 20 and 22 can be combined in different ways. For example, the functionality of the system described in Figures 12 and 17 could be combined, as could the functionality of systems described in Figures 12 and 20. A combination of all four embodiments could also be provided if the full functionality disclosed herein is desired. Other combinations of these different embodiments is also possible. The skilled person will appreciated that the functionality of any embodiment is determined by the individual functional elements which are provided.

Many modifications may be made to the specific embodiments described above without departing from the spirit and scope of the invention as defined in the accompanying claims. Features of one embodiment may also be used in other embodiments, either as an addition to such embodiment or as a replacement thereof.

References Chen, B., Azad, S., Fowler, M., West, M., & Zilles, C. (2020). Learning to Cheat: Quantifying Changes in Score Advantage of Unproctored Assessments over Time. L@S 2020 -Proceedings of the 7th ACM Conference on Learning @ Scale, 197-206.

https://doi.org/10.1145/3386527.3405925 Cluskey, G. R., Ehlen, C. R., & Raiborn, M. H. (2011). Thwarting online exam cheating without proctor supervision. Journal of Academic and Business Ethics, 4, 1-8. http://search.proquest.com/docview/876280909/fulltextPDF?accountid=4840 Dendir, S., & Maxwell, R. S. (2020). Cheating in online courses: Evidence from online 30 proctoring. Computers in Human Behavior Reports, 2, 100033. https://doi.org/10.1016/j.chbr.2020.100033 Face off. The Pro Proctor vs. the Grad Student -ProctorU. (2018).

https://www.proctoru. com/industry-news-and-notes/face-off-pro-proctor-vs-grad-student Gudino Paredes, S., Jasso Pena, F. de J., & de La Fuente Alcazar, J. M. (2021). Remote proctored exams: Integrity assurance in online education? Distance Education, 42(2), 200- 218. https://doi.org/10.1080/01587919.2021.1910495 Joint Council for Qualifications (JCQ). (2020). Instructions for conducting examinations (Issue August).

Karim, M. N., Kaminsky, S. E., & Behrend, T. S. (2014). Cheating, Reactions, and Performance in Remotely Proctored Testing: An Exploratory Experimental Study. Journal of Business and Psychology, 29(4), 555-572. https://doi.org/10.1007/s10869-014-9343-z King, C. G., Guyette, Roger W., Jr., & Piotrowski, C. (2009). Online Exams and Cheating: An Empirical Analysis of Business Students' Views. Journal of Educators Online.

https://eric.ed.gov/?id=EJ904058 Kolski, T., & Weible, J. (2018). Examining the Relationship between Student Test Anxiety and Webcam Based Exam Proctoring. Online Journal of Distance Learning Administration, 21(3).

Lee, J. W. (2020). Impact of proctoring environments on student performance: Online vs offline proctored exams. Journal of Asian Finance, Economics and Business, 7(8), 653-660. https://doi.org/10.13106/JAFEB.2020.VOL7.N08.653 Mind's Limit Found: 4 Things at Once I Live Science. (2008). https://www.livescience.com/2493-mind-limit-4.html Nigam, A., Pasricha, R., Singh, T., & Churi, P. (2021). A Systematic Review on Al-based Proctoring Systems: Past, Present and Future. Education and Information Technologies 2021, 1-25. https://doi.org/10.1007/S10639-021-10597-X Sutton, H. (2019). Minimize online cheating through proctoring, consequences. Recruiting & Retaining Adult Learners, 21(5), 1-5. https://doi.org/10.1002/nsr.30434 Woldeab, D., & Brothen, T. (2019). 21st Century Assessment: Online Proctoring, Test Anxiety, and Student Performance. International Journal of E-Learning & Distance Education, 34(1), 1-11.

Claims

CLAIMS1. A behavioural monitoring system for monitoring human behaviour to complete a task using a computer over a predetermined time period; the system comprising: A first camera configured to capture a first plurality of images of a user situated at a computer screen to monitor the user behaviour during the time period; A second camera configured to capture a second plurality of images of the working environment in which the user is positioned during the task, the working environment including a surface at which the user is working to complete the task using the computer; An environment processor configured to process at least one of the second plurality of captured images to determine objects within the working environment at the beginning of the time period and to process at least some of the second plurality of captured images to identify objects present within the working environment during the time period until the end of the time period; and An alert processor configured to compare the determined objects from the beginning of the task with objects within the working environment during the time period and to generate an alert if there has been any change in the number or type of objects positioned within the working environment during the time period.
2. The behavioural monitoring system of Claim 1, wherein the second camera is configured to capture a 360-degree image or video of the environment at the start of the predetermined time period and the environment processor is configured to build a three-dimensional representation of the environment.
3. The behavioural monitoring system of any preceding claim, wherein the second plurality of images comprises a first and second image of the environment taken from a first position and second position respectively, wherein representations of a first object and a second object are captured in each of the first image and second image; wherein the environment processor is configured to: determine a displacement distance between the first and second positions; determine a first linear displacement distance of corresponding points of the first object between the first and second images; determine a second linear displacement distance of corresponding points of the second object between the first and second images; and determine locations of the first and second objects with respect to the first and second positions using the displacement distance, the first linear displacement distance and the second linear displacement distance using a stereoscopic matching function.
4. The behavioural monitoring system of any of Claims 1 to 3, wherein the second plurality of images comprises a first and second image of the environment taken from a location in a first angular direction and a second angular direction respectively, wherein representations of a first object and a second object are captured in each of the first image and second image; wherein the environment processor is configured to: determine an image displacement angle between the first and second angular directions; determine a first angular displacement of corresponding points of the first object between the first and second images; determine a second angular displacement of corresponding points of the second object between the first and second images; and determine locations of the first and second objects with respect to the location using the image displacement angle, the first angular displacement and the second angular displacement using an angular stereoscopic matching function.
5. The behavioural monitoring system of Claim 3 or 4, wherein the determined locations of the first and second objects are used to build a 3D model of the environment.
6. The behavioural monitoring system of any of Claims 3 to 5, wherein the environment processor is further configured to apply a low-pass filter to the first and second images.
7. The behavioural monitoring system of any preceding claim, wherein the second plurality of images comprises images captured at random points in time during the predetermined time period.
8. The behavioural monitoring system of any of any preceding claim, wherein the first camera is a camera with a fixed position in relation to the screen.
9. The behavioural monitoring system of any preceding claim, wherein the second camera is a portable camera which can be appropriately configured to capture the working environment.
10. The behavioural monitoring system of any preceding claim, further comprising a communications engine configured to transmit the captured images and/or alerts to a remote computer via a communications network.
11. A method of monitoring human behaviour to complete a task using a computer over a predetermined time period; the method comprising: capturing a first plurality of images of a user situated at a computer screen to monitor the user behaviour during the time period; capturing a second plurality of images of the working environment in which the user is positioned during the task, the working environment including a surface at which the user is working to complete the task using the computer; processing at least one of the second plurality of captured images to determine objects within the working environment at the beginning of the time period and to process at least some of the second plurality of captured images to identify objects present within the working environment during the time period until the end of the time period; comparing the determined objects from the beginning of the task with objects within the working environment during the time period; and generating an alert if there has been any change in the number or type of objects positioned within the working environment during the time period.
12. The method of Claim 11, further comprising capturing, at the second camera, a 360-degree image or video of the environment at the start of the predetermined time period and building, at the environment processor, a three-dimensional representation of the environment.
13. The method of Claim 11 or 12, wherein capturing the second plurality of images comprises capturing a first and second image of the environment taken from a first position and second position respectively, wherein representations of a first object and a second object are captured in each of the first image and second image; wherein the method further comprises: determining a displacement distance between the first and second positions; determining a first linear displacement distance of corresponding points of the first object between the first and second images; determining a second linear displacement distance of corresponding points of the second object between the first and second images; and calculating locations of the first and second objects with respect to the first and second positions using the displacement distance, the first linear displacement distance and the second linear displacement distance using a stereoscopic matching function.
14. The method of Claim 11 or 12, wherein capturing the second plurality of images comprises capturing a first and second image of the environment taken from a location in a first angular direction and a second angular direction respectively, wherein representations of a first object and a second object are captured in each of the first image and second image; wherein the method further comprises: determining determine an image displacement angle between the first and second angular directions; determining a first angular displacement of corresponding points of the first object between the first and second images; determining a second angular displacement of corresponding points of the second object between the first and second images; and calculating locations of the first and second objects with respect to the location using the image displacement angle, the first angular displacement and the second angular displacement using an angular stereoscopic matching function.
15. The method of Claim 13 or 14, further comprising using the determined locations of the first and second objects to build a 3D model of the environment.
16. The method of any of Claims 13 to 15, further comprising applying a low-pass filter to the first and second images.
17. The method of any of Claims 11 to 16, further comprising capturing the second plurality of images at random points in time during the predetermined time period.
18. The method of any of Claims 11 to 17, wherein the first camera is a camera with a fixed position in relation to the screen.
19. The method of any of Claims 11 to 18, further comprising capturing, at the second camera, the working environment, wherein the second camera is a portable camera.
20. The method of any of Claims 11 to 19, further comprising transmitting, at a communications engine, the captured images and/or alerts to a remote computer via a communications network.