CN117653106A

CN117653106A - Emotion classification system and method

Info

Publication number: CN117653106A
Application number: CN202211125523.XA
Authority: CN
Inventors: 邹迪; 刘亚林
Original assignee: Hong Kong University Of Education
Current assignee: Hong Kong University Of Education
Priority date: 2022-08-09
Filing date: 2022-09-13
Publication date: 2024-03-08

Abstract

An emotion classification system for identifying an emotion of a subject, comprising: an eye tracking device for capturing eye data relating to at least one eye of a subject; a computing device including a processing unit, a storage unit, a user interface, a display, and a communication interface; the computing device is in electronic communication with the eye tracking apparatus, wherein the computing device is configured to receive eye data from the eye tracking apparatus; processing the received ocular data by applying a machine learning model to the received ocular data; and automatically identifying the emotion of the subject based on processing the received ocular data.

Description

Emotion classification system and method

Technical Field

The present invention relates to an emotion classification system and method, and more particularly, to a non-invasive emotion classification system and method that outputs emotion of a subject based on eyes of the subject.

Background

Today's world, technology deductions are increasingly monthly. Students and workers are faced with higher and higher academic and working pressures and are stressed in order to keep pace with technological advances. With the continued development of new technology, the stress of students and workers has increased, resulting in an increase in the number of people (e.g., students, labor population) who have various psychological and emotional problems. Examples of emotions include depression, anxiety, irritability, and the like. These problems can negatively impact the work, learning, and physical and mental health of an individual. These psychological and emotional problems ultimately lead to a reduction in quality of life.

Disclosure of Invention

The invention relates to an emotion classification system and method. The emotion classification method is performed by the emotion classification system and components thereof. The emotion classification system determines and outputs an emotion of a subject based on eye data of the subject. The system is for sensing (i.e., measuring) eye movements and/or position (i.e., eye data) of a subject and processing the sensed data using a trained machine learning engine to output emotion of the subject. The machine learning engine is to classify an emotion of the subject based on processing the measured ocular data.

The emotion classification system is configured to detect a plurality of emotions by processing the eye data. For example, the system is configured to determine whether the subject is smoldering, boring, relaxing, etc., based on processing the collected ocular data of the subject. The emotion classification system provides an automated system for determining an emotion of a subject. The system provides a convenient, easy to use and non-invasive method of emotion classification. The emotion classification system and method provide a realistic application scenario, for example, determining the health condition of students in schools or colleges. Of course, the illustrated system may also be applied to its scenario.

According to a first aspect of the present invention, there is provided an emotion classification system for identifying an emotion of a subject, comprising:

an eye tracking device for capturing eye data relating to at least one eye of a subject; and

a computing device including a processing unit, a storage unit, a user interface, a display, and a communication interface;

the computing device is in electronic communication with the eye tracking apparatus, wherein the computing device is to:

receiving eye data from the eye tracking device;

processing the received ocular data by applying a machine learning model to the received ocular data; and

based on the processing of the received ocular data, the emotion of the subject is automatically identified.

In an embodiment of the first aspect, the identified emotion is transmitted to and displayed on the display.

In an embodiment of the first aspect, the computing device is configured to identify at least one of 9 emotions of the subject, wherein the emotions include: smoldering, boring, relaxing, sad, calm, happy, anxiety, tension or agitation.

In an embodiment of the first aspect, the computing means is for processing the received ocular data and outputting an emotion model, each emotion identified by the computing means being defined by the emotion model;

Wherein the emotion model comprises a two-dimensional emotion model, a first dimension representing a wake level of the emotion of the object, a second dimension representing a valence level of the emotion of the object, each dimension being categorized as one of three levels;

the display is for presenting the two-dimensional emotion model.

In an embodiment of the first aspect, the computing means is for identifying one of the 9 emotions by classifying each emotion based on a rank of each dimension in the emotion model, wherein each emotion is represented by a combination of one of three ranks of each dimension in the emotion model.

In an embodiment of the first aspect, the computing means is for expressing the identified emotion in a cyclic emotion model, the cyclic emotion model being presented on the display.

In an embodiment of the first aspect, the eye tracking device is for detecting eye data by a non-contact arrangement.

In an embodiment of the first aspect, the eye tracking apparatus comprises:

one or more light sources for emitting light to an eye of a subject; a detector for repeatedly capturing an image of the subject's eye; an evaluation unit for generating ocular data, wherein the ocular data of at least one eye comprises at least one or more of: gaze location, pupil diameter, and pupil radius; and a communication interface for transmitting the ocular data to the computing device.

In an embodiment of the first aspect, the computing means is for expressing the identified emotions in a Russell loop emotion model, wherein the Russell loop emotion model is presented on the display, the Russell loop emotion model representing each emotion at a level of detection of valence and arousal of the subject based on the eye data, the Russell loop emotion model being represented as a circle comprising a vertical axis representing valence and a horizontal axis representing arousal, the detected emotions being represented on or within the circle.

In an embodiment of the first aspect, the computing means is for transmitting the identified emotion to a mobile device or a remote server over a wireless communication interface.

In an embodiment, the computing device is configured to process the ocular data by recording the ocular data to a neural network executing on the computing device, wherein the neural network is a convolutional neural network having at least three layers;

the first layer is a convolution layer for performing feature extraction on the received eye data, and the convolution layer is used for outputting a feature map modified by an activation function;

Wherein the second layer comprises a pooling layer, wherein the pooling layer receives the feature map from the convolution layer and the pooling layer is to perform feature selection and information filtering on data received from the convolution layer,

wherein the third layer is a fully connected layer, wherein each neuron in the fully connected layer is fully connected to all neurons of a previous layer, and the fully connected layer is used for recognizing emotion and outputting the recognized emotion.

In one embodiment, the eye tracking apparatus is for recording a plurality of samples of ocular data at a predetermined sampling rate, and the computing device is for processing the plurality of ocular data to automatically identify the emotion of the subject.

In one embodiment, the neural network is trained by a training database, the training database comprising a plurality of data points, the system comprising a training database for storing the training database;

wherein each data point in the training database defines a relationship between the eye data and emotion.

In one embodiment, the neural network is used for training by:

introducing noise samples into the training database;

Randomly selecting data points with preset sample sizes from the training database;

recording the sample size to the neural network for processing; and

repeatedly recording the preset sample size to the neural network, and adjusting noise, neural network parameters and adding a network module to optimize network performance.

In an embodiment, the method is performed by a mood classification system comprising an eye tracking apparatus and a computing device in communication with the eye tracking apparatus, and the method comprises the steps of:

the eye tracking device captures eye data relating to at least one eye of a subject; and

the computing device receives eye data from the eye tracking apparatus;

In an embodiment, the identified emotion is displayed on a computing device display.

In one embodiment, the step of the computing device identifying emotion comprises: the computing device identifies at least one of 9 emotions of the subject, wherein the emotions include: smoldering, boring, relaxing, sad, calm, happy, anxiety, tension or agitation.

In one embodiment, the step of processing the ocular data comprises:

outputting an emotion model, wherein each emotion identified is defined by the emotion model;

wherein the emotion model comprises a two-dimensional emotion model, a first dimension representing a wake level of the emotion of the object, a second dimension representing a valence level of the emotion of the object, each dimension being categorized as one of three levels; and

the displaying step includes: presenting the two-dimensional emotion model on the display.

In an embodiment, each of the 9 emotions is classified based on a rank of each dimension in the emotion model, wherein each emotion is represented by a combination of one of the three ranks of each dimension in the emotion model.

In an embodiment, the identified two-dimensional emotion is expressed in a cyclic emotion model, and the method comprises the step of presenting the cyclic emotion model on the display.

Drawings

Embodiments of the present invention will be described below, by way of example, with reference to the accompanying drawings. In the drawings:

FIG. 1 is a schematic diagram of an emotion classification system according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a computing device of the emotion classification system of FIG. 1;

FIG. 3 shows a flow chart of a method of emotion classification according to an embodiment of the present invention;

FIGS. 4A-4E illustrate screen shots of an emotion classification system presented on a display of the emotion classification system;

FIG. 5 shows an example of a cyclic emotion model presented on a display of an emotion classification method;

FIG. 6 illustrates an example of a report generated by the emotion classification system;

FIG. 7 is an example of a two-dimensional wake titer model;

FIG. 8 illustrates an example method of creating training data sets from X volunteers through non-invasive ocular data acquisition; and

FIG. 9 illustrates an example method of training a neural network for emotion recognition.

Detailed Description

The present invention relates to an emotion classification system for identifying one or more emotions of a subject based on measured ocular data of the subject. The emotion classification system is an automatic, non-invasive emotion classification system that measures eye data of a subject using an eye tracking device. The ocular data is processed using an artificial intelligence model executed by the computing device to output the emotion of the subject (i.e., the user). The emotion of the object is output on a display of the computing device. The identified emotion may be transmitted to a mobile device of the subject, e.g., a smart phone. Alternatively, the identified emotion may be transmitted by the computing device to another remote device or remote server.

Referring to fig. 1, an embodiment of an emotion classification system is shown. An emotion classification system comprising: an eye tracking device for capturing eye data relating to at least one eye of a subject; and a computing device including a processing unit, a storage unit, a user interface, a display, and a communication interface; a computing device in electronic communication with the eye tracking apparatus, wherein the computing device is to: receiving ocular data from an ocular tracking device; processing the received ocular data by applying a machine learning model to the received ocular data; and automatically identifying the emotion of the subject based on the processing of the received eye data. The computing device is for identifying at least one of the 9 emotions of the subject.

The identified emotion is transmitted to an emotion classification system, wherein the identified emotion is transmitted to and displayed on a display. The user interface is for receiving input from an object. The display is used to display the identified emotion and is also used to present reports and other information to the subject.

In this embodiment, the emotion classification system is used to automatically capture eye-tracked eye data (i.e., eye-related data) from a single eye-tracking device. The eye tracking apparatus non-invasively captures eye data and transmits the captured eye data to the computing device for processing. The computing device automatically identifies the emotion of the subject by processing the ocular data using a neural network (e.g., a convolutional neural network). The display is for displaying the identified emotion of the subject. The emotion classification method is used to capture eye data, process the eye data using a neural network, and present the identified emotion on a display. The emotion classification method automatically captures eye data and automatically recognizes emotion based on the captured eye data.

An advantage of automatic emotion recognition is that an automatic system for recognizing the emotion of a subject is provided. The system provides a low cost and low invasive system for emotion recognition. The system uses an eye tracking device that functions as a plug-and-play system that can be coupled to a computing device. The eye tracking apparatus may be coupled to any computing device, such as a laptop computer, desktop computer, smart phone, tablet computer, or any other suitable computing device. Emotion classification systems and emotion classification methods may be used to help individual subjects recognize emotion and understand their mental state and allow the subjects to improve their mental state.

As shown in fig. 1, a schematic diagram of an emotion classification system 10 is shown. The emotion classification system includes: an eye tracking apparatus 20 and a computing device 100. The eye tracking device 20 is used to capture eye data (i.e., data related to the subject's eye). The eye tracking apparatus 20 is for electronic communication with a computing device. The eye tracking apparatus is used to transmit the captured eye data to the computing device 100.

The eye tracking apparatus 20 includes: one or more light sources for emitting light to an eye of a subject; a detector for repeatedly capturing an image of the subject's eye; an evaluation unit for generating eye data; and a communication interface for transmitting the ocular data to the computing device 100.

Fig. 1 also shows a schematic diagram of the components of eye tracking apparatus 20. Referring to fig. 1, the eye tracking apparatus 20 includes: at least one light source 22 (i.e., a lighting device) for illuminating the eye of the subject. Preferably, the eye tracking device includes a plurality of light sources 22 (i.e., a plurality of illumination devices). The light source 22 is used to deliver light to the eye of the subject in a predetermined pattern. The light source 22 irradiates the eyes of the subject by emitting infrared or near infrared light onto the eyes of the subject in a predetermined pattern. As shown in fig. 1, the dashed lines represent light projected onto the subject's eye 12.

The eye tracking apparatus further includes: a detector 24 for capturing an image of the subject's eye. The detector is a camera that repeatedly captures an image of the subject's eye. The detector 24 is preferably a high resolution (i.e., high definition) camera. The camera 24 is used to capture a plurality of images of the eye over a predetermined time. Camera 24 is a high-speed image capture rate camera, e.g., a camera that can capture hundreds of frames per second.

The eye tracking apparatus 20 further includes: an evaluation unit 26 in communication with the detector 24 (i.e., camera). The evaluation unit 26 includes: a processor for processing images captured by the camera to determine ocular data. The evaluation unit 26 is configured to generate ocular data, wherein the ocular data of at least one eye comprises at least one or more of: gaze location, pupil diameter, pupil radius, and eye position. The evaluation unit may be implemented as a chip or an integrated circuit. The evaluation unit 26 is adapted to perform appropriate image processing and gaze mapping algorithms on the captured image to identify the eyes 12 in the captured image and to calculate eye data.

The eye tracking apparatus 20 includes: a communication interface 28. Communication interface 28 is used to transmit ocular data to computing device 100. The communication interface 28 is a wireless interface that allows wireless communication of ocular data from the ocular tracking device 20 to the computing apparatus 100.

The eye tracking device 20 is used to capture eye data in a non-contact manner, i.e., the eye tracking device does not need to contact the subject (i.e., the user) in any way. The eye tracking device 20 provides a non-invasive or less invasive eye data capture device. Preferably, the eye tracking device is a screen-based eye tracking device 20. The screen-based eye tracking apparatus 20 is mounted on a display (i.e., screen) of the computing device 100, for example, on the top or bottom of the display or along one side of the display. One example of an eye tracking device 20 is a Tobii eye tracker device, e.g., a Tobii eye tracker 5.

Alternatively, the eye tracking device 20 may be a wearable eye tracking device. The wearable eye tracking device may be integrated into a pair of eyeglasses. The eyewear includes the components of the eye tracking apparatus 20 as previously described.

Fig. 2 shows a schematic diagram of an example computing device 100 of emotion classification system 10. In the illustrated embodiment, the computing device 100 includes: appropriate components necessary to receive, store, and execute appropriate computer instructions. These components may include: a processing Unit 102 comprising a central processing Unit (Central Processing Unit, CPU), a Math Co-processing Unit (Math Processor), a graphics processing Unit (Graphics Processing Unit, GPU) or a tensor processing Unit (Tensor Processing Unit, TPU) for tensor or multidimensional array computation or manipulation operations; a Read-Only Memory (ROM) 104; a Random-Access Memory (RAM) 106; input/output devices, such as disk drives 108; an input device (i.e., user interface) 110, such as an ethernet port, a USB port, etc.; a display 112, such as a liquid crystal display, a light emitting display, or any other suitable display; and a communication link 114 (i.e., a communication interface). The display 112 may be integrated with the computing device 100 as shown in fig. 2.

Alternatively, the display 112 may be a remote display that may be in wired or wireless communication with the processing unit 102 and/or other components of the computing device. The computing device 100 further includes: a user interface 110. The user interface 110 allows a user (i.e., an object) to input data. For example, the user interface 110 may include: a plurality of buttons or a dial or a combination thereof. Alternatively, the user interface 110 may further include: a touch screen or a combination of buttons/dials and a touch screen. The user interface 110 may be integrated on the display 112.

The computing device 100 may include instructions that may be stored in the ROM104, RAM106, or disk drive 108 and executed by the processing unit 102. A plurality of communication links 114 (i.e., communication interfaces) may be provided that may be variously connected to one or more computing devices, such as servers, personal computers, terminals, wireless or handheld computing devices, internet of things (Internet of Things) devices, smart devices, and edge computing devices. At least one of the plurality of communication links may be connected to an external computing network by a telephone line or other type of communication link. The computing device 100 may include a bluetooth module or a Wi-Fi module or any other communication module that may be used as a communication link to allow the computing device 100 to communicate with other equipment. For example, a communication link 114 (communication interface) may be used to communicate with the eye tracking device 20.

The computing apparatus 100 may include a storage device, such as a disk drive 108, which may include a solid state drive, a hard disk drive, an optical drive, a tape drive, or a remote or cloud-based storage device. Computing device 100 may use a single disk drive or multiple disk drives, or a remote storage service. The computing device 100 may also have an appropriate operating system 116 resident on a disk drive or in the ROM of the computing device 100.

The computing device 100 may also provide the necessary computing power to operate or interface with a machine learning network, such as a neural network, to provide various functions and outputs. The neural network may be implemented locally or may also be accessed or partially accessed through a server or cloud-based service. The machine learning network may also be untrained, partially trained or fully trained, and/or may also be retrained, modified or updated over time. In one example, the computing device 100 is used to implement a convolutional neural network. Preferably, the neural network is a deep convolutional neural network having a plurality of layers.

Computing device 100 includes one or more databases 120. In an example form of computing device 100, database 120 is used to store emotion models. The emotion model comprises a two-dimensional emotion model defining arousal and valence. The database also stores relationships between emotions and emotion models. The emotion model comprises a two-dimensional emotion model. The first dimension represents a arousal level of the emotion of the subject and the second dimension represents a valence level of the emotion of the subject. Each dimension is categorized into one of three classes: high, medium and low. The relationship between emotion and emotion model is used to identify emotion.

The computing device 100 includes a training database stored in a training database 122. The training database includes a plurality of data points. The data points in the training database define the relationship between the eye data and the emotion. The data in the training database is used to train a neural network for automatic emotion recognition. The training database preferably includes a number of data points that are collected from individual objects as part of the database creation process.

The computing device 100 (i.e., computer) may be implemented by any computing architecture, including: a portable computer, a tablet computer, a stand-alone Personal Computer (PC), a smart device (e.g., a smartwatch or smart glasses), an internet of things (Internet of Things, ioT) device, an edge computing device, a client/server architecture, a "dumb" terminal/host architecture, a cloud computing-based architecture, or any other suitable architecture. Alternatively, the computing device 100 may be implemented in a smart phone or tablet.

In another form, components of computing apparatus 100 may be implemented across multiple devices, for example, some components may be implemented on a smart phone and some may be implemented on a laptop or desktop computer. For example, the processing of ocular data may be performed on a laptop or desktop computer, and the smartphone may receive the processed ocular data and perform emotion recognition on the received ocular data. The smartphone may display the identified emotion on its display screen. Computing device 100 may be suitably programmed to implement the emotion classification method.

When using emotion classification system 10, the system is used to automatically recognize (i.e., recognize) the emotion of the subject. The subject turns on the eye tracking apparatus 20 and the computing device 100 the eye tracking apparatus 20 is used to scan the subject's eyes and capture eye data. The eye tracking device is used to capture eye data including at least gaze location and pupil diameter or pupil radius. The evaluation unit is used for generating eye data. The captured ocular data is transmitted to a computing device.

The computing device 100 is for receiving ocular data from the eye tracking apparatus 20. The computing device 100 is further for processing the received ocular data by applying a machine learning model to the received ocular data; and the machine learning model is a multi-layer convolutional neural network. The computing device 100 is for automatically identifying the emotion of the subject by processing the received ocular data using a neural network executing on the computing device. The computing device is for presenting the identified emotion on the display.

The computing device 100 is for identifying at least one of 9 emotions of a subject, wherein the emotions include: depression, boredom, relaxation, sadness, happiness, anxiety, tension or excitement. Emotion classification system 10 may be used to identify an emotion of a subject in real-time and provide feedback on the identified emotion on a display. The computing apparatus 100 may be used to send the identified emotion to a mobile device (e.g., a smart phone) of the subject, or to a server or remote computing device. For example, the computing device may communicate the identified emotion to a psychologist or teacher or parent/guardian or doctor or other person associated with the subject.

Emotion classification system 10 is also used to continuously identify the emotion of a subject. The system 10 is used to identify emotions as they change. The eye tracking device 20 is used to continuously measure ocular data and the computing device is used to process ocular data and identify the emotion of the subject. The detected emotional change may be displayed on a display.

The computing device 100 is configured to process the received ocular data and output an emotion model. The emotion model includes a two-dimensional emotion model 700, as shown in fig. 7. The two-dimensional emotion model 700 may be presented on a display. Each emotion recognized by the computing device is defined by an emotion model. In the emotion model, a first dimension represents a arousal level of an emotion of the subject, and a second dimension represents a valence level of the emotion of the subject. Each dimension is categorized into one of three classes: high, medium and low as shown in fig. 7. Optionally, the display 112 is used to present a two-dimensional emotion model. The emotion model corresponds to a scoring scale that scores the subject's emotion's titer and arousal level based on the detected ocular data. The emotion model may not be presented.

The computing device is configured to determine arousal and potency based on the measured ocular data. Specifically, the neural network is trained to identify arousal and titers based on detected changes in pupil diameter and gaze direction of the subject. The eye tracking device 20 is used to detect changes in gaze direction and changes in pupil diameter. The computing device 100 and the neural network executing on the computing device are used to identify at least pupil diameter and output at least arousal and potency as part of emotion recognition. The neural network ultimately outputs an emotion based on the detected wake value and the utility value.

Referring to fig. 7, emotion model 700 includes: the horizontal axis representing potency and the vertical axis representing arousal. The potency is related to how pleasant the event is to occur. Arousal is related to the level of automatic activation created by the event. In the emotion model, the computing device determines three levels for each parameter (potency and arousal). For each parameter, the three levels are low, medium and high. Eye data is processed, and a arousal level and a titer level are determined by a neural network based on image processing of the eye data. As shown in fig. 7, after processing the ocular data, a arousal level and a valence level are determined and plotted to create an emotion model.

The computing device is also configured to identify one of the 9 emotions by classifying each emotion based on the level of each dimension in emotion model 700. Each emotion is represented by a combination of one of three levels for each dimension in the emotion model. Preferably, the identified emotion is displayed on the display 112 or transmitted to another device, e.g., a remote device. In one example, the computing device is to express the identified emotion in a cyclic emotion model 500 as shown in fig. 5. The cyclic emotion model may optionally be presented on display 112.

Referring to fig. 5, cyclic emotion model 500 is a Russell cyclic emotion model, i.e., a Russell emotion ring model. As shown in fig. 5, the computing device uses the emotion model information to classify the arousal level and the valence level into quadrants on the annular emotion model. Based on the detected arousal and valence levels, a cyclic emotion model is used to determine a specific one of the 9 emotions. The 9 emotions that computing device 100 may recognize are: anxiety, tightness, agitation, sadness, calm, happiness, smouldering and relaxation.

Referring to fig. 5, emotion is shown on a cyclic emotion model 500. Alternatively, a ring emotion model 500 may be presented on the display, in which a particular identified emotion is identified, for example, by a marker or color or other suitable means. Referring to fig. 5, each emotion is identified based on the arousal level and the valence level in the emotion model. For example, as shown in fig. 5: anxiety = high arousal, low potency; tightening = high arousal, medium potency; agonism = high arousal, high potency; sad = wake-up, low potency; calm = medium wake, medium potency; happy = wake-up, high potency; depression = low arousal, low potency; boring = low wake, medium potency; relaxation = low arousal, high potency. Optionally, the above-described relationship between emotion and emotion model (as defined above) is stored in database 120. The relationship may be defined in a table.

Emotion classification system 100 is also used to generate report 600 that includes at least the identified emotion and emotion model information, as well as other information. Fig. 6 shows an example of a report generated by emotion classification system 100. The report may include more detailed information that may be used by a medical professional, such as a psychologist or psychiatrist, to detect the subject. The report may include the identified emotions and the changes in the emotion over time of the subject using system 10.

Fig. 6 shows an example of a report. The report of fig. 6 shows the emotion identified over a period of time. Alternatively, the report may indicate a variety of emotions that the subject has identified when using the system 10. Referring to fig. 6, the report includes the detected arousal and titer levels. This is illustrated by section 602, section 602 illustrating the probability that the neural network detects a titer level and a wake level. In the example shown, high wake-up and medium titers are detected. The detected wake-up and titers are included in table 604. A cyclic emotion model 606 is provided in the report that plots wake and valence data detected by processing ocular data using a neural network. 9 emotion classifications, namely relationships between emotion and emotion data, are recorded in table 608. The currently identified emotion is presented in a marker 610. In the illustrated example report, the indicia 610 are colored circles. The circle may be colored to represent emotion. Each emotion may be represented by a unique color. The report may be downloaded to the device of the object. The report may also be sent to another device, such as a report of a medical professional. The report may also include a continuous trend of emotion as the system continually detects.

Fig. 3 illustrates an example emotion classification method 300. The method comprises a step 302 comprising: eye data is collected using an eye tracking device. The ocular data includes one or more of ocular data of at least one eye, the ocular data including at least one or more of: gaze location, pupil diameter, and pupil radius.

The method proceeds to step 304, which includes data preprocessing. The preprocessing step 304 is performed by the computing device 100 and includes: the ocular data is filtered to remove noise. The preprocessing may further include: the data is normalized and optionally averaged.

Step 306 includes: the emotion of the subject is identified based on processing the ocular data. The step of identifying the emotion is performed by the computing device 100.

As previously described, the step 306 of identifying emotion includes: processing the ocular data and determining an emotion model. An emotion model including a wake value and a valence value is used to determine (i.e., identify) an emotion of the object. The process of determining emotion using the emotion model is as described above. One of 9 emotions is determined based on the relationship between the arousal value and the effective value. Preferably, the emotion is determined using a Russell loop emotion model, which correlates wake values and valence values with emotion.

The identified emotion is determined as a probability. One or more combinations of emotions may be identified, and the probability of each identified emotion may be identified. This is because the emotions of humans are often mixed together. The probability of the identified emotion is presented on the display.

The method 300 comprises the following steps: step 308. Step 308 comprises: transmitting the emotion of the identified object. Step 310 includes: displaying the identified emotion and/or transmitting the identified emotion to a remote device. The identified emotion is represented by a mark, such as a colored circle or other mark. Each emotion may be defined by a unique signature. A variety of emotions may be presented to the user. The probability of each identified emotion is displayed on a display. In an example, multiple markers may be presented to indicate various emotions that may be identified.

Optionally, the method may include: step 310 of generating a report. The report includes two-dimensional data of detected arousal and valence values (i.e., emotional models). The report also includes the emotions detected in time in the event and the detected emotion changes. Preferably, the report indicates the emotion detected on a time scale, i.e. over a predetermined period of time. The time period may be the time when the subject uses emotion classification system 10.

Preferably, the method 300 is repeated over a period of time to track the emotion of the subject as the user uses the system 10. The emotion of the subject is automatically detected, i.e. recognized. As a new emotion is detected, the detected emotion is updated. Emotion classification method 300 is performed by components of emotion classification system 10.

The emotion classification system utilizes a convolutional neural network for emotion recognition. The convolutional neural network performs feature extraction and classification on the eye data. The network also determines an emotion model (i.e., arousal and valence values in two dimensions) and identifies emotion based on the ocular data and the emotion model. Other deep neural network architectures are also contemplated.

In the depicted example, a convolutional neural network is stored in a disk drive or ROM or other memory unit of computing device 100 and is executed by processor 102. Convolutional neural networks for emotion recognition are multi-layer neural networks. In one example, the network includes at least three layers. The first layer is a convolutional layer for performing feature extraction on the received ocular data. The convolution layer is used to output a feature map modified by the activation function. The second layer includes a pooling layer. The pooling layer receives the feature map from the convolution layer and is configured to perform feature selection and information filtering on the data received from the convolution layer. The third layer is a fully connected layer, wherein each neuron in the fully connected layer is fully connected to all neurons of a previous layer, and the fully connected layer is used to recognize emotion and output the recognized emotion. The third layer is for determining arousal and titer levels based at least on changes in pupil diameter of the subject as part of emotion recognition.

The convolutional neural network is trained using the training data set. The training data set is created using multi-point eye data from different users (i.e., subjects). The training data set is stored as a training database and in training database 122. Data of the training database is captured using a non-invasive data acquisition method.

Fig. 8 illustrates an example method of creating a training dataset through non-invasive ocular data acquisition. Referring to fig. 8, step 802 includes: x volunteers were organized to participate in the data acquisition process. X is an integer. The X volunteers may be organized by any suitable mechanism (e.g., by telephone call, text message, etc.). Depending on the particular application, different types of volunteers (e.g., different ethnicities, different sexes, etc.) may be selected to create the database.

The experimental equipment required for data acquisition includes: computer, eye tracking device (e.g., tobii eye tracker), emotion self-evaluation questionnaire based on Russell loop emotion expression model (i.e., russell loop emotion model), M videos with sound and N video games, where M and N are integers.

Step 804 includes: volunteers were requested to adjust their eye and mental states. Optionally, appropriate eye massage may be performed in advance in order to complete the subsequent experiments.

Step 806 includes: the volunteers were presented with M videos and eye data was recorded during each video viewing cycle. The ocular data may include parameters similar to those previously described, such as gaze direction, pupil diameter, etc.

Step 808 includes: the N video games are presented to the volunteer such that the volunteer plays the video game. Eye data was recorded while volunteers played N video games. Volunteers may be requested to complete N video games. Alternatively, step 808 may be performed concurrently with step 806. In some cases, volunteers may be required to watch video or play video games or both.

Step 810 includes: a complete scaled questionnaire from each volunteer is presented and received, wherein the questionnaire requires the volunteers to score their emotions based on their overall emotion during viewing the video and playing the video game. After each volunteer completed watching the video and playing the game, a questionnaire was provided to the volunteer. Volunteers quantified all emotions in two dimensions of titer and arousal. For each of the titers and arousals, the questionnaire will require the volunteer to quantify all emotions into one of three classes, low, medium and high. The questionnaire information is used to associate ocular data (i.e., ocular data collected when volunteers watch video and play a game) with an emotion model (i.e., valence and arousal).

To avoid eye strain of the subject (i.e., volunteer), the duration of each video and game is short, e.g., 2 minutes. Volunteers would be required to rest for at least 1 minute after completing one video or game and then enter the next video or video game. In addition, in order to unify the duration of the eye data recording, all video and game durations are unified as a time length T.

The sampling rate f is used for the eye tracking device to record eye data from volunteers. For each volunteer, the number of samples during each video and/or game will be recorded. For each volunteer, t×f samples of ocular data will be recorded, where T is the recording time. During the initial training dataset creation phase, a sample of eye data recorded by the eye tracker will contain 6 data points, including two-dimensional coordinates defining the position of the volunteer's eye's gaze on the computer screen and the pupil diameters of both eyes. Each record will include gaze coordinates and pupil diameters may be digitally stored using 4 byte floating point numbers. After one trial, i.e. during the data acquisition of the training dataset, T x f x 6 x 4 bytes will be recorded for each volunteer. The above-described data recording may be performed during steps 806 and 808.

Step 812 includes: the missing samples in the recorded data are interpolated and reset. During eye closure, rapid saccadic, blank data collected in the middle of blinking or glare due to light or other factors that may cause erroneous data points may occur with missing data samples.

Step 814 includes: the recorded pupil diameter values were normalized by replacing the pupil diameter with a fluctuation in pupil diameter. This normalizes the reading of pupil diameter within the sample. Other normalization techniques may be considered to handle varying pupil diameters.

Preferably, as part of method 800, X volunteers participate in M video tracks (i.e., video viewing) and N game attempts (i.e., games played). A total of x× (m+n) ocular data are collected, each ocular data being t×f×6×4 bytes. X is a preset proper number. For example, X may represent 2000 volunteers. The greater the number of volunteers, the greater the training data set in the training database. An advantage of a large dataset is that it can yield a neural network that is better trained. Further, better neural networks provide higher confidence and accuracy for emotion recognition. In one example, 14 videos are presented and 4 video games are used.

Volunteers quantified all emotions in both the titer and arousal dimensions for grading. The quantification is based on the Russell circulating emotion expression model. Each eye profile, i.e., each eye data point collected, will correspond to a titer level and a arousal level. Step 816 includes: the relationship between the ocular data and the emotion model is stored as a training database of all data samples collected for the total number of volunteers. In step 816, relationships between the measured ocular data and the emotion model, i.e., the valence and arousal values, are created and stored as part of a training database. Titers and arousals were quantified at three levels, low, medium and high. Step 816 includes: a training database is created. The training database may be expanded at any time by performing the method 800 on any number of new volunteers. Adding training databases provides a wider data set that can be used to improve the performance of the neural network.

The neural network is trained using a training database, such as a multi-layer convolutional neural network as previously described. Training databases are created using the method 800. FIG. 9 illustrates an example method of training a neural network for emotion recognition. Referring to fig. 9, a neural network training method 900 includes a step 902. Step 902 includes: noise samples are added to the training database to expand the data set. The added noise aids in training and can produce a more robust neural network.

Step 904 includes: random sample sizes are randomly selected as training sets. Step 906 includes: the selected sample size is recorded to a neural network.

In one example, the amplified sample size s% from the training database is randomly selected. The sample size s% was used as test data to evaluate the accuracy of the neural network. The remaining data 1-s% (i.e., test data set minus total training database data set) is used as training data in step 906. Data sets 1-s% were recorded to the neural network and the process repeated in batches. The model is trained by repeating the operation. For example, steps 904 and 906 may be repeated a set number of times to train the neural network. For example, the training data set may be recorded thousands of times to the neural network. Because the test data is randomly selected, steps 904 and 906 are repeated to create variances in the training data set. The method improves the accuracy of the neural network. Optionally, the amount of noise and neural network parameters (e.g., kernel size, number of pooling layers, etc.) may be varied to optimize neural network performance. Further alternatively, a network module, e.g. a residual module, may also be added during the training process.

Step 908 includes: the best training model is stored. The best model may be stored in a storage unit, e.g., a memory unit, a hard drive, a flash drive, a solid state drive, etc. After repeated training, the optimal training model may identify each of the three levels of potency and arousal based on processing the ocular data. The model may also output probabilities of 9 emotions based on the titers and arousal levels. The 9 emotions that can be identified are: depression (low arousal, low potency); boring (low arousal, medium titer); relaxation (low arousal, high titers); sadness (middle arousal, low titers); calm (middle arousal, middle titer); open heart (medium wake, high potency); anxiety (high arousal, low titers); stress (high arousal, medium titer); and agitation (high arousal, high potency).

One exemplary use of emotion classification system 10 is to provide a non-invasive monitoring system for monitoring student emotion in a school or college. Students often suffer from or are susceptible to mental health problems due to increased academic pressure and increased social pressure. The increase in social pressure is due to the prevalence of social media and network spoofing. Psychological health problems of students often manifest as irritability, anxiety, etc., which are detrimental to the students' home life, learning, and overall health. The system provides a simple, inexpensive, non-invasive platform for emotion recognition and mental health tracking. The eye tracking apparatus may be connected to a computing device of the student. The neural network may be stored and executed by a student's computing device (e.g., a laptop computer or desktop computer). Thus, the system 10 can be easily implemented on student equipment, making it simple and cost effective to use.

Through emotion monitoring, the psychological state and psychological health of the student can be monitored, because emotion can be used as an index for measuring whether the psychological state is good or not. The change over time in emotion detected by system 10 executing the emotion classification method also provides an effective way of monitoring the mental health of a student. Computers have become a common facility in schools and colleges.

Referring to fig. 4A through 4E, an example use of emotion classification system 10 is shown. Fig. 4A-4E illustrate screen shots of an emotion classification system presented on a display of emotion classification system 10. These figures show the content displayed to the object, i.e. the user.

Referring to fig. 4A, the eye tracking device 20 is activated for eye calibration. An eye tracker activation screen 400 is presented. A screen with a "naked eye tracker" button 402 is presented on the display 112 of the computing device 100. Clicking on the button may present information about how to use the eye tracking device. This step may be used for calibration. The calibration step may be performed only when the subject is first using the system 10. Calibration may be skipped.

Referring to fig. 4B, after activating the eye tracker, a set of instructions may be presented on how to use the eye tracking device. The user is presented with a button 404 to open the eye tracker manager (e.g., "open Tobii manager"). In this example, a Tobii eye tracker was used as the eye tracking. A next icon 408 is presented on the display 112. The next icon 408 allows the subject to jump to the next step in eye tracking.

Referring to fig. 4C, an eye tracking screen 406 is displayed. The screen shows the time of eye tracking. In the example shown, eye tracking is performed by eye tracking device 20 for 2 minutes. Once activated, the eye tracking device 20 operates in the background. Screen 406 may be minimized when the object uses other applications on computing device 100. The eye tracking device continues to collect eye data (i.e., eye data) in the background. An icon of the next step 408 is presented. The icon of the next step allows the object to continue to perform the emotion recognition process.

Referring to fig. 4D, an emotion recognition screen 410 is presented on display 112. Eye data is recorded into a trained neural network for emotion recognition. The recognition result is presented on screen 410 in two dimensions of wakeup and titer. The titers and wake-up results are given in separate tables. Referring to fig. 4D, high arousal and medium titers are detected based on processing ocular data from eye tracking device 20. Such a representation is because of the high wakeup=1 and the medium potency=1, which is shown in table 412. Optionally, emotion recognition is performed periodically while eye data is measured. In one example, emotion recognition may occur a few seconds after ocular data is measured. In one example, emotion recognition is performed in substantially real-time after eye data is measured and recorded by a neural network. Alternatively, emotion detection may be performed after eye tracking is performed over a period of time.

The identified emotions may be presented on the screen in colored circles or as another marker. As shown in fig. 4E, the identified emotion is displayed as a colored circle 416. The color of the circle corresponds to a particular emotion. Each emotion may be represented by a unique color. The identified emotion is represented by a arousal level and a valence level identified based on processing the ocular data. The colored circles provide an easily understood format. As shown in fig. 4E, emotions may be displayed in a single word on circle 416. Fig. 4E also shows a legend 418 of different colors indicating different possible emotions. The emotions may be displayed in any other suitable format using any indicia (e.g., color, text, numbers, etc.). The emotion classification system is used to detect one of 9 emotions as previously explained.

Referring to fig. 4D, an export report button 414 is presented. Pressing this button (i.e., interacting with the button) allows a report to be generated. The report may be in Excel format or PDF format or any other format. The report may be sent to another person, for example, a psychologist, via email. The generated report may be similar to report 600 shown in fig. 6. Based on deep convolutional neural networks, the described automatic emotion classification system 10 is used for accurate emotion classification and recognition.

Embodiments of the present invention may be low cost in that they do not involve complex measurement devices such as EEG. The system requires a single eye tracking and computing device. The system can be easily implemented in a school or college by connecting an eye tracking apparatus to a computer screen and installing software for performing an emotion classification method on the computer. The eye tracking device may be simply attached, for example by using a magnetic sticker. The system is easy to use because the emotion classification method can be performed by an object sitting in front of the eye tracking device and the display.

Embodiments of the present invention recognize 9 emotions. This enables a comprehensive indication of the emotion of the subject. The described emotion classification system and method is also advantageous in that emotion recognition may be performed continuously while the system is activated and the object interacts with the system. The emotion classification system measures ocular data of a single eye, making the system easier to use and user friendly. The eye, and in particular the pupil, can change significantly due to changes in emotion. This makes the system and method of the present invention substantially accurate.

It should also be appreciated that any suitable computing system architecture may be used where the methods and systems of the present invention are implemented in whole or in part by a computing system. This would include stand-alone computers, network computers, and dedicated hardware devices. Where the terms "computing system" and "computing device" are used, these terms are intended to cover any suitable arrangement of computer hardware capable of performing the described functions.

Those skilled in the art will appreciate that many changes and/or modifications may be made to the invention as shown in the specific embodiments without departing from the spirit or scope of the invention as broadly described. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive.

Any reference to prior art contained herein should not be taken as an admission that the information is common general knowledge, unless otherwise indicated.

Claims

1. A mood classification system for identifying a mood of a subject, comprising:

an eye tracking device for capturing eye data relating to at least one eye of a subject;

receiving eye data from the eye tracking device;

2. The emotion classification system of claim 1, wherein the identified emotion is transmitted to and displayed on the display.

3. The emotion classification system of claim 2, wherein the computing device is configured to identify at least one of 9 emotions of the subject, wherein the emotions include: smoldering, boring, relaxing, sad, calm, happy, anxiety, tension or agitation.

4. A mood classification system as claimed in claim 3, wherein the computing means is arranged to process the received ocular data and output a mood model, each mood identified by the computing means being defined by the mood model;

the display is for presenting the two-dimensional emotion model.

5. The emotion classification system of claim 4, wherein the computing device is configured to identify one of the 9 emotions by classifying each emotion based on a rank of each dimension in the emotion model, wherein each emotion is represented by a combination of one of three ranks of each dimension in the emotion model.

6. The emotion classification system of claim 5, wherein the computing device is configured to express the identified emotion in a cyclic emotion model that is presented on the display.

7. The emotion classification system of claim 6, wherein the eye tracking device is configured to detect eye data via a non-contact arrangement.

8. The emotion classification system of claim 7, wherein said eye tracking device comprises:

one or more light sources for emitting light to an eye of a subject;

a detector for repeatedly capturing an image of the subject's eye;

an evaluation unit for generating ocular data, wherein the ocular data of at least one eye comprises at least one or more of: gaze location, pupil diameter, and pupil radius; and

a communication interface for transmitting the ocular data to the computing device.

9. The emotion classification system of claim 7, wherein the computing device is configured to express the identified emotion in a Russell loop emotion model, wherein the Russell loop emotion model is presented on the display, the Russell loop emotion model representing each emotion at a level of detected titer and arousal of the subject based on the ocular data, the Russell loop emotion model being represented as a circle including a vertical axis representing titer and a horizontal axis representing arousal, the detected emotion being represented on or within the circle.

10. The emotion classification system of claim 9, wherein the computing device is configured to transmit the identified emotion to a mobile device or a remote server over a wireless communication interface.