WO2021225550A1 - Emotion recognition as feedback for reinforcement learning and as an indicator of the explanation need of users - Google Patents

Emotion recognition as feedback for reinforcement learning and as an indicator of the explanation need of users Download PDF

Info

Publication number
WO2021225550A1
WO2021225550A1 PCT/TR2021/050424 TR2021050424W WO2021225550A1 WO 2021225550 A1 WO2021225550 A1 WO 2021225550A1 TR 2021050424 W TR2021050424 W TR 2021050424W WO 2021225550 A1 WO2021225550 A1 WO 2021225550A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
command
emotional state
change
feedback
Prior art date
Application number
PCT/TR2021/050424
Other languages
French (fr)
Inventor
Yaser Deniz IREN
Banu Emine AYSOLMAZ
Original Assignee
Iren Yaser Deniz
Aysolmaz Banu Emine
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Iren Yaser Deniz, Aysolmaz Banu Emine filed Critical Iren Yaser Deniz
Publication of WO2021225550A1 publication Critical patent/WO2021225550A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Abstract

The present disclosure relates to a method that uses the change in the emotional state of the user as feedback in computer systems (mainly artificial intelligence systems) and the use of the emotional state-change feedback as input to reinforcement learning. With the detection of a negative change in the user's emotional state, the operation that is initiated by the user's command (or the outcomes of that operation) is evaluated as unsatisfactory or suboptimal. The unsatisfactory command-operation-outcome trio is added to the machine learning training dataset for further training (e.g., for reinforcement learning). In addition, by giving an explanation to the user after the unsatisfactory operation, an increase in the trust in the system, a better understanding of how the system works, and a more efficient and effective use of the system are aimed to be achieved.

Description

EMOTION RECOGNITION AS FEEDBACK FOR REINFORCEMENT LEARNING AND AS AN INDICATOR OF THE EXPLANATION NEED OF USERS
FIELD OF THE INVENTION
The present disclosure relates to a system and method that uses the change in the emotional state of the user as feedback in computer systems (mainly artificial intelligence systems) and the use of these emotional-state-change feedback as inputs for further training of the machine learning systems, and for explanation-providing faculties of those systems.
PREVIOUS TECHNIQUE
Today, artificial intelligence systems directly or indirectly serve users in a wide variety of cases with examples including but not limited to home automation solutions, voice-controlled intelligent personal assistants, navigation devices, mobile phones, and in call center systems. The scope of these services may include answering direct calls, understanding and replying to the users’ verbal commands, as well as listening to the call and bringing possible replies / solutions / menu options to the call center agent to increase the efficiency of the agent.
User interaction may have a wide variety of modalities such as audio, speech, gesture, facial expressions, haptics, keyboard-mouse-actions, touch-screens, and mobile devices. The details of such interaction methods and modalities are described in the Human Computer Interaction literature. Regardless of the modality, user interaction with computer systems generally takes the form of the user giving a command (e.g., making a request/running a query, etc.) or initiating an action (e.g., watching a movie, playing a song, reading a document, etc.), followed by the system executing an operation based on the command or initiated action, and the system sharing the outcomes of the executed operation (e.g., an answer, information, recommendation, decision, conclusion, etc.) with the user. For various reasons, users may be dissatisfied with the operation that the system executes or the outcomes it yields.
Previous techniques on feedback systems:
US8191004B2 explains a method to get user satisfaction feedback using an interactive display on software interfaces, in which the user indicates satisfaction by clicking on a satisfaction- scale display. The captured satisfaction score is communicated to the software designers and developers. In CN101517366B, the inventors describe a map system in which end-users can make corrections on map errors themselves without external help from the developers of the system. The feedback from the user may also be used to refine the results or adapt the behavior of the system, specifically in machine learning systems. One example is US20180032890A1 in which the inventors describe a question-and-answer customer support system that asks for user feedback regarding the relevance of the query results that are displayed by the system, and further adapt the system to user needs. In machine learning literature, the algorithms that continuously learn and change behavior based on acquiring new training data via user feedback/interaction are called active learning systems (Settles, 2009; Rubens et al., 2015). The common characteristic of these systems that collect user feedback either with the purpose of active learning or error reporting is that the user is required to take an additional action to provide the feedback. However, the era that we live in calls for a seamless integration of user feedback to system operation without needing the user to explicitly state or take action for providing feedback.
Various studies have been conducted to determine the emotional state of the user depending on the modality of the user-computer interaction method. Several examples from the academic literature are tone-of-voice (Turkle, 2005), speech-content, text (Batbaatar et al., 2015), biometric measurements (e.g., pulse, heart rate, EEG, sweating, skin conductivity, brain waves, and saliva composition), keyboard/mouse movements (Khanna and Sasikumar, 2010; Estrada et al., 2017), body language (Noroozi et al., 2018), facial expression (Barsoum et al., 2016), or a combination of several of the above-mentioned interaction modalities (Poria et al. 2016).
There are also many patents that cover emotion recognition methods on various modalities. CN104200804B discloses a method for combining multiple sources of data (i.e. , text, audio, and video) for emotion recognition. CN106782602B describes a deep-learning model which is capable of detecting emotions on the features that are extracted from human-speech. US9020822B2 introduces an emotion recognition system that uses gist features that are based on auditory attention cues extracted from users’ voices.
One key aspect of emotion recognition relates to the purpose of emotion recognition, i.e., how the information on recognized emotions are used. KR20140032651 A describes an emotion feedback service that detects the emotion of a user via the analysis of biometrics, facial expressions, and voice, and if the emotions of the user are in negative emotion range, it provides content (i.e., photos, music, or videos) to the user for improving the emotion state of the user. KR20170136538A describes a system that enables emotion recognition in video conferencing. By means of facial-expression recognition and speech-based features, the emotion of the video conference participants are measured. The system is capable of providing an alert if one of the participant’s emotional state is recognized as angry, disoriented, irritated, or stressed. There are also examples of emotion recognition being used as an input to certain decision-making or decision-support. US20170270922 A 1 describes a smart home automation system that uses a two-step emotion recognition procedure; on tone of voice and on content of speech respectively, then uses this information to improve the users’ mood by changing the behavior of smart home devices accordingly. The novelty of this invention from the previous techniques is that this invention suggests the usage of emotion recognition and relevant affective computing techniques for improving the artificial intelligence training and as an indication of the user’s need for an explanation.
Previous techniques on explainability:
A new challenge that arose with the advent of artificial intelligence systems is explainability. These systems are complex and often work as black-boxes, since humans cannot discover the internal workings and logic behind these systems. When users cannot understand how an Al system comes to a decision given for themselves, they develop concerns about if the system works accurately, fairly, free from bias, and respecting privacy (Wachter, Mittelstadt, and Floridi, 2017). Such concerns in turn lead to a lack of trust and decreased adoption of these systems. Explanations are mechanisms to help people understand how the system operates, increase trust in the system, and enable them to use the system more effectively and efficiently (Miller, 2019). In the recent years, there have been ongoing work on developing explanations computationally for black-box systems (Guidotti et al., 2018; Adadi and Berrada, 2018) and understanding if explanations can help overcome user concerns about these systems (Mueller et al., 2019). Mainly, explanations are categorized as global and local explanations (Weller, 2019). While global explanations provide information on how the system works in general, local explanations clarify and justify a particular decision given to the user. In practice, local explanations are likely to be requested more by the users, since users are motivated to question a system when there is an undesirable or unexpected outcome (Brockner and Wiesenfeld, 1996). However, it is challenging for an Al system to detect when the user is faced with an unexpected outcome. In many cases, the system itself cannot recognize an error (e.g., wrong transcription during text-to-speech conversion), and in other cases, there is no exact right or wrong, while the user can still be disappointed (e.g., movie recommendations). Due to these difficulties, scholars have mostly evaluated the use of explanations for the unexpected outcomes when the error can be inherently detected by the system (e.g., collision of the autonomous car by Kohn et al.(2019)) or when the outcome is obviously undesired by a user (e.g., rejected loan application by Fernandez et al. (2019)). However, there is a lack of solutions to identify the need for explanations about the unexpected outcomes when the outcome cannot be identified as explicitly erroneous or negative by the Al system but the user is nevertheless disappointed and dissatisfied.
The invention disclosed herein, the use of emotions to detect when an outcome is dissatisfactory for the user, can open up new possibilities for providing explanations. The need for explanations can be identified accurately and local explanations about the outcome can be developed precisely based on the detected dissatisfaction.
With the increasing number of users, the introduction of new human-computer interaction methods, it has become imperative for artificial intelligence systems to engage in a more seamless and intuitive interaction with their users. Thus, understanding own mistakes or suboptimal operations, learning from these mistakes, and providing explanations regarding their operating mechanisms to ensure trust and effective usage is a vital feature for artificial intelligence systems, which can be achieved through the invention disclosed herein.
BRIEF SUMMARY OF THE DISCLOSURE
Throughout the interaction between the user and the system, when the user experiences dissatisfaction regarding the system due to an error, an undesirable or unexpected outcome that the system yields, the user may exhibit a change in his/her emotional state. This emotional state-change manifests in different forms based on the particular mode of interaction that the system provides. For instance, the user’s tone of voice changes when repeating a voice command to a voice-controlled intelligent assistant after the system’s suboptimal operation following a command. Thus, this emotional state-change can be measured automatically by using emotion recognition methods that are specific to the interaction method of the system.
This invention discloses a method that suggests the use of emotional state-changes paired together with the user commands as inputs/feedback to the system, thus, identifying the dissatisfaction of the user based on the detected emotion-change. In turn, when the system detects user dissatisfaction based on the user’s emotional state-change, it reacts in two ways to improve the user experience.
First, the system may identify the dissatisfaction as a user-need for explanation, thus, the system can trigger the mechanisms to provide the user with an explanation of the command- operation-outcome trio, of which it detects the dissatisfaction of the user. This response helps the user understand how the system works and ensures better use in the future and increased trust and satisfaction towards the system. Secondly, the system takes the necessary precautions not to repeat the unsatisfactory operation in the future by saving this command- operation-outcome trio as unsatisfactory. This unsatisfactory record is then used to update the training data set of machine learning/artificial intelligence systems, which enables reinforcement learning in a natural and seamless way.
BRIEF DESCRIPTION OF THE DRAWINGS
The explanations of the figures that are provided to better convey the detailed description of the disclosure are given below.
Figure-1 Flow chart shows a generic user-computer interaction process according to the prior art.
Figure-2 displays a flow chart showing the user-computer interaction realized with the support of emotional-state detection by using the method described in the present disclosure. Figure-3 displays an alternative flow chart showing the user-computer interaction realized with the support of emotional-state detection by using the method described in the present disclosure.
Figure-4 displays an alternative flow chart showing the user-computer interaction realized with the support of emotional-state detection by using the method described in the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS
A generic user-computer interaction scenario consists of a sequence of steps: first, the user gives a command to the computer system (e.g., a query on a search engine, a voice command to turn off the lights.), then the computer performs the operation to execute the command, and finally communicates the outcomes of the executed command to the user. This interaction generally continues by repeating the steps as the user continues to use the computer system.
This invention describes a method and a system that considers the user’s emotions as they manifest throughout the interaction with the computer system. Each command (and the outcome of the execution) is recorded alongside the emotion that is expressed by the user while giving the command. The recorded emotion is compared against the previously recorded emotions of the user in the same interaction session as well as a long-term interaction history. If a significant negative change in emotional state is observed, the system labels the command-operation-outcome trio as unsatisfactory. In that case the system takes two actions: It provides the user with an explanation regarding the execution of the operation and how and why the outcomes were reached,
It adds the command-operation-outcome trio to the training dataset of the underlying machine learning/artificial intelligence models to be used in reinforcement learning.
This method identifies a negative change in emotional states based on a significant increase in the negative emotional states. A non-exhaustive list of examples of such negative emotional states are confusion, stress, anger, sadness, disappointment, dissatisfaction, frustration, contempt, and disgust. The identification of the change in the user’s emotional state is performed based on the comparison of the currently (or most recently) recognized emotion state against the recently detected sequences of emotional states or against the statistical descriptives of the historically recognized emotional states of the user. If the change is higher than a threshold (towards any side of the mean value), it is considered significant. By default, this threshold is the standard deviation of the distribution of the historically recognized emotional states of the user. The threshold can be customized by the user, thus, the sensitivity of the method can be calibrated to fit the intention and comfortable use of various users.
According to this disclosure, the term “emotional state” refers to the users’ emotions, mood, affect, sentiment, arousal, emotional valence, and interpersonal stance. There are many publications in the psychology, neuroscience, and computer science literature about the definition and classification of these concepts. Some examples of such classifications are as the following. Ekman's 6-basic emotional classification (Ekman, 1999), Russell and Feldman Barret's circular emotional model (Russell and Barrett Feldman, 1999), Bradley's vector model with arousal and valence dimensions (Bradley et al. , 1992), Tellegen and Watson's positive / negative (PANA) emotional model (Watson, Tellegen, 1985), Plutchik's emotional model (Plutchik, 2001), containing Mehrabian's PAD model that includes pleasure / arousal / dominance dimensions (Mehrabian, 1980), the 48-emotion classifications made by HUMAINE (HUMAINE, 2006), Parrot's tree-structured emotion classification (Shaver et al, 1987; Parrot, 2001), and the 27-emotion categorization published by the University of California (Cowen and Keltner, 2017).
A plethora of techniques exist in the literature for the identification of the user's emotional state depending on the mode of interaction between the user and the system. Every passing day, new emotional state recognition techniques are invented that use different modalities. Some prominent examples are the tone-of-voice, speech-content, text, biometric measurements (pulse, EEG, sweating, skin conductivity, brain waves, saliva composition, etc.), keyboard and mouse usage behavior, body language, facial expressions, or a combination of several of them. The method described in this disclosure can work with any modality, any emotional state detection technique, using any type of emotional state classification. The detection of emotional state can be implemented as a part of this invention as well as a third-party module.
Computer systems, especially artificial intelligence and machine learning systems, often work as black-boxes. The explanation of the operations performed by these systems and the outcomes they produce are deemed to be complex, difficult to understand, and sometimes even impossible to achieve. Numerous projects are currently performed for making the artificial intelligence systems transparent and explicable, identifying and eliminating unwanted trends (bias) in the distribution of the data in the training dataset, and making the systems work in an equitable and fair way. New techniques are being developed to generate explanations about the operations and outcomes of these complex systems. However, these techniques require a mechanism to detect when they need to provide explanations to their users. The invention disclosed herein provides a method to detect when the user needs explanations by comparing emotional state-changes in user commands and present explanations, and in this way, enables complex computer systems to become transparent. Explanations can be any type of additional information including but not limited to the details of the internal working of the system, the distribution of the data, outliers, biases in the training data, the current configuration of the system, statistical facts regarding the general user population of the system or a subset of the users that are similar to the user of the system, user’s emotional states, and generic tips for the usage of the system. The generation of the explanation can be implemented as a part of this invention as well as a third party module. Through the use of this method, the user will understand the system better, use it more efficiently, and have an increased trust in the system. The detected emotional state-changes are also used as feedback to solve the problems by the application developers and improve the system. As a result, a seamless user-computer interaction can be achieved.
The training of supervised machine learning involves the use of labelled training data. Assuming that the quality of the data is sufficient, the statistical principles suggest that the use of more training data yields better machine learning performance. For this reason, in practice, machine learning/artificial intelligence systems are generally repeatedly and continuously trained to improve their performance. Such continuous training can be achieved in several ways. One approach is reinforcement learning in which the artificial intelligence system attempts to improve its performance by optimizing the rewards and punishments that are set according to the observation of its outcomes. Another similar concept is active learning in which the artificial intelligence system gets explicit feedback from the users regarding the outcomes it produces, and uses this feedback to improve its performance. All active learning systems and the reinforcement learning systems in which the rewards and punishments are set according to the user feedback rely on the existence of a mechanism for the user to provide feedback to them. This invention comprises such a mechanism, by using the change in the emotional-states of the user as feedback throughout the interaction. More specifically, when a negative change in the emotional-state of the user is detected, the system labels the command-operation-outcome trio as unsatisfactory, and uses it as a feedback for reinforcement learning/active learning purposes. Similarly, a positive change and no-change in the user’s emotional state leads the system to label the command-operation-outcome trio as satisfactory. Thus, a continuous feedback cycle is formed in which the feedback is provided during the natural occurrence of the interaction between the user and the system, without the need for an explicit action from the user.
This invention forms a bridge between three components of an artificial intelligence system: the emotional state-change recognition component, the explanation component, and the continuous learning component. The emotional-state-change recognition component is responsible for the recognition of the user’s emotional states by means of any method and based on any modality. The explanation component is responsible for providing an explanation regarding the operation and the outcome of the computer system during the interaction between the user and the computer system. The continuous learning component is responsible for labelling the satisfactory and unsatisfactory command-operation-outcome trios and use them for reinforcement learning/active learning. This invention and all three components described above can be implemented and produced at different levels such as at the hardware-level as embedded software, at the operating system-level as a native part of the operating system, at client software-level as an application on the computer or an app on a mobile device or a wearable, or at the cloud-level as a service that operate remotely.
An exemplary application of the invention is indicated in Figure-2. After the user transmits a command (C1), the emotional state (ES1) related to this command will be determined, followed by the second command (C2) and the determination of the emotional state of the second command (ES2). These commands are considered to be related to each other if the time elapsed between the user giving the two commands or observing the outcomes of the first command and giving the second command is less than a predefined duration. If there is a significant negative emotional state change among ES1 and ES2, which means the change is over a threshold, this is used as an indication of an unsatisfactory/erroneous operation of the system, which caused the user to be dissatisfied. In this case, an explanation will be presented to the user about why the operation 01 is performed this way and the outcome OC1 is produced. In parallel, the command-operation-outcome (C1-01-OC1) trio is added to the training dataset for reinforcement learning/active learning to improve the performance of the artificial intelligence system. Then, the system continues to work regularly, ready to receive new commands from the user.
An alternative exemplary application of the invention is indicated in Figure-3. After the user transmits a command (C1), the emotional state (ES1) related to this command will be determined, followed by the second command (C2) and the determination of the emotional state of the second command (ES2). Then, the similarity between the two commands is calculated. If they are sufficiently similar to each other, or related to the same desired operation, and also if time elapsed between the transmission of the two commands is less than a predefined duration, then these commands are considered to be related to each other. In the case that the commands are not related to each other, the system continues to operate by performing the operation (02) for the user command (C2). In the case that the two commands are related to each other, the emotion state change between ES1 and ES2 is calculated. If there is a significant negative emotional state change among ES1 and ES2, which means the change is over a threshold, this is used as an indication of an unsatisfactory/erroneous operation of the system, which caused the user to be dissatisfied. In this case, an explanation will be presented to the user about why the operation 01 is performed this way and the outcome OC1 is produced. In parallel, the command-operation- outcome (C1-01-0C1) trio is added to the training dataset for reinforcement learning/active learning to improve the performance of the artificial intelligence system. Then, the system continues to work regularly, ready to receive new commands from the user.
Another alternative exemplary application of the invention is indicated in Figure-4. After the user transmits a command (C1), the emotional state (ES1) related to this command will be determined, followed by the second command (C2) and the determination of the emotional state of the second command (ES2). These commands are considered to be related to each other if the time elapsed between the user giving the two commands or observing the outcomes of the first command and giving the second command is less than a predefined duration. If there is a significant negative emotional state change among ES1 and ES2, which means the change is over a threshold, this is used as an indication of an unsatisfactory/erroneous operation of the system, which caused the user to be dissatisfied. In this case, an explanation will be presented to the user about why the operation 01 is performed this way and the outcome OC1 is produced. In parallel, the command-operation- outcome (C1-01-0C1) trio is added to the training dataset for reinforcement learning/active learning to improve the performance of the artificial intelligence system. Then, the system continues by performing the operation for the command (C2) and waits for the new commands. If the emotional state change is not detected to be significant, then the system continues by performing the operation for the command (C2) and waits for the new commands.
In brief; a method for using the change in the emotional state of the user as feedback in computer systems (mainly artificial intelligence systems) and using the emotional-state- change feedback as input to reinforcement learning, which involves the repeated actions of receiving a command from a user, performing an operation based on the user command, and presenting the outcome and characterized by the following steps: identification and registration of the emotional state of the user by means of an emotion detection module when a command is received from the user comparison of the emotional state-changes of the user among related commands and when the change is over a threshold:
• presentation of an explanation to the user about how and why the operation is performed and the outcome is produced
• adding the command-operation-outcome trio labels to the training dataset to be used in the reinforcement learning of the underlying machine learning/artificial intelligence system.
An alternative method for the identification of the dissatisfaction of the user based on the negative emotional state-change is the user by comparing emotional states between related commands against a threshold without the need for an explicit action from the user.
An alternative method for the adaptation of the threshold based on the identified historical emotional state-changes of the user for identifying an unsatisfactory outcome.
INDUSTRIAL USAGE/APPLICABILITY EXAMPLES
A typical industrial application of the invention is the following. Voice-controlled assistants recognize the voice commands of the users. These assistants use automated speech recognition technologies to understand the voice commands. It is not uncommon that a voice controlled assistant fails to understand or misunderstands a user command. In those cases, generally, the user repeats the same command (or a similar command). The failure to appropriately understand and execute the command in the first attempt creates an emotional state change in the user, which is then reflected to the tone of the voice of the user when giving the next command. According to the underlying method of this invention, the change in the emotional state of the user is detected by using a speech-emotion-recognition technique, and if the change is greater than a threshold (e.g., two standard deviations below or above the mean) it is accepted as an indication of an unsatisfactory performance in executing the previous command. Thus, the previous command-operation-outcome trio are marked as an unsatisfactory example to be used for further training of the automatic speech recognition model/algorithm of the voice controlled assistant. Moreover, once the unsatisfactory operation is understood, according to this invention, an explanation for the unsatisfactory performance can be given to the user.
Scenario:
User: “Voice Assistant, play my favorite rock music playlist.”
Voice Assistant: “I did not quite understand that.”
User: (with a changed tone of voice) “Play my favorite rock music playlist.”
The emotional state change is recognized in consecutive commands.
The vocal features of the initial command and the response of the voice assistant are marked as an unsatisfactory example. These are set aside for active learning/reinforcement learning of the automated speech recognition module of the voice assistant, thus potentially improving the performance. In parallel, an explanation is given to the user.
Voice Assistant: (conveys the explanation) “I noticed that you were not satisfied with my previous performance. According to my analysis, the mistake probably happened due to background noise.”
The applications of this invention are not limited to this example with the voice-controlled assistants, and voice commands. It can be applied to any setting in which the user interacts with a computer system. Another typical example is the interactive remote-controlled TVs and game consoles which can detect users’ body movements, hand gestures, and facial expressions to interact with them. Therefore, in this scenario the system will detect the change in emotional state based on the available interaction modalities, e.g., body language and facial expressions, and use it as a feedback for further training of the system and as a trigger for a need for explanation on the user’s side.
Benefits: The invention has multiple contributions and benefits. First, it facilitates the automated continuous improvement of the system performance by labelling the satisfactory/unsatisfactory command-operation-outcome instances for active learning/reinforcement learning when a significant change in the user’s emotional state is detected. Secondly, it allows implicit and natural user feedback during the interaction. In other words, the user does not need to explicitly report the unsatisfactory performance of the systems which saves the user from taking additional actions (e.g., sending an error ticket, clicking on a flag icon that indicates the error). Thus, the natural style of the interaction is not broken while the user is still implicitly giving feedback (i.e., with emotions) regarding the performance of the system. Thirdly, the explanation provided to the user increases the user’s understanding of how the system operates allowing the user to interact more efficiently with the system. Moreover, the explanations increase the trust in the system, and in turn, a wider adoption of the system. Finally, the users appreciate that the system takes their emotions into consideration, therefore the invention leads to a closer and more intimate relationship between the user and the system.
REFERENCES
Adadi, A., & Berrada, M. (2018). Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access, 6, 52138-52160.
Barsoum, E., Zhang, C., Ferrer, C. C., & Zhang, Z. (2016, October). Training deep networks for facial expression recognition with crowd-sourced label distribution. In Proceedings of the 18th ACM International Conference on Multimodal Interaction (pp. 279-283).
Batbaatar, E., Li, M., & Ryu, K. H. (2019). Semantic-emotion neural network for emotion recognition from text. IEEE Access, 7, 111866-111878.
Bradley, M. M., Greenwald, M. K., Petry, M. C., & Lang, P. J. (1992). Remembering pictures: pleasure and arousal in memory. Journal of experimental psychology: Learning, Memory, and Cognition, 18(2), 379.
Brockner, J., & Wiesenfeld, B. M. (1996). An Integrative Framework for Explaining Reactions to Decisions: Interactive Effects of Outcomes and Procedures. Psychological Bulletin, 120(2), 189-208.
Cowen, A. S., & Keltner, D. (2017). Self-report captures 27 distinct categories of emotion bridged by continuous gradients. Proceedings of the National Academy of Sciences, 114(38), E7900-E7909.
Ekman, P. (1999). Basic emotions. Handbook of cognition and emotion. Wiley, New York, 301-320.
Estrada, J., Buhia, J., Guevarra, A., & Forcado, M. R. (2017, November). Keyboard and mouse: tools in identifying emotions during computer activities. In International Conference on Big Data Technologies and Applications (pp. 115-123). Springer, Cham.
Fernandez, C., Provost, F., & Han, X. (2019). Counterfactual Explanations for Data-Driven Decisions. Fortieth International Conference on Information Systems, 1-9.
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Pedreschi, D., & Giannotti, F. (2018). A Survey Of Methods For Explaining Black Box Models. ACM Computing Surveys, 51(5), 1-42. HUMAINE Association. (2006). Humaine emotion annotation and representation language (earl): Proposal.
Khanna, P., & Sasikumar, M. (2010). Recognising emotions from keyboard stroke pattern. International journal of computer applications, 11(9), 1-5.
Kohn, S. C., Quinn, D., Pak, R., De Visser, E. J., & Shaw, T. H. (2018). Trust repair strategies with self-driving vehicles: An exploratory study. Proceedings of the Human Factors and Ergonomics Society, 2, 1108-1112.
Mehrabian, A. (1980). Basic dimensions for a general psychological theory: Implications for personality, social, environmental, and developmental studies (Vol. 2). Cambridge, MA: Oelgeschlager, Gunn & Hain.
Miller, T. (2019). Explanation in artificial intelligence : Insights from the social sciences. Artificial Intelligence, 267, 1-38.
Mueller, S. T., Hoffman, R. R., Clancey, W., Emrey, A., & Klein, G. (2019). Explanation in Human-AI Systems: A Literature Meta-Review, Synopsis of Key Ideas and Publications, and Bibliography for Explainable Al (Issue February 2019).
Noroozi, F., Kaminska, D., Corneanu, C., Sapinski, T., Escalera, S., & Anbarjafari, G. (2018). Survey on emotional body gesture recognition. IEEE transactions on affective computing.
Parrott, W. G. (Ed.). (2001). Emotions in social psychology: Essential readings psychology press.
Plutchik, R. (2001). The nature of emotions: Human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. American scientist, 89(4), 344-350.
Poria, S., Chaturvedi, I., Cambria, E., & Hussain, A. (2016, December). Convolutional MKL based multimodal emotion recognition and sentiment analysis. In 2016 IEEE 16th international conference on data mining (ICDM) (pp. 439-448). IEEE.
Rubens, N., Elahi, M., Sugiyama, M., & Kaplan, D. (2015). Active learning in recommender systems. In Recommender systems handbook (pp. 809-846). Springer, Boston, MA. Russell, J. A., & Barrett, L. F. (1999). Core affect, prototypical emotional episodes, and other things called emotion: dissecting the elephant. Journal of personality and social psychology, 76(5), 805.
Settles, B. (2009). Active learning literature survey. University of Wisconsin-Madison Department of Computer Sciences.
Shaver, P., Schwartz, J., Kirson, D., & O'connor, C. (1987). Emotion knowledge: further exploration of a prototype approach. Journal of personality and social psychology, 52(6), 1061.
Turkle, S. (2005). The second self: Computers and the human spirit. Mit Press.
Wachter, S., Mittelstadt, B., & Floridi, L. (2017). Transparent, explainable, and accountable Al for robotics. Science Robotics, 2(6).
Watson, D., & Tellegen, A. (1985). Toward a consensual structure of mood. Psychological bulletin, 98(2), 219.
Weller, A. (2019). Transparency: Motivations and Challenges. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11700 LNCS(Section 2), 23-40.

Claims

1. A method for using the change in the emotional state of the user as feedback in computer systems (mainly artificial intelligence systems) and using the emotional-state-change feedback as input to reinforcement learning, which involves the repeated actions of receiving a command from a user, performing an operation based on the user command, and presenting the outcome and characterized by the following steps:
Identification and registration of the emotional state of the user by means of an emotion detection module when a command is received from the user Comparison of the emotional state-changes of the user among related commands and when the change is over a threshold:
• Presentation of an explanation to the user about how and why the operation is performed and the outcome is produced
• Adding the command-operation-outcome trio labels to the training dataset to be used in the reinforcement learning of the underlying machine learning/artificial intelligence system
2. A method characterized by the identification of the dissatisfaction of the user based on the negative emotional state-change of the user by comparing emotional states between related commands against a threshold without the need for an explicit action from the user, as described in Claim 1.
3. A method characterized by the adaptation of the threshold based on the identified historical emotional state-changes of the user for identifying an unsatisfactory outcome, as described in Claim 2.
PCT/TR2021/050424 2020-05-06 2021-05-04 Emotion recognition as feedback for reinforcement learning and as an indicator of the explanation need of users WO2021225550A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
TR202007039 2020-05-06
TR2020/07039 2020-05-06

Publications (1)

Publication Number Publication Date
WO2021225550A1 true WO2021225550A1 (en) 2021-11-11

Family

ID=78468193

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/TR2021/050424 WO2021225550A1 (en) 2020-05-06 2021-05-04 Emotion recognition as feedback for reinforcement learning and as an indicator of the explanation need of users

Country Status (1)

Country Link
WO (1) WO2021225550A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115101032A (en) * 2022-06-17 2022-09-23 北京有竹居网络技术有限公司 Method, apparatus, electronic device and medium for generating score of text

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108334583A (en) * 2018-01-26 2018-07-27 上海智臻智能网络科技股份有限公司 Affective interaction method and device, computer readable storage medium, computer equipment
CN109670030A (en) * 2018-12-30 2019-04-23 联想(北京)有限公司 Question and answer exchange method and device
WO2019146866A1 (en) * 2018-01-29 2019-08-01 삼성전자주식회사 Robot reacting on basis of user behavior and control method therefor
US20190271940A1 (en) * 2018-03-05 2019-09-05 Samsung Electronics Co., Ltd. Electronic device, external device capable of being combined with the electronic device, and a display method thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108334583A (en) * 2018-01-26 2018-07-27 上海智臻智能网络科技股份有限公司 Affective interaction method and device, computer readable storage medium, computer equipment
WO2019146866A1 (en) * 2018-01-29 2019-08-01 삼성전자주식회사 Robot reacting on basis of user behavior and control method therefor
US20190271940A1 (en) * 2018-03-05 2019-09-05 Samsung Electronics Co., Ltd. Electronic device, external device capable of being combined with the electronic device, and a display method thereof
CN109670030A (en) * 2018-12-30 2019-04-23 联想(北京)有限公司 Question and answer exchange method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115101032A (en) * 2022-06-17 2022-09-23 北京有竹居网络技术有限公司 Method, apparatus, electronic device and medium for generating score of text

Similar Documents

Publication Publication Date Title
US11120895B2 (en) Systems and methods for mental health assessment
US10748644B2 (en) Systems and methods for mental health assessment
US11226673B2 (en) Affective interaction systems, devices, and methods based on affective computing user interface
Kächele et al. Inferring depression and affect from application dependent meta knowledge
WO2021081418A1 (en) Acoustic and natural language processing models for speech-based screening and monitoring of behavioral health conditions
Sheth et al. Cognitive services and intelligent chatbots: current perspectives and special issue introduction
Heimerl et al. Unraveling ml models of emotion with nova: Multi-level explainable ai for non-experts
Sun et al. Dynamic emotion modelling and anomaly detection in conversation based on emotional transition tensor
Griol et al. Modeling the user state for context-aware spoken interaction in ambient assisted living
Biancardi et al. A computational model for managing impressions of an embodied conversational agent in real-time
Galitsky et al. Chatbot components and architectures
Cortiñas-Lorenzo et al. Toward Explainable Affective Computing: A Review
WO2021225550A1 (en) Emotion recognition as feedback for reinforcement learning and as an indicator of the explanation need of users
Afzal et al. 26 Emotion Data Collection and Its Implications for Affective Computing
US20200257954A1 (en) Techniques for generating digital personas
US20230011923A1 (en) System for providing a virtual focus group facility
Kumano et al. Computational model of idiosyncratic perception of others' emotions
Lugrin et al. Modeling and evaluating a bayesian network of culture-dependent behaviors
Grafsgaard Multimodal affect modeling in task-oriented tutorial dialogue
Utami et al. A Brief Study of The Use of Pattern Recognition in Online Learning: Recommendation for Assessing Teaching Skills Automatically Online Based
Du et al. Multimodal emotion recognition based on feature fusion and residual connection
Ayoub Multimodal Affective Computing Using Temporal Convolutional Neural Network and Deep Convolutional Neural Networks
Cafaro et al. Nonverbal behavior in multimodal performances
Wattearachchi et al. Emotional Keyboard: To Provide Adaptive Functionalities Based on the Current User Emotion and the Context.
Hupont et al. From a discrete perspective of emotions to continuous, dynamic, and multimodal affect sensing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21799665

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21799665

Country of ref document: EP

Kind code of ref document: A1