WO2020207041A1 - System and method for dynamically recommending inputs based on identification of user emotions - Google Patents

System and method for dynamically recommending inputs based on identification of user emotions Download PDF

Info

Publication number
WO2020207041A1
WO2020207041A1 PCT/CN2019/122695 CN2019122695W WO2020207041A1 WO 2020207041 A1 WO2020207041 A1 WO 2020207041A1 CN 2019122695 W CN2019122695 W CN 2019122695W WO 2020207041 A1 WO2020207041 A1 WO 2020207041A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
input
module
gesture
content
Prior art date
Application number
PCT/CN2019/122695
Other languages
French (fr)
Inventor
Sumit Kumar Tiwary
Manoj Kumar
Yogiraj BANERJI
Govind JANARDHANAN
Tasleem Arif
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp., Ltd. filed Critical Guangdong Oppo Mobile Telecommunications Corp., Ltd.
Priority to CN201980095244.3A priority Critical patent/CN113785539A/en
Publication of WO2020207041A1 publication Critical patent/WO2020207041A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/07User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail characterised by the inclusion of specific contents
    • H04L51/10Multimedia information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/52User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail for supporting social networking services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72436User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for text messaging, e.g. short messaging services [SMS] or e-mails
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/0035User-machine interface; Control console
    • H04N1/00352Input means
    • H04N1/00381Input by recognition or interpretation of visible user gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2203/00Indexing scheme relating to G06F3/00 - G06F3/048
    • G06F2203/01Indexing scheme relating to G06F3/01
    • G06F2203/011Emotion or mood input determined on the basis of sensed human body parameters such as pulse, heart rate or beat, temperature of skin, facial expressions, iris, voice pitch, brain activity patterns
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2250/00Details of telephonic subscriber devices
    • H04M2250/52Details of telephonic subscriber devices including functional features of a camera

Definitions

  • the present invention relates generally to management of user inputs in electronic communication sessions, and more particularly, to a system and a method for dynamically recommending input contents based on identification of user emotions.
  • the various calling and/or messaging applications or ‘apps’ are either pre ⁇ installed in said electronic devices or can be downloaded and installed according to user requirements and preferences.
  • a typical electronic communication device such as a mobile phone, a tablet device, a smart phone, a laptop etc.; through the various ‘apps’ , facilitate user (s) to express his/her emotions by providing various options including images, videos, graphical images such as GIF, texts, icons etc. These options can be inputted by the users during any communication sessions according to their requirements and preferences.
  • a smart phone may be equipped with one or more electronic messaging applications that may be configured to typically receive the user inputs in form of texts, images and/or icons for facilitating the user to thereby convey their emotions to other users or recipients.
  • United States patent publications US20170147202 and US20030110450 disclose solutions for expressing any emotion information in a text message, based on the keyboard inputs and voice/audio inputs received from a user.
  • various text input parameters and the voice input parameters such as typing speed and voice pitch may allow to detect only the high intensity emotions, and hence all types and all degrees of emotion is difficult to be predicted in the text and voice inputs.
  • These types of inputs therefore require an additional input parameter of users’ gestures in order to accurately detect the emotion of the user in real time.
  • deep learning techniques are also used that detect emotional contexts within phrases or set of texts being inputted by the user. For example, the phrases “Go Away! ” and “Shut Up” etc. can be identified, by the state ⁇ of ⁇ the ⁇ art deep learning techniques, as having an anger element, even if the word “anger” or any of its synonyms has not been used in said phrases.
  • the deep ⁇ learning techniques only a limited number of phrases pertaining to limited types of emotions can be detected by the deep ⁇ learning techniques and may not detect all kinds of emotional contexts with accuracy.
  • an object of the present invention to provide methods and systems for automatically analysing the inputs received from the users in real ⁇ time. Another object of the invention is to manage the received user inputs in a manner such that the accurate input options is provided to the users to express their emotions in any text messages during a communication session. Another object of the invention is to receive real ⁇ time images having at least one gesture of the user. Yet another object of the present invention is to detect emotional data from the received inputs and the real ⁇ time images. A further object of the present invention is to recommend to the users, accurate input contents based on the emotional data for expressing emotions. Yet another object of the present invention is to facilitate the users to use additional input options for expressing their emotions in the text messages during a communication session.
  • one aspect of the present invention may relate to a method for dynamically recommending at least one input content to a user, in a communication session, over a communication network, the method comprising the steps of: receiving, via an input module, at least one user input in real ⁇ time; receiving, via a camera module, at least one portion of an image of the user, the at least one portion indicating at least one gesture of the user in real ⁇ time; identifying, using an expression identification module, at least one expression associated with the at least one gesture of the user; identifying, using an emotion detection module, at least one emotional data based on the at least one expression and the at least one user input; determining, using a processing module, the at least one input content, for the at least one user input, based on the at least one emotional data; and recommending to the user, using a display module, to select and use the at least one input content along with the at least one user input, during the communication session.
  • the system comprises an input module configured to receive at least one user input in real ⁇ time; a camera module, configured to receive at least one portion of an image of the user, the at least one portion indicating at least one gesture of the user in real ⁇ time; an expression identification module configured to identify at least one expression associated with the at least one gesture of the user; an emotion detection module configured to identify at least one emotional data based on the at least one expression and the at least one user input; a processing module configured to determine the at least one input content, for the at least one user input, based on the at least one emotional data; and a display module configured to recommend to the user, to select and use the at least one input content along with the at least one user input, during the communication session.
  • Another aspect of the invention may relate to a method and a system for tracking continuously the at least one gesture of the user in real ⁇ time to identify any changes; update emotional data based on the changes identified in the at least one gesture; and recommending an updated at least one content to the user, based on the updated emotional data.
  • Another aspect of the invention may relate to a method and a system for marking a beginning and an end of the at least one gesture received from the user; automatically perform segmentation of the at least one user input to create one or more segments based on the beginning and the end of the at least one gesture; identify a corresponding emotional data for each segment of the one or more segments based on the at least one gesture and the at least one user input; and recommend to the user, the at least one input content based on the corresponding emotional data for each segment of the at least one user input.
  • Another aspect of the invention may relate to a method and a system for providing an indication to the user whether or not the at least one portion of the real ⁇ time image is being adequately captured by the camera module.
  • Fig. 1 illustrates a system architecture [100] for dynamically recommending at least one input content to a user in a communication session over a communication network, in accordance with exemplary embodiments of the present invention.
  • Fig. 2 is a block diagram illustrating the system [100] elements for providing expression identification and tracking, in accordance with exemplary embodiments of the present invention.
  • Fig. 3 is a block diagram illustrating the system [100] elements for identifying emotional data, in accordance with exemplary embodiments of the present invention.
  • Fig. 4 is a block diagram illustrating the system [100] elements performing actions based on available history and profile information, in accordance with exemplary embodiments of the present invention.
  • Fig. 5 illustrates a scenario wherein the user is given an indication of the real ⁇ time image being captured, in accordance with exemplary embodiments of the present invention.
  • Fig. 6 illustrates a scenario wherein the user is prompted to use an updated input content based on any change in the emotional data, in accordance with exemplary embodiments of the present invention.
  • Fig. 7 illustrates a scenario wherein the user is prompted to use multiple input content options based on multiple emotions in a single text content, in accordance with exemplary embodiments of the present invention.
  • Fig. 8 is a flowchart illustrating the method for dynamically recommending at least one input content to a user in a communication session over a communication network, in accordance with exemplary embodiments of the present invention.
  • Fig. 9 illustrates the framework supporting the execution of the exemplary embodiments of the present invention.
  • the present invention encompasses systems and methods for dynamically recommending input contents based on identification of user emotions, during a communication session between users over a communication network.
  • At least one user input is received in real ⁇ time from a user via an input module.
  • at least one portion of an image of the user is also received using a camera module.
  • the at least one portion of the user image indicates at least one gesture of the user in real ⁇ time.
  • at least one expression associated with the at least one gesture of the user is identified.
  • At least one emotional data is determined based on the at least one expression and the at least one user input.
  • the at least one input content is determined.
  • the at least one input content is thereafter recommended to the user, for selecting and using the recommended at least one input content along with the at least one user input, during the communication session.
  • hardware includes a combination of discrete components, an integrated circuit, an application specific integrated circuit, a field programmable gate array, other programmable logic devices and/or other suitable hardware as may be obvious to a person skilled in the art.
  • software includes one or more objects, agents, threads, lines of code, subroutines, separate software applications, or other suitable software structures as may be obvious to a skilled person.
  • software can include one or more lines of code or other suitable software structures operating in a general ⁇ purpose software application, such as an operating system, and one or more lines of code or other suitable software structures operating in a specific purpose software application.
  • application or “applications” or “apps” are the software applications residing in respective electronic communication devices and can be either pre ⁇ installed or can be downloaded and installed in said devices.
  • the applications include, but are not limited to, contact management application, calendar application, messaging applications, image and/or video modification and viewing applications, gaming applications, navigational applications, office applications, business applications, educational applications, health and fitness applications, medical applications, finance applications, social networking applications, and any other application.
  • the application uses “data” that can be created, modified or installed in an electronic device over time. The data includes, but is not limited to, contacts, calendar entries, call logs, SMS, images, videos, factory data, emails and data associated with one or more applications.
  • Couple and its cognate terms, such as “couples” and “coupled” includes a physical connection (such as a conductor) , a virtual connection (such as through randomly assigned memory locations of data memory device) , a logical connection (such as through logical gates of semiconducting device) , other suitable connections, or a combination of such connections, as may be obvious to a skilled person.
  • “electronic communication device” includes, but is not limited to, a mobile phone, a wearable device, smart phone, set ⁇ top boxes, smart television, laptop, a general ⁇ purpose computer, desktop, personal digital assistant, tablet computer, mainframe computer, or any other computer implemented electronic device that is capable of making transactions of communication messages or data, as may be known to a person skilled in the art.
  • ‘expression’ of the user is detected through facial expressions, hand movements, fingers movements, thumbs movements, head movement, leg movement etc.
  • the various facial features of the user include movements of eyes, nose, lips, eyebrows, jaw movements etc.
  • the expression can be classified into any particular category by using state ⁇ of ⁇ the ⁇ art classifiers suitable for specific expression recognition.
  • emotional data is any data pertaining to the expression of the users and emotions of the user being inputted through a text, an audio, an icon or any image.
  • the emotional data is determined by analysing the received user input for example, words, sentences, phrases, GIF, images, icons, etc along with the user expression.
  • the emotional data can be classified into any particular type and degree by using predefined categories of human emotions, wherein the predefined categories of human emotions may be stored locally on the device or on one or more remote servers.
  • depth camera is a type of camera that is capable of capturing the depth information of any scene or a video frame that is being received as an input.
  • Fig. 1 illustrates a system architecture for dynamically recommending at least one input content to a user in a communication session over a communication network, in accordance with exemplary embodiments of the present disclosure.
  • the system [100] comprises a data managing module [102] , a profile managing module [104] , a dynamic content module [106] , a messaging application module [112] , a processing module [110] , a camera module [108] , an input module [118] , an emotion detection module [114] and an expression identification module [116] .
  • the messaging application module [112] initiates any communication session between the users.
  • the messaging application module [112] is configured to initiate one or more third party messaging applications, social networking applications, instant messenger applications, online chat applications running on any portals etc., that require the users to input text, voice, or image and accordingly convey their messages during the respective communication sessions.
  • the messaging application module [112] may also be triggered by installation of any input devices including keyboard, mouse, joysticks etc.
  • the messaging application module [112] may also be triggered by a touch input received from the user.
  • the messaging application module [112] is communicably coupled to the processing module [110] and makes a request to the processing module [110] for identifying the expression and emotions of the user.
  • the processing module [110] is coupled to the expression identification module [116] and the emotion detection module [114] that respectively identifies the expression and the emotional data from the inputs being received.
  • the expression identification module [116] receives at least one portion of the image of the user that indicates at least one gesture of the user.
  • the expression identification module [116] identifies the at least one expression associated with the at least one gesture of the user.
  • the emotion detection module [114] identifies at least one emotional data based on the combination of at least one expression and the at least one user input.
  • the emotional data is further used by the processing module [110] to determine at least one input content that is recommended to the user via a display module.
  • the at least one input content may include but is not limited to an icon, an emoticon, an image, a sticker, a text, a set of texts, a word, a set of words etc. Therefore, the user may implement the recommended at least one input content in the transactions of the messages, using the messaging applications being executed through the respective device.
  • the messaging application module [112] invokes the processing module [110] by sending a request to analyze the at least one user input and the at least one portion of an image of the user.
  • the processing module [110] thereafter seeks for the at least one portion of the image of the userfrom the camera module [108] and the at least one user input from the input module [118] and accordingly determine the at least one input content to be recommended to the user based on the emotional data.
  • the processing module [110] therefore acts as a context analyser that analyses and processes the received inputs to identify the emotional context associated with the same.
  • the processing module [110] is coupled to the camera module [108] and the input module [118] for receiving the user image and the at least one user input respectively.
  • the camera module [108] may include a depth camera that is capable of capturing depth information of the real ⁇ time image of the user.
  • the user image received using the camera module [108] comprises the at least one portion indicating at least one gesture of the user in real ⁇ time.
  • the processing module [110] is communicably coupled to the expression identification module [116] and the emotion detection module [114] that are supported by Artificial Intelligence (AI) tools and frameworks to respectively identify expressions and emotions of the users while the user is typing any input, or making any gestures during any ongoing communication session.
  • AI Artificial Intelligence
  • the input module [118] provides the at least one user input when the third ⁇ party messaging application are invoked.
  • the at least one user input may include a word, a set of texts, sentences, phrases, voice data, stored images, live images, media etc.
  • the camera module [108] may receive the image of the user in real ⁇ time, wherein at least one portion of the user image indicates the at least one gesture of the user in real ⁇ time.
  • the at least one gesture is analysed by the expression identification module [116] to identify the expression of the user being conveyed through his/her gesture.
  • the emotion detection module [114] and the expression identification module [116] respectively receive the at least one user input and the at least one portion of the image and continuously analyse the received at least one user input and the at least one portion of the image to identify any change in the expression being conveyed by the user.
  • the processing module [110] simultaneously detects the same while receiving the at least one user input and the at least one portion of the image of the user.
  • the change in the at least one user input and the at least one gesture of the image of the user corresponds to change in the mood of the user.
  • the emotion detection module [114] analyses that the user’s mood is ‘sad’ . However, when the user changes the input by deleting the word ‘not’ and input the sentence as “My trip was good” ; then the emotion detection module [114] identifies that the user’s mood is ‘happy’ .
  • the changes can also be recorded in the event of any changes in user’s gestures being made in real ⁇ time.
  • the user may change the facial expression from an angry face to a happy expression on his/her face. Therefore, any change in the expression is detected and accordingly analysed by the expression identification module [116] that may be supported by the artificial intelligence tools and mechanisms.
  • the expression identification module [116] s capable of identifying any change in the expression and accordingly determining the current mood of the user.
  • the processing module [110] also invokes the camera module [108] for receiving the image feed to be analyzed and processed and subsequently, interacts with expression identification module [116] for extracting the data and generating an expression ⁇ based system event.
  • the at least one user input is received and the at least one expression associated with the at least one gesture of the user is identified
  • the at least one emotional data is identified by the emotion detection module [114] .
  • the processing module [110] determine the at least one input content by analyzing and processing the emotional data determined from the at least one user input along with the at least one expression.
  • the processing module [110] also requests to emotion detection module [114] to obtain the type and degree of the at least one emotional data.
  • the type of the at least one emotional data may include Happy, Sad, Angry, Sleepy etc.
  • the degree of the emotional data type may include intensity of the mood type such as; Low, Medium, High, Very High etc) .
  • the type and degree of the at least one emotional data may be processed further to accurately determine the at least one input content that can subsequently be recommended to the user.
  • the user is typing “How dare you? ” and simultaneously, the camera module [108] captures the image of the user with an angry face.
  • the expression of the user is “angry face” .
  • the “angry face” expression along with the “How dare you? ” input are analysed together to determine an emotion data.
  • the emotional data may therefore be determined as “very angry” in this case.
  • the at least one input content suggested to the user may be a red ⁇ faced emoticon indicating an angry expression.
  • the data managing module [102] is configured to manage the on ⁇ device data pertaining to the communication sessions conducted by the user through any electronic communication device (s) .
  • the data managing module [102] may be located on the device.
  • the data managing module [102] performs the function of managing the data, including storing the data, formatting any text inputs, etc. It also stores any pre ⁇ processed data and performs the specific transactions of any data between corresponding modules within the system [100] .
  • the data managing module [102] also receives and stores the profiles of the user as well a plurality of user’s contact information. Each of the user’s contacts may have certain degree of affinity with the user, and based on the degree of affinity, the user may use different types of contents while inputting any message during a communication session.
  • the data managing module [102] uses the users’ profile and contact information to accordingly manage the data pertaining to the communication sessions.
  • the data managing module [102] also performs formatting of the at least one user input based on at least one of: the emotional data, the at least one expression and the at least one user input.
  • the data managing module [102] is communicatively coupled to the profile managing module [104] for receiving the profile information of the user. Further, the profile managing module [104] also provides the user profile information to various third ⁇ party applications.
  • the profile managing module [104] uses various data, for example, call log data, message content data, and other types of data to determine and create the user profile with respect to other senders or receivers.
  • the profile information can also be used to customize the formatting options and accordingly generate personalized content for the user’s specific contacts and friends list.
  • the profile information can also be used to filter any content with respect to the specific contacts and friends list.
  • the dynamic content module [106] is coupled to the data manging module [102] and the messaging application module [112] through the profile managing module [104] .
  • the dynamic content module [106] dynamically [106] searches for any data or content from any in ⁇ built storage module or database of the local device, and provides the same to the profile management module [104] as well to the data managing module [102] .
  • the searched data is further used by the messaging application module [112] in the communication sessions being conducted between the users.
  • the dynamic content module [106] also searches for any online data from various network servers located across any local or remote networks, for example; LAN, WAN, the Internet etc.
  • the system [100] is configured to receive at least one user input in real ⁇ time and also a real ⁇ time image of the user, wherein at least one portion of the image indicates gestures of the users.
  • the expression identification module [116] identifies at least one expression associated with the at least one gesture of the user.
  • the emotion detection module [114] identifies at least one emotional data based on the at least one expression and the at least one user input. The emotional data is used to determine the at least one input content that is thereby recommending to the user via a display module. The user is also prompted to select and use the at least one input content in combination with the at least one user input, during the communication session.
  • the display module comprises various elements including at least one of: touch screen, any display screen, a graphical user interface module etc.
  • Fig. 2 is a block diagram illustrating the system [100] elements for providing expression identification and tracking, in accordance with exemplary embodiments of the present disclosure.
  • the messaging application module [112] initiates the communication session for the user, and invokes the processing module [110] by sending a request to analyze the user inputs. Subsequently, the processing module [110] seeks inputs from the camera module [108] and once the processing module [110] receives the image of the user, the processing module [110] transmits the image of the user to the expression identification module [116] .
  • the image captured by the camera module [108] includes at least one portion of the user image that indicates at least one gesture of the user in real ⁇ time.
  • the expression identification module [116] continuously tracks the at least one gesture of the user as being indicated by the user image, and analyses the same to identify the expression of the user being conveyed through his/her gesture. The expression identification module [116] further analyses any change in the expression of the user. The expression identification module [116] also identifies the types and degrees of the expression. For example; happy, sad, angry etc and very happy, very sad, notably angry etc.
  • the messaging application module [112] is an application framework that along with the camera module [108] and the processing module [110] , supports execution of various software applications including at least one messaging application or any other software applications.
  • the messaging application module [112] may support a gaming application installed in the user device in which the user as a game player, may express different expressions while playing a game and accordingly sends to the other players, the messages related to the ongoing game.
  • the user may also may express their emotions by using the input module [118] comprising at least one of: keyboard, mouse, joystick etc.
  • Fig. 3 is a block diagram illustrating the system [100] elements for identifying emotional data, in accordance with exemplary embodiments of the present disclosure.
  • the processing module [110] provides the received inputs, viz the at least one user input and the at least one portion of the user image to the emotion detection module [114] .
  • the emotion detection module [114] continuously tracks the user inputs received via the input module [118] and the camera module [108] , to identify any change in the expression being conveyed through said inputs.
  • the emotion detection module [114] analyses the change and accordingly identifies the current emotion of the user in the form of at least one emotional data.
  • the processing module [110] requests to the emotion detection module [114] to provide the emotional data including type and degree of the user’s emotion.
  • the emotion detection module [114] accordingly sends the at least one emotional data to the processing module [110] for determining the at least one input content that may be thereafter recommended to the user.
  • the at least one input content for example: an emoticon or a smiley, is used by the user during the communication sessions being conducted by the messaging application module [112] .
  • Fig. 4 is a block diagram illustrating the system [100] elements performing actions based on available history and profile information, in accordance with exemplary embodiments of the present disclosure.
  • the profile managing module [104] , the data managing module [102] and the dynamic content module [106] interacts with each other to manage the data pertaining to the communication sessions between the users. The data is managed based on profile information and history of user actions and call logs during several communication sessions.
  • the messaging application module [112] interacts with the profile managing module [104] , the data managing module [102] and the dynamic content module [106] for obtaining the profile information and call log history of the user, and any information regarding any updated profile information and degree of affinity with the other users.
  • the processing module [110] updates the at least one input content and recommend the same to the user. Therefore, the user may be recommended the at least one input content based on the available history and profile information.
  • FIG. 5a, 5b and 5c illustrate a scenario wherein the user is given an indication of the real ⁇ time image being fully or partially captured by the camera module [108] , in accordance with exemplary embodiments of the present disclosure.
  • An electronic communication device [506] having a camera module [108] , a display screen [508] and a user interface [510] is shown in Figure 5a, 5b and 5c.
  • the at least one portion of an image [502] of the user is received by the camera module [108] .
  • the at least one portion indicates at least one gesture of the user in real ⁇ time.
  • Figure 5a illustrates the at least one gesture of the user being indicated by the facial expression of the user.
  • the expression identification module [116] identifies the at least one expression associated with the at least one gesture of the user.
  • the expression of the user may not be detected.
  • the face of the user is not fully covered by the shaded region [504] .
  • the shaded region [504] indicates the coverage of the user body part by the camera module [108] .
  • the face of the user is not adequately captured by the camera module [108] . Therefore, in such an event the camera module [108] may not detect the user’s expression.
  • an indication is providing on the display screen [508] , whether or not the at least one portion of the real ⁇ time image [502] is being adequately captured by the camera module [108] .
  • the user can accordingly adjust the communication device [506] to a suitable angle or control the camera angle such that adequate portion of the image is covered by the camera module [108] and thereby trigger the process of recommendation of input content by the processing module [110] .
  • Figure 5b shows an exemplary scenario, wherein depending upon the coverage region [504] of face area, the user is facilitated to control the initiation and execution of the features as disclosed in the proposed invention.
  • the camera module [108] tries to capture face of the user indicating at least one gesture of the user in real ⁇ time.
  • the user may also make the at least one gesture through his/her at least one body part including facial expressions, hand movements, finger movements, etc.
  • the at least one gesture captured by the camera module [108] is used to detect any expression of the user.
  • the camera does not fully capture the at least one body part of the user, i.e. his face.
  • the camera module [108] is unable to detect the expression of the user as the complete face of the user is not captured.
  • the user is given a first indication [512] via the display screen [508] or the display module that the complete face of the user is not being detected by the camera and also the user is prompted through the GUI (graphical user interface) to adjust the camera to accurate position or angle, to enable the identification of accurate emotions of the user. If the user moves the face while typing the user input (to cover the whole face for expression detection) , then the emotion ⁇ based formatting can be performed accurately without breaking the typing flow. This provides an advantage over the existing systems and methods which require user intervention for formatting the text inputs manually.
  • Figure 5c shows complete coverage of the face of the user that enables the identification of accurate emotions of the user.
  • a second indication [514] is given to the user for complete coverage of the at least one portion of the user image that is showing any gesture to indicate the current emotion of the user. Accordingly, the user types the inputs that are further analysed along with the expression of the face, and the real time input based on the user’s emotion is thereby recommended to the user.
  • Fig. 6 illustrates a scenario wherein the user is prompted to use an updated input based on any change in the emotional data, in accordance with exemplary embodiments of the present disclosure.
  • the user is typing a text with multiple type of emotions in single text and also expressing different emotions.
  • the change in the expression is accordingly analysed by the expression identification module [116] in real ⁇ time and the updated at least one input content is thereby determined.
  • the user is automatically and dynamically prompted to use the updated at least one input content.
  • the expression identification module [116] and the emotion detection module [114] continuously track the gesture of the user and the at least one user input in real ⁇ time to identify any changes and subsequently determine the updated at least one input content for the user.
  • the invention encompasses dynamically segmentation of the received user inputs for providing a smooth analysis of multiple emotions expressed by the user through the text input along with the gestures without breaking the experience of the user during the communication session.
  • Fig. 7 illustrates a scenario wherein the user is prompted to select from multiple options of the at least one input content based on the changes detected in the emotional data, in accordance with exemplary embodiments of the present disclosure.
  • the multiple emotion content may include various options such as image, GIF or video etc. that can have two or more segments depicting different layers of expression.
  • one GIF content may begin with a funny part and ending with an angry part.
  • an image may depict a conversation where the first top conversation in the image may be sad, followed by the bottom part of the image ending with angry conversation.
  • user can use one finger swipe gesture over side icon to view single emotion content or two finger swipe gesture to view multiple ⁇ emotion content from the side icon.
  • Fig. 8 is a flowchart illustrating the method for dynamically recommending at least one input content to a user, in a communication session, over a communication network, in accordance with exemplary embodiments of the present disclosure.
  • at least one user input is received in real ⁇ time.
  • the at least one user input includes but is not limited to a text input, a speech input, a video input and an image input.
  • at least one portion of an image of the user is also received, wherein the at least one portion indicates at least one gesture of the user in real ⁇ time receiving.
  • the at least one gesture includes but is not limited to facial expression and a behaviour pattern of the user.
  • a processing module [110] is configured to receive the at least one user input and the at least one portion of the user image via the input module [118] and the camera module [108] respectively.
  • At step 804 at least one expression associated with the at least one gesture of the user is identified.
  • the at least one expression is identified by the expression identification module [116] that is communicably coupled to the processing module [110] .
  • the processing module [110] is also communicably coupled to the emotion detection module [114] that is configured to identify at least one emotional data based on the at least one expression and the at least one user input.
  • the emotional data pertains to at least one type of human emotion, wherein the at least one type of human emotion is having at least one degree.
  • the at least one user input may also be formatted based on at least one of: the emotional data, the at least one expression and the at least one user input.
  • the formatting of the at least one input may also be based on the degree of affinity of the user with the other users. For example, the user may use different contents while communicating with his/her family and friends or with business colleagues.
  • the information pertaining to the degree of affinity may be stored by the data managing module [102] and the profile managing module [104] .
  • an indication is also provided to the user, via the display module, whether or not the at least one portion of the real ⁇ time image is being adequately captured by the camera module [108] . In the event the camera module [108] is unable to capture the gesture of the user, the user may adjust the angle of his/her device such that the at least one portion of the user image, that indicates the gesture of the user, may be adequately captured and inputted to the processing module [110] .
  • the at least one input content is determined for the at least one user input, based on the at least one emotional data.
  • the at least one input content is determined by the processing module [110] .
  • the at least one input content includes, but is not limited to, an icon, an emoticon, a video, an audio, a graphic interchange format (GIF) content, and an image.
  • GIF graphic interchange format
  • the at least one input content is recommended to the user via the display module, to select and use the in combination with the at least one user input, during the communication session.
  • the user is further recommended to replace the at least one user input with at least one content, in an event the user selects the at least one input content that is being displayed.
  • the at least one input content is recommended to the user while the user is typing the at least one user input in at least one text message during the communication session.
  • the at least one gesture of the user is continuously tracked in real ⁇ time to identify any changes. Accordingly, the emotional data is updated based on the changes identified in the at least one gesture and the user is recommended an updated at least one content based on the updated emotional data.
  • the present invention encompasses the recommendation of the at least one user input further based on segmentation of the user inputs received from the user.
  • a beginning and an end of the at least one gesture received from the user is marked.
  • the at least one user input is automatically segmented to create one or more segments.
  • a corresponding emotional data is identified for each segment of the one or more segments based on the at least one gesture and the at least one user input.
  • the user is recommended the at least one input content based on the corresponding emotional data for each segment of the at least one user input.
  • Fig. 9 illustrates the framework [900] supporting the execution of the exemplary embodiments of the present invention.
  • the system and the method of the present invention may be implemented on various compatible frameworks such as Android framework, iOS framework etc., that may include in ⁇ built AI chips (NPU ⁇ Neural Processing Unit) and faster machine learning technologies.
  • the expression detection and emotion identification may be coupled to the AI service framework, the depth camera and one or more Dot projectors provided in the communication devices.
  • the implementation of AI framework facilitates in easy identification of each of the minor and subtle changes in the expression of the user, for example; minor changes in the face movement or facial expression of the user.
  • the implementation of AI framework also facilitates in obtaining better accuracy for tracking of the user’s expression.
  • the various modules as disclosed herein, including the processing module [110] may be associated with at least one processor configured to perform data processing, input/output processing, and/or any other functionality that enables the working of the system [100] in accordance with the present disclosure.
  • a “processor” refers to any logic circuitry for processing instructions.
  • a processor may be a special purpose processor or plurality of microprocessors, wherein one or more microprocessors may be associated with at least one controller, a microcontroller, Application Specific Integrated Circuits (ASICs) , Field Programmable Gate Array (FPGAs) circuits, and any other type of integrated circuit (IC) , etc.
  • the at least one processor may be a local processor present in the vicinity of the system [100] .
  • the at least one processor may also a processor at a remote location that processes the method steps as explained in present disclosure.
  • the processor is also configured to fetch and execute computer ⁇ readable instructions and data stored in a memory or a data storage device.
  • the database may be implemented using a memory, any external storage device, an internal storage device for storing instructions to be executed, any information, and data, used by the system [100] to recommend the input options to a user during a communication session.
  • a “memory” or “repository” refers to any non ⁇ transitory media that stores data and/or instructions that cause a machine to operate in a specific manner.
  • the memory may include a volatile memory or a non ⁇ volatile memory.
  • Non ⁇ volatile memory includes, for example, magnetic disk, optical disk, solid state drives, or any other storage device for storing information and instructions.
  • Volatile memory includes, for example, a dynamic memory.
  • the memory may be a single or multiple, coupled or independent, and encompasses other variations and options of implementation as may be obvious to a person skilled in the art.
  • the processor, memory, and the system [100] are interconnected to each other, for example, using a communication bus.
  • the “communication bus” or a “bus” includes hardware, software and communication protocol used by the bus to facilitate transfer of data and/or instructions.
  • the communication bus facilitates transfer of data, information and content between these components.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The present invention relates to a system [100] and a method for dynamically recommending at least one input content to a user in a communication session, over a communication network. At least one user input and at least one gesture of the user is received in real‐time. Thereafter, at least one emotional data is identified based on: at least one expression associated with the at least one gesture of the user, and the at least one user input. Based on the at least one emotional data, the at least one input content is determined and thereby recommended to the user to use the same during the communication session.

Description

SYSTEM AND METHOD FOR DYNAMICALLY RECOMMENDING INPUTS BASED ON IDENTIFICATION OF USER EMOTIONS TECHNICAL FIELD
The present invention relates generally to management of user inputs in electronic communication sessions, and more particularly, to a system and a method for dynamically recommending input contents based on identification of user emotions.
BACKGROUND
This section is intended to provide information relating to general state of the art and thus any approach/functionality described hereinbelow should not be assumed to be qualified as a prior art merely by its inclusion in this section.
With advance development in the electronic communication devices, it has become possible for humans to establish communication sessions, via various calling and messaging applications, and convey their emotions while communicating with each other. The various calling and/or messaging applications or ‘apps’ are either pre‐installed in said electronic devices or can be downloaded and installed according to user requirements and preferences.
A typical electronic communication device, herein after ‘device’ , such as a mobile phone, a tablet device, a smart phone, a laptop etc.; through the various ‘apps’ , facilitate user (s) to express his/her emotions by providing various options including images, videos, graphical images such as GIF, texts, icons etc. These options can be inputted by the users during any communication sessions according to their requirements and preferences. For example, a smart phone may be equipped with one or more electronic messaging applications that may be configured to typically receive the user inputs in form of texts, images and/or icons for facilitating the user to thereby convey their emotions to other users or recipients.
While the conventional communication methods and systems provide the aforementioned options to the users for conveying their emotions during any communication session, the users are typically required to manually insert the inputs to provide their emotional information. A major drawback of such systems is that the options available within the electronic messaging applications, are often unable to communicate emotional information that accurately describes the intent and mood of the users, and hence, such conventional communication methods and systems require user interference for effectively communicating the intended emotions of the user to the other users.
The above‐mentioned limitation of the conventional systems and methods is overcome by providing, automatically, the options based on the emotional information being detected from the users’ text‐based or voice‐based inputs. However, in such systems, the automatic options are not displayed to the user until a complete text‐based or voice‐based input, e.g. a complete sentence via text or speech, is received from the user. Thus, these systems are able to extract the emotional data only after the inputted sentence is completed by the user. This creates a break in communication experience of the users and in order to experience a better accuracy of emotion detection, the user must pause before completing the whole conversation for the right amount of texts to be captured by the system.
United States patent publications US20170147202 and US20030110450, disclose solutions for expressing any emotion information in a text message, based on the keyboard inputs and voice/audio inputs received from a user. However, various text input parameters and the voice input parameters such as typing speed and voice pitch may allow to detect only the high intensity emotions, and hence all types and all degrees of emotion is difficult to be predicted in the text and voice inputs. These types of inputs therefore require an additional input parameter of users’ gestures in order to accurately detect the emotion of the user in real time.
In various existing methods and systems for expressing emotions in users’ text messages, deep learning techniques are also used that detect emotional contexts within phrases or set of texts being inputted by the user. For example, the phrases “Go Away! ” and “Shut Up” etc. can be identified, by the state‐of‐the‐art deep learning techniques, as having an anger element, even if the word “anger” or any of its synonyms has not been used in said phrases. However, only a limited number of phrases pertaining to limited types of emotions can be detected by the deep‐learning techniques and may not detect all kinds of emotional contexts with accuracy.
Therefore, there is a need to provide a solution to the above problem for automatically detecting the users’ emotions by analysing the inputs received from the users in real‐time, and managing the received inputs therein for providing accurate input options to the users to express their emotions during a communication session.
SUMMARY
This section is provided to introduce certain aspects of the present invention in a simplified form that are further described below in the detailed description. This summary is not intended to identify the key features or the scope of the claimed subject matter.
In view of the afore‐mentioned drawbacks and limitations of the prior arts, it is an object of the present invention to provide methods and systems for automatically analysing the inputs received from the users in real‐time. Another object of the invention is to manage the received user inputs in a manner such that the accurate input options is provided to the users to express their emotions in any text messages during a communication session. Another object of the invention is to receive real‐time images having at least one gesture of the user. Yet another object of the present invention is to detect emotional data from the received inputs and the real‐time images. A further object of the present invention is to recommend to the users, accurate input contents based on the  emotional data for expressing emotions. Yet another object of the present invention is to facilitate the users to use additional input options for expressing their emotions in the text messages during a communication session.
In view of these and other objects, one aspect of the present invention may relate to a method for dynamically recommending at least one input content to a user, in a communication session, over a communication network, the method comprising the steps of: receiving, via an input module, at least one user input in real‐time; receiving, via a camera module, at least one portion of an image of the user, the at least one portion indicating at least one gesture of the user in real‐time; identifying, using an expression identification module, at least one expression associated with the at least one gesture of the user; identifying, using an emotion detection module, at least one emotional data based on the at least one expression and the at least one user input; determining, using a processing module, the at least one input content, for the at least one user input, based on the at least one emotional data; and recommending to the user, using a display module, to select and use the at least one input content along with the at least one user input, during the communication session.
Another aspect of the invention may relate to a system for dynamically recommending at least one input content to a user in a communication session over a communication network. The system comprises an input module configured to receive at least one user input in real‐time; a camera module, configured to receive at least one portion of an image of the user, the at least one portion indicating at least one gesture of the user in real‐time; an expression identification module configured to identify at least one expression associated with the at least one gesture of the user; an emotion detection module configured to identify at least one emotional data based on the at least one expression and the at least one user input; a processing module configured to determine the at least one input content, for the at least one user input, based on the at least one emotional data; and a display module configured to  recommend to the user, to select and use the at least one input content along with the at least one user input, during the communication session.
Another aspect of the invention may relate to a method and a system for tracking continuously the at least one gesture of the user in real‐time to identify any changes; update emotional data based on the changes identified in the at least one gesture; and recommending an updated at least one content to the user, based on the updated emotional data.
Another aspect of the invention may relate to a method and a system for marking a beginning and an end of the at least one gesture received from the user; automatically perform segmentation of the at least one user input to create one or more segments based on the beginning and the end of the at least one gesture; identify a corresponding emotional data for each segment of the one or more segments based on the at least one gesture and the at least one user input; and recommend to the user, the at least one input content based on the corresponding emotional data for each segment of the at least one user input.
Another aspect of the invention may relate to a method and a system for providing an indication to the user whether or not the at least one portion of the real‐time image is being adequately captured by the camera module.
BRIEF DESCRIPTION OF DRAWINGS
The accompanying drawings, which are incorporated herein, and constitute a part of this present invention, illustrate exemplary embodiments of the disclosed methods and systems in which like reference numerals refer to the same parts throughout the different drawings. Components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present invention. Some drawings may indicate the components using block diagrams and may not represent the internal circuitry of each component. It will be appreciated by those skilled in the art that disclosure of such drawings includes disclosure of electrical components or circuitry commonly used to implement such components. The connections between the  sub‐components of a component have not been shown in the drawings for the sake of clarity, therefore, all sub‐components shall be assumed to be connected to each other unless explicitly otherwise stated in the present invention herein.
Fig. 1 illustrates a system architecture [100] for dynamically recommending at least one input content to a user in a communication session over a communication network, in accordance with exemplary embodiments of the present invention.
Fig. 2 is a block diagram illustrating the system [100] elements for providing expression identification and tracking, in accordance with exemplary embodiments of the present invention.
Fig. 3 is a block diagram illustrating the system [100] elements for identifying emotional data, in accordance with exemplary embodiments of the present invention.
Fig. 4 is a block diagram illustrating the system [100] elements performing actions based on available history and profile information, in accordance with exemplary embodiments of the present invention.
Fig. 5 illustrates a scenario wherein the user is given an indication of the real‐time image being captured, in accordance with exemplary embodiments of the present invention.
Fig. 6 illustrates a scenario wherein the user is prompted to use an updated input content based on any change in the emotional data, in accordance with exemplary embodiments of the present invention.
Fig. 7 illustrates a scenario wherein the user is prompted to use multiple input content options based on multiple emotions in a single text content, in accordance with exemplary embodiments of the present invention.
Fig. 8 is a flowchart illustrating the method for dynamically recommending at least one input content to a user in a communication session over a communication network, in accordance with exemplary embodiments of the present invention.
Fig. 9 illustrates the framework supporting the execution of the exemplary embodiments of the present invention.
DETAILED DESCRIPTION
In the following description, for the purposes of explanation, various specific details are set forth in order to provide a thorough understanding of embodiments of the present invention. It will be apparent, however, that embodiments of the present invention may be practiced without these specific details. Several features described hereafter can each be used independently of one another or with any combination of other features. An individual feature may not address any of the problems discussed above or might address only one of the problems discussed above.
The present invention encompasses systems and methods for dynamically recommending input contents based on identification of user emotions, during a communication session between users over a communication network. At least one user input is received in real‐time from a user via an input module. Along with the at least one user input, at least one portion of an image of the user is also received using a camera module. The at least one portion of the user image indicates at least one gesture of the user in real‐time. Thereafter, at least one expression associated with the at least one gesture of the user is identified. At least one emotional data is determined based on the at least one expression and the at least one user input. Subsequently, using the at least one emotional data, the at least one input content is determined. The at least one input content is thereafter recommended to the user, for selecting and using the recommended at least one input content along with the at least one user input, during the communication session.
As used herein, “hardware” includes a combination of discrete components, an integrated circuit, an application specific integrated circuit, a field programmable gate array, other programmable logic devices and/or other suitable hardware as may be obvious to a person skilled in the art.
As used herein, “software” includes one or more objects, agents, threads, lines of code, subroutines, separate software applications, or other suitable software structures as may be obvious to a skilled person. In one embodiment, software can include one or more lines of code or other suitable software structures operating in a general‐purpose software application, such as an operating system, and one or more lines of code or other suitable software structures operating in a specific purpose software application.
As used herein, “application” or “applications” or “apps” are the software applications residing in respective electronic communication devices and can be either pre‐installed or can be downloaded and installed in said devices. The applications include, but are not limited to, contact management application, calendar application, messaging applications, image and/or video modification and viewing applications, gaming applications, navigational applications, office applications, business applications, educational applications, health and fitness applications, medical applications, finance applications, social networking applications, and any other application. The application uses “data” that can be created, modified or installed in an electronic device over time. The data includes, but is not limited to, contacts, calendar entries, call logs, SMS, images, videos, factory data, emails and data associated with one or more applications.
As used herein, “couple” and its cognate terms, such as “couples” and “coupled” includes a physical connection (such as a conductor) , a virtual connection (such as through randomly assigned memory locations of data memory device) , a logical connection (such as through logical gates of semiconducting device) , other suitable connections, or a combination of such connections, as may be obvious to a skilled person.
As used herein, “electronic communication device” includes, but is not limited to, a mobile phone, a wearable device, smart phone, set‐top boxes, smart television, laptop, a general‐purpose computer, desktop, personal digital assistant, tablet computer, mainframe computer, or any other computer  implemented electronic device that is capable of making transactions of communication messages or data, as may be known to a person skilled in the art.
As used herein, ‘expression’ of the user is detected through facial expressions, hand movements, fingers movements, thumbs movements, head movement, leg movement etc. The various facial features of the user include movements of eyes, nose, lips, eyebrows, jaw movements etc. The expression can be classified into any particular category by using state‐of‐the‐art classifiers suitable for specific expression recognition.
As used herein, “emotional data” is any data pertaining to the expression of the users and emotions of the user being inputted through a text, an audio, an icon or any image. The emotional data is determined by analysing the received user input for example, words, sentences, phrases, GIF, images, icons, etc along with the user expression. The emotional data can be classified into any particular type and degree by using predefined categories of human emotions, wherein the predefined categories of human emotions may be stored locally on the device or on one or more remote servers.
As used herein ‘depth camera’ is a type of camera that is capable of capturing the depth information of any scene or a video frame that is being received as an input.
Fig. 1 illustrates a system architecture for dynamically recommending at least one input content to a user in a communication session over a communication network, in accordance with exemplary embodiments of the present disclosure. As shown in Fig. 1, the system [100] comprises a data managing module [102] , a profile managing module [104] , a dynamic content module [106] , a messaging application module [112] , a processing module [110] , a camera module [108] , an input module [118] , an emotion detection module [114] and an expression identification module [116] .
According to the embodiments of the present invention, the messaging application module [112] initiates any communication session between the users.  The messaging application module [112] is configured to initiate one or more third party messaging applications, social networking applications, instant messenger applications, online chat applications running on any portals etc., that require the users to input text, voice, or image and accordingly convey their messages during the respective communication sessions. The messaging application module [112] may also be triggered by installation of any input devices including keyboard, mouse, joysticks etc. The messaging application module [112] may also be triggered by a touch input received from the user.
In one embodiment of the present invention, the messaging application module [112] is communicably coupled to the processing module [110] and makes a request to the processing module [110] for identifying the expression and emotions of the user. The processing module [110] is coupled to the expression identification module [116] and the emotion detection module [114] that respectively identifies the expression and the emotional data from the inputs being received. According to the embodiments of the present invention, the expression identification module [116] receives at least one portion of the image of the user that indicates at least one gesture of the user. The expression identification module [116] identifies the at least one expression associated with the at least one gesture of the user. Further, the emotion detection module [114] identifies at least one emotional data based on the combination of at least one expression and the at least one user input. The emotional data is further used by the processing module [110] to determine at least one input content that is recommended to the user via a display module. The at least one input content may include but is not limited to an icon, an emoticon, an image, a sticker, a text, a set of texts, a word, a set of words etc. Therefore, the user may implement the recommended at least one input content in the transactions of the messages, using the messaging applications being executed through the respective device.
As disclosed above, the messaging application module [112] invokes the processing module [110] by sending a request to analyze the at least one user  input and the at least one portion of an image of the user. The processing module [110] thereafter seeks for the at least one portion of the image of the userfrom the camera module [108] and the at least one user input from the input module [118] and accordingly determine the at least one input content to be recommended to the user based on the emotional data. The processing module [110] therefore acts as a context analyser that analyses and processes the received inputs to identify the emotional context associated with the same. As discussed above, the processing module [110] is coupled to the camera module [108] and the input module [118] for receiving the user image and the at least one user input respectively. The camera module [108] may include a depth camera that is capable of capturing depth information of the real‐time image of the user.
The user image received using the camera module [108] , comprises the at least one portion indicating at least one gesture of the user in real‐time. The processing module [110] is communicably coupled to the expression identification module [116] and the emotion detection module [114] that are supported by Artificial Intelligence (AI) tools and frameworks to respectively identify expressions and emotions of the users while the user is typing any input, or making any gestures during any ongoing communication session.
In various embodiments of the present invention, the input module [118] provides the at least one user input when the third‐party messaging application are invoked. The at least one user input may include a word, a set of texts, sentences, phrases, voice data, stored images, live images, media etc. Further, the camera module [108] may receive the image of the user in real‐time, wherein at least one portion of the user image indicates the at least one gesture of the user in real‐time. The at least one gesture is analysed by the expression identification module [116] to identify the expression of the user being conveyed through his/her gesture.
Further, the emotion detection module [114] and the expression identification module [116] respectively receive the at least one user input and the at least one portion of the image and continuously analyse the received at least one user input and the at least one portion of the image to identify any change in the expression being conveyed by the user. In the event of any change in the expression in the received inputs, either via the input module [118] or via the camera module [108] , the processing module [110] simultaneously detects the same while receiving the at least one user input and the at least one portion of the image of the user. The change in the at least one user input and the at least one gesture of the image of the user corresponds to change in the mood of the user. For example, in an event of user typing a message sentence “My trip was not good” , the emotion detection module [114] analyses that the user’s mood is ‘sad’ . However, when the user changes the input by deleting the word ‘not’ and input the sentence as “My trip was good” ; then the emotion detection module [114] identifies that the user’s mood is ‘happy’ .
The changes can also be recorded in the event of any changes in user’s gestures being made in real‐time. For example, the user may change the facial expression from an angry face to a happy expression on his/her face. Therefore, any change in the expression is detected and accordingly analysed by the expression identification module [116] that may be supported by the artificial intelligence tools and mechanisms. According to the embodiments of the present invention, the expression identification module [116] s capable of identifying any change in the expression and accordingly determining the current mood of the user.
In various embodiments of the present invention, the processing module [110] also invokes the camera module [108] for receiving the image feed to be analyzed and processed and subsequently, interacts with expression identification module [116] for extracting the data and generating an expression‐based system event. Once the at least one user input is received and the at least  one expression associated with the at least one gesture of the user is identified, the at least one emotional data is identified by the emotion detection module [114] . Thereafter, the processing module [110] determine the at least one input content by analyzing and processing the emotional data determined from the at least one user input along with the at least one expression. The processing module [110] also requests to emotion detection module [114] to obtain the type and degree of the at least one emotional data. For example, the type of the at least one emotional data may include Happy, Sad, Angry, Sleepy etc., and the degree of the emotional data type may include intensity of the mood type such as; Low, Medium, High, Very High etc) . The type and degree of the at least one emotional data may be processed further to accurately determine the at least one input content that can subsequently be recommended to the user. For an instance, the user is typing “How dare you? ” and simultaneously, the camera module [108] captures the image of the user with an angry face. In such a case, the expression of the user is “angry face” . Further, the “angry face” expression along with the “How dare you? ” input are analysed together to determine an emotion data. The emotional data may therefore be determined as “very angry” in this case. In this case, the at least one input content suggested to the user may be a red‐faced emoticon indicating an angry expression.
The data managing module [102] is configured to manage the on‐device data pertaining to the communication sessions conducted by the user through any electronic communication device (s) . The data managing module [102] may be located on the device. The data managing module [102] performs the function of managing the data, including storing the data, formatting any text inputs, etc. It also stores any pre‐processed data and performs the specific transactions of any data between corresponding modules within the system [100] . For example, the data managing module [102] also receives and stores the profiles of the user as well a plurality of user’s contact information. Each of the user’s contacts may have certain degree of affinity with the user, and based on  the degree of affinity, the user may use different types of contents while inputting any message during a communication session. The data managing module [102] uses the users’ profile and contact information to accordingly manage the data pertaining to the communication sessions. The data managing module [102] also performs formatting of the at least one user input based on at least one of: the emotional data, the at least one expression and the at least one user input.
The data managing module [102] is communicatively coupled to the profile managing module [104] for receiving the profile information of the user. Further, the profile managing module [104] also provides the user profile information to various third‐party applications. The profile managing module [104] uses various data, for example, call log data, message content data, and other types of data to determine and create the user profile with respect to other senders or receivers. The profile information can also be used to customize the formatting options and accordingly generate personalized content for the user’s specific contacts and friends list. The profile information can also be used to filter any content with respect to the specific contacts and friends list.
The dynamic content module [106] is coupled to the data manging module [102] and the messaging application module [112] through the profile managing module [104] . The dynamic content module [106] dynamically [106] searches for any data or content from any in‐built storage module or database of the local device, and provides the same to the profile management module [104] as well to the data managing module [102] . The searched data is further used by the messaging application module [112] in the communication sessions being conducted between the users. The dynamic content module [106] also searches for any online data from various network servers located across any local or remote networks, for example; LAN, WAN, the Internet etc.
Thus, the system [100] according to the embodiments of the invention, is configured to receive at least one user input in real‐time and also a real‐time  image of the user, wherein at least one portion of the image indicates gestures of the users. The expression identification module [116] , identifies at least one expression associated with the at least one gesture of the user. The emotion detection module [114] identifies at least one emotional data based on the at least one expression and the at least one user input. The emotional data is used to determine the at least one input content that is thereby recommending to the user via a display module. The user is also prompted to select and use the at least one input content in combination with the at least one user input, during the communication session. In various embodiments of the present invention, the display module comprises various elements including at least one of: touch screen, any display screen, a graphical user interface module etc.
Fig. 2 is a block diagram illustrating the system [100] elements for providing expression identification and tracking, in accordance with exemplary embodiments of the present disclosure. The messaging application module [112] initiates the communication session for the user, and invokes the processing module [110] by sending a request to analyze the user inputs. Subsequently, the processing module [110] seeks inputs from the camera module [108] and once the processing module [110] receives the image of the user, the processing module [110] transmits the image of the user to the expression identification module [116] . The image captured by the camera module [108] includes at least one portion of the user image that indicates at least one gesture of the user in real‐time. The expression identification module [116] continuously tracks the at least one gesture of the user as being indicated by the user image, and analyses the same to identify the expression of the user being conveyed through his/her gesture. The expression identification module [116] further analyses any change in the expression of the user. The expression identification module [116] also identifies the types and degrees of the expression. For example; happy, sad, angry etc and very happy, very sad, terribly angry etc.
In one embodiment of the present invention, the messaging application module [112] is an application framework that along with the camera module [108] and the processing module [110] , supports execution of various software applications including at least one messaging application or any other software applications. For example; the messaging application module [112] may support a gaming application installed in the user device in which the user as a game player, may express different expressions while playing a game and accordingly sends to the other players, the messages related to the ongoing game. The user may also may express their emotions by using the input module [118] comprising at least one of: keyboard, mouse, joystick etc.
Fig. 3 is a block diagram illustrating the system [100] elements for identifying emotional data, in accordance with exemplary embodiments of the present disclosure. The processing module [110] provides the received inputs, viz the at least one user input and the at least one portion of the user image to the emotion detection module [114] . As explained earlier, the emotion detection module [114] continuously tracks the user inputs received via the input module [118] and the camera module [108] , to identify any change in the expression being conveyed through said inputs. In an event of any changes being detected, the emotion detection module [114] analyses the change and accordingly identifies the current emotion of the user in the form of at least one emotional data. The processing module [110] requests to the emotion detection module [114] to provide the emotional data including type and degree of the user’s emotion. The emotion detection module [114] accordingly sends the at least one emotional data to the processing module [110] for determining the at least one input content that may be thereafter recommended to the user. The at least one input content, for example: an emoticon or a smiley, is used by the user during the communication sessions being conducted by the messaging application module [112] .
Fig. 4 is a block diagram illustrating the system [100] elements performing actions based on available history and profile information, in accordance with exemplary embodiments of the present disclosure. As illustrated in the figure, the profile managing module [104] , the data managing module [102] and the dynamic content module [106] interacts with each other to manage the data pertaining to the communication sessions between the users. The data is managed based on profile information and history of user actions and call logs during several communication sessions. Further, the messaging application module [112] interacts with the profile managing module [104] , the data managing module [102] and the dynamic content module [106] for obtaining the profile information and call log history of the user, and any information regarding any updated profile information and degree of affinity with the other users. Based on the data and the information provided to the messaging application module [112] the processing module [110] updates the at least one input content and recommend the same to the user. Therefore, the user may be recommended the at least one input content based on the available history and profile information.
Fig. 5a, 5b and 5c illustrate a scenario wherein the user is given an indication of the real‐time image being fully or partially captured by the camera module [108] , in accordance with exemplary embodiments of the present disclosure. An electronic communication device [506] having a camera module [108] , a display screen [508] and a user interface [510] is shown in Figure 5a, 5b and 5c. As disclosed earlier the at least one portion of an image [502] of the user is received by the camera module [108] . The at least one portion indicates at least one gesture of the user in real‐time.
Figure 5a illustrates the at least one gesture of the user being indicated by the facial expression of the user. The expression identification module [116] identifies the at least one expression associated with the at least one gesture of the user. However, in the event the at least one portion is not adequately  captured by the camera module [108] , the expression of the user may not be detected. For example, as shown in Figure 5a, the face of the user is not fully covered by the shaded region [504] . The shaded region [504] indicates the coverage of the user body part by the camera module [108] . In the example as shown in Figure 5a, the face of the user is not adequately captured by the camera module [108] . Therefore, in such an event the camera module [108] may not detect the user’s expression. According to the embodiments of the present invention, an indication is providing on the display screen [508] , whether or not the at least one portion of the real‐time image [502] is being adequately captured by the camera module [108] . The user can accordingly adjust the communication device [506] to a suitable angle or control the camera angle such that adequate portion of the image is covered by the camera module [108] and thereby trigger the process of recommendation of input content by the processing module [110] .
Figure 5b shows an exemplary scenario, wherein depending upon the coverage region [504] of face area, the user is facilitated to control the initiation and execution of the features as disclosed in the proposed invention. Initially, when the user starts typing the input text via the input module [510] , the camera module [108] tries to capture face of the user indicating at least one gesture of the user in real‐time. The user may also make the at least one gesture through his/her at least one body part including facial expressions, hand movements, finger movements, etc. The at least one gesture captured by the camera module [108] is used to detect any expression of the user. In the scenario as shown in Figure 5b, the camera does not fully capture the at least one body part of the user, i.e. his face. Therefore, the camera module [108] is unable to detect the expression of the user as the complete face of the user is not captured. In this scenario, the user is given a first indication [512] via the display screen [508] or the display module that the complete face of the user is not being detected by the camera and also the user is prompted through the GUI (graphical user  interface) to adjust the camera to accurate position or angle, to enable the identification of accurate emotions of the user. If the user moves the face while typing the user input (to cover the whole face for expression detection) , then the emotion‐based formatting can be performed accurately without breaking the typing flow. This provides an advantage over the existing systems and methods which require user intervention for formatting the text inputs manually.
Figure 5c shows complete coverage of the face of the user that enables the identification of accurate emotions of the user. A second indication [514] is given to the user for complete coverage of the at least one portion of the user image that is showing any gesture to indicate the current emotion of the user. Accordingly, the user types the inputs that are further analysed along with the expression of the face, and the real time input based on the user’s emotion is thereby recommended to the user.
Fig. 6 illustrates a scenario wherein the user is prompted to use an updated input based on any change in the emotional data, in accordance with exemplary embodiments of the present disclosure. As shown in the figure, the user is typing a text with multiple type of emotions in single text and also expressing different emotions. As the user changes his expression from one mood to another mood, the change in the expression is accordingly analysed by the expression identification module [116] in real‐time and the updated at least one input content is thereby determined. The user is automatically and dynamically prompted to use the updated at least one input content. The expression identification module [116] and the emotion detection module [114] continuously track the gesture of the user and the at least one user input in real‐time to identify any changes and subsequently determine the updated at least one input content for the user. The invention encompasses dynamically segmentation of the received user inputs for providing a smooth analysis of multiple emotions expressed by the user through the text input along with the  gestures without breaking the experience of the user during the communication session.
Fig. 7 illustrates a scenario wherein the user is prompted to select from multiple options of the at least one input content based on the changes detected in the emotional data, in accordance with exemplary embodiments of the present disclosure. Based on the sequence of emotions being detected, different types of content can be filtered to replace the text. The multiple emotion content may include various options such as image, GIF or video etc. that can have two or more segments depicting different layers of expression. As for example, one GIF content may begin with a funny part and ending with an angry part. In another example, an image may depict a conversation where the first top conversation in the image may be sad, followed by the bottom part of the image ending with angry conversation. As shown in the figure, based on the text typed, user can use one finger swipe gesture over side icon to view single emotion content or two finger swipe gesture to view multiple‐emotion content from the side icon.
Fig. 8 is a flowchart illustrating the method for dynamically recommending at least one input content to a user, in a communication session, over a communication network, in accordance with exemplary embodiments of the present disclosure. At step 802, at least one user input is received in real‐time. The at least one user input includes but is not limited to a text input, a speech input, a video input and an image input. Further, at least one portion of an image of the user is also received, wherein the at least one portion indicates at least one gesture of the user in real‐time receiving. The at least one gesture includes but is not limited to facial expression and a behaviour pattern of the user. A processing module [110] is configured to receive the at least one user input and the at least one portion of the user image via the input module [118] and the camera module [108] respectively.
At step 804, at least one expression associated with the at least one gesture of the user is identified. The at least one expression is identified by the expression identification module [116] that is communicably coupled to the processing module [110] . Further, the processing module [110] is also communicably coupled to the emotion detection module [114] that is configured to identify at least one emotional data based on the at least one expression and the at least one user input. The emotional data pertains to at least one type of human emotion, wherein the at least one type of human emotion is having at least one degree. According to the embodiments of the present invention, the at least one user input may also be formatted based on at least one of: the emotional data, the at least one expression and the at least one user input. The formatting of the at least one input may also be based on the degree of affinity of the user with the other users. For example, the user may use different contents while communicating with his/her family and friends or with business colleagues. The information pertaining to the degree of affinity may be stored by the data managing module [102] and the profile managing module [104] . Further, an indication is also provided to the user, via the display module, whether or not the at least one portion of the real‐time image is being adequately captured by the camera module [108] . In the event the camera module [108] is unable to capture the gesture of the user, the user may adjust the angle of his/her device such that the at least one portion of the user image, that indicates the gesture of the user, may be adequately captured and inputted to the processing module [110] .
At step 806, the at least one input content is determined for the at least one user input, based on the at least one emotional data. The at least one input content is determined by the processing module [110] . According to the embodiments of the present invention, the at least one input content includes, but is not limited to, an icon, an emoticon, a video, an audio, a graphic interchange format (GIF) content, and an image.
At step 810, the at least one input content is recommended to the user via the display module, to select and use the in combination with the at least one user input, during the communication session. In one embodiment of the present invention the user is further recommended to replace the at least one user input with at least one content, in an event the user selects the at least one input content that is being displayed. According to the embodiments of the present invention, the at least one input content is recommended to the user while the user is typing the at least one user input in at least one text message during the communication session. Further, the at least one gesture of the user is continuously tracked in real‐time to identify any changes. Accordingly, the emotional data is updated based on the changes identified in the at least one gesture and the user is recommended an updated at least one content based on the updated emotional data.
The present invention encompasses the recommendation of the at least one user input further based on segmentation of the user inputs received from the user. A beginning and an end of the at least one gesture received from the user is marked. Thereafter, based on the beginning and the end of the at least one gesture, the at least one user input is automatically segmented to create one or more segments. A corresponding emotional data is identified for each segment of the one or more segments based on the at least one gesture and the at least one user input. Subsequently, the user is recommended the at least one input content based on the corresponding emotional data for each segment of the at least one user input.
Fig. 9 illustrates the framework [900] supporting the execution of the exemplary embodiments of the present invention. The system and the method of the present invention may be implemented on various compatible frameworks such as Android framework, iOS framework etc., that may include in‐built AI chips (NPU ‐Neural Processing Unit) and faster machine learning technologies. According to the embodiments of the present invention, the expression  detection and emotion identification may be coupled to the AI service framework, the depth camera and one or more Dot projectors provided in the communication devices. The implementation of AI framework facilitates in easy identification of each of the minor and subtle changes in the expression of the user, for example; minor changes in the face movement or facial expression of the user. The implementation of AI framework also facilitates in obtaining better accuracy for tracking of the user’s expression.
The features of the present invention as described herein, thus offers the following technical advantages over the conventional systems and methods: (1) accurate identification of the emotions of the user while the user is typing the input, (2) usage of the expression of the user as well as the user input for determining emotions (i.e. emotional data) of the user, (3) segmentation of the user input based on a change in the gesture/expression of the user, (4) recommendation of more than one input content for the one user input while the user is typing, (5) usage of single wipe an double swipe gesture for ease of selection of input content by the user.
The various modules as disclosed herein, including the processing module [110] , may be associated with at least one processor configured to perform data processing, input/output processing, and/or any other functionality that enables the working of the system [100] in accordance with the present disclosure. As used herein, a “processor” refers to any logic circuitry for processing instructions. A processor may be a special purpose processor or plurality of microprocessors, wherein one or more microprocessors may be associated with at least one controller, a microcontroller, Application Specific Integrated Circuits (ASICs) , Field Programmable Gate Array (FPGAs) circuits, and any other type of integrated circuit (IC) , etc. The at least one processor may be a local processor present in the vicinity of the system [100] . The at least one processor may also a processor at a remote location that processes the method steps as explained in present disclosure. Among other capabilities, the processor is also configured to fetch  and execute computer‐readable instructions and data stored in a memory or a data storage device.
According to the embodiments of the present invention, the database may be implemented using a memory, any external storage device, an internal storage device for storing instructions to be executed, any information, and data, used by the system [100] to recommend the input options to a user during a communication session. As used herein, a “memory” or “repository” refers to any non‐transitory media that stores data and/or instructions that cause a machine to operate in a specific manner. The memory may include a volatile memory or a non‐volatile memory. Non‐volatile memory includes, for example, magnetic disk, optical disk, solid state drives, or any other storage device for storing information and instructions. Volatile memory includes, for example, a dynamic memory. The memory may be a single or multiple, coupled or independent, and encompasses other variations and options of implementation as may be obvious to a person skilled in the art.
The processor, memory, and the system [100] are interconnected to each other, for example, using a communication bus. The “communication bus” or a “bus” includes hardware, software and communication protocol used by the bus to facilitate transfer of data and/or instructions. The communication bus facilitates transfer of data, information and content between these components.
While considerable emphasis has been placed herein on the disclosed embodiments, it will be appreciated that changes can be made to the embodiments without departing from the principles of the present invention. These and other changes in the embodiments of the present invention shall be within the scope of the present invention and it is to be understood that the foregoing descriptive matter is illustrative and non‐limiting.

Claims (26)

  1. A method for dynamically recommending at least one input content to a user, in a communication session, over a communication network, the method comprising the steps of:
    ‐ receiving, via an input module [118] , at least one user input in real‐time;
    ‐ receiving, via a camera module [108] , at least one portion of an image of the user, the at least one portion indicating at least one gesture of the user in real‐time;
    ‐ identifying, using an expression identification module [116] , at least one expression associated with the at least one gesture of the user;
    ‐ identifying, using an emotion detection module [114] , at least one emotional data based on the at least one expression and the at least one user input;
    ‐ determining, using a processing module [110] , the at least one input content, for the at least one user input, based on the at least one emotional data; and
    ‐ recommending to the user, using a display module, to select and use the displayed at least one input content in combination with the at least one user input, during the communication session.
  2. The method further comprising: replacing the at least one user input with at least one content in an event the user selects the at least one content.
  3. The method further comprising: changing the formatting of the at least one user input based on at least one of: the emotional data, the at least one expression and the at least one user input.
  4. The method further comprising: suggesting the at least one input content to the user while the user is typing the at least one user input in at least one text message during the communication session.
  5. The method further comprising:
    ‐ tracking continuously the at least one gesture of the user in real‐time to identify any changes;
    ‐ updating the emotional data based on the changes identified in the at least one gesture;
    ‐ recommending an updated at least one content to the user, based on the updated emotional data.
  6. The method further comprising:
    ‐ marking a beginning and an end of the at least one gesture received from the user;
    ‐ automatically segmenting the at least one user input to create one or more segments based on the beginning and the end of the at least one gesture; and
    ‐ identifying a corresponding emotional data for each segment of the one or more segments based on the at least one gesture and the at least one user input; and
    ‐ recommending to the user, the at least one input content based on the corresponding emotional data for each segment of the at least one user input.
  7. The method wherein the at least one user input includes at least one of: a text input, a speech input, a video input and an image input.
  8. The method wherein the at least one input content includes at least one of: an icon, an emoticon, a video, an audio, a graphic interchange format (GIF) content, and an image.
  9. The method wherein the at least one gesture includes but is not limited to facial expression and a behaviour pattern of the user.
  10. The method further comprising: providing on the display screen, an indication to the user whether or not the at least one portion of the real‐time image is being adequately captured by the camera module [108] .
  11. The method wherein the emotional data pertains to at least one type of human emotion, and wherein the at least one type of human emotion is having at least one degree.
  12. The method further comprising the step of filtering, the at least one content based on the emotional data.
  13. A system [100] for dynamically recommending at least one input content to a user in a communication session over a communication network, the system [100] comprising:
    ‐ an input module [118] configured to receive at least one user input in real‐time;
    ‐ a camera module [108] , configured to receive at least one portion of an image of the user, the at least one portion indicating at least one gesture of the user in real‐time;
    ‐ an expression identification module [116] configured to identify at least one expression associated with the at least one gesture of the user;
    ‐ an emotion detection module [114] configured to identify at least one emotional data based on the at least one expression and the at least one user input;
    ‐ a processing module [110] configured to determine the at least one input content, for the at least one user input, based on the at least one emotional data; and
    ‐ a display module configured to recommend to the user, to select and use the at least one input content in combination with the at least one user input, during the communication session.
  14. The system [100] , wherein the processing module [110] is further configured to:
    track continuously the at least one gesture of the user in real‐time to identify any changes;
    update emotional data based on the changes identified in the at least one gesture; and
    recommending an updated at least one content to the user, based on the updated emotional data.
  15. The system [100] , wherein the processing module [110] is further configured to:
    ‐ mark a beginning and an end of the at least one gesture received from the user;
    ‐ automatically perform segmentation of the at least one user input to create one or more segments based on the beginning and the end of the at least one gesture; and
    ‐ identify a corresponding emotional data for each segment of the one or more segments based on the at least one gesture and the at least one user input; and
    ‐ recommend to the user, the at least one input content based on the corresponding emotional data for each segment of the at least one user input.
  16. The system [100] wherein the at least one user input includes at least one of: a text input, a speech input, a video input and an image input.
  17. The system [100] , wherein the at least one input content includes at least one of: an icon, an emoticon, a video, an audio, a graphic interchange format (GIF) content, and an image.
  18. The system [100] , wherein the at least one gesture includes but is not limited to facial expression and a behaviour pattern of the user.
  19. The system [100] , wherein the camera module [108] comprises a depth camera that is capable of capturing depth information of the real‐time image of the user.
  20. The system [100] , wherein the processing module [110] is further configured to provide, using the display module, an indication to the user whether or not the at least one portion of the real‐time image is being adequately captured by the camera module [108] .
  21. The system [100] , wherein the emotional data pertains to at least one type of human emotion, and wherein the at least one type of human emotion is having at least one degree.
  22. The system [100] , the processing module [110] is further configured to, filter the at least one content based on the emotional data.
  23. The system [100] , further comprising a messaging application module [112] configured to facilitate the communication session.
  24. The system [100] , further comprises a data managing module [102] configured to manage at least one on‐device data pertaining to the communication session.
  25. The system [100] , further comprising a profile managing module [104] configured to provide profile information of the user.
  26. The system [100] , further comprising a dynamic content module [106] configured for searching any relevant data wherein the relevant data is stored in at least one local database and at least one online database across a network.
PCT/CN2019/122695 2019-04-10 2019-12-03 System and method for dynamically recommending inputs based on identification of user emotions WO2020207041A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201980095244.3A CN113785539A (en) 2019-04-10 2019-12-03 System and method for dynamically recommending input based on recognition of user emotion

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN201911014498 2019-04-10
IN201911014498 2019-04-10

Publications (1)

Publication Number Publication Date
WO2020207041A1 true WO2020207041A1 (en) 2020-10-15

Family

ID=72750481

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/122695 WO2020207041A1 (en) 2019-04-10 2019-12-03 System and method for dynamically recommending inputs based on identification of user emotions

Country Status (2)

Country Link
CN (1) CN113785539A (en)
WO (1) WO2020207041A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023186097A1 (en) * 2022-04-02 2023-10-05 维沃移动通信有限公司 Message output method and apparatus, and electronic device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101072207A (en) * 2007-06-22 2007-11-14 腾讯科技(深圳)有限公司 Exchange method for instant messaging tool and instant messaging tool
CN102255827A (en) * 2011-06-16 2011-11-23 北京奥米特科技有限公司 Video chatting method, device and system
CN103297742A (en) * 2012-02-27 2013-09-11 联想(北京)有限公司 Data processing method, microprocessor, communication terminal and server
CN104753766A (en) * 2015-03-02 2015-07-01 小米科技有限责任公司 Expression sending method and device
CN108062533A (en) * 2017-12-28 2018-05-22 北京达佳互联信息技术有限公司 Analytic method, system and the mobile terminal of user's limb action
US20180349686A1 (en) * 2017-05-31 2018-12-06 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method For Pushing Picture, Mobile Terminal, And Storage Medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170147202A1 (en) * 2015-11-24 2017-05-25 Facebook, Inc. Augmenting text messages with emotion information

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101072207A (en) * 2007-06-22 2007-11-14 腾讯科技(深圳)有限公司 Exchange method for instant messaging tool and instant messaging tool
CN102255827A (en) * 2011-06-16 2011-11-23 北京奥米特科技有限公司 Video chatting method, device and system
CN103297742A (en) * 2012-02-27 2013-09-11 联想(北京)有限公司 Data processing method, microprocessor, communication terminal and server
CN104753766A (en) * 2015-03-02 2015-07-01 小米科技有限责任公司 Expression sending method and device
US20180349686A1 (en) * 2017-05-31 2018-12-06 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method For Pushing Picture, Mobile Terminal, And Storage Medium
CN108062533A (en) * 2017-12-28 2018-05-22 北京达佳互联信息技术有限公司 Analytic method, system and the mobile terminal of user's limb action

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023186097A1 (en) * 2022-04-02 2023-10-05 维沃移动通信有限公司 Message output method and apparatus, and electronic device

Also Published As

Publication number Publication date
CN113785539A (en) 2021-12-10

Similar Documents

Publication Publication Date Title
US11983807B2 (en) Automatically generating motions of an avatar
US10311143B2 (en) Preventing frustration in online chat communication
CN110892395B (en) Virtual assistant providing enhanced communication session services
CN110869969B (en) Virtual assistant for generating personalized responses within a communication session
US11138207B2 (en) Integrated dynamic interface for expression-based retrieval of expressive media content
US10984226B2 (en) Method and apparatus for inputting emoticon
US10154071B2 (en) Group chat with dynamic background images and content from social media
CN111107392B (en) Video processing method and device and electronic equipment
US20210365749A1 (en) Image data processing method and apparatus, electronic device, and storage medium
CN110991427B (en) Emotion recognition method and device for video and computer equipment
US20190244427A1 (en) Switching realities for better task efficiency
US11055119B1 (en) Feedback responsive interface
US11928985B2 (en) Content pre-personalization using biometric data
CN107564526B (en) Processing method, apparatus and machine-readable medium
US11169667B2 (en) Profile picture management tool on social media platform
US20190325201A1 (en) Automated emotion detection and keyboard service
US20220092071A1 (en) Integrated Dynamic Interface for Expression-Based Retrieval of Expressive Media Content
JP2022088304A (en) Method for processing video, device, electronic device, medium, and computer program
CN105069013A (en) Control method and device for providing input interface in search interface
US10679042B2 (en) Method and apparatus to accurately interpret facial expressions in American Sign Language
WO2020207041A1 (en) System and method for dynamically recommending inputs based on identification of user emotions
KR20110012491A (en) System, management server, terminal and method for transmitting of message using image data and avatar
JP2018022479A (en) Method and system for managing electronic informed concent process in clinical trial
US20220222955A1 (en) Context-based shape extraction and interpretation from hand-drawn ink input
CN111400443A (en) Information processing method, device and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19924625

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19924625

Country of ref document: EP

Kind code of ref document: A1