AU2021105574A4

AU2021105574A4 - Real-time emotion detection and multimodal interview preparation system

Info

Publication number: AU2021105574A4
Application number: AU2021105574A
Authority: AU
Inventors: Niran N.; Manjunath R. Kounte; Pranav S. Hegde
Original assignee: R Kounte Manjunath Dr
Current assignee: R Kounte Manjunath Dr
Priority date: 2021-08-16
Filing date: 2021-08-16
Publication date: 2021-10-21
Anticipated expiration: 2029-08-16

Abstract

A real-time emotion detection and multimodal interview preparation cum conduction system by analyzing one's essays, resume, expression and voice focuses on textual, audio and video analysis using deep learning with the goal for detection of emotions through the multimodal approach. The proposed invention designs a real time emotion detection and interview preparation/conduction system is therefore satisfied by the usage of textual analysis and sentiment analysis through our text and video-based modules. A real-time emotion detection and multimodal interview preparation system includes: a textual analysis unit; a sentiment analysis unit; a processor; a camera module; and a display module. 2/3 Overall review of the Sentiment Analysis post Intervew Report meeting is mailed to the Program Manager (and candidate) for final call Candidate The Deep Learning Final Report attends t he A[ model scores t he generation of ba sed Interview candidIate based o n ~the candidate facial expressions,== voice,' context and correctness realtime Resume review and Document Based on verification Recommendations and final review, candidate is either selected or rejected FIG. 3 Number of Face ' Numberof Fab es. 1 Er~tnlreport Face #1Ernotional report F.ce #1 Agyt0,215 Angry 0-024 Ddut:0.0 D'sgust :0.0 F 00 a0 Fear : .1lue FFear :DG 02 Wrpri surpr.ise. 001++ Nouttol O-2BZNeutral 0,823 J FIG. 4

Description

2/3

Overall review of the Sentiment Analysis post Intervew Report meeting is mailed to the Program Manager (and candidate) for final call

Candidate The Deep Learning Final Report attends t he A[ model scores t he generation of ba sed Interview candidIate based o n ~the candidate facial expressions,== voice,' context and correctness realtime Resume review and Document Based on verification Recommendations and final review, candidate is either selected or rejected

FIG. 3

Number of Face ' Numberof Fab es. 1

Er~tnlreport Face #1Ernotional report F.ce #1 Agyt0,215 Angry 0-024 Ddut:0.0 D'sgust :0.0 F 00 a0 Fear : .1lue FFear :DG 02

Wrpri Nouttol O-2BZNeutral surpr.ise. 001++ 0,823 J

FIG. 4

REAL-TIME EMOTION DETECTION AND MULTIMODAL INTERVIEW PREPARATION SYSTEM FIELD OF INVENTION

The present invention relates to real time emotion detection and in particular to systems for real-time emotion detection.

The invention has been developed primarily for use as a real-time emotion detection and multimodal interview preparation cum conduction system and will be described hereinafter with reference to this application. However, it will be appreciated that the invention is not limited to this particular field of use.

BACKGROUND OF INVENTION

Background description includes information that may be useful in understanding the present invention. It is not an admission that any of the information provided herein is prior art or relevant to the presently claimed invention, or that any publication specifically or implicitly referenced is prior art.

The multitude of applications under the Al domain is one of the reasons behind its upsurge in the since the past decade. Various sub-branches like Machine Learning, Data Science, Deep Learning have been formulated for assistance in various domains, it has introduced concepts like neural networks which can be used to design a lossless audio/video compression algorithm and multiple other functionalities that can make computation more efficient. Artificial Intelligence in marketing is a revolutionary step towards catering to the needs of the customers in a non-tedious and swift manner, it can be used to monitor customer behavior, analyze the patterns in customers' orders. In the field of medicine Convolutional Neural Networks (CNN) can be used to build an algorithm that can classify the type of cardiac disorder with the assistance of wearable ECG monitoring devices, and on detection can notify the hospital on the current situation of the patient. The extensive variance and unpredictability of stock value in a Stock market can toned down through machine learning algorithms created specifically for the purpose creating classifiers using Linear regression.

The meaning and use of the word "interview" can be traced back to 1921, when Thomas Alva Edison conducted the first work interview. A written examination was attempted by the candidates. While this is the time when the term "interview" was first invented and used, similar concepts can be dated back. In reality, we can trace the process of interviewing someone back to the Middle Ages. There have been manuscripts discovered that mimic a modern-day resume to some degree. At the very beginning, there was no concept of "jobs". Instead, the main target was survival, so the men searched for food while the women prepared it. As humans progressed, new industries and abilities were discovered, learned, and passed down from generation to generation. Apprentices were prepared by craftsmen for work, as evidenced by the remains and documents of ancient Rome, Greece, and Egypt. We can think of the word " apprentice " as being similar to today's interns and internships. When a tradesman did not have children and needed to pass on their family career in some way, these "apprentice work vacancies" were commonly held. People gradually began to explore domains that were not similar to the one handed down by their family members as humans learned and the Industrial Revolution began. At the same time, several factories and shops opened in developed countries, which necessitated the start of the interview process.

A number of different types of emotion detection systems are known in the prior art. For example, the following patents are provided for their supportive teachings and are all incorporated by reference.

US8209182B2 An emotion recognition system for assessing human emotional behavior from communication by a speaker includes a processing system configured to receive signals representative of the verbal and/or non-verbal communication. The processing system derives signal features from the received signals. The processing system is further configured to implement at least one intermediate mapping between the signal features and one or more elements of an emotional ontology in order to perform an emotion recognition decision. The emotional ontology provides a gradient representation of the human emotional behavior.

The proposed technology focuses on the analysis of textual, audio and video inputs using deep learning techniques. The technology works on emotion detection through multimodal approach. This technology acts as an efficient and swift reequipment system based on the character, general aptitude of the candidates.

Above information is presented as background information only to assist with an understanding of the present disclosure. No determination has been made, no assertion is made, and as to whether any of the above might be applicable as prior art with regard to the present technology.

In the view of the foregoing disadvantages inherent in the known types of emotion detection systems now present in the prior art, the present technology provides an improved system. As such, the general purpose of the present technology, which will be described subsequently in greater detail, is to provide a new and improved real time emotion detection system along with techniques for conduction of interviews and recruitment drive that has all the advantages of the prior art and none of the disadvantages.

SUMMARY OF INVENTION

It is an object of the present invention to overcome or ameliorate at least one of the disadvantages of the prior art, or to provide a useful alternative.

It is an object of the invention in its preferred form to provide a real-time emotion detection and multimodal interview preparation cum conduction system.

The present technology relates to the field of real-time emotion detection and multimodal interview preparation cum conduction system by analyzing one's essays, resume, expression and voice. A deep learning model is used to serve the purpose using three modes of input that is text, audio and video.

According to an aspect of the invention in a preferred form, there is provided a real-time emotion detection and multimodal interview preparation system, the system including: a textual analysis unit; a sentiment analysis unit; a processor; a camera module; and a display module.

Preferably, the textual analysis unit is used to analyze the textual information gathered from a resume.

Preferably, the task of the sentiment analysis unit is to capture the facial expression using the camera module, and analyze the emotions.

Preferably, the processor is a 64-bit quad core processor.

Preferably, the display module is an LCD monitor for displaying the video contents.

In the view of the foregoing disadvantages inherent in the known types of Emotion detection systems now present in the prior art, the present technology provides an improved and cost-effective as well as user-friendly real time emotion detection technique. As such, the general purpose of the present technology, which will be described subsequently in greater detail, is to provide an improved technique for conduction of interviews by analyzing the resumes of candidates which has all the advantages of the prior art and none of the disadvantages.

The main objective of the proposed technology is to implement and design a real time emotion detection and interview preparation/conduction system is therefore satisfied by the usage of textual analysis and sentiment analysis through our text and video-based modules. This system can be used by candidates to prepare for interviews that are especially conducted in the online medium, and organizations who want to hire candidates based on their skill set can set up certain parameters suitable for their hiring process. It provides a revolutionary platform to assist both sides in the hiring process and encourages candidates to learn from their mistakes as a complete report of the interview will be provided on the application for introspection. This would be a good database for surveys and analysis in large spectrum of fields varying from human psychology to rate of employment.

Yet another object of the proposed technology is to focus is on textual, audio and video analysis using Deep Learning, the main goal being emotion detection through the multimodal approach mentioned above. We can use this data to run multiple models to determine parameters necessary for the evaluation of the candidate. With almost 15 lakh students graduating every year in the Engineering stream there is a dire requirement of jobs for many students in order to support their families. This requires an efficient and swift recruitment system based on the character, general aptitude of the candidate. Our multimodal interview system would fit in seamlessly into the current void present in our recruitment procedure.

Yet another objective of the proposed technology is that the inputs are obtained through we application. For instance, the text mining is used to analyze all the required data form the resumes submitted on the web application and used to create a unique questionnaire. The audio interview would be used to recognize patters in the candidate's conversation and determine traits like confidence, language proficiency etc. The video input will be obtained in MP4 or WAV file format and face can be recognized using HOG's (Histogram of Oriented Graphs) or YOLO V3.

Yet another aspect of the proposed technology is that the complicated task of emotion recognition is achieved using capturing regional pic cells that is the emotion anger is associated with the pic cells link to the eyebrows, happiness is associated with the pic cells link to the eyes and mouth. Based on these the current emotion of the candidate can be predicted.

Yet another aspect of the proposed technology is that. The proposed methodology is cost effective and also saves the ample time of both the interviewer as well as the candidate. The data obtained through the resume is used to categorize various skills of the candidate. Satisfactory results are obtained for both texts based and video-based sentiment analysis.

In this respect, before explaining at least one embodiment of the technology in detail, it is to be understood that the technology is not limited in its application to the details of construction and to the arrangements of the components set forth in the following description or illustrated in the various ways. Also, it is to be understood that the phraseology and terminology employed herein are for the purpose of description and should not be regarded as limiting.

These together with other objects of the technology, along with the various features of novelty which characterize the technology, are pointed out with particularity in the disclosure. For a better understanding of the technology, its operating advantages and the specific objects attained by its uses, reference should be had to the accompanying drawings and descriptive matter in which there are illustrated preferred embodiments of the technology.

BREIF DESCRIPTION OF DRAWINGS

The invention will be better understood and objects other than those set forth above will become apparent when consideration is given to the following detailed description thereof. Such description makes reference to the annexed drawings wherein:

FIG. 1 illustrates the Steps required to conduct a Typical Interview in the classical way of a real time emotion detection and multimodal interview preparation cum conduction system by analyzing one's essays, resume, expression and voice, according to the embodiment herein; FIG. 2 illustrates the block diagram of the proposed system of a real time emotion detection and multimodal interview preparation cum conduction system by analyzing one's essays, resume, expression and voice, according to the embodiment herein; FIG. 3 illustrates the External flow of an Al based Interview of a real time emotion detection and multimodal interview preparation cum conduction system by analyzing one's essays, resume, expression and voice, according to the embodiment herein; FIG. 4 illustrates the emotion detection analysis of a real time emotion detection and multimodal interview preparation cum conduction system by analyzing one's essays, resume, expression and voice, according to the embodiment herein; FIG. 5 illustration of perceive emotions of the candidates in graphical and percentage format of a real time emotion detection and multimodal interview preparation cum conduction system by analyzing one's essays, resume, expression and voice, according to the embodiment herein; and FIG. 6 illustrates the comparative graphical representation of emotions of a real time, emotion detection and multimodal interview preparation cum conduction system by analyzing one's essays, resume, expression and voice, according to the embodiment herein.

DETAILED DESCRIPTION OF INVENTION

In the following detailed description, reference is made to the accompanying drawings which form a part hereof, and in which is shown by way of illustration specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that the embodiments may be combined, or that other embodiments may be utilized and that structural and logical changes may be made without departing from the spirit and scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims and their equivalents.

While the present invention is described herein by way of example using several embodiments and illustrative drawings, those skilled in the art will recognize that the invention is neither intended to be limited to the embodiments of drawing or drawings described, nor intended to represent the scale of the various components. Further, some components that may form a part of the invention may not be illustrated in certain figures, for ease of illustration, and such omissions do not limit the embodiments outlined in any way. It should be understood that the drawings and detailed description thereto are not 1s intended to limit the invention to the particular form disclosed, but on the contrary, the invention covers all modifications, equivalents and alternatives falling within the spirit and scope of the present invention as defined by the appended claims. The headings are used for organizational purposes only and are not meant to limit the scope of the description or the claims. As used throughout this description, the word "may" be used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Further, the words "a" or "a" mean "at least one" and the word "plurality" means one or more, unless otherwise mentioned. Furthermore, the terminology and phraseology used herein is solely used for descriptive purposes and should not be construed as limiting in scope. Language such as "including," "comprising," "having," "containing," or "involving," and variations thereof, is intended to be broad and encompass the subject matter listed thereafter, equivalents, and any additional subject matter not recited, and is not intended to exclude any other additives, components, integers or steps. Likewise, the term "comprising" is considered synonymous with the terms "including" or "containing" for applicable legal purposes. Any discussion of documents, acts, materials, devices, articles and the like are included in the specification solely for the purpose of providing a context for the present invention.

In this disclosure, whenever an element or a group of elements is preceded with the transitional phrase "comprising", it is understood that we also contemplate the same element or group of elements with transitional phrases "consisting essentially of, "consisting", "selected from the group consisting of', "including", or "is" preceding the recitation of the element or group of elements and vice versa.

Ever since the Industrial Revolution, candidates who graduated from the newly formed Education System had to undergo a formal test that could assess whether the candidate was fit for the job. The aim of the job interview process is to assist an employer in determining whether you're eligible for the job you're interviewing for, as well as to assist you in determining whether your values and goals match with those of the employer. This process varies by company and job, with some involving several face-to-face interviews and others making a decision after only one meeting.

While interview processes differ depending on the company's procedures and procedure, most interviews follow a similar structure, which includes the following steps:

• Pre-screening

• Aptitude Round/First Interview

• Technical Round

• Final Interview with HR

• Decision of the Interview

Pre-Screening: Numerous employers hold an initial interview to determine if you are a suitable candidate for the job. The screening is usually fifteen to twenty minutes long and can be done over the phone or in person. This discussion is used to narrow down the pool of applicants that will be contacted for formal first interviews. This step is one of the most crucial and important steps since the company should filter out only the potential candidates and not others.

Aptitude/Technical Round: While most of these rounds are solely based on the result of the marks scored by the candidate in the test, it is also important to note the involvement of HR here in order to prepare the questions and conduct the exam with quorum.

Final Interview with HR: While this step is a final decision-making step in the process which decides whether the candidate who has passed all the previous tests is actually for the job.

Now, we have to understand that although the process of the selection and the way the profiles get scrutinized would vary based on the organization, type ofjob and the qualification, the core basis of the conduction and the base structure would remain, more or less, the same.

Conducting an Interview is not an easy task. It is, in fact, one of the most important tasks every company has to perfectly complete, since the incoming employees are the ones who would be doing the work; and the ones responsible for the growth of the company. Failing in recruiting promising candidates would mean a failure of the company as a whole. This is why every company takes recruiting very seriously. It is an important; and an expensive task. Even though this process has been followed since a long time, it is not efficient for the current developing world - Incorporating new technologies and strategies is a must. Now here are a few disadvantages that most traditional companies face.

i. High Expenditure - Companies allocate funds separately under the HR department to conduct the Interviews. It is considered expensive because of the infrastructure, documents and the tasks involved.

ii. Human Resource - A plentiful number of individuals are required to conduct the interviews. This includes most of the employees working aside from their regular tasks for the various stages mentioned above.

iii. Biasing - One of the main facts that we cannot neglect is biasing. Humans are biased in some way or the other. At times, one might favor an individual for some reason over the other, even if the latter has more skills compared to the one that got selected. This human biasing and partiality cost a lot for the company in the longer run.

iv. Misplacing - It is unfortunate to know that since most of the interviews are conducted in a manual process, at times, because of human errors, the documents or the report of some individuals might get misplaced or lost. This would result in the loss of a potential candidate because of a very simple reason and by the time they realize, it might be too late.

The proposed technology focuses on emotion detection through the multimodal approach that one's multiple models to determine parameters necessary for the evaluation of the candidates. The proposed multimodal interview system would fit in seamlessly into the current void present in the reequipment procedure.

The interview process would be segmented into namely three modules, the text-based module that uses input data from resumes submitted through the PWA, audio-based modules that is capable of analyzing voice patterns through to determine parameters like confidence and English proficiency, the video module that uses Deep Learning algorithms to detect and categorize complex human emotions like anger, fear, happiness and many more.

Text based Interview:

• Input is obtained through the Progressive Web Application designed using Flask.

• A predefined template is shared with the candidate and the Resume would be in the same format.

• The process of character classification would be based on the big five psychological traits i.e., Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism. These 5 traits have been psychologically proven to be parameters to determine the character of the person.

• NLTK (Natural Language Toolkit) will be used for the text processing.

• After Text Processing and tokenization of the various data that are required to formulate the questionnaire, the interview can be conducted either through text based questions or the questions can be asked verbally though GTTS (Google Text to Speech).

Video based Interview

• A webcam is launched to obtain the visual data required for sentiment/emotion detection. OpenCV is the preferred package to deal with this module.

• Facial Identification can be carried out through the usage of HOG's (Histogram of Oriented Graphs) in OpenCV.

• Currently the most efficient model to follow for emotion detection through visual data would be the Exception Model, it is based on Depth wise Separable Convolutions. This is preferred because of the minimal number of parameters involved in its usage, which in turn would help us run the model on Google Collab much efficiently.

• Various emotions can be detected through facial landmarks of the candidate i.e., Anger is associated with the facial landmark of eyebrows, happiness through the pixels surrounding the facial landmark that is the mouth.

Parts of the proposed system:

1s The Text Interview:

The main package that will be used for this module would be NLTK. Natural Language Toolkit is a Python package that can be used to work with human language data. It includes libraries of text processing using stemming, tokenization, and classification. The main set of functions that would be useful for our approach would be tokenization. The algorithm needs to traverse through the resume and obtain critical information related to the candidate, so all the information obtained is tokenized under various categories like hobbies, current occupation, previous experience and so on.

After all the data is obtained and tokenized, it will be used to prepare a unique questionnaire for the candidate. The candidate will be judged through multiple formats of the questions including multiple choice questions and written questions, these written questions would play a vital role in the process and will provide the algorithm with the necessary data required for character assessment through their answers, as mention'1 ed in the workflow judgement will conducted based on the big 5 psychological traits. On obtaining the end result and considering all the answers given by the candidate a decision is made by the system based on a pre-defined criterion.

The video interview:

The model used is XCeption model which uses Depth wise Separable Convolution for facial emotion recognition. For the video interview Depth wise Separable Convolution is the basis of model we are planning to use. In traditional method, convolution is applied to all the input channels at a time, which can take an immense amount of time for processing and computations. There is also the fact that standard coevolution has almost 9 times the number of multiplications as Depth wise Convolution. Therefore, it would be advantageous if we can use Depth wise Separable convolution which applies convolution to a single input channel at a time.

The complete process of Depth wise Separable Convolution can be subdivided into two phases i.e., the filtering stage (Depth wise Convolution) and combination stage (Pointwise Convolution).

Depth wise Convolution (Filtering stage):

Convolution is applied to a single input channel at a time, this is represented in the first diagram (b), in this case the number parameters we would have to deal with is as follows,

Considering Dk as the height and width of the of the input volume and the number of input channels being M the number of parameters, we could deal with is = Dk2 *M

Pointwise Convolution (Combination stage):

This is represented in the second diagram with height and width as one each with the number of input channels being M, so the number of parameters we would have to deal with in this phase is = N*M

The combination of both these phases would give us the total number of parameters which is

M [ Dk 2 +N] Number of Parameters in Depth wise Separable Convolution = M [ Dk 2 + Ni = 1_+ 1

Number of Parameters in Standard Convolution N * Dk2 * M N Dk 2 [71

Dk Dk M

DkM M r90 Dk N 1 N

(a) (b)

Diagram (a) Representation of Standard Diagram (b) Representation of Depth-wise Convolution Separable Convolution

In the block (b) the first representation signifies the first phase i.e., filtering stage and the second representation signifies the second phase i.e., combination stage

The above procedure is the process that will be followed facial emotion recognition. OpenCV can be used to analyze the data obtained by the webcam, and using the model a preferred conclusion to the status of the candidate's mental state or emotions can be identified. This would help in determining the stress handling of the candidate by asking certain questions. The end result would be a culmination of various factors and if the candidate falls a under a certain threshold, then they would not be hired.

The model developed can be subdivided into 4 main components based on their technicality: front-end module, verification module, data-base module, and a back-end processor/compiler. This hardware infrastructure developed solves and takes care of almost all the setup, scrutiny and conduction of interviews. As part of the job interviews, while using our product, candidates are faced with the choice of doing a text-based, audio, or video-based interview, and need to send all the necessary information via the web application using Flask. All of the data that was collected in the application is saved in the database, and that is exactly where Firebase data is stored. Firebase data is employed by the image verification component, which is made up of a Raspberry Pi with a camera module. A number of QR-code processes occur on the back end, rather than on the front end points, of the candidate's digital token. Until conducting the interview, the screening software ensures the identity of the applicant with the help of an interactive robot (optional). When the resume is submitted, the interview algorithm does some of the prep work for the interviewer to ensure that the correct questionnaire is created and then gets the interview started.

The results received after the analysis of the resume submitted will have a percentage based determinizations of how proficient the candidate is in the fields mentioned based on their achievements in that domain and the depth of knowledge. This provides a decent overview of the candidate's area of interest and domain proficiency which makes it easier for the algorithm to formulate a unique questionnaire based on the candidate's interests. As for the video-based interview, sentiment analysis needs to be conducted using the facial features and landmarks. The Deep Learning algorithm analyses the face of the candidate to provide predictions on how the candidate is feeling at the moment, this process is of immense assistance in conducting stress test for the candidate and check their pressure handling capacity.

Using all the data obtained in the interview i.e., the result of the questionnaire, the text based analysis of the Resume and the sentiment analysis using the video interview, the company can determine whether a candidate is suitable for their requirements. The results would be updated on Firebase and the candidate can view it on the web application created including some suggestions on how he/she could have done better. The results would be provided on the very day of the in interview and on selection the candidate can report to the company in a few days.

The software and hardware components used in this product were combined and utilized in different ways to meet diverse requirements and purposes to complete this project. Below are some of the proposed and utilized Hardware Components and Software, including various open-source libraries, packages and pre-trained models.

Hardware Stack:

i. Raspberry Pi 3B+: This variant of the RPI features a 1.4GHz 64-bit quad-core processor, a dual-band wireless LAN, Bluetooth 4.2/BLE support, faster Ethernet, and support for Power-over-Ethernet (with separate PoE HAT). ii. Camera module for Raspberry Pi version 2: It features two video modes (108Op3, 720p6, and 640 480p60/90), a Sony IMX219 sensor with a resolution of 3280 x 2464 pixels, and a Sony IMX219 sensor with a resolution of 3280 x 2464 pixels. iii. An LCD monitor/display module

Software Stack:

i. Firebase: It is a Backend-as-a-Service (BaaS) platform that also includes a real-time database and a slew of resources for rapidly developing applications. ii. Android Studio+Kotlin: Android Studio is Google's official integrated development environment. Kotlin is the language that Android Studio uses to write code. Alternatively, Flutter could be used. iii. Open CV+Zbar: The interface will make use of Google Text-to-Speech to read out the questions created by the machine learning/deep learning algorithm. iv. Yolo v3: A real-time object detection method, the network offers a simplified object representation that simplifies the detection of real-world objects. v. GPT2: It is capable of generating synthetic text samples and is ideal for creating questionnaires. vi. Keras, TensorFlow, Python and Raspbian OS

Reference will now be made in detail to the exemplary embodiment of the present disclosure. Before describing the detailed embodiments that are in accordance with the present disclosure, it should be observed that the embodiment resides primarily in combinations arrangement of the system according to an embodiment herein and as exemplified in FIG. 1

FIG. 1 illustrates the various steps that are required to conduct a typical interview in the classical way. In this, the candidates have to search and apply for the right interview that suits their needs. Interviewer schedules the interview and prepare the questionnaires accordingly. Later review process will disclose the results after executing the interview process.

FIG. 2 illustrates the block diagram of the proposed system of a real time emotion detection and multimodal interview preparation cum conduction system by analyzing one's essays, resume, expression and voice. The model that is developed is sub divided into four main components based on their technicality: front-end module, verification module, data-base module, and a back-end processor/compiler. This hardware infrastructure developed solves and takes care of almost all the setup, scrutiny and conduction of interviews. As part of the job interviews, while using our product, candidates are faced with the choice of doing a text-based, audio, or video-based interview, and need to send all the necessary information via the web application using Flask. All of the data that was collected in the application is saved in the database, and that is exactly where Firebase data is stored. Firebase data is employed by the image verification component, which is made up of a Raspberry Pi with a camera module. A number of QR-code processes occur on the back end, rather than on the front-end points, of the candidate's digital token. Until conducting the interview, the screening software ensures the identity of the applicant with the help of an interactive robot (optional). When the resume is submitted, the interview algorithm does some of the prep work for the interviewer to ensure that the correct questionnaire is created and then gets the interview started.

FIG. 3 illustrates the External flow of an Al based Interview of a real time emotion detection and multimodal interview preparation cum conduction system by analyzing one's essays, resume, expression and voice. In this scenario, the candidate attends the Al based interview. The deep learning model scores the candidate based on facial expressions, voice, context and correctness. This sentimental analysis along with document verification and resume review is conducted to generate a final report of the candidates. The interview report is mailed to the program manager wherein the candidate either selected or rejected based on recommendations and final review.

FIG. 4 illustrates the emotion detection analysis of a real time emotion detection and multimodal interview preparation cum conduction system by analyzing one's essays, resume, expression and voice. As shown in the figure the results are obtained for sentiment analysis through video depends on regional analysis that is the emotions on the face are determined by certain factors concerning various feature of the face.

FIG. 5 illustration of perceive emotions of the candidates in graphical and percentage format of a real time emotion detection and multimodal interview preparation cum conduction system by analyzing one's essays, resume, expression and voice. The perceived emotions of the candidate can be obtained in a graphical and percentage format. This information can be used to analyze the candidates mental state during the interview and finally decide whether they are capable of stress handling.

FIG. 6 illustrates the comparative graphical representation of emotions of a real time, emotion detection and multimodal interview preparation cum conduction system by analyzing one's essays, resume, expression and voice. Comparisons are conducted between the candidates for reliable results. All the perceived emotions are drawn with respect the probability of each emotion occurring over time. This provides suitable data for the recruiters to hire candidates.

In the following description, for the purpose of explanation, numerous specific details are set forth in order to provide a thorough understanding of the arrangement of the system according to an embodiment herein. It will be apparent, however, to one skilled in the art that the present embodiment can be practiced without these specific details. In other instances, structures are shown in block diagram form only in order to avoid obscuring the present invention.

Claims

1. A real-time emotion detection and multimodal interview preparation system, the system including:

a textual analysis unit;

a sentiment analysis unit;

a processor;

a camera module; and

a display module.

2. The system according to claim 1, wherein the textual analysis unit is used to analyze the textual information gathered from a resume.

3. The system according to any one of the preceding claims, wherein the task of the sentiment analysis unit is to capture the facial expression using the camera module, and analyze the emotions.

4. The system according to any one of the preceding claims, wherein the processor is a 64-bit quad core processor.

5. The system according to any one of the preceding claims, wherein the display module is an LCD monitor for displaying the video contents.