WO2023012818A1

WO2023012818A1 - A non-invasive multimodal screening and assessment system for human health monitoring and a method thereof

Info

Publication number: WO2023012818A1
Application number: PCT/IN2022/050687
Authority: WO
Inventors: Gopinath VARADHARAJAN; Pooja HEMMIGE SHWETHADRI; Vijaygopal Rengarajan
Original assignee: Sparcolife Digital Healthcare Technologies Private Limited
Priority date: 2021-07-31
Filing date: 2022-07-30
Publication date: 2023-02-09

Abstract

The present invention provides a non-invasive multimodal screening and assessment system (100) and a method thereof for human health monitoring and utilizes Facial expression-emotion recognition system and method which serves as indicators of health disorders including depression, anxiety and trauma and emotion recognition is used to quantify emotions and design better treatment programs for patients. The present invention employs a wired or wireless camera (101) consisting of a wired/ wireless HD camera, a hardware interface such as a workstation or a kiosk (102) and other parts (107, 108, 109, 110, 111). The method for assessment and screening of human health monitoring involves capturing the video during questionnaire session, detecting human emotions, conducting facial analysis, speech analysis and classifying speech to text characterized to identify emotions. The present system is reliable source for health monitoring with high speed, accuracy and at affordable cost.

Description

Title

A NON-INVASIVE MULTIMODAL SCREENING AND ASSESSMENT SYSTEM FOR HUMAN HEALTH MONITORING AND A METHOD THEREOF

FIELD OF THE INVENTION

The present invention generally relates to telehealth-based diagnosis and treatment. More particularly, the present invention relates to a non-invasive multimodal screening and assessment system for human health monitoring. Even more particularly, this invention relates to a non-invasive multimodal screening and assessment system for mental health and physical health based on multiple factors such as stress and anxiety management.

BACKGROUND OF THE INVENTION AND DESCRIPTION OF THE PRIOR ART

With the rising adoption of telehealth-based diagnosis and treatment, there are a rapidly increasing number of errors in diagnosis resulting in accentuated deterioration of actual quality of care. The sheer lack of awareness and reluctance of seeking medical/ professional help for assessment of the mental condition of human beings contributes to the overall low adoption of mental health care. There are multiple help lines and telepsychiatry consultation platforms for screening and assessing the mental condition of human beings, however, due to shortage of mental health professionals and related infrastructure, a large population of patients does not bode well for anyone as the quality of care diminishes. Also, the present solutions do not always pay attention to the overall appearance, mood, facial expression, body language and speech of the person and are dependent on Mental, Neurological and Substance Use (MNS) conditions during clinical assessment. The present methods of monitoring the health condition such as mental health is predominantly based on analyzing the responses provided by the patient to a specific type of multi choice questionnaire which curates the responses in a physical mode, however, there is a high chance that the data may not get rightly captured since the patient’s mental at the time of attempting the questionnaire is not being taken into consideration hence some of the disorders and stress levels while answering the questionnaire may also go unnoticed and the treatment protocols designed as a result of the analysis of responses provided by patient are resulting in accentuated deterioration of actual quality of care.

Reference may be made to US Patent 7,999,857 B2 entitled “Voice, lip-reading, face and emotion stress analysis, fuzzy logic intelligent camera system” which discloses an intelligent camera security monitoring, fuzzy logic analyses and information reporting system that includes video/audio camera, integrated local controller, interfaced plurality of sensors, and input/output means, that collects and analyses data and information observations from a viewed scene and communicates these to a central controller. The central controller with fuzzy logic processor receives, stores these observations, conducts a plurality of computer analyses techniques and technologies including face, voice, lip reading, emotion, movement, pattern recognition and stress analysis to determine responses and potential threat of/by a person, crowd, animal, action, activity or thing. The possible applications of the system are recognition of terrorists, criminals, enraged or dangerous persons as well as a person's level of intoxication or impairment by alcohol or drugs via a new “Visual Response Measure”.

Reference may be made to, which relates to US patent US20100189313A1 entitled “System and method for using three dimensional infrared imaging to identify individuals” which discloses Calibrated infrared and range imaging sensors to produce a true-metric three-dimensional (3D) surface model of any body region within the fields of view of both sensors. Curvilinear surface features in both modalities are caused by internal and external anatomical elements. They are extracted to form 3D Feature Maps that are projected onto the skin surface. Skeletonised Feature Maps define subpixel intersections that serve as anatomical landmarks to aggregate multiple images for models of larger regions of the body, and to transform images into precise standard poses. Features are classified by origin, location, and characteristics to produce annotations that are recorded with the images and feature maps in reference image libraries. The system provides an enabling technology for searchable medical image libraries.

Reference may be made to, which relates to IN patent 201717031179 entitled “Method and system for real time visualization of individual health condition on a mobile device” which discloses a method and technology to display 3D graphical output for a user using body sensor data personal medical data in real time. Embodiments of the application are describing the system and method for a real time visualization of the individual health condition on a mobile device or other devices. The mobile device would display a specific organ by gathering the vital signs, nutrition, activity level and medical data of a person who is wearing or using the device.

Hence, there is need of development of multimodal recognition system for tracking data in multi-mode recognition at a time, data acquisition unit for delivering the questionnaire in audio/visual format responses, capability of analyzing the physiological and emotional state of the human being, accommodation of precisely working classifiers which classify the signal based the condition to form complete image.

Given aforesaid disadvantages in the health monitoring system, the present invention is a non-invasive multimodal screening and assessment system and methods for multifarious human health conditions which includes wired or wireless image acquisition module such as non-mydriatic fundus camera, a system of hardware interface and a system and method of monitoring, screening and assessing the health conditions to minimize the diagnostic error rate. All methods and systems described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g. “such as”) provided with respect to certain embodiments herein is intended merely to better illuminate the invention and does not pose a limitation on the scope of the invention otherwise claimed. No language in the specification should be construed as indicating any non-claimed element essential to the practice of the invention.

OBJECTIVES OF THE INVENTION

Primary objective of the invention is to provide a non-invasive multimodal screening and assessment system for human health monitoring and a method thereof.

Another objective of the present invention is to provide a method and system for facial expression-emotion recognition.

Yet another objective of the present invention is to provide a video analytics module for screening and assessment for early diagnosis of multifarious health conditions by analyzing Facial Emotion Recognition, Micro-expression Analysis, Speech Emotion Recognition, Body Language Analysis, Pupillometry / Pupillary Responses as a Biomarker.

Still another objective of the present invention is to provide a computing devicebased system and method to provide standardized and personalized assessments that can help determine one or more health conditions of a given user.

Another objective of the present invention is to provide hardware interfaces such as a workstation or a kiosk coupled with a computing device-based system and method to provide advanced diagnostics following the results generated based on the computing device-based system. Yet another objective of the present invention is to provide a system and method for Retinal Imaging to detect multifarious health conditions using wired/ wireless non-mydriatic fundus cameras.

Still another objective of the present invention is to provide a cloud-based data analytics and data management system.

These and other objects and advantages of the present invention will become readily apparent from the following detailed description taken in conjunction with the accompanying drawings.

SUMMARY OF THE INVENTION

Accordingly, the present invention provides a non-invasive multimodal screening and assessment system for human health monitoring and a method thereof for human health monitoring which is used as a diagnostic tool utilizing video analytics as primary modality for screening and assessment, retinal imaging-based biomarker identification as a secondary modality and using synchronized inputs from biofeedback and neurofeedback as an auxiliary modality. The said video analytics mainly comprises Facial Emotion Recognition, Micro -expression Analysis, Speech Emotion Recognition, Body Language Analysis and Pupillometry / Pupillary Responses as a Biomarker for determining the anomaly in the mental and physical health condition. The above mentioned modalities can be achieved by analyzing the data from the passive sensors together with artificial intelligence to identify emotions and cognitive states of a person.

Facial expression-emotion recognition serve as indicators of health disorders including depression, anxiety and trauma and emotion recognition is used to quantify emotions and design better treatment programs for patients. The non-invasive multimodal screening and assessment system for human health monitoring comprises of:

1. A wired or wireless camera selected from non- mydriatic fundus camera, a thermal camera, a High resolution RGB camera, comprising of a wired/ wireless HD camera characterized to accommodate:

An ocular eye scope,

An ophthalmic condensing lens,

A connectivity module,

Charging enclosure with light bar and

A charging port

2. A hardware interface such as a workstation or a kiosk characterized into that:

A touch screen panel

Video acquisition module characterized to Recording for Facial Expression/Speech Emotion Recognition with Zoom, Pan and Tilt functionality

Connecting ports for Biofeedback and Neurofeedback Sensors selected from Brain Activity (EEG), Muscle Tension (EMG), Heart Rate (ECG), Respiration Rate, Pulse (BVP) and Pulse Oximetry, Skin Conductance (SC/GSR), Peripheral and Body Temperature, Eye Movement (EOG) and other biosensors.

The method for non-invasive multimodal screening and assessment comprises of: a. Capturing the video while the person is answering the questionnaire. The system will extract the image frames from video and save them in a testing image database. b. Detecting human emotions such as anger, fear, disgust, happiness, sadness, surprise, contempt. c. Recognizing object and scene detection, facial analysis with sentiment tracking, image moderation detecting explicit content, face comparison, face recognition and celebrity recognition. d. Speech recognition to identify the emotions by using frequency characteristics (such as accent shape, average pitch, pitch range etc), Time related features such as speech rate and speech and frequency and voice quality parameters and energy descriptors such as breathiness, brilliance, loudness, pause and pitch discontinuity. e. Classifying speech to text characterized to identify emotions such as anger, fear, disgust, happiness, sadness, surprise, contempt.

In one aspect of the present invention discloses retinal image diagnostic using Wireless Camera selected from Non-mydriatic Fundus camera, Thermal camera, High resolution RGB camera wherein the cloud based retinal image processing engine explores the retinal neurovascular architecture and the retinal ganglion pathways linking to the Central Nervous System (CNS). The present invention also discloses an interface between ophthalmology, neurology and image processing which with the help of retinal phenotyping is able to detect and assess multiple of candidate biomarkers including history of disease and disease progression.

In an aspect of the present invention the hardware interface such as a workstation or a kiosk transmits the data to the Cloud Server using in-built GPRS Module/Ethemet Port/Wi-Fi.

In one aspect, the present inventions provides a method and system for enabling improved adherence of drug intake during clinical trial and also serve as a reliable real-time pharmacovigilance tool.

BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS

Reference will be made to embodiments of the invention, examples of which may be illustrated in accompanying figures. These figures are intended to be illustrative, not limiting. Although the invention is generally described in the context of these embodiments, it should be understood that it is not intended to limit the scope of the invention to these particular embodiments.

FIG. 1 represents system components for non-invasive multimodal screening and assessment for enabling human health monitoring

FIG. 2 represents a block diagram for the method of the non-invasive multimodal screening and assessment for human health monitoring

FIG 3 represents a flowchart for a method of predicting health condition depending on emotional signs relies on the observation of the face, gestures or body posture

FIG 4 represents a graph of face expressions analysis which are captured during the questionnaire session of the human being

Although the specific features of the present invention are shown in some drawings and not in others. This is done for convenience only as each feature may be combined with any or all of the other features in accordance with the present invention.

Table No. 1: Legend and Legend Description

DETAILED DESCRIPTION OF THE INVENTION

In the following detailed description, a reference is made to the accompanying drawings that form a part hereof, and in which the specific embodiments that may be practiced is shown by way of illustration. These embodiments are described in sufficient detail to enable those skilled in the art to practice the embodiments and it is to be understood that other changes may be made without departing from the scope of the embodiments. The following detailed description is therefore not to be taken in a limiting sense.

According to the one embodiment, the present invention provides non-invasive multimodal screening and assessment method and system (100) thereof for human health monitoring which is used as a diagnostic tool utilizing video analytics as primary modality for screening and assessment, retinal imaging-based biomarker identification as a secondary modality and using synchronized inputs from biofeedback and neurofeedback as an auxiliary modality. The said video analytics mainly comprises Facial Emotion Recognition, Micro-expression Analysis, Speech Emotion Recognition, Body Language Analysis and Pupillometry/Pupillary Responses as a Biomarker for determining the anomaly in the mental and physical health condition. The above-mentioned modalities can be achieved by analyzing the data from the passive sensors together with artificial intelligence to identify emotions and cognitive states of a person. Facial expression-emotion recognition serve as indicators of health disorders including depression, anxiety and trauma and emotion recognition can be used to quantify emotions and design better treatment programs for patients.

According to another aspect of present invention, the non-invasive multimodal screening and assessment method and system (100) thereof for human health monitoring, represented in Fig 1 comprises: (a) a wired or wireless camera (101) selected from non-mydriatic fundus, Thermal camera, High resolution RGB camera comprising of a wired/ wireless HD camera characterized to accommodate an ocular eye scope, an ophthalmic condensing lens (106) arranged coaxially to the light source, a connectivity module, charging enclosure with light bar and a charging port; and, (b) a hardware interface such as a workstation or a kiosk (102) characterized for data collection for the human for determining the health condition. The hardware interface such as a workstation or a kiosk (102) comprises of a touch screen panel (103); a video and audio acquisition module (104) characterized to Recording for Facial Expression/Speech Emotion Recognition with Zoom, Pan and Tilt functionality; sensor ports (105) for Biofeedback and Neurofeedback Sensors selected from Brain Activity (EEG), Muscle Tension (EMG), Heart Rate (ECG), Respiration Rate, Pulse (BVP) and Pulse Oximetry, Skin Conductance (SC/GSR), Peripheral and Body Temperature, Eye Movement (EOG) and other biosensors.

Another embodiment of the present invention discloses retinal image diagnostic using Wireless Camera having the lens for retinal imaging wherein the cloud based retinal image processing engine explores the retinal neurovascular architecture and the retinal ganglion pathways linking to the Central Nervous System (CNS). The present invention also discloses an interface between ophthalmology, neurology and image processing which with the help of retinal phenotyping will be able to detect and assess multiple of candidate biomarkers including history of disease and disease progression. In one of the embodiment of the present invention, the hardware interface (102) collected the data received from the data collection units (101, 103, 104, 105 and 106) and transmits it through the circuit board (107) to the data acquisition unit (108) for processing. The data is also transmitted to the Cloud Server (109, 110) using in-built GPRS Module/Ethernet Port/Wi-Fi by the workstation or a kiosk (102). The results from the data acquisition unit can be displayed on the remotely working display device like mobile phone (111).

Yet another embodiment of present invention, the method for non- invasive multimodal screening and assessment, represented in Fig 2 comprises of: (a) Capturing the video while the person is answering the questionnaire. The system (100) extracts the image frames from video and save them in a testing image database; (b) Detecting human emotions such as anger, fear, disgust, happiness, sadness, surprise, contempt; (c) Recognizing object and scene detection, facial analysis with sentiment tracking, image moderation detecting explicit content, face comparison, face recognition and celebrity recognition, (d) Speech recognition to identify the emotions by using frequency characteristics (such as accent shape, average pitch, pitch range etc.), Time related features such as speech rate and speech and frequency and voice quality parameters and energy descriptors such as breathiness, brilliance, loudness, pause and pitch discontinuity; and, (e) Classifying speech to text characterized to identify emotions such as anger, fear, disgust, happiness, sadness, surprise, contempt.

In one embodiment of the present invention, provides a method and system (100) for enabling improved adherence of drug intake during clinical trial and also serves as a reliable real-time pharmaco vigilance tool.

In one aspect, the present invention provides a working and Data processing by the system. For each couple there are sessions for a particular duration. Each of these sessions is recorded by the Experts using the Bispectral Camera. Basic metadata such as Age, Location, job etc will be collected initially. In one embodiment of the present invention, measuring facial cutaneous temperature and assessing both its topographic and temporal distribution can provide insights about the person’s autonomic activity. The main approaches to objectively measure these emotional signs rely on the observation of the face, gestures or body posture. Thermal IR imaging-based Affective Computing enables monitoring human physiological parameters and Autonomic Nervous System (ANS) activity in a non-contact manner and without subject’s constraining.

In one embodiment of the present invention, collection of data points is enabled. The data collected by the expert includes Thermal, collected using a thermal infrared sensor and RGB Video collected using an RGB camera, Audio signals, speech-to-text and the topics discussed with the couples in the form of metadata. Once the thermal and RGB video is collected, it is then processed to sync the frame rates with each other using the method and system (100) of present invention in order to detect the changes, frame 1 is then compared with a plurality of other frames collected throughout the session. Series of Gaussian Filters are applied to enhance the frame quality. The visual sample with the synced frame rate is passed through Drop Filter and Light Filter. The Drop filter is characterized to filter out the unwanted frames in both the Thermal and the RGB data, whereas the Light filter is characterized to eliminate the excess environment light and to provide ambient light setting in order to provide quality images. The audio recorded during the interview is also being simultaneously processed. During the recording of the session, the patient may take a pause, thinking or even remain silent for some time. This space in audio is synced with RGB/Thermal to enhance emotional cues. The Processed signal is then passed through the noise filter in order to eliminate the background noise from the speech signal. The visual, auditory signals and the text data obtained after filtering are then compressed and uploaded to the cloud. In one embodiment of the present invention, a method is provided for preparing base Deep Learning model. For RGB data, the method utilizes AffectNet and Extended Cohn-Kanade Dataset and similar datasets as a base model and preparing a deep neural network model. The system (100) and the method disclosed in the present invention uses the subjects in controlled environment with typical interactive questions. The captured RGB is used as a transfer learning model from the base. For Thermal data, preparing appropriate models for feature extraction and preparing appropriate artificial neural net model are employed. The method also includes converting audio inputs to Wave2Vector and extracting Mel Frequency and its Coefficient features and using the appropriate database of speech to detect an emotion and prepare a deep neural network classifier. The method also includes converting audio inputs to text for content, context and sentiment analysis.

In one embodiment of the present invention, a method is provided for transfer Model preparation using subjects in controlled environment. The method includes: Preparing base data model; Feeding this base data to base model for Transfer Learning; identifying emotional probabilities such as happy, sad, anger, disgust, fear, surprise and contempt; validating the data from the base data model and the predicting output and comparing with the experts opinion; the Back Propagation metrics is adjusted based on Expert / Specialist feedback during these sessions; Training the data till a model with average 70% accuracy across all modalities is prepared; Processing the input data; and, Once the input data is processed, the samples (Visual, Auditory and Text) are sent to the prepared model in the Cloud, and, the output from each modality is provided as % score for each mode.

These and other aspects of the embodiments herein will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawings. It should be understood, however, that the following descriptions, while indicating the preferred embodiments and numerous specific details thereof, are given by way of illustration and not of limitation. Many changes and modifications may be made within the scope of the embodiments herein without departing from the spirit thereof, and the embodiments herein include all such modifications.

In another embodiment of present invention FIG 3 represents a flowchart for a method of predicting health condition depending on emotional signs relies on the observation of the face, gestures or body posture

In another embodiment of present invention FIG 4 represents a graph of face expressions analysis which is captured during the questionnaire session of the human being. The analysis is done by converting the captured videos to frames and followed by the frame analysis is done. For each frame, the emotional score is calculated and plotted in on the graph against the duration of questionnaire session. The highly scored expression is treated as the result of the facial expression analysis.

ADVANTAGES OF THE INVENTION

• The system (100) and method of the present invention significantly reduces the need for human intervention in the process of screening, assessment and monitoring, while also improving the accuracy of the condition itself.

• The speed and accuracy of screening, assessment and monitoring makes it an affordable and better alternative to the present modalities.

• The system (100) and method can be installed in offices to monitor the stress level of the employees to provide support for their well-being as an attempt to increase productivity.

• The system (100) and method of the present invention can be employed in the pharmaceuticals industry to monitor the adherence of the drug intake by the patient during the clinical trials.

• The system (100) and method of present invention will be useful for finding the state of the person to identify fatigue and stress in order to avoid accidents in an industrial environment or armed forces.

• The system (100) and method of the present invention can help in identifying the learner’s affective state and the learning state by analyzing the facial expressions, which then can be further processed by the teacher to analyze the learner’s pattern and accepting ability, and formulate reasonable teaching plans.

• The system (100) and method of the present invention can be employed in the hospitals not only to monitor the patients but also the doctor’s state of mind before performing surgical procedures to avoid medical errors.

Claims

Claims We Claim,

1. A system for non- invasive multimodal screening and assessment for enabling human health monitoring (100), comprising: a. A wired or wireless camera (101), selected from non-mydriatic fundus camera, a thermal camera, High resolution RGB camera wherein the camera further comprises an ocular eye scope, an ophthalmic condensing lens (106) with coaxially placed light source, a connectivity module, charging enclosure with light bar and a charging port; b. A hardware interface in form of a workstation or a kiosk (102) characterized for data collection from the human during questionnaire sessions; c. A circuit board (107), characterized for containing the electric circuit to collect and transmit the data from the input/data collection unit; d. A data acquisition and processing unit (108), characterized for procession of the data collected from the A wired or wireless camera (101) and A hardware interface; e. A cloud storage and data store system (109), characterized for storing, analysing and communicating the relevant data to end users and clinicians; f. A cloud machine learning system (110) g. A Display device (111) with integrated application working remotely for which includes Mobile phone, tablets

2. The system for non-invasive multimodal screening and assessment for enabling human health monitoring (100), as claimed in claim 1 wherein the hardware interface in form of a workstation or a kiosk (102) comprising: a. A touch screen panel (103), characterized for the feeding of input and display of output of the system (100); b. A video and audio acquisition module (104), characterized for recording facial expression and speech emotion recognition with zoom, pan and tilt functionalities and; c. A plurality of connecting sensors (105), characterized for connection between plurality of sensors to the hardware interface including biofeedback and neurofeedback sensors selected from brain activity, muscle tension, heart rate, respiration rate, pulse, pulse oximetry, skin conductance, peripheral and body temperature and eye movement

3. The system for non-invasive multimodal screening and assessment for enabling human health monitoring (100), as claimed in claim 1 wherein, the hardware interface (102) is configured to transmit data acquired from the plurality of sensors to a remote or local cloud server module through wired or wireless means.

4. The system for non-invasive multimodal screening and assessment for enabling human health monitoring (100), as claimed in claim 1 wherein, the system further comprises of a cloud based retinal image processing engine is configured to analyze a retinal image captured by the camera and explore the retinal neurovascular architecture, and the retinal ganglion pathways linking to the Central Nervous System which enables studying an interface between ophthalmology, neurology and image processing the help of retinal phenotyping detects and assesses a plurality of candidate biomarkers including history of disease and disease progression.

5. A method for non-invasive multimodal screening and assessment for enabling human health monitoring, comprising: a. Capturing a video while the person answers a questionnaire and a plurality of other biomarkers using a plurality of sensors; b. Extracting a plurality of image frames from video and storing them in a testing image database; c. Detecting human emotions such as anger, fear, disgust, happiness, sadness, surprise, contempt as well as micro expressions using image analysis; d. Recognizing object and detecting scenes, performing facial analysis with sentiment tracking, moderating images by detecting explicit content, comparing faces, and facial recognition; e. Recognizing speech to identify the emotions by using frequency characteristics such as accent shape, average pitch, pitch range, and time related features such as speech rate, speech frequency, voice quality parameters and energy descriptors such as breathiness, brilliance, loudness, pause and pitch discontinuity; and, f. Classifying speech to text characterized to identify emotions such as anger, fear, disgust, happiness, sadness, surprise, contempt.

6. The method for non-invasive multimodal screening and assessment for enabling human health monitoring as claimed in claim 4, wherein the measuring facial cutaneous temperature and assessing its topographic and temporal distribution provides insights about a person’s autonomic activity, and wherein, the main approaches to objectively measure these emotional signs rely on the observation of the face, gestures or body posture, and wherein, Thermal IR imaging-based affective computing enables monitoring human physiological parameters and Autonomic Nervous System (ANS) activity in a non-contact manner and without subject’s constraining.

18

7. The method for non- invasive multimodal screening and assessment for enabling human health monitoring as claimed in claim 4, wherein the collection of a plurality of data points is enabled which includes: a. A thermal data, wherein the thermal data is collected using a thermal infrared sensor and RGB Video collected using an RGB camera and; b. A audio data, wherein the audio data includes audio signals, speech- to-text and audio recordings of the topics discussed with the human subjects in the form of metadata wherein, the thermal data and the RGB video are processed to sync the frame rates with each other in order to detect the changes, and wherein, a series of Gaussian filters are applied to enhance the frame quality, and wherein, the visual sample with the synced frame rate is passed through Drop Filter and Light Filter, and wherein, the Drop filter is characterized to filter out the unwanted frames in both the thermal and the RGB data, whereas the Light Filter is characterized to eliminate the excess environment light and to provide ambient light setting in order to provide quality images, and wherein, the audio data is simultaneously processed and suitably synced with RGB or thermal data to enhance emotional cues, and wherein, the Processed signal is then passed through the noise filter in order to eliminate the background noise from the speech signal, and wherein, the visual, auditory signals and the text data obtained after filtering are then compressed and uploaded to a cloud storage module.

8. The method for non- invasive multimodal screening and assessment for enabling human health monitoring as claimed in claim 4, wherein a method is provided for preparing a base Deep Learning model, and wherein, for RGB data, the method utilizes AffectNet, Extended Cohn-Kanade Dataset and similar datasets as a base model and preparing

19 a deep neural network model, and wherein, the captured RGB are used as a transfer learning model from the base, and wherein, for thermal data, preparing appropriate models for feature extraction and preparing appropriate artificial neural net model are employed, and wherein, the method also includes converting audio inputs to Wave2Vector and extracting Mel Frequency and its coefficient features and using the appropriate database of speech to detect an emotion and prepare a deep neural network classifier, and wherein, the method also includes converting audio inputs to text for content, context and sentiment analysis.

9. The method for non-invasive multimodal screening and assessment for enabling human health monitoring as claimed in claim 4, wherein the method is provided for transfer model preparation using subjects in controlled environment, and wherein, the method includes: preparing base data model; feeding this base data to base model for Transfer Learning; identifying emotional probabilities such as happy, sad, anger, disgust, fear, surprise and contempt; validating the data from the base data model and the predicting output and comparing with the experts opinion; adjusting the Back Propagation metrics based on expert/specialist feedback during the sessions; training the data till a model with average 70% accuracy across all modalities is prepared; processing the input data; and, uploading the information and the samples (visual, auditory and text) are to the prepared model in the cloud computing module, and providing the output from each modality as a percentage score for each mode.

20