WO2021229600A1

WO2021229600A1 - Auscultation system for guiding a user to perform auscultation on a subject

Info

Publication number: WO2021229600A1
Application number: PCT/IN2021/050448
Authority: WO
Inventors: Vikram ARIA NARAYAN
Original assignee: Aria Narayan Vikram
Priority date: 2020-05-12
Filing date: 2021-05-10
Publication date: 2021-11-18

Abstract

The invention relates to an auscultation system for guiding a user to perform auscultation on a subject. The auscultation system includes a guide device which employs a computer vision model to identify one or more auscultation sites on the subject, and an augmented reality (AR) module to display the one or more auscultation sites on a screen of the guide device. The auscultation system also includes a stethoscope communicatively coupled to the guide device, to capture sound signals obtained from positioning the stethoscope on the subject at the one or more auscultation sites. The guide device is configured to recalculate position of the one or more auscultation sites using the computer vision model if the sound signals captured are not satisfactory, and to further direct the user to readjust the position of the stethoscope until the sound signals captured by the stethoscope are satisfactory.

Description

AUSCULTATION SYSTEM FOR GUIDING A USER TO PERFORM

AUSCULTATION ON A SUBJECT

FIELD OF THE INVENTION

[0001] The invention generally relates to auscultation in healthcare. Specifically, the invention relates to an Artificial Intelligence (Al)-enabled auscultation system for guiding a user to perform auscultation on a subject using Augmented Reality (AR) and to return a diagnosis and/or a screening report to the user.

BACKGROUND OF THE INVENTION

[0002] Auscultation is the act of listening to sounds of a human body using a stethoscope, to diagnose diseases. Auscultation can be used to diagnose a wide variety of diseases. Additionally, auscultation is non-invasive and inexpensive compared to other diagnostic methods such as, but not limited to, electrocardiograms (ECGs or EKGs) and X-rays.

[0003] Auscultation is an art that requires substantial tacit knowledge that can only be gained with practical experience. However, due to the growing population, there is a paucity of healthcare workers in many parts of the world today. Furthermore, there is an uneven distribution of health workers such that regions of higher poverty have lower numbers of healthcare workers. Owing to this, not many experienced professional healthcare workers are available to perform auscultation on patients.

[0004] Erstwhile auscultation methods employ traditional acoustic stethoscopes. Such acoustic stethoscopes transmit a signal that can sometimes be corrupted by noise leading to faulty diagnosis. Further improvements in this area has led to the rise of digital stethoscopes which help overcome the problem of faulty diagnosis. Additionally, the reliability of diagnosis inferred after auscultation depends on the expertise and hearing of the doctor. The diagnostician must be well-trained in positioning the stethoscope at various spots on the body. Moreover, the diagnostician must have significant experience in classifying these sounds. This skill requires significant time and mentorship to refine.

[0005] Recently, there has been a development of machine learning algorithms to automatically diagnose diseases when these models are supplied with digital signals from the stethoscope. However, the drawback with these solutions is that they still require a trained healthcare worker to operate the stethoscope.

[0006] Therefore, there exists a need in the field for a novel auscultation system which can be operated by any untrained volunteer, therefore, democratizing access to quality healthcare.

BRIEF DESCRIPTION OF THE FIGURES

[0007] The accompanying figures where like reference numerals refer to identical or functionally similar elements throughout the separate views and which together with the detailed description below are incorporated in and form part of the specification, serve to further illustrate various embodiments and to explain various principles and advantages all in accordance with the invention.

[0008] FIG. 1 illustrates an auscultation system for guiding a user to perform auscultation on a subject in accordance with an embodiment of the invention.

[0009] FIG. 2 illustrates a flow diagram of various method steps involved in the working of the auscultation system in accordance with an embodiment of the invention. [0010] Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions of some of the elements in the figures may be exaggerated relative to other elements to help to improve understanding of embodiments of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

[0011] Before describing in detail embodiments that are in accordance with the invention, it should be observed that the embodiments reside primarily in combinations of method steps and system components for an Artificial Intelligence (Al)-enabled auscultation system for guiding a user to perform auscultation on a subject using Augmented Reality (AR) and to return a diagnosis and/or a screening report to the user.

[0012] Accordingly, the system components and method steps have been represented where appropriate by conventional symbols in the drawings, showing only those specific details that are pertinent to understanding the embodiments of the invention so as not to obscure the disclosure with details that will be readily apparent to those of ordinary skill in the art having the benefit of the description herein.

[0013] The terms “a” or “an”, as used herein, are defined as one or more than one. The term plurality, as used herein, is defined as two or more than two. The term another, as used herein, is defined as at least a second or more. The terms including and/or having, as used herein, are defined as comprising (i.e., open language). The term coupled, as used herein, is defined as connected, although not necessarily directly, and not necessarily mechanically. The terms program, software application, and the like as used herein, are defined as a sequence of instructions designed for execution on a computer system. A program, computer program, or software application may include a subroutine, a function, a procedure, an object method, an object implementation, an executable application, an applet, a servlet, a source code, an object code, a shared library/dynamic load library and/or other sequence of instructions designed for execution on a computer system.

[0014] Various embodiments of the invention disclose an auscultation system for guiding a user to perform auscultation on a subject. The auscultation system includes a guide device which further includes a computer vision model configured to identify one or more auscultation sites on the subject. The guide device also includes an augmented reality (AR) module configured to display the one or more auscultation sites on a screen of the guide device. In an embodiment, the AR module employs a deep learning model, TensorFlow Lite PoseNet, to detect key locations on the subject’s body to identify the one or more auscultation sites. The AR module overlays the one or more auscultation sites on the subject’s body and the one or more auscultation sites are highlighted on the screen of the guide device.

[0015] The auscultation system also includes a stethoscope communicatively coupled to the guide device. The stethoscope can be, but need not be limited to, a digital stethoscope and an acoustic stethoscope. The stethoscope is configured to capture sound signals obtained from positioning the stethoscope on the subject at the one or more auscultation sites. Further, the guide device is configured to check if sound signals captured using the stethoscope are satisfactory based on location of the one or more auscultation sites. If the sound signals captured are not satisfactory, the guide device recalculates position of the one or more auscultation sites using the computer vision model, and further directs the user to readjust the position of the stethoscope until the sound signals captured by the stethoscope are satisfactory.

[0016] In accordance with an embodiment, the guide device further directs the user to place the stethoscope at the one or more auscultation sites again to test for vocal resonance after performing a preliminary auscultation on the subject. Voice commands are used to direct the subject to pronounce a plurality of phrases and the corresponding sounds from the subject are recorded using the stethoscope. The one or more auscultation sites are recalibrated and customized for accurate placement of the stethoscope based on movement of the stethoscope and the subject’s body dimensions. [0017] The auscultation system further includes a diagnostic module configured to interpret sound signals collected from the stethoscope at the one or more auscultation sites using one or more machine learning models. The one or more machine learning models classify the sound signals and predict various medical conditions/disorders. The medical conditions/disorders can be, but need not be limited to, lung disorders, cardiovascular and gastrointestinal diseases or conditions. In an embodiment, the diagnostic module employs a K-nearest neighbors machine learning model to identify abnormalities in the sound signals recorded by the stethoscope and returns a diagnosis and/or a screening report to the user.

[0018] FIG. 1 illustrates an auscultation system 100 for guiding a user to perform auscultation on a subject in accordance with an embodiment of the invention.

[0019] As illustrated in FIG. 1, auscultation system 100, inter alia, comprises a memory 102 (such as, but not limited to, a non-transitory or a machine readable memory), and a processor 104 (such as, but not limited to, a programmable electronic microprocessor, microcontroller, or similar device) communicatively coupled to memory 102. Memory 102 and processor 104 further communicate with various components of auscultation system 100 via a communication module 106.

[0020] Communication module 106 may be configured to transmit data between modules, engines, databases, memories, and other components of auscultation system 100 for use in performing the functions discussed herein. Communication module 106 may include one or more communication types and utilizes various communication methods for communication within auscultation system 100.

[0021] Auscultation system 100 includes a guide device 108 that can be, but need not be limited to, a smartphone, and an augmented reality (AR) device such as mixed reality smart glasses. Guide device 108 includes a computer vision model 110 configured to identify one or more auscultation sites on the subject.

[0022] Guide device 108 also includes an AR module 112 configured to display the one or more auscultation sites on a screen of guide device 108. AR module 112 overlays the one or more auscultation sites on the subject’s body and the one or more auscultation sites are highlighted on the screen of guide device 108.

[0023] Auscultation system 100 includes a stethoscope 114 communicatively coupled to guide device 108. Stethoscope 114 can be, but need not be limited to, a digital stethoscope and an acoustic stethoscope. Stethoscope 114 is configured to capture sound signals obtained from positioning stethoscope 114 on the subject at the one or more auscultation sites using AR module 112 which guides placement of stethoscope 114 on the subject’s body.

[0024] Once the process of auscultation using stethoscope 114 is executed, the information/sound signals collected is relayed back to guide device 108 via appropriate communication technologies such as wired communication and wireless communication including, but not limited to, Bluetooth, Wi-Fi, and Near-Field Communication (NFC).

[0025] Guide device 108 is further configured to check if sound signals captured using stethoscope 114 are satisfactory based on location of the one or more auscultation sites. If the sound signals captured are not satisfactory, guide device 108 recalculates position of the one or more auscultation sites using computer vision model 110, and further directs the user to readjust the position of stethoscope 114 until the sound signals captured by stethoscope 114 are satisfactory. [0026] In accordance with an embodiment, AR module 112 employs a deep learning model, TensorFlow Lite PoseNet, to detect key locations such as, but not limited to, hips and shoulders on the subject’s body, to identify the one or more auscultation sites. For instance, AR module 112, based on detecting the key locations on the subject’s body, identifies nine lung auscultation sites. This helps in guiding the user or a volunteer in placing stethoscope 114 on the subject’s body and collecting lung sounds using stethoscope 114.

[0027] Guide device 108 further directs the user to place stethoscope 114 at the one or more auscultation sites again to test for vocal resonance after performing a preliminary auscultation on the subject. Voice commands are used to direct the subject to pronounce a plurality of phrases (such as, but not limited to, “ninety-nine” and “blue balloons”) and the corresponding sounds from the subject are recorded using stethoscope 114.

[0028] The one or more auscultation sites are recalibrated and customized for accurate placement of stethoscope 114 based on movement of stethoscope 114 and the subject’s body dimensions. In accordance with an embodiment, the one or more auscultation sites are recalibrated using the deep learning model which recalculates positions of the one or more auscultation sites or points for each frame of a live video feed.

[0029] Auscultation system 100 further includes a diagnostic module 116 configured to interpret sound signals collected from stethoscope 114 at the one or more auscultation sites using one or more machine learning models 118. One or more machine learning models 118 classify the sound signals and predict various medical conditions/disorders. The medical conditions/disorders can be, but need not be limited to, lung disorders, cardiovascular and gastrointestinal diseases or conditions. [0030] In accordance with an embodiment, one or more machine learning models 118 classify the sounds, and make a diagnosis of “bronchophony”, “egophony” or “normal” and return this diagnosis to the user.

[0031] In accordance with another embodiment, diagnostic module 116 employs a K- nearest neighbors machine learning model to identify abnormalities in the sound signals recorded by stethoscope 114 and returns the diagnosis to the user. For instance, the K- nearest neighbors (KNN) machine learning model is used to identify abnormalities such as, but not limited to, wheezes and crackles in the lung sounds and this diagnosis is returned to the user.

[0032] In accordance with an exemplary embodiment, diagnosis of lung diseases using diagnostic module 116 is disclosed. Once lung sounds are collected by stethoscope 114 and sent to diagnostic module 116 (implemented as an application (app), for example), diagnostic module 116 performs a screening of the sounds captured and classifies the sounds as being normal or abnormal.

[0033] In order to perform the screening and classification of the sounds, one or more machine learning models 118 are trained using training data from an open sourced respiratory sounds database available in a web-based data science environment. For instance, the training data includes 920 annotated recordings of varying length (10 seconds to 90 seconds) Taken from 126 patients, the database has a total of 6898 respiratory cycles including normal breath sounds and adventitious sounds (crackles and wheezes). The data also includes clear recordings as well as recordings with background noise in order to simulate real-life conditions. The patients span all age groups which include children, adults and the elderly.

[0034] In accordance with an embodiment, a KNN machine learning model or classifier is used which separates sounds based on their proximity to other sounds. This proximity is determined on the basis of statistics derived from Mel Frequency Cepstral Coefficients (MFCCs) which represent perceptually meaningful sound features. In order to analyze respiratory sounds, statistical features (mean and standard deviation) are taken from the extracted MFCCs. These act as features for the KNN classifier.

[0035] An audio data preprocessing pipeline for training the model is as follows.

[0036] Each sound file has an associated label file. The label file contains the following information: start time of breath cycle, end time of breath cycle, whether crackles are present (represented by 0 or 1), and whether wheezes are present (represented by 0 or 1).

[0037] The sound files are then loaded into a numpy array format using Librosa, a Python package/audio library for music and audio analysis.

[0038] The sound files are then split based on breath cycles. The sound clips and associated labels are split up into training and validation data (the training data consists of 70% of the total data and the validation data consists of 30% of the total data). The validation and training data are split randomly. 50 MFCCs are obtained for each sound clip, using a built-in Librosa function.

[0039] The statistical mean and standard deviation measures are then derived from the MFCCs obtained above in order to reduce the time dependent frequencies into a single vector with 100 components. The feature vector thus obtained is standardized by removing the mean and scaling the vector to unit variance.

[0040] Once the KNN model (with nearest neighbors parameter of 3) is trained on the training data and validated on the validation data, the model is used to make predictions on respiratory sounds recorded through stethoscope 114. [0041] Below is a brief description of the windowing method used for each recording to avoid the need for annotation of breath cycles.

[0042] An audio sample is converted into a numpy array format using the Librosa audio library.

[0043] The audio file is then split into smaller time chunk windows disregarding length of breath cycles. The time chunk windows vary in time lengths to simulate real-life breath cycle durations.

[0044] The audio clips are then passed through the following preprocessing pipeline which includes generating MFCCs, obtaining means and standard deviations in a single vector for these MFCCs, and transforming these MFCC statistics using the Standard Scaler created during training.

[0045] The vectors representing the sound clips created through the model predictor are then passed to the KNN model. If the KNN model predicted any of the clips as containing an adventitious breath sound, the whole recording is predicted as abnormal.

[0046] FIG. 2 illustrates a flow diagram of various method steps involved in the working of auscultation system 100 in accordance with an embodiment of the invention.

[0047] A volunteer uses stethoscope 114 (for example, a digital stethoscope) in conjunction with guide device 108. At step 202, the volunteer positions guide device 108 towards a patient. In accordance with an embodiment of the invention, using AR module 112, various auscultation points are overlaid on the patient’s body. The auscultation points are highlighted on the screen of guide device 108 as well. [0048] In order to identify the auscultation points, the deep learning model of AR module 112 identifies key points such as, but not limited to, shoulders and hip, on a supplied human image. For instance, using these pre-generated points, nine lung auscultation points are generated using calculations derived through consultation with medical professionals. The specific procedure is as follows.

[0049] Distances of the auscultation points from the pre-generated key points (such as hips and shoulders) are obtained by analyzing data from medical auscultation procedures. Statistical measures such as mean and standard deviation across auscultation sites, marked by different medical professionals, are then calculated. Finally, the auscultation sites or points are generated using the pre-generated key points in conjunction with the above statistical measures.

[0050] In an ensuing step 204, the sound signals collected by stethoscope 114 are sent to guide device 108. At step 206, guide device 108 checks if the sound signals captured by stethoscope 114 are satisfactory based on the location. If the sound signals captured are not satisfactory, at step 208, the position of an auscultation point is recalculated, and the volunteer is directed to readjust the position of stethoscope 114 until the sound signals received are satisfactory. The auscultation points are recalibrated using the deep learning model which recalculates positions of the auscultation points for each frame of a live video feed.

[0051] At step 210, guide device 108 checks if all the auscultation points have been examined. If all the auscultation points have not been examined, the volunteer is directed to move on to the next auscultation point highlighted on the screen of guide device 108 for continuing the process of auscultation.

[0052] Finally, if all the auscultation points have been examined, at step 212, one or more machine learning models 118 are employed for processing the information to provide a diagnosis by predicting the presence or absence of medical conditions such as, but not limited to, heart murmurs, pneumonia, and abdominal bruits.

[0053] The present invention is advantageous in that it provides a system for AI- enabled auscultation using AR. The use of AR with AI ensures that the spots for auscultation can be found easily, thus making the process of auscultation faster and more efficient. Moreover, the use of AR to guide a volunteer while he or she is performing auscultation on the patient allows even an untrained layperson to operate it. Using a stethoscope connected to a smartphone app which uses AR and AI, even untrained volunteers can perform screening for lung disorders with an accuracy comparable to that of a medical professional.

[0054] Furthermore, extraction of key points such as hips and shoulders before applying the algorithm for stethoscope placement allows for extremely accurate collection of sounds as auscultation sites are customized according to the subject’s body dimensions. This can be used by medical students for training and educational purposes. It can also be used by untrained volunteers as the entire end-to-end process of auscultation, from stethoscope placement to diagnosis, is largely automated by the system of the present invention.

[0055] Additionally, the invention allows for a wide range of diseases to be diagnosed with high accuracy. As additional data is collected, the machine learning model accuracy improves, and diagnosis becomes better over time. Therefore, the invention provides access to quality healthcare and an accuracy of diagnosis is independent of the skill and expertise of a person performing auscultation.

[0056] Those skilled in the art will realize that the above recognized advantages and other advantages described herein are merely exemplary and are not meant to be a complete rendering of all of the advantages of the various embodiments of the present invention.

[0057] The system, as described in the invention or any of its components may be embodied in the form of a computing device. The computing device can be, for example, but not limited to, a general-purpose computer, a programmed microprocessor, a micro-controller, a peripheral integrated circuit element, and other devices or arrangements of devices, which are capable of implementing the steps that constitute the method of the invention. The computing device includes a processor, a memory, a nonvolatile data storage, a display, and a user interface.

[0058] In the foregoing specification, specific embodiments of the present invention have been described. However, one of ordinary skill in the art appreciates that various modifications and changes can be made without departing from the scope of the present invention. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of the present invention.

Claims

I/We Claim:

1. An auscultation system (100) for guiding a user to perform auscultation on a subject, the auscultation system (100) comprising: a guide device (108), wherein the guide device (108) comprises: a computer vision model (110) configured to identify one or more auscultation sites on the subject; and an augmented reality module (112) configured to display the one or more auscultation sites on a screen of the guide device (108); and a stethoscope (114) communicatively coupled to the guide device (108), wherein the stethoscope (114) is configured to capture sound signals obtained from positioning the stethoscope (114) on the subject at the one or more auscultation sites, wherein the guide device (108) is configured to: check if sound signals captured using the stethoscope (114) are satisfactory based on location of the one or more auscultation sites; recalculate position of the one or more auscultation sites using the computer vision model (110) if the sound signals captured are not satisfactory; and direct the user to readjust the position of the stethoscope (114) until the sound signals captured by the stethoscope (114) are satisfactory.

2. The auscultation system (100) as claimed in claim 1, wherein the augmented reality module (112) employs a deep learning model, TensorFlow Lite PoseNet, to detect key locations on the subject’s body to identify the one or more auscultation sites.

3. The auscultation system (100) as claimed in claim 1, wherein the augmented reality module (112) overlays the one or more auscultation sites on the subject’s body and the one or more auscultation sites are highlighted on the screen of the guide device (108).

4. The auscultation system (100) as claimed in claim 1, wherein the stethoscope (114) is one of a digital stethoscope and an acoustic stethoscope.

5. The auscultation system (100) as claimed in claim 1, wherein the guide device (108) further directs the user to place the stethoscope (114) at the one or more auscultation sites again to test for vocal resonance after performing a preliminary auscultation on the subject, wherein voice commands are used to direct the subject to pronounce a plurality of phrases and the corresponding sounds from the subject are recorded using the stethoscope (114).

6. The auscultation system (100) as claimed in claim 1, wherein the one or more auscultation sites are recalibrated and customized for accurate placement of the stethoscope (114) based on at least one of movement of the stethoscope (114) and the subject’s body dimensions.

7. The auscultation system (100) as claimed in claim 1 further comprises a diagnostic module (116) configured to interpret sound signals collected from the stethoscope (114) at the one or more auscultation sites using one or more machine learning models (118), wherein the one or more machine learning models (118) classify the sound signals and predict various medical conditions/disorders.

8. The auscultation system (100) as claimed in claim 7, wherein the diagnostic module (116) employs a K-nearest neighbors machine learning model to identify abnormalities in the sound signals recorded by the stethoscope (114) and returns a diagnosis and/or a screening report to the user.