WO2014045257A1 - System and method for determining a person's breathing - Google Patents

System and method for determining a person's breathing Download PDF

Info

Publication number
WO2014045257A1
WO2014045257A1 PCT/IB2013/058782 IB2013058782W WO2014045257A1 WO 2014045257 A1 WO2014045257 A1 WO 2014045257A1 IB 2013058782 W IB2013058782 W IB 2013058782W WO 2014045257 A1 WO2014045257 A1 WO 2014045257A1
Authority
WO
WIPO (PCT)
Prior art keywords
breathing
person
audio signal
expiration
inspiration
Prior art date
Application number
PCT/IB2013/058782
Other languages
French (fr)
Inventor
David Paul Walker
Anna Mary BARNEY
Anne BRUTON
Judith HOLLOWAY
Dragana NIKOLIC
Jane Lucas
Original Assignee
Koninklijke Philips N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips N.V. filed Critical Koninklijke Philips N.V.
Publication of WO2014045257A1 publication Critical patent/WO2014045257A1/en

Links

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/08Detecting, measuring or recording devices for evaluating the respiratory organs
    • A61B5/0803Recording apparatus specially adapted therefor
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • A61B5/7267Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B7/00Instruments for auscultation
    • A61B7/003Detecting lung or respiration noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B2562/00Details of sensors; Constructional details of sensor housings or probes; Accessories for sensors
    • A61B2562/02Details of sensors specially adapted for in-vivo measurements
    • A61B2562/0204Acoustic sensors
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/0002Remote monitoring of patients using telemetry, e.g. transmission of vital signals via a communication network
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/08Detecting, measuring or recording devices for evaluating the respiratory organs
    • A61B5/0816Measuring devices for examining respiratory frequency
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/103Detecting, measuring or recording devices for testing the shape, pattern, colour, size or movement of the body or parts thereof, for diagnostic purposes
    • A61B5/11Measuring movement of the entire body or parts thereof, e.g. head or hand tremor, mobility of a limb
    • A61B5/113Measuring movement of the entire body or parts thereof, e.g. head or hand tremor, mobility of a limb occurring during breathing
    • A61B5/1135Measuring movement of the entire body or parts thereof, e.g. head or hand tremor, mobility of a limb occurring during breathing by monitoring thoracic expansion
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/68Arrangements of detecting, measuring or recording means, e.g. sensors, in relation to patient
    • A61B5/6887Arrangements of detecting, measuring or recording means, e.g. sensors, in relation to patient mounted on external non-worn devices, e.g. non-medical devices
    • A61B5/6898Portable consumer electronic devices, e.g. music players, telephones, tablet computers
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Definitions

  • the invention relates to a system and method for determining a person's breathing.
  • the invention further relates to a Smartphone comprising the system, and to a computer program product for causing a processor system to perform the method.
  • Such measurement may provide data which characterizes the person's breathing, and which, in turn, enables further analysis and/or diagnosis based on said data.
  • a person may be requested to breathe into a mouth piece of a recording device that can measure the amount of air that is exhaled or the peak flow of the exhaled air.
  • a recording device that can measure the amount of air that is exhaled or the peak flow of the exhaled air.
  • a spirometer Such a device is referred to as a spirometer, and such type of measurement as spirometry.
  • a respiratory inductive plethysmograph in short RIP, being a flexible belt that is worn around the person's chest and which comprises electronic circuitry for measuring expansion and contraction of the person's rib cage.
  • WO 2010/015865 describes a breathing monitor having a sensor for sensing airflow from an individual's breath and for converting it to electronic signals, the sensor comprising a microphone without a wind shield and a temperature-sensing device, and the sensor being held by a flexible boom in proximity to the individual's nose or mouth.
  • the breathing monitor further comprises a processor to which the electronic signals are fed, the processor filtering the signals and differentiating between breathing signals corresponding to expiration and/or inspiration, and sound signals corresponding to external sounds and/or the individual's voice, and providing an output signal corresponding to the breathing signals, said output signal representing one or more breathing characteristics of the individual.
  • a problem of the aforementioned breathing monitor is that, although being less intrusive, it is still too cumbersome to use for determining a person's breathing.
  • a first aspect of the invention provides a system for determining a person's breathing, comprising:
  • an input for obtaining an audio signal the audio signal comprising a sound component constituting a speech recording of a person
  • a speech analyzer for analyzing the sound component by extracting a set of audio features from the audio signal which characterize the sound component
  • a breathing analyzer for establishing breathing measures indicative of one or more breathing cycles of the person's breathing during the speech recording by using the set of audio features to differentiate between inspiration and expiration in the sound component
  • a method for determining a person's breathing comprising:
  • the audio signal comprising a sound component constituting a speech recording of a person
  • a computer program product comprising instructions for causing a processor system to perform the method.
  • the audio signal comprises at least a sound component, being a signal component of the audio signal which represents a recording of the person speaking.
  • the sound component thus represents the sounds as generated by the person during speech and includes both the speaking sounds and the breathing sounds.
  • the audio signal may comprise other components, e.g., a background noise component or a pseudo-sound component, but which are generally considered undesirable and thus typically avoided as much as possible in generating or obtaining the audio signal.
  • the audio signal may be obtained from a microphone, which may or may not be part of the system.
  • the sound component of the audio signal is analyzed in that a set of audio features are extracted from the audio signal which together, at least to a certain degree, describe the sound component in the particular audio signal.
  • the set of audio features may constitute a set of speech features, in that many or all of the audio features may be well suitable for characterizing human speech. Multiple different audio features are extracted, together yielding a set of audio features.
  • the set of audio features is used to generate breathing measures.
  • the breathing measures are measures which are derived from the set of audio features and which allow one or more breathing cycles of the person during the speech recording to be identified.
  • the term breathing cycle refers to a cycle of inspiration and expiration, also commonly known as respiratory cycle.
  • the breathing measures may be indicative of the one or more breathing cycles in that they define, e.g., a period and a type of breathing occurring between during the period, i.e., inspiration or expiration.
  • the breathing measures are obtained by using the set of audio features to differentiate between inspiration and expiration in the sound component. Hence, depending on a value of one or more of the set of features, it is concluded whether the set of audio features indicates an inspiration or expiration.
  • breathing measures are made available, e.g., for use in a further analysis and/or diagnosis, either directly or in the form of one or more breathing parameters.
  • breathing parameter refers to a parameter characterizing the persons breathing, and which is derivable from the breathing measures. Such breathing parameters are known per se.
  • one or more characteristics of the speech recording of the person are quantified in the form of data.
  • the inventors have recognized that such characteristics alone are indicative of the breathing of a person, and that in fact; it is not needed to record the airflow from a person's breath.
  • the recording of airflow results in pseudo- sounds being recorded, which are caused by, e.g., the airflow moving a membrane of the microphone and thus mimicking, to a certain degree, a movement caused by sound waves.
  • pseudo-sounds differ from actual sounds such as speech in that, e.g., pseudo- sounds travel at a speed of the airflow rather than at the speed of sound.
  • the breathing measures By establishing the breathing measures by using the set of audio features to differentiate between inspiration and expiration in the sound component, it is not needed to use or rely on such pseudo-sounds to establish the breathing measures. Rather, the breathing measures are derived directly from a sound component in the audio signal representing the person's vocal sounds.
  • a conventional microphone can be used, without a need to position the conventional microphone in a path of the airflow, to remove wind shielding from the conventional microphone, and/or to use a temperature sensor.
  • the inventors have further recognized that the speaking sounds and the breathing sounds are both indicative of the person's breathing. By determining the person's breathing directly from the sound component representing the person's vocal sounds, it is not needed to separate the breathing sounds from the speaking sounds in the audio signal.
  • the breathing sounds complement and/or reaffirm the speaking sounds in being indicative of the person's breathing, thereby providing improved breathing measures.
  • a person's breathing may be determined from any speech recording.
  • the breathing analyzer is arranged for differentiating between inspiration and expiration in the sound component by applying an inspiration and expiration classifier to the set of audio features.
  • the analysis is thus based on a classification, in which an inspiration and expiration classifier is applied to the set of audio features so as to determine whether the set of audio features indicates an inspiration or an expiration.
  • the inspiration and expiration classifier has been trained using the set of audio features.
  • a training- based approach is used to obtain the classifier. Training-based approaches have been determined to be well suited for this particular purpose.
  • the speech analyzer is arranged for i) obtaining a plurality of segments of the audio signal, and ii) extracting the set of audio features from each one of the plurality of segments; and the breathing analyzer is arranged for establishing the breathing measures by classifying each one of the plurality of segments as either inspiration or expiration based on the respective set of audio features.
  • the breathing measures are thus obtained by performing the process of extracting the set of audio features and differentiating between inspiration and expiration for each of the plurality of segments separately.
  • the breathing analyzer is arranged for further classifying each one of the plurality segments according to a type of inspiration or expiration.
  • the set of audio features is thus analyzed to additionally obtain a type of inspiration or expiration, the type being, e.g., a quiet expiration, an expiration with voice, etc.
  • the inventors have recognized that also the type of inspiration and/or expiration can be derived from the sound component via the extracted set of audio features.
  • more detailed breathing measures are obtained, thereby improving the further analysis and/or diagnosis.
  • each one of the plurality of segments partially overlaps in time with at least another one of the plurality of segments.
  • segments which partially overlap in time with another segment more accurate breathing measures are obtained.
  • the set of audio features comprises at least one of: a temporal audio feature, a spectral moment feature, a Mel-Frequency Cepstral Coefficient (MFCC) feature, a Perceptual Linear Predictive Cepstral Coefficient (PLP-CC) feature, and a prosody feature.
  • MFCC Mel-Frequency Cepstral Coefficient
  • PLP-CC Perceptual Linear Predictive Cepstral Coefficient
  • the speech analyzer is arranged for band-pass filtering and/or de- trending the audio signal before extracting the set of audio features from the audio signal.
  • the band-pass filtering and de-trending each allow the set of audio features to be more accurately extracted from the audio signal.
  • a background noise component which may be present in the audio signal is suppressed or reduced.
  • the breathing parameter is one of:
  • a ratio between inspiration duration and expiration duration is derivable from the breathing measures and well suited for use in further analysis, diagnosis and/or feedback to the person itself.
  • the breathing analyzer is arranged for establishing the person's lung function based on the breathing measures.
  • People with an impaired lung function will prioritize breathing over speaking.
  • a general practitioner may listen to a person speaking in order to obtain a first order indication of a respiratory impairment.
  • the lung function can be established more accurately and in an automatic manner.
  • the breathing analyzer is further arranged for attributing the person's lung function to a lung disease.
  • breathing measures used during speech differ from those of quiet respiration and vary with different types of lung disease, e.g., from "Speech breathing in patients with lung disease", Am Rev Respir Dis, Vol. 147 (5), pp. 1199-1206, 1993.
  • said lung disease may be determined directly from speech, without a need for breathing into a mouth piece or wearing a belt.
  • a Smartphone comprising the system set forth, the Smartphone comprising a microphone for telephony, wherein the microphone is arranged for obtaining the audio signal for use in analyzing the person's breathing.
  • a Smartphone offers an unobtrusive and convenient way of recording a person's speech.
  • the Smartphone may be used for continuously monitoring a person's breathing.
  • the Smartphone may unobtrusively monitor the person's breathing during, e.g., a telephone call, a video call, a voice dictation, or a voice-based input.
  • a software application may be provided comprising instructions which, upon execution, cause the Smartphone to determine the breathing measures according to the present invention.
  • obtaining the audio signal comprises prompting the person to, at least one of: describe an occurrence, describe an object, describe a picture, and read out aloud a passage of text.
  • the person may thus be actively prompted to speak so as to enable the breathing measures to be determined from a recording of the speech.
  • Fig. 1 shows a system being an embodiment of the present invention
  • Fig. 2 shows a method being an embodiment of the present invention
  • Fig. 3 shows a computer program product for performing the method
  • Fig. 4a shows an audio signal comprising a sound component
  • Fig. 4b shows breathing measures obtained from the sound component
  • Fig. 5 shows a methodology for obtaining trained classifiers.
  • Fig. 1 shows a system 100 for determining a person's breathing.
  • the system 100 comprises an input 120 for obtaining an audio signal 122.
  • the audio signal 122 comprises a sound component constituting a speech recording of a person.
  • the input 120 may take various forms, as is illustrated in Fig. 1.
  • the input 120 may be a microphone input connectable to a microphone 110 so as to obtain the audio signal 122 in real-time from the microphone 110.
  • the input 120 may be a data input connectable to a data storage 112 comprising the audio signal 122 as stored data.
  • the input 120 may be a network interface for obtaining the audio signal 122 from a network 114, e.g., a Local Area Network (LAN) or the internet.
  • a network 114 e.g., a Local Area Network (LAN) or the internet.
  • the system 100 further comprises a speech analyzer 140.
  • the speech analyzer 140 is arranged for analyzing the sound component by extracting a set of audio features 142 from the audio signal 122 which characterizes the sound component. For obtaining the audio signal 122, the speech analyzer 140 is shown to be connected to the input 120.
  • the system 100 further comprises a breathing analyzer 160.
  • the breathing analyzer 160 is arranged for establishing breathing measures 162 indicative of one or more breathing cycles of the person's breathing during the speech recording by using the set of audio features 142 to differentiate between inspiration and expiration in the sound component. For obtaining the set of audio features 142, the breathing analyzer 160 is shown to be connected to the speech analyzer 140.
  • the system 100 further comprises an output 180 for providing the breathing measures 162 or a breathing parameter 164 derived from the breathing measures.
  • the output 180 is shown to be connected to the breathing analyzer 160.
  • the breathing analyzer 160 may be arranged for deriving the breathing parameter 164 from the breathing measures 162 and providing the breathing parameter 164 to the output 180 instead of, or in addition to, the breathing measures 162.
  • the output 180 may take various forms, as is illustrated in Fig. 1.
  • the output 180 may be a display output connectable to a display 190 for showing the breathing measures 162, the breathing parameter 164, a visual indicator derived therefrom, etc.
  • the output 180 may be a data output connectable to a data storage 192 for storing the breathing measures 162 or the breathing parameter 164 thereon.
  • the output 180 may be a network interface for providing the breathing measures 162 or the breathing parameter 164 via a network 194, e.g., via the internet.
  • the output 180 may be arranged for providing the breathing measures 162 or the breathing parameter 164 to a telehealth system via the network 194.
  • the input 120 obtains the audio signal 122.
  • the speech analyzer 140 analyzes a sound component of the audio signal 122 by extracting a set of audio features 142 from the audio signal 122 which characterizes the sound component.
  • the breathing analyzer 160 establishes breathing measures 162 indicative of one or more breathing cycles of the person's breathing during the speech recording by using the set of audio features 142 to differentiate between inspiration and expiration in the sound component.
  • the breathing analyzer 160 may derive a breathing parameter 164 from the breathing measures.
  • the output 180 then provides the breathing measures 162 or the breathing parameter 164, e.g., to another part of the system 100 or to another system.
  • Fig. 2 shows a method 200 for determining a person's breathing.
  • the method 200 may correspond to the aforementioned operation of the system 100. It is noted, however, that the method 200 may also be performed in separation of said system 100.
  • the method 200 comprises, in a step titled "OBTAINING AN AUDIO SIGNAL”, obtaining 210 an audio signal, the audio signal comprising a sound component constituting a speech recording of a person.
  • the method 200 further comprises, in a step titled “EXTRACTING SET OF AUDIO FEATURES”, extracting a set of audio features from the audio signal which characterizes the sound component.
  • the method 200 further comprises, in a step titled “ESTABLISHING BREATHING MEASURES BASED ON AUDIO FEATURES”, establishing 230 breathing measures indicative of one or more breathing cycles of the person's breathing during the speech recording by using the set of audio features to differentiate between inspiration and expiration in the sound component.
  • the method 200 further comprises, in a step titled "PROVIDING THE BREATHING MEASURES", providing 240 the breathing measures or a breathing parameter derived from the breathing measures.
  • Fig. 3 shows a computer program product 260 comprising instructions for causing a processor system to perform the method according to the present invention.
  • the computer program product 260 may be comprised on a computer readable medium 250, for example as a series of machine readable physical marks and/or as a series of elements having different electrical, e.g., magnetic, or optical properties or values.
  • Fig. 4a shows a schematic representation of an audio signal 122 comprising a sound component 124.
  • the schematic representation is in the form of a waveform.
  • the audio signal 122 is constituted primarily by the sound component 124 in that the audio signal 122 corresponds to a clean recording of the person speaking, i.e., without having recorded significant background noise.
  • the waveform shown is both of the audio signal 122 as well as of the sound component 124.
  • Fig. 4b shows a result of the present invention, in that it schematically illustrates breathing measures 162 as determined from the sound component 124.
  • EX an expiration
  • IN an inspiration
  • Said vertical dashed lines are also depicted in Fig. 4a to illustrate corresponding parts of the sound component 124.
  • Fig. 4b thus shows an alternating pattern of both labeled (vertical dashed lines) and detected (horizontal thick lines) expirations EX and inspirations IN.
  • the breathing measures 162 of Fig. 4b may be determined as follows from the sound component 124 of Fig. 4a.
  • the speech analyzer 140 may be arranged for obtaining a plurality of segments of the audio signal 122, e.g., by dividing the audio signal 122 into said segments so as to obtain a plurality of adjoining segments. Such segments may also be referred to as frames or windows. Each segment thus has a certain length, e.g., 50
  • Each one of the plurality of segments may overlap in time with at least another one of the plurality of segments, e.g., by 25 milliseconds.
  • the speech analyzer 140 may be arranged for dividing the audio signal 122 into adjoining segments which do not overlap.
  • the speech analyzer 140 may be further arranged for extracting a set of audio features 142 from each one of the plurality of segments.
  • audio feature refers to a mathematical expression, which, when using the audio signal 122 or a segment thereof as input, provides an output which characterizes one or more aspects of the audio signal 122 or the segment.
  • An example of an audio feature is Loudness, with other features being further discussed in reference to Table 1.
  • the breathing analyzer 160 may be arranged for establishing the breathing measures 162 by classifying each one of the plurality segments as either inspiration IN or expiration EX based on the respective set of audio features 142.
  • a classification-based approach may be used.
  • the breathing analyzer 160 may be arranged for differentiating between inspiration IN and expiration EX in the sound component 124 by applying an inspiration and expiration classifier to the set of audio features 142.
  • inspiration and expiration classifier may be manually designed, i.e., based on heuristics.
  • the inspiration and expiration classifier may be designed to take both situations into account by suitably defining heuristics for classifying only the former situation as expiration EX based on the set of audio features 142.
  • the classifier may be designed to take this into account.
  • the inspiration and expiration classifier may be trained, i.e., using an offline training methodology.
  • the training may make use of the set of audio features 142, which is typically the case but is not a limitation. For example, similar but non- identical audio features may be used.
  • the training may also comprise selecting the set of audio features 142, e.g., from a larger set of audio features 142. Therefore, in essence, the set of audio features 142 may constitute a subset of audio features 142. In the following, an example of such training is described. It will be appreciated, however, that many alternatives are known from the field of machine learning which may be advantageously used as well, e.g., neural networks, decision tree learning, etc.
  • Fig. 5 shows an example of the offline training methodology.
  • training data also referred to as control data
  • the training data is obtained from one or more subjects via two different methods which may be used individually or jointly.
  • a first method 310 uses a RIP signal 302 of the subject, i.e., a signal obtained from a respiratory inductive
  • a second method 320 is based on the manual detection of inspiration and expiration in a speech signal 304 of the same subject, i.e., an audio signal comprising a sound component.
  • breathing measures are obtained by, e.g., listening to the speech signal 304 and setting markers at a beginning and end of each inspiration phase under investigation. Based on those markers, each sample of the speech signal 304 may be labeled as 'IN' if it belongs to an inspiration phase or as 'EX' if it belongs to an expiration phase.
  • one or more breathing measures 312 are obtained.
  • the speech signal 304 together with the detected or labeled breathing phase 312 constitutes training data used in the subsequent training of the classifiers. It is noted that the above process may be repeated for more than one subject.
  • the speech signal 304 Before using the speech signal 304 in the aforementioned training, it may be pre-processed by band-pass filtering and/or de-trending the speech signal 304. This preprocessing may also be performed by the system 100, in that the speech analyzer 140 may be arranged for band-pass filtering and/or de-trending the audio signal 122 before extracting the set of audio features 142 from the audio signal.
  • the speech signal 304 may be de-trended by subtracting its mean and band-pass-filtering, e.g., using a fourth-order Butterworth filter in the frequency range from 60 Hz to 5000 Hz.
  • the speech signal 304 may be decomposed into segments before use in the training.
  • the speech signal 304 may be divided into segments of 50 ms length with a 50% overlap between adjacent segments.
  • the class labels i.e., the aforementioned 'IN' for an inspiration phase and 'EX' for an expiration phase, may be assigned to each segment based on the labels that are present predominantly within that segment. For example, if more than 50% of the samples within one segment are labeled 'IN', the segment in its entirety may be labeled 'IN' and thus considered as constituting or being part of an inspiration IN phase. In addition, so-termed transitional segments may be eliminated.
  • the segments in close proximity to a beginning and an end of each breathing phase may be omitted from the training. Close proximity may be defined by 100 ms for inspiration phases and 500 ms for expiration phases. Thus, segments which are considered to be in close proximity may not be considered in the training.
  • the training phase involves performing the steps as shown in the "CLASSIFICATION" block 350, in which the speech signal 304 is used as input, having been optionally pre-processed as noted above.
  • a step titled "Feature generation” 360 instantaneous audio features are computed for each segment of the speech signal 304.
  • the audio features may be audio features which are known per se from the field of audio analysis and the more specific field of speech analysis.
  • the audio features may correspond to those described by Florian Eyben et al. in "openSMILE - The Kunststoff Versatile and Fast Open-Source Audio Feature Extractor", Proc. ACM Multimedia, ACM, Florence, Italy, ISBN 978-1-60558-933-6, pp. 1459-1462, 25.-29.10.2010, and together constitute a feature space.
  • a 71 -dimensional feature vector 142 may be obtained for each of the segments, corresponding to the 71 audio features selected from the openSMILE toolkit.
  • the 71 -dimensional feature vector 142 may be provided in the form of training data to a step titled "Feature selection and extraction" 370.
  • An example of such a set of audio features 142 is shown in Table 1 below.
  • the set of audio features 142 is constituted by a plurality of temporal audio features, spectral moment features, Mel- Frequency Cepstral Coefficient (MFCC) features, Perceptual Linear Predictive Cepstral Coefficient (PLP-CC) features and prosody features.
  • MFCC Mel- Frequency Cepstral Coefficient
  • PDP-CC Perceptual Linear Predictive Cepstral Coefficient
  • Tabel 1 Example of a set of audio features
  • a further selection may be made from the 71 audio features so as to obtain a smaller set of audio features 144 for use in establishing the breathing measures. This may involve normalizing the 71 -dimensional feature vector 142 by z-score
  • Said significance may be obtained using an independent evaluation criterion for binary classification based on the two-sample t-test.
  • Said t-test may be applied to each feature and p- values may be obtained for each feature which serve as a measure of its discriminative power.
  • the p-values may be compared and used to sort the features accordingly.
  • timing information from the labeled breathing phase 312 may be used.
  • a subset of, e.g., 25 audio features may be selected comprising the 25 highest ranked features.
  • the set of audio features 144 is then provided for use in training the classifiers in a step titled "Train classifier” 375.
  • an inspiration and expiration classifier 376 may be trained using the set of audio features 144 and the labeled breathing phase 312.
  • the training may also be performed on the full set of audio features, e.g., the 71- dimensional feature vector obtained from the testing data in the "Feature generation” step 360.
  • the training itself may be based on a statistical classification method such as Naive Bayes (NB) classification, linear discriminant models, support vector machine (SVM) models, etc. All of the aforementioned methods have been verified to work well for this purpose.
  • NB Naive Bayes
  • SVM support vector machine
  • a Na ' ive Bayes classification may be used to predict class membership probabilities, i.e., the probability that a given sample or segment belongs to a particular class, being in this case either inspiration IN or expiration EX.
  • the set of audio features 142 may be modeled using a kernel smoothing density estimate.
  • a result of the statistical classification method a trained inspiration and expiration classifier 376 is then obtained.
  • the trained inspiration and expiration classifier 376 may then be provided for use in testing the classification in a step titled "Classification”
  • the testing of the classification may be as follows.
  • the trained inspiration and expiration classifier 376 may be applied to a set of audio features 145 derived from a testing speech signal 306.
  • the set of audio features 145 may be derived in a same manner from the testing speech signal 306 as described previously in reference to the speech signal 304 used for training, i.e., by computing a 71 -dimensional feature vector 143 from the testing speech signal 306, and deriving a smaller set of audio features 145 from the 71 -dimensional feature vector 143.
  • breathing measures 382 are obtained which are indicative of one or more breathing cycles of the person's breathing during the testing speech signal 306, e.g., a set of predicted breathing phases 382.
  • the system 100 may then use the trained inspiration and expiration classifier as follows.
  • the speech analyzer 140 may pre-process the audio signal 122 by de-trending and band-pass filtering the audio signal, e.g., by using the fourth-order Butterworth filter in the frequency range from 60 Hz to 5000 Hz as applied during training.
  • the audio signal 122 may be decomposed into segments of 50 ms, each having a 50% overlap.
  • the set of audio features 142 may be calculated as used during the training, e.g., the set of audio features 142 as shown in Table 1.
  • Each segment of the audio signal 122 may then be classified as either inspiration IN or expiration EX using the trained inspiration and expiration classifier.
  • the Naive Bayes method may compute the posterior probability of the segment belonging to each class using the prior class probabilities, which were estimated based on the relative frequencies of the classes in the training data, and then classify that segment according the largest posterior probability. As a result, a classification for each of the segments is obtained, which together may be used in generating the breathing measures 162.
  • the breathing measures 162 may be further post-processed, e.g., based on heuristics, to improve its quality. For example, clear inconsistencies, such as an inspiratory phase being too short or too long, may be removed or avoided.
  • the classification is a binary
  • the breathing analyzer 160 may also be arranged for further classifying each one of the plurality segments according to a type of inspiration or expiration. For example, an expiration may be classified as a quiet expiration or as an expiration with speech. Consequently, the classification may be non-binary, i.e., differentiate not only between inspiration and expiration but also between the specific type of inspiration and/or expiration.
  • the breathing analyzer 160 may further derive a breathing parameter 164 from the breathing measures 162, e.g., a mean inhalation rate, an inhalation duration variability, a mean exhalation rate, an exhalation variability, a number of breaths per minute or a ratio between inspiration duration and expiration duration. It is noted that the calculation of such parameters is known per se from the field of respiratory medicine.
  • the breathing analyzer 160 may also be arranged for establishing the person's lung function based on the breathing measures 162. For that purpose, one or more breathing parameters 164 may be calculated which may be compared against those in a medical database. As such, an estimate of the person's lung function may be obtained.
  • the breathing analyzer 160 may be arranged for attributing the person's lung function to a lung disease.
  • a medical database may be accessed comprising characteristics of lung diseases. The attributing may be based on the breathing measures 162 or one or more breathing parameters 164.
  • the breathing analyzer 160 may be arranged for i) deriving a set of breathing parameters from the breathing measures 162, ii) analyzing the set of breathing parameters to obtain a characteristic of the lung function, and iii) searching the medical database for a lung disease which matches the one or more characteristics, thereby attributing the lung function to the lung disease.
  • the system 100 may be arranged for distinguishing between lung diseases such as chronic obstructive pulmonary disease (COPD), asthma, pneumonia, emphysema, etc.
  • COPD chronic obstructive pulmonary disease
  • the present invention may be incorporated in a Smartphone. Since a
  • the Smartphone already comprises a microphone for phone calls, the microphone can be used for obtaining the audio signal 122 for use in analyzing the person's breathing.
  • the function of the speech analyzer 140 and the breathing analyzer 160 may be performed in software, e.g., by a software application running on the Smartphone.
  • the Smartphone may be arranged for determining the person's breathing from so-termed free speech or conversational speech.
  • the person is not required to speak specifically for the purpose of enabling the Smartphone to establish the breathing measures, but rather for a different purpose, e.g., a phone conversation.
  • the person may therefore interact with the Smartphone in a usual manner while the audio signal 122 is being analyzed unobtrusively in the background.
  • the Smartphone may also be arranged for continuously analyzing an audio signal 122 from the microphone.
  • the Smartphone when sitting on, e.g., a desk, may continuously record any speech in its vicinity and establish therefrom the breathing measures.
  • speaker recognition software may be used to identify the particular person.
  • the Smartphone may be arranged for analyzing speech provided specifically for the purpose of voice dictation or voice control.
  • any pre-processing of the audio signal 122 may be optimized in accordance with the type of Smartphone.
  • the upper limit of band-pass-filtering may be selected to be 4000 Hz or lower, i.e., below the Nyquist frequency.
  • the band-bass- filtering may be omitted in its entirety. Instead, a high-pass-filtering may be used.
  • the Smartphone may also be used to prompt the person to speak so as to be able to obtain the breathing measures 162 from a recording of the speech.
  • the Smartphone may be arranged for showing the person a picture and prompting the person describe the picture.
  • the Smartphone may also be arranged for prompting the person to describe an object or an occurrence, e.g., the person's day so far.
  • the Smartphone may also be arranged for prompting the person to read out aloud a passage of text, e.g., recent news, a joke, a story such as a bedtime story, etc.
  • the Smartphone may also be arranged for requiring the person to speak as part of a game, e.g., a speech controlled game, or a game requiring the person to speak for a predetermined period without deviation, repetition or hesitation.
  • system 100 may also be incorporated into a handheld device other than a Smartphone, such as a tablet.
  • the system 100 may be incorporated into another type of home electronics, such as a home television.
  • the system 100 may also be embodied as a standalone dedicated device which is arranged for recording a person speaking and then calculating and displaying, e.g., a so-termed speech breathing factor which indicates the respiratory health or relative respiratory health of the person.
  • the system 100 may be connectable to a telehealth system such as a Philips Motiva.
  • Philips Motiva is an example of an interactive healthcare platform that connects patients with chronic conditions to their healthcare providers - via a home television and an internet connection. Motiva automates disease management activities, and engages patients with personalized daily interactions and education delivered through the home television.
  • the system 100 may be incorporated into the aforementioned Smartphone or other personal device, with the output 180 of the system 100 being arranged for providing the breathing measures 162 or the breathing parameter 164 derived from the breathing measures via the internet to the telehealth system.
  • the system 100 may be incorporated into a remote part of the telehealth system.
  • the functions of the speech analyzer 140 and breathing analyzer 160 may be performed remotely, with only the speech of the person being recorded locally, i.e., at a location of the person such as a home.
  • an alarm, actionable feedback or a motivational message which may be derived from an analysis of the breathing measures 162, may be provided to the person locally using, e.g., a home television.
  • speech recording refers to speech being available in data form. Consequently, it does not imply the speech being prerecorded, i.e., it may equally refer to the speech being recorded and made available in real-time. It is further noted that determining a person's breathing refers to establishing data indicative of the breathing based on a study or investigation, with the subject of the study or investigation being here the audio signal comprising the sound component.
  • the invention also applies to computer programs, particularly computer programs on or in a carrier, adapted to put the invention into practice.
  • the program may be in the form of a source code, an object code, a code intermediate source and an object code such as in a partially compiled form, or in any other form suitable for use in the implementation of the method according to the invention.
  • a program may have many different architectural designs.
  • a program code implementing the functionality of the method or system according to the invention may be sub-divided into one or more sub-routines. Many different ways of distributing the functionality among these sub-routines will be apparent to the skilled person.
  • the subroutines may be stored together in one executable file to form a self-contained program.
  • Such an executable file may comprise computer-executable instructions, for example, processor instructions and/or interpreter instructions (e.g. Java interpreter instructions).
  • one or more or all of the sub-routines may be stored in at least one external library file and linked with a main program either statically or dynamically, e.g. at run-time.
  • the main program contains at least one call to at least one of the sub-routines.
  • the sub-routines may also comprise function calls to each other.
  • An embodiment relating to a computer program product comprises computer-executable instructions corresponding to each processing step of at least one of the methods set forth herein. These instructions may be sub-divided into subroutines and/or stored in one or more files that may be linked statically or dynamically.
  • Another embodiment relating to a computer program product comprises computer-executable instructions corresponding to each means of at least one of the systems and/or products set forth herein. These instructions may be sub-divided into sub-routines and/or stored in one or more files that may be linked statically or dynamically.
  • the carrier of a computer program may be any entity or device capable of carrying the program.
  • the carrier may include a storage medium, such as a ROM, for example, a CD ROM or a semiconductor ROM, or a magnetic recording medium, for example, a hard disk.
  • the carrier may be a transmissible carrier such as an electric or optical signal, which may be conveyed via electric or optical cable or by radio or other means.
  • the carrier may be constituted by such a cable or other device or means.
  • the carrier may be an integrated circuit in which the program is embedded, the integrated circuit being adapted to perform, or used in the performance of, the relevant method.

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Public Health (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Surgery (AREA)
  • Animal Behavior & Ethology (AREA)
  • Medical Informatics (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Veterinary Medicine (AREA)
  • Biomedical Technology (AREA)
  • Pathology (AREA)
  • Biophysics (AREA)
  • Signal Processing (AREA)
  • Physiology (AREA)
  • Pulmonology (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Computation (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Psychiatry (AREA)
  • Computational Linguistics (AREA)
  • Epidemiology (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

System (100) for determining a person's breathing, comprising: -an input (120) for obtaining an audio signal (122), the audio signal comprising a sound component(124) constituting a speech recording of a person; -a speech analyzer (140) for analyzing the sound component(124) by extracting a set of audio features(142, 144) from the audio signal (122) which characterizes the sound component; -a breathing analyzer (160) for establishing breathing measures (162) indicative of one or more breathing cycles of the person's breathing during the speech recording by using the set of audio features(142,144) to differentiate between inspiration (IN) and expiration (EX) in the sound component(124); and -an output (180) for providing the breathing measures(162) or a breathing parameter (164) derived from the breathing measures.

Description

SYSTEM AND METHOD FOR DETERMINING A PERSON'S BREATHING
FIELD OF THE INVENTION
The invention relates to a system and method for determining a person's breathing. The invention further relates to a Smartphone comprising the system, and to a computer program product for causing a processor system to perform the method.
In the field of health care, it is considered to be desirable to be able to measure a person's breathing. Such measurement may provide data which characterizes the person's breathing, and which, in turn, enables further analysis and/or diagnosis based on said data.
Several ways are known of measuring a person's breathing.
For example, a person may be requested to breathe into a mouth piece of a recording device that can measure the amount of air that is exhaled or the peak flow of the exhaled air. Such a device is referred to as a spirometer, and such type of measurement as spirometry. Another example is the use of a respiratory inductive plethysmograph, in short RIP, being a flexible belt that is worn around the person's chest and which comprises electronic circuitry for measuring expansion and contraction of the person's rib cage.
BACKGROUND OF THE INVENTION
Disadvantageously, the above ways of measuring a person's breathing are intrusive and cumbersome as they require breathing into a mouth piece or wearing a belt.
WO 2010/015865 describes a breathing monitor having a sensor for sensing airflow from an individual's breath and for converting it to electronic signals, the sensor comprising a microphone without a wind shield and a temperature-sensing device, and the sensor being held by a flexible boom in proximity to the individual's nose or mouth. The breathing monitor further comprises a processor to which the electronic signals are fed, the processor filtering the signals and differentiating between breathing signals corresponding to expiration and/or inspiration, and sound signals corresponding to external sounds and/or the individual's voice, and providing an output signal corresponding to the breathing signals, said output signal representing one or more breathing characteristics of the individual.
A problem of the aforementioned breathing monitor is that, although being less intrusive, it is still too cumbersome to use for determining a person's breathing. SUMMARY OF THE INVENTION
It would be advantageous to have a system or method which is more convenient to use for determining a person's breathing.
To better address this concern, a first aspect of the invention provides a system for determining a person's breathing, comprising:
an input for obtaining an audio signal, the audio signal comprising a sound component constituting a speech recording of a person;
a speech analyzer for analyzing the sound component by extracting a set of audio features from the audio signal which characterize the sound component;
- a breathing analyzer for establishing breathing measures indicative of one or more breathing cycles of the person's breathing during the speech recording by using the set of audio features to differentiate between inspiration and expiration in the sound component; and
an output for providing the breathing measures or a breathing parameter derived from the breathing measures.
In a further aspect of the invention, a method is provided for determining a person's breathing, comprising:
obtaining an audio signal, the audio signal comprising a sound component constituting a speech recording of a person;
- analyzing the sound component by extracting a set of audio features from the audio signal which characterize the sound component;
establishing breathing measures indicative of one or more breathing cycles of the person's breathing during the speech recording by using the set of audio features to differentiate between inspiration and expiration in the sound component; and
- providing the breathing measures or a breathing parameter derived from the breathing measures.
In a further aspect of the invention, a computer program product is provided comprising instructions for causing a processor system to perform the method.
The above measures involve obtaining an audio signal of a person speaking, i.e., a speech recording of the person. As a result, the audio signal comprises at least a sound component, being a signal component of the audio signal which represents a recording of the person speaking. The sound component thus represents the sounds as generated by the person during speech and includes both the speaking sounds and the breathing sounds. In addition, the audio signal may comprise other components, e.g., a background noise component or a pseudo-sound component, but which are generally considered undesirable and thus typically avoided as much as possible in generating or obtaining the audio signal. The audio signal may be obtained from a microphone, which may or may not be part of the system.
The sound component of the audio signal is analyzed in that a set of audio features are extracted from the audio signal which together, at least to a certain degree, describe the sound component in the particular audio signal. In essence, the set of audio features may constitute a set of speech features, in that many or all of the audio features may be well suitable for characterizing human speech. Multiple different audio features are extracted, together yielding a set of audio features.
The set of audio features is used to generate breathing measures. The breathing measures are measures which are derived from the set of audio features and which allow one or more breathing cycles of the person during the speech recording to be identified. Here, the term breathing cycle refers to a cycle of inspiration and expiration, also commonly known as respiratory cycle. The breathing measures may be indicative of the one or more breathing cycles in that they define, e.g., a period and a type of breathing occurring between during the period, i.e., inspiration or expiration. The breathing measures are obtained by using the set of audio features to differentiate between inspiration and expiration in the sound component. Hence, depending on a value of one or more of the set of features, it is concluded whether the set of audio features indicates an inspiration or expiration.
The breathing measures are made available, e.g., for use in a further analysis and/or diagnosis, either directly or in the form of one or more breathing parameters. Here, the term breathing parameter refers to a parameter characterizing the persons breathing, and which is derivable from the breathing measures. Such breathing parameters are known per se.
By extracting a set of audio features from the audio signal which characterize the sound component, one or more characteristics of the speech recording of the person are quantified in the form of data. The inventors have recognized that such characteristics alone are indicative of the breathing of a person, and that in fact; it is not needed to record the airflow from a person's breath. It is noted that the recording of airflow results in pseudo- sounds being recorded, which are caused by, e.g., the airflow moving a membrane of the microphone and thus mimicking, to a certain degree, a movement caused by sound waves. However, such pseudo-sounds differ from actual sounds such as speech in that, e.g., pseudo- sounds travel at a speed of the airflow rather than at the speed of sound. By establishing the breathing measures by using the set of audio features to differentiate between inspiration and expiration in the sound component, it is not needed to use or rely on such pseudo-sounds to establish the breathing measures. Rather, the breathing measures are derived directly from a sound component in the audio signal representing the person's vocal sounds.
Advantageously, it is not needed to provide a sensor that is capable of detecting the airflow from a person's breath. Instead, a conventional microphone can be used, without a need to position the conventional microphone in a path of the airflow, to remove wind shielding from the conventional microphone, and/or to use a temperature sensor.
The inventors have further recognized that the speaking sounds and the breathing sounds are both indicative of the person's breathing. By determining the person's breathing directly from the sound component representing the person's vocal sounds, it is not needed to separate the breathing sounds from the speaking sounds in the audio signal.
Advantageously, the breathing sounds complement and/or reaffirm the speaking sounds in being indicative of the person's breathing, thereby providing improved breathing measures. Advantageously, a person's breathing may be determined from any speech recording.
Optionally, the breathing analyzer is arranged for differentiating between inspiration and expiration in the sound component by applying an inspiration and expiration classifier to the set of audio features. The analysis is thus based on a classification, in which an inspiration and expiration classifier is applied to the set of audio features so as to determine whether the set of audio features indicates an inspiration or an expiration.
Optionally, the inspiration and expiration classifier has been trained using the set of audio features. Hence, instead of using heuristics to determine the classifier, a training- based approach is used to obtain the classifier. Training-based approaches have been determined to be well suited for this particular purpose.
Optionally, the speech analyzer is arranged for i) obtaining a plurality of segments of the audio signal, and ii) extracting the set of audio features from each one of the plurality of segments; and the breathing analyzer is arranged for establishing the breathing measures by classifying each one of the plurality of segments as either inspiration or expiration based on the respective set of audio features. The breathing measures are thus obtained by performing the process of extracting the set of audio features and differentiating between inspiration and expiration for each of the plurality of segments separately.
Optionally, the breathing analyzer is arranged for further classifying each one of the plurality segments according to a type of inspiration or expiration. The set of audio features is thus analyzed to additionally obtain a type of inspiration or expiration, the type being, e.g., a quiet expiration, an expiration with voice, etc. The inventors have recognized that also the type of inspiration and/or expiration can be derived from the sound component via the extracted set of audio features. Advantageously, more detailed breathing measures are obtained, thereby improving the further analysis and/or diagnosis.
Optionally, each one of the plurality of segments partially overlaps in time with at least another one of the plurality of segments. By using segments which partially overlap in time with another segment, more accurate breathing measures are obtained.
Optionally, the set of audio features comprises at least one of: a temporal audio feature, a spectral moment feature, a Mel-Frequency Cepstral Coefficient (MFCC) feature, a Perceptual Linear Predictive Cepstral Coefficient (PLP-CC) feature, and a prosody feature. The aforementioned features have been determined to each be well-suited for use in a set of features that enables discrimination between inspiration and expiration.
Optionally, the speech analyzer is arranged for band-pass filtering and/or de- trending the audio signal before extracting the set of audio features from the audio signal. Advantageously, the band-pass filtering and de-trending each allow the set of audio features to be more accurately extracted from the audio signal. Advantageously, a background noise component which may be present in the audio signal is suppressed or reduced.
Optionally, the breathing parameter is one of:
a mean inhalation rate;
an inhalation duration variability;
- a mean exhalation rate;
an exhalation variability;
a number of breaths per minute; and
a ratio between inspiration duration and expiration duration. Said breathing parameters are derivable from the breathing measures and well suited for use in further analysis, diagnosis and/or feedback to the person itself.
Optionally, the breathing analyzer is arranged for establishing the person's lung function based on the breathing measures. People with an impaired lung function will prioritize breathing over speaking. A general practitioner may listen to a person speaking in order to obtain a first order indication of a respiratory impairment. By automatically obtaining the breathing measures from a speech recording and analyzing said breathing measures, the lung function can be established more accurately and in an automatic manner.
Optionally, the breathing analyzer is further arranged for attributing the person's lung function to a lung disease. It is known per se that breathing measures used during speech differ from those of quiet respiration and vary with different types of lung disease, e.g., from "Speech breathing in patients with lung disease", Am Rev Respir Dis, Vol. 147 (5), pp. 1199-1206, 1993. Advantageously, said lung disease may be determined directly from speech, without a need for breathing into a mouth piece or wearing a belt.
Optionally, a Smartphone is provided comprising the system set forth, the Smartphone comprising a microphone for telephony, wherein the microphone is arranged for obtaining the audio signal for use in analyzing the person's breathing. A Smartphone offers an unobtrusive and convenient way of recording a person's speech. Advantageously, the Smartphone may be used for continuously monitoring a person's breathing. Advantageously, the Smartphone may unobtrusively monitor the person's breathing during, e.g., a telephone call, a video call, a voice dictation, or a voice-based input. Advantageously, a software application may be provided comprising instructions which, upon execution, cause the Smartphone to determine the breathing measures according to the present invention.
Optionally, obtaining the audio signal comprises prompting the person to, at least one of: describe an occurrence, describe an object, describe a picture, and read out aloud a passage of text. The person may thus be actively prompted to speak so as to enable the breathing measures to be determined from a recording of the speech.
It will be appreciated by those skilled in the art that two or more of the above- mentioned embodiments, implementations, and/or aspects of the invention may be combined in any way deemed useful.
Modifications and variations of the method, the computer program product and/or the Smartphone, which correspond to the described modifications and variations of the system, can be carried out by a person skilled in the art on the basis of the present description.
The invention is defined in the independent claims. Advantageous embodiments are defined in the dependent claims.
BRIEF DESCRIPTION OF THE DRAWINGS
These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiments described hereinafter. In the drawings,
Fig. 1 shows a system being an embodiment of the present invention;
Fig. 2 shows a method being an embodiment of the present invention;
Fig. 3 shows a computer program product for performing the method;
Fig. 4a shows an audio signal comprising a sound component;
Fig. 4b shows breathing measures obtained from the sound component; and Fig. 5 shows a methodology for obtaining trained classifiers.
DETAILED DESCRIPTION OF EMBODIMENTS
Fig. 1 shows a system 100 for determining a person's breathing. The system 100 comprises an input 120 for obtaining an audio signal 122. The audio signal 122 comprises a sound component constituting a speech recording of a person. The input 120 may take various forms, as is illustrated in Fig. 1. For example, the input 120 may be a microphone input connectable to a microphone 110 so as to obtain the audio signal 122 in real-time from the microphone 110. Alternatively or additionally, the input 120 may be a data input connectable to a data storage 112 comprising the audio signal 122 as stored data.
Alternatively or additionally, the input 120 may be a network interface for obtaining the audio signal 122 from a network 114, e.g., a Local Area Network (LAN) or the internet.
The system 100 further comprises a speech analyzer 140. The speech analyzer 140 is arranged for analyzing the sound component by extracting a set of audio features 142 from the audio signal 122 which characterizes the sound component. For obtaining the audio signal 122, the speech analyzer 140 is shown to be connected to the input 120. The system 100 further comprises a breathing analyzer 160. The breathing analyzer 160 is arranged for establishing breathing measures 162 indicative of one or more breathing cycles of the person's breathing during the speech recording by using the set of audio features 142 to differentiate between inspiration and expiration in the sound component. For obtaining the set of audio features 142, the breathing analyzer 160 is shown to be connected to the speech analyzer 140.
The system 100 further comprises an output 180 for providing the breathing measures 162 or a breathing parameter 164 derived from the breathing measures. For obtaining the breathing measures 162 or the breathing parameter 164, the output 180 is shown to be connected to the breathing analyzer 160. The breathing analyzer 160 may be arranged for deriving the breathing parameter 164 from the breathing measures 162 and providing the breathing parameter 164 to the output 180 instead of, or in addition to, the breathing measures 162. The output 180 may take various forms, as is illustrated in Fig. 1. For example, the output 180 may be a display output connectable to a display 190 for showing the breathing measures 162, the breathing parameter 164, a visual indicator derived therefrom, etc. Alternatively or additionally, the output 180 may be a data output connectable to a data storage 192 for storing the breathing measures 162 or the breathing parameter 164 thereon. Alternatively or additionally, the output 180 may be a network interface for providing the breathing measures 162 or the breathing parameter 164 via a network 194, e.g., via the internet. For example, the output 180 may be arranged for providing the breathing measures 162 or the breathing parameter 164 to a telehealth system via the network 194.
An operation of the system 100 may be briefly explained as follows. The input 120 obtains the audio signal 122. The speech analyzer 140 analyzes a sound component of the audio signal 122 by extracting a set of audio features 142 from the audio signal 122 which characterizes the sound component. The breathing analyzer 160 establishes breathing measures 162 indicative of one or more breathing cycles of the person's breathing during the speech recording by using the set of audio features 142 to differentiate between inspiration and expiration in the sound component. The breathing analyzer 160 may derive a breathing parameter 164 from the breathing measures. The output 180 then provides the breathing measures 162 or the breathing parameter 164, e.g., to another part of the system 100 or to another system.
Fig. 2 shows a method 200 for determining a person's breathing. The method 200 may correspond to the aforementioned operation of the system 100. It is noted, however, that the method 200 may also be performed in separation of said system 100.
The method 200 comprises, in a step titled "OBTAINING AN AUDIO SIGNAL", obtaining 210 an audio signal, the audio signal comprising a sound component constituting a speech recording of a person. The method 200 further comprises, in a step titled "EXTRACTING SET OF AUDIO FEATURES", extracting a set of audio features from the audio signal which characterizes the sound component. The method 200 further comprises, in a step titled "ESTABLISHING BREATHING MEASURES BASED ON AUDIO FEATURES", establishing 230 breathing measures indicative of one or more breathing cycles of the person's breathing during the speech recording by using the set of audio features to differentiate between inspiration and expiration in the sound component. The method 200 further comprises, in a step titled "PROVIDING THE BREATHING MEASURES", providing 240 the breathing measures or a breathing parameter derived from the breathing measures.
Fig. 3 shows a computer program product 260 comprising instructions for causing a processor system to perform the method according to the present invention. The computer program product 260 may be comprised on a computer readable medium 250, for example as a series of machine readable physical marks and/or as a series of elements having different electrical, e.g., magnetic, or optical properties or values.
The operation of the system 100 may be explained in more detail as follows. Fig. 4a shows a schematic representation of an audio signal 122 comprising a sound component 124. The schematic representation is in the form of a waveform. In this particular example, the audio signal 122 is constituted primarily by the sound component 124 in that the audio signal 122 corresponds to a clean recording of the person speaking, i.e., without having recorded significant background noise. Hence, the waveform shown is both of the audio signal 122 as well as of the sound component 124.
Fig. 4b shows a result of the present invention, in that it schematically illustrates breathing measures 162 as determined from the sound component 124. Here, an expiration is denoted by EX, an inspiration is denoted by IN, and a length of each expiration and inspiration is denoted by a length on the horizontal axis as well as two vertical dashed lines marking a start and an end of each expiration and inspiration. Said vertical dashed lines are also depicted in Fig. 4a to illustrate corresponding parts of the sound component 124. Fig. 4b thus shows an alternating pattern of both labeled (vertical dashed lines) and detected (horizontal thick lines) expirations EX and inspirations IN.
The breathing measures 162 of Fig. 4b may be determined as follows from the sound component 124 of Fig. 4a. The speech analyzer 140 may be arranged for obtaining a plurality of segments of the audio signal 122, e.g., by dividing the audio signal 122 into said segments so as to obtain a plurality of adjoining segments. Such segments may also be referred to as frames or windows. Each segment thus has a certain length, e.g., 50
milliseconds. Each one of the plurality of segments may overlap in time with at least another one of the plurality of segments, e.g., by 25 milliseconds. However, this is not a limitation. For example, the speech analyzer 140 may be arranged for dividing the audio signal 122 into adjoining segments which do not overlap. The speech analyzer 140 may be further arranged for extracting a set of audio features 142 from each one of the plurality of segments. Here, it is noted that the term audio feature refers to a mathematical expression, which, when using the audio signal 122 or a segment thereof as input, provides an output which characterizes one or more aspects of the audio signal 122 or the segment. An example of an audio feature is Loudness, with other features being further discussed in reference to Table 1.
The breathing analyzer 160 may be arranged for establishing the breathing measures 162 by classifying each one of the plurality segments as either inspiration IN or expiration EX based on the respective set of audio features 142. For that purpose, a classification-based approach may be used. For example, the breathing analyzer 160 may be arranged for differentiating between inspiration IN and expiration EX in the sound component 124 by applying an inspiration and expiration classifier to the set of audio features 142. Hence, based on one or more values of the set of audio features 142, the segment is either classified as inspiration IN or as expiration EX. The inspiration and expiration classifier may be manually designed, i.e., based on heuristics.
For example, it is known that speech is most likely to occur during exhalation. Hence, the exhalations are more sensitive to spectral energies which are associated with speech. Nevertheless, certain audio features may be indicative of an inspiration taking place during the speech. Hence, the inspiration and expiration classifier may be designed to take both situations into account by suitably defining heuristics for classifying only the former situation as expiration EX based on the set of audio features 142. Another example is that certain vocal sounds which occur during a pause of the speaking may indicate that an inspiration takes place, thus, the classifier may be designed to take this into account.
Alternatively or additionally, the inspiration and expiration classifier may be trained, i.e., using an offline training methodology. The training may make use of the set of audio features 142, which is typically the case but is not a limitation. For example, similar but non- identical audio features may be used. The training may also comprise selecting the set of audio features 142, e.g., from a larger set of audio features 142. Therefore, in essence, the set of audio features 142 may constitute a subset of audio features 142. In the following, an example of such training is described. It will be appreciated, however, that many alternatives are known from the field of machine learning which may be advantageously used as well, e.g., neural networks, decision tree learning, etc.
Fig. 5 shows an example of the offline training methodology. Here, training data, also referred to as control data, is obtained by performing the steps shown in the "CONTROL DATA" block 300. The training data is obtained from one or more subjects via two different methods which may be used individually or jointly. A first method 310 uses a RIP signal 302 of the subject, i.e., a signal obtained from a respiratory inductive
plethysmograph during the speaking of the subject. Based on the RIP signal 302, breathing measures are detected in a step 310 titled "BP detection". It is noted that instead of using a RIP signal 302, any other independent system for automatically obtaining breathing measures indicative of one or more breathing cycles may be used, e.g., a pneumotachograph or a strain gauge applied to the subject's chest. Additionally or alternatively, a second method 320 is based on the manual detection of inspiration and expiration in a speech signal 304 of the same subject, i.e., an audio signal comprising a sound component. Here, in a step titled "BP detection", breathing measures are obtained by, e.g., listening to the speech signal 304 and setting markers at a beginning and end of each inspiration phase under investigation. Based on those markers, each sample of the speech signal 304 may be labeled as 'IN' if it belongs to an inspiration phase or as 'EX' if it belongs to an expiration phase.
As a result of either or both of said methods 310, 320, one or more breathing measures 312 are obtained. The speech signal 304 together with the detected or labeled breathing phase 312 constitutes training data used in the subsequent training of the classifiers. It is noted that the above process may be repeated for more than one subject.
Before using the speech signal 304 in the aforementioned training, it may be pre-processed by band-pass filtering and/or de-trending the speech signal 304. This preprocessing may also be performed by the system 100, in that the speech analyzer 140 may be arranged for band-pass filtering and/or de-trending the audio signal 122 before extracting the set of audio features 142 from the audio signal. The speech signal 304 may be de-trended by subtracting its mean and band-pass-filtering, e.g., using a fourth-order Butterworth filter in the frequency range from 60 Hz to 5000 Hz. Moreover, the speech signal 304 may be decomposed into segments before use in the training. For example, the speech signal 304 may be divided into segments of 50 ms length with a 50% overlap between adjacent segments. The class labels, i.e., the aforementioned 'IN' for an inspiration phase and 'EX' for an expiration phase, may be assigned to each segment based on the labels that are present predominantly within that segment. For example, if more than 50% of the samples within one segment are labeled 'IN', the segment in its entirety may be labeled 'IN' and thus considered as constituting or being part of an inspiration IN phase. In addition, so-termed transitional segments may be eliminated. Here, in order to reduce a risk of assigning an incorrect class label to a particular speech segment, the segments in close proximity to a beginning and an end of each breathing phase may be omitted from the training. Close proximity may be defined by 100 ms for inspiration phases and 500 ms for expiration phases. Thus, segments which are considered to be in close proximity may not be considered in the training.
The training phase involves performing the steps as shown in the "CLASSIFICATION" block 350, in which the speech signal 304 is used as input, having been optionally pre-processed as noted above. Here, in a step titled "Feature generation" 360, instantaneous audio features are computed for each segment of the speech signal 304. The audio features may be audio features which are known per se from the field of audio analysis and the more specific field of speech analysis. For example, the audio features may correspond to those described by Florian Eyben et al. in "openSMILE - The Munich Versatile and Fast Open-Source Audio Feature Extractor", Proc. ACM Multimedia, ACM, Florence, Italy, ISBN 978-1-60558-933-6, pp. 1459-1462, 25.-29.10.2010, and together constitute a feature space. As such, a 71 -dimensional feature vector 142 may be obtained for each of the segments, corresponding to the 71 audio features selected from the openSMILE toolkit.
The 71 -dimensional feature vector 142 may be provided in the form of training data to a step titled "Feature selection and extraction" 370. An example of such a set of audio features 142 is shown in Table 1 below. Here, the set of audio features 142 is constituted by a plurality of temporal audio features, spectral moment features, Mel- Frequency Cepstral Coefficient (MFCC) features, Perceptual Linear Predictive Cepstral Coefficient (PLP-CC) features and prosody features.
Number Name Dimension
Temporal Features
1 Mean 1
2 Standard deviation 1
3 Data range 1
4 RMS energy 1
5 LOG energy 1
6 Intensity 1
7 Loudness 1
8 Zero crossing rate 1
Spectral Moment Features
9 Spectral centroid 1
10 Spectral skewness 1
11 Spectral kurtosis 1
MFCC Features
12-24 MFCC 13
25-37 1st order delta MFCC 13
38-50 2nd order delta MFCC 13
PLP-CC Features
51-56 PLP-CC 6
57-62 1st order delta PLP-CC 6
63-68 2nd order delta PLP-CC 6
Prosody Features
69 Fundamental frequency 1 70 Voicing probability 1
71 Loudness contour 1
Tabel 1. Example of a set of audio features
Optionally, a further selection may be made from the 71 audio features so as to obtain a smaller set of audio features 144 for use in establishing the breathing measures. This may involve normalizing the 71 -dimensional feature vector 142 by z-score
transformation and ranking said feature vector according to individual feature significance. Said significance may be obtained using an independent evaluation criterion for binary classification based on the two-sample t-test. Said t-test may be applied to each feature and p- values may be obtained for each feature which serve as a measure of its discriminative power. The p-values may be compared and used to sort the features accordingly. For that purpose, timing information from the labeled breathing phase 312 may be used. As a result, a subset of, e.g., 25 audio features may be selected comprising the 25 highest ranked features.
The set of audio features 144 is then provided for use in training the classifiers in a step titled "Train classifier" 375. Here, an inspiration and expiration classifier 376 may be trained using the set of audio features 144 and the labeled breathing phase 312.
Optionally, the training may also be performed on the full set of audio features, e.g., the 71- dimensional feature vector obtained from the testing data in the "Feature generation" step 360. The training itself may be based on a statistical classification method such as Naive Bayes (NB) classification, linear discriminant models, support vector machine (SVM) models, etc. All of the aforementioned methods have been verified to work well for this purpose. For example, a Na'ive Bayes classification may be used to predict class membership probabilities, i.e., the probability that a given sample or segment belongs to a particular class, being in this case either inspiration IN or expiration EX. Here, the set of audio features 142 may be modeled using a kernel smoothing density estimate. A result of the statistical classification method, a trained inspiration and expiration classifier 376 is then obtained. The trained inspiration and expiration classifier 376 may then be provided for use in testing the classification in a step titled "Classification" 380.
The testing of the classification may be as follows. The trained inspiration and expiration classifier 376 may be applied to a set of audio features 145 derived from a testing speech signal 306. The set of audio features 145 may be derived in a same manner from the testing speech signal 306 as described previously in reference to the speech signal 304 used for training, i.e., by computing a 71 -dimensional feature vector 143 from the testing speech signal 306, and deriving a smaller set of audio features 145 from the 71 -dimensional feature vector 143. As a result of the classification 380, breathing measures 382 are obtained which are indicative of one or more breathing cycles of the person's breathing during the testing speech signal 306, e.g., a set of predicted breathing phases 382.
After the training, the system 100 may then use the trained inspiration and expiration classifier as follows. The speech analyzer 140 may pre-process the audio signal 122 by de-trending and band-pass filtering the audio signal, e.g., by using the fourth-order Butterworth filter in the frequency range from 60 Hz to 5000 Hz as applied during training. Moreover, the audio signal 122 may be decomposed into segments of 50 ms, each having a 50% overlap. For each segment, the set of audio features 142 may be calculated as used during the training, e.g., the set of audio features 142 as shown in Table 1. Each segment of the audio signal 122 may then be classified as either inspiration IN or expiration EX using the trained inspiration and expiration classifier. For example, the Naive Bayes method may compute the posterior probability of the segment belonging to each class using the prior class probabilities, which were estimated based on the relative frequencies of the classes in the training data, and then classify that segment according the largest posterior probability. As a result, a classification for each of the segments is obtained, which together may be used in generating the breathing measures 162.
It is noted that the breathing measures 162 may be further post-processed, e.g., based on heuristics, to improve its quality. For example, clear inconsistencies, such as an inspiratory phase being too short or too long, may be removed or avoided.
Moreover, it is noted that in the above, the classification is a binary
classification in that a segment is either classified as inspiration IN or expiration EX. The breathing analyzer 160, however, may also be arranged for further classifying each one of the plurality segments according to a type of inspiration or expiration. For example, an expiration may be classified as a quiet expiration or as an expiration with speech. Consequently, the classification may be non-binary, i.e., differentiate not only between inspiration and expiration but also between the specific type of inspiration and/or expiration.
The breathing analyzer 160 may further derive a breathing parameter 164 from the breathing measures 162, e.g., a mean inhalation rate, an inhalation duration variability, a mean exhalation rate, an exhalation variability, a number of breaths per minute or a ratio between inspiration duration and expiration duration. It is noted that the calculation of such parameters is known per se from the field of respiratory medicine. The breathing analyzer 160 may also be arranged for establishing the person's lung function based on the breathing measures 162. For that purpose, one or more breathing parameters 164 may be calculated which may be compared against those in a medical database. As such, an estimate of the person's lung function may be obtained. Moreover, the breathing analyzer 160 may be arranged for attributing the person's lung function to a lung disease. For that purpose, a medical database may be accessed comprising characteristics of lung diseases. The attributing may be based on the breathing measures 162 or one or more breathing parameters 164. For example, the breathing analyzer 160 may be arranged for i) deriving a set of breathing parameters from the breathing measures 162, ii) analyzing the set of breathing parameters to obtain a characteristic of the lung function, and iii) searching the medical database for a lung disease which matches the one or more characteristics, thereby attributing the lung function to the lung disease. As such, the system 100 may be arranged for distinguishing between lung diseases such as chronic obstructive pulmonary disease (COPD), asthma, pneumonia, emphysema, etc.
The present invention may be incorporated in a Smartphone. Since a
Smartphone already comprises a microphone for phone calls, the microphone can be used for obtaining the audio signal 122 for use in analyzing the person's breathing. Moreover, the function of the speech analyzer 140 and the breathing analyzer 160 may be performed in software, e.g., by a software application running on the Smartphone. The Smartphone may be arranged for determining the person's breathing from so-termed free speech or conversational speech. Here, the person is not required to speak specifically for the purpose of enabling the Smartphone to establish the breathing measures, but rather for a different purpose, e.g., a phone conversation. The person may therefore interact with the Smartphone in a usual manner while the audio signal 122 is being analyzed unobtrusively in the background.
The Smartphone may also be arranged for continuously analyzing an audio signal 122 from the microphone. As such, the Smartphone, when sitting on, e.g., a desk, may continuously record any speech in its vicinity and establish therefrom the breathing measures. To ensure that the speech of a particular person is recorded, e.g., of a patient who is prone to respiratory impairment, speaker recognition software may be used to identify the particular person. Alternatively or additional, the Smartphone may be arranged for analyzing speech provided specifically for the purpose of voice dictation or voice control.
It is noted that any pre-processing of the audio signal 122 may be optimized in accordance with the type of Smartphone. For example, in case the Smartphone has a sampling frequency of 8000 Hz, the upper limit of band-pass-filtering may be selected to be 4000 Hz or lower, i.e., below the Nyquist frequency. In another example, the band-bass- filtering may be omitted in its entirety. Instead, a high-pass-filtering may be used.
The Smartphone may also be used to prompt the person to speak so as to be able to obtain the breathing measures 162 from a recording of the speech. For example, the Smartphone may be arranged for showing the person a picture and prompting the person describe the picture. The Smartphone may also be arranged for prompting the person to describe an object or an occurrence, e.g., the person's day so far. The Smartphone may also be arranged for prompting the person to read out aloud a passage of text, e.g., recent news, a joke, a story such as a bedtime story, etc. The Smartphone may also be arranged for requiring the person to speak as part of a game, e.g., a speech controlled game, or a game requiring the person to speak for a predetermined period without deviation, repetition or hesitation.
It is noted that the system 100 may also be incorporated into a handheld device other than a Smartphone, such as a tablet. Alternatively, the system 100 may be incorporated into another type of home electronics, such as a home television. The system 100 may also be embodied as a standalone dedicated device which is arranged for recording a person speaking and then calculating and displaying, e.g., a so-termed speech breathing factor which indicates the respiratory health or relative respiratory health of the person.
The system 100 may be connectable to a telehealth system such as a Philips Motiva. Philips Motiva is an example of an interactive healthcare platform that connects patients with chronic conditions to their healthcare providers - via a home television and an internet connection. Motiva automates disease management activities, and engages patients with personalized daily interactions and education delivered through the home television. For example, the system 100 may be incorporated into the aforementioned Smartphone or other personal device, with the output 180 of the system 100 being arranged for providing the breathing measures 162 or the breathing parameter 164 derived from the breathing measures via the internet to the telehealth system. Alternatively, the system 100 may be incorporated into a remote part of the telehealth system. As such, the functions of the speech analyzer 140 and breathing analyzer 160 may be performed remotely, with only the speech of the person being recorded locally, i.e., at a location of the person such as a home. Here, an alarm, actionable feedback or a motivational message, which may be derived from an analysis of the breathing measures 162, may be provided to the person locally using, e.g., a home television.
In general, it is noted that the term speech recording refers to speech being available in data form. Consequently, it does not imply the speech being prerecorded, i.e., it may equally refer to the speech being recorded and made available in real-time. It is further noted that determining a person's breathing refers to establishing data indicative of the breathing based on a study or investigation, with the subject of the study or investigation being here the audio signal comprising the sound component.
It will be appreciated that the invention also applies to computer programs, particularly computer programs on or in a carrier, adapted to put the invention into practice. The program may be in the form of a source code, an object code, a code intermediate source and an object code such as in a partially compiled form, or in any other form suitable for use in the implementation of the method according to the invention. It will also be appreciated that such a program may have many different architectural designs. For example, a program code implementing the functionality of the method or system according to the invention may be sub-divided into one or more sub-routines. Many different ways of distributing the functionality among these sub-routines will be apparent to the skilled person. The subroutines may be stored together in one executable file to form a self-contained program. Such an executable file may comprise computer-executable instructions, for example, processor instructions and/or interpreter instructions (e.g. Java interpreter instructions). Alternatively, one or more or all of the sub-routines may be stored in at least one external library file and linked with a main program either statically or dynamically, e.g. at run-time. The main program contains at least one call to at least one of the sub-routines. The sub-routines may also comprise function calls to each other. An embodiment relating to a computer program product comprises computer-executable instructions corresponding to each processing step of at least one of the methods set forth herein. These instructions may be sub-divided into subroutines and/or stored in one or more files that may be linked statically or dynamically.
Another embodiment relating to a computer program product comprises computer-executable instructions corresponding to each means of at least one of the systems and/or products set forth herein. These instructions may be sub-divided into sub-routines and/or stored in one or more files that may be linked statically or dynamically.
The carrier of a computer program may be any entity or device capable of carrying the program. For example, the carrier may include a storage medium, such as a ROM, for example, a CD ROM or a semiconductor ROM, or a magnetic recording medium, for example, a hard disk. Furthermore, the carrier may be a transmissible carrier such as an electric or optical signal, which may be conveyed via electric or optical cable or by radio or other means. When the program is embodied in such a signal, the carrier may be constituted by such a cable or other device or means. Alternatively, the carrier may be an integrated circuit in which the program is embedded, the integrated circuit being adapted to perform, or used in the performance of, the relevant method.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. Use of the verb "comprise" and its conjugations does not exclude the presence of elements or steps other than those stated in a claim. The article "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the device claim enumerating several means, several of these means may be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

Claims

CLAIMS:
1. System (100) for determining a person's breathing, comprising:
an input (120) for obtaining an audio signal (122), the audio signal comprising a sound component (124) constituting a speech recording of a person;
a speech analyzer (140) for analyzing the sound component (124) by extracting a set of audio features (142, 144) from the audio signal (122) which characterizes the sound component;
a breathing analyzer (160) for establishing breathing measures (162) indicative of one or more breathing cycles of the person's breathing during the speech recording by using the set of audio features (142, 144) to differentiate between inspiration (IN) and expiration (EX) in the sound component (124); and
an output (180) for providing the breathing measures (162) or a breathing parameter (164) derived from the breathing measures.
2. System (100) according to claim 1, wherein the breathing analyzer (160) is arranged for differentiating between inspiration (IN) and expiration (EX) in the sound component (124) by applying an inspiration and expiration classifier (376) to the set of audio features (142, 144).
3. System (100) according to claim 2, wherein the inspiration and expiration classifier (376) has been trained (350) using the set of audio features (142, 144).
4. System (100) according to claim 2, wherein:
the speech analyzer (140) is arranged for i) obtaining a plurality of segments of the audio signal (122), and ii) extracting the set of audio features (142, 144) from each one of the plurality of segments; and
the breathing analyzer (160) is arranged for establishing the breathing measures (162) by classifying each one of the plurality of segments as either inspiration (IN) or expiration (EX) based on the respective set of audio features (142, 144).
5. System (100) according to claim 4, wherein the breathing analyzer (160) is arranged for further classifying each one of the plurality segments according to a type of inspiration (IN) or expiration (EX).
6. System (100) according to claim 4, wherein each one of the plurality of segments partially overlaps in time with at least another one of the plurality of segments.
7. System (100) according to claim 1, wherein the set of audio features (142, 144) comprises at least one of: a temporal audio feature, a spectral moment feature, a Mel- Frequency Cepstral Coefficient (MFCC) feature, a Perceptual Linear Predictive Cepstral Coefficient (PLP-CC) feature, and a prosody feature.
8. System (100) according to claim 1, wherein the speech analyzer (140) is arranged for band-pass filtering and/or de-trending the audio signal before extracting the set of audio features (142, 144) from the audio signal (122).
9. System (100) according to claim 1, wherein the breathing parameter (164) is one of:
a mean inhalation rate;
- an inhalation duration variability;
a mean exhalation rate;
an exhalation variability;
a number of breaths per minute; and
a ratio between inspiration duration and expiration duration.
10. System (100) according to claim 1, wherein the breathing analyzer (160) is arranged for establishing the person's lung function based on the breathing measures (162).
11. System (100) according to claim 10, wherein the breathing analyzer (160) is further arranged for attributing the person's lung function to a lung disease.
12. Smartphone comprising the system (100) according to claim 1, the
Smartphone comprising a microphone (110) for telephony, wherein the microphone is arranged for obtaining the audio signal (122) for use in analyzing the person's breathing.
13. Method (200) for determining a person's breathing, comprising:
obtaining (210) an audio signal, the audio signal comprising a sound component constituting a speech recording of a person;
analyzing (220) the sound component by extracting a set of audio features from the audio signal which characterizes the sound component;
establishing (230) breathing measures indicative of one or more breathing cycles of the person's breathing during the speech recording by using the set of audio features to differentiate between inspiration and expiration in the sound component; and providing (240) the breathing measures or a breathing parameter derived from the breathing measures.
14. Method (200) according to claim 13, wherein obtaining (210) the audio signal comprises comprising prompting the person to, at least one of: describe an occurrence, describe an object, describe a picture, and read out aloud a passage of text.
15. Computer program product (260) comprising instructions for causing a processor system to perform the method according to claim 13.
PCT/IB2013/058782 2012-09-24 2013-09-23 System and method for determining a person's breathing WO2014045257A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201261704688P 2012-09-24 2012-09-24
US61/704,688 2012-09-24

Publications (1)

Publication Number Publication Date
WO2014045257A1 true WO2014045257A1 (en) 2014-03-27

Family

ID=49709780

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2013/058782 WO2014045257A1 (en) 2012-09-24 2013-09-23 System and method for determining a person's breathing

Country Status (1)

Country Link
WO (1) WO2014045257A1 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020075015A1 (en) * 2018-10-11 2020-04-16 Cordio Medical Ltd. Estimating lung volume by speech analysis
US10796805B2 (en) 2015-10-08 2020-10-06 Cordio Medical Ltd. Assessment of a pulmonary condition by speech analysis
US10825464B2 (en) 2015-12-16 2020-11-03 Dolby Laboratories Licensing Corporation Suppression of breath in audio signals
US10925548B2 (en) 2016-08-23 2021-02-23 Koninklijke Philips N.V. Device, system and method for detection of an asthma attack or asthma of a subject
CN112472066A (en) * 2020-11-25 2021-03-12 陈向军 Breathing disorder monitoring terminal, monitor and system
US11011188B2 (en) 2019-03-12 2021-05-18 Cordio Medical Ltd. Diagnostic techniques based on speech-sample alignment
WO2021099279A1 (en) * 2019-11-18 2021-05-27 Koninklijke Philips N.V. Speech-based breathing prediction
US11024327B2 (en) 2019-03-12 2021-06-01 Cordio Medical Ltd. Diagnostic techniques based on speech models
EP3964134A1 (en) * 2020-09-02 2022-03-09 Hill-Rom Services PTE. LTD. Lung health sensing through voice analysis
CN114287913A (en) * 2020-10-08 2022-04-08 国际商业机器公司 Multi-modal spirometric measurements for respiratory rate instability prediction
US11417342B2 (en) 2020-06-29 2022-08-16 Cordio Medical Ltd. Synthesizing patient-specific speech models
US11484211B2 (en) 2020-03-03 2022-11-01 Cordio Medical Ltd. Diagnosis of medical conditions using voice recordings and auscultation

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070005357A1 (en) * 2005-06-29 2007-01-04 Rosalyn Moran Telephone pathology assessment
JP2008086741A (en) * 2005-12-26 2008-04-17 Akira Tomono Respiration detection type chemical substance presenting device and respiration detector
WO2010015865A1 (en) 2008-08-08 2010-02-11 Healthsmart Limited Breathing monitor and method for monitoring breating

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070005357A1 (en) * 2005-06-29 2007-01-04 Rosalyn Moran Telephone pathology assessment
JP2008086741A (en) * 2005-12-26 2008-04-17 Akira Tomono Respiration detection type chemical substance presenting device and respiration detector
WO2010015865A1 (en) 2008-08-08 2010-02-11 Healthsmart Limited Breathing monitor and method for monitoring breating

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Speech breathing in patients with lung disease", AM REV RESPIR DIS, vol. 147, no. 5, 1993, pages 1199 - 1206
DIMA RUINSKIY ET AL: "An Effective Algorithm for Automatic Detection and Exact Demarcation of Breath Sounds in Speech and Song Signals", IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, IEEE SERVICE CENTER, NEW YORK, NY, USA, vol. 15, no. 3, 1 March 2007 (2007-03-01), pages 838 - 850, XP011165561, ISSN: 1558-7916, DOI: 10.1109/TASL.2006.889750 *
FLORIAN EYBEN ET AL.: "openSMILE - The Munich Versatile and Fast Open-Source Audio Feature Extractor", PROC. ACM MULTIMEDIA, ACM, FLORENCE, ITALY, 25 October 2010 (2010-10-25), pages 1459 - 1462, XP002718736

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10796805B2 (en) 2015-10-08 2020-10-06 Cordio Medical Ltd. Assessment of a pulmonary condition by speech analysis
US10825464B2 (en) 2015-12-16 2020-11-03 Dolby Laboratories Licensing Corporation Suppression of breath in audio signals
US10925548B2 (en) 2016-08-23 2021-02-23 Koninklijke Philips N.V. Device, system and method for detection of an asthma attack or asthma of a subject
AU2019356224B2 (en) * 2018-10-11 2022-07-14 Cordio Medical Ltd. Estimating lung volume by speech analysis
US10847177B2 (en) 2018-10-11 2020-11-24 Cordio Medical Ltd. Estimating lung volume by speech analysis
CN112822976B (en) * 2018-10-11 2024-05-07 科蒂奥医疗公司 Estimating lung volume by speech analysis
CN112822976A (en) * 2018-10-11 2021-05-18 科蒂奥医疗公司 Estimation of lung volume by speech analysis
WO2020075015A1 (en) * 2018-10-11 2020-04-16 Cordio Medical Ltd. Estimating lung volume by speech analysis
JP2022502189A (en) * 2018-10-11 2022-01-11 コルディオ メディカル リミテッド Estimating lung volume by speech analysis
JP7385299B2 (en) 2018-10-11 2023-11-22 コルディオ メディカル リミテッド Estimation of lung volume by speech analysis
US11011188B2 (en) 2019-03-12 2021-05-18 Cordio Medical Ltd. Diagnostic techniques based on speech-sample alignment
US11024327B2 (en) 2019-03-12 2021-06-01 Cordio Medical Ltd. Diagnostic techniques based on speech models
WO2021099279A1 (en) * 2019-11-18 2021-05-27 Koninklijke Philips N.V. Speech-based breathing prediction
US11752288B2 (en) 2019-11-18 2023-09-12 Koninklijke Philips N.V. Speech-based breathing prediction
US11484211B2 (en) 2020-03-03 2022-11-01 Cordio Medical Ltd. Diagnosis of medical conditions using voice recordings and auscultation
US11417342B2 (en) 2020-06-29 2022-08-16 Cordio Medical Ltd. Synthesizing patient-specific speech models
EP3964134A1 (en) * 2020-09-02 2022-03-09 Hill-Rom Services PTE. LTD. Lung health sensing through voice analysis
CN114287913A (en) * 2020-10-08 2022-04-08 国际商业机器公司 Multi-modal spirometric measurements for respiratory rate instability prediction
CN112472066A (en) * 2020-11-25 2021-03-12 陈向军 Breathing disorder monitoring terminal, monitor and system

Similar Documents

Publication Publication Date Title
WO2014045257A1 (en) System and method for determining a person's breathing
US11315687B2 (en) Method and apparatus for training and evaluating artificial neural networks used to determine lung pathology
JP6780182B2 (en) Evaluation of lung disease by voice analysis
US9814438B2 (en) Methods and apparatus for performing dynamic respiratory classification and tracking
US11304624B2 (en) Method and apparatus for performing dynamic respiratory classification and analysis for detecting wheeze particles and sources
Cosentino et al. Quantitative laughter detection, measurement, and classification—A critical survey
CN112822976B (en) Estimating lung volume by speech analysis
US11529072B2 (en) Method and apparatus for performing dynamic respiratory classification and tracking of wheeze and crackle
JP2006071936A (en) Dialogue agent
US20220054039A1 (en) Breathing measurement and management using an electronic device
Castillo-Escario et al. Entropy analysis of acoustic signals recorded with a smartphone for detecting apneas and hypopneas: A comparison with a commercial system for home sleep apnea diagnosis
JP2013123495A (en) Respiratory sound analysis device, respiratory sound analysis method, respiratory sound analysis program, and recording medium
Yahya et al. Automatic detection and classification of acoustic breathing cycles
Asatani et al. Classification of respiratory sounds using improved convolutional recurrent neural network
US10426426B2 (en) Methods and apparatus for performing dynamic respiratory classification and tracking
Routray Automatic measurement of speech breathing rate
Verde et al. An m-health system for the estimation of voice disorders
Ozdemir et al. A time-series approach to predict obstructive sleep apnea (OSA) Episodes
CN114730629A (en) Speech-based respiratory prediction
JP2012024527A (en) Device for determining proficiency level of abdominal breathing
Castro et al. Real-time identification of respiratory movements through a microphone
Nallanthighal et al. COVID-19 detection based on respiratory sensing from speech
Chang Speech Analysis Methodologies towards Unobtrusive Mental Health Monitoring
Eedara et al. An algorithm for automatic respiratory state classifications using tracheal sound analysis
US20210282736A1 (en) Respiration rate detection metholody for nebulizers

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13799385

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13799385

Country of ref document: EP

Kind code of ref document: A1