WO2023095136A1 - Subject diagnosis using speech analysis - Google Patents

Subject diagnosis using speech analysis Download PDF

Info

Publication number
WO2023095136A1
WO2023095136A1 PCT/IL2022/051253 IL2022051253W WO2023095136A1 WO 2023095136 A1 WO2023095136 A1 WO 2023095136A1 IL 2022051253 W IL2022051253 W IL 2022051253W WO 2023095136 A1 WO2023095136 A1 WO 2023095136A1
Authority
WO
WIPO (PCT)
Prior art keywords
vocal
diagnosis
subject
recording
features
Prior art date
Application number
PCT/IL2022/051253
Other languages
French (fr)
Inventor
Julie CWIKEL
Dan VILENCHIK
Alison Stern PEREZ
Ruslan SERGIENKO
Rachel ABRAMOVICH
Original Assignee
B.G. Negev Technologies And Applications Ltd., At Ben-Gurion University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by B.G. Negev Technologies And Applications Ltd., At Ben-Gurion University filed Critical B.G. Negev Technologies And Applications Ltd., At Ben-Gurion University
Publication of WO2023095136A1 publication Critical patent/WO2023095136A1/en

Links

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/48Other medical applications
    • A61B5/4803Speech analysis specially adapted for diagnostic purposes
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/16Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
    • A61B5/165Evaluating the state of mind, e.g. depression, anxiety
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • A61B5/7267Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B2562/00Details of sensors; Constructional details of sensor housings or probes; Accessories for sensors
    • A61B2562/02Details of sensors specially adapted for in-vivo measurements
    • A61B2562/0204Acoustic sensors
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/16Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
    • A61B5/168Evaluating attention deficit, hyperactivity
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/68Arrangements of detecting, measuring or recording means, e.g. sensors, in relation to patient
    • A61B5/6887Arrangements of detecting, measuring or recording means, e.g. sensors, in relation to patient mounted on external non-worn devices, e.g. non-medical devices
    • A61B5/6898Portable consumer electronic devices, e.g. music players, telephones, tablet computers

Definitions

  • the present disclosure in some embodiments, thereof, relates to diagnosis of a subject and, more particularly, but not exclusively, to diagnosis of mental state of the subject using speech recording of the subject.
  • SUBSTITUTE SHEET (RULE 26) 22. Buysse, D.J., et al., The Pittsburgh Sleep Quality Index: a new instrument for psychiatric practice and research. Psychiatry research, 1989. 28(2): p. 193-213.
  • Machine Learning techniques have apparently been used to predict mental states using various forms of data such as biological tests, questionnaires, video, and vocal recordings [1].
  • PTSD Post Traumatic Stress Disorder
  • suicide risk [4]
  • psychosis [5]
  • bipolar disorder [6]
  • depression [7, 8].
  • Study [9] has, apparently, identified the presence of pain but not, apparently, pain level and/or diagnostic characteristics.
  • Example 1 A method, implemented by computer circuitry, of diagnosis of a plurality of medical conditions for a subject comprising: obtaining a vocal recording of said subject having a vocal recording time duration; processing said vocal recording into a plurality of vocal recording portions, each vocal recording portion having a shorter time duration than a duration of said vocal recording time; extracting, for each of said plurality of vocal recording portions, vocal features of said subject; feeding, to a trained machine learning model, said vocal features for each of said plurality of vocal recording portions, to determine for each of said plurality of vocal recordings, an intermediate diagnosis for each of said plurality of medical conditions thereby obtaining a plurality of intermediate diagnoses for said plurality of time portions; determining, using said plurality of intermediate diagnoses, for said subject, a diagnosis for the plurality of medical conditions.
  • Example 2 The method according to claim 1, comprising obtaining linguistic features of said subject; and feeding said linguistic features to said trained machine learning model for 4
  • SUBSTITUTE SHEET (RULE 26) determining of said plurality of intermediate diagnoses.
  • Example 3 The method according to claim 1, comprising obtaining linguistic features of said subject; and feeding said linguistic features to a second trained machine learning model to determine a second intermediate diagnosis; wherein said determining comprises using said second intermediate diagnosis to provide said diagnosis for said plurality of medical conditions.
  • Example 4 The method according to any one of claims 2-3, wherein said obtaining linguistic features includes: obtaining a textural script of said vocal recording of said subject; and extracting said linguistic features from said textural script.
  • Example 5 The method according to claim 4, wherein said extracting said linguistic features includes extracting a plurality of linguistic features sets, one set for each vocal recording time portion.
  • Example 6 The method according to any one of claims 1-5, wherein said obtaining comprises obtaining a vocal recording which includes at least a portion where said subject describes a potentially emotive subject.
  • Example 7 The method according to any one of claims 1-6, wherein said determining, using said plurality of intermediate diagnoses comprises applying a rule to the diagnoses.
  • Example 8 The method according to claim 7, wherein said rule includes indicating presence of a condition where above a threshold proportion of said intermediate diagnoses indicate presence of the medical condition.
  • Example 9 The method according to claim 8, wherein said threshold proportion is different for different medical conditions of said plurality of medical conditions.
  • Example 10 The method according to any one of claims 1-9, wherein said determining, using said plurality of intermediate diagnoses comprises feeding said plurality of intermediate diagnoses to a second trained machine learning model to provide said diagnosis for the plurality of medical conditions.
  • Example 11 The method according to any one of claims 1-10, wherein said processing comprises removing one or more of silent portions, non-subject speech, and noise.
  • Example 13 The method according to any one of claims 2-12, wherein said linguistic features include prevalence of one or more type of word.
  • Example 14 The method according to claim 13, wherein said type of word includes words having a tense, for one or more tense.
  • Example 15 The method according to any one of claims 13-14, wherein said type of word includes a type of pronoun, for one or more pronoun type.
  • Example 16 The method according to any one of claims 13-15, wherein said type of word includes a pronoun, for one or more pronouns.
  • Example 17 The method according to any one of claims 13-16, wherein said type of word includes an emotion word, identified from a list of emotion words.
  • Example 18 The method according to any one of claims 13-17, wherein said type of word includes repetition of a word or phrase.
  • Example 19 The method according to any one of claims 13-18, wherein said type of word includes an emphasis word, identified from a list of emphasis words.
  • Example 20 The method according to any one of claims 14-19, wherein said prevalence includes a prevalence per minute of the type of word.
  • Example 21 The method according to any one of claims 14-20, wherein said prevalence includes a proportion of words of the narrative being said type of word.
  • Example 22 The method according to any one of claims 14-21, wherein one or more of said word types is identified according to a list.
  • Example 23 The method according to any one of claims 1-22, wherein said trained machine learning model is trained by: obtaining a plurality of voice recordings, each recording having a diagnosis label; processing each of said plurality voice recordings into a plurality of vocal recording portions, each vocal recording portion having a shorter time duration than a time duration of a corresponding voice recording of said plurality of voice recordings; extracting, for each of said plurality of vocal recording portions, vocal features of said subject; training the machine learning model, using said diagnoses labels and said vocal features.
  • Example 24 A method, implemented by computer circuitry, of training a machine learning model for diagnosis comprising: receiving a plurality of voice recordings, each recording having a diagnosis label and corresponding to a single patient; processing each of said plurality of voice recordings into a plurality of vocal recording portions, each vocal recording portion having a shorter time duration than said corresponding vocal recording time duration; extracting, for each of said plurality of vocal recording portions, vocal features of said subject; training the machine learning model, using said diagnosis labels, said vocal features for each of said plurality of vocal recording portions.
  • Example 25 A method, implemented by computer circuitry, of training a machine learning model for diagnosis comprising: obtaining a plurality of diagnosis labels, each diagnosis label including a diagnosis for a plurality of medical conditions; obtaining a plurality of voice recordings each associated with a diagnosis label of said plurality of diagnosis labels; obtaining one or more linguistic features, each associated with a diagnosis label of said plurality of diagnosis labels; extracting one or more vocal features from each of said plurality of voice recordings; training said machine learning model using said diagnosis labels, said one or more vocal features per diagnosis label, and said one or more linguistic features per diagnosis label.
  • Example 26 A method, implemented by computer circuitry, of training machine learning models for diagnosis comprising: obtaining a plurality of diagnosis labels, each diagnosis label including a diagnosis for a plurality of medical conditions; obtaining a plurality of voice recordings each associated with a diagnosis label of said plurality of diagnosis labels; obtaining one or more linguistic features, each associated with a diagnosis label of said plurality of diagnosis labels; extracting one or more vocal features from each of said plurality of voice
  • SUBSTITUTE SHEET (RULE 26) recordings; training a first machine learning model using said diagnosis labels and said one or more vocal features per diagnosis label; training a second machine learning model using said diagnosis labels and said one or more linguistic features per diagnosis label.
  • Example 27 The method according to claim 27, comprising training a third machine learning model using said first machine learning model and said second machine learning model and said diagnosis labels.
  • Example 28 The method according to claims 26-27, wherein said extracting one or more vocal features comprises, for each voice recording of said plurality of voice recordings: processing said voice recording into a plurality of vocal recording portions, each vocal recording portion having a shorter time duration than a duration of said voice recording; extracting, for each of said plurality of vocal recording portions, vocal features of said subject;
  • Example 29 The method according to any one of claims 26-87, wherein said obtaining said one or more linguistic features includes: obtaining a textural script of each said vocal recording of said plurality of vocal recordings; and extracting said one or more linguistic features from said textural script, for each textural script.
  • Some embodiments of the present disclosure are embodied as a system, method, or computer program product.
  • some embodiments of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro code, etc.) or an embodiment combining
  • SUBSTITUTE SHEET (RULE 26) software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” and/or “system.”
  • Implementation of the method and/or system of some embodiments of the present disclosure can involve performing and/or completing selected tasks manually, automatically, or a combination thereof. According to actual instrumentation and/or equipment of some embodiments of the method and/or system of the present disclosure, several selected tasks could be implemented by hardware, by software or by firmware and/or by a combination thereof, e.g., using an operating system.
  • hardware for performing selected tasks according to some embodiments of the present disclosure could be implemented as a chip or a circuit.
  • selected tasks according to some embodiments of the present disclosure could be implemented as a plurality of software instructions being executed by a computational device e.g., using any suitable operating system.
  • one or more tasks according to some exemplary embodiments of method and/or system as described herein are performed by a data processor, such as a computing platform for executing a plurality of instructions.
  • the data processor includes a volatile memory for storing instructions and/or data and/or a non-volatile storage e.g., for storing instructions and/or data.
  • a network connection is provided as well.
  • User interface/s e.g., display/s and/or user input device/s are optionally provided.
  • These computer program instructions may be provided to a processor of a general-purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart steps and/or block diagram block or blocks.
  • SUBSTITUTE SHEET (RULE 26) medium that can direct a computer (e.g., in a memory, local and/or hosted at the cloud), other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium can be used to produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
  • the computer program instructions may also be run by one or more computational device to cause a series of operational steps to be performed e.g., on the computational device, other programmable apparatus and/or other devices to produce a computer implemented process such that the instructions which execute provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • Some of the methods described herein are generally designed only for use by a computer, and may not be feasible and/or practical for performing purely manually, by a human expert.
  • a human expert who wanted to manually perform similar tasks might be expected to use different methods, e.g., making use of expert knowledge and/or the pattern recognition capabilities of the human brain, potentially more efficient than manually going through the steps of the methods described herein.
  • FIG. 1 illustrates a system, according to some embodiments of the disclosure
  • FIG. 2 illustrates data flow with respect to system elements, for prediction using a ML model, according to some embodiments of the disclosure
  • FIG. 3 is a flow chart of a method of diagnosis using a ML model, according to some embodiments of the disclosure.
  • FIG. 4 illustrates data flow with respect to system elements, for training of a ML model, according to some embodiments of the disclosure
  • FIG. 5 is a flow chart of a method of ML model training, according to some embodiments of the disclosure.
  • FIG. 6 is a flow chart of a method of diagnosis using a ML model, according to some embodiments of the disclosure.
  • FIG. 7 is a flow chart of a method of ML model training, according to some embodiments of the disclosure.
  • FIG. 8 illustrates data flow with respect to system elements, for prediction using a ML model, according to some embodiments of the disclosure
  • FIG. 9 is a method of ML model training, according to some embodiments of the disclosure.
  • FIG. 10 is a method of training a ML model, according to some embodiments of the disclosure.
  • FIG. 11 is a flow chart of methods of ML training, according to some embodiments of the disclosure.
  • FIG. 12 is a method of normalizing speech for an individual, according to some embodiments of the disclosure.
  • FIG. 13 is a method of a flow chart of a method of diagnosis using a ML model, according to some embodiments of the disclosure.
  • the present disclosure in some embodiments, thereof, relates to diagnosis of a subject and, more particularly, but not exclusively, to diagnosis of mental state of the subject using speech recording of the subject.
  • a broad aspect of some embodiments of the disclosure relates to diagnosing a subject using both vocal features of the subject (e.g. vocal features of speech of the subject) and linguistic features of language used by the subject.
  • one or more machine learning model provides a diagnosis of the subject using vocal recording/s of the subject.
  • vocal features are extracted from speech recording/s of the subject and linguistic feature/s are identified in text transcript/s of the recording/s.
  • SUBSTITUTE SHEET (RULE 26) An aspect of some embodiments of the disclosure relates to diagnosing a subject for a plurality of mental states (also herein termed “mental conditions”, “conditions”, “clinical states”, and “states”), using both vocal features of the subject and linguistic features of language used by the subject. Potential benefits of multi-morbidity diagnosis being potential identification of previously unknown co-morbidities potentially enabling improved treatment and/or outcome/s for the subject. For example, in some embodiments, a subject suffering from a physical illness is diagnosed for a range of mental health conditions e.g. using a single recording.
  • subjects suffering from a health condition potentially including both physical and mental aspects are diagnosed (e.g. using vocal recordings and the machine learning model), diagnosis of existence and/or severity of the different aspects in some embodiments, enabling improved treatment.
  • a potential advantage of using both vocal features and linguistic features of a subject in diagnosis of the subject is increased accuracy in the diagnosis e.g. as opposed to a diagnosis performed using a single type of feature.
  • using both vocal and linguistic features potentially enables diagnosis of clinical states for different subject populations e.g. different gender and/or age and/or language and/or culture and/or suffering from different clinical states and/or groups of clinical states.
  • the diagnosis is repeated at different times is repeated e.g. to provide assessment of a subject over time.
  • linguistic feature/s are also fed to the machine learning model which provides the diagnosis of the subject.
  • linguistic feature/s are evaluated separately, and the two evaluations (a first and a second evaluation) are used to provide a diagnosis of the subject.
  • a first evaluation is provided by a machine learning model fed the vocal feature/s and a second evaluation is based on identified linguistic feature/s (e.g. the second evaluation, in some embodiments provided by a machine learning model fed the linguistic 12
  • An aspect of some embodiments of the disclosure relates to dividing vocal recordings of a subject into a plurality of snippets (each snippet includes a temporal portion of the associated vocal recording) and one or both of training a machine learning model for diagnosis of subjects using snippets and diagnosis of a subject using snippets.
  • extracted feature/s from the snippets are used in training of and/or prediction using the machine learning model.
  • a vocal recording of speech of a subject is divided into snippets, from which are extracted snippet feature/s which are then fed to a machine learning model which produces a diagnosis for each snippet.
  • processing includes, in some embodiments, use of a rule and/or a second ML model.
  • diagnosis is for a plurality of conditions, where for each condition, the machine learning model provides a set of diagnoses (a diagnosis for each condition) for each snippet, and then the sets of diagnoses are used to provide a single set of diagnoses for the patient.
  • the subject voice recording is obtained as part screening and/or diagnosis and/or treatment of the subject.
  • subject/s presenting themselves for and/or receiving treatment are screened (e.g. for mental health status and/or complications) by obtaining a speech recording of the subject.
  • recordings of a same subject are acquired over time, diagnoses providing a picture of subject recovery or otherwise with time.
  • a subject suffering from a first medical condition is screened for other medical conditions, by acquiring voice recording/s and diagnosing the subject for one or more additional medical conditions (e.g. according to method/s as described in this document).
  • a potential advantage of which is improvement of prognosis of the first medical condition For example, a subject suffering from a cancer diagnosis who is also depressed (and/or suffering from other mental health condition/s) may have a less positive prognosis for the cancer diagnosis, potentially screening cancer sufferers for mental health issues which are then treated, potentially improving the cancer
  • voice recordings provides faster diagnosis and/or diagnosis for more individuals (e.g. as opposed to diagnosis by a mental health professional). For example, potentially enabling heath care authorities to divert mental health services to the correct individuals.
  • voice recordings and diagnosis of emotional state of the subject are used for other purposes than for diagnosis of health conditions.
  • voice recordings are used to determine if a subject is displaying correct emotional responses. For example, for screening of caregivers for potential abuse upon a child (or other dependent) presenting with an injury. For example, screening of suspected criminals. For example, for screening of subjects for risk e.g. risk of aggressive behavior.
  • video and/or photograph/s of a subject are used in the diagnosis e.g. in addition to vocal recording/s.
  • diagnosis e.g. as described within this document is used as a single-use diagnostic tool. In some embodiments, diagnosis e.g. as described within this document is used as a comparative assessment tool e.g. to enable comparison between subjects and/or a same subject at different time.
  • methods as described in this document provide an acute assessment e.g. for instance, upon an individual’ s arrival at a hospital e.g. after an accident or potentially traumatic event, for example, the diagnosis potentially enabling care providers to gauge a level of susceptibility to post-traumatic symptoms or full-blown PTSD.
  • the diagnosis is performed for a same subject at different times.
  • the diagnosis e.g. including a speech recording (e.g. recording an answer to a requests for the relevant narrative) performed before and after one or more intervention e.g. one or more of a therapy, program, study potentially provides evidence of change occurring in the processing of the narrated experience e.g. evidence of efficacy of the intervention.
  • a speech recording e.g. recording an answer to a requests for the relevant narrative
  • intervention e.g. one or more of a therapy, program, study
  • evidence of change occurring in the processing of the narrated experience e.g. evidence of efficacy of the intervention.
  • diagnosis is using subject narration of ‘same’ memories of ‘same’ events, experienced by a single individual, over a substantial period of time (e.g. 1 day to 1-5 years).
  • the diagnoses potentially providing connections between 14
  • SUBSTITUTE SHEET (RULE 26) narrative content and/or form, and/or one or more of the subject’s coping, processing, and psychosocial functioning.
  • a subject is asked to describe different events, the different recordings providing different diagnoses e.g. according to method/s described in this document.
  • these different diagnoses are used (e.g. by a healthcare practitioner e.g. a therapist) to identify those events which are one or more of more acute, prominent, or emotionally charged e.g. enabling identification of severity of different issues and/or triggers and/or efficacy of treatment for specific issues and/or triggers.
  • a diagnosis includes a diagnosis of how problematic and/or traumatic an event and/or experience was for the subject. Where, in some embodiments, this diagnosis is used to determine which subjects should receive preventative care (e.g. mental health care) to potentially prevent development of mental health issues (e.g. PTSD, depression, anxiety) associated with the event.
  • preventative care e.g. mental health care
  • PTSD depression, anxiety
  • diagnosis and/or diagnoses using method/s described in this document provide a one or more of:
  • FIG. 1 illustrates a system 100, according to some embodiments of the disclosure.
  • system 100 includes processing and memory circuitry (PMC) 102.
  • PMC processing and memory circuitry
  • one or more method as described in this document is performed partially or fully by PMC for example, one or more feature of the methods described in one or more of FIG. 3, FIG. 5, FIG. 6, FIG. 7, FIG. 10, FIG. 11, and FIG. 12.
  • system 100 includes a pre-processing module 104 (e.g. hosted by PCM 102).
  • system 100 includes a machine learning model 106 (e.g. hosted by PCM 102).
  • system 100 includes and/or has connectivity to one or more of a personal electronic device 108 (e.g. a cell phone, tablet, lap top), the cloud 110, electronic device 112 (e.g. medical device e.g. screening and/or diagnosis-bot), microphone 114.
  • a personal electronic device 108 e.g. a cell phone, tablet, lap top
  • the cloud 110 e.g. a cell phone, tablet, lap top
  • electronic device 112 e.g. medical device e.g. screening and/or diagnosis-bot
  • microphone 114 e.g. a microphone 114.
  • vocal recording of a subject is acquired by microphone 114 and/or a microphone of personal electronic device 108, and a microphone of electronic device 112.
  • one or more of personal electronic device 108, electronic device 112, and cloud 110 hosts one or more portion of PMC 102.
  • portion/s of PMC 102 are hosted as an application running on hardware of one or both of personal electronic device 108 (e.g. smartphone application) and electronic device 112 (e.g. desktop application).
  • personal electronic device 108 e.g. smartphone application
  • electronic device 112 e.g. desktop application
  • device/s 108, 112 access update/s to and/or feature/s of PCM 102 hosted by cloud 110.
  • one or more of personal electronic device 108, and electronic device 112 include one or more user interface. Through which a user (e.g.
  • SUBSTITUTE SHEET (RULE 26) subject, healthcare professional, subject caregiver) inputs information e.g. received by PCM 102 and/or is displayed output/s of PCM (e.g. output of ML model 106).
  • system 100 includes one or more sensor 109.
  • sensor/s 109 sense one or more physical feature of a subject (e.g. while the subject is being recorded to provide a subject vocal recording). For example, measurement including one or more of heartbeat feature/s (e.g. heart rate e.g. heart rate variability), blood pressure, temperature.
  • heartbeat feature/s e.g. heart rate e.g. heart rate variability
  • blood pressure e.g. heart pressure
  • temperature e.g. heart rate variability
  • the processor of PMC 102 is configured to implement at least one machine learning model 106.
  • the machine learning model 106 include a neural network (NN).
  • the machine learning model 106 includes a deep neural network (DNN).
  • the processor executes several computer-readable instructions implemented on a computer-readable memory of PMC 102, wherein execution of the computer-readable instructions enables data processing by machine learning model 106.
  • machine learning model 106 enables processing of data provided (e.g. including subject vocal feature/s and/or subject linguistic feature/s), for outputting one or more diagnosis of the subject.
  • the processor of PMC 102 is configured to implement a plurality of different machine learning models 106, for example, referring to FIG. 8, both ML model 810 and 2 nd ML model 816.
  • the layers of the machine learning model 106 are, in some embodiments, organized in accordance with Convolutional Neural Network (CNN) architecture, Recurrent Neural Network architecture, Recursive Neural Networks architecture, or Generative Adversarial Network (GAN) architecture.
  • CNN Convolutional Neural Network
  • GAN Generative Adversarial Network
  • at least some of the layers are organized in a plurality of DNN subnetworks.
  • Each layer of the DNN includes multiple basic computational elements (CE), typically referred to in the art as dimensions, neurons, or nodes.
  • CE basic computational elements
  • computational elements of a given layer can be connected with CEs of a preceding layer and/or a subsequent layer.
  • Each connection between a CE of a preceding layer and a CE of a subsequent layer is associated with a weighting value.
  • a given CE can receive inputs from CEs of a previous layer via the respective connections, each given 17
  • SUBSTITUTE SHEET (RULE 26) connection being associated with a weighting value which can be applied to the input of the given connection.
  • the weighting values can determine the relative strength of the connections and thus the relative influence of the respective inputs on the output of the given CE.
  • the given CE can be configured to compute an activation value (e.g., the weighted sum of the inputs) and further derive an output by applying an activation function to the computed activation.
  • the activation function can be, for example, an identity function, a deterministic function (e.g., linear, sigmoid, threshold, or the like), a stochastic function, or other suitable function.
  • the output from the given CE can be transmitted to CEs of a subsequent layer via the respective connections.
  • each connection at the output of a CE can be associated with a weighting value which can be applied to the output of the CE prior to being received as an input of a CE of a subsequent layer.
  • weighting values there can be threshold values (including limiting functions) associated with the connections and CEs.
  • FIG. 2 illustrates data flow with respect to system elements, for prediction using a ML model, according to some embodiments of the disclosure.
  • one or more obtained and/or received subject voice recording/s 216 are processed, by a voice recording processing module 203 to determine vocal feature/s 222.
  • vocal feature/s 222 are pre-determined vocal features.
  • the subject voice recording/s include one or more feature as described in step 300 FIG. 3.
  • processing module 203 performs and/or vocal feature/s 222 include one or more feature as described in regarding step 304 FIG.
  • voice recording processing module 203 (e.g. prior to extracting vocal feature/s) performs one or more pre-processing operation on the voice recording/s 216.
  • pre-processing is performed using a pre-trained ML model, e.g. referring back to FIG. 1, where a pre-processing ML model is hosted, in some embodiments, by pre-processing module 104.
  • subject language data 218 of the subject is processed, by a language processing module 205, to provide linguistic feature/s 224 which linguistic 18
  • SUBSTITUTE SHEET (RULE 26) feature/s are in some embodiments, pre-determined.
  • subject language data 218 includes one or more feature of subject language data as described regarding step 302 FIG. 3.
  • language processing module performs and/or linguistic feature/s include one or more feature as described regarding step 306 FIG. 3.
  • ML model/s 206 provide a multi-morbidity diagnosis 228 for the subject, using linguistic feature/s 224 and vocal feature/s 222 which are fed to the model, for example, the providing including one or more feature as described regarding step 312 FIG. 3.
  • subjective self-ratings 220 (which, in some embodiments, include one or more feature as described regarding self -ratings step 310 FIG. 3) are additionally fed to the ML model 206 e.g. and used in providing the multimorbidity diagnosis 228.
  • ML model 206 is a single ML model which receives both linguistic features/s 224 and vocal feature/s 222 and optional self-rating/s 220 to provide diagnosis 228.
  • ML model 206 includes a plurality of models e.g. including one or more feature as described and/or illustrated regarding FIG. 13. For example, a vocal feature ML model which receives vocal feature/s 222, and a linguistic feature ML model which receives linguistic features 224.
  • the vocal feature ML model and the linguistic feature ML model each produce a diagnosis, where the two diagnoses are fed to another ML model to provide a single diagnosis for the subject and/or a single diagnosis is provided by applying a rule to the two diagnoses.
  • self-rating/s 220 are fed to one or more of the vocal feature and linguistic feature ML models to be used for providing one or more of the diagnoses from the vocal feature and linguistic feature ML models.
  • PMC 102 hosts data 216, 218, 220, 222, 224, 228 and/or system elements 203, 205, 206.
  • pre-processing module 104 hosts one or both pre-processing modules 203, 205 and/or ML model 206 corresponds to ML model 106.
  • FIG. 3 is a flow chart of a method of diagnosis using a ML model, according to some embodiments of the disclosure.
  • subject voice recording/s are received, which recording/s include audio recording.
  • a single voice recording is received.
  • the voice recording includes 10 second -10 minutes of recording, or 1-5 minutes of recording, and/or a recording from which at least 1 minute of continuous vocalization (e.g. speech) of the subject is extractable (e.g. by removal of pauses and/or noise and/or extraneous speech).
  • a speech recording suitable for vocal method/s described in this document includes at least lOseconds - 3minutes, or about 1 minute, or lower or higher or intermediate ranges or durations of recorded speech of a subject.
  • a speech recording suitable for linguistic method/s described in this document includes at least 1 minute to 10 minutes, or about 5 minutes or lower or higher or intermediate ranges or durations of recorded speech of a subject.
  • the subject voice recording includes speech of the subject.
  • the subject voice recording 216 includes subject vocalizations e.g. one or more of emotional sounds (crying, sighing), vocalization accompanying non-verbal communication e.g. via sigh language, singing, humming, coughing.
  • subject language data is obtained and/or received. For example, according to one or more feature of step 1018 FIG. 10.
  • subject language data includes a text script/s of the subject voice recording/s.
  • subject language data includes text written by the subject e.g. one or more of; written answers to questions, social media posts, and email correspondence.
  • subject language data is acquired using one or more feature of Dr. Perez's (2014) Narrative Method for Assessment of Psychosocial Processing (NMAPP) e.g. including one or more feature as described in reference [16].
  • NMAPP Narrative Method for Assessment of Psychosocial Processing
  • a subject is asked to describe an experience.
  • the experience is a potentially emotive and/or psychologically charged event.
  • the subject is asked (e.g. by a human and/or electronic interviewer, verbally and/or by display of text): “Tell me the story of what you experienced.”
  • SUBSTITUTE SHEET (RULE 26)
  • a potential advantage of the subject providing a narrative is that the process of narrating may be therapeutic and cathartic in and of itself. Indeed, it may assist the individual in organization of what may be very chaotic events and experiences, and in the process of the creation of coherence out of what may be highly incoherent facets of experience - particularly in the acute post-event context.
  • vocal feature/s are extracted from the voice recording.
  • one or more vocal features include frequency-domain representation of the voice signal (e.g. obtained using Fast Fourier Transform (FFT)).
  • the vocal features include (e.g. only) frequency-domain values of the subject speech recording where those frequencies not corresponding to human speech are absent.
  • one or more vocal features are extracted using openSMILE open-source software.
  • exemplary vocal features include one or more of:
  • spectral aspects of the vocal signal e.g. pitch e.g. pitch range
  • linguistic feature/s are identified in the language data.
  • identifying linguistic features uses one or more feature of Dr. Perez's (2014) Narrative Method for Assessment of Psychosocial Processing (NMAPP) e.g. including one or more feature as described in reference [16].
  • NMAPP Narrative Method for Assessment of Psychosocial Processing
  • Natural Language Processing (NLP) technique/s are used to extracts one or more linguistic features e.g. techniques including one or more feature as described in one or more of references [4, 6, 7, 10, 14, 15] and/or one or more feature of Linguistic Inquiry and Word Count (LIWC) or word2vec programs (e.g. as described in reference [18]).
  • NLP Natural Language Processing
  • identifying linguistic features uses one or more feature as
  • linguistic features include prevalence of one or more feature of language used by the subject. Where, regarding exemplary linguistic features described hereinbelow, examples of each linguistic feature are found in table 1.
  • a linguistic feature includes a prevalence of one or more pronoun and/or pronoun type.
  • prevalence is, in some embodiments, the prevalence of the pronoun (and/or pronoun type) per minute and/or as a proportion of words used, and/or as a proportion of pronouns used.
  • pronouns are identified using a list of pronouns, data entries of the list, in some embodiments, having an associated category.
  • exemplary pronoun types include, for example, one or more of: personal (I, we, you, he, she, it, they), demonstrative (this, these, that, those), relative (who, which, that, as), indefinite (each, all, everyone, either, one, both, any, such, somebody), interrogative (who, which, what), reflexive (myself, herself), possessive (mine, yours, his, hers, theirs).
  • a linguistic feature includes a tense of language used, optionally, with respect to a time of occurrence of the story being narrated.
  • prevalence e.g. words per minute and/or as a proportion of words used and/or words having tense used
  • proportion of language used in description which has matching and/or mismatching tense e.g. prevalence past tense words used in description of a past event, e.g. prevalence of present tense words used in description of a past event, e.g. prevalence of future tense words used in description of a past event.
  • a linguistic feature includes a prevalence of “emotion words”.
  • prevalence in some embodiments, is emotion words per minute and/or proportion emotion words used and/or proportion of words of one or more type used (e.g. proportion of adjectives and/or adverbs and/or verbs and/or nouns being emotion words).
  • identification of emotion words is via a list (e.g. a pre-defined list).
  • a linguistic feature includes a prevalence of extra-linguistic expressions of emotion. For example, non-speech vocalizations e.g. crying, sighing, tongue clicking. For example, non-vocal noise generation e.g. hitting the table, stamping, clapping.
  • prevalence 22 For example, in some embodiments, prevalence 22
  • SUBSTITUTE SHEET (RULE 26) includes a number of such expressions, and/or duration, and/or proportion of the narration including such expressions.
  • a linguistic feature includes a prevalence of repetition. For example, a prevalence of repeated words and/or phrases. For example, as a proportion of a time of the narrative and/or as a proportion of the words used.
  • a linguistic feature includes a prevalence of emphasis used.
  • Emphasis in some embodiments, including use of words of a list of emphasis words (e.g. prevalence as a proportion of the narrative time and/or words).
  • Emphasis in some embodiments, including non-verbal gesture/s (e.g. a prevalence of).
  • Emphasis in some embodiments, including a volume of the speech e.g. as compared to other portion/s of the speech recording.
  • a linguistic feature includes a prevalence of passive and/or active language used. For example, as a proportion of a time of the narrative and/or as a proportion of the words and/or phrases used.
  • a linguistic feature includes one or more of a type of language e.g. where a phrase of language used is categorized as having one or more of type.
  • exemplary types including, for example, one or more of metaphor, poetic talk, euphemism, personalized / belonging talk, distancing / distanced talk, generalized talk, acceptability talk, meta-talk, talk outside narrative, markers specific to text, agency / locus of control, guilt talk, ambivalence, argumentation (with social discourse).
  • phrase/s are categorized using a machine learning model trained to identify types of language used, where in some embodiments, once identified, a linguistic feature includes prevalence (e.g. as a proportion of the text) of one or more type of language used.
  • lists e.g. pre-defined lists
  • the prevalence of each word type being a linguistic feature.
  • the lists include a list of pronouns, emotion words, and emphasis words.
  • one or more linguistic feature is determined for one or more specific portion of narrative. For example, an opening phrase and/or portion of a narrative. For example, a closing phrase and/or portion.
  • subject self-ratings are obtained and/or received.
  • the subject provides rating for pain, depression, and anxiety.
  • the rating is a 10-point rating scale based on the SUDS (subjective units of distress) measure e.g. as described in reference [27] .
  • the subject provides subjective self-ratings more than one time e.g. before and after vocal recordings of the subject are acquired.
  • self-rating repetition is used to provide an average selfrating. In some embodiments, self-rating repetition is used to provide a measure as to how traumatic the experience of providing a vocal recording is e.g. according to a theory that a recently distressed subject will rate their level of distress in a self-rating as higher.
  • self-rating repetition is used to determine if self-rating
  • SUBSTITUTE SHEET (RULE 26) provides a sufficiently consistent diagnosis dependent of emotional state of the subject.
  • multi-morbidity diagnosis includes a binary indication as to whether the subject has one or more medical condition. Additionally or alternatively, for one or more medical condition, in some embodiments, multi-morbidity diagnosis includes a level of severity of the condition.
  • a plurality of machine learning model/s each provide a diagnosis (e.g. a multi-morbidity diagnosis) where, in some embodiments, the plurality of diagnoses are combined using another machine learning model and/or a rule, for example, as described regarding FIG. 13.
  • a diagnosis is produced using the model where, in some embodiments, the model produces a number (e.g. between 0-100) as to how likely the subject suffers from a particular condition and, according to a rule (e.g. pre-trained cutoff) the subject is diagnosed as having the condition e.g. as belonging to a sick (e.g. depressed) or healthy (e.g. non-depressed) person.
  • a rule e.g. pre-trained cutoff
  • the cut-off is a different number.
  • the number provided by the ML model is within a different range (e.g. 0-50).
  • FIG. 4 illustrates data flow with respect to system elements 403, 405, 406, for training of a ML model 406, according to some embodiments of the disclosure.
  • like elements of FIG. 4 correspond to like elements of FIG. 2 (e.g. element 416 FIG. 4 corresponding to element 216 FIG. 2).
  • data is for a plurality of subjects (e.g. as opposed to FIG. 2 where, in some embodiments, data elements include data for a single subject).
  • a machine learning model 406 is fed vocal feature/s 422, linguistic feature/s 424, associated subjects’ labels 426, and optional subjects’ self- rating/s 420 and is then trained using the received data.
  • element 406 is a plurality of ML models.
  • vocal feature ML model which is fed vocal feature/s 422 and associated subjects’ labels 426
  • linguistic features ML model which is fed linguistic features 424 and associated subjects’ labels 426.
  • SUBSTITUTE SHEET (RULE 26) ML model and the linguistic feature ML model each are trained to produce a diagnosis.
  • one or both of the vocal feature ML model and the linguistic feature ML model are fed and trained using self-rating/s 420.
  • a ML pipeline is used, where the ML algorithm itself selects which features (e.g. of a plurality of vocal features and/or a plurality of linguistic features) are important or not as part of the training procedure.
  • PMC 102 hosts data 416, 418, 420, 422, 424, 428 and/or system elements 403, 405, 406.
  • pre-processing module 104 hosts one or both pre-processing modules 403, 405 and/or ML model 406 corresponds to ML model 106.
  • FIG. 5 is a flow chart of a method of ML model training, according to some embodiments of the disclosure.
  • a data set of a plurality of labeled subject voice recordings are obtained and/or received. Where each subject recording includes one or more feature as described subject voice recording/s regarding step 300 FIG. 3.
  • the data set includes both subjects having with clinical states (e.g. depression, anxiety, chronic pain) and normative persons without diagnosed mental health condition/s.
  • clinical states e.g. depression, anxiety, chronic pain
  • normative persons without diagnosed mental health condition/s e.g. depression, anxiety, chronic pain
  • a label indicates presence of one or more condition for each respective subject.
  • the label includes binary diagnosis (subject has the condition, subject does not have the condition) for one or more condition.
  • the label indicates a level of severity of one or more condition.
  • medical conditions include one or more psychological conditions. For example, one or more of; depression, anxiety, chronic pain syndrome, post-traumatic stress disorder (PTSD), sleep disturbances, eating disorder/s, fibromyalgia, attention deficit and hyperactivity disorder (ADHD), attention deficit disorder (ADD).
  • psychological conditions For example, one or more of; depression, anxiety, chronic pain syndrome, post-traumatic stress disorder (PTSD), sleep disturbances, eating disorder/s, fibromyalgia, attention deficit and hyperactivity disorder (ADHD), attention deficit disorder (ADD).
  • a label (or diagnosis) includes a subjects’ relationship to an experience and/or event. For example, a quantification as to how traumatic the event was.
  • a label (e.g. a multiple-condition label) is determined for 26
  • labeling of subjects includes diagnosing the subject using one or more standardized assessment questionnaire (e.g. including one or more standardized psychological questionnaire).
  • one or more standardized assessment questionnaire e.g. including one or more standardized psychological questionnaire.
  • Standard demographic and/or health behavior questions e.g. smoking, self-rated health, use of medications
  • GAD Generalized anxiety disorder
  • Pain diagnosis labeling uses, for example, one or more of: o The Short Form McGill Pain Questionnaire as described in reference [24] o The PHQ15 (brief medical questionnaire) as described in reference [25] o The PCS (pain catastrophizing scale) as described in reference [26]
  • the subject provides answers to questionnaire questions, (written and/or inputted into an electronic device).
  • the subject e.g. verbally
  • a healthcare professional e.g. psychiatrist
  • a diagnosis label for the subject e.g. the label including a diagnosis for one or more medical condition.
  • subject language data includes one or more feature of subject language data as described regarding step 302 FIG. 3.
  • vocal features are extracted from the received recordings. Where step 504, in some embodiments, includes one or more feature of step
  • step 506 in some embodiments, linguistic features are identified in the language data. Where step 506, in some embodiments, includes one or more feature of step 306 FIG. 3.
  • the ML model is trained to provide diagnoses for more than one condition, using the extracted vocal feature/s, linguistic feature/s, and label.
  • a dataset of labeled subject data (e.g. including vocal feature/s and/or linguistic feature/s) is split into training and test sets.
  • 50-95% of the data is used as training data, the remaining 50%-5% being used as test data.
  • 80% of the data is used as training data and 20% as test data.
  • one or more ML classifier is trained on the training data where, in some embodiment accuracy of each classifier is evaluated using the test data.
  • the ML classifier includes one or more of Neural Networks, Random Forest, logistic regression, k- nearest neighbors and boosting methods.
  • the ML classifier is evaluated using one or more of Flscore, AUC, accuracy, precision, and recall, all computed using k-fold cross-validation.
  • one or more measurement of subject physical feature is obtained and/or received.
  • physical feature/s and/or obtaining thereof includes one or more feature as described regarding sensor/s 109 FIG. 1.
  • the measured physical feature/s are fed to the machine learning model e.g. in addition to vocal feature/s.
  • a ML pipeline is used, where the ML algorithm itself selects which features (e.g. of a plurality of vocal features and/or a plurality of linguistic features) are important or not as part of the training procedure.
  • a first ML model is trained to provide diagnoses for more than one condition, using the extracted vocal feature/s and label.
  • a second ML model is trained to provide diagnoses for more than one condition, using the extracted linguistic feature/s and label.
  • a third ML model is trained to provide diagnoses for more than one condition, using a diagnoses provided by the first and second ML models, and the label.
  • FIG. 6 is a flow chart of a method of diagnosis using a ML model, according to some embodiments of the disclosure.
  • step 600 of FIG. 6 includes one or more feature of step 300 of FIG. 3.
  • each speech recording is processed into a plurality of snippets.
  • each snippet includes a temporal portion of the associated vocal recording.
  • a speech recording is divided into a plurality of portions, the snippets not overlapping temporally.
  • snippets overlap each other.
  • a continuous recording is cut into snippets of 1 second - 1 minute, or 1-30 seconds or 5-20 seconds, or about 10 seconds, or lower or higher or intermediate durations or ranges
  • steps 604 and 606 each include one or more feature of steps 304 and 308 FIG. 3 respectively and/or steps 504 and 508 FIG. 5 respectively, where the steps are carried out, in some embodiments for each snippet to provide, at step 608 a diagnosis for each snippet, also herein termed an “intermediate diagnosis” (where, in some embodiments, the diagnosis is a multi-morbidity diagnosis e.g. as described regarding step 312 FIG. 3 and/or multi-morbidity diagnosis 228 FIG. 2).
  • both vocal features and linguistic features are fed to a ML model
  • only vocal features are fed to the ML model or only linguistic features are fed to the ML model.
  • each of vocal features and linguistic features are fed to a separate model (e.g. as described regarding FIG. 13).
  • the plurality of diagnoses are fed to a rule and/or a second ML model to provide, at 612, a single subject diagnosis, which, in some embodiments, is a multi-morbidity diagnosis.
  • a rule includes a threshold number and/or proportion where, if this number and/or proportion of snippet diagnoses indicates the subject has a particular condition, the rule outputs a positive diagnosis for the condition.
  • different diagnostic conditions have different rules.
  • a rule include, for example, averaging of the plurality of snippet diagnoses.
  • FIG. 7 is a flow chart of a method of ML model training, according to some embodiments of the disclosure.
  • a plurality of labeled subject speech recordings are received. For example, according to one or more feature of step 500, FIG. 5.
  • the speech recordings are each processed into a plurality of snippets e.g. according to one or more feature of step 606.
  • vocal features and/or linguistic features are extracted from the snippets.
  • extraction of vocal feature/s from the snippets includes one or more feature of vocal feature extraction as described regarding step 604 FIG. 6.
  • a ML model for subject diagnosis is trained using the vocal features (and/or linguistic features) extracted and labels for a plurality of snippets. For example, where a single labeled recording provides a plurality of labeled snippets, increasing data for training of the ML model.
  • FIG. 7 is performed twice, a first time to train a vocal feature ML model using vocal features extracted from snippets and a second time to train a linguistic feature ML model using linguistic features extracted from snippets.
  • a ML pipeline is used, where the ML algorithm itself selects which features (e.g. of a plurality of vocal features and/or a plurality of linguistic features) are important or not as part of the training procedure.
  • FIG. 8 illustrates data flow with respect to system elements, for prediction using a ML model, according to some embodiments of the disclosure.
  • a voice recording 802 (including one or more feature of voice recording 216 FIG. 2) of a subject is processed by a voice recording processing module 804 to provide a plurality of voice recording snippets 806. Where, in some embodiments, voice recording processing module 802 performs one or more feature of step 604, FIG. 6.
  • Plurality of snippets 804, in some embodiments, are processed (e.g. according to 30
  • SUBSTITUTE SHEET (RULE 26) one or more feature of step 605 FIG.6) by a snippet processing module 806 to produce a plurality of sets of voice feature/s 808, e.g. a set of voice feature/s corresponding to each snippet.
  • a ML model 810 produces a plurality of diagnoses 812, e.g. a diagnosis (in some embodiments a multi-condition diagnosis) for each snippet.
  • a rue and/or second ML learning model 814 is fed the plurality of intermediate diagnoses 812 to provide a single subject diagnosis 816, for the individual subject who produced vocal recording 802.
  • a structure similar to that of FIG. 8 is used to provide a diagnose a subject but where the features extracted from the snippets are linguistic features.
  • a diagnosis is determined by receiving subject diagnoses 816 and a diagnosis provided using linguistic feature extraction from snippets and an associated machine learning model and the diagnoses are used to provide a single diagnosis of the subject (e.g. using another ML model and/or a rule).
  • FIG. 9 is a method of ML model training, according to some embodiments of the disclosure.
  • labeled sets of snippet diagnoses are received.
  • a set of snippet diagnoses includes a plurality of diagnoses, one for each snippet, where the set has a single label (e.g. the snippets being portion of a recording of a patient having that label).
  • a ML model is trained using the plurality of labeled sets of snippet diagnoses.
  • FIG. 10 is a method of training a ML model, according to some embodiments of the disclosure.
  • a plurality of subject recordings are received. For example, according to one or more feature of step 300 FIG. 3.
  • labels are received e.g. a label (e.g. label including one or more feature of label) for each recording.
  • labels are determined, for example, according to steps 1002, 1006, 1008.
  • SUBSTITUTE SHEET (RULE 26) questionnaires are received, for example, including one or more feature of step 300 FIG. 3.
  • subjects’ diagnosis labels are determined using the subject evaluation questionnaires (e.g. only).
  • subjects’ subjective rating/s e.g. 310 FIG. 3
  • subjects’ diagnosis labels are determined using both the subject evaluation questionnaires received at step 1002 and the subjects’ subjective evaluation received at step 1006.
  • the vocal recordings are pre-processed.
  • one or more feature of pre-processing is performed by a ML model.
  • one or more portion of pre-processing is performed by publicly (and/or commercially) available software e.g. openSMILE software.
  • pre-processing includes identifying and/or removing of silent portions of the speech recordings. Where, in some embodiments, preprocessing includes removing noise from the vocal recordings. Noise for example, including extraneous speech (e.g. speech of an interviewer) and/or background noise.
  • pre-processing includes checking and/or verifying that the speech recording includes speech of one subject.
  • a ML model is used to one or more of check, verify, and remove speech that is not associated with the subject (e.g. the subject being the individual speaking for a majority of the time of the recording).
  • each speech recording is split into snippets, for example, according to one or more feature of step 702 FIG. 7.
  • subject self-rating/s are received, subject self- rating/s, for example, including one or more feature of subject self-rating/s as described regarding step 310, FIG. 3.
  • vocal feature/s are extracted from each snippet. For example, according to one or more feature of step 704 FIG. 7.
  • subject language data is received.
  • subject language data includes a text script/s of the subject voice recording/s extracted at step 1018a.
  • the extracting includes feeding the subject voice recording/s to a ML model or other software 32
  • subject language data includes subject textural data, obtained and/or received at step 1018b.
  • subject textural data includes, for example, text written by the subject e.g. one or more of; written answers to questions, social media posts, and email correspondence.
  • linguistic feature/s in language data and/or text from speech recordings are identified. For example, according to one or more feature of step 506 FIG 5.
  • a ML model (or more than one ML model e.g. as described regarding ML model/s 206 and/or step 312 FIG. 3) is trained using labeled extracted vocal features and optionally, one or both of linguistic features (e.g. extracted at step 1020) and subject self-evaluation (e.g. received at step 1006).
  • a sub-set of vocal feature/s and/or linguistic feature/s for the ML model are selected, based on the training.
  • the ML model is provided e.g. for use in a system e.g. system 100 FIG. 1
  • FIG. 11 is a flow chart of methods of ML training, according to some embodiments of the disclosure.
  • FIG. 11 in some embodiments, illustrates exemplary embodiments of when, in a process of ML training, linguistic feature/s are identified from subject speech recordings.
  • Steps 1104, 1106, and 1108, in some embodiments, illustrating an embodiment where vocal features are extracted 1106 from snippets 1104 to train a ML model 1108.
  • Steps 1110, 1112, 1114, 116 in some embodiments, illustrating an embodiment where linguistic features are extracted from the entire recording 1110, and where vocal features are extracted from snippets 1114 to train a ML model 1116.
  • Steps 1118, 1120, 1122, 1124 in some embodiments, illustrating an embodiment where linguistic features 1120 and vocal features 1122 are both extracted from snippets to train a ML model 1124.
  • Arrow 1126 in some embodiments, illustrates a further embodiment where linguistic features are identified in the entire recordings 1110 and in the snippets 1120 where, in some embodiments, both types of linguistic feature are used to train a ML model 1124.
  • FIG. 12 is a method of companng speech for an individual, according to some embodiments of the disclosure.
  • a first speech recording of a subject including is obtained.
  • the first speech recording includes non-emotive content.
  • a vocal recording of a subject verbally performing a cognitive task potentially increases reliability and/or accuracy of recording and/or detecting baseline linguistic feature/s and/or vocal feature/s for the specific subject.
  • the subject is asked to describe an event which isn’t potentially emotive (or has low likelihood of being so) “Tell me the story of your experience of the weather today is”.
  • vocal and/or linguistic features of the first recording are identified to provide a first feature set (e.g. a non-emotive feature set), for example, including one or more feature of step 304 FIG. 3 and/or step 306 FIG. 3.
  • a first feature set e.g. a non-emotive feature set
  • a second speech recording of a subject is obtained.
  • the second speech recording includes potentially emotive content (also herein termed “emotional subject matter”) to provide an emotive feature set.
  • emotive content also herein termed “emotional subject matter”
  • the subject is asked to describe a potentially emotive event “Tell me the story of what you experienced.”
  • the subject is asked to verbally recount a potentially traumatic even and/or describe a potentially traumatic subject.
  • a potentially traumatic subject For example, when screening in order to identify those subjects likely to suffer long-lasting mental distress associated with a situation (e.g. health situation, e.g. injury, e.g. childbirth), in some embodiments, the subject is asked to describe the health situation and/or event/s which lead to the health situation.
  • vocal and/or linguistic features of the second 34 are included in some embodiments.
  • SUBSTITUTE SHEET (RULE 26) recording are identified to provide a second feature set (e.g. an emotive feature set), for example, including one or more feature of step 304 FIG. 3 and/or step 306 FIG. 3.
  • a second feature set e.g. an emotive feature set
  • vocal feature/s and/or linguistic features of the emotive feature set are normalized e.g. for the individual subject, using the first feature set to normalize the second feature set e.g. non-emotive feature set to normalize the emotive feature set.
  • vocal pitch and/or volume are different between individuals but are affected by emotional state.
  • non-emotive parameters e.g. for pitch and/or volume
  • emotive parameters e.g. determine emotional and/or mental state of the subject e.g. with respect to the emotive subject matter.
  • FIG. 13 is a method of a flow chart of a method of diagnosis using a ML model, according to some embodiments of the disclosure.
  • subject voice recording/s are obtained and/or received e.g. according to one or more feature of step 500 FIG. 5.
  • pre-determined vocal feature/s are extracted from the voice recording e.g. according to one or more feature of step 504 FIG. 5.
  • a vocal feature ML model is trained to provide a first diagnoses for more than one condition, using the extracted vocal feature/s, linguistic feature/s, and label.
  • training of the vocal feature ML includes one or more feature of ML training as describe regarding step 508 FIG. 5
  • labeled subject language data is received e.g. according to one or more feature of step 502 FIG. 5.
  • linguistic features are identified in the language data e.g. according to one or more feature of step 506 FIG. 5
  • a linguistic feature ML model is trained to provide diagnoses for more than one condition, using the extracted linguistic feature/s, linguistic feature/s, and label.
  • training of the vocal feature ML includes one or more feature of ML training as describe regarding step 508 FIG. 5
  • the first and second diagnoses are combined e.g. by another ML model e.g. to provide a single diagnosis of the subject.
  • combining of the two diagnoses includes using IMP® SPSS® Statistics software (version 25) to analyze a distribution of the diagnoses from the two analyses (i.e., present/not present and level of pain, depression, or anxiety).
  • the diagnoses are entered for analysis by conventional bivariate, and/or multivariate analyses.
  • independent variables include one or more of age, gender, vocal feature/s, and narrative feature/s.
  • multivariate analysis/es includes discriminant function analysis e.g. to form a parsimonious set of variables for detection of a plurality of outcome measures: e.g. depression, anxiety, or pain.
  • log-linear regression is used to reduce a number of independent variables e.g. to amplify statistical power.
  • diagnoses from both methods are analyzed using standard epidemiological methods for testing the efficacy of a screening test. These analyses include, for example, calculation of sensitivity and/or specificity, and/or positive and/or negative predictive values of the test e.g. according to one or more feature as described in reference [28].
  • analysis of the two diagnoses is carried out using a ML pipeline, where the ML algorithm itself selects which features are important or not as part of the training procedure.
  • a machine learning model was trained using the publicly available Distress Analysis Interview Corpus (DAIC) data set.
  • DAIC Distress Analysis Interview Corpus
  • the extracted features included frequency -domain representation of the voice signal (e.g. obtained using Fast Fourier Transform (FFT)).
  • FFT Fast Fourier Transform
  • openSMILE software was used to process speech recordings providing a frequency domain signal of the speech, where frequency bands not corresponding to human speech were removed.
  • Prediction using the model included the model receiving a voice sample, producing a number between 0-100, and, according to a pre-trained cut-off, diagnosing the subject as belonging to a sick (e.g. depressed) or healthy (e.g. non-depressed) person.
  • SUBSTITUTE SHEET (RULE 26) Training was used to select a subset of vocal features to improve diagnosis.
  • SUMC Sesoroka University Medical Center
  • classifiers were employed including Random Forest and Neural Networks. Better results were obtained with Neural Networks.
  • Results had lower sensitivity produced by the method (sensitivity - correcting identifying those with either depression or anxiety) compared with specificity (correctly identifying those without). Without wanting to be bound by theory, it is theorized that the lower sensitivity is associated with the small sample size and proportion of identified cases, which is further reduced in the train-test division of data.
  • SUBSTITUTE SHEET • the second-person feminine singular pronoun was used when the narrator, regardless of gender, was describing difficult, stressful, and/or upsetting sensory experiences and aftereffects.
  • Results for depression and anxiety are presented below in tables 2 and 3, where increase in accuracy of the ML model with data amount (number of patients) is also illustrated.
  • Table 4 illustrates generalizability (the ability of the ML to predict condition/s for a group of people, where the ML model has been trained using a different group of people), in some embodiments, of ML methods of the present disclosure.
  • table 4 illustrates sensitivity, specificity, and accuracy of diagnosis for a range of different populations being used as training data and test data.
  • Tables 5-10 illustrate application of different rules for providing a single diagnosis for a subject, based on a set of diagnoses, each associated with a different snippet of a vocal recording of the subject. For example, referring to the figures where the rule corresponds to step 610 FIG. 6 and/or to step 706 FIG. 7 and/or step 814 FIG. 8.
  • a rule providing the largest accuracy is selected.
  • a rule providing the highest sensitivity and/or specificity is selected e.g. depending on the application.
  • the coded transcripts were analyzed to provide higher-order Wegner pattern recognition and interpretations.
  • SUBSTITUTE SHEET (RULE 26) 4. emotionality in content (e.g., use of words such as “happy” and “sad”)
  • emotionality in form e.g., use of emphasis, repetition, flattening, and dialogue
  • extra-linguistic cues e.g., crying, sighing, pausing
  • Range format should not be construed as an inflexible limitation on the scope of the present disclosure. Accordingly, descriptions including ranges should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within the stated range and/or subrange, for example, 1, 2, 3, 4, 5, and 6. Whenever a numerical range is indicated within this document, it is meant to include any cited numeral (fractional or integral) within the indicated range.
  • SUBSTITUTE SHEET (RULE 26) context of separate embodiments, may also be provided in combination in a single embodiment.
  • various features of the present disclosure which are (e.g., for brevity) described in a context of a single embodiment, may also be provided separately or in any suitable sub-combination or may be suitable for use with any other described embodiment.
  • Features described in the context of various embodiments are not to be considered essential features of those embodiments, unless the embodiment is inoperative without those elements.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Psychiatry (AREA)
  • Biomedical Technology (AREA)
  • Medical Informatics (AREA)
  • Signal Processing (AREA)
  • Pathology (AREA)
  • Mathematical Physics (AREA)
  • Veterinary Medicine (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Animal Behavior & Ethology (AREA)
  • Surgery (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Hospice & Palliative Care (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Child & Adolescent Psychology (AREA)
  • Multimedia (AREA)
  • Developmental Disabilities (AREA)
  • Educational Technology (AREA)
  • Psychology (AREA)
  • Social Psychology (AREA)
  • Fuzzy Systems (AREA)
  • Physiology (AREA)

Abstract

A method, implemented by computer circuitry, of diagnosis of a plurality of medical conditions for a subject including: obtaining a vocal recording of the subject having a vocal recording time duration; processing the vocal recording into a plurality of vocal recording portions, each vocal recording portion having a shorter time duration than a duration of the vocal recording time; extracting, for each of the plurality of vocal recording portions, vocal features of the subject; feeding, to a trained machine learning model, the vocal features for each of the plurality of vocal recording portions, to determine for each of the plurality of vocal recordings, an intermediate diagnosis for each of the plurality of medical conditions thereby obtaining a plurality of intermediate diagnoses for the plurality of time portions; determining, using the plurality of intermediate diagnoses, for the subject, a diagnosis for the plurality of medical conditions.

Description

SUBJECT DIAGNOSIS USING SPEECH ANAEYSIS
TECHNOEOGICAE FIELD
The present disclosure, in some embodiments, thereof, relates to diagnosis of a subject and, more particularly, but not exclusively, to diagnosis of mental state of the subject using speech recording of the subject.
BACKGROUND
Background art, where each art is incorporated in its entirety by reference, includes the below list. In the following document these arts are referred to by number e.g. using the relevant reference number/s in square brackets: [number].
1. Ramos-Lima, L.F., et al., The use of machine learning techniques in trauma-related disorders: a systematic review. Journal of Psychiatric Research, 2020. 121: p. 159-172.
2. Mannar, C.R., et al., Speech-based markers for posttraumatic stress disorder in US veterans. Depression and anxiety, 2019. 36(7): p. 607-616.
3. Vergyri, D., et al. Speech-based assessment of PTSD in a military population using diverse feature classes, in Sixteenth annual conference of the international speech communication association. 2015.
4. Belouali, A., et al., Acoustic and language analysis of speech for suicide ideation among US veterans. medRxiv, 2020.
5. Corcoran, C.M. and G.A. Cecchi, Using Language Processing and Speech Analysis for the Identification of Psychosis and Other Disorders. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 2020. 5(8): p. 770-779.
6. Aldeneh, Z., et al., Identifying Mood Episodes Using Dialogue Features from Clinical Interviews. arXiv preprint arXiv: 1910.05115, 2019.
7. Al Hanai, T., M.M. Ghassemi, and J.R. Glass. Detecting Depression with Audio/Text Sequence Modeling of Interviews, in Inter speech. 2018.
8. Lam, G., H. Dongyan, and W. Lin. Context-aware deep learning for multimodal depression detection, in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2019: IEEE.
9. Oshrat, Y., et al. Speech prosody as a biosignal for physical pain detection.
1
SUBSTITUTE SHEET (RULE 26) in ConjProc 8th Speech Prosody. 2016.
10. Dham, S., A Sharma, and A Dhall, Depression scale recognition from audio, visual and text analysis. arXiv preprint arXiv: 1709.05865, 2017.
11. Scherer, S., et al. Investigating voice quality as a speaker-independent indicator of depression and PTSD, in Interspeech. 2013.
12. Wortwein, T. and S. Scherer. What really matters-An information gain analysis of questions and reactions in automated PTSD screenings, in 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII). 2017: IEEE.
13. Priya, A, S. Garg, and N.P. Tigga, Predicting Anxiety, Depression and Stress in Modern Life using Machine Learning Algorithms. Procedia Computer Science, 2020. 167: p. 1258-1267.
14. Corcoran, C.M. and G. Cecchi, Using language processing and speech analysis for the identification of psychosis and other disorders. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 2020.
15. Dinkel, H., M. Wu, and K. Yu, Text-based depression detection: What triggers an alert. arXiv preprint arXiv: 1904.05154, 2019.
16. Perez, AS., "You are Not Normal If You Won't be Scared": A Psychosociolinguistic Study of Coping and Narrative Processing in the Discourse of Israeli Bus Drivers who Experienced Terror Attacks. 2014, Ben-Gurion University of the Negev
17. Perez, AS., Y. Tobin, and S. Sagy, “There is no fear in my lexicon" vs. "You are not normal if you won't be scared.". Beyond Narrative Coherence. Philadelphia, John Benjamins, 2010: p. 121-146.
18. Mikolov, T.e.a., Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781[cs.CL]. 2013.
19. Radloff, L.S., The CES-D scale: A self-report depression scale for research in the general population. Applied psychological measurement, 1977. 1(3): p. 385-401.
20. Brewin, C.R., et al., Brief screening instrument for post-traumatic stress disorder. The British Journal of Psychiatry, 2002. 181(2): p. 158-162.
21. Spitzer, R.L., et al., A brief measure for assessing generalized anxiety disorder: the GAD- 7. Archives of internal medicine, 2006. 166(10): p. 1092-1097.
2
SUBSTITUTE SHEET (RULE 26) 22. Buysse, D.J., et al., The Pittsburgh Sleep Quality Index: a new instrument for psychiatric practice and research. Psychiatry research, 1989. 28(2): p. 193-213.
23. Cohen, S., T. Kamarck, and R. Mermelstein, A global measure of perceived stress. Journal of health and social behavior, 1983: p. 385-396.
24. Melzack, R., The short-form McGill pain questionnaire. Pain, 1987. 30(2): p. 191-197.
25. Kroenke, K., R.L. Spitzer, and J.B. Williams, The PHQ-15: validity of a new measure for evaluating the severity of somatic symptoms. Psychosomatic medicine, 2002. 64(2): p. 258- 266.
26. Robbins, J., Kirkmayer, J.L. , Attributions of common somatic symptoms. Psychological medicine, 1991. 21: p. 1029-1045.
27. Tanner, B.A, Validity of global physical and emotional SUDS. Applied psychophysiology and biofeedback, 2012. 37(1): p. 31-34.
28. Cwikel, J. and K. Ritchie, The short GDS: Evaluation in a heterogeneous, multilingual population. Clinical Gerontologist, 1989. 8(2): p. 63-83.
29. Spector-Mersel, G. (2011). Mechanisms of selection in claiming narrative identities: A model for interpreting narratives. Qualitative Inquiry, 77(2), 172-185.
30. Saxbe, D., Horton, K. T., & Tsai, A. B. (2018). The Birth Experiences Questionnaire: A brief measure assessing psychosocial dimensions of childbirth. Journal of Family Psychology, 32(2), 262-268.
Acknowledgement of the above references herein is not to be inferred as meaning that these are in any way relevant to the patentability of the presently disclosed subject matter.
Machine Learning techniques have apparently been used to predict mental states using various forms of data such as biological tests, questionnaires, video, and vocal recordings [1].
Many voice analysis studies focus on one clinical state, such as Post Traumatic Stress Disorder (PTSD) [2, 3], suicide risk [4], psychosis [5], bipolar disorder [6], or depression [7, 8].
Study [9] has, apparently, identified the presence of pain but not, apparently, pain level and/or diagnostic characteristics.
Research [3] has used prosodic features including speech speed, pitch, intensity, pause lengths, and spectral features to detect, for example, either depression or PTSD.
3
SUBSTITUTE SHEET (RULE 26) Some studies (e.g., [8, 10]) investigating vocal features of depression have used the same dataset (the Distress Analysis Interview Corpus (DAIC)), which garnered both vocal and visual features from automated interviews, transcribed text, and video recordings.
Studies by Scherer and colleagues [11, 12] apparently detected two clinical states, using a virtual machine interviewer to administer standard clinical measures of PTSD and depression, and, by analyzing the audio files of how people answered the questions, detected two simultaneous clinical states.
Another study [13] analyzed answers to questionnaires to distinguish between stress, anxiety, and depression.
GENERAL DESCRIPTION
Following is a non-exclusive list of some exemplary embodiments of the disclosure. The present disclosure also includes embodiments which include fewer than all the features in an example and embodiments using features from multiple examples, even if not listed below.
Example 1. A method, implemented by computer circuitry, of diagnosis of a plurality of medical conditions for a subject comprising: obtaining a vocal recording of said subject having a vocal recording time duration; processing said vocal recording into a plurality of vocal recording portions, each vocal recording portion having a shorter time duration than a duration of said vocal recording time; extracting, for each of said plurality of vocal recording portions, vocal features of said subject; feeding, to a trained machine learning model, said vocal features for each of said plurality of vocal recording portions, to determine for each of said plurality of vocal recordings, an intermediate diagnosis for each of said plurality of medical conditions thereby obtaining a plurality of intermediate diagnoses for said plurality of time portions; determining, using said plurality of intermediate diagnoses, for said subject, a diagnosis for the plurality of medical conditions.
Example 2. The method according to claim 1, comprising obtaining linguistic features of said subject; and feeding said linguistic features to said trained machine learning model for 4
SUBSTITUTE SHEET (RULE 26) determining of said plurality of intermediate diagnoses.
Example 3. The method according to claim 1, comprising obtaining linguistic features of said subject; and feeding said linguistic features to a second trained machine learning model to determine a second intermediate diagnosis; wherein said determining comprises using said second intermediate diagnosis to provide said diagnosis for said plurality of medical conditions.
Example 4. The method according to any one of claims 2-3, wherein said obtaining linguistic features includes: obtaining a textural script of said vocal recording of said subject; and extracting said linguistic features from said textural script.
Example 5. The method according to claim 4, wherein said extracting said linguistic features includes extracting a plurality of linguistic features sets, one set for each vocal recording time portion.
Example 6. The method according to any one of claims 1-5, wherein said obtaining comprises obtaining a vocal recording which includes at least a portion where said subject describes a potentially emotive subject.
Example 7. The method according to any one of claims 1-6, wherein said determining, using said plurality of intermediate diagnoses comprises applying a rule to the diagnoses.
Example 8. The method according to claim 7, wherein said rule includes indicating presence of a condition where above a threshold proportion of said intermediate diagnoses indicate presence of the medical condition.
Example 9. The method according to claim 8, wherein said threshold proportion is different for different medical conditions of said plurality of medical conditions.
Example 10. The method according to any one of claims 1-9, wherein said determining, using said plurality of intermediate diagnoses comprises feeding said plurality of intermediate diagnoses to a second trained machine learning model to provide said diagnosis for the plurality of medical conditions.
Example 11. The method according to any one of claims 1-10, wherein said processing comprises removing one or more of silent portions, non-subject speech, and noise.
5
SUBSTITUTE SHEET (RULE 26) Example 12. The method according to any one of claims 1-11, wherein said vocal features include one or more spectral feature of said vocal recording.
Example 13. The method according to any one of claims 2-12, wherein said linguistic features include prevalence of one or more type of word.
Example 14. The method according to claim 13, wherein said type of word includes words having a tense, for one or more tense.
Example 15. The method according to any one of claims 13-14, wherein said type of word includes a type of pronoun, for one or more pronoun type.
Example 16. The method according to any one of claims 13-15, wherein said type of word includes a pronoun, for one or more pronouns.
Example 17. The method according to any one of claims 13-16, wherein said type of word includes an emotion word, identified from a list of emotion words.
Example 18. The method according to any one of claims 13-17, wherein said type of word includes repetition of a word or phrase.
Example 19. The method according to any one of claims 13-18, wherein said type of word includes an emphasis word, identified from a list of emphasis words.
Example 20. The method according to any one of claims 14-19, wherein said prevalence includes a prevalence per minute of the type of word.
Example 21. The method according to any one of claims 14-20, wherein said prevalence includes a proportion of words of the narrative being said type of word.
Example 22. The method according to any one of claims 14-21, wherein one or more of said word types is identified according to a list.
Example 23. The method according to any one of claims 1-22, wherein said trained machine learning model is trained by: obtaining a plurality of voice recordings, each recording having a diagnosis label; processing each of said plurality voice recordings into a plurality of vocal recording portions, each vocal recording portion having a shorter time duration than a time duration of a corresponding voice recording of said plurality of voice recordings; extracting, for each of said plurality of vocal recording portions, vocal features of said subject; training the machine learning model, using said diagnoses labels and said vocal features.
6
SUBSTITUTE SHEET (RULE 26) Example 24. A method, implemented by computer circuitry, of training a machine learning model for diagnosis comprising: receiving a plurality of voice recordings, each recording having a diagnosis label and corresponding to a single patient; processing each of said plurality of voice recordings into a plurality of vocal recording portions, each vocal recording portion having a shorter time duration than said corresponding vocal recording time duration; extracting, for each of said plurality of vocal recording portions, vocal features of said subject; training the machine learning model, using said diagnosis labels, said vocal features for each of said plurality of vocal recording portions.
Example 25. A method, implemented by computer circuitry, of training a machine learning model for diagnosis comprising: obtaining a plurality of diagnosis labels, each diagnosis label including a diagnosis for a plurality of medical conditions; obtaining a plurality of voice recordings each associated with a diagnosis label of said plurality of diagnosis labels; obtaining one or more linguistic features, each associated with a diagnosis label of said plurality of diagnosis labels; extracting one or more vocal features from each of said plurality of voice recordings; training said machine learning model using said diagnosis labels, said one or more vocal features per diagnosis label, and said one or more linguistic features per diagnosis label.
Example 26. A method, implemented by computer circuitry, of training machine learning models for diagnosis comprising: obtaining a plurality of diagnosis labels, each diagnosis label including a diagnosis for a plurality of medical conditions; obtaining a plurality of voice recordings each associated with a diagnosis label of said plurality of diagnosis labels; obtaining one or more linguistic features, each associated with a diagnosis label of said plurality of diagnosis labels; extracting one or more vocal features from each of said plurality of voice
7
SUBSTITUTE SHEET (RULE 26) recordings; training a first machine learning model using said diagnosis labels and said one or more vocal features per diagnosis label; training a second machine learning model using said diagnosis labels and said one or more linguistic features per diagnosis label.
Example 27. The method according to claim 27, comprising training a third machine learning model using said first machine learning model and said second machine learning model and said diagnosis labels.
Example 28. The method according to claims 26-27, wherein said extracting one or more vocal features comprises, for each voice recording of said plurality of voice recordings: processing said voice recording into a plurality of vocal recording portions, each vocal recording portion having a shorter time duration than a duration of said voice recording; extracting, for each of said plurality of vocal recording portions, vocal features of said subject;
Example 29. The method according to any one of claims 26-87, wherein said obtaining said one or more linguistic features includes: obtaining a textural script of each said vocal recording of said plurality of vocal recordings; and extracting said one or more linguistic features from said textural script, for each textural script.
Unless otherwise defined, all technical and/or scientific terms used within this document have meaning as commonly understood by one of ordinary skill in the art/s to which the present disclosure pertains. Methods and/or materials similar or equivalent to those described herein can be used in the practice and/or testing of embodiments of the present disclosure, and exemplary methods and/or materials are described below. Regarding exemplary embodiments described below, the materials, methods, and examples are illustrative and are not intended to be necessarily limiting.
Some embodiments of the present disclosure are embodied as a system, method, or computer program product. For example, some embodiments of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro code, etc.) or an embodiment combining
8
SUBSTITUTE SHEET (RULE 26) software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” and/or “system.”
Implementation of the method and/or system of some embodiments of the present disclosure can involve performing and/or completing selected tasks manually, automatically, or a combination thereof. According to actual instrumentation and/or equipment of some embodiments of the method and/or system of the present disclosure, several selected tasks could be implemented by hardware, by software or by firmware and/or by a combination thereof, e.g., using an operating system.
For example, hardware for performing selected tasks according to some embodiments of the present disclosure could be implemented as a chip or a circuit. As software, selected tasks according to some embodiments of the present disclosure could be implemented as a plurality of software instructions being executed by a computational device e.g., using any suitable operating system.
In some embodiments, one or more tasks according to some exemplary embodiments of method and/or system as described herein are performed by a data processor, such as a computing platform for executing a plurality of instructions. Optionally, the data processor includes a volatile memory for storing instructions and/or data and/or a non-volatile storage e.g., for storing instructions and/or data. Optionally, a network connection is provided as well. User interface/s e.g., display/s and/or user input device/s are optionally provided.
Some embodiments of the present disclosure may be described below with reference to flowchart illustrations and/or block diagrams. For example illustrating exemplary methods and/or apparatus (systems) and/or and computer program products according to embodiments of the present disclosure. It will be understood that each step of the flowchart illustrations and/or block of the block diagrams, and/or combinations of steps in the flowchart illustrations and/or blocks in the block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general-purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart steps and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer readable
9
SUBSTITUTE SHEET (RULE 26) medium that can direct a computer (e.g., in a memory, local and/or hosted at the cloud), other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium can be used to produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be run by one or more computational device to cause a series of operational steps to be performed e.g., on the computational device, other programmable apparatus and/or other devices to produce a computer implemented process such that the instructions which execute provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
Some of the methods described herein are generally designed only for use by a computer, and may not be feasible and/or practical for performing purely manually, by a human expert. A human expert who wanted to manually perform similar tasks, might be expected to use different methods, e.g., making use of expert knowledge and/or the pattern recognition capabilities of the human brain, potentially more efficient than manually going through the steps of the methods described herein.
BRIEF DESCRIPTION OF THE DRAWINGS
In order to better understand the subject matter that is disclosed herein and to exemplify how it may be carried out in practice, embodiments will now be described, by way of non-limiting example only, with reference to the accompanying drawings, in which:
FIG. 1 illustrates a system, according to some embodiments of the disclosure;
FIG. 2 illustrates data flow with respect to system elements, for prediction using a ML model, according to some embodiments of the disclosure;
FIG. 3 is a flow chart of a method of diagnosis using a ML model, according to some embodiments of the disclosure;
FIG. 4 illustrates data flow with respect to system elements, for training of a ML model, according to some embodiments of the disclosure;
FIG. 5 is a flow chart of a method of ML model training, according to some embodiments of the disclosure;
10
SUBSTITUTE SHEET (RULE 26) FIG. 6 is a flow chart of a method of diagnosis using a ML model, according to some embodiments of the disclosure;
FIG. 7 is a flow chart of a method of ML model training, according to some embodiments of the disclosure;
FIG. 8 illustrates data flow with respect to system elements, for prediction using a ML model, according to some embodiments of the disclosure;
FIG. 9 is a method of ML model training, according to some embodiments of the disclosure;
FIG. 10 is a method of training a ML model, according to some embodiments of the disclosure;
FIG. 11 is a flow chart of methods of ML training, according to some embodiments of the disclosure;
FIG. 12 is a method of normalizing speech for an individual, according to some embodiments of the disclosure; and
FIG. 13 is a method of a flow chart of a method of diagnosis using a ML model, according to some embodiments of the disclosure.
In some embodiments, although non-limiting, in different figures, like numerals are used to refer to like elements, for example, element 216 in FIG. 2 corresponding to element 416 in FIG. 4.
DETAILED DESCRIPTION OF EMBODIMENTS
The present disclosure, in some embodiments, thereof, relates to diagnosis of a subject and, more particularly, but not exclusively, to diagnosis of mental state of the subject using speech recording of the subject.
Overview
A broad aspect of some embodiments of the disclosure relates to diagnosing a subject using both vocal features of the subject (e.g. vocal features of speech of the subject) and linguistic features of language used by the subject. In some embodiments, one or more machine learning model provides a diagnosis of the subject using vocal recording/s of the subject. Where, in some embodiments, vocal features are extracted from speech recording/s of the subject and linguistic feature/s are identified in text transcript/s of the recording/s.
11
SUBSTITUTE SHEET (RULE 26) An aspect of some embodiments of the disclosure relates to diagnosing a subject for a plurality of mental states (also herein termed “mental conditions”, “conditions”, “clinical states”, and “states”), using both vocal features of the subject and linguistic features of language used by the subject. Potential benefits of multi-morbidity diagnosis being potential identification of previously unknown co-morbidities potentially enabling improved treatment and/or outcome/s for the subject. For example, in some embodiments, a subject suffering from a physical illness is diagnosed for a range of mental health conditions e.g. using a single recording.
For example, in some embodiments, subjects suffering from a health condition potentially including both physical and mental aspects, are diagnosed (e.g. using vocal recordings and the machine learning model), diagnosis of existence and/or severity of the different aspects in some embodiments, enabling improved treatment.
For example, for conditions such as chronic pain syndrome and irritable bowel syndrome which include physical aspects and mental aspects which (it is theorized) impact each other, individual patient treatment, based on the diagnosis, in some embodiments, allows focus of treatment on the aspect having more impact on the quality of life of the subject.
A potential advantage of using both vocal features and linguistic features of a subject in diagnosis of the subject is increased accuracy in the diagnosis e.g. as opposed to a diagnosis performed using a single type of feature. Where, in some embodiments, using both vocal and linguistic features potentially enables diagnosis of clinical states for different subject populations e.g. different gender and/or age and/or language and/or culture and/or suffering from different clinical states and/or groups of clinical states.
In some embodiments, the diagnosis is repeated at different times is repeated e.g. to provide assessment of a subject over time.
In some embodiments, linguistic feature/s are also fed to the machine learning model which provides the diagnosis of the subject. Alternatively, or additionally, in some embodiments, linguistic feature/s are evaluated separately, and the two evaluations (a first and a second evaluation) are used to provide a diagnosis of the subject. Where, in some embodiments, a first evaluation is provided by a machine learning model fed the vocal feature/s and a second evaluation is based on identified linguistic feature/s (e.g. the second evaluation, in some embodiments provided by a machine learning model fed the linguistic 12
SUBSTITUTE SHEET (RULE 26) feature/s)
An aspect of some embodiments of the disclosure relates to dividing vocal recordings of a subject into a plurality of snippets (each snippet includes a temporal portion of the associated vocal recording) and one or both of training a machine learning model for diagnosis of subjects using snippets and diagnosis of a subject using snippets.
Where, in some embodiments, extracted feature/s from the snippets are used in training of and/or prediction using the machine learning model.
In some embodiments, a vocal recording of speech of a subject is divided into snippets, from which are extracted snippet feature/s which are then fed to a machine learning model which produces a diagnosis for each snippet.
The plurality of diagnoses are then processed to provide a single diagnosis of the subject. Where, processing includes, in some embodiments, use of a rule and/or a second ML model.
In some embodiments, diagnosis is for a plurality of conditions, where for each condition, the machine learning model provides a set of diagnoses (a diagnosis for each condition) for each snippet, and then the sets of diagnoses are used to provide a single set of diagnoses for the patient.
In some embodiments, the subject voice recording is obtained as part screening and/or diagnosis and/or treatment of the subject.
Where, for example, in some embodiments, subject/s presenting themselves for and/or receiving treatment (e.g. at a hospital) are screened (e.g. for mental health status and/or complications) by obtaining a speech recording of the subject.
In some embodiments, recordings of a same subject are acquired over time, diagnoses providing a picture of subject recovery or otherwise with time.
In some embodiments, a subject suffering from a first medical condition is screened for other medical conditions, by acquiring voice recording/s and diagnosing the subject for one or more additional medical conditions (e.g. according to method/s as described in this document). A potential advantage of which is improvement of prognosis of the first medical condition. For example, a subject suffering from a cancer diagnosis who is also depressed (and/or suffering from other mental health condition/s) may have a less positive prognosis for the cancer diagnosis, potentially screening cancer sufferers for mental health issues which are then treated, potentially improving the cancer
13
SUBSTITUTE SHEET (RULE 26) prognoses.
Potentially, using voice recordings provides faster diagnosis and/or diagnosis for more individuals (e.g. as opposed to diagnosis by a mental health professional). For example, potentially enabling heath care authorities to divert mental health services to the correct individuals.
In some embodiments, voice recordings and diagnosis of emotional state of the subject are used for other purposes than for diagnosis of health conditions. For example, in some embodiments, voice recordings are used to determine if a subject is displaying correct emotional responses. For example, for screening of caregivers for potential abuse upon a child (or other dependent) presenting with an injury. For example, screening of suspected criminals. For example, for screening of subjects for risk e.g. risk of aggressive behavior.
Optionally, in some embodiments, video and/or photograph/s of a subject are used in the diagnosis e.g. in addition to vocal recording/s.
In some embodiments, diagnosis e.g. as described within this document is used as a single-use diagnostic tool. In some embodiments, diagnosis e.g. as described within this document is used as a comparative assessment tool e.g. to enable comparison between subjects and/or a same subject at different time.
In some embodiments, methods as described in this document provide an acute assessment e.g. for instance, upon an individual’ s arrival at a hospital e.g. after an accident or potentially traumatic event, for example, the diagnosis potentially enabling care providers to gauge a level of susceptibility to post-traumatic symptoms or full-blown PTSD.
In some embodiments, the diagnosis is performed for a same subject at different times.
For example, in some embodiments, the diagnosis e.g. including a speech recording (e.g. recording an answer to a requests for the relevant narrative) performed before and after one or more intervention e.g. one or more of a therapy, program, study potentially provides evidence of change occurring in the processing of the narrated experience e.g. evidence of efficacy of the intervention.
For example, in some enbodiments, diagnosis is using subject narration of ‘same’ memories of ‘same’ events, experienced by a single individual, over a substantial period of time (e.g. 1 day to 1-5 years). The diagnoses potentially providing connections between 14
SUBSTITUTE SHEET (RULE 26) narrative content and/or form, and/or one or more of the subject’s coping, processing, and psychosocial functioning.
In some embodiments, a subject is asked to describe different events, the different recordings providing different diagnoses e.g. according to method/s described in this document. In some embodiments, these different diagnoses are used (e.g. by a healthcare practitioner e.g. a therapist) to identify those events which are one or more of more acute, prominent, or emotionally charged e.g. enabling identification of severity of different issues and/or triggers and/or efficacy of treatment for specific issues and/or triggers.
In some embodiments, a diagnosis includes a diagnosis of how problematic and/or traumatic an event and/or experience was for the subject. Where, in some embodiments, this diagnosis is used to determine which subjects should receive preventative care (e.g. mental health care) to potentially prevent development of mental health issues (e.g. PTSD, depression, anxiety) associated with the event. Potentially, diagnosis and/or diagnoses using method/s described in this document provide a one or more of:
1) an assessment of level of trauma or stress as related to the narrated experience (or event, state, or process);
2) an assessment of a process of emotional processing of the experience;
3) an assessment of an individual's emotional attributions with regard to the experience, such as guilt, shame, and responsibility;
4) an assessment of an individual's emotional and functional well-being subsequent to the experience e.g. as described in reference [17];
5) an assessment of issue/s of psychological blockage and/or hindrances to the coping process;
6) an assessment of a level of acute distress;
7) an assessment of a degree of healthy vs. less functional processing of a current life stressor at the time of narration;
8) an assessment of capacity of outlining a reliable picture of where the individual stands;
9) an assessment of current and/or potential clinical symptoms;
10) a prediction of a recovery trajectory;
11) a prediction of susceptibility to risk factors; and
12) an identification of individuals at risk and/or in need.
15
SUBSTITUTE SHEET (RULE 26) Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not necessarily limited in its application to the details of construction and the arrangement of the components and/or methods set forth in the following description and/or illustrated in the drawings and/or the Examples. The invention is capable of other embodiments or of being practiced or carried out in various ways.
FIG. 1 illustrates a system 100, according to some embodiments of the disclosure.
In some embodiments, system 100 includes processing and memory circuitry (PMC) 102. Where, in some embodiments, one or more method as described in this document is performed partially or fully by PMC for example, one or more feature of the methods described in one or more of FIG. 3, FIG. 5, FIG. 6, FIG. 7, FIG. 10, FIG. 11, and FIG. 12.
In some embodiments, system 100 includes a pre-processing module 104 (e.g. hosted by PCM 102).
In some embodiments, system 100 includes a machine learning model 106 (e.g. hosted by PCM 102).
In some embodiments, system 100 includes and/or has connectivity to one or more of a personal electronic device 108 (e.g. a cell phone, tablet, lap top), the cloud 110, electronic device 112 (e.g. medical device e.g. screening and/or diagnosis-bot), microphone 114.
In some embodiments, vocal recording of a subject is acquired by microphone 114 and/or a microphone of personal electronic device 108, and a microphone of electronic device 112.
In some embodiments, one or more of personal electronic device 108, electronic device 112, and cloud 110 hosts one or more portion of PMC 102.
Where, for example, portion/s of PMC 102 are hosted as an application running on hardware of one or both of personal electronic device 108 (e.g. smartphone application) and electronic device 112 (e.g. desktop application). Where, in some embodiments, device/s 108, 112 access update/s to and/or feature/s of PCM 102 hosted by cloud 110.
In some embodiments, one or more of personal electronic device 108, and electronic device 112 include one or more user interface. Through which a user (e.g.
16
SUBSTITUTE SHEET (RULE 26) subject, healthcare professional, subject caregiver) inputs information e.g. received by PCM 102 and/or is displayed output/s of PCM (e.g. output of ML model 106).
In some embodiments, system 100 includes one or more sensor 109. Where, in some embodiments, sensor/s 109 sense one or more physical feature of a subject (e.g. while the subject is being recorded to provide a subject vocal recording). For example, measurement including one or more of heartbeat feature/s (e.g. heart rate e.g. heart rate variability), blood pressure, temperature.
As visible in FIG. 1, the processor of PMC 102, in some embodiments, is configured to implement at least one machine learning model 106. In some embodiments, the machine learning model 106 include a neural network (NN). In some embodiments, the machine learning model 106 includes a deep neural network (DNN).
In some embodiments, the processor executes several computer-readable instructions implemented on a computer-readable memory of PMC 102, wherein execution of the computer-readable instructions enables data processing by machine learning model 106. As explained hereinafter, machine learning model 106 enables processing of data provided (e.g. including subject vocal feature/s and/or subject linguistic feature/s), for outputting one or more diagnosis of the subject.
Note that in some embodiments, the processor of PMC 102 is configured to implement a plurality of different machine learning models 106, for example, referring to FIG. 8, both ML model 810 and 2nd ML model 816.
By way of non-limiting example, the layers of the machine learning model 106 are, in some embodiments, organized in accordance with Convolutional Neural Network (CNN) architecture, Recurrent Neural Network architecture, Recursive Neural Networks architecture, or Generative Adversarial Network (GAN) architecture. In some embodiments, at least some of the layers are organized in a plurality of DNN subnetworks. Each layer of the DNN, in some embodiments, includes multiple basic computational elements (CE), typically referred to in the art as dimensions, neurons, or nodes.
Generally, computational elements of a given layer can be connected with CEs of a preceding layer and/or a subsequent layer. Each connection between a CE of a preceding layer and a CE of a subsequent layer is associated with a weighting value. A given CE can receive inputs from CEs of a previous layer via the respective connections, each given 17
SUBSTITUTE SHEET (RULE 26) connection being associated with a weighting value which can be applied to the input of the given connection. The weighting values can determine the relative strength of the connections and thus the relative influence of the respective inputs on the output of the given CE. The given CE can be configured to compute an activation value (e.g., the weighted sum of the inputs) and further derive an output by applying an activation function to the computed activation. The activation function can be, for example, an identity function, a deterministic function (e.g., linear, sigmoid, threshold, or the like), a stochastic function, or other suitable function. The output from the given CE can be transmitted to CEs of a subsequent layer via the respective connections. Likewise, as above, each connection at the output of a CE can be associated with a weighting value which can be applied to the output of the CE prior to being received as an input of a CE of a subsequent layer. Further to the weighting values, there can be threshold values (including limiting functions) associated with the connections and CEs.
FIG. 2 illustrates data flow with respect to system elements, for prediction using a ML model, according to some embodiments of the disclosure.
In some embodiments, one or more obtained and/or received subject voice recording/s 216 (e.g. including speech of the subject/s) are processed, by a voice recording processing module 203 to determine vocal feature/s 222. Where, in some embodiments, vocal feature/s 222 are pre-determined vocal features.
Where, in some embodiments, the subject voice recording/s include one or more feature as described in step 300 FIG. 3. Where, in some embodiments, processing module 203 performs and/or vocal feature/s 222 include one or more feature as described in regarding step 304 FIG.
Optionally, in some embodiments, voice recording processing module 203, (e.g. prior to extracting vocal feature/s) performs one or more pre-processing operation on the voice recording/s 216. For example, according to one or more feature as described regarding step 1010 FIG. 10. In some embodiments, pre-processing is performed using a pre-trained ML model, e.g. referring back to FIG. 1, where a pre-processing ML model is hosted, in some embodiments, by pre-processing module 104.
In some embodiments, subject language data 218 of the subject is processed, by a language processing module 205, to provide linguistic feature/s 224 which linguistic 18
SUBSTITUTE SHEET (RULE 26) feature/s are in some embodiments, pre-determined.
Where, in some embodiments, subject language data 218 includes one or more feature of subject language data as described regarding step 302 FIG. 3. Where, in some embodiments, language processing module performs and/or linguistic feature/s include one or more feature as described regarding step 306 FIG. 3.
In some embodiments, ML model/s 206 provide a multi-morbidity diagnosis 228 for the subject, using linguistic feature/s 224 and vocal feature/s 222 which are fed to the model, for example, the providing including one or more feature as described regarding step 312 FIG. 3.
Optionally, in some embodiments, subjective self-ratings 220 (which, in some embodiments, include one or more feature as described regarding self -ratings step 310 FIG. 3) are additionally fed to the ML model 206 e.g. and used in providing the multimorbidity diagnosis 228.
In some embodiments, ML model 206 is a single ML model which receives both linguistic features/s 224 and vocal feature/s 222 and optional self-rating/s 220 to provide diagnosis 228.
In some embodiments, ML model 206 includes a plurality of models e.g. including one or more feature as described and/or illustrated regarding FIG. 13. For example, a vocal feature ML model which receives vocal feature/s 222, and a linguistic feature ML model which receives linguistic features 224. In some embodiments, the vocal feature ML model and the linguistic feature ML model each produce a diagnosis, where the two diagnoses are fed to another ML model to provide a single diagnosis for the subject and/or a single diagnosis is provided by applying a rule to the two diagnoses. Where, in some embodiments, optionally, self-rating/s 220 are fed to one or more of the vocal feature and linguistic feature ML models to be used for providing one or more of the diagnoses from the vocal feature and linguistic feature ML models.
Referring back to FIG. 1, in some embodiments, PMC 102 hosts data 216, 218, 220, 222, 224, 228 and/or system elements 203, 205, 206. Where, in some embodiments, pre-processing module 104 hosts one or both pre-processing modules 203, 205 and/or ML model 206 corresponds to ML model 106.
FIG. 3 is a flow chart of a method of diagnosis using a ML model, according to some embodiments of the disclosure.
19
SUBSTITUTE SHEET (RULE 26) At 300, in some embodiments, subject voice recording/s are received, which recording/s include audio recording. In some embodiments, a single voice recording is received. Where, in some embodiments, the voice recording includes 10 second -10 minutes of recording, or 1-5 minutes of recording, and/or a recording from which at least 1 minute of continuous vocalization (e.g. speech) of the subject is extractable (e.g. by removal of pauses and/or noise and/or extraneous speech).
In some embodiments, a speech recording suitable for vocal method/s described in this document (e.g. extraction of vocal feature/s) includes at least lOseconds - 3minutes, or about 1 minute, or lower or higher or intermediate ranges or durations of recorded speech of a subject.
In some embodiments, a speech recording suitable for linguistic method/s described in this document (e.g. extraction of linguistic feature/s) includes at least 1 minute to 10 minutes, or about 5 minutes or lower or higher or intermediate ranges or durations of recorded speech of a subject.
In some embodiments, the subject voice recording includes speech of the subject. Alternatively (e.g. in the case of a non-verbal subject) or additionally, in some embodiments, the subject voice recording 216 includes subject vocalizations e.g. one or more of emotional sounds (crying, sighing), vocalization accompanying non-verbal communication e.g. via sigh language, singing, humming, coughing.
At 302, in some embodiments, subject language data is obtained and/or received. For example, according to one or more feature of step 1018 FIG. 10.
Where, in some embodiments, subject language data includes a text script/s of the subject voice recording/s. Where, alternatively or additionally, in some embodiments, subject language data includes text written by the subject e.g. one or more of; written answers to questions, social media posts, and email correspondence.
In an exemplary embodiment, subject language data is acquired using one or more feature of Dr. Perez's (2014) Narrative Method for Assessment of Psychosocial Processing (NMAPP) e.g. including one or more feature as described in reference [16].
In an exemplary embodiment, a subject is asked to describe an experience. Where, in some embodiments, the experience is a potentially emotive and/or psychologically charged event. Where, for example, in some embodiments, the subject is asked (e.g. by a human and/or electronic interviewer, verbally and/or by display of text): “Tell me the story of what you experienced.”
20
SUBSTITUTE SHEET (RULE 26) A potential advantage of the subject providing a narrative is that the process of narrating may be therapeutic and cathartic in and of itself. Indeed, it may assist the individual in organization of what may be very chaotic events and experiences, and in the process of the creation of coherence out of what may be highly incoherent facets of experience - particularly in the acute post-event context.
At 304, in some embodiments, vocal feature/s are extracted from the voice recording.
In some embodiments, one or more vocal features include frequency-domain representation of the voice signal (e.g. obtained using Fast Fourier Transform (FFT)). In an exemplary embodiment, the vocal features include (e.g. only) frequency-domain values of the subject speech recording where those frequencies not corresponding to human speech are absent.
In some embodiments, one or more vocal features are extracted using openSMILE open-source software.
In some embodiments, exemplary vocal features (e.g. alternatively or additionally to frequency domain representation of the voice signal) include one or more of:
• Speed, words per minute - pause length.
• spectral aspects of the vocal signal e.g. pitch e.g. pitch range
• speech speed e.g. as quantified by words per minute and/or syllables per minute
• pause lengths
At 306, in some embodiments, linguistic feature/s are identified in the language data.
In some embodiments, identifying linguistic features uses one or more feature of Dr. Perez's (2014) Narrative Method for Assessment of Psychosocial Processing (NMAPP) e.g. including one or more feature as described in reference [16].
Alternatively or additionally, in some embodiments, Natural Language Processing (NLP) technique/s are used to extracts one or more linguistic features e.g. techniques including one or more feature as described in one or more of references [4, 6, 7, 10, 14, 15] and/or one or more feature of Linguistic Inquiry and Word Count (LIWC) or word2vec programs (e.g. as described in reference [18]).
In some embodiments, identifying linguistic features uses one or more feature as
21
SUBSTITUTE SHEET (RULE 26) descnbed in reference [29] .
In some embodiments, linguistic features include prevalence of one or more feature of language used by the subject. Where, regarding exemplary linguistic features described hereinbelow, examples of each linguistic feature are found in table 1.
For example, in some embodiments, a linguistic feature includes a prevalence of one or more pronoun and/or pronoun type. Where prevalence is, in some embodiments, the prevalence of the pronoun (and/or pronoun type) per minute and/or as a proportion of words used, and/or as a proportion of pronouns used.
In some embodiments, pronouns are identified using a list of pronouns, data entries of the list, in some embodiments, having an associated category.
Where exemplary pronoun types include, for example, one or more of: personal (I, we, you, he, she, it, they), demonstrative (this, these, that, those), relative (who, which, that, as), indefinite (each, all, everyone, either, one, both, any, such, somebody), interrogative (who, which, what), reflexive (myself, herself), possessive (mine, yours, his, hers, theirs).
For example, in some embodiments, a linguistic feature includes a tense of language used, optionally, with respect to a time of occurrence of the story being narrated. For example, prevalence (e.g. words per minute and/or as a proportion of words used and/or words having tense used) of one or more of past, present, and future tenses. For example, proportion of language used in description which has matching and/or mismatching tense e.g. prevalence past tense words used in description of a past event, e.g. prevalence of present tense words used in description of a past event, e.g. prevalence of future tense words used in description of a past event.
For example, in some embodiments, a linguistic feature includes a prevalence of “emotion words”. Where prevalence, in some embodiments, is emotion words per minute and/or proportion emotion words used and/or proportion of words of one or more type used (e.g. proportion of adjectives and/or adverbs and/or verbs and/or nouns being emotion words). Where, in some embodiments, identification of emotion words is via a list (e.g. a pre-defined list).
For example, in some embodiments, a linguistic feature includes a prevalence of extra-linguistic expressions of emotion. For example, non-speech vocalizations e.g. crying, sighing, tongue clicking. For example, non-vocal noise generation e.g. hitting the table, stamping, clapping. Where for one or more expression of emotion, prevalence 22
SUBSTITUTE SHEET (RULE 26) includes a number of such expressions, and/or duration, and/or proportion of the narration including such expressions.
For example, in some embodiments, a linguistic feature includes a prevalence of repetition. For example, a prevalence of repeated words and/or phrases. For example, as a proportion of a time of the narrative and/or as a proportion of the words used.
For example, in some embodiments, a linguistic feature includes a prevalence of emphasis used. Emphasis, in some embodiments, including use of words of a list of emphasis words (e.g. prevalence as a proportion of the narrative time and/or words). Emphasis, in some embodiments, including non-verbal gesture/s (e.g. a prevalence of). Emphasis, in some embodiments, including a volume of the speech e.g. as compared to other portion/s of the speech recording.
For example, in some embodiments, a linguistic feature includes a prevalence of passive and/or active language used. For example, as a proportion of a time of the narrative and/or as a proportion of the words and/or phrases used.
For example, in some embodiments, a linguistic feature includes one or more of a type of language e.g. where a phrase of language used is categorized as having one or more of type. Exemplary types including, for example, one or more of metaphor, poetic talk, euphemism, personalized / belonging talk, distancing / distanced talk, generalized talk, acceptability talk, meta-talk, talk outside narrative, markers specific to text, agency / locus of control, guilt talk, ambivalence, argumentation (with social discourse). In some embodiments, phrase/s are categorized using a machine learning model trained to identify types of language used, where in some embodiments, once identified, a linguistic feature includes prevalence (e.g. as a proportion of the text) of one or more type of language used.
In an exemplary embodiment, lists (e.g. pre-defined lists) of word types are used, the prevalence of each word type being a linguistic feature. Where, in some embodiments, the lists include a list of pronouns, emotion words, and emphasis words.
In some embodiments, one or more linguistic feature is determined for one or more specific portion of narrative. For example, an opening phrase and/or portion of a narrative. For example, a closing phrase and/or portion.
Table 1
Figure imgf000025_0001
23
SUBSTITUTE SHEET (RULE 26)
Figure imgf000026_0001
At 310, in some embodiments, subject self-ratings are obtained and/or received. In some embodiments, the subject provides rating for pain, depression, and anxiety. In some embodiments, the rating is a 10-point rating scale based on the SUDS (subjective units of distress) measure e.g. as described in reference [27] .
Optionally, in some embodiments, the subject provides subjective self-ratings more than one time e.g. before and after vocal recordings of the subject are acquired.
In some embodiments, self-rating repetition is used to provide an average selfrating. In some embodiments, self-rating repetition is used to provide a measure as to how traumatic the experience of providing a vocal recording is e.g. according to a theory that a recently distressed subject will rate their level of distress in a self-rating as higher.
In some embodiments, self-rating repetition is used to determine if self-rating
24
SUBSTITUTE SHEET (RULE 26) provides a sufficiently consistent diagnosis dependent of emotional state of the subject.
At 312 machine learning model/s provide a multi-morbidity diagnosis. Where, in some embodiments, the multi-morbidity diagnosis includes a binary indication as to whether the subject has one or more medical condition. Additionally or alternatively, for one or more medical condition, in some embodiments, multi-morbidity diagnosis includes a level of severity of the condition.
In some embodiments, a plurality of machine learning model/s each provide a diagnosis (e.g. a multi-morbidity diagnosis) where, in some embodiments, the plurality of diagnoses are combined using another machine learning model and/or a rule, for example, as described regarding FIG. 13.
In some embodiments, a diagnosis is produced using the model where, in some embodiments, the model produces a number (e.g. between 0-100) as to how likely the subject suffers from a particular condition and, according to a rule (e.g. pre-trained cutoff) the subject is diagnosed as having the condition e.g. as belonging to a sick (e.g. depressed) or healthy (e.g. non-depressed) person. Where, in some embodiments, for each condition the cut-off is a different number. Where, in some embodiments, for each condition the number provided by the ML model is within a different range (e.g. 0-50).
FIG. 4 illustrates data flow with respect to system elements 403, 405, 406, for training of a ML model 406, according to some embodiments of the disclosure.
In some embodiments, like elements of FIG. 4 correspond to like elements of FIG. 2 (e.g. element 416 FIG. 4 corresponding to element 216 FIG. 2). Where, in some embodiments, for each data element 416, 418, 420, 426, 422, 424 of FIG. 4, data is for a plurality of subjects (e.g. as opposed to FIG. 2 where, in some embodiments, data elements include data for a single subject).
In some embodiments, a machine learning model 406 is fed vocal feature/s 422, linguistic feature/s 424, associated subjects’ labels 426, and optional subjects’ self- rating/s 420 and is then trained using the received data.
For example, as described regarding ML model/s 206 FIG. 2, in some embodiments, element 406 is a plurality of ML models.
For example, a vocal feature ML model which is fed vocal feature/s 422 and associated subjects’ labels 426, and a linguistic features ML model which is fed linguistic features 424 and associated subjects’ labels 426. In some embodiments, the vocal feature 25
SUBSTITUTE SHEET (RULE 26) ML model and the linguistic feature ML model each are trained to produce a diagnosis. Optionally, in some embodiments, one or both of the vocal feature ML model and the linguistic feature ML model are fed and trained using self-rating/s 420.
In some embodiments, a ML pipeline is used, where the ML algorithm itself selects which features (e.g. of a plurality of vocal features and/or a plurality of linguistic features) are important or not as part of the training procedure.
Referring back to FIG. 1, in some embodiments, PMC 102 hosts data 416, 418, 420, 422, 424, 428 and/or system elements 403, 405, 406. Where, in some embodiments, pre-processing module 104 hosts one or both pre-processing modules 403, 405 and/or ML model 406 corresponds to ML model 106.
FIG. 5 is a flow chart of a method of ML model training, according to some embodiments of the disclosure.
At 500, in some embodiments, a data set of a plurality of labeled subject voice recordings are obtained and/or received. Where each subject recording includes one or more feature as described subject voice recording/s regarding step 300 FIG. 3.
In some embodiments, the data set includes both subjects having with clinical states (e.g. depression, anxiety, chronic pain) and normative persons without diagnosed mental health condition/s.
In some embodiments, a label indicates presence of one or more condition for each respective subject. Where, in some embodiments, the label includes binary diagnosis (subject has the condition, subject does not have the condition) for one or more condition. Alternatively or additionally, in some embodiments, the label indicates a level of severity of one or more condition.
In some embodiments, medical conditions include one or more psychological conditions. For example, one or more of; depression, anxiety, chronic pain syndrome, post-traumatic stress disorder (PTSD), sleep disturbances, eating disorder/s, fibromyalgia, attention deficit and hyperactivity disorder (ADHD), attention deficit disorder (ADD).
In some embodiments, a label (or diagnosis) includes a subjects’ relationship to an experience and/or event. For example, a quantification as to how traumatic the event was.
In some embodiments, a label (e.g. a multiple-condition label) is determined for 26
SUBSTITUTE SHEET (RULE 26) a subject.
Where, for example, labeling of subjects includes diagnosing the subject using one or more standardized assessment questionnaire (e.g. including one or more standardized psychological questionnaire).
For example, one or more of:
• Standard demographic and/or health behavior questions (e.g. smoking, self-rated health, use of medications);
• The CES-D Depression Scale as described in reference [19];
• The Brief Screening Instrument for PTSD as described in reference [20];
• Generalized anxiety disorder (GAD), using the short form GAD-7 as described in reference [21]
• The Pittsburgh Sleep Quality Index as described in reference [22]
• A measure of perceived stress as described in reference [23]
• Pain diagnosis labeling uses, for example, one or more of: o The Short Form McGill Pain Questionnaire as described in reference [24] o The PHQ15 (brief medical questionnaire) as described in reference [25] o The PCS (pain catastrophizing scale) as described in reference [26] In some embodiments, the subject provides answers to questionnaire questions, (written and/or inputted into an electronic device). In some embodiments, the subject (e.g. verbally) provides answers to questionnaire questions to a healthcare professional who, in some embodiments, discusses questionnaire answer/s with the subject and/or amends answer/s. in one or more.
Alternatively or additionally, in some embodiments, a healthcare professional (e.g. psychiatrist) provides a diagnosis label for the subject e.g. the label including a diagnosis for one or more medical condition.
At 502, labeled subject language data is received. In some embodiments, subject language data includes one or more feature of subject language data as described regarding step 302 FIG. 3.
At 504, in some embodiments, vocal features are extracted from the received recordings. Where step 504, in some embodiments, includes one or more feature of step
27
SUBSTITUTE SHEET (RULE 26) 304 FIG. 3.
At 506, in some embodiments, linguistic features are identified in the language data. Where step 506, in some embodiments, includes one or more feature of step 306 FIG. 3.
At 508, in some embodiments, the ML model is trained to provide diagnoses for more than one condition, using the extracted vocal feature/s, linguistic feature/s, and label.
Where, in some embodiments, a dataset of labeled subject data (e.g. including vocal feature/s and/or linguistic feature/s) is split into training and test sets. In some embodiments, 50-95% of the data is used as training data, the remaining 50%-5% being used as test data. In an exemplary embodiment, 80% of the data is used as training data and 20% as test data.
In some embodiments, one or more ML classifier is trained on the training data where, in some embodiment accuracy of each classifier is evaluated using the test data. Where, in some embodiments, the ML classifier includes one or more of Neural Networks, Random Forest, logistic regression, k- nearest neighbors and boosting methods. In some embodiments, the ML classifier is evaluated using one or more of Flscore, AUC, accuracy, precision, and recall, all computed using k-fold cross-validation.
Optionally, in some embodiments, one or more measurement of subject physical feature is obtained and/or received. For example, where physical feature/s and/or obtaining thereof includes one or more feature as described regarding sensor/s 109 FIG. 1. In some embodiments, the measured physical feature/s (e.g. after optionally being pre- processed) are fed to the machine learning model e.g. in addition to vocal feature/s.
In some embodiments, a ML pipeline is used, where the ML algorithm itself selects which features (e.g. of a plurality of vocal features and/or a plurality of linguistic features) are important or not as part of the training procedure.
In some embodiments, a first ML model is trained to provide diagnoses for more than one condition, using the extracted vocal feature/s and label.
In some embodiments, a second ML model is trained to provide diagnoses for more than one condition, using the extracted linguistic feature/s and label.
In some embodiments, a third ML model is trained to provide diagnoses for more than one condition, using a diagnoses provided by the first and second ML models, and the label.
28
SUBSTITUTE SHEET (RULE 26) FIG. 6 is a flow chart of a method of diagnosis using a ML model, according to some embodiments of the disclosure.
In some embodiments, step 600 of FIG. 6 includes one or more feature of step 300 of FIG. 3.
At 602, in some embodiment, each speech recording is processed into a plurality of snippets.
Where, in some embodiments, each snippet includes a temporal portion of the associated vocal recording. In some embodiments, a speech recording is divided into a plurality of portions, the snippets not overlapping temporally. Alternatively, in some embodiments, snippets overlap each other.
In an exemplary embodiment, a continuous recording is cut into snippets of 1 second - 1 minute, or 1-30 seconds or 5-20 seconds, or about 10 seconds, or lower or higher or intermediate durations or ranges
In some embodiments, steps 604 and 606 each include one or more feature of steps 304 and 308 FIG. 3 respectively and/or steps 504 and 508 FIG. 5 respectively, where the steps are carried out, in some embodiments for each snippet to provide, at step 608 a diagnosis for each snippet, also herein termed an “intermediate diagnosis” (where, in some embodiments, the diagnosis is a multi-morbidity diagnosis e.g. as described regarding step 312 FIG. 3 and/or multi-morbidity diagnosis 228 FIG. 2).
It should be noted that, although regarding (at least) the methods of FIG. 3 and FIG. 5, that both vocal features and linguistic features are fed to a ML model, in some embodiments, only vocal features are fed to the ML model or only linguistic features are fed to the ML model. Or each of vocal features and linguistic features are fed to a separate model (e.g. as described regarding FIG. 13).
At 610, in some embodiments, the plurality of diagnoses are fed to a rule and/or a second ML model to provide, at 612, a single subject diagnosis, which, in some embodiments, is a multi-morbidity diagnosis.
In some embodiments, for example, where a diagnosis is a binary diagnosis, in some embodiments, a rule includes a threshold number and/or proportion where, if this number and/or proportion of snippet diagnoses indicates the subject has a particular condition, the rule outputs a positive diagnosis for the condition. In some embodiments, different diagnostic conditions have different rules.
29
SUBSTITUTE SHEET (RULE 26) In some embodiments, for example, where a diagnosis includes a measure of severity of and/or likelihood of having a condition, a rule include, for example, averaging of the plurality of snippet diagnoses.
FIG. 7 is a flow chart of a method of ML model training, according to some embodiments of the disclosure;
At 700, in some embodiments, a plurality of labeled subject speech recordings are received. For example, according to one or more feature of step 500, FIG. 5.
At 702, in some embodiments, the speech recordings are each processed into a plurality of snippets e.g. according to one or more feature of step 606.
At 704, in some embodiments, vocal features and/or linguistic features are extracted from the snippets. Where, in some embodiments, extraction of vocal feature/s from the snippets includes one or more feature of vocal feature extraction as described regarding step 604 FIG. 6.
At 706, in some embodiments, a ML model for subject diagnosis is trained using the vocal features (and/or linguistic features) extracted and labels for a plurality of snippets. For example, where a single labeled recording provides a plurality of labeled snippets, increasing data for training of the ML model.
In some embodiments, FIG. 7 is performed twice, a first time to train a vocal feature ML model using vocal features extracted from snippets and a second time to train a linguistic feature ML model using linguistic features extracted from snippets.
In some embodiments, a ML pipeline is used, where the ML algorithm itself selects which features (e.g. of a plurality of vocal features and/or a plurality of linguistic features) are important or not as part of the training procedure.
FIG. 8 illustrates data flow with respect to system elements, for prediction using a ML model, according to some embodiments of the disclosure.
In some embodiments, a voice recording 802 (including one or more feature of voice recording 216 FIG. 2) of a subject is processed by a voice recording processing module 804 to provide a plurality of voice recording snippets 806. Where, in some embodiments, voice recording processing module 802 performs one or more feature of step 604, FIG. 6.
Plurality of snippets 804, in some embodiments, are processed (e.g. according to 30
SUBSTITUTE SHEET (RULE 26) one or more feature of step 605 FIG.6) by a snippet processing module 806 to produce a plurality of sets of voice feature/s 808, e.g. a set of voice feature/s corresponding to each snippet.
In some embodiments, a ML model 810 produces a plurality of diagnoses 812, e.g. a diagnosis (in some embodiments a multi-condition diagnosis) for each snippet.
A rue and/or second ML learning model 814, in some embodiments, is fed the plurality of intermediate diagnoses 812 to provide a single subject diagnosis 816, for the individual subject who produced vocal recording 802.
In some embodiments, a structure similar to that of FIG. 8 is used to provide a diagnose a subject but where the features extracted from the snippets are linguistic features. In some embodiments, a diagnosis is determined by receiving subject diagnoses 816 and a diagnosis provided using linguistic feature extraction from snippets and an associated machine learning model and the diagnoses are used to provide a single diagnosis of the subject (e.g. using another ML model and/or a rule).
FIG. 9 is a method of ML model training, according to some embodiments of the disclosure.
At 900, in some embodiments, labeled sets of snippet diagnoses are received. Where, in some embodiments, a set of snippet diagnoses includes a plurality of diagnoses, one for each snippet, where the set has a single label (e.g. the snippets being portion of a recording of a patient having that label).
At 902, in some embodiments, a ML model is trained using the plurality of labeled sets of snippet diagnoses.
FIG. 10 is a method of training a ML model, according to some embodiments of the disclosure.
At 1000, in some embodiments, a plurality of subject recordings are received. For example, according to one or more feature of step 300 FIG. 3.
In some embodiments, labels are received e.g. a label (e.g. label including one or more feature of label) for each recording.
Alternatively or additionally, labels are determined, for example, according to steps 1002, 1006, 1008.
Where, in some embodiments, at step 1012, results of subject evaluation 31
SUBSTITUTE SHEET (RULE 26) questionnaires are received, for example, including one or more feature of step 300 FIG. 3.
In some embodiments, at step 708, subjects’ diagnosis labels are determined using the subject evaluation questionnaires (e.g. only).
At 706, optionally, in some embodiments, subjects’ subjective rating/s (e.g. 310 FIG. 3) are received. Where, in some embodiments, at step 708, subjects’ diagnosis labels are determined using both the subject evaluation questionnaires received at step 1002 and the subjects’ subjective evaluation received at step 1006.
At 1010, in some embodiments, the vocal recordings are pre-processed. Optionally, in some embodiments, one or more feature of pre-processing is performed by a ML model. Additionally and/or alternatively, in some embodiments, one or more portion of pre-processing is performed by publicly (and/or commercially) available software e.g. openSMILE software.
Where, in some embodiments, pre-processing includes identifying and/or removing of silent portions of the speech recordings. Where, in some embodiments, preprocessing includes removing noise from the vocal recordings. Noise for example, including extraneous speech (e.g. speech of an interviewer) and/or background noise.
In some embodiments, pre-processing includes checking and/or verifying that the speech recording includes speech of one subject. Where, in some embodiments, a ML model is used to one or more of check, verify, and remove speech that is not associated with the subject (e.g. the subject being the individual speaking for a majority of the time of the recording).
At 1012, in some embodiments, each speech recording is split into snippets, for example, according to one or more feature of step 702 FIG. 7.
At 1014, in some embodiments, subject self-rating/s are received, subject self- rating/s, for example, including one or more feature of subject self-rating/s as described regarding step 310, FIG. 3.
At 1016, in some embodiments, vocal feature/s are extracted from each snippet. For example, according to one or more feature of step 704 FIG. 7.
At 1018, in some embodiments, subject language data is received.
Where, in some embodiments, subject language data includes a text script/s of the subject voice recording/s extracted at step 1018a. Where, in some embodiments, the extracting includes feeding the subject voice recording/s to a ML model or other software 32
SUBSTITUTE SHEET (RULE 26) which produces text scripts from speech recordings.
Alternatively or additionally, in some embodiments, subject language data includes subject textural data, obtained and/or received at step 1018b. Where the subject textural data includes, for example, text written by the subject e.g. one or more of; written answers to questions, social media posts, and email correspondence.
At 1020, in some embodiments, linguistic feature/s in language data and/or text from speech recordings are identified. For example, according to one or more feature of step 506 FIG 5.
At 1022, in some embodiments, a ML model (or more than one ML model e.g. as described regarding ML model/s 206 and/or step 312 FIG. 3) is trained using labeled extracted vocal features and optionally, one or both of linguistic features (e.g. extracted at step 1020) and subject self-evaluation (e.g. received at step 1006).
At 1024, in some embodiments, a sub-set of vocal feature/s and/or linguistic feature/s for the ML model are selected, based on the training.
At 1026, in some embodiments, the ML model is provided e.g. for use in a system e.g. system 100 FIG. 1
FIG. 11 is a flow chart of methods of ML training, according to some embodiments of the disclosure.
FIG. 11, in some embodiments, illustrates exemplary embodiments of when, in a process of ML training, linguistic feature/s are identified from subject speech recordings.
Steps 1104, 1106, and 1108, in some embodiments, illustrating an embodiment where vocal features are extracted 1106 from snippets 1104 to train a ML model 1108.
Steps 1110, 1112, 1114, 116, in some embodiments, illustrating an embodiment where linguistic features are extracted from the entire recording 1110, and where vocal features are extracted from snippets 1114 to train a ML model 1116.
Steps 1118, 1120, 1122, 1124, in some embodiments, illustrating an embodiment where linguistic features 1120 and vocal features 1122 are both extracted from snippets to train a ML model 1124.
Arrow 1126, in some embodiments, illustrates a further embodiment where linguistic features are identified in the entire recordings 1110 and in the snippets 1120 where, in some embodiments, both types of linguistic feature are used to train a ML model 1124.
33
SUBSTITUTE SHEET (RULE 26) FIG. 12 is a method of companng speech for an individual, according to some embodiments of the disclosure.
At 1200, in some embodiments, a first speech recording of a subject including is obtained.
In some embodiments, the first speech recording includes non-emotive content.
For example, where the subject is recorded whilst talking about non-emotive topics. For example, reads a written text. In an exemplary embodiment, the subject is asked to verbally perform a cognitive task. For example, to describe suitable clothing for a weather situation, for example, to recite lists of categories. Without wanting to be bound by theory it is theorized that a vocal recording of a subject verbally performing a cognitive task potentially increases reliability and/or accuracy of recording and/or detecting baseline linguistic feature/s and/or vocal feature/s for the specific subject.
In some embodiments, the subject is asked to describe an event which isn’t potentially emotive (or has low likelihood of being so) “Tell me the story of your experience of the weather today is”.
At 1202, in some embodiments, vocal and/or linguistic features of the first recording are identified to provide a first feature set (e.g. a non-emotive feature set), for example, including one or more feature of step 304 FIG. 3 and/or step 306 FIG. 3.
At 1204, in some embodiments, a second speech recording of a subject is obtained.
Where, in some embodiments, the second speech recording includes potentially emotive content (also herein termed “emotional subject matter”) to provide an emotive feature set.
In an exemplary embodiment, the subject is asked to describe a potentially emotive event “Tell me the story of what you experienced.”
Where, for example, the subject is asked to verbally recount a potentially traumatic even and/or describe a potentially traumatic subject. For example, when screening in order to identify those subjects likely to suffer long-lasting mental distress associated with a situation (e.g. health situation, e.g. injury, e.g. childbirth), in some embodiments, the subject is asked to describe the health situation and/or event/s which lead to the health situation.
At 1206, in some embodiments, vocal and/or linguistic features of the second 34
SUBSTITUTE SHEET (RULE 26) recording are identified to provide a second feature set (e.g. an emotive feature set), for example, including one or more feature of step 304 FIG. 3 and/or step 306 FIG. 3.
At 1208, in some embodiments, vocal feature/s and/or linguistic features of the emotive feature set are normalized e.g. for the individual subject, using the first feature set to normalize the second feature set e.g. non-emotive feature set to normalize the emotive feature set.
For example, referring to vocal pitch and/or volume. In some embodiments, vocal pitch and/or volume are different between individuals but are affected by emotional state. In some embodiments, non-emotive parameters (e.g. for pitch and/or volume) are used to normalize emotive parameters e.g. determine emotional and/or mental state of the subject e.g. with respect to the emotive subject matter.
Exemplary statistical methods for analysis:
FIG. 13 is a method of a flow chart of a method of diagnosis using a ML model, according to some embodiments of the disclosure.
At 1300, in some embodiments, subject voice recording/s are obtained and/or received e.g. according to one or more feature of step 500 FIG. 5.At 1302, in some embodiments, pre-determined vocal feature/s are extracted from the voice recording e.g. according to one or more feature of step 504 FIG. 5.
At 1304, in some embodiments, a vocal feature ML model is trained to provide a first diagnoses for more than one condition, using the extracted vocal feature/s, linguistic feature/s, and label. Where, in some embodiments, training of the vocal feature ML includes one or more feature of ML training as describe regarding step 508 FIG. 5
At 1306, labeled subject language data is received e.g. according to one or more feature of step 502 FIG. 5.
At 1308, in some embodiments, linguistic features are identified in the language data e.g. according to one or more feature of step 506 FIG. 5
At 1310, in some embodiments, a linguistic feature ML model is trained to provide diagnoses for more than one condition, using the extracted linguistic feature/s, linguistic feature/s, and label. Where, in some embodiments, training of the vocal feature ML includes one or more feature of ML training as describe regarding step 508 FIG. 5
At 1312, in some embodiments the first and second diagnoses are combined e.g. by another ML model e.g. to provide a single diagnosis of the subject.
35
SUBSTITUTE SHEET (RULE 26) Where, in some embodiments, combining of the two diagnoses includes using IMP® SPSS® Statistics software (version 25) to analyze a distribution of the diagnoses from the two analyses (i.e., present/not present and level of pain, depression, or anxiety). In some embodiments, the diagnoses are entered for analysis by conventional bivariate, and/or multivariate analyses. In some embodiments, independent variables include one or more of age, gender, vocal feature/s, and narrative feature/s. In some embodiments, multivariate analysis/es includes discriminant function analysis e.g. to form a parsimonious set of variables for detection of a plurality of outcome measures: e.g. depression, anxiety, or pain.
In some embodiments, log-linear regression is used to reduce a number of independent variables e.g. to amplify statistical power.
In some embodiments, diagnoses from both methods are analyzed using standard epidemiological methods for testing the efficacy of a screening test. These analyses include, for example, calculation of sensitivity and/or specificity, and/or positive and/or negative predictive values of the test e.g. according to one or more feature as described in reference [28].
In some embodiments analysis of the two diagnoses is carried out using a ML pipeline, where the ML algorithm itself selects which features are important or not as part of the training procedure.
Exemplary preliminary vocal analysis (YAMS)
A machine learning model was trained using the publicly available Distress Analysis Interview Corpus (DAIC) data set.
Where 1 minute voice samples were acquired from the data set, vocal features extracted using openSMILE software, and the ML model was trained using the extracted vocal features and diagnosis labels for depression from the data set.
The extracted features included frequency -domain representation of the voice signal (e.g. obtained using Fast Fourier Transform (FFT)). Where openSMILE software was used to process speech recordings providing a frequency domain signal of the speech, where frequency bands not corresponding to human speech were removed.
Prediction using the model included the model receiving a voice sample, producing a number between 0-100, and, according to a pre-trained cut-off, diagnosing the subject as belonging to a sick (e.g. depressed) or healthy (e.g. non-depressed) person.
36
SUBSTITUTE SHEET (RULE 26) Training was used to select a subset of vocal features to improve diagnosis.
The ML model process was tested a study of 30 subjects each providing a narrative where the subjects were all from SUMC (Soroka University Medical Center) medical staff, collected during May and June of 2020. Of the 30 subjects 16 were working in Corona wards and 14 were working in maternity wards. 6 additional adult subjects from the community who were coping with Covidl9 stress during the first lockdown increased the subject numbers to N=36. 10 subjects out of 36 were labeled as having anxiety, 7 out 36 were labeled as having with depression, and 6 of these subjects were also labeled as having anxiety.
Recording of the narratives was using REDCap® data recording platform. Vocal features were extracted using openSMILE open-source software.
Different types of classifiers were employed including Random Forest and Neural Networks. Better results were obtained with Neural Networks.
Results had lower sensitivity produced by the method (sensitivity - correcting identifying those with either depression or anxiety) compared with specificity (correctly identifying those without). Without wanting to be bound by theory, it is theorized that the lower sensitivity is associated with the small sample size and proportion of identified cases, which is further reduced in the train-test division of data.
The selection of cases in test-train division for the ML classifier analysis was done randomly.
Referring to average accuracy of the test samples, results among those cases that were positive for both anxiety and depression (n=6), the accuracy was higher (n=4). Illustrating a potential increase in accuracy with comorbidity e.g. of anxiety and depression, e.g. rather than anxiety alone. Without wanting to be bound by theory, it is theorized that co-morbidity of anxiety and depression is a cue for a more severe case of anxiety e.g. refer to reference [51], which is reflected in a stronger signal in the voice markers.
Preliminary experimental results (NMAPP method):
Analysis of the SUMC dataset (e.g. script transcripts of the recordings) yielded over 60 codes, and significant patterns in both the form and content of the transcripts have been uncovered. discursive patterns related to pronouns and tenses;
37
SUBSTITUTE SHEET (RULE 26) • the second-person feminine singular pronoun was used when the narrator, regardless of gender, was describing difficult, stressful, and/or upsetting sensory experiences and aftereffects.
• markers of emphasis, repetition, and other discursive mechanisms of sharpening and flattening serve as "flags" to identify distress in the narratives.
These distinctive markers appear to correlate with quantitative measures of distress, anxiety, pain, and depression.
Analyses of the pre-test Soroka University Medical Center (SUMC) Soroka data (n=30), of the correlations between the word markers and the clinical labels shows that for total words, first person word counts, you-masculine, you-feminine. we-pronoun word counts and negative emotional word counts were correlated with measures of one or more of anxiety, depression, and pain.
Additional experimental results
Data was acquired for 5 populations:
1. chronic pain patients
2. health care workers
3. community residents
4. persons in treatment
Data was labeled using questionnaires, CES-D (depression) with a clinical cutoff, GAD7 (anxiety) and SUDS (subjective units of distress).
Results for depression and anxiety are presented below in tables 2 and 3, where increase in accuracy of the ML model with data amount (number of patients) is also illustrated.
Table 2 - Depression
Figure imgf000040_0001
38
SUBSTITUTE SHEET (RULE 26) Table 3 - Anxiety
Figure imgf000041_0001
Generalizability
Table 4, below, illustrates generalizability (the ability of the ML to predict condition/s for a group of people, where the ML model has been trained using a different group of people), in some embodiments, of ML methods of the present disclosure. Where, table 4 illustrates sensitivity, specificity, and accuracy of diagnosis for a range of different populations being used as training data and test data. Table 4 - Data from different populations used for ML model training and test sets
Figure imgf000041_0002
Exemplary rule selection
Tables 5-10 illustrate application of different rules for providing a single diagnosis for a subject, based on a set of diagnoses, each associated with a different snippet of a vocal recording of the subject. For example, referring to the figures where the rule corresponds to step 610 FIG. 6 and/or to step 706 FIG. 7 and/or step 814 FIG. 8.
39
SUBSTITUTE SHEET (RULE 26) Tables 5-10, in some embodiments, illustrate that use of snippets and/or rules increase accuracy.
In some embodiments, a rule providing the largest accuracy is selected.
In some embodiments, a rule providing the highest sensitivity and/or specificity is selected e.g. depending on the application.
For example, for a high urgency application (e.g. to identify a suicidal subject )a higher sensitivity is used e.g. over 95% which, in some embodiments, is associated with a reduction in specificity, in other words, more false positives. Table 5 - Depression - 111 subjects where 46% of the subjects were depressed - just suds no voice
Figure imgf000042_0001
Table 6 - Depression - 111 subjects where 46% of the subjects were depressed - suds + voice
Figure imgf000042_0002
Table 7 - Anxiety - 93 patients, where 45% of the subjects were anxious
Figure imgf000042_0003
Table 11 - Anxiety - 111 patients, where 50% of the subjects were anxious
Figure imgf000042_0004
40
SUBSTITUTE SHEET (RULE 26) Table 9 - Depression - 93 patients, where 42% of the subjects were depressed
Figure imgf000043_0001
Table 10 - Depression - 111 patients, of the subjects were 46% depressed
Figure imgf000043_0002
NMAPP additional experimental results
454 audio files of lengths ranging from 2 to 120 minutes including narratives from women describing their birth stories, along with participants’ responses to the Birth Experience Questionnaire (BEQ) as described in reference [30].
Coding of the transcripts was performed using Atlas. ti qualitative analysis program.
The coded transcripts were analyzed to provide higher-order discursive pattern recognition and interpretations.
The analysis was tested using BEQ data.
In the analysis of the transcripts, four components were extracted:
1. opening sentence(s);
2. closing sentence(s);
3. a description of the birth moment; and
4. an interpretive “title,” determined by the overarching theme or repeated/emphasized content.
Components 1-4 were evaluated according to:
1. use of tense
2. use of pronoun “voice” and associated prepositions
3. use of passive vs. active verbs and verb forms
41
SUBSTITUTE SHEET (RULE 26) 4. emotionality in content (e.g., use of words such as “happy” and “sad”)
5. emotionality in form (e.g., use of emphasis, repetition, flattening, and dialogue) and in extra-linguistic cues (e.g., crying, sighing, pausing)
6. use of “talk perspective” (i.e., level of personalization vs. distancing vs. universalization in the discourse)
7. specific narrative content (e.g., referring to the birth as traumatic). Evaluation of the components was statistically compared to the BEQ data. Results showed that those participants whose BEQ responses indicated higher levels of distress, trauma, and lack of satisfaction with their birth experiences, were also assessed according to components 1-4 and their analysis as highest in distress and difficulty processing the experience.
Significant correlations were found between the women’s quantitative and qualitative responses in six analytical categories: 1) trauma; 2) agency and ownership; 3) support; 4) attachment to the baby; 5) emotionality; and 6) expectations of the birth.
General
As used within this document, the term “about” refers to±20%
The terms “comprises”, “comprising”, “includes”, “including”, “having” and their conjugates mean “including but not limited to”.
The term “consisting of’ means “including and limited to”.
As used herein, singular forms, for example, “a”, “an” and “the” include plural references unless the context clearly dictates otherwise.
Within this application, various quantifications and/or expressions may include use of ranges. Range format should not be construed as an inflexible limitation on the scope of the present disclosure. Accordingly, descriptions including ranges should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within the stated range and/or subrange, for example, 1, 2, 3, 4, 5, and 6. Whenever a numerical range is indicated within this document, it is meant to include any cited numeral (fractional or integral) within the indicated range.
It is appreciated that certain features which are (e.g., for clarity) described in the 42
SUBSTITUTE SHEET (RULE 26) context of separate embodiments, may also be provided in combination in a single embodiment. Where various features of the present disclosure, which are (e.g., for brevity) described in a context of a single embodiment, may also be provided separately or in any suitable sub-combination or may be suitable for use with any other described embodiment. Features described in the context of various embodiments are not to be considered essential features of those embodiments, unless the embodiment is inoperative without those elements.
Although the present disclosure has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications, and variations will be apparent to those skilled in the art. Accordingly, this application intends to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims.
All references (e.g., publications, patents, patent applications) mentioned in this specification are herein incorporated in their entirety by reference into the specification, e.g., as if each individual publication, patent, or patent application was individually indicated to be incorporated herein by reference. Citation or identification of any reference in this application should not be construed as an admission that such reference is available as prior art to the present disclosure. In addition, any priority document(s) and/or documents related to this application (e.g., co-filed) are hereby incorporated herein by reference in its/their entirety.
Where section headings are used in this document, they should not be interpreted as necessarily limiting.
43
SUBSTITUTE SHEET (RULE 26)

Claims

CLAIMS:
1. A method, implemented by computer circuitry, of diagnosis of a plurality of medical conditions for a subject comprising: obtaining a vocal recording of said subject having a vocal recording time duration; processing said vocal recording into a plurality of vocal recording portions, each vocal recording portion having a shorter time duration than a duration of said vocal recording time; extracting, for each of said plurality of vocal recording portions, vocal features of said subject; feeding, to a trained machine learning model, said vocal features for each of said plurality of vocal recording portions, to determine for each of said plurality of vocal recordings, an intermediate diagnosis for each of said plurality of medical conditions thereby obtaining a plurality of intermediate diagnoses for said plurality of time portions; determining, using said plurality of intermediate diagnoses, for said subject, a diagnosis for the plurality of medical conditions.
2. The method according to claim 1, comprising obtaining linguistic features of said subject; and feeding said linguistic features to said trained machine learning model for determining of said plurality of intermediate diagnoses.
3. The method according to claim 1, comprising obtaining linguistic features of said subject; and feeding said linguistic features to a second trained machine learning model to determine a second intermediate diagnosis; wherein said determining comprises using said second intermediate diagnosis to provide said diagnosis for said plurality of medical conditions.
4. The method according to any one of claims 2-3, wherein said obtaining linguistic features includes: obtaining a textural script of said vocal recording of said subject; and
44
SUBSTITUTE SHEET (RULE 26) extracting said linguistic features from said textural script.
5. The method according to claim 4, wherein said extracting said linguistic features includes extracting a plurality of linguistic features sets, one set for each vocal recording time portion.
6. The method according to any one of claims 1-5, wherein said obtaining comprises obtaining a vocal recording which includes at least a portion where said subject describes a potentially emotive subject.
7. The method according to any one of claims 1-6, wherein said determining, using said plurality of intermediate diagnoses comprises applying a rule to the diagnoses.
8. The method according to claim 7, wherein said rule includes indicating presence of a condition where above a threshold proportion of said intermediate diagnoses indicate presence of the medical condition.
9. The method according to claim 8, wherein said threshold proportion is different for different medical conditions of said plurality of medical conditions.
10. The method according to any one of claims 1-9, wherein said determining, using said plurality of intermediate diagnoses comprises feeding said plurality of intermediate diagnoses to a second trained machine learning model to provide said diagnosis for the plurality of medical conditions.
11. The method according to any one of claims 1-10, wherein said processing comprises removing one or more of silent portions, non-subject speech, and noise.
12. The method according to any one of claims 1-11, wherein said vocal features include one or more spectral feature of said vocal recording.
13. The method according to any one of claims 2-12, wherein said linguistic features include prevalence of one or more type of word.
45
SUBSTITUTE SHEET (RULE 26)
14. The method according to claim 13, wherein said type of word includes words having a tense, for one or more tense.
15. The method according to any one of claims 13-14, wherein said type of word includes a type of pronoun, for one or more pronoun type.
16. The method according to any one of claims 13-15, wherein said type of word includes a pronoun, for one or more pronouns.
17. The method according to any one of claims 13-16, wherein said type of word includes an emotion word, identified from a list of emotion words.
18. The method according to any one of claims 13-17, wherein said type of word includes repetition of a word or phrase.
19. The method according to any one of claims 13-18, wherein said type of word includes an emphasis word, identified from a list of emphasis words.
20. The method according to any one of claims 14-19, wherein said prevalence includes a prevalence per minute of the type of word.
21. The method according to any one of claims 14-20, wherein said prevalence includes a proportion of words of the narrative being said type of word.
22. The method according to any one of claims 14-21, wherein one or more of said word types is identified according to a list.
23. The method according to any one of claims 1-22, wherein said trained machine learning model is trained by: obtaining a plurality of voice recordings, each recording having a diagnosis label; processing each of said plurality voice recordings into a plurality of vocal recording portions, each vocal recording portion having a shorter time duration than a 46
SUBSTITUTE SHEET (RULE 26) time duration of a corresponding voice recording of said plurality of voice recordings; extracting, for each of said plurality of vocal recording portions, vocal features of said subject; training the machine learning model, using said diagnoses labels and said vocal features.
24. A method, implemented by computer circuitry, of training a machine learning model for diagnosis comprising: receiving a plurality of voice recordings, each recording having a diagnosis label and corresponding to a single patient; processing each of said plurality of voice recordings into a plurality of vocal recording portions, each vocal recording portion having a shorter time duration than said corresponding vocal recording time duration; extracting, for each of said plurality of vocal recording portions, vocal features of said subject; training the machine learning model, using said diagnosis labels, said vocal features for each of said plurality of vocal recording portions.
25. A method, implemented by computer circuitry, of training a machine learning model for diagnosis comprising: obtaining a plurality of diagnosis labels, each diagnosis label including a diagnosis for a plurality of medical conditions; obtaining a plurality of voice recordings each associated with a diagnosis label of said plurality of diagnosis labels; obtaining one or more linguistic features, each associated with a diagnosis label of said plurality of diagnosis labels; extracting one or more vocal features from each of said plurality of voice recordings; training said machine learning model using said diagnosis labels, said one or more vocal features per diagnosis label, and said one or more linguistic features per diagnosis label.
47
SUBSTITUTE SHEET (RULE 26)
26. A method, implemented by computer circuitry, of training machine learning models for diagnosis comprising: obtaining a plurality of diagnosis labels, each diagnosis label including a diagnosis for a plurality of medical conditions; obtaining a plurality of voice recordings each associated with a diagnosis label of said plurality of diagnosis labels; obtaining one or more linguistic features, each associated with a diagnosis label of said plurality of diagnosis labels; extracting one or more vocal features from each of said plurality of voice recordings; training a first machine learning model using said diagnosis labels and said one or more vocal features per diagnosis label; training a second machine learning model using said diagnosis labels and said one or more linguistic features per diagnosis label.
27. The method according to claim 27, comprising training a third machine learning model using said first machine learning model and said second machine learning model and said diagnosis labels.
28. The method according to claims 26-27, wherein said extracting one or more vocal features comprises, for each voice recording of said plurality of voice recordings: processing said voice recording into a plurality of vocal recording portions, each vocal recording portion having a shorter time duration than a duration of said voice recording; extracting, for each of said plurality of vocal recording portions, vocal features of said subject;
29. The method according to any one of claims 26-87, wherein said obtaining said one or more linguistic features includes: obtaining a textural script of each said vocal recording of said plurality of vocal recordings; and extracting said one or more linguistic features from said textural script, for each 48
SUBSTITUTE SHEET (RULE 26)
PCT/IL2022/051253 2021-11-24 2022-11-24 Subject diagnosis using speech analysis WO2023095136A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163282704P 2021-11-24 2021-11-24
US63/282,704 2021-11-24

Publications (1)

Publication Number Publication Date
WO2023095136A1 true WO2023095136A1 (en) 2023-06-01

Family

ID=86538977

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IL2022/051253 WO2023095136A1 (en) 2021-11-24 2022-11-24 Subject diagnosis using speech analysis

Country Status (1)

Country Link
WO (1) WO2023095136A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190311815A1 (en) * 2017-05-05 2019-10-10 Canary Speech, LLC Medical assessment based on voice

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190311815A1 (en) * 2017-05-05 2019-10-10 Canary Speech, LLC Medical assessment based on voice

Similar Documents

Publication Publication Date Title
Pulido et al. Alzheimer's disease and automatic speech analysis: a review
Luz et al. Alzheimer’s dementia recognition through spontaneous speech
Fraser et al. Predicting MCI status from multimodal language data using cascaded classifiers
Morales et al. A cross-modal review of indicators for depression detection systems
Eni et al. Estimating autism severity in young children from speech signals using a deep neural network
Rana et al. Automated screening for distress: A perspective for the future
GB2567826A (en) System and method for assessing physiological state
Al-Hameed et al. A new diagnostic approach for the identification of patients with neurodegenerative cognitive complaints
Arif et al. Classification of anxiety disorders using machine learning methods: a literature review
Ntracha et al. Detection of mild cognitive impairment through natural language and touchscreen typing processing
Janse et al. Identifying nonwords: Effects of lexical neighborhoods, phonotactic probability, and listener characteristics
Yamada et al. Atypical repetition in daily conversation on different days for detecting alzheimer disease: evaluation of phone-call data from a regular monitoring service
Tremblay et al. Age-related deficits in speech production: From phonological planning to motor implementation
Farrús et al. Acoustic and prosodic information for home monitoring of bipolar disorder
Brennan et al. Predictive sentence comprehension during story-listening in autism spectrum disorder
Teferra et al. Acoustic and linguistic features of impromptu speech and their association with anxiety: validation study
Diaz-Asper et al. Acceptability of collecting speech samples from the elderly via the telephone
Kishimoto et al. Understanding psychiatric illness through natural language processing (UNDERPIN): Rationale, design, and methodology
Lalitha et al. Mental Illness Disorder Diagnosis Using Emotion Variation Detection from Continuous English Speech.
Karan et al. An investigation about the relationship between dysarthria level of speech and the neurological state of Parkinson’s patients
Hudenko et al. Listeners prefer the laughs of children with autism to those of typically developing children
Yamada et al. A mobile application using automatic speech analysis for classifying Alzheimer's disease and mild cognitive impairment
von Polier et al. Predicting adult attention deficit hyperactivity disorder (ADHD) using vocal acoustic features
WO2023095136A1 (en) Subject diagnosis using speech analysis
Shalu et al. Depression status estimation by deep learning based hybrid multi-modal fusion model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22898107

Country of ref document: EP

Kind code of ref document: A1